]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. BSD LICENSE |
2 | Copyright(c) 2010-2016 Intel Corporation. All rights reserved. | |
3 | All rights reserved. | |
4 | ||
5 | Redistribution and use in source and binary forms, with or without | |
6 | modification, are permitted provided that the following conditions | |
7 | are met: | |
8 | ||
9 | * Redistributions of source code must retain the above copyright | |
10 | notice, this list of conditions and the following disclaimer. | |
11 | * Redistributions in binary form must reproduce the above copyright | |
12 | notice, this list of conditions and the following disclaimer in | |
13 | the documentation and/or other materials provided with the | |
14 | distribution. | |
15 | * Neither the name of Intel Corporation nor the names of its | |
16 | contributors may be used to endorse or promote products derived | |
17 | from this software without specific prior written permission. | |
18 | ||
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
20 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
21 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
22 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | |
23 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | |
24 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | |
25 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
26 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | |
27 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
28 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
30 | ||
31 | Vhost Library | |
32 | ============= | |
33 | ||
34 | The vhost library implements a user space virtio net server allowing the user | |
35 | to manipulate the virtio ring directly. In another words, it allows the user | |
36 | to fetch/put packets from/to the VM virtio net device. To achieve this, a | |
37 | vhost library should be able to: | |
38 | ||
39 | * Access the guest memory: | |
40 | ||
41 | For QEMU, this is done by using the ``-object memory-backend-file,share=on,...`` | |
42 | option. Which means QEMU will create a file to serve as the guest RAM. | |
43 | The ``share=on`` option allows another process to map that file, which | |
44 | means it can access the guest RAM. | |
45 | ||
46 | * Know all the necessary information about the vring: | |
47 | ||
48 | Information such as where the available ring is stored. Vhost defines some | |
49 | messages (passed through a Unix domain socket file) to tell the backend all | |
50 | the information it needs to know how to manipulate the vring. | |
51 | ||
52 | ||
53 | Vhost API Overview | |
54 | ------------------ | |
55 | ||
56 | The following is an overview of the Vhost API functions: | |
57 | ||
58 | * ``rte_vhost_driver_register(path, flags)`` | |
59 | ||
60 | This function registers a vhost driver into the system. ``path`` specifies | |
61 | the Unix domain socket file path. | |
62 | ||
63 | Currently supported flags are: | |
64 | ||
65 | - ``RTE_VHOST_USER_CLIENT`` | |
66 | ||
67 | DPDK vhost-user will act as the client when this flag is given. See below | |
68 | for an explanation. | |
69 | ||
70 | - ``RTE_VHOST_USER_NO_RECONNECT`` | |
71 | ||
72 | When DPDK vhost-user acts as the client it will keep trying to reconnect | |
73 | to the server (QEMU) until it succeeds. This is useful in two cases: | |
74 | ||
75 | * When QEMU is not started yet. | |
76 | * When QEMU restarts (for example due to a guest OS reboot). | |
77 | ||
78 | This reconnect option is enabled by default. However, it can be turned off | |
79 | by setting this flag. | |
80 | ||
81 | - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` | |
82 | ||
83 | Dequeue zero copy will be enabled when this flag is set. It is disabled by | |
84 | default. | |
85 | ||
86 | There are some truths (including limitations) you might want to know while | |
87 | setting this flag: | |
88 | ||
89 | * zero copy is not good for small packets (typically for packet size below | |
90 | 512). | |
91 | ||
92 | * zero copy is really good for VM2VM case. For iperf between two VMs, the | |
93 | boost could be above 70% (when TSO is enableld). | |
94 | ||
95 | * for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if virtio | |
96 | indirect feature is not enabled and <= 128 if it is enabled. | |
97 | ||
98 | The is because when dequeue zero copy is enabled, guest Tx used vring will | |
99 | be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc | |
100 | has to be small enough so that the PMD driver will run out of available | |
101 | Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be | |
102 | starved. | |
103 | ||
104 | * Guest memory should be backended with huge pages to achieve better | |
105 | performance. Using 1G page size is the best. | |
106 | ||
107 | When dequeue zero copy is enabled, the guest phys address and host phys | |
108 | address mapping has to be established. Using non-huge pages means far | |
109 | more page segments. To make it simple, DPDK vhost does a linear search | |
110 | of those segments, thus the fewer the segments, the quicker we will get | |
111 | the mapping. NOTE: we may speed it by using tree searching in future. | |
112 | ||
113 | * ``rte_vhost_driver_session_start()`` | |
114 | ||
115 | This function starts the vhost session loop to handle vhost messages. It | |
116 | starts an infinite loop, therefore it should be called in a dedicated | |
117 | thread. | |
118 | ||
119 | * ``rte_vhost_driver_callback_register(virtio_net_device_ops)`` | |
120 | ||
121 | This function registers a set of callbacks, to let DPDK applications take | |
122 | the appropriate action when some events happen. The following events are | |
123 | currently supported: | |
124 | ||
125 | * ``new_device(int vid)`` | |
126 | ||
127 | This callback is invoked when a virtio net device becomes ready. ``vid`` | |
128 | is the virtio net device ID. | |
129 | ||
130 | * ``destroy_device(int vid)`` | |
131 | ||
132 | This callback is invoked when a virtio net device shuts down (or when the | |
133 | vhost connection is broken). | |
134 | ||
135 | * ``vring_state_changed(int vid, uint16_t queue_id, int enable)`` | |
136 | ||
137 | This callback is invoked when a specific queue's state is changed, for | |
138 | example to enabled or disabled. | |
139 | ||
140 | * ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)`` | |
141 | ||
142 | Transmits (enqueues) ``count`` packets from host to guest. | |
143 | ||
144 | * ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)`` | |
145 | ||
146 | Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``. | |
147 | ||
148 | * ``rte_vhost_feature_disable/rte_vhost_feature_enable(feature_mask)`` | |
149 | ||
150 | This function disables/enables some features. For example, it can be used to | |
151 | disable mergeable buffers and TSO features, which both are enabled by | |
152 | default. | |
153 | ||
154 | ||
155 | Vhost-user Implementations | |
156 | -------------------------- | |
157 | ||
158 | Vhost-user uses Unix domain sockets for passing messages. This means the DPDK | |
159 | vhost-user implementation has two options: | |
160 | ||
161 | * DPDK vhost-user acts as the server. | |
162 | ||
163 | DPDK will create a Unix domain socket server file and listen for | |
164 | connections from the frontend. | |
165 | ||
166 | Note, this is the default mode, and the only mode before DPDK v16.07. | |
167 | ||
168 | ||
169 | * DPDK vhost-user acts as the client. | |
170 | ||
171 | Unlike the server mode, this mode doesn't create the socket file; | |
172 | it just tries to connect to the server (which responses to create the | |
173 | file instead). | |
174 | ||
175 | When the DPDK vhost-user application restarts, DPDK vhost-user will try to | |
176 | connect to the server again. This is how the "reconnect" feature works. | |
177 | ||
178 | .. Note:: | |
179 | * The "reconnect" feature requires **QEMU v2.7** (or above). | |
180 | ||
181 | * The vhost supported features must be exactly the same before and | |
182 | after the restart. For example, if TSO is disabled and then enabled, | |
183 | nothing will work and issues undefined might happen. | |
184 | ||
185 | No matter which mode is used, once a connection is established, DPDK | |
186 | vhost-user will start receiving and processing vhost messages from QEMU. | |
187 | ||
188 | For messages with a file descriptor, the file descriptor can be used directly | |
189 | in the vhost process as it is already installed by the Unix domain socket. | |
190 | ||
191 | The supported vhost messages are: | |
192 | ||
193 | * ``VHOST_SET_MEM_TABLE`` | |
194 | * ``VHOST_SET_VRING_KICK`` | |
195 | * ``VHOST_SET_VRING_CALL`` | |
196 | * ``VHOST_SET_LOG_FD`` | |
197 | * ``VHOST_SET_VRING_ERR`` | |
198 | ||
199 | For ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each | |
200 | memory region and its file descriptor in the ancillary data of the message. | |
201 | The file descriptor is used to map that region. | |
202 | ||
203 | ``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into | |
204 | the data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove | |
205 | the vhost device from the data plane. | |
206 | ||
207 | When the socket connection is closed, vhost will destroy the device. | |
208 | ||
209 | Vhost supported vSwitch reference | |
210 | --------------------------------- | |
211 | ||
212 | For more vhost details and how to support vhost in vSwitch, please refer to | |
213 | the vhost example in the DPDK Sample Applications Guide. |