]> git.proxmox.com Git - ovs.git/blob - Documentation/topics/dpdk/vhost-user.rst
6bf16f72ba2d16ecd20fadf485c7ae34eafbb7a7
[ovs.git] / Documentation / topics / dpdk / vhost-user.rst
1 ..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24 =====================
25 DPDK vHost User Ports
26 =====================
27
28 The DPDK datapath provides DPDK-backed vHost user ports as a primary way to
29 interact with guests. For more information on vHost User, refer to the `QEMU
30 documentation`_ on same.
31
32 Quick Example
33 -------------
34
35 This example demonstrates how to add two ``dpdkvhostuserclient`` ports to an
36 existing bridge called ``br0``::
37
38 $ ovs-vsctl add-port br0 dpdkvhostclient0 \
39 -- set Interface dpdkvhostclient0 type=dpdkvhostuserclient \
40 options:vhost-server-path=/tmp/dpdkvhostclient0
41 $ ovs-vsctl add-port br0 dpdkvhostclient1 \
42 -- set Interface dpdkvhostclient1 type=dpdkvhostuserclient \
43 options:vhost-server-path=/tmp/dpdkvhostclient1
44
45 For the above examples to work, an appropriate server socket must be created
46 at the paths specified (``/tmp/dpdkvhostclient0`` and
47 ``/tmp/dpdkvhostclient1``). These sockets can be created with QEMU; see the
48 :ref:`vhost-user client <dpdk-vhost-user-client>` section for details.
49
50 vhost-user vs. vhost-user-client
51 --------------------------------
52
53 Open vSwitch provides two types of vHost User ports:
54
55 - vhost-user (``dpdkvhostuser``)
56
57 - vhost-user-client (``dpdkvhostuserclient``)
58
59 vHost User uses a client-server model. The server creates/manages/destroys the
60 vHost User sockets, and the client connects to the server. Depending on which
61 port type you use, ``dpdkvhostuser`` or ``dpdkvhostuserclient``, a different
62 configuration of the client-server model is used.
63
64 For vhost-user ports, Open vSwitch acts as the server and QEMU the client. This
65 means if OVS dies, all VMs **must** be restarted. On the other hand, for
66 vhost-user-client ports, OVS acts as the client and QEMU the server. This means
67 OVS can die and be restarted without issue, and it is also possible to restart
68 an instance itself. For this reason, vhost-user-client ports are the preferred
69 type for all known use cases; the only limitation is that vhost-user client
70 mode ports require QEMU version 2.7. Ports of type vhost-user are currently
71 deprecated and will be removed in a future release.
72
73 .. _dpdk-vhost-user:
74
75 vhost-user
76 ----------
77
78 .. important::
79
80 Use of vhost-user ports requires QEMU >= 2.2; vhost-user ports are
81 *deprecated*.
82
83 To use vhost-user ports, you must first add said ports to the switch. DPDK
84 vhost-user ports can have arbitrary names with the exception of forward and
85 backward slashes, which are prohibited. For vhost-user, the port type is
86 ``dpdkvhostuser``::
87
88 $ ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 \
89 type=dpdkvhostuser
90
91 This action creates a socket located at
92 ``/usr/local/var/run/openvswitch/vhost-user-1``, which you must provide to your
93 VM on the QEMU command line.
94
95 .. note::
96
97 If you wish for the vhost-user sockets to be created in a sub-directory of
98 ``/usr/local/var/run/openvswitch``, you may specify this directory in the
99 ovsdb like so::
100
101 $ ovs-vsctl --no-wait \
102 set Open_vSwitch . other_config:vhost-sock-dir=subdir
103
104 Once the vhost-user ports have been added to the switch, they must be added to
105 the guest. There are two ways to do this: using QEMU directly, or using
106 libvirt.
107
108 .. note::
109 IOMMU is not supported with vhost-user ports.
110
111 Adding vhost-user ports to the guest (QEMU)
112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113
114 To begin, you must attach the vhost-user device sockets to the guest. To do
115 this, you must pass the following parameters to QEMU::
116
117 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
118 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
119 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
120
121 where ``vhost-user-1`` is the name of the vhost-user port added to the switch.
122
123 Repeat the above parameters for multiple devices, changing the chardev ``path``
124 and ``id`` as necessary. Note that a separate and different chardev ``path``
125 needs to be specified for each vhost-user device. For example you have a second
126 vhost-user port named ``vhost-user-2``, you append your QEMU command line with
127 an additional set of parameters::
128
129 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
130 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
131 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
132
133 In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user
134 ports access a virtio-net device's virtual rings and packet buffers mapping the
135 VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
136 memory into their process address space, pass the following parameters to
137 QEMU::
138
139 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
140 -numa node,memdev=mem -mem-prealloc
141
142 Finally, you may wish to enable multiqueue support. This is optional but,
143 should you wish to enable it, run::
144
145 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
146 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
147 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
148
149 where:
150
151 ``$q``
152 The number of queues
153 ``$v``
154 The number of vectors, which is ``$q`` * 2 + 2
155
156 The vhost-user interface will be automatically reconfigured with required
157 number of rx and tx queues after connection of virtio device. Manual
158 configuration of ``n_rxq`` is not supported because OVS will work properly only
159 if ``n_rxq`` will match number of queues configured in QEMU.
160
161 A least 2 PMDs should be configured for the vswitch when using multiqueue.
162 Using a single PMD will cause traffic to be enqueued to the same vhost queue
163 rather than being distributed among different vhost queues for a vhost-user
164 interface.
165
166 If traffic destined for a VM configured with multiqueue arrives to the vswitch
167 via a physical DPDK port, then the number of rxqs should also be set to at
168 least 2 for that physical DPDK port. This is required to increase the
169 probability that a different PMD will handle the multiqueue transmission to the
170 guest using a different vhost queue.
171
172 If one wishes to use multiple queues for an interface in the guest, the driver
173 in the guest operating system must be configured to do so. It is recommended
174 that the number of queues configured be equal to ``$q``.
175
176 For example, this can be done for the Linux kernel virtio-net driver with::
177
178 $ ethtool -L <DEV> combined <$q>
179
180 where:
181
182 ``-L``
183 Changes the numbers of channels of the specified network device
184 ``combined``
185 Changes the number of multi-purpose channels.
186
187 Adding vhost-user ports to the guest (libvirt)
188 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189
190 To begin, you must change the user and group that qemu runs under, and restart
191 libvirtd.
192
193 - In ``/etc/libvirt/qemu.conf`` add/edit the following lines::
194
195 user = "root"
196 group = "root"
197
198 - Finally, restart the libvirtd process, For example, on Fedora::
199
200 $ systemctl restart libvirtd.service
201
202 Once complete, instantiate the VM. A sample XML configuration file is provided
203 at the :ref:`end of this file <dpdk-vhost-user-xml>`. Save this file, then
204 create a VM using this file::
205
206 $ virsh create demovm.xml
207
208 Once created, you can connect to the guest console::
209
210 $ virsh console demovm
211
212 The demovm xml configuration is aimed at achieving out of box performance on
213 VM. These enhancements include:
214
215 - The vcpus are pinned to the cores of the CPU socket 0 using ``vcpupin``.
216
217 - Configure NUMA cell and memory shared using ``memAccess='shared'``.
218
219 - Disable ``mrg_rxbuf='off'``
220
221 Refer to the `libvirt documentation <http://libvirt.org/formatdomain.html>`__
222 for more information.
223
224 .. _dpdk-vhost-user-client:
225
226 vhost-user-client
227 -----------------
228
229 .. important::
230
231 Use of vhost-user ports requires QEMU >= 2.7
232
233 To use vhost-user-client ports, you must first add said ports to the switch.
234 Like DPDK vhost-user ports, DPDK vhost-user-client ports can have mostly
235 arbitrary names. However, the name given to the port does not govern the name
236 of the socket device. Instead, this must be configured by the user by way of a
237 ``vhost-server-path`` option. For vhost-user-client, the port type is
238 ``dpdkvhostuserclient``::
239
240 $ VHOST_USER_SOCKET_PATH=/path/to/socket
241 $ ovs-vsctl add-port br0 vhost-client-1 \
242 -- set Interface vhost-client-1 type=dpdkvhostuserclient \
243 options:vhost-server-path=$VHOST_USER_SOCKET_PATH
244
245 Once the vhost-user-client ports have been added to the switch, they must be
246 added to the guest. Like vhost-user ports, there are two ways to do this: using
247 QEMU directly, or using libvirt. Only the QEMU case is covered here.
248
249 Adding vhost-user-client ports to the guest (QEMU)
250 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
251
252 Attach the vhost-user device sockets to the guest. To do this, you must pass
253 the following parameters to QEMU::
254
255 -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
256 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
257 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
258
259 where ``vhost-user-1`` is the name of the vhost-user port added to the switch.
260
261 If the corresponding ``dpdkvhostuserclient`` port has not yet been configured
262 in OVS with ``vhost-server-path=/path/to/socket``, QEMU will print a log
263 similar to the following::
264
265 QEMU waiting for connection on: disconnected:unix:/path/to/socket,server
266
267 QEMU will wait until the port is created sucessfully in OVS to boot the VM.
268 One benefit of using this mode is the ability for vHost ports to 'reconnect' in
269 event of the switch crashing or being brought down. Once it is brought back up,
270 the vHost ports will reconnect automatically and normal service will resume.
271
272 vhost-user-client IOMMU Support
273 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
274
275 vhost IOMMU is a feature which restricts the vhost memory that a virtio device
276 can access, and as such is useful in deployments in which security is a
277 concern.
278
279 IOMMU support may be enabled via a global config value,
280 ```vhost-iommu-support```. Setting this to true enables vhost IOMMU support for
281 all vhost ports when/where available::
282
283 $ ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
284
285 The default value is false.
286
287 .. important::
288
289 Changing this value requires restarting the daemon.
290
291 .. important::
292
293 Enabling the IOMMU feature also enables the vhost user reply-ack protocol;
294 this is known to work on QEMU v2.10.0, but is buggy on older versions
295 (2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feature is disabled by
296 default (and should remain so if using the aforementioned versions of
297 QEMU). Starting with QEMU v2.9.1, vhost-iommu-support can safely be
298 enabled, even without having an IOMMU device, with no performance penalty.
299
300 .. _dpdk-testpmd:
301
302 DPDK in the Guest
303 -----------------
304
305 The DPDK ``testpmd`` application can be run in guest VMs for high speed packet
306 forwarding between vhostuser ports. DPDK and testpmd application has to be
307 compiled on the guest VM. Below are the steps for setting up the testpmd
308 application in the VM.
309
310 .. note::
311
312 Support for DPDK in the guest requires QEMU >= 2.2
313
314 To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or
315 :ref:`dpdk-vhost-user-client`. Once started, connect to the VM, download the
316 DPDK sources to VM and build DPDK::
317
318 $ cd /root/dpdk/
319 $ wget http://fast.dpdk.org/rel/dpdk-17.11.1.tar.xz
320 $ tar xf dpdk-17.11.1.tar.xz
321 $ export DPDK_DIR=/root/dpdk/dpdk-stable-17.11.1
322 $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
323 $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
324 $ cd $DPDK_DIR
325 $ make install T=$DPDK_TARGET DESTDIR=install
326
327 Build the test-pmd application::
328
329 $ cd app/test-pmd
330 $ export RTE_SDK=$DPDK_DIR
331 $ export RTE_TARGET=$DPDK_TARGET
332 $ make
333
334 Setup huge pages and DPDK devices using UIO::
335
336 $ sysctl vm.nr_hugepages=1024
337 $ mkdir -p /dev/hugepages
338 $ mount -t hugetlbfs hugetlbfs /dev/hugepages # only if not already mounted
339 $ modprobe uio
340 $ insmod $DPDK_BUILD/kmod/igb_uio.ko
341 $ $DPDK_DIR/usertools/dpdk-devbind.py --status
342 $ $DPDK_DIR/usertools/dpdk-devbind.py -b igb_uio 00:03.0 00:04.0
343
344 .. note::
345
346 vhost ports pci ids can be retrieved using::
347
348 lspci | grep Ethernet
349
350 Finally, start the application::
351
352 # TODO
353
354 .. _dpdk-vhost-user-xml:
355
356 Sample XML
357 ----------
358
359 ::
360
361 <domain type='kvm'>
362 <name>demovm</name>
363 <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid>
364 <memory unit='KiB'>4194304</memory>
365 <currentMemory unit='KiB'>4194304</currentMemory>
366 <memoryBacking>
367 <hugepages>
368 <page size='2' unit='M' nodeset='0'/>
369 </hugepages>
370 </memoryBacking>
371 <vcpu placement='static'>2</vcpu>
372 <cputune>
373 <shares>4096</shares>
374 <vcpupin vcpu='0' cpuset='4'/>
375 <vcpupin vcpu='1' cpuset='5'/>
376 <emulatorpin cpuset='4,5'/>
377 </cputune>
378 <os>
379 <type arch='x86_64' machine='pc'>hvm</type>
380 <boot dev='hd'/>
381 </os>
382 <features>
383 <acpi/>
384 <apic/>
385 </features>
386 <cpu mode='host-model'>
387 <model fallback='allow'/>
388 <topology sockets='2' cores='1' threads='1'/>
389 <numa>
390 <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
391 </numa>
392 </cpu>
393 <on_poweroff>destroy</on_poweroff>
394 <on_reboot>restart</on_reboot>
395 <on_crash>destroy</on_crash>
396 <devices>
397 <emulator>/usr/bin/qemu-system-x86_64</emulator>
398 <disk type='file' device='disk'>
399 <driver name='qemu' type='qcow2' cache='none'/>
400 <source file='/root/CentOS7_x86_64.qcow2'/>
401 <target dev='vda' bus='virtio'/>
402 </disk>
403 <interface type='vhostuser'>
404 <mac address='00:00:00:00:00:01'/>
405 <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/>
406 <model type='virtio'/>
407 <driver queues='2'>
408 <host mrg_rxbuf='on'/>
409 </driver>
410 </interface>
411 <interface type='vhostuser'>
412 <mac address='00:00:00:00:00:02'/>
413 <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/>
414 <model type='virtio'/>
415 <driver queues='2'>
416 <host mrg_rxbuf='on'/>
417 </driver>
418 </interface>
419 <serial type='pty'>
420 <target port='0'/>
421 </serial>
422 <console type='pty'>
423 <target type='serial' port='0'/>
424 </console>
425 </devices>
426 </domain>
427
428 .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
429
430 vhost-user Dequeue Zero Copy (experimental)
431 -------------------------------------------
432
433 Normally when dequeuing a packet from a vHost User device, a memcpy operation
434 must be used to copy that packet from guest address space to host address
435 space. This memcpy can be removed by enabling dequeue zero-copy like so::
436
437 $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
438 dpdkvhostuserclient0 type=dpdkvhostuserclient \
439 options:vhost-server-path=/tmp/dpdkvhostclient0 \
440 options:dq-zero-copy=true
441
442 With this feature enabled, a reference (pointer) to the packet is passed to
443 the host, instead of a copy of the packet. Removing this memcpy can give a
444 performance improvement for some use cases, for example switching large packets
445 between different VMs. However additional packet loss may be observed.
446
447 Note that the feature is disabled by default and must be explicitly enabled
448 by setting the ``dq-zero-copy`` option to ``true`` while specifying the
449 ``vhost-server-path`` option as above. If you wish to split out the command
450 into multiple commands as below, ensure ``dq-zero-copy`` is set before
451 ``vhost-server-path``::
452
453 $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
454 $ ovs-vsctl set Interface dpdkvhostuserclient0 \
455 options:vhost-server-path=/tmp/dpdkvhostclient0
456
457 The feature is only available to ``dpdkvhostuserclient`` port types.
458
459 A limitation exists whereby if packets from a vHost port with
460 ``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx
461 descriptors (``n_txq_desc``) for that port must be reduced to a smaller number,
462 128 being the recommended value. This can be achieved by issuing the following
463 command::
464
465 $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
466
467 Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to
468 should not exceed 128. For example, in case of a bond over two physical ports
469 in balance-tcp mode, one must divide 128 by the number of links in the bond.
470
471 Refer to :ref:`dpdk-queues-sizes` for more information.
472
473 The reason for this limitation is due to how the zero copy functionality is
474 implemented. The vHost device's 'tx used vring', a virtio structure used for
475 tracking used ie. sent descriptors, will only be updated when the NIC frees
476 the corresponding mbuf. If we don't free the mbufs frequently enough, that
477 vring will be starved and packets will no longer be processed. One way to
478 ensure we don't encounter this scenario, is to configure ``n_txq_desc`` to a
479 small enough number such that the 'mbuf free threshold' for the NIC will be hit
480 more often and thus free mbufs more frequently. The value of 128 is suggested,
481 but values of 64 and 256 have been tested and verified to work too, with
482 differing performance characteristics. A value of 512 can be used too, if the
483 virtio queue size in the guest is increased to 1024 (available to configure in
484 QEMU versions v2.10 and greater). This value can be set like so::
485
486 $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
487 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
488 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
489 tx_queue_size=1024
490
491 Because of this limitation, this feature is considered 'experimental'.
492
493 The feature currently does not fully work with QEMU >= v2.7 due to a bug in
494 DPDK which will be addressed in an upcoming release. The patch to fix this
495 issue can be found on
496 `Patchwork
497 <http://dpdk.org/dev/patchwork/patch/32198/>`__
498
499 Further information can be found in the
500 `DPDK documentation
501 <http://dpdk.readthedocs.io/en/v17.11/prog_guide/vhost_lib.html>`__