2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
6 http://www.apache.org/licenses/LICENSE-2.0
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
14 Convention for heading levels in Open vSwitch documentation:
16 ======= Heading 0 (reserved for the title in a document)
22 Avoid deeper levels because they do not render well.
28 The DPDK datapath provides DPDK-backed vHost user ports as a primary way to
29 interact with guests. For more information on vHost User, refer to the `QEMU
30 documentation`_ on same.
34 To use any DPDK-backed interface, you must ensure your bridge is configured
35 correctly. For more information, refer to :doc:`bridge`.
40 This example demonstrates how to add two ``dpdkvhostuserclient`` ports to an
41 existing bridge called ``br0``::
43 $ ovs-vsctl add-port br0 dpdkvhostclient0 \
44 -- set Interface dpdkvhostclient0 type=dpdkvhostuserclient \
45 options:vhost-server-path=/tmp/dpdkvhostclient0
46 $ ovs-vsctl add-port br0 dpdkvhostclient1 \
47 -- set Interface dpdkvhostclient1 type=dpdkvhostuserclient \
48 options:vhost-server-path=/tmp/dpdkvhostclient1
50 For the above examples to work, an appropriate server socket must be created
51 at the paths specified (``/tmp/dpdkvhostclient0`` and
52 ``/tmp/dpdkvhostclient1``). These sockets can be created with QEMU; see the
53 :ref:`vhost-user client <dpdk-vhost-user-client>` section for details.
55 vhost-user vs. vhost-user-client
56 --------------------------------
58 Open vSwitch provides two types of vHost User ports:
60 - vhost-user (``dpdkvhostuser``)
62 - vhost-user-client (``dpdkvhostuserclient``)
64 vHost User uses a client-server model. The server creates/manages/destroys the
65 vHost User sockets, and the client connects to the server. Depending on which
66 port type you use, ``dpdkvhostuser`` or ``dpdkvhostuserclient``, a different
67 configuration of the client-server model is used.
69 For vhost-user ports, Open vSwitch acts as the server and QEMU the client. This
70 means if OVS dies, all VMs **must** be restarted. On the other hand, for
71 vhost-user-client ports, OVS acts as the client and QEMU the server. This means
72 OVS can die and be restarted without issue, and it is also possible to restart
73 an instance itself. For this reason, vhost-user-client ports are the preferred
74 type for all known use cases; the only limitation is that vhost-user client
75 mode ports require QEMU version 2.7. Ports of type vhost-user are currently
76 deprecated and will be removed in a future release.
85 Use of vhost-user ports requires QEMU >= 2.2; vhost-user ports are
88 To use vhost-user ports, you must first add said ports to the switch. DPDK
89 vhost-user ports can have arbitrary names with the exception of forward and
90 backward slashes, which are prohibited. For vhost-user, the port type is
93 $ ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 \
96 This action creates a socket located at
97 ``/usr/local/var/run/openvswitch/vhost-user-1``, which you must provide to your
98 VM on the QEMU command line.
102 If you wish for the vhost-user sockets to be created in a sub-directory of
103 ``/usr/local/var/run/openvswitch``, you may specify this directory in the
106 $ ovs-vsctl --no-wait \
107 set Open_vSwitch . other_config:vhost-sock-dir=subdir
109 Once the vhost-user ports have been added to the switch, they must be added to
110 the guest. There are two ways to do this: using QEMU directly, or using
115 IOMMU and Post-copy Live Migration are not supported with vhost-user ports.
117 Adding vhost-user ports to the guest (QEMU)
118 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120 To begin, you must attach the vhost-user device sockets to the guest. To do
121 this, you must pass the following parameters to QEMU::
123 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
124 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
125 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
127 where ``vhost-user-1`` is the name of the vhost-user port added to the switch.
129 Repeat the above parameters for multiple devices, changing the chardev ``path``
130 and ``id`` as necessary. Note that a separate and different chardev ``path``
131 needs to be specified for each vhost-user device. For example you have a second
132 vhost-user port named ``vhost-user-2``, you append your QEMU command line with
133 an additional set of parameters::
135 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
136 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
137 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
139 In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
140 access a virtio-net device's virtual rings and packet buffers mapping the VM's
141 physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory
142 into their process address space, pass the following parameters to QEMU::
144 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
145 -numa node,memdev=mem -mem-prealloc
147 Finally, you may wish to enable multiqueue support. This is optional but,
148 should you wish to enable it, run::
150 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
151 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
152 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
159 The number of vectors, which is ``$q`` * 2 + 2
161 The vhost-user interface will be automatically reconfigured with required
162 number of Rx and Tx queues after connection of virtio device. Manual
163 configuration of ``n_rxq`` is not supported because OVS will work properly only
164 if ``n_rxq`` will match number of queues configured in QEMU.
166 A least two PMDs should be configured for the vswitch when using multiqueue.
167 Using a single PMD will cause traffic to be enqueued to the same vhost queue
168 rather than being distributed among different vhost queues for a vhost-user
171 If traffic destined for a VM configured with multiqueue arrives to the vswitch
172 via a physical DPDK port, then the number of Rx queues should also be set to at
173 least two for that physical DPDK port. This is required to increase the
174 probability that a different PMD will handle the multiqueue transmission to the
175 guest using a different vhost queue.
177 If one wishes to use multiple queues for an interface in the guest, the driver
178 in the guest operating system must be configured to do so. It is recommended
179 that the number of queues configured be equal to ``$q``.
181 For example, this can be done for the Linux kernel virtio-net driver with::
183 $ ethtool -L <DEV> combined <$q>
188 Changes the numbers of channels of the specified network device
190 Changes the number of multi-purpose channels.
192 Adding vhost-user ports to the guest (libvirt)
193 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195 To begin, you must change the user and group that qemu runs under, and restart
198 - In ``/etc/libvirt/qemu.conf`` add/edit the following lines::
203 - Finally, restart the libvirtd process, For example, on Fedora::
205 $ systemctl restart libvirtd.service
207 Once complete, instantiate the VM. A sample XML configuration file is provided
208 at the :ref:`end of this file <dpdk-vhost-user-xml>`. Save this file, then
209 create a VM using this file::
211 $ virsh create demovm.xml
213 Once created, you can connect to the guest console::
215 $ virsh console demovm
217 The demovm xml configuration is aimed at achieving out of box performance on
218 VM. These enhancements include:
220 - The vcpus are pinned to the cores of the CPU socket 0 using ``vcpupin``.
222 - Configure NUMA cell and memory shared using ``memAccess='shared'``.
224 - Disable ``mrg_rxbuf='off'``
226 Refer to the `libvirt documentation <http://libvirt.org/formatdomain.html>`__
227 for more information.
229 .. _dpdk-vhost-user-client:
236 Use of vhost-user ports requires QEMU >= 2.7
238 To use vhost-user-client ports, you must first add said ports to the switch.
239 Like DPDK vhost-user ports, DPDK vhost-user-client ports can have mostly
240 arbitrary names. However, the name given to the port does not govern the name
241 of the socket device. Instead, this must be configured by the user by way of a
242 ``vhost-server-path`` option. For vhost-user-client, the port type is
243 ``dpdkvhostuserclient``::
245 $ VHOST_USER_SOCKET_PATH=/path/to/socket
246 $ ovs-vsctl add-port br0 vhost-client-1 \
247 -- set Interface vhost-client-1 type=dpdkvhostuserclient \
248 options:vhost-server-path=$VHOST_USER_SOCKET_PATH
250 Once the vhost-user-client ports have been added to the switch, they must be
251 added to the guest. Like vhost-user ports, there are two ways to do this: using
252 QEMU directly, or using libvirt. Only the QEMU case is covered here.
254 Adding vhost-user-client ports to the guest (QEMU)
255 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
257 Attach the vhost-user device sockets to the guest. To do this, you must pass
258 the following parameters to QEMU::
260 -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
261 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
262 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
264 where ``vhost-user-1`` is the name of the vhost-user port added to the switch.
266 If the corresponding ``dpdkvhostuserclient`` port has not yet been configured
267 in OVS with ``vhost-server-path=/path/to/socket``, QEMU will print a log
268 similar to the following::
270 QEMU waiting for connection on: disconnected:unix:/path/to/socket,server
272 QEMU will wait until the port is created sucessfully in OVS to boot the VM.
273 One benefit of using this mode is the ability for vHost ports to 'reconnect' in
274 event of the switch crashing or being brought down. Once it is brought back up,
275 the vHost ports will reconnect automatically and normal service will resume.
277 vhost-user-client IOMMU Support
278 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
280 vhost IOMMU is a feature which restricts the vhost memory that a virtio device
281 can access, and as such is useful in deployments in which security is a
284 IOMMU support may be enabled via a global config value,
285 ```vhost-iommu-support```. Setting this to true enables vhost IOMMU support for
286 all vhost ports when/where available::
288 $ ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
290 The default value is false.
294 Changing this value requires restarting the daemon.
298 Enabling the IOMMU feature also enables the vhost user reply-ack protocol;
299 this is known to work on QEMU v2.10.0, but is buggy on older versions
300 (2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feature is disabled by
301 default (and should remain so if using the aforementioned versions of
302 QEMU). Starting with QEMU v2.9.1, vhost-iommu-support can safely be
303 enabled, even without having an IOMMU device, with no performance penalty.
305 vhost-user-client Post-copy Live Migration Support (experimental)
306 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
308 ``Post-copy`` migration is the migration mode where the destination CPUs are
309 started before all the memory has been transferred. The main advantage is the
310 predictable migration time. Mostly used as a second phase after the normal
311 'pre-copy' migration in case it takes too long to converge.
313 More information can be found in QEMU `docs`_.
315 .. _`docs`: https://git.qemu.org/?p=qemu.git;a=blob;f=docs/devel/migration.rst
317 Post-copy support may be enabled via a global config value
318 ``vhost-postcopy-support``. Setting this to ``true`` enables Post-copy support
319 for all vhost-user-client ports::
321 $ ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true
323 The default value is ``false``.
327 Changing this value requires restarting the daemon.
331 DPDK Post-copy migration mode uses userfaultfd syscall to communicate with
332 the kernel about page fault handling and uses shared memory based on huge
333 pages. So destination host linux kernel should support userfaultfd over
334 shared hugetlbfs. This feature only introduced in kernel upstream version
337 Post-copy feature supported in DPDK since 18.11.0 version and in QEMU
338 since 2.12.0 version. But it's suggested to use QEMU >= 3.0.1 because
339 migration recovery was fixed for post-copy in 3.0 and few additional bug
340 fixes (like userfaulfd leak) was released in 3.0.1.
342 DPDK Post-copy feature requires avoiding to populate the guest memory
343 (application must not call mlock* syscall). So enabling mlockall and
344 dequeue zero-copy features is mis-compatible with post-copy feature.
346 Note that during migration of vhost-user device, PMD threads hang for the
347 time of faulted pages download from source host. Transferring 1GB hugepage
348 across a 10Gbps link possibly unacceptably slow. So recommended hugepage
356 The DPDK ``testpmd`` application can be run in guest VMs for high speed packet
357 forwarding between vhostuser ports. DPDK and testpmd application has to be
358 compiled on the guest VM. Below are the steps for setting up the testpmd
359 application in the VM.
363 Support for DPDK in the guest requires QEMU >= 2.2
365 To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or
366 :ref:`dpdk-vhost-user-client`. Once started, connect to the VM, download the
367 DPDK sources to VM and build DPDK::
370 $ wget http://fast.dpdk.org/rel/dpdk-18.11.2.tar.xz
371 $ tar xf dpdk-18.11.2.tar.xz
372 $ export DPDK_DIR=/root/dpdk/dpdk-stable-18.11.2
373 $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
374 $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
376 $ make install T=$DPDK_TARGET DESTDIR=install
378 Build the test-pmd application::
381 $ export RTE_SDK=$DPDK_DIR
382 $ export RTE_TARGET=$DPDK_TARGET
385 Setup huge pages and DPDK devices using UIO::
387 $ sysctl vm.nr_hugepages=1024
388 $ mkdir -p /dev/hugepages
389 $ mount -t hugetlbfs hugetlbfs /dev/hugepages # only if not already mounted
391 $ insmod $DPDK_BUILD/kmod/igb_uio.ko
392 $ $DPDK_DIR/usertools/dpdk-devbind.py --status
393 $ $DPDK_DIR/usertools/dpdk-devbind.py -b igb_uio 00:03.0 00:04.0
397 vhost ports pci ids can be retrieved using::
399 lspci | grep Ethernet
401 Finally, start the application::
405 .. _dpdk-vhost-user-xml:
414 <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid>
415 <memory unit='KiB'>4194304</memory>
416 <currentMemory unit='KiB'>4194304</currentMemory>
419 <page size='2' unit='M' nodeset='0'/>
422 <vcpu placement='static'>2</vcpu>
424 <shares>4096</shares>
425 <vcpupin vcpu='0' cpuset='4'/>
426 <vcpupin vcpu='1' cpuset='5'/>
427 <emulatorpin cpuset='4,5'/>
430 <type arch='x86_64' machine='pc'>hvm</type>
437 <cpu mode='host-model'>
438 <model fallback='allow'/>
439 <topology sockets='2' cores='1' threads='1'/>
441 <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
444 <on_poweroff>destroy</on_poweroff>
445 <on_reboot>restart</on_reboot>
446 <on_crash>destroy</on_crash>
448 <emulator>/usr/bin/qemu-system-x86_64</emulator>
449 <disk type='file' device='disk'>
450 <driver name='qemu' type='qcow2' cache='none'/>
451 <source file='/root/CentOS7_x86_64.qcow2'/>
452 <target dev='vda' bus='virtio'/>
454 <interface type='vhostuser'>
455 <mac address='00:00:00:00:00:01'/>
456 <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/>
457 <model type='virtio'/>
459 <host mrg_rxbuf='on'/>
462 <interface type='vhostuser'>
463 <mac address='00:00:00:00:00:02'/>
464 <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/>
465 <model type='virtio'/>
467 <host mrg_rxbuf='on'/>
474 <target type='serial' port='0'/>
479 .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
484 DPDK vHost User ports can be configured to use Jumbo Frames. For more
485 information, refer to :doc:`jumbo-frames`.
490 When sending a batch of packets to a vhost-user or vhost-user-client interface,
491 it may happen that some but not all of the packets in the batch are able to be
492 sent to the guest. This is often because there is not enough free descriptors
493 in the virtqueue for all the packets in the batch to be sent. In this case
494 there will be a retry, with a default maximum of 8 occurring. If at any time no
495 packets can be sent, it may mean the guest is not accepting packets, so there
496 are no (more) retries.
500 Maximum vhost tx batch size is defined by NETDEV_MAX_BURST, and is currently
503 Tx Retries may be reduced or even avoided by some external configuration, such
504 as increasing the virtqueue size through the ``rx_queue_size`` parameter
505 introduced in QEMU 2.7.0 / libvirt 2.3.0::
507 <interface type='vhostuser'>
508 <mac address='56:48:4f:53:54:01'/>
509 <source type='unix' path='/tmp/dpdkvhostclient0' mode='server'/>
510 <model type='virtio'/>
511 <driver name='vhost' rx_queue_size='1024' tx_queue_size='1024'/>
512 <address type='pci' domain='0x0000' bus='0x00' slot='0x10' function='0x0'/>
515 The guest application will also need need to provide enough descriptors. For
516 example with ``testpmd`` the command line argument can be used::
518 --rxd=1024 --txd=1024
520 The guest should also have sufficient cores dedicated for consuming and
521 processing packets at the required rate.
523 The amount of Tx retries on a vhost-user or vhost-user-client interface can be
526 $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
528 vhost-user Dequeue Zero Copy (experimental)
529 -------------------------------------------
531 Normally when dequeuing a packet from a vHost User device, a memcpy operation
532 must be used to copy that packet from guest address space to host address
533 space. This memcpy can be removed by enabling dequeue zero-copy like so::
535 $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
536 dpdkvhostuserclient0 type=dpdkvhostuserclient \
537 options:vhost-server-path=/tmp/dpdkvhostclient0 \
538 options:dq-zero-copy=true
540 With this feature enabled, a reference (pointer) to the packet is passed to
541 the host, instead of a copy of the packet. Removing this memcpy can give a
542 performance improvement for some use cases, for example switching large packets
543 between different VMs. However additional packet loss may be observed.
545 Note that the feature is disabled by default and must be explicitly enabled
546 by setting the ``dq-zero-copy`` option to ``true`` while specifying the
547 ``vhost-server-path`` option as above. If you wish to split out the command
548 into multiple commands as below, ensure ``dq-zero-copy`` is set before
549 ``vhost-server-path``::
551 $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
552 $ ovs-vsctl set Interface dpdkvhostuserclient0 \
553 options:vhost-server-path=/tmp/dpdkvhostclient0
555 The feature is only available to ``dpdkvhostuserclient`` port types.
557 A limitation exists whereby if packets from a vHost port with
558 ``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx
559 descriptors (``n_txq_desc``) for that port must be reduced to a smaller number,
560 128 being the recommended value. This can be achieved by issuing the following
563 $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
565 Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to
566 should not exceed 128. For example, in case of a bond over two physical ports
567 in balance-tcp mode, one must divide 128 by the number of links in the bond.
569 Refer to :ref:`dpdk-queues-sizes` for more information.
571 The reason for this limitation is due to how the zero copy functionality is
572 implemented. The vHost device's 'tx used vring', a virtio structure used for
573 tracking used ie. sent descriptors, will only be updated when the NIC frees
574 the corresponding mbuf. If we don't free the mbufs frequently enough, that
575 vring will be starved and packets will no longer be processed. One way to
576 ensure we don't encounter this scenario, is to configure ``n_txq_desc`` to a
577 small enough number such that the 'mbuf free threshold' for the NIC will be hit
578 more often and thus free mbufs more frequently. The value of 128 is suggested,
579 but values of 64 and 256 have been tested and verified to work too, with
580 differing performance characteristics. A value of 512 can be used too, if the
581 virtio queue size in the guest is increased to 1024 (available to configure in
582 QEMU versions v2.10 and greater). This value can be set like so::
584 $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
585 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
586 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
589 Because of this limitation, this feature is considered 'experimental'.
593 Post-copy Live Migration is not compatible with dequeue zero copy.
595 Further information can be found in the
597 <https://doc.dpdk.org/guides-18.11/prog_guide/vhost_lib.html>`__