2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
6 http://www.apache.org/licenses/LICENSE-2.0
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
14 Convention for heading levels in Open vSwitch documentation:
16 ======= Heading 0 (reserved for the title in a document)
22 Avoid deeper levels because they do not render well.
28 The DPDK datapath provides DPDK-backed vHost user ports as a primary way to
29 interact with guests. For more information on vHost User, refer to the `QEMU
30 documentation`_ on same.
35 This example demonstrates how to add two ``dpdkvhostuserclient`` ports to an
36 existing bridge called ``br0``::
38 $ ovs-vsctl add-port br0 dpdkvhostclient0 \
39 -- set Interface dpdkvhostclient0 type=dpdkvhostuserclient \
40 options:vhost-server-path=/tmp/dpdkvhostclient0
41 $ ovs-vsctl add-port br0 dpdkvhostclient1 \
42 -- set Interface dpdkvhostclient1 type=dpdkvhostuserclient \
43 options:vhost-server-path=/tmp/dpdkvhostclient1
45 For the above examples to work, an appropriate server socket must be created
46 at the paths specified (``/tmp/dpdkvhostclient0`` and
47 ``/tmp/dpdkvhostclient1``). These sockets can be created with QEMU; see the
48 :ref:`vhost-user client <dpdk-vhost-user-client>` section for details.
50 vhost-user vs. vhost-user-client
51 --------------------------------
53 Open vSwitch provides two types of vHost User ports:
55 - vhost-user (``dpdkvhostuser``)
57 - vhost-user-client (``dpdkvhostuserclient``)
59 vHost User uses a client-server model. The server creates/manages/destroys the
60 vHost User sockets, and the client connects to the server. Depending on which
61 port type you use, ``dpdkvhostuser`` or ``dpdkvhostuserclient``, a different
62 configuration of the client-server model is used.
64 For vhost-user ports, Open vSwitch acts as the server and QEMU the client. This
65 means if OVS dies, all VMs **must** be restarted. On the other hand, for
66 vhost-user-client ports, OVS acts as the client and QEMU the server. This means
67 OVS can die and be restarted without issue, and it is also possible to restart
68 an instance itself. For this reason, vhost-user-client ports are the preferred
69 type for all known use cases; the only limitation is that vhost-user client
70 mode ports require QEMU version 2.7. Ports of type vhost-user are currently
71 deprecated and will be removed in a future release.
80 Use of vhost-user ports requires QEMU >= 2.2; vhost-user ports are
83 To use vhost-user ports, you must first add said ports to the switch. DPDK
84 vhost-user ports can have arbitrary names with the exception of forward and
85 backward slashes, which are prohibited. For vhost-user, the port type is
88 $ ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 \
91 This action creates a socket located at
92 ``/usr/local/var/run/openvswitch/vhost-user-1``, which you must provide to your
93 VM on the QEMU command line.
97 If you wish for the vhost-user sockets to be created in a sub-directory of
98 ``/usr/local/var/run/openvswitch``, you may specify this directory in the
101 $ ovs-vsctl --no-wait \
102 set Open_vSwitch . other_config:vhost-sock-dir=subdir
104 Once the vhost-user ports have been added to the switch, they must be added to
105 the guest. There are two ways to do this: using QEMU directly, or using
109 IOMMU is not supported with vhost-user ports.
111 Adding vhost-user ports to the guest (QEMU)
112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114 To begin, you must attach the vhost-user device sockets to the guest. To do
115 this, you must pass the following parameters to QEMU::
117 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
118 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
119 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
121 where ``vhost-user-1`` is the name of the vhost-user port added to the switch.
123 Repeat the above parameters for multiple devices, changing the chardev ``path``
124 and ``id`` as necessary. Note that a separate and different chardev ``path``
125 needs to be specified for each vhost-user device. For example you have a second
126 vhost-user port named ``vhost-user-2``, you append your QEMU command line with
127 an additional set of parameters::
129 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
130 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
131 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
133 In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user
134 ports access a virtio-net device's virtual rings and packet buffers mapping the
135 VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
136 memory into their process address space, pass the following parameters to
139 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
140 -numa node,memdev=mem -mem-prealloc
142 Finally, you may wish to enable multiqueue support. This is optional but,
143 should you wish to enable it, run::
145 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
146 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
147 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
154 The number of vectors, which is ``$q`` * 2 + 2
156 The vhost-user interface will be automatically reconfigured with required
157 number of rx and tx queues after connection of virtio device. Manual
158 configuration of ``n_rxq`` is not supported because OVS will work properly only
159 if ``n_rxq`` will match number of queues configured in QEMU.
161 A least 2 PMDs should be configured for the vswitch when using multiqueue.
162 Using a single PMD will cause traffic to be enqueued to the same vhost queue
163 rather than being distributed among different vhost queues for a vhost-user
166 If traffic destined for a VM configured with multiqueue arrives to the vswitch
167 via a physical DPDK port, then the number of rxqs should also be set to at
168 least 2 for that physical DPDK port. This is required to increase the
169 probability that a different PMD will handle the multiqueue transmission to the
170 guest using a different vhost queue.
172 If one wishes to use multiple queues for an interface in the guest, the driver
173 in the guest operating system must be configured to do so. It is recommended
174 that the number of queues configured be equal to ``$q``.
176 For example, this can be done for the Linux kernel virtio-net driver with::
178 $ ethtool -L <DEV> combined <$q>
183 Changes the numbers of channels of the specified network device
185 Changes the number of multi-purpose channels.
187 Adding vhost-user ports to the guest (libvirt)
188 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190 To begin, you must change the user and group that qemu runs under, and restart
193 - In ``/etc/libvirt/qemu.conf`` add/edit the following lines::
198 - Finally, restart the libvirtd process, For example, on Fedora::
200 $ systemctl restart libvirtd.service
202 Once complete, instantiate the VM. A sample XML configuration file is provided
203 at the :ref:`end of this file <dpdk-vhost-user-xml>`. Save this file, then
204 create a VM using this file::
206 $ virsh create demovm.xml
208 Once created, you can connect to the guest console::
210 $ virsh console demovm
212 The demovm xml configuration is aimed at achieving out of box performance on
213 VM. These enhancements include:
215 - The vcpus are pinned to the cores of the CPU socket 0 using ``vcpupin``.
217 - Configure NUMA cell and memory shared using ``memAccess='shared'``.
219 - Disable ``mrg_rxbuf='off'``
221 Refer to the `libvirt documentation <http://libvirt.org/formatdomain.html>`__
222 for more information.
224 .. _dpdk-vhost-user-client:
231 Use of vhost-user ports requires QEMU >= 2.7
233 To use vhost-user-client ports, you must first add said ports to the switch.
234 Like DPDK vhost-user ports, DPDK vhost-user-client ports can have mostly
235 arbitrary names. However, the name given to the port does not govern the name
236 of the socket device. Instead, this must be configured by the user by way of a
237 ``vhost-server-path`` option. For vhost-user-client, the port type is
238 ``dpdkvhostuserclient``::
240 $ VHOST_USER_SOCKET_PATH=/path/to/socket
241 $ ovs-vsctl add-port br0 vhost-client-1 \
242 -- set Interface vhost-client-1 type=dpdkvhostuserclient \
243 options:vhost-server-path=$VHOST_USER_SOCKET_PATH
245 Once the vhost-user-client ports have been added to the switch, they must be
246 added to the guest. Like vhost-user ports, there are two ways to do this: using
247 QEMU directly, or using libvirt. Only the QEMU case is covered here.
249 Adding vhost-user-client ports to the guest (QEMU)
250 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
252 Attach the vhost-user device sockets to the guest. To do this, you must pass
253 the following parameters to QEMU::
255 -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
256 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
257 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
259 where ``vhost-user-1`` is the name of the vhost-user port added to the switch.
261 If the corresponding ``dpdkvhostuserclient`` port has not yet been configured
262 in OVS with ``vhost-server-path=/path/to/socket``, QEMU will print a log
263 similar to the following::
265 QEMU waiting for connection on: disconnected:unix:/path/to/socket,server
267 QEMU will wait until the port is created sucessfully in OVS to boot the VM.
268 One benefit of using this mode is the ability for vHost ports to 'reconnect' in
269 event of the switch crashing or being brought down. Once it is brought back up,
270 the vHost ports will reconnect automatically and normal service will resume.
272 vhost-user-client IOMMU Support
273 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275 vhost IOMMU is a feature which restricts the vhost memory that a virtio device
276 can access, and as such is useful in deployments in which security is a
279 IOMMU support may be enabled via a global config value,
280 ```vhost-iommu-support```. Setting this to true enables vhost IOMMU support for
281 all vhost ports when/where available::
283 $ ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
285 The default value is false.
289 Changing this value requires restarting the daemon.
293 Enabling the IOMMU feature also enables the vhost user reply-ack protocol;
294 this is known to work on QEMU v2.10.0, but is buggy on older versions
295 (2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feature is disabled by
296 default (and should remain so if using the aforementioned versions of
297 QEMU). Starting with QEMU v2.9.1, vhost-iommu-support can safely be
298 enabled, even without having an IOMMU device, with no performance penalty.
305 The DPDK ``testpmd`` application can be run in guest VMs for high speed packet
306 forwarding between vhostuser ports. DPDK and testpmd application has to be
307 compiled on the guest VM. Below are the steps for setting up the testpmd
308 application in the VM.
312 Support for DPDK in the guest requires QEMU >= 2.2
314 To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or
315 :ref:`dpdk-vhost-user-client`. Once started, connect to the VM, download the
316 DPDK sources to VM and build DPDK::
319 $ wget http://fast.dpdk.org/rel/dpdk-17.11.1.tar.xz
320 $ tar xf dpdk-17.11.1.tar.xz
321 $ export DPDK_DIR=/root/dpdk/dpdk-stable-17.11.1
322 $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
323 $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
325 $ make install T=$DPDK_TARGET DESTDIR=install
327 Build the test-pmd application::
330 $ export RTE_SDK=$DPDK_DIR
331 $ export RTE_TARGET=$DPDK_TARGET
334 Setup huge pages and DPDK devices using UIO::
336 $ sysctl vm.nr_hugepages=1024
337 $ mkdir -p /dev/hugepages
338 $ mount -t hugetlbfs hugetlbfs /dev/hugepages # only if not already mounted
340 $ insmod $DPDK_BUILD/kmod/igb_uio.ko
341 $ $DPDK_DIR/usertools/dpdk-devbind.py --status
342 $ $DPDK_DIR/usertools/dpdk-devbind.py -b igb_uio 00:03.0 00:04.0
346 vhost ports pci ids can be retrieved using::
348 lspci | grep Ethernet
350 Finally, start the application::
354 .. _dpdk-vhost-user-xml:
363 <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid>
364 <memory unit='KiB'>4194304</memory>
365 <currentMemory unit='KiB'>4194304</currentMemory>
368 <page size='2' unit='M' nodeset='0'/>
371 <vcpu placement='static'>2</vcpu>
373 <shares>4096</shares>
374 <vcpupin vcpu='0' cpuset='4'/>
375 <vcpupin vcpu='1' cpuset='5'/>
376 <emulatorpin cpuset='4,5'/>
379 <type arch='x86_64' machine='pc'>hvm</type>
386 <cpu mode='host-model'>
387 <model fallback='allow'/>
388 <topology sockets='2' cores='1' threads='1'/>
390 <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
393 <on_poweroff>destroy</on_poweroff>
394 <on_reboot>restart</on_reboot>
395 <on_crash>destroy</on_crash>
397 <emulator>/usr/bin/qemu-system-x86_64</emulator>
398 <disk type='file' device='disk'>
399 <driver name='qemu' type='qcow2' cache='none'/>
400 <source file='/root/CentOS7_x86_64.qcow2'/>
401 <target dev='vda' bus='virtio'/>
403 <interface type='vhostuser'>
404 <mac address='00:00:00:00:00:01'/>
405 <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/>
406 <model type='virtio'/>
408 <host mrg_rxbuf='on'/>
411 <interface type='vhostuser'>
412 <mac address='00:00:00:00:00:02'/>
413 <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/>
414 <model type='virtio'/>
416 <host mrg_rxbuf='on'/>
423 <target type='serial' port='0'/>
428 .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
430 vhost-user Dequeue Zero Copy (experimental)
431 -------------------------------------------
433 Normally when dequeuing a packet from a vHost User device, a memcpy operation
434 must be used to copy that packet from guest address space to host address
435 space. This memcpy can be removed by enabling dequeue zero-copy like so::
437 $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
438 dpdkvhostuserclient0 type=dpdkvhostuserclient \
439 options:vhost-server-path=/tmp/dpdkvhostclient0 \
440 options:dq-zero-copy=true
442 With this feature enabled, a reference (pointer) to the packet is passed to
443 the host, instead of a copy of the packet. Removing this memcpy can give a
444 performance improvement for some use cases, for example switching large packets
445 between different VMs. However additional packet loss may be observed.
447 Note that the feature is disabled by default and must be explicitly enabled
448 by setting the ``dq-zero-copy`` option to ``true`` while specifying the
449 ``vhost-server-path`` option as above. If you wish to split out the command
450 into multiple commands as below, ensure ``dq-zero-copy`` is set before
451 ``vhost-server-path``::
453 $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
454 $ ovs-vsctl set Interface dpdkvhostuserclient0 \
455 options:vhost-server-path=/tmp/dpdkvhostclient0
457 The feature is only available to ``dpdkvhostuserclient`` port types.
459 A limitation exists whereby if packets from a vHost port with
460 ``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx
461 descriptors (``n_txq_desc``) for that port must be reduced to a smaller number,
462 128 being the recommended value. This can be achieved by issuing the following
465 $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
467 Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to
468 should not exceed 128. For example, in case of a bond over two physical ports
469 in balance-tcp mode, one must divide 128 by the number of links in the bond.
471 Refer to :ref:`dpdk-queues-sizes` for more information.
473 The reason for this limitation is due to how the zero copy functionality is
474 implemented. The vHost device's 'tx used vring', a virtio structure used for
475 tracking used ie. sent descriptors, will only be updated when the NIC frees
476 the corresponding mbuf. If we don't free the mbufs frequently enough, that
477 vring will be starved and packets will no longer be processed. One way to
478 ensure we don't encounter this scenario, is to configure ``n_txq_desc`` to a
479 small enough number such that the 'mbuf free threshold' for the NIC will be hit
480 more often and thus free mbufs more frequently. The value of 128 is suggested,
481 but values of 64 and 256 have been tested and verified to work too, with
482 differing performance characteristics. A value of 512 can be used too, if the
483 virtio queue size in the guest is increased to 1024 (available to configure in
484 QEMU versions v2.10 and greater). This value can be set like so::
486 $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
487 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
488 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
491 Because of this limitation, this feature is considered 'experimental'.
493 The feature currently does not fully work with QEMU >= v2.7 due to a bug in
494 DPDK which will be addressed in an upcoming release. The patch to fix this
495 issue can be found on
497 <http://dpdk.org/dev/patchwork/patch/32198/>`__
499 Further information can be found in the
501 <http://dpdk.readthedocs.io/en/v17.11/prog_guide/vhost_lib.html>`__