2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
6 http://www.apache.org/licenses/LICENSE-2.0
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
14 Convention for heading levels in Open vSwitch documentation:
16 ======= Heading 0 (reserved for the title in a document)
22 Avoid deeper levels because they do not render well.
24 =================================
25 Open vSwitch with DPDK (Advanced)
26 =================================
28 The Advanced Install Guide explains how to improve OVS performance when using
29 DPDK datapath. This guide provides information on tuning, system configuration,
30 troubleshooting, static code analysis and testcases.
32 Building as a Shared Library
33 ----------------------------
35 DPDK can be built as a static or a shared library and shall be linked by
36 applications using DPDK datapath. When building OVS with DPDK, you can link
37 Open vSwitch against the shared DPDK library.
40 Minor performance loss is seen with OVS when using shared DPDK library as
41 compared to static library.
43 To build Open vSwitch using DPDK as a shared library, first refer to the `DPDK
44 installation guide`_ for download instructions for DPDK and OVS.
46 Once DPDK and OVS have been downloaded, you must configure the DPDK library
47 accordingly. Simply set ``CONFIG_RTE_BUILD_SHARED_LIB=y`` in
48 ``config/common_base``, then build and install DPDK. Once done, DPDK can be
49 built as usual. For example::
51 $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
52 $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
53 $ make install T=$DPDK_TARGET DESTDIR=install
55 Once DPDK is built, export the DPDK shared library location and setup OVS as
56 detailed in the `DPDK installation guide`_::
58 $ export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib
63 To achieve optimal OVS performance, the system can be configured and that
64 includes BIOS tweaks, Grub cmdline additions, better understanding of NUMA
65 nodes and apt selection of PCIe slots for NIC placement.
67 Recommended BIOS Settings
68 ~~~~~~~~~~~~~~~~~~~~~~~~~
70 .. list-table:: Recommended BIOS Settings
81 * - MLC Spacial Prefetcher
83 * - DCU Data Prefetcher
87 * - CPU Power and Performance
89 * - Memeory RAS and Performance Config -> NUMA optimized
95 The fastpath performance can be affected by factors related to the placement of
96 the NIC, such as channel speeds between PCIe slot and CPU or the proximity of
97 PCIe slot to the CPU cores running the DPDK application. Listed below are the
98 steps to identify right PCIe slot.
100 #. Retrieve host details using ``dmidecode``. For example::
102 $ dmidecode -t baseboard | grep "Product Name"
104 #. Download the technical specification for product listed, e.g: S2600WT2
106 #. Check the Product Architecture Overview on the Riser slot placement, CPU
107 sharing info and also PCIe channel speeds
109 For example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed
110 between CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s.
111 Running DPDK app on CPU1 cores and NIC inserted in to Riser card Slots will
112 optimize OVS performance in this case.
114 #. Check the Riser Card #1 - Root Port mapping information, on the available
115 slots and individual bus speeds. In S2600WT slot 1, slot 2 has high bus
116 speeds and are potential slots for NIC placement.
118 Advanced Hugepage Setup
119 ~~~~~~~~~~~~~~~~~~~~~~~
121 Allocate and mount 1 GB hugepages.
123 - For persistent allocation of huge pages, add the following options to the
126 default_hugepagesz=1GB hugepagesz=1G hugepages=N
128 For platforms supporting multiple huge page sizes, add multiple options::
130 default_hugepagesz=<size> hugepagesz=<size> hugepages=N
135 number of huge pages requested
137 huge page size with an optional suffix ``[kKmMgG]``
139 - For run-time allocation of huge pages::
141 $ echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages
146 number of huge pages requested
151 For run-time allocation of 1G huge pages, Contiguous Memory Allocator
152 (``CONFIG_CMA``) has to be supported by kernel, check your Linux distro.
154 Now mount the huge pages, if not already done so::
156 $ mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
158 Enable HyperThreading
159 ~~~~~~~~~~~~~~~~~~~~~
161 With HyperThreading, or SMT, enabled, a physical core appears as two logical
162 cores. SMT can be utilized to spawn worker threads on logical cores of the same
163 physical core there by saving additional cores.
165 With DPDK, when pinning pmd threads to logical cores, care must be taken to set
166 the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads are
167 pinned to SMT siblings.
169 Take a sample system configuration, with 2 sockets, 2 * 10 core processors, HT
170 enabled. This gives us a total of 40 logical cores. To identify the physical
171 core shared by two logical cores, run::
173 $ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
175 where ``N`` is the logical core number.
177 In this example, it would show that cores ``1`` and ``21`` share the same
178 physical core., thus, the ``pmd-cpu-mask`` can be used to enable these two pmd
179 threads running on these two logical cores (one physical core) is::
181 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002
186 The ``isolcpus`` option can be used to isolate cores from the Linux scheduler.
187 The isolated cores can then be used to dedicatedly run HPC applications or
188 threads. This helps in better application performance due to zero context
189 switching and minimal cache thrashing. To run platform logic on core 0 and
190 isolate cores between 1 and 19 from scheduler, add ``isolcpus=1-19`` to GRUB
194 It has been verified that core isolation has minimal advantage due to mature
195 Linux scheduler in some circumstances.
200 Ideally inter-NUMA datapaths should be avoided where possible as packets will
201 go across QPI and there may be a slight performance penalty when compared with
202 intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is
203 introduced on models that have 10 cores or more. This makes it possible to
204 logically split a socket into two NUMA regions and again it is preferred where
205 possible to keep critical datapaths within the one cluster.
207 It is good practice to ensure that threads that are in the datapath are pinned
208 to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs responsible for
209 forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, vHost
210 User ports automatically detect the NUMA socket of the QEMU vCPUs and will be
211 serviced by a PMD from the same node provided a core on this node is enabled in
212 the ``pmd-cpu-mask``. ``libnuma`` packages are required for this feature.
214 Compiler Optimizations
215 ~~~~~~~~~~~~~~~~~~~~~~
217 The default compiler optimization level is ``-O2``. Changing this to more
218 aggressive compiler optimization such as ``-O3 -march=native`` with
219 gcc (verified on 5.3.1) can produce performance gains though not siginificant.
220 ``-march=native`` will produce optimized code on local machine and should be
221 used when software compilation is done on Testbed.
229 For superior performance, DPDK pmd threads and Qemu vCPU threads needs to be
230 affinitized accordingly.
232 - PMD thread Affinity
234 A poll mode driver (pmd) thread handles the I/O of all DPDK interfaces
235 assigned to it. A pmd thread shall poll the ports for incoming packets,
236 switch the packets and send to tx port. pmd thread is CPU bound, and needs
237 to be affinitized to isolated cores for optimum performance.
239 By setting a bit in the mask, a pmd thread is created and pinned to the
240 corresponding CPU core. e.g. to run a pmd thread on core 2::
242 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4
245 pmd thread on a NUMA node is only created if there is at least one DPDK
246 interface from that NUMA node added to OVS.
248 - QEMU vCPU thread Affinity
250 A VM performing simple packet forwarding or running complex packet pipelines
251 has to ensure that the vCPU threads performing the work has as much CPU
252 occupancy as possible.
254 For example, on a multicore VM, multiple QEMU vCPU threads shall be spawned.
255 When the DPDK ``testpmd`` application that does packet forwarding is invoked,
256 the ``taskset`` command should be used to affinitize the vCPU threads to the
257 dedicated isolated cores on the host system.
259 Multiple Poll-Mode Driver Threads
260 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262 With pmd multi-threading support, OVS creates one pmd thread for each NUMA node
263 by default. However, in cases where there are multiple ports/rxq's producing
264 traffic, performance can be improved by creating multiple pmd threads running
265 on separate cores. These pmd threads can share the workload by each being
266 responsible for different ports/rxq's. Assignment of ports/rxq's to pmd threads
267 is done automatically.
269 A set bit in the mask means a pmd thread is created and pinned to the
270 corresponding CPU core. For example, to run pmd threads on core 1 and 2::
272 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
274 When using dpdk and dpdkvhostuser ports in a bi-directional VM loopback as
275 shown below, spreading the workload over 2 or 4 pmd threads shows significant
276 improvements as there will be more total CPU occupancy available::
278 NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
280 DPDK Physical Port Rx Queues
281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
285 $ ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>
287 The command above sets the number of rx queues for DPDK physical interface.
288 The rx queues are assigned to pmd threads on the same NUMA node in a
291 DPDK Physical Port Queue Sizes
292 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
296 $ ovs-vsctl set Interface dpdk0 options:n_rxq_desc=<integer>
297 $ ovs-vsctl set Interface dpdk0 options:n_txq_desc=<integer>
299 The command above sets the number of rx/tx descriptors that the NIC associated
300 with dpdk0 will be initialised with.
302 Different ``n_rxq_desc`` and ``n_txq_desc`` configurations yield different
303 benefits in terms of throughput and latency for different scenarios.
304 Generally, smaller queue sizes can have a positive impact for latency at the
305 expense of throughput. The opposite is often true for larger queue sizes.
306 Note: increasing the number of rx descriptors eg. to 4096 may have a negative
307 impact on performance due to the fact that non-vectorised DPDK rx functions may
308 be used. This is dependant on the driver in use, but is true for the commonly
309 used i40e and ixgbe DPDK drivers.
314 Each pmd thread contains one Exact Match Cache (EMC). After initial flow setup
315 in the datapath, the EMC contains a single table and provides the lowest level
316 (fastest) switching for DPDK ports. If there is a miss in the EMC then the next
317 level where switching will occur is the datapath classifier. Missing in the
318 EMC and looking up in the datapath classifier incurs a significant performance
319 penalty. If lookup misses occur in the EMC because it is too small to handle
320 the number of flows, its size can be increased. The EMC size can be modified by
321 editing the define ``EM_FLOW_HASH_SHIFT`` in ``lib/dpif-netdev.c``.
323 As mentioned above, an EMC is per pmd thread. An alternative way of increasing
324 the aggregate amount of possible flow entries in EMC and avoiding datapath
325 classifier lookups is to have multiple pmd threads running.
330 Rx mergeable buffers is a virtio feature that allows chaining of multiple
331 virtio descriptors to handle large packet sizes. Large packets are handled by
332 reserving and chaining multiple free descriptors together. Mergeable buffer
333 support is negotiated between the virtio driver and virtio device and is
334 supported by the DPDK vhost library. This behavior is supported and enabled by
335 default, however in the case where the user knows that rx mergeable buffers are
336 not needed i.e. jumbo frames are not needed, it can be forced off by adding
337 ``mrg_rxbuf=off`` to the QEMU command line options. By not reserving multiple
338 chains of descriptors it will make more individual virtio descriptors available
339 for rx to the guest using dpdkvhost ports and this can improve performance.
344 PHY-VM-PHY (vHost Loopback)
345 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
347 The `DPDK installation guide`_ details steps for PHY-VM-PHY loopback testcase
348 and packet forwarding using DPDK testpmd application in the Guest VM. For users
349 wishing to do packet forwarding using kernel stack below, you need to run the
350 below commands on the guest::
352 $ ifconfig eth1 1.1.1.2/24
353 $ ifconfig eth2 1.1.2.2/24
354 $ systemctl stop firewalld.service
355 $ systemctl stop iptables.service
356 $ sysctl -w net.ipv4.ip_forward=1
357 $ sysctl -w net.ipv4.conf.all.rp_filter=0
358 $ sysctl -w net.ipv4.conf.eth1.rp_filter=0
359 $ sysctl -w net.ipv4.conf.eth2.rp_filter=0
360 $ route add -net 1.1.2.0/24 eth2
361 $ route add -net 1.1.1.0/24 eth1
362 $ arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE
363 $ arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE
368 IVSHMEM can also be validated using the PHY-VM-PHY configuration. To begin,
369 follow the steps described in the `DPDK installation guide`_ to create and
370 initialize the database, start ovs-vswitchd and add ``dpdk``-type devices to
371 bridge ``br0``. Once complete, follow the below steps:
373 1. Add DPDK ring port to the bridge::
375 $ ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
377 2. Build modified QEMU
379 QEMU must be patched to enable IVSHMEM support::
382 $ wget http://wiki.qemu.org/download/qemu-2.2.1.tar.bz2
383 $ tar -jxvf qemu-2.2.1.tar.bz2
384 $ cd /usr/src/qemu-2.2.1
385 $ wget https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/patches/ivshmem-qemu-2.2.1.patch
386 $ patch -p1 < ivshmem-qemu-2.2.1.patch
387 $ ./configure --target-list=x86_64-softmmu --enable-debug --extra-cflags='-g'
390 3. Generate QEMU commandline::
392 $ mkdir -p /usr/src/cmdline_generator
393 $ cd /usr/src/cmdline_generator
394 $ wget https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/cmdline_generator.c
395 $ wget https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/Makefile
396 $ export RTE_SDK=/usr/src/dpdk-16.07
397 $ export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
399 $ ./build/cmdline_generator -m -p dpdkr0 XXX
400 $ cmdline=`cat OVSMEMPOOL`
404 $ export VM_NAME=ivshmem-vm
405 $ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2
406 $ export QEMU_BIN=/usr/src/qemu-2.2.1/x86_64-softmmu/qemu-system-x86_64
407 $ taskset 0x20 $QEMU_BIN -cpu host -smp 2,cores=2 -hda $QCOW2_IMAGE \
408 -m 4096 --enable-kvm -name $VM_NAME -nographic -vnc :2 \
409 -pidfile /tmp/vm1.pid $cmdline
411 5. Build and run the sample ``dpdkr`` app in VM::
413 $ echo 1024 > /proc/sys/vm/nr_hugepages
414 $ mount -t hugetlbfs nodev /dev/hugepages (if not already mounted)
416 # Build the DPDK ring application in the VM
417 $ export RTE_SDK=/root/dpdk-16.07
418 $ export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
421 # Run dpdkring application
422 $ ./build/dpdkr -c 1 -n 4 -- -n 0
423 # where "-n 0" refers to ring '0' i.e dpdkr0
425 PHY-VM-PHY (vHost Multiqueue)
426 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
428 vHost Multique functionality can also be validated using the PHY-VM-PHY
429 configuration. To begin, follow the steps described in the `DPDK installation
430 guide`_ to create and initialize the database, start ovs-vswitchd and add
431 ``dpdk``-type devices to bridge ``br0``. Once complete, follow the below steps:
433 1. Configure PMD and RXQs.
435 For example, set the number of dpdk port rx queues to at least 2 The number
436 of rx queues at vhost-user interface gets automatically configured after
437 virtio device connection and doesn't need manual configuration::
439 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=c
440 $ ovs-vsctl set Interface dpdk0 options:n_rxq=2
441 $ ovs-vsctl set Interface dpdk1 options:n_rxq=2
443 2. Instantiate Guest VM using QEMU cmdline
445 We must configure with appropriate software versions to ensure this feature
448 .. list-table:: Recommended BIOS Settings
455 * - QEMU thread affinity
456 - 2 cores (taskset 0x30)
466 To do this, instantiate the guest as follows::
468 $ export VM_NAME=vhost-vm
469 $ export GUEST_MEM=4096M
470 $ export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2
471 $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
472 $ taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -m 4096M \
473 -drive file=$QCOW2_IMAGE --enable-kvm -name $VM_NAME \
474 -nographic -numa node,memdev=mem -mem-prealloc \
475 -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \
476 -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \
477 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 \
478 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 \
479 -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \
480 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 \
481 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6
484 Queue value above should match the queues configured in OVS, The vector
485 value should be set to "number of queues x 2 + 2"
487 3. Configure the guest interface
489 Assuming there are 2 interfaces in the guest named eth0, eth1 check the
490 channel configuration and set the number of combined channels to 2 for
494 $ ethtool -L eth0 combined 2
495 $ ethtool -L eth1 combined 2
497 More information can be found in vHost walkthrough section.
499 4. Configure kernel packet forwarding
501 Configure IP and enable interfaces::
503 $ ifconfig eth0 5.5.5.1/24 up
504 $ ifconfig eth1 90.90.90.1/24 up
506 Configure IP forwarding and add route entries::
508 $ sysctl -w net.ipv4.ip_forward=1
509 $ sysctl -w net.ipv4.conf.all.rp_filter=0
510 $ sysctl -w net.ipv4.conf.eth0.rp_filter=0
511 $ sysctl -w net.ipv4.conf.eth1.rp_filter=0
512 $ ip route add 2.1.1.0/24 dev eth1
513 $ route add default gw 2.1.1.2 eth1
514 $ route add default gw 90.90.90.90 eth1
515 $ arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE
516 $ arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA
518 Check traffic on multiple queues::
520 $ cat /proc/interrupts | grep virtio
525 Two types of vHost User ports are available in OVS:
527 - vhost-user (``dpdkvhostuser``)
529 - vhost-user-client (``dpdkvhostuserclient``)
531 vHost User uses a client-server model. The server creates/manages/destroys the
532 vHost User sockets, and the client connects to the server. Depending on which
533 port type you use, ``dpdkvhostuser`` or ``dpdkvhostuserclient``, a different
534 configuration of the client-server model is used.
536 For vhost-user ports, Open vSwitch acts as the server and QEMU the client. For
537 vhost-user-client ports, Open vSwitch acts as the client and QEMU the server.
542 1. Install the prerequisites:
544 - QEMU version >= 2.2
546 2. Add vhost-user ports to the switch.
548 Unlike DPDK ring ports, DPDK vhost-user ports can have arbitrary names,
549 except that forward and backward slashes are prohibited in the names.
551 For vhost-user, the name of the port type is ``dpdkvhostuser``::
553 $ ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 \
556 This action creates a socket located at
557 ``/usr/local/var/run/openvswitch/vhost-user-1``, which you must provide to
558 your VM on the QEMU command line. More instructions on this can be found in
559 the next section "Adding vhost-user ports to VM"
562 If you wish for the vhost-user sockets to be created in a sub-directory of
563 ``/usr/local/var/run/openvswitch``, you may specify this directory in the
566 $ ovs-vsctl --no-wait \
567 set Open_vSwitch . other_config:vhost-sock-dir=subdir`
569 3. Add vhost-user ports to VM
573 Pass the following parameters to QEMU to attach a vhost-user device::
575 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
576 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
577 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
579 where ``vhost-user-1`` is the name of the vhost-user port added to the
582 Repeat the above parameters for multiple devices, changing the chardev
583 ``path`` and ``id`` as necessary. Note that a separate and different
584 chardev ``path`` needs to be specified for each vhost-user device. For
585 example you have a second vhost-user port named ``vhost-user-2``, you
586 append your QEMU command line with an additional set of parameters::
588 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
589 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
590 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
592 2. Configure hugepages
594 QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
595 a virtio-net device's virtual rings and packet buffers mapping the VM's
596 physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
597 memory into their process address space, pass the following parameters
600 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
601 -numa node,memdev=mem -mem-prealloc
603 3. Enable multiqueue support (optional)
605 QEMU needs to be configured to use multiqueue::
607 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
608 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
609 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
616 The number of vectors, which is ``$q`` * 2 + 2
618 The vhost-user interface will be automatically reconfigured with
619 required number of rx and tx queues after connection of virtio device.
620 Manual configuration of ``n_rxq`` is not supported because OVS will work
621 properly only if ``n_rxq`` will match number of queues configured in
624 A least 2 PMDs should be configured for the vswitch when using
625 multiqueue. Using a single PMD will cause traffic to be enqueued to the
626 same vhost queue rather than being distributed among different vhost
627 queues for a vhost-user interface.
629 If traffic destined for a VM configured with multiqueue arrives to the
630 vswitch via a physical DPDK port, then the number of rxqs should also be
631 set to at least 2 for that physical DPDK port. This is required to
632 increase the probability that a different PMD will handle the multiqueue
633 transmission to the guest using a different vhost queue.
635 If one wishes to use multiple queues for an interface in the guest, the
636 driver in the guest operating system must be configured to do so. It is
637 recommended that the number of queues configured be equal to ``$q``.
639 For example, this can be done for the Linux kernel virtio-net driver
642 $ ethtool -L <DEV> combined <$q>
647 Changes the numbers of channels of the specified network device
649 Changes the number of multi-purpose channels.
651 Configure the VM using libvirt
652 ++++++++++++++++++++++++++++++
654 You can also build and configure the VM using libvirt rather than QEMU by
657 1. Change the user/group, access control policty and restart libvirtd.
659 - In ``/etc/libvirt/qemu.conf`` add/edit the following lines::
664 - Disable SELinux or set to permissive mode::
668 - Restart the libvirtd process, For example, on Fedora::
670 $ systemctl restart libvirtd.service
672 2. Instantiate the VM
674 - Copy the XML configuration described in the `DPDK installation guide`_.
678 $ virsh create demovm.xml
680 - Connect to the guest console::
682 $ virsh console demovm
686 The demovm xml configuration is aimed at achieving out of box performance on
689 - The vcpus are pinned to the cores of the CPU socket 0 using ``vcpupin``.
691 - Configure NUMA cell and memory shared using ``memAccess='shared'``.
693 - Disable ``mrg_rxbuf='off'``
695 Refer to the `libvirt documentation <http://libvirt.org/formatdomain.html>`__
696 for more information.
701 1. Install the prerequisites:
703 - QEMU version >= 2.7
705 2. Add vhost-user-client ports to the switch.
707 Unlike vhost-user ports, the name given to port does not govern the name of
708 the socket device. ``vhost-server-path`` reflects the full path of the
709 socket that has been or will be created by QEMU for the given vHost User
712 For vhost-user-client, the name of the port type is
713 ``dpdkvhostuserclient``::
715 $ VHOST_USER_SOCKET_PATH=/path/to/socker
716 $ ovs-vsctl add-port br0 vhost-client-1 \
717 -- set Interface vhost-client-1 type=dpdkvhostuserclient \
718 options:vhost-server-path=$VHOST_USER_SOCKET_PATH
720 3. Add vhost-user-client ports to VM
724 Pass the following parameters to QEMU to attach a vhost-user device::
726 -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
727 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
728 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
730 where ``vhost-user-1`` is the name of the vhost-user port added to the
733 If the corresponding dpdkvhostuserclient port has not yet been configured
734 in OVS with ``vhost-server-path=/path/to/socket``, QEMU will print a log
735 similar to the following::
737 QEMU waiting for connection on: disconnected:unix:/path/to/socket,server
739 QEMU will wait until the port is created sucessfully in OVS to boot the VM.
741 One benefit of using this mode is the ability for vHost ports to
742 'reconnect' in event of the switch crashing or being brought down. Once
743 it is brought back up, the vHost ports will reconnect automatically and
744 normal service will resume.
746 DPDK Backend Inside VM
747 ~~~~~~~~~~~~~~~~~~~~~~
749 Additional configuration is required if you want to run ovs-vswitchd with DPDK
750 backend inside a QEMU virtual machine. Ovs-vswitchd creates separate DPDK TX
751 queues for each CPU core available. This operation fails inside QEMU virtual
752 machine because, by default, VirtIO NIC provided to the guest is configured to
753 support only single TX queue and single RX queue. To change this behavior, you
754 need to turn on ``mq`` (multiqueue) property of all ``virtio-net-pci`` devices
755 emulated by QEMU and used by DPDK. You may do it manually (by changing QEMU
756 command line) or, if you use Libvirt, by adding the following string to
757 ``<interface>`` sections of all network devices used by DPDK::
759 <driver name='vhost' queues='N'/>
764 determines how many queues can be used by the guest.
766 This requires QEMU >= 2.2.
771 Assuming you have a vhost-user port transmitting traffic consisting of packets
772 of size 64 bytes, the following command would limit the egress transmission
773 rate of the port to ~1,000,000 packets per second::
775 $ ovs-vsctl set port vhost-user0 qos=@newqos -- \
776 --id=@newqos create qos type=egress-policer other-config:cir=46000000 \
777 other-config:cbs=2048`
779 To examine the QoS configuration of the port, run::
781 $ ovs-appctl -t ovs-vswitchd qos/show vhost-user0
783 To clear the QoS configuration from the port and ovsdb, run::
785 $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
787 Refer to vswitch.xml for more details on egress-policer.
792 Here is an example on Ingress Policing usage. Assuming you have a vhost-user
793 port receiving traffic consisting of packets of size 64 bytes, the following
794 command would limit the reception rate of the port to ~1,000,000 packets per
797 $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 \
798 ingress_policing_burst=1000`
800 To examine the ingress policer configuration of the port::
802 $ ovs-vsctl list interface vhost-user0
804 To clear the ingress policer configuration from the port::
806 $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0
808 Refer to vswitch.xml for more details on ingress-policer.
813 Flow control can be enabled only on DPDK physical ports. To enable flow
814 control support at tx side while adding a port, run::
816 $ ovs-vsctl add-port br0 dpdk0 -- \
817 set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
819 Similarly, to enable rx flow control, run::
821 $ ovs-vsctl add-port br0 dpdk0 -- \
822 set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
824 To enable flow control auto-negotiation, run::
826 $ ovs-vsctl add-port br0 dpdk0 -- \
827 set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
829 To turn ON the tx flow control at run time(After the port is being added to
832 $ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true
834 The flow control parameters can be turned off by setting ``false`` to the
835 respective parameter. To disable the flow control at tx side, run::
837 $ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false
842 Pdump allows you to listen on DPDK ports and view the traffic that is passing
843 on them. To use this utility, one must have libpcap installed on the system.
844 Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and
845 ``CONFIG_RTE_LIBRTE_PMD_PCAP=y``.
848 A performance decrease is expected when using a monitoring application like
851 To use pdump, simply launch OVS as usual. Then, navigate to the ``app/pdump``
852 directory in DPDK, ``make`` the application and run like so::
854 $ sudo ./build/app/dpdk-pdump -- \
855 --pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \
856 --server-socket-path=/usr/local/var/run/openvswitch
858 The above command captures traffic received on queue 0 of port 0 and stores it
859 in ``/tmp/pkts.pcap``. Other combinations of port numbers, queues numbers and
860 pcap locations are of course also available to use. For example, to capture all
861 packets that traverse port 0 in a single pcap file::
863 $ sudo ./build/app/dpdk-pdump -- \
864 --pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap' \
865 --server-socket-path=/usr/local/var/run/openvswitch
867 ``server-socket-path`` must be set to the value of ovs_rundir() which typically
868 resolves to ``/usr/local/var/run/openvswitch``.
870 Many tools are available to view the contents of the pcap file. Once example is
871 tcpdump. Issue the following command to view the contents of ``pkts.pcap``::
873 $ tcpdump -r pkts.pcap
875 More information on the pdump app and its usage can be found in the `DPDK docs
876 <http://dpdk.org/doc/guides/sample_app_ug/pdump.html>`__.
881 By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
882 enable Jumbo Frames support for a DPDK port, change the Interface's
883 ``mtu_request`` attribute to a sufficiently large value. For example, to add a
884 DPDK Phy port with MTU of 9000::
886 $ ovs-vsctl add-port br0 dpdk0 \
887 -- set Interface dpdk0 type=dpdk \
888 -- set Interface dpdk0 mtu_request=9000`
890 Similarly, to change the MTU of an existing port to 6200::
892 $ ovs-vsctl set Interface dpdk0 mtu_request=6200
894 Some additional configuration is needed to take advantage of jumbo frames with
897 1. *mergeable buffers* must be enabled for vHost ports, as demonstrated in the
898 QEMU command line snippet below::
900 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
901 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on
903 2. Where virtio devices are bound to the Linux kernel driver in a guest
904 environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
905 MTU of those logical network interfaces must also be increased to a
906 sufficiently large value. This avoids segmentation of Jumbo Frames received
907 in the guest. Note that 'MTU' refers to the length of the IP packet only,
908 and not that of the entire frame.
910 To calculate the exact MTU of a standard IPv4 frame, subtract the L2 header
911 and CRC lengths (i.e. 18B) from the max supported frame size. So, to set
912 the MTU for a 9018B Jumbo Frame::
914 $ ifconfig eth1 mtu 9000
916 When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
917 increased, such that a full Jumbo Frame of a specific size may be accommodated
918 within a single mbuf segment.
920 Jumbo frame support has been validated against 9728B frames, which is the
921 largest frame size supported by Fortville NIC using the DPDK i40e driver, but
922 larger frames and other DPDK NIC drivers may be supported. These cases are
923 common for use cases involving East-West traffic only.
928 The vsperf project aims to develop a vSwitch test framework that can be used to
929 validate the suitability of different vSwitch implementations in a telco
930 deployment environment. More information can be found on the `OPNFV wiki
931 <https://wiki.opnfv.org/display/vsperf/VSperf+Home>`__.
936 Report problems to bugs@openvswitch.org.
938 .. _DPDK installation guide: INSTALL.DPDK.rst