]> git.proxmox.com Git - ovs.git/blame - Documentation/intro/install/dpdk.rst
netdev-dpdk: Add support for vHost dequeue zero copy (experimental)
[ovs.git] / Documentation / intro / install / dpdk.rst
CommitLineData
167703d6
SF
1..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24======================
25Open vSwitch with DPDK
26======================
27
28This document describes how to build and install Open vSwitch using a DPDK
29datapath. Open vSwitch can use the DPDK library to operate entirely in
30userspace.
31
624f6206
SF
32.. seealso::
33
34 The :doc:`releases FAQ </faq/releases>` lists support for the required
35 versions of DPDK for each version of Open vSwitch.
36
167703d6
SF
37Build requirements
38------------------
39
795752a3
SF
40In addition to the requirements described in :doc:`general`, building Open
41vSwitch with DPDK will require the following:
167703d6 42
5e925ccc 43- DPDK 17.11
167703d6
SF
44
45- A `DPDK supported NIC`_
46
47 Only required when physical ports are in use
48
49- A suitable kernel
50
51 On Linux Distros running kernel version >= 3.0, only `IOMMU` needs to enabled
52 via the grub cmdline, assuming you are using **VFIO**. For older kernels,
53 ensure the kernel is built with ``UIO``, ``HUGETLBFS``,
54 ``PROC_PAGE_MONITOR``, ``HPET``, ``HPET_MMAP`` support. If these are not
55 present, it will be necessary to upgrade your kernel or build a custom kernel
56 with these flags enabled.
57
e69e4f5b 58Detailed system requirements can be found at `DPDK requirements`_.
167703d6
SF
59
60.. _DPDK supported NIC: http://dpdk.org/doc/nics
61.. _DPDK requirements: http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html
62
63Installing
64----------
65
e69e4f5b
SF
66Install DPDK
67~~~~~~~~~~~~
167703d6 68
e69e4f5b 69#. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
167703d6
SF
70
71 $ cd /usr/src/
5e925ccc
MK
72 $ wget http://fast.dpdk.org/rel/dpdk-17.11.tar.xz
73 $ tar xf dpdk-17.11.tar.xz
74 $ export DPDK_DIR=/usr/src/dpdk-17.11
167703d6
SF
75 $ cd $DPDK_DIR
76
e69e4f5b
SF
77#. (Optional) Configure DPDK as a shared library
78
79 DPDK can be built as either a static library or a shared library. By
80 default, it is configured for the former. If you wish to use the latter, set
81 ``CONFIG_RTE_BUILD_SHARED_LIB=y`` in ``$DPDK_DIR/config/common_base``.
82
83 .. note::
84
85 Minor performance loss is expected when using OVS with a shared DPDK
86 library compared to a static DPDK library.
87
88#. Configure and install DPDK
167703d6 89
dc76953f 90 Build and install the DPDK library::
167703d6
SF
91
92 $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
93 $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
94 $ make install T=$DPDK_TARGET DESTDIR=install
95
e69e4f5b
SF
96#. (Optional) Export the DPDK shared library location
97
98 If DPDK was built as a shared library, export the path to this library for
99 use when building OVS::
100
101 $ export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib
102
2ee39aec 103.. _DPDK sources: http://dpdk.org/rel
167703d6
SF
104
105Install OVS
106~~~~~~~~~~~
107
108OVS can be installed using different methods. For OVS to use DPDK datapath, it
109has to be configured with DPDK support (``--with-dpdk``).
110
111.. note::
112 This section focuses on generic recipe that suits most cases. For
113 distribution specific instructions, refer to one of the more relevant guides.
114
115.. _OVS sources: http://openvswitch.org/releases/
116
e69e4f5b 117#. Ensure the standard OVS requirements, described in
795752a3 118 :ref:`general-build-reqs`, are installed
167703d6 119
e69e4f5b 120#. Bootstrap, if required, as described in :ref:`general-bootstrapping`
167703d6 121
e69e4f5b 122#. Configure the package using the ``--with-dpdk`` flag::
167703d6
SF
123
124 $ ./configure --with-dpdk=$DPDK_BUILD
125
126 where ``DPDK_BUILD`` is the path to the built DPDK library. This can be
96195c09
CE
127 skipped if DPDK library is installed in its default location.
128
129 If no path is provided to ``--with-dpdk``, but a pkg-config configuration
130 for libdpdk is available the include paths will be generated via an
131 equivalent ``pkg-config --cflags libdpdk``.
167703d6
SF
132
133 .. note::
134 While ``--with-dpdk`` is required, you can pass any other configuration
795752a3 135 option described in :ref:`general-configuring`.
167703d6 136
e69e4f5b 137#. Build and install OVS, as described in :ref:`general-building`
167703d6 138
795752a3 139Additional information can be found in :doc:`general`.
167703d6 140
e3e738a3 141.. note::
142 If you are running using the Fedora or Red Hat package, the Open vSwitch
143 daemon will run as a non-root user. This implies that you must have a
144 working IOMMU. Visit the `RHEL README`__ for additional information.
145
146__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst
147
167703d6
SF
148Setup
149-----
150
151Setup Hugepages
152~~~~~~~~~~~~~~~
153
154Allocate a number of 2M Huge pages:
155
156- For persistent allocation of huge pages, write to hugepages.conf file
dc76953f 157 in `/etc/sysctl.d`::
167703d6
SF
158
159 $ echo 'vm.nr_hugepages=2048' > /etc/sysctl.d/hugepages.conf
160
dc76953f 161- For run-time allocation of huge pages, use the ``sysctl`` utility::
167703d6
SF
162
163 $ sysctl -w vm.nr_hugepages=N # where N = No. of 2M huge pages
164
dc76953f 165To verify hugepage configuration::
167703d6
SF
166
167 $ grep HugePages_ /proc/meminfo
168
dc76953f 169Mount the hugepages, if not already mounted by default::
167703d6
SF
170
171 $ mount -t hugetlbfs none /dev/hugepages``
172
173.. _dpdk-vfio:
174
175Setup DPDK devices using VFIO
176~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
177
178VFIO is prefered to the UIO driver when using recent versions of DPDK. VFIO
179support required support from both the kernel and BIOS. For the former, kernel
180version > 3.6 must be used. For the latter, you must enable VT-d in the BIOS
181and ensure this is configured via grub. To ensure VT-d is enabled via the BIOS,
dc76953f 182run::
167703d6
SF
183
184 $ dmesg | grep -e DMAR -e IOMMU
185
186If VT-d is not enabled in the BIOS, enable it now.
187
dc76953f 188To ensure VT-d is enabled in the kernel, run::
167703d6
SF
189
190 $ cat /proc/cmdline | grep iommu=pt
191 $ cat /proc/cmdline | grep intel_iommu=on
192
193If VT-d is not enabled in the kernel, enable it now.
194
195Once VT-d is correctly configured, load the required modules and bind the NIC
dc76953f 196to the VFIO driver::
167703d6
SF
197
198 $ modprobe vfio-pci
199 $ /usr/bin/chmod a+x /dev/vfio
200 $ /usr/bin/chmod 0666 /dev/vfio/*
f3e7ec25
MW
201 $ $DPDK_DIR/usertools/dpdk-devbind.py --bind=vfio-pci eth1
202 $ $DPDK_DIR/usertools/dpdk-devbind.py --status
167703d6
SF
203
204Setup OVS
205~~~~~~~~~
206
795752a3
SF
207Open vSwitch should be started as described in :doc:`general` with the
208exception of ovs-vswitchd, which requires some special configuration to enable
209DPDK functionality. DPDK configuration arguments can be passed to ovs-vswitchd
210via the ``other_config`` column of the ``Open_vSwitch`` table. At a minimum,
211the ``dpdk-init`` option must be set to ``true``. For example::
167703d6 212
588e0ebc 213 $ export PATH=$PATH:/usr/local/share/openvswitch/scripts
167703d6
SF
214 $ export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
215 $ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
588e0ebc 216 $ ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start
167703d6
SF
217
218There are many other configuration options, the most important of which are
219listed below. Defaults will be provided for all values not explicitly set.
220
221``dpdk-init``
222 Specifies whether OVS should initialize and support DPDK ports. This is a
223 boolean, and defaults to false.
224
225``dpdk-lcore-mask``
226 Specifies the CPU cores on which dpdk lcore threads should be spawned and
227 expects hex string (eg '0x123').
228
229``dpdk-socket-mem``
230 Comma separated list of memory to pre-allocate from hugepages on specific
231 sockets.
232
233``dpdk-hugepage-dir``
234 Directory where hugetlbfs is mounted
235
236``vhost-sock-dir``
237 Option to set the path to the vhost-user unix socket files.
238
90ca71dd 239If allocating more than one GB hugepage, you can configure the
167703d6 240amount of memory used from any given NUMA nodes. For example, to use 1GB from
cd6c5bc8 241NUMA node 0 and 0GB for all other NUMA nodes, run::
167703d6
SF
242
243 $ ovs-vsctl --no-wait set Open_vSwitch . \
244 other_config:dpdk-socket-mem="1024,0"
245
cd6c5bc8
KT
246or::
247
248 $ ovs-vsctl --no-wait set Open_vSwitch . \
249 other_config:dpdk-socket-mem="1024"
250
167703d6
SF
251.. note::
252 Changing any of these options requires restarting the ovs-vswitchd
253 application
254
441cb3eb
DB
255See the section ``Performance Tuning`` for important DPDK customizations.
256
167703d6
SF
257Validating
258----------
259
e69e4f5b
SF
260At this point you can use ovs-vsctl to set up bridges and other Open vSwitch
261features. Seeing as we've configured the DPDK datapath, we will use DPDK-type
262ports. For example, to create a userspace bridge named ``br0`` and add two
263``dpdk`` ports to it, run::
167703d6
SF
264
265 $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
55e075e6
CL
266 $ ovs-vsctl add-port br0 myportnameone -- set Interface myportnameone \
267 type=dpdk options:dpdk-devargs=0000:06:00.0
268 $ ovs-vsctl add-port br0 myportnametwo -- set Interface myportnametwo \
269 type=dpdk options:dpdk-devargs=0000:06:00.1
270
271DPDK devices will not be available for use until a valid dpdk-devargs is
272specified.
167703d6 273
e69e4f5b 274Refer to ovs-vsctl(8) and :doc:`/howto/dpdk` for more details.
167703d6 275
e69e4f5b
SF
276Performance Tuning
277------------------
167703d6 278
e69e4f5b
SF
279To achieve optimal OVS performance, the system can be configured and that
280includes BIOS tweaks, Grub cmdline additions, better understanding of NUMA
281nodes and apt selection of PCIe slots for NIC placement.
167703d6 282
e69e4f5b 283.. note::
167703d6 284
e69e4f5b
SF
285 This section is optional. Once installed as described above, OVS with DPDK
286 will work out of the box.
167703d6 287
e69e4f5b
SF
288Recommended BIOS Settings
289~~~~~~~~~~~~~~~~~~~~~~~~~
167703d6 290
e69e4f5b
SF
291.. list-table:: Recommended BIOS Settings
292 :header-rows: 1
167703d6 293
e69e4f5b
SF
294 * - Setting
295 - Value
296 * - C3 Power State
297 - Disabled
298 * - C6 Power State
299 - Disabled
300 * - MLC Streamer
301 - Enabled
302 * - MLC Spacial Prefetcher
303 - Enabled
304 * - DCU Data Prefetcher
305 - Enabled
306 * - DCA
307 - Enabled
308 * - CPU Power and Performance
309 - Performance
310 * - Memeory RAS and Performance Config -> NUMA optimized
311 - Enabled
167703d6 312
e69e4f5b
SF
313PCIe Slot Selection
314~~~~~~~~~~~~~~~~~~~
167703d6 315
e69e4f5b
SF
316The fastpath performance can be affected by factors related to the placement of
317the NIC, such as channel speeds between PCIe slot and CPU or the proximity of
318PCIe slot to the CPU cores running the DPDK application. Listed below are the
319steps to identify right PCIe slot.
167703d6 320
e69e4f5b 321#. Retrieve host details using ``dmidecode``. For example::
167703d6 322
e69e4f5b 323 $ dmidecode -t baseboard | grep "Product Name"
167703d6 324
e69e4f5b 325#. Download the technical specification for product listed, e.g: S2600WT2
167703d6 326
e69e4f5b
SF
327#. Check the Product Architecture Overview on the Riser slot placement, CPU
328 sharing info and also PCIe channel speeds
167703d6 329
e69e4f5b
SF
330 For example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed
331 between CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s.
332 Running DPDK app on CPU1 cores and NIC inserted in to Riser card Slots will
333 optimize OVS performance in this case.
167703d6 334
e69e4f5b
SF
335#. Check the Riser Card #1 - Root Port mapping information, on the available
336 slots and individual bus speeds. In S2600WT slot 1, slot 2 has high bus
337 speeds and are potential slots for NIC placement.
167703d6 338
e69e4f5b
SF
339Advanced Hugepage Setup
340~~~~~~~~~~~~~~~~~~~~~~~
167703d6 341
e69e4f5b 342Allocate and mount 1 GB hugepages.
167703d6 343
e69e4f5b
SF
344- For persistent allocation of huge pages, add the following options to the
345 kernel bootline::
167703d6 346
e69e4f5b 347 default_hugepagesz=1GB hugepagesz=1G hugepages=N
167703d6 348
e69e4f5b 349 For platforms supporting multiple huge page sizes, add multiple options::
167703d6 350
e69e4f5b 351 default_hugepagesz=<size> hugepagesz=<size> hugepages=N
167703d6 352
e69e4f5b 353 where:
167703d6 354
e69e4f5b
SF
355 ``N``
356 number of huge pages requested
357 ``size``
358 huge page size with an optional suffix ``[kKmMgG]``
167703d6 359
e69e4f5b 360- For run-time allocation of huge pages::
167703d6 361
e69e4f5b 362 $ echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages
167703d6 363
e69e4f5b 364 where:
167703d6 365
e69e4f5b
SF
366 ``N``
367 number of huge pages requested
368 ``X``
369 NUMA Node
167703d6 370
e69e4f5b
SF
371 .. note::
372 For run-time allocation of 1G huge pages, Contiguous Memory Allocator
373 (``CONFIG_CMA``) has to be supported by kernel, check your Linux distro.
167703d6 374
e69e4f5b 375Now mount the huge pages, if not already done so::
167703d6 376
e69e4f5b 377 $ mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
167703d6 378
e69e4f5b
SF
379Isolate Cores
380~~~~~~~~~~~~~
381
382The ``isolcpus`` option can be used to isolate cores from the Linux scheduler.
383The isolated cores can then be used to dedicatedly run HPC applications or
384threads. This helps in better application performance due to zero context
385switching and minimal cache thrashing. To run platform logic on core 0 and
386isolate cores between 1 and 19 from scheduler, add ``isolcpus=1-19`` to GRUB
387cmdline.
167703d6
SF
388
389.. note::
e69e4f5b
SF
390 It has been verified that core isolation has minimal advantage due to mature
391 Linux scheduler in some circumstances.
167703d6 392
e69e4f5b
SF
393Compiler Optimizations
394~~~~~~~~~~~~~~~~~~~~~~
395
396The default compiler optimization level is ``-O2``. Changing this to more
397aggressive compiler optimization such as ``-O3 -march=native`` with
398gcc (verified on 5.3.1) can produce performance gains though not siginificant.
399``-march=native`` will produce optimized code on local machine and should be
400used when software compilation is done on Testbed.
401
441cb3eb
DB
402Multiple Poll-Mode Driver Threads
403~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
404
405With pmd multi-threading support, OVS creates one pmd thread for each NUMA node
406by default, if there is at least one DPDK interface from that NUMA node added
407to OVS. However, in cases where there are multiple ports/rxq's producing
408traffic, performance can be improved by creating multiple pmd threads running
409on separate cores. These pmd threads can share the workload by each being
410responsible for different ports/rxq's. Assignment of ports/rxq's to pmd threads
411is done automatically.
412
413A set bit in the mask means a pmd thread is created and pinned to the
414corresponding CPU core. For example, to run pmd threads on core 1 and 2::
415
416 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
417
418When using dpdk and dpdkvhostuser ports in a bi-directional VM loopback as
419shown below, spreading the workload over 2 or 4 pmd threads shows significant
420improvements as there will be more total CPU occupancy available::
421
422 NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
423
424Refer to ovs-vswitchd.conf.db(5) for additional information on configuration
425options.
426
e69e4f5b
SF
427Affinity
428~~~~~~~~
429
430For superior performance, DPDK pmd threads and Qemu vCPU threads needs to be
431affinitized accordingly.
432
433- PMD thread Affinity
434
435 A poll mode driver (pmd) thread handles the I/O of all DPDK interfaces
436 assigned to it. A pmd thread shall poll the ports for incoming packets,
c37813fd 437 switch the packets and send to tx port. A pmd thread is CPU bound, and needs
441cb3eb
DB
438 to be affinitized to isolated cores for optimum performance. Even though a
439 PMD thread may exist, the thread only starts consuming CPU cycles if there is
440 at least one receive queue assigned to the pmd.
c37813fd
BM
441
442 .. note::
441cb3eb 443 On NUMA systems, PCI devices are also local to a NUMA node. Unbound rx
c37813fd
BM
444 queues for a PCI device will be assigned to a pmd on it's local NUMA node
445 if a non-isolated PMD exists on that NUMA node. If not, the queue will be
446 assigned to a non-isolated pmd on a remote NUMA node. This will result in
447 reduced maximum throughput on that device and possibly on other devices
448 assigned to that pmd thread. If such a queue assignment is made a warning
449 message will be logged: "There's no available (non-isolated) pmd thread on
450 numa node N. Queue Q on port P will be assigned to the pmd on core C
451 (numa node N'). Expect reduced performance."
e69e4f5b 452
441cb3eb
DB
453 Binding PMD threads to cores is described in the above section
454 ``Multiple Poll-Mode Driver Threads``.
455
e69e4f5b
SF
456- QEMU vCPU thread Affinity
457
458 A VM performing simple packet forwarding or running complex packet pipelines
459 has to ensure that the vCPU threads performing the work has as much CPU
460 occupancy as possible.
461
462 For example, on a multicore VM, multiple QEMU vCPU threads shall be spawned.
463 When the DPDK ``testpmd`` application that does packet forwarding is invoked,
464 the ``taskset`` command should be used to affinitize the vCPU threads to the
465 dedicated isolated cores on the host system.
466
441cb3eb
DB
467Enable HyperThreading
468~~~~~~~~~~~~~~~~~~~~~
e69e4f5b 469
441cb3eb
DB
470With HyperThreading, or SMT, enabled, a physical core appears as two logical
471cores. SMT can be utilized to spawn worker threads on logical cores of the same
472physical core there by saving additional cores.
e69e4f5b 473
441cb3eb
DB
474With DPDK, when pinning pmd threads to logical cores, care must be taken to set
475the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads are
476pinned to SMT siblings.
e69e4f5b 477
441cb3eb
DB
478Take a sample system configuration, with 2 sockets, 2 * 10 core processors, HT
479enabled. This gives us a total of 40 logical cores. To identify the physical
480core shared by two logical cores, run::
e69e4f5b 481
441cb3eb 482 $ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
e69e4f5b 483
441cb3eb
DB
484where ``N`` is the logical core number.
485
486In this example, it would show that cores ``1`` and ``21`` share the same
487physical core. Logical cores can be specified in pmd-cpu-masks similarly to
488physical cores, as described in ``Multiple Poll-Mode Driver Threads``.
489
490NUMA/Cluster-on-Die
491~~~~~~~~~~~~~~~~~~~
492
493Ideally inter-NUMA datapaths should be avoided where possible as packets will
494go across QPI and there may be a slight performance penalty when compared with
495intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is
496introduced on models that have 10 cores or more. This makes it possible to
497logically split a socket into two NUMA regions and again it is preferred where
498possible to keep critical datapaths within the one cluster.
499
500It is good practice to ensure that threads that are in the datapath are pinned
501to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs responsible for
502forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, vHost
503User ports automatically detect the NUMA socket of the QEMU vCPUs and will be
504serviced by a PMD from the same node provided a core on this node is enabled in
505the ``pmd-cpu-mask``. ``libnuma`` packages are required for this feature.
506
507Binding PMD threads is described in the above section
508``Multiple Poll-Mode Driver Threads``.
e69e4f5b
SF
509
510DPDK Physical Port Rx Queues
511~~~~~~~~~~~~~~~~~~~~~~~~~~~~
512
513::
514
515 $ ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>
516
517The above command sets the number of rx queues for DPDK physical interface.
518The rx queues are assigned to pmd threads on the same NUMA node in a
519round-robin fashion.
520
a0b62aac
CL
521.. _dpdk-queues-sizes:
522
e69e4f5b
SF
523DPDK Physical Port Queue Sizes
524~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
525
526::
527
528 $ ovs-vsctl set Interface dpdk0 options:n_rxq_desc=<integer>
529 $ ovs-vsctl set Interface dpdk0 options:n_txq_desc=<integer>
530
531The above command sets the number of rx/tx descriptors that the NIC associated
532with dpdk0 will be initialised with.
533
534Different ``n_rxq_desc`` and ``n_txq_desc`` configurations yield different
535benefits in terms of throughput and latency for different scenarios.
536Generally, smaller queue sizes can have a positive impact for latency at the
537expense of throughput. The opposite is often true for larger queue sizes.
538Note: increasing the number of rx descriptors eg. to 4096 may have a negative
539impact on performance due to the fact that non-vectorised DPDK rx functions may
b4675b81 540be used. This is dependent on the driver in use, but is true for the commonly
e69e4f5b
SF
541used i40e and ixgbe DPDK drivers.
542
543Exact Match Cache
544~~~~~~~~~~~~~~~~~
545
546Each pmd thread contains one Exact Match Cache (EMC). After initial flow setup
547in the datapath, the EMC contains a single table and provides the lowest level
548(fastest) switching for DPDK ports. If there is a miss in the EMC then the next
549level where switching will occur is the datapath classifier. Missing in the
550EMC and looking up in the datapath classifier incurs a significant performance
551penalty. If lookup misses occur in the EMC because it is too small to handle
552the number of flows, its size can be increased. The EMC size can be modified by
553editing the define ``EM_FLOW_HASH_SHIFT`` in ``lib/dpif-netdev.c``.
554
555As mentioned above, an EMC is per pmd thread. An alternative way of increasing
556the aggregate amount of possible flow entries in EMC and avoiding datapath
557classifier lookups is to have multiple pmd threads running.
558
559Rx Mergeable Buffers
560~~~~~~~~~~~~~~~~~~~~
167703d6 561
e69e4f5b
SF
562Rx mergeable buffers is a virtio feature that allows chaining of multiple
563virtio descriptors to handle large packet sizes. Large packets are handled by
564reserving and chaining multiple free descriptors together. Mergeable buffer
565support is negotiated between the virtio driver and virtio device and is
566supported by the DPDK vhost library. This behavior is supported and enabled by
567default, however in the case where the user knows that rx mergeable buffers are
568not needed i.e. jumbo frames are not needed, it can be forced off by adding
569``mrg_rxbuf=off`` to the QEMU command line options. By not reserving multiple
570chains of descriptors it will make more individual virtio descriptors available
571for rx to the guest using dpdkvhost ports and this can improve performance.
167703d6 572
00adb8d7
IM
573Output Packet Batching
574~~~~~~~~~~~~~~~~~~~~~~
575
576To make advantage of batched transmit functions, OVS collects packets in
577intermediate queues before sending when processing a batch of received packets.
578Even if packets are matched by different flows, OVS uses a single send
579operation for all packets destined to the same output port.
580
581Furthermore, OVS is able to buffer packets in these intermediate queues for a
582configurable amount of time to reduce the frequency of send bursts at medium
583load levels when the packet receive rate is high, but the receive batch size
584still very small. This is particularly beneficial for packets transmitted to
585VMs using an interrupt-driven virtio driver, where the interrupt overhead is
586significant for the OVS PMD, the host operating system and the guest driver.
587
588The ``tx-flush-interval`` parameter can be used to specify the time in
589microseconds OVS should wait between two send bursts to a given port (default
590is ``0``). When the intermediate queue fills up before that time is over, the
591buffered packet batch is sent immediately::
592
593 $ ovs-vsctl set Open_vSwitch . other_config:tx-flush-interval=50
594
595This parameter influences both throughput and latency, depending on the traffic
596load on the port. In general lower values decrease latency while higher values
597may be useful to achieve higher throughput.
598
599Low traffic (``packet rate < 1 / tx-flush-interval``) should not experience
600any significant latency or throughput increase as packets are forwarded
601immediately.
602
603At intermediate load levels
604(``1 / tx-flush-interval < packet rate < 32 / tx-flush-interval``) traffic
605should experience an average latency increase of up to
606``1 / 2 * tx-flush-interval`` and a possible throughput improvement.
607
608Very high traffic (``packet rate >> 32 / tx-flush-interval``) should experience
609the average latency increase equal to ``32 / (2 * packet rate)``. Most send
610batches in this case will contain the maximum number of packets (``32``).
611
612A ``tx-burst-interval`` value of ``50`` microseconds has shown to provide a
613good performance increase in a ``PHY-VM-PHY`` scenario on ``x86`` system for
614interrupt-driven guests while keeping the latency increase at a reasonable
615level:
616
617 https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341628.html
618
619.. note::
620 Throughput impact of this option significantly depends on the scenario and
621 the traffic patterns. For example: ``tx-burst-interval`` value of ``50``
622 microseconds shows performance degradation in ``PHY-VM-PHY`` with bonded PHY
623 scenario while testing with ``256 - 1024`` packet flows:
624
625 https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341700.html
626
627The average number of packets per output batch can be checked in PMD stats::
628
629 $ ovs-appctl dpif-netdev/pmd-stats-show
630
167703d6
SF
631Limitations
632------------
633
634- Currently DPDK ports does not use HW offload functionality.
635- Network Interface Firmware requirements: Each release of DPDK is
636 validated against a specific firmware version for a supported Network
637 Interface. New firmware versions introduce bug fixes, performance
638 improvements and new functionality that DPDK leverages. The validated
639 firmware versions are available as part of the release notes for
640 DPDK. It is recommended that users update Network Interface firmware
641 to match what has been validated for the DPDK release.
642
643 The latest list of validated firmware versions can be found in the `DPDK
644 release notes`_.
645
5e925ccc 646.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_11.html
167703d6 647
fa02b5bf
IS
648- Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
649 given MTU value is calculated e.g. i40e driver includes 2 x vlan headers in
650 MTU overhead, em driver includes 1 x vlan header, ixgbe driver does not
651 include a vlan header in overhead. Currently it is not possible for OVS
652 DPDK to know what upper bound MTU value is supported for a given device.
653 As such OVS DPDK must provision for the case where the L2 frame for a given
654 MTU includes 2 x vlan headers. This reduces the upper bound MTU value for
655 devices that do not include vlan headers in their L2 frames by 8 bytes e.g.
656 ixgbe devices upper bound MTU is reduced from 9710 to 9702. This work
657 around is temporary and is expected to be removed once a method is provided
658 by DPDK to query the upper bound MTU value for a given device.
659
795752a3
SF
660Reporting Bugs
661--------------
167703d6 662
795752a3 663Report problems to bugs@openvswitch.org.