]> git.proxmox.com Git - mirror_ovs.git/blob - Documentation/intro/install/dpdk.rst
247c41da34fe0ccf3815831f726993c7186e537f
[mirror_ovs.git] / Documentation / intro / install / dpdk.rst
1 ..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24 ======================
25 Open vSwitch with DPDK
26 ======================
27
28 This document describes how to build and install Open vSwitch using a DPDK
29 datapath. Open vSwitch can use the DPDK library to operate entirely in
30 userspace.
31
32 .. seealso::
33
34 The :doc:`releases FAQ </faq/releases>` lists support for the required
35 versions of DPDK for each version of Open vSwitch.
36
37 Build requirements
38 ------------------
39
40 In addition to the requirements described in :doc:`general`, building Open
41 vSwitch with DPDK will require the following:
42
43 - DPDK 17.11.2
44
45 - A `DPDK supported NIC`_
46
47 Only required when physical ports are in use
48
49 - A suitable kernel
50
51 On Linux Distros running kernel version >= 3.0, only `IOMMU` needs to enabled
52 via the grub cmdline, assuming you are using **VFIO**. For older kernels,
53 ensure the kernel is built with ``UIO``, ``HUGETLBFS``,
54 ``PROC_PAGE_MONITOR``, ``HPET``, ``HPET_MMAP`` support. If these are not
55 present, it will be necessary to upgrade your kernel or build a custom kernel
56 with these flags enabled.
57
58 Detailed system requirements can be found at `DPDK requirements`_.
59
60 .. _DPDK supported NIC: http://dpdk.org/doc/nics
61 .. _DPDK requirements: http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html
62
63 Installing
64 ----------
65
66 Install DPDK
67 ~~~~~~~~~~~~
68
69 #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
70
71 $ cd /usr/src/
72 $ wget http://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz
73 $ tar xf dpdk-17.11.2.tar.xz
74 $ export DPDK_DIR=/usr/src/dpdk-stable-17.11.2
75 $ cd $DPDK_DIR
76
77 #. (Optional) Configure DPDK as a shared library
78
79 DPDK can be built as either a static library or a shared library. By
80 default, it is configured for the former. If you wish to use the latter, set
81 ``CONFIG_RTE_BUILD_SHARED_LIB=y`` in ``$DPDK_DIR/config/common_base``.
82
83 .. note::
84
85 Minor performance loss is expected when using OVS with a shared DPDK
86 library compared to a static DPDK library.
87
88 #. Configure and install DPDK
89
90 Build and install the DPDK library::
91
92 $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
93 $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
94 $ make install T=$DPDK_TARGET DESTDIR=install
95
96 #. (Optional) Export the DPDK shared library location
97
98 If DPDK was built as a shared library, export the path to this library for
99 use when building OVS::
100
101 $ export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib
102
103 .. _DPDK sources: http://dpdk.org/rel
104
105 Install OVS
106 ~~~~~~~~~~~
107
108 OVS can be installed using different methods. For OVS to use DPDK datapath, it
109 has to be configured with DPDK support (``--with-dpdk``).
110
111 .. note::
112 This section focuses on generic recipe that suits most cases. For
113 distribution specific instructions, refer to one of the more relevant guides.
114
115 .. _OVS sources: http://openvswitch.org/releases/
116
117 #. Ensure the standard OVS requirements, described in
118 :ref:`general-build-reqs`, are installed
119
120 #. Bootstrap, if required, as described in :ref:`general-bootstrapping`
121
122 #. Configure the package using the ``--with-dpdk`` flag::
123
124 $ ./configure --with-dpdk=$DPDK_BUILD
125
126 where ``DPDK_BUILD`` is the path to the built DPDK library. This can be
127 skipped if DPDK library is installed in its default location.
128
129 If no path is provided to ``--with-dpdk``, but a pkg-config configuration
130 for libdpdk is available the include paths will be generated via an
131 equivalent ``pkg-config --cflags libdpdk``.
132
133 .. note::
134 While ``--with-dpdk`` is required, you can pass any other configuration
135 option described in :ref:`general-configuring`.
136
137 #. Build and install OVS, as described in :ref:`general-building`
138
139 Additional information can be found in :doc:`general`.
140
141 .. note::
142 If you are running using the Fedora or Red Hat package, the Open vSwitch
143 daemon will run as a non-root user. This implies that you must have a
144 working IOMMU. Visit the `RHEL README`__ for additional information.
145
146 __ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst
147
148 Setup
149 -----
150
151 Setup Hugepages
152 ~~~~~~~~~~~~~~~
153
154 Allocate a number of 2M Huge pages:
155
156 - For persistent allocation of huge pages, write to hugepages.conf file
157 in `/etc/sysctl.d`::
158
159 $ echo 'vm.nr_hugepages=2048' > /etc/sysctl.d/hugepages.conf
160
161 - For run-time allocation of huge pages, use the ``sysctl`` utility::
162
163 $ sysctl -w vm.nr_hugepages=N # where N = No. of 2M huge pages
164
165 To verify hugepage configuration::
166
167 $ grep HugePages_ /proc/meminfo
168
169 Mount the hugepages, if not already mounted by default::
170
171 $ mount -t hugetlbfs none /dev/hugepages``
172
173 .. _dpdk-vfio:
174
175 Setup DPDK devices using VFIO
176 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
177
178 VFIO is prefered to the UIO driver when using recent versions of DPDK. VFIO
179 support required support from both the kernel and BIOS. For the former, kernel
180 version > 3.6 must be used. For the latter, you must enable VT-d in the BIOS
181 and ensure this is configured via grub. To ensure VT-d is enabled via the BIOS,
182 run::
183
184 $ dmesg | grep -e DMAR -e IOMMU
185
186 If VT-d is not enabled in the BIOS, enable it now.
187
188 To ensure VT-d is enabled in the kernel, run::
189
190 $ cat /proc/cmdline | grep iommu=pt
191 $ cat /proc/cmdline | grep intel_iommu=on
192
193 If VT-d is not enabled in the kernel, enable it now.
194
195 Once VT-d is correctly configured, load the required modules and bind the NIC
196 to the VFIO driver::
197
198 $ modprobe vfio-pci
199 $ /usr/bin/chmod a+x /dev/vfio
200 $ /usr/bin/chmod 0666 /dev/vfio/*
201 $ $DPDK_DIR/usertools/dpdk-devbind.py --bind=vfio-pci eth1
202 $ $DPDK_DIR/usertools/dpdk-devbind.py --status
203
204 Setup OVS
205 ~~~~~~~~~
206
207 Open vSwitch should be started as described in :doc:`general` with the
208 exception of ovs-vswitchd, which requires some special configuration to enable
209 DPDK functionality. DPDK configuration arguments can be passed to ovs-vswitchd
210 via the ``other_config`` column of the ``Open_vSwitch`` table. At a minimum,
211 the ``dpdk-init`` option must be set to either ``true`` or ``try``.
212 For example::
213
214 $ export PATH=$PATH:/usr/local/share/openvswitch/scripts
215 $ export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
216 $ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
217 $ ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start
218
219 There are many other configuration options, the most important of which are
220 listed below. Defaults will be provided for all values not explicitly set.
221
222 ``dpdk-init``
223 Specifies whether OVS should initialize and support DPDK ports. This field
224 can either be ``true`` or ``try``.
225 A value of ``true`` will cause the ovs-vswitchd process to abort on
226 initialization failure.
227 A value of ``try`` will imply that the ovs-vswitchd process should
228 continue running even if the EAL initialization fails.
229
230 ``dpdk-lcore-mask``
231 Specifies the CPU cores on which dpdk lcore threads should be spawned and
232 expects hex string (eg '0x123').
233
234 ``dpdk-socket-mem``
235 Comma separated list of memory to pre-allocate from hugepages on specific
236 sockets.
237
238 ``dpdk-hugepage-dir``
239 Directory where hugetlbfs is mounted
240
241 ``vhost-sock-dir``
242 Option to set the path to the vhost-user unix socket files.
243
244 If allocating more than one GB hugepage, you can configure the
245 amount of memory used from any given NUMA nodes. For example, to use 1GB from
246 NUMA node 0 and 0GB for all other NUMA nodes, run::
247
248 $ ovs-vsctl --no-wait set Open_vSwitch . \
249 other_config:dpdk-socket-mem="1024,0"
250
251 or::
252
253 $ ovs-vsctl --no-wait set Open_vSwitch . \
254 other_config:dpdk-socket-mem="1024"
255
256 .. note::
257 Changing any of these options requires restarting the ovs-vswitchd
258 application
259
260 See the section ``Performance Tuning`` for important DPDK customizations.
261
262 Validating
263 ----------
264
265 DPDK support can be confirmed by validating the ``dpdk_initialized`` boolean
266 value from the ovsdb. A value of ``true`` means that the DPDK EAL
267 initialization succeeded::
268
269 $ ovs-vsctl get Open_vSwitch . dpdk_initialized
270 true
271
272 Additionally, the library version linked to ovs-vswitchd can be confirmed
273 with either the ovs-vswitchd logs, or by running either of the commands::
274
275 $ ovs-vswitchd --version
276 ovs-vswitchd (Open vSwitch) 2.9.0
277 DPDK 17.11.0
278 $ ovs-vsctl get Open_vSwitch . dpdk_version
279 "DPDK 17.11.0"
280
281 At this point you can use ovs-vsctl to set up bridges and other Open vSwitch
282 features. Seeing as we've configured the DPDK datapath, we will use DPDK-type
283 ports. For example, to create a userspace bridge named ``br0`` and add two
284 ``dpdk`` ports to it, run::
285
286 $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
287 $ ovs-vsctl add-port br0 myportnameone -- set Interface myportnameone \
288 type=dpdk options:dpdk-devargs=0000:06:00.0
289 $ ovs-vsctl add-port br0 myportnametwo -- set Interface myportnametwo \
290 type=dpdk options:dpdk-devargs=0000:06:00.1
291
292 DPDK devices will not be available for use until a valid dpdk-devargs is
293 specified.
294
295 Refer to ovs-vsctl(8) and :doc:`/howto/dpdk` for more details.
296
297 Performance Tuning
298 ------------------
299
300 To achieve optimal OVS performance, the system can be configured and that
301 includes BIOS tweaks, Grub cmdline additions, better understanding of NUMA
302 nodes and apt selection of PCIe slots for NIC placement.
303
304 .. note::
305
306 This section is optional. Once installed as described above, OVS with DPDK
307 will work out of the box.
308
309 Recommended BIOS Settings
310 ~~~~~~~~~~~~~~~~~~~~~~~~~
311
312 .. list-table:: Recommended BIOS Settings
313 :header-rows: 1
314
315 * - Setting
316 - Value
317 * - C3 Power State
318 - Disabled
319 * - C6 Power State
320 - Disabled
321 * - MLC Streamer
322 - Enabled
323 * - MLC Spacial Prefetcher
324 - Enabled
325 * - DCU Data Prefetcher
326 - Enabled
327 * - DCA
328 - Enabled
329 * - CPU Power and Performance
330 - Performance
331 * - Memeory RAS and Performance Config -> NUMA optimized
332 - Enabled
333
334 PCIe Slot Selection
335 ~~~~~~~~~~~~~~~~~~~
336
337 The fastpath performance can be affected by factors related to the placement of
338 the NIC, such as channel speeds between PCIe slot and CPU or the proximity of
339 PCIe slot to the CPU cores running the DPDK application. Listed below are the
340 steps to identify right PCIe slot.
341
342 #. Retrieve host details using ``dmidecode``. For example::
343
344 $ dmidecode -t baseboard | grep "Product Name"
345
346 #. Download the technical specification for product listed, e.g: S2600WT2
347
348 #. Check the Product Architecture Overview on the Riser slot placement, CPU
349 sharing info and also PCIe channel speeds
350
351 For example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed
352 between CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s.
353 Running DPDK app on CPU1 cores and NIC inserted in to Riser card Slots will
354 optimize OVS performance in this case.
355
356 #. Check the Riser Card #1 - Root Port mapping information, on the available
357 slots and individual bus speeds. In S2600WT slot 1, slot 2 has high bus
358 speeds and are potential slots for NIC placement.
359
360 Advanced Hugepage Setup
361 ~~~~~~~~~~~~~~~~~~~~~~~
362
363 Allocate and mount 1 GB hugepages.
364
365 - For persistent allocation of huge pages, add the following options to the
366 kernel bootline::
367
368 default_hugepagesz=1GB hugepagesz=1G hugepages=N
369
370 For platforms supporting multiple huge page sizes, add multiple options::
371
372 default_hugepagesz=<size> hugepagesz=<size> hugepages=N
373
374 where:
375
376 ``N``
377 number of huge pages requested
378 ``size``
379 huge page size with an optional suffix ``[kKmMgG]``
380
381 - For run-time allocation of huge pages::
382
383 $ echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages
384
385 where:
386
387 ``N``
388 number of huge pages requested
389 ``X``
390 NUMA Node
391
392 .. note::
393 For run-time allocation of 1G huge pages, Contiguous Memory Allocator
394 (``CONFIG_CMA``) has to be supported by kernel, check your Linux distro.
395
396 Now mount the huge pages, if not already done so::
397
398 $ mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
399
400 Isolate Cores
401 ~~~~~~~~~~~~~
402
403 The ``isolcpus`` option can be used to isolate cores from the Linux scheduler.
404 The isolated cores can then be used to dedicatedly run HPC applications or
405 threads. This helps in better application performance due to zero context
406 switching and minimal cache thrashing. To run platform logic on core 0 and
407 isolate cores between 1 and 19 from scheduler, add ``isolcpus=1-19`` to GRUB
408 cmdline.
409
410 .. note::
411 It has been verified that core isolation has minimal advantage due to mature
412 Linux scheduler in some circumstances.
413
414 Compiler Optimizations
415 ~~~~~~~~~~~~~~~~~~~~~~
416
417 The default compiler optimization level is ``-O2``. Changing this to more
418 aggressive compiler optimization such as ``-O3 -march=native`` with
419 gcc (verified on 5.3.1) can produce performance gains though not significant.
420 ``-march=native`` will produce optimized code on local machine and should be
421 used when software compilation is done on Testbed.
422
423 Multiple Poll-Mode Driver Threads
424 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
425
426 With pmd multi-threading support, OVS creates one pmd thread for each NUMA node
427 by default, if there is at least one DPDK interface from that NUMA node added
428 to OVS. However, in cases where there are multiple ports/rxq's producing
429 traffic, performance can be improved by creating multiple pmd threads running
430 on separate cores. These pmd threads can share the workload by each being
431 responsible for different ports/rxq's. Assignment of ports/rxq's to pmd threads
432 is done automatically.
433
434 A set bit in the mask means a pmd thread is created and pinned to the
435 corresponding CPU core. For example, to run pmd threads on core 1 and 2::
436
437 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
438
439 When using dpdk and dpdkvhostuser ports in a bi-directional VM loopback as
440 shown below, spreading the workload over 2 or 4 pmd threads shows significant
441 improvements as there will be more total CPU occupancy available::
442
443 NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
444
445 Refer to ovs-vswitchd.conf.db(5) for additional information on configuration
446 options.
447
448 Affinity
449 ~~~~~~~~
450
451 For superior performance, DPDK pmd threads and Qemu vCPU threads needs to be
452 affinitized accordingly.
453
454 - PMD thread Affinity
455
456 A poll mode driver (pmd) thread handles the I/O of all DPDK interfaces
457 assigned to it. A pmd thread shall poll the ports for incoming packets,
458 switch the packets and send to tx port. A pmd thread is CPU bound, and needs
459 to be affinitized to isolated cores for optimum performance. Even though a
460 PMD thread may exist, the thread only starts consuming CPU cycles if there is
461 at least one receive queue assigned to the pmd.
462
463 .. note::
464 On NUMA systems, PCI devices are also local to a NUMA node. Unbound rx
465 queues for a PCI device will be assigned to a pmd on it's local NUMA node
466 if a non-isolated PMD exists on that NUMA node. If not, the queue will be
467 assigned to a non-isolated pmd on a remote NUMA node. This will result in
468 reduced maximum throughput on that device and possibly on other devices
469 assigned to that pmd thread. If such a queue assignment is made a warning
470 message will be logged: "There's no available (non-isolated) pmd thread on
471 numa node N. Queue Q on port P will be assigned to the pmd on core C
472 (numa node N'). Expect reduced performance."
473
474 Binding PMD threads to cores is described in the above section
475 ``Multiple Poll-Mode Driver Threads``.
476
477 - QEMU vCPU thread Affinity
478
479 A VM performing simple packet forwarding or running complex packet pipelines
480 has to ensure that the vCPU threads performing the work has as much CPU
481 occupancy as possible.
482
483 For example, on a multicore VM, multiple QEMU vCPU threads shall be spawned.
484 When the DPDK ``testpmd`` application that does packet forwarding is invoked,
485 the ``taskset`` command should be used to affinitize the vCPU threads to the
486 dedicated isolated cores on the host system.
487
488 Enable HyperThreading
489 ~~~~~~~~~~~~~~~~~~~~~
490
491 With HyperThreading, or SMT, enabled, a physical core appears as two logical
492 cores. SMT can be utilized to spawn worker threads on logical cores of the same
493 physical core there by saving additional cores.
494
495 With DPDK, when pinning pmd threads to logical cores, care must be taken to set
496 the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads are
497 pinned to SMT siblings.
498
499 Take a sample system configuration, with 2 sockets, 2 * 10 core processors, HT
500 enabled. This gives us a total of 40 logical cores. To identify the physical
501 core shared by two logical cores, run::
502
503 $ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
504
505 where ``N`` is the logical core number.
506
507 In this example, it would show that cores ``1`` and ``21`` share the same
508 physical core. Logical cores can be specified in pmd-cpu-masks similarly to
509 physical cores, as described in ``Multiple Poll-Mode Driver Threads``.
510
511 NUMA/Cluster-on-Die
512 ~~~~~~~~~~~~~~~~~~~
513
514 Ideally inter-NUMA datapaths should be avoided where possible as packets will
515 go across QPI and there may be a slight performance penalty when compared with
516 intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is
517 introduced on models that have 10 cores or more. This makes it possible to
518 logically split a socket into two NUMA regions and again it is preferred where
519 possible to keep critical datapaths within the one cluster.
520
521 It is good practice to ensure that threads that are in the datapath are pinned
522 to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs responsible for
523 forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, vHost
524 User ports automatically detect the NUMA socket of the QEMU vCPUs and will be
525 serviced by a PMD from the same node provided a core on this node is enabled in
526 the ``pmd-cpu-mask``. ``libnuma`` packages are required for this feature.
527
528 Binding PMD threads is described in the above section
529 ``Multiple Poll-Mode Driver Threads``.
530
531 DPDK Physical Port Rx Queues
532 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
533
534 ::
535
536 $ ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>
537
538 The above command sets the number of rx queues for DPDK physical interface.
539 The rx queues are assigned to pmd threads on the same NUMA node in a
540 round-robin fashion.
541
542 .. _dpdk-queues-sizes:
543
544 DPDK Physical Port Queue Sizes
545 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
546
547 ::
548
549 $ ovs-vsctl set Interface dpdk0 options:n_rxq_desc=<integer>
550 $ ovs-vsctl set Interface dpdk0 options:n_txq_desc=<integer>
551
552 The above command sets the number of rx/tx descriptors that the NIC associated
553 with dpdk0 will be initialised with.
554
555 Different ``n_rxq_desc`` and ``n_txq_desc`` configurations yield different
556 benefits in terms of throughput and latency for different scenarios.
557 Generally, smaller queue sizes can have a positive impact for latency at the
558 expense of throughput. The opposite is often true for larger queue sizes.
559 Note: increasing the number of rx descriptors eg. to 4096 may have a negative
560 impact on performance due to the fact that non-vectorised DPDK rx functions may
561 be used. This is dependent on the driver in use, but is true for the commonly
562 used i40e and ixgbe DPDK drivers.
563
564 Exact Match Cache
565 ~~~~~~~~~~~~~~~~~
566
567 Each pmd thread contains one Exact Match Cache (EMC). After initial flow setup
568 in the datapath, the EMC contains a single table and provides the lowest level
569 (fastest) switching for DPDK ports. If there is a miss in the EMC then the next
570 level where switching will occur is the datapath classifier. Missing in the
571 EMC and looking up in the datapath classifier incurs a significant performance
572 penalty. If lookup misses occur in the EMC because it is too small to handle
573 the number of flows, its size can be increased. The EMC size can be modified by
574 editing the define ``EM_FLOW_HASH_SHIFT`` in ``lib/dpif-netdev.c``.
575
576 As mentioned above, an EMC is per pmd thread. An alternative way of increasing
577 the aggregate amount of possible flow entries in EMC and avoiding datapath
578 classifier lookups is to have multiple pmd threads running.
579
580 Rx Mergeable Buffers
581 ~~~~~~~~~~~~~~~~~~~~
582
583 Rx mergeable buffers is a virtio feature that allows chaining of multiple
584 virtio descriptors to handle large packet sizes. Large packets are handled by
585 reserving and chaining multiple free descriptors together. Mergeable buffer
586 support is negotiated between the virtio driver and virtio device and is
587 supported by the DPDK vhost library. This behavior is supported and enabled by
588 default, however in the case where the user knows that rx mergeable buffers are
589 not needed i.e. jumbo frames are not needed, it can be forced off by adding
590 ``mrg_rxbuf=off`` to the QEMU command line options. By not reserving multiple
591 chains of descriptors it will make more individual virtio descriptors available
592 for rx to the guest using dpdkvhost ports and this can improve performance.
593
594 Output Packet Batching
595 ~~~~~~~~~~~~~~~~~~~~~~
596
597 To make advantage of batched transmit functions, OVS collects packets in
598 intermediate queues before sending when processing a batch of received packets.
599 Even if packets are matched by different flows, OVS uses a single send
600 operation for all packets destined to the same output port.
601
602 Furthermore, OVS is able to buffer packets in these intermediate queues for a
603 configurable amount of time to reduce the frequency of send bursts at medium
604 load levels when the packet receive rate is high, but the receive batch size
605 still very small. This is particularly beneficial for packets transmitted to
606 VMs using an interrupt-driven virtio driver, where the interrupt overhead is
607 significant for the OVS PMD, the host operating system and the guest driver.
608
609 The ``tx-flush-interval`` parameter can be used to specify the time in
610 microseconds OVS should wait between two send bursts to a given port (default
611 is ``0``). When the intermediate queue fills up before that time is over, the
612 buffered packet batch is sent immediately::
613
614 $ ovs-vsctl set Open_vSwitch . other_config:tx-flush-interval=50
615
616 This parameter influences both throughput and latency, depending on the traffic
617 load on the port. In general lower values decrease latency while higher values
618 may be useful to achieve higher throughput.
619
620 Low traffic (``packet rate < 1 / tx-flush-interval``) should not experience
621 any significant latency or throughput increase as packets are forwarded
622 immediately.
623
624 At intermediate load levels
625 (``1 / tx-flush-interval < packet rate < 32 / tx-flush-interval``) traffic
626 should experience an average latency increase of up to
627 ``1 / 2 * tx-flush-interval`` and a possible throughput improvement.
628
629 Very high traffic (``packet rate >> 32 / tx-flush-interval``) should experience
630 the average latency increase equal to ``32 / (2 * packet rate)``. Most send
631 batches in this case will contain the maximum number of packets (``32``).
632
633 A ``tx-burst-interval`` value of ``50`` microseconds has shown to provide a
634 good performance increase in a ``PHY-VM-PHY`` scenario on ``x86`` system for
635 interrupt-driven guests while keeping the latency increase at a reasonable
636 level:
637
638 https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341628.html
639
640 .. note::
641 Throughput impact of this option significantly depends on the scenario and
642 the traffic patterns. For example: ``tx-burst-interval`` value of ``50``
643 microseconds shows performance degradation in ``PHY-VM-PHY`` with bonded PHY
644 scenario while testing with ``256 - 1024`` packet flows:
645
646 https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341700.html
647
648 The average number of packets per output batch can be checked in PMD stats::
649
650 $ ovs-appctl dpif-netdev/pmd-stats-show
651
652 Limitations
653 ------------
654
655 - Currently DPDK ports does not use HW offload functionality.
656 - Network Interface Firmware requirements: Each release of DPDK is
657 validated against a specific firmware version for a supported Network
658 Interface. New firmware versions introduce bug fixes, performance
659 improvements and new functionality that DPDK leverages. The validated
660 firmware versions are available as part of the release notes for
661 DPDK. It is recommended that users update Network Interface firmware
662 to match what has been validated for the DPDK release.
663
664 The latest list of validated firmware versions can be found in the `DPDK
665 release notes`_.
666
667 .. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_11.html
668
669 - Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
670 given MTU value is calculated e.g. i40e driver includes 2 x vlan headers in
671 MTU overhead, em driver includes 1 x vlan header, ixgbe driver does not
672 include a vlan header in overhead. Currently it is not possible for OVS
673 DPDK to know what upper bound MTU value is supported for a given device.
674 As such OVS DPDK must provision for the case where the L2 frame for a given
675 MTU includes 2 x vlan headers. This reduces the upper bound MTU value for
676 devices that do not include vlan headers in their L2 frames by 8 bytes e.g.
677 ixgbe devices upper bound MTU is reduced from 9710 to 9702. This work
678 around is temporary and is expected to be removed once a method is provided
679 by DPDK to query the upper bound MTU value for a given device.
680
681 Reporting Bugs
682 --------------
683
684 Report problems to bugs@openvswitch.org.