]> git.proxmox.com Git - mirror_ovs.git/blame - INSTALL.DPDK-ADVANCED.md
netdev-dpdk: Remove dpdkvhostcuse ports
[mirror_ovs.git] / INSTALL.DPDK-ADVANCED.md
CommitLineData
c9b9d6df 1OVS DPDK ADVANCED INSTALL GUIDE
0072e931 2===============================
c9b9d6df
BB
3
4## Contents
5
61. [Overview](#overview)
72. [Building Shared Library](#build)
83. [System configuration](#sysconf)
94. [Performance Tuning](#perftune)
105. [OVS Testcases](#ovstc)
116. [Vhost Walkthrough](#vhost)
127. [QOS](#qos)
138. [Rate Limiting](#rl)
9fd39370 149. [Flow Control](#fc)
4b88d678 1510. [Pdump](#pdump)
0072e931
MK
1611. [Jumbo Frames](#jumbo)
1712. [Vsperf](#vsperf)
c9b9d6df
BB
18
19## <a name="overview"></a> 1. Overview
20
21The Advanced Install Guide explains how to improve OVS performance using
22DPDK datapath. This guide also provides information on tuning, system configuration,
23troubleshooting, static code analysis and testcases.
24
25## <a name="build"></a> 2. Building Shared Library
26
27DPDK can be built as static or shared library and shall be linked by applications
28using DPDK datapath. The section lists steps to build shared library and dynamically
29link DPDK against OVS.
30
31Note: Minor performance loss is seen with OVS when using shared DPDK library as
32compared to static library.
33
34Check section [INSTALL DPDK], [INSTALL OVS] of INSTALL.DPDK on download instructions
35for DPDK and OVS.
36
37 * Configure the DPDK library
38
39 Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base`
40 to generate shared DPDK library
41
42
43 * Build and install DPDK
44
45 For Default install (without IVSHMEM), set `export DPDK_TARGET=x86_64-native-linuxapp-gcc`
46 For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
47
48 ```
0a0f39df 49 export DPDK_DIR=/usr/src/dpdk-16.07
c9b9d6df
BB
50 export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
51 make install T=$DPDK_TARGET DESTDIR=install
52 ```
53
54 * Build, Install and Setup OVS.
55
56 Export the DPDK shared library location and setup OVS as listed in
57 section 3.3 of INSTALL.DPDK.
58
59 `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib`
60
61## <a name="sysconf"></a> 3. System Configuration
62
63To achieve optimal OVS performance, the system can be configured and that includes
64BIOS tweaks, Grub cmdline additions, better understanding of NUMA nodes and
65apt selection of PCIe slots for NIC placement.
66
67### 3.1 Recommended BIOS settings
68
69 ```
70 | Settings | values | comments
71 |---------------------------|-----------|-----------
72 | C3 power state | Disabled | -
73 | C6 power state | Disabled | -
74 | MLC Streamer | Enabled | -
75 | MLC Spacial prefetcher | Enabled | -
76 | DCU Data prefetcher | Enabled | -
77 | DCA | Enabled | -
78 | CPU power and performance | Performance -
79 | Memory RAS and perf | | -
80 config-> NUMA optimized | Enabled | -
81 ```
82
83### 3.2 PCIe Slot Selection
84
85The fastpath performance also depends on factors like the NIC placement,
86Channel speeds between PCIe slot and CPU, proximity of PCIe slot to the CPU
87cores running DPDK application. Listed below are the steps to identify
88right PCIe slot.
89
90- Retrieve host details using cmd `dmidecode -t baseboard | grep "Product Name"`
91- Download the technical specification for Product listed eg: S2600WT2.
92- Check the Product Architecture Overview on the Riser slot placement,
93 CPU sharing info and also PCIe channel speeds.
94
95 example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed between
96 CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s. Running DPDK app
97 on CPU1 cores and NIC inserted in to Riser card Slots will optimize OVS performance
98 in this case.
99
100- Check the Riser Card #1 - Root Port mapping information, on the available slots
101 and individual bus speeds. In S2600WT slot 1, slot 2 has high bus speeds and are
102 potential slots for NIC placement.
103
104### 3.3 Advanced Hugepage setup
105
106 Allocate and mount 1G Huge pages:
107
108 - For persistent allocation of huge pages, add the following options to the kernel bootline
109
110 Add `default_hugepagesz=1GB hugepagesz=1G hugepages=N`
111
112 For platforms supporting multiple huge page sizes, Add options
113
114 `default_hugepagesz=<size> hugepagesz=<size> hugepages=N`
115 where 'N' = Number of huge pages requested, 'size' = huge page size,
116 optional suffix [kKmMgG]
117
118 - For run-time allocation of huge pages
119
120 `echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages`
121 where 'N' = Number of huge pages requested, 'X' = NUMA Node
122
123 Note: For run-time allocation of 1G huge pages, Contiguous Memory Allocator(CONFIG_CMA)
124 has to be supported by kernel, check your Linux distro.
125
126 - Mount huge pages
127
128 `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
129
130 Note: Mount hugepages if not already mounted by default.
131
132### 3.4 Enable Hyperthreading
133
134 Requires BIOS changes
135
136 With HT/SMT enabled, A Physical core appears as two logical cores.
137 SMT can be utilized to spawn worker threads on logical cores of the same
138 physical core there by saving additional cores.
139
140 With DPDK, When pinning pmd threads to logical cores, care must be taken
141 to set the correct bits in the pmd-cpu-mask to ensure that the pmd threads are
142 pinned to SMT siblings.
143
144 Example System configuration:
145 Dual socket Machine, 2x 10 core processors, HT enabled, 40 logical cores
146
147 To use two logical cores which share the same physical core for pmd threads,
148 the following command can be used to identify a pair of logical cores.
149
150 `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`, where N is the
151 logical core number.
152
153 In this example, it would show that cores 1 and 21 share the same physical core.
154 The pmd-cpu-mask to enable two pmd threads running on these two logical cores
155 (one physical core) is.
156
157 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
158
159### 3.5 Isolate cores
160
161 'isolcpus' option can be used to isolate cores from the linux scheduler.
162 The isolated cores can then be used to dedicatedly run HPC applications/threads.
163 This helps in better application performance due to zero context switching and
164 minimal cache thrashing. To run platform logic on core 0 and isolate cores
165 between 1 and 19 from scheduler, Add `isolcpus=1-19` to GRUB cmdline.
166
167 Note: It has been verified that core isolation has minimal advantage due to
168 mature Linux scheduler in some circumstances.
169
170### 3.6 NUMA/Cluster on Die
171
172 Ideally inter NUMA datapaths should be avoided where possible as packets
173 will go across QPI and there may be a slight performance penalty when
174 compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3,
175 Cluster On Die is introduced on models that have 10 cores or more.
176 This makes it possible to logically split a socket into two NUMA regions
177 and again it is preferred where possible to keep critical datapaths
178 within the one cluster.
179
180 It is good practice to ensure that threads that are in the datapath are
181 pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
182 responsible for forwarding. If DPDK is built with
183 CONFIG_RTE_LIBRTE_VHOST_NUMA=y, vHost User ports automatically
184 detect the NUMA socket of the QEMU vCPUs and will be serviced by a PMD
185 from the same node provided a core on this node is enabled in the
435aaddd 186 pmd-cpu-mask. libnuma packages are required for this feature.
c9b9d6df
BB
187
188### 3.7 Compiler Optimizations
189
190 The default compiler optimization level is '-O2'. Changing this to
191 more aggressive compiler optimization such as '-O3 -march=native'
192 with gcc(verified on 5.3.1) can produce performance gains though not
193 siginificant. '-march=native' will produce optimized code on local machine
194 and should be used when SW compilation is done on Testbed.
195
196## <a name="perftune"></a> 4. Performance Tuning
197
198### 4.1 Affinity
199
200For superior performance, DPDK pmd threads and Qemu vCPU threads
201needs to be affinitized accordingly.
202
203 * PMD thread Affinity
204
205 A poll mode driver (pmd) thread handles the I/O of all DPDK
206 interfaces assigned to it. A pmd thread shall poll the ports
207 for incoming packets, switch the packets and send to tx port.
208 pmd thread is CPU bound, and needs to be affinitized to isolated
209 cores for optimum performance.
210
211 By setting a bit in the mask, a pmd thread is created and pinned
212 to the corresponding CPU core. e.g. to run a pmd thread on core 2
213
214 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
215
216 Note: pmd thread on a NUMA node is only created if there is
217 at least one DPDK interface from that NUMA node added to OVS.
218
219 * Qemu vCPU thread Affinity
220
221 A VM performing simple packet forwarding or running complex packet
222 pipelines has to ensure that the vCPU threads performing the work has
223 as much CPU occupancy as possible.
224
225 Example: On a multicore VM, multiple QEMU vCPU threads shall be spawned.
226 when the DPDK 'testpmd' application that does packet forwarding
227 is invoked, 'taskset' cmd should be used to affinitize the vCPU threads
228 to the dedicated isolated cores on the host system.
229
230### 4.2 Multiple poll mode driver threads
231
232 With pmd multi-threading support, OVS creates one pmd thread
233 for each NUMA node by default. However, it can be seen that in cases
234 where there are multiple ports/rxq's producing traffic, performance
235 can be improved by creating multiple pmd threads running on separate
236 cores. These pmd threads can then share the workload by each being
237 responsible for different ports/rxq's. Assignment of ports/rxq's to
238 pmd threads is done automatically.
239
240 A set bit in the mask means a pmd thread is created and pinned
241 to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
242
243 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
244
245 For example, when using dpdk and dpdkvhostuser ports in a bi-directional
246 VM loopback as shown below, spreading the workload over 2 or 4 pmd
247 threads shows significant improvements as there will be more total CPU
248 occupancy available.
249
250 NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
251
81acebda 252### 4.3 DPDK physical port Rx Queues
c9b9d6df
BB
253
254 `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
255
81acebda 256 The command above sets the number of rx queues for DPDK physical interface.
c9b9d6df 257 The rx queues are assigned to pmd threads on the same NUMA node in a
81acebda 258 round-robin fashion.
c9b9d6df
BB
259
260### 4.4 Exact Match Cache
261
262 Each pmd thread contains one EMC. After initial flow setup in the
263 datapath, the EMC contains a single table and provides the lowest level
264 (fastest) switching for DPDK ports. If there is a miss in the EMC then
265 the next level where switching will occur is the datapath classifier.
266 Missing in the EMC and looking up in the datapath classifier incurs a
267 significant performance penalty. If lookup misses occur in the EMC
268 because it is too small to handle the number of flows, its size can
269 be increased. The EMC size can be modified by editing the define
270 EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
271
272 As mentioned above an EMC is per pmd thread. So an alternative way of
273 increasing the aggregate amount of possible flow entries in EMC and
274 avoiding datapath classifier lookups is to have multiple pmd threads
275 running. This can be done as described in section 4.2.
276
277### 4.5 Rx Mergeable buffers
278
279 Rx Mergeable buffers is a virtio feature that allows chaining of multiple
280 virtio descriptors to handle large packet sizes. As such, large packets
281 are handled by reserving and chaining multiple free descriptors
282 together. Mergeable buffer support is negotiated between the virtio
283 driver and virtio device and is supported by the DPDK vhost library.
284 This behavior is typically supported and enabled by default, however
285 in the case where the user knows that rx mergeable buffers are not needed
286 i.e. jumbo frames are not needed, it can be forced off by adding
287 mrg_rxbuf=off to the QEMU command line options. By not reserving multiple
288 chains of descriptors it will make more individual virtio descriptors
289 available for rx to the guest using dpdkvhost ports and this can improve
290 performance.
291
292## <a name="ovstc"></a> 5. OVS Testcases
293### 5.1 PHY-VM-PHY [VHOST LOOPBACK]
294
295The section 5.2 in INSTALL.DPDK guide lists steps for PVP loopback testcase
296and packet forwarding using DPDK testpmd application in the Guest VM.
297For users wanting to do packet forwarding using kernel stack below are the steps.
298
299 ```
300 ifconfig eth1 1.1.1.2/24
301 ifconfig eth2 1.1.2.2/24
302 systemctl stop firewalld.service
303 systemctl stop iptables.service
304 sysctl -w net.ipv4.ip_forward=1
305 sysctl -w net.ipv4.conf.all.rp_filter=0
306 sysctl -w net.ipv4.conf.eth1.rp_filter=0
307 sysctl -w net.ipv4.conf.eth2.rp_filter=0
308 route add -net 1.1.2.0/24 eth2
309 route add -net 1.1.1.0/24 eth1
310 arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE
311 arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE
312 ```
313
314### 5.2 PHY-VM-PHY [IVSHMEM]
315
316 The steps (1-5) in 3.3 section of INSTALL.DPDK guide will create & initialize DB,
317 start vswitchd and add dpdk devices to bridge br0.
318
319 1. Add DPDK ring port to the bridge
320
321 ```
322 ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
323 ```
324
325 2. Build modified Qemu (Qemu-2.2.1 + ivshmem-qemu-2.2.1.patch)
326
327 ```
328 cd /usr/src/
329 wget http://wiki.qemu.org/download/qemu-2.2.1.tar.bz2
330 tar -jxvf qemu-2.2.1.tar.bz2
331 cd /usr/src/qemu-2.2.1
332 wget https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/patches/ivshmem-qemu-2.2.1.patch
333 patch -p1 < ivshmem-qemu-2.2.1.patch
334 ./configure --target-list=x86_64-softmmu --enable-debug --extra-cflags='-g'
335 make -j 4
336 ```
337
338 3. Generate Qemu commandline
339
340 ```
341 mkdir -p /usr/src/cmdline_generator
342 cd /usr/src/cmdline_generator
343 wget https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/cmdline_generator.c
344 wget https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/Makefile
0a0f39df 345 export RTE_SDK=/usr/src/dpdk-16.07
c9b9d6df
BB
346 export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
347 make
348 ./build/cmdline_generator -m -p dpdkr0 XXX
349 cmdline=`cat OVSMEMPOOL`
350 ```
351
352 4. start Guest VM
353
354 ```
355 export VM_NAME=ivshmem-vm
356 export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2
357 export QEMU_BIN=/usr/src/qemu-2.2.1/x86_64-softmmu/qemu-system-x86_64
358
359 taskset 0x20 $QEMU_BIN -cpu host -smp 2,cores=2 -hda $QCOW2_IMAGE -m 4096 --enable-kvm -name $VM_NAME -nographic -vnc :2 -pidfile /tmp/vm1.pid $cmdline
360 ```
361
362 5. Running sample "dpdk ring" app in VM
363
364 ```
365 echo 1024 > /proc/sys/vm/nr_hugepages
366 mount -t hugetlbfs nodev /dev/hugepages (if not already mounted)
367
368 # Build the DPDK ring application in the VM
0a0f39df 369 export RTE_SDK=/root/dpdk-16.07
c9b9d6df
BB
370 export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
371 make
372
373 # Run dpdkring application
374 ./build/dpdkr -c 1 -n 4 -- -n 0
375 where "-n 0" refers to ring '0' i.e dpdkr0
376 ```
377
8a8b9c4f
BB
378### 5.3 PHY-VM-PHY [VHOST MULTIQUEUE]
379
380 The steps (1-5) in 3.3 section of [INSTALL DPDK] guide will create & initialize DB,
381 start vswitchd and add dpdk devices to bridge br0.
382
383 1. Configure PMD and RXQs. For example set no. of dpdk port rx queues to atleast 2.
384 The number of rx queues at vhost-user interface gets automatically configured after
385 virtio device connection and doesn't need manual configuration.
386
387 ```
388 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=c
389 ovs-vsctl set Interface dpdk0 options:n_rxq=2
390 ovs-vsctl set Interface dpdk1 options:n_rxq=2
391 ```
392
393 2. Instantiate Guest VM using Qemu cmdline
394
395 Guest Configuration
396
397 ```
398 | configuration | values | comments
399 |----------------------|--------|-----------------
400 | qemu version | 2.5.0 |
401 | qemu thread affinity |2 cores | taskset 0x30
402 | memory | 4GB | -
403 | cores | 2 | -
404 | Qcow2 image |Fedora22| -
405 | multiqueue | on | -
406 ```
407
408 Instantiate Guest
409
410 ```
411 export VM_NAME=vhost-vm
412 export GUEST_MEM=4096M
413 export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2
414 export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
415
416 taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6
417 ```
418
419 Note: Queue value above should match the queues configured in OVS, The vector value
420 should be set to 'no. of queues x 2 + 2'.
421
422 3. Guest interface configuration
423
424 Assuming there are 2 interfaces in the guest named eth0, eth1 check the channel
425 configuration and set the number of combined channels to 2 for virtio devices.
426 More information can be found in [Vhost walkthrough] section.
427
428 ```
429 ethtool -l eth0
430 ethtool -L eth0 combined 2
431 ethtool -L eth1 combined 2
432 ```
433
434 4. Kernel Packet forwarding
435
436 Configure IP and enable interfaces
437
438 ```
439 ifconfig eth0 5.5.5.1/24 up
440 ifconfig eth1 90.90.90.1/24 up
441 ```
442
443 Configure IP forwarding and add route entries
444
445 ```
446 sysctl -w net.ipv4.ip_forward=1
447 sysctl -w net.ipv4.conf.all.rp_filter=0
448 sysctl -w net.ipv4.conf.eth0.rp_filter=0
449 sysctl -w net.ipv4.conf.eth1.rp_filter=0
450 ip route add 2.1.1.0/24 dev eth1
451 route add default gw 2.1.1.2 eth1
452 route add default gw 90.90.90.90 eth1
453 arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE
454 arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA
455 ```
456
457 Check traffic on multiple queues
458
459 ```
460 cat /proc/interrupts | grep virtio
461 ```
462
c9b9d6df 463## <a name="vhost"></a> 6. Vhost Walkthrough
c9b9d6df
BB
464### 6.1 vhost-user
465
466 - Prerequisites:
467
468 QEMU version >= 2.2
469
470 - Adding vhost-user ports to Switch
471
472 Unlike DPDK ring ports, DPDK vhost-user ports can have arbitrary names,
473 except that forward and backward slashes are prohibited in the names.
474
475 For vhost-user, the name of the port type is `dpdkvhostuser`
476
477 ```
478 ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
479 type=dpdkvhostuser
480 ```
481
482 This action creates a socket located at
483 `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
484 to your VM on the QEMU command line. More instructions on this can be
485 found in the next section "Adding vhost-user ports to VM"
486
487 Note: If you wish for the vhost-user sockets to be created in a
488 sub-directory of `/usr/local/var/run/openvswitch`, you may specify
489 this directory in the ovsdb like so:
490
491 `./utilities/ovs-vsctl --no-wait \
492 set Open_vSwitch . other_config:vhost-sock-dir=subdir`
493
494 - Adding vhost-user ports to VM
495
496 1. Configure sockets
497
498 Pass the following parameters to QEMU to attach a vhost-user device:
499
500 ```
501 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
502 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
503 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
504 ```
505
506 where vhost-user-1 is the name of the vhost-user port added
507 to the switch.
508 Repeat the above parameters for multiple devices, changing the
509 chardev path and id as necessary. Note that a separate and different
510 chardev path needs to be specified for each vhost-user device. For
511 example you have a second vhost-user port named 'vhost-user-2', you
512 append your QEMU command line with an additional set of parameters:
513
514 ```
515 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
516 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
517 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
518 ```
519
520 2. Configure huge pages.
521
522 QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
523 a virtio-net device's virtual rings and packet buffers mapping the VM's
524 physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
525 memory into their process address space, pass the following parameters
526 to QEMU:
527
528 ```
529 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
530 share=on -numa node,memdev=mem -mem-prealloc
531 ```
532
533 3. Enable multiqueue support(OPTIONAL)
534
81acebda
IM
535 QEMU needs to be configured to use multiqueue.
536 The $q below is the number of queues.
c9b9d6df
BB
537 The $v is the number of vectors, which is '$q x 2 + 2'.
538
539 ```
540 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
541 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
542 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
543 ```
544
81acebda
IM
545 The vhost-user interface will be automatically reconfigured with required
546 number of rx and tx queues after connection of virtio device.
547 Manual configuration of `n_rxq` is not supported because OVS will work
548 properly only if `n_rxq` will match number of queues configured in QEMU.
549
c9b9d6df
BB
550 A least 2 PMDs should be configured for the vswitch when using multiqueue.
551 Using a single PMD will cause traffic to be enqueued to the same vhost
552 queue rather than being distributed among different vhost queues for a
553 vhost-user interface.
554
555 If traffic destined for a VM configured with multiqueue arrives to the
556 vswitch via a physical DPDK port, then the number of rxqs should also be
557 set to at least 2 for that physical DPDK port. This is required to increase
558 the probability that a different PMD will handle the multiqueue
559 transmission to the guest using a different vhost queue.
560
561 If one wishes to use multiple queues for an interface in the guest, the
562 driver in the guest operating system must be configured to do so. It is
563 recommended that the number of queues configured be equal to '$q'.
564
565 For example, this can be done for the Linux kernel virtio-net driver with:
566
567 ```
568 ethtool -L <DEV> combined <$q>
569 ```
570 where `-L`: Changes the numbers of channels of the specified network device
571 and `combined`: Changes the number of multi-purpose channels.
572
573 - VM Configuration with libvirt
574
575 * change the user/group, access control policty and restart libvirtd.
576
577 - In `/etc/libvirt/qemu.conf` add/edit the following lines
578
579 ```
580 user = "root"
581 group = "root"
582 ```
583
584 - Disable SELinux or set to permissive mode
585
586 `setenforce 0`
587
588 - Restart the libvirtd process, For example, on Fedora
589
590 `systemctl restart libvirtd.service`
591
592 * Instantiate the VM
593
594 - Copy the xml configuration from [Guest VM using libvirt] in to workspace.
595
596 - Start the VM.
597
598 `virsh create demovm.xml`
599
600 - Connect to the guest console
601
602 `virsh console demovm`
603
604 * VM configuration
605
606 The demovm xml configuration is aimed at achieving out of box performance
607 on VM.
608
609 - The vcpus are pinned to the cores of the CPU socket 0 using vcpupin.
610
611 - Configure NUMA cell and memory shared using memAccess='shared'.
612
613 - Disable mrg_rxbuf='off'.
614
615 Note: For information on libvirt and further tuning refer [libvirt].
616
c9b9d6df
BB
617### 6.3 DPDK backend inside VM
618
619 Please note that additional configuration is required if you want to run
620 ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd
621 creates separate DPDK TX queues for each CPU core available. This operation
622 fails inside QEMU virtual machine because, by default, VirtIO NIC provided
623 to the guest is configured to support only single TX queue and single RX
624 queue. To change this behavior, you need to turn on 'mq' (multiqueue)
625 property of all virtio-net-pci devices emulated by QEMU and used by DPDK.
626 You may do it manually (by changing QEMU command line) or, if you use
627 Libvirt, by adding the following string:
628
629 `<driver name='vhost' queues='N'/>`
630
631 to <interface> sections of all network devices used by DPDK. Parameter 'N'
632 determines how many queues can be used by the guest.This may not work with
633 old versions of QEMU found in some distros and need Qemu version >= 2.2.
634
635## <a name="qos"></a> 7. QOS
636
637Here is an example on QOS usage.
638Assuming you have a vhost-user port transmitting traffic consisting of
639packets of size 64 bytes, the following command would limit the egress
640transmission rate of the port to ~1,000,000 packets per second
641
642`ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create qos
643type=egress-policer other-config:cir=46000000 other-config:cbs=2048`
644
645To examine the QoS configuration of the port:
646
647`ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
648
649To clear the QoS configuration from the port and ovsdb use the following:
650
651`ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos`
652
653For more details regarding egress-policer parameters please refer to the
654vswitch.xml.
655
656## <a name="rl"></a> 8. Rate Limiting
657
658Here is an example on Ingress Policing usage.
659Assuming you have a vhost-user port receiving traffic consisting of
660packets of size 64 bytes, the following command would limit the reception
661rate of the port to ~1,000,000 packets per second:
662
663`ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000
664 ingress_policing_burst=1000`
665
666To examine the ingress policer configuration of the port:
667
668`ovs-vsctl list interface vhost-user0`
669
670To clear the ingress policer configuration from the port use the following:
671
672`ovs-vsctl set interface vhost-user0 ingress_policing_rate=0`
673
674For more details regarding ingress-policer see the vswitch.xml.
675
9fd39370
SC
676## <a name="fc"></a> 9. Flow control.
677Flow control can be enabled only on DPDK physical ports.
678To enable flow control support at tx side while adding a port, add the
679'tx-flow-ctrl' option to the 'ovs-vsctl add-port' as in the eg: below.
680
681```
682ovs-vsctl add-port br0 dpdk0 -- \
683set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
684```
685
686Similarly to enable rx flow control,
687
688```
689ovs-vsctl add-port br0 dpdk0 -- \
690set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
691```
692
693And to enable the flow control auto-negotiation,
694
695```
696ovs-vsctl add-port br0 dpdk0 -- \
697set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
698```
699
700To turn ON the tx flow control at run time(After the port is being added
701to OVS), the command-line input will be,
702
703`ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true`
704
705The flow control parameters can be turned off by setting 'false' to the
706respective parameter. To disable the flow control at tx side,
707
708`ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
709
4b88d678
CL
710## <a name="pdump"></a> 10. Pdump
711
712Pdump allows you to listen on DPDK ports and view the traffic that is
713passing on them. To use this utility, one must have libpcap installed
714on the system. Furthermore, DPDK must be built with CONFIG_RTE_LIBRTE_PDUMP=y
715and CONFIG_RTE_LIBRTE_PMD_PCAP=y.
716
717To use pdump, simply launch OVS as usual. Then, navigate to the 'app/pdump'
718directory in DPDK, 'make' the application and run like so:
719
720```
721sudo ./build/app/dpdk_pdump --
722--pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap
723--server-socket-path=/usr/local/var/run/openvswitch
724```
725
726The above command captures traffic received on queue 0 of port 0 and stores
727it in /tmp/pkts.pcap. Other combinations of port numbers, queues numbers and
728pcap locations are of course also available to use. 'server-socket-path' must
729be set to the value of ovs_rundir() which typically resolves to
730'/usr/local/var/run/openvswitch'.
731More information on the pdump app and its usage can be found in the below link.
732
733http://dpdk.org/doc/guides/sample_app_ug/pdump.html
734
735Many tools are available to view the contents of the pcap file. Once example is
736tcpdump. Issue the following command to view the contents of 'pkts.pcap':
737
738`tcpdump -r pkts.pcap`
739
740A performance decrease is expected when using a monitoring application like
741the DPDK pdump app.
742
0072e931
MK
743## <a name="jumbo"></a> 11. Jumbo Frames
744
745By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
746enable Jumbo Frames support for a DPDK port, change the Interface's `mtu_request`
747attribute to a sufficiently large value.
748
749e.g. Add a DPDK Phy port with MTU of 9000:
750
751`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set Interface dpdk0 mtu_request=9000`
752
753e.g. Change the MTU of an existing port to 6200:
754
755`ovs-vsctl set Interface dpdk0 mtu_request=6200`
756
757When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
758increased, such that a full Jumbo Frame of a specific size may be accommodated
759within a single mbuf segment.
760
761Jumbo frame support has been validated against 9728B frames (largest frame size
762supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
763(particularly in use cases involving East-West traffic only), and other DPDK NIC
764drivers may be supported.
765
d3254e21 766### 11.1 vHost Ports and Jumbo Frames
0072e931
MK
767
768Some additional configuration is needed to take advantage of jumbo frames with
769vhost ports:
770
d3254e21
BB
771 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
772 the QEMU command line snippet below:
0072e931 773
d3254e21
BB
774 ```
775 '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
776 '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
777 ```
0072e931 778
d3254e21
BB
779 2. Where virtio devices are bound to the Linux kernel driver in a guest
780 environment (i.e. interfaces are not bound to an in-guest DPDK driver),
781 the MTU of those logical network interfaces must also be increased to a
782 sufficiently large value. This avoids segmentation of Jumbo Frames
783 received in the guest. Note that 'MTU' refers to the length of the IP
784 packet only, and not that of the entire frame.
0072e931 785
d3254e21
BB
786 To calculate the exact MTU of a standard IPv4 frame, subtract the L2
787 header and CRC lengths (i.e. 18B) from the max supported frame size.
788 So, to set the MTU for a 9018B Jumbo Frame:
0072e931 789
d3254e21
BB
790 ```
791 ifconfig eth1 mtu 9000
792 ```
0072e931
MK
793
794## <a name="vsperf"></a> 12. Vsperf
c9b9d6df
BB
795
796Vsperf project goal is to develop vSwitch test framework that can be used to
797validate the suitability of different vSwitch implementations in a Telco deployment
798environment. More information can be found in below link.
799
800https://wiki.opnfv.org/display/vsperf/VSperf+Home
801
802
803Bug Reporting:
804--------------
805
806Please report problems to bugs@openvswitch.org.
807
808
809[INSTALL.userspace.md]:INSTALL.userspace.md
810[INSTALL.md]:INSTALL.md
811[DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules
812[DPDK Docs]: http://dpdk.org/doc
813[libvirt]: http://libvirt.org/formatdomain.html
814[Guest VM using libvirt]: INSTALL.DPDK.md#ovstc
8a8b9c4f 815[Vhost walkthrough]: INSTALL.DPDK.md#vhost
c9b9d6df
BB
816[INSTALL DPDK]: INSTALL.DPDK.md#build
817[INSTALL OVS]: INSTALL.DPDK.md#build