]> git.proxmox.com Git - ovs.git/blame - Documentation/howto/dpdk.rst
doc: Remove ivshmem instructions.
[ovs.git] / Documentation / howto / dpdk.rst
CommitLineData
e69e4f5b
SF
1..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24============================
25Using Open vSwitch with DPDK
26============================
27
28This document describes how to use Open vSwitch with DPDK datapath.
29
30.. important::
31
32 Using the DPDK datapath requires building OVS with DPDK support. Refer to
33 :doc:`/intro/install/dpdk` for more information.
34
35Ports and Bridges
36-----------------
37
38ovs-vsctl can be used to set up bridges and other Open vSwitch features.
39Bridges should be created with a ``datapath_type=netdev``::
40
41 $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
42
43ovs-vsctl can also be used to add DPDK devices. OVS expects DPDK device names
44to start with ``dpdk`` and end with a portid. ovs-vswitchd should print the
45number of dpdk devices found in the log file::
46
47 $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
48 $ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
49
50After the DPDK ports get added to switch, a polling thread continuously polls
51DPDK devices and consumes 100% of the core, as can be checked from ``top`` and
52``ps`` commands::
53
54 $ top -H
55 $ ps -eLo pid,psr,comm | grep pmd
56
57Creating bonds of DPDK interfaces is slightly different to creating bonds of
58system interfaces. For DPDK, the interface type must be explicitly set. For
59example::
60
61 $ ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 \
62 -- set Interface dpdk0 type=dpdk \
63 -- set Interface dpdk1 type=dpdk
64
65To stop ovs-vswitchd & delete bridge, run::
66
67 $ ovs-appctl -t ovs-vswitchd exit
68 $ ovs-appctl -t ovsdb-server exit
69 $ ovs-vsctl del-br br0
70
71PMD Thread Statistics
72---------------------
73
74To show current stats::
75
76 $ ovs-appctl dpif-netdev/pmd-stats-show
77
78To clear previous stats::
79
80 $ ovs-appctl dpif-netdev/pmd-stats-clear
81
82Port/RXQ Assigment to PMD Threads
83---------------------------------
84
85To show port/rxq assignment::
86
87 $ ovs-appctl dpif-netdev/pmd-rxq-show
88
89To change default rxq assignment to pmd threads, rxqs may be manually pinned to
90desired cores using::
91
92 $ ovs-vsctl set Interface <iface> \
93 other_config:pmd-rxq-affinity=<rxq-affinity-list>
94
95where:
96
97- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
98
99For example::
100
101 $ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
102 other_config:pmd-rxq-affinity="0:3,1:7,3:8"
103
104This will ensure:
105
106- Queue #0 pinned to core 3
107- Queue #1 pinned to core 7
108- Queue #2 not pinned
109- Queue #3 pinned to core 8
110
111After that PMD threads on cores where RX queues was pinned will become
112``isolated``. This means that this thread will poll only pinned RX queues.
113
114.. warning::
115 If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will
116 not be polled. Also, if provided ``core_id`` is not available (ex. this
117 ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD
118 thread.
119
120QoS
121---
122
123Assuming you have a vhost-user port transmitting traffic consisting of packets
124of size 64 bytes, the following command would limit the egress transmission
125rate of the port to ~1,000,000 packets per second::
126
127 $ ovs-vsctl set port vhost-user0 qos=@newqos -- \
128 --id=@newqos create qos type=egress-policer other-config:cir=46000000 \
129 other-config:cbs=2048`
130
131To examine the QoS configuration of the port, run::
132
133 $ ovs-appctl -t ovs-vswitchd qos/show vhost-user0
134
135To clear the QoS configuration from the port and ovsdb, run::
136
137 $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
138
139Refer to vswitch.xml for more details on egress-policer.
140
141Rate Limiting
142--------------
143
144Here is an example on Ingress Policing usage. Assuming you have a vhost-user
145port receiving traffic consisting of packets of size 64 bytes, the following
146command would limit the reception rate of the port to ~1,000,000 packets per
147second::
148
149 $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 \
150 ingress_policing_burst=1000`
151
152To examine the ingress policer configuration of the port::
153
154 $ ovs-vsctl list interface vhost-user0
155
156To clear the ingress policer configuration from the port::
157
158 $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0
159
160Refer to vswitch.xml for more details on ingress-policer.
161
162Flow Control
163------------
164
165Flow control can be enabled only on DPDK physical ports. To enable flow control
166support at tx side while adding a port, run::
167
168 $ ovs-vsctl add-port br0 dpdk0 -- \
169 set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
170
171Similarly, to enable rx flow control, run::
172
173 $ ovs-vsctl add-port br0 dpdk0 -- \
174 set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
175
176To enable flow control auto-negotiation, run::
177
178 $ ovs-vsctl add-port br0 dpdk0 -- \
179 set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
180
181To turn ON the tx flow control at run time for an existing port, run::
182
183 $ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true
184
185The flow control parameters can be turned off by setting ``false`` to the
186respective parameter. To disable the flow control at tx side, run::
187
188 $ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false
189
190pdump
191-----
192
193pdump allows you to listen on DPDK ports and view the traffic that is passing
194on them. To use this utility, one must have libpcap installed on the system.
195Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and
196``CONFIG_RTE_LIBRTE_PMD_PCAP=y``.
197
198.. warning::
199 A performance decrease is expected when using a monitoring application like
200 the DPDK pdump app.
201
202To use pdump, simply launch OVS as usual, then navigate to the ``app/pdump``
203directory in DPDK, ``make`` the application and run like so::
204
205 $ sudo ./build/app/dpdk-pdump -- \
206 --pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \
207 --server-socket-path=/usr/local/var/run/openvswitch
208
209The above command captures traffic received on queue 0 of port 0 and stores it
210in ``/tmp/pkts.pcap``. Other combinations of port numbers, queues numbers and
211pcap locations are of course also available to use. For example, to capture all
212packets that traverse port 0 in a single pcap file::
213
214 $ sudo ./build/app/dpdk-pdump -- \
215 --pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap' \
216 --server-socket-path=/usr/local/var/run/openvswitch
217
218``server-socket-path`` must be set to the value of ``ovs_rundir()`` which
219typically resolves to ``/usr/local/var/run/openvswitch``.
220
221Many tools are available to view the contents of the pcap file. Once example is
222tcpdump. Issue the following command to view the contents of ``pkts.pcap``::
223
224 $ tcpdump -r pkts.pcap
225
226More information on the pdump app and its usage can be found in the `DPDK docs
34aa9cf9 227<http://dpdk.org/doc/guides/tools/pdump.html>`__.
e69e4f5b
SF
228
229Jumbo Frames
230------------
231
232By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
233enable Jumbo Frames support for a DPDK port, change the Interface's
234``mtu_request`` attribute to a sufficiently large value. For example, to add a
235DPDK Phy port with MTU of 9000::
236
237 $ ovs-vsctl add-port br0 dpdk0 \
238 -- set Interface dpdk0 type=dpdk \
239 -- set Interface dpdk0 mtu_request=9000`
240
241Similarly, to change the MTU of an existing port to 6200::
242
243 $ ovs-vsctl set Interface dpdk0 mtu_request=6200
244
245Some additional configuration is needed to take advantage of jumbo frames with
246vHost ports:
247
2481. *mergeable buffers* must be enabled for vHost ports, as demonstrated in the
249 QEMU command line snippet below::
250
251 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
252 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on
253
2542. Where virtio devices are bound to the Linux kernel driver in a guest
255 environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
256 MTU of those logical network interfaces must also be increased to a
257 sufficiently large value. This avoids segmentation of Jumbo Frames received
258 in the guest. Note that 'MTU' refers to the length of the IP packet only,
259 and not that of the entire frame.
260
261 To calculate the exact MTU of a standard IPv4 frame, subtract the L2 header
262 and CRC lengths (i.e. 18B) from the max supported frame size. So, to set
263 the MTU for a 9018B Jumbo Frame::
264
265 $ ifconfig eth1 mtu 9000
266
267When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
268increased, such that a full Jumbo Frame of a specific size may be accommodated
269within a single mbuf segment.
270
271Jumbo frame support has been validated against 9728B frames, which is the
272largest frame size supported by Fortville NIC using the DPDK i40e driver, but
273larger frames and other DPDK NIC drivers may be supported. These cases are
274common for use cases involving East-West traffic only.
275
1a2bb118
SC
276Rx Checksum Offload
277-------------------
278
279By default, DPDK physical ports are enabled with Rx checksum offload. Rx
280checksum offload can be configured on a DPDK physical port either when adding
281or at run time.
282
283To disable Rx checksum offload when adding a DPDK port dpdk0::
284
285 $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
286 options:rx-checksum-offload=false
287
288Similarly to disable the Rx checksum offloading on a existing DPDK port dpdk0::
289
290 $ ovs-vsctl set Interface dpdk0 type=dpdk options:rx-checksum-offload=false
291
292Rx checksum offload can offer performance improvement only for tunneling
293traffic in OVS-DPDK because the checksum validation of tunnel packets is
294offloaded to the NIC. Also enabling Rx checksum may slightly reduce the
295performance of non-tunnel traffic, specifically for smaller size packet.
296DPDK vectorization is disabled when checksum offloading is configured on DPDK
297physical ports which in turn effects the non-tunnel traffic performance.
298So it is advised to turn off the Rx checksum offload for non-tunnel traffic use
299cases to achieve the best performance.
300
e69e4f5b
SF
301.. _dpdk-ovs-in-guest:
302
303OVS with DPDK Inside VMs
304------------------------
305
306Additional configuration is required if you want to run ovs-vswitchd with DPDK
307backend inside a QEMU virtual machine. ovs-vswitchd creates separate DPDK TX
308queues for each CPU core available. This operation fails inside QEMU virtual
309machine because, by default, VirtIO NIC provided to the guest is configured to
310support only single TX queue and single RX queue. To change this behavior, you
311need to turn on ``mq`` (multiqueue) property of all ``virtio-net-pci`` devices
312emulated by QEMU and used by DPDK. You may do it manually (by changing QEMU
313command line) or, if you use Libvirt, by adding the following string to
314``<interface>`` sections of all network devices used by DPDK::
315
316 <driver name='vhost' queues='N'/>
317
318where:
319
320``N``
321 determines how many queues can be used by the guest.
322
323This requires QEMU >= 2.2.
324
325.. _dpdk-phy-phy:
326
327PHY-PHY
328-------
329
330Add a userspace bridge and two ``dpdk`` (PHY) ports::
331
332 # Add userspace bridge
333 $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
334
335 # Add two dpdk ports
336 $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
337 $ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
338
339Add test flows to forward packets betwen DPDK port 0 and port 1::
340
341 # Clear current flows
342 $ ovs-ofctl del-flows br0
343
344 # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
345 $ ovs-ofctl add-flow br0 in_port=1,action=output:2
346 $ ovs-ofctl add-flow br0 in_port=2,action=output:1
347
348Transmit traffic into either port. You should see it returned via the other.
349
350.. _dpdk-vhost-loopback:
351
352PHY-VM-PHY (vHost Loopback)
353---------------------------
354
355Add a userspace bridge, two ``dpdk`` (PHY) ports, and two ``dpdkvhostuser``
356ports::
357
358 # Add userspace bridge
359 $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
360
361 # Add two dpdk ports
362 $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
363 $ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
364
365 # Add two dpdkvhostuser ports
366 $ ovs-vsctl add-port br0 dpdkvhostuser0 \
367 -- set Interface dpdkvhostuser0 type=dpdkvhostuser
368 $ ovs-vsctl add-port br0 dpdkvhostuser1 \
369 -- set Interface dpdkvhostuser1 type=dpdkvhostuser
370
371Add test flows to forward packets betwen DPDK devices and VM ports::
372
373 # Clear current flows
374 $ ovs-ofctl del-flows br0
375
376 # Add flows
377 $ ovs-ofctl add-flow br0 in_port=1,action=output:3
378 $ ovs-ofctl add-flow br0 in_port=3,action=output:1
379 $ ovs-ofctl add-flow br0 in_port=4,action=output:2
380 $ ovs-ofctl add-flow br0 in_port=2,action=output:4
381
382 # Dump flows
383 $ ovs-ofctl dump-flows br0
384
385Create a VM using the following configuration:
386
387+----------------------+--------+-----------------+
388| configuration | values | comments |
389+----------------------+--------+-----------------+
390| qemu version | 2.2.0 | n/a |
391| qemu thread affinity | core 5 | taskset 0x20 |
392| memory | 4GB | n/a |
393| cores | 2 | n/a |
394| Qcow2 image | CentOS7| n/a |
395| mrg_rxbuf | off | n/a |
396+----------------------+--------+-----------------+
397
398You can do this directly with QEMU via the ``qemu-system-x86_64`` application::
399
400 $ export VM_NAME=vhost-vm
401 $ export GUEST_MEM=3072M
402 $ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2
403 $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
404
405 $ taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm \
406 -m $GUEST_MEM -drive file=$QCOW2_IMAGE --nographic -snapshot \
407 -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 \
408 -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \
409 -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \
410 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
411 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off \
412 -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \
413 -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \
414 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off
415
416For a explanation of this command, along with alternative approaches such as
417booting the VM via libvirt, refer to :doc:`/topics/dpdk/vhost-user`.
418
419Once the guest is configured and booted, configure DPDK packet forwarding
420within the guest. To accomplish this, build the ``testpmd`` application as
421described in :ref:`dpdk-testpmd`. Once compiled, run the application::
422
423 $ cd $DPDK_DIR/app/test-pmd;
424 $ ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- \
425 --burst=64 -i --txqflags=0xf00 --disable-hw-vlan
426 $ set fwd mac retry
427 $ start
428
429When you finish testing, bind the vNICs back to kernel::
430
431 $ $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci 0000:00:03.0
432 $ $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci 0000:00:04.0
433
434.. note::
435
436 Valid PCI IDs must be passed in above example. The PCI IDs can be retrieved
437 like so::
438
439 $ $DPDK_DIR/tools/dpdk-devbind.py --status
440
441More information on the dpdkvhostuser ports can be found in
442:doc:`/topics/dpdk/vhost-user`.
443
444PHY-VM-PHY (vHost Loopback) (Kernel Forwarding)
445~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
446
447:ref:`dpdk-vhost-loopback` details steps for PHY-VM-PHY loopback
448testcase and packet forwarding using DPDK testpmd application in the Guest VM.
449For users wishing to do packet forwarding using kernel stack below, you need to
450run the below commands on the guest::
451
452 $ ifconfig eth1 1.1.1.2/24
453 $ ifconfig eth2 1.1.2.2/24
454 $ systemctl stop firewalld.service
455 $ systemctl stop iptables.service
456 $ sysctl -w net.ipv4.ip_forward=1
457 $ sysctl -w net.ipv4.conf.all.rp_filter=0
458 $ sysctl -w net.ipv4.conf.eth1.rp_filter=0
459 $ sysctl -w net.ipv4.conf.eth2.rp_filter=0
460 $ route add -net 1.1.2.0/24 eth2
461 $ route add -net 1.1.1.0/24 eth1
462 $ arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE
463 $ arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE
464
465PHY-VM-PHY (vHost Multiqueue)
466~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
467
468vHost Multiqueue functionality can also be validated using the PHY-VM-PHY
469configuration. To begin, follow the steps described in :ref:`dpdk-phy-phy` to
470create and initialize the database, start ovs-vswitchd and add ``dpdk``-type
471devices to bridge ``br0``. Once complete, follow the below steps:
472
4731. Configure PMD and RXQs.
474
475 For example, set the number of dpdk port rx queues to at least 2 The number
476 of rx queues at vhost-user interface gets automatically configured after
477 virtio device connection and doesn't need manual configuration::
478
479 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xc
480 $ ovs-vsctl set Interface dpdk0 options:n_rxq=2
481 $ ovs-vsctl set Interface dpdk1 options:n_rxq=2
482
4832. Instantiate Guest VM using QEMU cmdline
484
485 We must configure with appropriate software versions to ensure this feature
486 is supported.
487
488 .. list-table:: Recommended BIOS Settings
489 :header-rows: 1
490
491 * - Setting
492 - Value
493 * - QEMU version
494 - 2.5.0
495 * - QEMU thread affinity
496 - 2 cores (taskset 0x30)
497 * - Memory
498 - 4 GB
499 * - Cores
500 - 2
501 * - Distro
502 - Fedora 22
503 * - Multiqueue
504 - Enabled
505
506 To do this, instantiate the guest as follows::
507
508 $ export VM_NAME=vhost-vm
509 $ export GUEST_MEM=4096M
510 $ export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2
511 $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
512 $ taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -m 4096M \
513 -drive file=$QCOW2_IMAGE --enable-kvm -name $VM_NAME \
514 -nographic -numa node,memdev=mem -mem-prealloc \
515 -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \
516 -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \
517 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 \
518 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 \
519 -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \
520 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 \
521 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6
522
523 .. note::
524 Queue value above should match the queues configured in OVS, The vector
525 value should be set to "number of queues x 2 + 2"
526
5273. Configure the guest interface
528
529 Assuming there are 2 interfaces in the guest named eth0, eth1 check the
530 channel configuration and set the number of combined channels to 2 for
531 virtio devices::
532
533 $ ethtool -l eth0
534 $ ethtool -L eth0 combined 2
535 $ ethtool -L eth1 combined 2
536
537 More information can be found in vHost walkthrough section.
538
5394. Configure kernel packet forwarding
540
541 Configure IP and enable interfaces::
542
543 $ ifconfig eth0 5.5.5.1/24 up
544 $ ifconfig eth1 90.90.90.1/24 up
545
546 Configure IP forwarding and add route entries::
547
548 $ sysctl -w net.ipv4.ip_forward=1
549 $ sysctl -w net.ipv4.conf.all.rp_filter=0
550 $ sysctl -w net.ipv4.conf.eth0.rp_filter=0
551 $ sysctl -w net.ipv4.conf.eth1.rp_filter=0
552 $ ip route add 2.1.1.0/24 dev eth1
553 $ route add default gw 2.1.1.2 eth1
554 $ route add default gw 90.90.90.90 eth1
555 $ arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE
556 $ arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA
557
558 Check traffic on multiple queues::
559
560 $ cat /proc/interrupts | grep virtio