2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
6 http://www.apache.org/licenses/LICENSE-2.0
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
14 Convention for heading levels in Open vSwitch documentation:
16 ======= Heading 0 (reserved for the title in a document)
22 Avoid deeper levels because they do not render well.
25 ========================
26 Open vSwitch with AF_XDP
27 ========================
29 This document describes how to build and install Open vSwitch using
33 The AF_XDP support of Open vSwitch is considered 'experimental',
34 and it is not compiled in by default.
39 AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
40 built upon the eBPF and XDP technology. It is aims to have comparable
41 performance to DPDK but cooperate better with existing kernel's networking
42 stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
43 attached to the netdev, by-passing a couple of Linux kernel's subsystems.
44 As a result, AF_XDP socket shows much better performance than AF_PACKET.
45 For more details about AF_XDP, please see linux kernel's
46 Documentation/networking/af_xdp.rst
51 OVS has a couple of netdev types, i.e., system, tap, or
52 dpdk. The AF_XDP feature adds a new netdev types called
53 "afxdp", and implement its configuration, packet reception,
54 and transmit functions. Since the AF_XDP socket, called xsk,
55 operates in userspace, once ovs-vswitchd receives packets
56 from xsk, the afxdp netdev re-uses the existing userspace
57 dpif-netdev datapath. As a result, most of the packet processing
58 happens at the userspace instead of linux kernel.
62 | +-------------------+
63 | | ovs-vswitchd |<-->ovsdb-server
64 | +-------------------+
65 | | ofproto |<-->OpenFlow controllers
66 | +--------+-+--------+
67 | | netdev | |ofproto-|
68 userspace | +--------+ | dpif |
69 | | afxdp | +--------+
71 | +---||---+ +--------+
76 _ +---||-----+--------+
79 |_ +--------||---------+
88 In addition to the requirements described in :doc:`general`, building Open
89 vSwitch with AF_XDP will require the following:
91 - libbpf from kernel source tree (kernel 5.0.0 or later)
93 - Linux kernel XDP support, with the following options (required)
97 * CONFIG_BPF_SYSCALL=y
99 * CONFIG_XDP_SOCKETS=y
102 - The following optional Kconfig options are also recommended, but not
105 * CONFIG_BPF_JIT=y (Performance)
107 * CONFIG_HAVE_BPF_JIT=y (Performance)
109 * CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
111 - Once your AF_XDP-enabled kernel is ready, if possible, run
112 **./xdpsock -r -N -z -i <your device>** under linux/samples/bpf.
113 This is an OVS independent benchmark tools for AF_XDP.
114 It makes sure your basic kernel requirements are met for AF_XDP.
119 For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
120 First, clone a recent version of Linux bpf-next tree::
122 git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
124 Second, go into the Linux source directory and build libbpf in the tools
133 Make sure xsk.h and bpf.h are installed in system's library path,
134 e.g. /usr/local/include/bpf/ or /usr/include/bpf/
136 Make sure the libbpf.so is installed correctly::
139 ldconfig -p | grep libbpf
141 Third, ensure the standard OVS requirements are installed and
142 bootstrap/configure the package::
144 ./boot.sh && ./configure --enable-afxdp
146 Finally, build and install OVS::
150 To kick start end-to-end autotesting::
152 uname -a # make sure having 5.0+ kernel
153 make check-afxdp TESTSUITEFLAGS='1'
156 Not all test cases pass at this time. Currenly all cvlan tests are skipped
157 due to kernel issues.
159 If a test case fails, check the log at::
162 tests/system-afxdp-testsuite.dir/<test num>/system-afxdp-testsuite.log
167 Before running OVS with AF_XDP, make sure the libbpf and libelf are
170 ldd vswitchd/ovs-vswitchd
172 Open vSwitch should be started using userspace datapath as described
176 ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
178 Make sure your device driver support AF_XDP, netdev-afxdp supports
179 the following additional options (see ``man ovs-vswitchd.conf.db`` for
182 * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
183 ``native`` or ``generic``. Defaults to ``best-effort``, i.e. best of
184 supported modes, so in most cases you don't need to change it.
186 * ``use-need-wakeup``: default ``true`` if libbpf supports it,
189 For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
190 configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
193 ethtool -L enp2s0 combined 1
194 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
195 ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
196 other_config:pmd-rxq-affinity="0:4"
198 Or, use 4 pmds/cores and 4 queues by doing::
200 ethtool -L enp2s0 combined 4
201 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
202 ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
203 options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
206 ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
207 ``n_rxq`` equals ``1`` by default.
209 To validate that the bridge has successfully instantiated, you can use the::
213 Should show something like::
220 Otherwise, enable debugging by::
222 ovs-appctl vlog/set netdev_afxdp::dbg
224 To check which XDP mode was chosen by ``best-effort``, you can look for
225 ``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
227 # ovs-appctl dpctl/show
230 port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
231 xdp-mode=best-effort,
232 xdp-mode-in-use=native-with-zerocopy)
236 Most of the design details are described in the paper presented at
237 Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
238 section 4, and slides[2][4].
239 "The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
240 about AF_XDP current and future work.
242 [1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
244 [2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
246 [3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
248 [4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
253 The name of the game is to keep your CPU running in userspace, allowing PMD
254 to keep polling the AF_XDP queues without any interferences from kernel.
256 #. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
257 running cores, device plug-in slot)
259 #. Isolate your CPU by doing isolcpu at grub configure.
261 #. IRQ should not set to pmd running core.
263 #. The Spectre and Meltdown fixes increase the overhead of system calls.
266 Debugging performance issue
267 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
268 While running the traffic, use linux perf tool to see where your cpu
271 cd bpf-next/tools/perf
273 ./perf record -p `pidof ovs-vswitchd` sleep 10
276 Measure your system call rate by doing::
278 pstree -p `pidof ovs-vswitchd`
279 strace -c -p <your pmd's PID>
281 Or, use OVS pmd tool::
283 ovs-appctl dpif-netdev/pmd-stats-show
289 Below is a script using namespaces and veth peer::
292 ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
293 --disable-system --detach \
294 ovs-vsctl -- add-br br0 -- set Bridge br0 \
295 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
296 fail-mode=secure datapath_type=netdev
297 ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
300 ovs-appctl vlog/set netdev_afxdp::dbg
302 ip link add p0 type veth peer name afxdp-p0
303 ip link set p0 netns at_ns0
304 ip link set dev afxdp-p0 up
305 ovs-vsctl add-port br0 afxdp-p0 -- \
306 set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
308 ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
309 ip addr add "10.1.1.1/24" dev p0
310 ip link set dev p0 up
314 ip link add p1 type veth peer name afxdp-p1
315 ip link set p1 netns at_ns1
316 ip link set dev afxdp-p1 up
318 ovs-vsctl add-port br0 afxdp-p1 -- \
319 set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
320 ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
321 ip addr add "10.1.1.2/24" dev p1
322 ip link set dev p1 up
325 ip netns exec at_ns0 ping -i .2 10.1.1.2
328 Limitations/Known Issues
329 ------------------------
330 #. Device's numa ID is always 0, need a way to find numa id from a netdev.
331 #. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
332 work-around is to use OpenFlow meter action.
333 #. Most of the tests are done using i40e single port. Multiple ports and
334 also ixgbe driver also needs to be tested.
335 #. No latency test result (TODO items)
336 #. Due to limitations of current upstream kernel, various offloading
337 (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
338 Also, TCP is not working over virtual interfaces (veth) in generic XDP mode.
339 Some more information and possible workaround available `here
340 <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
341 For TAP interfaces generic mode seems to work fine (TCP works) and even
342 could provide better performance than native mode in some cases.
347 Assume you have enp2s0 as physical nic, and a tap device connected to VM.
348 First, start OVS, then add physical port::
350 ethtool -L enp2s0 combined 1
351 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
352 ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
353 options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
355 Start a VM with virtio and tap device::
357 qemu-system-x86_64 -hda ubuntu1810.qcow \
359 -cpu host,+x2apic -enable-kvm \
360 -device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
361 vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
362 -netdev type=tap,id=net0,vhost=on,queues=8 \
363 -object memory-backend-file,id=mem,size=4096M,\
364 mem-path=/dev/hugepages,share=on \
365 -numa node,memdev=mem -mem-prealloc -smp 2
367 Create OpenFlow rules::
369 ovs-vsctl add-port br0 tap0 -- set interface tap0
370 ovs-ofctl del-flows br0
371 ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
372 ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
374 Inside the VM, use xdp_rxq_info to bounce back the traffic::
376 ./xdp_rxq_info --dev ens3 --action XDP_TX
379 PVP using vhostuser device
380 --------------------------
381 First, build OVS with DPDK and AFXDP::
383 ./configure --enable-afxdp --with-dpdk=<dpdk path>
384 make -j4 && make install
386 Create a vhost-user port from OVS::
388 ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
389 ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
390 other_config:pmd-cpu-mask=0xfff
391 ovs-vsctl add-port br0 vhost-user-1 \
392 -- set Interface vhost-user-1 type=dpdkvhostuser
394 Start VM using vhost-user mode::
396 qemu-system-x86_64 -hda ubuntu1810.qcow \
398 -cpu host,+x2apic -enable-kvm \
399 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \
400 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \
401 -device virtio-net-pci,mac=00:00:00:00:00:01,\
402 netdev=mynet1,mq=on,vectors=10 \
403 -object memory-backend-file,id=mem,size=4096M,\
404 mem-path=/dev/hugepages,share=on \
405 -numa node,memdev=mem -mem-prealloc -smp 2
407 Setup the OpenFlow ruls::
409 ovs-ofctl del-flows br0
410 ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:vhost-user-1"
411 ovs-ofctl add-flow br0 "in_port=vhost-user-1, actions=output:enp2s0"
413 Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
415 ./xdp_rxq_info --dev ens3 --action XDP_DROP
416 ./xdp_rxq_info --dev ens3 --action XDP_TX
419 PCP container using veth
420 ------------------------
421 Create namespace and veth peer devices::
424 ip link add p0 type veth peer name afxdp-p0
425 ip link set p0 netns at_ns0
426 ip link set dev afxdp-p0 up
427 ip netns exec at_ns0 ip link set dev p0 up
429 Attach the veth port to br0 (linux kernel mode)::
431 ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
435 ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
437 Setup the OpenFlow rules::
439 ovs-ofctl del-flows br0
440 ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
441 ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
443 In the namespace, run drop or bounce back the packet::
445 ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
446 ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
452 Please report problems to dev@openvswitch.org.