]> git.proxmox.com Git - mirror_ovs.git/blob - Documentation/intro/install/afxdp.rst
15e3c918f9421b3a793ae157d0af1ad02ad78ea3
[mirror_ovs.git] / Documentation / intro / install / afxdp.rst
1 ..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24
25 ========================
26 Open vSwitch with AF_XDP
27 ========================
28
29 This document describes how to build and install Open vSwitch using
30 AF_XDP netdev.
31
32 .. warning::
33 The AF_XDP support of Open vSwitch is considered 'experimental',
34 and it is not compiled in by default.
35
36
37 Introduction
38 ------------
39 AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
40 built upon the eBPF and XDP technology. It is aims to have comparable
41 performance to DPDK but cooperate better with existing kernel's networking
42 stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
43 attached to the netdev, by-passing a couple of Linux kernel's subsystems.
44 As a result, AF_XDP socket shows much better performance than AF_PACKET.
45 For more details about AF_XDP, please see linux kernel's
46 Documentation/networking/af_xdp.rst
47
48
49 AF_XDP Netdev
50 -------------
51 OVS has a couple of netdev types, i.e., system, tap, or
52 dpdk. The AF_XDP feature adds a new netdev types called
53 "afxdp", and implement its configuration, packet reception,
54 and transmit functions. Since the AF_XDP socket, called xsk,
55 operates in userspace, once ovs-vswitchd receives packets
56 from xsk, the afxdp netdev re-uses the existing userspace
57 dpif-netdev datapath. As a result, most of the packet processing
58 happens at the userspace instead of linux kernel.
59
60 ::
61
62 | +-------------------+
63 | | ovs-vswitchd |<-->ovsdb-server
64 | +-------------------+
65 | | ofproto |<-->OpenFlow controllers
66 | +--------+-+--------+
67 | | netdev | |ofproto-|
68 userspace | +--------+ | dpif |
69 | | afxdp | +--------+
70 | | netdev | | dpif |
71 | +---||---+ +--------+
72 | || | dpif- |
73 | || | netdev |
74 |_ || +--------+
75 ||
76 _ +---||-----+--------+
77 | | AF_XDP prog + |
78 kernel | | xsk_map |
79 |_ +--------||---------+
80 ||
81 physical
82 NIC
83
84
85 Build requirements
86 ------------------
87
88 In addition to the requirements described in :doc:`general`, building Open
89 vSwitch with AF_XDP will require the following:
90
91 - libbpf from kernel source tree (kernel 5.0.0 or later)
92
93 - Linux kernel XDP support, with the following options (required)
94
95 * CONFIG_BPF=y
96
97 * CONFIG_BPF_SYSCALL=y
98
99 * CONFIG_XDP_SOCKETS=y
100
101
102 - The following optional Kconfig options are also recommended, but not
103 required:
104
105 * CONFIG_BPF_JIT=y (Performance)
106
107 * CONFIG_HAVE_BPF_JIT=y (Performance)
108
109 * CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
110
111 - Once your AF_XDP-enabled kernel is ready, if possible, run
112 **./xdpsock -r -N -z -i <your device>** under linux/samples/bpf.
113 This is an OVS independent benchmark tools for AF_XDP.
114 It makes sure your basic kernel requirements are met for AF_XDP.
115
116
117 Installing
118 ----------
119 For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
120 First, clone a recent version of Linux bpf-next tree::
121
122 git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
123
124 Second, go into the Linux source directory and build libbpf in the tools
125 directory::
126
127 cd bpf-next/
128 cd tools/lib/bpf/
129 make && make install
130 make install_headers
131
132 .. note::
133 Make sure xsk.h and bpf.h are installed in system's library path,
134 e.g. /usr/local/include/bpf/ or /usr/include/bpf/
135
136 Make sure the libbpf.so is installed correctly::
137
138 ldconfig
139 ldconfig -p | grep libbpf
140
141 Third, ensure the standard OVS requirements are installed and
142 bootstrap/configure the package::
143
144 ./boot.sh && ./configure --enable-afxdp
145
146 Finally, build and install OVS::
147
148 make && make install
149
150 To kick start end-to-end autotesting::
151
152 uname -a # make sure having 5.0+ kernel
153 make check-afxdp TESTSUITEFLAGS='1'
154
155 .. note::
156 Not all test cases pass at this time. Currenly all cvlan tests are skipped
157 due to kernel issues.
158
159 If a test case fails, check the log at::
160
161 cat \
162 tests/system-afxdp-testsuite.dir/<test num>/system-afxdp-testsuite.log
163
164
165 Setup AF_XDP netdev
166 -------------------
167 Before running OVS with AF_XDP, make sure the libbpf and libelf are
168 set-up right::
169
170 ldd vswitchd/ovs-vswitchd
171
172 Open vSwitch should be started using userspace datapath as described
173 in :doc:`general`::
174
175 ovs-vswitchd ...
176 ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
177
178 Make sure your device driver support AF_XDP, netdev-afxdp supports
179 the following additional options (see ``man ovs-vswitchd.conf.db`` for
180 more details):
181
182 * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
183 ``native`` or ``generic``. Defaults to ``best-effort``, i.e. best of
184 supported modes, so in most cases you don't need to change it.
185
186 * ``use-need-wakeup``: default ``true`` if libbpf supports it,
187 otherwise ``false``.
188
189 For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
190 configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
191 ``n_rxq``::
192
193 ethtool -L enp2s0 combined 1
194 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
195 ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
196 other_config:pmd-rxq-affinity="0:4"
197
198 Or, use 4 pmds/cores and 4 queues by doing::
199
200 ethtool -L enp2s0 combined 4
201 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
202 ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
203 options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
204
205 .. note::
206 ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
207 ``n_rxq`` equals ``1`` by default.
208
209 To validate that the bridge has successfully instantiated, you can use the::
210
211 ovs-vsctl show
212
213 Should show something like::
214
215 Port "ens802f0"
216 Interface "ens802f0"
217 type: afxdp
218 options: {n_rxq="1"}
219
220 Otherwise, enable debugging by::
221
222 ovs-appctl vlog/set netdev_afxdp::dbg
223
224 To check which XDP mode was chosen by ``best-effort``, you can look for
225 ``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
226
227 # ovs-appctl dpctl/show
228 netdev@ovs-netdev:
229 <...>
230 port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
231 xdp-mode=best-effort,
232 xdp-mode-in-use=native-with-zerocopy)
233
234 References
235 ----------
236 Most of the design details are described in the paper presented at
237 Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
238 section 4, and slides[2][4].
239 "The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
240 about AF_XDP current and future work.
241
242 [1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
243
244 [2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
245
246 [3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
247
248 [4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
249
250
251 Performance Tuning
252 ------------------
253 The name of the game is to keep your CPU running in userspace, allowing PMD
254 to keep polling the AF_XDP queues without any interferences from kernel.
255
256 #. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
257 running cores, device plug-in slot)
258
259 #. Isolate your CPU by doing isolcpu at grub configure.
260
261 #. IRQ should not set to pmd running core.
262
263 #. The Spectre and Meltdown fixes increase the overhead of system calls.
264
265
266 Debugging performance issue
267 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
268 While running the traffic, use linux perf tool to see where your cpu
269 spends its cycle::
270
271 cd bpf-next/tools/perf
272 make
273 ./perf record -p `pidof ovs-vswitchd` sleep 10
274 ./perf report
275
276 Measure your system call rate by doing::
277
278 pstree -p `pidof ovs-vswitchd`
279 strace -c -p <your pmd's PID>
280
281 Or, use OVS pmd tool::
282
283 ovs-appctl dpif-netdev/pmd-stats-show
284
285
286 Example Script
287 --------------
288
289 Below is a script using namespaces and veth peer::
290
291 #!/bin/bash
292 ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
293 --disable-system --detach \
294 ovs-vsctl -- add-br br0 -- set Bridge br0 \
295 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
296 fail-mode=secure datapath_type=netdev
297 ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
298
299 ip netns add at_ns0
300 ovs-appctl vlog/set netdev_afxdp::dbg
301
302 ip link add p0 type veth peer name afxdp-p0
303 ip link set p0 netns at_ns0
304 ip link set dev afxdp-p0 up
305 ovs-vsctl add-port br0 afxdp-p0 -- \
306 set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
307
308 ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
309 ip addr add "10.1.1.1/24" dev p0
310 ip link set dev p0 up
311 NS_EXEC_HEREDOC
312
313 ip netns add at_ns1
314 ip link add p1 type veth peer name afxdp-p1
315 ip link set p1 netns at_ns1
316 ip link set dev afxdp-p1 up
317
318 ovs-vsctl add-port br0 afxdp-p1 -- \
319 set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
320 ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
321 ip addr add "10.1.1.2/24" dev p1
322 ip link set dev p1 up
323 NS_EXEC_HEREDOC
324
325 ip netns exec at_ns0 ping -i .2 10.1.1.2
326
327
328 Limitations/Known Issues
329 ------------------------
330 #. Device's numa ID is always 0, need a way to find numa id from a netdev.
331 #. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
332 work-around is to use OpenFlow meter action.
333 #. Most of the tests are done using i40e single port. Multiple ports and
334 also ixgbe driver also needs to be tested.
335 #. No latency test result (TODO items)
336 #. Due to limitations of current upstream kernel, various offloading
337 (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
338 Also, TCP is not working over virtual interfaces (veth) in generic XDP mode.
339 Some more information and possible workaround available `here
340 <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
341 For TAP interfaces generic mode seems to work fine (TCP works) and even
342 could provide better performance than native mode in some cases.
343
344
345 PVP using tap device
346 --------------------
347 Assume you have enp2s0 as physical nic, and a tap device connected to VM.
348 First, start OVS, then add physical port::
349
350 ethtool -L enp2s0 combined 1
351 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
352 ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
353 options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
354
355 Start a VM with virtio and tap device::
356
357 qemu-system-x86_64 -hda ubuntu1810.qcow \
358 -m 4096 \
359 -cpu host,+x2apic -enable-kvm \
360 -device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
361 vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
362 -netdev type=tap,id=net0,vhost=on,queues=8 \
363 -object memory-backend-file,id=mem,size=4096M,\
364 mem-path=/dev/hugepages,share=on \
365 -numa node,memdev=mem -mem-prealloc -smp 2
366
367 Create OpenFlow rules::
368
369 ovs-vsctl add-port br0 tap0 -- set interface tap0
370 ovs-ofctl del-flows br0
371 ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
372 ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
373
374 Inside the VM, use xdp_rxq_info to bounce back the traffic::
375
376 ./xdp_rxq_info --dev ens3 --action XDP_TX
377
378
379 PVP using vhostuser device
380 --------------------------
381 First, build OVS with DPDK and AFXDP::
382
383 ./configure --enable-afxdp --with-dpdk=<dpdk path>
384 make -j4 && make install
385
386 Create a vhost-user port from OVS::
387
388 ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
389 ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
390 other_config:pmd-cpu-mask=0xfff
391 ovs-vsctl add-port br0 vhost-user-1 \
392 -- set Interface vhost-user-1 type=dpdkvhostuser
393
394 Start VM using vhost-user mode::
395
396 qemu-system-x86_64 -hda ubuntu1810.qcow \
397 -m 4096 \
398 -cpu host,+x2apic -enable-kvm \
399 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \
400 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \
401 -device virtio-net-pci,mac=00:00:00:00:00:01,\
402 netdev=mynet1,mq=on,vectors=10 \
403 -object memory-backend-file,id=mem,size=4096M,\
404 mem-path=/dev/hugepages,share=on \
405 -numa node,memdev=mem -mem-prealloc -smp 2
406
407 Setup the OpenFlow ruls::
408
409 ovs-ofctl del-flows br0
410 ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:vhost-user-1"
411 ovs-ofctl add-flow br0 "in_port=vhost-user-1, actions=output:enp2s0"
412
413 Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
414
415 ./xdp_rxq_info --dev ens3 --action XDP_DROP
416 ./xdp_rxq_info --dev ens3 --action XDP_TX
417
418
419 PCP container using veth
420 ------------------------
421 Create namespace and veth peer devices::
422
423 ip netns add at_ns0
424 ip link add p0 type veth peer name afxdp-p0
425 ip link set p0 netns at_ns0
426 ip link set dev afxdp-p0 up
427 ip netns exec at_ns0 ip link set dev p0 up
428
429 Attach the veth port to br0 (linux kernel mode)::
430
431 ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
432
433 Or, use AF_XDP::
434
435 ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
436
437 Setup the OpenFlow rules::
438
439 ovs-ofctl del-flows br0
440 ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
441 ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
442
443 In the namespace, run drop or bounce back the packet::
444
445 ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
446 ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
447
448
449 Bug Reporting
450 -------------
451
452 Please report problems to dev@openvswitch.org.