1 Using Open vSwitch with DPDK
2 ============================
4 Open vSwitch can use Intel(R) DPDK lib to operate entirely in
5 userspace. This file explains how to install and use Open vSwitch in
8 The DPDK support of Open vSwitch is considered experimental.
9 It has not been thoroughly tested.
11 This version of Open vSwitch should be built manually with "configure"
14 Building and Installing:
15 ------------------------
17 Recommended to use DPDK 1.6.
20 Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.6.0r2
22 update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate single lib file.
23 CONFIG_RTE_BUILD_COMBINE_LIBS=y
25 make install T=x86_64-default-linuxapp-gcc
26 For details refer to http://dpdk.org/
29 Refer to intel-dpdk-getting-started-guide.pdf for understanding
30 DPDK kernel requirement.
33 cd $(OVS_DIR)/openvswitch
35 export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc
36 ./configure --with-dpdk=$DPDK_BUILD
39 Refer to INSTALL.userspace for general requirements of building
42 Using the DPDK with ovs-vswitchd:
43 ---------------------------------
46 kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1
48 First setup DPDK devices:
52 e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko
53 - Bind network device to ibg_uio.
54 e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1
55 Alternate binding method:
56 Find target Ethernet devices
57 lspci -nn|grep Ethernet
58 Bring Down (e.g. eth2, eth3)
61 Look at current devices (e.g ixgbe devices)
62 ls /sys/bus/pci/drivers/ixgbe/
63 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
64 Unbind target pci devices from current driver (e.g. 02:00.0 ...)
65 echo 0000:02:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
66 echo 0000:02:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
67 Bind to target driver (e.g. igb_uio)
68 echo 0000:02:00.0 > /sys/bus/pci/drivers/igb_uio/bind
69 echo 0000:02:00.1 > /sys/bus/pci/drivers/igb_uio/bind
70 Check binding for listed devices
71 ls /sys/bus/pci/drivers/igb_uio
72 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
75 - load ovs kernel module
76 e.g modprobe openvswitch
78 e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/
80 Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
82 Start ovsdb-server as discussed in INSTALL doc:
84 First time only db creation (or clearing):
85 mkdir -p /usr/local/etc/openvswitch
86 mkdir -p /usr/local/var/run/openvswitch
87 rm /usr/local/etc/openvswitch/conf.db
89 ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
90 ./vswitchd/vswitch.ovsschema
93 ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
94 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
95 --private-key=db:Open_vSwitch,SSL,private_key \
96 --certificate=dbitch,SSL,certificate \
97 --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
98 First time after db creation, initialize:
100 ./utilities/ovs-vsctl --no-wait init
103 DPDK configuration arguments can be passed to vswitchd via `--dpdk`
104 argument. dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
105 for dpdk initialization.
108 export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
109 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
111 If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory:
113 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
114 -- unix:$DB_SOCK --pidfile --detach
116 To use ovs-vswitchd with DPDK, create a bridge with datapath_type
117 "netdev" in the configuration database. For example:
120 ovs-vsctl set bridge br0 datapath_type=netdev
122 Now you can add dpdk devices. OVS expect DPDK device name start with dpdk
123 and end with portid. vswitchd should print number of dpdk devices found.
125 ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
126 ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
128 Once first DPDK port is added to vswitchd, it creates a Polling thread and
129 polls dpdk device in continuous loop. Therefore CPU utilization
130 for that thread is always 100%.
132 Test flow script across NICs (assuming ovs in /usr/src/ovs):
133 Assume 1.1.1.1 on NIC port 1 (dpdk0)
134 Assume 1.1.1.2 on NIC port 2 (dpdk1)
137 ############################# Script:
141 # Move to command directory
143 cd /usr/src/ovs/utilities/
145 # Clear current flows
146 ./ovs-ofctl del-flows br0
148 # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
149 ./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
150 nw_dst=1.1.1.2,idle_timeout=0,action=output:2
151 ./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
152 nw_dst=1.1.1.1,idle_timeout=0,action=output:1
154 ######################################
156 Ideally for maximum throughput, the 100% task should not be scheduled out
157 which temporarily halts the process. The following affinitization methods will
160 At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0
161 but this may change. Lets pick a target core for 100% task to run on, i.e. core 7.
162 Also assume a dual 8 core sandy bridge system with hyperthreading enabled.
163 (A different cpu configuration will have different core mask requirements).
165 To give better ownership of 100%, isolation maybe useful.
166 To kernel bootline add core isolation list for core 7 and associated hype core 23
168 Reboot system for isolation to take effect, restart everything
170 List threads (and their pid) of ovs-vswitchd
171 top -p `pidof ovs-vswitchd` -H -d1
173 Look for pmd* thread which is polling dpdk devices, this will be the 100% CPU
174 bound task. Using this thread pid, affinitize to core 7 (mask 0x080),
178 pid 1762's current affinity mask: 1
179 pid 1762's new affinity mask: 80
181 Assume that all other ovs-vswitchd threads to be on other socket 0 cores.
182 Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F
184 taskset -p 0x0FF007F {thread pid, e.g 1738}
185 pid 1738's current affinity mask: 1
186 pid 1738's new affinity mask: ff007f
189 The core 23 is left idle, which allows core 7 to run at full rate.
191 Future changes may change the need for cpu core affinitization.
196 Following the steps above to create a bridge, you can now add dpdk rings
197 as a port to the vswitch. OVS will expect the DPDK ring device name to
198 start with dpdkr and end with a portid.
200 ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
202 DPDK rings client test application
204 Included in the test directory is a sample DPDK application for testing
205 the rings. This is from the base dpdk directory and modified to work
206 with the ring naming used within ovs.
208 location tests/ovs_client
212 ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
214 In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
216 It is essential to have --proc-type=secondary
218 The application simply receives an mbuf on the receive queue of the
219 ethernet ring and then places that same mbuf on the transmit ring of
220 the ethernet ring. It is a trivial loopback application.
222 In addition to executing the client in the host, you can execute it within
223 a guest VM. To do so you will need a patched qemu. You can download the
224 patch and getting started guide at :
226 https://01.org/packet-processing/downloads
228 A general rule of thumb for better performance is that the client
229 application should not be assigned the same dpdk core mask "-c" as
235 - This Support is for Physical NIC. I have tested with Intel NIC only.
236 - vswitchd userspace datapath does affine polling thread but it is
237 assumed that devices are on numa node 0. Therefore if device is
238 attached to non zero numa node switching performance would be
240 - There are fixed number of polling thread and fixed number of per
241 device queues configured.
242 - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
243 - Currently DPDK port does not make use any offload functionality.
245 - The shared memory is currently restricted to the use of a 1GB
247 - All huge pages are shared amongst the host, clients, virtual
253 Please report problems to bugs@openvswitch.org.