]> git.proxmox.com Git - mirror_ovs.git/blame - INSTALL.DPDK
netlink-socket: add support for nl_lookup_genl_mcgroup()
[mirror_ovs.git] / INSTALL.DPDK
CommitLineData
8a9562d2
PS
1 Using Open vSwitch with DPDK
2 ============================
3
4Open vSwitch can use Intel(R) DPDK lib to operate entirely in
5userspace. This file explains how to install and use Open vSwitch in
6such a mode.
7
8The DPDK support of Open vSwitch is considered experimental.
9It has not been thoroughly tested.
10
11This version of Open vSwitch should be built manually with "configure"
12and "make".
13
14Building and Installing:
15------------------------
16
d7310583 17Required DPDK 1.7.
8a9562d2
PS
18
19DPDK:
d7310583 20Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.7.0
c2cbb53c 21cd $DPDK_DIR
d7310583 22update config/common_linuxapp so that dpdk generate single lib file.
8a9562d2
PS
23CONFIG_RTE_BUILD_COMBINE_LIBS=y
24
d7310583 25make install T=x86_64-native-linuxapp-gcc
8a9562d2
PS
26For details refer to http://dpdk.org/
27
28Linux kernel:
29Refer to intel-dpdk-getting-started-guide.pdf for understanding
30DPDK kernel requirement.
31
32OVS:
33cd $(OVS_DIR)/openvswitch
34./boot.sh
d7310583 35export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/
c2cbb53c 36./configure --with-dpdk=$DPDK_BUILD
8a9562d2
PS
37make
38
39Refer to INSTALL.userspace for general requirements of building
40userspace OVS.
41
42Using the DPDK with ovs-vswitchd:
43---------------------------------
44
c2cbb53c
PM
45Setup system boot:
46 kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1
47
8a9562d2
PS
48First setup DPDK devices:
49 - insert uio.ko
c2cbb53c 50 e.g. modprobe uio
8a9562d2 51 - insert igb_uio.ko
d7310583
DDP
52 e.g. insmod $DPDK_BUILD/kmod/igb_uio.ko
53 - Bind network device to igb_uio.
54 e.g. $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1
c2cbb53c
PM
55 Alternate binding method:
56 Find target Ethernet devices
57 lspci -nn|grep Ethernet
58 Bring Down (e.g. eth2, eth3)
59 ifconfig eth2 down
60 ifconfig eth3 down
61 Look at current devices (e.g ixgbe devices)
62 ls /sys/bus/pci/drivers/ixgbe/
63 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
64 Unbind target pci devices from current driver (e.g. 02:00.0 ...)
65 echo 0000:02:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
66 echo 0000:02:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
67 Bind to target driver (e.g. igb_uio)
68 echo 0000:02:00.0 > /sys/bus/pci/drivers/igb_uio/bind
69 echo 0000:02:00.1 > /sys/bus/pci/drivers/igb_uio/bind
70 Check binding for listed devices
71 ls /sys/bus/pci/drivers/igb_uio
72 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
73
74Prepare system:
c2cbb53c
PM
75 - mount hugetlbfs
76 e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/
8a9562d2
PS
77
78Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
79
c2cbb53c
PM
80Start ovsdb-server as discussed in INSTALL doc:
81 Summary e.g.:
82 First time only db creation (or clearing):
83 mkdir -p /usr/local/etc/openvswitch
84 mkdir -p /usr/local/var/run/openvswitch
85 rm /usr/local/etc/openvswitch/conf.db
86 cd $OVS_DIR
87 ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
88 ./vswitchd/vswitch.ovsschema
89 start ovsdb-server
90 cd $OVS_DIR
91 ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
6ba531aa 92 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
c2cbb53c
PM
93 --private-key=db:Open_vSwitch,SSL,private_key \
94 --certificate=dbitch,SSL,certificate \
95 --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
96 First time after db creation, initialize:
97 cd $OVS_DIR
98 ./utilities/ovs-vsctl --no-wait init
99
8a9562d2
PS
100Start vswitchd:
101DPDK configuration arguments can be passed to vswitchd via `--dpdk`
d1279464
PS
102argument. This needs to be first argument passed to vswitchd process.
103dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
8a9562d2
PS
104for dpdk initialization.
105
106 e.g.
c2cbb53c 107 export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
8a9562d2
PS
108 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
109
c2cbb53c
PM
110If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory:
111
112 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
113 -- unix:$DB_SOCK --pidfile --detach
114
8a9562d2
PS
115To use ovs-vswitchd with DPDK, create a bridge with datapath_type
116"netdev" in the configuration database. For example:
117
118 ovs-vsctl add-br br0
119 ovs-vsctl set bridge br0 datapath_type=netdev
120
121Now you can add dpdk devices. OVS expect DPDK device name start with dpdk
122and end with portid. vswitchd should print number of dpdk devices found.
123
124 ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
c2cbb53c 125 ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
8a9562d2 126
c2cbb53c 127Once first DPDK port is added to vswitchd, it creates a Polling thread and
8a9562d2
PS
128polls dpdk device in continuous loop. Therefore CPU utilization
129for that thread is always 100%.
130
c2cbb53c
PM
131Test flow script across NICs (assuming ovs in /usr/src/ovs):
132 Assume 1.1.1.1 on NIC port 1 (dpdk0)
133 Assume 1.1.1.2 on NIC port 2 (dpdk1)
134 Execute script:
135
136############################# Script:
137
138#! /bin/sh
139
140# Move to command directory
141
142cd /usr/src/ovs/utilities/
143
144# Clear current flows
145./ovs-ofctl del-flows br0
146
147# Add flows between port 1 (dpdk0) to port 2 (dpdk1)
148./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
149nw_dst=1.1.1.2,idle_timeout=0,action=output:2
150./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
151nw_dst=1.1.1.1,idle_timeout=0,action=output:1
152
153######################################
154
155Ideally for maximum throughput, the 100% task should not be scheduled out
156which temporarily halts the process. The following affinitization methods will
157help.
158
159At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0
160but this may change. Lets pick a target core for 100% task to run on, i.e. core 7.
161Also assume a dual 8 core sandy bridge system with hyperthreading enabled.
162(A different cpu configuration will have different core mask requirements).
163
164To give better ownership of 100%, isolation maybe useful.
165To kernel bootline add core isolation list for core 7 and associated hype core 23
166 e.g. isolcpus=7,23
167Reboot system for isolation to take effect, restart everything
168
169List threads (and their pid) of ovs-vswitchd
170 top -p `pidof ovs-vswitchd` -H -d1
171
172Look for pmd* thread which is polling dpdk devices, this will be the 100% CPU
173bound task. Using this thread pid, affinitize to core 7 (mask 0x080),
174example pid 1762
175
176taskset -p 080 1762
177 pid 1762's current affinity mask: 1
178 pid 1762's new affinity mask: 80
179
180Assume that all other ovs-vswitchd threads to be on other socket 0 cores.
181Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F
182
183taskset -p 0x0FF007F {thread pid, e.g 1738}
184 pid 1738's current affinity mask: 1
185 pid 1738's new affinity mask: ff007f
186. . .
187
188The core 23 is left idle, which allows core 7 to run at full rate.
189
190Future changes may change the need for cpu core affinitization.
191
95fb793a 192DPDK Rings :
193------------
194
195Following the steps above to create a bridge, you can now add dpdk rings
196as a port to the vswitch. OVS will expect the DPDK ring device name to
197start with dpdkr and end with a portid.
198
199 ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
200
201DPDK rings client test application
202
203Included in the test directory is a sample DPDK application for testing
204the rings. This is from the base dpdk directory and modified to work
205with the ring naming used within ovs.
206
207location tests/ovs_client
208
209To run the client :
210
211 ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
212
213In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
214
215It is essential to have --proc-type=secondary
216
217The application simply receives an mbuf on the receive queue of the
218ethernet ring and then places that same mbuf on the transmit ring of
219the ethernet ring. It is a trivial loopback application.
220
221In addition to executing the client in the host, you can execute it within
222a guest VM. To do so you will need a patched qemu. You can download the
223patch and getting started guide at :
224
225https://01.org/packet-processing/downloads
226
227A general rule of thumb for better performance is that the client
228application should not be assigned the same dpdk core mask "-c" as
229the vswitchd.
230
8a9562d2
PS
231Restrictions:
232-------------
233
234 - This Support is for Physical NIC. I have tested with Intel NIC only.
235 - vswitchd userspace datapath does affine polling thread but it is
236 assumed that devices are on numa node 0. Therefore if device is
237 attached to non zero numa node switching performance would be
238 suboptimal.
239 - There are fixed number of polling thread and fixed number of per
240 device queues configured.
241 - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
242 - Currently DPDK port does not make use any offload functionality.
95fb793a 243 ivshmem
244 - The shared memory is currently restricted to the use of a 1GB
245 huge pages.
246 - All huge pages are shared amongst the host, clients, virtual
247 machines etc.
8a9562d2
PS
248
249Bug Reporting:
250--------------
251
252Please report problems to bugs@openvswitch.org.