]> git.proxmox.com Git - mirror_ovs.git/blame - INSTALL.DPDK
bitmap: Add new functions.
[mirror_ovs.git] / INSTALL.DPDK
CommitLineData
8a9562d2
PS
1 Using Open vSwitch with DPDK
2 ============================
3
4Open vSwitch can use Intel(R) DPDK lib to operate entirely in
5userspace. This file explains how to install and use Open vSwitch in
6such a mode.
7
8The DPDK support of Open vSwitch is considered experimental.
9It has not been thoroughly tested.
10
11This version of Open vSwitch should be built manually with "configure"
12and "make".
13
14Building and Installing:
15------------------------
16
17Recommended to use DPDK 1.6.
18
19DPDK:
c2cbb53c
PM
20Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.6.0r2
21cd $DPDK_DIR
8a9562d2
PS
22update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate single lib file.
23CONFIG_RTE_BUILD_COMBINE_LIBS=y
24
25make install T=x86_64-default-linuxapp-gcc
26For details refer to http://dpdk.org/
27
28Linux kernel:
29Refer to intel-dpdk-getting-started-guide.pdf for understanding
30DPDK kernel requirement.
31
32OVS:
33cd $(OVS_DIR)/openvswitch
34./boot.sh
c2cbb53c
PM
35export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc
36./configure --with-dpdk=$DPDK_BUILD
8a9562d2
PS
37make
38
39Refer to INSTALL.userspace for general requirements of building
40userspace OVS.
41
42Using the DPDK with ovs-vswitchd:
43---------------------------------
44
c2cbb53c
PM
45Setup system boot:
46 kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1
47
8a9562d2
PS
48First setup DPDK devices:
49 - insert uio.ko
c2cbb53c 50 e.g. modprobe uio
8a9562d2
PS
51 - insert igb_uio.ko
52 e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko
8a9562d2
PS
53 - Bind network device to ibg_uio.
54 e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1
c2cbb53c
PM
55 Alternate binding method:
56 Find target Ethernet devices
57 lspci -nn|grep Ethernet
58 Bring Down (e.g. eth2, eth3)
59 ifconfig eth2 down
60 ifconfig eth3 down
61 Look at current devices (e.g ixgbe devices)
62 ls /sys/bus/pci/drivers/ixgbe/
63 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
64 Unbind target pci devices from current driver (e.g. 02:00.0 ...)
65 echo 0000:02:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
66 echo 0000:02:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
67 Bind to target driver (e.g. igb_uio)
68 echo 0000:02:00.0 > /sys/bus/pci/drivers/igb_uio/bind
69 echo 0000:02:00.1 > /sys/bus/pci/drivers/igb_uio/bind
70 Check binding for listed devices
71 ls /sys/bus/pci/drivers/igb_uio
72 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
73
74Prepare system:
75 - load ovs kernel module
76 e.g modprobe openvswitch
77 - mount hugetlbfs
78 e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/
8a9562d2
PS
79
80Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
81
c2cbb53c
PM
82Start ovsdb-server as discussed in INSTALL doc:
83 Summary e.g.:
84 First time only db creation (or clearing):
85 mkdir -p /usr/local/etc/openvswitch
86 mkdir -p /usr/local/var/run/openvswitch
87 rm /usr/local/etc/openvswitch/conf.db
88 cd $OVS_DIR
89 ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
90 ./vswitchd/vswitch.ovsschema
91 start ovsdb-server
92 cd $OVS_DIR
93 ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
6ba531aa 94 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
c2cbb53c
PM
95 --private-key=db:Open_vSwitch,SSL,private_key \
96 --certificate=dbitch,SSL,certificate \
97 --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
98 First time after db creation, initialize:
99 cd $OVS_DIR
100 ./utilities/ovs-vsctl --no-wait init
101
8a9562d2
PS
102Start vswitchd:
103DPDK configuration arguments can be passed to vswitchd via `--dpdk`
c2cbb53c 104argument. dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
8a9562d2
PS
105for dpdk initialization.
106
107 e.g.
c2cbb53c 108 export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
8a9562d2
PS
109 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
110
c2cbb53c
PM
111If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory:
112
113 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
114 -- unix:$DB_SOCK --pidfile --detach
115
8a9562d2
PS
116To use ovs-vswitchd with DPDK, create a bridge with datapath_type
117"netdev" in the configuration database. For example:
118
119 ovs-vsctl add-br br0
120 ovs-vsctl set bridge br0 datapath_type=netdev
121
122Now you can add dpdk devices. OVS expect DPDK device name start with dpdk
123and end with portid. vswitchd should print number of dpdk devices found.
124
125 ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
c2cbb53c 126 ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
8a9562d2 127
c2cbb53c 128Once first DPDK port is added to vswitchd, it creates a Polling thread and
8a9562d2
PS
129polls dpdk device in continuous loop. Therefore CPU utilization
130for that thread is always 100%.
131
c2cbb53c
PM
132Test flow script across NICs (assuming ovs in /usr/src/ovs):
133 Assume 1.1.1.1 on NIC port 1 (dpdk0)
134 Assume 1.1.1.2 on NIC port 2 (dpdk1)
135 Execute script:
136
137############################# Script:
138
139#! /bin/sh
140
141# Move to command directory
142
143cd /usr/src/ovs/utilities/
144
145# Clear current flows
146./ovs-ofctl del-flows br0
147
148# Add flows between port 1 (dpdk0) to port 2 (dpdk1)
149./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
150nw_dst=1.1.1.2,idle_timeout=0,action=output:2
151./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
152nw_dst=1.1.1.1,idle_timeout=0,action=output:1
153
154######################################
155
156Ideally for maximum throughput, the 100% task should not be scheduled out
157which temporarily halts the process. The following affinitization methods will
158help.
159
160At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0
161but this may change. Lets pick a target core for 100% task to run on, i.e. core 7.
162Also assume a dual 8 core sandy bridge system with hyperthreading enabled.
163(A different cpu configuration will have different core mask requirements).
164
165To give better ownership of 100%, isolation maybe useful.
166To kernel bootline add core isolation list for core 7 and associated hype core 23
167 e.g. isolcpus=7,23
168Reboot system for isolation to take effect, restart everything
169
170List threads (and their pid) of ovs-vswitchd
171 top -p `pidof ovs-vswitchd` -H -d1
172
173Look for pmd* thread which is polling dpdk devices, this will be the 100% CPU
174bound task. Using this thread pid, affinitize to core 7 (mask 0x080),
175example pid 1762
176
177taskset -p 080 1762
178 pid 1762's current affinity mask: 1
179 pid 1762's new affinity mask: 80
180
181Assume that all other ovs-vswitchd threads to be on other socket 0 cores.
182Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F
183
184taskset -p 0x0FF007F {thread pid, e.g 1738}
185 pid 1738's current affinity mask: 1
186 pid 1738's new affinity mask: ff007f
187. . .
188
189The core 23 is left idle, which allows core 7 to run at full rate.
190
191Future changes may change the need for cpu core affinitization.
192
95fb793a 193DPDK Rings :
194------------
195
196Following the steps above to create a bridge, you can now add dpdk rings
197as a port to the vswitch. OVS will expect the DPDK ring device name to
198start with dpdkr and end with a portid.
199
200 ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
201
202DPDK rings client test application
203
204Included in the test directory is a sample DPDK application for testing
205the rings. This is from the base dpdk directory and modified to work
206with the ring naming used within ovs.
207
208location tests/ovs_client
209
210To run the client :
211
212 ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
213
214In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
215
216It is essential to have --proc-type=secondary
217
218The application simply receives an mbuf on the receive queue of the
219ethernet ring and then places that same mbuf on the transmit ring of
220the ethernet ring. It is a trivial loopback application.
221
222In addition to executing the client in the host, you can execute it within
223a guest VM. To do so you will need a patched qemu. You can download the
224patch and getting started guide at :
225
226https://01.org/packet-processing/downloads
227
228A general rule of thumb for better performance is that the client
229application should not be assigned the same dpdk core mask "-c" as
230the vswitchd.
231
8a9562d2
PS
232Restrictions:
233-------------
234
235 - This Support is for Physical NIC. I have tested with Intel NIC only.
236 - vswitchd userspace datapath does affine polling thread but it is
237 assumed that devices are on numa node 0. Therefore if device is
238 attached to non zero numa node switching performance would be
239 suboptimal.
240 - There are fixed number of polling thread and fixed number of per
241 device queues configured.
242 - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
243 - Currently DPDK port does not make use any offload functionality.
95fb793a 244 ivshmem
245 - The shared memory is currently restricted to the use of a 1GB
246 huge pages.
247 - All huge pages are shared amongst the host, clients, virtual
248 machines etc.
8a9562d2
PS
249
250Bug Reporting:
251--------------
252
253Please report problems to bugs@openvswitch.org.