]>
Commit | Line | Data |
---|---|---|
8a9562d2 PS |
1 | Using Open vSwitch with DPDK |
2 | ============================ | |
3 | ||
4 | Open vSwitch can use Intel(R) DPDK lib to operate entirely in | |
5 | userspace. This file explains how to install and use Open vSwitch in | |
6 | such a mode. | |
7 | ||
8 | The DPDK support of Open vSwitch is considered experimental. | |
9 | It has not been thoroughly tested. | |
10 | ||
11 | This version of Open vSwitch should be built manually with "configure" | |
12 | and "make". | |
13 | ||
14 | Building and Installing: | |
15 | ------------------------ | |
16 | ||
17 | Recommended to use DPDK 1.6. | |
18 | ||
19 | DPDK: | |
c2cbb53c PM |
20 | Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.6.0r2 |
21 | cd $DPDK_DIR | |
8a9562d2 PS |
22 | update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate single lib file. |
23 | CONFIG_RTE_BUILD_COMBINE_LIBS=y | |
24 | ||
25 | make install T=x86_64-default-linuxapp-gcc | |
26 | For details refer to http://dpdk.org/ | |
27 | ||
28 | Linux kernel: | |
29 | Refer to intel-dpdk-getting-started-guide.pdf for understanding | |
30 | DPDK kernel requirement. | |
31 | ||
32 | OVS: | |
33 | cd $(OVS_DIR)/openvswitch | |
34 | ./boot.sh | |
c2cbb53c PM |
35 | export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc |
36 | ./configure --with-dpdk=$DPDK_BUILD | |
8a9562d2 PS |
37 | make |
38 | ||
39 | Refer to INSTALL.userspace for general requirements of building | |
40 | userspace OVS. | |
41 | ||
42 | Using the DPDK with ovs-vswitchd: | |
43 | --------------------------------- | |
44 | ||
c2cbb53c PM |
45 | Setup system boot: |
46 | kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1 | |
47 | ||
8a9562d2 PS |
48 | First setup DPDK devices: |
49 | - insert uio.ko | |
c2cbb53c | 50 | e.g. modprobe uio |
8a9562d2 PS |
51 | - insert igb_uio.ko |
52 | e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko | |
8a9562d2 PS |
53 | - Bind network device to ibg_uio. |
54 | e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1 | |
c2cbb53c PM |
55 | Alternate binding method: |
56 | Find target Ethernet devices | |
57 | lspci -nn|grep Ethernet | |
58 | Bring Down (e.g. eth2, eth3) | |
59 | ifconfig eth2 down | |
60 | ifconfig eth3 down | |
61 | Look at current devices (e.g ixgbe devices) | |
62 | ls /sys/bus/pci/drivers/ixgbe/ | |
63 | 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind | |
64 | Unbind target pci devices from current driver (e.g. 02:00.0 ...) | |
65 | echo 0000:02:00.0 > /sys/bus/pci/drivers/ixgbe/unbind | |
66 | echo 0000:02:00.1 > /sys/bus/pci/drivers/ixgbe/unbind | |
67 | Bind to target driver (e.g. igb_uio) | |
68 | echo 0000:02:00.0 > /sys/bus/pci/drivers/igb_uio/bind | |
69 | echo 0000:02:00.1 > /sys/bus/pci/drivers/igb_uio/bind | |
70 | Check binding for listed devices | |
71 | ls /sys/bus/pci/drivers/igb_uio | |
72 | 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind | |
73 | ||
74 | Prepare system: | |
75 | - load ovs kernel module | |
76 | e.g modprobe openvswitch | |
77 | - mount hugetlbfs | |
78 | e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/ | |
8a9562d2 PS |
79 | |
80 | Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. | |
81 | ||
c2cbb53c PM |
82 | Start ovsdb-server as discussed in INSTALL doc: |
83 | Summary e.g.: | |
84 | First time only db creation (or clearing): | |
85 | mkdir -p /usr/local/etc/openvswitch | |
86 | mkdir -p /usr/local/var/run/openvswitch | |
87 | rm /usr/local/etc/openvswitch/conf.db | |
88 | cd $OVS_DIR | |
89 | ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ | |
90 | ./vswitchd/vswitch.ovsschema | |
91 | start ovsdb-server | |
92 | cd $OVS_DIR | |
93 | ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ | |
6ba531aa | 94 | --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ |
c2cbb53c PM |
95 | --private-key=db:Open_vSwitch,SSL,private_key \ |
96 | --certificate=dbitch,SSL,certificate \ | |
97 | --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach | |
98 | First time after db creation, initialize: | |
99 | cd $OVS_DIR | |
100 | ./utilities/ovs-vsctl --no-wait init | |
101 | ||
8a9562d2 PS |
102 | Start vswitchd: |
103 | DPDK configuration arguments can be passed to vswitchd via `--dpdk` | |
c2cbb53c | 104 | argument. dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter |
8a9562d2 PS |
105 | for dpdk initialization. |
106 | ||
107 | e.g. | |
c2cbb53c | 108 | export DB_SOCK=/usr/local/var/run/openvswitch/db.sock |
8a9562d2 PS |
109 | ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach |
110 | ||
c2cbb53c PM |
111 | If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory: |
112 | ||
113 | ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ | |
114 | -- unix:$DB_SOCK --pidfile --detach | |
115 | ||
8a9562d2 PS |
116 | To use ovs-vswitchd with DPDK, create a bridge with datapath_type |
117 | "netdev" in the configuration database. For example: | |
118 | ||
119 | ovs-vsctl add-br br0 | |
120 | ovs-vsctl set bridge br0 datapath_type=netdev | |
121 | ||
122 | Now you can add dpdk devices. OVS expect DPDK device name start with dpdk | |
123 | and end with portid. vswitchd should print number of dpdk devices found. | |
124 | ||
125 | ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk | |
c2cbb53c | 126 | ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk |
8a9562d2 | 127 | |
c2cbb53c | 128 | Once first DPDK port is added to vswitchd, it creates a Polling thread and |
8a9562d2 PS |
129 | polls dpdk device in continuous loop. Therefore CPU utilization |
130 | for that thread is always 100%. | |
131 | ||
c2cbb53c PM |
132 | Test flow script across NICs (assuming ovs in /usr/src/ovs): |
133 | Assume 1.1.1.1 on NIC port 1 (dpdk0) | |
134 | Assume 1.1.1.2 on NIC port 2 (dpdk1) | |
135 | Execute script: | |
136 | ||
137 | ############################# Script: | |
138 | ||
139 | #! /bin/sh | |
140 | ||
141 | # Move to command directory | |
142 | ||
143 | cd /usr/src/ovs/utilities/ | |
144 | ||
145 | # Clear current flows | |
146 | ./ovs-ofctl del-flows br0 | |
147 | ||
148 | # Add flows between port 1 (dpdk0) to port 2 (dpdk1) | |
149 | ./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\ | |
150 | nw_dst=1.1.1.2,idle_timeout=0,action=output:2 | |
151 | ./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\ | |
152 | nw_dst=1.1.1.1,idle_timeout=0,action=output:1 | |
153 | ||
154 | ###################################### | |
155 | ||
156 | Ideally for maximum throughput, the 100% task should not be scheduled out | |
157 | which temporarily halts the process. The following affinitization methods will | |
158 | help. | |
159 | ||
160 | At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0 | |
161 | but this may change. Lets pick a target core for 100% task to run on, i.e. core 7. | |
162 | Also assume a dual 8 core sandy bridge system with hyperthreading enabled. | |
163 | (A different cpu configuration will have different core mask requirements). | |
164 | ||
165 | To give better ownership of 100%, isolation maybe useful. | |
166 | To kernel bootline add core isolation list for core 7 and associated hype core 23 | |
167 | e.g. isolcpus=7,23 | |
168 | Reboot system for isolation to take effect, restart everything | |
169 | ||
170 | List threads (and their pid) of ovs-vswitchd | |
171 | top -p `pidof ovs-vswitchd` -H -d1 | |
172 | ||
173 | Look for pmd* thread which is polling dpdk devices, this will be the 100% CPU | |
174 | bound task. Using this thread pid, affinitize to core 7 (mask 0x080), | |
175 | example pid 1762 | |
176 | ||
177 | taskset -p 080 1762 | |
178 | pid 1762's current affinity mask: 1 | |
179 | pid 1762's new affinity mask: 80 | |
180 | ||
181 | Assume that all other ovs-vswitchd threads to be on other socket 0 cores. | |
182 | Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F | |
183 | ||
184 | taskset -p 0x0FF007F {thread pid, e.g 1738} | |
185 | pid 1738's current affinity mask: 1 | |
186 | pid 1738's new affinity mask: ff007f | |
187 | . . . | |
188 | ||
189 | The core 23 is left idle, which allows core 7 to run at full rate. | |
190 | ||
191 | Future changes may change the need for cpu core affinitization. | |
192 | ||
95fb793a | 193 | DPDK Rings : |
194 | ------------ | |
195 | ||
196 | Following the steps above to create a bridge, you can now add dpdk rings | |
197 | as a port to the vswitch. OVS will expect the DPDK ring device name to | |
198 | start with dpdkr and end with a portid. | |
199 | ||
200 | ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr | |
201 | ||
202 | DPDK rings client test application | |
203 | ||
204 | Included in the test directory is a sample DPDK application for testing | |
205 | the rings. This is from the base dpdk directory and modified to work | |
206 | with the ring naming used within ovs. | |
207 | ||
208 | location tests/ovs_client | |
209 | ||
210 | To run the client : | |
211 | ||
212 | ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" | |
213 | ||
214 | In the case of the dpdkr example above the "port id you gave dpdkr" is 0. | |
215 | ||
216 | It is essential to have --proc-type=secondary | |
217 | ||
218 | The application simply receives an mbuf on the receive queue of the | |
219 | ethernet ring and then places that same mbuf on the transmit ring of | |
220 | the ethernet ring. It is a trivial loopback application. | |
221 | ||
222 | In addition to executing the client in the host, you can execute it within | |
223 | a guest VM. To do so you will need a patched qemu. You can download the | |
224 | patch and getting started guide at : | |
225 | ||
226 | https://01.org/packet-processing/downloads | |
227 | ||
228 | A general rule of thumb for better performance is that the client | |
229 | application should not be assigned the same dpdk core mask "-c" as | |
230 | the vswitchd. | |
231 | ||
8a9562d2 PS |
232 | Restrictions: |
233 | ------------- | |
234 | ||
235 | - This Support is for Physical NIC. I have tested with Intel NIC only. | |
236 | - vswitchd userspace datapath does affine polling thread but it is | |
237 | assumed that devices are on numa node 0. Therefore if device is | |
238 | attached to non zero numa node switching performance would be | |
239 | suboptimal. | |
240 | - There are fixed number of polling thread and fixed number of per | |
241 | device queues configured. | |
242 | - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. | |
243 | - Currently DPDK port does not make use any offload functionality. | |
95fb793a | 244 | ivshmem |
245 | - The shared memory is currently restricted to the use of a 1GB | |
246 | huge pages. | |
247 | - All huge pages are shared amongst the host, clients, virtual | |
248 | machines etc. | |
8a9562d2 PS |
249 | |
250 | Bug Reporting: | |
251 | -------------- | |
252 | ||
253 | Please report problems to bugs@openvswitch.org. |