]>
Commit | Line | Data |
---|---|---|
542cc9bb TG |
1 | Using Open vSwitch with DPDK |
2 | ============================ | |
3 | ||
4 | Open vSwitch can use Intel(R) DPDK lib to operate entirely in | |
5 | userspace. This file explains how to install and use Open vSwitch in | |
6 | such a mode. | |
7 | ||
8 | The DPDK support of Open vSwitch is considered experimental. | |
9 | It has not been thoroughly tested. | |
10 | ||
11 | This version of Open vSwitch should be built manually with `configure` | |
12 | and `make`. | |
13 | ||
14 | OVS needs a system with 1GB hugepages support. | |
15 | ||
16 | Building and Installing: | |
17 | ------------------------ | |
18 | ||
19 | Required DPDK 1.7 | |
20 | ||
21 | 1. Configure build & install DPDK: | |
22 | 1. Set `$DPDK_DIR` | |
23 | ||
24 | ``` | |
25 | export DPDK_DIR=/usr/src/dpdk-1.7.1 | |
26 | cd $DPDK_DIR | |
27 | ``` | |
28 | ||
29 | 2. Update `config/common_linuxapp` so that DPDK generate single lib file. | |
30 | (modification also required for IVSHMEM build) | |
31 | ||
32 | `CONFIG_RTE_BUILD_COMBINE_LIBS=y` | |
33 | ||
34 | Then run `make install` to build and isntall the library. | |
35 | For default install without IVSHMEM: | |
36 | ||
37 | `make install T=x86_64-native-linuxapp-gcc` | |
38 | ||
39 | To include IVSHMEM (shared memory): | |
40 | ||
41 | `make install T=x86_64-ivshmem-linuxapp-gcc` | |
42 | ||
43 | For further details refer to http://dpdk.org/ | |
44 | ||
45 | 2. Configure & build the Linux kernel: | |
46 | ||
47 | Refer to intel-dpdk-getting-started-guide.pdf for understanding | |
48 | DPDK kernel requirement. | |
49 | ||
50 | 3. Configure & build OVS: | |
51 | ||
52 | * Non IVSHMEM: | |
53 | ||
54 | `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` | |
55 | ||
56 | * IVSHMEM: | |
57 | ||
58 | `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` | |
59 | ||
60 | ``` | |
61 | cd $(OVS_DIR)/openvswitch | |
62 | ./boot.sh | |
63 | ./configure --with-dpdk=$DPDK_BUILD | |
64 | make | |
65 | ``` | |
66 | ||
67 | To have better performance one can enable aggressive compiler optimizations and | |
68 | use the special instructions(popcnt, crc32) that may not be available on all | |
69 | machines. Instead of typing `make`, type: | |
70 | ||
71 | `make CFLAGS='-O3 -march=native'` | |
72 | ||
9feb1017 | 73 | Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS. |
542cc9bb TG |
74 | |
75 | Using the DPDK with ovs-vswitchd: | |
76 | --------------------------------- | |
77 | ||
78 | 1. Setup system boot | |
79 | Add the following options to the kernel bootline: | |
80 | ||
81 | `default_hugepagesz=1GB hugepagesz=1G hugepages=1` | |
82 | ||
83 | 2. Setup DPDK devices: | |
491c2ea3 MG |
84 | |
85 | DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO | |
86 | modules. UIO requires inserting an out of tree driver igb_uio.ko that is | |
87 | available in DPDK. Setup for both methods are described below. | |
88 | ||
89 | * UIO: | |
90 | 1. insert uio.ko: `modprobe uio` | |
91 | 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` | |
92 | 3. Bind network device to igb_uio: | |
93 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` | |
94 | ||
95 | * VFIO: | |
96 | ||
97 | VFIO needs to be supported in the kernel and the BIOS. More information | |
98 | can be found in the [DPDK Linux GSG]. | |
99 | ||
100 | 1. Insert vfio-pci.ko: `modprobe vfio-pci` | |
101 | 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` | |
102 | and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` | |
103 | 3. Bind network device to vfio-pci: | |
104 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` | |
542cc9bb TG |
105 | |
106 | 3. Mount the hugetable filsystem | |
107 | ||
108 | `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` | |
109 | ||
110 | Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. | |
111 | ||
a52b0492 GS |
112 | 4. Follow the instructions in [INSTALL.md] to install only the |
113 | userspace daemons and utilities (via 'make install'). | |
542cc9bb TG |
114 | 1. First time only db creation (or clearing): |
115 | ||
a52b0492 GS |
116 | ``` |
117 | mkdir -p /usr/local/etc/openvswitch | |
118 | mkdir -p /usr/local/var/run/openvswitch | |
119 | rm /usr/local/etc/openvswitch/conf.db | |
120 | ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ | |
121 | /usr/local/share/openvswitch/vswitch.ovsschema | |
122 | ``` | |
542cc9bb | 123 | |
a52b0492 | 124 | 2. Start ovsdb-server |
542cc9bb | 125 | |
a52b0492 GS |
126 | ``` |
127 | ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ | |
542cc9bb TG |
128 | --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ |
129 | --private-key=db:Open_vSwitch,SSL,private_key \ | |
130 | --certificate=Open_vSwitch,SSL,certificate \ | |
131 | --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach | |
a52b0492 | 132 | ``` |
542cc9bb TG |
133 | |
134 | 3. First time after db creation, initialize: | |
135 | ||
a52b0492 GS |
136 | ``` |
137 | ovs-vsctl --no-wait init | |
138 | ``` | |
542cc9bb TG |
139 | |
140 | 5. Start vswitchd: | |
141 | ||
142 | DPDK configuration arguments can be passed to vswitchd via `--dpdk` | |
143 | argument. This needs to be first argument passed to vswitchd process. | |
144 | dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter | |
145 | for dpdk initialization. | |
146 | ||
a52b0492 GS |
147 | ``` |
148 | export DB_SOCK=/usr/local/var/run/openvswitch/db.sock | |
149 | ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach | |
150 | ``` | |
542cc9bb | 151 | |
a52b0492 GS |
152 | If allocated more than one GB hugepage (as for IVSHMEM), set amount and |
153 | use NUMA node 0 memory: | |
542cc9bb | 154 | |
a52b0492 GS |
155 | ``` |
156 | ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ | |
157 | -- unix:$DB_SOCK --pidfile --detach | |
158 | ``` | |
542cc9bb TG |
159 | |
160 | 6. Add bridge & ports | |
161 | ||
162 | To use ovs-vswitchd with DPDK, create a bridge with datapath_type | |
163 | "netdev" in the configuration database. For example: | |
164 | ||
a52b0492 | 165 | `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` |
542cc9bb TG |
166 | |
167 | Now you can add dpdk devices. OVS expect DPDK device name start with dpdk | |
a52b0492 GS |
168 | and end with portid. vswitchd should print (in the log file) the number |
169 | of dpdk devices found. | |
542cc9bb | 170 | |
a52b0492 GS |
171 | ``` |
172 | ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk | |
173 | ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk | |
174 | ``` | |
542cc9bb | 175 | |
a52b0492 GS |
176 | Once first DPDK port is added to vswitchd, it creates a Polling thread and |
177 | polls dpdk device in continuous loop. Therefore CPU utilization | |
178 | for that thread is always 100%. | |
542cc9bb TG |
179 | |
180 | 7. Add test flows | |
181 | ||
182 | Test flow script across NICs (assuming ovs in /usr/src/ovs): | |
183 | Execute script: | |
184 | ||
185 | ``` | |
186 | #! /bin/sh | |
187 | # Move to command directory | |
188 | cd /usr/src/ovs/utilities/ | |
189 | ||
190 | # Clear current flows | |
191 | ./ovs-ofctl del-flows br0 | |
192 | ||
193 | # Add flows between port 1 (dpdk0) to port 2 (dpdk1) | |
194 | ./ovs-ofctl add-flow br0 in_port=1,action=output:2 | |
195 | ./ovs-ofctl add-flow br0 in_port=2,action=output:1 | |
196 | ``` | |
197 | ||
198 | 8. Performance tuning | |
199 | ||
200 | With pmd multi-threading support, OVS creates one pmd thread for each | |
201 | numa node as default. The pmd thread handles the I/O of all DPDK | |
202 | interfaces on the same numa node. The following two commands can be used | |
203 | to configure the multi-threading behavior. | |
204 | ||
a52b0492 | 205 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` |
542cc9bb | 206 | |
a52b0492 GS |
207 | The command above asks for a CPU mask for setting the affinity of pmd |
208 | threads. A set bit in the mask means a pmd thread is created and pinned | |
209 | to the corresponding CPU core. For more information, please refer to | |
542cc9bb TG |
210 | `man ovs-vswitchd.conf.db` |
211 | ||
a52b0492 | 212 | `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>` |
542cc9bb TG |
213 | |
214 | The command above sets the number of rx queues of each DPDK interface. The | |
215 | rx queues are assigned to pmd threads on the same numa node in round-robin | |
216 | fashion. For more information, please refer to `man ovs-vswitchd.conf.db` | |
217 | ||
218 | Ideally for maximum throughput, the pmd thread should not be scheduled out | |
219 | which temporarily halts its execution. The following affinitization methods | |
220 | can help. | |
221 | ||
222 | Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core | |
223 | sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 | |
224 | and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu | |
225 | configuration could have different core mask requirements). | |
226 | ||
227 | To kernel bootline add core isolation list for cores and associated hype cores | |
228 | (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take | |
229 | effect, restart everything. | |
230 | ||
231 | Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': | |
232 | ||
a52b0492 | 233 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550` |
542cc9bb TG |
234 | |
235 | You should be able to check that pmd threads are pinned to the correct cores | |
236 | via: | |
237 | ||
a52b0492 GS |
238 | ``` |
239 | top -p `pidof ovs-vswitchd` -H -d1 | |
240 | ``` | |
542cc9bb TG |
241 | |
242 | Note, the pmd threads on a numa node are only created if there is at least | |
243 | one DPDK interface from the numa node that has been added to OVS. | |
244 | ||
245 | Note, core 0 is always reserved from non-pmd threads and should never be set | |
246 | in the cpu mask. | |
247 | ||
248 | DPDK Rings : | |
249 | ------------ | |
250 | ||
251 | Following the steps above to create a bridge, you can now add dpdk rings | |
252 | as a port to the vswitch. OVS will expect the DPDK ring device name to | |
253 | start with dpdkr and end with a portid. | |
254 | ||
a52b0492 | 255 | `ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` |
542cc9bb TG |
256 | |
257 | DPDK rings client test application | |
258 | ||
259 | Included in the test directory is a sample DPDK application for testing | |
260 | the rings. This is from the base dpdk directory and modified to work | |
261 | with the ring naming used within ovs. | |
262 | ||
263 | location tests/ovs_client | |
264 | ||
265 | To run the client : | |
266 | ||
a52b0492 GS |
267 | ``` |
268 | cd /usr/src/ovs/tests/ | |
269 | ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" | |
270 | ``` | |
542cc9bb TG |
271 | |
272 | In the case of the dpdkr example above the "port id you gave dpdkr" is 0. | |
273 | ||
274 | It is essential to have --proc-type=secondary | |
275 | ||
276 | The application simply receives an mbuf on the receive queue of the | |
277 | ethernet ring and then places that same mbuf on the transmit ring of | |
278 | the ethernet ring. It is a trivial loopback application. | |
279 | ||
280 | DPDK rings in VM (IVSHMEM shared memory communications) | |
281 | ------------------------------------------------------- | |
282 | ||
283 | In addition to executing the client in the host, you can execute it within | |
284 | a guest VM. To do so you will need a patched qemu. You can download the | |
285 | patch and getting started guide at : | |
286 | ||
287 | https://01.org/packet-processing/downloads | |
288 | ||
289 | A general rule of thumb for better performance is that the client | |
290 | application should not be assigned the same dpdk core mask "-c" as | |
291 | the vswitchd. | |
292 | ||
293 | Restrictions: | |
294 | ------------- | |
295 | ||
296 | - This Support is for Physical NIC. I have tested with Intel NIC only. | |
297 | - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. | |
298 | - Currently DPDK port does not make use any offload functionality. | |
299 | ||
300 | ivshmem: | |
301 | - The shared memory is currently restricted to the use of a 1GB | |
302 | huge pages. | |
303 | - All huge pages are shared amongst the host, clients, virtual | |
304 | machines etc. | |
305 | ||
306 | Bug Reporting: | |
307 | -------------- | |
308 | ||
309 | Please report problems to bugs@openvswitch.org. | |
9feb1017 TG |
310 | |
311 | [INSTALL.userspace.md]:INSTALL.userspace.md | |
312 | [INSTALL.md]:INSTALL.md | |
491c2ea3 | 313 | [DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules |