]>
Commit | Line | Data |
---|---|---|
542cc9bb TG |
1 | Using Open vSwitch with DPDK |
2 | ============================ | |
3 | ||
4 | Open vSwitch can use Intel(R) DPDK lib to operate entirely in | |
5 | userspace. This file explains how to install and use Open vSwitch in | |
6 | such a mode. | |
7 | ||
8 | The DPDK support of Open vSwitch is considered experimental. | |
9 | It has not been thoroughly tested. | |
10 | ||
11 | This version of Open vSwitch should be built manually with `configure` | |
12 | and `make`. | |
13 | ||
14 | OVS needs a system with 1GB hugepages support. | |
15 | ||
16 | Building and Installing: | |
17 | ------------------------ | |
18 | ||
7d1ced01 CL |
19 | Required: DPDK 2.0 |
20 | Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` | |
21 | on Debian/Ubuntu) | |
542cc9bb TG |
22 | |
23 | 1. Configure build & install DPDK: | |
24 | 1. Set `$DPDK_DIR` | |
25 | ||
26 | ``` | |
543342a4 | 27 | export DPDK_DIR=/usr/src/dpdk-2.0 |
542cc9bb TG |
28 | cd $DPDK_DIR |
29 | ``` | |
30 | ||
31 | 2. Update `config/common_linuxapp` so that DPDK generate single lib file. | |
32 | (modification also required for IVSHMEM build) | |
33 | ||
34 | `CONFIG_RTE_BUILD_COMBINE_LIBS=y` | |
35 | ||
777cb787 | 36 | Update `config/common_linuxapp` so that DPDK is built with vhost |
7d1ced01 | 37 | libraries. |
777cb787 DDP |
38 | |
39 | `CONFIG_RTE_LIBRTE_VHOST=y` | |
40 | ||
41 | Then run `make install` to build and install the library. | |
542cc9bb TG |
42 | For default install without IVSHMEM: |
43 | ||
44 | `make install T=x86_64-native-linuxapp-gcc` | |
45 | ||
46 | To include IVSHMEM (shared memory): | |
47 | ||
48 | `make install T=x86_64-ivshmem-linuxapp-gcc` | |
49 | ||
50 | For further details refer to http://dpdk.org/ | |
51 | ||
52 | 2. Configure & build the Linux kernel: | |
53 | ||
54 | Refer to intel-dpdk-getting-started-guide.pdf for understanding | |
55 | DPDK kernel requirement. | |
56 | ||
57 | 3. Configure & build OVS: | |
58 | ||
59 | * Non IVSHMEM: | |
60 | ||
61 | `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` | |
62 | ||
63 | * IVSHMEM: | |
64 | ||
65 | `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` | |
66 | ||
67 | ``` | |
68 | cd $(OVS_DIR)/openvswitch | |
69 | ./boot.sh | |
543342a4 | 70 | ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] |
542cc9bb TG |
71 | make |
72 | ``` | |
73 | ||
543342a4 MK |
74 | Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings. |
75 | ||
542cc9bb TG |
76 | To have better performance one can enable aggressive compiler optimizations and |
77 | use the special instructions(popcnt, crc32) that may not be available on all | |
78 | machines. Instead of typing `make`, type: | |
79 | ||
80 | `make CFLAGS='-O3 -march=native'` | |
81 | ||
9feb1017 | 82 | Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS. |
542cc9bb TG |
83 | |
84 | Using the DPDK with ovs-vswitchd: | |
85 | --------------------------------- | |
86 | ||
87 | 1. Setup system boot | |
88 | Add the following options to the kernel bootline: | |
89 | ||
90 | `default_hugepagesz=1GB hugepagesz=1G hugepages=1` | |
91 | ||
92 | 2. Setup DPDK devices: | |
491c2ea3 MG |
93 | |
94 | DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO | |
95 | modules. UIO requires inserting an out of tree driver igb_uio.ko that is | |
96 | available in DPDK. Setup for both methods are described below. | |
97 | ||
98 | * UIO: | |
99 | 1. insert uio.ko: `modprobe uio` | |
100 | 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` | |
101 | 3. Bind network device to igb_uio: | |
dbde55e7 | 102 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` |
491c2ea3 MG |
103 | |
104 | * VFIO: | |
105 | ||
106 | VFIO needs to be supported in the kernel and the BIOS. More information | |
107 | can be found in the [DPDK Linux GSG]. | |
108 | ||
109 | 1. Insert vfio-pci.ko: `modprobe vfio-pci` | |
110 | 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` | |
111 | and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` | |
112 | 3. Bind network device to vfio-pci: | |
dbde55e7 | 113 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` |
542cc9bb TG |
114 | |
115 | 3. Mount the hugetable filsystem | |
116 | ||
117 | `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` | |
118 | ||
119 | Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. | |
120 | ||
a52b0492 GS |
121 | 4. Follow the instructions in [INSTALL.md] to install only the |
122 | userspace daemons and utilities (via 'make install'). | |
542cc9bb TG |
123 | 1. First time only db creation (or clearing): |
124 | ||
a52b0492 GS |
125 | ``` |
126 | mkdir -p /usr/local/etc/openvswitch | |
127 | mkdir -p /usr/local/var/run/openvswitch | |
128 | rm /usr/local/etc/openvswitch/conf.db | |
129 | ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ | |
130 | /usr/local/share/openvswitch/vswitch.ovsschema | |
131 | ``` | |
542cc9bb | 132 | |
a52b0492 | 133 | 2. Start ovsdb-server |
542cc9bb | 134 | |
a52b0492 GS |
135 | ``` |
136 | ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ | |
542cc9bb TG |
137 | --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ |
138 | --private-key=db:Open_vSwitch,SSL,private_key \ | |
139 | --certificate=Open_vSwitch,SSL,certificate \ | |
140 | --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach | |
a52b0492 | 141 | ``` |
542cc9bb TG |
142 | |
143 | 3. First time after db creation, initialize: | |
144 | ||
a52b0492 GS |
145 | ``` |
146 | ovs-vsctl --no-wait init | |
147 | ``` | |
542cc9bb TG |
148 | |
149 | 5. Start vswitchd: | |
150 | ||
151 | DPDK configuration arguments can be passed to vswitchd via `--dpdk` | |
152 | argument. This needs to be first argument passed to vswitchd process. | |
153 | dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter | |
154 | for dpdk initialization. | |
155 | ||
a52b0492 GS |
156 | ``` |
157 | export DB_SOCK=/usr/local/var/run/openvswitch/db.sock | |
158 | ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach | |
159 | ``` | |
542cc9bb | 160 | |
a52b0492 GS |
161 | If allocated more than one GB hugepage (as for IVSHMEM), set amount and |
162 | use NUMA node 0 memory: | |
542cc9bb | 163 | |
a52b0492 GS |
164 | ``` |
165 | ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ | |
166 | -- unix:$DB_SOCK --pidfile --detach | |
167 | ``` | |
542cc9bb TG |
168 | |
169 | 6. Add bridge & ports | |
b8e57534 | 170 | |
542cc9bb TG |
171 | To use ovs-vswitchd with DPDK, create a bridge with datapath_type |
172 | "netdev" in the configuration database. For example: | |
173 | ||
a52b0492 | 174 | `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` |
542cc9bb TG |
175 | |
176 | Now you can add dpdk devices. OVS expect DPDK device name start with dpdk | |
a52b0492 GS |
177 | and end with portid. vswitchd should print (in the log file) the number |
178 | of dpdk devices found. | |
542cc9bb | 179 | |
a52b0492 GS |
180 | ``` |
181 | ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk | |
182 | ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk | |
183 | ``` | |
542cc9bb | 184 | |
a52b0492 GS |
185 | Once first DPDK port is added to vswitchd, it creates a Polling thread and |
186 | polls dpdk device in continuous loop. Therefore CPU utilization | |
187 | for that thread is always 100%. | |
542cc9bb | 188 | |
77c180ce BM |
189 | Note: creating bonds of DPDK interfaces is slightly different to creating |
190 | bonds of system interfaces. For DPDK, the interface type must be explicitly | |
191 | set, for example: | |
192 | ||
193 | ``` | |
194 | ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk | |
195 | ``` | |
196 | ||
542cc9bb TG |
197 | 7. Add test flows |
198 | ||
199 | Test flow script across NICs (assuming ovs in /usr/src/ovs): | |
200 | Execute script: | |
201 | ||
202 | ``` | |
203 | #! /bin/sh | |
204 | # Move to command directory | |
205 | cd /usr/src/ovs/utilities/ | |
206 | ||
207 | # Clear current flows | |
208 | ./ovs-ofctl del-flows br0 | |
209 | ||
210 | # Add flows between port 1 (dpdk0) to port 2 (dpdk1) | |
211 | ./ovs-ofctl add-flow br0 in_port=1,action=output:2 | |
212 | ./ovs-ofctl add-flow br0 in_port=2,action=output:1 | |
213 | ``` | |
214 | ||
215 | 8. Performance tuning | |
216 | ||
217 | With pmd multi-threading support, OVS creates one pmd thread for each | |
218 | numa node as default. The pmd thread handles the I/O of all DPDK | |
219 | interfaces on the same numa node. The following two commands can be used | |
220 | to configure the multi-threading behavior. | |
221 | ||
a52b0492 | 222 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` |
542cc9bb | 223 | |
a52b0492 GS |
224 | The command above asks for a CPU mask for setting the affinity of pmd |
225 | threads. A set bit in the mask means a pmd thread is created and pinned | |
226 | to the corresponding CPU core. For more information, please refer to | |
542cc9bb TG |
227 | `man ovs-vswitchd.conf.db` |
228 | ||
a52b0492 | 229 | `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>` |
542cc9bb TG |
230 | |
231 | The command above sets the number of rx queues of each DPDK interface. The | |
232 | rx queues are assigned to pmd threads on the same numa node in round-robin | |
233 | fashion. For more information, please refer to `man ovs-vswitchd.conf.db` | |
234 | ||
235 | Ideally for maximum throughput, the pmd thread should not be scheduled out | |
236 | which temporarily halts its execution. The following affinitization methods | |
237 | can help. | |
238 | ||
239 | Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core | |
240 | sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 | |
241 | and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu | |
242 | configuration could have different core mask requirements). | |
243 | ||
244 | To kernel bootline add core isolation list for cores and associated hype cores | |
245 | (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take | |
246 | effect, restart everything. | |
247 | ||
248 | Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': | |
249 | ||
a52b0492 | 250 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550` |
542cc9bb TG |
251 | |
252 | You should be able to check that pmd threads are pinned to the correct cores | |
253 | via: | |
254 | ||
a52b0492 GS |
255 | ``` |
256 | top -p `pidof ovs-vswitchd` -H -d1 | |
257 | ``` | |
542cc9bb TG |
258 | |
259 | Note, the pmd threads on a numa node are only created if there is at least | |
260 | one DPDK interface from the numa node that has been added to OVS. | |
261 | ||
6553d06b DDP |
262 | To understand where most of the time is spent and whether the caches are |
263 | effective, these commands can be used: | |
264 | ||
265 | ``` | |
266 | ovs-appctl dpif-netdev/pmd-stats-clear #To reset statistics | |
267 | ovs-appctl dpif-netdev/pmd-stats-show | |
268 | ``` | |
269 | ||
542cc9bb TG |
270 | DPDK Rings : |
271 | ------------ | |
272 | ||
273 | Following the steps above to create a bridge, you can now add dpdk rings | |
274 | as a port to the vswitch. OVS will expect the DPDK ring device name to | |
275 | start with dpdkr and end with a portid. | |
276 | ||
a52b0492 | 277 | `ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` |
542cc9bb TG |
278 | |
279 | DPDK rings client test application | |
280 | ||
281 | Included in the test directory is a sample DPDK application for testing | |
282 | the rings. This is from the base dpdk directory and modified to work | |
283 | with the ring naming used within ovs. | |
284 | ||
285 | location tests/ovs_client | |
286 | ||
287 | To run the client : | |
288 | ||
a52b0492 GS |
289 | ``` |
290 | cd /usr/src/ovs/tests/ | |
291 | ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" | |
292 | ``` | |
542cc9bb TG |
293 | |
294 | In the case of the dpdkr example above the "port id you gave dpdkr" is 0. | |
295 | ||
296 | It is essential to have --proc-type=secondary | |
297 | ||
298 | The application simply receives an mbuf on the receive queue of the | |
299 | ethernet ring and then places that same mbuf on the transmit ring of | |
300 | the ethernet ring. It is a trivial loopback application. | |
301 | ||
302 | DPDK rings in VM (IVSHMEM shared memory communications) | |
303 | ------------------------------------------------------- | |
304 | ||
305 | In addition to executing the client in the host, you can execute it within | |
306 | a guest VM. To do so you will need a patched qemu. You can download the | |
307 | patch and getting started guide at : | |
308 | ||
309 | https://01.org/packet-processing/downloads | |
310 | ||
311 | A general rule of thumb for better performance is that the client | |
312 | application should not be assigned the same dpdk core mask "-c" as | |
313 | the vswitchd. | |
314 | ||
58397e6c KT |
315 | DPDK vhost: |
316 | ----------- | |
317 | ||
7d1ced01 | 318 | DPDK 2.0 supports two types of vhost: |
58397e6c | 319 | |
7d1ced01 CL |
320 | 1. vhost-user |
321 | 2. vhost-cuse | |
58397e6c | 322 | |
7d1ced01 CL |
323 | Whatever type of vhost is enabled in the DPDK build specified, is the type |
324 | that will be enabled in OVS. By default, vhost-user is enabled in DPDK. | |
325 | Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports | |
326 | will be enabled in OVS. | |
327 | Please note that support for vhost-cuse is intended to be deprecated in OVS | |
328 | in a future release. | |
58397e6c | 329 | |
7d1ced01 CL |
330 | DPDK vhost-user: |
331 | ---------------- | |
58397e6c | 332 | |
7d1ced01 CL |
333 | The following sections describe the use of vhost-user 'dpdkvhostuser' ports |
334 | with OVS. | |
58397e6c | 335 | |
7d1ced01 CL |
336 | DPDK vhost-user Prerequisites: |
337 | ------------------------- | |
58397e6c | 338 | |
7d1ced01 CL |
339 | 1. DPDK 2.0 with vhost support enabled as documented in the "Building and |
340 | Installing section" | |
58397e6c | 341 | |
7d1ced01 | 342 | 2. QEMU version v2.1.0+ |
58397e6c | 343 | |
7d1ced01 CL |
344 | QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing |
345 | your VM with memory greater than 1GB due to potential issues with memory | |
346 | mapping larger areas. | |
58397e6c | 347 | |
7d1ced01 CL |
348 | Adding DPDK vhost-user ports to the Switch: |
349 | -------------------------------------- | |
58397e6c | 350 | |
7d1ced01 CL |
351 | Following the steps above to create a bridge, you can now add DPDK vhost-user |
352 | as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can | |
353 | have arbitrary names. | |
58397e6c | 354 | |
7d1ced01 | 355 | - For vhost-user, the name of the port type is `dpdkvhostuser` |
58397e6c | 356 | |
7d1ced01 CL |
357 | ``` |
358 | ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 | |
359 | type=dpdkvhostuser | |
360 | ``` | |
361 | ||
362 | This action creates a socket located at | |
363 | `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide | |
364 | to your VM on the QEMU command line. More instructions on this can be | |
365 | found in the next section "DPDK vhost-user VM configuration" | |
366 | Note: If you wish for the vhost-user sockets to be created in a | |
367 | directory other than `/usr/local/var/run/openvswitch`, you may specify | |
368 | another location on the ovs-vswitchd command line like so: | |
369 | ||
370 | `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...` | |
371 | ||
372 | DPDK vhost-user VM configuration: | |
373 | --------------------------------- | |
374 | Follow the steps below to attach vhost-user port(s) to a VM. | |
375 | ||
376 | 1. Configure sockets. | |
377 | Pass the following parameters to QEMU to attach a vhost-user device: | |
378 | ||
379 | ``` | |
380 | -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 | |
381 | -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce | |
382 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 | |
383 | ``` | |
384 | ||
385 | ...where vhost-user-1 is the name of the vhost-user port added | |
386 | to the switch. | |
387 | Repeat the above parameters for multiple devices, changing the | |
388 | chardev path and id as necessary. Note that a separate and different | |
389 | chardev path needs to be specified for each vhost-user device. For | |
390 | example you have a second vhost-user port named 'vhost-user-2', you | |
391 | append your QEMU command line with an additional set of parameters: | |
392 | ||
393 | ``` | |
394 | -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 | |
395 | -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce | |
396 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 | |
397 | ``` | |
398 | ||
399 | 2. Configure huge pages. | |
400 | QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access | |
401 | a virtio-net device's virtual rings and packet buffers mapping the VM's | |
402 | physical memory on hugetlbfs. To enable vhost-user ports to map the VM's | |
403 | memory into their process address space, pass the following paramters | |
404 | to QEMU: | |
405 | ||
406 | ``` | |
407 | -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, | |
408 | share=on | |
409 | -numa node,memdev=mem -mem-prealloc | |
410 | ``` | |
411 | ||
412 | DPDK vhost-cuse: | |
413 | ---------------- | |
414 | ||
415 | The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports | |
416 | with OVS. | |
417 | ||
418 | DPDK vhost-cuse Prerequisites: | |
419 | ------------------------- | |
420 | ||
421 | 1. DPDK 2.0 with vhost support enabled as documented in the "Building and | |
422 | Installing section" | |
423 | As an additional step, you must enable vhost-cuse in DPDK by setting the | |
424 | following additional flag in `config/common_linuxapp`: | |
425 | ||
426 | `CONFIG_RTE_LIBRTE_VHOST_USER=n` | |
427 | ||
428 | Following this, rebuild DPDK as per the instructions in the "Building and | |
429 | Installing" section. Finally, rebuild OVS as per step 3 in the "Building | |
430 | and Installing" section - OVS will detect that DPDK has vhost-cuse libraries | |
431 | compiled and in turn will enable support for it in the switch and disable | |
432 | vhost-user support. | |
433 | ||
434 | 2. Insert the Cuse module: | |
435 | ||
436 | `modprobe cuse` | |
437 | ||
438 | 3. Build and insert the `eventfd_link` module: | |
439 | ||
440 | ``` | |
441 | cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ | |
442 | make | |
443 | insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko | |
444 | ``` | |
445 | ||
446 | 4. QEMU version v2.1.0+ | |
447 | ||
448 | vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to | |
449 | use v2.2.0 if providing your VM with memory greater than 1GB due to potential | |
450 | issues with memory mapping larger areas. | |
451 | Note: QEMU v1.6.2 will also work, with slightly different command line parameters, | |
452 | which are specified later in this document. | |
453 | ||
454 | Adding DPDK vhost-cuse ports to the Switch: | |
455 | -------------------------------------- | |
456 | ||
457 | Following the steps above to create a bridge, you can now add DPDK vhost-cuse | |
458 | as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have | |
459 | arbitrary names. | |
460 | ||
461 | - For vhost-cuse, the name of the port type is `dpdkvhostcuse` | |
462 | ||
463 | ``` | |
464 | ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 | |
465 | type=dpdkvhostcuse | |
466 | ``` | |
467 | ||
468 | When attaching vhost-cuse ports to QEMU, the name provided during the | |
469 | add-port operation must match the ifname parameter on the QEMU command | |
470 | line. More instructions on this can be found in the next section. | |
471 | ||
472 | DPDK vhost-cuse VM configuration: | |
473 | --------------------------------- | |
474 | ||
475 | vhost-cuse ports use a Linux* character device to communicate with QEMU. | |
58397e6c KT |
476 | By default it is set to `/dev/vhost-net`. It is possible to reuse this |
477 | standard device for DPDK vhost, which makes setup a little simpler but it | |
478 | is better practice to specify an alternative character device in order to | |
479 | avoid any conflicts if kernel vhost is to be used in parallel. | |
480 | ||
481 | 1. This step is only needed if using an alternative character device. | |
482 | ||
483 | The new character device filename must be specified on the vswitchd | |
484 | commandline: | |
485 | ||
486 | `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...` | |
487 | ||
488 | Note that the `--cuse_dev_name` argument and associated string must be the first | |
489 | arguments after `--dpdk` and come before the EAL arguments. In the example | |
490 | above, the character device to be used will be `/dev/my-vhost-net`. | |
491 | ||
492 | 2. This step is only needed if reusing the standard character device. It will | |
493 | conflict with the kernel vhost character device so the user must first | |
494 | remove it. | |
495 | ||
496 | `rm -rf /dev/vhost-net` | |
497 | ||
498 | 3a. Configure virtio-net adaptors: | |
499 | The following parameters must be passed to the QEMU binary: | |
500 | ||
501 | ``` | |
502 | -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on | |
503 | -device virtio-net-pci,netdev=net1,mac=<mac> | |
504 | ``` | |
505 | ||
506 | Repeat the above parameters for multiple devices. | |
507 | ||
508 | The DPDK vhost library will negiotiate its own features, so they | |
509 | need not be passed in as command line params. Note that as offloads are | |
510 | disabled this is the equivalent of setting: | |
511 | ||
512 | `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` | |
513 | ||
514 | 3b. If using an alternative character device. It must be also explicitly | |
515 | passed to QEMU using the `vhostfd` argument: | |
516 | ||
517 | ``` | |
518 | -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, | |
519 | vhostfd=<open_fd> | |
520 | -device virtio-net-pci,netdev=net1,mac=<mac> | |
521 | ``` | |
522 | ||
523 | The open file descriptor must be passed to QEMU running as a child | |
524 | process. This could be done with a simple python script. | |
525 | ||
526 | ``` | |
527 | #!/usr/bin/python | |
528 | fd = os.open("/dev/usvhost", os.O_RDWR) | |
529 | subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ | |
530 | vhost=on,vhostfd=" + fd +"...", shell=True) | |
531 | ||
532 | Alternatively the the `qemu-wrap.py` script can be used to automate the | |
533 | requirements specified above and can be used in conjunction with libvirt if | |
534 | desired. See the "DPDK vhost VM configuration with QEMU wrapper" section | |
535 | below. | |
536 | ||
537 | 4. Configure huge pages: | |
538 | QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a | |
539 | virtio-net device's virtual rings and packet buffers mapping the VM's | |
540 | physical memory on hugetlbfs. To enable vhost-ports to map the VM's | |
7d1ced01 | 541 | memory into their process address space, pass the following parameters |
58397e6c KT |
542 | to QEMU: |
543 | ||
544 | `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, | |
545 | share=on -numa node,memdev=mem -mem-prealloc` | |
546 | ||
7d1ced01 CL |
547 | Note: For use with an earlier QEMU version such as v1.6.2, use the |
548 | following to configure hugepages instead: | |
58397e6c | 549 | |
7d1ced01 | 550 | `-mem-path /dev/hugepages -mem-prealloc` |
58397e6c | 551 | |
7d1ced01 CL |
552 | DPDK vhost-cuse VM configuration with QEMU wrapper: |
553 | --------------------------------------------------- | |
58397e6c KT |
554 | The QEMU wrapper script automatically detects and calls QEMU with the |
555 | necessary parameters. It performs the following actions: | |
556 | ||
557 | * Automatically detects the location of the hugetlbfs and inserts this | |
558 | into the command line parameters. | |
559 | * Automatically open file descriptors for each virtio-net device and | |
560 | inserts this into the command line parameters. | |
561 | * Calls QEMU passing both the command line parameters passed to the | |
562 | script itself and those it has auto-detected. | |
563 | ||
564 | Before use, you **must** edit the configuration parameters section of the | |
565 | script to point to the correct emulator location and set additional | |
566 | settings. Of these settings, `emul_path` and `us_vhost_path` **must** be | |
567 | set. All other settings are optional. | |
568 | ||
569 | To use directly from the command line simply pass the wrapper some of the | |
570 | QEMU parameters: it will configure the rest. For example: | |
571 | ||
572 | ``` | |
573 | qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4 | |
574 | --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1, | |
575 | script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci, | |
576 | netdev=net1,mac=00:00:00:00:00:01 | |
5568661c | 577 | ``` |
58397e6c | 578 | |
7d1ced01 CL |
579 | DPDK vhost-cuse VM configuration with libvirt: |
580 | ---------------------------------------------- | |
58397e6c KT |
581 | |
582 | If you are using libvirt, you must enable libvirt to access the character | |
583 | device by adding it to controllers cgroup for libvirtd using the following | |
584 | steps. | |
585 | ||
586 | 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: | |
587 | ||
588 | ``` | |
589 | 1) clear_emulator_capabilities = 0 | |
590 | 2) user = "root" | |
591 | 3) group = "root" | |
592 | 4) cgroup_device_acl = [ | |
593 | "/dev/null", "/dev/full", "/dev/zero", | |
594 | "/dev/random", "/dev/urandom", | |
595 | "/dev/ptmx", "/dev/kvm", "/dev/kqemu", | |
596 | "/dev/rtc", "/dev/hpet", "/dev/net/tun", | |
597 | "/dev/<my-vhost-device>", | |
598 | "/dev/hugepages"] | |
599 | ``` | |
600 | ||
601 | <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net` | |
602 | device. If you have specificed a different name on the ovs-vswitchd | |
603 | commandline using the "--cuse_dev_name" parameter, please specify that | |
604 | filename instead. | |
605 | ||
606 | 2. Disable SELinux or set to permissive mode | |
607 | ||
608 | 3. Restart the libvirtd process | |
609 | For example, on Fedora: | |
610 | ||
611 | `systemctl restart libvirtd.service` | |
612 | ||
613 | After successfully editing the configuration, you may launch your | |
614 | vhost-enabled VM. The XML describing the VM can be configured like so | |
615 | within the <qemu:commandline> section: | |
616 | ||
617 | 1. Set up shared hugepages: | |
618 | ||
619 | ``` | |
620 | <qemu:arg value='-object'/> | |
621 | <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/> | |
622 | <qemu:arg value='-numa'/> | |
623 | <qemu:arg value='node,memdev=mem'/> | |
624 | <qemu:arg value='-mem-prealloc'/> | |
625 | ``` | |
626 | ||
627 | 2. Set up your tap devices: | |
628 | ||
629 | ``` | |
630 | <qemu:arg value='-netdev'/> | |
631 | <qemu:arg value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/> | |
632 | <qemu:arg value='-device'/> | |
633 | <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/> | |
634 | ``` | |
635 | ||
636 | Repeat for as many devices as are desired, modifying the id, ifname | |
637 | and mac as necessary. | |
638 | ||
639 | Again, if you are using an alternative character device (other than | |
640 | `/dev/vhost-net`), please specify the file descriptor like so: | |
641 | ||
642 | `<qemu:arg value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>` | |
643 | ||
644 | Where <open_fd> refers to the open file descriptor of the character device. | |
645 | Instructions of how to retrieve the file descriptor can be found in the | |
646 | "DPDK vhost VM configuration" section. | |
647 | Alternatively, the process is automated with the qemu-wrap.py script, | |
648 | detailed in the next section. | |
649 | ||
650 | Now you may launch your VM using virt-manager, or like so: | |
651 | ||
652 | `virsh create my_vhost_vm.xml` | |
653 | ||
7d1ced01 | 654 | DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: |
58397e6c KT |
655 | ---------------------------------------------------------- |
656 | ||
657 | To use the qemu-wrapper script in conjuntion with libvirt, follow the | |
658 | steps in the previous section before proceeding with the following steps: | |
659 | ||
660 | 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) | |
661 | Ideally in the same directory that the QEMU binary is located. | |
662 | ||
663 | 2. Ensure that the script has the same owner/group and file permissions | |
664 | as the QEMU binary. | |
665 | ||
666 | 3. Update the VM xml file using "virsh edit VM.xml" | |
667 | ||
668 | 1. Set the VM to use the launch script. | |
669 | Set the emulator path contained in the `<emulator><emulator/>` tags. | |
670 | For example, replace: | |
671 | ||
672 | `<emulator>/usr/bin/qemu-kvm<emulator/>` | |
673 | ||
674 | with: | |
675 | ||
676 | `<emulator>/usr/bin/qemu-wrap.py<emulator/>` | |
677 | ||
678 | 4. Edit the Configuration Parameters section of the script to point to | |
679 | the correct emulator location and set any additional options. If you are | |
680 | using a alternative character device name, please set "us_vhost_path" to the | |
681 | location of that device. The script will automatically detect and insert | |
7d1ced01 | 682 | the correct "vhostfd" value in the QEMU command line arguments. |
58397e6c KT |
683 | |
684 | 5. Use virt-manager to launch the VM | |
685 | ||
9899125a OS |
686 | Running ovs-vswitchd with DPDK backend inside a VM |
687 | -------------------------------------------------- | |
688 | ||
689 | Please note that additional configuration is required if you want to run | |
690 | ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd | |
691 | creates separate DPDK TX queues for each CPU core available. This operation | |
692 | fails inside QEMU virtual machine because, by default, VirtIO NIC provided | |
693 | to the guest is configured to support only single TX queue and single RX | |
694 | queue. To change this behavior, you need to turn on 'mq' (multiqueue) | |
695 | property of all virtio-net-pci devices emulated by QEMU and used by DPDK. | |
696 | You may do it manually (by changing QEMU command line) or, if you use Libvirt, | |
697 | by adding the following string: | |
698 | ||
699 | `<driver name='vhost' queues='N'/>` | |
700 | ||
701 | to <interface> sections of all network devices used by DPDK. Parameter 'N' | |
702 | determines how many queues can be used by the guest. | |
703 | ||
542cc9bb TG |
704 | Restrictions: |
705 | ------------- | |
706 | ||
542cc9bb TG |
707 | - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. |
708 | - Currently DPDK port does not make use any offload functionality. | |
58397e6c | 709 | - DPDK-vHost support works with 1G huge pages. |
542cc9bb TG |
710 | |
711 | ivshmem: | |
3088fab7 MG |
712 | - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be |
713 | unable to share any rings or mempools with a virtual machine. | |
714 | This is because the current implementation of ivshmem works by sharing | |
715 | a single 1GB huge page from the host operating system to any guest | |
716 | operating system through the Qemu ivshmem device. When using smaller | |
717 | page sizes, multiple pages may be required to hold the ring descriptors | |
718 | and buffer pools. The Qemu ivshmem device does not allow you to share | |
719 | multiple file descriptors to the guest operating system. However, if you | |
720 | want to share dpdkr rings with other processes on the host, you can do | |
721 | this with smaller page sizes. | |
542cc9bb TG |
722 | |
723 | Bug Reporting: | |
724 | -------------- | |
725 | ||
726 | Please report problems to bugs@openvswitch.org. | |
9feb1017 TG |
727 | |
728 | [INSTALL.userspace.md]:INSTALL.userspace.md | |
729 | [INSTALL.md]:INSTALL.md | |
491c2ea3 | 730 | [DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules |
58397e6c | 731 | [DPDK Docs]: http://dpdk.org/doc |