]>
Commit | Line | Data |
---|---|---|
542cc9bb TG |
1 | Using Open vSwitch with DPDK |
2 | ============================ | |
3 | ||
4 | Open vSwitch can use Intel(R) DPDK lib to operate entirely in | |
5 | userspace. This file explains how to install and use Open vSwitch in | |
6 | such a mode. | |
7 | ||
8 | The DPDK support of Open vSwitch is considered experimental. | |
9 | It has not been thoroughly tested. | |
10 | ||
11 | This version of Open vSwitch should be built manually with `configure` | |
12 | and `make`. | |
13 | ||
14 | OVS needs a system with 1GB hugepages support. | |
15 | ||
16 | Building and Installing: | |
17 | ------------------------ | |
18 | ||
543342a4 | 19 | Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) |
542cc9bb TG |
20 | |
21 | 1. Configure build & install DPDK: | |
22 | 1. Set `$DPDK_DIR` | |
23 | ||
24 | ``` | |
543342a4 | 25 | export DPDK_DIR=/usr/src/dpdk-2.0 |
542cc9bb TG |
26 | cd $DPDK_DIR |
27 | ``` | |
28 | ||
29 | 2. Update `config/common_linuxapp` so that DPDK generate single lib file. | |
30 | (modification also required for IVSHMEM build) | |
31 | ||
32 | `CONFIG_RTE_BUILD_COMBINE_LIBS=y` | |
33 | ||
777cb787 | 34 | Update `config/common_linuxapp` so that DPDK is built with vhost |
543342a4 MK |
35 | libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user |
36 | libraries should be explicitly turned off (they are enabled by default | |
37 | in DPDK 2.0). | |
777cb787 DDP |
38 | |
39 | `CONFIG_RTE_LIBRTE_VHOST=y` | |
543342a4 | 40 | `CONFIG_RTE_LIBRTE_VHOST_USER=n` |
777cb787 DDP |
41 | |
42 | Then run `make install` to build and install the library. | |
542cc9bb TG |
43 | For default install without IVSHMEM: |
44 | ||
45 | `make install T=x86_64-native-linuxapp-gcc` | |
46 | ||
47 | To include IVSHMEM (shared memory): | |
48 | ||
49 | `make install T=x86_64-ivshmem-linuxapp-gcc` | |
50 | ||
51 | For further details refer to http://dpdk.org/ | |
52 | ||
53 | 2. Configure & build the Linux kernel: | |
54 | ||
55 | Refer to intel-dpdk-getting-started-guide.pdf for understanding | |
56 | DPDK kernel requirement. | |
57 | ||
58 | 3. Configure & build OVS: | |
59 | ||
60 | * Non IVSHMEM: | |
61 | ||
62 | `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` | |
63 | ||
64 | * IVSHMEM: | |
65 | ||
66 | `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` | |
67 | ||
68 | ``` | |
69 | cd $(OVS_DIR)/openvswitch | |
70 | ./boot.sh | |
543342a4 | 71 | ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] |
542cc9bb TG |
72 | make |
73 | ``` | |
74 | ||
543342a4 MK |
75 | Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings. |
76 | ||
542cc9bb TG |
77 | To have better performance one can enable aggressive compiler optimizations and |
78 | use the special instructions(popcnt, crc32) that may not be available on all | |
79 | machines. Instead of typing `make`, type: | |
80 | ||
81 | `make CFLAGS='-O3 -march=native'` | |
82 | ||
9feb1017 | 83 | Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS. |
542cc9bb TG |
84 | |
85 | Using the DPDK with ovs-vswitchd: | |
86 | --------------------------------- | |
87 | ||
88 | 1. Setup system boot | |
89 | Add the following options to the kernel bootline: | |
90 | ||
91 | `default_hugepagesz=1GB hugepagesz=1G hugepages=1` | |
92 | ||
93 | 2. Setup DPDK devices: | |
491c2ea3 MG |
94 | |
95 | DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO | |
96 | modules. UIO requires inserting an out of tree driver igb_uio.ko that is | |
97 | available in DPDK. Setup for both methods are described below. | |
98 | ||
99 | * UIO: | |
100 | 1. insert uio.ko: `modprobe uio` | |
101 | 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` | |
102 | 3. Bind network device to igb_uio: | |
dbde55e7 | 103 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` |
491c2ea3 MG |
104 | |
105 | * VFIO: | |
106 | ||
107 | VFIO needs to be supported in the kernel and the BIOS. More information | |
108 | can be found in the [DPDK Linux GSG]. | |
109 | ||
110 | 1. Insert vfio-pci.ko: `modprobe vfio-pci` | |
111 | 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` | |
112 | and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` | |
113 | 3. Bind network device to vfio-pci: | |
dbde55e7 | 114 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` |
542cc9bb TG |
115 | |
116 | 3. Mount the hugetable filsystem | |
117 | ||
118 | `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` | |
119 | ||
120 | Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. | |
121 | ||
a52b0492 GS |
122 | 4. Follow the instructions in [INSTALL.md] to install only the |
123 | userspace daemons and utilities (via 'make install'). | |
542cc9bb TG |
124 | 1. First time only db creation (or clearing): |
125 | ||
a52b0492 GS |
126 | ``` |
127 | mkdir -p /usr/local/etc/openvswitch | |
128 | mkdir -p /usr/local/var/run/openvswitch | |
129 | rm /usr/local/etc/openvswitch/conf.db | |
130 | ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ | |
131 | /usr/local/share/openvswitch/vswitch.ovsschema | |
132 | ``` | |
542cc9bb | 133 | |
a52b0492 | 134 | 2. Start ovsdb-server |
542cc9bb | 135 | |
a52b0492 GS |
136 | ``` |
137 | ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ | |
542cc9bb TG |
138 | --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ |
139 | --private-key=db:Open_vSwitch,SSL,private_key \ | |
140 | --certificate=Open_vSwitch,SSL,certificate \ | |
141 | --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach | |
a52b0492 | 142 | ``` |
542cc9bb TG |
143 | |
144 | 3. First time after db creation, initialize: | |
145 | ||
a52b0492 GS |
146 | ``` |
147 | ovs-vsctl --no-wait init | |
148 | ``` | |
542cc9bb TG |
149 | |
150 | 5. Start vswitchd: | |
151 | ||
152 | DPDK configuration arguments can be passed to vswitchd via `--dpdk` | |
153 | argument. This needs to be first argument passed to vswitchd process. | |
154 | dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter | |
155 | for dpdk initialization. | |
156 | ||
a52b0492 GS |
157 | ``` |
158 | export DB_SOCK=/usr/local/var/run/openvswitch/db.sock | |
159 | ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach | |
160 | ``` | |
542cc9bb | 161 | |
a52b0492 GS |
162 | If allocated more than one GB hugepage (as for IVSHMEM), set amount and |
163 | use NUMA node 0 memory: | |
542cc9bb | 164 | |
a52b0492 GS |
165 | ``` |
166 | ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ | |
167 | -- unix:$DB_SOCK --pidfile --detach | |
168 | ``` | |
542cc9bb TG |
169 | |
170 | 6. Add bridge & ports | |
b8e57534 | 171 | |
542cc9bb TG |
172 | To use ovs-vswitchd with DPDK, create a bridge with datapath_type |
173 | "netdev" in the configuration database. For example: | |
174 | ||
a52b0492 | 175 | `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` |
542cc9bb TG |
176 | |
177 | Now you can add dpdk devices. OVS expect DPDK device name start with dpdk | |
a52b0492 GS |
178 | and end with portid. vswitchd should print (in the log file) the number |
179 | of dpdk devices found. | |
542cc9bb | 180 | |
a52b0492 GS |
181 | ``` |
182 | ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk | |
183 | ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk | |
184 | ``` | |
542cc9bb | 185 | |
a52b0492 GS |
186 | Once first DPDK port is added to vswitchd, it creates a Polling thread and |
187 | polls dpdk device in continuous loop. Therefore CPU utilization | |
188 | for that thread is always 100%. | |
542cc9bb | 189 | |
77c180ce BM |
190 | Note: creating bonds of DPDK interfaces is slightly different to creating |
191 | bonds of system interfaces. For DPDK, the interface type must be explicitly | |
192 | set, for example: | |
193 | ||
194 | ``` | |
195 | ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk | |
196 | ``` | |
197 | ||
542cc9bb TG |
198 | 7. Add test flows |
199 | ||
200 | Test flow script across NICs (assuming ovs in /usr/src/ovs): | |
201 | Execute script: | |
202 | ||
203 | ``` | |
204 | #! /bin/sh | |
205 | # Move to command directory | |
206 | cd /usr/src/ovs/utilities/ | |
207 | ||
208 | # Clear current flows | |
209 | ./ovs-ofctl del-flows br0 | |
210 | ||
211 | # Add flows between port 1 (dpdk0) to port 2 (dpdk1) | |
212 | ./ovs-ofctl add-flow br0 in_port=1,action=output:2 | |
213 | ./ovs-ofctl add-flow br0 in_port=2,action=output:1 | |
214 | ``` | |
215 | ||
216 | 8. Performance tuning | |
217 | ||
218 | With pmd multi-threading support, OVS creates one pmd thread for each | |
219 | numa node as default. The pmd thread handles the I/O of all DPDK | |
220 | interfaces on the same numa node. The following two commands can be used | |
221 | to configure the multi-threading behavior. | |
222 | ||
a52b0492 | 223 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` |
542cc9bb | 224 | |
a52b0492 GS |
225 | The command above asks for a CPU mask for setting the affinity of pmd |
226 | threads. A set bit in the mask means a pmd thread is created and pinned | |
227 | to the corresponding CPU core. For more information, please refer to | |
542cc9bb TG |
228 | `man ovs-vswitchd.conf.db` |
229 | ||
a52b0492 | 230 | `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>` |
542cc9bb TG |
231 | |
232 | The command above sets the number of rx queues of each DPDK interface. The | |
233 | rx queues are assigned to pmd threads on the same numa node in round-robin | |
234 | fashion. For more information, please refer to `man ovs-vswitchd.conf.db` | |
235 | ||
236 | Ideally for maximum throughput, the pmd thread should not be scheduled out | |
237 | which temporarily halts its execution. The following affinitization methods | |
238 | can help. | |
239 | ||
240 | Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core | |
241 | sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 | |
242 | and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu | |
243 | configuration could have different core mask requirements). | |
244 | ||
245 | To kernel bootline add core isolation list for cores and associated hype cores | |
246 | (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take | |
247 | effect, restart everything. | |
248 | ||
249 | Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': | |
250 | ||
a52b0492 | 251 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550` |
542cc9bb TG |
252 | |
253 | You should be able to check that pmd threads are pinned to the correct cores | |
254 | via: | |
255 | ||
a52b0492 GS |
256 | ``` |
257 | top -p `pidof ovs-vswitchd` -H -d1 | |
258 | ``` | |
542cc9bb TG |
259 | |
260 | Note, the pmd threads on a numa node are only created if there is at least | |
261 | one DPDK interface from the numa node that has been added to OVS. | |
262 | ||
6553d06b DDP |
263 | To understand where most of the time is spent and whether the caches are |
264 | effective, these commands can be used: | |
265 | ||
266 | ``` | |
267 | ovs-appctl dpif-netdev/pmd-stats-clear #To reset statistics | |
268 | ovs-appctl dpif-netdev/pmd-stats-show | |
269 | ``` | |
270 | ||
542cc9bb TG |
271 | DPDK Rings : |
272 | ------------ | |
273 | ||
274 | Following the steps above to create a bridge, you can now add dpdk rings | |
275 | as a port to the vswitch. OVS will expect the DPDK ring device name to | |
276 | start with dpdkr and end with a portid. | |
277 | ||
a52b0492 | 278 | `ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` |
542cc9bb TG |
279 | |
280 | DPDK rings client test application | |
281 | ||
282 | Included in the test directory is a sample DPDK application for testing | |
283 | the rings. This is from the base dpdk directory and modified to work | |
284 | with the ring naming used within ovs. | |
285 | ||
286 | location tests/ovs_client | |
287 | ||
288 | To run the client : | |
289 | ||
a52b0492 GS |
290 | ``` |
291 | cd /usr/src/ovs/tests/ | |
292 | ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" | |
293 | ``` | |
542cc9bb TG |
294 | |
295 | In the case of the dpdkr example above the "port id you gave dpdkr" is 0. | |
296 | ||
297 | It is essential to have --proc-type=secondary | |
298 | ||
299 | The application simply receives an mbuf on the receive queue of the | |
300 | ethernet ring and then places that same mbuf on the transmit ring of | |
301 | the ethernet ring. It is a trivial loopback application. | |
302 | ||
303 | DPDK rings in VM (IVSHMEM shared memory communications) | |
304 | ------------------------------------------------------- | |
305 | ||
306 | In addition to executing the client in the host, you can execute it within | |
307 | a guest VM. To do so you will need a patched qemu. You can download the | |
308 | patch and getting started guide at : | |
309 | ||
310 | https://01.org/packet-processing/downloads | |
311 | ||
312 | A general rule of thumb for better performance is that the client | |
313 | application should not be assigned the same dpdk core mask "-c" as | |
314 | the vswitchd. | |
315 | ||
58397e6c KT |
316 | DPDK vhost: |
317 | ----------- | |
318 | ||
319 | vhost-cuse is only supported at present i.e. not using the standard QEMU | |
320 | vhost-user interface. It is intended that vhost-user support will be added | |
321 | in future releases when supported in DPDK and that vhost-cuse will eventually | |
322 | be deprecated. See [DPDK Docs] for more info on vhost. | |
323 | ||
324 | Prerequisites: | |
777cb787 | 325 | 1. Insert the Cuse module: |
58397e6c KT |
326 | |
327 | `modprobe cuse` | |
328 | ||
777cb787 | 329 | 2. Build and insert the `eventfd_link` module: |
58397e6c KT |
330 | |
331 | `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` | |
332 | `make` | |
333 | `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` | |
334 | ||
335 | Following the steps above to create a bridge, you can now add DPDK vhost | |
336 | as a port to the vswitch. | |
337 | ||
338 | `ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost` | |
339 | ||
340 | Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names: | |
341 | ||
342 | `ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost` | |
343 | ||
344 | However, please note that when attaching userspace devices to QEMU, the | |
345 | name provided during the add-port operation must match the ifname parameter | |
346 | on the QEMU command line. | |
347 | ||
348 | ||
349 | DPDK vhost VM configuration: | |
350 | ---------------------------- | |
351 | ||
352 | vhost ports use a Linux* character device to communicate with QEMU. | |
353 | By default it is set to `/dev/vhost-net`. It is possible to reuse this | |
354 | standard device for DPDK vhost, which makes setup a little simpler but it | |
355 | is better practice to specify an alternative character device in order to | |
356 | avoid any conflicts if kernel vhost is to be used in parallel. | |
357 | ||
358 | 1. This step is only needed if using an alternative character device. | |
359 | ||
360 | The new character device filename must be specified on the vswitchd | |
361 | commandline: | |
362 | ||
363 | `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...` | |
364 | ||
365 | Note that the `--cuse_dev_name` argument and associated string must be the first | |
366 | arguments after `--dpdk` and come before the EAL arguments. In the example | |
367 | above, the character device to be used will be `/dev/my-vhost-net`. | |
368 | ||
369 | 2. This step is only needed if reusing the standard character device. It will | |
370 | conflict with the kernel vhost character device so the user must first | |
371 | remove it. | |
372 | ||
373 | `rm -rf /dev/vhost-net` | |
374 | ||
375 | 3a. Configure virtio-net adaptors: | |
376 | The following parameters must be passed to the QEMU binary: | |
377 | ||
378 | ``` | |
379 | -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on | |
380 | -device virtio-net-pci,netdev=net1,mac=<mac> | |
381 | ``` | |
382 | ||
383 | Repeat the above parameters for multiple devices. | |
384 | ||
385 | The DPDK vhost library will negiotiate its own features, so they | |
386 | need not be passed in as command line params. Note that as offloads are | |
387 | disabled this is the equivalent of setting: | |
388 | ||
389 | `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` | |
390 | ||
391 | 3b. If using an alternative character device. It must be also explicitly | |
392 | passed to QEMU using the `vhostfd` argument: | |
393 | ||
394 | ``` | |
395 | -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, | |
396 | vhostfd=<open_fd> | |
397 | -device virtio-net-pci,netdev=net1,mac=<mac> | |
398 | ``` | |
399 | ||
400 | The open file descriptor must be passed to QEMU running as a child | |
401 | process. This could be done with a simple python script. | |
402 | ||
403 | ``` | |
404 | #!/usr/bin/python | |
405 | fd = os.open("/dev/usvhost", os.O_RDWR) | |
406 | subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ | |
407 | vhost=on,vhostfd=" + fd +"...", shell=True) | |
408 | ||
409 | Alternatively the the `qemu-wrap.py` script can be used to automate the | |
410 | requirements specified above and can be used in conjunction with libvirt if | |
411 | desired. See the "DPDK vhost VM configuration with QEMU wrapper" section | |
412 | below. | |
413 | ||
414 | 4. Configure huge pages: | |
415 | QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a | |
416 | virtio-net device's virtual rings and packet buffers mapping the VM's | |
417 | physical memory on hugetlbfs. To enable vhost-ports to map the VM's | |
418 | memory into their process address space, pass the following paramters | |
419 | to QEMU: | |
420 | ||
421 | `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, | |
422 | share=on -numa node,memdev=mem -mem-prealloc` | |
423 | ||
424 | ||
425 | DPDK vhost VM configuration with QEMU wrapper: | |
426 | ---------------------------------------------- | |
427 | ||
428 | The QEMU wrapper script automatically detects and calls QEMU with the | |
429 | necessary parameters. It performs the following actions: | |
430 | ||
431 | * Automatically detects the location of the hugetlbfs and inserts this | |
432 | into the command line parameters. | |
433 | * Automatically open file descriptors for each virtio-net device and | |
434 | inserts this into the command line parameters. | |
435 | * Calls QEMU passing both the command line parameters passed to the | |
436 | script itself and those it has auto-detected. | |
437 | ||
438 | Before use, you **must** edit the configuration parameters section of the | |
439 | script to point to the correct emulator location and set additional | |
440 | settings. Of these settings, `emul_path` and `us_vhost_path` **must** be | |
441 | set. All other settings are optional. | |
442 | ||
443 | To use directly from the command line simply pass the wrapper some of the | |
444 | QEMU parameters: it will configure the rest. For example: | |
445 | ||
446 | ``` | |
447 | qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4 | |
448 | --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1, | |
449 | script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci, | |
450 | netdev=net1,mac=00:00:00:00:00:01 | |
5568661c | 451 | ``` |
58397e6c KT |
452 | |
453 | DPDK vhost VM configuration with libvirt: | |
454 | ----------------------------------------- | |
455 | ||
456 | If you are using libvirt, you must enable libvirt to access the character | |
457 | device by adding it to controllers cgroup for libvirtd using the following | |
458 | steps. | |
459 | ||
460 | 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: | |
461 | ||
462 | ``` | |
463 | 1) clear_emulator_capabilities = 0 | |
464 | 2) user = "root" | |
465 | 3) group = "root" | |
466 | 4) cgroup_device_acl = [ | |
467 | "/dev/null", "/dev/full", "/dev/zero", | |
468 | "/dev/random", "/dev/urandom", | |
469 | "/dev/ptmx", "/dev/kvm", "/dev/kqemu", | |
470 | "/dev/rtc", "/dev/hpet", "/dev/net/tun", | |
471 | "/dev/<my-vhost-device>", | |
472 | "/dev/hugepages"] | |
473 | ``` | |
474 | ||
475 | <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net` | |
476 | device. If you have specificed a different name on the ovs-vswitchd | |
477 | commandline using the "--cuse_dev_name" parameter, please specify that | |
478 | filename instead. | |
479 | ||
480 | 2. Disable SELinux or set to permissive mode | |
481 | ||
482 | 3. Restart the libvirtd process | |
483 | For example, on Fedora: | |
484 | ||
485 | `systemctl restart libvirtd.service` | |
486 | ||
487 | After successfully editing the configuration, you may launch your | |
488 | vhost-enabled VM. The XML describing the VM can be configured like so | |
489 | within the <qemu:commandline> section: | |
490 | ||
491 | 1. Set up shared hugepages: | |
492 | ||
493 | ``` | |
494 | <qemu:arg value='-object'/> | |
495 | <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/> | |
496 | <qemu:arg value='-numa'/> | |
497 | <qemu:arg value='node,memdev=mem'/> | |
498 | <qemu:arg value='-mem-prealloc'/> | |
499 | ``` | |
500 | ||
501 | 2. Set up your tap devices: | |
502 | ||
503 | ``` | |
504 | <qemu:arg value='-netdev'/> | |
505 | <qemu:arg value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/> | |
506 | <qemu:arg value='-device'/> | |
507 | <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/> | |
508 | ``` | |
509 | ||
510 | Repeat for as many devices as are desired, modifying the id, ifname | |
511 | and mac as necessary. | |
512 | ||
513 | Again, if you are using an alternative character device (other than | |
514 | `/dev/vhost-net`), please specify the file descriptor like so: | |
515 | ||
516 | `<qemu:arg value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>` | |
517 | ||
518 | Where <open_fd> refers to the open file descriptor of the character device. | |
519 | Instructions of how to retrieve the file descriptor can be found in the | |
520 | "DPDK vhost VM configuration" section. | |
521 | Alternatively, the process is automated with the qemu-wrap.py script, | |
522 | detailed in the next section. | |
523 | ||
524 | Now you may launch your VM using virt-manager, or like so: | |
525 | ||
526 | `virsh create my_vhost_vm.xml` | |
527 | ||
528 | DPDK vhost VM configuration with libvirt and QEMU wrapper: | |
529 | ---------------------------------------------------------- | |
530 | ||
531 | To use the qemu-wrapper script in conjuntion with libvirt, follow the | |
532 | steps in the previous section before proceeding with the following steps: | |
533 | ||
534 | 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) | |
535 | Ideally in the same directory that the QEMU binary is located. | |
536 | ||
537 | 2. Ensure that the script has the same owner/group and file permissions | |
538 | as the QEMU binary. | |
539 | ||
540 | 3. Update the VM xml file using "virsh edit VM.xml" | |
541 | ||
542 | 1. Set the VM to use the launch script. | |
543 | Set the emulator path contained in the `<emulator><emulator/>` tags. | |
544 | For example, replace: | |
545 | ||
546 | `<emulator>/usr/bin/qemu-kvm<emulator/>` | |
547 | ||
548 | with: | |
549 | ||
550 | `<emulator>/usr/bin/qemu-wrap.py<emulator/>` | |
551 | ||
552 | 4. Edit the Configuration Parameters section of the script to point to | |
553 | the correct emulator location and set any additional options. If you are | |
554 | using a alternative character device name, please set "us_vhost_path" to the | |
555 | location of that device. The script will automatically detect and insert | |
556 | the correct "vhostfd" value in the QEMU command line arguements. | |
557 | ||
558 | 5. Use virt-manager to launch the VM | |
559 | ||
9899125a OS |
560 | Running ovs-vswitchd with DPDK backend inside a VM |
561 | -------------------------------------------------- | |
562 | ||
563 | Please note that additional configuration is required if you want to run | |
564 | ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd | |
565 | creates separate DPDK TX queues for each CPU core available. This operation | |
566 | fails inside QEMU virtual machine because, by default, VirtIO NIC provided | |
567 | to the guest is configured to support only single TX queue and single RX | |
568 | queue. To change this behavior, you need to turn on 'mq' (multiqueue) | |
569 | property of all virtio-net-pci devices emulated by QEMU and used by DPDK. | |
570 | You may do it manually (by changing QEMU command line) or, if you use Libvirt, | |
571 | by adding the following string: | |
572 | ||
573 | `<driver name='vhost' queues='N'/>` | |
574 | ||
575 | to <interface> sections of all network devices used by DPDK. Parameter 'N' | |
576 | determines how many queues can be used by the guest. | |
577 | ||
542cc9bb TG |
578 | Restrictions: |
579 | ------------- | |
580 | ||
542cc9bb TG |
581 | - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. |
582 | - Currently DPDK port does not make use any offload functionality. | |
58397e6c | 583 | - DPDK-vHost support works with 1G huge pages. |
542cc9bb TG |
584 | |
585 | ivshmem: | |
3088fab7 MG |
586 | - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be |
587 | unable to share any rings or mempools with a virtual machine. | |
588 | This is because the current implementation of ivshmem works by sharing | |
589 | a single 1GB huge page from the host operating system to any guest | |
590 | operating system through the Qemu ivshmem device. When using smaller | |
591 | page sizes, multiple pages may be required to hold the ring descriptors | |
592 | and buffer pools. The Qemu ivshmem device does not allow you to share | |
593 | multiple file descriptors to the guest operating system. However, if you | |
594 | want to share dpdkr rings with other processes on the host, you can do | |
595 | this with smaller page sizes. | |
542cc9bb TG |
596 | |
597 | Bug Reporting: | |
598 | -------------- | |
599 | ||
600 | Please report problems to bugs@openvswitch.org. | |
9feb1017 TG |
601 | |
602 | [INSTALL.userspace.md]:INSTALL.userspace.md | |
603 | [INSTALL.md]:INSTALL.md | |
491c2ea3 | 604 | [DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules |
58397e6c | 605 | [DPDK Docs]: http://dpdk.org/doc |