]> git.proxmox.com Git - mirror_ovs.git/blame - INSTALL.DPDK.md
Prepare for post-2.4.0 (2.4.90).
[mirror_ovs.git] / INSTALL.DPDK.md
CommitLineData
542cc9bb
TG
1Using Open vSwitch with DPDK
2============================
3
4Open vSwitch can use Intel(R) DPDK lib to operate entirely in
5userspace. This file explains how to install and use Open vSwitch in
6such a mode.
7
8The DPDK support of Open vSwitch is considered experimental.
9It has not been thoroughly tested.
10
11This version of Open vSwitch should be built manually with `configure`
12and `make`.
13
14OVS needs a system with 1GB hugepages support.
15
16Building and Installing:
17------------------------
18
7d1ced01
CL
19Required: DPDK 2.0
20Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
21on Debian/Ubuntu)
542cc9bb
TG
22
231. Configure build & install DPDK:
24 1. Set `$DPDK_DIR`
25
26 ```
543342a4 27 export DPDK_DIR=/usr/src/dpdk-2.0
542cc9bb
TG
28 cd $DPDK_DIR
29 ```
30
31 2. Update `config/common_linuxapp` so that DPDK generate single lib file.
32 (modification also required for IVSHMEM build)
33
34 `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
35
777cb787 36 Update `config/common_linuxapp` so that DPDK is built with vhost
7d1ced01 37 libraries.
777cb787
DDP
38
39 `CONFIG_RTE_LIBRTE_VHOST=y`
40
41 Then run `make install` to build and install the library.
542cc9bb
TG
42 For default install without IVSHMEM:
43
44 `make install T=x86_64-native-linuxapp-gcc`
45
46 To include IVSHMEM (shared memory):
47
48 `make install T=x86_64-ivshmem-linuxapp-gcc`
49
50 For further details refer to http://dpdk.org/
51
522. Configure & build the Linux kernel:
53
54 Refer to intel-dpdk-getting-started-guide.pdf for understanding
55 DPDK kernel requirement.
56
573. Configure & build OVS:
58
59 * Non IVSHMEM:
60
61 `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/`
62
63 * IVSHMEM:
64
65 `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/`
66
67 ```
68 cd $(OVS_DIR)/openvswitch
69 ./boot.sh
543342a4 70 ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"]
542cc9bb
TG
71 make
72 ```
73
543342a4
MK
74 Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings.
75
542cc9bb
TG
76To have better performance one can enable aggressive compiler optimizations and
77use the special instructions(popcnt, crc32) that may not be available on all
78machines. Instead of typing `make`, type:
79
80`make CFLAGS='-O3 -march=native'`
81
9feb1017 82Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS.
542cc9bb
TG
83
84Using the DPDK with ovs-vswitchd:
85---------------------------------
86
871. Setup system boot
88 Add the following options to the kernel bootline:
89
90 `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
91
922. Setup DPDK devices:
491c2ea3
MG
93
94 DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO
95 modules. UIO requires inserting an out of tree driver igb_uio.ko that is
96 available in DPDK. Setup for both methods are described below.
97
98 * UIO:
99 1. insert uio.ko: `modprobe uio`
100 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko`
101 3. Bind network device to igb_uio:
dbde55e7 102 `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1`
491c2ea3
MG
103
104 * VFIO:
105
106 VFIO needs to be supported in the kernel and the BIOS. More information
107 can be found in the [DPDK Linux GSG].
108
109 1. Insert vfio-pci.ko: `modprobe vfio-pci`
110 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio`
111 and: `sudo /usr/bin/chmod 0666 /dev/vfio/*`
112 3. Bind network device to vfio-pci:
dbde55e7 113 `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1`
542cc9bb
TG
114
1153. Mount the hugetable filsystem
116
117 `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
118
119 Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
120
a52b0492
GS
1214. Follow the instructions in [INSTALL.md] to install only the
122 userspace daemons and utilities (via 'make install').
542cc9bb
TG
123 1. First time only db creation (or clearing):
124
a52b0492
GS
125 ```
126 mkdir -p /usr/local/etc/openvswitch
127 mkdir -p /usr/local/var/run/openvswitch
128 rm /usr/local/etc/openvswitch/conf.db
129 ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
130 /usr/local/share/openvswitch/vswitch.ovsschema
131 ```
542cc9bb 132
a52b0492 133 2. Start ovsdb-server
542cc9bb 134
a52b0492
GS
135 ```
136 ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
542cc9bb
TG
137 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
138 --private-key=db:Open_vSwitch,SSL,private_key \
139 --certificate=Open_vSwitch,SSL,certificate \
140 --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
a52b0492 141 ```
542cc9bb
TG
142
143 3. First time after db creation, initialize:
144
a52b0492
GS
145 ```
146 ovs-vsctl --no-wait init
147 ```
542cc9bb
TG
148
1495. Start vswitchd:
150
151 DPDK configuration arguments can be passed to vswitchd via `--dpdk`
152 argument. This needs to be first argument passed to vswitchd process.
153 dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
154 for dpdk initialization.
155
a52b0492
GS
156 ```
157 export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
158 ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
159 ```
542cc9bb 160
a52b0492
GS
161 If allocated more than one GB hugepage (as for IVSHMEM), set amount and
162 use NUMA node 0 memory:
542cc9bb 163
a52b0492
GS
164 ```
165 ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
166 -- unix:$DB_SOCK --pidfile --detach
167 ```
542cc9bb
TG
168
1696. Add bridge & ports
b8e57534 170
542cc9bb
TG
171 To use ovs-vswitchd with DPDK, create a bridge with datapath_type
172 "netdev" in the configuration database. For example:
173
a52b0492 174 `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev`
542cc9bb
TG
175
176 Now you can add dpdk devices. OVS expect DPDK device name start with dpdk
a52b0492
GS
177 and end with portid. vswitchd should print (in the log file) the number
178 of dpdk devices found.
542cc9bb 179
a52b0492
GS
180 ```
181 ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
182 ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
183 ```
542cc9bb 184
a52b0492
GS
185 Once first DPDK port is added to vswitchd, it creates a Polling thread and
186 polls dpdk device in continuous loop. Therefore CPU utilization
187 for that thread is always 100%.
542cc9bb 188
77c180ce
BM
189 Note: creating bonds of DPDK interfaces is slightly different to creating
190 bonds of system interfaces. For DPDK, the interface type must be explicitly
191 set, for example:
192
193 ```
194 ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk
195 ```
196
542cc9bb
TG
1977. Add test flows
198
199 Test flow script across NICs (assuming ovs in /usr/src/ovs):
200 Execute script:
201
202 ```
203 #! /bin/sh
204 # Move to command directory
205 cd /usr/src/ovs/utilities/
206
207 # Clear current flows
208 ./ovs-ofctl del-flows br0
209
210 # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
211 ./ovs-ofctl add-flow br0 in_port=1,action=output:2
212 ./ovs-ofctl add-flow br0 in_port=2,action=output:1
213 ```
214
2158. Performance tuning
216
217 With pmd multi-threading support, OVS creates one pmd thread for each
218 numa node as default. The pmd thread handles the I/O of all DPDK
219 interfaces on the same numa node. The following two commands can be used
220 to configure the multi-threading behavior.
221
a52b0492 222 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
542cc9bb 223
a52b0492
GS
224 The command above asks for a CPU mask for setting the affinity of pmd
225 threads. A set bit in the mask means a pmd thread is created and pinned
226 to the corresponding CPU core. For more information, please refer to
542cc9bb
TG
227 `man ovs-vswitchd.conf.db`
228
a52b0492 229 `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>`
542cc9bb
TG
230
231 The command above sets the number of rx queues of each DPDK interface. The
232 rx queues are assigned to pmd threads on the same numa node in round-robin
233 fashion. For more information, please refer to `man ovs-vswitchd.conf.db`
234
235 Ideally for maximum throughput, the pmd thread should not be scheduled out
236 which temporarily halts its execution. The following affinitization methods
237 can help.
238
239 Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core
240 sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7
241 and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu
242 configuration could have different core mask requirements).
243
244 To kernel bootline add core isolation list for cores and associated hype cores
245 (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take
246 effect, restart everything.
247
248 Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask':
249
a52b0492 250 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550`
542cc9bb
TG
251
252 You should be able to check that pmd threads are pinned to the correct cores
253 via:
254
a52b0492
GS
255 ```
256 top -p `pidof ovs-vswitchd` -H -d1
257 ```
542cc9bb
TG
258
259 Note, the pmd threads on a numa node are only created if there is at least
260 one DPDK interface from the numa node that has been added to OVS.
261
6553d06b
DDP
262 To understand where most of the time is spent and whether the caches are
263 effective, these commands can be used:
264
265 ```
266 ovs-appctl dpif-netdev/pmd-stats-clear #To reset statistics
267 ovs-appctl dpif-netdev/pmd-stats-show
268 ```
269
542cc9bb
TG
270DPDK Rings :
271------------
272
273Following the steps above to create a bridge, you can now add dpdk rings
274as a port to the vswitch. OVS will expect the DPDK ring device name to
275start with dpdkr and end with a portid.
276
a52b0492 277`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr`
542cc9bb
TG
278
279DPDK rings client test application
280
281Included in the test directory is a sample DPDK application for testing
282the rings. This is from the base dpdk directory and modified to work
283with the ring naming used within ovs.
284
285location tests/ovs_client
286
287To run the client :
288
a52b0492
GS
289```
290cd /usr/src/ovs/tests/
291ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
292```
542cc9bb
TG
293
294In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
295
296It is essential to have --proc-type=secondary
297
298The application simply receives an mbuf on the receive queue of the
299ethernet ring and then places that same mbuf on the transmit ring of
300the ethernet ring. It is a trivial loopback application.
301
302DPDK rings in VM (IVSHMEM shared memory communications)
303-------------------------------------------------------
304
305In addition to executing the client in the host, you can execute it within
306a guest VM. To do so you will need a patched qemu. You can download the
307patch and getting started guide at :
308
309https://01.org/packet-processing/downloads
310
311A general rule of thumb for better performance is that the client
312application should not be assigned the same dpdk core mask "-c" as
313the vswitchd.
314
58397e6c
KT
315DPDK vhost:
316-----------
317
7d1ced01 318DPDK 2.0 supports two types of vhost:
58397e6c 319
7d1ced01
CL
3201. vhost-user
3212. vhost-cuse
58397e6c 322
7d1ced01
CL
323Whatever type of vhost is enabled in the DPDK build specified, is the type
324that will be enabled in OVS. By default, vhost-user is enabled in DPDK.
325Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports
326will be enabled in OVS.
327Please note that support for vhost-cuse is intended to be deprecated in OVS
328in a future release.
58397e6c 329
7d1ced01
CL
330DPDK vhost-user:
331----------------
58397e6c 332
7d1ced01
CL
333The following sections describe the use of vhost-user 'dpdkvhostuser' ports
334with OVS.
58397e6c 335
7d1ced01
CL
336DPDK vhost-user Prerequisites:
337-------------------------
58397e6c 338
7d1ced01
CL
3391. DPDK 2.0 with vhost support enabled as documented in the "Building and
340 Installing section"
58397e6c 341
7d1ced01 3422. QEMU version v2.1.0+
58397e6c 343
7d1ced01
CL
344 QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing
345 your VM with memory greater than 1GB due to potential issues with memory
346 mapping larger areas.
58397e6c 347
7d1ced01
CL
348Adding DPDK vhost-user ports to the Switch:
349--------------------------------------
58397e6c 350
7d1ced01
CL
351Following the steps above to create a bridge, you can now add DPDK vhost-user
352as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
353have arbitrary names.
58397e6c 354
7d1ced01 355 - For vhost-user, the name of the port type is `dpdkvhostuser`
58397e6c 356
7d1ced01
CL
357 ```
358 ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
359 type=dpdkvhostuser
360 ```
361
362 This action creates a socket located at
363 `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
364 to your VM on the QEMU command line. More instructions on this can be
365 found in the next section "DPDK vhost-user VM configuration"
366 Note: If you wish for the vhost-user sockets to be created in a
367 directory other than `/usr/local/var/run/openvswitch`, you may specify
368 another location on the ovs-vswitchd command line like so:
369
370 `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
371
372DPDK vhost-user VM configuration:
373---------------------------------
374Follow the steps below to attach vhost-user port(s) to a VM.
375
3761. Configure sockets.
377 Pass the following parameters to QEMU to attach a vhost-user device:
378
379 ```
380 -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
381 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
382 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
383 ```
384
385 ...where vhost-user-1 is the name of the vhost-user port added
386 to the switch.
387 Repeat the above parameters for multiple devices, changing the
388 chardev path and id as necessary. Note that a separate and different
389 chardev path needs to be specified for each vhost-user device. For
390 example you have a second vhost-user port named 'vhost-user-2', you
391 append your QEMU command line with an additional set of parameters:
392
393 ```
394 -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
395 -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
396 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
397 ```
398
3992. Configure huge pages.
400 QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
401 a virtio-net device's virtual rings and packet buffers mapping the VM's
402 physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
403 memory into their process address space, pass the following paramters
404 to QEMU:
405
406 ```
407 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
408 share=on
409 -numa node,memdev=mem -mem-prealloc
410 ```
411
412DPDK vhost-cuse:
413----------------
414
415The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports
416with OVS.
417
418DPDK vhost-cuse Prerequisites:
419-------------------------
420
4211. DPDK 2.0 with vhost support enabled as documented in the "Building and
422 Installing section"
423 As an additional step, you must enable vhost-cuse in DPDK by setting the
424 following additional flag in `config/common_linuxapp`:
425
426 `CONFIG_RTE_LIBRTE_VHOST_USER=n`
427
428 Following this, rebuild DPDK as per the instructions in the "Building and
429 Installing" section. Finally, rebuild OVS as per step 3 in the "Building
430 and Installing" section - OVS will detect that DPDK has vhost-cuse libraries
431 compiled and in turn will enable support for it in the switch and disable
432 vhost-user support.
433
4342. Insert the Cuse module:
435
436 `modprobe cuse`
437
4383. Build and insert the `eventfd_link` module:
439
440 ```
441 cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
442 make
443 insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
444 ```
445
4464. QEMU version v2.1.0+
447
448 vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to
449 use v2.2.0 if providing your VM with memory greater than 1GB due to potential
450 issues with memory mapping larger areas.
451 Note: QEMU v1.6.2 will also work, with slightly different command line parameters,
452 which are specified later in this document.
453
454Adding DPDK vhost-cuse ports to the Switch:
455--------------------------------------
456
457Following the steps above to create a bridge, you can now add DPDK vhost-cuse
458as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have
459arbitrary names.
460
461 - For vhost-cuse, the name of the port type is `dpdkvhostcuse`
462
463 ```
464 ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
465 type=dpdkvhostcuse
466 ```
467
468 When attaching vhost-cuse ports to QEMU, the name provided during the
469 add-port operation must match the ifname parameter on the QEMU command
470 line. More instructions on this can be found in the next section.
471
472DPDK vhost-cuse VM configuration:
473---------------------------------
474
475 vhost-cuse ports use a Linux* character device to communicate with QEMU.
58397e6c
KT
476 By default it is set to `/dev/vhost-net`. It is possible to reuse this
477 standard device for DPDK vhost, which makes setup a little simpler but it
478 is better practice to specify an alternative character device in order to
479 avoid any conflicts if kernel vhost is to be used in parallel.
480
4811. This step is only needed if using an alternative character device.
482
483 The new character device filename must be specified on the vswitchd
484 commandline:
485
486 `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...`
487
488 Note that the `--cuse_dev_name` argument and associated string must be the first
489 arguments after `--dpdk` and come before the EAL arguments. In the example
490 above, the character device to be used will be `/dev/my-vhost-net`.
491
4922. This step is only needed if reusing the standard character device. It will
493 conflict with the kernel vhost character device so the user must first
494 remove it.
495
496 `rm -rf /dev/vhost-net`
497
4983a. Configure virtio-net adaptors:
499 The following parameters must be passed to the QEMU binary:
500
501 ```
502 -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
503 -device virtio-net-pci,netdev=net1,mac=<mac>
504 ```
505
506 Repeat the above parameters for multiple devices.
507
508 The DPDK vhost library will negiotiate its own features, so they
509 need not be passed in as command line params. Note that as offloads are
510 disabled this is the equivalent of setting:
511
512 `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off`
513
5143b. If using an alternative character device. It must be also explicitly
515 passed to QEMU using the `vhostfd` argument:
516
517 ```
518 -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
519 vhostfd=<open_fd>
520 -device virtio-net-pci,netdev=net1,mac=<mac>
521 ```
522
523 The open file descriptor must be passed to QEMU running as a child
524 process. This could be done with a simple python script.
525
526 ```
527 #!/usr/bin/python
528 fd = os.open("/dev/usvhost", os.O_RDWR)
529 subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\
530 vhost=on,vhostfd=" + fd +"...", shell=True)
531
532 Alternatively the the `qemu-wrap.py` script can be used to automate the
533 requirements specified above and can be used in conjunction with libvirt if
534 desired. See the "DPDK vhost VM configuration with QEMU wrapper" section
535 below.
536
5374. Configure huge pages:
538 QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
539 virtio-net device's virtual rings and packet buffers mapping the VM's
540 physical memory on hugetlbfs. To enable vhost-ports to map the VM's
7d1ced01 541 memory into their process address space, pass the following parameters
58397e6c
KT
542 to QEMU:
543
544 `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
545 share=on -numa node,memdev=mem -mem-prealloc`
546
7d1ced01
CL
547 Note: For use with an earlier QEMU version such as v1.6.2, use the
548 following to configure hugepages instead:
58397e6c 549
7d1ced01 550 `-mem-path /dev/hugepages -mem-prealloc`
58397e6c 551
7d1ced01
CL
552DPDK vhost-cuse VM configuration with QEMU wrapper:
553---------------------------------------------------
58397e6c
KT
554The QEMU wrapper script automatically detects and calls QEMU with the
555necessary parameters. It performs the following actions:
556
557 * Automatically detects the location of the hugetlbfs and inserts this
558 into the command line parameters.
559 * Automatically open file descriptors for each virtio-net device and
560 inserts this into the command line parameters.
561 * Calls QEMU passing both the command line parameters passed to the
562 script itself and those it has auto-detected.
563
564Before use, you **must** edit the configuration parameters section of the
565script to point to the correct emulator location and set additional
566settings. Of these settings, `emul_path` and `us_vhost_path` **must** be
567set. All other settings are optional.
568
569To use directly from the command line simply pass the wrapper some of the
570QEMU parameters: it will configure the rest. For example:
571
572```
573qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
574 --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1,
575 script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci,
576 netdev=net1,mac=00:00:00:00:00:01
5568661c 577```
58397e6c 578
7d1ced01
CL
579DPDK vhost-cuse VM configuration with libvirt:
580----------------------------------------------
58397e6c
KT
581
582If you are using libvirt, you must enable libvirt to access the character
583device by adding it to controllers cgroup for libvirtd using the following
584steps.
585
586 1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
587
588 ```
589 1) clear_emulator_capabilities = 0
590 2) user = "root"
591 3) group = "root"
592 4) cgroup_device_acl = [
593 "/dev/null", "/dev/full", "/dev/zero",
594 "/dev/random", "/dev/urandom",
595 "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
596 "/dev/rtc", "/dev/hpet", "/dev/net/tun",
597 "/dev/<my-vhost-device>",
598 "/dev/hugepages"]
599 ```
600
601 <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net`
602 device. If you have specificed a different name on the ovs-vswitchd
603 commandline using the "--cuse_dev_name" parameter, please specify that
604 filename instead.
605
606 2. Disable SELinux or set to permissive mode
607
608 3. Restart the libvirtd process
609 For example, on Fedora:
610
611 `systemctl restart libvirtd.service`
612
613After successfully editing the configuration, you may launch your
614vhost-enabled VM. The XML describing the VM can be configured like so
615within the <qemu:commandline> section:
616
617 1. Set up shared hugepages:
618
619 ```
620 <qemu:arg value='-object'/>
621 <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/>
622 <qemu:arg value='-numa'/>
623 <qemu:arg value='node,memdev=mem'/>
624 <qemu:arg value='-mem-prealloc'/>
625 ```
626
627 2. Set up your tap devices:
628
629 ```
630 <qemu:arg value='-netdev'/>
631 <qemu:arg value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/>
632 <qemu:arg value='-device'/>
633 <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/>
634 ```
635
636 Repeat for as many devices as are desired, modifying the id, ifname
637 and mac as necessary.
638
639 Again, if you are using an alternative character device (other than
640 `/dev/vhost-net`), please specify the file descriptor like so:
641
642 `<qemu:arg value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>`
643
644 Where <open_fd> refers to the open file descriptor of the character device.
645 Instructions of how to retrieve the file descriptor can be found in the
646 "DPDK vhost VM configuration" section.
647 Alternatively, the process is automated with the qemu-wrap.py script,
648 detailed in the next section.
649
650Now you may launch your VM using virt-manager, or like so:
651
652 `virsh create my_vhost_vm.xml`
653
7d1ced01 654DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
58397e6c
KT
655----------------------------------------------------------
656
657To use the qemu-wrapper script in conjuntion with libvirt, follow the
658steps in the previous section before proceeding with the following steps:
659
660 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH)
661 Ideally in the same directory that the QEMU binary is located.
662
663 2. Ensure that the script has the same owner/group and file permissions
664 as the QEMU binary.
665
666 3. Update the VM xml file using "virsh edit VM.xml"
667
668 1. Set the VM to use the launch script.
669 Set the emulator path contained in the `<emulator><emulator/>` tags.
670 For example, replace:
671
672 `<emulator>/usr/bin/qemu-kvm<emulator/>`
673
674 with:
675
676 `<emulator>/usr/bin/qemu-wrap.py<emulator/>`
677
678 4. Edit the Configuration Parameters section of the script to point to
679 the correct emulator location and set any additional options. If you are
680 using a alternative character device name, please set "us_vhost_path" to the
681 location of that device. The script will automatically detect and insert
7d1ced01 682 the correct "vhostfd" value in the QEMU command line arguments.
58397e6c
KT
683
684 5. Use virt-manager to launch the VM
685
9899125a
OS
686Running ovs-vswitchd with DPDK backend inside a VM
687--------------------------------------------------
688
689Please note that additional configuration is required if you want to run
690ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd
691creates separate DPDK TX queues for each CPU core available. This operation
692fails inside QEMU virtual machine because, by default, VirtIO NIC provided
693to the guest is configured to support only single TX queue and single RX
694queue. To change this behavior, you need to turn on 'mq' (multiqueue)
695property of all virtio-net-pci devices emulated by QEMU and used by DPDK.
696You may do it manually (by changing QEMU command line) or, if you use Libvirt,
697by adding the following string:
698
699`<driver name='vhost' queues='N'/>`
700
701to <interface> sections of all network devices used by DPDK. Parameter 'N'
702determines how many queues can be used by the guest.
703
542cc9bb
TG
704Restrictions:
705-------------
706
542cc9bb
TG
707 - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
708 - Currently DPDK port does not make use any offload functionality.
58397e6c 709 - DPDK-vHost support works with 1G huge pages.
542cc9bb
TG
710
711 ivshmem:
3088fab7
MG
712 - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be
713 unable to share any rings or mempools with a virtual machine.
714 This is because the current implementation of ivshmem works by sharing
715 a single 1GB huge page from the host operating system to any guest
716 operating system through the Qemu ivshmem device. When using smaller
717 page sizes, multiple pages may be required to hold the ring descriptors
718 and buffer pools. The Qemu ivshmem device does not allow you to share
719 multiple file descriptors to the guest operating system. However, if you
720 want to share dpdkr rings with other processes on the host, you can do
721 this with smaller page sizes.
542cc9bb
TG
722
723Bug Reporting:
724--------------
725
726Please report problems to bugs@openvswitch.org.
9feb1017
TG
727
728[INSTALL.userspace.md]:INSTALL.userspace.md
729[INSTALL.md]:INSTALL.md
491c2ea3 730[DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules
58397e6c 731[DPDK Docs]: http://dpdk.org/doc