]>
Commit | Line | Data |
---|---|---|
542cc9bb TG |
1 | Using Open vSwitch with DPDK |
2 | ============================ | |
3 | ||
4 | Open vSwitch can use Intel(R) DPDK lib to operate entirely in | |
5 | userspace. This file explains how to install and use Open vSwitch in | |
6 | such a mode. | |
7 | ||
8 | The DPDK support of Open vSwitch is considered experimental. | |
9 | It has not been thoroughly tested. | |
10 | ||
11 | This version of Open vSwitch should be built manually with `configure` | |
12 | and `make`. | |
13 | ||
14 | OVS needs a system with 1GB hugepages support. | |
15 | ||
16 | Building and Installing: | |
17 | ------------------------ | |
18 | ||
362ca396 | 19 | Required: DPDK 16.04 |
7d1ced01 CL |
20 | Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` |
21 | on Debian/Ubuntu) | |
542cc9bb TG |
22 | |
23 | 1. Configure build & install DPDK: | |
24 | 1. Set `$DPDK_DIR` | |
25 | ||
26 | ``` | |
362ca396 | 27 | export DPDK_DIR=/usr/src/dpdk-16.04 |
542cc9bb TG |
28 | cd $DPDK_DIR |
29 | ``` | |
30 | ||
362ca396 | 31 | 2. Then run `make install` to build and install the library. |
542cc9bb TG |
32 | For default install without IVSHMEM: |
33 | ||
d60a9c21 | 34 | `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` |
542cc9bb TG |
35 | |
36 | To include IVSHMEM (shared memory): | |
37 | ||
d60a9c21 | 38 | `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` |
542cc9bb TG |
39 | |
40 | For further details refer to http://dpdk.org/ | |
41 | ||
42 | 2. Configure & build the Linux kernel: | |
43 | ||
44 | Refer to intel-dpdk-getting-started-guide.pdf for understanding | |
45 | DPDK kernel requirement. | |
46 | ||
47 | 3. Configure & build OVS: | |
48 | ||
49 | * Non IVSHMEM: | |
50 | ||
51 | `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` | |
52 | ||
53 | * IVSHMEM: | |
54 | ||
55 | `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` | |
56 | ||
57 | ``` | |
15b612f8 | 58 | cd $(OVS_DIR)/ |
542cc9bb | 59 | ./boot.sh |
543342a4 | 60 | ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] |
542cc9bb TG |
61 | make |
62 | ``` | |
63 | ||
543342a4 MK |
64 | Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings. |
65 | ||
542cc9bb TG |
66 | To have better performance one can enable aggressive compiler optimizations and |
67 | use the special instructions(popcnt, crc32) that may not be available on all | |
68 | machines. Instead of typing `make`, type: | |
69 | ||
70 | `make CFLAGS='-O3 -march=native'` | |
71 | ||
9feb1017 | 72 | Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS. |
542cc9bb TG |
73 | |
74 | Using the DPDK with ovs-vswitchd: | |
75 | --------------------------------- | |
76 | ||
77 | 1. Setup system boot | |
78 | Add the following options to the kernel bootline: | |
79 | ||
80 | `default_hugepagesz=1GB hugepagesz=1G hugepages=1` | |
81 | ||
82 | 2. Setup DPDK devices: | |
491c2ea3 MG |
83 | |
84 | DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO | |
85 | modules. UIO requires inserting an out of tree driver igb_uio.ko that is | |
86 | available in DPDK. Setup for both methods are described below. | |
87 | ||
88 | * UIO: | |
89 | 1. insert uio.ko: `modprobe uio` | |
90 | 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` | |
91 | 3. Bind network device to igb_uio: | |
dbde55e7 | 92 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` |
491c2ea3 MG |
93 | |
94 | * VFIO: | |
95 | ||
96 | VFIO needs to be supported in the kernel and the BIOS. More information | |
97 | can be found in the [DPDK Linux GSG]. | |
98 | ||
99 | 1. Insert vfio-pci.ko: `modprobe vfio-pci` | |
100 | 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` | |
101 | and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` | |
102 | 3. Bind network device to vfio-pci: | |
dbde55e7 | 103 | `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` |
542cc9bb | 104 | |
18f777b2 | 105 | 3. Mount the hugetable filesystem |
542cc9bb TG |
106 | |
107 | `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` | |
108 | ||
109 | Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. | |
110 | ||
a52b0492 GS |
111 | 4. Follow the instructions in [INSTALL.md] to install only the |
112 | userspace daemons and utilities (via 'make install'). | |
542cc9bb TG |
113 | 1. First time only db creation (or clearing): |
114 | ||
a52b0492 GS |
115 | ``` |
116 | mkdir -p /usr/local/etc/openvswitch | |
117 | mkdir -p /usr/local/var/run/openvswitch | |
118 | rm /usr/local/etc/openvswitch/conf.db | |
119 | ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ | |
120 | /usr/local/share/openvswitch/vswitch.ovsschema | |
121 | ``` | |
542cc9bb | 122 | |
a52b0492 | 123 | 2. Start ovsdb-server |
542cc9bb | 124 | |
a52b0492 GS |
125 | ``` |
126 | ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ | |
542cc9bb TG |
127 | --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ |
128 | --private-key=db:Open_vSwitch,SSL,private_key \ | |
129 | --certificate=Open_vSwitch,SSL,certificate \ | |
130 | --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach | |
a52b0492 | 131 | ``` |
542cc9bb TG |
132 | |
133 | 3. First time after db creation, initialize: | |
134 | ||
a52b0492 GS |
135 | ``` |
136 | ovs-vsctl --no-wait init | |
137 | ``` | |
542cc9bb TG |
138 | |
139 | 5. Start vswitchd: | |
140 | ||
bab69409 AC |
141 | DPDK configuration arguments can be passed to vswitchd via Open_vSwitch |
142 | other_config column. The recognized configuration options are listed. | |
143 | Defaults will be provided for all values not explicitly set. | |
144 | ||
145 | * dpdk-init | |
146 | Specifies whether OVS should initialize and support DPDK ports. This is | |
147 | a boolean, and defaults to false. | |
148 | ||
149 | * dpdk-lcore-mask | |
150 | Specifies the CPU cores on which dpdk lcore threads should be spawned. | |
151 | The DPDK lcore threads are used for DPDK library tasks, such as | |
152 | library internal message processing, logging, etc. Value should be in | |
153 | the form of a hex string (so '0x123') similar to the 'taskset' mask | |
154 | input. | |
155 | If not specified, the value will be determined by choosing the lowest | |
156 | CPU core from initial cpu affinity list. Otherwise, the value will be | |
157 | passed directly to the DPDK library. | |
158 | For performance reasons, it is best to set this to a single core on | |
159 | the system, rather than allow lcore threads to float. | |
160 | ||
161 | * dpdk-alloc-mem | |
162 | This sets the total memory to preallocate from hugepages regardless of | |
163 | processor socket. It is recommended to use dpdk-socket-mem instead. | |
164 | ||
165 | * dpdk-socket-mem | |
166 | Comma separated list of memory to pre-allocate from hugepages on specific | |
167 | sockets. | |
168 | ||
169 | * dpdk-hugepage-dir | |
170 | Directory where hugetlbfs is mounted | |
171 | ||
eac84432 AC |
172 | * dpdk-extra |
173 | Extra arguments to provide to DPDK EAL, as previously specified on the | |
174 | command line. Do not pass '--no-huge' to the system in this way. Support | |
175 | for running the system without hugepages is nonexistent. | |
176 | ||
bab69409 AC |
177 | * cuse-dev-name |
178 | Option to set the vhost_cuse character device name. | |
179 | ||
180 | * vhost-sock-dir | |
181 | Option to set the path to the vhost_user unix socket files. | |
182 | ||
183 | NOTE: Changing any of these options requires restarting the ovs-vswitchd | |
184 | application. | |
185 | ||
186 | Open vSwitch can be started as normal. DPDK will be initialized as long | |
187 | as the dpdk-init option has been set to 'true'. | |
188 | ||
542cc9bb | 189 | |
a52b0492 GS |
190 | ``` |
191 | export DB_SOCK=/usr/local/var/run/openvswitch/db.sock | |
bab69409 AC |
192 | ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true |
193 | ovs-vswitchd unix:$DB_SOCK --pidfile --detach | |
a52b0492 | 194 | ``` |
542cc9bb | 195 | |
a52b0492 GS |
196 | If allocated more than one GB hugepage (as for IVSHMEM), set amount and |
197 | use NUMA node 0 memory: | |
542cc9bb | 198 | |
a52b0492 | 199 | ``` |
bab69409 AC |
200 | ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0" |
201 | ovs-vswitchd unix:$DB_SOCK --pidfile --detach | |
a52b0492 | 202 | ``` |
542cc9bb TG |
203 | |
204 | 6. Add bridge & ports | |
b8e57534 | 205 | |
542cc9bb TG |
206 | To use ovs-vswitchd with DPDK, create a bridge with datapath_type |
207 | "netdev" in the configuration database. For example: | |
208 | ||
a52b0492 | 209 | `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` |
542cc9bb | 210 | |
f748d99a RB |
211 | Now you can add dpdk devices. OVS expects DPDK device names to start with |
212 | "dpdk" and end with a portid. vswitchd should print (in the log file) the | |
213 | number of dpdk devices found. | |
542cc9bb | 214 | |
a52b0492 GS |
215 | ``` |
216 | ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk | |
217 | ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk | |
218 | ``` | |
542cc9bb | 219 | |
a52b0492 GS |
220 | Once first DPDK port is added to vswitchd, it creates a Polling thread and |
221 | polls dpdk device in continuous loop. Therefore CPU utilization | |
222 | for that thread is always 100%. | |
542cc9bb | 223 | |
77c180ce BM |
224 | Note: creating bonds of DPDK interfaces is slightly different to creating |
225 | bonds of system interfaces. For DPDK, the interface type must be explicitly | |
226 | set, for example: | |
227 | ||
228 | ``` | |
229 | ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk | |
230 | ``` | |
231 | ||
542cc9bb TG |
232 | 7. Add test flows |
233 | ||
234 | Test flow script across NICs (assuming ovs in /usr/src/ovs): | |
235 | Execute script: | |
236 | ||
237 | ``` | |
238 | #! /bin/sh | |
239 | # Move to command directory | |
240 | cd /usr/src/ovs/utilities/ | |
241 | ||
242 | # Clear current flows | |
243 | ./ovs-ofctl del-flows br0 | |
244 | ||
245 | # Add flows between port 1 (dpdk0) to port 2 (dpdk1) | |
246 | ./ovs-ofctl add-flow br0 in_port=1,action=output:2 | |
247 | ./ovs-ofctl add-flow br0 in_port=2,action=output:1 | |
248 | ``` | |
249 | ||
0bf765f7 IS |
250 | 8. QoS usage example |
251 | ||
252 | Assuming you have a vhost-user port transmitting traffic consisting of | |
253 | packets of size 64 bytes, the following command would limit the egress | |
254 | transmission rate of the port to ~1,000,000 packets per second: | |
255 | ||
256 | `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create qos | |
257 | type=egress-policer other-config:cir=46000000 other-config:cbs=2048` | |
258 | ||
259 | To examine the QoS configuration of the port: | |
260 | ||
261 | `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` | |
262 | ||
263 | To clear the QoS configuration from the port and ovsdb use the following: | |
264 | ||
265 | `ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos` | |
266 | ||
267 | For more details regarding egress-policer parameters please refer to the | |
268 | vswitch.xml. | |
269 | ||
188d29d7 KT |
270 | Performance Tuning: |
271 | ------------------- | |
542cc9bb | 272 | |
188d29d7 | 273 | 1. PMD affinitization |
542cc9bb | 274 | |
188d29d7 KT |
275 | A poll mode driver (pmd) thread handles the I/O of all DPDK |
276 | interfaces assigned to it. A pmd thread will busy loop through | |
277 | the assigned port/rxq's polling for packets, switch the packets | |
278 | and send to a tx port if required. Typically, it is found that | |
279 | a pmd thread is CPU bound, meaning that the greater the CPU | |
280 | occupancy the pmd thread can get, the better the performance. To | |
281 | that end, it is good practice to ensure that a pmd thread has as | |
282 | many cycles on a core available to it as possible. This can be | |
283 | achieved by affinitizing the pmd thread with a core that has no | |
284 | other workload. See section 7 below for a description of how to | |
285 | isolate cores for this purpose also. | |
542cc9bb | 286 | |
188d29d7 KT |
287 | The following command can be used to specify the affinity of the |
288 | pmd thread(s). | |
542cc9bb | 289 | |
188d29d7 | 290 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` |
542cc9bb | 291 | |
188d29d7 KT |
292 | By setting a bit in the mask, a pmd thread is created and pinned |
293 | to the corresponding CPU core. e.g. to run a pmd thread on core 1 | |
542cc9bb | 294 | |
188d29d7 | 295 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2` |
542cc9bb | 296 | |
188d29d7 | 297 | For more information, please refer to the Open_vSwitch TABLE section in |
542cc9bb | 298 | |
188d29d7 | 299 | `man ovs-vswitchd.conf.db` |
542cc9bb | 300 | |
188d29d7 KT |
301 | Note, that a pmd thread on a NUMA node is only created if there is |
302 | at least one DPDK interface from that NUMA node added to OVS. | |
542cc9bb | 303 | |
188d29d7 | 304 | 2. Multiple poll mode driver threads |
542cc9bb | 305 | |
188d29d7 KT |
306 | With pmd multi-threading support, OVS creates one pmd thread |
307 | for each NUMA node by default. However, it can be seen that in cases | |
308 | where there are multiple ports/rxq's producing traffic, performance | |
309 | can be improved by creating multiple pmd threads running on separate | |
310 | cores. These pmd threads can then share the workload by each being | |
311 | responsible for different ports/rxq's. Assignment of ports/rxq's to | |
312 | pmd threads is done automatically. | |
542cc9bb | 313 | |
188d29d7 KT |
314 | The following command can be used to specify the affinity of the |
315 | pmd threads. | |
542cc9bb | 316 | |
188d29d7 KT |
317 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` |
318 | ||
319 | A set bit in the mask means a pmd thread is created and pinned | |
320 | to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 | |
321 | ||
322 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` | |
323 | ||
324 | For more information, please refer to the Open_vSwitch TABLE section in | |
325 | ||
326 | `man ovs-vswitchd.conf.db` | |
327 | ||
328 | For example, when using dpdk and dpdkvhostuser ports in a bi-directional | |
329 | VM loopback as shown below, spreading the workload over 2 or 4 pmd | |
330 | threads shows significant improvements as there will be more total CPU | |
331 | occupancy available. | |
332 | ||
333 | NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 | |
334 | ||
ce179f11 IM |
335 | The following command can be used to confirm that the port/rxq assignment |
336 | to pmd threads is as required: | |
337 | ||
338 | `ovs-appctl dpif-netdev/pmd-rxq-show` | |
339 | ||
340 | This can also be checked with: | |
188d29d7 KT |
341 | |
342 | ``` | |
343 | top -H | |
344 | taskset -p <pid_of_pmd> | |
345 | ``` | |
346 | ||
347 | To understand where most of the pmd thread time is spent and whether the | |
348 | caches are being utilized, these commands can be used: | |
349 | ||
350 | ``` | |
351 | # Clear previous stats | |
352 | ovs-appctl dpif-netdev/pmd-stats-clear | |
353 | ||
354 | # Check current stats | |
355 | ovs-appctl dpif-netdev/pmd-stats-show | |
356 | ``` | |
357 | ||
358 | 3. DPDK port Rx Queues | |
359 | ||
a14b8947 | 360 | `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>` |
188d29d7 | 361 | |
a14b8947 | 362 | The command above sets the number of rx queues for DPDK interface. |
188d29d7 KT |
363 | The rx queues are assigned to pmd threads on the same NUMA node in a |
364 | round-robin fashion. For more information, please refer to the | |
365 | Open_vSwitch TABLE section in | |
366 | ||
367 | `man ovs-vswitchd.conf.db` | |
368 | ||
369 | 4. Exact Match Cache | |
370 | ||
371 | Each pmd thread contains one EMC. After initial flow setup in the | |
372 | datapath, the EMC contains a single table and provides the lowest level | |
373 | (fastest) switching for DPDK ports. If there is a miss in the EMC then | |
374 | the next level where switching will occur is the datapath classifier. | |
375 | Missing in the EMC and looking up in the datapath classifier incurs a | |
376 | significant performance penalty. If lookup misses occur in the EMC | |
377 | because it is too small to handle the number of flows, its size can | |
378 | be increased. The EMC size can be modified by editing the define | |
379 | EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c. | |
380 | ||
381 | As mentioned above an EMC is per pmd thread. So an alternative way of | |
382 | increasing the aggregate amount of possible flow entries in EMC and | |
383 | avoiding datapath classifier lookups is to have multiple pmd threads | |
384 | running. This can be done as described in section 2. | |
385 | ||
386 | 5. Compiler options | |
387 | ||
388 | The default compiler optimization level is '-O2'. Changing this to | |
389 | more aggressive compiler optimizations such as '-O3' or | |
390 | '-Ofast -march=native' with gcc can produce performance gains. | |
391 | ||
392 | 6. Simultaneous Multithreading (SMT) | |
393 | ||
394 | With SMT enabled, one physical core appears as two logical cores | |
395 | which can improve performance. | |
396 | ||
397 | SMT can be utilized to add additional pmd threads without consuming | |
398 | additional physical cores. Additional pmd threads may be added in the | |
399 | same manner as described in section 2. If trying to minimize the use | |
400 | of physical cores for pmd threads, care must be taken to set the | |
401 | correct bits in the pmd-cpu-mask to ensure that the pmd threads are | |
402 | pinned to SMT siblings. | |
403 | ||
404 | For example, when using 2x 10 core processors in a dual socket system | |
405 | with HT enabled, /proc/cpuinfo will report 40 logical cores. To use | |
406 | two logical cores which share the same physical core for pmd threads, | |
407 | the following command can be used to identify a pair of logical cores. | |
408 | ||
409 | `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list` | |
410 | ||
411 | where N is the logical core number. In this example, it would show that | |
412 | cores 1 and 21 share the same physical core. The pmd-cpu-mask to enable | |
413 | two pmd threads running on these two logical cores (one physical core) | |
414 | is. | |
415 | ||
416 | `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002` | |
417 | ||
418 | Note that SMT is enabled by the Hyper-Threading section in the | |
419 | BIOS, and as such will apply to the whole system. So the impact of | |
420 | enabling/disabling it for the whole system should be considered | |
421 | e.g. If workloads on the system can scale across multiple cores, | |
422 | SMT may very beneficial. However, if they do not and perform best | |
423 | on a single physical core, SMT may not be beneficial. | |
424 | ||
425 | 7. The isolcpus kernel boot parameter | |
426 | ||
427 | isolcpus can be used on the kernel bootline to isolate cores from the | |
428 | kernel scheduler and hence dedicate them to OVS or other packet | |
429 | forwarding related workloads. For example a Linux kernel boot-line | |
430 | could be: | |
431 | ||
432 | 'GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=4 default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"' | |
433 | ||
434 | 8. NUMA/Cluster On Die | |
435 | ||
436 | Ideally inter NUMA datapaths should be avoided where possible as packets | |
437 | will go across QPI and there may be a slight performance penalty when | |
438 | compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3, | |
439 | Cluster On Die is introduced on models that have 10 cores or more. | |
440 | This makes it possible to logically split a socket into two NUMA regions | |
441 | and again it is preferred where possible to keep critical datapaths | |
442 | within the one cluster. | |
443 | ||
444 | It is good practice to ensure that threads that are in the datapath are | |
445 | pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs | |
446 | responsible for forwarding. | |
447 | ||
448 | 9. Rx Mergeable buffers | |
449 | ||
450 | Rx Mergeable buffers is a virtio feature that allows chaining of multiple | |
451 | virtio descriptors to handle large packet sizes. As such, large packets | |
452 | are handled by reserving and chaining multiple free descriptors | |
453 | together. Mergeable buffer support is negotiated between the virtio | |
454 | driver and virtio device and is supported by the DPDK vhost library. | |
455 | This behavior is typically supported and enabled by default, however | |
456 | in the case where the user knows that rx mergeable buffers are not needed | |
457 | i.e. jumbo frames are not needed, it can be forced off by adding | |
de658847 | 458 | mrg_rxbuf=off to the QEMU command line options. By not reserving multiple |
188d29d7 KT |
459 | chains of descriptors it will make more individual virtio descriptors |
460 | available for rx to the guest using dpdkvhost ports and this can improve | |
461 | performance. | |
462 | ||
463 | 10. Packet processing in the guest | |
464 | ||
465 | It is good practice whether simply forwarding packets from one | |
466 | interface to another or more complex packet processing in the guest, | |
467 | to ensure that the thread performing this work has as much CPU | |
468 | occupancy as possible. For example when the DPDK sample application | |
469 | `testpmd` is used to forward packets in the guest, multiple QEMU vCPU | |
470 | threads can be created. Taskset can then be used to affinitize the | |
471 | vCPU thread responsible for forwarding to a dedicated core not used | |
472 | for other general processing on the host system. | |
473 | ||
474 | 11. DPDK virtio pmd in the guest | |
475 | ||
476 | dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the path | |
477 | to the guest using the DPDK vhost library. This library is compatible with | |
478 | virtio-net drivers in the guest but significantly better performance can | |
479 | be observed when using the DPDK virtio pmd driver in the guest. The DPDK | |
480 | `testpmd` application can be used in the guest as an example application | |
481 | that forwards packet from one DPDK vhost port to another. An example of | |
482 | running `testpmd` in the guest can be seen here. | |
483 | ||
484 | `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i --txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start` | |
485 | ||
486 | See below information on dpdkvhostcuse and dpdkvhostuser ports. | |
487 | See [DPDK Docs] for more information on `testpmd`. | |
542cc9bb | 488 | |
6553d06b | 489 | |
6553d06b | 490 | |
542cc9bb TG |
491 | DPDK Rings : |
492 | ------------ | |
493 | ||
494 | Following the steps above to create a bridge, you can now add dpdk rings | |
495 | as a port to the vswitch. OVS will expect the DPDK ring device name to | |
496 | start with dpdkr and end with a portid. | |
497 | ||
a52b0492 | 498 | `ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` |
542cc9bb TG |
499 | |
500 | DPDK rings client test application | |
501 | ||
502 | Included in the test directory is a sample DPDK application for testing | |
503 | the rings. This is from the base dpdk directory and modified to work | |
504 | with the ring naming used within ovs. | |
505 | ||
506 | location tests/ovs_client | |
507 | ||
508 | To run the client : | |
509 | ||
a52b0492 GS |
510 | ``` |
511 | cd /usr/src/ovs/tests/ | |
512 | ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" | |
513 | ``` | |
542cc9bb TG |
514 | |
515 | In the case of the dpdkr example above the "port id you gave dpdkr" is 0. | |
516 | ||
517 | It is essential to have --proc-type=secondary | |
518 | ||
519 | The application simply receives an mbuf on the receive queue of the | |
520 | ethernet ring and then places that same mbuf on the transmit ring of | |
521 | the ethernet ring. It is a trivial loopback application. | |
522 | ||
523 | DPDK rings in VM (IVSHMEM shared memory communications) | |
524 | ------------------------------------------------------- | |
525 | ||
526 | In addition to executing the client in the host, you can execute it within | |
527 | a guest VM. To do so you will need a patched qemu. You can download the | |
528 | patch and getting started guide at : | |
529 | ||
530 | https://01.org/packet-processing/downloads | |
531 | ||
532 | A general rule of thumb for better performance is that the client | |
533 | application should not be assigned the same dpdk core mask "-c" as | |
534 | the vswitchd. | |
535 | ||
58397e6c KT |
536 | DPDK vhost: |
537 | ----------- | |
538 | ||
362ca396 | 539 | DPDK 16.04 supports two types of vhost: |
58397e6c | 540 | |
7d1ced01 CL |
541 | 1. vhost-user |
542 | 2. vhost-cuse | |
58397e6c | 543 | |
7d1ced01 CL |
544 | Whatever type of vhost is enabled in the DPDK build specified, is the type |
545 | that will be enabled in OVS. By default, vhost-user is enabled in DPDK. | |
546 | Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports | |
547 | will be enabled in OVS. | |
548 | Please note that support for vhost-cuse is intended to be deprecated in OVS | |
549 | in a future release. | |
58397e6c | 550 | |
7d1ced01 CL |
551 | DPDK vhost-user: |
552 | ---------------- | |
58397e6c | 553 | |
7d1ced01 CL |
554 | The following sections describe the use of vhost-user 'dpdkvhostuser' ports |
555 | with OVS. | |
58397e6c | 556 | |
7d1ced01 CL |
557 | DPDK vhost-user Prerequisites: |
558 | ------------------------- | |
58397e6c | 559 | |
362ca396 | 560 | 1. DPDK 16.04 with vhost support enabled as documented in the "Building and |
7d1ced01 | 561 | Installing section" |
58397e6c | 562 | |
7d1ced01 | 563 | 2. QEMU version v2.1.0+ |
58397e6c | 564 | |
7d1ced01 CL |
565 | QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing |
566 | your VM with memory greater than 1GB due to potential issues with memory | |
567 | mapping larger areas. | |
58397e6c | 568 | |
7d1ced01 CL |
569 | Adding DPDK vhost-user ports to the Switch: |
570 | -------------------------------------- | |
58397e6c | 571 | |
7d1ced01 CL |
572 | Following the steps above to create a bridge, you can now add DPDK vhost-user |
573 | as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can | |
1af27e8a DDP |
574 | have arbitrary names, except that forward and backward slashes are prohibited |
575 | in the names. | |
58397e6c | 576 | |
7d1ced01 | 577 | - For vhost-user, the name of the port type is `dpdkvhostuser` |
58397e6c | 578 | |
7d1ced01 | 579 | ``` |
1af65cc7 | 580 | ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 |
7d1ced01 CL |
581 | type=dpdkvhostuser |
582 | ``` | |
583 | ||
584 | This action creates a socket located at | |
585 | `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide | |
586 | to your VM on the QEMU command line. More instructions on this can be | |
587 | found in the next section "DPDK vhost-user VM configuration" | |
6eb51f9a CL |
588 | - If you wish for the vhost-user sockets to be created in a sub-directory of |
589 | `/usr/local/var/run/openvswitch`, you may specify this directory in the | |
590 | ovsdb like so: | |
7d1ced01 | 591 | |
bab69409 | 592 | `./utilities/ovs-vsctl --no-wait \ |
6eb51f9a | 593 | set Open_vSwitch . other_config:vhost-sock-dir=subdir` |
7d1ced01 CL |
594 | |
595 | DPDK vhost-user VM configuration: | |
596 | --------------------------------- | |
597 | Follow the steps below to attach vhost-user port(s) to a VM. | |
598 | ||
599 | 1. Configure sockets. | |
600 | Pass the following parameters to QEMU to attach a vhost-user device: | |
601 | ||
602 | ``` | |
603 | -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 | |
604 | -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce | |
605 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 | |
606 | ``` | |
607 | ||
608 | ...where vhost-user-1 is the name of the vhost-user port added | |
609 | to the switch. | |
610 | Repeat the above parameters for multiple devices, changing the | |
611 | chardev path and id as necessary. Note that a separate and different | |
612 | chardev path needs to be specified for each vhost-user device. For | |
613 | example you have a second vhost-user port named 'vhost-user-2', you | |
614 | append your QEMU command line with an additional set of parameters: | |
615 | ||
616 | ``` | |
617 | -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 | |
618 | -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce | |
619 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 | |
620 | ``` | |
621 | ||
622 | 2. Configure huge pages. | |
623 | QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access | |
624 | a virtio-net device's virtual rings and packet buffers mapping the VM's | |
625 | physical memory on hugetlbfs. To enable vhost-user ports to map the VM's | |
626 | memory into their process address space, pass the following paramters | |
627 | to QEMU: | |
628 | ||
629 | ``` | |
630 | -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, | |
631 | share=on | |
632 | -numa node,memdev=mem -mem-prealloc | |
633 | ``` | |
634 | ||
4573fbd3 | 635 | 3. Optional: Enable multiqueue support |
a14b8947 IM |
636 | The vhost-user interface must be configured in Open vSwitch with the |
637 | desired amount of queues with: | |
638 | ||
639 | ``` | |
640 | ovs-vsctl set Interface vhost-user-2 options:n_rxq=<requested queues> | |
641 | ``` | |
642 | ||
643 | QEMU needs to be configured as well. | |
644 | The $q below should match the queues requested in OVS (if $q is more, | |
645 | packets will not be received). | |
4573fbd3 FL |
646 | The $v is the number of vectors, which is '$q x 2 + 2'. |
647 | ||
648 | ``` | |
649 | -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 | |
650 | -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q | |
651 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v | |
652 | ``` | |
653 | ||
db6e1383 IS |
654 | If one wishes to use multiple queues for an interface in the guest, the |
655 | driver in the guest operating system must be configured to do so. It is | |
656 | recommended that the number of queues configured be equal to '$q'. | |
657 | ||
658 | For example, this can be done for the Linux kernel virtio-net driver with: | |
659 | ||
660 | ``` | |
661 | ethtool -L <DEV> combined <$q> | |
662 | ``` | |
663 | ||
664 | A note on the command above: | |
665 | ||
666 | `-L`: Changes the numbers of channels of the specified network device | |
667 | ||
668 | `combined`: Changes the number of multi-purpose channels. | |
669 | ||
7d1ced01 CL |
670 | DPDK vhost-cuse: |
671 | ---------------- | |
672 | ||
673 | The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports | |
674 | with OVS. | |
675 | ||
676 | DPDK vhost-cuse Prerequisites: | |
677 | ------------------------- | |
678 | ||
362ca396 | 679 | 1. DPDK 16.04 with vhost support enabled as documented in the "Building and |
7d1ced01 CL |
680 | Installing section" |
681 | As an additional step, you must enable vhost-cuse in DPDK by setting the | |
362ca396 | 682 | following additional flag in `config/common_base`: |
7d1ced01 CL |
683 | |
684 | `CONFIG_RTE_LIBRTE_VHOST_USER=n` | |
685 | ||
686 | Following this, rebuild DPDK as per the instructions in the "Building and | |
687 | Installing" section. Finally, rebuild OVS as per step 3 in the "Building | |
688 | and Installing" section - OVS will detect that DPDK has vhost-cuse libraries | |
689 | compiled and in turn will enable support for it in the switch and disable | |
690 | vhost-user support. | |
691 | ||
692 | 2. Insert the Cuse module: | |
693 | ||
694 | `modprobe cuse` | |
695 | ||
696 | 3. Build and insert the `eventfd_link` module: | |
697 | ||
698 | ``` | |
699 | cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ | |
700 | make | |
701 | insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko | |
702 | ``` | |
703 | ||
704 | 4. QEMU version v2.1.0+ | |
705 | ||
706 | vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to | |
707 | use v2.2.0 if providing your VM with memory greater than 1GB due to potential | |
708 | issues with memory mapping larger areas. | |
709 | Note: QEMU v1.6.2 will also work, with slightly different command line parameters, | |
710 | which are specified later in this document. | |
711 | ||
712 | Adding DPDK vhost-cuse ports to the Switch: | |
713 | -------------------------------------- | |
714 | ||
715 | Following the steps above to create a bridge, you can now add DPDK vhost-cuse | |
716 | as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have | |
717 | arbitrary names. | |
718 | ||
719 | - For vhost-cuse, the name of the port type is `dpdkvhostcuse` | |
720 | ||
721 | ``` | |
1af65cc7 | 722 | ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 |
7d1ced01 CL |
723 | type=dpdkvhostcuse |
724 | ``` | |
725 | ||
726 | When attaching vhost-cuse ports to QEMU, the name provided during the | |
727 | add-port operation must match the ifname parameter on the QEMU command | |
728 | line. More instructions on this can be found in the next section. | |
729 | ||
730 | DPDK vhost-cuse VM configuration: | |
731 | --------------------------------- | |
732 | ||
733 | vhost-cuse ports use a Linux* character device to communicate with QEMU. | |
58397e6c KT |
734 | By default it is set to `/dev/vhost-net`. It is possible to reuse this |
735 | standard device for DPDK vhost, which makes setup a little simpler but it | |
736 | is better practice to specify an alternative character device in order to | |
737 | avoid any conflicts if kernel vhost is to be used in parallel. | |
738 | ||
739 | 1. This step is only needed if using an alternative character device. | |
740 | ||
bab69409 | 741 | The new character device filename must be specified in the ovsdb: |
58397e6c | 742 | |
bab69409 AC |
743 | `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \ |
744 | other_config:cuse-dev-name=my-vhost-net` | |
58397e6c | 745 | |
bab69409 AC |
746 | In the example above, the character device to be used will be |
747 | `/dev/my-vhost-net`. | |
58397e6c KT |
748 | |
749 | 2. This step is only needed if reusing the standard character device. It will | |
750 | conflict with the kernel vhost character device so the user must first | |
751 | remove it. | |
752 | ||
753 | `rm -rf /dev/vhost-net` | |
754 | ||
755 | 3a. Configure virtio-net adaptors: | |
756 | The following parameters must be passed to the QEMU binary: | |
757 | ||
758 | ``` | |
759 | -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on | |
760 | -device virtio-net-pci,netdev=net1,mac=<mac> | |
761 | ``` | |
762 | ||
763 | Repeat the above parameters for multiple devices. | |
764 | ||
765 | The DPDK vhost library will negiotiate its own features, so they | |
766 | need not be passed in as command line params. Note that as offloads are | |
767 | disabled this is the equivalent of setting: | |
768 | ||
769 | `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` | |
770 | ||
771 | 3b. If using an alternative character device. It must be also explicitly | |
772 | passed to QEMU using the `vhostfd` argument: | |
773 | ||
774 | ``` | |
775 | -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, | |
776 | vhostfd=<open_fd> | |
777 | -device virtio-net-pci,netdev=net1,mac=<mac> | |
778 | ``` | |
779 | ||
780 | The open file descriptor must be passed to QEMU running as a child | |
781 | process. This could be done with a simple python script. | |
782 | ||
783 | ``` | |
784 | #!/usr/bin/python | |
785 | fd = os.open("/dev/usvhost", os.O_RDWR) | |
786 | subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ | |
787 | vhost=on,vhostfd=" + fd +"...", shell=True) | |
788 | ||
898dcef1 | 789 | Alternatively the `qemu-wrap.py` script can be used to automate the |
58397e6c KT |
790 | requirements specified above and can be used in conjunction with libvirt if |
791 | desired. See the "DPDK vhost VM configuration with QEMU wrapper" section | |
792 | below. | |
793 | ||
794 | 4. Configure huge pages: | |
795 | QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a | |
796 | virtio-net device's virtual rings and packet buffers mapping the VM's | |
797 | physical memory on hugetlbfs. To enable vhost-ports to map the VM's | |
7d1ced01 | 798 | memory into their process address space, pass the following parameters |
58397e6c KT |
799 | to QEMU: |
800 | ||
801 | `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, | |
802 | share=on -numa node,memdev=mem -mem-prealloc` | |
803 | ||
7d1ced01 CL |
804 | Note: For use with an earlier QEMU version such as v1.6.2, use the |
805 | following to configure hugepages instead: | |
58397e6c | 806 | |
7d1ced01 | 807 | `-mem-path /dev/hugepages -mem-prealloc` |
58397e6c | 808 | |
7d1ced01 CL |
809 | DPDK vhost-cuse VM configuration with QEMU wrapper: |
810 | --------------------------------------------------- | |
58397e6c KT |
811 | The QEMU wrapper script automatically detects and calls QEMU with the |
812 | necessary parameters. It performs the following actions: | |
813 | ||
814 | * Automatically detects the location of the hugetlbfs and inserts this | |
815 | into the command line parameters. | |
816 | * Automatically open file descriptors for each virtio-net device and | |
817 | inserts this into the command line parameters. | |
818 | * Calls QEMU passing both the command line parameters passed to the | |
819 | script itself and those it has auto-detected. | |
820 | ||
821 | Before use, you **must** edit the configuration parameters section of the | |
822 | script to point to the correct emulator location and set additional | |
823 | settings. Of these settings, `emul_path` and `us_vhost_path` **must** be | |
824 | set. All other settings are optional. | |
825 | ||
826 | To use directly from the command line simply pass the wrapper some of the | |
827 | QEMU parameters: it will configure the rest. For example: | |
828 | ||
829 | ``` | |
830 | qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4 | |
831 | --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1, | |
832 | script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci, | |
833 | netdev=net1,mac=00:00:00:00:00:01 | |
5568661c | 834 | ``` |
58397e6c | 835 | |
7d1ced01 CL |
836 | DPDK vhost-cuse VM configuration with libvirt: |
837 | ---------------------------------------------- | |
58397e6c KT |
838 | |
839 | If you are using libvirt, you must enable libvirt to access the character | |
840 | device by adding it to controllers cgroup for libvirtd using the following | |
841 | steps. | |
842 | ||
843 | 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: | |
844 | ||
845 | ``` | |
846 | 1) clear_emulator_capabilities = 0 | |
847 | 2) user = "root" | |
848 | 3) group = "root" | |
849 | 4) cgroup_device_acl = [ | |
850 | "/dev/null", "/dev/full", "/dev/zero", | |
851 | "/dev/random", "/dev/urandom", | |
852 | "/dev/ptmx", "/dev/kvm", "/dev/kqemu", | |
853 | "/dev/rtc", "/dev/hpet", "/dev/net/tun", | |
854 | "/dev/<my-vhost-device>", | |
855 | "/dev/hugepages"] | |
856 | ``` | |
857 | ||
858 | <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net` | |
bab69409 AC |
859 | device. If you have specificed a different name in the database |
860 | using the "other_config:cuse-dev-name" parameter, please specify that | |
58397e6c KT |
861 | filename instead. |
862 | ||
863 | 2. Disable SELinux or set to permissive mode | |
864 | ||
865 | 3. Restart the libvirtd process | |
866 | For example, on Fedora: | |
867 | ||
868 | `systemctl restart libvirtd.service` | |
869 | ||
870 | After successfully editing the configuration, you may launch your | |
871 | vhost-enabled VM. The XML describing the VM can be configured like so | |
872 | within the <qemu:commandline> section: | |
873 | ||
874 | 1. Set up shared hugepages: | |
875 | ||
876 | ``` | |
877 | <qemu:arg value='-object'/> | |
878 | <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/> | |
879 | <qemu:arg value='-numa'/> | |
880 | <qemu:arg value='node,memdev=mem'/> | |
881 | <qemu:arg value='-mem-prealloc'/> | |
882 | ``` | |
883 | ||
884 | 2. Set up your tap devices: | |
885 | ||
886 | ``` | |
887 | <qemu:arg value='-netdev'/> | |
888 | <qemu:arg value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/> | |
889 | <qemu:arg value='-device'/> | |
890 | <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/> | |
891 | ``` | |
892 | ||
893 | Repeat for as many devices as are desired, modifying the id, ifname | |
894 | and mac as necessary. | |
895 | ||
896 | Again, if you are using an alternative character device (other than | |
897 | `/dev/vhost-net`), please specify the file descriptor like so: | |
898 | ||
899 | `<qemu:arg value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>` | |
900 | ||
901 | Where <open_fd> refers to the open file descriptor of the character device. | |
902 | Instructions of how to retrieve the file descriptor can be found in the | |
903 | "DPDK vhost VM configuration" section. | |
904 | Alternatively, the process is automated with the qemu-wrap.py script, | |
905 | detailed in the next section. | |
906 | ||
907 | Now you may launch your VM using virt-manager, or like so: | |
908 | ||
909 | `virsh create my_vhost_vm.xml` | |
910 | ||
7d1ced01 | 911 | DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: |
58397e6c KT |
912 | ---------------------------------------------------------- |
913 | ||
914 | To use the qemu-wrapper script in conjuntion with libvirt, follow the | |
915 | steps in the previous section before proceeding with the following steps: | |
916 | ||
917 | 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) | |
918 | Ideally in the same directory that the QEMU binary is located. | |
919 | ||
920 | 2. Ensure that the script has the same owner/group and file permissions | |
921 | as the QEMU binary. | |
922 | ||
923 | 3. Update the VM xml file using "virsh edit VM.xml" | |
924 | ||
925 | 1. Set the VM to use the launch script. | |
926 | Set the emulator path contained in the `<emulator><emulator/>` tags. | |
927 | For example, replace: | |
928 | ||
929 | `<emulator>/usr/bin/qemu-kvm<emulator/>` | |
930 | ||
931 | with: | |
932 | ||
933 | `<emulator>/usr/bin/qemu-wrap.py<emulator/>` | |
934 | ||
935 | 4. Edit the Configuration Parameters section of the script to point to | |
936 | the correct emulator location and set any additional options. If you are | |
937 | using a alternative character device name, please set "us_vhost_path" to the | |
938 | location of that device. The script will automatically detect and insert | |
7d1ced01 | 939 | the correct "vhostfd" value in the QEMU command line arguments. |
58397e6c KT |
940 | |
941 | 5. Use virt-manager to launch the VM | |
942 | ||
9899125a OS |
943 | Running ovs-vswitchd with DPDK backend inside a VM |
944 | -------------------------------------------------- | |
945 | ||
946 | Please note that additional configuration is required if you want to run | |
947 | ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd | |
948 | creates separate DPDK TX queues for each CPU core available. This operation | |
949 | fails inside QEMU virtual machine because, by default, VirtIO NIC provided | |
950 | to the guest is configured to support only single TX queue and single RX | |
951 | queue. To change this behavior, you need to turn on 'mq' (multiqueue) | |
952 | property of all virtio-net-pci devices emulated by QEMU and used by DPDK. | |
953 | You may do it manually (by changing QEMU command line) or, if you use Libvirt, | |
954 | by adding the following string: | |
955 | ||
956 | `<driver name='vhost' queues='N'/>` | |
957 | ||
958 | to <interface> sections of all network devices used by DPDK. Parameter 'N' | |
959 | determines how many queues can be used by the guest. | |
960 | ||
542cc9bb TG |
961 | Restrictions: |
962 | ------------- | |
963 | ||
542cc9bb TG |
964 | - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. |
965 | - Currently DPDK port does not make use any offload functionality. | |
58397e6c | 966 | - DPDK-vHost support works with 1G huge pages. |
542cc9bb TG |
967 | |
968 | ivshmem: | |
3088fab7 MG |
969 | - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be |
970 | unable to share any rings or mempools with a virtual machine. | |
971 | This is because the current implementation of ivshmem works by sharing | |
972 | a single 1GB huge page from the host operating system to any guest | |
973 | operating system through the Qemu ivshmem device. When using smaller | |
974 | page sizes, multiple pages may be required to hold the ring descriptors | |
975 | and buffer pools. The Qemu ivshmem device does not allow you to share | |
976 | multiple file descriptors to the guest operating system. However, if you | |
977 | want to share dpdkr rings with other processes on the host, you can do | |
978 | this with smaller page sizes. | |
542cc9bb | 979 | |
1e77bbe5 | 980 | Platform and Network Interface: |
362ca396 | 981 | - By default with DPDK 16.04, a maximum of 64 TX queues can be used with an |
49bbbdfd IS |
982 | Intel XL710 Network Interface on a platform with more than 64 logical |
983 | cores. If a user attempts to add an XL710 interface as a DPDK port type to | |
984 | a system as described above, an error will be reported that initialization | |
985 | failed for the 65th queue. OVS will then roll back to the previous | |
986 | successful queue initialization and use that value as the total number of | |
987 | TX queues available with queue locking. If a user wishes to use more than | |
988 | 64 queues and avoid locking, then the | |
989 | `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in DPDK must be | |
990 | increased to the desired number of queues. Both DPDK and OVS must be | |
991 | recompiled for this change to take effect. | |
1e77bbe5 | 992 | |
542cc9bb TG |
993 | Bug Reporting: |
994 | -------------- | |
995 | ||
996 | Please report problems to bugs@openvswitch.org. | |
9feb1017 TG |
997 | |
998 | [INSTALL.userspace.md]:INSTALL.userspace.md | |
999 | [INSTALL.md]:INSTALL.md | |
491c2ea3 | 1000 | [DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules |
58397e6c | 1001 | [DPDK Docs]: http://dpdk.org/doc |