]>
Commit | Line | Data |
---|---|---|
1 | .. | |
2 | Licensed under the Apache License, Version 2.0 (the "License"); you may | |
3 | not use this file except in compliance with the License. You may obtain | |
4 | a copy of the License at | |
5 | ||
6 | http://www.apache.org/licenses/LICENSE-2.0 | |
7 | ||
8 | Unless required by applicable law or agreed to in writing, software | |
9 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
10 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
11 | License for the specific language governing permissions and limitations | |
12 | under the License. | |
13 | ||
14 | Convention for heading levels in Open vSwitch documentation: | |
15 | ||
16 | ======= Heading 0 (reserved for the title in a document) | |
17 | ------- Heading 1 | |
18 | ~~~~~~~ Heading 2 | |
19 | +++++++ Heading 3 | |
20 | ''''''' Heading 4 | |
21 | ||
22 | Avoid deeper levels because they do not render well. | |
23 | ||
24 | ============================ | |
25 | Using Open vSwitch with DPDK | |
26 | ============================ | |
27 | ||
28 | This document describes how to use Open vSwitch with DPDK datapath. | |
29 | ||
30 | .. important:: | |
31 | ||
32 | Using the DPDK datapath requires building OVS with DPDK support. Refer to | |
33 | :doc:`/intro/install/dpdk` for more information. | |
34 | ||
35 | Ports and Bridges | |
36 | ----------------- | |
37 | ||
38 | ovs-vsctl can be used to set up bridges and other Open vSwitch features. | |
39 | Bridges should be created with a ``datapath_type=netdev``:: | |
40 | ||
41 | $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev | |
42 | ||
43 | ovs-vsctl can also be used to add DPDK devices. ovs-vswitchd should print the | |
44 | number of dpdk devices found in the log file:: | |
45 | ||
46 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ | |
47 | options:dpdk-devargs=0000:01:00.0 | |
48 | $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \ | |
49 | options:dpdk-devargs=0000:01:00.1 | |
50 | ||
51 | After the DPDK ports get added to switch, a polling thread continuously polls | |
52 | DPDK devices and consumes 100% of the core, as can be checked from ``top`` and | |
53 | ``ps`` commands:: | |
54 | ||
55 | $ top -H | |
56 | $ ps -eLo pid,psr,comm | grep pmd | |
57 | ||
58 | Creating bonds of DPDK interfaces is slightly different to creating bonds of | |
59 | system interfaces. For DPDK, the interface type and devargs must be explicitly | |
60 | set. For example:: | |
61 | ||
62 | $ ovs-vsctl add-bond br0 dpdkbond p0 p1 \ | |
63 | -- set Interface p0 type=dpdk options:dpdk-devargs=0000:01:00.0 \ | |
64 | -- set Interface p1 type=dpdk options:dpdk-devargs=0000:01:00.1 | |
65 | ||
66 | To stop ovs-vswitchd & delete bridge, run:: | |
67 | ||
68 | $ ovs-appctl -t ovs-vswitchd exit | |
69 | $ ovs-appctl -t ovsdb-server exit | |
70 | $ ovs-vsctl del-br br0 | |
71 | ||
72 | PMD Thread Statistics | |
73 | --------------------- | |
74 | ||
75 | To show current stats:: | |
76 | ||
77 | $ ovs-appctl dpif-netdev/pmd-stats-show | |
78 | ||
79 | To clear previous stats:: | |
80 | ||
81 | $ ovs-appctl dpif-netdev/pmd-stats-clear | |
82 | ||
83 | Port/RXQ Assigment to PMD Threads | |
84 | --------------------------------- | |
85 | ||
86 | To show port/rxq assignment:: | |
87 | ||
88 | $ ovs-appctl dpif-netdev/pmd-rxq-show | |
89 | ||
90 | To change default rxq assignment to pmd threads, rxqs may be manually pinned to | |
91 | desired cores using:: | |
92 | ||
93 | $ ovs-vsctl set Interface <iface> \ | |
94 | other_config:pmd-rxq-affinity=<rxq-affinity-list> | |
95 | ||
96 | where: | |
97 | ||
98 | - ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values | |
99 | ||
100 | For example:: | |
101 | ||
102 | $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ | |
103 | other_config:pmd-rxq-affinity="0:3,1:7,3:8" | |
104 | ||
105 | This will ensure: | |
106 | ||
107 | - Queue #0 pinned to core 3 | |
108 | - Queue #1 pinned to core 7 | |
109 | - Queue #2 not pinned | |
110 | - Queue #3 pinned to core 8 | |
111 | ||
112 | After that PMD threads on cores where RX queues was pinned will become | |
113 | ``isolated``. This means that this thread will poll only pinned RX queues. | |
114 | ||
115 | .. warning:: | |
116 | If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will | |
117 | not be polled. Also, if provided ``core_id`` is not available (ex. this | |
118 | ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD | |
119 | thread. | |
120 | ||
121 | QoS | |
122 | --- | |
123 | ||
124 | Assuming you have a vhost-user port transmitting traffic consisting of packets | |
125 | of size 64 bytes, the following command would limit the egress transmission | |
126 | rate of the port to ~1,000,000 packets per second:: | |
127 | ||
128 | $ ovs-vsctl set port vhost-user0 qos=@newqos -- \ | |
129 | --id=@newqos create qos type=egress-policer other-config:cir=46000000 \ | |
130 | other-config:cbs=2048` | |
131 | ||
132 | To examine the QoS configuration of the port, run:: | |
133 | ||
134 | $ ovs-appctl -t ovs-vswitchd qos/show vhost-user0 | |
135 | ||
136 | To clear the QoS configuration from the port and ovsdb, run:: | |
137 | ||
138 | $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos | |
139 | ||
140 | Refer to vswitch.xml for more details on egress-policer. | |
141 | ||
142 | Rate Limiting | |
143 | -------------- | |
144 | ||
145 | Here is an example on Ingress Policing usage. Assuming you have a vhost-user | |
146 | port receiving traffic consisting of packets of size 64 bytes, the following | |
147 | command would limit the reception rate of the port to ~1,000,000 packets per | |
148 | second:: | |
149 | ||
150 | $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 \ | |
151 | ingress_policing_burst=1000` | |
152 | ||
153 | To examine the ingress policer configuration of the port:: | |
154 | ||
155 | $ ovs-vsctl list interface vhost-user0 | |
156 | ||
157 | To clear the ingress policer configuration from the port:: | |
158 | ||
159 | $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0 | |
160 | ||
161 | Refer to vswitch.xml for more details on ingress-policer. | |
162 | ||
163 | Flow Control | |
164 | ------------ | |
165 | ||
166 | Flow control can be enabled only on DPDK physical ports. To enable flow control | |
167 | support at tx side while adding a port, run:: | |
168 | ||
169 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ | |
170 | options:dpdk-devargs=0000:01:00.0 options:tx-flow-ctrl=true | |
171 | ||
172 | Similarly, to enable rx flow control, run:: | |
173 | ||
174 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ | |
175 | options:dpdk-devargs=0000:01:00.0 options:rx-flow-ctrl=true | |
176 | ||
177 | To enable flow control auto-negotiation, run:: | |
178 | ||
179 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ | |
180 | options:dpdk-devargs=0000:01:00.0 options:flow-ctrl-autoneg=true | |
181 | ||
182 | To turn ON the tx flow control at run time for an existing port, run:: | |
183 | ||
184 | $ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true | |
185 | ||
186 | The flow control parameters can be turned off by setting ``false`` to the | |
187 | respective parameter. To disable the flow control at tx side, run:: | |
188 | ||
189 | $ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=false | |
190 | ||
191 | pdump | |
192 | ----- | |
193 | ||
194 | pdump allows you to listen on DPDK ports and view the traffic that is passing | |
195 | on them. To use this utility, one must have libpcap installed on the system. | |
196 | Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and | |
197 | ``CONFIG_RTE_LIBRTE_PMD_PCAP=y``. | |
198 | ||
199 | .. warning:: | |
200 | A performance decrease is expected when using a monitoring application like | |
201 | the DPDK pdump app. | |
202 | ||
203 | To use pdump, simply launch OVS as usual, then navigate to the ``app/pdump`` | |
204 | directory in DPDK, ``make`` the application and run like so:: | |
205 | ||
206 | $ sudo ./build/app/dpdk-pdump -- \ | |
207 | --pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \ | |
208 | --server-socket-path=/usr/local/var/run/openvswitch | |
209 | ||
210 | The above command captures traffic received on queue 0 of port 0 and stores it | |
211 | in ``/tmp/pkts.pcap``. Other combinations of port numbers, queues numbers and | |
212 | pcap locations are of course also available to use. For example, to capture all | |
213 | packets that traverse port 0 in a single pcap file:: | |
214 | ||
215 | $ sudo ./build/app/dpdk-pdump -- \ | |
216 | --pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap' \ | |
217 | --server-socket-path=/usr/local/var/run/openvswitch | |
218 | ||
219 | ``server-socket-path`` must be set to the value of ``ovs_rundir()`` which | |
220 | typically resolves to ``/usr/local/var/run/openvswitch``. | |
221 | ||
222 | Many tools are available to view the contents of the pcap file. Once example is | |
223 | tcpdump. Issue the following command to view the contents of ``pkts.pcap``:: | |
224 | ||
225 | $ tcpdump -r pkts.pcap | |
226 | ||
227 | More information on the pdump app and its usage can be found in the `DPDK docs | |
228 | <http://dpdk.org/doc/guides/tools/pdump.html>`__. | |
229 | ||
230 | Jumbo Frames | |
231 | ------------ | |
232 | ||
233 | By default, DPDK ports are configured with standard Ethernet MTU (1500B). To | |
234 | enable Jumbo Frames support for a DPDK port, change the Interface's | |
235 | ``mtu_request`` attribute to a sufficiently large value. For example, to add a | |
236 | DPDK Phy port with MTU of 9000:: | |
237 | ||
238 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ | |
239 | options:dpdk-devargs=0000:01:00.0 mtu_request=9000 | |
240 | ||
241 | Similarly, to change the MTU of an existing port to 6200:: | |
242 | ||
243 | $ ovs-vsctl set Interface dpdk-p0 mtu_request=6200 | |
244 | ||
245 | Some additional configuration is needed to take advantage of jumbo frames with | |
246 | vHost ports: | |
247 | ||
248 | 1. *mergeable buffers* must be enabled for vHost ports, as demonstrated in the | |
249 | QEMU command line snippet below:: | |
250 | ||
251 | -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ | |
252 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on | |
253 | ||
254 | 2. Where virtio devices are bound to the Linux kernel driver in a guest | |
255 | environment (i.e. interfaces are not bound to an in-guest DPDK driver), the | |
256 | MTU of those logical network interfaces must also be increased to a | |
257 | sufficiently large value. This avoids segmentation of Jumbo Frames received | |
258 | in the guest. Note that 'MTU' refers to the length of the IP packet only, | |
259 | and not that of the entire frame. | |
260 | ||
261 | To calculate the exact MTU of a standard IPv4 frame, subtract the L2 header | |
262 | and CRC lengths (i.e. 18B) from the max supported frame size. So, to set | |
263 | the MTU for a 9018B Jumbo Frame:: | |
264 | ||
265 | $ ip link set eth1 mtu 9000 | |
266 | ||
267 | When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are | |
268 | increased, such that a full Jumbo Frame of a specific size may be accommodated | |
269 | within a single mbuf segment. | |
270 | ||
271 | Jumbo frame support has been validated against 9728B frames, which is the | |
272 | largest frame size supported by Fortville NIC using the DPDK i40e driver, but | |
273 | larger frames and other DPDK NIC drivers may be supported. These cases are | |
274 | common for use cases involving East-West traffic only. | |
275 | ||
276 | Rx Checksum Offload | |
277 | ------------------- | |
278 | ||
279 | By default, DPDK physical ports are enabled with Rx checksum offload. Rx | |
280 | checksum offload can be configured on a DPDK physical port either when adding | |
281 | or at run time. | |
282 | ||
283 | To disable Rx checksum offload when adding a DPDK port dpdk-p0:: | |
284 | ||
285 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ | |
286 | options:dpdk-devargs=0000:01:00.0 options:rx-checksum-offload=false | |
287 | ||
288 | Similarly to disable the Rx checksum offloading on a existing DPDK port dpdk-p0:: | |
289 | ||
290 | $ ovs-vsctl set Interface dpdk-p0 options:rx-checksum-offload=false | |
291 | ||
292 | Rx checksum offload can offer performance improvement only for tunneling | |
293 | traffic in OVS-DPDK because the checksum validation of tunnel packets is | |
294 | offloaded to the NIC. Also enabling Rx checksum may slightly reduce the | |
295 | performance of non-tunnel traffic, specifically for smaller size packet. | |
296 | DPDK vectorization is disabled when checksum offloading is configured on DPDK | |
297 | physical ports which in turn effects the non-tunnel traffic performance. | |
298 | So it is advised to turn off the Rx checksum offload for non-tunnel traffic use | |
299 | cases to achieve the best performance. | |
300 | ||
301 | .. _extended-statistics: | |
302 | ||
303 | Extended Statistics | |
304 | ------------------- | |
305 | ||
306 | DPDK Extended Statistics API allows PMD to expose unique set of statistics. | |
307 | The Extended statistics are implemented and supported only for DPDK physical | |
308 | and vHost ports. | |
309 | ||
310 | To enable statistics, you have to enable OpenFlow 1.4 support for OVS. | |
311 | Configure bridge br0 to support OpenFlow version 1.4:: | |
312 | ||
313 | $ ovs-vsctl set bridge br0 datapath_type=netdev \ | |
314 | protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 | |
315 | ||
316 | Check the OVSDB protocols column in the bridge table if OpenFlow 1.4 support | |
317 | is enabled for OVS:: | |
318 | ||
319 | $ ovsdb-client dump Bridge protocols | |
320 | ||
321 | Query the port statistics by explicitly specifying -O OpenFlow14 option:: | |
322 | ||
323 | $ ovs-ofctl -O OpenFlow14 dump-ports br0 | |
324 | ||
325 | Note: vHost ports supports only partial statistics. RX packet size based | |
326 | counter are only supported and doesn't include TX packet size counters. | |
327 | ||
328 | .. _port-hotplug: | |
329 | ||
330 | Port Hotplug | |
331 | ------------ | |
332 | ||
333 | OVS supports port hotplugging, allowing the use of ports that were not bound | |
334 | to DPDK when vswitchd was started. | |
335 | In order to attach a port, it has to be bound to DPDK using the | |
336 | ``dpdk_nic_bind.py`` script:: | |
337 | ||
338 | $ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio 0000:01:00.0 | |
339 | ||
340 | Then it can be attached to OVS:: | |
341 | ||
342 | $ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \ | |
343 | options:dpdk-devargs=0000:01:00.0 | |
344 | ||
345 | Detaching will be performed while processing del-port command:: | |
346 | ||
347 | $ ovs-vsctl del-port dpdkx | |
348 | ||
349 | This feature is not supported with VFIO and does not work with some NICs. | |
350 | For more information please refer to the `DPDK Port Hotplug Framework | |
351 | <http://dpdk.org/doc/guides/prog_guide/port_hotplug_framework.html#hotplug>`__. | |
352 | ||
353 | .. _vdev-support: | |
354 | ||
355 | Vdev Support | |
356 | ------------ | |
357 | ||
358 | DPDK provides drivers for both physical and virtual devices. Physical DPDK | |
359 | devices are added to OVS by specifying a valid PCI address in 'dpdk-devargs'. | |
360 | Virtual DPDK devices which do not have PCI addresses can be added using a | |
361 | different format for 'dpdk-devargs'. | |
362 | ||
363 | Typically, the format expected is 'eth_<driver_name><x>' where 'x' is a | |
364 | unique identifier of your choice for the given port. | |
365 | ||
366 | For example to add a dpdk port that uses the 'null' DPDK PMD driver:: | |
367 | ||
368 | $ ovs-vsctl add-port br0 null0 -- set Interface null0 type=dpdk \ | |
369 | options:dpdk-devargs=eth_null0 | |
370 | ||
371 | Similarly, to add a dpdk port that uses the 'af_packet' DPDK PMD driver:: | |
372 | ||
373 | $ ovs-vsctl add-port br0 myeth0 -- set Interface myeth0 type=dpdk \ | |
374 | options:dpdk-devargs=eth_af_packet0,iface=eth0 | |
375 | ||
376 | More information on the different types of virtual DPDK PMDs can be found in | |
377 | the `DPDK documentation | |
378 | <http://dpdk.org/doc/guides/nics/overview.html>`__. | |
379 | ||
380 | Note: Not all DPDK virtual PMD drivers have been tested and verified to work. | |
381 | ||
382 | EMC Insertion Probability | |
383 | ------------------------- | |
384 | By default 1 in every 100 flows are inserted into the Exact Match Cache (EMC). | |
385 | It is possible to change this insertion probability by setting the | |
386 | ``emc-insert-inv-prob`` option:: | |
387 | ||
388 | $ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N | |
389 | ||
390 | where: | |
391 | ||
392 | ``N`` | |
393 | is a positive integer representing the inverse probability of insertion ie. | |
394 | on average 1 in every N packets with a unique flow will generate an EMC | |
395 | insertion. | |
396 | ||
397 | If ``N`` is set to 1, an insertion will be performed for every flow. If set to | |
398 | 0, no insertions will be performed and the EMC will effectively be disabled. | |
399 | ||
400 | With default ``N`` set to 100, higher megaflow hits will occur initially | |
401 | as observed with pmd stats:: | |
402 | ||
403 | $ ovs-appctl dpif-netdev/pmd-stats-show | |
404 | ||
405 | For certain traffic profiles with many parallel flows, it's recommended to set | |
406 | ``N`` to '0' to achieve higher forwarding performance. | |
407 | ||
408 | For more information on the EMC refer to :doc:`/intro/install/dpdk` . | |
409 | ||
410 | .. _dpdk-ovs-in-guest: | |
411 | ||
412 | OVS with DPDK Inside VMs | |
413 | ------------------------ | |
414 | ||
415 | Additional configuration is required if you want to run ovs-vswitchd with DPDK | |
416 | backend inside a QEMU virtual machine. ovs-vswitchd creates separate DPDK TX | |
417 | queues for each CPU core available. This operation fails inside QEMU virtual | |
418 | machine because, by default, VirtIO NIC provided to the guest is configured to | |
419 | support only single TX queue and single RX queue. To change this behavior, you | |
420 | need to turn on ``mq`` (multiqueue) property of all ``virtio-net-pci`` devices | |
421 | emulated by QEMU and used by DPDK. You may do it manually (by changing QEMU | |
422 | command line) or, if you use Libvirt, by adding the following string to | |
423 | ``<interface>`` sections of all network devices used by DPDK:: | |
424 | ||
425 | <driver name='vhost' queues='N'/> | |
426 | ||
427 | where: | |
428 | ||
429 | ``N`` | |
430 | determines how many queues can be used by the guest. | |
431 | ||
432 | This requires QEMU >= 2.2. | |
433 | ||
434 | .. _dpdk-phy-phy: | |
435 | ||
436 | PHY-PHY | |
437 | ------- | |
438 | ||
439 | Add a userspace bridge and two ``dpdk`` (PHY) ports:: | |
440 | ||
441 | # Add userspace bridge | |
442 | $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev | |
443 | ||
444 | # Add two dpdk ports | |
445 | $ ovs-vsctl add-port br0 phy0 -- set Interface phy0 type=dpdk \ | |
446 | options:dpdk-devargs=0000:01:00.0 ofport_request=1 | |
447 | ||
448 | $ ovs-vsctl add-port br0 phy1 -- set Interface phy1 type=dpdk | |
449 | options:dpdk-devargs=0000:01:00.1 ofport_request=2 | |
450 | ||
451 | Add test flows to forward packets betwen DPDK port 0 and port 1:: | |
452 | ||
453 | # Clear current flows | |
454 | $ ovs-ofctl del-flows br0 | |
455 | ||
456 | # Add flows between port 1 (phy0) to port 2 (phy1) | |
457 | $ ovs-ofctl add-flow br0 in_port=1,action=output:2 | |
458 | $ ovs-ofctl add-flow br0 in_port=2,action=output:1 | |
459 | ||
460 | Transmit traffic into either port. You should see it returned via the other. | |
461 | ||
462 | .. _dpdk-vhost-loopback: | |
463 | ||
464 | PHY-VM-PHY (vHost Loopback) | |
465 | --------------------------- | |
466 | ||
467 | Add a userspace bridge, two ``dpdk`` (PHY) ports, and two ``dpdkvhostuser`` | |
468 | ports:: | |
469 | ||
470 | # Add userspace bridge | |
471 | $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev | |
472 | ||
473 | # Add two dpdk ports | |
474 | $ ovs-vsctl add-port br0 phy0 -- set Interface phy0 type=dpdk \ | |
475 | options:dpdk-devargs=0000:01:00.0 ofport_request=1 | |
476 | ||
477 | $ ovs-vsctl add-port br0 phy1 -- set Interface phy1 type=dpdk | |
478 | options:dpdk-devargs=0000:01:00.1 ofport_request=2 | |
479 | ||
480 | # Add two dpdkvhostuser ports | |
481 | $ ovs-vsctl add-port br0 dpdkvhostuser0 \ | |
482 | -- set Interface dpdkvhostuser0 type=dpdkvhostuser ofport_request=3 | |
483 | $ ovs-vsctl add-port br0 dpdkvhostuser1 \ | |
484 | -- set Interface dpdkvhostuser1 type=dpdkvhostuser ofport_request=4 | |
485 | ||
486 | Add test flows to forward packets betwen DPDK devices and VM ports:: | |
487 | ||
488 | # Clear current flows | |
489 | $ ovs-ofctl del-flows br0 | |
490 | ||
491 | # Add flows | |
492 | $ ovs-ofctl add-flow br0 in_port=1,action=output:3 | |
493 | $ ovs-ofctl add-flow br0 in_port=3,action=output:1 | |
494 | $ ovs-ofctl add-flow br0 in_port=4,action=output:2 | |
495 | $ ovs-ofctl add-flow br0 in_port=2,action=output:4 | |
496 | ||
497 | # Dump flows | |
498 | $ ovs-ofctl dump-flows br0 | |
499 | ||
500 | Create a VM using the following configuration: | |
501 | ||
502 | +----------------------+--------+-----------------+ | |
503 | | configuration | values | comments | | |
504 | +----------------------+--------+-----------------+ | |
505 | | qemu version | 2.2.0 | n/a | | |
506 | | qemu thread affinity | core 5 | taskset 0x20 | | |
507 | | memory | 4GB | n/a | | |
508 | | cores | 2 | n/a | | |
509 | | Qcow2 image | CentOS7| n/a | | |
510 | | mrg_rxbuf | off | n/a | | |
511 | +----------------------+--------+-----------------+ | |
512 | ||
513 | You can do this directly with QEMU via the ``qemu-system-x86_64`` application:: | |
514 | ||
515 | $ export VM_NAME=vhost-vm | |
516 | $ export GUEST_MEM=3072M | |
517 | $ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 | |
518 | $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch | |
519 | ||
520 | $ taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm \ | |
521 | -m $GUEST_MEM -drive file=$QCOW2_IMAGE --nographic -snapshot \ | |
522 | -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 \ | |
523 | -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \ | |
524 | -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \ | |
525 | -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ | |
526 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off \ | |
527 | -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \ | |
528 | -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ | |
529 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off | |
530 | ||
531 | For a explanation of this command, along with alternative approaches such as | |
532 | booting the VM via libvirt, refer to :doc:`/topics/dpdk/vhost-user`. | |
533 | ||
534 | Once the guest is configured and booted, configure DPDK packet forwarding | |
535 | within the guest. To accomplish this, build the ``testpmd`` application as | |
536 | described in :ref:`dpdk-testpmd`. Once compiled, run the application:: | |
537 | ||
538 | $ cd $DPDK_DIR/app/test-pmd; | |
539 | $ ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- \ | |
540 | --burst=64 -i --txqflags=0xf00 --disable-hw-vlan | |
541 | $ set fwd mac retry | |
542 | $ start | |
543 | ||
544 | When you finish testing, bind the vNICs back to kernel:: | |
545 | ||
546 | $ $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci 0000:00:03.0 | |
547 | $ $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci 0000:00:04.0 | |
548 | ||
549 | .. note:: | |
550 | ||
551 | Valid PCI IDs must be passed in above example. The PCI IDs can be retrieved | |
552 | like so:: | |
553 | ||
554 | $ $DPDK_DIR/tools/dpdk-devbind.py --status | |
555 | ||
556 | More information on the dpdkvhostuser ports can be found in | |
557 | :doc:`/topics/dpdk/vhost-user`. | |
558 | ||
559 | PHY-VM-PHY (vHost Loopback) (Kernel Forwarding) | |
560 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
561 | ||
562 | :ref:`dpdk-vhost-loopback` details steps for PHY-VM-PHY loopback | |
563 | testcase and packet forwarding using DPDK testpmd application in the Guest VM. | |
564 | For users wishing to do packet forwarding using kernel stack below, you need to | |
565 | run the below commands on the guest:: | |
566 | ||
567 | $ ip addr add 1.1.1.2/24 dev eth1 | |
568 | $ ip addr add 1.1.2.2/24 dev eth2 | |
569 | $ ip link set eth1 up | |
570 | $ ip link set eth2 up | |
571 | $ systemctl stop firewalld.service | |
572 | $ systemctl stop iptables.service | |
573 | $ sysctl -w net.ipv4.ip_forward=1 | |
574 | $ sysctl -w net.ipv4.conf.all.rp_filter=0 | |
575 | $ sysctl -w net.ipv4.conf.eth1.rp_filter=0 | |
576 | $ sysctl -w net.ipv4.conf.eth2.rp_filter=0 | |
577 | $ route add -net 1.1.2.0/24 eth2 | |
578 | $ route add -net 1.1.1.0/24 eth1 | |
579 | $ arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE | |
580 | $ arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE | |
581 | ||
582 | PHY-VM-PHY (vHost Multiqueue) | |
583 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
584 | ||
585 | vHost Multiqueue functionality can also be validated using the PHY-VM-PHY | |
586 | configuration. To begin, follow the steps described in :ref:`dpdk-phy-phy` to | |
587 | create and initialize the database, start ovs-vswitchd and add ``dpdk``-type | |
588 | devices to bridge ``br0``. Once complete, follow the below steps: | |
589 | ||
590 | 1. Configure PMD and RXQs. | |
591 | ||
592 | For example, set the number of dpdk port rx queues to at least 2 The number | |
593 | of rx queues at vhost-user interface gets automatically configured after | |
594 | virtio device connection and doesn't need manual configuration:: | |
595 | ||
596 | $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xc | |
597 | $ ovs-vsctl set Interface phy0 options:n_rxq=2 | |
598 | $ ovs-vsctl set Interface phy1 options:n_rxq=2 | |
599 | ||
600 | 2. Instantiate Guest VM using QEMU cmdline | |
601 | ||
602 | We must configure with appropriate software versions to ensure this feature | |
603 | is supported. | |
604 | ||
605 | .. list-table:: Recommended BIOS Settings | |
606 | :header-rows: 1 | |
607 | ||
608 | * - Setting | |
609 | - Value | |
610 | * - QEMU version | |
611 | - 2.5.0 | |
612 | * - QEMU thread affinity | |
613 | - 2 cores (taskset 0x30) | |
614 | * - Memory | |
615 | - 4 GB | |
616 | * - Cores | |
617 | - 2 | |
618 | * - Distro | |
619 | - Fedora 22 | |
620 | * - Multiqueue | |
621 | - Enabled | |
622 | ||
623 | To do this, instantiate the guest as follows:: | |
624 | ||
625 | $ export VM_NAME=vhost-vm | |
626 | $ export GUEST_MEM=4096M | |
627 | $ export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2 | |
628 | $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch | |
629 | $ taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -m 4096M \ | |
630 | -drive file=$QCOW2_IMAGE --enable-kvm -name $VM_NAME \ | |
631 | -nographic -numa node,memdev=mem -mem-prealloc \ | |
632 | -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \ | |
633 | -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \ | |
634 | -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 \ | |
635 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 \ | |
636 | -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \ | |
637 | -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 \ | |
638 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6 | |
639 | ||
640 | .. note:: | |
641 | Queue value above should match the queues configured in OVS, The vector | |
642 | value should be set to "number of queues x 2 + 2" | |
643 | ||
644 | 3. Configure the guest interface | |
645 | ||
646 | Assuming there are 2 interfaces in the guest named eth0, eth1 check the | |
647 | channel configuration and set the number of combined channels to 2 for | |
648 | virtio devices:: | |
649 | ||
650 | $ ethtool -l eth0 | |
651 | $ ethtool -L eth0 combined 2 | |
652 | $ ethtool -L eth1 combined 2 | |
653 | ||
654 | More information can be found in vHost walkthrough section. | |
655 | ||
656 | 4. Configure kernel packet forwarding | |
657 | ||
658 | Configure IP and enable interfaces:: | |
659 | ||
660 | $ ip addr add 5.5.5.1/24 dev eth0 | |
661 | $ ip addr add 90.90.90.1/24 dev eth1 | |
662 | $ ip link set eth0 up | |
663 | $ ip link set eth1 up | |
664 | ||
665 | Configure IP forwarding and add route entries:: | |
666 | ||
667 | $ sysctl -w net.ipv4.ip_forward=1 | |
668 | $ sysctl -w net.ipv4.conf.all.rp_filter=0 | |
669 | $ sysctl -w net.ipv4.conf.eth0.rp_filter=0 | |
670 | $ sysctl -w net.ipv4.conf.eth1.rp_filter=0 | |
671 | $ ip route add 2.1.1.0/24 dev eth1 | |
672 | $ route add default gw 2.1.1.2 eth1 | |
673 | $ route add default gw 90.90.90.90 eth1 | |
674 | $ arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE | |
675 | $ arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA | |
676 | ||
677 | Check traffic on multiple queues:: | |
678 | ||
679 | $ cat /proc/interrupts | grep virtio |