]>
Commit | Line | Data |
---|---|---|
e69e4f5b SF |
1 | .. |
2 | Licensed under the Apache License, Version 2.0 (the "License"); you may | |
3 | not use this file except in compliance with the License. You may obtain | |
4 | a copy of the License at | |
5 | ||
6 | http://www.apache.org/licenses/LICENSE-2.0 | |
7 | ||
8 | Unless required by applicable law or agreed to in writing, software | |
9 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
10 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
11 | License for the specific language governing permissions and limitations | |
12 | under the License. | |
13 | ||
14 | Convention for heading levels in Open vSwitch documentation: | |
15 | ||
16 | ======= Heading 0 (reserved for the title in a document) | |
17 | ------- Heading 1 | |
18 | ~~~~~~~ Heading 2 | |
19 | +++++++ Heading 3 | |
20 | ''''''' Heading 4 | |
21 | ||
22 | Avoid deeper levels because they do not render well. | |
23 | ||
24 | ============================ | |
25 | Using Open vSwitch with DPDK | |
26 | ============================ | |
27 | ||
28 | This document describes how to use Open vSwitch with DPDK datapath. | |
29 | ||
30 | .. important:: | |
31 | ||
32 | Using the DPDK datapath requires building OVS with DPDK support. Refer to | |
33 | :doc:`/intro/install/dpdk` for more information. | |
34 | ||
35 | Ports and Bridges | |
36 | ----------------- | |
37 | ||
38 | ovs-vsctl can be used to set up bridges and other Open vSwitch features. | |
39 | Bridges should be created with a ``datapath_type=netdev``:: | |
40 | ||
41 | $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev | |
42 | ||
a2673b6c | 43 | ovs-vsctl can also be used to add DPDK devices. ovs-vswitchd should print the |
e69e4f5b SF |
44 | number of dpdk devices found in the log file:: |
45 | ||
fafa41a6 DDP |
46 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ |
47 | options:dpdk-devargs=0000:01:00.0 | |
48 | $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \ | |
49 | options:dpdk-devargs=0000:01:00.1 | |
e69e4f5b SF |
50 | |
51 | After the DPDK ports get added to switch, a polling thread continuously polls | |
52 | DPDK devices and consumes 100% of the core, as can be checked from ``top`` and | |
53 | ``ps`` commands:: | |
54 | ||
55 | $ top -H | |
56 | $ ps -eLo pid,psr,comm | grep pmd | |
57 | ||
58 | Creating bonds of DPDK interfaces is slightly different to creating bonds of | |
fafa41a6 DDP |
59 | system interfaces. For DPDK, the interface type and devargs must be explicitly |
60 | set. For example:: | |
e69e4f5b | 61 | |
fafa41a6 DDP |
62 | $ ovs-vsctl add-bond br0 dpdkbond p0 p1 \ |
63 | -- set Interface p0 type=dpdk options:dpdk-devargs=0000:01:00.0 \ | |
64 | -- set Interface p1 type=dpdk options:dpdk-devargs=0000:01:00.1 | |
e69e4f5b SF |
65 | |
66 | To stop ovs-vswitchd & delete bridge, run:: | |
67 | ||
68 | $ ovs-appctl -t ovs-vswitchd exit | |
69 | $ ovs-appctl -t ovsdb-server exit | |
70 | $ ovs-vsctl del-br br0 | |
71 | ||
72 | PMD Thread Statistics | |
73 | --------------------- | |
74 | ||
75 | To show current stats:: | |
76 | ||
77 | $ ovs-appctl dpif-netdev/pmd-stats-show | |
78 | ||
79 | To clear previous stats:: | |
80 | ||
81 | $ ovs-appctl dpif-netdev/pmd-stats-clear | |
82 | ||
83 | Port/RXQ Assigment to PMD Threads | |
84 | --------------------------------- | |
85 | ||
86 | To show port/rxq assignment:: | |
87 | ||
88 | $ ovs-appctl dpif-netdev/pmd-rxq-show | |
89 | ||
90 | To change default rxq assignment to pmd threads, rxqs may be manually pinned to | |
91 | desired cores using:: | |
92 | ||
93 | $ ovs-vsctl set Interface <iface> \ | |
94 | other_config:pmd-rxq-affinity=<rxq-affinity-list> | |
95 | ||
96 | where: | |
97 | ||
98 | - ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values | |
99 | ||
100 | For example:: | |
101 | ||
fafa41a6 | 102 | $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ |
e69e4f5b SF |
103 | other_config:pmd-rxq-affinity="0:3,1:7,3:8" |
104 | ||
105 | This will ensure: | |
106 | ||
107 | - Queue #0 pinned to core 3 | |
108 | - Queue #1 pinned to core 7 | |
109 | - Queue #2 not pinned | |
110 | - Queue #3 pinned to core 8 | |
111 | ||
112 | After that PMD threads on cores where RX queues was pinned will become | |
113 | ``isolated``. This means that this thread will poll only pinned RX queues. | |
114 | ||
115 | .. warning:: | |
116 | If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will | |
117 | not be polled. Also, if provided ``core_id`` is not available (ex. this | |
118 | ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD | |
119 | thread. | |
120 | ||
121 | QoS | |
122 | --- | |
123 | ||
124 | Assuming you have a vhost-user port transmitting traffic consisting of packets | |
125 | of size 64 bytes, the following command would limit the egress transmission | |
126 | rate of the port to ~1,000,000 packets per second:: | |
127 | ||
128 | $ ovs-vsctl set port vhost-user0 qos=@newqos -- \ | |
129 | --id=@newqos create qos type=egress-policer other-config:cir=46000000 \ | |
130 | other-config:cbs=2048` | |
131 | ||
132 | To examine the QoS configuration of the port, run:: | |
133 | ||
134 | $ ovs-appctl -t ovs-vswitchd qos/show vhost-user0 | |
135 | ||
136 | To clear the QoS configuration from the port and ovsdb, run:: | |
137 | ||
138 | $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos | |
139 | ||
140 | Refer to vswitch.xml for more details on egress-policer. | |
141 | ||
142 | Rate Limiting | |
143 | -------------- | |
144 | ||
145 | Here is an example on Ingress Policing usage. Assuming you have a vhost-user | |
146 | port receiving traffic consisting of packets of size 64 bytes, the following | |
147 | command would limit the reception rate of the port to ~1,000,000 packets per | |
148 | second:: | |
149 | ||
150 | $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 \ | |
151 | ingress_policing_burst=1000` | |
152 | ||
153 | To examine the ingress policer configuration of the port:: | |
154 | ||
155 | $ ovs-vsctl list interface vhost-user0 | |
156 | ||
157 | To clear the ingress policer configuration from the port:: | |
158 | ||
159 | $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0 | |
160 | ||
161 | Refer to vswitch.xml for more details on ingress-policer. | |
162 | ||
163 | Flow Control | |
164 | ------------ | |
165 | ||
166 | Flow control can be enabled only on DPDK physical ports. To enable flow control | |
167 | support at tx side while adding a port, run:: | |
168 | ||
fafa41a6 DDP |
169 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ |
170 | options:dpdk-devargs=0000:01:00.0 options:tx-flow-ctrl=true | |
e69e4f5b SF |
171 | |
172 | Similarly, to enable rx flow control, run:: | |
173 | ||
fafa41a6 DDP |
174 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ |
175 | options:dpdk-devargs=0000:01:00.0 options:rx-flow-ctrl=true | |
e69e4f5b SF |
176 | |
177 | To enable flow control auto-negotiation, run:: | |
178 | ||
fafa41a6 DDP |
179 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ |
180 | options:dpdk-devargs=0000:01:00.0 options:flow-ctrl-autoneg=true | |
e69e4f5b SF |
181 | |
182 | To turn ON the tx flow control at run time for an existing port, run:: | |
183 | ||
fafa41a6 | 184 | $ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true |
e69e4f5b SF |
185 | |
186 | The flow control parameters can be turned off by setting ``false`` to the | |
187 | respective parameter. To disable the flow control at tx side, run:: | |
188 | ||
fafa41a6 | 189 | $ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=false |
e69e4f5b SF |
190 | |
191 | pdump | |
192 | ----- | |
193 | ||
194 | pdump allows you to listen on DPDK ports and view the traffic that is passing | |
195 | on them. To use this utility, one must have libpcap installed on the system. | |
196 | Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and | |
197 | ``CONFIG_RTE_LIBRTE_PMD_PCAP=y``. | |
198 | ||
199 | .. warning:: | |
200 | A performance decrease is expected when using a monitoring application like | |
201 | the DPDK pdump app. | |
202 | ||
203 | To use pdump, simply launch OVS as usual, then navigate to the ``app/pdump`` | |
204 | directory in DPDK, ``make`` the application and run like so:: | |
205 | ||
206 | $ sudo ./build/app/dpdk-pdump -- \ | |
207 | --pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \ | |
208 | --server-socket-path=/usr/local/var/run/openvswitch | |
209 | ||
210 | The above command captures traffic received on queue 0 of port 0 and stores it | |
211 | in ``/tmp/pkts.pcap``. Other combinations of port numbers, queues numbers and | |
212 | pcap locations are of course also available to use. For example, to capture all | |
213 | packets that traverse port 0 in a single pcap file:: | |
214 | ||
215 | $ sudo ./build/app/dpdk-pdump -- \ | |
216 | --pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap' \ | |
217 | --server-socket-path=/usr/local/var/run/openvswitch | |
218 | ||
219 | ``server-socket-path`` must be set to the value of ``ovs_rundir()`` which | |
220 | typically resolves to ``/usr/local/var/run/openvswitch``. | |
221 | ||
222 | Many tools are available to view the contents of the pcap file. Once example is | |
223 | tcpdump. Issue the following command to view the contents of ``pkts.pcap``:: | |
224 | ||
225 | $ tcpdump -r pkts.pcap | |
226 | ||
227 | More information on the pdump app and its usage can be found in the `DPDK docs | |
34aa9cf9 | 228 | <http://dpdk.org/doc/guides/tools/pdump.html>`__. |
e69e4f5b SF |
229 | |
230 | Jumbo Frames | |
231 | ------------ | |
232 | ||
233 | By default, DPDK ports are configured with standard Ethernet MTU (1500B). To | |
234 | enable Jumbo Frames support for a DPDK port, change the Interface's | |
235 | ``mtu_request`` attribute to a sufficiently large value. For example, to add a | |
236 | DPDK Phy port with MTU of 9000:: | |
237 | ||
fafa41a6 DDP |
238 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ |
239 | options:dpdk-devargs=0000:01:00.0 mtu_request=9000 | |
e69e4f5b SF |
240 | |
241 | Similarly, to change the MTU of an existing port to 6200:: | |
242 | ||
fafa41a6 | 243 | $ ovs-vsctl set Interface dpdk-p0 mtu_request=6200 |
e69e4f5b SF |
244 | |
245 | Some additional configuration is needed to take advantage of jumbo frames with | |
246 | vHost ports: | |
247 | ||
248 | 1. *mergeable buffers* must be enabled for vHost ports, as demonstrated in the | |
249 | QEMU command line snippet below:: | |
250 | ||
251 | -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ | |
252 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on | |
253 | ||
254 | 2. Where virtio devices are bound to the Linux kernel driver in a guest | |
255 | environment (i.e. interfaces are not bound to an in-guest DPDK driver), the | |
256 | MTU of those logical network interfaces must also be increased to a | |
257 | sufficiently large value. This avoids segmentation of Jumbo Frames received | |
258 | in the guest. Note that 'MTU' refers to the length of the IP packet only, | |
259 | and not that of the entire frame. | |
260 | ||
261 | To calculate the exact MTU of a standard IPv4 frame, subtract the L2 header | |
262 | and CRC lengths (i.e. 18B) from the max supported frame size. So, to set | |
263 | the MTU for a 9018B Jumbo Frame:: | |
264 | ||
265 | $ ifconfig eth1 mtu 9000 | |
266 | ||
267 | When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are | |
268 | increased, such that a full Jumbo Frame of a specific size may be accommodated | |
269 | within a single mbuf segment. | |
270 | ||
271 | Jumbo frame support has been validated against 9728B frames, which is the | |
272 | largest frame size supported by Fortville NIC using the DPDK i40e driver, but | |
273 | larger frames and other DPDK NIC drivers may be supported. These cases are | |
274 | common for use cases involving East-West traffic only. | |
275 | ||
1a2bb118 SC |
276 | Rx Checksum Offload |
277 | ------------------- | |
278 | ||
279 | By default, DPDK physical ports are enabled with Rx checksum offload. Rx | |
280 | checksum offload can be configured on a DPDK physical port either when adding | |
281 | or at run time. | |
282 | ||
fafa41a6 | 283 | To disable Rx checksum offload when adding a DPDK port dpdk-p0:: |
1a2bb118 | 284 | |
fafa41a6 DDP |
285 | $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ |
286 | options:dpdk-devargs=0000:01:00.0 options:rx-checksum-offload=false | |
1a2bb118 | 287 | |
fafa41a6 | 288 | Similarly to disable the Rx checksum offloading on a existing DPDK port dpdk-p0:: |
1a2bb118 | 289 | |
fafa41a6 | 290 | $ ovs-vsctl set Interface dpdk-p0 options:rx-checksum-offload=false |
1a2bb118 SC |
291 | |
292 | Rx checksum offload can offer performance improvement only for tunneling | |
293 | traffic in OVS-DPDK because the checksum validation of tunnel packets is | |
294 | offloaded to the NIC. Also enabling Rx checksum may slightly reduce the | |
295 | performance of non-tunnel traffic, specifically for smaller size packet. | |
296 | DPDK vectorization is disabled when checksum offloading is configured on DPDK | |
297 | physical ports which in turn effects the non-tunnel traffic performance. | |
298 | So it is advised to turn off the Rx checksum offload for non-tunnel traffic use | |
299 | cases to achieve the best performance. | |
300 | ||
9b49f85f BB |
301 | .. _extended-statistics: |
302 | ||
303 | Extended Statistics | |
304 | ------------------- | |
305 | ||
306 | DPDK Extended Statistics API allows PMD to expose unique set of statistics. | |
307 | The Extended statistics are implemented and supported only for DPDK physical | |
308 | and vHost ports. | |
309 | ||
310 | To enable statistics, you have to enable OpenFlow 1.4 support for OVS. | |
311 | Configure bridge br0 to support OpenFlow version 1.4:: | |
312 | ||
313 | $ ovs-vsctl set bridge br0 datapath_type=netdev \ | |
314 | protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 | |
315 | ||
316 | Check the OVSDB protocols column in the bridge table if OpenFlow 1.4 support | |
317 | is enabled for OVS:: | |
318 | ||
319 | $ ovsdb-client dump Bridge protocols | |
320 | ||
321 | Query the port statistics by explicitly specifying -O OpenFlow14 option:: | |
322 | ||
323 | $ ovs-ofctl -O OpenFlow14 dump-ports br0 | |
324 | ||
325 | Note: vHost ports supports only partial statistics. RX packet size based | |
326 | counter are only supported and doesn't include TX packet size counters. | |
327 | ||
b8374d0d MV |
328 | .. _port-hotplug: |
329 | ||
330 | Port Hotplug | |
331 | ------------ | |
332 | ||
333 | OVS supports port hotplugging, allowing the use of ports that were not bound | |
334 | to DPDK when vswitchd was started. | |
335 | In order to attach a port, it has to be bound to DPDK using the | |
336 | ``dpdk_nic_bind.py`` script:: | |
337 | ||
338 | $ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio 0000:01:00.0 | |
339 | ||
340 | Then it can be attached to OVS:: | |
341 | ||
55e075e6 CL |
342 | $ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \ |
343 | options:dpdk-devargs=0000:01:00.0 | |
b8374d0d MV |
344 | |
345 | It is also possible to detach a port from ovs, the user has to remove the | |
346 | port using the del-port command, then it can be detached using:: | |
347 | ||
fafa41a6 | 348 | $ ovs-appctl netdev-dpdk/detach 0000:01:00.0 |
b8374d0d MV |
349 | |
350 | This feature is not supported with VFIO and does not work with some NICs. | |
351 | For more information please refer to the `DPDK Port Hotplug Framework | |
352 | <http://dpdk.org/doc/guides/prog_guide/port_hotplug_framework.html#hotplug>`__. | |
353 | ||
69876ed7 CL |
354 | .. _vdev-support: |
355 | ||
356 | Vdev Support | |
357 | ------------ | |
358 | ||
359 | DPDK provides drivers for both physical and virtual devices. Physical DPDK | |
360 | devices are added to OVS by specifying a valid PCI address in 'dpdk-devargs'. | |
361 | Virtual DPDK devices which do not have PCI addresses can be added using a | |
362 | different format for 'dpdk-devargs'. | |
363 | ||
364 | Typically, the format expected is 'eth_<driver_name><x>' where 'x' is a | |
365 | number between 0 and RTE_MAX_ETHPORTS -1 (31). | |
366 | ||
367 | For example to add a dpdk port that uses the 'null' DPDK PMD driver:: | |
368 | ||
369 | $ ovs-vsctl add-port br0 null0 -- set Interface null0 type=dpdk \ | |
370 | options:dpdk-devargs=eth_null0 | |
371 | ||
372 | Similarly, to add a dpdk port that uses the 'af_packet' DPDK PMD driver:: | |
373 | ||
374 | $ ovs-vsctl add-port br0 myeth0 -- set Interface myeth0 type=dpdk \ | |
375 | options:dpdk-devargs=eth_af_packet0,iface=eth0 | |
376 | ||
377 | More information on the different types of virtual DPDK PMDs can be found in | |
378 | the `DPDK documentation | |
379 | <http://dpdk.org/doc/guides/nics/overview.html>`__. | |
380 | ||
381 | Note: Not all DPDK virtual PMD drivers have been tested and verified to work. | |
382 | ||
4c30b246 CL |
383 | EMC Insertion Probability |
384 | ------------------------- | |
385 | By default 1 in every 100 flows are inserted into the Exact Match Cache (EMC). | |
386 | It is possible to change this insertion probability by setting the | |
387 | ``emc-insert-inv-prob`` option:: | |
388 | ||
389 | $ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N | |
390 | ||
391 | where: | |
392 | ||
393 | ``N`` | |
394 | is a positive integer representing the inverse probability of insertion ie. | |
395 | on average 1 in every N packets with a unique flow will generate an EMC | |
396 | insertion. | |
397 | ||
398 | If ``N`` is set to 1, an insertion will be performed for every flow. If set to | |
399 | 0, no insertions will be performed and the EMC will effectively be disabled. | |
400 | ||
401 | For more information on the EMC refer to :doc:`/intro/install/dpdk` . | |
402 | ||
e69e4f5b SF |
403 | .. _dpdk-ovs-in-guest: |
404 | ||
405 | OVS with DPDK Inside VMs | |
406 | ------------------------ | |
407 | ||
408 | Additional configuration is required if you want to run ovs-vswitchd with DPDK | |
409 | backend inside a QEMU virtual machine. ovs-vswitchd creates separate DPDK TX | |
410 | queues for each CPU core available. This operation fails inside QEMU virtual | |
411 | machine because, by default, VirtIO NIC provided to the guest is configured to | |
412 | support only single TX queue and single RX queue. To change this behavior, you | |
413 | need to turn on ``mq`` (multiqueue) property of all ``virtio-net-pci`` devices | |
414 | emulated by QEMU and used by DPDK. You may do it manually (by changing QEMU | |
415 | command line) or, if you use Libvirt, by adding the following string to | |
416 | ``<interface>`` sections of all network devices used by DPDK:: | |
417 | ||
418 | <driver name='vhost' queues='N'/> | |
419 | ||
420 | where: | |
421 | ||
422 | ``N`` | |
423 | determines how many queues can be used by the guest. | |
424 | ||
425 | This requires QEMU >= 2.2. | |
426 | ||
427 | .. _dpdk-phy-phy: | |
428 | ||
429 | PHY-PHY | |
430 | ------- | |
431 | ||
432 | Add a userspace bridge and two ``dpdk`` (PHY) ports:: | |
433 | ||
434 | # Add userspace bridge | |
435 | $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev | |
436 | ||
437 | # Add two dpdk ports | |
fafa41a6 DDP |
438 | $ ovs-vsctl add-port br0 phy0 -- set Interface phy0 type=dpdk \ |
439 | options:dpdk-devargs=0000:01:00.0 ofport_request=1 | |
440 | ||
441 | $ ovs-vsctl add-port br0 phy1 -- set Interface phy1 type=dpdk | |
442 | options:dpdk-devargs=0000:01:00.1 ofport_request=2 | |
e69e4f5b SF |
443 | |
444 | Add test flows to forward packets betwen DPDK port 0 and port 1:: | |
445 | ||
446 | # Clear current flows | |
447 | $ ovs-ofctl del-flows br0 | |
448 | ||
fafa41a6 | 449 | # Add flows between port 1 (phy0) to port 2 (phy1) |
e69e4f5b SF |
450 | $ ovs-ofctl add-flow br0 in_port=1,action=output:2 |
451 | $ ovs-ofctl add-flow br0 in_port=2,action=output:1 | |
452 | ||
453 | Transmit traffic into either port. You should see it returned via the other. | |
454 | ||
455 | .. _dpdk-vhost-loopback: | |
456 | ||
457 | PHY-VM-PHY (vHost Loopback) | |
458 | --------------------------- | |
459 | ||
460 | Add a userspace bridge, two ``dpdk`` (PHY) ports, and two ``dpdkvhostuser`` | |
461 | ports:: | |
462 | ||
463 | # Add userspace bridge | |
464 | $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev | |
465 | ||
466 | # Add two dpdk ports | |
fafa41a6 DDP |
467 | $ ovs-vsctl add-port br0 phy0 -- set Interface phy0 type=dpdk \ |
468 | options:dpdk-devargs=0000:01:00.0 ofport_request=1 | |
469 | ||
470 | $ ovs-vsctl add-port br0 phy1 -- set Interface phy1 type=dpdk | |
471 | options:dpdk-devargs=0000:01:00.1 ofport_request=2 | |
e69e4f5b SF |
472 | |
473 | # Add two dpdkvhostuser ports | |
474 | $ ovs-vsctl add-port br0 dpdkvhostuser0 \ | |
fafa41a6 | 475 | -- set Interface dpdkvhostuser0 type=dpdkvhostuser ofport_request=3 |
e69e4f5b | 476 | $ ovs-vsctl add-port br0 dpdkvhostuser1 \ |
fafa41a6 | 477 | -- set Interface dpdkvhostuser1 type=dpdkvhostuser ofport_request=4 |
e69e4f5b SF |
478 | |
479 | Add test flows to forward packets betwen DPDK devices and VM ports:: | |
480 | ||
481 | # Clear current flows | |
482 | $ ovs-ofctl del-flows br0 | |
483 | ||
484 | # Add flows | |
485 | $ ovs-ofctl add-flow br0 in_port=1,action=output:3 | |
486 | $ ovs-ofctl add-flow br0 in_port=3,action=output:1 | |
487 | $ ovs-ofctl add-flow br0 in_port=4,action=output:2 | |
488 | $ ovs-ofctl add-flow br0 in_port=2,action=output:4 | |
489 | ||
490 | # Dump flows | |
491 | $ ovs-ofctl dump-flows br0 | |
492 | ||
493 | Create a VM using the following configuration: | |
494 | ||
495 | +----------------------+--------+-----------------+ | |
496 | | configuration | values | comments | | |
497 | +----------------------+--------+-----------------+ | |
498 | | qemu version | 2.2.0 | n/a | | |
499 | | qemu thread affinity | core 5 | taskset 0x20 | | |
500 | | memory | 4GB | n/a | | |
501 | | cores | 2 | n/a | | |
502 | | Qcow2 image | CentOS7| n/a | | |
503 | | mrg_rxbuf | off | n/a | | |
504 | +----------------------+--------+-----------------+ | |
505 | ||
506 | You can do this directly with QEMU via the ``qemu-system-x86_64`` application:: | |
507 | ||
508 | $ export VM_NAME=vhost-vm | |
509 | $ export GUEST_MEM=3072M | |
510 | $ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 | |
511 | $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch | |
512 | ||
513 | $ taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm \ | |
514 | -m $GUEST_MEM -drive file=$QCOW2_IMAGE --nographic -snapshot \ | |
515 | -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 \ | |
516 | -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \ | |
517 | -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \ | |
518 | -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ | |
519 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off \ | |
520 | -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \ | |
521 | -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ | |
522 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off | |
523 | ||
524 | For a explanation of this command, along with alternative approaches such as | |
525 | booting the VM via libvirt, refer to :doc:`/topics/dpdk/vhost-user`. | |
526 | ||
527 | Once the guest is configured and booted, configure DPDK packet forwarding | |
528 | within the guest. To accomplish this, build the ``testpmd`` application as | |
529 | described in :ref:`dpdk-testpmd`. Once compiled, run the application:: | |
530 | ||
531 | $ cd $DPDK_DIR/app/test-pmd; | |
532 | $ ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- \ | |
533 | --burst=64 -i --txqflags=0xf00 --disable-hw-vlan | |
534 | $ set fwd mac retry | |
535 | $ start | |
536 | ||
537 | When you finish testing, bind the vNICs back to kernel:: | |
538 | ||
539 | $ $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci 0000:00:03.0 | |
540 | $ $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci 0000:00:04.0 | |
541 | ||
542 | .. note:: | |
543 | ||
544 | Valid PCI IDs must be passed in above example. The PCI IDs can be retrieved | |
545 | like so:: | |
546 | ||
547 | $ $DPDK_DIR/tools/dpdk-devbind.py --status | |
548 | ||
549 | More information on the dpdkvhostuser ports can be found in | |
550 | :doc:`/topics/dpdk/vhost-user`. | |
551 | ||
552 | PHY-VM-PHY (vHost Loopback) (Kernel Forwarding) | |
553 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
554 | ||
555 | :ref:`dpdk-vhost-loopback` details steps for PHY-VM-PHY loopback | |
556 | testcase and packet forwarding using DPDK testpmd application in the Guest VM. | |
557 | For users wishing to do packet forwarding using kernel stack below, you need to | |
558 | run the below commands on the guest:: | |
559 | ||
560 | $ ifconfig eth1 1.1.1.2/24 | |
561 | $ ifconfig eth2 1.1.2.2/24 | |
562 | $ systemctl stop firewalld.service | |
563 | $ systemctl stop iptables.service | |
564 | $ sysctl -w net.ipv4.ip_forward=1 | |
565 | $ sysctl -w net.ipv4.conf.all.rp_filter=0 | |
566 | $ sysctl -w net.ipv4.conf.eth1.rp_filter=0 | |
567 | $ sysctl -w net.ipv4.conf.eth2.rp_filter=0 | |
568 | $ route add -net 1.1.2.0/24 eth2 | |
569 | $ route add -net 1.1.1.0/24 eth1 | |
570 | $ arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE | |
571 | $ arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE | |
572 | ||
573 | PHY-VM-PHY (vHost Multiqueue) | |
574 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
575 | ||
576 | vHost Multiqueue functionality can also be validated using the PHY-VM-PHY | |
577 | configuration. To begin, follow the steps described in :ref:`dpdk-phy-phy` to | |
578 | create and initialize the database, start ovs-vswitchd and add ``dpdk``-type | |
579 | devices to bridge ``br0``. Once complete, follow the below steps: | |
580 | ||
581 | 1. Configure PMD and RXQs. | |
582 | ||
583 | For example, set the number of dpdk port rx queues to at least 2 The number | |
584 | of rx queues at vhost-user interface gets automatically configured after | |
585 | virtio device connection and doesn't need manual configuration:: | |
586 | ||
587 | $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xc | |
fafa41a6 DDP |
588 | $ ovs-vsctl set Interface phy0 options:n_rxq=2 |
589 | $ ovs-vsctl set Interface phy1 options:n_rxq=2 | |
e69e4f5b SF |
590 | |
591 | 2. Instantiate Guest VM using QEMU cmdline | |
592 | ||
593 | We must configure with appropriate software versions to ensure this feature | |
594 | is supported. | |
595 | ||
596 | .. list-table:: Recommended BIOS Settings | |
597 | :header-rows: 1 | |
598 | ||
599 | * - Setting | |
600 | - Value | |
601 | * - QEMU version | |
602 | - 2.5.0 | |
603 | * - QEMU thread affinity | |
604 | - 2 cores (taskset 0x30) | |
605 | * - Memory | |
606 | - 4 GB | |
607 | * - Cores | |
608 | - 2 | |
609 | * - Distro | |
610 | - Fedora 22 | |
611 | * - Multiqueue | |
612 | - Enabled | |
613 | ||
614 | To do this, instantiate the guest as follows:: | |
615 | ||
616 | $ export VM_NAME=vhost-vm | |
617 | $ export GUEST_MEM=4096M | |
618 | $ export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2 | |
619 | $ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch | |
620 | $ taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -m 4096M \ | |
621 | -drive file=$QCOW2_IMAGE --enable-kvm -name $VM_NAME \ | |
622 | -nographic -numa node,memdev=mem -mem-prealloc \ | |
623 | -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \ | |
624 | -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0 \ | |
625 | -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 \ | |
626 | -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 \ | |
627 | -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 \ | |
628 | -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 \ | |
629 | -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6 | |
630 | ||
631 | .. note:: | |
632 | Queue value above should match the queues configured in OVS, The vector | |
633 | value should be set to "number of queues x 2 + 2" | |
634 | ||
635 | 3. Configure the guest interface | |
636 | ||
637 | Assuming there are 2 interfaces in the guest named eth0, eth1 check the | |
638 | channel configuration and set the number of combined channels to 2 for | |
639 | virtio devices:: | |
640 | ||
641 | $ ethtool -l eth0 | |
642 | $ ethtool -L eth0 combined 2 | |
643 | $ ethtool -L eth1 combined 2 | |
644 | ||
645 | More information can be found in vHost walkthrough section. | |
646 | ||
647 | 4. Configure kernel packet forwarding | |
648 | ||
649 | Configure IP and enable interfaces:: | |
650 | ||
651 | $ ifconfig eth0 5.5.5.1/24 up | |
652 | $ ifconfig eth1 90.90.90.1/24 up | |
653 | ||
654 | Configure IP forwarding and add route entries:: | |
655 | ||
656 | $ sysctl -w net.ipv4.ip_forward=1 | |
657 | $ sysctl -w net.ipv4.conf.all.rp_filter=0 | |
658 | $ sysctl -w net.ipv4.conf.eth0.rp_filter=0 | |
659 | $ sysctl -w net.ipv4.conf.eth1.rp_filter=0 | |
660 | $ ip route add 2.1.1.0/24 dev eth1 | |
661 | $ route add default gw 2.1.1.2 eth1 | |
662 | $ route add default gw 90.90.90.90 eth1 | |
663 | $ arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE | |
664 | $ arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA | |
665 | ||
666 | Check traffic on multiple queues:: | |
667 | ||
668 | $ cat /proc/interrupts | grep virtio |