]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | .. SPDX-License-Identifier: BSD-3-Clause |
2 | Copyright(c) 2010-2015 Intel Corporation. | |
7c673cae FG |
3 | |
4 | Poll Mode Driver for Emulated Virtio NIC | |
5 | ======================================== | |
6 | ||
7 | Virtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor. | |
8 | In the Data Plane Development Kit (DPDK), | |
9 | we provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution, | |
11fdf7f2 | 10 | |
7c673cae FG |
11 | for fast guest VM to guest VM communication and guest VM to host communication. |
12 | ||
13 | Vhost is a kernel acceleration module for virtio qemu backend. | |
14 | The DPDK extends kni to support vhost raw socket interface, | |
15 | which enables vhost to directly read/ write packets from/to a physical port. | |
16 | With this enhancement, virtio could achieve quite promising performance. | |
17 | ||
7c673cae FG |
18 | For basic qemu-KVM installation and other Intel EM poll mode driver in guest VM, |
19 | please refer to Chapter "Driver for VM Emulated Devices". | |
20 | ||
21 | In this chapter, we will demonstrate usage of virtio PMD driver with two backends, | |
22 | standard qemu vhost back end and vhost kni back end. | |
23 | ||
24 | Virtio Implementation in DPDK | |
25 | ----------------------------- | |
26 | ||
27 | For details about the virtio spec, refer to Virtio PCI Card Specification written by Rusty Russell. | |
28 | ||
29 | As a PMD, virtio provides packet reception and transmission callbacks virtio_recv_pkts and virtio_xmit_pkts. | |
30 | ||
31 | In virtio_recv_pkts, index in range [vq->vq_used_cons_idx , vq->vq_ring.used->idx) in vring is available for virtio to burst out. | |
32 | ||
33 | In virtio_xmit_pkts, same index range in vring is available for virtio to clean. | |
34 | Virtio will enqueue to be transmitted packets into vring, advance the vq->vq_ring.avail->idx, | |
35 | and then notify the host back end if necessary. | |
36 | ||
37 | Features and Limitations of virtio PMD | |
38 | -------------------------------------- | |
39 | ||
40 | In this release, the virtio PMD driver provides the basic functionality of packet reception and transmission. | |
41 | ||
42 | * It supports merge-able buffers per packet when receiving packets and scattered buffer per packet | |
43 | when transmitting packets. The packet size supported is from 64 to 1518. | |
44 | ||
45 | * It supports multicast packets and promiscuous mode. | |
46 | ||
11fdf7f2 | 47 | * The descriptor number for the Rx/Tx queue is hard-coded to be 256 by qemu 2.7 and below. |
7c673cae FG |
48 | If given a different descriptor number by the upper application, |
49 | the virtio PMD generates a warning and fall back to the hard-coded value. | |
9f95a23c | 50 | Rx queue size can be configurable and up to 1024 since qemu 2.8 and above. Rx queue size is 256 |
11fdf7f2 | 51 | by default. Tx queue size is still hard-coded to be 256. |
7c673cae FG |
52 | |
53 | * Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them. | |
11fdf7f2 TL |
54 | When backend can't support vlan filter, virtio app on guest should not enable vlan filter in order |
55 | to make sure the virtio port is configured correctly. E.g. do not specify '--enable-hw-vlan' in testpmd | |
56 | command line. | |
7c673cae | 57 | |
11fdf7f2 TL |
58 | * "RTE_PKTMBUF_HEADROOM" should be defined |
59 | no less than "sizeof(struct virtio_net_hdr_mrg_rxbuf)", which is 12 bytes when mergeable or | |
60 | "VIRTIO_F_VERSION_1" is set. | |
61 | no less than "sizeof(struct virtio_net_hdr)", which is 10 bytes, when using non-mergeable. | |
7c673cae FG |
62 | |
63 | * Virtio does not support runtime configuration. | |
64 | ||
65 | * Virtio supports Link State interrupt. | |
66 | ||
11fdf7f2 TL |
67 | * Virtio supports Rx interrupt (so far, only support 1:1 mapping for queue/interrupt). |
68 | ||
7c673cae FG |
69 | * Virtio supports software vlan stripping and inserting. |
70 | ||
71 | * Virtio supports using port IO to get PCI resource when uio/igb_uio module is not available. | |
72 | ||
73 | Prerequisites | |
74 | ------------- | |
75 | ||
76 | The following prerequisites apply: | |
77 | ||
78 | * In the BIOS, turn VT-x and VT-d on | |
79 | ||
80 | * Linux kernel with KVM module; vhost module loaded and ioeventfd supported. | |
81 | Qemu standard backend without vhost support isn't tested, and probably isn't supported. | |
82 | ||
83 | Virtio with kni vhost Back End | |
84 | ------------------------------ | |
85 | ||
86 | This section demonstrates kni vhost back end example setup for Phy-VM Communication. | |
87 | ||
88 | .. _figure_host_vm_comms: | |
89 | ||
90 | .. figure:: img/host_vm_comms.* | |
91 | ||
92 | Host2VM Communication Example Using kni vhost Back End | |
93 | ||
94 | ||
95 | Host2VM communication example | |
96 | ||
97 | #. Load the kni kernel module: | |
98 | ||
99 | .. code-block:: console | |
100 | ||
101 | insmod rte_kni.ko | |
102 | ||
103 | Other basic DPDK preparations like hugepage enabling, uio port binding are not listed here. | |
104 | Please refer to the *DPDK Getting Started Guide* for detailed instructions. | |
105 | ||
106 | #. Launch the kni user application: | |
107 | ||
108 | .. code-block:: console | |
109 | ||
11fdf7f2 | 110 | examples/kni/build/app/kni -l 0-3 -n 4 -- -p 0x1 -P --config="(0,1,3)" |
7c673cae FG |
111 | |
112 | This command generates one network device vEth0 for physical port. | |
113 | If specify more physical ports, the generated network device will be vEth1, vEth2, and so on. | |
114 | ||
115 | For each physical port, kni creates two user threads. | |
116 | One thread loops to fetch packets from the physical NIC port into the kni receive queue. | |
117 | The other user thread loops to send packets in the kni transmit queue. | |
118 | ||
119 | For each physical port, kni also creates a kernel thread that retrieves packets from the kni receive queue, | |
120 | place them onto kni's raw socket's queue and wake up the vhost kernel thread to exchange packets with the virtio virt queue. | |
121 | ||
122 | For more details about kni, please refer to :ref:`kni`. | |
123 | ||
124 | #. Enable the kni raw socket functionality for the specified physical NIC port, | |
125 | get the generated file descriptor and set it in the qemu command line parameter. | |
126 | Always remember to set ioeventfd_on and vhost_on. | |
127 | ||
128 | Example: | |
129 | ||
130 | .. code-block:: console | |
131 | ||
132 | echo 1 > /sys/class/net/vEth0/sock_en | |
133 | fd=`cat /sys/class/net/vEth0/sock_fd` | |
134 | exec qemu-system-x86_64 -enable-kvm -cpu host \ | |
135 | -m 2048 -smp 4 -name dpdk-test1-vm1 \ | |
136 | -drive file=/data/DPDKVMS/dpdk-vm.img \ | |
137 | -netdev tap, fd=$fd,id=mynet_kni, script=no,vhost=on \ | |
138 | -device virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x3,ioeventfd=on \ | |
139 | -vnc:1 -daemonize | |
140 | ||
141 | In the above example, virtio port 0 in the guest VM will be associated with vEth0, which in turns corresponds to a physical port, | |
142 | which means received packets come from vEth0, and transmitted packets is sent to vEth0. | |
143 | ||
144 | #. In the guest, bind the virtio device to the uio_pci_generic kernel module and start the forwarding application. | |
145 | When the virtio port in guest bursts Rx, it is getting packets from the | |
146 | raw socket's receive queue. | |
147 | When the virtio port bursts Tx, it is sending packet to the tx_q. | |
148 | ||
149 | .. code-block:: console | |
150 | ||
151 | modprobe uio | |
152 | echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages | |
153 | modprobe uio_pci_generic | |
11fdf7f2 | 154 | python usertools/dpdk-devbind.py -b uio_pci_generic 00:03.0 |
7c673cae FG |
155 | |
156 | We use testpmd as the forwarding application in this example. | |
157 | ||
158 | .. figure:: img/console.* | |
159 | ||
160 | Running testpmd | |
161 | ||
162 | #. Use IXIA packet generator to inject a packet stream into the KNI physical port. | |
163 | ||
164 | The packet reception and transmission flow path is: | |
165 | ||
166 | IXIA packet generator->82599 PF->KNI Rx queue->KNI raw socket queue->Guest | |
167 | VM virtio port 0 Rx burst->Guest VM virtio port 0 Tx burst-> KNI Tx queue | |
168 | ->82599 PF-> IXIA packet generator | |
169 | ||
170 | Virtio with qemu virtio Back End | |
171 | -------------------------------- | |
172 | ||
173 | .. _figure_host_vm_comms_qemu: | |
174 | ||
175 | .. figure:: img/host_vm_comms_qemu.* | |
176 | ||
177 | Host2VM Communication Example Using qemu vhost Back End | |
178 | ||
179 | ||
180 | .. code-block:: console | |
181 | ||
182 | qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/ | |
183 | hugepages -mem-prealloc | |
184 | -drive file=/data/DPDKVMS/dpdk-vm1 | |
185 | -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on | |
186 | -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on | |
187 | -device pci-assign,host=04:10.1 \ | |
188 | ||
189 | In this example, the packet reception flow path is: | |
190 | ||
191 | IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest | |
192 | VM virtio port 0 Rx burst-> Guest VM 82599 VF port1 Tx burst-> IXIA packet | |
193 | generator | |
194 | ||
195 | The packet transmission flow is: | |
196 | ||
197 | IXIA packet generator-> Guest VM 82599 VF port1 Rx burst-> Guest VM virtio | |
198 | port 0 Tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator | |
199 | ||
200 | ||
201 | Virtio PMD Rx/Tx Callbacks | |
202 | -------------------------- | |
203 | ||
11fdf7f2 | 204 | Virtio driver has 4 Rx callbacks and 3 Tx callbacks. |
7c673cae FG |
205 | |
206 | Rx callbacks: | |
207 | ||
208 | #. ``virtio_recv_pkts``: | |
209 | Regular version without mergeable Rx buffer support. | |
210 | ||
211 | #. ``virtio_recv_mergeable_pkts``: | |
212 | Regular version with mergeable Rx buffer support. | |
213 | ||
214 | #. ``virtio_recv_pkts_vec``: | |
215 | Vector version without mergeable Rx buffer support, also fixes the available | |
216 | ring indexes and uses vector instructions to optimize performance. | |
217 | ||
11fdf7f2 TL |
218 | #. ``virtio_recv_mergeable_pkts_inorder``: |
219 | In-order version with mergeable Rx buffer support. | |
220 | ||
7c673cae FG |
221 | Tx callbacks: |
222 | ||
223 | #. ``virtio_xmit_pkts``: | |
224 | Regular version. | |
225 | ||
226 | #. ``virtio_xmit_pkts_simple``: | |
227 | Vector version fixes the available ring indexes to optimize performance. | |
228 | ||
11fdf7f2 TL |
229 | #. ``virtio_xmit_pkts_inorder``: |
230 | In-order version. | |
7c673cae FG |
231 | |
232 | By default, the non-vector callbacks are used: | |
233 | ||
234 | * For Rx: If mergeable Rx buffers is disabled then ``virtio_recv_pkts`` is | |
235 | used; otherwise ``virtio_recv_mergeable_pkts``. | |
236 | ||
237 | * For Tx: ``virtio_xmit_pkts``. | |
238 | ||
239 | ||
240 | Vector callbacks will be used when: | |
241 | ||
11fdf7f2 | 242 | * ``txmode.offloads`` is set to ``0x0``, which implies: |
7c673cae FG |
243 | |
244 | * Single segment is specified. | |
245 | ||
246 | * No offload support is needed. | |
247 | ||
248 | * Mergeable Rx buffers is disabled. | |
249 | ||
250 | The corresponding callbacks are: | |
251 | ||
252 | * For Rx: ``virtio_recv_pkts_vec``. | |
253 | ||
254 | * For Tx: ``virtio_xmit_pkts_simple``. | |
255 | ||
256 | ||
257 | Example of using the vector version of the virtio poll mode driver in | |
258 | ``testpmd``:: | |
259 | ||
11fdf7f2 TL |
260 | testpmd -l 0-2 -n 4 -- -i --tx-offloads=0x0 --rxq=1 --txq=1 --nb-cores=1 |
261 | ||
262 | In-order callbacks only work on simulated virtio user vdev. | |
263 | ||
264 | * For Rx: If mergeable Rx buffers is enabled and in-order is enabled then | |
265 | ``virtio_xmit_pkts_inorder`` is used. | |
266 | ||
267 | * For Tx: If in-order is enabled then ``virtio_xmit_pkts_inorder`` is used. | |
268 | ||
269 | Interrupt mode | |
270 | -------------- | |
271 | ||
272 | .. _virtio_interrupt_mode: | |
273 | ||
274 | There are three kinds of interrupts from a virtio device over PCI bus: config | |
275 | interrupt, Rx interrupts, and Tx interrupts. Config interrupt is used for | |
276 | notification of device configuration changes, especially link status (lsc). | |
277 | Interrupt mode is translated into Rx interrupts in the context of DPDK. | |
278 | ||
279 | .. Note:: | |
280 | ||
281 | Virtio PMD already has support for receiving lsc from qemu when the link | |
282 | status changes, especially when vhost user disconnects. However, it fails | |
283 | to do that if the VM is created by qemu 2.6.2 or below, since the | |
284 | capability to detect vhost user disconnection is introduced in qemu 2.7.0. | |
285 | ||
286 | Prerequisites for Rx interrupts | |
287 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
288 | ||
289 | To support Rx interrupts, | |
290 | #. Check if guest kernel supports VFIO-NOIOMMU: | |
291 | ||
292 | Linux started to support VFIO-NOIOMMU since 4.8.0. Make sure the guest | |
293 | kernel is compiled with: | |
294 | ||
295 | .. code-block:: console | |
296 | ||
297 | CONFIG_VFIO_NOIOMMU=y | |
298 | ||
299 | #. Properly set msix vectors when starting VM: | |
300 | ||
301 | Enable multi-queue when starting VM, and specify msix vectors in qemu | |
302 | cmdline. (N+1) is the minimum, and (2N+2) is mostly recommended. | |
303 | ||
304 | .. code-block:: console | |
305 | ||
306 | $(QEMU) ... -device virtio-net-pci,mq=on,vectors=2N+2 ... | |
307 | ||
308 | #. In VM, insert vfio module in NOIOMMU mode: | |
309 | ||
310 | .. code-block:: console | |
311 | ||
312 | modprobe vfio enable_unsafe_noiommu_mode=1 | |
313 | modprobe vfio-pci | |
314 | ||
315 | #. In VM, bind the virtio device with vfio-pci: | |
316 | ||
317 | .. code-block:: console | |
318 | ||
319 | python usertools/dpdk-devbind.py -b vfio-pci 00:03.0 | |
320 | ||
321 | Example | |
322 | ~~~~~~~ | |
323 | ||
324 | Here we use l3fwd-power as an example to show how to get started. | |
325 | ||
326 | Example: | |
327 | ||
328 | .. code-block:: console | |
329 | ||
330 | $ l3fwd-power -l 0-1 -- -p 1 -P --config="(0,0,1)" \ | |
331 | --no-numa --parse-ptype | |
332 | ||
333 | ||
334 | Virtio PMD arguments | |
335 | -------------------- | |
336 | ||
337 | The user can specify below argument in devargs. | |
338 | ||
339 | #. ``vdpa``: | |
340 | ||
341 | A virtio device could also be driven by vDPA (vhost data path acceleration) | |
342 | driver, and works as a HW vhost backend. This argument is used to specify | |
343 | a virtio device needs to work in vDPA mode. | |
344 | (Default: 0 (disabled)) | |
345 | ||
346 | #. ``mrg_rxbuf``: | |
347 | ||
348 | It is used to enable virtio device mergeable Rx buffer feature. | |
349 | (Default: 1 (enabled)) | |
350 | ||
351 | #. ``in_order``: | |
352 | ||
353 | It is used to enable virtio device in-order feature. | |
354 | (Default: 1 (enabled)) |