]>
Commit | Line | Data |
---|---|---|
2a26ed8e MCC |
1 | .. include:: <isonum.txt> |
2 | ||
3 | ===================== | |
4 | VFIO Mediated devices | |
5 | ===================== | |
6 | ||
7 | :Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved. | |
8 | :Author: Neo Jia <cjia@nvidia.com> | |
9 | :Author: Kirti Wankhede <kwankhede@nvidia.com> | |
10 | ||
11 | This program is free software; you can redistribute it and/or modify | |
12 | it under the terms of the GNU General Public License version 2 as | |
13 | published by the Free Software Foundation. | |
14 | ||
8e1c5a40 KW |
15 | |
16 | Virtual Function I/O (VFIO) Mediated devices[1] | |
17 | =============================================== | |
18 | ||
19 | The number of use cases for virtualizing DMA devices that do not have built-in | |
20 | SR_IOV capability is increasing. Previously, to virtualize such devices, | |
21 | developers had to create their own management interfaces and APIs, and then | |
22 | integrate them with user space software. To simplify integration with user space | |
23 | software, we have identified common requirements and a unified management | |
24 | interface for such devices. | |
25 | ||
26 | The VFIO driver framework provides unified APIs for direct device access. It is | |
27 | an IOMMU/device-agnostic framework for exposing direct device access to user | |
28 | space in a secure, IOMMU-protected environment. This framework is used for | |
29 | multiple devices, such as GPUs, network adapters, and compute accelerators. With | |
30 | direct device access, virtual machines or user space applications have direct | |
31 | access to the physical device. This framework is reused for mediated devices. | |
32 | ||
33 | The mediated core driver provides a common interface for mediated device | |
34 | management that can be used by drivers of different devices. This module | |
35 | provides a generic interface to perform these operations: | |
36 | ||
37 | * Create and destroy a mediated device | |
38 | * Add a mediated device to and remove it from a mediated bus driver | |
39 | * Add a mediated device to and remove it from an IOMMU group | |
40 | ||
41 | The mediated core driver also provides an interface to register a bus driver. | |
42 | For example, the mediated VFIO mdev driver is designed for mediated devices and | |
43 | supports VFIO APIs. The mediated bus driver adds a mediated device to and | |
44 | removes it from a VFIO group. | |
45 | ||
46 | The following high-level block diagram shows the main components and interfaces | |
47 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM | |
2a26ed8e | 48 | devices as examples, as these devices are the first devices to use this module:: |
8e1c5a40 KW |
49 | |
50 | +---------------+ | |
51 | | | | |
52 | | +-----------+ | mdev_register_driver() +--------------+ | |
53 | | | | +<------------------------+ | | |
54 | | | mdev | | | | | |
55 | | | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user | |
56 | | | driver | | probe()/remove() | | APIs | |
57 | | | | | +--------------+ | |
58 | | +-----------+ | | |
59 | | | | |
60 | | MDEV CORE | | |
61 | | MODULE | | |
62 | | mdev.ko | | |
63 | | +-----------+ | mdev_register_device() +--------------+ | |
64 | | | | +<------------------------+ | | |
65 | | | | | | nvidia.ko |<-> physical | |
66 | | | | +------------------------>+ | device | |
67 | | | | | callbacks +--------------+ | |
68 | | | Physical | | | |
69 | | | device | | mdev_register_device() +--------------+ | |
70 | | | interface | |<------------------------+ | | |
71 | | | | | | i915.ko |<-> physical | |
72 | | | | +------------------------>+ | device | |
73 | | | | | callbacks +--------------+ | |
74 | | | | | | |
75 | | | | | mdev_register_device() +--------------+ | |
76 | | | | +<------------------------+ | | |
77 | | | | | | ccw_device.ko|<-> physical | |
78 | | | | +------------------------>+ | device | |
79 | | | | | callbacks +--------------+ | |
80 | | +-----------+ | | |
81 | +---------------+ | |
82 | ||
83 | ||
84 | Registration Interfaces | |
85 | ======================= | |
86 | ||
87 | The mediated core driver provides the following types of registration | |
88 | interfaces: | |
89 | ||
90 | * Registration interface for a mediated bus driver | |
91 | * Physical device driver interface | |
92 | ||
93 | Registration Interface for a Mediated Bus Driver | |
94 | ------------------------------------------------ | |
95 | ||
96 | The registration interface for a mediated bus driver provides the following | |
2a26ed8e | 97 | structure to represent a mediated device's driver:: |
8e1c5a40 KW |
98 | |
99 | /* | |
100 | * struct mdev_driver [2] - Mediated device's driver | |
101 | * @name: driver name | |
102 | * @probe: called when new device created | |
103 | * @remove: called when device removed | |
104 | * @driver: device driver structure | |
105 | */ | |
106 | struct mdev_driver { | |
107 | const char *name; | |
108 | int (*probe) (struct device *dev); | |
109 | void (*remove) (struct device *dev); | |
110 | struct device_driver driver; | |
111 | }; | |
112 | ||
113 | A mediated bus driver for mdev should use this structure in the function calls | |
114 | to register and unregister itself with the core driver: | |
115 | ||
2a26ed8e | 116 | * Register:: |
8e1c5a40 | 117 | |
2a26ed8e | 118 | extern int mdev_register_driver(struct mdev_driver *drv, |
8e1c5a40 KW |
119 | struct module *owner); |
120 | ||
2a26ed8e | 121 | * Unregister:: |
8e1c5a40 | 122 | |
2a26ed8e | 123 | extern void mdev_unregister_driver(struct mdev_driver *drv); |
8e1c5a40 KW |
124 | |
125 | The mediated bus driver is responsible for adding mediated devices to the VFIO | |
126 | group when devices are bound to the driver and removing mediated devices from | |
127 | the VFIO when devices are unbound from the driver. | |
128 | ||
129 | ||
130 | Physical Device Driver Interface | |
131 | -------------------------------- | |
132 | ||
42930553 AW |
133 | The physical device driver interface provides the mdev_parent_ops[3] structure |
134 | to define the APIs to manage work in the mediated core driver that is related | |
135 | to the physical device. | |
8e1c5a40 | 136 | |
42930553 | 137 | The structures in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
138 | |
139 | * dev_attr_groups: attributes of the parent device | |
140 | * mdev_attr_groups: attributes of the mediated device | |
141 | * supported_config: attributes to define supported configurations | |
142 | ||
42930553 | 143 | The functions in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
144 | |
145 | * create: allocate basic resources in a driver for a mediated device | |
146 | * remove: free resources in a driver when a mediated device is destroyed | |
147 | ||
42930553 | 148 | The callbacks in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
149 | |
150 | * open: open callback of mediated device | |
151 | * close: close callback of mediated device | |
152 | * ioctl: ioctl callback of mediated device | |
153 | * read : read emulation callback | |
154 | * write: write emulation callback | |
155 | * mmap: mmap emulation callback | |
156 | ||
42930553 | 157 | A driver should use the mdev_parent_ops structure in the function call to |
2a26ed8e | 158 | register itself with the mdev core driver:: |
8e1c5a40 | 159 | |
2a26ed8e MCC |
160 | extern int mdev_register_device(struct device *dev, |
161 | const struct mdev_parent_ops *ops); | |
8e1c5a40 | 162 | |
42930553 | 163 | However, the mdev_parent_ops structure is not required in the function call |
2a26ed8e | 164 | that a driver should use to unregister itself with the mdev core driver:: |
8e1c5a40 | 165 | |
2a26ed8e | 166 | extern void mdev_unregister_device(struct device *dev); |
8e1c5a40 KW |
167 | |
168 | ||
169 | Mediated Device Management Interface Through sysfs | |
170 | ================================================== | |
171 | ||
172 | The management interface through sysfs enables user space software, such as | |
173 | libvirt, to query and configure mediated devices in a hardware-agnostic fashion. | |
174 | This management interface provides flexibility to the underlying physical | |
175 | device's driver to support features such as: | |
176 | ||
177 | * Mediated device hot plug | |
178 | * Multiple mediated devices in a single virtual machine | |
179 | * Multiple mediated devices from different physical devices | |
180 | ||
181 | Links in the mdev_bus Class Directory | |
182 | ------------------------------------- | |
183 | The /sys/class/mdev_bus/ directory contains links to devices that are registered | |
184 | with the mdev core driver. | |
185 | ||
186 | Directories and files under the sysfs for Each Physical Device | |
187 | -------------------------------------------------------------- | |
188 | ||
2a26ed8e MCC |
189 | :: |
190 | ||
191 | |- [parent physical device] | |
192 | |--- Vendor-specific-attributes [optional] | |
193 | |--- [mdev_supported_types] | |
194 | | |--- [<type-id>] | |
195 | | | |--- create | |
196 | | | |--- name | |
197 | | | |--- available_instances | |
198 | | | |--- device_api | |
199 | | | |--- description | |
200 | | | |--- [devices] | |
201 | | |--- [<type-id>] | |
202 | | | |--- create | |
203 | | | |--- name | |
204 | | | |--- available_instances | |
205 | | | |--- device_api | |
206 | | | |--- description | |
207 | | | |--- [devices] | |
208 | | |--- [<type-id>] | |
209 | | |--- create | |
210 | | |--- name | |
211 | | |--- available_instances | |
212 | | |--- device_api | |
213 | | |--- description | |
214 | | |--- [devices] | |
8e1c5a40 KW |
215 | |
216 | * [mdev_supported_types] | |
217 | ||
218 | The list of currently supported mediated device types and their details. | |
219 | ||
220 | [<type-id>], device_api, and available_instances are mandatory attributes | |
221 | that should be provided by vendor driver. | |
222 | ||
223 | * [<type-id>] | |
224 | ||
1c4f128e SD |
225 | The [<type-id>] name is created by adding the device driver string as a prefix |
226 | to the string provided by the vendor driver. This format of this name is as | |
2a26ed8e | 227 | follows:: |
8e1c5a40 KW |
228 | |
229 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); | |
230 | ||
9372e6fe | 231 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside |
2a26ed8e | 232 | of the core mdev code) |
9372e6fe | 233 | |
8e1c5a40 KW |
234 | * device_api |
235 | ||
236 | This attribute should show which device API is being created, for example, | |
237 | "vfio-pci" for a PCI device. | |
238 | ||
239 | * available_instances | |
240 | ||
241 | This attribute should show the number of devices of type <type-id> that can be | |
242 | created. | |
243 | ||
244 | * [device] | |
245 | ||
246 | This directory contains links to the devices of type <type-id> that have been | |
2a26ed8e | 247 | created. |
8e1c5a40 KW |
248 | |
249 | * name | |
250 | ||
251 | This attribute should show human readable name. This is optional attribute. | |
252 | ||
253 | * description | |
254 | ||
255 | This attribute should show brief features/description of the type. This is | |
256 | optional attribute. | |
257 | ||
258 | Directories and Files Under the sysfs for Each mdev Device | |
259 | ---------------------------------------------------------- | |
260 | ||
2a26ed8e MCC |
261 | :: |
262 | ||
263 | |- [parent phy device] | |
264 | |--- [$MDEV_UUID] | |
8e1c5a40 KW |
265 | |--- remove |
266 | |--- mdev_type {link to its type} | |
267 | |--- vendor-specific-attributes [optional] | |
268 | ||
269 | * remove (write only) | |
2a26ed8e | 270 | |
8e1c5a40 KW |
271 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can |
272 | fail the remove() callback if that device is active and the vendor driver | |
273 | doesn't support hot unplug. | |
274 | ||
2a26ed8e MCC |
275 | Example:: |
276 | ||
8e1c5a40 KW |
277 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove |
278 | ||
2a26ed8e | 279 | Mediated device Hot plug |
8e1c5a40 KW |
280 | ------------------------ |
281 | ||
282 | Mediated devices can be created and assigned at runtime. The procedure to hot | |
283 | plug a mediated device is the same as the procedure to hot plug a PCI device. | |
284 | ||
285 | Translation APIs for Mediated Devices | |
286 | ===================================== | |
287 | ||
288 | The following APIs are provided for translating user pfn to host pfn in a VFIO | |
2a26ed8e | 289 | driver:: |
8e1c5a40 | 290 | |
2a26ed8e MCC |
291 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, |
292 | int npage, int prot, unsigned long *phys_pfn); | |
8e1c5a40 | 293 | |
2a26ed8e MCC |
294 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, |
295 | int npage); | |
8e1c5a40 KW |
296 | |
297 | These functions call back into the back-end IOMMU module by using the pin_pages | |
298 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently | |
299 | these callbacks are supported in the TYPE1 IOMMU module. To enable them for | |
300 | other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide | |
301 | these two callback functions. | |
302 | ||
9d1a546c KW |
303 | Using the Sample Code |
304 | ===================== | |
305 | ||
306 | mtty.c in samples/vfio-mdev/ directory is a sample driver program to | |
307 | demonstrate how to use the mediated device framework. | |
308 | ||
309 | The sample driver creates an mdev device that simulates a serial port over a PCI | |
310 | card. | |
311 | ||
312 | 1. Build and load the mtty.ko module. | |
313 | ||
314 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ | |
315 | ||
2a26ed8e MCC |
316 | Files in this device directory in sysfs are similar to the following:: |
317 | ||
318 | # tree /sys/devices/virtual/mtty/mtty/ | |
319 | /sys/devices/virtual/mtty/mtty/ | |
320 | |-- mdev_supported_types | |
321 | | |-- mtty-1 | |
322 | | | |-- available_instances | |
323 | | | |-- create | |
324 | | | |-- device_api | |
325 | | | |-- devices | |
326 | | | `-- name | |
327 | | `-- mtty-2 | |
328 | | |-- available_instances | |
329 | | |-- create | |
330 | | |-- device_api | |
331 | | |-- devices | |
332 | | `-- name | |
333 | |-- mtty_dev | |
334 | | `-- sample_mtty_dev | |
335 | |-- power | |
336 | | |-- autosuspend_delay_ms | |
337 | | |-- control | |
338 | | |-- runtime_active_time | |
339 | | |-- runtime_status | |
340 | | `-- runtime_suspended_time | |
341 | |-- subsystem -> ../../../../class/mtty | |
342 | `-- uevent | |
9d1a546c KW |
343 | |
344 | 2. Create a mediated device by using the dummy device that you created in the | |
2a26ed8e | 345 | previous step:: |
9d1a546c | 346 | |
2a26ed8e | 347 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ |
9d1a546c KW |
348 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create |
349 | ||
2a26ed8e | 350 | 3. Add parameters to qemu-kvm:: |
9d1a546c | 351 | |
2a26ed8e MCC |
352 | -device vfio-pci,\ |
353 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 | |
9d1a546c KW |
354 | |
355 | 4. Boot the VM. | |
356 | ||
357 | In the Linux guest VM, with no hardware on the host, the device appears | |
2a26ed8e MCC |
358 | as follows:: |
359 | ||
360 | # lspci -s 00:05.0 -xxvv | |
361 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) | |
362 | Subsystem: Device 4348:3253 | |
363 | Physical Slot: 5 | |
364 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- | |
365 | Stepping- SERR- FastB2B- DisINTx- | |
366 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | |
367 | <TAbort- <MAbort- >SERR- <PERR- INTx- | |
368 | Interrupt: pin A routed to IRQ 10 | |
369 | Region 0: I/O ports at c150 [size=8] | |
370 | Region 1: I/O ports at c158 [size=8] | |
371 | Kernel driver in use: serial | |
372 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 | |
373 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 | |
374 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 | |
375 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 | |
376 | ||
377 | In the Linux guest VM, dmesg output for the device is as follows: | |
378 | ||
379 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 | |
380 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A | |
381 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A | |
382 | ||
383 | ||
384 | 5. In the Linux guest VM, check the serial ports:: | |
385 | ||
386 | # setserial -g /dev/ttyS* | |
387 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 | |
388 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 | |
389 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 | |
9d1a546c | 390 | |
ce8cd407 | 391 | 6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or |
9d1a546c KW |
392 | /dev/ttyS2 with hardware flow control disabled. |
393 | ||
394 | 7. Type data on the minicom terminal or send data to the terminal emulation | |
395 | program and read the data. | |
396 | ||
397 | Data is loop backed from hosts mtty driver. | |
398 | ||
2a26ed8e | 399 | 8. Destroy the mediated device that you created:: |
9d1a546c | 400 | |
2a26ed8e | 401 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove |
9d1a546c | 402 | |
8e1c5a40 | 403 | References |
9d1a546c | 404 | ========== |
8e1c5a40 | 405 | |
2a26ed8e MCC |
406 | 1. See Documentation/vfio.txt for more information on VFIO. |
407 | 2. struct mdev_driver in include/linux/mdev.h | |
408 | 3. struct mdev_parent_ops in include/linux/mdev.h | |
409 | 4. struct vfio_iommu_driver_ops in include/linux/vfio.h |