]>
Commit | Line | Data |
---|---|---|
1 | [[qm_pci_passthrough]] | |
2 | PCI(e) Passthrough | |
3 | ------------------ | |
4 | ifdef::wiki[] | |
5 | :pve-toplevel: | |
6 | endif::wiki[] | |
7 | ||
8 | PCI(e) passthrough is a mechanism to give a virtual machine control over | |
9 | a PCI device from the host. This can have some advantages over using | |
10 | virtualized hardware, for example lower latency, higher performance, or more | |
11 | features (e.g., offloading). | |
12 | ||
13 | But, if you pass through a device to a virtual machine, you cannot use that | |
14 | device anymore on the host or in any other VM. | |
15 | ||
16 | Note that, while PCI passthrough is available for i440fx and q35 machines, PCIe | |
17 | passthrough is only available on q35 machines. This does not mean that | |
18 | PCIe capable devices that are passed through as PCI devices will only run at | |
19 | PCI speeds. Passing through devices as PCIe just sets a flag for the guest to | |
20 | tell it that the device is a PCIe device instead of a "really fast legacy PCI | |
21 | device". Some guest applications benefit from this. | |
22 | ||
23 | General Requirements | |
24 | ~~~~~~~~~~~~~~~~~~~~ | |
25 | ||
26 | Since passthrough is performed on real hardware, it needs to fulfill some | |
27 | requirements. A brief overview of these requirements is given below, for more | |
28 | information on specific devices, see | |
29 | https://pve.proxmox.com/wiki/PCI_Passthrough[PCI Passthrough Examples]. | |
30 | ||
31 | Hardware | |
32 | ^^^^^^^^ | |
33 | Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement | |
34 | **U**nit) interrupt remapping, this includes the CPU and the motherboard. | |
35 | ||
36 | Generally, Intel systems with VT-d and AMD systems with AMD-Vi support this. | |
37 | But it is not guaranteed that everything will work out of the box, due | |
38 | to bad hardware implementation and missing or low quality drivers. | |
39 | ||
40 | Further, server grade hardware has often better support than consumer grade | |
41 | hardware, but even then, many modern system can support this. | |
42 | ||
43 | Please refer to your hardware vendor to check if they support this feature | |
44 | under Linux for your specific setup. | |
45 | ||
46 | Determining PCI Card Address | |
47 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
48 | ||
49 | The easiest way is to use the GUI to add a device of type "Host PCI" in the VM's | |
50 | hardware tab. Alternatively, you can use the command line. | |
51 | ||
52 | You can locate your card using | |
53 | ||
54 | ---- | |
55 | lspci | |
56 | ---- | |
57 | ||
58 | Configuration | |
59 | ^^^^^^^^^^^^^ | |
60 | ||
61 | Once you ensured that your hardware supports passthrough, you will need to do | |
62 | some configuration to enable PCI(e) passthrough. | |
63 | ||
64 | .IOMMU | |
65 | ||
66 | First, you will have to enable IOMMU support in your BIOS/UEFI. Usually the | |
67 | corresponding setting is called `IOMMU` or `VT-d`, but you should find the exact | |
68 | option name in the manual of your motherboard. | |
69 | ||
70 | For Intel CPUs, you also need to enable the IOMMU on the | |
71 | xref:sysboot_edit_kernel_cmdline[kernel command line] kernels by adding: | |
72 | ||
73 | ---- | |
74 | intel_iommu=on | |
75 | ---- | |
76 | ||
77 | For AMD CPUs it should be enabled automatically. | |
78 | ||
79 | .IOMMU Passthrough Mode | |
80 | ||
81 | If your hardware supports IOMMU passthrough mode, enabling this mode might | |
82 | increase performance. | |
83 | This is because VMs then bypass the (default) DMA translation normally | |
84 | performed by the hyper-visor and instead pass DMA requests directly to the | |
85 | hardware IOMMU. To enable these options, add: | |
86 | ||
87 | ---- | |
88 | iommu=pt | |
89 | ---- | |
90 | ||
91 | to the xref:sysboot_edit_kernel_cmdline[kernel commandline]. | |
92 | ||
93 | .Kernel Modules | |
94 | ||
95 | //TODO: remove `vfio_virqfd` stuff with eol of pve 7 | |
96 | You have to make sure the following modules are loaded. This can be achieved by | |
97 | adding them to `'/etc/modules''. In kernels newer than 6.2 ({pve} 8 and onward) | |
98 | the 'vfio_virqfd' module is part of the 'vfio' module, therefore loading | |
99 | 'vfio_virqfd' in {pve} 8 and newer is not necessary. | |
100 | ||
101 | ---- | |
102 | vfio | |
103 | vfio_iommu_type1 | |
104 | vfio_pci | |
105 | vfio_virqfd #not needed if on kernel 6.2 or newer | |
106 | ---- | |
107 | ||
108 | [[qm_pci_passthrough_update_initramfs]] | |
109 | After changing anything modules related, you need to refresh your | |
110 | `initramfs`. On {pve} this can be done by executing: | |
111 | ||
112 | ---- | |
113 | # update-initramfs -u -k all | |
114 | ---- | |
115 | ||
116 | To check if the modules are being loaded, the output of | |
117 | ||
118 | ---- | |
119 | # lsmod | grep vfio | |
120 | ---- | |
121 | ||
122 | should include the four modules from above. | |
123 | ||
124 | .Finish Configuration | |
125 | ||
126 | Finally reboot to bring the changes into effect and check that it is indeed | |
127 | enabled. | |
128 | ||
129 | ---- | |
130 | # dmesg | grep -e DMAR -e IOMMU -e AMD-Vi | |
131 | ---- | |
132 | ||
133 | should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is | |
134 | enabled, depending on hardware and kernel the exact message can vary. | |
135 | ||
136 | For notes on how to troubleshoot or verify if IOMMU is working as intended, please | |
137 | see the https://pve.proxmox.com/wiki/PCI_Passthrough#Verifying_IOMMU_parameters[Verifying IOMMU Parameters] | |
138 | section in our wiki. | |
139 | ||
140 | It is also important that the device(s) you want to pass through | |
141 | are in a *separate* `IOMMU` group. This can be checked with a call to the {pve} | |
142 | API: | |
143 | ||
144 | ---- | |
145 | # pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist "" | |
146 | ---- | |
147 | ||
148 | It is okay if the device is in an `IOMMU` group together with its functions | |
149 | (e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge. | |
150 | ||
151 | .PCI(e) slots | |
152 | [NOTE] | |
153 | ==== | |
154 | Some platforms handle their physical PCI(e) slots differently. So, sometimes | |
155 | it can help to put the card in a another PCI(e) slot, if you do not get the | |
156 | desired `IOMMU` group separation. | |
157 | ==== | |
158 | ||
159 | .Unsafe interrupts | |
160 | [NOTE] | |
161 | ==== | |
162 | For some platforms, it may be necessary to allow unsafe interrupts. | |
163 | For this add the following line in a file ending with `.conf' file in | |
164 | */etc/modprobe.d/*: | |
165 | ||
166 | ---- | |
167 | options vfio_iommu_type1 allow_unsafe_interrupts=1 | |
168 | ---- | |
169 | ||
170 | Please be aware that this option can make your system unstable. | |
171 | ==== | |
172 | ||
173 | GPU Passthrough Notes | |
174 | ^^^^^^^^^^^^^^^^^^^^^ | |
175 | ||
176 | It is not possible to display the frame buffer of the GPU via NoVNC or SPICE on | |
177 | the {pve} web interface. | |
178 | ||
179 | When passing through a whole GPU or a vGPU and graphic output is wanted, one | |
180 | has to either physically connect a monitor to the card, or configure a remote | |
181 | desktop software (for example, VNC or RDP) inside the guest. | |
182 | ||
183 | If you want to use the GPU as a hardware accelerator, for example, for | |
184 | programs using OpenCL or CUDA, this is not required. | |
185 | ||
186 | Host Device Passthrough | |
187 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
188 | ||
189 | The most used variant of PCI(e) passthrough is to pass through a whole | |
190 | PCI(e) card, for example a GPU or a network card. | |
191 | ||
192 | ||
193 | Host Configuration | |
194 | ^^^^^^^^^^^^^^^^^^ | |
195 | ||
196 | {pve} tries to automatically make the PCI(e) device unavailable for the host. | |
197 | However, if this doesn't work, there are two things that can be done: | |
198 | ||
199 | * pass the device IDs to the options of the 'vfio-pci' modules by adding | |
200 | + | |
201 | ---- | |
202 | options vfio-pci ids=1234:5678,4321:8765 | |
203 | ---- | |
204 | + | |
205 | to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are | |
206 | the vendor and device IDs obtained by: | |
207 | + | |
208 | ---- | |
209 | # lspci -nn | |
210 | ---- | |
211 | ||
212 | * blacklist the driver on the host completely, ensuring that it is free to bind | |
213 | for passthrough, with | |
214 | + | |
215 | ---- | |
216 | blacklist DRIVERNAME | |
217 | ---- | |
218 | + | |
219 | in a .conf file in */etc/modprobe.d/*. | |
220 | + | |
221 | To find the drivername, execute | |
222 | + | |
223 | ---- | |
224 | # lspci -k | |
225 | ---- | |
226 | + | |
227 | for example: | |
228 | + | |
229 | ---- | |
230 | # lspci -k | grep -A 3 "VGA" | |
231 | ---- | |
232 | + | |
233 | will output something similar to | |
234 | + | |
235 | ---- | |
236 | 01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1) | |
237 | Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 [GeForce GT 1030] | |
238 | Kernel driver in use: <some-module> | |
239 | Kernel modules: <some-module> | |
240 | ---- | |
241 | + | |
242 | Now we can blacklist the drivers by writing them into a .conf file: | |
243 | + | |
244 | ---- | |
245 | echo "blacklist <some-module>" >> /etc/modprobe.d/blacklist.conf | |
246 | ---- | |
247 | ||
248 | For both methods you need to | |
249 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and | |
250 | reboot after that. | |
251 | ||
252 | Should this not work, you might need to set a soft dependency to load the gpu | |
253 | modules before loading 'vfio-pci'. This can be done with the 'softdep' flag, see | |
254 | also the manpages on 'modprobe.d' for more information. | |
255 | ||
256 | For example, if you are using drivers named <some-module>: | |
257 | ||
258 | ---- | |
259 | # echo "softdep <some-module> pre: vfio-pci" >> /etc/modprobe.d/<some-module>.conf | |
260 | ---- | |
261 | ||
262 | ||
263 | .Verify Configuration | |
264 | ||
265 | To check if your changes were successful, you can use | |
266 | ||
267 | ---- | |
268 | # lspci -nnk | |
269 | ---- | |
270 | ||
271 | and check your device entry. If it says | |
272 | ||
273 | ---- | |
274 | Kernel driver in use: vfio-pci | |
275 | ---- | |
276 | ||
277 | or the 'in use' line is missing entirely, the device is ready to be used for | |
278 | passthrough. | |
279 | ||
280 | [[qm_pci_passthrough_vm_config]] | |
281 | VM Configuration | |
282 | ^^^^^^^^^^^^^^^^ | |
283 | When passing through a GPU, the best compatibility is reached when using | |
284 | 'q35' as machine type, 'OVMF' ('UEFI' for VMs) instead of SeaBIOS and PCIe | |
285 | instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the | |
286 | GPU needs to have an UEFI capable ROM, otherwise use SeaBIOS instead. To check if | |
287 | the ROM is UEFI capable, see the | |
288 | https://pve.proxmox.com/wiki/PCI_Passthrough#How_to_know_if_a_graphics_card_is_UEFI_.28OVMF.29_compatible[PCI Passthrough Examples] | |
289 | wiki. | |
290 | ||
291 | Furthermore, using OVMF, disabling vga arbitration may be possible, reducing the | |
292 | amount of legacy code needed to be run during boot. To disable vga arbitration: | |
293 | ||
294 | ---- | |
295 | echo "options vfio-pci ids=<vendor-id>,<device-id> disable_vga=1" > /etc/modprobe.d/vfio.conf | |
296 | ---- | |
297 | ||
298 | replacing the <vendor-id> and <device-id> with the ones obtained from: | |
299 | ||
300 | ---- | |
301 | # lspci -nn | |
302 | ---- | |
303 | ||
304 | PCI devices can be added in the web interface in the hardware section of the VM. | |
305 | Alternatively, you can use the command line; set the *hostpciX* option in the VM | |
306 | configuration, for example by executing: | |
307 | ||
308 | ---- | |
309 | # qm set VMID -hostpci0 00:02.0 | |
310 | ---- | |
311 | ||
312 | or by adding a line to the VM configuration file: | |
313 | ||
314 | ---- | |
315 | hostpci0: 00:02.0 | |
316 | ---- | |
317 | ||
318 | ||
319 | If your device has multiple functions (e.g., ``00:02.0`' and ``00:02.1`' ), | |
320 | you can pass them through all together with the shortened syntax ``00:02`'. | |
321 | This is equivalent with checking the ``All Functions`' checkbox in the | |
322 | web interface. | |
323 | ||
324 | There are some options to which may be necessary, depending on the device | |
325 | and guest OS: | |
326 | ||
327 | * *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM. | |
328 | With this enabled the *vga* configuration option will be ignored. | |
329 | ||
330 | * *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device | |
331 | combination require PCIe rather than PCI. PCIe is only available for 'q35' | |
332 | machine types. | |
333 | ||
334 | * *rombar=on|off* makes the firmware ROM visible for the guest. Default is on. | |
335 | Some PCI(e) devices need this disabled. | |
336 | ||
337 | * *romfile=<path>*, is an optional path to a ROM file for the device to use. | |
338 | This is a relative path under */usr/share/kvm/*. | |
339 | ||
340 | .Example | |
341 | ||
342 | An example of PCIe passthrough with a GPU set to primary: | |
343 | ||
344 | ---- | |
345 | # qm set VMID -hostpci0 02:00,pcie=on,x-vga=on | |
346 | ---- | |
347 | ||
348 | .PCI ID overrides | |
349 | ||
350 | You can override the PCI vendor ID, device ID, and subsystem IDs that will be | |
351 | seen by the guest. This is useful if your device is a variant with an ID that | |
352 | your guest's drivers don't recognize, but you want to force those drivers to be | |
353 | loaded anyway (e.g. if you know your device shares the same chipset as a | |
354 | supported variant). | |
355 | ||
356 | The available options are `vendor-id`, `device-id`, `sub-vendor-id`, and | |
357 | `sub-device-id`. You can set any or all of these to override your device's | |
358 | default IDs. | |
359 | ||
360 | For example: | |
361 | ||
362 | ---- | |
363 | # qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000 | |
364 | ---- | |
365 | ||
366 | SR-IOV | |
367 | ~~~~~~ | |
368 | ||
369 | Another variant for passing through PCI(e) devices is to use the hardware | |
370 | virtualization features of your devices, if available. | |
371 | ||
372 | .Enabling SR-IOV | |
373 | [NOTE] | |
374 | ==== | |
375 | To use SR-IOV, platform support is especially important. It may be necessary | |
376 | to enable this feature in the BIOS/UEFI first, or to use a specific PCI(e) port | |
377 | for it to work. In doubt, consult the manual of the platform or contact its | |
378 | vendor. | |
379 | ==== | |
380 | ||
381 | 'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables | |
382 | a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the | |
383 | system. Each of those 'VF' can be used in a different VM, with full hardware | |
384 | features and also better performance and lower latency than software | |
385 | virtualized devices. | |
386 | ||
387 | Currently, the most common use case for this are NICs (**N**etwork | |
388 | **I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per | |
389 | physical port. This allows using features such as checksum offloading, etc. to | |
390 | be used inside a VM, reducing the (host) CPU overhead. | |
391 | ||
392 | Host Configuration | |
393 | ^^^^^^^^^^^^^^^^^^ | |
394 | ||
395 | Generally, there are two methods for enabling virtual functions on a device. | |
396 | ||
397 | * sometimes there is an option for the driver module e.g. for some | |
398 | Intel drivers | |
399 | + | |
400 | ---- | |
401 | max_vfs=4 | |
402 | ---- | |
403 | + | |
404 | which could be put file with '.conf' ending under */etc/modprobe.d/*. | |
405 | (Do not forget to update your initramfs after that) | |
406 | + | |
407 | Please refer to your driver module documentation for the exact | |
408 | parameters and options. | |
409 | ||
410 | * The second, more generic, approach is using the `sysfs`. | |
411 | If a device and driver supports this you can change the number of VFs on | |
412 | the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute: | |
413 | + | |
414 | ---- | |
415 | # echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs | |
416 | ---- | |
417 | + | |
418 | To make this change persistent you can use the `sysfsutils` Debian package. | |
419 | After installation configure it via */etc/sysfs.conf* or a `FILE.conf' in | |
420 | */etc/sysfs.d/*. | |
421 | ||
422 | VM Configuration | |
423 | ^^^^^^^^^^^^^^^^ | |
424 | ||
425 | After creating VFs, you should see them as separate PCI(e) devices when | |
426 | outputting them with `lspci`. Get their ID and pass them through like a | |
427 | xref:qm_pci_passthrough_vm_config[normal PCI(e) device]. | |
428 | ||
429 | Mediated Devices (vGPU, GVT-g) | |
430 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
431 | ||
432 | Mediated devices are another method to reuse features and performance from | |
433 | physical hardware for virtualized hardware. These are found most common in | |
434 | virtualized GPU setups such as Intel's GVT-g and NVIDIA's vGPUs used in their | |
435 | GRID technology. | |
436 | ||
437 | With this, a physical Card is able to create virtual cards, similar to SR-IOV. | |
438 | The difference is that mediated devices do not appear as PCI(e) devices in the | |
439 | host, and are such only suited for using in virtual machines. | |
440 | ||
441 | Host Configuration | |
442 | ^^^^^^^^^^^^^^^^^^ | |
443 | ||
444 | In general your card's driver must support that feature, otherwise it will | |
445 | not work. So please refer to your vendor for compatible drivers and how to | |
446 | configure them. | |
447 | ||
448 | Intel's drivers for GVT-g are integrated in the Kernel and should work | |
449 | with 5th, 6th and 7th generation Intel Core Processors, as well as E3 v4, E3 | |
450 | v5 and E3 v6 Xeon Processors. | |
451 | ||
452 | To enable it for Intel Graphics, you have to make sure to load the module | |
453 | 'kvmgt' (for example via `/etc/modules`) and to enable it on the | |
454 | xref:sysboot_edit_kernel_cmdline[Kernel commandline] and add the following parameter: | |
455 | ||
456 | ---- | |
457 | i915.enable_gvt=1 | |
458 | ---- | |
459 | ||
460 | After that remember to | |
461 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`], | |
462 | and reboot your host. | |
463 | ||
464 | VM Configuration | |
465 | ^^^^^^^^^^^^^^^^ | |
466 | ||
467 | To use a mediated device, simply specify the `mdev` property on a `hostpciX` | |
468 | VM configuration option. | |
469 | ||
470 | You can get the supported devices via the 'sysfs'. For example, to list the | |
471 | supported types for the device '0000:00:02.0' you would simply execute: | |
472 | ||
473 | ---- | |
474 | # ls /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types | |
475 | ---- | |
476 | ||
477 | Each entry is a directory which contains the following important files: | |
478 | ||
479 | * 'available_instances' contains the amount of still available instances of | |
480 | this type, each 'mdev' use in a VM reduces this. | |
481 | * 'description' contains a short description about the capabilities of the type | |
482 | * 'create' is the endpoint to create such a device, {pve} does this | |
483 | automatically for you, if a 'hostpciX' option with `mdev` is configured. | |
484 | ||
485 | Example configuration with an `Intel GVT-g vGPU` (`Intel Skylake 6700k`): | |
486 | ||
487 | ---- | |
488 | # qm set VMID -hostpci0 00:02.0,mdev=i915-GVTg_V5_4 | |
489 | ---- | |
490 | ||
491 | With this set, {pve} automatically creates such a device on VM start, and | |
492 | cleans it up again when the VM stops. | |
493 | ||
494 | Use in Clusters | |
495 | ~~~~~~~~~~~~~~~ | |
496 | ||
497 | It is also possible to map devices on a cluster level, so that they can be | |
498 | properly used with HA and hardware changes are detected and non root users | |
499 | can configure them. See xref:resource_mapping[Resource Mapping] | |
500 | for details on that. | |
501 | ||
502 | [[qm_pci_viommu]] | |
503 | vIOMMU (emulated IOMMU) | |
504 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
505 | ||
506 | vIOMMU is the emulation of a hardware IOMMU within a virtual machine, providing | |
507 | improved memory access control and security for virtualized I/O devices. Using | |
508 | the vIOMMU option also allows you to pass through PCI devices to level-2 VMs in | |
509 | level-1 VMs via https://pve.proxmox.com/wiki/Nested_Virtualization[Nested Virtualization]. | |
510 | There are currently two vIOMMU implementations available: Intel and VirtIO. | |
511 | ||
512 | Host requirement: | |
513 | ||
514 | * Add `intel_iommu=on` or `amd_iommu=on` depending on your CPU to your kernel | |
515 | command line. | |
516 | ||
517 | Intel vIOMMU | |
518 | ^^^^^^^^^^^^ | |
519 | ||
520 | Intel vIOMMU specific VM requirements: | |
521 | ||
522 | * Whether you are using an Intel or AMD CPU on your host, it is important to set | |
523 | `intel_iommu=on` in the VMs kernel parameters. | |
524 | ||
525 | * To use Intel vIOMMU you need to set *q35* as the machine type. | |
526 | ||
527 | If all requirements are met, you can add `viommu=intel` to the machine parameter | |
528 | in the configuration of the VM that should be able to pass through PCI devices. | |
529 | ||
530 | ---- | |
531 | # qm set VMID -machine q35,viommu=intel | |
532 | ---- | |
533 | ||
534 | https://wiki.qemu.org/Features/VT-d[QEMU documentation for VT-d] | |
535 | ||
536 | VirtIO vIOMMU | |
537 | ^^^^^^^^^^^^^ | |
538 | ||
539 | This vIOMMU implementation is more recent and does not have as many limitations | |
540 | as Intel vIOMMU but is currently less used in production and less documentated. | |
541 | ||
542 | With VirtIO vIOMMU there is *no* need to set any kernel parameters. It is also | |
543 | *not* necessary to use q35 as the machine type, but it is advisable if you want | |
544 | to use PCIe. | |
545 | ||
546 | ---- | |
547 | # qm set VMID -machine q35,viommu=virtio | |
548 | ---- | |
549 | ||
550 | https://web.archive.org/web/20230804075844/https://michael2012z.medium.com/virtio-iommu-789369049443[Blog-Post by Michael Zhao explaining virtio-iommu] | |
551 | ||
552 | ifdef::wiki[] | |
553 | ||
554 | See Also | |
555 | ~~~~~~~~ | |
556 | ||
557 | * link:/wiki/Pci_passthrough[PCI Passthrough Examples] | |
558 | ||
559 | endif::wiki[] |