]> git.proxmox.com Git - pve-docs.git/blob - qm-pci-passthrough.adoc
vzdump: add section about backup fleecing
[pve-docs.git] / qm-pci-passthrough.adoc
1 [[qm_pci_passthrough]]
2 PCI(e) Passthrough
3 ------------------
4 ifdef::wiki[]
5 :pve-toplevel:
6 endif::wiki[]
7
8 PCI(e) passthrough is a mechanism to give a virtual machine control over
9 a PCI device from the host. This can have some advantages over using
10 virtualized hardware, for example lower latency, higher performance, or more
11 features (e.g., offloading).
12
13 But, if you pass through a device to a virtual machine, you cannot use that
14 device anymore on the host or in any other VM.
15
16 Note that, while PCI passthrough is available for i440fx and q35 machines, PCIe
17 passthrough is only available on q35 machines. This does not mean that
18 PCIe capable devices that are passed through as PCI devices will only run at
19 PCI speeds. Passing through devices as PCIe just sets a flag for the guest to
20 tell it that the device is a PCIe device instead of a "really fast legacy PCI
21 device". Some guest applications benefit from this.
22
23 General Requirements
24 ~~~~~~~~~~~~~~~~~~~~
25
26 Since passthrough is performed on real hardware, it needs to fulfill some
27 requirements. A brief overview of these requirements is given below, for more
28 information on specific devices, see
29 https://pve.proxmox.com/wiki/PCI_Passthrough[PCI Passthrough Examples].
30
31 Hardware
32 ^^^^^^^^
33 Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement
34 **U**nit) interrupt remapping, this includes the CPU and the motherboard.
35
36 Generally, Intel systems with VT-d and AMD systems with AMD-Vi support this.
37 But it is not guaranteed that everything will work out of the box, due
38 to bad hardware implementation and missing or low quality drivers.
39
40 Further, server grade hardware has often better support than consumer grade
41 hardware, but even then, many modern system can support this.
42
43 Please refer to your hardware vendor to check if they support this feature
44 under Linux for your specific setup.
45
46 Determining PCI Card Address
47 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
48
49 The easiest way is to use the GUI to add a device of type "Host PCI" in the VM's
50 hardware tab. Alternatively, you can use the command line.
51
52 You can locate your card using
53
54 ----
55 lspci
56 ----
57
58 Configuration
59 ^^^^^^^^^^^^^
60
61 Once you ensured that your hardware supports passthrough, you will need to do
62 some configuration to enable PCI(e) passthrough.
63
64 .IOMMU
65
66 First, you will have to enable IOMMU support in your BIOS/UEFI. Usually the
67 corresponding setting is called `IOMMU` or `VT-d`, but you should find the exact
68 option name in the manual of your motherboard.
69
70 For Intel CPUs, you also need to enable the IOMMU on the
71 xref:sysboot_edit_kernel_cmdline[kernel command line] kernels by adding:
72
73 ----
74 intel_iommu=on
75 ----
76
77 For AMD CPUs it should be enabled automatically.
78
79 .IOMMU Passthrough Mode
80
81 If your hardware supports IOMMU passthrough mode, enabling this mode might
82 increase performance.
83 This is because VMs then bypass the (default) DMA translation normally
84 performed by the hyper-visor and instead pass DMA requests directly to the
85 hardware IOMMU. To enable these options, add:
86
87 ----
88 iommu=pt
89 ----
90
91 to the xref:sysboot_edit_kernel_cmdline[kernel commandline].
92
93 .Kernel Modules
94
95 //TODO: remove `vfio_virqfd` stuff with eol of pve 7
96 You have to make sure the following modules are loaded. This can be achieved by
97 adding them to `'/etc/modules''. In kernels newer than 6.2 ({pve} 8 and onward)
98 the 'vfio_virqfd' module is part of the 'vfio' module, therefore loading
99 'vfio_virqfd' in {pve} 8 and newer is not necessary.
100
101 ----
102 vfio
103 vfio_iommu_type1
104 vfio_pci
105 vfio_virqfd #not needed if on kernel 6.2 or newer
106 ----
107
108 [[qm_pci_passthrough_update_initramfs]]
109 After changing anything modules related, you need to refresh your
110 `initramfs`. On {pve} this can be done by executing:
111
112 ----
113 # update-initramfs -u -k all
114 ----
115
116 To check if the modules are being loaded, the output of
117
118 ----
119 # lsmod | grep vfio
120 ----
121
122 should include the four modules from above.
123
124 .Finish Configuration
125
126 Finally reboot to bring the changes into effect and check that it is indeed
127 enabled.
128
129 ----
130 # dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
131 ----
132
133 should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is
134 enabled, depending on hardware and kernel the exact message can vary.
135
136 For notes on how to troubleshoot or verify if IOMMU is working as intended, please
137 see the https://pve.proxmox.com/wiki/PCI_Passthrough#Verifying_IOMMU_parameters[Verifying IOMMU Parameters]
138 section in our wiki.
139
140 It is also important that the device(s) you want to pass through
141 are in a *separate* `IOMMU` group. This can be checked with a call to the {pve}
142 API:
143
144 ----
145 # pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""
146 ----
147
148 It is okay if the device is in an `IOMMU` group together with its functions
149 (e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge.
150
151 .PCI(e) slots
152 [NOTE]
153 ====
154 Some platforms handle their physical PCI(e) slots differently. So, sometimes
155 it can help to put the card in a another PCI(e) slot, if you do not get the
156 desired `IOMMU` group separation.
157 ====
158
159 .Unsafe interrupts
160 [NOTE]
161 ====
162 For some platforms, it may be necessary to allow unsafe interrupts.
163 For this add the following line in a file ending with `.conf' file in
164 */etc/modprobe.d/*:
165
166 ----
167 options vfio_iommu_type1 allow_unsafe_interrupts=1
168 ----
169
170 Please be aware that this option can make your system unstable.
171 ====
172
173 GPU Passthrough Notes
174 ^^^^^^^^^^^^^^^^^^^^^
175
176 It is not possible to display the frame buffer of the GPU via NoVNC or SPICE on
177 the {pve} web interface.
178
179 When passing through a whole GPU or a vGPU and graphic output is wanted, one
180 has to either physically connect a monitor to the card, or configure a remote
181 desktop software (for example, VNC or RDP) inside the guest.
182
183 If you want to use the GPU as a hardware accelerator, for example, for
184 programs using OpenCL or CUDA, this is not required.
185
186 Host Device Passthrough
187 ~~~~~~~~~~~~~~~~~~~~~~~
188
189 The most used variant of PCI(e) passthrough is to pass through a whole
190 PCI(e) card, for example a GPU or a network card.
191
192
193 Host Configuration
194 ^^^^^^^^^^^^^^^^^^
195
196 {pve} tries to automatically make the PCI(e) device unavailable for the host.
197 However, if this doesn't work, there are two things that can be done:
198
199 * pass the device IDs to the options of the 'vfio-pci' modules by adding
200 +
201 ----
202 options vfio-pci ids=1234:5678,4321:8765
203 ----
204 +
205 to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are
206 the vendor and device IDs obtained by:
207 +
208 ----
209 # lspci -nn
210 ----
211
212 * blacklist the driver on the host completely, ensuring that it is free to bind
213 for passthrough, with
214 +
215 ----
216 blacklist DRIVERNAME
217 ----
218 +
219 in a .conf file in */etc/modprobe.d/*.
220 +
221 To find the drivername, execute
222 +
223 ----
224 # lspci -k
225 ----
226 +
227 for example:
228 +
229 ----
230 # lspci -k | grep -A 3 "VGA"
231 ----
232 +
233 will output something similar to
234 +
235 ----
236 01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)
237 Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 [GeForce GT 1030]
238 Kernel driver in use: <some-module>
239 Kernel modules: <some-module>
240 ----
241 +
242 Now we can blacklist the drivers by writing them into a .conf file:
243 +
244 ----
245 echo "blacklist <some-module>" >> /etc/modprobe.d/blacklist.conf
246 ----
247
248 For both methods you need to
249 xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and
250 reboot after that.
251
252 Should this not work, you might need to set a soft dependency to load the gpu
253 modules before loading 'vfio-pci'. This can be done with the 'softdep' flag, see
254 also the manpages on 'modprobe.d' for more information.
255
256 For example, if you are using drivers named <some-module>:
257
258 ----
259 # echo "softdep <some-module> pre: vfio-pci" >> /etc/modprobe.d/<some-module>.conf
260 ----
261
262
263 .Verify Configuration
264
265 To check if your changes were successful, you can use
266
267 ----
268 # lspci -nnk
269 ----
270
271 and check your device entry. If it says
272
273 ----
274 Kernel driver in use: vfio-pci
275 ----
276
277 or the 'in use' line is missing entirely, the device is ready to be used for
278 passthrough.
279
280 [[qm_pci_passthrough_vm_config]]
281 VM Configuration
282 ^^^^^^^^^^^^^^^^
283 When passing through a GPU, the best compatibility is reached when using
284 'q35' as machine type, 'OVMF' ('UEFI' for VMs) instead of SeaBIOS and PCIe
285 instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the
286 GPU needs to have an UEFI capable ROM, otherwise use SeaBIOS instead. To check if
287 the ROM is UEFI capable, see the
288 https://pve.proxmox.com/wiki/PCI_Passthrough#How_to_know_if_a_graphics_card_is_UEFI_.28OVMF.29_compatible[PCI Passthrough Examples]
289 wiki.
290
291 Furthermore, using OVMF, disabling vga arbitration may be possible, reducing the
292 amount of legacy code needed to be run during boot. To disable vga arbitration:
293
294 ----
295 echo "options vfio-pci ids=<vendor-id>,<device-id> disable_vga=1" > /etc/modprobe.d/vfio.conf
296 ----
297
298 replacing the <vendor-id> and <device-id> with the ones obtained from:
299
300 ----
301 # lspci -nn
302 ----
303
304 PCI devices can be added in the web interface in the hardware section of the VM.
305 Alternatively, you can use the command line; set the *hostpciX* option in the VM
306 configuration, for example by executing:
307
308 ----
309 # qm set VMID -hostpci0 00:02.0
310 ----
311
312 or by adding a line to the VM configuration file:
313
314 ----
315 hostpci0: 00:02.0
316 ----
317
318
319 If your device has multiple functions (e.g., ``00:02.0`' and ``00:02.1`' ),
320 you can pass them through all together with the shortened syntax ``00:02`'.
321 This is equivalent with checking the ``All Functions`' checkbox in the
322 web interface.
323
324 There are some options to which may be necessary, depending on the device
325 and guest OS:
326
327 * *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM.
328 With this enabled the *vga* configuration option will be ignored.
329
330 * *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device
331 combination require PCIe rather than PCI. PCIe is only available for 'q35'
332 machine types.
333
334 * *rombar=on|off* makes the firmware ROM visible for the guest. Default is on.
335 Some PCI(e) devices need this disabled.
336
337 * *romfile=<path>*, is an optional path to a ROM file for the device to use.
338 This is a relative path under */usr/share/kvm/*.
339
340 .Example
341
342 An example of PCIe passthrough with a GPU set to primary:
343
344 ----
345 # qm set VMID -hostpci0 02:00,pcie=on,x-vga=on
346 ----
347
348 .PCI ID overrides
349
350 You can override the PCI vendor ID, device ID, and subsystem IDs that will be
351 seen by the guest. This is useful if your device is a variant with an ID that
352 your guest's drivers don't recognize, but you want to force those drivers to be
353 loaded anyway (e.g. if you know your device shares the same chipset as a
354 supported variant).
355
356 The available options are `vendor-id`, `device-id`, `sub-vendor-id`, and
357 `sub-device-id`. You can set any or all of these to override your device's
358 default IDs.
359
360 For example:
361
362 ----
363 # qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000
364 ----
365
366 SR-IOV
367 ~~~~~~
368
369 Another variant for passing through PCI(e) devices is to use the hardware
370 virtualization features of your devices, if available.
371
372 .Enabling SR-IOV
373 [NOTE]
374 ====
375 To use SR-IOV, platform support is especially important. It may be necessary
376 to enable this feature in the BIOS/UEFI first, or to use a specific PCI(e) port
377 for it to work. In doubt, consult the manual of the platform or contact its
378 vendor.
379 ====
380
381 'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables
382 a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the
383 system. Each of those 'VF' can be used in a different VM, with full hardware
384 features and also better performance and lower latency than software
385 virtualized devices.
386
387 Currently, the most common use case for this are NICs (**N**etwork
388 **I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per
389 physical port. This allows using features such as checksum offloading, etc. to
390 be used inside a VM, reducing the (host) CPU overhead.
391
392 Host Configuration
393 ^^^^^^^^^^^^^^^^^^
394
395 Generally, there are two methods for enabling virtual functions on a device.
396
397 * sometimes there is an option for the driver module e.g. for some
398 Intel drivers
399 +
400 ----
401 max_vfs=4
402 ----
403 +
404 which could be put file with '.conf' ending under */etc/modprobe.d/*.
405 (Do not forget to update your initramfs after that)
406 +
407 Please refer to your driver module documentation for the exact
408 parameters and options.
409
410 * The second, more generic, approach is using the `sysfs`.
411 If a device and driver supports this you can change the number of VFs on
412 the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute:
413 +
414 ----
415 # echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
416 ----
417 +
418 To make this change persistent you can use the `sysfsutils` Debian package.
419 After installation configure it via */etc/sysfs.conf* or a `FILE.conf' in
420 */etc/sysfs.d/*.
421
422 VM Configuration
423 ^^^^^^^^^^^^^^^^
424
425 After creating VFs, you should see them as separate PCI(e) devices when
426 outputting them with `lspci`. Get their ID and pass them through like a
427 xref:qm_pci_passthrough_vm_config[normal PCI(e) device].
428
429 Mediated Devices (vGPU, GVT-g)
430 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
431
432 Mediated devices are another method to reuse features and performance from
433 physical hardware for virtualized hardware. These are found most common in
434 virtualized GPU setups such as Intel's GVT-g and NVIDIA's vGPUs used in their
435 GRID technology.
436
437 With this, a physical Card is able to create virtual cards, similar to SR-IOV.
438 The difference is that mediated devices do not appear as PCI(e) devices in the
439 host, and are such only suited for using in virtual machines.
440
441 Host Configuration
442 ^^^^^^^^^^^^^^^^^^
443
444 In general your card's driver must support that feature, otherwise it will
445 not work. So please refer to your vendor for compatible drivers and how to
446 configure them.
447
448 Intel's drivers for GVT-g are integrated in the Kernel and should work
449 with 5th, 6th and 7th generation Intel Core Processors, as well as E3 v4, E3
450 v5 and E3 v6 Xeon Processors.
451
452 To enable it for Intel Graphics, you have to make sure to load the module
453 'kvmgt' (for example via `/etc/modules`) and to enable it on the
454 xref:sysboot_edit_kernel_cmdline[Kernel commandline] and add the following parameter:
455
456 ----
457 i915.enable_gvt=1
458 ----
459
460 After that remember to
461 xref:qm_pci_passthrough_update_initramfs[update the `initramfs`],
462 and reboot your host.
463
464 VM Configuration
465 ^^^^^^^^^^^^^^^^
466
467 To use a mediated device, simply specify the `mdev` property on a `hostpciX`
468 VM configuration option.
469
470 You can get the supported devices via the 'sysfs'. For example, to list the
471 supported types for the device '0000:00:02.0' you would simply execute:
472
473 ----
474 # ls /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types
475 ----
476
477 Each entry is a directory which contains the following important files:
478
479 * 'available_instances' contains the amount of still available instances of
480 this type, each 'mdev' use in a VM reduces this.
481 * 'description' contains a short description about the capabilities of the type
482 * 'create' is the endpoint to create such a device, {pve} does this
483 automatically for you, if a 'hostpciX' option with `mdev` is configured.
484
485 Example configuration with an `Intel GVT-g vGPU` (`Intel Skylake 6700k`):
486
487 ----
488 # qm set VMID -hostpci0 00:02.0,mdev=i915-GVTg_V5_4
489 ----
490
491 With this set, {pve} automatically creates such a device on VM start, and
492 cleans it up again when the VM stops.
493
494 Use in Clusters
495 ~~~~~~~~~~~~~~~
496
497 It is also possible to map devices on a cluster level, so that they can be
498 properly used with HA and hardware changes are detected and non root users
499 can configure them. See xref:resource_mapping[Resource Mapping]
500 for details on that.
501
502 [[qm_pci_viommu]]
503 vIOMMU (emulated IOMMU)
504 ~~~~~~~~~~~~~~~~~~~~~~~
505
506 vIOMMU is the emulation of a hardware IOMMU within a virtual machine, providing
507 improved memory access control and security for virtualized I/O devices. Using
508 the vIOMMU option also allows you to pass through PCI devices to level-2 VMs in
509 level-1 VMs via https://pve.proxmox.com/wiki/Nested_Virtualization[Nested Virtualization].
510 There are currently two vIOMMU implementations available: Intel and VirtIO.
511
512 Host requirement:
513
514 * Add `intel_iommu=on` or `amd_iommu=on` depending on your CPU to your kernel
515 command line.
516
517 Intel vIOMMU
518 ^^^^^^^^^^^^
519
520 Intel vIOMMU specific VM requirements:
521
522 * Whether you are using an Intel or AMD CPU on your host, it is important to set
523 `intel_iommu=on` in the VMs kernel parameters.
524
525 * To use Intel vIOMMU you need to set *q35* as the machine type.
526
527 If all requirements are met, you can add `viommu=intel` to the machine parameter
528 in the configuration of the VM that should be able to pass through PCI devices.
529
530 ----
531 # qm set VMID -machine q35,viommu=intel
532 ----
533
534 https://wiki.qemu.org/Features/VT-d[QEMU documentation for VT-d]
535
536 VirtIO vIOMMU
537 ^^^^^^^^^^^^^
538
539 This vIOMMU implementation is more recent and does not have as many limitations
540 as Intel vIOMMU but is currently less used in production and less documentated.
541
542 With VirtIO vIOMMU there is *no* need to set any kernel parameters. It is also
543 *not* necessary to use q35 as the machine type, but it is advisable if you want
544 to use PCIe.
545
546 ----
547 # qm set VMID -machine q35,viommu=virtio
548 ----
549
550 https://web.archive.org/web/20230804075844/https://michael2012z.medium.com/virtio-iommu-789369049443[Blog-Post by Michael Zhao explaining virtio-iommu]
551
552 ifdef::wiki[]
553
554 See Also
555 ~~~~~~~~
556
557 * link:/wiki/Pci_passthrough[PCI Passthrough Examples]
558
559 endif::wiki[]