]> git.proxmox.com Git - pve-docs.git/blame - qm-pci-passthrough.adoc
pcie-passthrough: note that iommu activation is not always necessary
[pve-docs.git] / qm-pci-passthrough.adoc
CommitLineData
6e4c46c4
DC
1[[qm_pci_passthrough]]
2PCI(e) Passthrough
3------------------
e582833b
DC
4ifdef::wiki[]
5:pve-toplevel:
6endif::wiki[]
6e4c46c4
DC
7
8PCI(e) passthrough is a mechanism to give a virtual machine control over
49f20f1b
TL
9a PCI device from the host. This can have some advantages over using
10virtualized hardware, for example lower latency, higher performance, or more
11features (e.g., offloading).
6e4c46c4 12
49f20f1b 13But, if you pass through a device to a virtual machine, you cannot use that
6e4c46c4
DC
14device anymore on the host or in any other VM.
15
16General Requirements
17~~~~~~~~~~~~~~~~~~~~
18
19Since passthrough is a feature which also needs hardware support, there are
49f20f1b
TL
20some requirements to check and preparations to be done to make it work.
21
6e4c46c4
DC
22
23Hardware
24^^^^^^^^
49f20f1b
TL
25Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement
26**U**nit) interrupt remapping, this includes the CPU and the mainboard.
6e4c46c4 27
49f20f1b
TL
28Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this.
29But it is not guaranteed that everything will work out of the box, due
30to bad hardware implementation and missing or low quality drivers.
6e4c46c4 31
49f20f1b 32Further, server grade hardware has often better support than consumer grade
6e4c46c4
DC
33hardware, but even then, many modern system can support this.
34
49f20f1b 35Please refer to your hardware vendor to check if they support this feature
a22d7c24 36under Linux for your specific setup.
49f20f1b 37
6e4c46c4
DC
38
39Configuration
40^^^^^^^^^^^^^
41
49f20f1b
TL
42Once you ensured that your hardware supports passthrough, you will need to do
43some configuration to enable PCI(e) passthrough.
6e4c46c4 44
6e4c46c4 45
39d84f28 46.IOMMU
6e4c46c4 47
63f0bb9d
DC
48First, the IOMMU support has to be enabled in your BIOS/UEFI. Most often, that
49options is named `IOMMU` or `VT-d`, but check the manual for your motherboard
50for the exact option you need to enable.
51
e51a78cd 52Then, the IOMMU might need to be activated on the
69055103 53xref:sysboot_edit_kernel_cmdline[kernel commandline].
e51a78cd 54(On newer kernels, this should not be necessary.)
1748211a
SI
55
56The command line parameters are:
6e4c46c4 57
49f20f1b
TL
58* for Intel CPUs:
59+
60----
61 intel_iommu=on
62----
0c54d612 63* for AMD CPUs it should be enabled automatically.
6e4c46c4 64
39d84f28 65.Kernel Modules
6e4c46c4 66
49f20f1b
TL
67You have to make sure the following modules are loaded. This can be achieved by
68adding them to `'/etc/modules''
6e4c46c4 69
49f20f1b 70----
6e4c46c4
DC
71 vfio
72 vfio_iommu_type1
73 vfio_pci
74 vfio_virqfd
49f20f1b 75----
6e4c46c4 76
49f20f1b 77[[qm_pci_passthrough_update_initramfs]]
6e4c46c4 78After changing anything modules related, you need to refresh your
49f20f1b 79`initramfs`. On {pve} this can be done by executing:
6e4c46c4
DC
80
81----
49f20f1b 82# update-initramfs -u -k all
6e4c46c4
DC
83----
84
39d84f28 85.Finish Configuration
49f20f1b
TL
86
87Finally reboot to bring the changes into effect and check that it is indeed
88enabled.
6e4c46c4
DC
89
90----
5e235b99 91# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
6e4c46c4
DC
92----
93
49f20f1b
TL
94should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is
95enabled, depending on hardware and kernel the exact message can vary.
6e4c46c4
DC
96
97It is also important that the device(s) you want to pass through
49f20f1b 98are in a *separate* `IOMMU` group. This can be checked with:
6e4c46c4
DC
99
100----
49f20f1b 101# find /sys/kernel/iommu_groups/ -type l
6e4c46c4
DC
102----
103
49f20f1b 104It is okay if the device is in an `IOMMU` group together with its functions
6e4c46c4
DC
105(e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge.
106
107.PCI(e) slots
108[NOTE]
109====
49f20f1b
TL
110Some platforms handle their physical PCI(e) slots differently. So, sometimes
111it can help to put the card in a another PCI(e) slot, if you do not get the
112desired `IOMMU` group separation.
6e4c46c4
DC
113====
114
115.Unsafe interrupts
116[NOTE]
117====
118For some platforms, it may be necessary to allow unsafe interrupts.
49f20f1b
TL
119For this add the following line in a file ending with `.conf' file in
120*/etc/modprobe.d/*:
6e4c46c4 121
49f20f1b 122----
6e4c46c4 123 options vfio_iommu_type1 allow_unsafe_interrupts=1
49f20f1b 124----
6e4c46c4
DC
125
126Please be aware that this option can make your system unstable.
127====
128
082b32fb
TL
129GPU Passthrough Notes
130^^^^^^^^^^^^^^^^^^^^^
13cae0c1 131
082b32fb
TL
132It is not possible to display the frame buffer of the GPU via NoVNC or SPICE on
133the {pve} web interface.
13cae0c1 134
082b32fb
TL
135When passing through a whole GPU or a vGPU and graphic output is wanted, one
136has to either physically connect a monitor to the card, or configure a remote
137desktop software (for example, VNC or RDP) inside the guest.
13cae0c1 138
082b32fb
TL
139If you want to use the GPU as a hardware accelerator, for example, for
140programs using OpenCL or CUDA, this is not required.
13cae0c1 141
49f20f1b 142Host Device Passthrough
6e4c46c4
DC
143~~~~~~~~~~~~~~~~~~~~~~~
144
145The most used variant of PCI(e) passthrough is to pass through a whole
49f20f1b
TL
146PCI(e) card, for example a GPU or a network card.
147
6e4c46c4
DC
148
149Host Configuration
150^^^^^^^^^^^^^^^^^^
151
eebb3506 152In this case, the host must not use the card. There are two methods to achieve
49f20f1b 153this:
6e4c46c4 154
49f20f1b
TL
155* pass the device IDs to the options of the 'vfio-pci' modules by adding
156+
157----
6e4c46c4 158 options vfio-pci ids=1234:5678,4321:8765
6e4c46c4 159----
49f20f1b
TL
160+
161to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are
162the vendor and device IDs obtained by:
163+
164----
eebb3506 165# lspci -nn
6e4c46c4
DC
166----
167
49f20f1b
TL
168* blacklist the driver completely on the host, ensuring that it is free to bind
169for passthrough, with
170+
171----
6e4c46c4 172 blacklist DRIVERNAME
49f20f1b
TL
173----
174+
175in a .conf file in */etc/modprobe.d/*.
6e4c46c4 176
49f20f1b
TL
177For both methods you need to
178xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and
179reboot after that.
6e4c46c4 180
eebb3506
SR
181.Verify Configuration
182
183To check if your changes were successful, you can use
184
185----
186# lspci -nnk
187----
188
189and check your device entry. If it says
190
191----
192Kernel driver in use: vfio-pci
193----
194
195or the 'in use' line is missing entirely, the device is ready to be used for
196passthrough.
197
49f20f1b 198[[qm_pci_passthrough_vm_config]]
6e4c46c4
DC
199VM Configuration
200^^^^^^^^^^^^^^^^
49f20f1b
TL
201To pass through the device you need to set the *hostpciX* option in the VM
202configuration, for example by executing:
6e4c46c4
DC
203
204----
49f20f1b 205# qm set VMID -hostpci0 00:02.0
6e4c46c4
DC
206----
207
5ee3d3cd 208If your device has multiple functions (e.g., ``00:02.0`' and ``00:02.1`' ),
1fa89424
DC
209you can pass them through all together with the shortened syntax ``00:02`'.
210This is equivalent with checking the ``All Functions`' checkbox in the
211web-interface.
6e4c46c4
DC
212
213There are some options to which may be necessary, depending on the device
49f20f1b
TL
214and guest OS:
215
216* *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM.
217With this enabled the *vga* configuration option will be ignored.
6e4c46c4 218
6e4c46c4 219* *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device
49f20f1b
TL
220combination require PCIe rather than PCI. PCIe is only available for 'q35'
221machine types.
222
6e4c46c4
DC
223* *rombar=on|off* makes the firmware ROM visible for the guest. Default is on.
224Some PCI(e) devices need this disabled.
49f20f1b 225
6e4c46c4 226* *romfile=<path>*, is an optional path to a ROM file for the device to use.
49f20f1b
TL
227This is a relative path under */usr/share/kvm/*.
228
39d84f28 229.Example
6e4c46c4
DC
230
231An example of PCIe passthrough with a GPU set to primary:
232
233----
49f20f1b 234# qm set VMID -hostpci0 02:00,pcie=on,x-vga=on
6e4c46c4
DC
235----
236
cf2da2d8
NS
237.PCI ID overrides
238
239You can override the PCI vendor ID, device ID, and subsystem IDs that will be
240seen by the guest. This is useful if your device is a variant with an ID that
241your guest's drivers don't recognize, but you want to force those drivers to be
242loaded anyway (e.g. if you know your device shares the same chipset as a
243supported variant).
244
245The available options are `vendor-id`, `device-id`, `sub-vendor-id`, and
246`sub-device-id`. You can set any or all of these to override your device's
247default IDs.
248
249For example:
250
251----
252# qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000
253----
254
49f20f1b 255
6e4c46c4
DC
256Other considerations
257^^^^^^^^^^^^^^^^^^^^
258
259When passing through a GPU, the best compatibility is reached when using
49f20f1b
TL
260'q35' as machine type, 'OVMF' ('EFI' for VMs) instead of SeaBIOS and PCIe
261instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the
262GPU needs to have an EFI capable ROM, otherwise use SeaBIOS instead.
6e4c46c4
DC
263
264SR-IOV
265~~~~~~
266
49f20f1b
TL
267Another variant for passing through PCI(e) devices, is to use the hardware
268virtualization features of your devices, if available.
269
270'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables
271a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the
272system. Each of those 'VF' can be used in a different VM, with full hardware
273features and also better performance and lower latency than software
274virtualized devices.
6e4c46c4 275
49f20f1b
TL
276Currently, the most common use case for this are NICs (**N**etwork
277**I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per
278physical port. This allows using features such as checksum offloading, etc. to
279be used inside a VM, reducing the (host) CPU overhead.
6e4c46c4 280
6e4c46c4
DC
281
282Host Configuration
283^^^^^^^^^^^^^^^^^^
284
49f20f1b 285Generally, there are two methods for enabling virtual functions on a device.
6e4c46c4 286
49f20f1b 287* sometimes there is an option for the driver module e.g. for some
6e4c46c4 288Intel drivers
49f20f1b
TL
289+
290----
6e4c46c4 291 max_vfs=4
49f20f1b
TL
292----
293+
294which could be put file with '.conf' ending under */etc/modprobe.d/*.
6e4c46c4 295(Do not forget to update your initramfs after that)
49f20f1b 296+
6e4c46c4
DC
297Please refer to your driver module documentation for the exact
298parameters and options.
299
49f20f1b
TL
300* The second, more generic, approach is using the `sysfs`.
301If a device and driver supports this you can change the number of VFs on
302the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute:
303+
6e4c46c4 304----
49f20f1b 305# echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
6e4c46c4 306----
49f20f1b
TL
307+
308To make this change persistent you can use the `sysfsutils` Debian package.
39d84f28 309After installation configure it via */etc/sysfs.conf* or a `FILE.conf' in
49f20f1b 310*/etc/sysfs.d/*.
6e4c46c4
DC
311
312VM Configuration
313^^^^^^^^^^^^^^^^
314
49f20f1b
TL
315After creating VFs, you should see them as separate PCI(e) devices when
316outputting them with `lspci`. Get their ID and pass them through like a
317xref:qm_pci_passthrough_vm_config[normal PCI(e) device].
6e4c46c4
DC
318
319Other considerations
320^^^^^^^^^^^^^^^^^^^^
321
322For this feature, platform support is especially important. It may be necessary
49f20f1b
TL
323to enable this feature in the BIOS/EFI first, or to use a specific PCI(e) port
324for it to work. In doubt, consult the manual of the platform or contact its
325vendor.
050192c5 326
d25f097c
TL
327Mediated Devices (vGPU, GVT-g)
328~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
050192c5 329
a22d7c24 330Mediated devices are another method to reuse features and performance from
d25f097c 331physical hardware for virtualized hardware. These are found most common in
3a433e9b 332virtualized GPU setups such as Intel's GVT-g and NVIDIA's vGPUs used in their
d25f097c
TL
333GRID technology.
334
335With this, a physical Card is able to create virtual cards, similar to SR-IOV.
336The difference is that mediated devices do not appear as PCI(e) devices in the
337host, and are such only suited for using in virtual machines.
050192c5 338
050192c5
DC
339
340Host Configuration
341^^^^^^^^^^^^^^^^^^
342
d25f097c 343In general your card's driver must support that feature, otherwise it will
a22d7c24 344not work. So please refer to your vendor for compatible drivers and how to
050192c5
DC
345configure them.
346
3a433e9b 347Intel's drivers for GVT-g are integrated in the Kernel and should work
a22d7c24
SR
348with 5th, 6th and 7th generation Intel Core Processors, as well as E3 v4, E3
349v5 and E3 v6 Xeon Processors.
050192c5 350
1748211a
SI
351To enable it for Intel Graphics, you have to make sure to load the module
352'kvmgt' (for example via `/etc/modules`) and to enable it on the
69055103 353xref:sysboot_edit_kernel_cmdline[Kernel commandline] and add the following parameter:
050192c5
DC
354
355----
356 i915.enable_gvt=1
357----
358
359After that remember to
360xref:qm_pci_passthrough_update_initramfs[update the `initramfs`],
1748211a 361and reboot your host.
050192c5
DC
362
363VM Configuration
364^^^^^^^^^^^^^^^^
365
d25f097c
TL
366To use a mediated device, simply specify the `mdev` property on a `hostpciX`
367VM configuration option.
050192c5 368
d25f097c
TL
369You can get the supported devices via the 'sysfs'. For example, to list the
370supported types for the device '0000:00:02.0' you would simply execute:
050192c5
DC
371
372----
373# ls /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types
374----
375
376Each entry is a directory which contains the following important files:
377
d25f097c
TL
378* 'available_instances' contains the amount of still available instances of
379this type, each 'mdev' use in a VM reduces this.
050192c5 380* 'description' contains a short description about the capabilities of the type
d25f097c
TL
381* 'create' is the endpoint to create such a device, {pve} does this
382automatically for you, if a 'hostpciX' option with `mdev` is configured.
050192c5 383
d25f097c 384Example configuration with an `Intel GVT-g vGPU` (`Intel Skylake 6700k`):
050192c5
DC
385
386----
387# qm set VMID -hostpci0 00:02.0,mdev=i915-GVTg_V5_4
388----
389
390With this set, {pve} automatically creates such a device on VM start, and
391cleans it up again when the VM stops.
e582833b
DC
392
393ifdef::wiki[]
394
395See Also
396~~~~~~~~
397
398* link:/wiki/Pci_passthrough[PCI Passthrough Examples]
399
400endif::wiki[]