]>
Commit | Line | Data |
---|---|---|
6e4c46c4 DC |
1 | [[qm_pci_passthrough]] |
2 | PCI(e) Passthrough | |
3 | ------------------ | |
e582833b DC |
4 | ifdef::wiki[] |
5 | :pve-toplevel: | |
6 | endif::wiki[] | |
6e4c46c4 DC |
7 | |
8 | PCI(e) passthrough is a mechanism to give a virtual machine control over | |
49f20f1b TL |
9 | a PCI device from the host. This can have some advantages over using |
10 | virtualized hardware, for example lower latency, higher performance, or more | |
11 | features (e.g., offloading). | |
6e4c46c4 | 12 | |
49f20f1b | 13 | But, if you pass through a device to a virtual machine, you cannot use that |
6e4c46c4 DC |
14 | device anymore on the host or in any other VM. |
15 | ||
16 | General Requirements | |
17 | ~~~~~~~~~~~~~~~~~~~~ | |
18 | ||
19 | Since passthrough is a feature which also needs hardware support, there are | |
49f20f1b TL |
20 | some requirements to check and preparations to be done to make it work. |
21 | ||
6e4c46c4 DC |
22 | |
23 | Hardware | |
24 | ^^^^^^^^ | |
49f20f1b TL |
25 | Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement |
26 | **U**nit) interrupt remapping, this includes the CPU and the mainboard. | |
6e4c46c4 | 27 | |
49f20f1b TL |
28 | Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this. |
29 | But it is not guaranteed that everything will work out of the box, due | |
30 | to bad hardware implementation and missing or low quality drivers. | |
6e4c46c4 | 31 | |
49f20f1b | 32 | Further, server grade hardware has often better support than consumer grade |
6e4c46c4 DC |
33 | hardware, but even then, many modern system can support this. |
34 | ||
49f20f1b | 35 | Please refer to your hardware vendor to check if they support this feature |
a22d7c24 | 36 | under Linux for your specific setup. |
49f20f1b | 37 | |
6e4c46c4 DC |
38 | |
39 | Configuration | |
40 | ^^^^^^^^^^^^^ | |
41 | ||
49f20f1b TL |
42 | Once you ensured that your hardware supports passthrough, you will need to do |
43 | some configuration to enable PCI(e) passthrough. | |
6e4c46c4 | 44 | |
6e4c46c4 | 45 | |
39d84f28 | 46 | .IOMMU |
6e4c46c4 | 47 | |
63f0bb9d DC |
48 | First, the IOMMU support has to be enabled in your BIOS/UEFI. Most often, that |
49 | options is named `IOMMU` or `VT-d`, but check the manual for your motherboard | |
50 | for the exact option you need to enable. | |
51 | ||
e51a78cd | 52 | Then, the IOMMU might need to be activated on the |
69055103 | 53 | xref:sysboot_edit_kernel_cmdline[kernel commandline]. |
e51a78cd | 54 | (On newer kernels, this should not be necessary.) |
1748211a SI |
55 | |
56 | The command line parameters are: | |
6e4c46c4 | 57 | |
49f20f1b TL |
58 | * for Intel CPUs: |
59 | + | |
60 | ---- | |
61 | intel_iommu=on | |
62 | ---- | |
0c54d612 | 63 | * for AMD CPUs it should be enabled automatically. |
6e4c46c4 | 64 | |
a4c60848 DC |
65 | |
66 | If your hardware supports it, enabling IOMMU passthrough mode might increase | |
67 | performance, because then the VMs bypass the (default) DMA translation | |
68 | which is normally done by the hypervisor, before handing DMA requests off to | |
69 | the hardware IOMMU. You can enable it with adding | |
70 | ||
71 | ---- | |
72 | iommu.passthrough=1 | |
73 | ---- | |
74 | ||
75 | or | |
76 | ||
77 | ---- | |
78 | iommu=pt | |
79 | ---- | |
80 | ||
81 | to the kernel commandline. | |
82 | ||
39d84f28 | 83 | .Kernel Modules |
6e4c46c4 | 84 | |
49f20f1b TL |
85 | You have to make sure the following modules are loaded. This can be achieved by |
86 | adding them to `'/etc/modules'' | |
6e4c46c4 | 87 | |
49f20f1b | 88 | ---- |
6e4c46c4 DC |
89 | vfio |
90 | vfio_iommu_type1 | |
91 | vfio_pci | |
92 | vfio_virqfd | |
49f20f1b | 93 | ---- |
6e4c46c4 | 94 | |
49f20f1b | 95 | [[qm_pci_passthrough_update_initramfs]] |
6e4c46c4 | 96 | After changing anything modules related, you need to refresh your |
49f20f1b | 97 | `initramfs`. On {pve} this can be done by executing: |
6e4c46c4 DC |
98 | |
99 | ---- | |
49f20f1b | 100 | # update-initramfs -u -k all |
6e4c46c4 DC |
101 | ---- |
102 | ||
39d84f28 | 103 | .Finish Configuration |
49f20f1b TL |
104 | |
105 | Finally reboot to bring the changes into effect and check that it is indeed | |
106 | enabled. | |
6e4c46c4 DC |
107 | |
108 | ---- | |
5e235b99 | 109 | # dmesg | grep -e DMAR -e IOMMU -e AMD-Vi |
6e4c46c4 DC |
110 | ---- |
111 | ||
49f20f1b TL |
112 | should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is |
113 | enabled, depending on hardware and kernel the exact message can vary. | |
6e4c46c4 DC |
114 | |
115 | It is also important that the device(s) you want to pass through | |
49f20f1b | 116 | are in a *separate* `IOMMU` group. This can be checked with: |
6e4c46c4 DC |
117 | |
118 | ---- | |
49f20f1b | 119 | # find /sys/kernel/iommu_groups/ -type l |
6e4c46c4 DC |
120 | ---- |
121 | ||
49f20f1b | 122 | It is okay if the device is in an `IOMMU` group together with its functions |
6e4c46c4 DC |
123 | (e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge. |
124 | ||
125 | .PCI(e) slots | |
126 | [NOTE] | |
127 | ==== | |
49f20f1b TL |
128 | Some platforms handle their physical PCI(e) slots differently. So, sometimes |
129 | it can help to put the card in a another PCI(e) slot, if you do not get the | |
130 | desired `IOMMU` group separation. | |
6e4c46c4 DC |
131 | ==== |
132 | ||
133 | .Unsafe interrupts | |
134 | [NOTE] | |
135 | ==== | |
136 | For some platforms, it may be necessary to allow unsafe interrupts. | |
49f20f1b TL |
137 | For this add the following line in a file ending with `.conf' file in |
138 | */etc/modprobe.d/*: | |
6e4c46c4 | 139 | |
49f20f1b | 140 | ---- |
6e4c46c4 | 141 | options vfio_iommu_type1 allow_unsafe_interrupts=1 |
49f20f1b | 142 | ---- |
6e4c46c4 DC |
143 | |
144 | Please be aware that this option can make your system unstable. | |
145 | ==== | |
146 | ||
082b32fb TL |
147 | GPU Passthrough Notes |
148 | ^^^^^^^^^^^^^^^^^^^^^ | |
13cae0c1 | 149 | |
082b32fb TL |
150 | It is not possible to display the frame buffer of the GPU via NoVNC or SPICE on |
151 | the {pve} web interface. | |
13cae0c1 | 152 | |
082b32fb TL |
153 | When passing through a whole GPU or a vGPU and graphic output is wanted, one |
154 | has to either physically connect a monitor to the card, or configure a remote | |
155 | desktop software (for example, VNC or RDP) inside the guest. | |
13cae0c1 | 156 | |
082b32fb TL |
157 | If you want to use the GPU as a hardware accelerator, for example, for |
158 | programs using OpenCL or CUDA, this is not required. | |
13cae0c1 | 159 | |
49f20f1b | 160 | Host Device Passthrough |
6e4c46c4 DC |
161 | ~~~~~~~~~~~~~~~~~~~~~~~ |
162 | ||
163 | The most used variant of PCI(e) passthrough is to pass through a whole | |
49f20f1b TL |
164 | PCI(e) card, for example a GPU or a network card. |
165 | ||
6e4c46c4 DC |
166 | |
167 | Host Configuration | |
168 | ^^^^^^^^^^^^^^^^^^ | |
169 | ||
eebb3506 | 170 | In this case, the host must not use the card. There are two methods to achieve |
49f20f1b | 171 | this: |
6e4c46c4 | 172 | |
49f20f1b TL |
173 | * pass the device IDs to the options of the 'vfio-pci' modules by adding |
174 | + | |
175 | ---- | |
6e4c46c4 | 176 | options vfio-pci ids=1234:5678,4321:8765 |
6e4c46c4 | 177 | ---- |
49f20f1b TL |
178 | + |
179 | to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are | |
180 | the vendor and device IDs obtained by: | |
181 | + | |
182 | ---- | |
eebb3506 | 183 | # lspci -nn |
6e4c46c4 DC |
184 | ---- |
185 | ||
49f20f1b TL |
186 | * blacklist the driver completely on the host, ensuring that it is free to bind |
187 | for passthrough, with | |
188 | + | |
189 | ---- | |
6e4c46c4 | 190 | blacklist DRIVERNAME |
49f20f1b TL |
191 | ---- |
192 | + | |
193 | in a .conf file in */etc/modprobe.d/*. | |
6e4c46c4 | 194 | |
49f20f1b TL |
195 | For both methods you need to |
196 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and | |
197 | reboot after that. | |
6e4c46c4 | 198 | |
eebb3506 SR |
199 | .Verify Configuration |
200 | ||
201 | To check if your changes were successful, you can use | |
202 | ||
203 | ---- | |
204 | # lspci -nnk | |
205 | ---- | |
206 | ||
207 | and check your device entry. If it says | |
208 | ||
209 | ---- | |
210 | Kernel driver in use: vfio-pci | |
211 | ---- | |
212 | ||
213 | or the 'in use' line is missing entirely, the device is ready to be used for | |
214 | passthrough. | |
215 | ||
49f20f1b | 216 | [[qm_pci_passthrough_vm_config]] |
6e4c46c4 DC |
217 | VM Configuration |
218 | ^^^^^^^^^^^^^^^^ | |
49f20f1b TL |
219 | To pass through the device you need to set the *hostpciX* option in the VM |
220 | configuration, for example by executing: | |
6e4c46c4 DC |
221 | |
222 | ---- | |
49f20f1b | 223 | # qm set VMID -hostpci0 00:02.0 |
6e4c46c4 DC |
224 | ---- |
225 | ||
5ee3d3cd | 226 | If your device has multiple functions (e.g., ``00:02.0`' and ``00:02.1`' ), |
1fa89424 DC |
227 | you can pass them through all together with the shortened syntax ``00:02`'. |
228 | This is equivalent with checking the ``All Functions`' checkbox in the | |
229 | web-interface. | |
6e4c46c4 DC |
230 | |
231 | There are some options to which may be necessary, depending on the device | |
49f20f1b TL |
232 | and guest OS: |
233 | ||
234 | * *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM. | |
235 | With this enabled the *vga* configuration option will be ignored. | |
6e4c46c4 | 236 | |
6e4c46c4 | 237 | * *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device |
49f20f1b TL |
238 | combination require PCIe rather than PCI. PCIe is only available for 'q35' |
239 | machine types. | |
240 | ||
6e4c46c4 DC |
241 | * *rombar=on|off* makes the firmware ROM visible for the guest. Default is on. |
242 | Some PCI(e) devices need this disabled. | |
49f20f1b | 243 | |
6e4c46c4 | 244 | * *romfile=<path>*, is an optional path to a ROM file for the device to use. |
49f20f1b TL |
245 | This is a relative path under */usr/share/kvm/*. |
246 | ||
39d84f28 | 247 | .Example |
6e4c46c4 DC |
248 | |
249 | An example of PCIe passthrough with a GPU set to primary: | |
250 | ||
251 | ---- | |
49f20f1b | 252 | # qm set VMID -hostpci0 02:00,pcie=on,x-vga=on |
6e4c46c4 DC |
253 | ---- |
254 | ||
cf2da2d8 NS |
255 | .PCI ID overrides |
256 | ||
257 | You can override the PCI vendor ID, device ID, and subsystem IDs that will be | |
258 | seen by the guest. This is useful if your device is a variant with an ID that | |
259 | your guest's drivers don't recognize, but you want to force those drivers to be | |
260 | loaded anyway (e.g. if you know your device shares the same chipset as a | |
261 | supported variant). | |
262 | ||
263 | The available options are `vendor-id`, `device-id`, `sub-vendor-id`, and | |
264 | `sub-device-id`. You can set any or all of these to override your device's | |
265 | default IDs. | |
266 | ||
267 | For example: | |
268 | ||
269 | ---- | |
270 | # qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000 | |
271 | ---- | |
272 | ||
49f20f1b | 273 | |
6e4c46c4 DC |
274 | Other considerations |
275 | ^^^^^^^^^^^^^^^^^^^^ | |
276 | ||
277 | When passing through a GPU, the best compatibility is reached when using | |
49f20f1b TL |
278 | 'q35' as machine type, 'OVMF' ('EFI' for VMs) instead of SeaBIOS and PCIe |
279 | instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the | |
280 | GPU needs to have an EFI capable ROM, otherwise use SeaBIOS instead. | |
6e4c46c4 DC |
281 | |
282 | SR-IOV | |
283 | ~~~~~~ | |
284 | ||
49f20f1b TL |
285 | Another variant for passing through PCI(e) devices, is to use the hardware |
286 | virtualization features of your devices, if available. | |
287 | ||
288 | 'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables | |
289 | a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the | |
290 | system. Each of those 'VF' can be used in a different VM, with full hardware | |
291 | features and also better performance and lower latency than software | |
292 | virtualized devices. | |
6e4c46c4 | 293 | |
49f20f1b TL |
294 | Currently, the most common use case for this are NICs (**N**etwork |
295 | **I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per | |
296 | physical port. This allows using features such as checksum offloading, etc. to | |
297 | be used inside a VM, reducing the (host) CPU overhead. | |
6e4c46c4 | 298 | |
6e4c46c4 DC |
299 | |
300 | Host Configuration | |
301 | ^^^^^^^^^^^^^^^^^^ | |
302 | ||
49f20f1b | 303 | Generally, there are two methods for enabling virtual functions on a device. |
6e4c46c4 | 304 | |
49f20f1b | 305 | * sometimes there is an option for the driver module e.g. for some |
6e4c46c4 | 306 | Intel drivers |
49f20f1b TL |
307 | + |
308 | ---- | |
6e4c46c4 | 309 | max_vfs=4 |
49f20f1b TL |
310 | ---- |
311 | + | |
312 | which could be put file with '.conf' ending under */etc/modprobe.d/*. | |
6e4c46c4 | 313 | (Do not forget to update your initramfs after that) |
49f20f1b | 314 | + |
6e4c46c4 DC |
315 | Please refer to your driver module documentation for the exact |
316 | parameters and options. | |
317 | ||
49f20f1b TL |
318 | * The second, more generic, approach is using the `sysfs`. |
319 | If a device and driver supports this you can change the number of VFs on | |
320 | the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute: | |
321 | + | |
6e4c46c4 | 322 | ---- |
49f20f1b | 323 | # echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs |
6e4c46c4 | 324 | ---- |
49f20f1b TL |
325 | + |
326 | To make this change persistent you can use the `sysfsutils` Debian package. | |
39d84f28 | 327 | After installation configure it via */etc/sysfs.conf* or a `FILE.conf' in |
49f20f1b | 328 | */etc/sysfs.d/*. |
6e4c46c4 DC |
329 | |
330 | VM Configuration | |
331 | ^^^^^^^^^^^^^^^^ | |
332 | ||
49f20f1b TL |
333 | After creating VFs, you should see them as separate PCI(e) devices when |
334 | outputting them with `lspci`. Get their ID and pass them through like a | |
335 | xref:qm_pci_passthrough_vm_config[normal PCI(e) device]. | |
6e4c46c4 DC |
336 | |
337 | Other considerations | |
338 | ^^^^^^^^^^^^^^^^^^^^ | |
339 | ||
340 | For this feature, platform support is especially important. It may be necessary | |
49f20f1b TL |
341 | to enable this feature in the BIOS/EFI first, or to use a specific PCI(e) port |
342 | for it to work. In doubt, consult the manual of the platform or contact its | |
343 | vendor. | |
050192c5 | 344 | |
d25f097c TL |
345 | Mediated Devices (vGPU, GVT-g) |
346 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
050192c5 | 347 | |
a22d7c24 | 348 | Mediated devices are another method to reuse features and performance from |
d25f097c | 349 | physical hardware for virtualized hardware. These are found most common in |
3a433e9b | 350 | virtualized GPU setups such as Intel's GVT-g and NVIDIA's vGPUs used in their |
d25f097c TL |
351 | GRID technology. |
352 | ||
353 | With this, a physical Card is able to create virtual cards, similar to SR-IOV. | |
354 | The difference is that mediated devices do not appear as PCI(e) devices in the | |
355 | host, and are such only suited for using in virtual machines. | |
050192c5 | 356 | |
050192c5 DC |
357 | |
358 | Host Configuration | |
359 | ^^^^^^^^^^^^^^^^^^ | |
360 | ||
d25f097c | 361 | In general your card's driver must support that feature, otherwise it will |
a22d7c24 | 362 | not work. So please refer to your vendor for compatible drivers and how to |
050192c5 DC |
363 | configure them. |
364 | ||
3a433e9b | 365 | Intel's drivers for GVT-g are integrated in the Kernel and should work |
a22d7c24 SR |
366 | with 5th, 6th and 7th generation Intel Core Processors, as well as E3 v4, E3 |
367 | v5 and E3 v6 Xeon Processors. | |
050192c5 | 368 | |
1748211a SI |
369 | To enable it for Intel Graphics, you have to make sure to load the module |
370 | 'kvmgt' (for example via `/etc/modules`) and to enable it on the | |
69055103 | 371 | xref:sysboot_edit_kernel_cmdline[Kernel commandline] and add the following parameter: |
050192c5 DC |
372 | |
373 | ---- | |
374 | i915.enable_gvt=1 | |
375 | ---- | |
376 | ||
377 | After that remember to | |
378 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`], | |
1748211a | 379 | and reboot your host. |
050192c5 DC |
380 | |
381 | VM Configuration | |
382 | ^^^^^^^^^^^^^^^^ | |
383 | ||
d25f097c TL |
384 | To use a mediated device, simply specify the `mdev` property on a `hostpciX` |
385 | VM configuration option. | |
050192c5 | 386 | |
d25f097c TL |
387 | You can get the supported devices via the 'sysfs'. For example, to list the |
388 | supported types for the device '0000:00:02.0' you would simply execute: | |
050192c5 DC |
389 | |
390 | ---- | |
391 | # ls /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types | |
392 | ---- | |
393 | ||
394 | Each entry is a directory which contains the following important files: | |
395 | ||
d25f097c TL |
396 | * 'available_instances' contains the amount of still available instances of |
397 | this type, each 'mdev' use in a VM reduces this. | |
050192c5 | 398 | * 'description' contains a short description about the capabilities of the type |
d25f097c TL |
399 | * 'create' is the endpoint to create such a device, {pve} does this |
400 | automatically for you, if a 'hostpciX' option with `mdev` is configured. | |
050192c5 | 401 | |
d25f097c | 402 | Example configuration with an `Intel GVT-g vGPU` (`Intel Skylake 6700k`): |
050192c5 DC |
403 | |
404 | ---- | |
405 | # qm set VMID -hostpci0 00:02.0,mdev=i915-GVTg_V5_4 | |
406 | ---- | |
407 | ||
408 | With this set, {pve} automatically creates such a device on VM start, and | |
409 | cleans it up again when the VM stops. | |
e582833b DC |
410 | |
411 | ifdef::wiki[] | |
412 | ||
413 | See Also | |
414 | ~~~~~~~~ | |
415 | ||
416 | * link:/wiki/Pci_passthrough[PCI Passthrough Examples] | |
417 | ||
418 | endif::wiki[] |