]>
Commit | Line | Data |
---|---|---|
1 | [[qm_pci_passthrough]] | |
2 | PCI(e) Passthrough | |
3 | ------------------ | |
4 | ifdef::wiki[] | |
5 | :pve-toplevel: | |
6 | endif::wiki[] | |
7 | ||
8 | PCI(e) passthrough is a mechanism to give a virtual machine control over | |
9 | a PCI device from the host. This can have some advantages over using | |
10 | virtualized hardware, for example lower latency, higher performance, or more | |
11 | features (e.g., offloading). | |
12 | ||
13 | But, if you pass through a device to a virtual machine, you cannot use that | |
14 | device anymore on the host or in any other VM. | |
15 | ||
16 | General Requirements | |
17 | ~~~~~~~~~~~~~~~~~~~~ | |
18 | ||
19 | Since passthrough is a feature which also needs hardware support, there are | |
20 | some requirements to check and preparations to be done to make it work. | |
21 | ||
22 | ||
23 | Hardware | |
24 | ^^^^^^^^ | |
25 | Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement | |
26 | **U**nit) interrupt remapping, this includes the CPU and the mainboard. | |
27 | ||
28 | Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this. | |
29 | But it is not guaranteed that everything will work out of the box, due | |
30 | to bad hardware implementation and missing or low quality drivers. | |
31 | ||
32 | Further, server grade hardware has often better support than consumer grade | |
33 | hardware, but even then, many modern system can support this. | |
34 | ||
35 | Please refer to your hardware vendor to check if they support this feature | |
36 | under Linux for your specific setup. | |
37 | ||
38 | ||
39 | Configuration | |
40 | ^^^^^^^^^^^^^ | |
41 | ||
42 | Once you ensured that your hardware supports passthrough, you will need to do | |
43 | some configuration to enable PCI(e) passthrough. | |
44 | ||
45 | ||
46 | .IOMMU | |
47 | ||
48 | The IOMMU has to be activated on the | |
49 | xref:sysboot_edit_kernel_cmdline[kernel commandline]. | |
50 | ||
51 | The command line parameters are: | |
52 | ||
53 | * for Intel CPUs: | |
54 | + | |
55 | ---- | |
56 | intel_iommu=on | |
57 | ---- | |
58 | * for AMD CPUs: | |
59 | + | |
60 | ---- | |
61 | amd_iommu=on | |
62 | ---- | |
63 | ||
64 | ||
65 | .Kernel Modules | |
66 | ||
67 | You have to make sure the following modules are loaded. This can be achieved by | |
68 | adding them to `'/etc/modules'' | |
69 | ||
70 | ---- | |
71 | vfio | |
72 | vfio_iommu_type1 | |
73 | vfio_pci | |
74 | vfio_virqfd | |
75 | ---- | |
76 | ||
77 | [[qm_pci_passthrough_update_initramfs]] | |
78 | After changing anything modules related, you need to refresh your | |
79 | `initramfs`. On {pve} this can be done by executing: | |
80 | ||
81 | ---- | |
82 | # update-initramfs -u -k all | |
83 | ---- | |
84 | ||
85 | If you are using `systemd-boot` make sure to | |
86 | xref:sysboot_systemd_boot_refresh[sync the new initramfs to the bootable partitions]. | |
87 | ||
88 | .Finish Configuration | |
89 | ||
90 | Finally reboot to bring the changes into effect and check that it is indeed | |
91 | enabled. | |
92 | ||
93 | ---- | |
94 | # dmesg | grep -e DMAR -e IOMMU -e AMD-Vi | |
95 | ---- | |
96 | ||
97 | should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is | |
98 | enabled, depending on hardware and kernel the exact message can vary. | |
99 | ||
100 | It is also important that the device(s) you want to pass through | |
101 | are in a *separate* `IOMMU` group. This can be checked with: | |
102 | ||
103 | ---- | |
104 | # find /sys/kernel/iommu_groups/ -type l | |
105 | ---- | |
106 | ||
107 | It is okay if the device is in an `IOMMU` group together with its functions | |
108 | (e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge. | |
109 | ||
110 | .PCI(e) slots | |
111 | [NOTE] | |
112 | ==== | |
113 | Some platforms handle their physical PCI(e) slots differently. So, sometimes | |
114 | it can help to put the card in a another PCI(e) slot, if you do not get the | |
115 | desired `IOMMU` group separation. | |
116 | ==== | |
117 | ||
118 | .Unsafe interrupts | |
119 | [NOTE] | |
120 | ==== | |
121 | For some platforms, it may be necessary to allow unsafe interrupts. | |
122 | For this add the following line in a file ending with `.conf' file in | |
123 | */etc/modprobe.d/*: | |
124 | ||
125 | ---- | |
126 | options vfio_iommu_type1 allow_unsafe_interrupts=1 | |
127 | ---- | |
128 | ||
129 | Please be aware that this option can make your system unstable. | |
130 | ==== | |
131 | ||
132 | GPU Passthrough Notes | |
133 | ^^^^^^^^^^^^^^^^^^^^^ | |
134 | ||
135 | It is not possible to display the frame buffer of the GPU via NoVNC or SPICE on | |
136 | the {pve} web interface. | |
137 | ||
138 | When passing through a whole GPU or a vGPU and graphic output is wanted, one | |
139 | has to either physically connect a monitor to the card, or configure a remote | |
140 | desktop software (for example, VNC or RDP) inside the guest. | |
141 | ||
142 | If you want to use the GPU as a hardware accelerator, for example, for | |
143 | programs using OpenCL or CUDA, this is not required. | |
144 | ||
145 | Host Device Passthrough | |
146 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
147 | ||
148 | The most used variant of PCI(e) passthrough is to pass through a whole | |
149 | PCI(e) card, for example a GPU or a network card. | |
150 | ||
151 | ||
152 | Host Configuration | |
153 | ^^^^^^^^^^^^^^^^^^ | |
154 | ||
155 | In this case, the host cannot use the card. There are two methods to achieve | |
156 | this: | |
157 | ||
158 | * pass the device IDs to the options of the 'vfio-pci' modules by adding | |
159 | + | |
160 | ---- | |
161 | options vfio-pci ids=1234:5678,4321:8765 | |
162 | ---- | |
163 | + | |
164 | to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are | |
165 | the vendor and device IDs obtained by: | |
166 | + | |
167 | ---- | |
168 | # lcpci -nn | |
169 | ---- | |
170 | ||
171 | * blacklist the driver completely on the host, ensuring that it is free to bind | |
172 | for passthrough, with | |
173 | + | |
174 | ---- | |
175 | blacklist DRIVERNAME | |
176 | ---- | |
177 | + | |
178 | in a .conf file in */etc/modprobe.d/*. | |
179 | ||
180 | For both methods you need to | |
181 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and | |
182 | reboot after that. | |
183 | ||
184 | [[qm_pci_passthrough_vm_config]] | |
185 | VM Configuration | |
186 | ^^^^^^^^^^^^^^^^ | |
187 | To pass through the device you need to set the *hostpciX* option in the VM | |
188 | configuration, for example by executing: | |
189 | ||
190 | ---- | |
191 | # qm set VMID -hostpci0 00:02.0 | |
192 | ---- | |
193 | ||
194 | If your device has multiple functions (e.g., ``00:02.0`' and ``00:02.1`' ), | |
195 | you can pass them through all together with the shortened syntax ``00:02`' | |
196 | ||
197 | There are some options to which may be necessary, depending on the device | |
198 | and guest OS: | |
199 | ||
200 | * *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM. | |
201 | With this enabled the *vga* configuration option will be ignored. | |
202 | ||
203 | * *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device | |
204 | combination require PCIe rather than PCI. PCIe is only available for 'q35' | |
205 | machine types. | |
206 | ||
207 | * *rombar=on|off* makes the firmware ROM visible for the guest. Default is on. | |
208 | Some PCI(e) devices need this disabled. | |
209 | ||
210 | * *romfile=<path>*, is an optional path to a ROM file for the device to use. | |
211 | This is a relative path under */usr/share/kvm/*. | |
212 | ||
213 | .Example | |
214 | ||
215 | An example of PCIe passthrough with a GPU set to primary: | |
216 | ||
217 | ---- | |
218 | # qm set VMID -hostpci0 02:00,pcie=on,x-vga=on | |
219 | ---- | |
220 | ||
221 | ||
222 | Other considerations | |
223 | ^^^^^^^^^^^^^^^^^^^^ | |
224 | ||
225 | When passing through a GPU, the best compatibility is reached when using | |
226 | 'q35' as machine type, 'OVMF' ('EFI' for VMs) instead of SeaBIOS and PCIe | |
227 | instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the | |
228 | GPU needs to have an EFI capable ROM, otherwise use SeaBIOS instead. | |
229 | ||
230 | SR-IOV | |
231 | ~~~~~~ | |
232 | ||
233 | Another variant for passing through PCI(e) devices, is to use the hardware | |
234 | virtualization features of your devices, if available. | |
235 | ||
236 | 'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables | |
237 | a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the | |
238 | system. Each of those 'VF' can be used in a different VM, with full hardware | |
239 | features and also better performance and lower latency than software | |
240 | virtualized devices. | |
241 | ||
242 | Currently, the most common use case for this are NICs (**N**etwork | |
243 | **I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per | |
244 | physical port. This allows using features such as checksum offloading, etc. to | |
245 | be used inside a VM, reducing the (host) CPU overhead. | |
246 | ||
247 | ||
248 | Host Configuration | |
249 | ^^^^^^^^^^^^^^^^^^ | |
250 | ||
251 | Generally, there are two methods for enabling virtual functions on a device. | |
252 | ||
253 | * sometimes there is an option for the driver module e.g. for some | |
254 | Intel drivers | |
255 | + | |
256 | ---- | |
257 | max_vfs=4 | |
258 | ---- | |
259 | + | |
260 | which could be put file with '.conf' ending under */etc/modprobe.d/*. | |
261 | (Do not forget to update your initramfs after that) | |
262 | + | |
263 | Please refer to your driver module documentation for the exact | |
264 | parameters and options. | |
265 | ||
266 | * The second, more generic, approach is using the `sysfs`. | |
267 | If a device and driver supports this you can change the number of VFs on | |
268 | the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute: | |
269 | + | |
270 | ---- | |
271 | # echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs | |
272 | ---- | |
273 | + | |
274 | To make this change persistent you can use the `sysfsutils` Debian package. | |
275 | After installation configure it via */etc/sysfs.conf* or a `FILE.conf' in | |
276 | */etc/sysfs.d/*. | |
277 | ||
278 | VM Configuration | |
279 | ^^^^^^^^^^^^^^^^ | |
280 | ||
281 | After creating VFs, you should see them as separate PCI(e) devices when | |
282 | outputting them with `lspci`. Get their ID and pass them through like a | |
283 | xref:qm_pci_passthrough_vm_config[normal PCI(e) device]. | |
284 | ||
285 | Other considerations | |
286 | ^^^^^^^^^^^^^^^^^^^^ | |
287 | ||
288 | For this feature, platform support is especially important. It may be necessary | |
289 | to enable this feature in the BIOS/EFI first, or to use a specific PCI(e) port | |
290 | for it to work. In doubt, consult the manual of the platform or contact its | |
291 | vendor. | |
292 | ||
293 | Mediated Devices (vGPU, GVT-g) | |
294 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
295 | ||
296 | Mediated devices are another method to reuse features and performance from | |
297 | physical hardware for virtualized hardware. These are found most common in | |
298 | virtualized GPU setups such as Intels GVT-g and Nvidias vGPUs used in their | |
299 | GRID technology. | |
300 | ||
301 | With this, a physical Card is able to create virtual cards, similar to SR-IOV. | |
302 | The difference is that mediated devices do not appear as PCI(e) devices in the | |
303 | host, and are such only suited for using in virtual machines. | |
304 | ||
305 | ||
306 | Host Configuration | |
307 | ^^^^^^^^^^^^^^^^^^ | |
308 | ||
309 | In general your card's driver must support that feature, otherwise it will | |
310 | not work. So please refer to your vendor for compatible drivers and how to | |
311 | configure them. | |
312 | ||
313 | Intels drivers for GVT-g are integrated in the Kernel and should work | |
314 | with 5th, 6th and 7th generation Intel Core Processors, as well as E3 v4, E3 | |
315 | v5 and E3 v6 Xeon Processors. | |
316 | ||
317 | To enable it for Intel Graphics, you have to make sure to load the module | |
318 | 'kvmgt' (for example via `/etc/modules`) and to enable it on the | |
319 | xref:sysboot_edit_kernel_cmdline[Kernel commandline] and add the following parameter: | |
320 | ||
321 | ---- | |
322 | i915.enable_gvt=1 | |
323 | ---- | |
324 | ||
325 | After that remember to | |
326 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`], | |
327 | and reboot your host. | |
328 | ||
329 | VM Configuration | |
330 | ^^^^^^^^^^^^^^^^ | |
331 | ||
332 | To use a mediated device, simply specify the `mdev` property on a `hostpciX` | |
333 | VM configuration option. | |
334 | ||
335 | You can get the supported devices via the 'sysfs'. For example, to list the | |
336 | supported types for the device '0000:00:02.0' you would simply execute: | |
337 | ||
338 | ---- | |
339 | # ls /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types | |
340 | ---- | |
341 | ||
342 | Each entry is a directory which contains the following important files: | |
343 | ||
344 | * 'available_instances' contains the amount of still available instances of | |
345 | this type, each 'mdev' use in a VM reduces this. | |
346 | * 'description' contains a short description about the capabilities of the type | |
347 | * 'create' is the endpoint to create such a device, {pve} does this | |
348 | automatically for you, if a 'hostpciX' option with `mdev` is configured. | |
349 | ||
350 | Example configuration with an `Intel GVT-g vGPU` (`Intel Skylake 6700k`): | |
351 | ||
352 | ---- | |
353 | # qm set VMID -hostpci0 00:02.0,mdev=i915-GVTg_V5_4 | |
354 | ---- | |
355 | ||
356 | With this set, {pve} automatically creates such a device on VM start, and | |
357 | cleans it up again when the VM stops. | |
358 | ||
359 | ifdef::wiki[] | |
360 | ||
361 | See Also | |
362 | ~~~~~~~~ | |
363 | ||
364 | * link:/wiki/Pci_passthrough[PCI Passthrough Examples] | |
365 | ||
366 | endif::wiki[] |