]>
Commit | Line | Data |
---|---|---|
6e4c46c4 DC |
1 | [[qm_pci_passthrough]] |
2 | PCI(e) Passthrough | |
3 | ------------------ | |
4 | ||
5 | PCI(e) passthrough is a mechanism to give a virtual machine control over | |
49f20f1b TL |
6 | a PCI device from the host. This can have some advantages over using |
7 | virtualized hardware, for example lower latency, higher performance, or more | |
8 | features (e.g., offloading). | |
6e4c46c4 | 9 | |
49f20f1b | 10 | But, if you pass through a device to a virtual machine, you cannot use that |
6e4c46c4 DC |
11 | device anymore on the host or in any other VM. |
12 | ||
13 | General Requirements | |
14 | ~~~~~~~~~~~~~~~~~~~~ | |
15 | ||
16 | Since passthrough is a feature which also needs hardware support, there are | |
49f20f1b TL |
17 | some requirements to check and preparations to be done to make it work. |
18 | ||
6e4c46c4 DC |
19 | |
20 | Hardware | |
21 | ^^^^^^^^ | |
49f20f1b TL |
22 | Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement |
23 | **U**nit) interrupt remapping, this includes the CPU and the mainboard. | |
6e4c46c4 | 24 | |
49f20f1b TL |
25 | Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this. |
26 | But it is not guaranteed that everything will work out of the box, due | |
27 | to bad hardware implementation and missing or low quality drivers. | |
6e4c46c4 | 28 | |
49f20f1b | 29 | Further, server grade hardware has often better support than consumer grade |
6e4c46c4 DC |
30 | hardware, but even then, many modern system can support this. |
31 | ||
49f20f1b TL |
32 | Please refer to your hardware vendor to check if they support this feature |
33 | under Linux for your specific setup | |
34 | ||
6e4c46c4 DC |
35 | |
36 | Configuration | |
37 | ^^^^^^^^^^^^^ | |
38 | ||
49f20f1b TL |
39 | Once you ensured that your hardware supports passthrough, you will need to do |
40 | some configuration to enable PCI(e) passthrough. | |
6e4c46c4 | 41 | |
6e4c46c4 | 42 | |
39d84f28 | 43 | .IOMMU |
6e4c46c4 | 44 | |
49f20f1b | 45 | The IOMMU has to be activated on the kernel commandline. The easiest way is to |
39d84f28 | 46 | enable trough grub. Edit `'/etc/default/grub'' and add the following to the |
49f20f1b | 47 | 'GRUB_CMDLINE_LINUX_DEFAULT' variable: |
6e4c46c4 | 48 | |
49f20f1b TL |
49 | * for Intel CPUs: |
50 | + | |
51 | ---- | |
52 | intel_iommu=on | |
53 | ---- | |
54 | * for AMD CPUs: | |
55 | + | |
56 | ---- | |
6e4c46c4 | 57 | amd_iommu=on |
49f20f1b | 58 | ---- |
6e4c46c4 | 59 | |
39d84f28 | 60 | [[qm_pci_passthrough_update_grub]] |
49f20f1b TL |
61 | To bring this change in effect, make sure you run: |
62 | ||
63 | ---- | |
64 | # update-grub | |
65 | ---- | |
6e4c46c4 | 66 | |
39d84f28 | 67 | .Kernel Modules |
6e4c46c4 | 68 | |
49f20f1b TL |
69 | You have to make sure the following modules are loaded. This can be achieved by |
70 | adding them to `'/etc/modules'' | |
6e4c46c4 | 71 | |
49f20f1b | 72 | ---- |
6e4c46c4 DC |
73 | vfio |
74 | vfio_iommu_type1 | |
75 | vfio_pci | |
76 | vfio_virqfd | |
49f20f1b | 77 | ---- |
6e4c46c4 | 78 | |
49f20f1b | 79 | [[qm_pci_passthrough_update_initramfs]] |
6e4c46c4 | 80 | After changing anything modules related, you need to refresh your |
49f20f1b | 81 | `initramfs`. On {pve} this can be done by executing: |
6e4c46c4 DC |
82 | |
83 | ---- | |
49f20f1b | 84 | # update-initramfs -u -k all |
6e4c46c4 DC |
85 | ---- |
86 | ||
39d84f28 | 87 | .Finish Configuration |
49f20f1b TL |
88 | |
89 | Finally reboot to bring the changes into effect and check that it is indeed | |
90 | enabled. | |
6e4c46c4 DC |
91 | |
92 | ---- | |
49f20f1b | 93 | # dmesg -e DMAR -e IOMMU -e AMD-Vi |
6e4c46c4 DC |
94 | ---- |
95 | ||
49f20f1b TL |
96 | should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is |
97 | enabled, depending on hardware and kernel the exact message can vary. | |
6e4c46c4 DC |
98 | |
99 | It is also important that the device(s) you want to pass through | |
49f20f1b | 100 | are in a *separate* `IOMMU` group. This can be checked with: |
6e4c46c4 DC |
101 | |
102 | ---- | |
49f20f1b | 103 | # find /sys/kernel/iommu_groups/ -type l |
6e4c46c4 DC |
104 | ---- |
105 | ||
49f20f1b | 106 | It is okay if the device is in an `IOMMU` group together with its functions |
6e4c46c4 DC |
107 | (e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge. |
108 | ||
109 | .PCI(e) slots | |
110 | [NOTE] | |
111 | ==== | |
49f20f1b TL |
112 | Some platforms handle their physical PCI(e) slots differently. So, sometimes |
113 | it can help to put the card in a another PCI(e) slot, if you do not get the | |
114 | desired `IOMMU` group separation. | |
6e4c46c4 DC |
115 | ==== |
116 | ||
117 | .Unsafe interrupts | |
118 | [NOTE] | |
119 | ==== | |
120 | For some platforms, it may be necessary to allow unsafe interrupts. | |
49f20f1b TL |
121 | For this add the following line in a file ending with `.conf' file in |
122 | */etc/modprobe.d/*: | |
6e4c46c4 | 123 | |
49f20f1b | 124 | ---- |
6e4c46c4 | 125 | options vfio_iommu_type1 allow_unsafe_interrupts=1 |
49f20f1b | 126 | ---- |
6e4c46c4 DC |
127 | |
128 | Please be aware that this option can make your system unstable. | |
129 | ==== | |
130 | ||
49f20f1b | 131 | Host Device Passthrough |
6e4c46c4 DC |
132 | ~~~~~~~~~~~~~~~~~~~~~~~ |
133 | ||
134 | The most used variant of PCI(e) passthrough is to pass through a whole | |
49f20f1b TL |
135 | PCI(e) card, for example a GPU or a network card. |
136 | ||
6e4c46c4 DC |
137 | |
138 | Host Configuration | |
139 | ^^^^^^^^^^^^^^^^^^ | |
140 | ||
49f20f1b TL |
141 | In this case, the host cannot use the card. There are two methods to achieve |
142 | this: | |
6e4c46c4 | 143 | |
49f20f1b TL |
144 | * pass the device IDs to the options of the 'vfio-pci' modules by adding |
145 | + | |
146 | ---- | |
6e4c46c4 | 147 | options vfio-pci ids=1234:5678,4321:8765 |
6e4c46c4 | 148 | ---- |
49f20f1b TL |
149 | + |
150 | to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are | |
151 | the vendor and device IDs obtained by: | |
152 | + | |
153 | ---- | |
154 | # lcpci -nn | |
6e4c46c4 DC |
155 | ---- |
156 | ||
49f20f1b TL |
157 | * blacklist the driver completely on the host, ensuring that it is free to bind |
158 | for passthrough, with | |
159 | + | |
160 | ---- | |
6e4c46c4 | 161 | blacklist DRIVERNAME |
49f20f1b TL |
162 | ---- |
163 | + | |
164 | in a .conf file in */etc/modprobe.d/*. | |
6e4c46c4 | 165 | |
49f20f1b TL |
166 | For both methods you need to |
167 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and | |
168 | reboot after that. | |
6e4c46c4 | 169 | |
49f20f1b | 170 | [[qm_pci_passthrough_vm_config]] |
6e4c46c4 DC |
171 | VM Configuration |
172 | ^^^^^^^^^^^^^^^^ | |
49f20f1b TL |
173 | To pass through the device you need to set the *hostpciX* option in the VM |
174 | configuration, for example by executing: | |
6e4c46c4 DC |
175 | |
176 | ---- | |
49f20f1b | 177 | # qm set VMID -hostpci0 00:02.0 |
6e4c46c4 DC |
178 | ---- |
179 | ||
180 | If your device has multiple functions, you can pass them through all together | |
49f20f1b | 181 | with the shortened syntax ``00:02`' |
6e4c46c4 DC |
182 | |
183 | There are some options to which may be necessary, depending on the device | |
49f20f1b TL |
184 | and guest OS: |
185 | ||
186 | * *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM. | |
187 | With this enabled the *vga* configuration option will be ignored. | |
6e4c46c4 | 188 | |
6e4c46c4 | 189 | * *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device |
49f20f1b TL |
190 | combination require PCIe rather than PCI. PCIe is only available for 'q35' |
191 | machine types. | |
192 | ||
6e4c46c4 DC |
193 | * *rombar=on|off* makes the firmware ROM visible for the guest. Default is on. |
194 | Some PCI(e) devices need this disabled. | |
49f20f1b | 195 | |
6e4c46c4 | 196 | * *romfile=<path>*, is an optional path to a ROM file for the device to use. |
49f20f1b TL |
197 | This is a relative path under */usr/share/kvm/*. |
198 | ||
39d84f28 | 199 | .Example |
6e4c46c4 DC |
200 | |
201 | An example of PCIe passthrough with a GPU set to primary: | |
202 | ||
203 | ---- | |
49f20f1b | 204 | # qm set VMID -hostpci0 02:00,pcie=on,x-vga=on |
6e4c46c4 DC |
205 | ---- |
206 | ||
49f20f1b | 207 | |
6e4c46c4 DC |
208 | Other considerations |
209 | ^^^^^^^^^^^^^^^^^^^^ | |
210 | ||
211 | When passing through a GPU, the best compatibility is reached when using | |
49f20f1b TL |
212 | 'q35' as machine type, 'OVMF' ('EFI' for VMs) instead of SeaBIOS and PCIe |
213 | instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the | |
214 | GPU needs to have an EFI capable ROM, otherwise use SeaBIOS instead. | |
6e4c46c4 DC |
215 | |
216 | SR-IOV | |
217 | ~~~~~~ | |
218 | ||
49f20f1b TL |
219 | Another variant for passing through PCI(e) devices, is to use the hardware |
220 | virtualization features of your devices, if available. | |
221 | ||
222 | 'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables | |
223 | a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the | |
224 | system. Each of those 'VF' can be used in a different VM, with full hardware | |
225 | features and also better performance and lower latency than software | |
226 | virtualized devices. | |
6e4c46c4 | 227 | |
49f20f1b TL |
228 | Currently, the most common use case for this are NICs (**N**etwork |
229 | **I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per | |
230 | physical port. This allows using features such as checksum offloading, etc. to | |
231 | be used inside a VM, reducing the (host) CPU overhead. | |
6e4c46c4 | 232 | |
6e4c46c4 DC |
233 | |
234 | Host Configuration | |
235 | ^^^^^^^^^^^^^^^^^^ | |
236 | ||
49f20f1b | 237 | Generally, there are two methods for enabling virtual functions on a device. |
6e4c46c4 | 238 | |
49f20f1b | 239 | * sometimes there is an option for the driver module e.g. for some |
6e4c46c4 | 240 | Intel drivers |
49f20f1b TL |
241 | + |
242 | ---- | |
6e4c46c4 | 243 | max_vfs=4 |
49f20f1b TL |
244 | ---- |
245 | + | |
246 | which could be put file with '.conf' ending under */etc/modprobe.d/*. | |
6e4c46c4 | 247 | (Do not forget to update your initramfs after that) |
49f20f1b | 248 | + |
6e4c46c4 DC |
249 | Please refer to your driver module documentation for the exact |
250 | parameters and options. | |
251 | ||
49f20f1b TL |
252 | * The second, more generic, approach is using the `sysfs`. |
253 | If a device and driver supports this you can change the number of VFs on | |
254 | the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute: | |
255 | + | |
6e4c46c4 | 256 | ---- |
49f20f1b | 257 | # echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs |
6e4c46c4 | 258 | ---- |
49f20f1b TL |
259 | + |
260 | To make this change persistent you can use the `sysfsutils` Debian package. | |
39d84f28 | 261 | After installation configure it via */etc/sysfs.conf* or a `FILE.conf' in |
49f20f1b | 262 | */etc/sysfs.d/*. |
6e4c46c4 DC |
263 | |
264 | VM Configuration | |
265 | ^^^^^^^^^^^^^^^^ | |
266 | ||
49f20f1b TL |
267 | After creating VFs, you should see them as separate PCI(e) devices when |
268 | outputting them with `lspci`. Get their ID and pass them through like a | |
269 | xref:qm_pci_passthrough_vm_config[normal PCI(e) device]. | |
6e4c46c4 DC |
270 | |
271 | Other considerations | |
272 | ^^^^^^^^^^^^^^^^^^^^ | |
273 | ||
274 | For this feature, platform support is especially important. It may be necessary | |
49f20f1b TL |
275 | to enable this feature in the BIOS/EFI first, or to use a specific PCI(e) port |
276 | for it to work. In doubt, consult the manual of the platform or contact its | |
277 | vendor. | |
050192c5 | 278 | |
d25f097c TL |
279 | Mediated Devices (vGPU, GVT-g) |
280 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
050192c5 | 281 | |
d25f097c TL |
282 | Mediated devices are another method to use reuse features and performance from |
283 | physical hardware for virtualized hardware. These are found most common in | |
284 | virtualized GPU setups such as Intels GVT-g and Nvidias vGPUs used in their | |
285 | GRID technology. | |
286 | ||
287 | With this, a physical Card is able to create virtual cards, similar to SR-IOV. | |
288 | The difference is that mediated devices do not appear as PCI(e) devices in the | |
289 | host, and are such only suited for using in virtual machines. | |
050192c5 | 290 | |
050192c5 DC |
291 | |
292 | Host Configuration | |
293 | ^^^^^^^^^^^^^^^^^^ | |
294 | ||
d25f097c | 295 | In general your card's driver must support that feature, otherwise it will |
050192c5 DC |
296 | not work. So please refer to your vendor for compatbile drivers and how to |
297 | configure them. | |
298 | ||
299 | Intels drivers for GVT-g are integraded in the Kernel and should work | |
d25f097c TL |
300 | with the 5th, 6th and 7th generation Intel Core Processors, further E3 v4, E3 |
301 | v5 and E3 v6 Xeon Processors are supported. | |
050192c5 DC |
302 | |
303 | To enable it for Intel Graphcs, you have to make sure to load the module | |
304 | 'kvmgt' (for example via `/etc/modules`) and to enable it on the Kernel | |
d25f097c TL |
305 | commandline. For this you can edit `'/etc/default/grub'' and add the following |
306 | to the 'GRUB_CMDLINE_LINUX_DEFAULT' variable: | |
050192c5 DC |
307 | |
308 | ---- | |
309 | i915.enable_gvt=1 | |
310 | ---- | |
311 | ||
312 | After that remember to | |
313 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`], | |
314 | xref:qm_pci_passthrough_update_grub[update grub] and | |
315 | reboot your host. | |
316 | ||
317 | VM Configuration | |
318 | ^^^^^^^^^^^^^^^^ | |
319 | ||
d25f097c TL |
320 | To use a mediated device, simply specify the `mdev` property on a `hostpciX` |
321 | VM configuration option. | |
050192c5 | 322 | |
d25f097c TL |
323 | You can get the supported devices via the 'sysfs'. For example, to list the |
324 | supported types for the device '0000:00:02.0' you would simply execute: | |
050192c5 DC |
325 | |
326 | ---- | |
327 | # ls /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types | |
328 | ---- | |
329 | ||
330 | Each entry is a directory which contains the following important files: | |
331 | ||
d25f097c TL |
332 | * 'available_instances' contains the amount of still available instances of |
333 | this type, each 'mdev' use in a VM reduces this. | |
050192c5 | 334 | * 'description' contains a short description about the capabilities of the type |
d25f097c TL |
335 | * 'create' is the endpoint to create such a device, {pve} does this |
336 | automatically for you, if a 'hostpciX' option with `mdev` is configured. | |
050192c5 | 337 | |
d25f097c | 338 | Example configuration with an `Intel GVT-g vGPU` (`Intel Skylake 6700k`): |
050192c5 DC |
339 | |
340 | ---- | |
341 | # qm set VMID -hostpci0 00:02.0,mdev=i915-GVTg_V5_4 | |
342 | ---- | |
343 | ||
344 | With this set, {pve} automatically creates such a device on VM start, and | |
345 | cleans it up again when the VM stops. |