]>
Commit | Line | Data |
---|---|---|
6e4c46c4 DC |
1 | [[qm_pci_passthrough]] |
2 | PCI(e) Passthrough | |
3 | ------------------ | |
4 | ||
5 | PCI(e) passthrough is a mechanism to give a virtual machine control over | |
49f20f1b TL |
6 | a PCI device from the host. This can have some advantages over using |
7 | virtualized hardware, for example lower latency, higher performance, or more | |
8 | features (e.g., offloading). | |
6e4c46c4 | 9 | |
49f20f1b | 10 | But, if you pass through a device to a virtual machine, you cannot use that |
6e4c46c4 DC |
11 | device anymore on the host or in any other VM. |
12 | ||
13 | General Requirements | |
14 | ~~~~~~~~~~~~~~~~~~~~ | |
15 | ||
16 | Since passthrough is a feature which also needs hardware support, there are | |
49f20f1b TL |
17 | some requirements to check and preparations to be done to make it work. |
18 | ||
6e4c46c4 DC |
19 | |
20 | Hardware | |
21 | ^^^^^^^^ | |
49f20f1b TL |
22 | Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement |
23 | **U**nit) interrupt remapping, this includes the CPU and the mainboard. | |
6e4c46c4 | 24 | |
49f20f1b TL |
25 | Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this. |
26 | But it is not guaranteed that everything will work out of the box, due | |
27 | to bad hardware implementation and missing or low quality drivers. | |
6e4c46c4 | 28 | |
49f20f1b | 29 | Further, server grade hardware has often better support than consumer grade |
6e4c46c4 DC |
30 | hardware, but even then, many modern system can support this. |
31 | ||
49f20f1b TL |
32 | Please refer to your hardware vendor to check if they support this feature |
33 | under Linux for your specific setup | |
34 | ||
6e4c46c4 DC |
35 | |
36 | Configuration | |
37 | ^^^^^^^^^^^^^ | |
38 | ||
49f20f1b TL |
39 | Once you ensured that your hardware supports passthrough, you will need to do |
40 | some configuration to enable PCI(e) passthrough. | |
6e4c46c4 | 41 | |
6e4c46c4 | 42 | |
49f20f1b TL |
43 | IOMMU |
44 | +++++ | |
6e4c46c4 | 45 | |
49f20f1b TL |
46 | The IOMMU has to be activated on the kernel commandline. The easiest way is to |
47 | enable trough grub. Edit `'/etc/default/grub'' and add the following to th | |
48 | 'GRUB_CMDLINE_LINUX_DEFAULT' variable: | |
6e4c46c4 | 49 | |
49f20f1b TL |
50 | * for Intel CPUs: |
51 | + | |
52 | ---- | |
53 | intel_iommu=on | |
54 | ---- | |
55 | * for AMD CPUs: | |
56 | + | |
57 | ---- | |
6e4c46c4 | 58 | amd_iommu=on |
49f20f1b | 59 | ---- |
6e4c46c4 | 60 | |
49f20f1b TL |
61 | To bring this change in effect, make sure you run: |
62 | ||
63 | ---- | |
64 | # update-grub | |
65 | ---- | |
6e4c46c4 | 66 | |
49f20f1b TL |
67 | Kernel Modules |
68 | ++++++++++++++ | |
6e4c46c4 | 69 | |
49f20f1b TL |
70 | You have to make sure the following modules are loaded. This can be achieved by |
71 | adding them to `'/etc/modules'' | |
6e4c46c4 | 72 | |
49f20f1b | 73 | ---- |
6e4c46c4 DC |
74 | vfio |
75 | vfio_iommu_type1 | |
76 | vfio_pci | |
77 | vfio_virqfd | |
49f20f1b | 78 | ---- |
6e4c46c4 | 79 | |
49f20f1b | 80 | [[qm_pci_passthrough_update_initramfs]] |
6e4c46c4 | 81 | After changing anything modules related, you need to refresh your |
49f20f1b | 82 | `initramfs`. On {pve} this can be done by executing: |
6e4c46c4 DC |
83 | |
84 | ---- | |
49f20f1b | 85 | # update-initramfs -u -k all |
6e4c46c4 DC |
86 | ---- |
87 | ||
49f20f1b TL |
88 | Finish Configuration |
89 | ++++++++++++++++++++ | |
90 | ||
91 | Finally reboot to bring the changes into effect and check that it is indeed | |
92 | enabled. | |
6e4c46c4 DC |
93 | |
94 | ---- | |
49f20f1b | 95 | # dmesg -e DMAR -e IOMMU -e AMD-Vi |
6e4c46c4 DC |
96 | ---- |
97 | ||
49f20f1b TL |
98 | should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is |
99 | enabled, depending on hardware and kernel the exact message can vary. | |
6e4c46c4 DC |
100 | |
101 | It is also important that the device(s) you want to pass through | |
49f20f1b | 102 | are in a *separate* `IOMMU` group. This can be checked with: |
6e4c46c4 DC |
103 | |
104 | ---- | |
49f20f1b | 105 | # find /sys/kernel/iommu_groups/ -type l |
6e4c46c4 DC |
106 | ---- |
107 | ||
49f20f1b | 108 | It is okay if the device is in an `IOMMU` group together with its functions |
6e4c46c4 DC |
109 | (e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge. |
110 | ||
111 | .PCI(e) slots | |
112 | [NOTE] | |
113 | ==== | |
49f20f1b TL |
114 | Some platforms handle their physical PCI(e) slots differently. So, sometimes |
115 | it can help to put the card in a another PCI(e) slot, if you do not get the | |
116 | desired `IOMMU` group separation. | |
6e4c46c4 DC |
117 | ==== |
118 | ||
119 | .Unsafe interrupts | |
120 | [NOTE] | |
121 | ==== | |
122 | For some platforms, it may be necessary to allow unsafe interrupts. | |
49f20f1b TL |
123 | For this add the following line in a file ending with `.conf' file in |
124 | */etc/modprobe.d/*: | |
6e4c46c4 | 125 | |
49f20f1b | 126 | ---- |
6e4c46c4 | 127 | options vfio_iommu_type1 allow_unsafe_interrupts=1 |
49f20f1b | 128 | ---- |
6e4c46c4 DC |
129 | |
130 | Please be aware that this option can make your system unstable. | |
131 | ==== | |
132 | ||
49f20f1b | 133 | Host Device Passthrough |
6e4c46c4 DC |
134 | ~~~~~~~~~~~~~~~~~~~~~~~ |
135 | ||
136 | The most used variant of PCI(e) passthrough is to pass through a whole | |
49f20f1b TL |
137 | PCI(e) card, for example a GPU or a network card. |
138 | ||
6e4c46c4 DC |
139 | |
140 | Host Configuration | |
141 | ^^^^^^^^^^^^^^^^^^ | |
142 | ||
49f20f1b TL |
143 | In this case, the host cannot use the card. There are two methods to achieve |
144 | this: | |
6e4c46c4 | 145 | |
49f20f1b TL |
146 | * pass the device IDs to the options of the 'vfio-pci' modules by adding |
147 | + | |
148 | ---- | |
6e4c46c4 | 149 | options vfio-pci ids=1234:5678,4321:8765 |
6e4c46c4 | 150 | ---- |
49f20f1b TL |
151 | + |
152 | to a .conf file in */etc/modprobe.d/* where `1234:5678` and `4321:8765` are | |
153 | the vendor and device IDs obtained by: | |
154 | + | |
155 | ---- | |
156 | # lcpci -nn | |
6e4c46c4 DC |
157 | ---- |
158 | ||
49f20f1b TL |
159 | * blacklist the driver completely on the host, ensuring that it is free to bind |
160 | for passthrough, with | |
161 | + | |
162 | ---- | |
6e4c46c4 | 163 | blacklist DRIVERNAME |
49f20f1b TL |
164 | ---- |
165 | + | |
166 | in a .conf file in */etc/modprobe.d/*. | |
6e4c46c4 | 167 | |
49f20f1b TL |
168 | For both methods you need to |
169 | xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and | |
170 | reboot after that. | |
6e4c46c4 | 171 | |
49f20f1b | 172 | [[qm_pci_passthrough_vm_config]] |
6e4c46c4 DC |
173 | VM Configuration |
174 | ^^^^^^^^^^^^^^^^ | |
49f20f1b TL |
175 | To pass through the device you need to set the *hostpciX* option in the VM |
176 | configuration, for example by executing: | |
6e4c46c4 DC |
177 | |
178 | ---- | |
49f20f1b | 179 | # qm set VMID -hostpci0 00:02.0 |
6e4c46c4 DC |
180 | ---- |
181 | ||
182 | If your device has multiple functions, you can pass them through all together | |
49f20f1b | 183 | with the shortened syntax ``00:02`' |
6e4c46c4 DC |
184 | |
185 | There are some options to which may be necessary, depending on the device | |
49f20f1b TL |
186 | and guest OS: |
187 | ||
188 | * *x-vga=on|off* marks the PCI(e) device as the primary GPU of the VM. | |
189 | With this enabled the *vga* configuration option will be ignored. | |
6e4c46c4 | 190 | |
6e4c46c4 | 191 | * *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device |
49f20f1b TL |
192 | combination require PCIe rather than PCI. PCIe is only available for 'q35' |
193 | machine types. | |
194 | ||
6e4c46c4 DC |
195 | * *rombar=on|off* makes the firmware ROM visible for the guest. Default is on. |
196 | Some PCI(e) devices need this disabled. | |
49f20f1b | 197 | |
6e4c46c4 | 198 | * *romfile=<path>*, is an optional path to a ROM file for the device to use. |
49f20f1b TL |
199 | This is a relative path under */usr/share/kvm/*. |
200 | ||
201 | Example | |
202 | +++++++ | |
6e4c46c4 DC |
203 | |
204 | An example of PCIe passthrough with a GPU set to primary: | |
205 | ||
206 | ---- | |
49f20f1b | 207 | # qm set VMID -hostpci0 02:00,pcie=on,x-vga=on |
6e4c46c4 DC |
208 | ---- |
209 | ||
49f20f1b | 210 | |
6e4c46c4 DC |
211 | Other considerations |
212 | ^^^^^^^^^^^^^^^^^^^^ | |
213 | ||
214 | When passing through a GPU, the best compatibility is reached when using | |
49f20f1b TL |
215 | 'q35' as machine type, 'OVMF' ('EFI' for VMs) instead of SeaBIOS and PCIe |
216 | instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the | |
217 | GPU needs to have an EFI capable ROM, otherwise use SeaBIOS instead. | |
6e4c46c4 DC |
218 | |
219 | SR-IOV | |
220 | ~~~~~~ | |
221 | ||
49f20f1b TL |
222 | Another variant for passing through PCI(e) devices, is to use the hardware |
223 | virtualization features of your devices, if available. | |
224 | ||
225 | 'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables | |
226 | a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the | |
227 | system. Each of those 'VF' can be used in a different VM, with full hardware | |
228 | features and also better performance and lower latency than software | |
229 | virtualized devices. | |
6e4c46c4 | 230 | |
49f20f1b TL |
231 | Currently, the most common use case for this are NICs (**N**etwork |
232 | **I**nterface **C**ard) with SR-IOV support, which can provide multiple VFs per | |
233 | physical port. This allows using features such as checksum offloading, etc. to | |
234 | be used inside a VM, reducing the (host) CPU overhead. | |
6e4c46c4 | 235 | |
6e4c46c4 DC |
236 | |
237 | Host Configuration | |
238 | ^^^^^^^^^^^^^^^^^^ | |
239 | ||
49f20f1b | 240 | Generally, there are two methods for enabling virtual functions on a device. |
6e4c46c4 | 241 | |
49f20f1b | 242 | * sometimes there is an option for the driver module e.g. for some |
6e4c46c4 | 243 | Intel drivers |
49f20f1b TL |
244 | + |
245 | ---- | |
6e4c46c4 | 246 | max_vfs=4 |
49f20f1b TL |
247 | ---- |
248 | + | |
249 | which could be put file with '.conf' ending under */etc/modprobe.d/*. | |
6e4c46c4 | 250 | (Do not forget to update your initramfs after that) |
49f20f1b | 251 | + |
6e4c46c4 DC |
252 | Please refer to your driver module documentation for the exact |
253 | parameters and options. | |
254 | ||
49f20f1b TL |
255 | * The second, more generic, approach is using the `sysfs`. |
256 | If a device and driver supports this you can change the number of VFs on | |
257 | the fly. For example, to setup 4 VFs on device 0000:01:00.0 execute: | |
258 | + | |
6e4c46c4 | 259 | ---- |
49f20f1b | 260 | # echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs |
6e4c46c4 | 261 | ---- |
49f20f1b TL |
262 | + |
263 | To make this change persistent you can use the `sysfsutils` Debian package. | |
264 | After installation configure it via */etc/sysfs.conf* or a `FILE.conf' inf | |
265 | */etc/sysfs.d/*. | |
6e4c46c4 DC |
266 | |
267 | VM Configuration | |
268 | ^^^^^^^^^^^^^^^^ | |
269 | ||
49f20f1b TL |
270 | After creating VFs, you should see them as separate PCI(e) devices when |
271 | outputting them with `lspci`. Get their ID and pass them through like a | |
272 | xref:qm_pci_passthrough_vm_config[normal PCI(e) device]. | |
6e4c46c4 DC |
273 | |
274 | Other considerations | |
275 | ^^^^^^^^^^^^^^^^^^^^ | |
276 | ||
277 | For this feature, platform support is especially important. It may be necessary | |
49f20f1b TL |
278 | to enable this feature in the BIOS/EFI first, or to use a specific PCI(e) port |
279 | for it to work. In doubt, consult the manual of the platform or contact its | |
280 | vendor. |