]> git.proxmox.com Git - mirror_qemu.git/blob - docs/pcie.txt
rbd: Fix to cleanly reject -drive without pool or image
[mirror_qemu.git] / docs / pcie.txt
1 PCI EXPRESS GUIDELINES
2 ======================
3
4 1. Introduction
5 ================
6 The doc proposes best practices on how to use PCI Express/PCI device
7 in PCI Express based machines and explains the reasoning behind them.
8
9 The following presentations accompany this document:
10 (1) Q35 overview.
11 http://wiki.qemu.org/images/4/4e/Q35.pdf
12 (2) A comparison between PCI and PCI Express technologies.
13 http://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf
14
15 Note: The usage examples are not intended to replace the full
16 documentation, please use QEMU help to retrieve all options.
17
18 2. Device placement strategy
19 ============================
20 QEMU does not have a clear socket-device matching mechanism
21 and allows any PCI/PCI Express device to be plugged into any
22 PCI/PCI Express slot.
23 Plugging a PCI device into a PCI Express slot might not always work and
24 is weird anyway since it cannot be done for "bare metal".
25 Plugging a PCI Express device into a PCI slot will hide the Extended
26 Configuration Space thus is also not recommended.
27
28 The recommendation is to separate the PCI Express and PCI hierarchies.
29 PCI Express devices should be plugged only into PCI Express Root Ports and
30 PCI Express Downstream ports.
31
32 2.1 Root Bus (pcie.0)
33 =====================
34 Place only the following kinds of devices directly on the Root Complex:
35 (1) PCI Devices (e.g. network card, graphics card, IDE controller),
36 not controllers. Place only legacy PCI devices on
37 the Root Complex. These will be considered Integrated Endpoints.
38 Note: Integrated Endpoints are not hot-pluggable.
39
40 Although the PCI Express spec does not forbid PCI Express devices as
41 Integrated Endpoints, existing hardware mostly integrates legacy PCI
42 devices with the Root Complex. Guest OSes are suspected to behave
43 strangely when PCI Express devices are integrated
44 with the Root Complex.
45
46 (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
47 hierarchies.
48
49 (3) DMI-PCI Bridges (i82801b11-bridge), for starting legacy PCI
50 hierarchies.
51
52 (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses
53 are needed.
54
55 pcie.0 bus
56 ----------------------------------------------------------------------------
57 | | | |
58 ----------- ------------------ ------------------ --------------
59 | PCI Dev | | PCIe Root Port | | DMI-PCI Bridge | | pxb-pcie |
60 ----------- ------------------ ------------------ --------------
61
62 2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use:
63 -device <dev>[,bus=pcie.0]
64 2.1.2 To expose a new PCI Express Root Bus use:
65 -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z]
66 Only PCI Express Root Ports and DMI-PCI bridges can be connected
67 to the pcie.1 bus:
68 -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
69 -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
70
71
72 2.2 PCI Express only hierarchy
73 ==============================
74 Always use PCI Express Root Ports to start PCI Express hierarchies.
75
76 A PCI Express Root bus supports up to 32 devices. Since each
77 PCI Express Root Port is a function and a multi-function
78 device may support up to 8 functions, the maximum possible
79 number of PCI Express Root Ports per PCI Express Root Bus is 256.
80
81 Prefer grouping PCI Express Root Ports into multi-function devices
82 to keep a simple flat hierarchy that is enough for most scenarios.
83 Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
84 if there is no more room for PCI Express Root Ports.
85 Please see section 4. for further justifications.
86
87 Plug only PCI Express devices into PCI Express Ports.
88
89
90 pcie.0 bus
91 ----------------------------------------------------------------------------------
92 | | |
93 ------------- ------------- -------------
94 | Root Port | | Root Port | | Root Port |
95 ------------ ------------- -------------
96 | -------------------------|------------------------
97 ------------ | ----------------- |
98 | PCIe Dev | | PCI Express | Upstream Port | |
99 ------------ | Switch ----------------- |
100 | | | |
101 | ------------------- ------------------- |
102 | | Downstream Port | | Downstream Port | |
103 | ------------------- ------------------- |
104 -------------|-----------------------|------------
105 ------------
106 | PCIe Dev |
107 ------------
108
109 2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
110 -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \
111 -device <dev>,bus=root_port1
112 2.2.2 Using multi-function PCI Express Root Ports:
113 -device ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \
114 -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \
115 -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \
116 2.2.3 Plugging a PCI Express device into a Switch:
117 -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \
118 -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \
119 -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \
120 -device <dev>,bus=downstream_port1
121
122 Notes:
123 - (slot, chassis) pair is mandatory and must be unique for each
124 PCI Express Root Port. slot defaults to 0 when not specified.
125 - 'addr' parameter can be 0 for all the examples above.
126
127
128 2.3 PCI only hierarchy
129 ======================
130 Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints,
131 but, as mentioned in section 5, doing so means the legacy PCI
132 device in question will be incapable of hot-unplugging.
133 Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination
134 with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
135
136 Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge
137 (having 32 slots) and several PCI-PCI Bridges attached to it
138 (each supporting also 32 slots) will support hundreds of legacy devices.
139 The recommendation is to populate one PCI-PCI Bridge under the DMI-PCI Bridge
140 until is full and then plug a new PCI-PCI Bridge...
141
142 pcie.0 bus
143 ----------------------------------------------
144 | |
145 ----------- ------------------
146 | PCI Dev | | DMI-PCI BRIDGE |
147 ---------- ------------------
148 | |
149 ------------------ ------------------
150 | PCI-PCI Bridge | | PCI-PCI Bridge | ...
151 ------------------ ------------------
152 | |
153 ----------- -----------
154 | PCI Dev | | PCI Dev |
155 ----------- -----------
156
157 2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use:
158 -device <dev>[,bus=pcie.0]
159 2.3.2 Plugging a PCI device into a PCI-PCI Bridge:
160 -device i82801b11-bridge,id=dmi_pci_bridge1[,bus=pcie.0] \
161 -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y] \
162 -device <dev>,bus=pci_bridge1[,addr=x]
163 Note that 'addr' cannot be 0 unless shpc=off parameter is passed to
164 the PCI Bridge.
165
166 3. IO space issues
167 ===================
168 The PCI Express Root Ports and PCI Express Downstream ports are seen by
169 Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each
170 such Port should be reserved a 4K IO range for, even though only one
171 (multifunction) device can be plugged into each Port. This results in
172 poor IO space utilization.
173
174 The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
175 by not allocating IO space for each PCI Express Root / PCI Express
176 Downstream port if:
177 (1) the port is empty, or
178 (2) the device behind the port has no IO BARs.
179
180 The IO space is very limited, to 65536 byte-wide IO ports, and may even be
181 fragmented by fixed IO ports owned by platform devices resulting in at most
182 10 PCI Express Root Ports or PCI Express Downstream Ports per system
183 if devices with IO BARs are used in the PCI Express hierarchy. Using the
184 proposed device placing strategy solves this issue by using only
185 PCI Express devices within PCI Express hierarchy.
186
187 The PCI Express spec requires that PCI Express devices work properly
188 without using IO ports. The PCI hierarchy has no such limitations.
189
190
191 4. Bus numbers issues
192 ======================
193 Each PCI domain can have up to only 256 buses and the QEMU PCI Express
194 machines do not support multiple PCI domains even if extra Root
195 Complexes (pxb-pcie) are used.
196
197 Each element of the PCI Express hierarchy (Root Complexes,
198 PCI Express Root Ports, PCI Express Downstream/Upstream ports)
199 uses one bus number. Since only one (multifunction) device
200 can be attached to a PCI Express Root Port or PCI Express Downstream
201 Port it is advised to plan in advance for the expected number of
202 devices to prevent bus number starvation.
203
204 Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI
205 Express hierarchy) enables the hierarchy to not spend bus numbers on
206 Upstream Ports.
207
208 The bus_nr properties of the pxb-pcie devices partition the 0..255 bus
209 number space. All bus numbers assigned to the buses recursively behind a
210 given pxb-pcie device's root bus must fit between the bus_nr property of
211 that pxb-pcie device, and the lowest of the higher bus_nr properties
212 that the command line sets for other pxb-pcie devices.
213
214
215 5. Hot-plug
216 ============
217 The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
218 do not support hot-plug, so any devices plugged into Root Complexes
219 cannot be hot-plugged/hot-unplugged:
220 (1) PCI Express Integrated Endpoints
221 (2) PCI Express Root Ports
222 (3) DMI-PCI Bridges
223 (4) pxb-pcie
224
225 Be aware that PCI Express Downstream Ports can't be hot-plugged into
226 an existing PCI Express Upstream Port.
227
228 PCI devices can be hot-plugged into PCI-PCI Bridges. The PCI hot-plug is ACPI
229 based and can work side by side with the PCI Express native hot-plug.
230
231 PCI Express devices can be natively hot-plugged/hot-unplugged into/from
232 PCI Express Root Ports (and PCI Express Downstream Ports).
233
234 5.1 Planning for hot-plug:
235 (1) PCI hierarchy
236 Leave enough PCI-PCI Bridge slots empty or add one
237 or more empty PCI-PCI Bridges to the DMI-PCI Bridge.
238
239 For each such PCI-PCI Bridge the Guest Firmware is expected to reserve
240 4K IO space and 2M MMIO range to be used for all devices behind it.
241
242 Because of the hard IO limit of around 10 PCI Bridges (~ 40K space)
243 per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the
244 Integrated Endpoints. (The PCI Express Hierarchy needs no IO space).
245
246 (2) PCI Express hierarchy:
247 Leave enough PCI Express Root Ports empty. Use multifunction
248 PCI Express Root Ports (up to 8 ports per pcie.0 slot)
249 on the Root Complex(es), for keeping the
250 hierarchy as flat as possible, thereby saving PCI bus numbers.
251 Don't use PCI Express Switches if you don't have
252 to, each one of those uses an extra PCI bus (for its Upstream Port)
253 that could be put to better use with another Root Port or Downstream
254 Port, which may come handy for hot-plugging another device.
255
256
257 5.3 Hot-plug example:
258 Using HMP: (add -monitor stdio to QEMU command line)
259 device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/>
260
261
262 6. Device assignment
263 ====================
264 Host devices are mostly PCI Express and should be plugged only into
265 PCI Express Root Ports or PCI Express Downstream Ports.
266 PCI-PCI Bridge slots can be used for legacy PCI host devices.
267
268 6.1 How to detect if a device is PCI Express:
269 > lspci -s 03:00.0 -v (as root)
270
271 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
272 Subsystem: Intel Corporation Dual Band Wireless-AC 7260
273 Flags: bus master, fast devsel, latency 0, IRQ 50
274 Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
275 Capabilities: [c8] Power Management version 3
276 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
277 Capabilities: [40] Express Endpoint, MSI 00
278 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279 Capabilities: [100] Advanced Error Reporting
280 Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
281 Capabilities: [14c] Latency Tolerance Reporting
282 Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014
283
284 If you can see the "Express Endpoint" capability in the
285 output, then the device is indeed PCI Express.
286
287
288 7. Virtio devices
289 =================
290 Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints
291 will remain PCI and have transitional behaviour as default.
292 Transitional virtio devices work in both IO and MMIO modes depending on
293 the guest support. The Guest firmware will assign both IO and MMIO resources
294 to transitional virtio devices.
295
296 Virtio devices plugged into PCI Express ports are PCI Express devices and
297 have "1.0" behavior by default without IO support.
298 In both cases disable-legacy and disable-modern properties can be used
299 to override the behaviour.
300
301 Note that setting disable-legacy=off will enable legacy mode (enabling
302 legacy behavior) for PCI Express virtio devices causing them to
303 require IO space, which, given the limited available IO space, may quickly
304 lead to resource exhaustion, and is therefore strongly discouraged.
305
306
307 8. Conclusion
308 ==============
309 The proposal offers a usage model that is easy to understand and follow
310 and at the same time overcomes the PCI Express architecture limitations.