]> git.proxmox.com Git - mirror_qemu.git/blob - docs/specs/ppc-spapr-xive.rst
Merge remote-tracking branch 'remotes/stsquad/tags/pull-misc-6.0-updates-170321-2...
[mirror_qemu.git] / docs / specs / ppc-spapr-xive.rst
1 XIVE for sPAPR (pseries machines)
2 =================================
3
4 The POWER9 processor comes with a new interrupt controller
5 architecture, called XIVE as "eXternal Interrupt Virtualization
6 Engine". It supports a larger number of interrupt sources and offers
7 virtualization features which enables the HW to deliver interrupts
8 directly to virtual processors without hypervisor assistance.
9
10 A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
11 processors can run under two interrupt modes:
12
13 - *Legacy Compatibility Mode*
14
15 the hypervisor provides identical interfaces and similar
16 functionality to PAPR+ Version 2.7. This is the default mode
17
18 It is also referred as *XICS* in QEMU.
19
20 - *XIVE native exploitation mode*
21
22 the hypervisor provides new interfaces to manage the XIVE control
23 structures, and provides direct control for interrupt management
24 through MMIO pages.
25
26 Which interrupt modes can be used by the machine is negotiated with
27 the guest O/S during the Client Architecture Support negotiation
28 sequence. The two modes are mutually exclusive.
29
30 Both interrupt mode share the same IRQ number space. See below for the
31 layout.
32
33 CAS Negotiation
34 ---------------
35
36 QEMU advertises the supported interrupt modes in the device tree
37 property ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS
38 Selection for XIVE is indicated in the ``ibm,architecture-vec-5``
39 property byte 23.
40
41 The interrupt modes supported by the machine depend on the CPU type
42 (POWER9 is required for XIVE) but also on the machine property
43 ``ic-mode`` which can be set on the command line. It can take the
44 following values: ``xics``, ``xive``, and ``dual`` which is the
45 default mode. ``dual`` means that both modes XICS **and** XIVE are
46 supported and if the guest OS supports XIVE, this mode will be
47 selected.
48
49 The chosen interrupt mode is activated after a reconfiguration done
50 in a machine reset.
51
52 KVM negotiation
53 ---------------
54
55 When the guest starts under KVM, the capabilities of the host kernel
56 and QEMU are also negotiated. Depending on the version of the host
57 kernel, KVM will advertise the XIVE capability to QEMU or not.
58
59 Nevertheless, the available interrupt modes in the machine should not
60 depend on the XIVE KVM capability of the host. On older kernels
61 without XIVE KVM support, QEMU will use the emulated XIVE device as a
62 fallback and on newer kernels (>=5.2), the KVM XIVE device.
63
64 XIVE native exploitation mode is not supported for KVM nested guests,
65 VMs running under a L1 hypervisor (KVM on pSeries). In that case, the
66 hypervisor will not advertise the KVM capability and QEMU will use the
67 emulated XIVE device, same as for older versions of KVM.
68
69 As a final refinement, the user can also switch the use of the KVM
70 device with the machine option ``kernel_irqchip``.
71
72
73 XIVE support in KVM
74 ~~~~~~~~~~~~~~~~~~~
75
76 For guest OSes supporting XIVE, the resulting interrupt modes on host
77 kernels with XIVE KVM support are the following:
78
79 ============== ============= ============= ================
80 ic-mode kernel_irqchip
81 -------------- ----------------------------------------------
82 / allowed off on
83 (default)
84 ============== ============= ============= ================
85 dual (default) XIVE KVM XIVE emul. XIVE KVM
86 xive XIVE KVM XIVE emul. XIVE KVM
87 xics XICS KVM XICS emul. XICS KVM
88 ============== ============= ============= ================
89
90 For legacy guest OSes without XIVE support, the resulting interrupt
91 modes are the following:
92
93 ============== ============= ============= ================
94 ic-mode kernel_irqchip
95 -------------- ----------------------------------------------
96 / allowed off on
97 (default)
98 ============== ============= ============= ================
99 dual (default) XICS KVM XICS emul. XICS KVM
100 xive QEMU error(3) QEMU error(3) QEMU error(3)
101 xics XICS KVM XICS emul. XICS KVM
102 ============== ============= ============= ================
103
104 (3) QEMU fails at CAS with ``Guest requested unavailable interrupt
105 mode (XICS), either don't set the ic-mode machine property or try
106 ic-mode=xics or ic-mode=dual``
107
108
109 No XIVE support in KVM
110 ~~~~~~~~~~~~~~~~~~~~~~
111
112 For guest OSes supporting XIVE, the resulting interrupt modes on host
113 kernels without XIVE KVM support are the following:
114
115 ============== ============= ============= ================
116 ic-mode kernel_irqchip
117 -------------- ----------------------------------------------
118 / allowed off on
119 (default)
120 ============== ============= ============= ================
121 dual (default) XIVE emul.(1) XIVE emul. QEMU error (2)
122 xive XIVE emul.(1) XIVE emul. QEMU error (2)
123 xics XICS KVM XICS emul. XICS KVM
124 ============== ============= ============= ================
125
126
127 (1) QEMU warns with ``warning: kernel_irqchip requested but unavailable:
128 IRQ_XIVE capability must be present for KVM``
129 In some cases (old host kernels or KVM nested guests), one may hit a
130 QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
131 with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on``
132 (2) QEMU fails with ``kernel_irqchip requested but unavailable:
133 IRQ_XIVE capability must be present for KVM``
134
135
136 For legacy guest OSes without XIVE support, the resulting interrupt
137 modes are the following:
138
139 ============== ============= ============= ================
140 ic-mode kernel_irqchip
141 -------------- ----------------------------------------------
142 / allowed off on
143 (default)
144 ============== ============= ============= ================
145 dual (default) QEMU error(4) XICS emul. QEMU error(4)
146 xive QEMU error(3) QEMU error(3) QEMU error(3)
147 xics XICS KVM XICS emul. XICS KVM
148 ============== ============= ============= ================
149
150 (3) QEMU fails at CAS with ``Guest requested unavailable interrupt
151 mode (XICS), either don't set the ic-mode machine property or try
152 ic-mode=xics or ic-mode=dual``
153 (4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
154 with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on``
155
156
157 XIVE Device tree properties
158 ---------------------------
159
160 The properties for the PAPR interrupt controller node when the *XIVE
161 native exploitation mode* is selected should contain:
162
163 - ``device_type``
164
165 value should be "power-ivpe".
166
167 - ``compatible``
168
169 value should be "ibm,power-ivpe".
170
171 - ``reg``
172
173 contains the base address and size of the thread interrupt
174 managnement areas (TIMA), for the User level and for the Guest OS
175 level. Only the Guest OS level is taken into account today.
176
177 - ``ibm,xive-eq-sizes``
178
179 the size of the event queues. One cell per size supported, contains
180 log2 of size, in ascending order.
181
182 - ``ibm,xive-lisn-ranges``
183
184 the IRQ interrupt number ranges assigned to the guest for the IPIs.
185
186 The root node also exports :
187
188 - ``ibm,plat-res-int-priorities``
189
190 contains a list of priorities that the hypervisor has reserved for
191 its own use.
192
193 IRQ number space
194 ----------------
195
196 IRQ Number space of the ``pseries`` machine is 8K wide and is the same
197 for both interrupt mode. The different ranges are defined as follow :
198
199 - ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
200 - ``0x1000 .. 0x1000`` 1 EPOW
201 - ``0x1001 .. 0x1001`` 1 HOTPLUG
202 - ``0x1002 .. 0x10FF`` unused
203 - ``0x1100 .. 0x11FF`` 256 VIO devices
204 - ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices
205 - ``0x1280 .. 0x12FF`` unused
206 - ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated)
207
208 Monitoring XIVE
209 ---------------
210
211 The state of the XIVE interrupt controller can be queried through the
212 monitor commands ``info pic``. The output comes in two parts.
213
214 First, the state of the thread interrupt context registers is dumped
215 for each CPU :
216
217 ::
218
219 (qemu) info pic
220 CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2
221 CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000
222 CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400
223 CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000
224 CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000
225 ...
226
227 In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
228 the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
229 line which is set to the VP identifier.
230
231 Then comes the routing information which aggregates the EAS and the
232 END configuration:
233
234 ::
235
236 ...
237 LISN PQ EISN CPU/PRIO EQ
238 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
239 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
240 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
241 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
242 00000004 MSI -Q M 00000000
243 00000005 MSI -Q M 00000000
244 00000006 MSI -Q M 00000000
245 00000007 MSI -Q M 00000000
246 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
247 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
248 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
249 00001101 MSI -Q M 00000000
250 00001200 LSI -Q M 00000000
251 00001201 LSI -Q M 00000000
252 00001202 LSI -Q M 00000000
253 00001203 LSI -Q M 00000000
254 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
255 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
256 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
257
258 The source information and configuration:
259
260 - The ``LISN`` column outputs the interrupt number of the source in
261 range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
262 - The ``PQ`` column reflects the state of the PQ bits of the source :
263
264 - ``--`` source is ready to take events
265 - ``P-`` an event was sent and an EOI is PENDING
266 - ``PQ`` an event was QUEUED
267 - ``-Q`` source is OFF
268
269 a ``M`` indicates that source is *MASKED* at the EAS level,
270
271 The targeting configuration :
272
273 - The ``EISN`` column is the event data that will be queued in the event
274 queue of the O/S.
275 - The ``CPU/PRIO`` column is the tuple defining the CPU number and
276 priority queue serving the source.
277 - The ``EQ`` column outputs :
278
279 - the current index of the event queue/ the max number of entries
280 - the O/S event queue address
281 - the toggle bit
282 - the last entries that were pushed in the event queue.