]>
Commit | Line | Data |
---|---|---|
24563a58 CLG |
1 | XIVE for sPAPR (pseries machines) |
2 | ================================= | |
3 | ||
4 | The POWER9 processor comes with a new interrupt controller | |
5 | architecture, called XIVE as "eXternal Interrupt Virtualization | |
6 | Engine". It supports a larger number of interrupt sources and offers | |
7 | virtualization features which enables the HW to deliver interrupts | |
8 | directly to virtual processors without hypervisor assistance. | |
9 | ||
10 | A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 | |
11 | processors can run under two interrupt modes: | |
12 | ||
13 | - *Legacy Compatibility Mode* | |
14 | ||
15 | the hypervisor provides identical interfaces and similar | |
16 | functionality to PAPR+ Version 2.7. This is the default mode | |
17 | ||
18 | It is also referred as *XICS* in QEMU. | |
19 | ||
20 | - *XIVE native exploitation mode* | |
21 | ||
22 | the hypervisor provides new interfaces to manage the XIVE control | |
23 | structures, and provides direct control for interrupt management | |
24 | through MMIO pages. | |
25 | ||
26 | Which interrupt modes can be used by the machine is negotiated with | |
27 | the guest O/S during the Client Architecture Support negotiation | |
28 | sequence. The two modes are mutually exclusive. | |
29 | ||
30 | Both interrupt mode share the same IRQ number space. See below for the | |
31 | layout. | |
32 | ||
33 | CAS Negotiation | |
34 | --------------- | |
35 | ||
36 | QEMU advertises the supported interrupt modes in the device tree | |
37 | property "ibm,arch-vec-5-platform-support" in byte 23 and the OS | |
38 | Selection for XIVE is indicated in the "ibm,architecture-vec-5" | |
39 | property byte 23. | |
40 | ||
41 | The interrupt modes supported by the machine depend on the CPU type | |
42 | (POWER9 is required for XIVE) but also on the machine property | |
43 | ``ic-mode`` which can be set on the command line. It can take the | |
44 | following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` | |
45 | is the default but it may change in the future. | |
46 | ||
47 | The choosen interrupt mode is activated after a reconfiguration done | |
48 | in a machine reset. | |
49 | ||
50 | XIVE Device tree properties | |
51 | --------------------------- | |
52 | ||
53 | The properties for the PAPR interrupt controller node when the *XIVE | |
54 | native exploitation mode* is selected shoud contain: | |
55 | ||
56 | - ``device_type`` | |
57 | ||
58 | value should be "power-ivpe". | |
59 | ||
60 | - ``compatible`` | |
61 | ||
62 | value should be "ibm,power-ivpe". | |
63 | ||
64 | - ``reg`` | |
65 | ||
66 | contains the base address and size of the thread interrupt | |
67 | managnement areas (TIMA), for the User level and for the Guest OS | |
68 | level. Only the Guest OS level is taken into account today. | |
69 | ||
70 | - ``ibm,xive-eq-sizes`` | |
71 | ||
72 | the size of the event queues. One cell per size supported, contains | |
73 | log2 of size, in ascending order. | |
74 | ||
75 | - ``ibm,xive-lisn-ranges`` | |
76 | ||
77 | the IRQ interrupt number ranges assigned to the guest for the IPIs. | |
78 | ||
79 | The root node also exports : | |
80 | ||
81 | - ``ibm,plat-res-int-priorities`` | |
82 | ||
83 | contains a list of priorities that the hypervisor has reserved for | |
84 | its own use. | |
85 | ||
86 | IRQ number space | |
87 | ---------------- | |
88 | ||
89 | IRQ Number space of the ``pseries`` machine is 8K wide and is the same | |
90 | for both interrupt mode. The different ranges are defined as follow : | |
91 | ||
92 | - ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) | |
93 | - ``0x1000 .. 0x1000`` 1 EPOW | |
94 | - ``0x1001 .. 0x1001`` 1 HOTPLUG | |
95 | - ``0x1100 .. 0x11FF`` 256 VIO devices | |
96 | - ``0x1200 .. 0x127F`` 32 PHBs devices | |
97 | - ``0x1280 .. 0x12FF`` unused | |
98 | - ``0x1300 .. 0x1FFF`` PHB MSIs | |
99 | ||
100 | Monitoring XIVE | |
101 | --------------- | |
102 | ||
103 | The state of the XIVE interrupt controller can be queried through the | |
104 | monitor commands ``info pic``. The output comes in two parts. | |
105 | ||
106 | First, the state of the thread interrupt context registers is dumped | |
107 | for each CPU : | |
108 | ||
109 | :: | |
110 | ||
111 | (qemu) info pic | |
112 | CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 | |
113 | CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 | |
114 | CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 | |
115 | CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 | |
116 | CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 | |
117 | ... | |
118 | ||
119 | In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only | |
120 | the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM | |
121 | line which is set to the VP identifier. | |
122 | ||
123 | Then comes the routing information which aggregates the EAS and the | |
124 | END configuration: | |
125 | ||
126 | :: | |
127 | ||
128 | ... | |
129 | LISN PQ EISN CPU/PRIO EQ | |
130 | 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] | |
131 | 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] | |
132 | 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] | |
133 | 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] | |
134 | 00000004 MSI -Q M 00000000 | |
135 | 00000005 MSI -Q M 00000000 | |
136 | 00000006 MSI -Q M 00000000 | |
137 | 00000007 MSI -Q M 00000000 | |
138 | 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] | |
139 | 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] | |
140 | 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] | |
141 | 00001101 MSI -Q M 00000000 | |
142 | 00001200 LSI -Q M 00000000 | |
143 | 00001201 LSI -Q M 00000000 | |
144 | 00001202 LSI -Q M 00000000 | |
145 | 00001203 LSI -Q M 00000000 | |
146 | 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] | |
147 | 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] | |
148 | 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] | |
149 | ||
150 | The source information and configuration: | |
151 | ||
152 | - The ``LISN`` column outputs the interrupt number of the source in | |
153 | range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` | |
154 | - The ``PQ`` column reflects the state of the PQ bits of the source : | |
155 | ||
156 | - ``--`` source is ready to take events | |
157 | - ``P-`` an event was sent and an EOI is PENDING | |
158 | - ``PQ`` an event was QUEUED | |
159 | - ``-Q`` source is OFF | |
160 | ||
161 | a ``M`` indicates that source is *MASKED* at the EAS level, | |
162 | ||
163 | The targeting configuration : | |
164 | ||
165 | - The ``EISN`` column is the event data that will be queued in the event | |
166 | queue of the O/S. | |
167 | - The ``CPU/PRIO`` column is the tuple defining the CPU number and | |
168 | priority queue serving the source. | |
169 | - The ``EQ`` column outputs : | |
170 | ||
171 | - the current index of the event queue/ the max number of entries | |
172 | - the O/S event queue address | |
173 | - the toggle bit | |
174 | - the last entries that were pushed in the event queue. |