]>
Commit | Line | Data |
---|---|---|
4d2e26a3 | 1 | ==================================== |
a9282d01 IM |
2 | Coherent Accelerator Interface (CXL) |
3 | ==================================== | |
4 | ||
5 | Introduction | |
6 | ============ | |
7 | ||
8 | The coherent accelerator interface is designed to allow the | |
9 | coherent connection of accelerators (FPGAs and other devices) to a | |
10 | POWER system. These devices need to adhere to the Coherent | |
11 | Accelerator Interface Architecture (CAIA). | |
12 | ||
13 | IBM refers to this as the Coherent Accelerator Processor Interface | |
14 | or CAPI. In the kernel it's referred to by the name CXL to avoid | |
15 | confusion with the ISDN CAPI subsystem. | |
16 | ||
17 | Coherent in this context means that the accelerator and CPUs can | |
18 | both access system memory directly and with the same effective | |
19 | addresses. | |
20 | ||
21 | ||
22 | Hardware overview | |
23 | ================= | |
24 | ||
4d2e26a3 MCC |
25 | :: |
26 | ||
f24be42a | 27 | POWER8/9 FPGA |
a9282d01 IM |
28 | +----------+ +---------+ |
29 | | | | | | |
30 | | CPU | | AFU | | |
31 | | | | | | |
32 | | | | | | |
33 | | | | | | |
34 | +----------+ +---------+ | |
35 | | PHB | | | | |
36 | | +------+ | PSL | | |
37 | | | CAPP |<------>| | | |
38 | +---+------+ PCIE +---------+ | |
39 | ||
f24be42a | 40 | The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP) |
a9282d01 IM |
41 | unit which is part of the PCIe Host Bridge (PHB). This is managed |
42 | by Linux by calls into OPAL. Linux doesn't directly program the | |
43 | CAPP. | |
44 | ||
45 | The FPGA (or coherently attached device) consists of two parts. | |
46 | The POWER Service Layer (PSL) and the Accelerator Function Unit | |
47 | (AFU). The AFU is used to implement specific functionality behind | |
48 | the PSL. The PSL, among other things, provides memory address | |
49 | translation services to allow each AFU direct access to userspace | |
50 | memory. | |
51 | ||
52 | The AFU is the core part of the accelerator (eg. the compression, | |
53 | crypto etc function). The kernel has no knowledge of the function | |
54 | of the AFU. Only userspace interacts directly with the AFU. | |
55 | ||
56 | The PSL provides the translation and interrupt services that the | |
57 | AFU needs. This is what the kernel interacts with. For example, if | |
58 | the AFU needs to read a particular effective address, it sends | |
59 | that address to the PSL, the PSL then translates it, fetches the | |
60 | data from memory and returns it to the AFU. If the PSL has a | |
61 | translation miss, it interrupts the kernel and the kernel services | |
62 | the fault. The context to which this fault is serviced is based on | |
63 | who owns that acceleration function. | |
64 | ||
4d2e26a3 MCC |
65 | - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0. |
66 | - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0. | |
67 | ||
f24be42a | 68 | This PSL Version 9 provides new features such as: |
4d2e26a3 | 69 | |
f24be42a CL |
70 | * Interaction with the nest MMU on the P9 chip. |
71 | * Native DMA support. | |
72 | * Supports sending ASB_Notify messages for host thread wakeup. | |
73 | * Supports Atomic operations. | |
4d2e26a3 | 74 | * etc. |
f24be42a CL |
75 | |
76 | Cards with a PSL9 won't work on a POWER8 system and cards with a | |
77 | PSL8 won't work on a POWER9 system. | |
a9282d01 IM |
78 | |
79 | AFU Modes | |
80 | ========= | |
81 | ||
82 | There are two programming modes supported by the AFU. Dedicated | |
83 | and AFU directed. AFU may support one or both modes. | |
84 | ||
85 | When using dedicated mode only one MMU context is supported. In | |
86 | this mode, only one userspace process can use the accelerator at | |
87 | time. | |
88 | ||
89 | When using AFU directed mode, up to 16K simultaneous contexts can | |
90 | be supported. This means up to 16K simultaneous userspace | |
91 | applications may use the accelerator (although specific AFUs may | |
92 | support fewer). In this mode, the AFU sends a 16 bit context ID | |
93 | with each of its requests. This tells the PSL which context is | |
94 | associated with each operation. If the PSL can't translate an | |
95 | operation, the ID can also be accessed by the kernel so it can | |
96 | determine the userspace context associated with an operation. | |
97 | ||
98 | ||
99 | MMIO space | |
100 | ========== | |
101 | ||
102 | A portion of the accelerator MMIO space can be directly mapped | |
103 | from the AFU to userspace. Either the whole space can be mapped or | |
104 | just a per context portion. The hardware is self describing, hence | |
105 | the kernel can determine the offset and size of the per context | |
106 | portion. | |
107 | ||
108 | ||
109 | Interrupts | |
110 | ========== | |
111 | ||
112 | AFUs may generate interrupts that are destined for userspace. These | |
113 | are received by the kernel as hardware interrupts and passed onto | |
114 | userspace by a read syscall documented below. | |
115 | ||
116 | Data storage faults and error interrupts are handled by the kernel | |
117 | driver. | |
118 | ||
119 | ||
120 | Work Element Descriptor (WED) | |
121 | ============================= | |
122 | ||
123 | The WED is a 64-bit parameter passed to the AFU when a context is | |
124 | started. Its format is up to the AFU hence the kernel has no | |
125 | knowledge of what it represents. Typically it will be the | |
126 | effective address of a work queue or status block where the AFU | |
127 | and userspace can share control and status information. | |
128 | ||
129 | ||
130 | ||
131 | ||
132 | User API | |
133 | ======== | |
134 | ||
594ff7d0 | 135 | 1. AFU character devices |
8f97986c | 136 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
594ff7d0 | 137 | |
a9282d01 IM |
138 | For AFUs operating in AFU directed mode, two character device |
139 | files will be created. /dev/cxl/afu0.0m will correspond to a | |
140 | master context and /dev/cxl/afu0.0s will correspond to a slave | |
141 | context. Master contexts have access to the full MMIO space an | |
142 | AFU provides. Slave contexts have access to only the per process | |
143 | MMIO space an AFU provides. | |
144 | ||
145 | For AFUs operating in dedicated process mode, the driver will | |
146 | only create a single character device per AFU called | |
147 | /dev/cxl/afu0.0d. This will have access to the entire MMIO space | |
148 | that the AFU provides (like master contexts in AFU directed). | |
149 | ||
150 | The types described below are defined in include/uapi/misc/cxl.h | |
151 | ||
152 | The following file operations are supported on both slave and | |
153 | master devices. | |
154 | ||
dc12f20b | 155 | A userspace library libcxl is available here: |
4d2e26a3 | 156 | |
aee85fb6 | 157 | https://github.com/ibm-capi/libcxl |
4d2e26a3 | 158 | |
aee85fb6 | 159 | This provides a C interface to this kernel API. |
a9282d01 IM |
160 | |
161 | open | |
162 | ---- | |
163 | ||
164 | Opens the device and allocates a file descriptor to be used with | |
165 | the rest of the API. | |
166 | ||
167 | A dedicated mode AFU only has one context and only allows the | |
168 | device to be opened once. | |
169 | ||
170 | An AFU directed mode AFU can have many contexts, the device can be | |
171 | opened once for each context that is available. | |
172 | ||
173 | When all available contexts are allocated the open call will fail | |
174 | and return -ENOSPC. | |
175 | ||
4d2e26a3 MCC |
176 | Note: |
177 | IRQs need to be allocated for each context, which may limit | |
a9282d01 IM |
178 | the number of contexts that can be created, and therefore |
179 | how many times the device can be opened. The POWER8 CAPP | |
180 | supports 2040 IRQs and 3 are used by the kernel, so 2037 are | |
181 | left. If 1 IRQ is needed per context, then only 2037 | |
182 | contexts can be allocated. If 4 IRQs are needed per context, | |
183 | then only 2037/4 = 509 contexts can be allocated. | |
184 | ||
185 | ||
186 | ioctl | |
187 | ----- | |
188 | ||
189 | CXL_IOCTL_START_WORK: | |
190 | Starts the AFU context and associates it with the current | |
191 | process. Once this ioctl is successfully executed, all memory | |
192 | mapped into this process is accessible to this AFU context | |
193 | using the same effective addresses. No additional calls are | |
194 | required to map/unmap memory. The AFU memory context will be | |
195 | updated as userspace allocates and frees memory. This ioctl | |
196 | returns once the AFU context is started. | |
197 | ||
4d2e26a3 MCC |
198 | Takes a pointer to a struct cxl_ioctl_start_work |
199 | ||
200 | :: | |
a9282d01 IM |
201 | |
202 | struct cxl_ioctl_start_work { | |
203 | __u64 flags; | |
204 | __u64 work_element_descriptor; | |
205 | __u64 amr; | |
206 | __s16 num_interrupts; | |
207 | __s16 reserved1; | |
208 | __s32 reserved2; | |
209 | __u64 reserved3; | |
210 | __u64 reserved4; | |
211 | __u64 reserved5; | |
212 | __u64 reserved6; | |
213 | }; | |
214 | ||
215 | flags: | |
216 | Indicates which optional fields in the structure are | |
217 | valid. | |
218 | ||
219 | work_element_descriptor: | |
220 | The Work Element Descriptor (WED) is a 64-bit argument | |
221 | defined by the AFU. Typically this is an effective | |
222 | address pointing to an AFU specific structure | |
223 | describing what work to perform. | |
224 | ||
225 | amr: | |
226 | Authority Mask Register (AMR), same as the powerpc | |
227 | AMR. This field is only used by the kernel when the | |
228 | corresponding CXL_START_WORK_AMR value is specified in | |
229 | flags. If not specified the kernel will use a default | |
230 | value of 0. | |
231 | ||
232 | num_interrupts: | |
233 | Number of userspace interrupts to request. This field | |
234 | is only used by the kernel when the corresponding | |
235 | CXL_START_WORK_NUM_IRQS value is specified in flags. | |
236 | If not specified the minimum number required by the | |
237 | AFU will be allocated. The min and max number can be | |
238 | obtained from sysfs. | |
239 | ||
240 | reserved fields: | |
241 | For ABI padding and future extensions | |
242 | ||
243 | CXL_IOCTL_GET_PROCESS_ELEMENT: | |
244 | Get the current context id, also known as the process element. | |
245 | The value is returned from the kernel as a __u32. | |
246 | ||
247 | ||
248 | mmap | |
249 | ---- | |
250 | ||
251 | An AFU may have an MMIO space to facilitate communication with the | |
252 | AFU. If it does, the MMIO space can be accessed via mmap. The size | |
253 | and contents of this area are specific to the particular AFU. The | |
254 | size can be discovered via sysfs. | |
255 | ||
256 | In AFU directed mode, master contexts are allowed to map all of | |
257 | the MMIO space and slave contexts are allowed to only map the per | |
258 | process MMIO space associated with the context. In dedicated | |
259 | process mode the entire MMIO space can always be mapped. | |
260 | ||
261 | This mmap call must be done after the START_WORK ioctl. | |
262 | ||
263 | Care should be taken when accessing MMIO space. Only 32 and 64-bit | |
264 | accesses are supported by POWER8. Also, the AFU will be designed | |
265 | with a specific endianness, so all MMIO accesses should consider | |
266 | endianness (recommend endian(3) variants like: le64toh(), | |
267 | be64toh() etc). These endian issues equally apply to shared memory | |
268 | queues the WED may describe. | |
269 | ||
270 | ||
271 | read | |
272 | ---- | |
273 | ||
274 | Reads events from the AFU. Blocks if no events are pending | |
275 | (unless O_NONBLOCK is supplied). Returns -EIO in the case of an | |
276 | unrecoverable error or if the card is removed. | |
277 | ||
278 | read() will always return an integral number of events. | |
279 | ||
280 | The buffer passed to read() must be at least 4K bytes. | |
281 | ||
282 | The result of the read will be a buffer of one or more events, | |
4d2e26a3 | 283 | each event is of type struct cxl_event, of varying size:: |
a9282d01 IM |
284 | |
285 | struct cxl_event { | |
286 | struct cxl_event_header header; | |
287 | union { | |
288 | struct cxl_event_afu_interrupt irq; | |
289 | struct cxl_event_data_storage fault; | |
290 | struct cxl_event_afu_error afu_error; | |
291 | }; | |
292 | }; | |
293 | ||
4d2e26a3 MCC |
294 | The struct cxl_event_header is defined as |
295 | ||
296 | :: | |
a9282d01 IM |
297 | |
298 | struct cxl_event_header { | |
299 | __u16 type; | |
300 | __u16 size; | |
301 | __u16 process_element; | |
302 | __u16 reserved1; | |
303 | }; | |
304 | ||
305 | type: | |
306 | This defines the type of event. The type determines how | |
307 | the rest of the event is structured. These types are | |
308 | described below and defined by enum cxl_event_type. | |
309 | ||
310 | size: | |
311 | This is the size of the event in bytes including the | |
312 | struct cxl_event_header. The start of the next event can | |
313 | be found at this offset from the start of the current | |
314 | event. | |
315 | ||
316 | process_element: | |
317 | Context ID of the event. | |
318 | ||
319 | reserved field: | |
320 | For future extensions and padding. | |
321 | ||
322 | If the event type is CXL_EVENT_AFU_INTERRUPT then the event | |
4d2e26a3 MCC |
323 | structure is defined as |
324 | ||
325 | :: | |
a9282d01 IM |
326 | |
327 | struct cxl_event_afu_interrupt { | |
328 | __u16 flags; | |
329 | __u16 irq; /* Raised AFU interrupt number */ | |
330 | __u32 reserved1; | |
331 | }; | |
332 | ||
333 | flags: | |
334 | These flags indicate which optional fields are present | |
335 | in this struct. Currently all fields are mandatory. | |
336 | ||
337 | irq: | |
338 | The IRQ number sent by the AFU. | |
339 | ||
340 | reserved field: | |
341 | For future extensions and padding. | |
342 | ||
343 | If the event type is CXL_EVENT_DATA_STORAGE then the event | |
4d2e26a3 MCC |
344 | structure is defined as |
345 | ||
346 | :: | |
a9282d01 IM |
347 | |
348 | struct cxl_event_data_storage { | |
349 | __u16 flags; | |
350 | __u16 reserved1; | |
351 | __u32 reserved2; | |
352 | __u64 addr; | |
353 | __u64 dsisr; | |
354 | __u64 reserved3; | |
355 | }; | |
356 | ||
357 | flags: | |
358 | These flags indicate which optional fields are present in | |
359 | this struct. Currently all fields are mandatory. | |
360 | ||
361 | address: | |
362 | The address that the AFU unsuccessfully attempted to | |
363 | access. Valid accesses will be handled transparently by the | |
364 | kernel but invalid accesses will generate this event. | |
365 | ||
366 | dsisr: | |
367 | This field gives information on the type of fault. It is a | |
368 | copy of the DSISR from the PSL hardware when the address | |
369 | fault occurred. The form of the DSISR is as defined in the | |
370 | CAIA. | |
371 | ||
372 | reserved fields: | |
373 | For future extensions | |
374 | ||
375 | If the event type is CXL_EVENT_AFU_ERROR then the event structure | |
4d2e26a3 MCC |
376 | is defined as |
377 | ||
378 | :: | |
a9282d01 IM |
379 | |
380 | struct cxl_event_afu_error { | |
381 | __u16 flags; | |
382 | __u16 reserved1; | |
383 | __u32 reserved2; | |
384 | __u64 error; | |
385 | }; | |
386 | ||
387 | flags: | |
388 | These flags indicate which optional fields are present in | |
389 | this struct. Currently all fields are Mandatory. | |
390 | ||
391 | error: | |
392 | Error status from the AFU. Defined by the AFU. | |
393 | ||
394 | reserved fields: | |
395 | For future extensions and padding | |
396 | ||
594ff7d0 CL |
397 | |
398 | 2. Card character device (powerVM guest only) | |
8f97986c | 399 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
594ff7d0 CL |
400 | |
401 | In a powerVM guest, an extra character device is created for the | |
402 | card. The device is only used to write (flash) a new image on the | |
403 | FPGA accelerator. Once the image is written and verified, the | |
404 | device tree is updated and the card is reset to reload the updated | |
405 | image. | |
406 | ||
407 | open | |
408 | ---- | |
409 | ||
410 | Opens the device and allocates a file descriptor to be used with | |
411 | the rest of the API. The device can only be opened once. | |
412 | ||
413 | ioctl | |
414 | ----- | |
415 | ||
4d2e26a3 | 416 | CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE: |
594ff7d0 CL |
417 | Starts and controls flashing a new FPGA image. Partial |
418 | reconfiguration is not supported (yet), so the image must contain | |
419 | a copy of the PSL and AFU(s). Since an image can be quite large, | |
420 | the caller may have to iterate, splitting the image in smaller | |
421 | chunks. | |
422 | ||
4d2e26a3 MCC |
423 | Takes a pointer to a struct cxl_adapter_image:: |
424 | ||
594ff7d0 CL |
425 | struct cxl_adapter_image { |
426 | __u64 flags; | |
427 | __u64 data; | |
428 | __u64 len_data; | |
429 | __u64 len_image; | |
430 | __u64 reserved1; | |
431 | __u64 reserved2; | |
432 | __u64 reserved3; | |
433 | __u64 reserved4; | |
434 | }; | |
435 | ||
436 | flags: | |
437 | These flags indicate which optional fields are present in | |
438 | this struct. Currently all fields are mandatory. | |
439 | ||
440 | data: | |
441 | Pointer to a buffer with part of the image to write to the | |
442 | card. | |
443 | ||
444 | len_data: | |
445 | Size of the buffer pointed to by data. | |
446 | ||
447 | len_image: | |
448 | Full size of the image. | |
449 | ||
450 | ||
a9282d01 IM |
451 | Sysfs Class |
452 | =========== | |
453 | ||
454 | A cxl sysfs class is added under /sys/class/cxl to facilitate | |
455 | enumeration and tuning of the accelerators. Its layout is | |
456 | described in Documentation/ABI/testing/sysfs-class-cxl | |
457 | ||
aee85fb6 | 458 | |
a9282d01 IM |
459 | Udev rules |
460 | ========== | |
461 | ||
462 | The following udev rules could be used to create a symlink to the | |
463 | most logical chardev to use in any programming mode (afuX.Yd for | |
464 | dedicated, afuX.Ys for afu directed), since the API is virtually | |
4d2e26a3 | 465 | identical for each:: |
a9282d01 IM |
466 | |
467 | SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b" | |
468 | SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \ | |
469 | KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b" |