]>
Commit | Line | Data |
---|---|---|
5fb004a2 DG |
1 | APEI tables generating and CPER record |
2 | ====================================== | |
3 | ||
4 | .. | |
5 | Copyright (c) 2020 HUAWEI TECHNOLOGIES CO., LTD. | |
6 | ||
7 | This work is licensed under the terms of the GNU GPL, version 2 or later. | |
8 | See the COPYING file in the top-level directory. | |
9 | ||
10 | Design Details | |
11 | -------------- | |
12 | ||
13 | :: | |
14 | ||
15 | etc/acpi/tables etc/hardware_errors | |
16 | ==================== =============================== | |
17 | + +--------------------------+ +----------------------------+ | |
18 | | | HEST | +--------->| error_block_address1 |------+ | |
19 | | +--------------------------+ | +----------------------------+ | | |
20 | | | GHES1 | | +------->| error_block_address2 |------+-+ | |
21 | | +--------------------------+ | | +----------------------------+ | | | |
22 | | | ................. | | | | .............. | | | | |
23 | | | error_status_address-----+-+ | -----------------------------+ | | | |
24 | | | ................. | | +--->| error_block_addressN |------+-+---+ | |
25 | | | read_ack_register--------+-+ | | +----------------------------+ | | | | |
26 | | | read_ack_preserve | +-+---+--->| read_ack_register1 | | | | | |
27 | | | read_ack_write | | | +----------------------------+ | | | | |
28 | + +--------------------------+ | +-+--->| read_ack_register2 | | | | | |
29 | | | GHES2 | | | | +----------------------------+ | | | | |
30 | + +--------------------------+ | | | | ............. | | | | | |
31 | | | ................. | | | | +----------------------------+ | | | | |
32 | | | error_status_address-----+---+ | | +->| read_ack_registerN | | | | | |
33 | | | ................. | | | | +----------------------------+ | | | | |
34 | | | read_ack_register--------+-----+ | | |Generic Error Status Block 1|<-----+ | | | |
35 | | | read_ack_preserve | | | |-+------------------------+-+ | | | |
36 | | | read_ack_write | | | | | CPER | | | | | |
37 | + +--------------------------| | | | | CPER | | | | | |
38 | | | ............... | | | | | .... | | | | | |
39 | + +--------------------------+ | | | | CPER | | | | | |
40 | | | GHESN | | | |-+------------------------+-| | | | |
41 | + +--------------------------+ | | |Generic Error Status Block 2|<-------+ | | |
42 | | | ................. | | | |-+------------------------+-+ | | |
43 | | | error_status_address-----+-------+ | | | CPER | | | | |
44 | | | ................. | | | | CPER | | | | |
45 | | | read_ack_register--------+---------+ | | .... | | | | |
46 | | | read_ack_preserve | | | CPER | | | | |
47 | | | read_ack_write | +-+------------------------+-+ | | |
48 | + +--------------------------+ | .......... | | | |
49 | |----------------------------+ | | |
50 | |Generic Error Status Block N |<----------+ | |
51 | |-+-------------------------+-+ | |
52 | | | CPER | | | |
53 | | | CPER | | | |
54 | | | .... | | | |
55 | | | CPER | | | |
56 | +-+-------------------------+-+ | |
57 | ||
58 | ||
59 | (1) QEMU generates the ACPI HEST table. This table goes in the current | |
60 | "etc/acpi/tables" fw_cfg blob. Each error source has different | |
61 | notification types. | |
62 | ||
63 | (2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU | |
64 | also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob | |
65 | contains an address registers table and an Error Status Data Block table. | |
66 | ||
67 | (3) The address registers table contains N Error Block Address entries | |
68 | and N Read Ack Register entries. The size for each entry is 8-byte. | |
69 | The Error Status Data Block table contains N Error Status Data Block | |
70 | entries. The size for each entry is 4096(0x1000) bytes. The total size | |
71 | for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. | |
72 | N is the number of the kinds of hardware error sources. | |
73 | ||
74 | (4) QEMU generates the ACPI linker/loader script for the firmware. The | |
75 | firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors" | |
76 | and copies blob contents there. | |
77 | ||
78 | (5) QEMU generates N ADD_POINTER commands, which patch addresses in the | |
79 | "error_status_address" fields of the HEST table with a pointer to the | |
80 | corresponding "address registers" in the "etc/hardware_errors" blob. | |
81 | ||
82 | (6) QEMU generates N ADD_POINTER commands, which patch addresses in the | |
83 | "read_ack_register" fields of the HEST table with a pointer to the | |
84 | corresponding "read_ack_register" within the "etc/hardware_errors" blob. | |
85 | ||
86 | (7) QEMU generates N ADD_POINTER commands for the firmware, which patch | |
87 | addresses in the "error_block_address" fields with a pointer to the | |
88 | respective "Error Status Data Block" in the "etc/hardware_errors" blob. | |
89 | ||
90 | (8) QEMU defines a third and write-only fw_cfg blob which is called | |
91 | "etc/hardware_errors_addr". Through that blob, the firmware can send back | |
92 | the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr" | |
93 | blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command | |
94 | for the firmware. The firmware will write back the start address of | |
95 | "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr". | |
96 | ||
97 | (9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into corresponding | |
98 | "Error Status Data Block", guest memory, and then injects platform specific | |
99 | interrupt (in case of arm/virt machine it's Synchronous External Abort) as a | |
100 | notification which is necessary for notifying the guest. | |
101 | ||
102 | (10) This notification (in virtual hardware) will be handled by the guest | |
103 | kernel, on receiving notification, guest APEI driver could read the CPER error | |
104 | and take appropriate action. | |
105 | ||
106 | (11) kvm_arch_on_sigbus_vcpu() uses source_id as index in "etc/hardware_errors" to | |
107 | find out "Error Status Data Block" entry corresponding to error source. So supported | |
108 | source_id values should be assigned here and not be changed afterwards to make sure | |
109 | that guest will write error into expected "Error Status Data Block" even if guest was | |
110 | migrated to a newer QEMU. |