]>
Commit | Line | Data |
---|---|---|
b27a2d04 MCC |
1 | .. include:: <isonum.txt> |
2 | ||
9c058d24 MCC |
3 | ============================================ |
4 | Reliability, Availability and Serviceability | |
5 | ============================================ | |
6 | ||
7 | RAS concepts | |
8 | ************ | |
9 | ||
10 | Reliability, Availability and Serviceability (RAS) is a concept used on | |
9f02a486 | 11 | servers meant to measure their robustness. |
9c058d24 MCC |
12 | |
13 | Reliability | |
14 | is the probability that a system will produce correct outputs. | |
15 | ||
16 | * Generally measured as Mean Time Between Failures (MTBF) | |
17 | * Enhanced by features that help to avoid, detect and repair hardware faults | |
18 | ||
19 | Availability | |
20 | is the probability that a system is operational at a given time | |
21 | ||
22 | * Generally measured as a percentage of downtime per a period of time | |
23 | * Often uses mechanisms to detect and correct hardware faults in | |
24 | runtime; | |
25 | ||
26 | Serviceability (or maintainability) | |
27 | is the simplicity and speed with which a system can be repaired or | |
28 | maintained | |
29 | ||
30 | * Generally measured on Mean Time Between Repair (MTBR) | |
31 | ||
32 | Improving RAS | |
33 | ------------- | |
34 | ||
35 | In order to reduce systems downtime, a system should be capable of detecting | |
36 | hardware errors, and, when possible correcting them in runtime. It should | |
37 | also provide mechanisms to detect hardware degradation, in order to warn | |
38 | the system administrator to take the action of replacing a component before | |
39 | it causes data loss or system downtime. | |
40 | ||
41 | Among the monitoring measures, the most usual ones include: | |
42 | ||
43 | * CPU – detect errors at instruction execution and at L1/L2/L3 caches; | |
44 | * Memory – add error correction logic (ECC) to detect and correct errors; | |
9f02a486 | 45 | * I/O – add CRC checksums for transferred data; |
9c058d24 MCC |
46 | * Storage – RAID, journal file systems, checksums, |
47 | Self-Monitoring, Analysis and Reporting Technology (SMART). | |
48 | ||
49 | By monitoring the number of occurrences of error detections, it is possible | |
50 | to identify if the probability of hardware errors is increasing, and, on such | |
9f02a486 | 51 | case, do a preventive maintenance to replace a degraded component while |
9c058d24 MCC |
52 | those errors are correctable. |
53 | ||
54 | Types of errors | |
55 | --------------- | |
56 | ||
57 | Most mechanisms used on modern systems use use technologies like Hamming | |
58 | Codes that allow error correction when the number of errors on a bit packet | |
59 | is below a threshold. If the number of errors is above, those mechanisms | |
60 | can indicate with a high degree of confidence that an error happened, but | |
61 | they can't correct. | |
62 | ||
63 | Also, sometimes an error occur on a component that it is not used. For | |
64 | example, a part of the memory that it is not currently allocated. | |
65 | ||
66 | That defines some categories of errors: | |
67 | ||
68 | * **Correctable Error (CE)** - the error detection mechanism detected and | |
69 | corrected the error. Such errors are usually not fatal, although some | |
70 | Kernel mechanisms allow the system administrator to consider them as fatal. | |
71 | ||
72 | * **Uncorrected Error (UE)** - the amount of errors happened above the error | |
73 | correction threshold, and the system was unable to auto-correct. | |
74 | ||
75 | * **Fatal Error** - when an UE error happens on a critical component of the | |
76 | system (for example, a piece of the Kernel got corrupted by an UE), the | |
77 | only reliable way to avoid data corruption is to hang or reboot the machine. | |
78 | ||
79 | * **Non-fatal Error** - when an UE error happens on an unused component, | |
80 | like a CPU in power down state or an unused memory bank, the system may | |
81 | still run, eventually replacing the affected hardware by a hot spare, | |
82 | if available. | |
83 | ||
9332ef9d | 84 | Also, when an error happens on a userspace process, it is also possible to |
9c058d24 MCC |
85 | kill such process and let userspace restart it. |
86 | ||
87 | The mechanism for handling non-fatal errors is usually complex and may | |
88 | require the help of some userspace application, in order to apply the | |
89 | policy desired by the system administrator. | |
90 | ||
91 | Identifying a bad hardware component | |
92 | ------------------------------------ | |
93 | ||
94 | Just detecting a hardware flaw is usually not enough, as the system needs | |
95 | to pinpoint to the minimal replaceable unit (MRU) that should be exchanged | |
96 | to make the hardware reliable again. | |
97 | ||
98 | So, it requires not only error logging facilities, but also mechanisms that | |
99 | will translate the error message to the silkscreen or component label for | |
100 | the MRU. | |
101 | ||
102 | Typically, it is very complex for memory, as modern CPUs interlace memory | |
103 | from different memory modules, in order to provide a better performance. The | |
104 | DMI BIOS usually have a list of memory module labels, with can be obtained | |
105 | using the ``dmidecode`` tool. For example, on a desktop machine, it shows:: | |
106 | ||
107 | Memory Device | |
108 | Total Width: 64 bits | |
109 | Data Width: 64 bits | |
110 | Size: 16384 MB | |
111 | Form Factor: SODIMM | |
112 | Set: None | |
113 | Locator: ChannelA-DIMM0 | |
114 | Bank Locator: BANK 0 | |
115 | Type: DDR4 | |
116 | Type Detail: Synchronous | |
117 | Speed: 2133 MHz | |
118 | Rank: 2 | |
119 | Configured Clock Speed: 2133 MHz | |
120 | ||
121 | On the above example, a DDR4 SO-DIMM memory module is located at the | |
122 | system's memory labeled as "BANK 0", as given by the *bank locator* field. | |
123 | Please notice that, on such system, the *total width* is equal to the | |
9f02a486 | 124 | *data width*. It means that such memory module doesn't have error |
9c058d24 MCC |
125 | detection/correction mechanisms. |
126 | ||
127 | Unfortunately, not all systems use the same field to specify the memory | |
128 | bank. On this example, from an older server, ``dmidecode`` shows:: | |
129 | ||
130 | Memory Device | |
131 | Array Handle: 0x1000 | |
132 | Error Information Handle: Not Provided | |
133 | Total Width: 72 bits | |
134 | Data Width: 64 bits | |
135 | Size: 8192 MB | |
136 | Form Factor: DIMM | |
137 | Set: 1 | |
138 | Locator: DIMM_A1 | |
139 | Bank Locator: Not Specified | |
140 | Type: DDR3 | |
141 | Type Detail: Synchronous Registered (Buffered) | |
142 | Speed: 1600 MHz | |
143 | Rank: 2 | |
144 | Configured Clock Speed: 1600 MHz | |
145 | ||
146 | There, the DDR3 RDIMM memory module is located at the system's memory labeled | |
147 | as "DIMM_A1", as given by the *locator* field. Please notice that this | |
9f02a486 | 148 | memory module has 64 bits of *data width* and 72 bits of *total width*. So, |
9c058d24 MCC |
149 | it has 8 extra bits to be used by error detection and correction mechanisms. |
150 | Such kind of memory is called Error-correcting code memory (ECC memory). | |
151 | ||
152 | To make things even worse, it is not uncommon that systems with different | |
153 | labels on their system's board to use exactly the same BIOS, meaning that | |
154 | the labels provided by the BIOS won't match the real ones. | |
155 | ||
156 | ECC memory | |
157 | ---------- | |
158 | ||
159 | As mentioned on the previous section, ECC memory has extra bits to be | |
160 | used for error correction. So, on 64 bit systems, a memory module | |
161 | has 64 bits of *data width*, and 74 bits of *total width*. So, there are | |
162 | 8 bits extra bits to be used for the error detection and correction | |
163 | mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_. | |
164 | ||
165 | So, when the cpu requests the memory controller to write a word with | |
166 | *data width*, the memory controller calculates the *syndrome* in real time, | |
167 | using Hamming code, or some other error correction code, like SECDED+, | |
168 | producing a code with *total width* size. Such code is then written | |
169 | on the memory modules. | |
170 | ||
171 | At read, the *total width* bits code is converted back, using the same | |
172 | ECC code used on write, producing a word with *data width* and a *syndrome*. | |
173 | The word with *data width* is sent to the CPU, even when errors happen. | |
174 | ||
175 | The memory controller also looks at the *syndrome* in order to check if | |
176 | there was an error, and if the ECC code was able to fix such error. | |
177 | If the error was corrected, a Corrected Error (CE) happened. If not, an | |
178 | Uncorrected Error (UE) happened. | |
179 | ||
180 | The information about the CE/UE errors is stored on some special registers | |
181 | at the memory controller and can be accessed by reading such registers, | |
182 | either by BIOS, by some special CPUs or by Linux EDAC driver. On x86 64 | |
183 | bit CPUs, such errors can also be retrieved via the Machine Check | |
184 | Architecture (MCA)\ [#f3]_. | |
185 | ||
186 | .. [#f1] Please notice that several memory controllers allow operation on a | |
187 | mode called "Lock-Step", where it groups two memory modules together, | |
188 | doing 128-bit reads/writes. That gives 16 bits for error correction, with | |
9f02a486 | 189 | significantly improves the error correction mechanism, at the expense |
9c058d24 MCC |
190 | that, when an error happens, there's no way to know what memory module is |
191 | to blame. So, it has to blame both memory modules. | |
192 | ||
193 | .. [#f2] Some memory controllers also allow using memory in mirror mode. | |
194 | On such mode, the same data is written to two memory modules. At read, | |
195 | the system checks both memory modules, in order to check if both provide | |
196 | identical data. On such configuration, when an error happens, there's no | |
197 | way to know what memory module is to blame. So, it has to blame both | |
198 | memory modules (or 4 memory modules, if the system is also on Lock-step | |
199 | mode). | |
200 | ||
201 | .. [#f3] For more details about the Machine Check Architecture (MCA), | |
202 | please read Documentation/x86/x86_64/machinecheck at the Kernel tree. | |
203 | ||
da9bb1d2 | 204 | EDAC - Error Detection And Correction |
9c058d24 | 205 | ************************************* |
87f24c3a | 206 | |
b27a2d04 | 207 | .. note:: |
e34217c0 | 208 | |
9c058d24 | 209 | "bluesmoke" was the name for this device driver subsystem when it |
b27a2d04 MCC |
210 | was "out-of-tree" and maintained at http://bluesmoke.sourceforge.net. |
211 | That site is mostly archaic now and can be used only for historical | |
212 | purposes. | |
87f24c3a | 213 | |
9c058d24 MCC |
214 | When the subsystem was pushed upstream for the first time, on |
215 | Kernel 2.6.16, for the first time, it was renamed to ``EDAC``. | |
b27a2d04 MCC |
216 | |
217 | Purpose | |
043b4318 | 218 | ------- |
da9bb1d2 | 219 | |
b27a2d04 | 220 | The ``edac`` kernel module's goal is to detect and report hardware errors |
043b4318 | 221 | that occur within the computer system running under linux. |
87f24c3a | 222 | |
b27a2d04 | 223 | Memory |
043b4318 | 224 | ------ |
87f24c3a | 225 | |
043b4318 BP |
226 | Memory Correctable Errors (CE) and Uncorrectable Errors (UE) are the |
227 | primary errors being harvested. These types of errors are harvested by | |
b27a2d04 | 228 | the ``edac_mc`` device. |
da9bb1d2 AC |
229 | |
230 | Detecting CE events, then harvesting those events and reporting them, | |
b27a2d04 | 231 | **can** but must not necessarily be a predictor of future UE events. With |
043b4318 BP |
232 | CE events only, the system can and will continue to operate as no data |
233 | has been damaged yet. | |
234 | ||
235 | However, preventive maintenance and proactive part replacement of memory | |
9c058d24 | 236 | modules exhibiting CEs can reduce the likelihood of the dreaded UE events |
043b4318 | 237 | and system panics. |
da9bb1d2 | 238 | |
b27a2d04 | 239 | Other hardware elements |
043b4318 | 240 | ----------------------- |
87f24c3a | 241 | |
b27a2d04 | 242 | A new feature for EDAC, the ``edac_device`` class of device, was added in |
87f24c3a DT |
243 | the 2.6.23 version of the kernel. |
244 | ||
245 | This new device type allows for non-memory type of ECC hardware detectors | |
246 | to have their states harvested and presented to userspace via the sysfs | |
247 | interface. | |
248 | ||
043b4318 BP |
249 | Some architectures have ECC detectors for L1, L2 and L3 caches, |
250 | along with DMA engines, fabric switches, main data path switches, | |
251 | interconnections, and various other hardware data paths. If the hardware | |
252 | reports it, then a edac_device device probably can be constructed to | |
253 | harvest and present that to userspace. | |
87f24c3a DT |
254 | |
255 | ||
b27a2d04 | 256 | PCI bus scanning |
043b4318 | 257 | ---------------- |
da9bb1d2 | 258 | |
043b4318 BP |
259 | In addition, PCI devices are scanned for PCI Bus Parity and SERR Errors |
260 | in order to determine if errors are occurring during data transfers. | |
87f24c3a | 261 | |
da9bb1d2 | 262 | The presence of PCI Parity errors must be examined with a grain of salt. |
b27a2d04 | 263 | There are several add-in adapters that do **not** follow the PCI specification |
da9bb1d2 AC |
264 | with regards to Parity generation and reporting. The specification says |
265 | the vendor should tie the parity status bits to 0 if they do not intend | |
266 | to generate parity. Some vendors do not do this, and thus the parity bit | |
267 | can "float" giving false positives. | |
268 | ||
043b4318 BP |
269 | There is a PCI device attribute located in sysfs that is checked by |
270 | the EDAC PCI scanning code. If that attribute is set, PCI parity/error | |
b27a2d04 | 271 | scanning is skipped for that device. The attribute is:: |
87f24c3a DT |
272 | |
273 | broken_parity_status | |
274 | ||
b27a2d04 | 275 | and is located in ``/sys/devices/pci<XXX>/0000:XX:YY.Z`` directories for |
87f24c3a DT |
276 | PCI devices. |
277 | ||
da9bb1d2 | 278 | |
b27a2d04 | 279 | Versioning |
043b4318 | 280 | ---------- |
da9bb1d2 | 281 | |
b27a2d04 | 282 | EDAC is composed of a "core" module (``edac_core.ko``) and several Memory |
043b4318 BP |
283 | Controller (MC) driver modules. On a given system, the CORE is loaded |
284 | and one MC driver will be loaded. Both the CORE and the MC driver (or | |
b27a2d04 | 285 | ``edac_device`` driver) have individual versions that reflect current |
043b4318 | 286 | release level of their respective modules. |
87f24c3a | 287 | |
043b4318 BP |
288 | Thus, to "report" on what version a system is running, one must report |
289 | both the CORE's and the MC driver's versions. | |
da9bb1d2 AC |
290 | |
291 | ||
b27a2d04 | 292 | Loading |
043b4318 | 293 | ------- |
da9bb1d2 | 294 | |
b27a2d04 MCC |
295 | If ``edac`` was statically linked with the kernel then no loading |
296 | is necessary. If ``edac`` was built as modules then simply modprobe | |
297 | the ``edac`` pieces that you need. You should be able to modprobe | |
043b4318 BP |
298 | hardware-specific modules and have the dependencies load the necessary |
299 | core modules. | |
da9bb1d2 | 300 | |
b27a2d04 | 301 | Example:: |
da9bb1d2 | 302 | |
b27a2d04 | 303 | $ modprobe amd76x_edac |
da9bb1d2 | 304 | |
b27a2d04 MCC |
305 | loads both the ``amd76x_edac.ko`` memory controller module and the |
306 | ``edac_mc.ko`` core module. | |
da9bb1d2 AC |
307 | |
308 | ||
b27a2d04 | 309 | Sysfs interface |
043b4318 | 310 | --------------- |
da9bb1d2 | 311 | |
b27a2d04 | 312 | EDAC presents a ``sysfs`` interface for control and reporting purposes. It |
043b4318 | 313 | lives in the /sys/devices/system/edac directory. |
87f24c3a | 314 | |
043b4318 | 315 | Within this directory there currently reside 2 components: |
da9bb1d2 | 316 | |
b27a2d04 | 317 | ======= ============================== |
da9bb1d2 | 318 | mc memory controller(s) system |
49c0dab7 | 319 | pci PCI control and status system |
b27a2d04 | 320 | ======= ============================== |
da9bb1d2 AC |
321 | |
322 | ||
043b4318 | 323 | |
da9bb1d2 | 324 | Memory Controller (mc) Model |
043b4318 | 325 | ---------------------------- |
da9bb1d2 | 326 | |
9c058d24 | 327 | Each ``mc`` device controls a set of memory modules [#f4]_. These modules |
b27a2d04 | 328 | are laid out in a Chip-Select Row (``csrowX``) and Channel table (``chX``). |
043b4318 | 329 | There can be multiple csrows and multiple channels. |
da9bb1d2 | 330 | |
9c058d24 MCC |
331 | .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely |
332 | used to refer to a memory module, although there are other memory | |
333 | packaging alternatives, like SO-DIMM, SIMM, etc. Along this document, | |
334 | and inside the EDAC system, the term "dimm" is used for all memory | |
335 | modules, even when they use a different kind of packaging. | |
336 | ||
043b4318 BP |
337 | Memory controllers allow for several csrows, with 8 csrows being a |
338 | typical value. Yet, the actual number of csrows depends on the layout of | |
9c058d24 MCC |
339 | a given motherboard, memory controller and memory module characteristics. |
340 | ||
341 | Dual channels allow for dual data length (e. g. 128 bits, on 64 bit systems) | |
342 | data transfers to/from the CPU from/to memory. Some newer chipsets allow | |
343 | for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory | |
344 | controllers. The following example will assume 2 channels: | |
345 | ||
346 | +------------+-----------------------+ | |
82a19551 JC |
347 | | CS Rows | Channels | |
348 | +------------+-----------+-----------+ | |
349 | | | ``ch0`` | ``ch1`` | | |
9c058d24 MCC |
350 | +============+===========+===========+ |
351 | | ``csrow0`` | DIMM_A0 | DIMM_B0 | | |
352 | +------------+ | | | |
353 | | ``csrow1`` | | | | |
354 | +------------+-----------+-----------+ | |
355 | | ``csrow2`` | DIMM_A1 | DIMM_B1 | | |
356 | +------------+ | | | |
357 | | ``csrow3`` | | | | |
358 | +------------+-----------+-----------+ | |
359 | ||
360 | In the above example, there are 4 physical slots on the motherboard | |
da9bb1d2 AC |
361 | for memory DIMMs: |
362 | ||
9c058d24 MCC |
363 | +---------+---------+ |
364 | | DIMM_A0 | DIMM_B0 | | |
365 | +---------+---------+ | |
366 | | DIMM_A1 | DIMM_B1 | | |
367 | +---------+---------+ | |
da9bb1d2 | 368 | |
043b4318 | 369 | Labels for these slots are usually silk-screened on the motherboard. |
b27a2d04 | 370 | Slots labeled ``A`` are channel 0 in this example. Slots labeled ``B`` are |
043b4318 BP |
371 | channel 1. Notice that there are two csrows possible on a physical DIMM. |
372 | These csrows are allocated their csrow assignment based on the slot into | |
373 | which the memory DIMM is placed. Thus, when 1 DIMM is placed in each | |
374 | Channel, the csrows cross both DIMMs. | |
da9bb1d2 AC |
375 | |
376 | Memory DIMMs come single or dual "ranked". A rank is a populated csrow. | |
377 | Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above | |
9c058d24 MCC |
378 | will have just one csrow (csrow0). csrow1 will be empty. On the other |
379 | hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0 | |
380 | and csrow1 will be populated. The pattern repeats itself for csrow2 and | |
da9bb1d2 AC |
381 | csrow3. |
382 | ||
043b4318 BP |
383 | The representation of the above is reflected in the directory |
384 | tree in EDAC's sysfs interface. Starting in directory | |
9c058d24 MCC |
385 | ``/sys/devices/system/edac/mc``, each memory controller will be |
386 | represented by its own ``mcX`` directory, where ``X`` is the | |
387 | index of the MC:: | |
da9bb1d2 AC |
388 | |
389 | ..../edac/mc/ | |
390 | | | |
391 | |->mc0 | |
392 | |->mc1 | |
393 | |->mc2 | |
394 | .... | |
395 | ||
b27a2d04 MCC |
396 | Under each ``mcX`` directory each ``csrowX`` is again represented by a |
397 | ``csrowX``, where ``X`` is the csrow index:: | |
da9bb1d2 AC |
398 | |
399 | .../mc/mc0/ | |
400 | | | |
401 | |->csrow0 | |
402 | |->csrow2 | |
403 | |->csrow3 | |
404 | .... | |
405 | ||
043b4318 BP |
406 | Notice that there is no csrow1, which indicates that csrow0 is composed |
407 | of a single ranked DIMMs. This should also apply in both Channels, in | |
408 | order to have dual-channel mode be operational. Since both csrow2 and | |
409 | csrow3 are populated, this indicates a dual ranked set of DIMMs for | |
410 | channels 0 and 1. | |
da9bb1d2 | 411 | |
b27a2d04 | 412 | Within each of the ``mcX`` and ``csrowX`` directories are several EDAC |
043b4318 | 413 | control and attribute files. |
da9bb1d2 | 414 | |
b27a2d04 MCC |
415 | ``mcX`` directories |
416 | ------------------- | |
da9bb1d2 | 417 | |
b27a2d04 MCC |
418 | In ``mcX`` directories are EDAC control and attribute files for |
419 | this ``X`` instance of the memory controllers. | |
da9bb1d2 | 420 | |
8b6f04ce | 421 | For a description of the sysfs API, please see: |
b27a2d04 | 422 | |
3aae9edd | 423 | Documentation/ABI/testing/sysfs-devices-edac |
da9bb1d2 AC |
424 | |
425 | ||
032d0ab7 MCC |
426 | ``dimmX`` or ``rankX`` directories |
427 | ---------------------------------- | |
428 | ||
429 | The recommended way to use the EDAC subsystem is to look at the information | |
430 | provided by the ``dimmX`` or ``rankX`` directories [#f5]_. | |
431 | ||
432 | A typical EDAC system has the following structure under | |
433 | ``/sys/devices/system/edac/``\ [#f6]_:: | |
434 | ||
435 | /sys/devices/system/edac/ | |
436 | ├── mc | |
437 | │ ├── mc0 | |
438 | │ │ ├── ce_count | |
439 | │ │ ├── ce_noinfo_count | |
440 | │ │ ├── dimm0 | |
4fb6fde7 | 441 | │ │ │ ├── dimm_ce_count |
032d0ab7 MCC |
442 | │ │ │ ├── dimm_dev_type |
443 | │ │ │ ├── dimm_edac_mode | |
444 | │ │ │ ├── dimm_label | |
445 | │ │ │ ├── dimm_location | |
446 | │ │ │ ├── dimm_mem_type | |
4fb6fde7 | 447 | │ │ │ ├── dimm_ue_count |
032d0ab7 MCC |
448 | │ │ │ ├── size |
449 | │ │ │ └── uevent | |
450 | │ │ ├── max_location | |
451 | │ │ ├── mc_name | |
452 | │ │ ├── reset_counters | |
453 | │ │ ├── seconds_since_reset | |
454 | │ │ ├── size_mb | |
455 | │ │ ├── ue_count | |
456 | │ │ ├── ue_noinfo_count | |
457 | │ │ └── uevent | |
458 | │ ├── mc1 | |
459 | │ │ ├── ce_count | |
460 | │ │ ├── ce_noinfo_count | |
461 | │ │ ├── dimm0 | |
4fb6fde7 | 462 | │ │ │ ├── dimm_ce_count |
032d0ab7 MCC |
463 | │ │ │ ├── dimm_dev_type |
464 | │ │ │ ├── dimm_edac_mode | |
465 | │ │ │ ├── dimm_label | |
466 | │ │ │ ├── dimm_location | |
467 | │ │ │ ├── dimm_mem_type | |
4fb6fde7 | 468 | │ │ │ ├── dimm_ue_count |
032d0ab7 MCC |
469 | │ │ │ ├── size |
470 | │ │ │ └── uevent | |
471 | │ │ ├── max_location | |
472 | │ │ ├── mc_name | |
473 | │ │ ├── reset_counters | |
474 | │ │ ├── seconds_since_reset | |
475 | │ │ ├── size_mb | |
476 | │ │ ├── ue_count | |
477 | │ │ ├── ue_noinfo_count | |
478 | │ │ └── uevent | |
479 | │ └── uevent | |
480 | └── uevent | |
481 | ||
482 | In the ``dimmX`` directories are EDAC control and attribute files for | |
483 | this ``X`` memory module: | |
484 | ||
485 | - ``size`` - Total memory managed by this csrow attribute file | |
486 | ||
487 | This attribute file displays, in count of megabytes, the memory | |
488 | that this csrow contains. | |
489 | ||
4fb6fde7 AM |
490 | - ``dimm_ue_count`` - Uncorrectable Errors count attribute file |
491 | ||
492 | This attribute file displays the total count of uncorrectable | |
493 | errors that have occurred on this DIMM. If panic_on_ue is set | |
494 | this counter will not have a chance to increment, since EDAC | |
495 | will panic the system. | |
496 | ||
497 | - ``dimm_ce_count`` - Correctable Errors count attribute file | |
498 | ||
499 | This attribute file displays the total count of correctable | |
500 | errors that have occurred on this DIMM. This count is very | |
501 | important to examine. CEs provide early indications that a | |
502 | DIMM is beginning to fail. This count field should be | |
503 | monitored for non-zero values and report such information | |
504 | to the system administrator. | |
505 | ||
032d0ab7 MCC |
506 | - ``dimm_dev_type`` - Device type attribute file |
507 | ||
508 | This attribute file will display what type of DRAM device is | |
509 | being utilized on this DIMM. | |
510 | Examples: | |
511 | ||
512 | - x1 | |
513 | - x2 | |
514 | - x4 | |
515 | - x8 | |
516 | ||
517 | - ``dimm_edac_mode`` - EDAC Mode of operation attribute file | |
518 | ||
519 | This attribute file will display what type of Error detection | |
520 | and correction is being utilized. | |
521 | ||
522 | - ``dimm_label`` - memory module label control file | |
523 | ||
524 | This control file allows this DIMM to have a label assigned | |
525 | to it. With this label in the module, when errors occur | |
526 | the output can provide the DIMM label in the system log. | |
527 | This becomes vital for panic events to isolate the | |
528 | cause of the UE event. | |
529 | ||
530 | DIMM Labels must be assigned after booting, with information | |
531 | that correctly identifies the physical slot with its | |
532 | silk screen label. This information is currently very | |
533 | motherboard specific and determination of this information | |
534 | must occur in userland at this time. | |
535 | ||
536 | - ``dimm_location`` - location of the memory module | |
537 | ||
538 | The location can have up to 3 levels, and describe how the | |
539 | memory controller identifies the location of a memory module. | |
540 | Depending on the type of memory and memory controller, it | |
541 | can be: | |
542 | ||
543 | - *csrow* and *channel* - used when the memory controller | |
544 | doesn't identify a single DIMM - e. g. in ``rankX`` dir; | |
545 | - *branch*, *channel*, *slot* - typically used on FB-DIMM memory | |
546 | controllers; | |
547 | - *channel*, *slot* - used on Nehalem and newer Intel drivers. | |
548 | ||
549 | - ``dimm_mem_type`` - Memory Type attribute file | |
550 | ||
551 | This attribute file will display what type of memory is currently | |
552 | on this csrow. Normally, either buffered or unbuffered memory. | |
553 | Examples: | |
554 | ||
555 | - Registered-DDR | |
556 | - Unbuffered-DDR | |
557 | ||
558 | .. [#f5] On some systems, the memory controller doesn't have any logic | |
559 | to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories. | |
560 | On modern Intel memory controllers, the memory controller identifies the | |
561 | memory modules directly. On such systems, the directory is called ``dimmX``. | |
562 | ||
563 | .. [#f6] There are also some ``power`` directories and ``subsystem`` | |
564 | symlinks inside the sysfs mapping that are automatically created by | |
565 | the sysfs subsystem. Currently, they serve no purpose. | |
da9bb1d2 | 566 | |
b27a2d04 MCC |
567 | ``csrowX`` directories |
568 | ---------------------- | |
043b4318 | 569 | |
9c058d24 | 570 | When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX`` |
043b4318 BP |
571 | directories. As this API doesn't work properly for Rambus, FB-DIMMs and |
572 | modern Intel Memory Controllers, this is being deprecated in favor of | |
9c058d24 | 573 | ``dimmX`` directories. |
8b6f04ce | 574 | |
b27a2d04 MCC |
575 | In the ``csrowX`` directories are EDAC control and attribute files for |
576 | this ``X`` instance of csrow: | |
da9bb1d2 AC |
577 | |
578 | ||
b27a2d04 | 579 | - ``ue_count`` - Total Uncorrectable Errors count attribute file |
da9bb1d2 AC |
580 | |
581 | This attribute file displays the total count of uncorrectable | |
582 | errors that have occurred on this csrow. If panic_on_ue is set | |
583 | this counter will not have a chance to increment, since EDAC | |
584 | will panic the system. | |
585 | ||
586 | ||
b27a2d04 | 587 | - ``ce_count`` - Total Correctable Errors count attribute file |
da9bb1d2 AC |
588 | |
589 | This attribute file displays the total count of correctable | |
043b4318 BP |
590 | errors that have occurred on this csrow. This count is very |
591 | important to examine. CEs provide early indications that a | |
592 | DIMM is beginning to fail. This count field should be | |
593 | monitored for non-zero values and report such information | |
594 | to the system administrator. | |
da9bb1d2 AC |
595 | |
596 | ||
b27a2d04 | 597 | - ``size_mb`` - Total memory managed by this csrow attribute file |
da9bb1d2 | 598 | |
3aae9edd | 599 | This attribute file displays, in count of megabytes, the memory |
f3479816 | 600 | that this csrow contains. |
da9bb1d2 AC |
601 | |
602 | ||
b27a2d04 | 603 | - ``mem_type`` - Memory Type attribute file |
da9bb1d2 AC |
604 | |
605 | This attribute file will display what type of memory is currently | |
606 | on this csrow. Normally, either buffered or unbuffered memory. | |
49c0dab7 | 607 | Examples: |
da9bb1d2 | 608 | |
b27a2d04 MCC |
609 | - Registered-DDR |
610 | - Unbuffered-DDR | |
da9bb1d2 | 611 | |
da9bb1d2 | 612 | |
b27a2d04 | 613 | - ``edac_mode`` - EDAC Mode of operation attribute file |
da9bb1d2 AC |
614 | |
615 | This attribute file will display what type of Error detection | |
616 | and correction is being utilized. | |
617 | ||
618 | ||
b27a2d04 | 619 | - ``dev_type`` - Device type attribute file |
da9bb1d2 | 620 | |
49c0dab7 DT |
621 | This attribute file will display what type of DRAM device is |
622 | being utilized on this DIMM. | |
623 | Examples: | |
da9bb1d2 | 624 | |
b27a2d04 MCC |
625 | - x1 |
626 | - x2 | |
627 | - x4 | |
628 | - x8 | |
da9bb1d2 | 629 | |
da9bb1d2 | 630 | |
b27a2d04 | 631 | - ``ch0_ce_count`` - Channel 0 CE Count attribute file |
da9bb1d2 AC |
632 | |
633 | This attribute file will display the count of CEs on this | |
634 | DIMM located in channel 0. | |
635 | ||
636 | ||
b27a2d04 | 637 | - ``ch0_ue_count`` - Channel 0 UE Count attribute file |
da9bb1d2 AC |
638 | |
639 | This attribute file will display the count of UEs on this | |
640 | DIMM located in channel 0. | |
641 | ||
642 | ||
b27a2d04 | 643 | - ``ch0_dimm_label`` - Channel 0 DIMM Label control file |
da9bb1d2 | 644 | |
da9bb1d2 AC |
645 | |
646 | This control file allows this DIMM to have a label assigned | |
647 | to it. With this label in the module, when errors occur | |
648 | the output can provide the DIMM label in the system log. | |
649 | This becomes vital for panic events to isolate the | |
650 | cause of the UE event. | |
651 | ||
652 | DIMM Labels must be assigned after booting, with information | |
653 | that correctly identifies the physical slot with its | |
654 | silk screen label. This information is currently very | |
655 | motherboard specific and determination of this information | |
656 | must occur in userland at this time. | |
657 | ||
658 | ||
b27a2d04 | 659 | - ``ch1_ce_count`` - Channel 1 CE Count attribute file |
da9bb1d2 | 660 | |
da9bb1d2 AC |
661 | |
662 | This attribute file will display the count of CEs on this | |
663 | DIMM located in channel 1. | |
664 | ||
665 | ||
b27a2d04 | 666 | - ``ch1_ue_count`` - Channel 1 UE Count attribute file |
da9bb1d2 | 667 | |
da9bb1d2 AC |
668 | |
669 | This attribute file will display the count of UEs on this | |
670 | DIMM located in channel 0. | |
671 | ||
672 | ||
b27a2d04 | 673 | - ``ch1_dimm_label`` - Channel 1 DIMM Label control file |
da9bb1d2 AC |
674 | |
675 | This control file allows this DIMM to have a label assigned | |
676 | to it. With this label in the module, when errors occur | |
677 | the output can provide the DIMM label in the system log. | |
678 | This becomes vital for panic events to isolate the | |
679 | cause of the UE event. | |
680 | ||
681 | DIMM Labels must be assigned after booting, with information | |
682 | that correctly identifies the physical slot with its | |
683 | silk screen label. This information is currently very | |
684 | motherboard specific and determination of this information | |
685 | must occur in userland at this time. | |
686 | ||
043b4318 | 687 | |
b27a2d04 | 688 | System Logging |
043b4318 | 689 | -------------- |
da9bb1d2 | 690 | |
043b4318 | 691 | If logging for UEs and CEs is enabled, then system logs will contain |
b27a2d04 | 692 | information indicating that errors have been detected:: |
da9bb1d2 | 693 | |
b27a2d04 MCC |
694 | EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, channel 1 "DIMM_B1": amd76x_edac |
695 | EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, channel 1 "DIMM_B1": amd76x_edac | |
da9bb1d2 AC |
696 | |
697 | ||
698 | The structure of the message is: | |
b27a2d04 MCC |
699 | |
700 | +---------------------------------------+-------------+ | |
82a19551 | 701 | | Content | Example | |
b27a2d04 MCC |
702 | +=======================================+=============+ |
703 | | The memory controller | MC0 | | |
704 | +---------------------------------------+-------------+ | |
705 | | Error type | CE | | |
706 | +---------------------------------------+-------------+ | |
707 | | Memory page | 0x283 | | |
708 | +---------------------------------------+-------------+ | |
709 | | Offset in the page | 0xce0 | | |
710 | +---------------------------------------+-------------+ | |
711 | | The byte granularity | grain 8 | | |
712 | | or resolution of the error | | | |
713 | +---------------------------------------+-------------+ | |
714 | | The error syndrome | 0xb741 | | |
715 | +---------------------------------------+-------------+ | |
82a19551 | 716 | | Memory row | row 0 | |
b27a2d04 MCC |
717 | +---------------------------------------+-------------+ |
718 | | Memory channel | channel 1 | | |
719 | +---------------------------------------+-------------+ | |
720 | | DIMM label, if set prior | DIMM B1 | | |
721 | +---------------------------------------+-------------+ | |
722 | | And then an optional, driver-specific | | | |
723 | | message that may have additional | | | |
724 | | information. | | | |
725 | +---------------------------------------+-------------+ | |
da9bb1d2 | 726 | |
043b4318 BP |
727 | Both UEs and CEs with no info will lack all but memory controller, error |
728 | type, a notice of "no info" and then an optional, driver-specific error | |
729 | message. | |
da9bb1d2 AC |
730 | |
731 | ||
da9bb1d2 | 732 | PCI Bus Parity Detection |
043b4318 | 733 | ------------------------ |
da9bb1d2 | 734 | |
043b4318 BP |
735 | On Header Type 00 devices, the primary status is looked at for any |
736 | parity error regardless of whether parity is enabled on the device or | |
737 | not. (The spec indicates parity is generated in some cases). On Header | |
738 | Type 01 bridges, the secondary status register is also looked at to see | |
739 | if parity occurred on the bus on the other side of the bridge. | |
da9bb1d2 AC |
740 | |
741 | ||
b27a2d04 | 742 | Sysfs configuration |
043b4318 | 743 | ------------------- |
da9bb1d2 | 744 | |
b27a2d04 MCC |
745 | Under ``/sys/devices/system/edac/pci`` are control and attribute files as |
746 | follows: | |
da9bb1d2 AC |
747 | |
748 | ||
b27a2d04 | 749 | - ``check_pci_parity`` - Enable/Disable PCI Parity checking control file |
da9bb1d2 AC |
750 | |
751 | This control file enables or disables the PCI Bus Parity scanning | |
752 | operation. Writing a 1 to this file enables the scanning. Writing | |
753 | a 0 to this file disables the scanning. | |
754 | ||
b27a2d04 MCC |
755 | Enable:: |
756 | ||
757 | echo "1" >/sys/devices/system/edac/pci/check_pci_parity | |
da9bb1d2 | 758 | |
b27a2d04 | 759 | Disable:: |
da9bb1d2 | 760 | |
b27a2d04 | 761 | echo "0" >/sys/devices/system/edac/pci/check_pci_parity |
da9bb1d2 | 762 | |
327dafb1 | 763 | |
b27a2d04 | 764 | - ``pci_parity_count`` - Parity Count |
327dafb1 AJ |
765 | |
766 | This attribute file will display the number of parity errors that | |
767 | have been detected. | |
768 | ||
769 | ||
b27a2d04 | 770 | Module parameters |
043b4318 | 771 | ----------------- |
327dafb1 | 772 | |
b27a2d04 | 773 | - ``edac_mc_panic_on_ue`` - Panic on UE control file |
327dafb1 AJ |
774 | |
775 | An uncorrectable error will cause a machine panic. This is usually | |
776 | desirable. It is a bad idea to continue when an uncorrectable error | |
777 | occurs - it is indeterminate what was uncorrected and the operating | |
778 | system context might be so mangled that continuing will lead to further | |
779 | corruption. If the kernel has MCE configured, then EDAC will never | |
780 | notice the UE. | |
781 | ||
b27a2d04 MCC |
782 | LOAD TIME:: |
783 | ||
784 | module/kernel parameter: edac_mc_panic_on_ue=[0|1] | |
785 | ||
786 | RUN TIME:: | |
327dafb1 | 787 | |
b27a2d04 | 788 | echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue |
327dafb1 AJ |
789 | |
790 | ||
b27a2d04 | 791 | - ``edac_mc_log_ue`` - Log UE control file |
327dafb1 | 792 | |
327dafb1 AJ |
793 | |
794 | Generate kernel messages describing uncorrectable errors. These errors | |
795 | are reported through the system message log system. UE statistics | |
796 | will be accumulated even when UE logging is disabled. | |
797 | ||
b27a2d04 | 798 | LOAD TIME:: |
327dafb1 | 799 | |
b27a2d04 | 800 | module/kernel parameter: edac_mc_log_ue=[0|1] |
327dafb1 | 801 | |
b27a2d04 | 802 | RUN TIME:: |
327dafb1 | 803 | |
b27a2d04 MCC |
804 | echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ue |
805 | ||
806 | ||
807 | - ``edac_mc_log_ce`` - Log CE control file | |
327dafb1 | 808 | |
327dafb1 AJ |
809 | |
810 | Generate kernel messages describing correctable errors. These | |
811 | errors are reported through the system message log system. | |
812 | CE statistics will be accumulated even when CE logging is disabled. | |
813 | ||
b27a2d04 MCC |
814 | LOAD TIME:: |
815 | ||
816 | module/kernel parameter: edac_mc_log_ce=[0|1] | |
817 | ||
818 | RUN TIME:: | |
327dafb1 | 819 | |
b27a2d04 | 820 | echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ce |
327dafb1 AJ |
821 | |
822 | ||
b27a2d04 | 823 | - ``edac_mc_poll_msec`` - Polling period control file |
327dafb1 | 824 | |
327dafb1 AJ |
825 | |
826 | The time period, in milliseconds, for polling for error information. | |
827 | Too small a value wastes resources. Too large a value might delay | |
828 | necessary handling of errors and might loose valuable information for | |
829 | locating the error. 1000 milliseconds (once each second) is the current | |
830 | default. Systems which require all the bandwidth they can get, may | |
831 | increase this. | |
832 | ||
b27a2d04 | 833 | LOAD TIME:: |
327dafb1 | 834 | |
b27a2d04 | 835 | module/kernel parameter: edac_mc_poll_msec=[0|1] |
327dafb1 | 836 | |
b27a2d04 | 837 | RUN TIME:: |
da9bb1d2 | 838 | |
b27a2d04 | 839 | echo "1000" > /sys/module/edac_core/parameters/edac_mc_poll_msec |
da9bb1d2 | 840 | |
b27a2d04 MCC |
841 | |
842 | - ``panic_on_pci_parity`` - Panic on PCI PARITY Error | |
da9bb1d2 AC |
843 | |
844 | ||
3aae9edd | 845 | This control file enables or disables panicking when a parity |
da9bb1d2 AC |
846 | error has been detected. |
847 | ||
848 | ||
b27a2d04 MCC |
849 | module/kernel parameter:: |
850 | ||
851 | edac_panic_on_pci_pe=[0|1] | |
852 | ||
853 | Enable:: | |
854 | ||
855 | echo "1" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe | |
da9bb1d2 | 856 | |
b27a2d04 | 857 | Disable:: |
da9bb1d2 | 858 | |
b27a2d04 | 859 | echo "0" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe |
da9bb1d2 AC |
860 | |
861 | ||
862 | ||
043b4318 BP |
863 | EDAC device type |
864 | ---------------- | |
87f24c3a | 865 | |
66c222a0 | 866 | In the header file, edac_pci.h, there is a series of edac_device structures |
87f24c3a DT |
867 | and APIs for the EDAC_DEVICE. |
868 | ||
869 | User space access to an edac_device is through the sysfs interface. | |
870 | ||
b27a2d04 MCC |
871 | At the location ``/sys/devices/system/edac`` (sysfs) new edac_device devices |
872 | will appear. | |
87f24c3a | 873 | |
b27a2d04 MCC |
874 | There is a three level tree beneath the above ``edac`` directory. For example, |
875 | the ``test_device_edac`` device (found at the http://bluesmoke.sourceforget.net | |
876 | website) installs itself as:: | |
87f24c3a | 877 | |
b27a2d04 | 878 | /sys/devices/system/edac/test-instance |
87f24c3a | 879 | |
b27a2d04 | 880 | in this directory are various controls, a symlink and one or more ``instance`` |
c98be0c9 | 881 | directories. |
87f24c3a DT |
882 | |
883 | The standard default controls are: | |
884 | ||
b27a2d04 | 885 | ============== ======================================================= |
87f24c3a DT |
886 | log_ce boolean to log CE events |
887 | log_ue boolean to log UE events | |
b27a2d04 | 888 | panic_on_ue boolean to ``panic`` the system if an UE is encountered |
87f24c3a DT |
889 | (default off, can be set true via startup script) |
890 | poll_msec time period between POLL cycles for events | |
b27a2d04 | 891 | ============== ======================================================= |
87f24c3a DT |
892 | |
893 | The test_device_edac device adds at least one of its own custom control: | |
894 | ||
b27a2d04 | 895 | ============== ================================================== |
87f24c3a DT |
896 | test_bits which in the current test driver does nothing but |
897 | show how it is installed. A ported driver can | |
898 | add one or more such controls and/or attributes | |
899 | for specific uses. | |
900 | One out-of-tree driver uses controls here to allow | |
901 | for ERROR INJECTION operations to hardware | |
902 | injection registers | |
b27a2d04 | 903 | ============== ================================================== |
87f24c3a DT |
904 | |
905 | The symlink points to the 'struct dev' that is registered for this edac_device. | |
906 | ||
b27a2d04 | 907 | Instances |
043b4318 | 908 | --------- |
87f24c3a | 909 | |
b27a2d04 MCC |
910 | One or more instance directories are present. For the ``test_device_edac`` |
911 | case: | |
87f24c3a | 912 | |
b27a2d04 MCC |
913 | +----------------+ |
914 | | test-instance0 | | |
915 | +----------------+ | |
87f24c3a DT |
916 | |
917 | ||
918 | In this directory there are two default counter attributes, which are totals of | |
919 | counter in deeper subdirectories. | |
920 | ||
b27a2d04 | 921 | ============== ==================================== |
87f24c3a DT |
922 | ce_count total of CE events of subdirectories |
923 | ue_count total of UE events of subdirectories | |
b27a2d04 | 924 | ============== ==================================== |
87f24c3a | 925 | |
b27a2d04 | 926 | Blocks |
043b4318 | 927 | ------ |
87f24c3a | 928 | |
b27a2d04 MCC |
929 | At the lowest directory level is the ``block`` directory. There can be 0, 1 |
930 | or more blocks specified in each instance: | |
87f24c3a | 931 | |
b27a2d04 MCC |
932 | +-------------+ |
933 | | test-block0 | | |
934 | +-------------+ | |
87f24c3a DT |
935 | |
936 | In this directory the default attributes are: | |
937 | ||
b27a2d04 MCC |
938 | ============== ================================================ |
939 | ce_count which is counter of CE events for this ``block`` | |
87f24c3a | 940 | of hardware being monitored |
b27a2d04 | 941 | ue_count which is counter of UE events for this ``block`` |
87f24c3a | 942 | of hardware being monitored |
b27a2d04 | 943 | ============== ================================================ |
87f24c3a DT |
944 | |
945 | ||
b27a2d04 | 946 | The ``test_device_edac`` device adds 4 attributes and 1 control: |
87f24c3a | 947 | |
b27a2d04 | 948 | ================== ==================================================== |
87f24c3a DT |
949 | test-block-bits-0 for every POLL cycle this counter |
950 | is incremented | |
951 | test-block-bits-1 every 10 cycles, this counter is bumped once, | |
952 | and test-block-bits-0 is set to 0 | |
953 | test-block-bits-2 every 100 cycles, this counter is bumped once, | |
954 | and test-block-bits-1 is set to 0 | |
955 | test-block-bits-3 every 1000 cycles, this counter is bumped once, | |
956 | and test-block-bits-2 is set to 0 | |
b27a2d04 | 957 | ================== ==================================================== |
87f24c3a DT |
958 | |
959 | ||
b27a2d04 | 960 | ================== ==================================================== |
87f24c3a DT |
961 | reset-counters writing ANY thing to this control will |
962 | reset all the above counters. | |
b27a2d04 | 963 | ================== ==================================================== |
87f24c3a DT |
964 | |
965 | ||
b27a2d04 | 966 | Use of the ``test_device_edac`` driver should enable any others to create their own |
87f24c3a DT |
967 | unique drivers for their hardware systems. |
968 | ||
b27a2d04 MCC |
969 | The ``test_device_edac`` sample driver is located at the |
970 | http://bluesmoke.sourceforge.net project site for EDAC. | |
87f24c3a | 971 | |
043b4318 | 972 | |
e4b53016 MCC |
973 | Usage of EDAC APIs on Nehalem and newer Intel CPUs |
974 | -------------------------------------------------- | |
31983a04 | 975 | |
e4b53016 MCC |
976 | On older Intel architectures, the memory controller was part of the North |
977 | Bridge chipset. Nehalem, Sandy Bridge, Ivy Bridge, Haswell, Sky Lake and | |
978 | newer Intel architectures integrated an enhanced version of the memory | |
979 | controller (MC) inside the CPUs. | |
31983a04 | 980 | |
e4b53016 MCC |
981 | This chapter will cover the differences of the enhanced memory controllers |
982 | found on newer Intel CPUs, such as ``i7core_edac``, ``sb_edac`` and | |
983 | ``sbx_edac`` drivers. | |
984 | ||
985 | .. note:: | |
986 | ||
987 | The Xeon E7 processor families use a separate chip for the memory | |
988 | controller, called Intel Scalable Memory Buffer. This section doesn't | |
989 | apply for such families. | |
990 | ||
991 | 1) There is one Memory Controller per Quick Patch Interconnect | |
c3444363 MCC |
992 | (QPI). At the driver, the term "socket" means one QPI. This is |
993 | associated with a physical CPU socket. | |
31983a04 MCC |
994 | |
995 | Each MC have 3 physical read channels, 3 physical write channels and | |
c94bed8e | 996 | 3 logic channels. The driver currently sees it as just 3 channels. |
31983a04 MCC |
997 | Each channel can have up to 3 DIMMs. |
998 | ||
999 | The minimum known unity is DIMMs. There are no information about csrows. | |
3aae9edd | 1000 | As EDAC API maps the minimum unity is csrows, the driver sequentially |
e4b53016 | 1001 | maps channel/DIMM into different csrows. |
c3444363 | 1002 | |
b27a2d04 MCC |
1003 | For example, supposing the following layout:: |
1004 | ||
c3444363 MCC |
1005 | Ch0 phy rd0, wr0 (0x063f4031): 2 ranks, UDIMMs |
1006 | dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 | |
1007 | dimm 1 1024 Mb offset: 4, bank: 8, rank: 1, row: 0x4000, col: 0x400 | |
1008 | Ch1 phy rd1, wr1 (0x063f4031): 2 ranks, UDIMMs | |
1009 | dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 | |
1010 | Ch2 phy rd3, wr3 (0x063f4031): 2 ranks, UDIMMs | |
1011 | dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 | |
b27a2d04 MCC |
1012 | |
1013 | The driver will map it as:: | |
1014 | ||
c3444363 MCC |
1015 | csrow0: channel 0, dimm0 |
1016 | csrow1: channel 0, dimm1 | |
1017 | csrow2: channel 1, dimm0 | |
1018 | csrow3: channel 2, dimm0 | |
1019 | ||
b27a2d04 | 1020 | exports one DIMM per csrow. |
31983a04 | 1021 | |
c3444363 | 1022 | Each QPI is exported as a different memory controller. |
31983a04 | 1023 | |
e4b53016 MCC |
1024 | 2) The MC has the ability to inject errors to test drivers. The drivers |
1025 | implement this functionality via some error injection nodes: | |
31983a04 MCC |
1026 | |
1027 | For injecting a memory error, there are some sysfs nodes, under | |
b27a2d04 | 1028 | ``/sys/devices/system/edac/mc/mc?/``: |
31983a04 | 1029 | |
b27a2d04 | 1030 | - ``inject_addrmatch/*``: |
31983a04 | 1031 | Controls the error injection mask register. It is possible to specify |
b27a2d04 MCC |
1032 | several characteristics of the address to match an error code:: |
1033 | ||
31983a04 MCC |
1034 | dimm = the affected dimm. Numbers are relative to a channel; |
1035 | rank = the memory rank; | |
1036 | channel = the channel that will generate an error; | |
1037 | bank = the affected bank; | |
1038 | page = the page address; | |
1039 | column (or col) = the address column. | |
b27a2d04 | 1040 | |
31983a04 MCC |
1041 | each of the above values can be set to "any" to match any valid value. |
1042 | ||
1043 | At driver init, all values are set to any. | |
1044 | ||
1045 | For example, to generate an error at rank 1 of dimm 2, for any channel, | |
b27a2d04 MCC |
1046 | any bank, any page, any column:: |
1047 | ||
35be9544 MCC |
1048 | echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm |
1049 | echo 1 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank | |
31983a04 | 1050 | |
b27a2d04 MCC |
1051 | To return to the default behaviour of matching any, you can do:: |
1052 | ||
35be9544 MCC |
1053 | echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm |
1054 | echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank | |
31983a04 | 1055 | |
b27a2d04 MCC |
1056 | - ``inject_eccmask``: |
1057 | specifies what bits will have troubles, | |
1058 | ||
1059 | - ``inject_section``: | |
1060 | specifies what ECC cache section will get the error:: | |
31983a04 | 1061 | |
31983a04 MCC |
1062 | 3 for both |
1063 | 2 for the highest | |
1064 | 1 for the lowest | |
1065 | ||
b27a2d04 MCC |
1066 | - ``inject_type``: |
1067 | specifies the type of error, being a combination of the following bits:: | |
1068 | ||
31983a04 MCC |
1069 | bit 0 - repeat |
1070 | bit 1 - ecc | |
1071 | bit 2 - parity | |
1072 | ||
b27a2d04 MCC |
1073 | - ``inject_enable``: |
1074 | starts the error generation when something different than 0 is written. | |
31983a04 MCC |
1075 | |
1076 | All inject vars can be read. root permission is needed for write. | |
1077 | ||
1078 | Datasheet states that the error will only be generated after a write on an | |
1079 | address that matches inject_addrmatch. It seems, however, that reading will | |
1080 | also produce an error. | |
1081 | ||
1082 | For example, the following code will generate an error for any write access | |
b27a2d04 | 1083 | at socket 0, on any DIMM/address on channel 2:: |
31983a04 | 1084 | |
b27a2d04 MCC |
1085 | echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/channel |
1086 | echo 2 >/sys/devices/system/edac/mc/mc0/inject_type | |
1087 | echo 64 >/sys/devices/system/edac/mc/mc0/inject_eccmask | |
1088 | echo 3 >/sys/devices/system/edac/mc/mc0/inject_section | |
1089 | echo 1 >/sys/devices/system/edac/mc/mc0/inject_enable | |
1090 | dd if=/dev/mem of=/dev/null seek=16k bs=4k count=1 >& /dev/null | |
31983a04 | 1091 | |
c3444363 MCC |
1092 | For socket 1, it is needed to replace "mc0" by "mc1" at the above |
1093 | commands. | |
1094 | ||
b27a2d04 | 1095 | The generated error message will look like:: |
31983a04 | 1096 | |
b27a2d04 | 1097 | EDAC MC0: UE row 0, channel-a= 0 channel-b= 0 labels "-": NON_FATAL (addr = 0x0075b980, socket=0, Dimm=0, Channel=2, syndrome=0x00000040, count=1, Err=8c0000400001009f:4000080482 (read error: read ECC error)) |
31983a04 | 1098 | |
e4b53016 | 1099 | 3) Corrected Error memory register counters |
31983a04 | 1100 | |
e4b53016 MCC |
1101 | Those newer MCs have some registers to count memory errors. The driver |
1102 | uses those registers to report Corrected Errors on devices with Registered | |
1103 | DIMMs. | |
31983a04 | 1104 | |
e4b53016 MCC |
1105 | However, those counters don't work with Unregistered DIMM. As the chipset |
1106 | offers some counters that also work with UDIMMs (but with a worse level of | |
35be9544 MCC |
1107 | granularity than the default ones), the driver exposes those registers for |
1108 | UDIMM memories. | |
c3444363 | 1109 | |
b27a2d04 | 1110 | They can be read by looking at the contents of ``all_channel_counts/``:: |
31983a04 | 1111 | |
b27a2d04 | 1112 | $ for i in /sys/devices/system/edac/mc/mc0/all_channel_counts/*; do echo $i; cat $i; done |
35be9544 MCC |
1113 | /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0 |
1114 | 0 | |
1115 | /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1 | |
1116 | 0 | |
1117 | /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2 | |
1118 | 0 | |
c3444363 MCC |
1119 | |
1120 | What happens here is that errors on different csrows, but at the same | |
1121 | dimm number will increment the same counter. | |
b27a2d04 MCC |
1122 | So, in this memory mapping:: |
1123 | ||
c3444363 MCC |
1124 | csrow0: channel 0, dimm0 |
1125 | csrow1: channel 0, dimm1 | |
1126 | csrow2: channel 1, dimm0 | |
1127 | csrow3: channel 2, dimm0 | |
b27a2d04 | 1128 | |
35be9544 | 1129 | The hardware will increment udimm0 for an error at the first dimm at either |
b27a2d04 MCC |
1130 | csrow0, csrow2 or csrow3; |
1131 | ||
35be9544 | 1132 | The hardware will increment udimm1 for an error at the second dimm at either |
b27a2d04 MCC |
1133 | csrow0, csrow2 or csrow3; |
1134 | ||
35be9544 | 1135 | The hardware will increment udimm2 for an error at the third dimm at either |
b27a2d04 | 1136 | csrow0, csrow2 or csrow3; |
c3444363 MCC |
1137 | |
1138 | 4) Standard error counters | |
1139 | ||
1140 | The standard error counters are generated when an mcelog error is received | |
e4b53016 MCC |
1141 | by the driver. Since, with UDIMM, this is counted by software, it is |
1142 | possible that some errors could be lost. With RDIMM's, they display the | |
35be9544 | 1143 | contents of the registers |
043b4318 | 1144 | |
b27a2d04 MCC |
1145 | Reference documents used on ``amd64_edac`` |
1146 | ------------------------------------------ | |
1147 | ||
1148 | ``amd64_edac`` module is based on the following documents | |
6b7464b7 AG |
1149 | (available from http://support.amd.com/en-us/search/tech-docs): |
1150 | ||
b27a2d04 | 1151 | 1. :Title: BIOS and Kernel Developer's Guide for AMD Athlon 64 and AMD |
6b7464b7 | 1152 | Opteron Processors |
b27a2d04 MCC |
1153 | :AMD publication #: 26094 |
1154 | :Revision: 3.26 | |
1155 | :Link: http://support.amd.com/TechDocs/26094.PDF | |
6b7464b7 | 1156 | |
b27a2d04 | 1157 | 2. :Title: BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh |
6b7464b7 | 1158 | Processors |
b27a2d04 MCC |
1159 | :AMD publication #: 32559 |
1160 | :Revision: 3.00 | |
1161 | :Issue Date: May 2006 | |
1162 | :Link: http://support.amd.com/TechDocs/32559.pdf | |
6b7464b7 | 1163 | |
b27a2d04 | 1164 | 3. :Title: BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h |
6b7464b7 | 1165 | Processors |
b27a2d04 MCC |
1166 | :AMD publication #: 31116 |
1167 | :Revision: 3.00 | |
1168 | :Issue Date: September 07, 2007 | |
1169 | :Link: http://support.amd.com/TechDocs/31116.pdf | |
6b7464b7 | 1170 | |
b27a2d04 | 1171 | 4. :Title: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h |
6b7464b7 | 1172 | Models 30h-3Fh Processors |
b27a2d04 MCC |
1173 | :AMD publication #: 49125 |
1174 | :Revision: 3.06 | |
1175 | :Issue Date: 2/12/2015 (latest release) | |
1176 | :Link: http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf | |
6b7464b7 | 1177 | |
b27a2d04 | 1178 | 5. :Title: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h |
6b7464b7 | 1179 | Models 60h-6Fh Processors |
b27a2d04 MCC |
1180 | :AMD publication #: 50742 |
1181 | :Revision: 3.01 | |
1182 | :Issue Date: 7/23/2015 (latest release) | |
1183 | :Link: http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf | |
6b7464b7 | 1184 | |
b27a2d04 | 1185 | 6. :Title: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 16h |
6b7464b7 | 1186 | Models 00h-0Fh Processors |
b27a2d04 MCC |
1187 | :AMD publication #: 48751 |
1188 | :Revision: 3.03 | |
1189 | :Issue Date: 2/23/2015 (latest release) | |
1190 | :Link: http://support.amd.com/TechDocs/48751_16h_bkdg.pdf | |
1191 | ||
1192 | Credits | |
1193 | ======= | |
1194 | ||
1195 | * Written by Doug Thompson <dougthompson@xmission.com> | |
6b7464b7 | 1196 | |
b27a2d04 MCC |
1197 | - 7 Dec 2005 |
1198 | - 17 Jul 2007 Updated | |
043b4318 | 1199 | |
b27a2d04 | 1200 | * |copy| Mauro Carvalho Chehab |
043b4318 | 1201 | |
b27a2d04 | 1202 | - 05 Aug 2009 Nehalem interface |
e4b53016 | 1203 | - 26 Oct 2016 Converted to ReST and cleanups at the Nehalem section |
043b4318 | 1204 | |
b27a2d04 | 1205 | * EDAC authors/maintainers: |
043b4318 | 1206 | |
b27a2d04 MCC |
1207 | - Doug Thompson, Dave Jiang, Dave Peterson et al, |
1208 | - Mauro Carvalho Chehab | |
1209 | - Borislav Petkov | |
1210 | - original author: Thayne Harbaugh |