]>
Commit | Line | Data |
---|---|---|
1a3ec143 AS |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | .. _imc: | |
3 | ||
4 | =================================== | |
5 | IMC (In-Memory Collection Counters) | |
6 | =================================== | |
7 | ||
8 | Anju T Sudhakar, 10 May 2019 | |
9 | ||
10 | .. contents:: | |
11 | :depth: 3 | |
12 | ||
13 | ||
14 | Basic overview | |
15 | ============== | |
16 | ||
17 | IMC (In-Memory collection counters) is a hardware monitoring facility that | |
18 | collects large numbers of hardware performance events at Nest level (these are | |
19 | on-chip but off-core), Core level and Thread level. | |
20 | ||
21 | The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC | |
22 | (On-Chip Controller) complex. The microcode collects the counter data and moves | |
23 | the nest IMC counter data to memory. | |
24 | ||
25 | The Core and Thread IMC PMU counters are handled in the core. Core level PMU | |
26 | counters give us the IMC counters' data per core and thread level PMU counters | |
27 | give us the IMC counters' data per CPU thread. | |
28 | ||
29 | OPAL obtains the IMC PMU and supported events information from the IMC Catalog | |
30 | and passes on to the kernel via the device tree. The event's information | |
31 | contains: | |
32 | ||
33 | - Event name | |
34 | - Event Offset | |
35 | - Event description | |
36 | ||
37 | and possibly also: | |
38 | ||
39 | - Event scale | |
40 | - Event unit | |
41 | ||
42 | Some PMUs may have a common scale and unit values for all their supported | |
43 | events. For those cases, the scale and unit properties for those events must be | |
44 | inherited from the PMU. | |
45 | ||
46 | The event offset in the memory is where the counter data gets accumulated. | |
47 | ||
48 | IMC catalog is available at: | |
49 | https://github.com/open-power/ima-catalog | |
50 | ||
51 | The kernel discovers the IMC counters information in the device tree at the | |
52 | `imc-counters` device node which has a compatible field | |
53 | `ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs | |
54 | and their event's information and register the PMU and its attributes in the | |
55 | kernel. | |
56 | ||
57 | IMC example usage | |
58 | ================= | |
59 | ||
60 | .. code-block:: sh | |
61 | ||
62 | # perf list | |
63 | [...] | |
64 | nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event] | |
65 | nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event] | |
66 | [...] | |
67 | core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] | |
68 | core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] | |
69 | [...] | |
70 | thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] | |
71 | thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] | |
72 | ||
73 | To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/: | |
74 | ||
75 | .. code-block:: sh | |
76 | ||
77 | # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket | |
78 | ||
79 | To see non-idle instructions for core 0: | |
80 | ||
81 | .. code-block:: sh | |
82 | ||
83 | # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000 | |
84 | ||
85 | To see non-idle instructions for a "make": | |
86 | ||
87 | .. code-block:: sh | |
88 | ||
89 | # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make | |
90 | ||
91 | ||
92 | IMC Trace-mode | |
93 | =============== | |
94 | ||
95 | POWER9 supports two modes for IMC which are the Accumulation mode and Trace | |
96 | mode. In Accumulation mode, event counts are accumulated in system Memory. | |
97 | Hypervisor then reads the posted counts periodically or when requested. In IMC | |
98 | Trace mode, the 64 bit trace SCOM value is initialized with the event | |
99 | information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event | |
100 | to be monitored and the sampling duration. On each overflow in the CPMCxSEL, | |
101 | hardware snapshots the program counter along with event counts and writes into | |
102 | memory pointed by LDBAR. | |
103 | ||
104 | LDBAR is a 64 bit special purpose per thread register, it has bits to indicate | |
105 | whether hardware is configured for accumulation or trace mode. | |
106 | ||
107 | LDBAR Register Layout | |
108 | --------------------- | |
109 | ||
110 | +-------+----------------------+ | |
111 | | 0 | Enable/Disable | | |
112 | +-------+----------------------+ | |
113 | | 1 | 0: Accumulation Mode | | |
114 | | +----------------------+ | |
115 | | | 1: Trace Mode | | |
116 | +-------+----------------------+ | |
117 | | 2:3 | Reserved | | |
118 | +-------+----------------------+ | |
119 | | 4-6 | PB scope | | |
120 | +-------+----------------------+ | |
121 | | 7 | Reserved | | |
122 | +-------+----------------------+ | |
123 | | 8:50 | Counter Address | | |
124 | +-------+----------------------+ | |
125 | | 51:63 | Reserved | | |
126 | +-------+----------------------+ | |
127 | ||
128 | TRACE_IMC_SCOM bit representation | |
129 | --------------------------------- | |
130 | ||
131 | +-------+------------+ | |
132 | | 0:1 | SAMPSEL | | |
133 | +-------+------------+ | |
134 | | 2:33 | CPMC_LOAD | | |
135 | +-------+------------+ | |
136 | | 34:40 | CPMC1SEL | | |
137 | +-------+------------+ | |
138 | | 41:47 | CPMC2SEL | | |
139 | +-------+------------+ | |
140 | | 48:50 | BUFFERSIZE | | |
141 | +-------+------------+ | |
142 | | 51:63 | RESERVED | | |
143 | +-------+------------+ | |
144 | ||
145 | CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the | |
146 | event to count. BUFFERSIZE indicates the memory range. On each overflow, | |
147 | hardware snapshots the program counter along with event counts and updates the | |
148 | memory and reloads the CMPC_LOAD value for the next sampling duration. IMC | |
149 | hardware does not support exceptions, so it quietly wraps around if memory | |
150 | buffer reaches the end. | |
151 | ||
152 | *Currently the event monitored for trace-mode is fixed as cycle.* | |
153 | ||
154 | Trace IMC example usage | |
155 | ======================= | |
156 | ||
157 | .. code-block:: sh | |
158 | ||
159 | # perf list | |
160 | [....] | |
161 | trace_imc/trace_cycles/ [Kernel PMU event] | |
162 | ||
163 | To record an application/process with trace-imc event: | |
164 | ||
165 | .. code-block:: sh | |
166 | ||
167 | # perf record -e trace_imc/trace_cycles/ yes > /dev/null | |
168 | [ perf record: Woken up 1 times to write data ] | |
169 | [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ] | |
170 | ||
171 | The `perf.data` generated, can be read using perf report. | |
172 | ||
173 | Benefits of using IMC trace-mode | |
174 | ================================ | |
175 | ||
176 | PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC | |
177 | trace mode snapshots the program counter and updates to the memory. And this | |
178 | also provide a way for the operating system to do instruction sampling in real | |
179 | time without PMI processing overhead. | |
180 | ||
181 | Performance data using `perf top` with and without trace-imc event. | |
182 | ||
183 | PMI interrupts count when `perf top` command is executed without trace-imc event. | |
184 | ||
185 | .. code-block:: sh | |
186 | ||
187 | # grep PMI /proc/interrupts | |
188 | PMI: 0 0 0 0 Performance monitoring interrupts | |
189 | # ./perf top | |
190 | ... | |
191 | # grep PMI /proc/interrupts | |
192 | PMI: 39735 8710 17338 17801 Performance monitoring interrupts | |
193 | # ./perf top -e trace_imc/trace_cycles/ | |
194 | ... | |
195 | # grep PMI /proc/interrupts | |
196 | PMI: 39735 8710 17338 17801 Performance monitoring interrupts | |
197 | ||
198 | ||
199 | That is, the PMI interrupt counts do not increment when using the `trace_imc` event. |