]>
Commit | Line | Data |
---|---|---|
1d8c8b20 | 1 | perf-stat(1) |
6e6b754f | 2 | ============ |
1d8c8b20 IM |
3 | |
4 | NAME | |
5 | ---- | |
6 | perf-stat - Run a command and gather performance counter statistics | |
7 | ||
8 | SYNOPSIS | |
9 | -------- | |
10 | [verse] | |
8c207692 SB |
11 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command> |
12 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] | |
4979d0c7 | 13 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>] |
ba6039b6 | 14 | 'perf stat' report [-i file] |
1d8c8b20 IM |
15 | |
16 | DESCRIPTION | |
17 | ----------- | |
18 | This command runs a command and gathers performance counter statistics | |
19 | from it. | |
20 | ||
21 | ||
22 | OPTIONS | |
23 | ------- | |
24 | <command>...:: | |
25 | Any command you can specify in a shell. | |
26 | ||
4979d0c7 JO |
27 | record:: |
28 | See STAT RECORD. | |
20c84e95 | 29 | |
ba6039b6 JO |
30 | report:: |
31 | See STAT REPORT. | |
32 | ||
1d8c8b20 IM |
33 | -e:: |
34 | --event=:: | |
f9ab9c19 CS |
35 | Select the PMU event. Selection can be: |
36 | ||
37 | - a symbolic event name (use 'perf list' to list all events) | |
38 | ||
39 | - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a | |
40 | hexadecimal event descriptor. | |
41 | ||
42 | - a symbolically formed event like 'pmu/param1=0x3,param2/' where | |
43 | param1 and param2 are defined as formats for the PMU in | |
44 | /sys/bus/event_sources/devices/<pmu>/format/* | |
45 | ||
46 | - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' | |
47 | where M, N, K are numbers (in decimal, hex, octal format). | |
48 | Acceptable values for each of 'config', 'config1' and 'config2' | |
49 | parameters are defined by corresponding entries in | |
50 | /sys/bus/event_sources/devices/<pmu>/format/* | |
1d8c8b20 | 51 | |
20c84e95 | 52 | -i:: |
2e6cdf99 SE |
53 | --no-inherit:: |
54 | child tasks do not inherit counters | |
20c84e95 IM |
55 | -p:: |
56 | --pid=<pid>:: | |
b52956c9 | 57 | stat events on existing process id (comma separated list) |
8c207692 SB |
58 | |
59 | -t:: | |
60 | --tid=<tid>:: | |
b52956c9 | 61 | stat events on existing thread id (comma separated list) |
8c207692 | 62 | |
20c84e95 | 63 | |
1d8c8b20 | 64 | -a:: |
8c207692 | 65 | --all-cpus:: |
0d79f8b9 | 66 | system-wide collection from all CPUs (default if no target is specified) |
1d8c8b20 | 67 | |
b26bc5a7 | 68 | -c:: |
8c207692 SB |
69 | --scale:: |
70 | scale/normalize counter values | |
71 | ||
f594bae0 BP |
72 | -d:: |
73 | --detailed:: | |
74 | print more detailed statistics, can be specified up to 3 times | |
75 | ||
76 | -d: detailed events, L1 and LLC data cache | |
77 | -d -d: more detailed events, dTLB and iTLB events | |
78 | -d -d -d: very detailed events, adding prefetch events | |
79 | ||
8c207692 SB |
80 | -r:: |
81 | --repeat=<n>:: | |
a7e191c3 | 82 | repeat command and print average + stddev (max: 100). 0 means forever. |
1d8c8b20 | 83 | |
5af52b51 | 84 | -B:: |
8c207692 | 85 | --big-num:: |
5af52b51 SE |
86 | print large numbers with thousands' separators according to locale |
87 | ||
c45c6ea2 SE |
88 | -C:: |
89 | --cpu=:: | |
8c207692 SB |
90 | Count only on the list of CPUs provided. Multiple CPUs can be provided as a |
91 | comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. | |
c45c6ea2 SE |
92 | In per-thread mode, this option is ignored. The -a option is still necessary |
93 | to activate system-wide monitoring. Default is to count on all CPUs. | |
94 | ||
f5b4a9c3 SE |
95 | -A:: |
96 | --no-aggr:: | |
efc9c056 | 97 | Do not aggregate counts across all monitored CPUs. |
f5b4a9c3 | 98 | |
8c207692 SB |
99 | -n:: |
100 | --null:: | |
101 | null run - don't start any counters | |
102 | ||
103 | -v:: | |
104 | --verbose:: | |
105 | be more verbose (show counter open errors, etc) | |
106 | ||
d7470b6a SE |
107 | -x SEP:: |
108 | --field-separator SEP:: | |
109 | print counts using a CSV-style output to make it easy to import directly into | |
110 | spreadsheets. Columns are separated by the string specified in SEP. | |
111 | ||
023695d9 SE |
112 | -G name:: |
113 | --cgroup name:: | |
114 | monitor only in the container (cgroup) called "name". This option is available only | |
115 | in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to | |
116 | container "name" are monitored when they run on the monitored CPUs. Multiple cgroups | |
117 | can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup | |
118 | to first event, second cgroup to second event and so on. It is possible to provide | |
119 | an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have | |
120 | corresponding events, i.e., they always refer to events defined earlier on the command | |
121 | line. | |
122 | ||
4aa9015f | 123 | -o file:: |
56f3bae7 | 124 | --output file:: |
4aa9015f SE |
125 | Print the output into the designated file. |
126 | ||
127 | --append:: | |
128 | Append to the output file designated with the -o option. Ignored if -o is not specified. | |
129 | ||
56f3bae7 JC |
130 | --log-fd:: |
131 | ||
132 | Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive | |
133 | with it. --append may be used here. Examples: | |
134 | 3>results perf stat --log-fd 3 -- $cmd | |
135 | 3>>results perf stat --log-fd 3 --append -- $cmd | |
136 | ||
1f16c575 PZ |
137 | --pre:: |
138 | --post:: | |
139 | Pre and post measurement hooks, e.g.: | |
140 | ||
141 | perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage | |
56f3bae7 | 142 | |
13370a9b SE |
143 | -I msecs:: |
144 | --interval-print msecs:: | |
19afd104 KL |
145 | Print count deltas every N milliseconds (minimum: 10ms) |
146 | The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. | |
147 | example: 'perf stat -I 1000 -e cycles -a sleep 5' | |
56f3bae7 | 148 | |
54b50916 AK |
149 | --metric-only:: |
150 | Only print computed metrics. Print them in a single line. | |
206cab65 | 151 | Don't show any raw values. Not supported with --per-thread. |
54b50916 | 152 | |
d4304958 | 153 | --per-socket:: |
d7e7a451 SE |
154 | Aggregate counts per processor socket for system-wide mode measurements. This |
155 | is a useful mode to detect imbalance between sockets. To enable this mode, | |
d4304958 | 156 | use --per-socket in addition to -a. (system-wide). The output includes the |
d7e7a451 SE |
157 | socket number and the number of online processors on that socket. This is |
158 | useful to gauge the amount of aggregation. | |
159 | ||
12c08a9f SE |
160 | --per-core:: |
161 | Aggregate counts per physical processor for system-wide mode measurements. This | |
162 | is a useful mode to detect imbalance between physical cores. To enable this mode, | |
163 | use --per-core in addition to -a. (system-wide). The output includes the | |
164 | core number and the number of online logical processors on that physical processor. | |
165 | ||
32b8af82 JO |
166 | --per-thread:: |
167 | Aggregate counts per monitored threads, when monitoring threads (-t option) | |
168 | or processes (-p option). | |
169 | ||
41191688 | 170 | -D msecs:: |
8f3dd2b0 | 171 | --delay msecs:: |
41191688 AK |
172 | After starting the program, wait msecs before measuring. This is useful to |
173 | filter out the startup phase of the program, which is often very different. | |
174 | ||
4cabc3d1 AK |
175 | -T:: |
176 | --transaction:: | |
177 | ||
178 | Print statistics of transactional execution if supported. | |
179 | ||
4979d0c7 JO |
180 | STAT RECORD |
181 | ----------- | |
182 | Stores stat data into perf data file. | |
183 | ||
184 | -o file:: | |
185 | --output file:: | |
186 | Output file name. | |
187 | ||
ba6039b6 JO |
188 | STAT REPORT |
189 | ----------- | |
190 | Reads and reports stat data from perf data file. | |
191 | ||
192 | -i file:: | |
193 | --input file:: | |
194 | Input file name. | |
195 | ||
89af4e05 JO |
196 | --per-socket:: |
197 | Aggregate counts per processor socket for system-wide mode measurements. | |
198 | ||
199 | --per-core:: | |
200 | Aggregate counts per physical processor for system-wide mode measurements. | |
201 | ||
202 | -A:: | |
203 | --no-aggr:: | |
204 | Do not aggregate counts across all monitored CPUs. | |
205 | ||
44b1e60a AK |
206 | --topdown:: |
207 | Print top down level 1 metrics if supported by the CPU. This allows to | |
208 | determine bottle necks in the CPU pipeline for CPU bound workloads, | |
209 | by breaking the cycles consumed down into frontend bound, backend bound, | |
210 | bad speculation and retiring. | |
211 | ||
212 | Frontend bound means that the CPU cannot fetch and decode instructions fast | |
213 | enough. Backend bound means that computation or memory access is the bottle | |
214 | neck. Bad Speculation means that the CPU wasted cycles due to branch | |
215 | mispredictions and similar issues. Retiring means that the CPU computed without | |
216 | an apparently bottleneck. The bottleneck is only the real bottleneck | |
217 | if the workload is actually bound by the CPU and not by something else. | |
218 | ||
219 | For best results it is usually a good idea to use it with interval | |
220 | mode like -I 1000, as the bottleneck of workloads can change often. | |
221 | ||
222 | The top down metrics are collected per core instead of per | |
223 | CPU thread. Per core mode is automatically enabled | |
224 | and -a (global monitoring) is needed, requiring root rights or | |
225 | perf.perf_event_paranoid=-1. | |
226 | ||
227 | Topdown uses the full Performance Monitoring Unit, and needs | |
228 | disabling of the NMI watchdog (as root): | |
229 | echo 0 > /proc/sys/kernel/nmi_watchdog | |
230 | for best results. Otherwise the bottlenecks may be inconsistent | |
231 | on workload with changing phases. | |
232 | ||
233 | This enables --metric-only, unless overriden with --no-metric-only. | |
234 | ||
235 | To interpret the results it is usually needed to know on which | |
236 | CPUs the workload runs on. If needed the CPUs can be forced using | |
237 | taskset. | |
4979d0c7 | 238 | |
430daf2d AK |
239 | --no-merge:: |
240 | Do not merge results from same PMUs. | |
241 | ||
1d8c8b20 IM |
242 | EXAMPLES |
243 | -------- | |
244 | ||
20c84e95 | 245 | $ perf stat -- make -j |
1d8c8b20 | 246 | |
20c84e95 | 247 | Performance counter stats for 'make -j': |
1d8c8b20 | 248 | |
20c84e95 IM |
249 | 8117.370256 task clock ticks # 11.281 CPU utilization factor |
250 | 678 context switches # 0.000 M/sec | |
251 | 133 CPU migrations # 0.000 M/sec | |
252 | 235724 pagefaults # 0.029 M/sec | |
253 | 24821162526 CPU cycles # 3057.784 M/sec | |
254 | 18687303457 instructions # 2302.138 M/sec | |
255 | 172158895 cache references # 21.209 M/sec | |
256 | 27075259 cache misses # 3.335 M/sec | |
1d8c8b20 | 257 | |
20c84e95 | 258 | Wall-clock time elapsed: 719.554352 msecs |
1d8c8b20 | 259 | |
6b45f7b2 AK |
260 | CSV FORMAT |
261 | ---------- | |
262 | ||
263 | With -x, perf stat is able to output a not-quite-CSV format output | |
264 | Commas in the output are not put into "". To make it easy to parse | |
265 | it is recommended to use a different character like -x \; | |
266 | ||
267 | The fields are in this order: | |
268 | ||
269 | - optional usec time stamp in fractions of second (with -I xxx) | |
270 | - optional CPU, core, or socket identifier | |
271 | - optional number of logical CPUs aggregated | |
272 | - counter value | |
273 | - unit of the counter value or empty | |
274 | - event name | |
275 | - run time of counter | |
276 | - percentage of measurement time the counter was running | |
277 | - optional variance if multiple values are collected with -r | |
278 | - optional metric value | |
279 | - optional unit of metric | |
280 | ||
281 | Additional metrics may be printed with all earlier fields being empty. | |
282 | ||
1d8c8b20 IM |
283 | SEE ALSO |
284 | -------- | |
386b05e3 | 285 | linkperf:perf-top[1], linkperf:perf-list[1] |