[mirror_ubuntu-bionic-kernel.git] / tools / perf / Documentation / perf-stat.txt

perf-stat(1)
============

NAME
----
perf-stat - Run a command and gather performance counter statistics

SYNOPSIS
--------
[verse]
'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
'perf stat' report [-i file]

DESCRIPTION
-----------
This command runs a command and gathers performance counter statistics
from it.


OPTIONS
-------
<command>...::
	Any command you can specify in a shell.

record::
	See STAT RECORD.

report::
	See STAT REPORT.

-e::
--event=::
	Select the PMU event. Selection can be:

	- a symbolic event name (use 'perf list' to list all events)

	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
	  hexadecimal event descriptor.

	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
	  param1 and param2 are defined as formats for the PMU in
	  /sys/bus/event_sources/devices/<pmu>/format/*

	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
	  where M, N, K are numbers (in decimal, hex, octal format).
	  Acceptable values for each of 'config', 'config1' and 'config2'
	  parameters are defined by corresponding entries in
	  /sys/bus/event_sources/devices/<pmu>/format/*

-i::
--no-inherit::
        child tasks do not inherit counters
-p::
--pid=<pid>::
        stat events on existing process id (comma separated list)

-t::
--tid=<tid>::
        stat events on existing thread id (comma separated list)


-a::
--all-cpus::
        system-wide collection from all CPUs (default if no target is specified)

-c::
--scale::
	scale/normalize counter values

-d::
--detailed::
	print more detailed statistics, can be specified up to 3 times

	   -d:          detailed events, L1 and LLC data cache
        -d -d:     more detailed events, dTLB and iTLB events
     -d -d -d:     very detailed events, adding prefetch events

-r::
--repeat=<n>::
	repeat command and print average + stddev (max: 100). 0 means forever.

-B::
--big-num::
        print large numbers with thousands' separators according to locale

-C::
--cpu=::
Count only on the list of CPUs provided. Multiple CPUs can be provided as a
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
In per-thread mode, this option is ignored. The -a option is still necessary
to activate system-wide monitoring. Default is to count on all CPUs.

-A::
--no-aggr::
Do not aggregate counts across all monitored CPUs.

-n::
--null::
        null run - don't start any counters

-v::
--verbose::
        be more verbose (show counter open errors, etc)

-x SEP::
--field-separator SEP::
print counts using a CSV-style output to make it easy to import directly into
spreadsheets. Columns are separated by the string specified in SEP.

-G name::
--cgroup name::
monitor only in the container (cgroup) called "name". This option is available only
in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
to first event, second cgroup to second event and so on. It is possible to provide
an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
corresponding events, i.e., they always refer to events defined earlier on the command
line.

-o file::
--output file::
Print the output into the designated file.

--append::
Append to the output file designated with the -o option. Ignored if -o is not specified.

--log-fd::

Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
with it.  --append may be used here.  Examples:
     3>results  perf stat --log-fd 3          -- $cmd
     3>>results perf stat --log-fd 3 --append -- $cmd

--pre::
--post::
	Pre and post measurement hooks, e.g.:

perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage

-I msecs::
--interval-print msecs::
Print count deltas every N milliseconds (minimum: 10ms)
The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals.  Use with caution.
	example: 'perf stat -I 1000 -e cycles -a sleep 5'

--metric-only::
Only print computed metrics. Print them in a single line.
Don't show any raw values. Not supported with --per-thread.

--per-socket::
Aggregate counts per processor socket for system-wide mode measurements.  This
is a useful mode to detect imbalance between sockets.  To enable this mode,
use --per-socket in addition to -a. (system-wide).  The output includes the
socket number and the number of online processors on that socket. This is
useful to gauge the amount of aggregation.

--per-core::
Aggregate counts per physical processor for system-wide mode measurements.  This
is a useful mode to detect imbalance between physical cores.  To enable this mode,
use --per-core in addition to -a. (system-wide).  The output includes the
core number and the number of online logical processors on that physical processor.

--per-thread::
Aggregate counts per monitored threads, when monitoring threads (-t option)
or processes (-p option).

-D msecs::
--delay msecs::
After starting the program, wait msecs before measuring. This is useful to
filter out the startup phase of the program, which is often very different.

-T::
--transaction::

Print statistics of transactional execution if supported.

STAT RECORD
-----------
Stores stat data into perf data file.

-o file::
--output file::
Output file name.

STAT REPORT
-----------
Reads and reports stat data from perf data file.

-i file::
--input file::
Input file name.

--per-socket::
Aggregate counts per processor socket for system-wide mode measurements.

--per-core::
Aggregate counts per physical processor for system-wide mode measurements.

-A::
--no-aggr::
Do not aggregate counts across all monitored CPUs.

--topdown::
Print top down level 1 metrics if supported by the CPU. This allows to
determine bottle necks in the CPU pipeline for CPU bound workloads,
by breaking the cycles consumed down into frontend bound, backend bound,
bad speculation and retiring.

Frontend bound means that the CPU cannot fetch and decode instructions fast
enough. Backend bound means that computation or memory access is the bottle
neck. Bad Speculation means that the CPU wasted cycles due to branch
mispredictions and similar issues. Retiring means that the CPU computed without
an apparently bottleneck. The bottleneck is only the real bottleneck
if the workload is actually bound by the CPU and not by something else.

For best results it is usually a good idea to use it with interval
mode like -I 1000, as the bottleneck of workloads can change often.

The top down metrics are collected per core instead of per
CPU thread. Per core mode is automatically enabled
and -a (global monitoring) is needed, requiring root rights or
perf.perf_event_paranoid=-1.

Topdown uses the full Performance Monitoring Unit, and needs
disabling of the NMI watchdog (as root):
echo 0 > /proc/sys/kernel/nmi_watchdog
for best results. Otherwise the bottlenecks may be inconsistent
on workload with changing phases.

This enables --metric-only, unless overriden with --no-metric-only.

To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.

--no-merge::
Do not merge results from same PMUs.

EXAMPLES
--------

$ perf stat -- make -j

 Performance counter stats for 'make -j':

    8117.370256  task clock ticks     #      11.281 CPU utilization factor
            678  context switches     #       0.000 M/sec
            133  CPU migrations       #       0.000 M/sec
         235724  pagefaults           #       0.029 M/sec
    24821162526  CPU cycles           #    3057.784 M/sec
    18687303457  instructions         #    2302.138 M/sec
      172158895  cache references     #      21.209 M/sec
       27075259  cache misses         #       3.335 M/sec

 Wall-clock time elapsed:   719.554352 msecs

CSV FORMAT
----------

With -x, perf stat is able to output a not-quite-CSV format output
Commas in the output are not put into "". To make it easy to parse
it is recommended to use a different character like -x \;

The fields are in this order:

	- optional usec time stamp in fractions of second (with -I xxx)
	- optional CPU, core, or socket identifier
	- optional number of logical CPUs aggregated
	- counter value
	- unit of the counter value or empty
	- event name
	- run time of counter
	- percentage of measurement time the counter was running
	- optional variance if multiple values are collected with -r
	- optional metric value
	- optional unit of metric

Additional metrics may be printed with all earlier fields being empty.

SEE ALSO
--------
linkperf:perf-top[1], linkperf:perf-list[1]
Commit	Line	Data
1d8c8b20	1	perf-stat(1)
6e6b754f	2	============
1d8c8b20 IM	3
	4	NAME
	5	----
	6	perf-stat - Run a command and gather performance counter statistics
	7
	8	SYNOPSIS
	9	--------
	10	[verse]
8c207692 SB	11	'perf stat' [-e <EVENT> \| --event=EVENT] [-a] <command>
8c207692 SB	12	'perf stat' [-e <EVENT> \| --event=EVENT] [-a] -- <command> [<options>]
4979d0c7	13	'perf stat' [-e <EVENT> \| --event=EVENT] [-a] record [-o file] -- <command> [<options>]
ba6039b6	14	'perf stat' report [-i file]
1d8c8b20 IM	15
	16	DESCRIPTION
	17	-----------
	18	This command runs a command and gathers performance counter statistics
	19	from it.
	20
	21
	22	OPTIONS
	23	-------
	24	<command>...::
	25	Any command you can specify in a shell.
	26
4979d0c7 JO	27	record::
4979d0c7 JO	28	See STAT RECORD.
20c84e95	29
ba6039b6 JO	30	report::
	31	See STAT REPORT.
	32
1d8c8b20 IM	33	-e::
1d8c8b20 IM	34	--event=::
f9ab9c19 CS	35	Select the PMU event. Selection can be:
	36
	37	- a symbolic event name (use 'perf list' to list all events)
	38
	39	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
	40	hexadecimal event descriptor.
	41
	42	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
	43	param1 and param2 are defined as formats for the PMU in
	44	/sys/bus/event_sources/devices/<pmu>/format/*
	45
	46	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
	47	where M, N, K are numbers (in decimal, hex, octal format).
	48	Acceptable values for each of 'config', 'config1' and 'config2'
	49	parameters are defined by corresponding entries in
	50	/sys/bus/event_sources/devices/<pmu>/format/*
1d8c8b20	51
20c84e95	52	-i::
2e6cdf99 SE	53	--no-inherit::
2e6cdf99 SE	54	child tasks do not inherit counters
20c84e95 IM	55	-p::
20c84e95 IM	56	--pid=<pid>::
b52956c9	57	stat events on existing process id (comma separated list)
8c207692 SB	58
	59	-t::
	60	--tid=<tid>::
b52956c9	61	stat events on existing thread id (comma separated list)
8c207692	62
20c84e95	63
1d8c8b20	64	-a::
8c207692	65	--all-cpus::
0d79f8b9	66	system-wide collection from all CPUs (default if no target is specified)
1d8c8b20	67
b26bc5a7	68	-c::
8c207692 SB	69	--scale::
	70	scale/normalize counter values
	71
f594bae0 BP	72	-d::
	73	--detailed::
	74	print more detailed statistics, can be specified up to 3 times
	75
	76	-d: detailed events, L1 and LLC data cache
	77	-d -d: more detailed events, dTLB and iTLB events
	78	-d -d -d: very detailed events, adding prefetch events
	79
8c207692 SB	80	-r::
8c207692 SB	81	--repeat=<n>::
a7e191c3	82	repeat command and print average + stddev (max: 100). 0 means forever.
1d8c8b20	83
5af52b51	84	-B::
8c207692	85	--big-num::
5af52b51 SE	86	print large numbers with thousands' separators according to locale
5af52b51 SE	87
c45c6ea2 SE	88	-C::
c45c6ea2 SE	89	--cpu=::
8c207692 SB	90	Count only on the list of CPUs provided. Multiple CPUs can be provided as a
8c207692 SB	91	comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
c45c6ea2 SE	92	In per-thread mode, this option is ignored. The -a option is still necessary
	93	to activate system-wide monitoring. Default is to count on all CPUs.
	94
f5b4a9c3 SE	95	-A::
f5b4a9c3 SE	96	--no-aggr::
efc9c056	97	Do not aggregate counts across all monitored CPUs.
f5b4a9c3	98
8c207692 SB	99	-n::
	100	--null::
	101	null run - don't start any counters
	102
	103	-v::
	104	--verbose::
	105	be more verbose (show counter open errors, etc)
	106
d7470b6a SE	107	-x SEP::
	108	--field-separator SEP::
	109	print counts using a CSV-style output to make it easy to import directly into
	110	spreadsheets. Columns are separated by the string specified in SEP.
	111
023695d9 SE	112	-G name::
	113	--cgroup name::
	114	monitor only in the container (cgroup) called "name". This option is available only
	115	in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
	116	container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
	117	can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
	118	to first event, second cgroup to second event and so on. It is possible to provide
	119	an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
	120	corresponding events, i.e., they always refer to events defined earlier on the command
	121	line.
	122
4aa9015f	123	-o file::
56f3bae7	124	--output file::
4aa9015f SE	125	Print the output into the designated file.
	126
	127	--append::
	128	Append to the output file designated with the -o option. Ignored if -o is not specified.
	129
56f3bae7 JC	130	--log-fd::
	131
	132	Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
	133	with it. --append may be used here. Examples:
	134	3>results perf stat --log-fd 3 -- $cmd
	135	3>>results perf stat --log-fd 3 --append -- $cmd
	136
1f16c575 PZ	137	--pre::
	138	--post::
	139	Pre and post measurement hooks, e.g.:
	140
	141	perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage
56f3bae7	142
13370a9b SE	143	-I msecs::
13370a9b SE	144	--interval-print msecs::
19afd104 KL	145	Print count deltas every N milliseconds (minimum: 10ms)
	146	The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution.
	147	example: 'perf stat -I 1000 -e cycles -a sleep 5'
56f3bae7	148
54b50916 AK	149	--metric-only::
54b50916 AK	150	Only print computed metrics. Print them in a single line.
206cab65	151	Don't show any raw values. Not supported with --per-thread.
54b50916	152
d4304958	153	--per-socket::
d7e7a451 SE	154	Aggregate counts per processor socket for system-wide mode measurements. This
d7e7a451 SE	155	is a useful mode to detect imbalance between sockets. To enable this mode,
d4304958	156	use --per-socket in addition to -a. (system-wide). The output includes the
d7e7a451 SE	157	socket number and the number of online processors on that socket. This is
	158	useful to gauge the amount of aggregation.
	159
12c08a9f SE	160	--per-core::
	161	Aggregate counts per physical processor for system-wide mode measurements. This
	162	is a useful mode to detect imbalance between physical cores. To enable this mode,
	163	use --per-core in addition to -a. (system-wide). The output includes the
	164	core number and the number of online logical processors on that physical processor.
	165
32b8af82 JO	166	--per-thread::
	167	Aggregate counts per monitored threads, when monitoring threads (-t option)
	168	or processes (-p option).
	169
41191688	170	-D msecs::
8f3dd2b0	171	--delay msecs::
41191688 AK	172	After starting the program, wait msecs before measuring. This is useful to
	173	filter out the startup phase of the program, which is often very different.
	174
4cabc3d1 AK	175	-T::
	176	--transaction::
	177
	178	Print statistics of transactional execution if supported.
	179
4979d0c7 JO	180	STAT RECORD
	181	-----------
	182	Stores stat data into perf data file.
	183
	184	-o file::
	185	--output file::
	186	Output file name.
	187
ba6039b6 JO	188	STAT REPORT
	189	-----------
	190	Reads and reports stat data from perf data file.
	191
	192	-i file::
	193	--input file::
	194	Input file name.
	195
89af4e05 JO	196	--per-socket::
	197	Aggregate counts per processor socket for system-wide mode measurements.
	198
	199	--per-core::
	200	Aggregate counts per physical processor for system-wide mode measurements.
	201
	202	-A::
	203	--no-aggr::
	204	Do not aggregate counts across all monitored CPUs.
	205
44b1e60a AK	206	--topdown::
	207	Print top down level 1 metrics if supported by the CPU. This allows to
	208	determine bottle necks in the CPU pipeline for CPU bound workloads,
	209	by breaking the cycles consumed down into frontend bound, backend bound,
	210	bad speculation and retiring.
	211
	212	Frontend bound means that the CPU cannot fetch and decode instructions fast
	213	enough. Backend bound means that computation or memory access is the bottle
	214	neck. Bad Speculation means that the CPU wasted cycles due to branch
	215	mispredictions and similar issues. Retiring means that the CPU computed without
	216	an apparently bottleneck. The bottleneck is only the real bottleneck
	217	if the workload is actually bound by the CPU and not by something else.
	218
	219	For best results it is usually a good idea to use it with interval
	220	mode like -I 1000, as the bottleneck of workloads can change often.
	221
	222	The top down metrics are collected per core instead of per
	223	CPU thread. Per core mode is automatically enabled
	224	and -a (global monitoring) is needed, requiring root rights or
	225	perf.perf_event_paranoid=-1.
	226
	227	Topdown uses the full Performance Monitoring Unit, and needs
	228	disabling of the NMI watchdog (as root):
	229	echo 0 > /proc/sys/kernel/nmi_watchdog
	230	for best results. Otherwise the bottlenecks may be inconsistent
	231	on workload with changing phases.
	232
	233	This enables --metric-only, unless overriden with --no-metric-only.
	234
	235	To interpret the results it is usually needed to know on which
	236	CPUs the workload runs on. If needed the CPUs can be forced using
	237	taskset.
4979d0c7	238
430daf2d AK	239	--no-merge::
	240	Do not merge results from same PMUs.
	241
1d8c8b20 IM	242	EXAMPLES
	243	--------
	244
20c84e95	245	$ perf stat -- make -j
1d8c8b20	246
20c84e95	247	Performance counter stats for 'make -j':
1d8c8b20	248
20c84e95 IM	249	8117.370256 task clock ticks # 11.281 CPU utilization factor
	250	678 context switches # 0.000 M/sec
	251	133 CPU migrations # 0.000 M/sec
	252	235724 pagefaults # 0.029 M/sec
	253	24821162526 CPU cycles # 3057.784 M/sec
	254	18687303457 instructions # 2302.138 M/sec
	255	172158895 cache references # 21.209 M/sec
	256	27075259 cache misses # 3.335 M/sec
1d8c8b20	257
20c84e95	258	Wall-clock time elapsed: 719.554352 msecs
1d8c8b20	259
6b45f7b2 AK	260	CSV FORMAT
	261	----------
	262
	263	With -x, perf stat is able to output a not-quite-CSV format output
	264	Commas in the output are not put into "". To make it easy to parse
	265	it is recommended to use a different character like -x \;
	266
	267	The fields are in this order:
	268
	269	- optional usec time stamp in fractions of second (with -I xxx)
	270	- optional CPU, core, or socket identifier
	271	- optional number of logical CPUs aggregated
	272	- counter value
	273	- unit of the counter value or empty
	274	- event name
	275	- run time of counter
	276	- percentage of measurement time the counter was running
	277	- optional variance if multiple values are collected with -r
	278	- optional metric value
	279	- optional unit of metric
	280
	281	Additional metrics may be printed with all earlier fields being empty.
	282
1d8c8b20 IM	283	SEE ALSO
1d8c8b20 IM	284	--------
386b05e3	285	linkperf:perf-top[1], linkperf:perf-list[1]