[mirror_ubuntu-jammy-kernel.git] / tools / perf / Documentation / perf-top.txt

perf-top(1)
===========

NAME
----
perf-top - System profiling tool.

SYNOPSIS
--------
[verse]
'perf top' [-e <EVENT> | --event=EVENT] [<options>]

DESCRIPTION
-----------
This command generates and displays a performance counter profile in real time.


OPTIONS
-------
-a::
--all-cpus::
        System-wide collection.  (default)

-c <count>::
--count=<count>::
	Event period to sample.

-C <cpu-list>::
--cpu=<cpu>::
Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
Default is to monitor all CPUS.

-d <seconds>::
--delay=<seconds>::
	Number of seconds to delay between refreshes.

-e <event>::
--event=<event>::
	Select the PMU event. Selection can be a symbolic event name
	(use 'perf list' to list all events) or a raw PMU
	event (eventsel+umask) in the form of rNNN where NNN is a
	hexadecimal event descriptor.

-E <entries>::
--entries=<entries>::
	Display this many functions.

-f <count>::
--count-filter=<count>::
	Only display functions with more events than this.

--group::
        Put the counters into a counter group.

--group-sort-idx::
	Sort the output by the event at the index n in group. If n is invalid,
	sort by the first event. It can support multiple groups with different
	amount of events. WARNING: This should be used on grouped events.

-F <freq>::
--freq=<freq>::
	Profile at this frequency. Use 'max' to use the currently maximum
	allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate
	sysctl.

-i::
--inherit::
	Child tasks do not inherit counters.

-k <path>::
--vmlinux=<path>::
	Path to vmlinux.  Required for annotation functionality.

--ignore-vmlinux::
	Ignore vmlinux files.

--kallsyms=<file>::
	kallsyms pathname

-m <pages>::
--mmap-pages=<pages>::
	Number of mmap data pages (must be a power of two) or size
	specification with appended unit character - B/K/M/G. The
	size is rounded up to have nearest pages power of two value.

-p <pid>::
--pid=<pid>::
	Profile events on existing Process ID (comma separated list).

-t <tid>::
--tid=<tid>::
        Profile events on existing thread ID (comma separated list).

-u::
--uid=::
        Record events in threads owned by uid. Name or number.

-r <priority>::
--realtime=<priority>::
	Collect data with this RT SCHED_FIFO priority.

--sym-annotate=<symbol>::
        Annotate this symbol.

-K::
--hide_kernel_symbols::
        Hide kernel symbols.

-U::
--hide_user_symbols::
        Hide user symbols.

--demangle-kernel::
        Demangle kernel symbols.

-D::
--dump-symtab::
        Dump the symbol table used for profiling.

-v::
--verbose::
	Be more verbose (show counter open errors, etc).

-z::
--zero::
	Zero history across display updates.

-s::
--sort::
	Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight,
	local_weight, abort, in_tx, transaction, overhead, sample, period.
	Please see description of --sort in the perf-report man page.

--fields=::
	Specify output field - multiple keys can be specified in CSV format.
	Following fields are available:
	overhead, overhead_sys, overhead_us, overhead_children, sample and period.
	Also it can contain any sort key(s).

	By default, every sort keys not specified in --field will be appended
	automatically.

-n::
--show-nr-samples::
	Show a column with the number of samples.

--show-total-period::
	Show a column with the sum of periods.

--dsos::
	Only consider symbols in these dsos.  This option will affect the
	percentage of the overhead column.  See --percentage for more info.

--comms::
	Only consider symbols in these comms.  This option will affect the
	percentage of the overhead column.  See --percentage for more info.

--symbols::
	Only consider these symbols.  This option will affect the
	percentage of the overhead column.  See --percentage for more info.

-M::
--disassembler-style=:: Set disassembler style for objdump.

--prefix=PREFIX::
--prefix-strip=N::
        Remove first N entries from source file path names in executables
        and add PREFIX. This allows to display source code compiled on systems
        with different file system layout.

--source::
	Interleave source code with assembly code. Enabled by default,
	disable with --no-source.

--asm-raw::
	Show raw instruction encoding of assembly instructions.

-g::
	Enables call-graph (stack chain/backtrace) recording.

--call-graph [mode,type,min[,limit],order[,key][,branch]]::
	Setup and enable call-graph (stack chain/backtrace) recording,
	implies -g.  See `--call-graph` section in perf-record and
	perf-report man pages for details.

--children::
	Accumulate callchain of children to parent entry so that then can
	show up in the output.  The output will have a new "Children" column
	and will be sorted on the data.  It requires -g/--call-graph option
	enabled.  See the `overhead calculation' section for more details.
	Enabled by default, disable with --no-children.

--max-stack::
	Set the stack depth limit when parsing the callchain, anything
	beyond the specified depth will be ignored. This is a trade-off
	between information loss and faster processing especially for
	workloads that can have a very long callchain stack.

	Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.

--ignore-callees=<regex>::
        Ignore callees of the function(s) matching the given regex.
        This has the effect of collecting the callers of each such
        function into one place in the call-graph tree.

--percent-limit::
	Do not show entries which have an overhead under that percent.
	(Default: 0).

--percentage::
	Determine how to display the overhead percentage of filtered entries.
	Filters can be applied by --comms, --dsos and/or --symbols options and
	Zoom operations on the TUI (thread, dso, etc).

	"relative" means it's relative to filtered entries only so that the
	sum of shown entries will be always 100%. "absolute" means it retains
	the original value before and after the filter is applied.

-w::
--column-widths=<width[,width...]>::
	Force each column width to the provided list, for large terminal
	readability.  0 means no limit (default behavior).

--proc-map-timeout::
	When processing pre-existing threads /proc/XXX/mmap, it may take
	a long time, because the file may be huge. A time out is needed
	in such cases.
	This option sets the time out limit. The default value is 500 ms.


-b::
--branch-any::
	Enable taken branch stack sampling. Any type of taken branch may be sampled.
	This is a shortcut for --branch-filter any. See --branch-filter for more infos.

-j::
--branch-filter::
	Enable taken branch stack sampling. Each sample captures a series of consecutive
	taken branches. The number of branches captured with each sample depends on the
	underlying hardware, the type of branches of interest, and the executed code.
	It is possible to select the types of branches captured by enabling filters.
	For a full list of modifiers please see the perf record manpage.

	The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
	The privilege levels may be omitted, in which case, the privilege levels of the associated
	event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
	levels are subject to permissions.  When sampling on multiple events, branch stack sampling
	is enabled for all the sampling events. The sampled branch type is the same for all events.
	The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
	Note that this feature may not be available on all processors.

--raw-trace::
	When displaying traceevent output, do not use print fmt or plugins.

--hierarchy::
	Enable hierarchy output.

--overwrite::
	Enable this to use just the most recent records, which helps in high core count
	machines such as Knights Landing/Mill, but right now is disabled by default as
	the pausing used in this technique is leading to loss of metadata events such
	as PERF_RECORD_MMAP which makes 'perf top' unable to resolve samples, leading
	to lots of unknown samples appearing on the UI. Enable this if you are in such
	machines and profiling a workload that doesn't creates short lived threads and/or
	doesn't uses many executable mmap operations. Work is being planed to solve
	this situation, till then, this will remain disabled by default.

--force::
	Don't do ownership validation.

--num-thread-synthesize::
	The number of threads to run when synthesizing events for existing processes.
	By default, the number of threads equals to the number of online CPUs.

--namespaces::
	Record events of type PERF_RECORD_NAMESPACES and display it with the
	'cgroup_id' sort key.

-G name::
--cgroup name::
monitor only in the container (cgroup) called "name". This option is available only
in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
to first event, second cgroup to second event and so on. It is possible to provide
an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
corresponding events, i.e., they always refer to events defined earlier on the command
line. If the user wants to track multiple events for a specific cgroup, the user can
use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.

--all-cgroups::
	Record events of type PERF_RECORD_CGROUP and display it with the
	'cgroup' sort key.

--switch-on EVENT_NAME::
	Only consider events after this event is found.

	E.g.:

           Find out where broadcast packets are handled

		perf probe -L icmp_rcv

	   Insert a probe there:

		perf probe icmp_rcv:59

	   Start perf top and ask it to only consider the cycles events when a
           broadcast packet arrives This will show a menu with two entries and
           will start counting when a broadcast packet arrives:

		perf top -e cycles,probe:icmp_rcv --switch-on=probe:icmp_rcv

	   Alternatively one can ask for --group and then two overhead columns
           will appear, the first for cycles and the second for the switch-on event.

		perf top --group -e cycles,probe:icmp_rcv --switch-on=probe:icmp_rcv

	This may be interesting to measure a workload only after some initialization
	phase is over, i.e. insert a perf probe at that point and use the above
	examples replacing probe:icmp_rcv with the just-after-init probe.

--switch-off EVENT_NAME::
	Stop considering events after this event is found.

--show-on-off-events::
	Show the --switch-on/off events too. This has no effect in 'perf top' now
	but probably we'll make the default not to show the switch-on/off events
        on the --group mode and if there is only one event besides the off/on ones,
	go straight to the histogram browser, just like 'perf top' with no events
	explicitly specified does.

--stitch-lbr::
	Show callgraph with stitched LBRs, which may have more complete
	callgraph. The option must be used with --call-graph lbr recording.
	Disabled by default. In common cases with call stack overflows,
	it can recreate better call stacks than the default lbr call stack
	output. But this approach is not full proof. There can be cases
	where it creates incorrect call stacks from incorrect matches.
	The known limitations include exception handing such as
	setjmp/longjmp will have calls/returns not match.

ifdef::HAVE_LIBPFM[]
--pfm-events events::
Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
including support for event filters. For example '--pfm-events
inst_retired:any_p:u:c=1:i'. More than one event can be passed to the
option using the comma separator. Hardware events and generic hardware
events cannot be mixed together. The latter must be used with the -e
option. The -e option and this one can be mixed and matched.  Events
can be grouped using the {} notation.
endif::HAVE_LIBPFM[]

INTERACTIVE PROMPTING KEYS
--------------------------

[d]::
	Display refresh delay.

[e]::
	Number of entries to display.

[E]::
	Event to display when multiple counters are active.

[f]::
	Profile display filter (>= hit count).

[F]::
	Annotation display filter (>= % of total).

[s]::
	Annotate symbol.

[S]::
	Stop annotation, return to full profile display.

[K]::
	Hide kernel symbols.

[U]::
	Hide user symbols.

[z]::
	Toggle event count zeroing across display updates.

[qQ]::
	Quit.

Pressing any unmapped key displays a menu, and prompts for input.

include::callchain-overhead-calculation.txt[]

SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-report[1]
Commit	Line	Data
1d8c8b20	1	perf-top(1)
6e6b754f	2	===========
1d8c8b20 IM	3
	4	NAME
	5	----
83617983	6	perf-top - System profiling tool.
1d8c8b20 IM	7
	8	SYNOPSIS
	9	--------
	10	[verse]
83617983	11	'perf top' [-e <EVENT> \| --event=EVENT] [<options>]
1d8c8b20 IM	12
	13	DESCRIPTION
	14	-----------
2e7a9881	15	This command generates and displays a performance counter profile in real time.
1d8c8b20 IM	16
	17
	18	OPTIONS
	19	-------
83617983 MG	20	-a::
	21	--all-cpus::
	22	System-wide collection. (default)
	23
	24	-c <count>::
	25	--count=<count>::
	26	Event period to sample.
	27
c45c6ea2 SE	28	-C <cpu-list>::
c45c6ea2 SE	29	--cpu=<cpu>::
2e7a9881 SB	30	Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
2e7a9881 SB	31	comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
c45c6ea2	32	Default is to monitor all CPUS.
83617983 MG	33
	34	-d <seconds>::
	35	--delay=<seconds>::
	36	Number of seconds to delay between refreshes.
1d8c8b20	37
83617983 MG	38	-e <event>::
83617983 MG	39	--event=<event>::
386b05e3 TG	40	Select the PMU event. Selection can be a symbolic event name
	41	(use 'perf list' to list all events) or a raw PMU
	42	event (eventsel+umask) in the form of rNNN where NNN is a
83617983	43	hexadecimal event descriptor.
1d8c8b20	44
83617983 MG	45	-E <entries>::
	46	--entries=<entries>::
	47	Display this many functions.
	48
	49	-f <count>::
	50	--count-filter=<count>::
	51	Only display functions with more events than this.
	52
2e7a9881 SB	53	--group::
	54	Put the counters into a counter group.
	55
df7deb2c JY	56	--group-sort-idx::
	57	Sort the output by the event at the index n in group. If n is invalid,
	58	sort by the first event. It can support multiple groups with different
	59	amount of events. WARNING: This should be used on grouped events.
	60
83617983 MG	61	-F <freq>::
83617983 MG	62	--freq=<freq>::
7831bf23 ACM	63	Profile at this frequency. Use 'max' to use the currently maximum
	64	allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate
	65	sysctl.
83617983 MG	66
	67	-i::
	68	--inherit::
2376c67a	69	Child tasks do not inherit counters.
83617983 MG	70
	71	-k <path>::
	72	--vmlinux=<path>::
	73	Path to vmlinux. Required for annotation functionality.
	74
a8403912 ACM	75	--ignore-vmlinux::
	76	Ignore vmlinux files.
	77
1b3aae90 ACM	78	--kallsyms=<file>::
	79	kallsyms pathname
	80
83617983 MG	81	-m <pages>::
83617983 MG	82	--mmap-pages=<pages>::
27050f53 JO	83	Number of mmap data pages (must be a power of two) or size
	84	specification with appended unit character - B/K/M/G. The
	85	size is rounded up to have nearest pages power of two value.
83617983 MG	86
	87	-p <pid>::
	88	--pid=<pid>::
b52956c9	89	Profile events on existing Process ID (comma separated list).
2e7a9881 SB	90
	91	-t <tid>::
	92	--tid=<tid>::
b52956c9	93	Profile events on existing thread ID (comma separated list).
83617983	94
0d37aa34 ACM	95	-u::
	96	--uid=::
	97	Record events in threads owned by uid. Name or number.
	98
83617983 MG	99	-r <priority>::
	100	--realtime=<priority>::
	101	Collect data with this RT SCHED_FIFO priority.
	102
83617983	103	--sym-annotate=<symbol>::
6cff0e8d	104	Annotate this symbol.
83617983	105
2e7a9881 SB	106	-K::
	107	--hide_kernel_symbols::
	108	Hide kernel symbols.
	109
	110	-U::
	111	--hide_user_symbols::
	112	Hide user symbols.
	113
763122ad AK	114	--demangle-kernel::
	115	Demangle kernel symbols.
	116
2e7a9881 SB	117	-D::
	118	--dump-symtab::
	119	Dump the symbol table used for profiling.
	120
83617983 MG	121	-v::
	122	--verbose::
	123	Be more verbose (show counter open errors, etc).
	124
	125	-z::
	126	--zero::
	127	Zero history across display updates.
	128
ab81f3fd ACM	129	-s::
ab81f3fd ACM	130	--sort::
f5d05bce	131	Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight,
a2ce067e NK	132	local_weight, abort, in_tx, transaction, overhead, sample, period.
a2ce067e NK	133	Please see description of --sort in the perf-report man page.
ab81f3fd	134
6fe8c26d NK	135	--fields=::
	136	Specify output field - multiple keys can be specified in CSV format.
	137	Following fields are available:
1432ec34	138	overhead, overhead_sys, overhead_us, overhead_children, sample and period.
6fe8c26d NK	139	Also it can contain any sort key(s).
	140
	141	By default, every sort keys not specified in --field will be appended
	142	automatically.
	143
ab81f3fd ACM	144	-n::
	145	--show-nr-samples::
	146	Show a column with the number of samples.
	147
	148	--show-total-period::
	149	Show a column with the sum of periods.
	150
	151	--dsos::
33db4568 NK	152	Only consider symbols in these dsos. This option will affect the
33db4568 NK	153	percentage of the overhead column. See --percentage for more info.
ab81f3fd ACM	154
ab81f3fd ACM	155	--comms::
33db4568 NK	156	Only consider symbols in these comms. This option will affect the
33db4568 NK	157	percentage of the overhead column. See --percentage for more info.
ab81f3fd ACM	158
ab81f3fd ACM	159	--symbols::
33db4568 NK	160	Only consider these symbols. This option will affect the
33db4568 NK	161	percentage of the overhead column. See --percentage for more info.
ab81f3fd	162
64c6f0c7 ACM	163	-M::
	164	--disassembler-style=:: Set disassembler style for objdump.
	165
3b0b16bf AK	166	--prefix=PREFIX::
	167	--prefix-strip=N::
	168	Remove first N entries from source file path names in executables
	169	and add PREFIX. This allows to display source code compiled on systems
	170	with different file system layout.
	171
64c6f0c7 ACM	172	--source::
	173	Interleave source code with assembly code. Enabled by default,
	174	disable with --no-source.
	175
	176	--asm-raw::
	177	Show raw instruction encoding of assembly instructions.
	178
bf80669e	179	-g::
ae779a63 JO	180	Enables call-graph (stack chain/backtrace) recording.
ae779a63 JO	181
a2c10d39	182	--call-graph [mode,type,min[,limit],order[,key][,branch]]::
ae779a63	183	Setup and enable call-graph (stack chain/backtrace) recording,
a2c10d39 NK	184	implies -g. See `--call-graph` section in perf-record and
a2c10d39 NK	185	perf-report man pages for details.
19d4ac3c	186
1432ec34 NK	187	--children::
	188	Accumulate callchain of children to parent entry so that then can
	189	show up in the output. The output will have a new "Children" column
	190	and will be sorted on the data. It requires -g/--call-graph option
dd309207	191	enabled. See the `overhead calculation' section for more details.
108a7c10	192	Enabled by default, disable with --no-children.
1432ec34	193
5dbb6e81 WL	194	--max-stack::
	195	Set the stack depth limit when parsing the callchain, anything
	196	beyond the specified depth will be ignored. This is a trade-off
	197	between information loss and faster processing especially for
	198	workloads that can have a very long callchain stack.
	199
4cb93446	200	Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.
5dbb6e81	201
b21484f1 GP	202	--ignore-callees=<regex>::
	203	Ignore callees of the function(s) matching the given regex.
	204	This has the effect of collecting the callers of each such
	205	function into one place in the call-graph tree.
	206
fa5df943 NK	207	--percent-limit::
	208	Do not show entries which have an overhead under that percent.
	209	(Default: 0).
	210
33db4568 NK	211	--percentage::
	212	Determine how to display the overhead percentage of filtered entries.
	213	Filters can be applied by --comms, --dsos and/or --symbols options and
	214	Zoom operations on the TUI (thread, dso, etc).
	215
	216	"relative" means it's relative to filtered entries only so that the
	217	sum of shown entries will be always 100%. "absolute" means it retains
	218	the original value before and after the filter is applied.
	219
cf59002f NK	220	-w::
	221	--column-widths=<width[,width...]>::
	222	Force each column width to the provided list, for large terminal
	223	readability. 0 means no limit (default behavior).
	224
9d9cad76 KL	225	--proc-map-timeout::
	226	When processing pre-existing threads /proc/XXX/mmap, it may take
	227	a long time, because the file may be huge. A time out is needed
	228	in such cases.
	229	This option sets the time out limit. The default value is 500 ms.
	230
cf59002f	231
a18b027e AK	232	-b::
	233	--branch-any::
	234	Enable taken branch stack sampling. Any type of taken branch may be sampled.
	235	This is a shortcut for --branch-filter any. See --branch-filter for more infos.
	236
	237	-j::
	238	--branch-filter::
	239	Enable taken branch stack sampling. Each sample captures a series of consecutive
	240	taken branches. The number of branches captured with each sample depends on the
	241	underlying hardware, the type of branches of interest, and the executed code.
	242	It is possible to select the types of branches captured by enabling filters.
	243	For a full list of modifiers please see the perf record manpage.
	244
	245	The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
	246	The privilege levels may be omitted, in which case, the privilege levels of the associated
	247	event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
	248	levels are subject to permissions. When sampling on multiple events, branch stack sampling
	249	is enabled for all the sampling events. The sampled branch type is the same for all events.
	250	The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
	251	Note that this feature may not be available on all processors.
	252
053a3989 NK	253	--raw-trace::
	254	When displaying traceevent output, do not use print fmt or plugins.
	255
c92fcfde NK	256	--hierarchy::
	257	Enable hierarchy output.
	258
4e303fbe	259	--overwrite::
218d6111 ACM	260	Enable this to use just the most recent records, which helps in high core count
	261	machines such as Knights Landing/Mill, but right now is disabled by default as
	262	the pausing used in this technique is leading to loss of metadata events such
	263	as PERF_RECORD_MMAP which makes 'perf top' unable to resolve samples, leading
	264	to lots of unknown samples appearing on the UI. Enable this if you are in such
	265	machines and profiling a workload that doesn't creates short lived threads and/or
	266	doesn't uses many executable mmap operations. Work is being planed to solve
	267	this situation, till then, this will remain disabled by default.
4e303fbe	268
868a8329 KJ	269	--force::
	270	Don't do ownership validation.
	271
0c6b4994 KL	272	--num-thread-synthesize::
	273	The number of threads to run when synthesizing events for existing processes.
	274	By default, the number of threads equals to the number of online CPUs.
868a8329	275
a0c0a4ac NK	276	--namespaces::
	277	Record events of type PERF_RECORD_NAMESPACES and display it with the
	278	'cgroup_id' sort key.
	279
51f38242 JM	280	-G name::
	281	--cgroup name::
	282	monitor only in the container (cgroup) called "name". This option is available only
	283	in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
	284	container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
	285	can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
	286	to first event, second cgroup to second event and so on. It is possible to provide
	287	an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
	288	corresponding events, i.e., they always refer to events defined earlier on the command
	289	line. If the user wants to track multiple events for a specific cgroup, the user can
	290	use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
	291
f382842f NK	292	--all-cgroups::
	293	Record events of type PERF_RECORD_CGROUP and display it with the
	294	'cgroup' sort key.
	295
2f53ae34 ACM	296	--switch-on EVENT_NAME::
	297	Only consider events after this event is found.
	298
	299	E.g.:
	300
	301	Find out where broadcast packets are handled
	302
	303	perf probe -L icmp_rcv
	304
	305	Insert a probe there:
	306
	307	perf probe icmp_rcv:59
	308
	309	Start perf top and ask it to only consider the cycles events when a
	310	broadcast packet arrives This will show a menu with two entries and
	311	will start counting when a broadcast packet arrives:
	312
	313	perf top -e cycles,probe:icmp_rcv --switch-on=probe:icmp_rcv
	314
	315	Alternatively one can ask for --group and then two overhead columns
	316	will appear, the first for cycles and the second for the switch-on event.
	317
	318	perf top --group -e cycles,probe:icmp_rcv --switch-on=probe:icmp_rcv
	319
	320	This may be interesting to measure a workload only after some initialization
	321	phase is over, i.e. insert a perf probe at that point and use the above
	322	examples replacing probe:icmp_rcv with the just-after-init probe.
	323
	324	--switch-off EVENT_NAME::
	325	Stop considering events after this event is found.
	326
	327	--show-on-off-events::
	328	Show the --switch-on/off events too. This has no effect in 'perf top' now
	329	but probably we'll make the default not to show the switch-on/off events
	330	on the --group mode and if there is only one event besides the off/on ones,
	331	go straight to the histogram browser, just like 'perf top' with no events
4d39c89f	332	explicitly specified does.
2f53ae34	333
13e0c844 KL	334	--stitch-lbr::
	335	Show callgraph with stitched LBRs, which may have more complete
	336	callgraph. The option must be used with --call-graph lbr recording.
	337	Disabled by default. In common cases with call stack overflows,
	338	it can recreate better call stacks than the default lbr call stack
	339	output. But this approach is not full proof. There can be cases
	340	where it creates incorrect call stacks from incorrect matches.
	341	The known limitations include exception handing such as
	342	setjmp/longjmp will have calls/returns not match.
a0c0a4ac	343
70943490 SE	344	ifdef::HAVE_LIBPFM[]
	345	--pfm-events events::
	346	Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
	347	including support for event filters. For example '--pfm-events
	348	inst_retired:any_p:u:c=1:i'. More than one event can be passed to the
	349	option using the comma separator. Hardware events and generic hardware
	350	events cannot be mixed together. The latter must be used with the -e
	351	option. The -e option and this one can be mixed and matched. Events
	352	can be grouped using the {} notation.
	353	endif::HAVE_LIBPFM[]
	354
83617983 MG	355	INTERACTIVE PROMPTING KEYS
	356	--------------------------
	357
	358	[d]::
	359	Display refresh delay.
	360
	361	[e]::
	362	Number of entries to display.
	363
	364	[E]::
	365	Event to display when multiple counters are active.
	366
	367	[f]::
	368	Profile display filter (>= hit count).
	369
	370	[F]::
	371	Annotation display filter (>= % of total).
	372
	373	[s]::
	374	Annotate symbol.
	375
	376	[S]::
	377	Stop annotation, return to full profile display.
	378
958964f8 SJ	379	[K]::
	380	Hide kernel symbols.
	381
	382	[U]::
	383	Hide user symbols.
	384
83617983 MG	385	[z]::
	386	Toggle event count zeroing across display updates.
	387
	388	[qQ]::
	389	Quit.
	390
	391	Pressing any unmapped key displays a menu, and prompts for input.
1d8c8b20	392
dd309207	393	include::callchain-overhead-calculation.txt[]
1d8c8b20	394
1d8c8b20 IM	395	SEE ALSO
1d8c8b20 IM	396	--------
a2ce067e	397	linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-report[1]