[rustc.git] / src / doc / rustc-guide / src / profiling / with_perf.md

# Profiling with perf

This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.org/index.php/Main_Page).

## Initial steps

- Get a clean checkout of rust-lang/master, or whatever it is you want
  to profile.
- Set the following settings in your `config.toml`:
  - `debuginfo-level = 1` - enables line debuginfo
  - `use-jemalloc = false` - lets you do memory use profiling with valgrind
  - leave everything else the defaults
- Run `./x.py build` to get a full build
- Make a rustup toolchain pointing to that result
  - see [the "build and run" section for instructions][b-a-r]

[b-a-r]: ../building/how-to-build-and-run.html#toolchain

## Gathering a perf profile

perf is an excellent tool on linux that can be used to gather and
analyze all kinds of information. Mostly it is used to figure out
where a program spends its time. It can also be used for other sorts
of events, though, like cache misses and so forth.

### The basics

The basic `perf` command is this:

```bash
perf record -F99 --call-graph dwarf XXX
```

The `-F99` tells perf to sample at 99 Hz, which avoids generating too
much data for longer runs (why 99 Hz you ask? It is often chosen
because it is unlikely to be in lockstep with other periodic
activity). The `--call-graph dwarf` tells perf to get call-graph
information from debuginfo, which is accurate. The `XXX` is the
command you want to profile. So, for example, you might do:

```bash
perf record -F99 --call-graph dwarf cargo +<toolchain> rustc
```

to run `cargo` -- here `<toolchain>` should be the name of the toolchain
you made in the beginning. But there are some things to be aware of:

- You probably don't want to profile the time spend building
  dependencies. So something like `cargo build; cargo clean -p $C` may
  be helpful (where `$C` is the crate name)
    - Though usually I just do `touch src/lib.rs` and rebuild instead. =)
- You probably don't want incremental messing about with your
  profile. So something like `CARGO_INCREMENTAL=0` can be helpful.

### Gathering a perf profile from a `perf.rust-lang.org` test

Often we want to analyze a specific test from `perf.rust-lang.org`. To
do that, the first step is to clone
[the rustc-perf repository][rustc-perf-gh]:

```bash
git clone https://github.com/rust-lang-nursery/rustc-perf
```

[rustc-perf-gh]: https://github.com/rust-lang-nursery/rustc-perf

#### Doing it the easy way

Once you've cloned the repo, you can use the `collector` executable to
do profiling for you! You can find
[instructions in the rustc-perf readme][rustc-perf-readme].

[rustc-perf-readme]: https://github.com/rust-lang-nursery/rustc-perf/blob/master/collector/README.md#profiling

For example, to measure the clap-rs test, you might do:

```bash
./target/release/collector                                      \
  --output-repo /path/to/place/output                           \
  profile perf-record                                           \
  --rustc /path/to/rustc/executable/from/your/build/directory   \
  --cargo `which cargo`                                         \
  --filter clap-rs                                              \
  --builds Check                                                \
```

You can also use that same command to use cachegrind or other profiling tools.

#### Doing it the hard way

If you prefer to run things manually, that is also possible. You first
need to find the source for the test you want. Sources for the tests
are found in [the `collector/benchmarks` directory][dir]. So let's go
into the directory of a specific test; we'll use `clap-rs` as an
example:

[dir]: https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks

```bash
cd collector/benchmarks/clap-rs
```

In this case, let's say we want to profile the `cargo check`
performance. In that case, I would first run some basic commands to
build the dependencies:

```bash
# Setup: first clean out any old results and build the dependencies:
cargo +<toolchain> clean
CARGO_INCREMENTAL=0 cargo +<toolchain> check
```

(Again, `<toolchain>` should be replaced with the name of the
toolchain we made in the first step.)

Next: we want record the execution time for *just* the clap-rs crate,
running cargo check. I tend to use `cargo rustc` for this, since it
also allows me to add explicit flags, which we'll do later on.

```bash
touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib
```

Note that final command: it's a doozy! It uses the `cargo rustc`
command, which executes rustc with (potentially) additional options;
the `--profile check` and `--lib` options specify that we are doing a
`cargo check` execution, and that this is a library (not a binary).

At this point, we can use `perf` tooling to analyze the results. For example:

```bash
perf report
```

will open up an interactive TUI program. In simple cases, that can be
helpful. For more detailed examination, the [`perf-focus` tool][pf]
can be helpful; it is covered below.

**A note of caution.** Each of the rustc-perf tests is its own special
  snowflake. In particular, some of them are not libraries, in which
  case you would want to do `touch src/main.rs` and avoid passing
  `--lib`. I'm not sure how best to tell which test is which to be
  honest.

### Gathering NLL data

If you want to profile an NLL run, you can just pass extra options to
the `cargo rustc` command, like so:

```bash
touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir
```

[pf]: https://github.com/nikomatsakis/perf-focus

## Analyzing a perf profile with `perf focus`

Once you've gathered a perf profile, we want to get some information
about it. For this, I personally use [perf focus][pf]. It's a kind of
simple but useful tool that lets you answer queries like:

- "how much time was spent in function F" (no matter where it was called from)
- "how much time was spent in function F when it was called from G"
- "how much time was spent in function F *excluding* time spent in G"
- "what functions does F call and how much time does it spend in them"

To understand how it works, you have to know just a bit about
perf. Basically, perf works by *sampling* your process on a regular
basis (or whenever some event occurs). For each sample, perf gathers a
backtrace. `perf focus` lets you write a regular expression that tests
which functions appear in that backtrace, and then tells you which
percentage of samples had a backtrace that met the regular
expression. It's probably easiest to explain by walking through how I
would analyze NLL performance.

### Installing `perf-focus`

You can install perf-focus using `cargo install`:

```bash
cargo install perf-focus
```

### Example: How much time is spent in MIR borrowck?

Let's say we've gathered the NLL data for a test. We'd like to know
how much time it is spending in the MIR borrow-checker. The "main"
function of the MIR borrowck is called `do_mir_borrowck`, so we can do
this command:

```bash
$ perf focus '{do_mir_borrowck}'
Matcher    : {do_mir_borrowck}
Matches    : 228
Not Matches: 542
Percentage : 29%
```

The `'{do_mir_borrowck}'` argument is called the **matcher**. It
specifies the test to be applied on the backtrace. In this case, the
`{X}` indicates that there must be *some* function on the backtrace
that meets the regular expression `X`. In this case, that regex is
just the name of the function we want (in fact, it's a subset of the name;
the full name includes a bunch of other stuff, like the module
path). In this mode, perf-focus just prints out the percentage of
samples where `do_mir_borrowck` was on the stack: in this case, 29%.

**A note about c++filt.** To get the data from `perf`, `perf focus`
  currently executes `perf script` (perhaps there is a better
  way...). I've sometimes found that `perf script` outputs C++ mangled
  names. This is annoying. You can tell by running `perf script |
  head` yourself — if you see names like `5rustc6middle` instead of
  `rustc::middle`, then you have the same problem. You can solve this
  by doing:

```bash
perf script | c++filt | perf focus --from-stdin ...
```

This will pipe the output from `perf script` through `c++filt` and
should mostly convert those names into a more friendly format. The
`--from-stdin` flag to `perf focus` tells it to get its data from
stdin, rather than executing `perf focus`. We should make this more
convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
just always use it — it's pretty harmless).

### Example: How much time does MIR borrowck spend solving traits?

Perhaps we'd like to know how much time MIR borrowck spends in the
trait checker. We can ask this using a more complex regex:

```bash
$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher    : {do_mir_borrowck},..{^rustc::traits}
Matches    : 12
Not Matches: 1311
Percentage : 0%
```

Here we used the `..` operator to ask "how often do we have
`do_mir_borrowck` on the stack and then, later, some function whose
name begins with `rusc::traits`?" (basically, code in that module). It
turns out the answer is "almost never" — only 12 samples fit that
description (if you ever see *no* samples, that often indicates your
query is messed up).

If you're curious, you can find out exactly which samples by using the
`--print-match` option. This will print out the full backtrace for
each sample. The `|` at the front of the line indicates the part that
the regular expression matched.

### Example: Where does MIR borrowck spend its time?

Often we want to do a more "explorational" queries. Like, we know that
MIR borrowck is 29% of the time, but where does that time get spent?
For that, the `--tree-callees` option is often the best tool. You
usually also want to give `--tree-min-percent` or
`--tree-max-depth`. The result looks like this:

```bash
$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 43%

Tree
| matched `{do_mir_borrowck}` (43% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
```

What happens with `--tree-callees` is that

- we find each sample matching the regular expression
- we look at the code that is occurs *after* the regex match and try
  to build up a call tree

The `--tree-min-percent 3` option says "only show me things that take
more than 3% of the time. Without this, the tree often gets really
noisy and includes random stuff like the innards of
malloc. `--tree-max-depth` can be useful too, it just limits how many
levels we print.

For each line, we display the percent of time in that function
altogether ("total") and the percent of time spent in **just that
function and not some callee of that function** (self). Usually
"total" is the more interesting number, but not always.

### Relative percentages

By default, all in perf-focus are relative to the **total program
execution**. This is useful to help you keep perspective — often as
we drill down to find hot spots, we can lose sight of the fact that,
in terms of overall program execution, this "hot spot" is actually not
important. It also ensures that percentages between different queries
are easily compared against one another.

That said, sometimes it's useful to get relative percentages, so `perf
focus` offers a `--relative` option. In this case, the percentages are
listed only for samples that match (vs all samples). So for example we
could get our percentages relative to the borrowck itself
like so:

```bash
$ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 100%

Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]
```

Here you see that `compute_regions` came up as "47% total" — that
means that 47% of `do_mir_borrowck` is spent in that function. Before,
we saw 20% — that's because `do_mir_borrowck` itself is only 43% of
the total time (and `.47 * .43 = .20`).
Commit	Line	Data
a1dfa0c6 XL	1	# Profiling with perf
	2
	3	This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.org/index.php/Main_Page).
	4
	5	## Initial steps
	6
	7	- Get a clean checkout of rust-lang/master, or whatever it is you want
	8	to profile.
	9	- Set the following settings in your `config.toml`:
dc9dc135 XL	10	- `debuginfo-level = 1` - enables line debuginfo
dc9dc135 XL	11	- `use-jemalloc = false` - lets you do memory use profiling with valgrind
a1dfa0c6 XL	12	- leave everything else the defaults
	13	- Run `./x.py build` to get a full build
	14	- Make a rustup toolchain pointing to that result
	15	- see [the "build and run" section for instructions][b-a-r]
	16
60c5eb7d	17	[b-a-r]: ../building/how-to-build-and-run.html#toolchain
dc9dc135	18
a1dfa0c6 XL	19	## Gathering a perf profile
	20
	21	perf is an excellent tool on linux that can be used to gather and
	22	analyze all kinds of information. Mostly it is used to figure out
	23	where a program spends its time. It can also be used for other sorts
	24	of events, though, like cache misses and so forth.
	25
	26	### The basics
	27
	28	The basic `perf` command is this:
	29
	30	```bash
60c5eb7d	31	perf record -F99 --call-graph dwarf XXX
a1dfa0c6 XL	32	```
	33
	34	The `-F99` tells perf to sample at 99 Hz, which avoids generating too
	35	much data for longer runs (why 99 Hz you ask? It is often chosen
	36	because it is unlikely to be in lockstep with other periodic
	37	activity). The `--call-graph dwarf` tells perf to get call-graph
	38	information from debuginfo, which is accurate. The `XXX` is the
	39	command you want to profile. So, for example, you might do:
	40
	41	```bash
60c5eb7d	42	perf record -F99 --call-graph dwarf cargo +<toolchain> rustc
a1dfa0c6 XL	43	```
	44
	45	to run `cargo` -- here `<toolchain>` should be the name of the toolchain
	46	you made in the beginning. But there are some things to be aware of:
	47
	48	- You probably don't want to profile the time spend building
	49	dependencies. So something like `cargo build; cargo clean -p $C` may
	50	be helpful (where `$C` is the crate name)
	51	- Though usually I just do `touch src/lib.rs` and rebuild instead. =)
	52	- You probably don't want incremental messing about with your
	53	profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
	54
	55	### Gathering a perf profile from a `perf.rust-lang.org` test
	56
	57	Often we want to analyze a specific test from `perf.rust-lang.org`. To
	58	do that, the first step is to clone
	59	[the rustc-perf repository][rustc-perf-gh]:
	60
	61	```bash
60c5eb7d	62	git clone https://github.com/rust-lang-nursery/rustc-perf
a1dfa0c6 XL	63	```
	64
	65	[rustc-perf-gh]: https://github.com/rust-lang-nursery/rustc-perf
	66
	67	#### Doing it the easy way
	68
	69	Once you've cloned the repo, you can use the `collector` executable to
	70	do profiling for you! You can find
	71	[instructions in the rustc-perf readme][rustc-perf-readme].
	72
	73	[rustc-perf-readme]: https://github.com/rust-lang-nursery/rustc-perf/blob/master/collector/README.md#profiling
	74
	75	For example, to measure the clap-rs test, you might do:
	76
	77	```bash
60c5eb7d XL	78	./target/release/collector \
	79	--output-repo /path/to/place/output \
	80	profile perf-record \
	81	--rustc /path/to/rustc/executable/from/your/build/directory \
	82	--cargo `which cargo` \
	83	--filter clap-rs \
	84	--builds Check \
a1dfa0c6 XL	85	```
	86
	87	You can also use that same command to use cachegrind or other profiling tools.
	88
	89	#### Doing it the hard way
	90
	91	If you prefer to run things manually, that is also possible. You first
	92	need to find the source for the test you want. Sources for the tests
	93	are found in [the `collector/benchmarks` directory][dir]. So let's go
	94	into the directory of a specific test; we'll use `clap-rs` as an
	95	example:
	96
	97	[dir]: https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks
	98
	99	```bash
60c5eb7d	100	cd collector/benchmarks/clap-rs
a1dfa0c6 XL	101	```
	102
	103	In this case, let's say we want to profile the `cargo check`
	104	performance. In that case, I would first run some basic commands to
	105	build the dependencies:
	106
	107	```bash
	108	# Setup: first clean out any old results and build the dependencies:
60c5eb7d XL	109	cargo +<toolchain> clean
60c5eb7d XL	110	CARGO_INCREMENTAL=0 cargo +<toolchain> check
a1dfa0c6 XL	111	```
	112
	113	(Again, `<toolchain>` should be replaced with the name of the
	114	toolchain we made in the first step.)
	115
	116	Next: we want record the execution time for just the clap-rs crate,
	117	running cargo check. I tend to use `cargo rustc` for this, since it
	118	also allows me to add explicit flags, which we'll do later on.
	119
	120	```bash
60c5eb7d XL	121	touch src/lib.rs
60c5eb7d XL	122	CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib
a1dfa0c6 XL	123	```
	124
	125	Note that final command: it's a doozy! It uses the `cargo rustc`
	126	command, which executes rustc with (potentially) additional options;
	127	the `--profile check` and `--lib` options specify that we are doing a
	128	`cargo check` execution, and that this is a library (not a binary).
	129
	130	At this point, we can use `perf` tooling to analyze the results. For example:
	131
	132	```bash
60c5eb7d	133	perf report
a1dfa0c6 XL	134	```
	135
	136	will open up an interactive TUI program. In simple cases, that can be
	137	helpful. For more detailed examination, the [`perf-focus` tool][pf]
	138	can be helpful; it is covered below.
	139
	140	A note of caution. Each of the rustc-perf tests is its own special
	141	snowflake. In particular, some of them are not libraries, in which
	142	case you would want to do `touch src/main.rs` and avoid passing
	143	`--lib`. I'm not sure how best to tell which test is which to be
	144	honest.
	145
	146	### Gathering NLL data
	147
	148	If you want to profile an NLL run, you can just pass extra options to
	149	the `cargo rustc` command, like so:
	150
	151	```bash
60c5eb7d XL	152	touch src/lib.rs
60c5eb7d XL	153	CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir
a1dfa0c6 XL	154	```
	155
	156	[pf]: https://github.com/nikomatsakis/perf-focus
	157
	158	## Analyzing a perf profile with `perf focus`
	159
	160	Once you've gathered a perf profile, we want to get some information
	161	about it. For this, I personally use [perf focus][pf]. It's a kind of
	162	simple but useful tool that lets you answer queries like:
	163
	164	- "how much time was spent in function F" (no matter where it was called from)
	165	- "how much time was spent in function F when it was called from G"
	166	- "how much time was spent in function F excluding time spent in G"
	167	- "what functions does F call and how much time does it spend in them"
	168
	169	To understand how it works, you have to know just a bit about
	170	perf. Basically, perf works by sampling your process on a regular
	171	basis (or whenever some event occurs). For each sample, perf gathers a
	172	backtrace. `perf focus` lets you write a regular expression that tests
	173	which functions appear in that backtrace, and then tells you which
	174	percentage of samples had a backtrace that met the regular
	175	expression. It's probably easiest to explain by walking through how I
	176	would analyze NLL performance.
	177
	178	### Installing `perf-focus`
	179
	180	You can install perf-focus using `cargo install`:
	181
	182	```bash
60c5eb7d	183	cargo install perf-focus
a1dfa0c6 XL	184	```
	185
	186	### Example: How much time is spent in MIR borrowck?
	187
	188	Let's say we've gathered the NLL data for a test. We'd like to know
	189	how much time it is spending in the MIR borrow-checker. The "main"
	190	function of the MIR borrowck is called `do_mir_borrowck`, so we can do
	191	this command:
	192
	193	```bash
60c5eb7d	194	$ perf focus '{do_mir_borrowck}'
a1dfa0c6 XL	195	Matcher : {do_mir_borrowck}
	196	Matches : 228
	197	Not Matches: 542
	198	Percentage : 29%
	199	```
	200
	201	The `'{do_mir_borrowck}'` argument is called the matcher. It
	202	specifies the test to be applied on the backtrace. In this case, the
	203	`{X}` indicates that there must be some function on the backtrace
	204	that meets the regular expression `X`. In this case, that regex is
	205	just the name of the function we want (in fact, it's a subset of the name;
	206	the full name includes a bunch of other stuff, like the module
	207	path). In this mode, perf-focus just prints out the percentage of
	208	samples where `do_mir_borrowck` was on the stack: in this case, 29%.
	209
	210	A note about c++filt. To get the data from `perf`, `perf focus`
	211	currently executes `perf script` (perhaps there is a better
	212	way...). I've sometimes found that `perf script` outputs C++ mangled
	213	names. This is annoying. You can tell by running `perf script \|
	214	head` yourself — if you see names like `5rustc6middle` instead of
	215	`rustc::middle`, then you have the same problem. You can solve this
	216	by doing:
	217
	218	```bash
60c5eb7d	219	perf script \| c++filt \| perf focus --from-stdin ...
a1dfa0c6 XL	220	```
	221
	222	This will pipe the output from `perf script` through `c++filt` and
	223	should mostly convert those names into a more friendly format. The
	224	`--from-stdin` flag to `perf focus` tells it to get its data from
	225	stdin, rather than executing `perf focus`. We should make this more
	226	convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
	227	just always use it — it's pretty harmless).
	228
	229	### Example: How much time does MIR borrowck spend solving traits?
	230
	231	Perhaps we'd like to know how much time MIR borrowck spends in the
	232	trait checker. We can ask this using a more complex regex:
	233
	234	```bash
60c5eb7d	235	$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
a1dfa0c6 XL	236	Matcher : {do_mir_borrowck},..{^rustc::traits}
	237	Matches : 12
	238	Not Matches: 1311
	239	Percentage : 0%
	240	```
	241
	242	Here we used the `..` operator to ask "how often do we have
	243	`do_mir_borrowck` on the stack and then, later, some function whose
	244	name begins with `rusc::traits`?" (basically, code in that module). It
	245	turns out the answer is "almost never" — only 12 samples fit that
	246	description (if you ever see no samples, that often indicates your
	247	query is messed up).
	248
	249	If you're curious, you can find out exactly which samples by using the
	250	`--print-match` option. This will print out the full backtrace for
	251	each sample. The `\|` at the front of the line indicates the part that
	252	the regular expression matched.
	253
	254	### Example: Where does MIR borrowck spend its time?
	255
	256	Often we want to do a more "explorational" queries. Like, we know that
	257	MIR borrowck is 29% of the time, but where does that time get spent?
	258	For that, the `--tree-callees` option is often the best tool. You
	259	usually also want to give `--tree-min-percent` or
	260	`--tree-max-depth`. The result looks like this:
	261
	262	```bash
60c5eb7d	263	$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
a1dfa0c6 XL	264	Matcher : {do_mir_borrowck}
	265	Matches : 577
	266	Not Matches: 746
	267	Percentage : 43%
	268
	269	Tree
	270	\| matched `{do_mir_borrowck}` (43% total, 0% self)
	271	: \| rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
	272	: : \| rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
	273	: : : \| core::ops::function::FnOnce::call_once (5% total, 0% self)
	274	: : : : \| rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
dc9dc135	275	: : : \| <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
a1dfa0c6	276	: \| rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
dc9dc135	277	: \| <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
a1dfa0c6 XL	278	: \| rustc_mir::dataflow::do_dataflow (3% total, 0% self)
	279	```
	280
	281	What happens with `--tree-callees` is that
	282
	283	- we find each sample matching the regular expression
	284	- we look at the code that is occurs after the regex match and try
	285	to build up a call tree
	286
	287	The `--tree-min-percent 3` option says "only show me things that take
	288	more than 3% of the time. Without this, the tree often gets really
	289	noisy and includes random stuff like the innards of
	290	malloc. `--tree-max-depth` can be useful too, it just limits how many
	291	levels we print.
	292
	293	For each line, we display the percent of time in that function
	294	altogether ("total") and the percent of time spent in **just that
	295	function and not some callee of that function** (self). Usually
	296	"total" is the more interesting number, but not always.
	297
dc9dc135	298	### Relative percentages
a1dfa0c6 XL	299
	300	By default, all in perf-focus are relative to the **total program
	301	execution**. This is useful to help you keep perspective — often as
	302	we drill down to find hot spots, we can lose sight of the fact that,
	303	in terms of overall program execution, this "hot spot" is actually not
	304	important. It also ensures that percentages between different queries
	305	are easily compared against one another.
	306
	307	That said, sometimes it's useful to get relative percentages, so `perf
	308	focus` offers a `--relative` option. In this case, the percentages are
	309	listed only for samples that match (vs all samples). So for example we
	310	could get our percentages relative to the borrowck itself
	311	like so:
	312
	313	```bash
60c5eb7d	314	$ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
a1dfa0c6 XL	315	Matcher : {do_mir_borrowck}
	316	Matches : 577
	317	Not Matches: 746
	318	Percentage : 100%
	319
	320	Tree
	321	\| matched `{do_mir_borrowck}` (100% total, 0% self)
	322	: \| rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
	323	: \| rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
dc9dc135	324	: \| <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
a1dfa0c6 XL	325	: \| rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]
	326	```
	327
	328	Here you see that `compute_regions` came up as "47% total" — that
	329	means that 47% of `do_mir_borrowck` is spent in that function. Before,
	330	we saw 20% — that's because `do_mir_borrowck` itself is only 43% of
	331	the total time (and `.47 * .43 = .20`).