]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-guide/src/profiling/with_perf.md
New upstream version 1.41.1+dfsg1
[rustc.git] / src / doc / rustc-guide / src / profiling / with_perf.md
CommitLineData
a1dfa0c6
XL
1# Profiling with perf
2
3This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.org/index.php/Main_Page).
4
5## Initial steps
6
7- Get a clean checkout of rust-lang/master, or whatever it is you want
8 to profile.
9- Set the following settings in your `config.toml`:
dc9dc135
XL
10 - `debuginfo-level = 1` - enables line debuginfo
11 - `use-jemalloc = false` - lets you do memory use profiling with valgrind
a1dfa0c6
XL
12 - leave everything else the defaults
13- Run `./x.py build` to get a full build
14- Make a rustup toolchain pointing to that result
15 - see [the "build and run" section for instructions][b-a-r]
16
60c5eb7d 17[b-a-r]: ../building/how-to-build-and-run.html#toolchain
dc9dc135 18
a1dfa0c6
XL
19## Gathering a perf profile
20
21perf is an excellent tool on linux that can be used to gather and
22analyze all kinds of information. Mostly it is used to figure out
23where a program spends its time. It can also be used for other sorts
24of events, though, like cache misses and so forth.
25
26### The basics
27
28The basic `perf` command is this:
29
30```bash
60c5eb7d 31perf record -F99 --call-graph dwarf XXX
a1dfa0c6
XL
32```
33
34The `-F99` tells perf to sample at 99 Hz, which avoids generating too
35much data for longer runs (why 99 Hz you ask? It is often chosen
36because it is unlikely to be in lockstep with other periodic
37activity). The `--call-graph dwarf` tells perf to get call-graph
38information from debuginfo, which is accurate. The `XXX` is the
39command you want to profile. So, for example, you might do:
40
41```bash
60c5eb7d 42perf record -F99 --call-graph dwarf cargo +<toolchain> rustc
a1dfa0c6
XL
43```
44
45to run `cargo` -- here `<toolchain>` should be the name of the toolchain
46you made in the beginning. But there are some things to be aware of:
47
48- You probably don't want to profile the time spend building
49 dependencies. So something like `cargo build; cargo clean -p $C` may
50 be helpful (where `$C` is the crate name)
51 - Though usually I just do `touch src/lib.rs` and rebuild instead. =)
52- You probably don't want incremental messing about with your
53 profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
54
55### Gathering a perf profile from a `perf.rust-lang.org` test
56
57Often we want to analyze a specific test from `perf.rust-lang.org`. To
58do that, the first step is to clone
59[the rustc-perf repository][rustc-perf-gh]:
60
61```bash
60c5eb7d 62git clone https://github.com/rust-lang-nursery/rustc-perf
a1dfa0c6
XL
63```
64
65[rustc-perf-gh]: https://github.com/rust-lang-nursery/rustc-perf
66
67#### Doing it the easy way
68
69Once you've cloned the repo, you can use the `collector` executable to
70do profiling for you! You can find
71[instructions in the rustc-perf readme][rustc-perf-readme].
72
73[rustc-perf-readme]: https://github.com/rust-lang-nursery/rustc-perf/blob/master/collector/README.md#profiling
74
75For example, to measure the clap-rs test, you might do:
76
77```bash
60c5eb7d
XL
78./target/release/collector \
79 --output-repo /path/to/place/output \
80 profile perf-record \
81 --rustc /path/to/rustc/executable/from/your/build/directory \
82 --cargo `which cargo` \
83 --filter clap-rs \
84 --builds Check \
a1dfa0c6
XL
85```
86
87You can also use that same command to use cachegrind or other profiling tools.
88
89#### Doing it the hard way
90
91If you prefer to run things manually, that is also possible. You first
92need to find the source for the test you want. Sources for the tests
93are found in [the `collector/benchmarks` directory][dir]. So let's go
94into the directory of a specific test; we'll use `clap-rs` as an
95example:
96
97[dir]: https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks
98
99```bash
60c5eb7d 100cd collector/benchmarks/clap-rs
a1dfa0c6
XL
101```
102
103In this case, let's say we want to profile the `cargo check`
104performance. In that case, I would first run some basic commands to
105build the dependencies:
106
107```bash
108# Setup: first clean out any old results and build the dependencies:
60c5eb7d
XL
109cargo +<toolchain> clean
110CARGO_INCREMENTAL=0 cargo +<toolchain> check
a1dfa0c6
XL
111```
112
113(Again, `<toolchain>` should be replaced with the name of the
114toolchain we made in the first step.)
115
116Next: we want record the execution time for *just* the clap-rs crate,
117running cargo check. I tend to use `cargo rustc` for this, since it
118also allows me to add explicit flags, which we'll do later on.
119
120```bash
60c5eb7d
XL
121touch src/lib.rs
122CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib
a1dfa0c6
XL
123```
124
125Note that final command: it's a doozy! It uses the `cargo rustc`
126command, which executes rustc with (potentially) additional options;
127the `--profile check` and `--lib` options specify that we are doing a
128`cargo check` execution, and that this is a library (not a binary).
129
130At this point, we can use `perf` tooling to analyze the results. For example:
131
132```bash
60c5eb7d 133perf report
a1dfa0c6
XL
134```
135
136will open up an interactive TUI program. In simple cases, that can be
137helpful. For more detailed examination, the [`perf-focus` tool][pf]
138can be helpful; it is covered below.
139
140**A note of caution.** Each of the rustc-perf tests is its own special
141 snowflake. In particular, some of them are not libraries, in which
142 case you would want to do `touch src/main.rs` and avoid passing
143 `--lib`. I'm not sure how best to tell which test is which to be
144 honest.
145
146### Gathering NLL data
147
148If you want to profile an NLL run, you can just pass extra options to
149the `cargo rustc` command, like so:
150
151```bash
60c5eb7d
XL
152touch src/lib.rs
153CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir
a1dfa0c6
XL
154```
155
156[pf]: https://github.com/nikomatsakis/perf-focus
157
158## Analyzing a perf profile with `perf focus`
159
160Once you've gathered a perf profile, we want to get some information
161about it. For this, I personally use [perf focus][pf]. It's a kind of
162simple but useful tool that lets you answer queries like:
163
164- "how much time was spent in function F" (no matter where it was called from)
165- "how much time was spent in function F when it was called from G"
166- "how much time was spent in function F *excluding* time spent in G"
167- "what functions does F call and how much time does it spend in them"
168
169To understand how it works, you have to know just a bit about
170perf. Basically, perf works by *sampling* your process on a regular
171basis (or whenever some event occurs). For each sample, perf gathers a
172backtrace. `perf focus` lets you write a regular expression that tests
173which functions appear in that backtrace, and then tells you which
174percentage of samples had a backtrace that met the regular
175expression. It's probably easiest to explain by walking through how I
176would analyze NLL performance.
177
178### Installing `perf-focus`
179
180You can install perf-focus using `cargo install`:
181
182```bash
60c5eb7d 183cargo install perf-focus
a1dfa0c6
XL
184```
185
186### Example: How much time is spent in MIR borrowck?
187
188Let's say we've gathered the NLL data for a test. We'd like to know
189how much time it is spending in the MIR borrow-checker. The "main"
190function of the MIR borrowck is called `do_mir_borrowck`, so we can do
191this command:
192
193```bash
60c5eb7d 194$ perf focus '{do_mir_borrowck}'
a1dfa0c6
XL
195Matcher : {do_mir_borrowck}
196Matches : 228
197Not Matches: 542
198Percentage : 29%
199```
200
201The `'{do_mir_borrowck}'` argument is called the **matcher**. It
202specifies the test to be applied on the backtrace. In this case, the
203`{X}` indicates that there must be *some* function on the backtrace
204that meets the regular expression `X`. In this case, that regex is
205just the name of the function we want (in fact, it's a subset of the name;
206the full name includes a bunch of other stuff, like the module
207path). In this mode, perf-focus just prints out the percentage of
208samples where `do_mir_borrowck` was on the stack: in this case, 29%.
209
210**A note about c++filt.** To get the data from `perf`, `perf focus`
211 currently executes `perf script` (perhaps there is a better
212 way...). I've sometimes found that `perf script` outputs C++ mangled
213 names. This is annoying. You can tell by running `perf script |
214 head` yourself — if you see names like `5rustc6middle` instead of
215 `rustc::middle`, then you have the same problem. You can solve this
216 by doing:
217
218```bash
60c5eb7d 219perf script | c++filt | perf focus --from-stdin ...
a1dfa0c6
XL
220```
221
222This will pipe the output from `perf script` through `c++filt` and
223should mostly convert those names into a more friendly format. The
224`--from-stdin` flag to `perf focus` tells it to get its data from
225stdin, rather than executing `perf focus`. We should make this more
226convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
227just always use it — it's pretty harmless).
228
229### Example: How much time does MIR borrowck spend solving traits?
230
231Perhaps we'd like to know how much time MIR borrowck spends in the
232trait checker. We can ask this using a more complex regex:
233
234```bash
60c5eb7d 235$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
a1dfa0c6
XL
236Matcher : {do_mir_borrowck},..{^rustc::traits}
237Matches : 12
238Not Matches: 1311
239Percentage : 0%
240```
241
242Here we used the `..` operator to ask "how often do we have
243`do_mir_borrowck` on the stack and then, later, some function whose
244name begins with `rusc::traits`?" (basically, code in that module). It
245turns out the answer is "almost never" — only 12 samples fit that
246description (if you ever see *no* samples, that often indicates your
247query is messed up).
248
249If you're curious, you can find out exactly which samples by using the
250`--print-match` option. This will print out the full backtrace for
251each sample. The `|` at the front of the line indicates the part that
252the regular expression matched.
253
254### Example: Where does MIR borrowck spend its time?
255
256Often we want to do a more "explorational" queries. Like, we know that
257MIR borrowck is 29% of the time, but where does that time get spent?
258For that, the `--tree-callees` option is often the best tool. You
259usually also want to give `--tree-min-percent` or
260`--tree-max-depth`. The result looks like this:
261
262```bash
60c5eb7d 263$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
a1dfa0c6
XL
264Matcher : {do_mir_borrowck}
265Matches : 577
266Not Matches: 746
267Percentage : 43%
268
269Tree
270| matched `{do_mir_borrowck}` (43% total, 0% self)
271: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
272: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
273: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
274: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
dc9dc135 275: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
a1dfa0c6 276: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
dc9dc135 277: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
a1dfa0c6
XL
278: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
279```
280
281What happens with `--tree-callees` is that
282
283- we find each sample matching the regular expression
284- we look at the code that is occurs *after* the regex match and try
285 to build up a call tree
286
287The `--tree-min-percent 3` option says "only show me things that take
288more than 3% of the time. Without this, the tree often gets really
289noisy and includes random stuff like the innards of
290malloc. `--tree-max-depth` can be useful too, it just limits how many
291levels we print.
292
293For each line, we display the percent of time in that function
294altogether ("total") and the percent of time spent in **just that
295function and not some callee of that function** (self). Usually
296"total" is the more interesting number, but not always.
297
dc9dc135 298### Relative percentages
a1dfa0c6
XL
299
300By default, all in perf-focus are relative to the **total program
301execution**. This is useful to help you keep perspective — often as
302we drill down to find hot spots, we can lose sight of the fact that,
303in terms of overall program execution, this "hot spot" is actually not
304important. It also ensures that percentages between different queries
305are easily compared against one another.
306
307That said, sometimes it's useful to get relative percentages, so `perf
308focus` offers a `--relative` option. In this case, the percentages are
309listed only for samples that match (vs all samples). So for example we
310could get our percentages relative to the borrowck itself
311like so:
312
313```bash
60c5eb7d 314$ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
a1dfa0c6
XL
315Matcher : {do_mir_borrowck}
316Matches : 577
317Not Matches: 746
318Percentage : 100%
319
320Tree
321| matched `{do_mir_borrowck}` (100% total, 0% self)
322: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
323: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
dc9dc135 324: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
a1dfa0c6
XL
325: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]
326```
327
328Here you see that `compute_regions` came up as "47% total" — that
329means that 47% of `do_mir_borrowck` is spent in that function. Before,
330we saw 20% — that's because `do_mir_borrowck` itself is only 43% of
331the total time (and `.47 * .43 = .20`).