]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/profiling.md
New upstream version 1.52.0~beta.3+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / profiling.md
CommitLineData
a1dfa0c6
XL
1# Profiling the compiler
2
6a06907d 3This section talks about how to profile the compiler and find out where it spends its time.
e1599b0c
XL
4
5Depending on what you're trying to measure, there are several different approaches:
6
7- If you want to see if a PR improves or regresses compiler performance:
ba9703b0 8 - The [rustc-perf](https://github.com/rust-lang/rustc-perf) project makes this easy and can be triggered to run on a PR via the `@rustc-perf` bot.
6a06907d 9
e1599b0c 10- If you want a medium-to-high level overview of where `rustc` is spending its time:
6a06907d 11 - The `-Z self-profile` flag and [measureme](https://github.com/rust-lang/measureme) tools offer a query-based approach to profiling.
e1599b0c 12 See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information.
6a06907d 13
e1599b0c 14- If you want function level performance data or even just more details than the above approaches:
6a06907d
XL
15 - Consider using a native code profiler such as [perf](profiling/with_perf.html)
16 - or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision,
17 full-featured graphical interface.
18
19- If you want a nice visual representation of the compile times of your crate graph,
20 you can use [cargo's `-Z timings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings),
21 eg. `cargo -Z timings build`.
22 You can use this flag on the compiler itself with `CARGOFLAGS="-Z timings" ./x.py build`
23
24- If you want to profile memory usage, you can use various tools depending on what operating system
25 you are using.
26 - For Windows, read our [WPA guide](profiling/wpa_profiling.html).
27
28## Optimizing rustc's bootstrap times with `cargo-llvm-lines`
29
30Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
31number of lines of LLVM IR across all instantiations of a generic function.
32Since most of the time compiling rustc is spent in LLVM, the idea is that by
33reducing the amount of code passed to LLVM, compiling rustc gets faster.
34
35To use `cargo-llvm-lines` together with somewhat custom rustc build process, you can use
36`-C save-temps` to obtain required LLVM IR. The option preserves temporary work products
37created during compilation. Among those is LLVM IR that represents an input to the
38optimization pipeline; ideal for our purposes. It is stored in files with `*.no-opt.bc`
39extension in LLVM bitcode format.
40
41Example usage:
42```
43cargo install cargo-llvm-lines
44# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P
45
46# Do a clean before every run, to not mix in the results from previous runs.
47./x.py clean
48env RUSTFLAGS=-Csave-temps ./x.py build --stage 0 compiler/rustc
49
50# Single crate, e.g., rustc_middle. (Relies on the glob support of your shell.)
51# Convert unoptimized LLVM bitcode into a human readable LLVM assembly accepted by cargo-llvm-lines.
52for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.no-opt.bc; do
53 ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f"
54done
55cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.ll > llvm-lines-middle.txt
56
57# Specify all crates of the compiler.
58for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.no-opt.bc; do
59 ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f"
60done
61cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.ll > llvm-lines.txt
62```
63
64Example output for the compiler:
65```
66 Lines Copies Function name
67 ----- ------ -------------
68 45207720 (100%) 1583774 (100%) (TOTAL)
69 2102350 (4.7%) 146650 (9.3%) core::ptr::drop_in_place
70 615080 (1.4%) 8392 (0.5%) std::thread::local::LocalKey<T>::try_with
71 594296 (1.3%) 1780 (0.1%) hashbrown::raw::RawTable<T>::rehash_in_place
72 592071 (1.3%) 9691 (0.6%) core::option::Option<T>::map
73 528172 (1.2%) 5741 (0.4%) core::alloc::layout::Layout::array
74 466854 (1.0%) 8863 (0.6%) core::ptr::swap_nonoverlapping_one
75 412736 (0.9%) 1780 (0.1%) hashbrown::raw::RawTable<T>::resize
76 367776 (0.8%) 2554 (0.2%) alloc::raw_vec::RawVec<T,A>::grow_amortized
77 367507 (0.8%) 643 (0.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
78 355882 (0.8%) 6332 (0.4%) alloc::alloc::box_free
79 354556 (0.8%) 14213 (0.9%) core::ptr::write
80 354361 (0.8%) 3590 (0.2%) core::iter::traits::iterator::Iterator::fold
81 347761 (0.8%) 3873 (0.2%) rustc_middle::ty::context::tls::set_tlv
82 337534 (0.7%) 2377 (0.2%) alloc::raw_vec::RawVec<T,A>::allocate_in
83 331690 (0.7%) 3192 (0.2%) hashbrown::raw::RawTable<T>::find
84 328756 (0.7%) 3978 (0.3%) rustc_middle::ty::context::tls::with_context_opt
85 326903 (0.7%) 642 (0.0%) rustc_query_system::query::plumbing::try_execute_query
86```
87
88Since this doesn't seem to work with incremental compilation or `x.py check`,
89you will be compiling rustc _a lot_.
90I recommend changing a few settings in `config.toml` to make it bearable:
91```
92[rust]
93# A debug build takes _a third_ as long on my machine,
94# but compiling more than stage0 rustc becomes unbearably slow.
95optimize = false
96
97# We can't use incremental anyway, so we disable it for a little speed boost.
98incremental = false
99# We won't be running it, so no point in compiling debug checks.
100debug = false
101
102# Using a single codegen unit gives less output, but is slower to compile.
103codegen-units = 0 # num_cpus
104```
105
106The llvm-lines output is affected by several options.
107`optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB.
a1dfa0c6 108
6a06907d
XL
109MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Z
110mir-opt-level=1"`, level 0 adds 0.3GB and level 2 removes 0.2GB.
111As of <!-- date: 2021-01 --> January 2021, inlining currently only happens in
112LLVM but this might change in the future.