]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # Profiling the compiler |
2 | ||
6a06907d | 3 | This section talks about how to profile the compiler and find out where it spends its time. |
e1599b0c XL |
4 | |
5 | Depending on what you're trying to measure, there are several different approaches: | |
6 | ||
7 | - If you want to see if a PR improves or regresses compiler performance: | |
ba9703b0 | 8 | - The [rustc-perf](https://github.com/rust-lang/rustc-perf) project makes this easy and can be triggered to run on a PR via the `@rustc-perf` bot. |
6a06907d | 9 | |
e1599b0c | 10 | - If you want a medium-to-high level overview of where `rustc` is spending its time: |
6a06907d | 11 | - The `-Z self-profile` flag and [measureme](https://github.com/rust-lang/measureme) tools offer a query-based approach to profiling. |
e1599b0c | 12 | See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information. |
6a06907d | 13 | |
e1599b0c | 14 | - If you want function level performance data or even just more details than the above approaches: |
6a06907d XL |
15 | - Consider using a native code profiler such as [perf](profiling/with_perf.html) |
16 | - or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision, | |
17 | full-featured graphical interface. | |
18 | ||
19 | - If you want a nice visual representation of the compile times of your crate graph, | |
20 | you can use [cargo's `-Z timings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings), | |
21 | eg. `cargo -Z timings build`. | |
22 | You can use this flag on the compiler itself with `CARGOFLAGS="-Z timings" ./x.py build` | |
23 | ||
24 | - If you want to profile memory usage, you can use various tools depending on what operating system | |
25 | you are using. | |
26 | - For Windows, read our [WPA guide](profiling/wpa_profiling.html). | |
27 | ||
28 | ## Optimizing rustc's bootstrap times with `cargo-llvm-lines` | |
29 | ||
30 | Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the | |
31 | number of lines of LLVM IR across all instantiations of a generic function. | |
32 | Since most of the time compiling rustc is spent in LLVM, the idea is that by | |
33 | reducing the amount of code passed to LLVM, compiling rustc gets faster. | |
34 | ||
35 | To use `cargo-llvm-lines` together with somewhat custom rustc build process, you can use | |
36 | `-C save-temps` to obtain required LLVM IR. The option preserves temporary work products | |
37 | created during compilation. Among those is LLVM IR that represents an input to the | |
38 | optimization pipeline; ideal for our purposes. It is stored in files with `*.no-opt.bc` | |
39 | extension in LLVM bitcode format. | |
40 | ||
41 | Example usage: | |
42 | ``` | |
43 | cargo install cargo-llvm-lines | |
44 | # On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P | |
45 | ||
46 | # Do a clean before every run, to not mix in the results from previous runs. | |
47 | ./x.py clean | |
48 | env RUSTFLAGS=-Csave-temps ./x.py build --stage 0 compiler/rustc | |
49 | ||
50 | # Single crate, e.g., rustc_middle. (Relies on the glob support of your shell.) | |
51 | # Convert unoptimized LLVM bitcode into a human readable LLVM assembly accepted by cargo-llvm-lines. | |
52 | for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.no-opt.bc; do | |
53 | ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f" | |
54 | done | |
55 | cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.ll > llvm-lines-middle.txt | |
56 | ||
57 | # Specify all crates of the compiler. | |
58 | for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.no-opt.bc; do | |
59 | ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f" | |
60 | done | |
61 | cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.ll > llvm-lines.txt | |
62 | ``` | |
63 | ||
64 | Example output for the compiler: | |
65 | ``` | |
66 | Lines Copies Function name | |
67 | ----- ------ ------------- | |
68 | 45207720 (100%) 1583774 (100%) (TOTAL) | |
69 | 2102350 (4.7%) 146650 (9.3%) core::ptr::drop_in_place | |
70 | 615080 (1.4%) 8392 (0.5%) std::thread::local::LocalKey<T>::try_with | |
71 | 594296 (1.3%) 1780 (0.1%) hashbrown::raw::RawTable<T>::rehash_in_place | |
72 | 592071 (1.3%) 9691 (0.6%) core::option::Option<T>::map | |
73 | 528172 (1.2%) 5741 (0.4%) core::alloc::layout::Layout::array | |
74 | 466854 (1.0%) 8863 (0.6%) core::ptr::swap_nonoverlapping_one | |
75 | 412736 (0.9%) 1780 (0.1%) hashbrown::raw::RawTable<T>::resize | |
76 | 367776 (0.8%) 2554 (0.2%) alloc::raw_vec::RawVec<T,A>::grow_amortized | |
77 | 367507 (0.8%) 643 (0.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl | |
78 | 355882 (0.8%) 6332 (0.4%) alloc::alloc::box_free | |
79 | 354556 (0.8%) 14213 (0.9%) core::ptr::write | |
80 | 354361 (0.8%) 3590 (0.2%) core::iter::traits::iterator::Iterator::fold | |
81 | 347761 (0.8%) 3873 (0.2%) rustc_middle::ty::context::tls::set_tlv | |
82 | 337534 (0.7%) 2377 (0.2%) alloc::raw_vec::RawVec<T,A>::allocate_in | |
83 | 331690 (0.7%) 3192 (0.2%) hashbrown::raw::RawTable<T>::find | |
84 | 328756 (0.7%) 3978 (0.3%) rustc_middle::ty::context::tls::with_context_opt | |
85 | 326903 (0.7%) 642 (0.0%) rustc_query_system::query::plumbing::try_execute_query | |
86 | ``` | |
87 | ||
88 | Since this doesn't seem to work with incremental compilation or `x.py check`, | |
89 | you will be compiling rustc _a lot_. | |
90 | I recommend changing a few settings in `config.toml` to make it bearable: | |
91 | ``` | |
92 | [rust] | |
93 | # A debug build takes _a third_ as long on my machine, | |
94 | # but compiling more than stage0 rustc becomes unbearably slow. | |
95 | optimize = false | |
96 | ||
97 | # We can't use incremental anyway, so we disable it for a little speed boost. | |
98 | incremental = false | |
99 | # We won't be running it, so no point in compiling debug checks. | |
100 | debug = false | |
101 | ||
102 | # Using a single codegen unit gives less output, but is slower to compile. | |
103 | codegen-units = 0 # num_cpus | |
104 | ``` | |
105 | ||
106 | The llvm-lines output is affected by several options. | |
107 | `optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB. | |
a1dfa0c6 | 108 | |
6a06907d XL |
109 | MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Z |
110 | mir-opt-level=1"`, level 0 adds 0.3GB and level 2 removes 0.2GB. | |
111 | As of <!-- date: 2021-01 --> January 2021, inlining currently only happens in | |
112 | LLVM but this might change in the future. |