[rustc.git] / src / doc / rustc-dev-guide / src / profiling.md

# Profiling the compiler

This section talks about how to profile the compiler and find out where it spends its time.

Depending on what you're trying to measure, there are several different approaches:

- If you want to see if a PR improves or regresses compiler performance:
  - The [rustc-perf](https://github.com/rust-lang/rustc-perf) project makes this easy and can be triggered to run on a PR via the `@rustc-perf` bot.

- If you want a medium-to-high level overview of where `rustc` is spending its time:
  - The `-Z self-profile` flag and [measureme](https://github.com/rust-lang/measureme) tools offer a query-based approach to profiling.
    See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information.

- If you want function level performance data or even just more details than the above approaches:
  - Consider using a native code profiler such as [perf](profiling/with_perf.html)
  - or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision,
    full-featured graphical interface.

- If you want a nice visual representation of the compile times of your crate graph,
  you can use [cargo's `-Z timings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings),
  eg. `cargo -Z timings build`.
  You can use this flag on the compiler itself with `CARGOFLAGS="-Z timings" ./x.py build`

- If you want to profile memory usage, you can use various tools depending on what operating system
  you are using.
  - For Windows, read our [WPA guide](profiling/wpa_profiling.html).

## Optimizing rustc's bootstrap times with `cargo-llvm-lines`

Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
number of lines of LLVM IR across all instantiations of a generic function.
Since most of the time compiling rustc is spent in LLVM, the idea is that by
reducing the amount of code passed to LLVM, compiling rustc gets faster.

To use `cargo-llvm-lines` together with somewhat custom rustc build process, you can use
`-C save-temps` to obtain required LLVM IR. The option preserves temporary work products
created during compilation. Among those is LLVM IR that represents an input to the
optimization pipeline; ideal for our purposes. It is stored in files with `*.no-opt.bc`
extension in LLVM bitcode format.

Example usage:
```
cargo install cargo-llvm-lines
# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P

# Do a clean before every run, to not mix in the results from previous runs.
./x.py clean
env RUSTFLAGS=-Csave-temps ./x.py build --stage 0 compiler/rustc

# Single crate, e.g., rustc_middle. (Relies on the glob support of your shell.)
# Convert unoptimized LLVM bitcode into a human readable LLVM assembly accepted by cargo-llvm-lines.
for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.no-opt.bc; do
  ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f"
done
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.ll > llvm-lines-middle.txt

# Specify all crates of the compiler.
for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.no-opt.bc; do
  ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f"
done
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.ll > llvm-lines.txt
```

Example output for the compiler:
```
  Lines            Copies          Function name
  -----            ------          -------------
  45207720 (100%)  1583774 (100%)  (TOTAL)
   2102350 (4.7%)   146650 (9.3%)  core::ptr::drop_in_place
    615080 (1.4%)     8392 (0.5%)  std::thread::local::LocalKey<T>::try_with
    594296 (1.3%)     1780 (0.1%)  hashbrown::raw::RawTable<T>::rehash_in_place
    592071 (1.3%)     9691 (0.6%)  core::option::Option<T>::map
    528172 (1.2%)     5741 (0.4%)  core::alloc::layout::Layout::array
    466854 (1.0%)     8863 (0.6%)  core::ptr::swap_nonoverlapping_one
    412736 (0.9%)     1780 (0.1%)  hashbrown::raw::RawTable<T>::resize
    367776 (0.8%)     2554 (0.2%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    367507 (0.8%)      643 (0.0%)  rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
    355882 (0.8%)     6332 (0.4%)  alloc::alloc::box_free
    354556 (0.8%)    14213 (0.9%)  core::ptr::write
    354361 (0.8%)     3590 (0.2%)  core::iter::traits::iterator::Iterator::fold
    347761 (0.8%)     3873 (0.2%)  rustc_middle::ty::context::tls::set_tlv
    337534 (0.7%)     2377 (0.2%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    331690 (0.7%)     3192 (0.2%)  hashbrown::raw::RawTable<T>::find
    328756 (0.7%)     3978 (0.3%)  rustc_middle::ty::context::tls::with_context_opt
    326903 (0.7%)      642 (0.0%)  rustc_query_system::query::plumbing::try_execute_query
```

Since this doesn't seem to work with incremental compilation or `x.py check`,
you will be compiling rustc _a lot_.
I recommend changing a few settings in `config.toml` to make it bearable:
```
[rust]
# A debug build takes _a third_ as long on my machine,
# but compiling more than stage0 rustc becomes unbearably slow.
optimize = false

# We can't use incremental anyway, so we disable it for a little speed boost.
incremental = false
# We won't be running it, so no point in compiling debug checks.
debug = false

# Using a single codegen unit gives less output, but is slower to compile.
codegen-units = 0  # num_cpus
```

The llvm-lines output is affected by several options.
`optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB.

MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Z
mir-opt-level=1"`, level 0 adds 0.3GB and level 2 removes 0.2GB.
As of <!-- date: 2021-01 --> January 2021, inlining currently only happens in
LLVM but this might change in the future.
Commit	Line	Data
a1dfa0c6 XL	1	# Profiling the compiler
a1dfa0c6 XL	2
6a06907d	3	This section talks about how to profile the compiler and find out where it spends its time.
e1599b0c XL	4
	5	Depending on what you're trying to measure, there are several different approaches:
	6
	7	- If you want to see if a PR improves or regresses compiler performance:
ba9703b0	8	- The [rustc-perf](https://github.com/rust-lang/rustc-perf) project makes this easy and can be triggered to run on a PR via the `@rustc-perf` bot.
6a06907d	9
e1599b0c	10	- If you want a medium-to-high level overview of where `rustc` is spending its time:
6a06907d	11	- The `-Z self-profile` flag and [measureme](https://github.com/rust-lang/measureme) tools offer a query-based approach to profiling.
e1599b0c	12	See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information.
6a06907d	13
e1599b0c	14	- If you want function level performance data or even just more details than the above approaches:
6a06907d XL	15	- Consider using a native code profiler such as [perf](profiling/with_perf.html)
	16	- or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision,
	17	full-featured graphical interface.
	18
	19	- If you want a nice visual representation of the compile times of your crate graph,
	20	you can use [cargo's `-Z timings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings),
	21	eg. `cargo -Z timings build`.
	22	You can use this flag on the compiler itself with `CARGOFLAGS="-Z timings" ./x.py build`
	23
	24	- If you want to profile memory usage, you can use various tools depending on what operating system
	25	you are using.
	26	- For Windows, read our [WPA guide](profiling/wpa_profiling.html).
	27
	28	## Optimizing rustc's bootstrap times with `cargo-llvm-lines`
	29
	30	Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
	31	number of lines of LLVM IR across all instantiations of a generic function.
	32	Since most of the time compiling rustc is spent in LLVM, the idea is that by
	33	reducing the amount of code passed to LLVM, compiling rustc gets faster.
	34
	35	To use `cargo-llvm-lines` together with somewhat custom rustc build process, you can use
	36	`-C save-temps` to obtain required LLVM IR. The option preserves temporary work products
	37	created during compilation. Among those is LLVM IR that represents an input to the
	38	optimization pipeline; ideal for our purposes. It is stored in files with `*.no-opt.bc`
	39	extension in LLVM bitcode format.
	40
	41	Example usage:
	42	```
	43	cargo install cargo-llvm-lines
	44	# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P
	45
	46	# Do a clean before every run, to not mix in the results from previous runs.
	47	./x.py clean
	48	env RUSTFLAGS=-Csave-temps ./x.py build --stage 0 compiler/rustc
	49
	50	# Single crate, e.g., rustc_middle. (Relies on the glob support of your shell.)
	51	# Convert unoptimized LLVM bitcode into a human readable LLVM assembly accepted by cargo-llvm-lines.
	52	for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.no-opt.bc; do
	53	./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f"
	54	done
	55	cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.ll > llvm-lines-middle.txt
	56
	57	# Specify all crates of the compiler.
	58	for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.no-opt.bc; do
	59	./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f"
	60	done
	61	cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.ll > llvm-lines.txt
	62	```
	63
	64	Example output for the compiler:
	65	```
	66	Lines Copies Function name
	67	----- ------ -------------
	68	45207720 (100%) 1583774 (100%) (TOTAL)
	69	2102350 (4.7%) 146650 (9.3%) core::ptr::drop_in_place
	70	615080 (1.4%) 8392 (0.5%) std::thread::local::LocalKey<T>::try_with
	71	594296 (1.3%) 1780 (0.1%) hashbrown::raw::RawTable<T>::rehash_in_place
	72	592071 (1.3%) 9691 (0.6%) core::option::Option<T>::map
	73	528172 (1.2%) 5741 (0.4%) core::alloc::layout::Layout::array
	74	466854 (1.0%) 8863 (0.6%) core::ptr::swap_nonoverlapping_one
	75	412736 (0.9%) 1780 (0.1%) hashbrown::raw::RawTable<T>::resize
	76	367776 (0.8%) 2554 (0.2%) alloc::raw_vec::RawVec<T,A>::grow_amortized
	77	367507 (0.8%) 643 (0.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
	78	355882 (0.8%) 6332 (0.4%) alloc::alloc::box_free
79	354556 (0.8%) 14213 (0.9%) core::ptr::write
80	354361 (0.8%) 3590 (0.2%) core::iter::traits::iterator::Iterator::fold
81	347761 (0.8%) 3873 (0.2%) rustc_middle::ty::context::tls::set_tlv
82	337534 (0.7%) 2377 (0.2%) alloc::raw_vec::RawVec<T,A>::allocate_in
83	331690 (0.7%) 3192 (0.2%) hashbrown::raw::RawTable<T>::find
84	328756 (0.7%) 3978 (0.3%) rustc_middle::ty::context::tls::with_context_opt
85	326903 (0.7%) 642 (0.0%) rustc_query_system::query::plumbing::try_execute_query
86	```
87
88	Since this doesn't seem to work with incremental compilation or `x.py check`,
89	you will be compiling rustc _a lot_.
90	I recommend changing a few settings in `config.toml` to make it bearable:
91	```
92	[rust]
93	# A debug build takes _a third_ as long on my machine,
94	# but compiling more than stage0 rustc becomes unbearably slow.
95	optimize = false
96
97	# We can't use incremental anyway, so we disable it for a little speed boost.
98	incremental = false
99	# We won't be running it, so no point in compiling debug checks.
100	debug = false
101
102	# Using a single codegen unit gives less output, but is slower to compile.
103	codegen-units = 0 # num_cpus
104	```
105
106	The llvm-lines output is affected by several options.
107	`optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB.
a1dfa0c6	108
6a06907d XL	109	MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Z
	110	mir-opt-level=1"`, level 0 adds 0.3GB and level 2 removes 0.2GB.
	111	As of <!-- date: 2021-01 --> January 2021, inlining currently only happens in
	112	LLVM but this might change in the future.