]> git.proxmox.com Git - rustc.git/blame - src/doc/embedded-book/src/unsorted/speed-vs-size.md
New upstream version 1.44.1+dfsg1
[rustc.git] / src / doc / embedded-book / src / unsorted / speed-vs-size.md
CommitLineData
9fa01778
XL
1# Optimizations: the speed size tradeoff
2
3Everyone wants their program to be super fast and super small but it's usually
532ac7d7
XL
4not possible to have both characteristics. This section discusses the
5different optimization levels that `rustc` provides and how they affect the
9fa01778
XL
6execution time and binary size of a program.
7
8## No optimizations
9
10This is the default. When you call `cargo build` you use the development (AKA
11`dev`) profile. This profile is optimized for debugging so it enables debug
12information and does *not* enable any optimizations, i.e. it uses `-C opt-level
13= 0`.
14
15At least for bare metal development, debuginfo is zero cost in the sense that it
16won't occupy space in Flash / ROM so we actually recommend that you enable
17debuginfo in the release profile -- it is disabled by default. That will let you
18use breakpoints when debugging release builds.
19
20``` toml
21[profile.release]
22# symbols are nice and they don't increase the size on Flash
23debug = true
24```
25
26No optimizations is great for debugging because stepping through the code feels
27like you are executing the program statement by statement, plus you can `print`
532ac7d7 28stack variables and function arguments in GDB. When the code is optimized, trying
9fa01778
XL
29to print variables results in `$0 = <value optimized out>` being printed.
30
31The biggest downside of the `dev` profile is that the resulting binary will be
32huge and slow. The size is usually more of a problem because unoptimized
33binaries can occupy dozens of KiB of Flash, which your target device may not
34have -- the result: your unoptimized binary doesn't fit in your device!
35
532ac7d7 36Can we have smaller, debugger friendly binaries? Yes, there's a trick.
9fa01778
XL
37
38### Optimizing dependencies
39
ba9703b0 40There's a Cargo feature named [`profile-overrides`] that lets you
9fa01778
XL
41override the optimization level of dependencies. You can use that feature to
42optimize all dependencies for size while keeping the top crate unoptimized and
43debugger friendly.
44
ba9703b0 45[`profile-overrides`]: https://doc.rust-lang.org/cargo/reference/profiles.html#overrides
9fa01778
XL
46
47Here's an example:
48
49``` toml
50# Cargo.toml
9fa01778
XL
51[package]
52name = "app"
53# ..
54
ba9703b0 55[profile.dev.package."*"] # +
9fa01778
XL
56opt-level = "z" # +
57```
58
59Without the override:
60
61``` console
62$ cargo size --bin app -- -A
63app :
64section size addr
65.vector_table 1024 0x8000000
66.text 9060 0x8000400
67.rodata 1708 0x8002780
68.data 0 0x20000000
69.bss 4 0x20000000
70```
71
72With the override:
73
74``` console
75$ cargo size --bin app -- -A
76app :
77section size addr
78.vector_table 1024 0x8000000
79.text 3490 0x8000400
80.rodata 1100 0x80011c0
81.data 0 0x20000000
82.bss 4 0x20000000
83```
84
85That's a 6 KiB reduction in Flash usage without any loss in the debuggability of
86the top crate. If you step into a dependency then you'll start seeing those
87`<value optimized out>` messages again but it's usually the case that you want
88to debug the top crate and not the dependencies. And if you *do* need to debug a
89dependency then you can use the `profile-overrides` feature to exclude a
90particular dependency from being optimized. See example below:
91
92``` toml
93# ..
94
95# don't optimize the `cortex-m-rt` crate
ba9703b0 96[profile.dev.package.cortex-m-rt] # +
9fa01778
XL
97opt-level = 0 # +
98
99# but do optimize all the other dependencies
ba9703b0 100[profile.dev.package."*"]
9fa01778
XL
101codegen-units = 1 # better optimizations
102opt-level = "z"
103```
104
105Now the top crate and `cortex-m-rt` are debugger friendly!
106
107## Optimize for speed
108
109As of 2018-09-18 `rustc` supports three "optimize for speed" levels: `opt-level
110= 1`, `2` and `3`. When you run `cargo build --release` you are using the release
111profile which defaults to `opt-level = 3`.
112
113Both `opt-level = 2` and `3` optimize for speed at the expense of binary size,
114but level `3` does more vectorization and inlining than level `2`. In
532ac7d7 115particular, you'll see that at `opt-level` equal to or greater than `2` LLVM will
9fa01778
XL
116unroll loops. Loop unrolling has a rather high cost in terms of Flash / ROM
117(e.g. from 26 bytes to 194 for a zero this array loop) but can also halve the
118execution time given the right conditions (e.g. number of iterations is big
119enough).
120
121Currently there's no way to disable loop unrolling in `opt-level = 2` and `3` so
122if you can't afford its cost you should optimize your program for size.
123
124## Optimize for size
125
126As of 2018-09-18 `rustc` supports two "optimize for size" levels: `opt-level =
127"s"` and `"z"`. These names were inherited from clang / LLVM and are not too
128descriptive but `"z"` is meant to give the idea that it produces smaller
129binaries than `"s"`.
130
131If you want your release binaries to be optimized for size then change the
132`profile.release.opt-level` setting in `Cargo.toml` as shown below.
133
134``` toml
135[profile.release]
136# or "z"
137opt-level = "s"
138```
139
140These two optimization levels greatly reduce LLVM's inline threshold, a metric
141used to decide whether to inline a function or not. One of Rust principles are
142zero cost abstractions; these abstractions tend to use a lot of newtypes and
143small functions to hold invariants (e.g. functions that borrow an inner value
144like `deref`, `as_ref`) so a low inline threshold can make LLVM miss
145optimization opportunities (e.g. eliminate dead branches, inline calls to
146closures).
147
148When optimizing for size you may want to try increasing the inline threshold to
149see if that has any effect on the binary size. The recommended way to change the
150inline threshold is to append the `-C inline-threshold` flag to the other
151rustflags in `.cargo/config`.
152
153``` toml
154# .cargo/config
155# this assumes that you are using the cortex-m-quickstart template
156[target.'cfg(all(target_arch = "arm", target_os = "none"))']
157rustflags = [
158 # ..
159 "-C", "inline-threshold=123", # +
160]
161```
162
163What value to use? [As of 1.29.0 these are the inline thresholds that the
164different optimization levels use][inline-threshold]:
165
166[inline-threshold]: https://github.com/rust-lang/rust/blob/1.29.0/src/librustc_codegen_llvm/back/write.rs#L2105-L2122
167
168- `opt-level = 3` uses 275
169- `opt-level = 2` uses 225
170- `opt-level = "s"` uses 75
171- `opt-level = "z"` uses 25
172
173You should try `225` and `275` when optimizing for size.