]> git.proxmox.com Git - rustc.git/blob - src/doc/embedded-book/src/unsorted/speed-vs-size.md
New upstream version 1.44.1+dfsg1
[rustc.git] / src / doc / embedded-book / src / unsorted / speed-vs-size.md
1 # Optimizations: the speed size tradeoff
2
3 Everyone wants their program to be super fast and super small but it's usually
4 not possible to have both characteristics. This section discusses the
5 different optimization levels that `rustc` provides and how they affect the
6 execution time and binary size of a program.
7
8 ## No optimizations
9
10 This is the default. When you call `cargo build` you use the development (AKA
11 `dev`) profile. This profile is optimized for debugging so it enables debug
12 information and does *not* enable any optimizations, i.e. it uses `-C opt-level
13 = 0`.
14
15 At least for bare metal development, debuginfo is zero cost in the sense that it
16 won't occupy space in Flash / ROM so we actually recommend that you enable
17 debuginfo in the release profile -- it is disabled by default. That will let you
18 use breakpoints when debugging release builds.
19
20 ``` toml
21 [profile.release]
22 # symbols are nice and they don't increase the size on Flash
23 debug = true
24 ```
25
26 No optimizations is great for debugging because stepping through the code feels
27 like you are executing the program statement by statement, plus you can `print`
28 stack variables and function arguments in GDB. When the code is optimized, trying
29 to print variables results in `$0 = <value optimized out>` being printed.
30
31 The biggest downside of the `dev` profile is that the resulting binary will be
32 huge and slow. The size is usually more of a problem because unoptimized
33 binaries can occupy dozens of KiB of Flash, which your target device may not
34 have -- the result: your unoptimized binary doesn't fit in your device!
35
36 Can we have smaller, debugger friendly binaries? Yes, there's a trick.
37
38 ### Optimizing dependencies
39
40 There's a Cargo feature named [`profile-overrides`] that lets you
41 override the optimization level of dependencies. You can use that feature to
42 optimize all dependencies for size while keeping the top crate unoptimized and
43 debugger friendly.
44
45 [`profile-overrides`]: https://doc.rust-lang.org/cargo/reference/profiles.html#overrides
46
47 Here's an example:
48
49 ``` toml
50 # Cargo.toml
51 [package]
52 name = "app"
53 # ..
54
55 [profile.dev.package."*"] # +
56 opt-level = "z" # +
57 ```
58
59 Without the override:
60
61 ``` console
62 $ cargo size --bin app -- -A
63 app :
64 section size addr
65 .vector_table 1024 0x8000000
66 .text 9060 0x8000400
67 .rodata 1708 0x8002780
68 .data 0 0x20000000
69 .bss 4 0x20000000
70 ```
71
72 With the override:
73
74 ``` console
75 $ cargo size --bin app -- -A
76 app :
77 section size addr
78 .vector_table 1024 0x8000000
79 .text 3490 0x8000400
80 .rodata 1100 0x80011c0
81 .data 0 0x20000000
82 .bss 4 0x20000000
83 ```
84
85 That's a 6 KiB reduction in Flash usage without any loss in the debuggability of
86 the top crate. If you step into a dependency then you'll start seeing those
87 `<value optimized out>` messages again but it's usually the case that you want
88 to debug the top crate and not the dependencies. And if you *do* need to debug a
89 dependency then you can use the `profile-overrides` feature to exclude a
90 particular dependency from being optimized. See example below:
91
92 ``` toml
93 # ..
94
95 # don't optimize the `cortex-m-rt` crate
96 [profile.dev.package.cortex-m-rt] # +
97 opt-level = 0 # +
98
99 # but do optimize all the other dependencies
100 [profile.dev.package."*"]
101 codegen-units = 1 # better optimizations
102 opt-level = "z"
103 ```
104
105 Now the top crate and `cortex-m-rt` are debugger friendly!
106
107 ## Optimize for speed
108
109 As of 2018-09-18 `rustc` supports three "optimize for speed" levels: `opt-level
110 = 1`, `2` and `3`. When you run `cargo build --release` you are using the release
111 profile which defaults to `opt-level = 3`.
112
113 Both `opt-level = 2` and `3` optimize for speed at the expense of binary size,
114 but level `3` does more vectorization and inlining than level `2`. In
115 particular, you'll see that at `opt-level` equal to or greater than `2` LLVM will
116 unroll loops. Loop unrolling has a rather high cost in terms of Flash / ROM
117 (e.g. from 26 bytes to 194 for a zero this array loop) but can also halve the
118 execution time given the right conditions (e.g. number of iterations is big
119 enough).
120
121 Currently there's no way to disable loop unrolling in `opt-level = 2` and `3` so
122 if you can't afford its cost you should optimize your program for size.
123
124 ## Optimize for size
125
126 As of 2018-09-18 `rustc` supports two "optimize for size" levels: `opt-level =
127 "s"` and `"z"`. These names were inherited from clang / LLVM and are not too
128 descriptive but `"z"` is meant to give the idea that it produces smaller
129 binaries than `"s"`.
130
131 If you want your release binaries to be optimized for size then change the
132 `profile.release.opt-level` setting in `Cargo.toml` as shown below.
133
134 ``` toml
135 [profile.release]
136 # or "z"
137 opt-level = "s"
138 ```
139
140 These two optimization levels greatly reduce LLVM's inline threshold, a metric
141 used to decide whether to inline a function or not. One of Rust principles are
142 zero cost abstractions; these abstractions tend to use a lot of newtypes and
143 small functions to hold invariants (e.g. functions that borrow an inner value
144 like `deref`, `as_ref`) so a low inline threshold can make LLVM miss
145 optimization opportunities (e.g. eliminate dead branches, inline calls to
146 closures).
147
148 When optimizing for size you may want to try increasing the inline threshold to
149 see if that has any effect on the binary size. The recommended way to change the
150 inline threshold is to append the `-C inline-threshold` flag to the other
151 rustflags in `.cargo/config`.
152
153 ``` toml
154 # .cargo/config
155 # this assumes that you are using the cortex-m-quickstart template
156 [target.'cfg(all(target_arch = "arm", target_os = "none"))']
157 rustflags = [
158 # ..
159 "-C", "inline-threshold=123", # +
160 ]
161 ```
162
163 What value to use? [As of 1.29.0 these are the inline thresholds that the
164 different optimization levels use][inline-threshold]:
165
166 [inline-threshold]: https://github.com/rust-lang/rust/blob/1.29.0/src/librustc_codegen_llvm/back/write.rs#L2105-L2122
167
168 - `opt-level = 3` uses 275
169 - `opt-level = 2` uses 225
170 - `opt-level = "s"` uses 75
171 - `opt-level = "z"` uses 25
172
173 You should try `225` and `275` when optimizing for size.