]>
Commit | Line | Data |
---|---|---|
9fa01778 XL |
1 | # Optimizations: the speed size tradeoff |
2 | ||
3 | Everyone wants their program to be super fast and super small but it's usually | |
532ac7d7 XL |
4 | not possible to have both characteristics. This section discusses the |
5 | different optimization levels that `rustc` provides and how they affect the | |
9fa01778 XL |
6 | execution time and binary size of a program. |
7 | ||
8 | ## No optimizations | |
9 | ||
10 | This is the default. When you call `cargo build` you use the development (AKA | |
11 | `dev`) profile. This profile is optimized for debugging so it enables debug | |
12 | information and does *not* enable any optimizations, i.e. it uses `-C opt-level | |
13 | = 0`. | |
14 | ||
15 | At least for bare metal development, debuginfo is zero cost in the sense that it | |
16 | won't occupy space in Flash / ROM so we actually recommend that you enable | |
17 | debuginfo in the release profile -- it is disabled by default. That will let you | |
18 | use breakpoints when debugging release builds. | |
19 | ||
20 | ``` toml | |
21 | [profile.release] | |
22 | # symbols are nice and they don't increase the size on Flash | |
23 | debug = true | |
24 | ``` | |
25 | ||
26 | No optimizations is great for debugging because stepping through the code feels | |
27 | like you are executing the program statement by statement, plus you can `print` | |
532ac7d7 | 28 | stack variables and function arguments in GDB. When the code is optimized, trying |
9fa01778 XL |
29 | to print variables results in `$0 = <value optimized out>` being printed. |
30 | ||
31 | The biggest downside of the `dev` profile is that the resulting binary will be | |
32 | huge and slow. The size is usually more of a problem because unoptimized | |
33 | binaries can occupy dozens of KiB of Flash, which your target device may not | |
34 | have -- the result: your unoptimized binary doesn't fit in your device! | |
35 | ||
532ac7d7 | 36 | Can we have smaller, debugger friendly binaries? Yes, there's a trick. |
9fa01778 XL |
37 | |
38 | ### Optimizing dependencies | |
39 | ||
ba9703b0 | 40 | There's a Cargo feature named [`profile-overrides`] that lets you |
9fa01778 XL |
41 | override the optimization level of dependencies. You can use that feature to |
42 | optimize all dependencies for size while keeping the top crate unoptimized and | |
43 | debugger friendly. | |
44 | ||
ba9703b0 | 45 | [`profile-overrides`]: https://doc.rust-lang.org/cargo/reference/profiles.html#overrides |
9fa01778 XL |
46 | |
47 | Here's an example: | |
48 | ||
49 | ``` toml | |
50 | # Cargo.toml | |
9fa01778 XL |
51 | [package] |
52 | name = "app" | |
53 | # .. | |
54 | ||
ba9703b0 | 55 | [profile.dev.package."*"] # + |
9fa01778 XL |
56 | opt-level = "z" # + |
57 | ``` | |
58 | ||
59 | Without the override: | |
60 | ||
61 | ``` console | |
62 | $ cargo size --bin app -- -A | |
63 | app : | |
64 | section size addr | |
65 | .vector_table 1024 0x8000000 | |
66 | .text 9060 0x8000400 | |
67 | .rodata 1708 0x8002780 | |
68 | .data 0 0x20000000 | |
69 | .bss 4 0x20000000 | |
70 | ``` | |
71 | ||
72 | With the override: | |
73 | ||
74 | ``` console | |
75 | $ cargo size --bin app -- -A | |
76 | app : | |
77 | section size addr | |
78 | .vector_table 1024 0x8000000 | |
79 | .text 3490 0x8000400 | |
80 | .rodata 1100 0x80011c0 | |
81 | .data 0 0x20000000 | |
82 | .bss 4 0x20000000 | |
83 | ``` | |
84 | ||
85 | That's a 6 KiB reduction in Flash usage without any loss in the debuggability of | |
86 | the top crate. If you step into a dependency then you'll start seeing those | |
87 | `<value optimized out>` messages again but it's usually the case that you want | |
88 | to debug the top crate and not the dependencies. And if you *do* need to debug a | |
89 | dependency then you can use the `profile-overrides` feature to exclude a | |
90 | particular dependency from being optimized. See example below: | |
91 | ||
92 | ``` toml | |
93 | # .. | |
94 | ||
95 | # don't optimize the `cortex-m-rt` crate | |
ba9703b0 | 96 | [profile.dev.package.cortex-m-rt] # + |
9fa01778 XL |
97 | opt-level = 0 # + |
98 | ||
99 | # but do optimize all the other dependencies | |
ba9703b0 | 100 | [profile.dev.package."*"] |
9fa01778 XL |
101 | codegen-units = 1 # better optimizations |
102 | opt-level = "z" | |
103 | ``` | |
104 | ||
105 | Now the top crate and `cortex-m-rt` are debugger friendly! | |
106 | ||
107 | ## Optimize for speed | |
108 | ||
109 | As of 2018-09-18 `rustc` supports three "optimize for speed" levels: `opt-level | |
110 | = 1`, `2` and `3`. When you run `cargo build --release` you are using the release | |
111 | profile which defaults to `opt-level = 3`. | |
112 | ||
113 | Both `opt-level = 2` and `3` optimize for speed at the expense of binary size, | |
114 | but level `3` does more vectorization and inlining than level `2`. In | |
532ac7d7 | 115 | particular, you'll see that at `opt-level` equal to or greater than `2` LLVM will |
9fa01778 XL |
116 | unroll loops. Loop unrolling has a rather high cost in terms of Flash / ROM |
117 | (e.g. from 26 bytes to 194 for a zero this array loop) but can also halve the | |
118 | execution time given the right conditions (e.g. number of iterations is big | |
119 | enough). | |
120 | ||
121 | Currently there's no way to disable loop unrolling in `opt-level = 2` and `3` so | |
122 | if you can't afford its cost you should optimize your program for size. | |
123 | ||
124 | ## Optimize for size | |
125 | ||
126 | As of 2018-09-18 `rustc` supports two "optimize for size" levels: `opt-level = | |
127 | "s"` and `"z"`. These names were inherited from clang / LLVM and are not too | |
128 | descriptive but `"z"` is meant to give the idea that it produces smaller | |
129 | binaries than `"s"`. | |
130 | ||
131 | If you want your release binaries to be optimized for size then change the | |
132 | `profile.release.opt-level` setting in `Cargo.toml` as shown below. | |
133 | ||
134 | ``` toml | |
135 | [profile.release] | |
136 | # or "z" | |
137 | opt-level = "s" | |
138 | ``` | |
139 | ||
140 | These two optimization levels greatly reduce LLVM's inline threshold, a metric | |
141 | used to decide whether to inline a function or not. One of Rust principles are | |
142 | zero cost abstractions; these abstractions tend to use a lot of newtypes and | |
143 | small functions to hold invariants (e.g. functions that borrow an inner value | |
144 | like `deref`, `as_ref`) so a low inline threshold can make LLVM miss | |
145 | optimization opportunities (e.g. eliminate dead branches, inline calls to | |
146 | closures). | |
147 | ||
148 | When optimizing for size you may want to try increasing the inline threshold to | |
149 | see if that has any effect on the binary size. The recommended way to change the | |
150 | inline threshold is to append the `-C inline-threshold` flag to the other | |
151 | rustflags in `.cargo/config`. | |
152 | ||
153 | ``` toml | |
154 | # .cargo/config | |
155 | # this assumes that you are using the cortex-m-quickstart template | |
156 | [target.'cfg(all(target_arch = "arm", target_os = "none"))'] | |
157 | rustflags = [ | |
158 | # .. | |
159 | "-C", "inline-threshold=123", # + | |
160 | ] | |
161 | ``` | |
162 | ||
163 | What value to use? [As of 1.29.0 these are the inline thresholds that the | |
164 | different optimization levels use][inline-threshold]: | |
165 | ||
166 | [inline-threshold]: https://github.com/rust-lang/rust/blob/1.29.0/src/librustc_codegen_llvm/back/write.rs#L2105-L2122 | |
167 | ||
168 | - `opt-level = 3` uses 275 | |
169 | - `opt-level = 2` uses 225 | |
170 | - `opt-level = "s"` uses 75 | |
171 | - `opt-level = "z"` uses 25 | |
172 | ||
173 | You should try `225` and `275` when optimizing for size. |