1 # Optimizations: the speed size tradeoff
3 Everyone wants their program to be super fast and super small but it's usually
4 not possible to have both characteristics. This section discusses the
5 different optimization levels that `rustc` provides and how they affect the
6 execution time and binary size of a program.
10 This is the default. When you call `cargo build` you use the development (AKA
11 `dev`) profile. This profile is optimized for debugging so it enables debug
12 information and does *not* enable any optimizations, i.e. it uses `-C opt-level
15 At least for bare metal development, debuginfo is zero cost in the sense that it
16 won't occupy space in Flash / ROM so we actually recommend that you enable
17 debuginfo in the release profile -- it is disabled by default. That will let you
18 use breakpoints when debugging release builds.
22 # symbols are nice and they don't increase the size on Flash
26 No optimizations is great for debugging because stepping through the code feels
27 like you are executing the program statement by statement, plus you can `print`
28 stack variables and function arguments in GDB. When the code is optimized, trying
29 to print variables results in `$0 = <value optimized out>` being printed.
31 The biggest downside of the `dev` profile is that the resulting binary will be
32 huge and slow. The size is usually more of a problem because unoptimized
33 binaries can occupy dozens of KiB of Flash, which your target device may not
34 have -- the result: your unoptimized binary doesn't fit in your device!
36 Can we have smaller, debugger friendly binaries? Yes, there's a trick.
38 ### Optimizing dependencies
40 There's a Cargo feature named [`profile-overrides`] that lets you
41 override the optimization level of dependencies. You can use that feature to
42 optimize all dependencies for size while keeping the top crate unoptimized and
45 [`profile-overrides`]: https://doc.rust-lang.org/cargo/reference/profiles.html#overrides
55 [profile.dev.package."*"] # +
62 $ cargo size --bin app -- -A
65 .vector_table 1024 0x8000000
67 .rodata 1708 0x8002780
75 $ cargo size --bin app -- -A
78 .vector_table 1024 0x8000000
80 .rodata 1100 0x80011c0
85 That's a 6 KiB reduction in Flash usage without any loss in the debuggability of
86 the top crate. If you step into a dependency then you'll start seeing those
87 `<value optimized out>` messages again but it's usually the case that you want
88 to debug the top crate and not the dependencies. And if you *do* need to debug a
89 dependency then you can use the `profile-overrides` feature to exclude a
90 particular dependency from being optimized. See example below:
95 # don't optimize the `cortex-m-rt` crate
96 [profile.dev.package.cortex-m-rt] # +
99 # but do optimize all the other dependencies
100 [profile.dev.package."*"]
101 codegen-units = 1 # better optimizations
105 Now the top crate and `cortex-m-rt` are debugger friendly!
107 ## Optimize for speed
109 As of 2018-09-18 `rustc` supports three "optimize for speed" levels: `opt-level
110 = 1`, `2` and `3`. When you run `cargo build --release` you are using the release
111 profile which defaults to `opt-level = 3`.
113 Both `opt-level = 2` and `3` optimize for speed at the expense of binary size,
114 but level `3` does more vectorization and inlining than level `2`. In
115 particular, you'll see that at `opt-level` equal to or greater than `2` LLVM will
116 unroll loops. Loop unrolling has a rather high cost in terms of Flash / ROM
117 (e.g. from 26 bytes to 194 for a zero this array loop) but can also halve the
118 execution time given the right conditions (e.g. number of iterations is big
121 Currently there's no way to disable loop unrolling in `opt-level = 2` and `3` so
122 if you can't afford its cost you should optimize your program for size.
126 As of 2018-09-18 `rustc` supports two "optimize for size" levels: `opt-level =
127 "s"` and `"z"`. These names were inherited from clang / LLVM and are not too
128 descriptive but `"z"` is meant to give the idea that it produces smaller
131 If you want your release binaries to be optimized for size then change the
132 `profile.release.opt-level` setting in `Cargo.toml` as shown below.
140 These two optimization levels greatly reduce LLVM's inline threshold, a metric
141 used to decide whether to inline a function or not. One of Rust principles are
142 zero cost abstractions; these abstractions tend to use a lot of newtypes and
143 small functions to hold invariants (e.g. functions that borrow an inner value
144 like `deref`, `as_ref`) so a low inline threshold can make LLVM miss
145 optimization opportunities (e.g. eliminate dead branches, inline calls to
148 When optimizing for size you may want to try increasing the inline threshold to
149 see if that has any effect on the binary size. The recommended way to change the
150 inline threshold is to append the `-C inline-threshold` flag to the other
151 rustflags in `.cargo/config.toml`.
155 # this assumes that you are using the cortex-m-quickstart template
156 [target.'cfg(all(target_arch = "arm", target_os = "none"))']
159 "-C", "inline-threshold=123", # +
163 What value to use? [As of 1.29.0 these are the inline thresholds that the
164 different optimization levels use][inline-threshold]:
166 [inline-threshold]: https://github.com/rust-lang/rust/blob/1.29.0/src/librustc_codegen_llvm/back/write.rs#L2105-L2122
168 - `opt-level = 3` uses 275
169 - `opt-level = 2` uses 225
170 - `opt-level = "s"` uses 75
171 - `opt-level = "z"` uses 25
173 You should try `225` and `275` when optimizing for size.