]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # The MIR (Mid-level IR) |
2 | ||
6a06907d XL |
3 | <!-- toc --> |
4 | ||
a1dfa0c6 XL |
5 | MIR is Rust's _Mid-level Intermediate Representation_. It is |
6 | constructed from [HIR](../hir.html). MIR was introduced in | |
7 | [RFC 1211]. It is a radically simplified form of Rust that is used for | |
8 | certain flow-sensitive safety checks – notably the borrow checker! – | |
9 | and also for optimization and code generation. | |
10 | ||
11 | If you'd like a very high-level introduction to MIR, as well as some | |
12 | of the compiler concepts that it relies on (such as control-flow | |
13 | graphs and desugaring), you may enjoy the | |
14 | [rust-lang blog post that introduced MIR][blog]. | |
15 | ||
16 | [blog]: https://blog.rust-lang.org/2016/04/19/MIR.html | |
17 | ||
18 | ## Introduction to MIR | |
19 | ||
6a06907d | 20 | MIR is defined in the [`compiler/rustc_middle/src/mir/`][mir] module, but much of the code |
3c0e092e XL |
21 | that manipulates it is found in [`compiler/rustc_mir_build`][mirmanip_build], |
22 | [`compiler/rustc_mir_transform`][mirmanip_transform], and | |
23 | [`compiler/rustc_mir_dataflow`][mirmanip_dataflow]. | |
a1dfa0c6 | 24 | |
60c5eb7d | 25 | [RFC 1211]: https://rust-lang.github.io/rfcs/1211-mir.html |
a1dfa0c6 XL |
26 | |
27 | Some of the key characteristics of MIR are: | |
28 | ||
29 | - It is based on a [control-flow graph][cfg]. | |
30 | - It does not have nested expressions. | |
31 | - All types in MIR are fully explicit. | |
32 | ||
33 | [cfg]: ../appendix/background.html#cfg | |
34 | ||
35 | ## Key MIR vocabulary | |
36 | ||
37 | This section introduces the key concepts of MIR, summarized here: | |
38 | ||
39 | - **Basic blocks**: units of the control-flow graph, consisting of: | |
40 | - **statements:** actions with one successor | |
41 | - **terminators:** actions with potentially multiple successors; always at | |
42 | the end of a block | |
43 | - (if you're not familiar with the term *basic block*, see the [background | |
44 | chapter][cfg]) | |
45 | - **Locals:** Memory locations allocated on the stack (conceptually, at | |
46 | least), such as function arguments, local variables, and | |
47 | temporaries. These are identified by an index, written with a | |
48 | leading underscore, like `_1`. There is also a special "local" | |
49 | (`_0`) allocated to store the return value. | |
50 | - **Places:** expressions that identify a location in memory, like `_1` or | |
51 | `_1.f`. | |
52 | - **Rvalues:** expressions that produce a value. The "R" stands for | |
53 | the fact that these are the "right-hand side" of an assignment. | |
54 | - **Operands:** the arguments to an rvalue, which can either be a | |
55 | constant (like `22`) or a place (like `_1`). | |
56 | ||
3c0e092e | 57 | You can get a feeling for how MIR is constructed by translating simple |
a1dfa0c6 XL |
58 | programs into MIR and reading the pretty printed output. In fact, the |
59 | playground makes this easy, since it supplies a MIR button that will | |
60 | show you the MIR for your program. Try putting this program into play | |
61 | (or [clicking on this link][sample-play]), and then clicking the "MIR" | |
62 | button on the top: | |
63 | ||
64 | [sample-play]: https://play.rust-lang.org/?gist=30074856e62e74e91f06abd19bd72ece&version=stable | |
65 | ||
66 | ```rust | |
67 | fn main() { | |
68 | let mut vec = Vec::new(); | |
69 | vec.push(1); | |
70 | vec.push(2); | |
71 | } | |
72 | ``` | |
73 | ||
74 | You should see something like: | |
75 | ||
76 | ```mir | |
77 | // WARNING: This output format is intended for human consumers only | |
78 | // and is subject to change without notice. Knock yourself out. | |
79 | fn main() -> () { | |
80 | ... | |
81 | } | |
82 | ``` | |
83 | ||
84 | This is the MIR format for the `main` function. | |
85 | ||
86 | **Variable declarations.** If we drill in a bit, we'll see it begins | |
87 | with a bunch of variable declarations. They look like this: | |
88 | ||
89 | ```mir | |
90 | let mut _0: (); // return place | |
74b04a01 | 91 | let mut _1: std::vec::Vec<i32>; // in scope 0 at src/main.rs:2:9: 2:16 |
a1dfa0c6 XL |
92 | let mut _2: (); |
93 | let mut _3: &mut std::vec::Vec<i32>; | |
94 | let mut _4: (); | |
95 | let mut _5: &mut std::vec::Vec<i32>; | |
96 | ``` | |
97 | ||
98 | You can see that variables in MIR don't have names, they have indices, | |
99 | like `_0` or `_1`. We also intermingle the user's variables (e.g., | |
74b04a01 XL |
100 | `_1`) with temporary values (e.g., `_2` or `_3`). You can tell apart |
101 | user-defined variables because they have debuginfo associated to them (see below). | |
102 | ||
103 | **User variable debuginfo.** Below the variable declarations, we find the only | |
104 | hint that `_1` represents a user variable: | |
105 | ```mir | |
106 | scope 1 { | |
107 | debug vec => _1; // in scope 1 at src/main.rs:2:9: 2:16 | |
108 | } | |
109 | ``` | |
110 | Each `debug <Name> => <Place>;` annotation describes a named user variable, | |
111 | and where (i.e. the place) a debugger can find the data of that variable. | |
112 | Here the mapping is trivial, but optimizations may complicate the place, | |
113 | or lead to multiple user variables sharing the same place. | |
114 | Additionally, closure captures are described using the same system, and so | |
115 | they're complicated even without optimizations, e.g.: `debug x => (*((*_1).0: &T));`. | |
116 | ||
117 | The "scope" blocks (e.g., `scope 1 { .. }`) describe the lexical structure of | |
118 | the source program (which names were in scope when), so any part of the program | |
119 | annotated with `// in scope 0` would be missing `vec`, if you were stepping | |
120 | through the code in a debugger, for example. | |
a1dfa0c6 XL |
121 | |
122 | **Basic blocks.** Reading further, we see our first **basic block** (naturally | |
123 | it may look slightly different when you view it, and I am ignoring some of the | |
124 | comments): | |
125 | ||
126 | ```mir | |
127 | bb0: { | |
128 | StorageLive(_1); | |
129 | _1 = const <std::vec::Vec<T>>::new() -> bb2; | |
130 | } | |
131 | ``` | |
132 | ||
133 | A basic block is defined by a series of **statements** and a final | |
134 | **terminator**. In this case, there is one statement: | |
135 | ||
136 | ```mir | |
137 | StorageLive(_1); | |
138 | ``` | |
139 | ||
140 | This statement indicates that the variable `_1` is "live", meaning | |
141 | that it may be used later – this will persist until we encounter a | |
142 | `StorageDead(_1)` statement, which indicates that the variable `_1` is | |
143 | done being used. These "storage statements" are used by LLVM to | |
144 | allocate stack space. | |
145 | ||
146 | The **terminator** of the block `bb0` is the call to `Vec::new`: | |
147 | ||
148 | ```mir | |
149 | _1 = const <std::vec::Vec<T>>::new() -> bb2; | |
150 | ``` | |
151 | ||
152 | Terminators are different from statements because they can have more | |
153 | than one successor – that is, control may flow to different | |
154 | places. Function calls like the call to `Vec::new` are always | |
155 | terminators because of the possibility of unwinding, although in the | |
156 | case of `Vec::new` we are able to see that indeed unwinding is not | |
416331ca | 157 | possible, and hence we list only one successor block, `bb2`. |
a1dfa0c6 XL |
158 | |
159 | If we look ahead to `bb2`, we will see it looks like this: | |
160 | ||
161 | ```mir | |
162 | bb2: { | |
163 | StorageLive(_3); | |
164 | _3 = &mut _1; | |
165 | _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4]; | |
166 | } | |
167 | ``` | |
168 | ||
169 | Here there are two statements: another `StorageLive`, introducing the `_3` | |
170 | temporary, and then an assignment: | |
171 | ||
172 | ```mir | |
173 | _3 = &mut _1; | |
174 | ``` | |
175 | ||
176 | Assignments in general have the form: | |
177 | ||
178 | ```text | |
179 | <Place> = <Rvalue> | |
180 | ``` | |
181 | ||
182 | A place is an expression like `_3`, `_3.f` or `*_3` – it denotes a | |
183 | location in memory. An **Rvalue** is an expression that creates a | |
184 | value: in this case, the rvalue is a mutable borrow expression, which | |
185 | looks like `&mut <Place>`. So we can kind of define a grammar for | |
186 | rvalues like so: | |
187 | ||
188 | ```text | |
189 | <Rvalue> = & (mut)? <Place> | |
190 | | <Operand> + <Operand> | |
191 | | <Operand> - <Operand> | |
192 | | ... | |
193 | ||
194 | <Operand> = Constant | |
195 | | copy Place | |
196 | | move Place | |
197 | ``` | |
198 | ||
199 | As you can see from this grammar, rvalues cannot be nested – they can | |
200 | only reference places and constants. Moreover, when you use a place, | |
201 | we indicate whether we are **copying it** (which requires that the | |
202 | place have a type `T` where `T: Copy`) or **moving it** (which works | |
203 | for a place of any type). So, for example, if we had the expression `x | |
60c5eb7d | 204 | = a + b + c` in Rust, that would get compiled to two statements and a |
a1dfa0c6 XL |
205 | temporary: |
206 | ||
207 | ```mir | |
208 | TMP1 = a + b | |
209 | x = TMP1 + c | |
210 | ``` | |
211 | ||
212 | ([Try it and see][play-abc], though you may want to do release mode to skip | |
213 | over the overflow checks.) | |
214 | ||
215 | [play-abc]: https://play.rust-lang.org/?gist=1751196d63b2a71f8208119e59d8a5b6&version=stable | |
216 | ||
217 | ## MIR data types | |
218 | ||
6a06907d | 219 | The MIR data types are defined in the [`compiler/rustc_middle/src/mir/`][mir] |
a1dfa0c6 XL |
220 | module. Each of the key concepts mentioned in the previous section |
221 | maps in a fairly straightforward way to a Rust type. | |
222 | ||
6a06907d | 223 | The main MIR data type is [`Body`]. It contains the data for a single |
a1dfa0c6 XL |
224 | function (along with sub-instances of Mir for "promoted constants", |
225 | but [you can read about those below](#promoted)). | |
226 | ||
227 | - **Basic blocks**: The basic blocks are stored in the field | |
6a06907d XL |
228 | [`Body::basic_blocks`][basicblocks]; this is a vector |
229 | of [`BasicBlockData`] structures. Nobody ever references a | |
230 | basic block directly: instead, we pass around [`BasicBlock`] | |
231 | values, which are [newtype'd] indices into this vector. | |
232 | - **Statements** are represented by the type [`Statement`]. | |
233 | - **Terminators** are represented by the [`Terminator`]. | |
234 | - **Locals** are represented by a [newtype'd] index type [`Local`]. | |
235 | The data for a local variable is found in the | |
236 | [`Body::local_decls`][localdecls] vector). There is also a special constant | |
237 | [`RETURN_PLACE`] identifying the special "local" representing the return value. | |
238 | - **Places** are identified by the enum [`Place`]. There are a few | |
239 | variants: | |
a1dfa0c6 XL |
240 | - Local variables like `_1` |
241 | - Static variables `FOO` | |
242 | - **Projections**, which are fields or other things that "project | |
6a06907d XL |
243 | out" from a base place. These are represented by the type |
244 | [`ProjectionElem`]. So e.g. the place `_1.f` is a projection, | |
245 | with `f` being the "projection element" and `_1` being the base | |
a1dfa0c6 | 246 | path. `*_1` is also a projection, with the `*` being represented |
6a06907d XL |
247 | by the [`ProjectionElem::Deref`] element. |
248 | - **Rvalues** are represented by the enum [`Rvalue`]. | |
249 | - **Operands** are represented by the enum [`Operand`]. | |
a1dfa0c6 XL |
250 | |
251 | ## Representing constants | |
252 | ||
253 | *to be written* | |
254 | ||
255 | <a name="promoted"></a> | |
256 | ||
257 | ### Promoted constants | |
258 | ||
6a06907d | 259 | See the const-eval WG's [docs on promotion](https://github.com/rust-lang/const-eval/blob/master/promotion.md). |
a1dfa0c6 XL |
260 | |
261 | ||
6a06907d | 262 | [mir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html |
3c0e092e XL |
263 | [mirmanip_build]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/index.html |
264 | [mirmanip_transform]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html | |
265 | [mirmanip_dataflow]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_dataflow/index.html | |
6a06907d | 266 | [`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html |
ba9703b0 | 267 | [newtype'd]: ../appendix/glossary.html#newtype |
6a06907d XL |
268 | [basicblocks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.basic_blocks |
269 | [`BasicBlock`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlock.html | |
270 | [`BasicBlockData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlockData.html | |
271 | [`Statement`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Statement.html | |
272 | [`Terminator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/terminator/struct.Terminator.html | |
273 | [`Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Local.html | |
274 | [localdecls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.local_decls | |
275 | [`RETURN_PLACE`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/constant.RETURN_PLACE.html | |
276 | [`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Place.html | |
277 | [`ProjectionElem`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html | |
278 | [`ProjectionElem::Deref`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html#variant.Deref | |
279 | [`Rvalue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Rvalue.html | |
280 | [`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Operand.html |