]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # The MIR (Mid-level IR) |
2 | ||
6a06907d XL |
3 | <!-- toc --> |
4 | ||
a1dfa0c6 XL |
5 | MIR is Rust's _Mid-level Intermediate Representation_. It is |
6 | constructed from [HIR](../hir.html). MIR was introduced in | |
7 | [RFC 1211]. It is a radically simplified form of Rust that is used for | |
8 | certain flow-sensitive safety checks – notably the borrow checker! – | |
9 | and also for optimization and code generation. | |
10 | ||
11 | If you'd like a very high-level introduction to MIR, as well as some | |
12 | of the compiler concepts that it relies on (such as control-flow | |
13 | graphs and desugaring), you may enjoy the | |
14 | [rust-lang blog post that introduced MIR][blog]. | |
15 | ||
16 | [blog]: https://blog.rust-lang.org/2016/04/19/MIR.html | |
17 | ||
18 | ## Introduction to MIR | |
19 | ||
6a06907d XL |
20 | MIR is defined in the [`compiler/rustc_middle/src/mir/`][mir] module, but much of the code |
21 | that manipulates it is found in [`compiler/rustc_mir`][mirmanip]. | |
a1dfa0c6 | 22 | |
60c5eb7d | 23 | [RFC 1211]: https://rust-lang.github.io/rfcs/1211-mir.html |
a1dfa0c6 XL |
24 | |
25 | Some of the key characteristics of MIR are: | |
26 | ||
27 | - It is based on a [control-flow graph][cfg]. | |
28 | - It does not have nested expressions. | |
29 | - All types in MIR are fully explicit. | |
30 | ||
31 | [cfg]: ../appendix/background.html#cfg | |
32 | ||
33 | ## Key MIR vocabulary | |
34 | ||
35 | This section introduces the key concepts of MIR, summarized here: | |
36 | ||
37 | - **Basic blocks**: units of the control-flow graph, consisting of: | |
38 | - **statements:** actions with one successor | |
39 | - **terminators:** actions with potentially multiple successors; always at | |
40 | the end of a block | |
41 | - (if you're not familiar with the term *basic block*, see the [background | |
42 | chapter][cfg]) | |
43 | - **Locals:** Memory locations allocated on the stack (conceptually, at | |
44 | least), such as function arguments, local variables, and | |
45 | temporaries. These are identified by an index, written with a | |
46 | leading underscore, like `_1`. There is also a special "local" | |
47 | (`_0`) allocated to store the return value. | |
48 | - **Places:** expressions that identify a location in memory, like `_1` or | |
49 | `_1.f`. | |
50 | - **Rvalues:** expressions that produce a value. The "R" stands for | |
51 | the fact that these are the "right-hand side" of an assignment. | |
52 | - **Operands:** the arguments to an rvalue, which can either be a | |
53 | constant (like `22`) or a place (like `_1`). | |
54 | ||
55 | You can get a feeling for how MIR is structed by translating simple | |
56 | programs into MIR and reading the pretty printed output. In fact, the | |
57 | playground makes this easy, since it supplies a MIR button that will | |
58 | show you the MIR for your program. Try putting this program into play | |
59 | (or [clicking on this link][sample-play]), and then clicking the "MIR" | |
60 | button on the top: | |
61 | ||
62 | [sample-play]: https://play.rust-lang.org/?gist=30074856e62e74e91f06abd19bd72ece&version=stable | |
63 | ||
64 | ```rust | |
65 | fn main() { | |
66 | let mut vec = Vec::new(); | |
67 | vec.push(1); | |
68 | vec.push(2); | |
69 | } | |
70 | ``` | |
71 | ||
72 | You should see something like: | |
73 | ||
74 | ```mir | |
75 | // WARNING: This output format is intended for human consumers only | |
76 | // and is subject to change without notice. Knock yourself out. | |
77 | fn main() -> () { | |
78 | ... | |
79 | } | |
80 | ``` | |
81 | ||
82 | This is the MIR format for the `main` function. | |
83 | ||
84 | **Variable declarations.** If we drill in a bit, we'll see it begins | |
85 | with a bunch of variable declarations. They look like this: | |
86 | ||
87 | ```mir | |
88 | let mut _0: (); // return place | |
74b04a01 | 89 | let mut _1: std::vec::Vec<i32>; // in scope 0 at src/main.rs:2:9: 2:16 |
a1dfa0c6 XL |
90 | let mut _2: (); |
91 | let mut _3: &mut std::vec::Vec<i32>; | |
92 | let mut _4: (); | |
93 | let mut _5: &mut std::vec::Vec<i32>; | |
94 | ``` | |
95 | ||
96 | You can see that variables in MIR don't have names, they have indices, | |
97 | like `_0` or `_1`. We also intermingle the user's variables (e.g., | |
74b04a01 XL |
98 | `_1`) with temporary values (e.g., `_2` or `_3`). You can tell apart |
99 | user-defined variables because they have debuginfo associated to them (see below). | |
100 | ||
101 | **User variable debuginfo.** Below the variable declarations, we find the only | |
102 | hint that `_1` represents a user variable: | |
103 | ```mir | |
104 | scope 1 { | |
105 | debug vec => _1; // in scope 1 at src/main.rs:2:9: 2:16 | |
106 | } | |
107 | ``` | |
108 | Each `debug <Name> => <Place>;` annotation describes a named user variable, | |
109 | and where (i.e. the place) a debugger can find the data of that variable. | |
110 | Here the mapping is trivial, but optimizations may complicate the place, | |
111 | or lead to multiple user variables sharing the same place. | |
112 | Additionally, closure captures are described using the same system, and so | |
113 | they're complicated even without optimizations, e.g.: `debug x => (*((*_1).0: &T));`. | |
114 | ||
115 | The "scope" blocks (e.g., `scope 1 { .. }`) describe the lexical structure of | |
116 | the source program (which names were in scope when), so any part of the program | |
117 | annotated with `// in scope 0` would be missing `vec`, if you were stepping | |
118 | through the code in a debugger, for example. | |
a1dfa0c6 XL |
119 | |
120 | **Basic blocks.** Reading further, we see our first **basic block** (naturally | |
121 | it may look slightly different when you view it, and I am ignoring some of the | |
122 | comments): | |
123 | ||
124 | ```mir | |
125 | bb0: { | |
126 | StorageLive(_1); | |
127 | _1 = const <std::vec::Vec<T>>::new() -> bb2; | |
128 | } | |
129 | ``` | |
130 | ||
131 | A basic block is defined by a series of **statements** and a final | |
132 | **terminator**. In this case, there is one statement: | |
133 | ||
134 | ```mir | |
135 | StorageLive(_1); | |
136 | ``` | |
137 | ||
138 | This statement indicates that the variable `_1` is "live", meaning | |
139 | that it may be used later – this will persist until we encounter a | |
140 | `StorageDead(_1)` statement, which indicates that the variable `_1` is | |
141 | done being used. These "storage statements" are used by LLVM to | |
142 | allocate stack space. | |
143 | ||
144 | The **terminator** of the block `bb0` is the call to `Vec::new`: | |
145 | ||
146 | ```mir | |
147 | _1 = const <std::vec::Vec<T>>::new() -> bb2; | |
148 | ``` | |
149 | ||
150 | Terminators are different from statements because they can have more | |
151 | than one successor – that is, control may flow to different | |
152 | places. Function calls like the call to `Vec::new` are always | |
153 | terminators because of the possibility of unwinding, although in the | |
154 | case of `Vec::new` we are able to see that indeed unwinding is not | |
416331ca | 155 | possible, and hence we list only one successor block, `bb2`. |
a1dfa0c6 XL |
156 | |
157 | If we look ahead to `bb2`, we will see it looks like this: | |
158 | ||
159 | ```mir | |
160 | bb2: { | |
161 | StorageLive(_3); | |
162 | _3 = &mut _1; | |
163 | _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4]; | |
164 | } | |
165 | ``` | |
166 | ||
167 | Here there are two statements: another `StorageLive`, introducing the `_3` | |
168 | temporary, and then an assignment: | |
169 | ||
170 | ```mir | |
171 | _3 = &mut _1; | |
172 | ``` | |
173 | ||
174 | Assignments in general have the form: | |
175 | ||
176 | ```text | |
177 | <Place> = <Rvalue> | |
178 | ``` | |
179 | ||
180 | A place is an expression like `_3`, `_3.f` or `*_3` – it denotes a | |
181 | location in memory. An **Rvalue** is an expression that creates a | |
182 | value: in this case, the rvalue is a mutable borrow expression, which | |
183 | looks like `&mut <Place>`. So we can kind of define a grammar for | |
184 | rvalues like so: | |
185 | ||
186 | ```text | |
187 | <Rvalue> = & (mut)? <Place> | |
188 | | <Operand> + <Operand> | |
189 | | <Operand> - <Operand> | |
190 | | ... | |
191 | ||
192 | <Operand> = Constant | |
193 | | copy Place | |
194 | | move Place | |
195 | ``` | |
196 | ||
197 | As you can see from this grammar, rvalues cannot be nested – they can | |
198 | only reference places and constants. Moreover, when you use a place, | |
199 | we indicate whether we are **copying it** (which requires that the | |
200 | place have a type `T` where `T: Copy`) or **moving it** (which works | |
201 | for a place of any type). So, for example, if we had the expression `x | |
60c5eb7d | 202 | = a + b + c` in Rust, that would get compiled to two statements and a |
a1dfa0c6 XL |
203 | temporary: |
204 | ||
205 | ```mir | |
206 | TMP1 = a + b | |
207 | x = TMP1 + c | |
208 | ``` | |
209 | ||
210 | ([Try it and see][play-abc], though you may want to do release mode to skip | |
211 | over the overflow checks.) | |
212 | ||
213 | [play-abc]: https://play.rust-lang.org/?gist=1751196d63b2a71f8208119e59d8a5b6&version=stable | |
214 | ||
215 | ## MIR data types | |
216 | ||
6a06907d | 217 | The MIR data types are defined in the [`compiler/rustc_middle/src/mir/`][mir] |
a1dfa0c6 XL |
218 | module. Each of the key concepts mentioned in the previous section |
219 | maps in a fairly straightforward way to a Rust type. | |
220 | ||
6a06907d | 221 | The main MIR data type is [`Body`]. It contains the data for a single |
a1dfa0c6 XL |
222 | function (along with sub-instances of Mir for "promoted constants", |
223 | but [you can read about those below](#promoted)). | |
224 | ||
225 | - **Basic blocks**: The basic blocks are stored in the field | |
6a06907d XL |
226 | [`Body::basic_blocks`][basicblocks]; this is a vector |
227 | of [`BasicBlockData`] structures. Nobody ever references a | |
228 | basic block directly: instead, we pass around [`BasicBlock`] | |
229 | values, which are [newtype'd] indices into this vector. | |
230 | - **Statements** are represented by the type [`Statement`]. | |
231 | - **Terminators** are represented by the [`Terminator`]. | |
232 | - **Locals** are represented by a [newtype'd] index type [`Local`]. | |
233 | The data for a local variable is found in the | |
234 | [`Body::local_decls`][localdecls] vector). There is also a special constant | |
235 | [`RETURN_PLACE`] identifying the special "local" representing the return value. | |
236 | - **Places** are identified by the enum [`Place`]. There are a few | |
237 | variants: | |
a1dfa0c6 XL |
238 | - Local variables like `_1` |
239 | - Static variables `FOO` | |
240 | - **Projections**, which are fields or other things that "project | |
6a06907d XL |
241 | out" from a base place. These are represented by the type |
242 | [`ProjectionElem`]. So e.g. the place `_1.f` is a projection, | |
243 | with `f` being the "projection element" and `_1` being the base | |
a1dfa0c6 | 244 | path. `*_1` is also a projection, with the `*` being represented |
6a06907d XL |
245 | by the [`ProjectionElem::Deref`] element. |
246 | - **Rvalues** are represented by the enum [`Rvalue`]. | |
247 | - **Operands** are represented by the enum [`Operand`]. | |
a1dfa0c6 XL |
248 | |
249 | ## Representing constants | |
250 | ||
251 | *to be written* | |
252 | ||
253 | <a name="promoted"></a> | |
254 | ||
255 | ### Promoted constants | |
256 | ||
6a06907d | 257 | See the const-eval WG's [docs on promotion](https://github.com/rust-lang/const-eval/blob/master/promotion.md). |
a1dfa0c6 XL |
258 | |
259 | ||
6a06907d XL |
260 | [mir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html |
261 | [mirmanip]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/index.html | |
262 | [`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html | |
ba9703b0 | 263 | [newtype'd]: ../appendix/glossary.html#newtype |
6a06907d XL |
264 | [basicblocks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.basic_blocks |
265 | [`BasicBlock`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlock.html | |
266 | [`BasicBlockData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlockData.html | |
267 | [`Statement`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Statement.html | |
268 | [`Terminator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/terminator/struct.Terminator.html | |
269 | [`Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Local.html | |
270 | [localdecls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.local_decls | |
271 | [`RETURN_PLACE`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/constant.RETURN_PLACE.html | |
272 | [`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Place.html | |
273 | [`ProjectionElem`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html | |
274 | [`ProjectionElem::Deref`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html#variant.Deref | |
275 | [`Rvalue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Rvalue.html | |
276 | [`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Operand.html |