]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/mir/index.md
Merge tag 'debian/1.52.1+dfsg1-1_exp2' into proxmox/buster
[rustc.git] / src / doc / rustc-dev-guide / src / mir / index.md
CommitLineData
a1dfa0c6
XL
1# The MIR (Mid-level IR)
2
6a06907d
XL
3<!-- toc -->
4
a1dfa0c6
XL
5MIR is Rust's _Mid-level Intermediate Representation_. It is
6constructed from [HIR](../hir.html). MIR was introduced in
7[RFC 1211]. It is a radically simplified form of Rust that is used for
8certain flow-sensitive safety checks – notably the borrow checker! –
9and also for optimization and code generation.
10
11If you'd like a very high-level introduction to MIR, as well as some
12of the compiler concepts that it relies on (such as control-flow
13graphs and desugaring), you may enjoy the
14[rust-lang blog post that introduced MIR][blog].
15
16[blog]: https://blog.rust-lang.org/2016/04/19/MIR.html
17
18## Introduction to MIR
19
6a06907d
XL
20MIR is defined in the [`compiler/rustc_middle/src/mir/`][mir] module, but much of the code
21that manipulates it is found in [`compiler/rustc_mir`][mirmanip].
a1dfa0c6 22
60c5eb7d 23[RFC 1211]: https://rust-lang.github.io/rfcs/1211-mir.html
a1dfa0c6
XL
24
25Some of the key characteristics of MIR are:
26
27- It is based on a [control-flow graph][cfg].
28- It does not have nested expressions.
29- All types in MIR are fully explicit.
30
31[cfg]: ../appendix/background.html#cfg
32
33## Key MIR vocabulary
34
35This section introduces the key concepts of MIR, summarized here:
36
37- **Basic blocks**: units of the control-flow graph, consisting of:
38 - **statements:** actions with one successor
39 - **terminators:** actions with potentially multiple successors; always at
40 the end of a block
41 - (if you're not familiar with the term *basic block*, see the [background
42 chapter][cfg])
43- **Locals:** Memory locations allocated on the stack (conceptually, at
44 least), such as function arguments, local variables, and
45 temporaries. These are identified by an index, written with a
46 leading underscore, like `_1`. There is also a special "local"
47 (`_0`) allocated to store the return value.
48- **Places:** expressions that identify a location in memory, like `_1` or
49 `_1.f`.
50- **Rvalues:** expressions that produce a value. The "R" stands for
51 the fact that these are the "right-hand side" of an assignment.
52 - **Operands:** the arguments to an rvalue, which can either be a
53 constant (like `22`) or a place (like `_1`).
54
55You can get a feeling for how MIR is structed by translating simple
56programs into MIR and reading the pretty printed output. In fact, the
57playground makes this easy, since it supplies a MIR button that will
58show you the MIR for your program. Try putting this program into play
59(or [clicking on this link][sample-play]), and then clicking the "MIR"
60button on the top:
61
62[sample-play]: https://play.rust-lang.org/?gist=30074856e62e74e91f06abd19bd72ece&version=stable
63
64```rust
65fn main() {
66 let mut vec = Vec::new();
67 vec.push(1);
68 vec.push(2);
69}
70```
71
72You should see something like:
73
74```mir
75// WARNING: This output format is intended for human consumers only
76// and is subject to change without notice. Knock yourself out.
77fn main() -> () {
78 ...
79}
80```
81
82This is the MIR format for the `main` function.
83
84**Variable declarations.** If we drill in a bit, we'll see it begins
85with a bunch of variable declarations. They look like this:
86
87```mir
88let mut _0: (); // return place
74b04a01 89let mut _1: std::vec::Vec<i32>; // in scope 0 at src/main.rs:2:9: 2:16
a1dfa0c6
XL
90let mut _2: ();
91let mut _3: &mut std::vec::Vec<i32>;
92let mut _4: ();
93let mut _5: &mut std::vec::Vec<i32>;
94```
95
96You can see that variables in MIR don't have names, they have indices,
97like `_0` or `_1`. We also intermingle the user's variables (e.g.,
74b04a01
XL
98`_1`) with temporary values (e.g., `_2` or `_3`). You can tell apart
99user-defined variables because they have debuginfo associated to them (see below).
100
101**User variable debuginfo.** Below the variable declarations, we find the only
102hint that `_1` represents a user variable:
103```mir
104scope 1 {
105 debug vec => _1; // in scope 1 at src/main.rs:2:9: 2:16
106}
107```
108Each `debug <Name> => <Place>;` annotation describes a named user variable,
109and where (i.e. the place) a debugger can find the data of that variable.
110Here the mapping is trivial, but optimizations may complicate the place,
111or lead to multiple user variables sharing the same place.
112Additionally, closure captures are described using the same system, and so
113they're complicated even without optimizations, e.g.: `debug x => (*((*_1).0: &T));`.
114
115The "scope" blocks (e.g., `scope 1 { .. }`) describe the lexical structure of
116the source program (which names were in scope when), so any part of the program
117annotated with `// in scope 0` would be missing `vec`, if you were stepping
118through the code in a debugger, for example.
a1dfa0c6
XL
119
120**Basic blocks.** Reading further, we see our first **basic block** (naturally
121it may look slightly different when you view it, and I am ignoring some of the
122comments):
123
124```mir
125bb0: {
126 StorageLive(_1);
127 _1 = const <std::vec::Vec<T>>::new() -> bb2;
128}
129```
130
131A basic block is defined by a series of **statements** and a final
132**terminator**. In this case, there is one statement:
133
134```mir
135StorageLive(_1);
136```
137
138This statement indicates that the variable `_1` is "live", meaning
139that it may be used later – this will persist until we encounter a
140`StorageDead(_1)` statement, which indicates that the variable `_1` is
141done being used. These "storage statements" are used by LLVM to
142allocate stack space.
143
144The **terminator** of the block `bb0` is the call to `Vec::new`:
145
146```mir
147_1 = const <std::vec::Vec<T>>::new() -> bb2;
148```
149
150Terminators are different from statements because they can have more
151than one successor – that is, control may flow to different
152places. Function calls like the call to `Vec::new` are always
153terminators because of the possibility of unwinding, although in the
154case of `Vec::new` we are able to see that indeed unwinding is not
416331ca 155possible, and hence we list only one successor block, `bb2`.
a1dfa0c6
XL
156
157If we look ahead to `bb2`, we will see it looks like this:
158
159```mir
160bb2: {
161 StorageLive(_3);
162 _3 = &mut _1;
163 _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
164}
165```
166
167Here there are two statements: another `StorageLive`, introducing the `_3`
168temporary, and then an assignment:
169
170```mir
171_3 = &mut _1;
172```
173
174Assignments in general have the form:
175
176```text
177<Place> = <Rvalue>
178```
179
180A place is an expression like `_3`, `_3.f` or `*_3` – it denotes a
181location in memory. An **Rvalue** is an expression that creates a
182value: in this case, the rvalue is a mutable borrow expression, which
183looks like `&mut <Place>`. So we can kind of define a grammar for
184rvalues like so:
185
186```text
187<Rvalue> = & (mut)? <Place>
188 | <Operand> + <Operand>
189 | <Operand> - <Operand>
190 | ...
191
192<Operand> = Constant
193 | copy Place
194 | move Place
195```
196
197As you can see from this grammar, rvalues cannot be nested – they can
198only reference places and constants. Moreover, when you use a place,
199we indicate whether we are **copying it** (which requires that the
200place have a type `T` where `T: Copy`) or **moving it** (which works
201for a place of any type). So, for example, if we had the expression `x
60c5eb7d 202= a + b + c` in Rust, that would get compiled to two statements and a
a1dfa0c6
XL
203temporary:
204
205```mir
206TMP1 = a + b
207x = TMP1 + c
208```
209
210([Try it and see][play-abc], though you may want to do release mode to skip
211over the overflow checks.)
212
213[play-abc]: https://play.rust-lang.org/?gist=1751196d63b2a71f8208119e59d8a5b6&version=stable
214
215## MIR data types
216
6a06907d 217The MIR data types are defined in the [`compiler/rustc_middle/src/mir/`][mir]
a1dfa0c6
XL
218module. Each of the key concepts mentioned in the previous section
219maps in a fairly straightforward way to a Rust type.
220
6a06907d 221The main MIR data type is [`Body`]. It contains the data for a single
a1dfa0c6
XL
222function (along with sub-instances of Mir for "promoted constants",
223but [you can read about those below](#promoted)).
224
225- **Basic blocks**: The basic blocks are stored in the field
6a06907d
XL
226 [`Body::basic_blocks`][basicblocks]; this is a vector
227 of [`BasicBlockData`] structures. Nobody ever references a
228 basic block directly: instead, we pass around [`BasicBlock`]
229 values, which are [newtype'd] indices into this vector.
230- **Statements** are represented by the type [`Statement`].
231- **Terminators** are represented by the [`Terminator`].
232- **Locals** are represented by a [newtype'd] index type [`Local`].
233 The data for a local variable is found in the
234 [`Body::local_decls`][localdecls] vector). There is also a special constant
235 [`RETURN_PLACE`] identifying the special "local" representing the return value.
236- **Places** are identified by the enum [`Place`]. There are a few
237 variants:
a1dfa0c6
XL
238 - Local variables like `_1`
239 - Static variables `FOO`
240 - **Projections**, which are fields or other things that "project
6a06907d
XL
241 out" from a base place. These are represented by the type
242 [`ProjectionElem`]. So e.g. the place `_1.f` is a projection,
243 with `f` being the "projection element" and `_1` being the base
a1dfa0c6 244 path. `*_1` is also a projection, with the `*` being represented
6a06907d
XL
245 by the [`ProjectionElem::Deref`] element.
246- **Rvalues** are represented by the enum [`Rvalue`].
247- **Operands** are represented by the enum [`Operand`].
a1dfa0c6
XL
248
249## Representing constants
250
251*to be written*
252
253<a name="promoted"></a>
254
255### Promoted constants
256
6a06907d 257See the const-eval WG's [docs on promotion](https://github.com/rust-lang/const-eval/blob/master/promotion.md).
a1dfa0c6
XL
258
259
6a06907d
XL
260[mir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html
261[mirmanip]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/index.html
262[`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html
ba9703b0 263[newtype'd]: ../appendix/glossary.html#newtype
6a06907d
XL
264[basicblocks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.basic_blocks
265[`BasicBlock`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlock.html
266[`BasicBlockData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlockData.html
267[`Statement`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Statement.html
268[`Terminator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/terminator/struct.Terminator.html
269[`Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Local.html
270[localdecls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.local_decls
271[`RETURN_PLACE`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/constant.RETURN_PLACE.html
272[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Place.html
273[`ProjectionElem`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html
274[`ProjectionElem::Deref`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html#variant.Deref
275[`Rvalue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Rvalue.html
276[`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Operand.html