]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/miri.md
New upstream version 1.54.0+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / miri.md
1 # Miri
2
3 <!-- toc -->
4
5 The Miri (**MIR** **I**nterpreter) engine is a virtual machine for executing MIR without
6 compiling to machine code. It is usually invoked via `tcx.const_eval_*` functions.
7 In the following, we will refer to the Miri engine as just "Miri", but note that
8 there also is a stand-alone
9 [tool called "Miri"](https://github.com/rust-lang/miri/) that is based on the
10 engine (sometimes referred to as Miri-the-tool to disambiguate it from the
11 engine).
12
13 If you start out with a constant:
14
15 ```rust
16 const FOO: usize = 1 << 12;
17 ```
18
19 rustc doesn't actually invoke anything until the constant is either used or
20 placed into metadata.
21
22 Once you have a use-site like:
23
24 ```rust,ignore
25 type Foo = [u8; FOO - 42];
26 ```
27
28 The compiler needs to figure out the length of the array before being able to
29 create items that use the type (locals, constants, function arguments, ...).
30
31 To obtain the (in this case empty) parameter environment, one can call
32 `let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is
33
34 ```rust,ignore
35 let gid = GlobalId {
36 promoted: None,
37 instance: Instance::mono(length_def_id),
38 };
39 ```
40
41 Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of
42 the MIR of the array length expression. The MIR will look something like this:
43
44 ```mir
45 Foo::{{constant}}#0: usize = {
46 let mut _0: usize;
47 let mut _1: (usize, bool);
48
49 bb0: {
50 _1 = CheckedSub(const FOO, const 42usize);
51 assert(!move (_1.1: bool), "attempt to subtract with overflow") -> bb1;
52 }
53
54 bb1: {
55 _0 = move (_1.0: usize);
56 return;
57 }
58 }
59 ```
60
61 Before the evaluation, a virtual memory location (in this case essentially a
62 `vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.
63
64 At the start of the evaluation, `_0` and `_1` are
65 `Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite
66 a mouthful: [`Operand`] can represent either data stored somewhere in the
67 [interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization)
68 immediate data stored in-line. And [`Immediate`] can either be a single
69 (potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer),
70 or a pair of two of them. In our case, the single scalar value is *not* (yet)
71 initialized.
72
73 When the initialization of `_1` is invoked, the value of the `FOO` constant is
74 required, and triggers another call to `tcx.const_eval_*`, which will not be shown
75 here. If the evaluation of FOO is successful, `42` will be subtracted from its
76 value `4096` and the result stored in `_1` as
77 `Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. },
78 Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value,
79 the second part is a bool that's true if an overflow happened. A `Scalar::Raw`
80 also stores the size (in bytes) of this scalar value; we are eliding that here.
81
82 The next statement asserts that said boolean is `0`. In case the assertion
83 fails, its error message is used for reporting a compile-time error.
84
85 Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw {
86 data: 4054, .. }))` is stored in the virtual memory was allocated before the
87 evaluation. `_0` always refers to that location directly.
88
89 After the evaluation is done, the return value is converted from [`Operand`] to
90 [`ConstValue`] by [`op_to_const`]: the former representation is geared towards
91 what is needed *during* cost evaluation, while [`ConstValue`] is shaped by the
92 needs of the remaining parts of the compiler that consume the results of const
93 evaluation. As part of this conversion, for types with scalar values, even if
94 the resulting [`Operand`] is `Indirect`, it will return an immediate
95 `ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`).
96 This makes using the result much more efficient and also more convenient, as no
97 further queries need to be executed in order to get at something as simple as a
98 `usize`.
99
100 Future evaluations of the same constants will not actually invoke
101 Miri, but just use the cached result.
102
103 [`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/enum.Operand.html
104 [`Immediate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/enum.Immediate.html
105 [`ConstValue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.ConstValue.html
106 [`Scalar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.Scalar.html
107 [`op_to_const`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/const_eval/eval_queries/fn.op_to_const.html
108
109 ## Datastructures
110
111 Miri's outside-facing datastructures can be found in
112 [rustc_middle/src/mir/interpret](https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/mir/interpret).
113 This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A
114 `ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
115 pointer), `Slice` (to represent byte slices and strings, as needed for pattern
116 matching) or `ByRef`, which is used for anything else and refers to a virtual
117 allocation. These allocations can be accessed via the methods on
118 `tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer;
119 see [the next section](#memory) for more on that.
120
121 If you are expecting a numeric result, you can use `eval_usize` (panics on
122 anything that can't be representad as a `u64`) or `try_eval_usize` which results
123 in an `Option<u64>` yielding the `Scalar` if possible.
124
125 ## Memory
126
127 To support any kind of pointers, Miri needs to have a "virtual memory" that the
128 pointers can point to. This is implemented in the [`Memory`] type. In the
129 simplest model, every global variable, stack variable and every dynamic
130 allocation corresponds to an [`Allocation`] in that memory. (Actually using an
131 allocation for every MIR stack variable would be very inefficient; that's why we
132 have `Operand::Immediate` for stack variables that are both small and never have
133 their address taken. But that is purely an optimization.)
134
135 Such an `Allocation` is basically just a sequence of `u8` storing the value of
136 each byte in this allocation. (Plus some extra data, see below.) Every
137 `Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a
138 [`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and
139 an offset into the allocation (indicating which byte of the allocation the
140 pointer points to). It may seem odd that a `Pointer` is not just an integer
141 address, but remember that during const evaluation, we cannot know at which
142 actual integer address the allocation will end up -- so we use `AllocId` as
143 symbolic base addresses, which means we need a separate offset. (As an aside,
144 it turns out that pointers at run-time are
145 [more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)
146
147 These allocations exist so that references and raw pointers have something to
148 point to. There is no global linear heap in which things are allocated, but each
149 allocation (be it for a local variable, a static or a (future) heap allocation)
150 gets its own little memory with exactly the required size. So if you have a
151 pointer to an allocation for a local variable `a`, there is no possible (no
152 matter how unsafe) operation that you can do that would ever change said pointer
153 to a pointer to a different local variable `b`.
154 Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same.
155
156 This, however, causes a problem when we want to store a `Pointer` into an
157 `Allocation`: we cannot turn it into a sequence of `u8` of the right length!
158 `AllocId` and offset together are twice as big as a pointer "seems" to be. This
159 is what the `relocation` field of `Allocation` is for: the byte offset of the
160 `Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored
161 out-of-band. The two are reassembled when the `Pointer` is read from memory.
162 The other bit of extra data an `Allocation` needs is `undef_mask` for keeping
163 track of which of its bytes are initialized.
164
165 ### Global memory and exotic allocations
166
167 `Memory` exists only during the Miri evaluation; it gets destroyed when the
168 final value of the constant is computed. In case that constant contains any
169 pointers, those get "interned" and moved to a global "const eval memory" that is
170 part of `TyCtxt`. These allocations stay around for the remaining computation
171 and get serialized into the final output (so that dependent crates can use
172 them).
173
174 Moreover, to also support function pointers, the global memory in `TyCtxt` can
175 also contain "virtual allocations": instead of an `Allocation`, these contain an
176 `Instance`. That allows a `Pointer` to point to either normal data or a
177 function, which is needed to be able to evaluate casts from function pointers to
178 raw pointers.
179
180 Finally, the [`GlobalAlloc`] type used in the global memory also contains a
181 variant `Static` that points to a particular `const` or `static` item. This is
182 needed to support circular statics, where we need to have a `Pointer` to a
183 `static` for which we cannot yet have an `Allocation` as we do not know the
184 bytes of its value.
185
186 [`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/struct.Memory.html
187 [`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.Allocation.html
188 [`Pointer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.Pointer.html
189 [`GlobalAlloc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.GlobalAlloc.html
190
191 ### Pointer values vs Pointer types
192
193 One common cause of confusion in Miri is that being a pointer *value* and having
194 a pointer *type* are entirely independent properties. By "pointer value", we
195 refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into
196 Miri's virtual memory. This is in contrast to `Scalar::Raw`, which is just some
197 concrete integer.
198
199 However, a variable of pointer or reference *type*, such as `*const T` or `&T`,
200 does not have to have a pointer *value*: it could be obtaining by casting or
201 transmuting an integer to a pointer (as of <!-- date: 2021-01 --> January 2021
202 that is hard to do in const eval, but eventually `transmute` will be stable as a
203 `const fn`). And similarly, when casting or transmuting a reference to some
204 actual allocation to an integer, we end up with a pointer *value*
205 (`Scalar::Ptr`) at integer *type* (`usize`). This is a problem because we
206 cannot meaningfully perform integer operations such as division on pointer
207 values.
208
209 ## Interpretation
210
211 Although the main entry point to constant evaluation is the `tcx.const_eval_*`
212 functions, there are additional functions in
213 [rustc_mir/src/const_eval.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/const_eval/index.html)
214 that allow accessing the fields of a `ConstValue` (`ByRef` or otherwise). You should
215 never have to access an `Allocation` directly except for translating it to the
216 compilation target (at the moment just LLVM).
217
218 Miri starts by creating a virtual stack frame for the current constant that is
219 being evaluated. There's essentially no difference between a constant and a
220 function with no arguments, except that constants do not allow local (named)
221 variables at the time of writing this guide.
222
223 A stack frame is defined by the `Frame` type in
224 [rustc_mir/src/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir/src/interpret/eval_context.rs)
225 and contains all the local
226 variables memory (`None` at the start of evaluation). Each frame refers to the
227 evaluation of either the root constant or subsequent calls to `const fn`. The
228 evaluation of another constant simply calls `tcx.const_eval_*`, which produce an
229 entirely new and independent stack frame.
230
231 The frames are just a `Vec<Frame>`, there's no way to actually refer to a
232 `Frame`'s memory even if horrible shenanigans are done via unsafe code. The only
233 memory that can be referred to are `Allocation`s.
234
235 Miri now calls the `step` method (in
236 [rustc_mir/src/interpret/step.rs](https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir/src/interpret/step.rs)
237 ) until it either returns an error or has no further statements to execute. Each
238 statement will now initialize or modify the locals or the virtual memory
239 referred to by a local. This might require evaluating other constants or
240 statics, which just recursively invokes `tcx.const_eval_*`.