]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/type-inference.md
New upstream version 1.52.0~beta.3+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / type-inference.md
1 # Type inference
2
3 <!-- toc -->
4
5 Type inference is the process of automatic detection of the type of an
6 expression.
7
8 It is what allows Rust to work with fewer or no type annotations,
9 making things easier for users:
10
11 ```rust
12 fn main() {
13 let mut things = vec![];
14 things.push("thing");
15 }
16 ```
17
18 Here, the type of `things` is *inferred* to be `Vec<&str>` because of the value
19 we push into `things`.
20
21 The type inference is based on the standard Hindley-Milner (HM) type inference
22 algorithm, but extended in various way to accommodate subtyping, region
23 inference, and higher-ranked types.
24
25 ## A note on terminology
26
27 We use the notation `?T` to refer to inference variables, also called
28 existential variables.
29
30 We use the terms "region" and "lifetime" interchangeably. Both refer to
31 the `'a` in `&'a T`.
32
33 The term "bound region" refers to a region that is bound in a function
34 signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is
35 "free" if it is not bound.
36
37 ## Creating an inference context
38
39 You create and "enter" an inference context by doing something like
40 the following:
41
42 ```rust,ignore
43 tcx.infer_ctxt().enter(|infcx| {
44 // Use the inference context `infcx` here.
45 })
46 ```
47
48 Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
49 fresh `'cx`, while `'tcx` is the same as outside the inference context.
50 (Again, see the [`ty` chapter][ty-ch] for more details on this setup.)
51
52 [ty-ch]: ty.html
53
54 The `tcx.infer_ctxt` method actually returns a builder, which means
55 there are some kinds of configuration you can do before the `infcx` is
56 created. See `InferCtxtBuilder` for more information.
57
58 <a name="vars"></a>
59
60 ## Inference variables
61
62 The main purpose of the inference context is to house a bunch of
63 **inference variables** – these represent types or regions whose precise
64 value is not yet known, but will be uncovered as we perform type-checking.
65
66 If you're familiar with the basic ideas of unification from H-M type
67 systems, or logic languages like Prolog, this is the same concept. If
68 you're not, you might want to read a tutorial on how H-M type
69 inference works, or perhaps this blog post on
70 [unification in the Chalk project].
71
72 [Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/
73
74 All told, the inference context stores four kinds of inference variables
75 (as of <!-- date: 2018-01 --> January 2018):
76
77 - Type variables, which come in three varieties:
78 - General type variables (the most common). These can be unified with any
79 type.
80 - Integral type variables, which can only be unified with an integral type,
81 and arise from an integer literal expression like `22`.
82 - Float type variables, which can only be unified with a float type, and
83 arise from a float literal expression like `22.0`.
84 - Region variables, which represent lifetimes, and arise all over the place.
85
86 All the type variables work in much the same way: you can create a new
87 type variable, and what you get is `Ty<'tcx>` representing an
88 unresolved type `?T`. Then later you can apply the various operations
89 that the inferencer supports, such as equality or subtyping, and it
90 will possibly **instantiate** (or **bind**) that `?T` to a specific
91 value as a result.
92
93 The region variables work somewhat differently, and are described
94 below in a separate section.
95
96 ## Enforcing equality / subtyping
97
98 The most basic operations you can perform in the type inferencer is
99 **equality**, which forces two types `T` and `U` to be the same. The
100 recommended way to add an equality constraint is to use the `at`
101 method, roughly like so:
102
103 ```rust,ignore
104 infcx.at(...).eq(t, u);
105 ```
106
107 The first `at()` call provides a bit of context, i.e. why you are
108 doing this unification, and in what environment, and the `eq` method
109 performs the actual equality constraint.
110
111 When you equate things, you force them to be precisely equal. Equating
112 returns an `InferResult` – if it returns `Err(err)`, then equating
113 failed, and the enclosing `TypeError` will tell you what went wrong.
114
115 The success case is perhaps more interesting. The "primary" return
116 type of `eq` is `()` – that is, when it succeeds, it doesn't return a
117 value of any particular interest. Rather, it is executed for its
118 side-effects of constraining type variables and so forth. However, the
119 actual return type is not `()`, but rather `InferOk<()>`. The
120 `InferOk` type is used to carry extra trait obligations – your job is
121 to ensure that these are fulfilled (typically by enrolling them in a
122 fulfillment context). See the [trait chapter] for more background on that.
123
124 [trait chapter]: traits/resolution.html
125
126 You can similarly enforce subtyping through `infcx.at(..).sub(..)`. The same
127 basic concepts as above apply.
128
129 ## "Trying" equality
130
131 Sometimes you would like to know if it is *possible* to equate two
132 types without error. You can test that with `infcx.can_eq` (or
133 `infcx.can_sub` for subtyping). If this returns `Ok`, then equality
134 is possible – but in all cases, any side-effects are reversed.
135
136 Be aware, though, that the success or failure of these methods is always
137 **modulo regions**. That is, two types `&'a u32` and `&'b u32` will
138 return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the
139 "two-phase" nature of how we solve region constraints.
140
141 ## Snapshots
142
143 As described in the previous section on `can_eq`, often it is useful
144 to be able to do a series of operations and then roll back their
145 side-effects. This is done for various reasons: one of them is to be
146 able to backtrack, trying out multiple possibilities before settling
147 on which path to take. Another is in order to ensure that a series of
148 smaller changes take place atomically or not at all.
149
150 To allow for this, the inference context supports a `snapshot` method.
151 When you call it, it will start recording changes that occur from the
152 operations you perform. When you are done, you can either invoke
153 `rollback_to`, which will undo those changes, or else `confirm`, which
154 will make them permanent. Snapshots can be nested as long as you follow
155 a stack-like discipline.
156
157 Rather than use snapshots directly, it is often helpful to use the
158 methods like `commit_if_ok` or `probe` that encapsulate higher-level
159 patterns.
160
161 ## Subtyping obligations
162
163 One thing worth discussing is subtyping obligations. When you force
164 two types to be a subtype, like `?T <: i32`, we can often convert those
165 into equality constraints. This follows from Rust's rather limited notion
166 of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`.
167
168 However, in some cases we have to be more careful. For example, when
169 regions are involved. So if you have `?T <: &'a i32`, what we would do
170 is to first "generalize" `&'a i32` into a type with a region variable:
171 `&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then
172 relate this new variable with the original bound:
173
174 ```text
175 &'?b i32 <: &'a i32
176 ```
177
178 This will result in a region constraint (see below) of `'?b: 'a`.
179
180 One final interesting case is relating two unbound type variables,
181 like `?T <: ?U`. In that case, we can't make progress, so we enqueue
182 an obligation `Subtype(?T, ?U)` and return it via the `InferOk`
183 mechanism. You'll have to try again when more details about `?T` or
184 `?U` are known.
185
186 ## Region constraints
187
188 Regions are inferenced somewhat differently from types. Rather than
189 eagerly unifying things, we simply collect constraints as we go, but
190 make (almost) no attempt to solve regions. These constraints have the
191 form of an "outlives" constraint:
192
193 ```text
194 'a: 'b
195 ```
196
197 Actually the code tends to view them as a subregion relation, but it's the same
198 idea:
199
200 ```text
201 'b <= 'a
202 ```
203
204 (There are various other kinds of constraints, such as "verifys"; see
205 the [`region_constraints`] module for details.)
206
207 There is one case where we do some amount of eager unification. If you have an
208 equality constraint between two regions
209
210 ```text
211 'a = 'b
212 ```
213
214 we will record that fact in a unification table. You can then use
215 [`opportunistic_resolve_var`] to convert `'b` to `'a` (or vice
216 versa). This is sometimes needed to ensure termination of fixed-point
217 algorithms.
218
219 [`region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/index.html
220 [`opportunistic_resolve_var`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/struct.RegionConstraintCollector.html#method.opportunistic_resolve_var
221
222 ## Extracting region constraints
223
224 Ultimately, region constraints are only solved at the very end of
225 type-checking, once all other constraints are known. There are two
226 ways to solve region constraints right now: lexical and
227 non-lexical. Eventually there will only be one.
228
229 To solve **lexical** region constraints, you invoke
230 [`resolve_regions_and_report_errors`]. This "closes" the region
231 constraint process and invokes the [`lexical_region_resolve`] code. Once
232 this is done, any further attempt to equate or create a subtyping
233 relationship will yield an ICE.
234
235 Non-lexical region constraints are not handled within the inference
236 context. Instead, the NLL solver (actually, the MIR type-checker)
237 invokes [`take_and_reset_region_constraints`] periodically. This
238 extracts all of the outlives constraints from the region solver, but
239 leaves the set of variables intact. This is used to get *just* the
240 region constraints that resulted from some particular point in the
241 program, since the NLL solver needs to know not just *what* regions
242 were subregions, but also *where*. Finally, the NLL solver invokes
243 [`take_region_var_origins`], which "closes" the region constraint
244 process in the same way as normal solving.
245
246 [`resolve_regions_and_report_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.resolve_regions_and_report_errors
247 [`lexical_region_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/lexical_region_resolve/index.html
248 [`take_and_reset_region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_and_reset_region_constraints
249 [`take_region_var_origins`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_region_var_origins
250
251 ## Lexical region resolution
252
253 Lexical region resolution is done by initially assigning each region
254 variable to an empty value. We then process each outlives constraint
255 repeatedly, growing region variables until a fixed-point is reached.
256 Region variables can be grown using a least-upper-bound relation on
257 the region lattice in a fairly straightforward fashion.