]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # Type inference |
2 | ||
6a06907d XL |
3 | <!-- toc --> |
4 | ||
a1dfa0c6 XL |
5 | Type inference is the process of automatic detection of the type of an |
6 | expression. | |
7 | ||
8 | It is what allows Rust to work with fewer or no type annotations, | |
9 | making things easier for users: | |
10 | ||
dc9dc135 | 11 | ```rust |
a1dfa0c6 XL |
12 | fn main() { |
13 | let mut things = vec![]; | |
dc9dc135 | 14 | things.push("thing"); |
a1dfa0c6 XL |
15 | } |
16 | ``` | |
17 | ||
dc9dc135 | 18 | Here, the type of `things` is *inferred* to be `Vec<&str>` because of the value |
a1dfa0c6 XL |
19 | we push into `things`. |
20 | ||
21 | The type inference is based on the standard Hindley-Milner (HM) type inference | |
22 | algorithm, but extended in various way to accommodate subtyping, region | |
23 | inference, and higher-ranked types. | |
24 | ||
25 | ## A note on terminology | |
26 | ||
27 | We use the notation `?T` to refer to inference variables, also called | |
28 | existential variables. | |
29 | ||
30 | We use the terms "region" and "lifetime" interchangeably. Both refer to | |
31 | the `'a` in `&'a T`. | |
32 | ||
33 | The term "bound region" refers to a region that is bound in a function | |
34 | signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is | |
35 | "free" if it is not bound. | |
36 | ||
37 | ## Creating an inference context | |
38 | ||
39 | You create and "enter" an inference context by doing something like | |
40 | the following: | |
41 | ||
42 | ```rust,ignore | |
43 | tcx.infer_ctxt().enter(|infcx| { | |
44 | // Use the inference context `infcx` here. | |
45 | }) | |
46 | ``` | |
47 | ||
dc9dc135 XL |
48 | Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some |
49 | fresh `'cx`, while `'tcx` is the same as outside the inference context. | |
a1dfa0c6 XL |
50 | (Again, see the [`ty` chapter][ty-ch] for more details on this setup.) |
51 | ||
74b04a01 XL |
52 | [ty-ch]: ty.html |
53 | ||
a1dfa0c6 XL |
54 | The `tcx.infer_ctxt` method actually returns a builder, which means |
55 | there are some kinds of configuration you can do before the `infcx` is | |
56 | created. See `InferCtxtBuilder` for more information. | |
57 | ||
58 | <a name="vars"></a> | |
59 | ||
60 | ## Inference variables | |
61 | ||
62 | The main purpose of the inference context is to house a bunch of | |
63 | **inference variables** – these represent types or regions whose precise | |
64 | value is not yet known, but will be uncovered as we perform type-checking. | |
65 | ||
66 | If you're familiar with the basic ideas of unification from H-M type | |
67 | systems, or logic languages like Prolog, this is the same concept. If | |
68 | you're not, you might want to read a tutorial on how H-M type | |
69 | inference works, or perhaps this blog post on | |
70 | [unification in the Chalk project]. | |
71 | ||
72 | [Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/ | |
73 | ||
6a06907d XL |
74 | All told, the inference context stores four kinds of inference variables |
75 | (as of <!-- date: 2018-01 --> January 2018): | |
a1dfa0c6 XL |
76 | |
77 | - Type variables, which come in three varieties: | |
78 | - General type variables (the most common). These can be unified with any | |
79 | type. | |
80 | - Integral type variables, which can only be unified with an integral type, | |
81 | and arise from an integer literal expression like `22`. | |
82 | - Float type variables, which can only be unified with a float type, and | |
83 | arise from a float literal expression like `22.0`. | |
84 | - Region variables, which represent lifetimes, and arise all over the place. | |
85 | ||
86 | All the type variables work in much the same way: you can create a new | |
87 | type variable, and what you get is `Ty<'tcx>` representing an | |
88 | unresolved type `?T`. Then later you can apply the various operations | |
89 | that the inferencer supports, such as equality or subtyping, and it | |
90 | will possibly **instantiate** (or **bind**) that `?T` to a specific | |
91 | value as a result. | |
92 | ||
93 | The region variables work somewhat differently, and are described | |
94 | below in a separate section. | |
95 | ||
96 | ## Enforcing equality / subtyping | |
97 | ||
98 | The most basic operations you can perform in the type inferencer is | |
99 | **equality**, which forces two types `T` and `U` to be the same. The | |
100 | recommended way to add an equality constraint is to use the `at` | |
101 | method, roughly like so: | |
102 | ||
103 | ```rust,ignore | |
104 | infcx.at(...).eq(t, u); | |
105 | ``` | |
106 | ||
107 | The first `at()` call provides a bit of context, i.e. why you are | |
108 | doing this unification, and in what environment, and the `eq` method | |
109 | performs the actual equality constraint. | |
110 | ||
111 | When you equate things, you force them to be precisely equal. Equating | |
112 | returns an `InferResult` – if it returns `Err(err)`, then equating | |
113 | failed, and the enclosing `TypeError` will tell you what went wrong. | |
114 | ||
115 | The success case is perhaps more interesting. The "primary" return | |
116 | type of `eq` is `()` – that is, when it succeeds, it doesn't return a | |
117 | value of any particular interest. Rather, it is executed for its | |
118 | side-effects of constraining type variables and so forth. However, the | |
119 | actual return type is not `()`, but rather `InferOk<()>`. The | |
120 | `InferOk` type is used to carry extra trait obligations – your job is | |
121 | to ensure that these are fulfilled (typically by enrolling them in a | |
122 | fulfillment context). See the [trait chapter] for more background on that. | |
123 | ||
124 | [trait chapter]: traits/resolution.html | |
125 | ||
126 | You can similarly enforce subtyping through `infcx.at(..).sub(..)`. The same | |
127 | basic concepts as above apply. | |
128 | ||
129 | ## "Trying" equality | |
130 | ||
131 | Sometimes you would like to know if it is *possible* to equate two | |
132 | types without error. You can test that with `infcx.can_eq` (or | |
133 | `infcx.can_sub` for subtyping). If this returns `Ok`, then equality | |
134 | is possible – but in all cases, any side-effects are reversed. | |
135 | ||
136 | Be aware, though, that the success or failure of these methods is always | |
137 | **modulo regions**. That is, two types `&'a u32` and `&'b u32` will | |
138 | return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the | |
139 | "two-phase" nature of how we solve region constraints. | |
140 | ||
141 | ## Snapshots | |
142 | ||
143 | As described in the previous section on `can_eq`, often it is useful | |
144 | to be able to do a series of operations and then roll back their | |
145 | side-effects. This is done for various reasons: one of them is to be | |
146 | able to backtrack, trying out multiple possibilities before settling | |
147 | on which path to take. Another is in order to ensure that a series of | |
148 | smaller changes take place atomically or not at all. | |
149 | ||
150 | To allow for this, the inference context supports a `snapshot` method. | |
151 | When you call it, it will start recording changes that occur from the | |
152 | operations you perform. When you are done, you can either invoke | |
153 | `rollback_to`, which will undo those changes, or else `confirm`, which | |
ba9703b0 | 154 | will make them permanent. Snapshots can be nested as long as you follow |
a1dfa0c6 XL |
155 | a stack-like discipline. |
156 | ||
157 | Rather than use snapshots directly, it is often helpful to use the | |
158 | methods like `commit_if_ok` or `probe` that encapsulate higher-level | |
159 | patterns. | |
160 | ||
161 | ## Subtyping obligations | |
162 | ||
163 | One thing worth discussing is subtyping obligations. When you force | |
164 | two types to be a subtype, like `?T <: i32`, we can often convert those | |
165 | into equality constraints. This follows from Rust's rather limited notion | |
166 | of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`. | |
167 | ||
168 | However, in some cases we have to be more careful. For example, when | |
169 | regions are involved. So if you have `?T <: &'a i32`, what we would do | |
170 | is to first "generalize" `&'a i32` into a type with a region variable: | |
171 | `&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then | |
172 | relate this new variable with the original bound: | |
173 | ||
174 | ```text | |
175 | &'?b i32 <: &'a i32 | |
176 | ``` | |
177 | ||
178 | This will result in a region constraint (see below) of `'?b: 'a`. | |
179 | ||
180 | One final interesting case is relating two unbound type variables, | |
181 | like `?T <: ?U`. In that case, we can't make progress, so we enqueue | |
182 | an obligation `Subtype(?T, ?U)` and return it via the `InferOk` | |
183 | mechanism. You'll have to try again when more details about `?T` or | |
184 | `?U` are known. | |
185 | ||
186 | ## Region constraints | |
187 | ||
188 | Regions are inferenced somewhat differently from types. Rather than | |
189 | eagerly unifying things, we simply collect constraints as we go, but | |
190 | make (almost) no attempt to solve regions. These constraints have the | |
191 | form of an "outlives" constraint: | |
192 | ||
193 | ```text | |
194 | 'a: 'b | |
195 | ``` | |
196 | ||
197 | Actually the code tends to view them as a subregion relation, but it's the same | |
198 | idea: | |
199 | ||
200 | ```text | |
201 | 'b <= 'a | |
202 | ``` | |
203 | ||
204 | (There are various other kinds of constraints, such as "verifys"; see | |
ba9703b0 | 205 | the [`region_constraints`] module for details.) |
a1dfa0c6 XL |
206 | |
207 | There is one case where we do some amount of eager unification. If you have an | |
208 | equality constraint between two regions | |
209 | ||
210 | ```text | |
211 | 'a = 'b | |
212 | ``` | |
213 | ||
214 | we will record that fact in a unification table. You can then use | |
ba9703b0 | 215 | [`opportunistic_resolve_var`] to convert `'b` to `'a` (or vice |
a1dfa0c6 XL |
216 | versa). This is sometimes needed to ensure termination of fixed-point |
217 | algorithms. | |
218 | ||
ba9703b0 XL |
219 | [`region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/index.html |
220 | [`opportunistic_resolve_var`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/struct.RegionConstraintCollector.html#method.opportunistic_resolve_var | |
221 | ||
a1dfa0c6 XL |
222 | ## Extracting region constraints |
223 | ||
224 | Ultimately, region constraints are only solved at the very end of | |
225 | type-checking, once all other constraints are known. There are two | |
226 | ways to solve region constraints right now: lexical and | |
227 | non-lexical. Eventually there will only be one. | |
228 | ||
229 | To solve **lexical** region constraints, you invoke | |
ba9703b0 XL |
230 | [`resolve_regions_and_report_errors`]. This "closes" the region |
231 | constraint process and invokes the [`lexical_region_resolve`] code. Once | |
a1dfa0c6 XL |
232 | this is done, any further attempt to equate or create a subtyping |
233 | relationship will yield an ICE. | |
234 | ||
235 | Non-lexical region constraints are not handled within the inference | |
236 | context. Instead, the NLL solver (actually, the MIR type-checker) | |
ba9703b0 | 237 | invokes [`take_and_reset_region_constraints`] periodically. This |
a1dfa0c6 XL |
238 | extracts all of the outlives constraints from the region solver, but |
239 | leaves the set of variables intact. This is used to get *just* the | |
240 | region constraints that resulted from some particular point in the | |
241 | program, since the NLL solver needs to know not just *what* regions | |
ba9703b0 XL |
242 | were subregions, but also *where*. Finally, the NLL solver invokes |
243 | [`take_region_var_origins`], which "closes" the region constraint | |
a1dfa0c6 XL |
244 | process in the same way as normal solving. |
245 | ||
ba9703b0 XL |
246 | [`resolve_regions_and_report_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.resolve_regions_and_report_errors |
247 | [`lexical_region_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/lexical_region_resolve/index.html | |
248 | [`take_and_reset_region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_and_reset_region_constraints | |
249 | [`take_region_var_origins`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_region_var_origins | |
250 | ||
a1dfa0c6 XL |
251 | ## Lexical region resolution |
252 | ||
253 | Lexical region resolution is done by initially assigning each region | |
254 | variable to an empty value. We then process each outlives constraint | |
255 | repeatedly, growing region variables until a fixed-point is reached. | |
256 | Region variables can be grown using a least-upper-bound relation on | |
257 | the region lattice in a fairly straightforward fashion. |