src/doc/rustc-dev-guide/src/type-inference.md

   1 # Type inference
   2
   3 <!-- toc -->
   4
   5 Type inference is the process of automatic detection of the type of an
   6 expression.
   7
   8 It is what allows Rust to work with fewer or no type annotations,
   9 making things easier for users:
  10
  11 ```rust
  12 fn main() {
  13     let mut things = vec![];
  14     things.push("thing");
  15 }
  16 ```
  17
  18 Here, the type of `things` is *inferred* to be `Vec<&str>` because of the value
  19 we push into `things`.
  20
  21 The type inference is based on the standard Hindley-Milner (HM) type inference
  22 algorithm, but extended in various way to accommodate subtyping, region
  23 inference, and higher-ranked types.
  24
  25 ## A note on terminology
  26
  27 We use the notation `?T` to refer to inference variables, also called
  28 existential variables.
  29
  30 We use the terms "region" and "lifetime" interchangeably. Both refer to
  31 the `'a` in `&'a T`.
  32
  33 The term "bound region" refers to a region that is bound in a function
  34 signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is
  35 "free" if it is not bound.
  36
  37 ## Creating an inference context
  38
  39 You create and "enter" an inference context by doing something like
  40 the following:
  41
  42 ```rust,ignore
  43 tcx.infer_ctxt().enter(|infcx| {
  44     // Use the inference context `infcx` here.
  45 })
  46 ```
  47
  48 Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
  49 fresh `'cx`, while `'tcx` is the same as outside the inference context.
  50 (Again, see the [`ty` chapter][ty-ch] for more details on this setup.)
  51
  52 [ty-ch]: ty.html
  53
  54 The `tcx.infer_ctxt` method actually returns a builder, which means
  55 there are some kinds of configuration you can do before the `infcx` is
  56 created. See `InferCtxtBuilder` for more information.
  57
  58 <a name="vars"></a>
  59
  60 ## Inference variables
  61
  62 The main purpose of the inference context is to house a bunch of
  63 **inference variables** – these represent types or regions whose precise
  64 value is not yet known, but will be uncovered as we perform type-checking.
  65
  66 If you're familiar with the basic ideas of unification from H-M type
  67 systems, or logic languages like Prolog, this is the same concept. If
  68 you're not, you might want to read a tutorial on how H-M type
  69 inference works, or perhaps this blog post on
  70 [unification in the Chalk project].
  71
  72 [Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/
  73
  74 All told, the inference context stores four kinds of inference variables
  75 (as of <!-- date: 2018-01 --> January 2018):
  76
  77 - Type variables, which come in three varieties:
  78   - General type variables (the most common). These can be unified with any
  79     type.
  80   - Integral type variables, which can only be unified with an integral type,
  81     and arise from an integer literal expression like `22`.
  82   - Float type variables, which can only be unified with a float type, and
  83     arise from a float literal expression like `22.0`.
  84 - Region variables, which represent lifetimes, and arise all over the place.
  85
  86 All the type variables work in much the same way: you can create a new
  87 type variable, and what you get is `Ty<'tcx>` representing an
  88 unresolved type `?T`. Then later you can apply the various operations
  89 that the inferencer supports, such as equality or subtyping, and it
  90 will possibly **instantiate** (or **bind**) that `?T` to a specific
  91 value as a result.
  92
  93 The region variables work somewhat differently, and are described
  94 below in a separate section.
  95
  96 ## Enforcing equality / subtyping
  97
  98 The most basic operations you can perform in the type inferencer is
  99 **equality**, which forces two types `T` and `U` to be the same. The
 100 recommended way to add an equality constraint is to use the `at`
 101 method, roughly like so:
 102
 103 ```rust,ignore
 104 infcx.at(...).eq(t, u);
 105 ```
 106
 107 The first `at()` call provides a bit of context, i.e. why you are
 108 doing this unification, and in what environment, and the `eq` method
 109 performs the actual equality constraint.
 110
 111 When you equate things, you force them to be precisely equal. Equating
 112 returns an `InferResult` – if it returns `Err(err)`, then equating
 113 failed, and the enclosing `TypeError` will tell you what went wrong.
 114
 115 The success case is perhaps more interesting. The "primary" return
 116 type of `eq` is `()` – that is, when it succeeds, it doesn't return a
 117 value of any particular interest. Rather, it is executed for its
 118 side-effects of constraining type variables and so forth. However, the
 119 actual return type is not `()`, but rather `InferOk<()>`. The
 120 `InferOk` type is used to carry extra trait obligations – your job is
 121 to ensure that these are fulfilled (typically by enrolling them in a
 122 fulfillment context). See the [trait chapter] for more background on that.
 123
 124 [trait chapter]: traits/resolution.html
 125
 126 You can similarly enforce subtyping through `infcx.at(..).sub(..)`. The same
 127 basic concepts as above apply.
 128
 129 ## "Trying" equality
 130
 131 Sometimes you would like to know if it is *possible* to equate two
 132 types without error.  You can test that with `infcx.can_eq` (or
 133 `infcx.can_sub` for subtyping). If this returns `Ok`, then equality
 134 is possible – but in all cases, any side-effects are reversed.
 135
 136 Be aware, though, that the success or failure of these methods is always
 137 **modulo regions**. That is, two types `&'a u32` and `&'b u32` will
 138 return `Ok` for `can_eq`, even if `'a != 'b`.  This falls out from the
 139 "two-phase" nature of how we solve region constraints.
 140
 141 ## Snapshots
 142
 143 As described in the previous section on `can_eq`, often it is useful
 144 to be able to do a series of operations and then roll back their
 145 side-effects. This is done for various reasons: one of them is to be
 146 able to backtrack, trying out multiple possibilities before settling
 147 on which path to take. Another is in order to ensure that a series of
 148 smaller changes take place atomically or not at all.
 149
 150 To allow for this, the inference context supports a `snapshot` method.
 151 When you call it, it will start recording changes that occur from the
 152 operations you perform. When you are done, you can either invoke
 153 `rollback_to`, which will undo those changes, or else `confirm`, which
 154 will make them permanent. Snapshots can be nested as long as you follow
 155 a stack-like discipline.
 156
 157 Rather than use snapshots directly, it is often helpful to use the
 158 methods like `commit_if_ok` or `probe` that encapsulate higher-level
 159 patterns.
 160
 161 ## Subtyping obligations
 162
 163 One thing worth discussing is subtyping obligations. When you force
 164 two types to be a subtype, like `?T <: i32`, we can often convert those
 165 into equality constraints. This follows from Rust's rather limited notion
 166 of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`.
 167
 168 However, in some cases we have to be more careful. For example, when
 169 regions are involved. So if you have `?T <: &'a i32`, what we would do
 170 is to first "generalize" `&'a i32` into a type with a region variable:
 171 `&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then
 172 relate this new variable with the original bound:
 173
 174 ```text
 175 &'?b i32 <: &'a i32
 176 ```
 177
 178 This will result in a region constraint (see below) of `'?b: 'a`.
 179
 180 One final interesting case is relating two unbound type variables,
 181 like `?T <: ?U`.  In that case, we can't make progress, so we enqueue
 182 an obligation `Subtype(?T, ?U)` and return it via the `InferOk`
 183 mechanism. You'll have to try again when more details about `?T` or
 184 `?U` are known.
 185
 186 ## Region constraints
 187
 188 Regions are inferenced somewhat differently from types. Rather than
 189 eagerly unifying things, we simply collect constraints as we go, but
 190 make (almost) no attempt to solve regions. These constraints have the
 191 form of an "outlives" constraint:
 192
 193 ```text
 194 'a: 'b
 195 ```
 196
 197 Actually the code tends to view them as a subregion relation, but it's the same
 198 idea:
 199
 200 ```text
 201 'b <= 'a
 202 ```
 203
 204 (There are various other kinds of constraints, such as "verifys"; see
 205 the [`region_constraints`] module for details.)
 206
 207 There is one case where we do some amount of eager unification. If you have an
 208 equality constraint between two regions
 209
 210 ```text
 211 'a = 'b
 212 ```
 213
 214 we will record that fact in a unification table. You can then use
 215 [`opportunistic_resolve_var`] to convert `'b` to `'a` (or vice
 216 versa). This is sometimes needed to ensure termination of fixed-point
 217 algorithms.
 218
 219 [`region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/index.html
 220 [`opportunistic_resolve_var`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/struct.RegionConstraintCollector.html#method.opportunistic_resolve_var
 221
 222 ## Extracting region constraints
 223
 224 Ultimately, region constraints are only solved at the very end of
 225 type-checking, once all other constraints are known. There are two
 226 ways to solve region constraints right now: lexical and
 227 non-lexical. Eventually there will only be one.
 228
 229 To solve **lexical** region constraints, you invoke
 230 [`resolve_regions_and_report_errors`].  This "closes" the region
 231 constraint process and invokes the [`lexical_region_resolve`] code. Once
 232 this is done, any further attempt to equate or create a subtyping
 233 relationship will yield an ICE.
 234
 235 Non-lexical region constraints are not handled within the inference
 236 context. Instead, the NLL solver (actually, the MIR type-checker)
 237 invokes [`take_and_reset_region_constraints`] periodically. This
 238 extracts all of the outlives constraints from the region solver, but
 239 leaves the set of variables intact. This is used to get *just* the
 240 region constraints that resulted from some particular point in the
 241 program, since the NLL solver needs to know not just *what* regions
 242 were subregions, but also *where*. Finally, the NLL solver invokes
 243 [`take_region_var_origins`], which "closes" the region constraint
 244 process in the same way as normal solving.
 245
 246 [`resolve_regions_and_report_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.resolve_regions_and_report_errors
 247 [`lexical_region_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/lexical_region_resolve/index.html
 248 [`take_and_reset_region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_and_reset_region_constraints
 249 [`take_region_var_origins`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_region_var_origins
 250
 251 ## Lexical region resolution
 252
 253 Lexical region resolution is done by initially assigning each region
 254 variable to an empty value. We then process each outlives constraint
 255 repeatedly, growing region variables until a fixed-point is reached.
 256 Region variables can be grown using a least-upper-bound relation on
 257 the region lattice in a fairly straightforward fashion.