[rustc.git] / src / doc / rustc-dev-guide / src / type-inference.md

# Type inference

<!-- toc -->

Type inference is the process of automatic detection of the type of an
expression.

It is what allows Rust to work with fewer or no type annotations,
making things easier for users:

```rust
fn main() {
    let mut things = vec![];
    things.push("thing");
}
```

Here, the type of `things` is *inferred* to be `Vec<&str>` because of the value
we push into `things`.

The type inference is based on the standard Hindley-Milner (HM) type inference
algorithm, but extended in various way to accommodate subtyping, region
inference, and higher-ranked types.

## A note on terminology

We use the notation `?T` to refer to inference variables, also called
existential variables.

We use the terms "region" and "lifetime" interchangeably. Both refer to
the `'a` in `&'a T`.

The term "bound region" refers to a region that is bound in a function
signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is
"free" if it is not bound.

## Creating an inference context

You create and "enter" an inference context by doing something like
the following:

```rust,ignore
tcx.infer_ctxt().enter(|infcx| {
    // Use the inference context `infcx` here.
})
```

Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
fresh `'cx`, while `'tcx` is the same as outside the inference context.
(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)

[ty-ch]: ty.html

The `tcx.infer_ctxt` method actually returns a builder, which means
there are some kinds of configuration you can do before the `infcx` is
created. See `InferCtxtBuilder` for more information.

<a name="vars"></a>

## Inference variables

The main purpose of the inference context is to house a bunch of
**inference variables** – these represent types or regions whose precise
value is not yet known, but will be uncovered as we perform type-checking.

If you're familiar with the basic ideas of unification from H-M type
systems, or logic languages like Prolog, this is the same concept. If
you're not, you might want to read a tutorial on how H-M type
inference works, or perhaps this blog post on
[unification in the Chalk project].

[Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/

All told, the inference context stores four kinds of inference variables
(as of <!-- date: 2018-01 --> January 2018):

- Type variables, which come in three varieties:
  - General type variables (the most common). These can be unified with any
    type.
  - Integral type variables, which can only be unified with an integral type,
    and arise from an integer literal expression like `22`.
  - Float type variables, which can only be unified with a float type, and
    arise from a float literal expression like `22.0`.
- Region variables, which represent lifetimes, and arise all over the place.

All the type variables work in much the same way: you can create a new
type variable, and what you get is `Ty<'tcx>` representing an
unresolved type `?T`. Then later you can apply the various operations
that the inferencer supports, such as equality or subtyping, and it
will possibly **instantiate** (or **bind**) that `?T` to a specific
value as a result.

The region variables work somewhat differently, and are described
below in a separate section.

## Enforcing equality / subtyping

The most basic operations you can perform in the type inferencer is
**equality**, which forces two types `T` and `U` to be the same. The
recommended way to add an equality constraint is to use the `at`
method, roughly like so:

```rust,ignore
infcx.at(...).eq(t, u);
```

The first `at()` call provides a bit of context, i.e. why you are
doing this unification, and in what environment, and the `eq` method
performs the actual equality constraint.

When you equate things, you force them to be precisely equal. Equating
returns an `InferResult` – if it returns `Err(err)`, then equating
failed, and the enclosing `TypeError` will tell you what went wrong.

The success case is perhaps more interesting. The "primary" return
type of `eq` is `()` – that is, when it succeeds, it doesn't return a
value of any particular interest. Rather, it is executed for its
side-effects of constraining type variables and so forth. However, the
actual return type is not `()`, but rather `InferOk<()>`. The
`InferOk` type is used to carry extra trait obligations – your job is
to ensure that these are fulfilled (typically by enrolling them in a
fulfillment context). See the [trait chapter] for more background on that.

[trait chapter]: traits/resolution.html

You can similarly enforce subtyping through `infcx.at(..).sub(..)`. The same
basic concepts as above apply.

## "Trying" equality

Sometimes you would like to know if it is *possible* to equate two
types without error.  You can test that with `infcx.can_eq` (or
`infcx.can_sub` for subtyping). If this returns `Ok`, then equality
is possible – but in all cases, any side-effects are reversed.

Be aware, though, that the success or failure of these methods is always
**modulo regions**. That is, two types `&'a u32` and `&'b u32` will
return `Ok` for `can_eq`, even if `'a != 'b`.  This falls out from the
"two-phase" nature of how we solve region constraints.

## Snapshots

As described in the previous section on `can_eq`, often it is useful
to be able to do a series of operations and then roll back their
side-effects. This is done for various reasons: one of them is to be
able to backtrack, trying out multiple possibilities before settling
on which path to take. Another is in order to ensure that a series of
smaller changes take place atomically or not at all.

To allow for this, the inference context supports a `snapshot` method.
When you call it, it will start recording changes that occur from the
operations you perform. When you are done, you can either invoke
`rollback_to`, which will undo those changes, or else `confirm`, which
will make them permanent. Snapshots can be nested as long as you follow
a stack-like discipline.

Rather than use snapshots directly, it is often helpful to use the
methods like `commit_if_ok` or `probe` that encapsulate higher-level
patterns.

## Subtyping obligations

One thing worth discussing is subtyping obligations. When you force
two types to be a subtype, like `?T <: i32`, we can often convert those
into equality constraints. This follows from Rust's rather limited notion
of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`.

However, in some cases we have to be more careful. For example, when
regions are involved. So if you have `?T <: &'a i32`, what we would do
is to first "generalize" `&'a i32` into a type with a region variable:
`&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then
relate this new variable with the original bound:

```text
&'?b i32 <: &'a i32
```

This will result in a region constraint (see below) of `'?b: 'a`.

One final interesting case is relating two unbound type variables,
like `?T <: ?U`.  In that case, we can't make progress, so we enqueue
an obligation `Subtype(?T, ?U)` and return it via the `InferOk`
mechanism. You'll have to try again when more details about `?T` or
`?U` are known.

## Region constraints

Regions are inferenced somewhat differently from types. Rather than
eagerly unifying things, we simply collect constraints as we go, but
make (almost) no attempt to solve regions. These constraints have the
form of an "outlives" constraint:

```text
'a: 'b
```

Actually the code tends to view them as a subregion relation, but it's the same
idea:

```text
'b <= 'a
```

(There are various other kinds of constraints, such as "verifys"; see
the [`region_constraints`] module for details.)

There is one case where we do some amount of eager unification. If you have an
equality constraint between two regions

```text
'a = 'b
```

we will record that fact in a unification table. You can then use
[`opportunistic_resolve_var`] to convert `'b` to `'a` (or vice
versa). This is sometimes needed to ensure termination of fixed-point
algorithms.

[`region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/index.html
[`opportunistic_resolve_var`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/struct.RegionConstraintCollector.html#method.opportunistic_resolve_var

## Extracting region constraints

Ultimately, region constraints are only solved at the very end of
type-checking, once all other constraints are known. There are two
ways to solve region constraints right now: lexical and
non-lexical. Eventually there will only be one.

To solve **lexical** region constraints, you invoke
[`resolve_regions_and_report_errors`].  This "closes" the region
constraint process and invokes the [`lexical_region_resolve`] code. Once
this is done, any further attempt to equate or create a subtyping
relationship will yield an ICE.

Non-lexical region constraints are not handled within the inference
context. Instead, the NLL solver (actually, the MIR type-checker)
invokes [`take_and_reset_region_constraints`] periodically. This
extracts all of the outlives constraints from the region solver, but
leaves the set of variables intact. This is used to get *just* the
region constraints that resulted from some particular point in the
program, since the NLL solver needs to know not just *what* regions
were subregions, but also *where*. Finally, the NLL solver invokes
[`take_region_var_origins`], which "closes" the region constraint
process in the same way as normal solving.

[`resolve_regions_and_report_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.resolve_regions_and_report_errors
[`lexical_region_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/lexical_region_resolve/index.html
[`take_and_reset_region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_and_reset_region_constraints
[`take_region_var_origins`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_region_var_origins

## Lexical region resolution

Lexical region resolution is done by initially assigning each region
variable to an empty value. We then process each outlives constraint
repeatedly, growing region variables until a fixed-point is reached.
Region variables can be grown using a least-upper-bound relation on
the region lattice in a fairly straightforward fashion.
Commit	Line	Data
a1dfa0c6 XL	1	# Type inference
a1dfa0c6 XL	2
6a06907d XL	3	<!-- toc -->
6a06907d XL	4
a1dfa0c6 XL	5	Type inference is the process of automatic detection of the type of an
	6	expression.
	7
	8	It is what allows Rust to work with fewer or no type annotations,
	9	making things easier for users:
	10
dc9dc135	11	```rust
a1dfa0c6 XL	12	fn main() {
a1dfa0c6 XL	13	let mut things = vec![];
dc9dc135	14	things.push("thing");
a1dfa0c6 XL	15	}
	16	```
	17
dc9dc135	18	Here, the type of `things` is inferred to be `Vec<&str>` because of the value
a1dfa0c6 XL	19	we push into `things`.
	20
	21	The type inference is based on the standard Hindley-Milner (HM) type inference
	22	algorithm, but extended in various way to accommodate subtyping, region
	23	inference, and higher-ranked types.
	24
	25	## A note on terminology
	26
	27	We use the notation `?T` to refer to inference variables, also called
	28	existential variables.
	29
	30	We use the terms "region" and "lifetime" interchangeably. Both refer to
	31	the `'a` in `&'a T`.
	32
	33	The term "bound region" refers to a region that is bound in a function
	34	signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is
	35	"free" if it is not bound.
	36
	37	## Creating an inference context
	38
	39	You create and "enter" an inference context by doing something like
	40	the following:
	41
	42	```rust,ignore
	43	tcx.infer_ctxt().enter(\|infcx\| {
	44	// Use the inference context `infcx` here.
	45	})
	46	```
	47
dc9dc135 XL	48	Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
dc9dc135 XL	49	fresh `'cx`, while `'tcx` is the same as outside the inference context.
a1dfa0c6 XL	50	(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)
a1dfa0c6 XL	51
74b04a01 XL	52	[ty-ch]: ty.html
74b04a01 XL	53
a1dfa0c6 XL	54	The `tcx.infer_ctxt` method actually returns a builder, which means
	55	there are some kinds of configuration you can do before the `infcx` is
	56	created. See `InferCtxtBuilder` for more information.
	57
	58	<a name="vars"></a>
	59
	60	## Inference variables
	61
	62	The main purpose of the inference context is to house a bunch of
	63	inference variables – these represent types or regions whose precise
	64	value is not yet known, but will be uncovered as we perform type-checking.
	65
	66	If you're familiar with the basic ideas of unification from H-M type
	67	systems, or logic languages like Prolog, this is the same concept. If
	68	you're not, you might want to read a tutorial on how H-M type
	69	inference works, or perhaps this blog post on
	70	[unification in the Chalk project].
	71
	72	[Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/
	73
6a06907d XL	74	All told, the inference context stores four kinds of inference variables
6a06907d XL	75	(as of <!-- date: 2018-01 --> January 2018):
a1dfa0c6 XL	76
	77	- Type variables, which come in three varieties:
	78	- General type variables (the most common). These can be unified with any
	79	type.
	80	- Integral type variables, which can only be unified with an integral type,
	81	and arise from an integer literal expression like `22`.
	82	- Float type variables, which can only be unified with a float type, and
	83	arise from a float literal expression like `22.0`.
	84	- Region variables, which represent lifetimes, and arise all over the place.
	85
	86	All the type variables work in much the same way: you can create a new
	87	type variable, and what you get is `Ty<'tcx>` representing an
	88	unresolved type `?T`. Then later you can apply the various operations
	89	that the inferencer supports, such as equality or subtyping, and it
	90	will possibly instantiate (or bind) that `?T` to a specific
	91	value as a result.
	92
	93	The region variables work somewhat differently, and are described
	94	below in a separate section.
	95
	96	## Enforcing equality / subtyping
	97
	98	The most basic operations you can perform in the type inferencer is
	99	equality, which forces two types `T` and `U` to be the same. The
	100	recommended way to add an equality constraint is to use the `at`
	101	method, roughly like so:
	102
	103	```rust,ignore
	104	infcx.at(...).eq(t, u);
	105	```
	106
	107	The first `at()` call provides a bit of context, i.e. why you are
	108	doing this unification, and in what environment, and the `eq` method
	109	performs the actual equality constraint.
	110
	111	When you equate things, you force them to be precisely equal. Equating
	112	returns an `InferResult` – if it returns `Err(err)`, then equating
	113	failed, and the enclosing `TypeError` will tell you what went wrong.
	114
	115	The success case is perhaps more interesting. The "primary" return
	116	type of `eq` is `()` – that is, when it succeeds, it doesn't return a
	117	value of any particular interest. Rather, it is executed for its
	118	side-effects of constraining type variables and so forth. However, the
	119	actual return type is not `()`, but rather `InferOk<()>`. The
	120	`InferOk` type is used to carry extra trait obligations – your job is
	121	to ensure that these are fulfilled (typically by enrolling them in a
	122	fulfillment context). See the [trait chapter] for more background on that.
	123
	124	[trait chapter]: traits/resolution.html
	125
	126	You can similarly enforce subtyping through `infcx.at(..).sub(..)`. The same
	127	basic concepts as above apply.
	128
	129	## "Trying" equality
	130
	131	Sometimes you would like to know if it is possible to equate two
	132	types without error. You can test that with `infcx.can_eq` (or
	133	`infcx.can_sub` for subtyping). If this returns `Ok`, then equality
	134	is possible – but in all cases, any side-effects are reversed.
	135
	136	Be aware, though, that the success or failure of these methods is always
	137	modulo regions. That is, two types `&'a u32` and `&'b u32` will
	138	return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the
	139	"two-phase" nature of how we solve region constraints.
140
141	## Snapshots
142
143	As described in the previous section on `can_eq`, often it is useful
144	to be able to do a series of operations and then roll back their
145	side-effects. This is done for various reasons: one of them is to be
146	able to backtrack, trying out multiple possibilities before settling
147	on which path to take. Another is in order to ensure that a series of
148	smaller changes take place atomically or not at all.
149
150	To allow for this, the inference context supports a `snapshot` method.
151	When you call it, it will start recording changes that occur from the
152	operations you perform. When you are done, you can either invoke
153	`rollback_to`, which will undo those changes, or else `confirm`, which
ba9703b0	154	will make them permanent. Snapshots can be nested as long as you follow
a1dfa0c6 XL	155	a stack-like discipline.
	156
	157	Rather than use snapshots directly, it is often helpful to use the
	158	methods like `commit_if_ok` or `probe` that encapsulate higher-level
	159	patterns.
	160
	161	## Subtyping obligations
	162
	163	One thing worth discussing is subtyping obligations. When you force
	164	two types to be a subtype, like `?T <: i32`, we can often convert those
	165	into equality constraints. This follows from Rust's rather limited notion
	166	of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`.
	167
	168	However, in some cases we have to be more careful. For example, when
	169	regions are involved. So if you have `?T <: &'a i32`, what we would do
	170	is to first "generalize" `&'a i32` into a type with a region variable:
	171	`&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then
	172	relate this new variable with the original bound:
	173
	174	```text
	175	&'?b i32 <: &'a i32
	176	```
	177
	178	This will result in a region constraint (see below) of `'?b: 'a`.
	179
	180	One final interesting case is relating two unbound type variables,
	181	like `?T <: ?U`. In that case, we can't make progress, so we enqueue
	182	an obligation `Subtype(?T, ?U)` and return it via the `InferOk`
	183	mechanism. You'll have to try again when more details about `?T` or
	184	`?U` are known.
	185
	186	## Region constraints
	187
	188	Regions are inferenced somewhat differently from types. Rather than
	189	eagerly unifying things, we simply collect constraints as we go, but
	190	make (almost) no attempt to solve regions. These constraints have the
	191	form of an "outlives" constraint:
	192
	193	```text
	194	'a: 'b
	195	```
	196
	197	Actually the code tends to view them as a subregion relation, but it's the same
	198	idea:
	199
	200	```text
	201	'b <= 'a
	202	```
	203
	204	(There are various other kinds of constraints, such as "verifys"; see
ba9703b0	205	the [`region_constraints`] module for details.)
a1dfa0c6 XL	206
	207	There is one case where we do some amount of eager unification. If you have an
	208	equality constraint between two regions
	209
	210	```text
	211	'a = 'b
	212	```
	213
	214	we will record that fact in a unification table. You can then use
ba9703b0	215	[`opportunistic_resolve_var`] to convert `'b` to `'a` (or vice
a1dfa0c6 XL	216	versa). This is sometimes needed to ensure termination of fixed-point
	217	algorithms.
	218
ba9703b0 XL	219	[`region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/index.html
	220	[`opportunistic_resolve_var`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/region_constraints/struct.RegionConstraintCollector.html#method.opportunistic_resolve_var
	221
a1dfa0c6 XL	222	## Extracting region constraints
	223
	224	Ultimately, region constraints are only solved at the very end of
	225	type-checking, once all other constraints are known. There are two
	226	ways to solve region constraints right now: lexical and
	227	non-lexical. Eventually there will only be one.
	228
	229	To solve lexical region constraints, you invoke
ba9703b0 XL	230	[`resolve_regions_and_report_errors`]. This "closes" the region
ba9703b0 XL	231	constraint process and invokes the [`lexical_region_resolve`] code. Once
a1dfa0c6 XL	232	this is done, any further attempt to equate or create a subtyping
	233	relationship will yield an ICE.
	234
	235	Non-lexical region constraints are not handled within the inference
	236	context. Instead, the NLL solver (actually, the MIR type-checker)
ba9703b0	237	invokes [`take_and_reset_region_constraints`] periodically. This
a1dfa0c6 XL	238	extracts all of the outlives constraints from the region solver, but
	239	leaves the set of variables intact. This is used to get just the
	240	region constraints that resulted from some particular point in the
	241	program, since the NLL solver needs to know not just what regions
ba9703b0 XL	242	were subregions, but also where. Finally, the NLL solver invokes
ba9703b0 XL	243	[`take_region_var_origins`], which "closes" the region constraint
a1dfa0c6 XL	244	process in the same way as normal solving.
a1dfa0c6 XL	245
ba9703b0 XL	246	[`resolve_regions_and_report_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.resolve_regions_and_report_errors
	247	[`lexical_region_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/lexical_region_resolve/index.html
	248	[`take_and_reset_region_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_and_reset_region_constraints
	249	[`take_region_var_origins`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.take_region_var_origins
	250
a1dfa0c6 XL	251	## Lexical region resolution
	252
	253	Lexical region resolution is done by initially assigning each region
	254	variable to an empty value. We then process each outlives constraint
	255	repeatedly, growing region variables until a fixed-point is reached.
	256	Region variables can be grown using a least-upper-bound relation on
	257	the region lattice in a fairly straightforward fashion.