]> git.proxmox.com Git - rustc.git/blame - src/librustc/middle/infer/region_inference/README.md
Imported Upstream version 1.8.0+dfsg1
[rustc.git] / src / librustc / middle / infer / region_inference / README.md
CommitLineData
85aaf69f
SL
1Region inference
2
3# Terminology
4
b039eaaf 5Note that we use the terms region and lifetime interchangeably.
85aaf69f
SL
6
7# Introduction
8
9Region inference uses a somewhat more involved algorithm than type
b039eaaf 10inference. It is not the most efficient thing ever written though it
85aaf69f
SL
11seems to work well enough in practice (famous last words). The reason
12that we use a different algorithm is because, unlike with types, it is
13impractical to hand-annotate with regions (in some cases, there aren't
14even the requisite syntactic forms). So we have to get it right, and
15it's worth spending more time on a more involved analysis. Moreover,
16regions are a simpler case than types: they don't have aggregate
17structure, for example.
18
19Unlike normal type inference, which is similar in spirit to H-M and thus
20works progressively, the region type inference works by accumulating
21constraints over the course of a function. Finally, at the end of
22processing a function, we process and solve the constraints all at
23once.
24
25The constraints are always of one of three possible forms:
26
b039eaaf
SL
27- `ConstrainVarSubVar(Ri, Rj)` states that region variable Ri must be
28 a subregion of Rj
29- `ConstrainRegSubVar(R, Ri)` states that the concrete region R (which
30 must not be a variable) must be a subregion of the variable Ri
31- `ConstrainVarSubReg(Ri, R)` states the variable Ri shoudl be less
32 than the concrete region R. This is kind of deprecated and ought to
33 be replaced with a verify (they essentially play the same role).
34
35In addition to constraints, we also gather up a set of "verifys"
36(what, you don't think Verify is a noun? Get used to it my
37friend!). These represent relations that must hold but which don't
38influence inference proper. These take the form of:
39
40- `VerifyRegSubReg(Ri, Rj)` indicates that Ri <= Rj must hold,
41 where Rj is not an inference variable (and Ri may or may not contain
42 one). This doesn't influence inference because we will already have
43 inferred Ri to be as small as possible, so then we just test whether
44 that result was less than Rj or not.
45- `VerifyGenericBound(R, Vb)` is a more complex expression which tests
46 that the region R must satisfy the bound `Vb`. The bounds themselves
47 may have structure like "must outlive one of the following regions"
48 or "must outlive ALL of the following regions. These bounds arise
49 from constraints like `T: 'a` -- if we know that `T: 'b` and `T: 'c`
50 (say, from where clauses), then we can conclude that `T: 'a` if `'b:
51 'a` *or* `'c: 'a`.
85aaf69f
SL
52
53# Building up the constraints
54
55Variables and constraints are created using the following methods:
56
57- `new_region_var()` creates a new, unconstrained region variable;
b039eaaf
SL
58- `make_subregion(Ri, Rj)` states that Ri is a subregion of Rj
59- `lub_regions(Ri, Rj) -> Rk` returns a region Rk which is
60 the smallest region that is greater than both Ri and Rj
61- `glb_regions(Ri, Rj) -> Rk` returns a region Rk which is
62 the greatest region that is smaller than both Ri and Rj
85aaf69f
SL
63
64The actual region resolution algorithm is not entirely
65obvious, though it is also not overly complex.
66
67## Snapshotting
68
69It is also permitted to try (and rollback) changes to the graph. This
70is done by invoking `start_snapshot()`, which returns a value. Then
71later you can call `rollback_to()` which undoes the work.
72Alternatively, you can call `commit()` which ends all snapshots.
73Snapshots can be recursive---so you can start a snapshot when another
74is in progress, but only the root snapshot can "commit".
75
85aaf69f
SL
76## The problem
77
78Basically our input is a directed graph where nodes can be divided
79into two categories: region variables and concrete regions. Each edge
80`R -> S` in the graph represents a constraint that the region `R` is a
81subregion of the region `S`.
82
83Region variable nodes can have arbitrary degree. There is one region
84variable node per region variable.
85
86Each concrete region node is associated with some, well, concrete
87region: e.g., a free lifetime, or the region for a particular scope.
88Note that there may be more than one concrete region node for a
89particular region value. Moreover, because of how the graph is built,
90we know that all concrete region nodes have either in-degree 1 or
91out-degree 1.
92
93Before resolution begins, we build up the constraints in a hashmap
94that maps `Constraint` keys to spans. During resolution, we construct
95the actual `Graph` structure that we describe here.
96
b039eaaf
SL
97## Computing the values for region variables
98
99The algorithm is a simple dataflow algorithm. Each region variable
100begins as empty. We iterate over the constraints, and for each constraint
101we grow the relevant region variable to be as big as it must be to meet all the
102constraints. This means the region variables can grow to be `'static` if
103necessary.
104
105## Verification
106
107After all constraints are fully propoagated, we do a "verification"
108step where we walk over the verify bounds and check that they are
109satisfied. These bounds represent the "maximal" values that a region
110variable can take on, basically.
85aaf69f
SL
111
112# The Region Hierarchy
113
114## Without closures
115
116Let's first consider the region hierarchy without thinking about
117closures, because they add a lot of complications. The region
118hierarchy *basically* mirrors the lexical structure of the code.
119There is a region for every piece of 'evaluation' that occurs, meaning
120every expression, block, and pattern (patterns are considered to
121"execute" by testing the value they are applied to and creating any
122relevant bindings). So, for example:
123
c1a9b12d 124 fn foo(x: isize, y: isize) { // -+
85aaf69f
SL
125 // +------------+ // |
126 // | +-----+ // |
127 // | +-+ +-+ +-+ // |
128 // | | | | | | | // |
129 // v v v v v v v // |
130 let z = x + y; // |
131 ... // |
132 } // -+
133
134 fn bar() { ... }
135
136In this example, there is a region for the fn body block as a whole,
137and then a subregion for the declaration of the local variable.
138Within that, there are sublifetimes for the assignment pattern and
139also the expression `x + y`. The expression itself has sublifetimes
140for evaluating `x` and `y`.
141
142## Function calls
143
144Function calls are a bit tricky. I will describe how we handle them
145*now* and then a bit about how we can improve them (Issue #6268).
146
147Consider a function call like `func(expr1, expr2)`, where `func`,
148`arg1`, and `arg2` are all arbitrary expressions. Currently,
149we construct a region hierarchy like:
150
151 +----------------+
152 | |
153 +--+ +---+ +---+|
154 v v v v v vv
155 func(expr1, expr2)
156
157Here you can see that the call as a whole has a region and the
158function plus arguments are subregions of that. As a side-effect of
159this, we get a lot of spurious errors around nested calls, in
160particular when combined with `&mut` functions. For example, a call
161like this one
162
163 self.foo(self.bar())
164
165where both `foo` and `bar` are `&mut self` functions will always yield
166an error.
167
168Here is a more involved example (which is safe) so we can see what's
169going on:
170
c1a9b12d 171 struct Foo { f: usize, g: usize }
85aaf69f 172 ...
c1a9b12d 173 fn add(p: &mut usize, v: usize) {
85aaf69f
SL
174 *p += v;
175 }
176 ...
c1a9b12d 177 fn inc(p: &mut usize) -> usize {
85aaf69f
SL
178 *p += 1; *p
179 }
180 fn weird() {
181 let mut x: Box<Foo> = box Foo { ... };
182 'a: add(&mut (*x).f,
183 'b: inc(&mut (*x).f)) // (..)
184 }
185
186The important part is the line marked `(..)` which contains a call to
187`add()`. The first argument is a mutable borrow of the field `f`. The
188second argument also borrows the field `f`. Now, in the current borrow
189checker, the first borrow is given the lifetime of the call to
190`add()`, `'a`. The second borrow is given the lifetime of `'b` of the
191call to `inc()`. Because `'b` is considered to be a sublifetime of
192`'a`, an error is reported since there are two co-existing mutable
193borrows of the same data.
194
195However, if we were to examine the lifetimes a bit more carefully, we
196can see that this error is unnecessary. Let's examine the lifetimes
197involved with `'a` in detail. We'll break apart all the steps involved
198in a call expression:
199
200 'a: {
201 'a_arg1: let a_temp1: ... = add;
c1a9b12d
SL
202 'a_arg2: let a_temp2: &'a mut usize = &'a mut (*x).f;
203 'a_arg3: let a_temp3: usize = {
85aaf69f
SL
204 let b_temp1: ... = inc;
205 let b_temp2: &'b = &'b mut (*x).f;
206 'b_call: b_temp1(b_temp2)
207 };
208 'a_call: a_temp1(a_temp2, a_temp3) // (**)
209 }
210
211Here we see that the lifetime `'a` includes a number of substatements.
212In particular, there is this lifetime I've called `'a_call` that
213corresponds to the *actual execution of the function `add()`*, after
214all arguments have been evaluated. There is a corresponding lifetime
215`'b_call` for the execution of `inc()`. If we wanted to be precise
216about it, the lifetime of the two borrows should be `'a_call` and
217`'b_call` respectively, since the references that were created
218will not be dereferenced except during the execution itself.
219
220However, this model by itself is not sound. The reason is that
221while the two references that are created will never be used
222simultaneously, it is still true that the first reference is
223*created* before the second argument is evaluated, and so even though
224it will not be *dereferenced* during the evaluation of the second
225argument, it can still be *invalidated* by that evaluation. Consider
226this similar but unsound example:
227
c1a9b12d 228 struct Foo { f: usize, g: usize }
85aaf69f 229 ...
c1a9b12d 230 fn add(p: &mut usize, v: usize) {
85aaf69f
SL
231 *p += v;
232 }
233 ...
c1a9b12d 234 fn consume(x: Box<Foo>) -> usize {
85aaf69f
SL
235 x.f + x.g
236 }
237 fn weird() {
238 let mut x: Box<Foo> = box Foo { ... };
239 'a: add(&mut (*x).f, consume(x)) // (..)
240 }
241
242In this case, the second argument to `add` actually consumes `x`, thus
243invalidating the first argument.
244
245So, for now, we exclude the `call` lifetimes from our model.
246Eventually I would like to include them, but we will have to make the
247borrow checker handle this situation correctly. In particular, if
248there is a reference created whose lifetime does not enclose
249the borrow expression, we must issue sufficient restrictions to ensure
250that the pointee remains valid.
251
c34b1796
AL
252## Modeling closures
253
254Integrating closures properly into the model is a bit of
255work-in-progress. In an ideal world, we would model closures as
256closely as possible after their desugared equivalents. That is, a
257closure type would be modeled as a struct, and the region hierarchy of
258different closure bodies would be completely distinct from all other
259fns. We are generally moving in that direction but there are
260complications in terms of the implementation.
261
262In practice what we currently do is somewhat different. The basis for
263the current approach is the observation that the only time that
264regions from distinct fn bodies interact with one another is through
265an upvar or the type of a fn parameter (since closures live in the fn
266body namespace, they can in fact have fn parameters whose types
267include regions from the surrounding fn body). For these cases, there
268are separate mechanisms which ensure that the regions that appear in
269upvars/parameters outlive the dynamic extent of each call to the
270closure:
271
2721. Types must outlive the region of any expression where they are used.
273 For a closure type `C` to outlive a region `'r`, that implies that the
274 types of all its upvars must outlive `'r`.
2752. Parameters must outlive the region of any fn that they are passed to.
276
277Therefore, we can -- sort of -- assume that any region from an
278enclosing fns is larger than any region from one of its enclosed
279fn. And that is precisely what we do: when building the region
280hierarchy, each region lives in its own distinct subtree, but if we
281are asked to compute the `LUB(r1, r2)` of two regions, and those
282regions are in disjoint subtrees, we compare the lexical nesting of
283the two regions.
284
285*Ideas for improving the situation:* (FIXME #3696) The correctness
286argument here is subtle and a bit hand-wavy. The ideal, as stated
287earlier, would be to model things in such a way that it corresponds
288more closely to the desugared code. The best approach for doing this
289is a bit unclear: it may in fact be possible to *actually* desugar
290before we start, but I don't think so. The main option that I've been
291thinking through is imposing a "view shift" as we enter the fn body,
292so that regions appearing in the types of fn parameters and upvars are
293translated from being regions in the outer fn into free region
294parameters, just as they would be if we applied the desugaring. The
295challenge here is that type inference may not have fully run, so the
296types may not be fully known: we could probably do this translation
297lazilly, as type variables are instantiated. We would also have to
298apply a kind of inverse translation to the return value. This would be
299a good idea anyway, as right now it is possible for free regions
300instantiated within the closure to leak into the parent: this
301currently leads to type errors, since those regions cannot outlive any
302expressions within the parent hierarchy. Much like the current
303handling of closures, there are no known cases where this leads to a
304type-checking accepting incorrect code (though it sometimes rejects
305what might be considered correct code; see rust-lang/rust#22557), but
306it still doesn't feel like the right approach.
85aaf69f
SL
307
308### Skolemization
309
310For a discussion on skolemization and higher-ranked subtyping, please
311see the module `middle::infer::higher_ranked::doc`.