]>
Commit | Line | Data |
---|---|---|
85aaf69f SL |
1 | % The Borrow Checker |
2 | ||
0531ce1d XL |
3 | > WARNING: This README is more or less obsolete, and will be removed |
4 | > soon! The new system is described in the [rustc guide]. | |
5 | ||
6 | [rustc guide]: https://rust-lang-nursery.github.io/rustc-guide/mir-borrowck.html | |
7 | ||
85aaf69f SL |
8 | This pass has the job of enforcing memory safety. This is a subtle |
9 | topic. This docs aim to explain both the practice and the theory | |
10 | behind the borrow checker. They start with a high-level overview of | |
11 | how it works, and then proceed to dive into the theoretical | |
12 | background. Finally, they go into detail on some of the more subtle | |
13 | aspects. | |
14 | ||
15 | # Table of contents | |
16 | ||
17 | These docs are long. Search for the section you are interested in. | |
18 | ||
19 | - Overview | |
20 | - Formal model | |
21 | - Borrowing and loans | |
22 | - Moves and initialization | |
23 | - Drop flags and structural fragments | |
24 | - Future work | |
25 | ||
26 | # Overview | |
27 | ||
28 | The borrow checker checks one function at a time. It operates in two | |
29 | passes. The first pass, called `gather_loans`, walks over the function | |
30 | and identifies all of the places where borrows (e.g., `&` expressions | |
31 | and `ref` bindings) and moves (copies or captures of a linear value) | |
32 | occur. It also tracks initialization sites. For each borrow and move, | |
33 | it checks various basic safety conditions at this time (for example, | |
34 | that the lifetime of the borrow doesn't exceed the lifetime of the | |
35 | value being borrowed, or that there is no move out of an `&T` | |
36 | referent). | |
37 | ||
38 | It then uses the dataflow module to propagate which of those borrows | |
39 | may be in scope at each point in the procedure. A loan is considered | |
40 | to come into scope at the expression that caused it and to go out of | |
41 | scope when the lifetime of the resulting reference expires. | |
42 | ||
43 | Once the in-scope loans are known for each point in the program, the | |
44 | borrow checker walks the IR again in a second pass called | |
45 | `check_loans`. This pass examines each statement and makes sure that | |
46 | it is safe with respect to the in-scope loans. | |
47 | ||
48 | # Formal model | |
49 | ||
50 | Throughout the docs we'll consider a simple subset of Rust in which | |
2c00a5a8 | 51 | you can only borrow from places, defined like so: |
85aaf69f SL |
52 | |
53 | ```text | |
2c00a5a8 | 54 | P = x | P.f | *P |
85aaf69f SL |
55 | ``` |
56 | ||
2c00a5a8 XL |
57 | Here `x` represents some variable, `P.f` is a field reference, |
58 | and `*P` is a pointer dereference. There is no auto-deref or other | |
85aaf69f SL |
59 | niceties. This means that if you have a type like: |
60 | ||
c34b1796 AL |
61 | ```rust |
62 | struct S { f: i32 } | |
85aaf69f SL |
63 | ``` |
64 | ||
65 | and a variable `a: Box<S>`, then the rust expression `a.f` would correspond | |
2c00a5a8 | 66 | to an `P` of `(*a).f`. |
85aaf69f SL |
67 | |
68 | Here is the formal grammar for the types we'll consider: | |
69 | ||
70 | ```text | |
c34b1796 AL |
71 | TY = i32 | bool | S<'LT...> | Box<TY> | & 'LT MQ TY |
72 | MQ = mut | imm | |
85aaf69f SL |
73 | ``` |
74 | ||
75 | Most of these types should be pretty self explanatory. Here `S` is a | |
76 | struct name and we assume structs are declared like so: | |
77 | ||
78 | ```text | |
79 | SD = struct S<'LT...> { (f: TY)... } | |
80 | ``` | |
81 | ||
82 | # Borrowing and loans | |
83 | ||
84 | ## An intuitive explanation | |
85 | ||
86 | ### Issuing loans | |
87 | ||
88 | Now, imagine we had a program like this: | |
89 | ||
c34b1796 AL |
90 | ```rust |
91 | struct Foo { f: i32, g: i32 } | |
85aaf69f SL |
92 | ... |
93 | 'a: { | |
c34b1796 AL |
94 | let mut x: Box<Foo> = ...; |
95 | let y = &mut (*x).f; | |
96 | x = ...; | |
85aaf69f SL |
97 | } |
98 | ``` | |
99 | ||
100 | This is of course dangerous because mutating `x` will free the old | |
101 | value and hence invalidate `y`. The borrow checker aims to prevent | |
102 | this sort of thing. | |
103 | ||
104 | #### Loans and restrictions | |
105 | ||
106 | The way the borrow checker works is that it analyzes each borrow | |
2c00a5a8 | 107 | expression (in our simple model, that's stuff like `&P`, though in |
85aaf69f SL |
108 | real life there are a few other cases to consider). For each borrow |
109 | expression, it computes a `Loan`, which is a data structure that | |
110 | records (1) the value being borrowed, (2) the mutability and scope of | |
111 | the borrow, and (3) a set of restrictions. In the code, `Loan` is a | |
112 | struct defined in `middle::borrowck`. Formally, we define `LOAN` as | |
113 | follows: | |
114 | ||
115 | ```text | |
2c00a5a8 XL |
116 | LOAN = (P, LT, MQ, RESTRICTION*) |
117 | RESTRICTION = (P, ACTION*) | |
85aaf69f SL |
118 | ACTION = MUTATE | CLAIM | FREEZE |
119 | ``` | |
120 | ||
2c00a5a8 | 121 | Here the `LOAN` tuple defines the place `P` being borrowed; the |
85aaf69f SL |
122 | lifetime `LT` of that borrow; the mutability `MQ` of the borrow; and a |
123 | list of restrictions. The restrictions indicate actions which, if | |
124 | taken, could invalidate the loan and lead to type safety violations. | |
125 | ||
2c00a5a8 | 126 | Each `RESTRICTION` is a pair of a restrictive place `P` (which will |
85aaf69f SL |
127 | either be the path that was borrowed or some prefix of the path that |
128 | was borrowed) and a set of restricted actions. There are three kinds | |
2c00a5a8 | 129 | of actions that may be restricted for the path `P`: |
85aaf69f | 130 | |
2c00a5a8 XL |
131 | - `MUTATE` means that `P` cannot be assigned to; |
132 | - `CLAIM` means that the `P` cannot be borrowed mutably; | |
133 | - `FREEZE` means that the `P` cannot be borrowed immutably; | |
85aaf69f | 134 | |
2c00a5a8 XL |
135 | Finally, it is never possible to move from a place that appears in a |
136 | restriction. This implies that the "empty restriction" `(P, [])`, | |
85aaf69f | 137 | which contains an empty set of actions, still has a purpose---it |
2c00a5a8 | 138 | prevents moves from `P`. I chose not to make `MOVE` a fourth kind of |
85aaf69f SL |
139 | action because that would imply that sometimes moves are permitted |
140 | from restricted values, which is not the case. | |
141 | ||
142 | #### Example | |
143 | ||
144 | To give you a better feeling for what kind of restrictions derived | |
145 | from a loan, let's look at the loan `L` that would be issued as a | |
146 | result of the borrow `&mut (*x).f` in the example above: | |
147 | ||
148 | ```text | |
149 | L = ((*x).f, 'a, mut, RS) where | |
150 | RS = [((*x).f, [MUTATE, CLAIM, FREEZE]), | |
151 | (*x, [MUTATE, CLAIM, FREEZE]), | |
152 | (x, [MUTATE, CLAIM, FREEZE])] | |
153 | ``` | |
154 | ||
155 | The loan states that the expression `(*x).f` has been loaned as | |
156 | mutable for the lifetime `'a`. Because the loan is mutable, that means | |
157 | that the value `(*x).f` may be mutated via the newly created reference | |
158 | (and *only* via that pointer). This is reflected in the | |
159 | restrictions `RS` that accompany the loan. | |
160 | ||
161 | The first restriction `((*x).f, [MUTATE, CLAIM, FREEZE])` states that | |
162 | the lender may not mutate, freeze, nor alias `(*x).f`. Mutation is | |
163 | illegal because `(*x).f` is only supposed to be mutated via the new | |
164 | reference, not by mutating the original path `(*x).f`. Freezing is | |
165 | illegal because the path now has an `&mut` alias; so even if we the | |
166 | lender were to consider `(*x).f` to be immutable, it might be mutated | |
167 | via this alias. They will be enforced for the lifetime `'a` of the | |
168 | loan. After the loan expires, the restrictions no longer apply. | |
169 | ||
170 | The second restriction on `*x` is interesting because it does not | |
171 | apply to the path that was lent (`(*x).f`) but rather to a prefix of | |
172 | the borrowed path. This is due to the rules of inherited mutability: | |
173 | if the user were to assign to (or freeze) `*x`, they would indirectly | |
174 | overwrite (or freeze) `(*x).f`, and thus invalidate the reference | |
175 | that was created. In general it holds that when a path is | |
176 | lent, restrictions are issued for all the owning prefixes of that | |
177 | path. In this case, the path `*x` owns the path `(*x).f` and, | |
62682a34 | 178 | because `x` has ownership, the path `x` owns the path `*x`. |
85aaf69f SL |
179 | Therefore, borrowing `(*x).f` yields restrictions on both |
180 | `*x` and `x`. | |
181 | ||
182 | ### Checking for illegal assignments, moves, and reborrows | |
183 | ||
184 | Once we have computed the loans introduced by each borrow, the borrow | |
185 | checker uses a data flow propagation to compute the full set of loans | |
186 | in scope at each expression and then uses that set to decide whether | |
187 | that expression is legal. Remember that the scope of loan is defined | |
188 | by its lifetime LT. We sometimes say that a loan which is in-scope at | |
189 | a particular point is an "outstanding loan", and the set of | |
190 | restrictions included in those loans as the "outstanding | |
191 | restrictions". | |
192 | ||
193 | The kinds of expressions which in-scope loans can render illegal are: | |
194 | - *assignments* (`lv = v`): illegal if there is an in-scope restriction | |
195 | against mutating `lv`; | |
196 | - *moves*: illegal if there is any in-scope restriction on `lv` at all; | |
197 | - *mutable borrows* (`&mut lv`): illegal there is an in-scope restriction | |
198 | against claiming `lv`; | |
199 | - *immutable borrows* (`&lv`): illegal there is an in-scope restriction | |
200 | against freezing `lv`. | |
201 | ||
202 | ## Formal rules | |
203 | ||
204 | Now that we hopefully have some kind of intuitive feeling for how the | |
205 | borrow checker works, let's look a bit more closely now at the precise | |
c34b1796 | 206 | conditions that it uses. |
85aaf69f SL |
207 | |
208 | I will present the rules in a modified form of standard inference | |
209 | rules, which looks as follows: | |
210 | ||
211 | ```text | |
212 | PREDICATE(X, Y, Z) // Rule-Name | |
213 | Condition 1 | |
214 | Condition 2 | |
215 | Condition 3 | |
216 | ``` | |
217 | ||
218 | The initial line states the predicate that is to be satisfied. The | |
219 | indented lines indicate the conditions that must be met for the | |
220 | predicate to be satisfied. The right-justified comment states the name | |
221 | of this rule: there are comments in the borrowck source referencing | |
222 | these names, so that you can cross reference to find the actual code | |
223 | that corresponds to the formal rule. | |
224 | ||
225 | ### Invariants | |
226 | ||
227 | I want to collect, at a high-level, the invariants the borrow checker | |
228 | maintains. I will give them names and refer to them throughout the | |
229 | text. Together these invariants are crucial for the overall soundness | |
230 | of the system. | |
231 | ||
232 | **Mutability requires uniqueness.** To mutate a path | |
233 | ||
234 | **Unique mutability.** There is only one *usable* mutable path to any | |
235 | given memory at any given time. This implies that when claiming memory | |
236 | with an expression like `p = &mut x`, the compiler must guarantee that | |
237 | the borrowed value `x` can no longer be mutated so long as `p` is | |
238 | live. (This is done via restrictions, read on.) | |
239 | ||
240 | **.** | |
241 | ||
242 | ||
243 | ### The `gather_loans` pass | |
244 | ||
245 | We start with the `gather_loans` pass, which walks the AST looking for | |
246 | borrows. For each borrow, there are three bits of information: the | |
2c00a5a8 | 247 | place `P` being borrowed and the mutability `MQ` and lifetime `LT` |
85aaf69f SL |
248 | of the resulting pointer. Given those, `gather_loans` applies four |
249 | validity tests: | |
250 | ||
2c00a5a8 XL |
251 | 1. `MUTABILITY(P, MQ)`: The mutability of the reference is |
252 | compatible with the mutability of `P` (i.e., not borrowing immutable | |
85aaf69f SL |
253 | data as mutable). |
254 | ||
2c00a5a8 XL |
255 | 2. `ALIASABLE(P, MQ)`: The aliasability of the reference is |
256 | compatible with the aliasability of `P`. The goal is to prevent | |
85aaf69f SL |
257 | `&mut` borrows of aliasability data. |
258 | ||
2c00a5a8 | 259 | 3. `LIFETIME(P, LT, MQ)`: The lifetime of the borrow does not exceed |
85aaf69f SL |
260 | the lifetime of the value being borrowed. |
261 | ||
2c00a5a8 | 262 | 4. `RESTRICTIONS(P, LT, ACTIONS) = RS`: This pass checks and computes the |
85aaf69f SL |
263 | restrictions to maintain memory safety. These are the restrictions |
264 | that will go into the final loan. We'll discuss in more detail below. | |
265 | ||
266 | ## Checking mutability | |
267 | ||
268 | Checking mutability is fairly straightforward. We just want to prevent | |
c34b1796 AL |
269 | immutable data from being borrowed as mutable. Note that it is ok to borrow |
270 | mutable data as immutable, since that is simply a freeze. The judgement | |
2c00a5a8 | 271 | `MUTABILITY(P, MQ)` means the mutability of `P` is compatible with a borrow |
c34b1796 AL |
272 | of mutability `MQ`. The Rust code corresponding to this predicate is the |
273 | function `check_mutability` in `middle::borrowck::gather_loans`. | |
85aaf69f SL |
274 | |
275 | ### Checking mutability of variables | |
276 | ||
277 | *Code pointer:* Function `check_mutability()` in `gather_loans/mod.rs`, | |
278 | but also the code in `mem_categorization`. | |
279 | ||
280 | Let's begin with the rules for variables, which state that if a | |
281 | variable is declared as mutable, it may be borrowed any which way, but | |
c34b1796 | 282 | otherwise the variable must be borrowed as immutable: |
85aaf69f SL |
283 | |
284 | ```text | |
285 | MUTABILITY(X, MQ) // M-Var-Mut | |
286 | DECL(X) = mut | |
287 | ||
c34b1796 | 288 | MUTABILITY(X, imm) // M-Var-Imm |
85aaf69f | 289 | DECL(X) = imm |
85aaf69f SL |
290 | ``` |
291 | ||
292 | ### Checking mutability of owned content | |
293 | ||
62682a34 | 294 | Fields and boxes inherit their mutability from |
85aaf69f | 295 | their base expressions, so both of their rules basically |
2c00a5a8 | 296 | delegate the check to the base expression `P`: |
85aaf69f SL |
297 | |
298 | ```text | |
2c00a5a8 XL |
299 | MUTABILITY(P.f, MQ) // M-Field |
300 | MUTABILITY(P, MQ) | |
85aaf69f | 301 | |
2c00a5a8 XL |
302 | MUTABILITY(*P, MQ) // M-Deref-Unique |
303 | TYPE(P) = Box<Ty> | |
304 | MUTABILITY(P, MQ) | |
85aaf69f SL |
305 | ``` |
306 | ||
307 | ### Checking mutability of immutable pointer types | |
308 | ||
309 | Immutable pointer types like `&T` can only | |
c34b1796 | 310 | be borrowed if MQ is immutable: |
85aaf69f SL |
311 | |
312 | ```text | |
2c00a5a8 XL |
313 | MUTABILITY(*P, imm) // M-Deref-Borrowed-Imm |
314 | TYPE(P) = &Ty | |
85aaf69f SL |
315 | ``` |
316 | ||
317 | ### Checking mutability of mutable pointer types | |
318 | ||
319 | `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut: | |
320 | ||
321 | ```text | |
2c00a5a8 XL |
322 | MUTABILITY(*P, MQ) // M-Deref-Borrowed-Mut |
323 | TYPE(P) = &mut Ty | |
85aaf69f SL |
324 | ``` |
325 | ||
326 | ## Checking aliasability | |
327 | ||
c34b1796 | 328 | The goal of the aliasability check is to ensure that we never permit `&mut` |
2c00a5a8 XL |
329 | borrows of aliasable data. The judgement `ALIASABLE(P, MQ)` means the |
330 | aliasability of `P` is compatible with a borrow of mutability `MQ`. The Rust | |
c34b1796 AL |
331 | code corresponding to this predicate is the function `check_aliasability()` in |
332 | `middle::borrowck::gather_loans`. | |
85aaf69f SL |
333 | |
334 | ### Checking aliasability of variables | |
335 | ||
336 | Local variables are never aliasable as they are accessible only within | |
337 | the stack frame. | |
338 | ||
339 | ```text | |
340 | ALIASABLE(X, MQ) // M-Var-Mut | |
341 | ``` | |
342 | ||
343 | ### Checking aliasable of owned content | |
344 | ||
345 | Owned content is aliasable if it is found in an aliasable location: | |
346 | ||
347 | ```text | |
2c00a5a8 XL |
348 | ALIASABLE(P.f, MQ) // M-Field |
349 | ALIASABLE(P, MQ) | |
85aaf69f | 350 | |
2c00a5a8 XL |
351 | ALIASABLE(*P, MQ) // M-Deref-Unique |
352 | ALIASABLE(P, MQ) | |
85aaf69f SL |
353 | ``` |
354 | ||
cc61c64b | 355 | ### Checking aliasability of immutable pointer types |
85aaf69f SL |
356 | |
357 | Immutable pointer types like `&T` are aliasable, and hence can only be | |
358 | borrowed immutably: | |
359 | ||
360 | ```text | |
2c00a5a8 XL |
361 | ALIASABLE(*P, imm) // M-Deref-Borrowed-Imm |
362 | TYPE(P) = &Ty | |
85aaf69f SL |
363 | ``` |
364 | ||
cc61c64b | 365 | ### Checking aliasability of mutable pointer types |
85aaf69f SL |
366 | |
367 | `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut: | |
368 | ||
369 | ```text | |
2c00a5a8 XL |
370 | ALIASABLE(*P, MQ) // M-Deref-Borrowed-Mut |
371 | TYPE(P) = &mut Ty | |
85aaf69f SL |
372 | ``` |
373 | ||
374 | ## Checking lifetime | |
375 | ||
376 | These rules aim to ensure that no data is borrowed for a scope that exceeds | |
377 | its lifetime. These two computations wind up being intimately related. | |
2c00a5a8 XL |
378 | Formally, we define a predicate `LIFETIME(P, LT, MQ)`, which states that |
379 | "the place `P` can be safely borrowed for the lifetime `LT` with mutability | |
85aaf69f SL |
380 | `MQ`". The Rust code corresponding to this predicate is the module |
381 | `middle::borrowck::gather_loans::lifetime`. | |
382 | ||
85aaf69f SL |
383 | ### Checking lifetime of variables |
384 | ||
385 | The rule for variables states that a variable can only be borrowed a | |
386 | lifetime `LT` that is a subregion of the variable's scope: | |
387 | ||
388 | ```text | |
389 | LIFETIME(X, LT, MQ) // L-Local | |
c34b1796 | 390 | LT <= block where X is declared |
85aaf69f SL |
391 | ``` |
392 | ||
393 | ### Checking lifetime for owned content | |
394 | ||
62682a34 | 395 | The lifetime of a field or box is the same as the lifetime |
85aaf69f SL |
396 | of its owner: |
397 | ||
398 | ```text | |
2c00a5a8 XL |
399 | LIFETIME(P.f, LT, MQ) // L-Field |
400 | LIFETIME(P, LT, MQ) | |
85aaf69f | 401 | |
2c00a5a8 XL |
402 | LIFETIME(*P, LT, MQ) // L-Deref-Send |
403 | TYPE(P) = Box<Ty> | |
404 | LIFETIME(P, LT, MQ) | |
85aaf69f SL |
405 | ``` |
406 | ||
407 | ### Checking lifetime for derefs of references | |
408 | ||
409 | References have a lifetime `LT'` associated with them. The | |
410 | data they point at has been guaranteed to be valid for at least this | |
411 | lifetime. Therefore, the borrow is valid so long as the lifetime `LT` | |
412 | of the borrow is shorter than the lifetime `LT'` of the pointer | |
413 | itself: | |
414 | ||
415 | ```text | |
2c00a5a8 XL |
416 | LIFETIME(*P, LT, MQ) // L-Deref-Borrowed |
417 | TYPE(P) = <' Ty OR <' mut Ty | |
85aaf69f SL |
418 | LT <= LT' |
419 | ``` | |
420 | ||
421 | ## Computing the restrictions | |
422 | ||
423 | The final rules govern the computation of *restrictions*, meaning that | |
424 | we compute the set of actions that will be illegal for the life of the | |
2c00a5a8 | 425 | loan. The predicate is written `RESTRICTIONS(P, LT, ACTIONS) = |
85aaf69f | 426 | RESTRICTION*`, which can be read "in order to prevent `ACTIONS` from |
2c00a5a8 | 427 | occurring on `P`, the restrictions `RESTRICTION*` must be respected |
85aaf69f SL |
428 | for the lifetime of the loan". |
429 | ||
430 | Note that there is an initial set of restrictions: these restrictions | |
431 | are computed based on the kind of borrow: | |
432 | ||
433 | ```text | |
2c00a5a8 XL |
434 | &mut P => RESTRICTIONS(P, LT, MUTATE|CLAIM|FREEZE) |
435 | &P => RESTRICTIONS(P, LT, MUTATE|CLAIM) | |
85aaf69f SL |
436 | ``` |
437 | ||
438 | The reasoning here is that a mutable borrow must be the only writer, | |
439 | therefore it prevents other writes (`MUTATE`), mutable borrows | |
440 | (`CLAIM`), and immutable borrows (`FREEZE`). An immutable borrow | |
441 | permits other immutable borrows but forbids writes and mutable borrows. | |
85aaf69f SL |
442 | |
443 | ### Restrictions for loans of a local variable | |
444 | ||
445 | The simplest case is a borrow of a local variable `X`: | |
446 | ||
447 | ```text | |
448 | RESTRICTIONS(X, LT, ACTIONS) = (X, ACTIONS) // R-Variable | |
449 | ``` | |
450 | ||
451 | In such cases we just record the actions that are not permitted. | |
452 | ||
453 | ### Restrictions for loans of fields | |
454 | ||
455 | Restricting a field is the same as restricting the owner of that | |
456 | field: | |
457 | ||
458 | ```text | |
2c00a5a8 XL |
459 | RESTRICTIONS(P.f, LT, ACTIONS) = RS, (P.f, ACTIONS) // R-Field |
460 | RESTRICTIONS(P, LT, ACTIONS) = RS | |
85aaf69f SL |
461 | ``` |
462 | ||
463 | The reasoning here is as follows. If the field must not be mutated, | |
464 | then you must not mutate the owner of the field either, since that | |
465 | would indirectly modify the field. Similarly, if the field cannot be | |
466 | frozen or aliased, we cannot allow the owner to be frozen or aliased, | |
467 | since doing so indirectly freezes/aliases the field. This is the | |
468 | origin of inherited mutability. | |
469 | ||
470 | ### Restrictions for loans of owned referents | |
471 | ||
472 | Because the mutability of owned referents is inherited, restricting an | |
473 | owned referent is similar to restricting a field, in that it implies | |
62682a34 | 474 | restrictions on the pointer. However, boxes have an important |
2c00a5a8 XL |
475 | twist: if the owner `P` is mutated, that causes the owned referent |
476 | `*P` to be freed! So whenever an owned referent `*P` is borrowed, we | |
477 | must prevent the box `P` from being mutated, which means | |
85aaf69f | 478 | that we always add `MUTATE` and `CLAIM` to the restriction set imposed |
2c00a5a8 | 479 | on `P`: |
85aaf69f SL |
480 | |
481 | ```text | |
2c00a5a8 XL |
482 | RESTRICTIONS(*P, LT, ACTIONS) = RS, (*P, ACTIONS) // R-Deref-Send-Pointer |
483 | TYPE(P) = Box<Ty> | |
484 | RESTRICTIONS(P, LT, ACTIONS|MUTATE|CLAIM) = RS | |
85aaf69f SL |
485 | ``` |
486 | ||
487 | ### Restrictions for loans of immutable borrowed referents | |
488 | ||
489 | Immutable borrowed referents are freely aliasable, meaning that | |
490 | the compiler does not prevent you from copying the pointer. This | |
491 | implies that issuing restrictions is useless. We might prevent the | |
2c00a5a8 XL |
492 | user from acting on `*P` itself, but there could be another path |
493 | `*P1` that refers to the exact same memory, and we would not be | |
85aaf69f SL |
494 | restricting that path. Therefore, the rule for `&Ty` pointers |
495 | always returns an empty set of restrictions, and it only permits | |
496 | restricting `MUTATE` and `CLAIM` actions: | |
497 | ||
498 | ```text | |
2c00a5a8 XL |
499 | RESTRICTIONS(*P, LT, ACTIONS) = [] // R-Deref-Imm-Borrowed |
500 | TYPE(P) = <' Ty | |
85aaf69f SL |
501 | LT <= LT' // (1) |
502 | ACTIONS subset of [MUTATE, CLAIM] | |
503 | ``` | |
504 | ||
505 | The reason that we can restrict `MUTATE` and `CLAIM` actions even | |
506 | without a restrictions list is that it is never legal to mutate nor to | |
507 | borrow mutably the contents of a `&Ty` pointer. In other words, | |
508 | those restrictions are already inherent in the type. | |
509 | ||
510 | Clause (1) in the rule for `&Ty` deserves mention. Here I | |
511 | specify that the lifetime of the loan must be less than the lifetime | |
512 | of the `&Ty` pointer. In simple cases, this clause is redundant, since | |
513 | the `LIFETIME()` function will already enforce the required rule: | |
514 | ||
c34b1796 AL |
515 | ```rust |
516 | fn foo(point: &'a Point) -> &'static i32 { | |
85aaf69f SL |
517 | &point.x // Error |
518 | } | |
519 | ``` | |
520 | ||
521 | The above example fails to compile both because of clause (1) above | |
522 | but also by the basic `LIFETIME()` check. However, in more advanced | |
523 | examples involving multiple nested pointers, clause (1) is needed: | |
524 | ||
c34b1796 AL |
525 | ```rust |
526 | fn foo(point: &'a &'b mut Point) -> &'b i32 { | |
85aaf69f SL |
527 | &point.x // Error |
528 | } | |
529 | ``` | |
530 | ||
531 | The `LIFETIME` rule here would accept `'b` because, in fact, the | |
532 | *memory is* guaranteed to remain valid (i.e., not be freed) for the | |
533 | lifetime `'b`, since the `&mut` pointer is valid for `'b`. However, we | |
534 | are returning an immutable reference, so we need the memory to be both | |
535 | valid and immutable. Even though `point.x` is referenced by an `&mut` | |
536 | pointer, it can still be considered immutable so long as that `&mut` | |
537 | pointer is found in an aliased location. That means the memory is | |
538 | guaranteed to be *immutable* for the lifetime of the `&` pointer, | |
539 | which is only `'a`, not `'b`. Hence this example yields an error. | |
540 | ||
541 | As a final twist, consider the case of two nested *immutable* | |
542 | pointers, rather than a mutable pointer within an immutable one: | |
543 | ||
c34b1796 AL |
544 | ```rust |
545 | fn foo(point: &'a &'b Point) -> &'b i32 { | |
85aaf69f SL |
546 | &point.x // OK |
547 | } | |
548 | ``` | |
549 | ||
550 | This function is legal. The reason for this is that the inner pointer | |
551 | (`*point : &'b Point`) is enough to guarantee the memory is immutable | |
552 | and valid for the lifetime `'b`. This is reflected in | |
553 | `RESTRICTIONS()` by the fact that we do not recurse (i.e., we impose | |
2c00a5a8 | 554 | no restrictions on `P`, which in this particular case is the pointer |
85aaf69f SL |
555 | `point : &'a &'b Point`). |
556 | ||
557 | #### Why both `LIFETIME()` and `RESTRICTIONS()`? | |
558 | ||
559 | Given the previous text, it might seem that `LIFETIME` and | |
560 | `RESTRICTIONS` should be folded together into one check, but there is | |
561 | a reason that they are separated. They answer separate concerns. | |
562 | The rules pertaining to `LIFETIME` exist to ensure that we don't | |
563 | create a borrowed pointer that outlives the memory it points at. So | |
564 | `LIFETIME` prevents a function like this: | |
565 | ||
c34b1796 AL |
566 | ```rust |
567 | fn get_1<'a>() -> &'a i32 { | |
85aaf69f SL |
568 | let x = 1; |
569 | &x | |
570 | } | |
571 | ``` | |
572 | ||
573 | Here we would be returning a pointer into the stack. Clearly bad. | |
574 | ||
575 | However, the `RESTRICTIONS` rules are more concerned with how memory | |
576 | is used. The example above doesn't generate an error according to | |
577 | `RESTRICTIONS` because, for local variables, we don't require that the | |
578 | loan lifetime be a subset of the local variable lifetime. The idea | |
579 | here is that we *can* guarantee that `x` is not (e.g.) mutated for the | |
580 | lifetime `'a`, even though `'a` exceeds the function body and thus | |
581 | involves unknown code in the caller -- after all, `x` ceases to exist | |
582 | after we return and hence the remaining code in `'a` cannot possibly | |
583 | mutate it. This distinction is important for type checking functions | |
584 | like this one: | |
585 | ||
c34b1796 AL |
586 | ```rust |
587 | fn inc_and_get<'a>(p: &'a mut Point) -> &'a i32 { | |
85aaf69f SL |
588 | p.x += 1; |
589 | &p.x | |
590 | } | |
591 | ``` | |
592 | ||
593 | In this case, we take in a `&mut` and return a frozen borrowed pointer | |
594 | with the same lifetime. So long as the lifetime of the returned value | |
595 | doesn't exceed the lifetime of the `&mut` we receive as input, this is | |
596 | fine, though it may seem surprising at first (it surprised me when I | |
597 | first worked it through). After all, we're guaranteeing that `*p` | |
598 | won't be mutated for the lifetime `'a`, even though we can't "see" the | |
599 | entirety of the code during that lifetime, since some of it occurs in | |
600 | our caller. But we *do* know that nobody can mutate `*p` except | |
601 | through `p`. So if we don't mutate `*p` and we don't return `p`, then | |
602 | we know that the right to mutate `*p` has been lost to our caller -- | |
603 | in terms of capability, the caller passed in the ability to mutate | |
604 | `*p`, and we never gave it back. (Note that we can't return `p` while | |
605 | `*p` is borrowed since that would be a move of `p`, as `&mut` pointers | |
606 | are affine.) | |
607 | ||
85aaf69f SL |
608 | ### Restrictions for loans of mutable borrowed referents |
609 | ||
610 | Mutable borrowed pointers are guaranteed to be the only way to mutate | |
611 | their referent. This permits us to take greater license with them; for | |
612 | example, the referent can be frozen simply be ensuring that we do not | |
613 | use the original pointer to perform mutate. Similarly, we can allow | |
614 | the referent to be claimed, so long as the original pointer is unused | |
615 | while the new claimant is live. | |
616 | ||
617 | The rule for mutable borrowed pointers is as follows: | |
618 | ||
619 | ```text | |
2c00a5a8 XL |
620 | RESTRICTIONS(*P, LT, ACTIONS) = RS, (*P, ACTIONS) // R-Deref-Mut-Borrowed |
621 | TYPE(P) = <' mut Ty | |
85aaf69f | 622 | LT <= LT' // (1) |
2c00a5a8 | 623 | RESTRICTIONS(P, LT, ACTIONS) = RS // (2) |
85aaf69f SL |
624 | ``` |
625 | ||
626 | Let's examine the two numbered clauses: | |
627 | ||
628 | Clause (1) specifies that the lifetime of the loan (`LT`) cannot | |
629 | exceed the lifetime of the `&mut` pointer (`LT'`). The reason for this | |
630 | is that the `&mut` pointer is guaranteed to be the only legal way to | |
631 | mutate its referent -- but only for the lifetime `LT'`. After that | |
632 | lifetime, the loan on the referent expires and hence the data may be | |
633 | modified by its owner again. This implies that we are only able to | |
634 | guarantee that the referent will not be modified or aliased for a | |
635 | maximum of `LT'`. | |
636 | ||
637 | Here is a concrete example of a bug this rule prevents: | |
638 | ||
c34b1796 | 639 | ```rust |
85aaf69f | 640 | // Test region-reborrow-from-shorter-mut-ref.rs: |
cc61c64b | 641 | fn copy_borrowed_ptr<'a,'b,T>(x: &'a mut &'b mut T) -> &'b mut T { |
85aaf69f SL |
642 | &mut **p // ERROR due to clause (1) |
643 | } | |
644 | fn main() { | |
645 | let mut x = 1; | |
646 | let mut y = &mut x; // <-'b-----------------------------+ | |
647 | // +-'a--------------------+ | | |
648 | // v v | | |
649 | let z = copy_borrowed_ptr(&mut y); // y is lent | | |
650 | *y += 1; // Here y==z, so both should not be usable... | | |
651 | *z += 1; // ...and yet they would be, but for clause 1. | | |
652 | } // <------------------------------------------------------+ | |
653 | ``` | |
654 | ||
655 | Clause (2) propagates the restrictions on the referent to the pointer | |
62682a34 | 656 | itself. This is the same as with an box, though the |
85aaf69f SL |
657 | reasoning is mildly different. The basic goal in all cases is to |
658 | prevent the user from establishing another route to the same data. To | |
659 | see what I mean, let's examine various cases of what can go wrong and | |
660 | show how it is prevented. | |
661 | ||
662 | **Example danger 1: Moving the base pointer.** One of the simplest | |
663 | ways to violate the rules is to move the base pointer to a new name | |
664 | and access it via that new name, thus bypassing the restrictions on | |
665 | the old name. Here is an example: | |
666 | ||
c34b1796 | 667 | ```rust |
85aaf69f | 668 | // src/test/compile-fail/borrowck-move-mut-base-ptr.rs |
c34b1796 AL |
669 | fn foo(t0: &mut i32) { |
670 | let p: &i32 = &*t0; // Freezes `*t0` | |
85aaf69f SL |
671 | let t1 = t0; //~ ERROR cannot move out of `t0` |
672 | *t1 = 22; // OK, not a write through `*t0` | |
673 | } | |
674 | ``` | |
675 | ||
676 | Remember that `&mut` pointers are linear, and hence `let t1 = t0` is a | |
677 | move of `t0` -- or would be, if it were legal. Instead, we get an | |
2c00a5a8 | 678 | error, because clause (2) imposes restrictions on `P` (`t0`, here), |
85aaf69f SL |
679 | and any restrictions on a path make it impossible to move from that |
680 | path. | |
681 | ||
682 | **Example danger 2: Claiming the base pointer.** Another possible | |
683 | danger is to mutably borrow the base path. This can lead to two bad | |
684 | scenarios. The most obvious is that the mutable borrow itself becomes | |
685 | another path to access the same data, as shown here: | |
686 | ||
c34b1796 | 687 | ```rust |
85aaf69f | 688 | // src/test/compile-fail/borrowck-mut-borrow-of-mut-base-ptr.rs |
c34b1796 AL |
689 | fn foo<'a>(mut t0: &'a mut i32, |
690 | mut t1: &'a mut i32) { | |
691 | let p: &i32 = &*t0; // Freezes `*t0` | |
85aaf69f SL |
692 | let mut t2 = &mut t0; //~ ERROR cannot borrow `t0` |
693 | **t2 += 1; // Mutates `*t0` | |
694 | } | |
695 | ``` | |
696 | ||
697 | In this example, `**t2` is the same memory as `*t0`. Because `t2` is | |
698 | an `&mut` pointer, `**t2` is a unique path and hence it would be | |
699 | possible to mutate `**t2` even though that memory was supposed to be | |
700 | frozen by the creation of `p`. However, an error is reported -- the | |
701 | reason is that the freeze `&*t0` will restrict claims and mutation | |
702 | against `*t0` which, by clause 2, in turn prevents claims and mutation | |
703 | of `t0`. Hence the claim `&mut t0` is illegal. | |
704 | ||
705 | Another danger with an `&mut` pointer is that we could swap the `t0` | |
706 | value away to create a new path: | |
707 | ||
c34b1796 | 708 | ```rust |
85aaf69f | 709 | // src/test/compile-fail/borrowck-swap-mut-base-ptr.rs |
c34b1796 AL |
710 | fn foo<'a>(mut t0: &'a mut i32, |
711 | mut t1: &'a mut i32) { | |
712 | let p: &i32 = &*t0; // Freezes `*t0` | |
85aaf69f SL |
713 | swap(&mut t0, &mut t1); //~ ERROR cannot borrow `t0` |
714 | *t1 = 22; | |
715 | } | |
716 | ``` | |
717 | ||
718 | This is illegal for the same reason as above. Note that if we added | |
719 | back a swap operator -- as we used to have -- we would want to be very | |
720 | careful to ensure this example is still illegal. | |
721 | ||
722 | **Example danger 3: Freeze the base pointer.** In the case where the | |
723 | referent is claimed, even freezing the base pointer can be dangerous, | |
724 | as shown in the following example: | |
725 | ||
c34b1796 | 726 | ```rust |
85aaf69f | 727 | // src/test/compile-fail/borrowck-borrow-of-mut-base-ptr.rs |
c34b1796 AL |
728 | fn foo<'a>(mut t0: &'a mut i32, |
729 | mut t1: &'a mut i32) { | |
730 | let p: &mut i32 = &mut *t0; // Claims `*t0` | |
85aaf69f | 731 | let mut t2 = &t0; //~ ERROR cannot borrow `t0` |
c34b1796 | 732 | let q: &i32 = &*t2; // Freezes `*t0` but not through `*p` |
85aaf69f SL |
733 | *p += 1; // violates type of `*q` |
734 | } | |
735 | ``` | |
736 | ||
737 | Here the problem is that `*t0` is claimed by `p`, and hence `p` wants | |
738 | to be the controlling pointer through which mutation or freezes occur. | |
c34b1796 | 739 | But `t2` would -- if it were legal -- have the type `& &mut i32`, and |
85aaf69f SL |
740 | hence would be a mutable pointer in an aliasable location, which is |
741 | considered frozen (since no one can write to `**t2` as it is not a | |
c34b1796 | 742 | unique path). Therefore, we could reasonably create a frozen `&i32` |
85aaf69f SL |
743 | pointer pointing at `*t0` that coexists with the mutable pointer `p`, |
744 | which is clearly unsound. | |
745 | ||
746 | However, it is not always unsafe to freeze the base pointer. In | |
747 | particular, if the referent is frozen, there is no harm in it: | |
748 | ||
c34b1796 | 749 | ```rust |
85aaf69f | 750 | // src/test/run-pass/borrowck-borrow-of-mut-base-ptr-safe.rs |
c34b1796 AL |
751 | fn foo<'a>(mut t0: &'a mut i32, |
752 | mut t1: &'a mut i32) { | |
753 | let p: &i32 = &*t0; // Freezes `*t0` | |
85aaf69f | 754 | let mut t2 = &t0; |
c34b1796 AL |
755 | let q: &i32 = &*t2; // Freezes `*t0`, but that's ok... |
756 | let r: &i32 = &*t0; // ...after all, could do same thing directly. | |
85aaf69f SL |
757 | } |
758 | ``` | |
759 | ||
760 | In this case, creating the alias `t2` of `t0` is safe because the only | |
761 | thing `t2` can be used for is to further freeze `*t0`, which is | |
762 | already frozen. In particular, we cannot assign to `*t0` through the | |
763 | new alias `t2`, as demonstrated in this test case: | |
764 | ||
c34b1796 | 765 | ```rust |
85aaf69f | 766 | // src/test/run-pass/borrowck-borrow-mut-base-ptr-in-aliasable-loc.rs |
c34b1796 | 767 | fn foo(t0: & &mut i32) { |
85aaf69f | 768 | let t1 = t0; |
c34b1796 | 769 | let p: &i32 = &**t0; |
85aaf69f SL |
770 | **t1 = 22; //~ ERROR cannot assign |
771 | } | |
772 | ``` | |
773 | ||
774 | This distinction is reflected in the rules. When doing an `&mut` | |
775 | borrow -- as in the first example -- the set `ACTIONS` will be | |
776 | `CLAIM|MUTATE|FREEZE`, because claiming the referent implies that it | |
777 | cannot be claimed, mutated, or frozen by anyone else. These | |
778 | restrictions are propagated back to the base path and hence the base | |
779 | path is considered unfreezable. | |
780 | ||
781 | In contrast, when the referent is merely frozen -- as in the second | |
782 | example -- the set `ACTIONS` will be `CLAIM|MUTATE`, because freezing | |
783 | the referent implies that it cannot be claimed or mutated but permits | |
784 | others to freeze. Hence when these restrictions are propagated back to | |
785 | the base path, it will still be considered freezable. | |
786 | ||
787 | ||
788 | ||
abe05a73 XL |
789 | **FIXME [RFC 1751](https://github.com/rust-lang/rfcs/issues/1751) |
790 | Restrictions against mutating the base pointer.** | |
791 | When an `&mut` pointer is frozen or claimed, we currently pass along the | |
85aaf69f SL |
792 | restriction against MUTATE to the base pointer. I do not believe this |
793 | restriction is needed. It dates from the days when we had a way to | |
794 | mutate that preserved the value being mutated (i.e., swap). Nowadays | |
795 | the only form of mutation is assignment, which destroys the pointer | |
796 | being mutated -- therefore, a mutation cannot create a new path to the | |
797 | same data. Rather, it removes an existing path. This implies that not | |
798 | only can we permit mutation, we can have mutation kill restrictions in | |
799 | the dataflow sense. | |
800 | ||
801 | **WARNING:** We do not currently have `const` borrows in the | |
802 | language. If they are added back in, we must ensure that they are | |
803 | consistent with all of these examples. The crucial question will be | |
804 | what sorts of actions are permitted with a `&const &mut` pointer. I | |
805 | would suggest that an `&mut` referent found in an `&const` location be | |
806 | prohibited from both freezes and claims. This would avoid the need to | |
807 | prevent `const` borrows of the base pointer when the referent is | |
808 | borrowed. | |
809 | ||
c34b1796 AL |
810 | [ Previous revisions of this document discussed `&const` in more detail. |
811 | See the revision history. ] | |
812 | ||
85aaf69f SL |
813 | # Moves and initialization |
814 | ||
815 | The borrow checker is also in charge of ensuring that: | |
816 | ||
817 | - all memory which is accessed is initialized | |
818 | - immutable local variables are assigned at most once. | |
819 | ||
820 | These are two separate dataflow analyses built on the same | |
821 | framework. Let's look at checking that memory is initialized first; | |
822 | the checking of immutable local variable assignments works in a very | |
823 | similar way. | |
824 | ||
825 | To track the initialization of memory, we actually track all the | |
826 | points in the program that *create uninitialized memory*, meaning | |
827 | moves and the declaration of uninitialized variables. For each of | |
828 | these points, we create a bit in the dataflow set. Assignments to a | |
829 | variable `x` or path `a.b.c` kill the move/uninitialization bits for | |
830 | those paths and any subpaths (e.g., `x`, `x.y`, `a.b.c`, `*a.b.c`). | |
831 | Bits are unioned when two control-flow paths join. Thus, the | |
832 | presence of a bit indicates that the move may have occurred without an | |
833 | intervening assignment to the same memory. At each use of a variable, | |
834 | we examine the bits in scope, and check that none of them are | |
835 | moves/uninitializations of the variable that is being used. | |
836 | ||
837 | Let's look at a simple example: | |
838 | ||
c34b1796 AL |
839 | ```rust |
840 | fn foo(a: Box<i32>) { | |
841 | let b: Box<i32>; // Gen bit 0. | |
85aaf69f SL |
842 | |
843 | if cond { // Bits: 0 | |
844 | use(&*a); | |
845 | b = a; // Gen bit 1, kill bit 0. | |
846 | use(&*b); | |
847 | } else { | |
848 | // Bits: 0 | |
849 | } | |
850 | // Bits: 0,1 | |
851 | use(&*a); // Error. | |
852 | use(&*b); // Error. | |
853 | } | |
854 | ||
c34b1796 | 855 | fn use(a: &i32) { } |
85aaf69f SL |
856 | ``` |
857 | ||
858 | In this example, the variable `b` is created uninitialized. In one | |
859 | branch of an `if`, we then move the variable `a` into `b`. Once we | |
860 | exit the `if`, therefore, it is an error to use `a` or `b` since both | |
861 | are only conditionally initialized. I have annotated the dataflow | |
862 | state using comments. There are two dataflow bits, with bit 0 | |
863 | corresponding to the creation of `b` without an initializer, and bit 1 | |
864 | corresponding to the move of `a`. The assignment `b = a` both | |
865 | generates bit 1, because it is a move of `a`, and kills bit 0, because | |
866 | `b` is now initialized. On the else branch, though, `b` is never | |
867 | initialized, and so bit 0 remains untouched. When the two flows of | |
868 | control join, we union the bits from both sides, resulting in both | |
869 | bits 0 and 1 being set. Thus any attempt to use `a` uncovers the bit 1 | |
870 | from the "then" branch, showing that `a` may be moved, and any attempt | |
871 | to use `b` uncovers bit 0, from the "else" branch, showing that `b` | |
872 | may not be initialized. | |
873 | ||
874 | ## Initialization of immutable variables | |
875 | ||
876 | Initialization of immutable variables works in a very similar way, | |
877 | except that: | |
878 | ||
879 | 1. we generate bits for each assignment to a variable; | |
880 | 2. the bits are never killed except when the variable goes out of scope. | |
881 | ||
882 | Thus the presence of an assignment bit indicates that the assignment | |
883 | may have occurred. Note that assignments are only killed when the | |
884 | variable goes out of scope, as it is not relevant whether or not there | |
885 | has been a move in the meantime. Using these bits, we can declare that | |
886 | an assignment to an immutable variable is legal iff there is no other | |
887 | assignment bit to that same variable in scope. | |
888 | ||
889 | ## Why is the design made this way? | |
890 | ||
891 | It may seem surprising that we assign dataflow bits to *each move* | |
892 | rather than *each path being moved*. This is somewhat less efficient, | |
893 | since on each use, we must iterate through all moves and check whether | |
894 | any of them correspond to the path in question. Similar concerns apply | |
895 | to the analysis for double assignments to immutable variables. The | |
896 | main reason to do it this way is that it allows us to print better | |
897 | error messages, because when a use occurs, we can print out the | |
898 | precise move that may be in scope, rather than simply having to say | |
899 | "the variable may not be initialized". | |
900 | ||
901 | ## Data structures used in the move analysis | |
902 | ||
903 | The move analysis maintains several data structures that enable it to | |
904 | cross-reference moves and assignments to determine when they may be | |
905 | moving/assigning the same memory. These are all collected into the | |
906 | `MoveData` and `FlowedMoveData` structs. The former represents the set | |
907 | of move paths, moves, and assignments, and the latter adds in the | |
908 | results of a dataflow computation. | |
909 | ||
910 | ### Move paths | |
911 | ||
912 | The `MovePath` tree tracks every path that is moved or assigned to. | |
913 | These paths have the same form as the `LoanPath` data structure, which | |
2c00a5a8 | 914 | in turn is the "real world version of the places `P` that we |
85aaf69f SL |
915 | introduced earlier. The difference between a `MovePath` and a `LoanPath` |
916 | is that move paths are: | |
917 | ||
918 | 1. Canonicalized, so that we have exactly one copy of each, and | |
919 | we can refer to move paths by index; | |
920 | 2. Cross-referenced with other paths into a tree, so that given a move | |
921 | path we can efficiently find all parent move paths and all | |
922 | extensions (e.g., given the `a.b` move path, we can easily find the | |
923 | move path `a` and also the move paths `a.b.c`) | |
924 | 3. Cross-referenced with moves and assignments, so that we can | |
925 | easily find all moves and assignments to a given path. | |
926 | ||
927 | The mechanism that we use is to create a `MovePath` record for each | |
928 | move path. These are arranged in an array and are referenced using | |
929 | `MovePathIndex` values, which are newtype'd indices. The `MovePath` | |
930 | structs are arranged into a tree, representing using the standard | |
931 | Knuth representation where each node has a child 'pointer' and a "next | |
932 | sibling" 'pointer'. In addition, each `MovePath` has a parent | |
933 | 'pointer'. In this case, the 'pointers' are just `MovePathIndex` | |
934 | values. | |
935 | ||
936 | In this way, if we want to find all base paths of a given move path, | |
937 | we can just iterate up the parent pointers (see `each_base_path()` in | |
938 | the `move_data` module). If we want to find all extensions, we can | |
939 | iterate through the subtree (see `each_extending_path()`). | |
940 | ||
941 | ### Moves and assignments | |
942 | ||
943 | There are structs to represent moves (`Move`) and assignments | |
944 | (`Assignment`), and these are also placed into arrays and referenced | |
945 | by index. All moves of a particular path are arranged into a linked | |
946 | lists, beginning with `MovePath.first_move` and continuing through | |
947 | `Move.next_move`. | |
948 | ||
949 | We distinguish between "var" assignments, which are assignments to a | |
950 | variable like `x = foo`, and "path" assignments (`x.f = foo`). This | |
951 | is because we need to assign dataflows to the former, but not the | |
952 | latter, so as to check for double initialization of immutable | |
953 | variables. | |
954 | ||
955 | ### Gathering and checking moves | |
956 | ||
957 | Like loans, we distinguish two phases. The first, gathering, is where | |
958 | we uncover all the moves and assignments. As with loans, we do some | |
959 | basic sanity checking in this phase, so we'll report errors if you | |
960 | attempt to move out of a borrowed pointer etc. Then we do the dataflow | |
961 | (see `FlowedMoveData::new`). Finally, in the `check_loans.rs` code, we | |
962 | walk back over, identify all uses, assignments, and captures, and | |
963 | check that they are legal given the set of dataflow bits we have | |
964 | computed for that program point. | |
965 | ||
966 | # Drop flags and structural fragments | |
967 | ||
968 | In addition to the job of enforcing memory safety, the borrow checker | |
969 | code is also responsible for identifying the *structural fragments* of | |
970 | data in the function, to support out-of-band dynamic drop flags | |
971 | allocated on the stack. (For background, see [RFC PR #320].) | |
972 | ||
973 | [RFC PR #320]: https://github.com/rust-lang/rfcs/pull/320 | |
974 | ||
975 | Semantically, each piece of data that has a destructor may need a | |
976 | boolean flag to indicate whether or not its destructor has been run | |
977 | yet. However, in many cases there is no need to actually maintain such | |
978 | a flag: It can be apparent from the code itself that a given path is | |
979 | always initialized (or always deinitialized) when control reaches the | |
980 | end of its owner's scope, and thus we can unconditionally emit (or | |
981 | not) the destructor invocation for that path. | |
982 | ||
983 | A simple example of this is the following: | |
984 | ||
985 | ```rust | |
c34b1796 AL |
986 | struct D { p: i32 } |
987 | impl D { fn new(x: i32) -> D { ... } | |
85aaf69f SL |
988 | impl Drop for D { ... } |
989 | ||
990 | fn foo(a: D, b: D, t: || -> bool) { | |
991 | let c: D; | |
992 | let d: D; | |
993 | if t() { c = b; } | |
994 | } | |
995 | ``` | |
996 | ||
997 | At the end of the body of `foo`, the compiler knows that `a` is | |
998 | initialized, introducing a drop obligation (deallocating the boxed | |
999 | integer) for the end of `a`'s scope that is run unconditionally. | |
1000 | Likewise the compiler knows that `d` is not initialized, and thus it | |
1001 | leave out the drop code for `d`. | |
1002 | ||
1003 | The compiler cannot statically know the drop-state of `b` nor `c` at | |
1004 | the end of their scope, since that depends on the value of | |
1005 | `t`. Therefore, we need to insert boolean flags to track whether we | |
1006 | need to drop `b` and `c`. | |
1007 | ||
1008 | However, the matter is not as simple as just mapping local variables | |
1009 | to their corresponding drop flags when necessary. In particular, in | |
1010 | addition to being able to move data out of local variables, Rust | |
1011 | allows one to move values in and out of structured data. | |
1012 | ||
1013 | Consider the following: | |
1014 | ||
1015 | ```rust | |
1016 | struct S { x: D, y: D, z: D } | |
1017 | ||
1018 | fn foo(a: S, mut b: S, t: || -> bool) { | |
1019 | let mut c: S; | |
1020 | let d: S; | |
1021 | let e: S = a.clone(); | |
1022 | if t() { | |
1023 | c = b; | |
1024 | b.x = e.y; | |
1025 | } | |
1026 | if t() { c.y = D::new(4); } | |
1027 | } | |
1028 | ``` | |
1029 | ||
1030 | As before, the drop obligations of `a` and `d` can be statically | |
1031 | determined, and again the state of `b` and `c` depend on dynamic | |
1032 | state. But additionally, the dynamic drop obligations introduced by | |
1033 | `b` and `c` are not just per-local boolean flags. For example, if the | |
1034 | first call to `t` returns `false` and the second call `true`, then at | |
1035 | the end of their scope, `b` will be completely initialized, but only | |
1036 | `c.y` in `c` will be initialized. If both calls to `t` return `true`, | |
1037 | then at the end of their scope, `c` will be completely initialized, | |
1038 | but only `b.x` will be initialized in `b`, and only `e.x` and `e.z` | |
1039 | will be initialized in `e`. | |
1040 | ||
1041 | Note that we need to cover the `z` field in each case in some way, | |
1042 | since it may (or may not) need to be dropped, even though `z` is never | |
1043 | directly mentioned in the body of the `foo` function. We call a path | |
1044 | like `b.z` a *fragment sibling* of `b.x`, since the field `z` comes | |
1045 | from the same structure `S` that declared the field `x` in `b.x`. | |
1046 | ||
1047 | In general we need to maintain boolean flags that match the | |
1048 | `S`-structure of both `b` and `c`. In addition, we need to consult | |
1049 | such a flag when doing an assignment (such as `c.y = D::new(4);` | |
1050 | above), in order to know whether or not there is a previous value that | |
1051 | needs to be dropped before we do the assignment. | |
1052 | ||
1053 | So for any given function, we need to determine what flags are needed | |
1054 | to track its drop obligations. Our strategy for determining the set of | |
1055 | flags is to represent the fragmentation of the structure explicitly: | |
1056 | by starting initially from the paths that are explicitly mentioned in | |
1057 | moves and assignments (such as `b.x` and `c.y` above), and then | |
1058 | traversing the structure of the path's type to identify leftover | |
1059 | *unmoved fragments*: assigning into `c.y` means that `c.x` and `c.z` | |
1060 | are leftover unmoved fragments. Each fragment represents a drop | |
1061 | obligation that may need to be tracked. Paths that are only moved or | |
1062 | assigned in their entirety (like `a` and `d`) are treated as a single | |
1063 | drop obligation. | |
1064 | ||
1065 | The fragment construction process works by piggy-backing on the | |
1066 | existing `move_data` module. We already have callbacks that visit each | |
1067 | direct move and assignment; these form the basis for the sets of | |
1068 | moved_leaf_paths and assigned_leaf_paths. From these leaves, we can | |
1069 | walk up their parent chain to identify all of their parent paths. | |
1070 | We need to identify the parents because of cases like the following: | |
1071 | ||
1072 | ```rust | |
1073 | struct Pair<X,Y>{ x: X, y: Y } | |
1074 | fn foo(dd_d_d: Pair<Pair<Pair<D, D>, D>, D>) { | |
1075 | other_function(dd_d_d.x.y); | |
1076 | } | |
1077 | ``` | |
1078 | ||
1079 | In this code, the move of the path `dd_d.x.y` leaves behind not only | |
1080 | the fragment drop-obligation `dd_d.x.x` but also `dd_d.y` as well. | |
1081 | ||
1082 | Once we have identified the directly-referenced leaves and their | |
1083 | parents, we compute the left-over fragments, in the function | |
1084 | `fragments::add_fragment_siblings`. As of this writing this works by | |
1085 | looking at each directly-moved or assigned path P, and blindly | |
1086 | gathering all sibling fields of P (as well as siblings for the parents | |
1087 | of P, etc). After accumulating all such siblings, we filter out the | |
1088 | entries added as siblings of P that turned out to be | |
1089 | directly-referenced paths (or parents of directly referenced paths) | |
1090 | themselves, thus leaving the never-referenced "left-overs" as the only | |
1091 | thing left from the gathering step. | |
1092 | ||
1093 | ## Array structural fragments | |
1094 | ||
1095 | A special case of the structural fragments discussed above are | |
1096 | the elements of an array that has been passed by value, such as | |
1097 | the following: | |
1098 | ||
1099 | ```rust | |
c34b1796 | 1100 | fn foo(a: [D; 10], i: i32) -> D { |
85aaf69f SL |
1101 | a[i] |
1102 | } | |
1103 | ``` | |
1104 | ||
1105 | The above code moves a single element out of the input array `a`. | |
1106 | The remainder of the array still needs to be dropped; i.e., it | |
1107 | is a structural fragment. Note that after performing such a move, | |
1108 | it is not legal to read from the array `a`. There are a number of | |
1109 | ways to deal with this, but the important thing to note is that | |
1110 | the semantics needs to distinguish in some manner between a | |
1111 | fragment that is the *entire* array versus a fragment that represents | |
1112 | all-but-one element of the array. A place where that distinction | |
1113 | would arise is the following: | |
1114 | ||
1115 | ```rust | |
c34b1796 | 1116 | fn foo(a: [D; 10], b: [D; 10], i: i32, t: bool) -> D { |
85aaf69f SL |
1117 | if t { |
1118 | a[i] | |
1119 | } else { | |
1120 | b[i] | |
1121 | } | |
1122 | ||
1123 | // When control exits, we will need either to drop all of `a` | |
1124 | // and all-but-one of `b`, or to drop all of `b` and all-but-one | |
1125 | // of `a`. | |
1126 | } | |
1127 | ``` | |
1128 | ||
94b46f34 | 1129 | There are a number of ways that the codegen backend could choose to |
85aaf69f | 1130 | compile this (e.g. a `[bool; 10]` array for each such moved array; |
c34b1796 | 1131 | or an `Option<usize>` for each moved array). From the viewpoint of the |
85aaf69f SL |
1132 | borrow-checker, the important thing is to record what kind of fragment |
1133 | is implied by the relevant moves. | |
1134 | ||
1135 | # Future work | |
1136 | ||
1137 | While writing up these docs, I encountered some rules I believe to be | |
1138 | stricter than necessary: | |
1139 | ||
2c00a5a8 | 1140 | - I think restricting the `&mut` P against moves and `ALIAS` is sufficient, |
85aaf69f SL |
1141 | `MUTATE` and `CLAIM` are overkill. `MUTATE` was necessary when swap was |
1142 | a built-in operator, but as it is not, it is implied by `CLAIM`, | |
1143 | and `CLAIM` is implied by `ALIAS`. The only net effect of this is an | |
1144 | extra error message in some cases, though. | |
1145 | - I have not described how closures interact. Current code is unsound. | |
1146 | I am working on describing and implementing the fix. | |
1147 | - If we wish, we can easily extend the move checking to allow finer-grained | |
1148 | tracking of what is initialized and what is not, enabling code like | |
1149 | this: | |
1150 | ||
1151 | a = x.f.g; // x.f.g is now uninitialized | |
1152 | // here, x and x.f are not usable, but x.f.h *is* | |
1153 | x.f.g = b; // x.f.g is not initialized | |
1154 | // now x, x.f, x.f.g, x.f.h are all usable | |
1155 | ||
1156 | What needs to change here, most likely, is that the `moves` module | |
1157 | should record not only what paths are moved, but what expressions | |
1158 | are actual *uses*. For example, the reference to `x` in `x.f.g = b` | |
1159 | is not a true *use* in the sense that it requires `x` to be fully | |
1160 | initialized. This is in fact why the above code produces an error | |
1161 | today: the reference to `x` in `x.f.g = b` is considered illegal | |
1162 | because `x` is not fully initialized. | |
1163 | ||
1164 | There are also some possible refactorings: | |
1165 | ||
1166 | - It might be nice to replace all loan paths with the MovePath mechanism, | |
1167 | since they allow lightweight comparison using an integer. |