]> git.proxmox.com Git - rustc.git/blame - src/librustc/dep_graph/README.md
Imported Upstream version 1.9.0+dfsg1
[rustc.git] / src / librustc / dep_graph / README.md
CommitLineData
9cc50fc6
SL
1# Dependency graph for incremental compilation
2
3This module contains the infrastructure for managing the incremental
4compilation dependency graph. This README aims to explain how it ought
5to be used. In this document, we'll first explain the overall
6strategy, and then share some tips for handling specific scenarios.
7
8The high-level idea is that we want to instrument the compiler to
9track which parts of the AST and other IR are read/written by what.
10This way, when we come back later, we can look at this graph and
11determine what work needs to be redone.
12
13### The dependency graph
14
15The nodes of the graph are defined by the enum `DepNode`. They represent
16one of three things:
17
181. HIR nodes (like `Hir(DefId)`) represent the HIR input itself.
192. Data nodes (like `ItemSignature(DefId)`) represent some computed
20 information about a particular item.
213. Procedure notes (like `CoherenceCheckImpl(DefId)`) represent some
22 procedure that is executing. Usually this procedure is
23 performing some kind of check for errors. You can think of them as
24 computed values where the value being computed is `()` (and the
25 value may fail to be computed, if an error results).
26
27An edge `N1 -> N2` is added between two nodes if either:
28
29- the value of `N1` is used to compute `N2`;
30- `N1` is read by the procedure `N2`;
31- the procedure `N1` writes the value `N2`.
32
33The latter two conditions are equivalent to the first one if you think
34of procedures as values.
35
36### Basic tracking
37
38There is a very general strategy to ensure that you have a correct, if
39sometimes overconservative, dependency graph. The two main things you have
40to do are (a) identify shared state and (b) identify the current tasks.
41
42### Identifying shared state
43
44Identify "shared state" that will be written by one pass and read by
45another. In particular, we need to identify shared state that will be
46read "across items" -- that is, anything where changes in one item
47could invalidate work done for other items. So, for example:
48
491. The signature for a function is "shared state".
502. The computed type of some expression in the body of a function is
51 not shared state, because if it changes it does not itself
52 invalidate other functions (though it may be that it causes new
53 monomorphizations to occur, but that's handled independently).
54a0048b 54
9cc50fc6
SL
55Put another way: if the HIR for an item changes, we are going to
56recompile that item for sure. But we need the dep tracking map to tell
57us what *else* we have to recompile. Shared state is anything that is
58used to communicate results from one item to another.
59
60### Identifying the current task
61
62The dep graph always tracks a current task: this is basically the
63`DepNode` that the compiler is computing right now. Typically it would
64be a procedure node, but it can also be a data node (as noted above,
65the two are kind of equivalent).
66
67You set the current task by calling `dep_graph.in_task(node)`. For example:
68
69```rust
70let _task = tcx.dep_graph.in_task(DepNode::Privacy);
71```
72
73Now all the code until `_task` goes out of scope will be considered
74part of the "privacy task".
75
76The tasks are maintained in a stack, so it is perfectly fine to nest
77one task within another. Because pushing a task is considered to be
78computing a value, when you nest a task `N2` inside of a task `N1`, we
79automatically add an edge `N2 -> N1` (since `N1` presumably needed the
80result of `N2` to complete):
81
82```rust
83let _n1 = tcx.dep_graph.in_task(DepNode::N1);
84let _n2 = tcx.dep_graph.in_task(DepNode::N2);
85// this will result in an edge N1 -> n2
86```
87
88### Ignore tasks
89
90Although it is rarely needed, you can also push a special "ignore"
91task:
92
93```rust
94let _ignore = tc.dep_graph.in_ignore();
95```
96
97This will cause all read/write edges to be ignored until it goes out
98of scope or until something else is pushed. For example, we could
99suppress the edge between nested tasks like so:
100
101```rust
102let _n1 = tcx.dep_graph.in_task(DepNode::N1);
103let _ignore = tcx.dep_graph.in_ignore();
104let _n2 = tcx.dep_graph.in_task(DepNode::N2);
105// now no edge is added
106```
107
108### Tracking reads and writes
109
110We need to identify what shared state is read/written by the current
111task as it executes. The most fundamental way of doing that is to invoke
112the `read` and `write` methods on `DepGraph`:
113
114```rust
115// Adds an edge from DepNode::Hir(some_def_id) to the current task
116tcx.dep_graph.read(DepNode::Hir(some_def_id))
117
118// Adds an edge from the current task to DepNode::ItemSignature(some_def_id)
119tcx.dep_graph.write(DepNode::ItemSignature(some_def_id))
120```
121
122However, you should rarely need to invoke those methods directly.
123Instead, the idea is to *encapsulate* shared state into some API that
124will invoke `read` and `write` automatically. The most common way to
125do this is to use a `DepTrackingMap`, described in the next section,
126but any sort of abstraction barrier will do. In general, the strategy
127is that getting access to information implicitly adds an appropriate
128`read`. So, for example, when you use the
129`dep_graph::visit_all_items_in_krate` helper method, it will visit
130each item `X`, start a task `Foo(X)` for that item, and automatically
131add an edge `Hir(X) -> Foo(X)`. This edge is added because the code is
132being given access to the HIR node for `X`, and hence it is expected
133to read from it. Similarly, reading from the `tcache` map for item `X`
134(which is a `DepTrackingMap`, described below) automatically invokes
135`dep_graph.read(ItemSignature(X))`.
136
137To make this strategy work, a certain amount of indirection is
138required. For example, modules in the HIR do not have direct pointers
139to the items that they contain. Rather, they contain node-ids -- one
140can then ask the HIR map for the item with a given node-id. This gives
141us an opportunity to add an appropriate read edge.
142
143#### Explicit calls to read and write when starting a new subtask
144
145One time when you *may* need to call `read` and `write` directly is
146when you push a new task onto the stack, either by calling `in_task`
147as shown above or indirectly, such as with the `memoize` pattern
148described below. In that case, any data that the task has access to
149from the surrounding environment must be explicitly "read". For
150example, in `librustc_typeck`, the collection code visits all items
151and, among other things, starts a subtask producing its signature
152(what follows is simplified pseudocode, of course):
153
154```rust
155fn visit_item(item: &hir::Item) {
156 // Here, current subtask is "Collect(X)", and an edge Hir(X) -> Collect(X)
157 // has automatically been added by `visit_all_items_in_krate`.
158 let sig = signature_of_item(item);
159}
160
161fn signature_of_item(item: &hir::Item) {
162 let def_id = tcx.map.local_def_id(item.id);
163 let task = tcx.dep_graph.in_task(DepNode::ItemSignature(def_id));
164 tcx.dep_graph.read(DepNode::Hir(def_id)); // <-- the interesting line
165 ...
166}
167```
168
169Here you can see that, in `signature_of_item`, we started a subtask
170corresponding to producing the `ItemSignature`. This subtask will read from
171`item` -- but it gained access to `item` implicitly. This means that if it just
172reads from `item`, there would be missing edges in the graph:
173
174 Hir(X) --+ // added by the explicit call to `read`
175 | |
176 | +---> ItemSignature(X) -> Collect(X)
177 | ^
178 | |
179 +---------------------------------+ // added by `visit_all_items_in_krate`
54a0048b 180
9cc50fc6
SL
181In particular, the edge from `Hir(X)` to `ItemSignature(X)` is only
182present because we called `read` ourselves when entering the `ItemSignature(X)`
183task.
184
185So, the rule of thumb: when entering a new task yourself, register
186reads on any shared state that you inherit. (This actually comes up
187fairly infrequently though: the main place you need caution is around
188memoization.)
189
190#### Dependency tracking map
191
192`DepTrackingMap` is a particularly convenient way to correctly store
193shared state. A `DepTrackingMap` is a special hashmap that will add
194edges automatically when `get` and `insert` are called. The idea is
195that, when you get/insert a value for the key `K`, we will add an edge
196from/to the node `DepNode::Variant(K)` (for some variant specific to
197the map).
198
199Each `DepTrackingMap` is parameterized by a special type `M` that
200implements `DepTrackingMapConfig`; this trait defines the key and value
201types of the map, and also defines a fn for converting from the key to
202a `DepNode` label. You don't usually have to muck about with this by
203hand, there is a macro for creating it. You can see the complete set
204of `DepTrackingMap` definitions in `librustc/middle/ty/maps.rs`.
205
206As an example, let's look at the `adt_defs` map. The `adt_defs` map
207maps from the def-id of a struct/enum to its `AdtDef`. It is defined
208using this macro:
209
210```rust
211dep_map_ty! { AdtDefs: ItemSignature(DefId) -> ty::AdtDefMaster<'tcx> }
212// ~~~~~~~ ~~~~~~~~~~~~~ ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
213// | | Key type Value type
214// | DepNode variant
215// Name of map id type
216```
217
218this indicates that a map id type `AdtDefs` will be created. The key
219of the map will be a `DefId` and value will be
220`ty::AdtDefMaster<'tcx>`. The `DepNode` will be created by
221`DepNode::ItemSignature(K)` for a given key.
222
223Once that is done, you can just use the `DepTrackingMap` like any
224other map:
225
226```rust
227let mut map: DepTrackingMap<M> = DepTrackingMap::new(dep_graph);
228map.insert(key, value); // registers dep_graph.write
229map.get(key; // registers dep_graph.read
230```
231
232#### Memoization
233
234One particularly interesting case is memoization. If you have some
235shared state that you compute in a memoized fashion, the correct thing
236to do is to define a `RefCell<DepTrackingMap>` for it and use the
237`memoize` helper:
238
239```rust
240map.memoize(key, || /* compute value */)
241```
242
243This will create a graph that looks like
244
245 ... -> MapVariant(key) -> CurrentTask
246
247where `MapVariant` is the `DepNode` variant that the map is associated with,
248and `...` are whatever edges the `/* compute value */` closure creates.
249
250In particular, using the memoize helper is much better than writing
251the obvious code yourself:
252
253```
254if let Some(result) = map.get(key) {
255 return result;
256}
257let value = /* compute value */;
258map.insert(key, value);
259```
260
261If you write that code manually, the dependency graph you get will
262include artificial edges that are not necessary. For example, imagine that
263two tasks, A and B, both invoke the manual memoization code, but A happens
264to go first. The resulting graph will be:
265
266 ... -> A -> MapVariant(key) -> B
267 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ // caused by A writing to MapVariant(key)
268 ~~~~~~~~~~~~~~~~~~~~ // caused by B reading from MapVariant(key)
269
270This graph is not *wrong*, but it encodes a path from A to B that
271should not exist. In contrast, using the memoized helper, you get:
272
273 ... -> MapVariant(key) -> A
274 |
275 +----------> B
54a0048b
SL
276
277which is much cleaner.
9cc50fc6
SL
278
279**Be aware though that the closure is executed with `MapVariant(key)`
280pushed onto the stack as the current task!** That means that you must
281add explicit `read` calls for any shared state that it accesses
282implicitly from its environment. See the section on "explicit calls to
283read and write when starting a new subtask" above for more details.
284
285### How to decide where to introduce a new task
286
287Certainly, you need at least one task on the stack: any attempt to
288`read` or `write` shared state will panic if there is no current
289task. But where does it make sense to introduce subtasks? The basic
290rule is that a subtask makes sense for any discrete unit of work you
291may want to skip in the future. Adding a subtask separates out the
292reads/writes from *that particular subtask* versus the larger
293context. An example: you might have a 'meta' task for all of borrow
294checking, and then subtasks for borrow checking individual fns. (Seen
295in this light, memoized computations are just a special case where we
296may want to avoid redoing the work even within the context of one
297compilation.)
298
299The other case where you might want a subtask is to help with refining
300the reads/writes for some later bit of work that needs to be memoized.
301For example, we create a subtask for type-checking the body of each
302fn. However, in the initial version of incr. comp. at least, we do
303not expect to actually *SKIP* type-checking -- we only expect to skip
304trans. However, it's still useful to create subtasks for type-checking
305individual items, because, otherwise, if a fn sig changes, we won't
306know which callers are affected -- in fact, because the graph would be
307so coarse, we'd just have to retrans everything, since we can't
308distinguish which fns used which fn sigs.
309
310### Testing the dependency graph
311
312There are various ways to write tests against the dependency graph.
313The simplest mechanism are the
314`#[rustc_if_this_changed]` and `#[rustc_then_this_would_need]`
315annotations. These are used in compile-fail tests to test whether the
316expected set of paths exist in the dependency graph. As an example,
317see `src/test/compile-fail/dep-graph-caller-callee.rs`.
318
319The idea is that you can annotate a test like:
320
321```rust
322#[rustc_if_this_changed]
323fn foo() { }
324
325#[rustc_then_this_would_need(TypeckItemBody)] //~ ERROR OK
326fn bar() { foo(); }
327
328#[rustc_then_this_would_need(TypeckItemBody)] //~ ERROR no path
329fn baz() { }
330```
331
332This will check whether there is a path in the dependency graph from
333`Hir(foo)` to `TypeckItemBody(bar)`. An error is reported for each
334`#[rustc_then_this_would_need]` annotation that indicates whether a
335path exists. `//~ ERROR` annotations can then be used to test if a
336path is found (as demonstrated above).
337
338### Debugging the dependency graph
339
340The compiler is also capable of dumping the dependency graph for your
341debugging pleasure. To do so, pass the `-Z dump-dep-graph` flag. The
342graph will be dumped to `dep_graph.{txt,dot}` in the current
343directory. You can override the filename with the `RUST_DEP_GRAPH`
344environment variable.
345
346Frequently, though, the full dep graph is quite overwhelming and not
347particularly helpful. Therefore, the compiler also allows you to filter
348the graph. You can filter in three ways:
349
3501. All edges originating in a particular set of nodes (usually a single node).
3512. All edges reaching a particular set of nodes.
3523. All edges that lie between given start and end nodes.
353
354To filter, use the `RUST_DEP_GRAPH_FILTER` environment variable, which should
355look like one of the following:
356
357```
358source_filter // nodes originating from source_filter
359-> target_filter // nodes that can reach target_filter
360source_filter -> target_filter // nodes in between source_filter and target_filter
361```
362
363`source_filter` and `target_filter` are a `&`-separated list of strings.
364A node is considered to match a filter if all of those strings appear in its
365label. So, for example:
366
367```
368RUST_DEP_GRAPH_FILTER='-> TypeckItemBody'
369```
370
371would select the predecessors of all `TypeckItemBody` nodes. Usually though you
372want the `TypeckItemBody` node for some particular fn, so you might write:
373
374```
375RUST_DEP_GRAPH_FILTER='-> TypeckItemBody & bar'
376```
377
378This will select only the `TypeckItemBody` nodes for fns with `bar` in their name.
379
380Perhaps you are finding that when you change `foo` you need to re-type-check `bar`,
381but you don't think you should have to. In that case, you might do:
382
383```
384RUST_DEP_GRAPH_FILTER='Hir&foo -> TypeckItemBody & bar'
385```
386
387This will dump out all the nodes that lead from `Hir(foo)` to
388`TypeckItemBody(bar)`, from which you can (hopefully) see the source
389of the erroneous edge.