]> git.proxmox.com Git - rustc.git/blame - src/librustc/dep_graph/README.md
New upstream version 1.12.0+dfsg1
[rustc.git] / src / librustc / dep_graph / README.md
CommitLineData
9cc50fc6
SL
1# Dependency graph for incremental compilation
2
3This module contains the infrastructure for managing the incremental
4compilation dependency graph. This README aims to explain how it ought
5to be used. In this document, we'll first explain the overall
6strategy, and then share some tips for handling specific scenarios.
7
8The high-level idea is that we want to instrument the compiler to
9track which parts of the AST and other IR are read/written by what.
10This way, when we come back later, we can look at this graph and
11determine what work needs to be redone.
12
13### The dependency graph
14
15The nodes of the graph are defined by the enum `DepNode`. They represent
16one of three things:
17
181. HIR nodes (like `Hir(DefId)`) represent the HIR input itself.
192. Data nodes (like `ItemSignature(DefId)`) represent some computed
20 information about a particular item.
213. Procedure notes (like `CoherenceCheckImpl(DefId)`) represent some
22 procedure that is executing. Usually this procedure is
23 performing some kind of check for errors. You can think of them as
24 computed values where the value being computed is `()` (and the
25 value may fail to be computed, if an error results).
26
27An edge `N1 -> N2` is added between two nodes if either:
28
29- the value of `N1` is used to compute `N2`;
30- `N1` is read by the procedure `N2`;
31- the procedure `N1` writes the value `N2`.
32
33The latter two conditions are equivalent to the first one if you think
34of procedures as values.
35
36### Basic tracking
37
38There is a very general strategy to ensure that you have a correct, if
39sometimes overconservative, dependency graph. The two main things you have
40to do are (a) identify shared state and (b) identify the current tasks.
41
42### Identifying shared state
43
44Identify "shared state" that will be written by one pass and read by
45another. In particular, we need to identify shared state that will be
46read "across items" -- that is, anything where changes in one item
47could invalidate work done for other items. So, for example:
48
491. The signature for a function is "shared state".
502. The computed type of some expression in the body of a function is
51 not shared state, because if it changes it does not itself
52 invalidate other functions (though it may be that it causes new
53 monomorphizations to occur, but that's handled independently).
54a0048b 54
9cc50fc6
SL
55Put another way: if the HIR for an item changes, we are going to
56recompile that item for sure. But we need the dep tracking map to tell
57us what *else* we have to recompile. Shared state is anything that is
58used to communicate results from one item to another.
59
60### Identifying the current task
61
62The dep graph always tracks a current task: this is basically the
63`DepNode` that the compiler is computing right now. Typically it would
64be a procedure node, but it can also be a data node (as noted above,
65the two are kind of equivalent).
66
67You set the current task by calling `dep_graph.in_task(node)`. For example:
68
69```rust
70let _task = tcx.dep_graph.in_task(DepNode::Privacy);
71```
72
73Now all the code until `_task` goes out of scope will be considered
74part of the "privacy task".
75
76The tasks are maintained in a stack, so it is perfectly fine to nest
77one task within another. Because pushing a task is considered to be
78computing a value, when you nest a task `N2` inside of a task `N1`, we
79automatically add an edge `N2 -> N1` (since `N1` presumably needed the
80result of `N2` to complete):
81
82```rust
83let _n1 = tcx.dep_graph.in_task(DepNode::N1);
84let _n2 = tcx.dep_graph.in_task(DepNode::N2);
85// this will result in an edge N1 -> n2
86```
87
88### Ignore tasks
89
90Although it is rarely needed, you can also push a special "ignore"
91task:
92
93```rust
94let _ignore = tc.dep_graph.in_ignore();
95```
96
97This will cause all read/write edges to be ignored until it goes out
98of scope or until something else is pushed. For example, we could
99suppress the edge between nested tasks like so:
100
101```rust
102let _n1 = tcx.dep_graph.in_task(DepNode::N1);
103let _ignore = tcx.dep_graph.in_ignore();
104let _n2 = tcx.dep_graph.in_task(DepNode::N2);
105// now no edge is added
106```
107
108### Tracking reads and writes
109
110We need to identify what shared state is read/written by the current
111task as it executes. The most fundamental way of doing that is to invoke
112the `read` and `write` methods on `DepGraph`:
113
114```rust
115// Adds an edge from DepNode::Hir(some_def_id) to the current task
116tcx.dep_graph.read(DepNode::Hir(some_def_id))
117
118// Adds an edge from the current task to DepNode::ItemSignature(some_def_id)
119tcx.dep_graph.write(DepNode::ItemSignature(some_def_id))
120```
121
122However, you should rarely need to invoke those methods directly.
123Instead, the idea is to *encapsulate* shared state into some API that
124will invoke `read` and `write` automatically. The most common way to
125do this is to use a `DepTrackingMap`, described in the next section,
126but any sort of abstraction barrier will do. In general, the strategy
127is that getting access to information implicitly adds an appropriate
128`read`. So, for example, when you use the
129`dep_graph::visit_all_items_in_krate` helper method, it will visit
130each item `X`, start a task `Foo(X)` for that item, and automatically
131add an edge `Hir(X) -> Foo(X)`. This edge is added because the code is
132being given access to the HIR node for `X`, and hence it is expected
133to read from it. Similarly, reading from the `tcache` map for item `X`
134(which is a `DepTrackingMap`, described below) automatically invokes
135`dep_graph.read(ItemSignature(X))`.
136
5bcae85e
SL
137**Note:** adding `Hir` nodes requires a bit of caution due to the
138"inlining" that old trans and constant evaluation still use. See the
139section on inlining below.
140
9cc50fc6
SL
141To make this strategy work, a certain amount of indirection is
142required. For example, modules in the HIR do not have direct pointers
143to the items that they contain. Rather, they contain node-ids -- one
144can then ask the HIR map for the item with a given node-id. This gives
145us an opportunity to add an appropriate read edge.
146
147#### Explicit calls to read and write when starting a new subtask
148
149One time when you *may* need to call `read` and `write` directly is
150when you push a new task onto the stack, either by calling `in_task`
151as shown above or indirectly, such as with the `memoize` pattern
152described below. In that case, any data that the task has access to
153from the surrounding environment must be explicitly "read". For
154example, in `librustc_typeck`, the collection code visits all items
155and, among other things, starts a subtask producing its signature
156(what follows is simplified pseudocode, of course):
157
158```rust
159fn visit_item(item: &hir::Item) {
160 // Here, current subtask is "Collect(X)", and an edge Hir(X) -> Collect(X)
161 // has automatically been added by `visit_all_items_in_krate`.
162 let sig = signature_of_item(item);
163}
164
165fn signature_of_item(item: &hir::Item) {
166 let def_id = tcx.map.local_def_id(item.id);
167 let task = tcx.dep_graph.in_task(DepNode::ItemSignature(def_id));
168 tcx.dep_graph.read(DepNode::Hir(def_id)); // <-- the interesting line
169 ...
170}
171```
172
173Here you can see that, in `signature_of_item`, we started a subtask
174corresponding to producing the `ItemSignature`. This subtask will read from
175`item` -- but it gained access to `item` implicitly. This means that if it just
176reads from `item`, there would be missing edges in the graph:
177
178 Hir(X) --+ // added by the explicit call to `read`
179 | |
180 | +---> ItemSignature(X) -> Collect(X)
181 | ^
182 | |
183 +---------------------------------+ // added by `visit_all_items_in_krate`
54a0048b 184
9cc50fc6
SL
185In particular, the edge from `Hir(X)` to `ItemSignature(X)` is only
186present because we called `read` ourselves when entering the `ItemSignature(X)`
187task.
188
189So, the rule of thumb: when entering a new task yourself, register
190reads on any shared state that you inherit. (This actually comes up
191fairly infrequently though: the main place you need caution is around
192memoization.)
193
194#### Dependency tracking map
195
196`DepTrackingMap` is a particularly convenient way to correctly store
197shared state. A `DepTrackingMap` is a special hashmap that will add
198edges automatically when `get` and `insert` are called. The idea is
199that, when you get/insert a value for the key `K`, we will add an edge
200from/to the node `DepNode::Variant(K)` (for some variant specific to
201the map).
202
203Each `DepTrackingMap` is parameterized by a special type `M` that
204implements `DepTrackingMapConfig`; this trait defines the key and value
205types of the map, and also defines a fn for converting from the key to
206a `DepNode` label. You don't usually have to muck about with this by
207hand, there is a macro for creating it. You can see the complete set
208of `DepTrackingMap` definitions in `librustc/middle/ty/maps.rs`.
209
210As an example, let's look at the `adt_defs` map. The `adt_defs` map
211maps from the def-id of a struct/enum to its `AdtDef`. It is defined
212using this macro:
213
214```rust
215dep_map_ty! { AdtDefs: ItemSignature(DefId) -> ty::AdtDefMaster<'tcx> }
216// ~~~~~~~ ~~~~~~~~~~~~~ ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
217// | | Key type Value type
218// | DepNode variant
219// Name of map id type
220```
221
222this indicates that a map id type `AdtDefs` will be created. The key
223of the map will be a `DefId` and value will be
224`ty::AdtDefMaster<'tcx>`. The `DepNode` will be created by
225`DepNode::ItemSignature(K)` for a given key.
226
227Once that is done, you can just use the `DepTrackingMap` like any
228other map:
229
230```rust
231let mut map: DepTrackingMap<M> = DepTrackingMap::new(dep_graph);
232map.insert(key, value); // registers dep_graph.write
233map.get(key; // registers dep_graph.read
234```
235
236#### Memoization
237
238One particularly interesting case is memoization. If you have some
239shared state that you compute in a memoized fashion, the correct thing
240to do is to define a `RefCell<DepTrackingMap>` for it and use the
241`memoize` helper:
242
243```rust
244map.memoize(key, || /* compute value */)
245```
246
247This will create a graph that looks like
248
249 ... -> MapVariant(key) -> CurrentTask
250
251where `MapVariant` is the `DepNode` variant that the map is associated with,
252and `...` are whatever edges the `/* compute value */` closure creates.
253
254In particular, using the memoize helper is much better than writing
255the obvious code yourself:
256
257```
258if let Some(result) = map.get(key) {
259 return result;
260}
261let value = /* compute value */;
262map.insert(key, value);
263```
264
265If you write that code manually, the dependency graph you get will
266include artificial edges that are not necessary. For example, imagine that
267two tasks, A and B, both invoke the manual memoization code, but A happens
268to go first. The resulting graph will be:
269
270 ... -> A -> MapVariant(key) -> B
271 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ // caused by A writing to MapVariant(key)
272 ~~~~~~~~~~~~~~~~~~~~ // caused by B reading from MapVariant(key)
273
274This graph is not *wrong*, but it encodes a path from A to B that
275should not exist. In contrast, using the memoized helper, you get:
276
277 ... -> MapVariant(key) -> A
278 |
279 +----------> B
54a0048b
SL
280
281which is much cleaner.
9cc50fc6
SL
282
283**Be aware though that the closure is executed with `MapVariant(key)`
284pushed onto the stack as the current task!** That means that you must
285add explicit `read` calls for any shared state that it accesses
286implicitly from its environment. See the section on "explicit calls to
287read and write when starting a new subtask" above for more details.
288
289### How to decide where to introduce a new task
290
291Certainly, you need at least one task on the stack: any attempt to
292`read` or `write` shared state will panic if there is no current
293task. But where does it make sense to introduce subtasks? The basic
294rule is that a subtask makes sense for any discrete unit of work you
295may want to skip in the future. Adding a subtask separates out the
296reads/writes from *that particular subtask* versus the larger
297context. An example: you might have a 'meta' task for all of borrow
298checking, and then subtasks for borrow checking individual fns. (Seen
299in this light, memoized computations are just a special case where we
300may want to avoid redoing the work even within the context of one
301compilation.)
302
303The other case where you might want a subtask is to help with refining
304the reads/writes for some later bit of work that needs to be memoized.
305For example, we create a subtask for type-checking the body of each
306fn. However, in the initial version of incr. comp. at least, we do
307not expect to actually *SKIP* type-checking -- we only expect to skip
308trans. However, it's still useful to create subtasks for type-checking
309individual items, because, otherwise, if a fn sig changes, we won't
310know which callers are affected -- in fact, because the graph would be
311so coarse, we'd just have to retrans everything, since we can't
312distinguish which fns used which fn sigs.
313
314### Testing the dependency graph
315
316There are various ways to write tests against the dependency graph.
317The simplest mechanism are the
318`#[rustc_if_this_changed]` and `#[rustc_then_this_would_need]`
319annotations. These are used in compile-fail tests to test whether the
320expected set of paths exist in the dependency graph. As an example,
321see `src/test/compile-fail/dep-graph-caller-callee.rs`.
322
323The idea is that you can annotate a test like:
324
325```rust
326#[rustc_if_this_changed]
327fn foo() { }
328
329#[rustc_then_this_would_need(TypeckItemBody)] //~ ERROR OK
330fn bar() { foo(); }
331
332#[rustc_then_this_would_need(TypeckItemBody)] //~ ERROR no path
333fn baz() { }
334```
335
336This will check whether there is a path in the dependency graph from
337`Hir(foo)` to `TypeckItemBody(bar)`. An error is reported for each
338`#[rustc_then_this_would_need]` annotation that indicates whether a
339path exists. `//~ ERROR` annotations can then be used to test if a
340path is found (as demonstrated above).
341
342### Debugging the dependency graph
343
344The compiler is also capable of dumping the dependency graph for your
345debugging pleasure. To do so, pass the `-Z dump-dep-graph` flag. The
346graph will be dumped to `dep_graph.{txt,dot}` in the current
347directory. You can override the filename with the `RUST_DEP_GRAPH`
348environment variable.
349
350Frequently, though, the full dep graph is quite overwhelming and not
351particularly helpful. Therefore, the compiler also allows you to filter
352the graph. You can filter in three ways:
353
3541. All edges originating in a particular set of nodes (usually a single node).
3552. All edges reaching a particular set of nodes.
3563. All edges that lie between given start and end nodes.
357
358To filter, use the `RUST_DEP_GRAPH_FILTER` environment variable, which should
359look like one of the following:
360
361```
362source_filter // nodes originating from source_filter
363-> target_filter // nodes that can reach target_filter
364source_filter -> target_filter // nodes in between source_filter and target_filter
365```
366
367`source_filter` and `target_filter` are a `&`-separated list of strings.
368A node is considered to match a filter if all of those strings appear in its
369label. So, for example:
370
371```
372RUST_DEP_GRAPH_FILTER='-> TypeckItemBody'
373```
374
375would select the predecessors of all `TypeckItemBody` nodes. Usually though you
376want the `TypeckItemBody` node for some particular fn, so you might write:
377
378```
379RUST_DEP_GRAPH_FILTER='-> TypeckItemBody & bar'
380```
381
382This will select only the `TypeckItemBody` nodes for fns with `bar` in their name.
383
384Perhaps you are finding that when you change `foo` you need to re-type-check `bar`,
385but you don't think you should have to. In that case, you might do:
386
387```
388RUST_DEP_GRAPH_FILTER='Hir&foo -> TypeckItemBody & bar'
389```
390
391This will dump out all the nodes that lead from `Hir(foo)` to
392`TypeckItemBody(bar)`, from which you can (hopefully) see the source
393of the erroneous edge.
5bcae85e
SL
394
395### Inlining of HIR nodes
396
397For the time being, at least, we still sometimes "inline" HIR nodes
398from other crates into the current HIR map. This creates a weird
399scenario where the same logical item (let's call it `X`) has two
400def-ids: the original def-id `X` and a new, inlined one `X'`. `X'` is
401in the current crate, but it's not like other HIR nodes: in
402particular, when we restart compilation, it will not be available to
403hash. Therefore, we do not want `Hir(X')` nodes appearing in our
404graph. Instead, we want a "read" of `Hir(X')` to be represented as a
405read of `MetaData(X)`, since the metadata for `X` is where the inlined
406representation originated in the first place.
407
408To achieve this, the HIR map will detect if the def-id originates in
409an inlined node and add a dependency to a suitable `MetaData` node
410instead. If you are reading a HIR node and are not sure if it may be
411inlined or not, you can use `tcx.map.read(node_id)` and it will detect
412whether the node is inlined or not and do the right thing. You can
413also use `tcx.map.is_inlined_def_id()` and
414`tcx.map.is_inlined_node_id()` to test.