]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/queries/incremental-compilation.md
New upstream version 1.44.1+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / queries / incremental-compilation.md
CommitLineData
a1dfa0c6
XL
1# Incremental compilation
2
3The incremental compilation scheme is, in essence, a surprisingly
4simple extension to the overall query system. We'll start by describing
5a slightly simplified variant of the real thing – the "basic algorithm" –
6and then describe some possible improvements.
7
8## The basic algorithm
9
10The basic algorithm is
11called the **red-green** algorithm[^salsa]. The high-level idea is
12that, after each run of the compiler, we will save the results of all
13the queries that we do, as well as the **query DAG**. The
14**query DAG** is a [DAG] that indexes which queries executed which
15other queries. So, for example, there would be an edge from a query Q1
16to another query Q2 if computing Q1 required computing Q2 (note that
17because queries cannot depend on themselves, this results in a DAG and
18not a general graph).
19
20[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
21
22On the next run of the compiler, then, we can sometimes reuse these
23query results to avoid re-executing a query. We do this by assigning
24every query a **color**:
25
26- If a query is colored **red**, that means that its result during
27 this compilation has **changed** from the previous compilation.
28- If a query is colored **green**, that means that its result is
29 the **same** as the previous compilation.
30
31There are two key insights here:
32
33- First, if all the inputs to query Q are colored green, then the
34 query Q **must** result in the same value as last time and hence
35 need not be re-executed (or else the compiler is not deterministic).
36- Second, even if some inputs to a query changes, it may be that it
37 **still** produces the same result as the previous compilation. In
38 particular, the query may only use part of its input.
39 - Therefore, after executing a query, we always check whether it
40 produced the same result as the previous time. **If it did,** we
41 can still mark the query as green, and hence avoid re-executing
42 dependent queries.
43
44### The try-mark-green algorithm
45
46At the core of incremental compilation is an algorithm called
47"try-mark-green". It has the job of determining the color of a given
48query Q (which must not have yet been executed). In cases where Q has
49red inputs, determining Q's color may involve re-executing Q so that
50we can compare its output, but if all of Q's inputs are green, then we
51can conclude that Q must be green without re-executing it or inspecting
52its value at all. In the compiler, this allows us to avoid
53deserializing the result from disk when we don't need it, and in fact
54enables us to sometimes skip *serializing* the result as well
55(see the refinements section below).
56
57Try-mark-green works as follows:
58
59- First check if the query Q was executed during the previous compilation.
60 - If not, we can just re-execute the query as normal, and assign it the
61 color of red.
62- If yes, then load the 'dependent queries' of Q.
63- If there is a saved result, then we load the `reads(Q)` vector from the
64 query DAG. The "reads" is the set of queries that Q executed during
65 its execution.
66 - For each query R in `reads(Q)`, we recursively demand the color
67 of R using try-mark-green.
68 - Note: it is important that we visit each node in `reads(Q)` in same order
69 as they occurred in the original compilation. See [the section on the
70 query DAG below](#dag).
71 - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is
72 dirty.
73 - We re-execute Q and compare the hash of its result to the hash of the
74 result from the previous compilation.
75 - If the hash has not changed, we can mark Q as **green** and return.
76 - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that
77 case, we can color Q as **green** and return.
78
79<a name="dag"></a>
80
81### The query DAG
82
83The query DAG code is stored in
ba9703b0 84[`src/librustc_middle/dep_graph`][dep_graph]. Construction of the DAG is done
a1dfa0c6
XL
85by instrumenting the query execution.
86
87One key point is that the query DAG also tracks ordering; that is, for
88each query Q, we not only track the queries that Q reads, we track the
89**order** in which they were read. This allows try-mark-green to walk
90those queries back in the same order. This is important because once a
91subquery comes back as red, we can no longer be sure that Q will continue
92along the same path as before. That is, imagine a query like this:
93
94```rust,ignore
95fn main_query(tcx) {
96 if tcx.subquery1() {
97 tcx.subquery2()
98 } else {
99 tcx.subquery3()
100 }
101}
102```
103
104Now imagine that in the first compilation, `main_query` starts by
105executing `subquery1`, and this returns true. In that case, the next
106query `main_query` executes will be `subquery2`, and `subquery3` will
107not be executed at all.
108
109But now imagine that in the **next** compilation, the input has
110changed such that `subquery1` returns **false**. In this case, `subquery2`
111would never execute. If try-mark-green were to visit `reads(main_query)` out
112of order, however, it might visit `subquery2` before `subquery1`, and hence
113execute it.
114This can lead to ICEs and other problems in the compiler.
115
ba9703b0 116[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc_middle/dep_graph
a1dfa0c6
XL
117
118## Improvements to the basic algorithm
119
120In the description of the basic algorithm, we said that at the end of
121compilation we would save the results of all the queries that were
122performed. In practice, this can be quite wasteful – many of those
123results are very cheap to recompute, and serializing and deserializing
124them is not a particular win. In practice, what we would do is to save
125**the hashes** of all the subqueries that we performed. Then, in select cases,
126we **also** save the results.
127
128This is why the incremental algorithm separates computing the
129**color** of a node, which often does not require its value, from
130computing the **result** of a node. Computing the result is done via a simple
131algorithm like so:
132
133- Check if a saved result for Q is available. If so, compute the color of Q.
134 If Q is green, deserialize and return the saved result.
135- Otherwise, execute Q.
136 - We can then compare the hash of the result and color Q as green if
137 it did not change.
138
532ac7d7
XL
139## Resources
140The initial design document can be found at https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md, which expands on the memoization details, provides more high-level overview and motivation for this system.
141
a1dfa0c6
XL
142# Footnotes
143
144[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis