]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # Incremental compilation |
2 | ||
3 | The incremental compilation scheme is, in essence, a surprisingly | |
4 | simple extension to the overall query system. We'll start by describing | |
5 | a slightly simplified variant of the real thing – the "basic algorithm" – | |
6 | and then describe some possible improvements. | |
7 | ||
8 | ## The basic algorithm | |
9 | ||
10 | The basic algorithm is | |
11 | called the **red-green** algorithm[^salsa]. The high-level idea is | |
12 | that, after each run of the compiler, we will save the results of all | |
13 | the queries that we do, as well as the **query DAG**. The | |
14 | **query DAG** is a [DAG] that indexes which queries executed which | |
15 | other queries. So, for example, there would be an edge from a query Q1 | |
16 | to another query Q2 if computing Q1 required computing Q2 (note that | |
17 | because queries cannot depend on themselves, this results in a DAG and | |
18 | not a general graph). | |
19 | ||
20 | [DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph | |
21 | ||
22 | On the next run of the compiler, then, we can sometimes reuse these | |
23 | query results to avoid re-executing a query. We do this by assigning | |
24 | every query a **color**: | |
25 | ||
26 | - If a query is colored **red**, that means that its result during | |
27 | this compilation has **changed** from the previous compilation. | |
28 | - If a query is colored **green**, that means that its result is | |
29 | the **same** as the previous compilation. | |
30 | ||
31 | There are two key insights here: | |
32 | ||
33 | - First, if all the inputs to query Q are colored green, then the | |
34 | query Q **must** result in the same value as last time and hence | |
35 | need not be re-executed (or else the compiler is not deterministic). | |
36 | - Second, even if some inputs to a query changes, it may be that it | |
37 | **still** produces the same result as the previous compilation. In | |
38 | particular, the query may only use part of its input. | |
39 | - Therefore, after executing a query, we always check whether it | |
40 | produced the same result as the previous time. **If it did,** we | |
41 | can still mark the query as green, and hence avoid re-executing | |
42 | dependent queries. | |
43 | ||
44 | ### The try-mark-green algorithm | |
45 | ||
46 | At the core of incremental compilation is an algorithm called | |
47 | "try-mark-green". It has the job of determining the color of a given | |
48 | query Q (which must not have yet been executed). In cases where Q has | |
49 | red inputs, determining Q's color may involve re-executing Q so that | |
50 | we can compare its output, but if all of Q's inputs are green, then we | |
51 | can conclude that Q must be green without re-executing it or inspecting | |
52 | its value at all. In the compiler, this allows us to avoid | |
53 | deserializing the result from disk when we don't need it, and in fact | |
54 | enables us to sometimes skip *serializing* the result as well | |
55 | (see the refinements section below). | |
56 | ||
57 | Try-mark-green works as follows: | |
58 | ||
59 | - First check if the query Q was executed during the previous compilation. | |
60 | - If not, we can just re-execute the query as normal, and assign it the | |
61 | color of red. | |
62 | - If yes, then load the 'dependent queries' of Q. | |
63 | - If there is a saved result, then we load the `reads(Q)` vector from the | |
64 | query DAG. The "reads" is the set of queries that Q executed during | |
65 | its execution. | |
66 | - For each query R in `reads(Q)`, we recursively demand the color | |
67 | of R using try-mark-green. | |
68 | - Note: it is important that we visit each node in `reads(Q)` in same order | |
69 | as they occurred in the original compilation. See [the section on the | |
70 | query DAG below](#dag). | |
71 | - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is | |
72 | dirty. | |
73 | - We re-execute Q and compare the hash of its result to the hash of the | |
74 | result from the previous compilation. | |
75 | - If the hash has not changed, we can mark Q as **green** and return. | |
76 | - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that | |
77 | case, we can color Q as **green** and return. | |
78 | ||
79 | <a name="dag"></a> | |
80 | ||
81 | ### The query DAG | |
82 | ||
83 | The query DAG code is stored in | |
ba9703b0 | 84 | [`src/librustc_middle/dep_graph`][dep_graph]. Construction of the DAG is done |
a1dfa0c6 XL |
85 | by instrumenting the query execution. |
86 | ||
87 | One key point is that the query DAG also tracks ordering; that is, for | |
88 | each query Q, we not only track the queries that Q reads, we track the | |
89 | **order** in which they were read. This allows try-mark-green to walk | |
90 | those queries back in the same order. This is important because once a | |
91 | subquery comes back as red, we can no longer be sure that Q will continue | |
92 | along the same path as before. That is, imagine a query like this: | |
93 | ||
94 | ```rust,ignore | |
95 | fn main_query(tcx) { | |
96 | if tcx.subquery1() { | |
97 | tcx.subquery2() | |
98 | } else { | |
99 | tcx.subquery3() | |
100 | } | |
101 | } | |
102 | ``` | |
103 | ||
104 | Now imagine that in the first compilation, `main_query` starts by | |
105 | executing `subquery1`, and this returns true. In that case, the next | |
106 | query `main_query` executes will be `subquery2`, and `subquery3` will | |
107 | not be executed at all. | |
108 | ||
109 | But now imagine that in the **next** compilation, the input has | |
110 | changed such that `subquery1` returns **false**. In this case, `subquery2` | |
111 | would never execute. If try-mark-green were to visit `reads(main_query)` out | |
112 | of order, however, it might visit `subquery2` before `subquery1`, and hence | |
113 | execute it. | |
114 | This can lead to ICEs and other problems in the compiler. | |
115 | ||
ba9703b0 | 116 | [dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc_middle/dep_graph |
a1dfa0c6 XL |
117 | |
118 | ## Improvements to the basic algorithm | |
119 | ||
120 | In the description of the basic algorithm, we said that at the end of | |
121 | compilation we would save the results of all the queries that were | |
122 | performed. In practice, this can be quite wasteful – many of those | |
123 | results are very cheap to recompute, and serializing and deserializing | |
124 | them is not a particular win. In practice, what we would do is to save | |
125 | **the hashes** of all the subqueries that we performed. Then, in select cases, | |
126 | we **also** save the results. | |
127 | ||
128 | This is why the incremental algorithm separates computing the | |
129 | **color** of a node, which often does not require its value, from | |
130 | computing the **result** of a node. Computing the result is done via a simple | |
131 | algorithm like so: | |
132 | ||
133 | - Check if a saved result for Q is available. If so, compute the color of Q. | |
134 | If Q is green, deserialize and return the saved result. | |
135 | - Otherwise, execute Q. | |
136 | - We can then compare the hash of the result and color Q as green if | |
137 | it did not change. | |
138 | ||
532ac7d7 XL |
139 | ## Resources |
140 | The initial design document can be found at https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md, which expands on the memoization details, provides more high-level overview and motivation for this system. | |
141 | ||
a1dfa0c6 XL |
142 | # Footnotes |
143 | ||
144 | [^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis |