]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # Incremental compilation |
2 | ||
6a06907d XL |
3 | <!-- toc --> |
4 | ||
a1dfa0c6 XL |
5 | The incremental compilation scheme is, in essence, a surprisingly |
6 | simple extension to the overall query system. We'll start by describing | |
7 | a slightly simplified variant of the real thing – the "basic algorithm" – | |
8 | and then describe some possible improvements. | |
9 | ||
10 | ## The basic algorithm | |
11 | ||
12 | The basic algorithm is | |
13 | called the **red-green** algorithm[^salsa]. The high-level idea is | |
14 | that, after each run of the compiler, we will save the results of all | |
15 | the queries that we do, as well as the **query DAG**. The | |
16 | **query DAG** is a [DAG] that indexes which queries executed which | |
6a06907d | 17 | other queries. So, for example, there would be an [edge] from a query Q1 |
a1dfa0c6 XL |
18 | to another query Q2 if computing Q1 required computing Q2 (note that |
19 | because queries cannot depend on themselves, this results in a DAG and | |
20 | not a general graph). | |
21 | ||
22 | [DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph | |
23 | ||
24 | On the next run of the compiler, then, we can sometimes reuse these | |
25 | query results to avoid re-executing a query. We do this by assigning | |
26 | every query a **color**: | |
27 | ||
28 | - If a query is colored **red**, that means that its result during | |
29 | this compilation has **changed** from the previous compilation. | |
30 | - If a query is colored **green**, that means that its result is | |
31 | the **same** as the previous compilation. | |
32 | ||
33 | There are two key insights here: | |
34 | ||
35 | - First, if all the inputs to query Q are colored green, then the | |
36 | query Q **must** result in the same value as last time and hence | |
37 | need not be re-executed (or else the compiler is not deterministic). | |
38 | - Second, even if some inputs to a query changes, it may be that it | |
39 | **still** produces the same result as the previous compilation. In | |
40 | particular, the query may only use part of its input. | |
41 | - Therefore, after executing a query, we always check whether it | |
42 | produced the same result as the previous time. **If it did,** we | |
43 | can still mark the query as green, and hence avoid re-executing | |
44 | dependent queries. | |
45 | ||
46 | ### The try-mark-green algorithm | |
47 | ||
48 | At the core of incremental compilation is an algorithm called | |
49 | "try-mark-green". It has the job of determining the color of a given | |
50 | query Q (which must not have yet been executed). In cases where Q has | |
51 | red inputs, determining Q's color may involve re-executing Q so that | |
52 | we can compare its output, but if all of Q's inputs are green, then we | |
53 | can conclude that Q must be green without re-executing it or inspecting | |
54 | its value at all. In the compiler, this allows us to avoid | |
55 | deserializing the result from disk when we don't need it, and in fact | |
56 | enables us to sometimes skip *serializing* the result as well | |
57 | (see the refinements section below). | |
58 | ||
59 | Try-mark-green works as follows: | |
60 | ||
61 | - First check if the query Q was executed during the previous compilation. | |
62 | - If not, we can just re-execute the query as normal, and assign it the | |
63 | color of red. | |
64 | - If yes, then load the 'dependent queries' of Q. | |
65 | - If there is a saved result, then we load the `reads(Q)` vector from the | |
66 | query DAG. The "reads" is the set of queries that Q executed during | |
67 | its execution. | |
68 | - For each query R in `reads(Q)`, we recursively demand the color | |
69 | of R using try-mark-green. | |
70 | - Note: it is important that we visit each node in `reads(Q)` in same order | |
71 | as they occurred in the original compilation. See [the section on the | |
72 | query DAG below](#dag). | |
73 | - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is | |
74 | dirty. | |
75 | - We re-execute Q and compare the hash of its result to the hash of the | |
76 | result from the previous compilation. | |
77 | - If the hash has not changed, we can mark Q as **green** and return. | |
78 | - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that | |
79 | case, we can color Q as **green** and return. | |
80 | ||
81 | <a name="dag"></a> | |
82 | ||
83 | ### The query DAG | |
84 | ||
85 | The query DAG code is stored in | |
6a06907d | 86 | [`compiler/rustc_middle/src/dep_graph`][dep_graph]. Construction of the DAG is done |
a1dfa0c6 XL |
87 | by instrumenting the query execution. |
88 | ||
89 | One key point is that the query DAG also tracks ordering; that is, for | |
90 | each query Q, we not only track the queries that Q reads, we track the | |
91 | **order** in which they were read. This allows try-mark-green to walk | |
92 | those queries back in the same order. This is important because once a | |
93 | subquery comes back as red, we can no longer be sure that Q will continue | |
94 | along the same path as before. That is, imagine a query like this: | |
95 | ||
96 | ```rust,ignore | |
97 | fn main_query(tcx) { | |
98 | if tcx.subquery1() { | |
99 | tcx.subquery2() | |
100 | } else { | |
101 | tcx.subquery3() | |
102 | } | |
103 | } | |
104 | ``` | |
105 | ||
106 | Now imagine that in the first compilation, `main_query` starts by | |
107 | executing `subquery1`, and this returns true. In that case, the next | |
108 | query `main_query` executes will be `subquery2`, and `subquery3` will | |
109 | not be executed at all. | |
110 | ||
111 | But now imagine that in the **next** compilation, the input has | |
112 | changed such that `subquery1` returns **false**. In this case, `subquery2` | |
113 | would never execute. If try-mark-green were to visit `reads(main_query)` out | |
114 | of order, however, it might visit `subquery2` before `subquery1`, and hence | |
115 | execute it. | |
116 | This can lead to ICEs and other problems in the compiler. | |
117 | ||
6a06907d | 118 | [dep_graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/dep_graph/index.html |
a1dfa0c6 XL |
119 | |
120 | ## Improvements to the basic algorithm | |
121 | ||
122 | In the description of the basic algorithm, we said that at the end of | |
123 | compilation we would save the results of all the queries that were | |
124 | performed. In practice, this can be quite wasteful – many of those | |
125 | results are very cheap to recompute, and serializing and deserializing | |
126 | them is not a particular win. In practice, what we would do is to save | |
127 | **the hashes** of all the subqueries that we performed. Then, in select cases, | |
128 | we **also** save the results. | |
129 | ||
130 | This is why the incremental algorithm separates computing the | |
131 | **color** of a node, which often does not require its value, from | |
132 | computing the **result** of a node. Computing the result is done via a simple | |
133 | algorithm like so: | |
134 | ||
135 | - Check if a saved result for Q is available. If so, compute the color of Q. | |
136 | If Q is green, deserialize and return the saved result. | |
137 | - Otherwise, execute Q. | |
138 | - We can then compare the hash of the result and color Q as green if | |
139 | it did not change. | |
140 | ||
532ac7d7 | 141 | ## Resources |
6a06907d XL |
142 | The initial design document can be found [here][initial-design], which expands |
143 | on the memoization details, provides more high-level overview and motivation | |
144 | for this system. | |
532ac7d7 | 145 | |
a1dfa0c6 XL |
146 | # Footnotes |
147 | ||
148 | [^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis | |
6a06907d XL |
149 | |
150 | [edge]: https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#edge | |
151 | [initial-design]: https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md |