]> git.proxmox.com Git - rustc.git/blame - vendor/salsa/book/src/rfcs/RFC0005-Durability.md
New upstream version 1.48.0+dfsg1
[rustc.git] / vendor / salsa / book / src / rfcs / RFC0005-Durability.md
CommitLineData
f035d41b
XL
1# Summary
2
3- Introduce a user-visibile concept of `Durability`
4- Adjusting the "durability" of an input can allow salsa to skip a lot of validation work
5- Garbage collection -- particularly of interned values -- however becomes more complex
6- Possible future expansion: automatic detection of more "durable" input values
7
8# Motivation
9
10## Making validation faster by optimizing for "durability"
11
12Presently, salsa's validation logic requires traversing all
13dependencies to check that they have not changed. This can sometimes
14be quite costly in practice: rust-analyzer for example sometimes
15spends as much as 90ms revalidating the results from a no-op
16change. One option to improve this is simply optimization --
17[salsa#176] for example reduces validation times significantly, and
18there remains opportunity to do better still. However, even if we are
19able to traverse the dependency graph more efficiently, it will still
20be an O(n) process. It would be nice if we could do better.
21
22[salsa#176]: https://github.com/salsa-rs/salsa/pull/176
23
24One observation is that, in practice, there are often input values
25that are known to change quite infrequently. For example, in
26rust-analyzer, the standard library and crates downloaded from
27crates.io are unlikely to change (though changes are possible; see
28below). Similarly, the `Cargo.toml` file for a project changes
29relatively infrequently compared to the sources. We say then that
30these inputs are more **durable** -- that is, they change less frequently.
31
32This RFC proposes a mechanism to take advantage of durability for
33optimization purposes. Imagine that we have some query Q that depends
34solely on the standard library. The idea is that we can track the last
35revision R when the standard library was changed. Then, when
36traversing dependencies, we can skip traversing the dependencies of Q
37if it was last validated after the revision R. Put another way, we
38only need to traverse the dependencies of Q when the standard library
39changes -- which is unusual. If the standard library *does* change,
40for example by user's tinkering with the internal sources, then yes we
41walk the dependencies of Q to see if it is affected.
42
43# User's guide
44
45## The durability type
46
47We add a new type `salsa::Durability` which has there associated constants:
48
3dfed10e 49```rust,ignore
f035d41b
XL
50#[derive(Copy, Clone, Debug, Ord)]
51pub struct Durability(..);
52
53impl Durability {
54 // Values that change regularly, like the source to the current crate.
55 pub const LOW: Durability;
56
57 // Values that change infrequently, like Cargo.toml.
58 pub const MEDIUM: Durability;
59
60 // Values that are not expected to change, like sources from crates.io or the stdlib.
61 pub const HIGH: Durability;
62}
63```
64
65h## Specifying the durability of an input
66
67When setting an input `foo`, one can now invoke a method
68`set_foo_with_durability`, which takes a `Durability` as the final
69argument:
70
3dfed10e 71```rust,ignore
f035d41b
XL
72// db.set_foo(key, value) is equivalent to:
73db.set_foo_with_durability(key, value, Durability::LOW);
74
75// This would indicate that `foo` is not expected to change:
76db.set_foo_with_durability(key, value, Durability::HIGH);
77```
78
79## Durability of interned values
80
81Interned values are always considered `Durability::HIGH`. This makes
82sense as many queries that only use high durability inputs will also
83make use of interning internally. A consequence of this is that they
84will not be garbage collected unless you use the specific patterns
85recommended below.
86
87## Synthetic writes
88
89Finally, we add one new method, `synthetic_write(durability)`,
90available on the salsa runtime:
91
3dfed10e 92```rust,ignore
f035d41b
XL
93db.salsa_runtime().synthetic_write(Durability::HIGH)
94```
95
96As the name suggests, `synthetic_write` causes salsa to act *as
97though* a write to an input of the given durability had taken
98place. This can be used for benchmarking, but it's also important to
99controlling what values get garbaged collected, as described below.
100
101## Tracing and garbage collection
102
103Durability affects garbage collection. The `SweepStrategy` struct is
104modified as follows:
105
3dfed10e 106```rust,ignore
f035d41b
XL
107/// Sweeps values which may be outdated, but which have not
108/// been verified since the start of the current collection.
109/// These are typically memoized values from previous computations
110/// that are no longer relevant.
111pub fn sweep_outdated(self) -> SweepStrategy;
112
113/// Sweeps values which have not been verified since the start
114/// of the current collection, even if they are known to be
115/// up to date. This can be used to collect "high durability" values
116/// that are not *directly* used by the main query.
117///
118/// So, for example, imagine a main query `result` which relies
119/// on another query `threshold` and (indirectly) on a `threshold_inner`:
120///
121/// ```
122/// result(10) [durability: Low]
123/// |
124/// v
125/// threshold(10) [durability: High]
126/// |
127/// v
128/// threshold_inner(10) [durability: High]
129/// ```
130///
131/// If you modify a low durability input and then access `result`,
132/// then `result(10)` and its *immediate* dependencies will
133/// be considered "verified". However, because `threshold(10)`
134/// has high durability and no high durability input was modified,
135/// we will not verify *its* dependencies, so `threshold_inner` is not
136/// verified (but it is also not outdated).
137///
138/// Collecting unverified things would therefore collect `threshold_inner(10)`.
139/// Collecting only *outdated* things (i.e., with `sweep_outdated`)
140/// would collect nothing -- but this does mean that some high durability
141/// queries that are no longer relevant to your main query may stick around.
142///
143/// To get the most precise garbage collection, do a synthetic write with
144/// high durability -- this will force us to verify *all* values. You can then
145/// sweep unverified values.
146pub fn sweep_unverified(self) -> SweepStrategy;
147```
148
149# Reference guide
150
151## Review: The need for GC to collect outdated values
152
153In general, salsa's lazy validation scheme can lead to the accumulation
154of garbage that is no longer needed. Consider a query like this one:
155
3dfed10e 156```rust,ignore
f035d41b
XL
157fn derived1(db: &impl Database, start: usize) {
158 let middle = self.input(start);
159 self.derived2(middle)
160}
161```
162
163Now imagine that, on some particular run, we compute `derived1(22)`:
164
165- `derived1(22)`
166 - executes `input(22)`, which returns `44`
167 - then executes `derived2(44)`
168
169The end result of this execution will be a dependency graph
170like:
171
3dfed10e 172```notrust
f035d41b
XL
173derived1(22) -> derived2(44)
174 |
175 v
176input(22)
177```
178
179Now. imagine that the user modifies `input(22)` to have the value `45`.
180The next time `derived1(22)` executes, it will load `input(22)` as before,
181but then execute `derived2(45)`. This leaves us with a dependency
182graph as follows:
183
3dfed10e 184```notrust
f035d41b
XL
185derived1(22) -> derived2(45)
186 |
187 v
188input(22) derived2(44)
189```
190
191Notice that we still see `derived2(44)` in the graph. This is because
192we memoized the result in last round and then simply had no use for it
193in this round. The role of GC is to collect "outdated" values like
194this one.
195
196###Review: Tracing and GC before durability
197
198In the absence of durability, when you execute a query Q in some new
199revision where Q has not previously executed, salsa must trace back
200through all the queries that Q depends on to ensure that they are
201still up to date. As each of Q's dependencies is validated, we mark it
202to indicate that it has been checked in the current revision (and
203thus, within a particular revision, we would never validate or trace a
204particular query twice).
205
206So, to continue our example, when we first executed `derived1(22)`
207in revision R1, we might have had a graph like:
208
209
3dfed10e 210```notrust
f035d41b
XL
211derived1(22) -> derived2(44)
212[verified: R1] [verified: R1]
213 |
214 v
215input(22)
216```
217
218Now, after we modify `input(22)` and execute `derived1(22)` again, we
219would have a graph like:
220
3dfed10e 221```notrust
f035d41b
XL
222derived1(22) -> derived2(45)
223[verified: R2] [verified: R2]
224 |
225 v
226input(22) derived2(44)
227 [verified: R1]
228```
229
230Note that `derived2(44)`, the outdated value, never had its "verified"
231revision updated, because we never accessed it.
232
233Salsa leverages this validation stamp to serve as the "marking" phase
234of a simple mark-sweep garbage collector. The idea is that the sweep
235method can collect any values that are "outdated" (whose "verified"
236revision is less than the current revision).
237
238The intended model is that one can do a "mark-sweep" style garbage
239collection like so:
240
3dfed10e 241```rust,ignore
f035d41b
XL
242// Modify some input, triggering a new revision.
243db.set_input(22, 45);
244
245// The **mark** phase: execute the "main query", with the intention
246// that we wish to retain all the memoized values needed to compute
247// this main query, but discard anything else. For example, in an IDE
248// context, this might be a "compute all errors" query.
249db.derived1(22);
250
251// The **sweep** phase: discard anything that was not traced during
252// the mark phase.
253db.sweep_all(...);
254```
255
256In the case of our example, when we execute `sweep_all`, it would
257collect `derived2(44)`.
258
259## Challenge: Durability lets us avoid tracing
260
261This tracing model is affected by the move to durability. Now, if some
262derived value has a high durability, we may skip tracing its
263descendants altogether. This means that they would never be "verified"
264-- that is, their "verified date" would never be updated.
265
266This is why we modify the definition of "outdated" as follows:
267
268- For a query value `Q` with durability `D`, let `R_lc` be the revision when
269 values of durability `D` last changed. Let `R_v` be the revision when
270 `Q` was last verified.
271- `Q` is outdated if `R_v < R_lc`.
272 - In other words, if `Q` may have changed since it was last verified.
273
274## Collecting interned and untracked values
275
276Most values can be collected whenever we like without influencing
277correctness. However, interned values and those with untracked
278dependencies are an exception -- **they can only be collected when
279outdated**. This is because their values may not be reproducible --
280in other words, re-executing an interning query (or one with untracked
281dependencies, which can read arbitrary program state) twice in a row
282may produce a different value. In the case of an interning query, for
283example, we may wind up using a different integer than we did before.
284If the query is outdated, this is not a problem: anything that
285dependend on its result must also be outdated, and hence would be
286re-executed and would observe the new value. But if the query is *not*
287outdated, then we could get inconsistent result.s
288
289# Alternatives and future work
290
291## Rejected: Arbitrary durabilities
292
293We considered permitting arbitrary "levels" of durability -- for
294example, allowing the user to specify a number -- rather than offering
295just three. Ultimately it seemed like that level of control wasn't
296really necessary and that having just three levels would be sufficient
297and simpler.
298
299## Rejected: Durability lattices
300
301We also considered permitting a "lattice" of durabilities -- e.g., to
302mirror the crate DAG in rust-analyzer -- but this is tricky because
303the lattice itself would be dependent on other inputs.
304