]> git.proxmox.com Git - rustc.git/blame - vendor/salsa/book/src/rfcs/RFC0006-Dynamic-Databases.md
New upstream version 1.48.0+dfsg1
[rustc.git] / vendor / salsa / book / src / rfcs / RFC0006-Dynamic-Databases.md
CommitLineData
3dfed10e
XL
1# Dynamic databases
2
3## Metadata
4
5* Author: nikomatsakis
6* Date: 2020-06-29
7* Introduced in: [salsa-rs/salsa#1](https://github.com/salsa-rs/salsa/pull/1) (please update once you open your PR)
8
9## Summary
10
11* Retool Salsa's setup so that the generated code for a query group is not
12 dependent on the final database type, and interacts with the database only
13 through `dyn` trait values.
14* This imposes a certain amount of indirecton but has the benefit that when a
15 query group definition changes, less code must be recompiled as a result.
16* Key changes include:
17 * Database keys are "interned" in the database to produce a
18 `DatabaseKeyIndex`.
19 * The values for cached query are stored directly in the hashtable instead of
20 in an `Arc`. There is still an Arc per cached query, but it stores the
21 dependency information.
22 * The various traits are changed to make `salsa::Database` dyn-safe. Invoking
23 methods on the runtime must now go through a `salsa::Runtime` trait.
24 * The `salsa::requires` functionality is removed.
25* Upsides of the proposal:
26 * Potentially improved recompilation time. Minimal code is regenerated.
27 * Removing the `DatabaseData` unsafe code hack that was required by slots.
28* Downsides of the proposal:
29 * The effect on runtime performance must be measured.
30 * `DatabaseKeyIndex` values will leak, as we propose no means to reclaim them.
31 However, the same is true of `Slot` values today.
32 * Storing values for the tables directly in the hashtable makes it less
33 obvious how we would return references to them in a safe fashion (before, I
34 had planned to have a separate module that held onto the Arc for the slot,
35 so we were sure the value would not be deallocated; one can still imagine
36 supporting this feature, but it would require some fancier unsafe code
37 reasoning, although it would be more efficient.)
38 * The `salsa::requires` functionality is removed.
39
40## Motivation
41
42Under the current salsa setup, all of the "glue code" that manages cache
43invalidation and other logic is ultimately parameterized by a type `DB` that
44refers to the full database. The problem is that, if you consider a typical
45salsa crate graph, the actual value for that type is not available until the
46final database crate is compiled:
47
48```mermaid
49graph TD;
50 Database["Crate that defines the database"];
51 QueryGroupA["Crate with query group A"];
52 QueryGroupB["Crate with query group B"];
53 SalsaCrate["the `salsa` crate"];
54 Database -- depends on --> QueryGroupA;
55 Database -- depends on --> QueryGroupB;
56 QueryGroupA -- depends on --> SalsaCrate;
57 QueryGroupB -- depends on --> SalsaCrate;
58```
59
60The result is that we do not actually compile a good part of the code from
61`QueryGroupA` or `QueryGroupB` until we build the final database crate.
62
63### What you can do today: dyn traits
64
65What you can do today is to use define a "dyn-compatible" query group
66trait and then write your derived functions using a `dyn` type as the
67argument:
68
69```rust,ignore
70#[salsa::query_group(QueryGroupAStorage)]
71trait QueryGroupA {
72 fn derived(&self, key: usize) -> usize;
73}
74
75fn derived(db: &dyn QueryGroupA, key: usize) -> usize {
76 key * 2
77}
78```
79
80This has the benefit that the `derived` function is not generic. However, it's
81still true that the glue code salsa makes will be generic over a `DB` type --
82this includes the impl of `QueryGroupA` but also the `Slot` and other machinery.
83This means that even if the only change is to query group B, in a different
84crate, the glue code for query group A ultimately has to be recompiled whenever
85the `Database` crate is rebuilt (though incremental compilation may help here).
86Moreover, as reported in [salsa-rs/salsa#220], measurements of rust-analyzer
87suggest that this code may be duplicated and accounting for more of the binary
88than we would expect.
89
90FIXME: I'd like to have better measurements on the above!
91
92[salsa-rs/salsa#220]: https://github.com/salsa-rs/salsa/issues/220
93
94### Our goal
95
96The primary goal of this RFC is to make it so that the glue code we generate for
97query groups is not dependent on the database type, thus enabling better
98incremental rebuilds.
99
100## User's guide
101
102Most of the changes in this RFC are "under the hood". But there are various user
103visibile changes proposed here.
104
105### All query groups must be dyn safe
106
107The largest one is that **all Salsa query groups must now be dyn-safe**. The
108existing salsa query methods are all dyn-safe, so what this really implies is
109that one cannot have super-traits that use generic methods or other things that
110are not dyn safe. For example, this query group would be illegal:
111
112```rust,ignore
113#[salsa::query_group(QueryGroupAStorage)]
114trait QueryGroupA: Foo {
115}
116
117trait Foo {
118 fn method<T>(t: T) { }
119}
120```
121
122We could support query groups that are not dyn safe, but it would require us to
123have two "similar but different" ways of generating plumbing, and I'm not
124convinced that it's worth it. Moreover, it would require some form of opt-in so
125that would be a measure of user complexity as well.
126
127### All query functions must take a dyn database
128
129You used to be able to implement queries by using `impl MyDatabase`, like so:
130
131```rust,ignore
132fn my_query(db: &impl MyDatabase, ...) { .. }
133```
134
135but you must now use `dyn MyDatabase`:
136
137```rust,ignore
138fn my_query(db: &dyn MyDatabase, ...) { .. }
139```
140
141### Databases embed a `Storage<DB>` with a fixed field name
142
143The "Hello World" database becomes the following:
144
145```rust,ignore
146#[salsa::database(QueryGroup1, ..., QueryGroupN)]
147struct MyDatabase {
148 storage: salsa::Storage<Self>
149}
150
151impl salsa::Database for MyDatabase {}
152```
153
154In particular:
155
156* You now embed a `salsa::Storage<Self>` instead of a `salsa::Runtime<Self>`
157* The field **must** be named `storage` by default; we can include a `#[salsa::storge_field(xxx)]` annotation to change that default if desired.
158 * Or we could scrape the struct declaration and infer it, I suppose.
159* You no longer have to define the `salsa_runtime` and `salsa_runtime_mut` methods, they move to the `DatabaseOps` trait and are manually implemented by doing `self.storage.runtime()` and so forth.
160
161Why these changes, and what is this `Storage` struct? This is because the actual
162storage for queries is moving outside of the runtime. The Storage struct just
163combines the `Runtime` (whose type no longer references `DB` directly) with an
164`Arc<DB::Storage>`. The full type of `Storage`, since it includes the database
165type, cannot appear in any public interface, it is just used by the various
166implementations that are created by `salsa::database`.
167
168### Instead of `db.query(Q)`, you write `Q.in_db(&db)`
169
170As a consequence of the previous point, the existing `query` and `query_mut`
171methods on the `salsa::Database` trait are changed to methods on the query types
172themselves. So instead of `db.query(SomeQuery)`, one would write
173`SomeQuery.in_db(&db)` (or `in_db_mut`). This both helps by making the
174`salsa::Database` trait dyn-safe and also works better with the new use of `dyn`
175types, since it permits a coercion from `&db` to the appropriate `dyn` database
176type at the point of call.
177
178### The salsa-event mechanism will move to dynamic dispatch
179
180A further consequence is that the existing `salsa_event` method will be
181simplified and made suitable for dynamic dispatch. It used to take a closure
182that would produce the event if necessary; it now simply takes the event itself.
183This is partly because events themselves no longer contain complex information:
184they used to have database-keys, which could require expensive cloning, but they
185now have simple indices.
186
187```rust,ignore
188fn salsa_event(&self, event: Event) {
189 #![allow(unused_variables)]
190}
191```
192
193This may imply some runtime cost, since various parts of the machinery invoke
194`salsa_event`, and those calls will now be virtual calls. They would previously
195have been static calls that would likely have been optimized away entirely.
196
197It is however possible that ThinLTO or other such optimization could remove
198those calls, this has not been tested, and in any case the runtime effects are
199not expected to be high, since all the calls will always go to the same
200function.
201
202### The `salsa::requires` function is removed
203
204We currently offer a feature for "private" dependencies between query groups
205called `#[salsa::requires(ExtraDatabase)]`. This then requires query
206functions to be written like:
207
208```rust,ignore
209fn query_fn(db: &impl Database + ExtraDatabase, ...) { }
210```
211
212This format is not compatible with `dyn`, so this feature is removed.
213
214## Reference guide
215
216### Example
217
218To explain the proposal, we'll use the Hello World example, lightly adapted:
219
220```rust,ignore
221#[salsa::query_group(HelloWorldStorage)]
222trait HelloWorld: salsa::Database {
223 #[salsa::input]
224 fn input_string(&self, key: ()) -> Arc<String>;
225
226 fn length(&self, key: ()) -> usize;
227}
228
229fn length(db: &dyn HelloWorld, (): ()) -> usize {
230 // Read the input string:
231 let input_string = db.input_string(());
232
233 // Return its length:
234 input_string.len()
235}
236
237#[salsa::database(HelloWorldStorage)]
238struct DatabaseStruct {
239 runtime: salsa::Runtime<DatabaseStruct>,
240}
241
242impl salsa::Database for DatabaseStruct {
243 fn salsa_runtime(&self) -> &salsa::Runtime<Self> {
244 &self.runtime
245 }
246
247 fn salsa_runtime_mut(&mut self) -> &mut salsa::Runtime<Self> {
248 &mut self.runtime
249 }
250}
251```
252
253### Identifying queries using the `DatabaseKeyIndex`
254
255We introduce the following struct that represents a database key using a series
256of indices:
257
258```rust,ignore
259struct DatabaseKeyIndex {
260 /// Identifies the query group.
261 group_index: u16,
262
263 /// Identifies the query within the group.
264 query_index: u16,
265
266 /// Identifies the key within the query.
267 key_index: u32,
268}
269```
270
271This struct allows the various query group structs to refer to database keys
272without having to use a type like `DB::DatabaseKey` that is dependent on the
273`DB`.
274
275The group/query indices will be assigned by the `salsa::database` and
276`salsa::query_group` macros respectively. When query group storage is created,
277it will be passed in its group index by the database. Each query will be able to
278access its query-index through the `Query` trait, as they are statically known
279at the time that the query is compiled (the group index, in contrast, depends on
280the full set of groups for the database).
281
282The key index can be assigned by the query as it executes without any central
283coordination. Each query will use a `IndexMap` (from the `indexmap` crate)
284mapping `Q::Key -> QueryState`. Inserting new keys into this map also creates
285new indices, and it is possible to index into the map in O(1) time later to
286obtain the state (or key) from a given query. This map replaces the existing
287`Q::Key -> Arc<Slot<..>>` map that is used today.
288
289One notable implication: we cannot remove entries from the query index map
290(e.g., for GC) because that would invalidate the existing indices. We can
291however replace the query-state with a "not computed" value. This is not new:
292slots already take this approach today. In principle, we could extend the
293tracing GC to permit compressing and perhaps even rewriting indices, but it's
294not clear that this is a problem in practice.
295
296The `DatabaseKeyIndex` also supports a `debug` method that returns a value with
297a human readable `debug!` output, so that you can do `debug!("{:?}",
298index.debug(db))`. This works by generating a `fmt_debug` method that is
299supported by the various query groups.
300
301### The various query traits are not generic over a database
302
303Today, the `Query`, `QueryFunction`, and `QueryGroup` traits are generic over
304the database `DB`, which allows them to name the final database type and
305associated types derived from it. In the new scheme, we never want to do that,
306and so instead they will now have an associated type, `DynDb`, that maps to the
307`dyn` version of the query group trait that the query is associated with.
308
309Therefore `QueryFunction` for example can become:
310
311```rust,ignore
312pub trait QueryFunction: Query {
313 fn execute(db: &Self::DynDb, key: Self::Key) -> Self::Value;
314 fn recover(db: &Self::DynDb, cycle: &[DB::DatabaseKey], key: &Self::Key) -> Option<Self::Value> {
315 let _ = (db, cycle, key);
316 None
317 }
318}
319```
320
321### Storing query results and tracking dependencies
322
323In today's setup, we have all the data for a particular query stored in a
324`Slot<Q, DB, MP>`, and these slots hold references to one another to track
325dependencies. Because the type of each slot is specific to the particular query
326`Q`, the references between slots are done using a `Arc<dyn DatabaseSlot<DB>>`
327handle. This requires some unsafe hacks, including the `DatabaseData` associated
328type.
329
330This RFC proposes to alter this setup. Dependencies will store a `DatabaseIndex`
331instead. This means that validating dependencies is less efficient, as we no
332longer have a direct pointer to the dependency information but instead must
333execute three index lookups (one to find the query group, one to locate the
334query, and then one to locate the key). Similarly the LRU list can be reverted
335to a `LinkedHashMap` of indices.
336
337We may tinker with other approaches too: the key change in the RFC is that we
338do not need to store a `DB::DatabaseKey` or `Slot<..DB..>`, but instead can use
339some type for dependencies that is independent of the dtabase type `DB`.
340
341### Dispatching methods from a `DatabaseKeyIndex`
342
343There are a number of methods that can be dispatched through the database
344interface on a `DatabaseKeyIndex`. For example, we already mentioned
345`fmt_debug`, which emits a debug representation of the key, but there is also
346`maybe_changed_since`, which checks whether the value for a given key may have
347changed since the given revision. Each of these methods is a member of the
348`DatabaseOps` trait and they are dispatched as follows.
349
350First, the `#[salsa::database]` procedural macro is the one which
351generates the `DatabaseOps` impl for the database. This base method
352simply matches on the group index to determine which query group
353contains the key, and then dispatches to an inherent
354method defined on the appropriate query group struct:
355
356```rust,ignore
357impl salsa::plumbing::DatabaseOps for DatabaseStruct {
358 // We'll use the `fmt_debug` method as an example
359 fn fmt_debug(&self, index: DatabaseKeyIndex, fmt: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
360 match index.group_index() {
361 0 => {
362 let storage = <Self as HasQueryGroup<HelloWorld>>::group_storage(self);
363 storage.fmt_debug(index, fmt)
364 }
365
366 _ => panic!("Invalid index")
367 }
368 }
369}
370```
371
372The query group struct has a very similar inherent method that dispatches based
373on the query index and invokes a method on the query storage:
374
375```rust,ignore
376impl HelloWorldGroupStorage__ {
377 // We'll use the `fmt_debug` method as an example
378 fn fmt_debug(&self, index: DatabaseKeyIndex, fmt: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
379 match index.query_index() {
380 0 => self.appropriate_query_field.fmt_debug(index, fmt),
381 1 => ...
382 _ => panic!("Invalid index")
383 }
384 }
385}
386```
387
388Finally, the query storage can use the key index to lookup the appropriate
389data from the `FxIndexSet`.
390
391### Wrap runtime in a `Storage<DB>` type
392
393The Salsa runtime is currently `Runtime<DB>` but it will change to just
394`Runtime` and thus not be generic over the database. This means it can be
395referenced directly by query storage implementations. This is very useful
396because it allows that type to have a number of `pub(crate)` details that query
397storage implementations make use of but which are not exposed as part of our
398public API.
399
400However, the `Runtime` crate used to contain a `DB::Storage`, and without the
401`DB` in its type, it no longer can. Therefore, we will introduce a new type
402`Storage<DB>` type which is defined like so:
403
404```rust,ignore
405pub struct Storage<DB: DatabaseImpl> {
406 query_store: Arc<DB::DatabaseStorage>,
407 runtime: Runtime,
408}
409
410impl<DB> Storage<DB> {
411 pub fn query_store(&self) -> &DB::DatabaseStorage {
412 &self.query_store
413 }
414
415 pub fn salsa_runtime(&self) -> &Runtime {
416 &self.runtime
417 }
418
419 pub fn salsa_runtime_mut(&mut self) -> &mut Runtime {
420 &self.runtime
421 }
422
423 /// Used for parallel queries
424 pub fn snapshot(&self) -> Self {
425 Storage {
426 query_store: query_store.clone(),
427 runtime: runtime.snapshot(),
428 }
429 }
430}
431```
432
433The user is expected to include a field `storage: Storage<DB>` in their database
434definition. The `salsa::database` procedural macro, when it generates impls of
435traits like `HasQueryGroup`, will embed code like `self.storage` that looks for
436that field.
437
438### `salsa_runtime` methods move to `DatabaseOps` trait
439
440The `salsa_runtime` methods used to be manually implemented by users to define
441the field that contains the salsa runtime. This was always boilerplate. The
442`salsa::database` macro now handles that job by defining them to invoke the
443corresponding methods on `Storage`.
444
445### Salsa database trait becomes dyn safe
446
447Under this proposal, the Salsa database must be dyn safe. This implies that
448we have to make a few changes:
449
450* The `query` and `query_mut` methods move to an extension trait.
451* The `DatabaseStorageTypes` supertrait is removed (that trait is renamed and altered, see next section).
452* The `salsa_event` method changes, as described in the User's guide.
453
454### Salsa database trait requires `'static`, at least for now
455
456One downside of this proposal is that the `salsa::Database` trait now has a
457`'static` bound. This is a result of the lack of GATs -- in particular, the
458queries expect a `Q::DynDb` as argument. In the query definition, we have
459something like `type DynDb = dyn QueryGroupDatabase`, which in turn defaults to
460`dyn::QueryGroupDatabase + 'static`.
461
462At the moment, this limitation is harmless, since salsa databases don't support
463generic parameters. But it would be good to lift in the future, especially as we
464would like to support arena allocation and other such patterns. The limitation
465could be overcome in the future by:
466
467* converting to a GAT like `DynDb<'a>`, if those were available;
468* or by simulating GATs by introducing a trait to carry the `DynDb` definition,
469 like `QueryDb<'a>`, where `Query` has the supertrait `for<'a> Self:
470 QueryDb<'a>`. This would permit the `DynDb` type to be referenced by writing
471 `<Q as QueryDb<'a>>::DynDb`.
472
473### Salsa query group traits are extended with `Database` and `HasQueryGroup` supertrait
474
475When `#[salsa::query_group]` is applied to a trait, we currently generate a copy
476of the trait that is "more or less" unmodified (although we sometimes add
477additional synthesized methods, such as the `set` method for an input). Under
478this proposal, we will also introduce a `HasQueryGroup<QueryGroupStorage>`
479supertrait. Therefore the following input:
480
481```rust,ignore
482#[salsa::query_group(HelloWorldStorage)]
483trait HelloWorld { .. }
484```
485
486will generate a trait like:
487
488```rust,ignore
489trait HelloWorld:
490 salsa::Database +
491 salsa::plumbing::HasQueryGroup<HelloWorldStorage>
492{
493 ..
494}
495```
496
497The `Database` trait is the standard `salsa::Database` trait and contains
498various helper methods. The `HasQueryGroup` trait is implemented by the database
499and defines various plumbing methods that are used by the storage
500implementations.
501
502One downside of this is that `salsa::Database` methods become available on the
503trait; we might want to give internal plumbing methods more obscure names.
504
505#### Bounds were already present on the blanket impl of salsa query group trait
506
507The new bounds that are appearing on the trait were always present on the
508blanket impl that the `salsa::query_group` procedural macro generated, which
509looks like so (and continues unchanged under this RFC):
510
511```rust,ignore
512impl<DB> HelloWorld for DB
513where
514 DB: salsa::Database +
515 DB: salsa::plumbing::HasQueryGroup<HelloWorldStorage>
516{
517 ...
518}
519```
520
521The reason we generate the impl is so that the `salsa::database` procedural
522macro can simply create the `HasQueryGroup` impl and never needs to know the
523name of the `HelloWorld` trait, only the `HelloWorldStorage` type.
524
525### Storage types no longer parameterized by the database
526
527Today's storage types, such as `Derived`, are parameterized over both a query `Q` and a `DB` (along with the memoization policy `MP`):
528
529```rust,ignore
530// Before this RFC:
531pub struct DerivedStorage<DB, Q, MP>
532where
533 Q: QueryFunction<DB>,
534 DB: Database + HasQueryGroup<Q::Group>,
535 MP: MemoizationPolicy<DB, Q>,
536```
537
538The `DB` parameter should no longer be needed after the previously described
539changes are made, so that the signature looks like:
540
541```rust,ignore
542// Before this RFC:
543pub struct DerivedStorage<Q, MP>
544where
545 Q: QueryFunction,
546 MP: MemoizationPolicy<DB, Q>,
547```
548
549## Alternatives and future work
550
551The 'linch-pin' of this design is the `DatabaseKeyIndex` type, which allows for
552signatures to refer to "any query in the system" without reference to the `DB`
553type. The biggest downside of the system is that this type is an integer which
554then requires a tracing GC to recover index values. The primary alternative
555would be to use an `Arc`-like scheme,but this has some severe downsides:
556
557* Requires reference counting, allocation
558* Hashing and equality comparisons have more data to process versus an integer
559* Equality comparisons must still be deep since you may have older and newer keys co-existing
560* Requires a `Arc<dyn DatabaseKey>`-like setup, which then encounters the
561 problems that this type is not `Send` or `Sync`, leading to hacks like the
562 `DB::DatabaseData` we use today.