src/doc/rustc-guide/src/query.md

   1 # Queries: demand-driven compilation
   2
   3 As described in [the high-level overview of the compiler][hl], the
   4 Rust compiler is current transitioning from a traditional "pass-based"
   5 setup to a "demand-driven" system. **The Compiler Query System is the
   6 key to our new demand-driven organization.** The idea is pretty
   7 simple. You have various queries that compute things about the input
   8 – for example, there is a query called `type_of(def_id)` that, given
   9 the def-id of some item, will compute the type of that item and return
  10 it to you.
  11
  12 [hl]: high-level-overview.html
  13
  14 Query execution is **memoized** – so the first time you invoke a
  15 query, it will go do the computation, but the next time, the result is
  16 returned from a hashtable. Moreover, query execution fits nicely into
  17 **incremental computation**; the idea is roughly that, when you do a
  18 query, the result **may** be returned to you by loading stored data
  19 from disk (but that's a separate topic we won't discuss further here).
  20
  21 The overall vision is that, eventually, the entire compiler
  22 control-flow will be query driven. There will effectively be one
  23 top-level query ("compile") that will run compilation on a crate; this
  24 will in turn demand information about that crate, starting from the
  25 *end*.  For example:
  26
  27 - This "compile" query might demand to get a list of codegen-units
  28   (i.e. modules that need to be compiled by LLVM).
  29 - But computing the list of codegen-units would invoke some subquery
  30   that returns the list of all modules defined in the Rust source.
  31 - That query in turn would invoke something asking for the HIR.
  32 - This keeps going further and further back until we wind up doing the
  33   actual parsing.
  34
  35 However, that vision is not fully realized. Still, big chunks of the
  36 compiler (for example, generating MIR) work exactly like this.
  37
  38 ### Incremental Compilation in Detail
  39
  40 The [Incremental Compilation in Detail][query-model] chapter gives a more
  41 in-depth description of what queries are and how they work.
  42 If you intend to write a query of your own, this is a good read.
  43
  44 ### Invoking queries
  45
  46 To invoke a query is simple. The tcx ("type context") offers a method
  47 for each defined query. So, for example, to invoke the `type_of`
  48 query, you would just do this:
  49
  50 ```rust,ignore
  51 let ty = tcx.type_of(some_def_id);
  52 ```
  53
  54 ### How the compiler executes a query
  55
  56 So you may be wondering what happens when you invoke a query
  57 method. The answer is that, for each query, the compiler maintains a
  58 cache – if your query has already been executed, then, the answer is
  59 simple: we clone the return value out of the cache and return it
  60 (therefore, you should try to ensure that the return types of queries
  61 are cheaply cloneable; insert a `Rc` if necessary).
  62
  63 #### Providers
  64
  65 If, however, the query is *not* in the cache, then the compiler will
  66 try to find a suitable **provider**. A provider is a function that has
  67 been defined and linked into the compiler somewhere that contains the
  68 code to compute the result of the query.
  69
  70 **Providers are defined per-crate.** The compiler maintains,
  71 internally, a table of providers for every crate, at least
  72 conceptually. Right now, there are really two sets: the providers for
  73 queries about the **local crate** (that is, the one being compiled)
  74 and providers for queries about **external crates** (that is,
  75 dependencies of the local crate). Note that what determines the crate
  76 that a query is targeting is not the *kind* of query, but the *key*.
  77 For example, when you invoke `tcx.type_of(def_id)`, that could be a
  78 local query or an external query, depending on what crate the `def_id`
  79 is referring to (see the `self::keys::Key` trait for more information
  80 on how that works).
  81
  82 Providers always have the same signature:
  83
  84 ```rust,ignore
  85 fn provider<'tcx>(
  86     tcx: TyCtxt<'tcx>,
  87     key: QUERY_KEY,
  88 ) -> QUERY_RESULT {
  89     ...
  90 }
  91 ```
  92
  93 Providers take two arguments: the `tcx` and the query key.
  94 They return the result of the query.
  95
  96 ####  How providers are setup
  97
  98 When the tcx is created, it is given the providers by its creator using
  99 the `Providers` struct. This struct is generated by the macros here, but it
 100 is basically a big list of function pointers:
 101
 102 ```rust,ignore
 103 struct Providers {
 104     type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
 105     ...
 106 }
 107 ```
 108
 109 At present, we have one copy of the struct for local crates, and one
 110 for external crates, though the plan is that we may eventually have
 111 one per crate.
 112
 113 These `Provider` structs are ultimately created and populated by
 114 `librustc_driver`, but it does this by distributing the work
 115 throughout the other `rustc_*` crates. This is done by invoking
 116 various `provide` functions. These functions tend to look something
 117 like this:
 118
 119 ```rust,ignore
 120 pub fn provide(providers: &mut Providers) {
 121     *providers = Providers {
 122         type_of,
 123         ..*providers
 124     };
 125 }
 126 ```
 127
 128 That is, they take an `&mut Providers` and mutate it in place. Usually
 129 we use the formulation above just because it looks nice, but you could
 130 as well do `providers.type_of = type_of`, which would be equivalent.
 131 (Here, `type_of` would be a top-level function, defined as we saw
 132 before.) So, if we want to add a provider for some other query,
 133 let's call it `fubar`, into the crate above, we might modify the `provide()`
 134 function like so:
 135
 136 ```rust,ignore
 137 pub fn provide(providers: &mut Providers) {
 138     *providers = Providers {
 139         type_of,
 140         fubar,
 141         ..*providers
 142     };
 143 }
 144
 145 fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
 146 ```
 147
 148 N.B. Most of the `rustc_*` crates only provide **local
 149 providers**. Almost all **extern providers** wind up going through the
 150 [`rustc_metadata` crate][rustc_metadata], which loads the information from the
 151 crate metadata. But in some cases there are crates that provide queries for
 152 *both* local and external crates, in which case they define both a
 153 `provide` and a `provide_extern` function that `rustc_driver` can
 154 invoke.
 155
 156 [rustc_metadata]: https://github.com/rust-lang/rust/tree/master/src/librustc_metadata
 157
 158 ### Adding a new kind of query
 159
 160 So suppose you want to add a new kind of query, how do you do so?
 161 Well, defining a query takes place in two steps:
 162
 163 1. first, you have to specify the query name and arguments; and then,
 164 2. you have to supply query providers where needed.
 165
 166 To specify the query name and arguments, you simply add an entry to
 167 the big macro invocation in
 168 [`src/librustc/query/mod.rs`][query-mod], which looks something like:
 169
 170 [query-mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/query/index.html
 171
 172 ```rust,ignore
 173 rustc_queries! {
 174     Other {
 175         /// Records the type of every item.
 176         query type_of(key: DefId) -> Ty<'tcx> {
 177             cache { key.is_local() }
 178         }
 179     }
 180
 181     ...
 182 }
 183 ```
 184
 185 Queries are grouped into categories (`Other`, `Codegen`, `TypeChecking`, etc.).
 186 Each group contains one or more queries. Each query definition is broken up like
 187 this:
 188
 189 ```rust,ignore
 190 query type_of(key: DefId) -> Ty<'tcx> { ... }
 191 ^^    ^^^^^^^      ^^^^^     ^^^^^^^^   ^^^
 192 |     |            |         |          |
 193 |     |            |         |          query modifiers
 194 |     |            |         result type of query
 195 |     |            query key type
 196 |     name of query
 197 query keyword
 198 ```
 199
 200 Let's go over them one by one:
 201
 202 - **Query keyword:** indicates a start of a query definition.
 203 - **Name of query:** the name of the query method
 204   (`tcx.type_of(..)`). Also used as the name of a struct
 205   (`ty::queries::type_of`) that will be generated to represent
 206   this query.
 207 - **Query key type:** the type of the argument to this query.
 208   This type must implement the `ty::query::keys::Key` trait, which
 209   defines (for example) how to map it to a crate, and so forth.
 210 - **Result type of query:** the type produced by this query. This type
 211   should (a) not use `RefCell` or other interior mutability and (b) be
 212   cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
 213   non-trivial data types.
 214   - The one exception to those rules is the `ty::steal::Steal` type,
 215     which is used to cheaply modify MIR in place. See the definition
 216     of `Steal` for more details. New uses of `Steal` should **not** be
 217     added without alerting `@rust-lang/compiler`.
 218 - **Query modifiers:** various flags and options that customize how the
 219   query is processed.
 220
 221 So, to add a query:
 222
 223 - Add an entry to `rustc_queries!` using the format above.
 224 - Link the provider by modifying the appropriate `provide` method;
 225   or add a new one if needed and ensure that `rustc_driver` is invoking it.
 226
 227 #### Query structs and descriptions
 228
 229 For each kind, the `rustc_queries` macro will generate a "query struct"
 230 named after the query. This struct is a kind of a place-holder
 231 describing the query. Each such struct implements the
 232 `self::config::QueryConfig` trait, which has associated types for the
 233 key/value of that particular query. Basically the code generated looks something
 234 like this:
 235
 236 ```rust,ignore
 237 // Dummy struct representing a particular kind of query:
 238 pub struct type_of<'tcx> { data: PhantomData<&'tcx ()> }
 239
 240 impl<'tcx> QueryConfig for type_of<'tcx> {
 241   type Key = DefId;
 242   type Value = Ty<'tcx>;
 243
 244   const NAME: QueryName = QueryName::type_of;
 245   const CATEGORY: ProfileCategory = ProfileCategory::Other;
 246 }
 247 ```
 248
 249 There is an additional trait that you may wish to implement called
 250 `self::config::QueryDescription`. This trait is used during cycle
 251 errors to give a "human readable" name for the query, so that we can
 252 summarize what was happening when the cycle occurred. Implementing
 253 this trait is optional if the query key is `DefId`, but if you *don't*
 254 implement it, you get a pretty generic error ("processing `foo`...").
 255 You can put new impls into the `config` module. They look something like this:
 256
 257 ```rust,ignore
 258 impl<'tcx> QueryDescription for queries::type_of<'tcx> {
 259     fn describe(tcx: TyCtxt, key: DefId) -> String {
 260         format!("computing the type of `{}`", tcx.def_path_str(key))
 261     }
 262 }
 263 ```
 264
 265 Another option is to add `desc` modifier:
 266
 267 ```rust,ignore
 268 rustc_queries! {
 269     Other {
 270         /// Records the type of every item.
 271         query type_of(key: DefId) -> Ty<'tcx> {
 272             desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
 273         }
 274     }
 275 }
 276 ```
 277
 278 `rustc_queries` macro will generate an appropriate `impl` automatically.
 279
 280 [query-model]: queries/incremental-compilation-in-detail.md