]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # High-level overview of the compiler source |
2 | ||
3 | ## Crate structure | |
4 | ||
5 | The main Rust repository consists of a `src` directory, under which | |
6 | there live many crates. These crates contain the sources for the | |
7 | standard library and the compiler. This document, of course, focuses | |
8 | on the latter. | |
9 | ||
74b04a01 | 10 | Rustc consists of a number of crates, including `rustc_ast`, |
60c5eb7d | 11 | `rustc`, `rustc_target`, `rustc_codegen`, `rustc_driver`, and |
a1dfa0c6 XL |
12 | many more. The source for each crate can be found in a directory |
13 | like `src/libXXX`, where `XXX` is the crate name. | |
14 | ||
15 | (N.B. The names and divisions of these crates are not set in | |
16 | stone and may change over time. For the time being, we tend towards a | |
17 | finer-grained division to help with compilation time, though as incremental | |
18 | compilation improves, that may change.) | |
19 | ||
20 | The dependency structure of these crates is roughly a diamond: | |
21 | ||
22 | ```text | |
23 | rustc_driver | |
24 | / | \ | |
25 | / | \ | |
26 | / | \ | |
27 | / v \ | |
28 | rustc_codegen rustc_borrowck ... rustc_metadata | |
29 | \ | / | |
30 | \ | / | |
31 | \ | / | |
32 | \ v / | |
33 | rustc | |
34 | | | |
35 | v | |
74b04a01 | 36 | rustc_ast |
a1dfa0c6 XL |
37 | / \ |
38 | / \ | |
dfeec247 | 39 | rustc_span rustc_builtin_macros |
a1dfa0c6 XL |
40 | ``` |
41 | ||
42 | The `rustc_driver` crate, at the top of this lattice, is effectively | |
43 | the "main" function for the rust compiler. It doesn't have much "real | |
44 | code", but instead ties together all of the code defined in the other | |
45 | crates and defines the overall flow of execution. (As we transition | |
46 | more and more to the [query model], however, the | |
47 | "flow" of compilation is becoming less centrally defined.) | |
48 | ||
49 | At the other extreme, the `rustc` crate defines the common and | |
50 | pervasive data structures that all the rest of the compiler uses | |
51 | (e.g. how to represent types, traits, and the program itself). It | |
52 | also contains some amount of the compiler itself, although that is | |
53 | relatively limited. | |
54 | ||
55 | Finally, all the crates in the bulge in the middle define the bulk of | |
56 | the compiler – they all depend on `rustc`, so that they can make use | |
57 | of the various types defined there, and they export public routines | |
58 | that `rustc_driver` will invoke as needed (more and more, what these | |
59 | crates export are "query definitions", but those are covered later | |
60 | on). | |
61 | ||
62 | Below `rustc` lie various crates that make up the parser and error | |
63 | reporting mechanism. For historical reasons, these crates do not have | |
64 | the `rustc_` prefix, but they are really just as much an internal part | |
65 | of the compiler and not intended to be stable (though they do wind up | |
66 | getting used by some crates in the wild; a practice we hope to | |
67 | gradually phase out). | |
68 | ||
a1dfa0c6 XL |
69 | ## The main stages of compilation |
70 | ||
71 | The Rust compiler is in a bit of transition right now. It used to be a | |
72 | purely "pass-based" compiler, where we ran a number of passes over the | |
73 | entire program, and each did a particular check of transformation. We | |
74 | are gradually replacing this pass-based code with an alternative setup | |
75 | based on on-demand **queries**. In the query-model, we work backwards, | |
76 | executing a *query* that expresses our ultimate goal (e.g. "compile | |
77 | this crate"). This query in turn may make other queries (e.g. "get me | |
78 | a list of all modules in the crate"). Those queries make other queries | |
79 | that ultimately bottom out in the base operations, like parsing the | |
80 | input, running the type-checker, and so forth. This on-demand model | |
81 | permits us to do exciting things like only do the minimal amount of | |
82 | work needed to type-check a single function. It also helps with | |
83 | incremental compilation. (For details on defining queries, check out | |
84 | the [query model].) | |
85 | ||
86 | Regardless of the general setup, the basic operations that the | |
87 | compiler must perform are the same. The only thing that changes is | |
88 | whether these operations are invoked front-to-back, or on demand. In | |
89 | order to compile a Rust crate, these are the general steps that we | |
90 | take: | |
91 | ||
92 | 1. **Parsing input** | |
93 | - this processes the `.rs` files and produces the AST | |
94 | ("abstract syntax tree") | |
74b04a01 | 95 | - the AST is defined in `src/librustc_ast/ast.rs`. It is intended to match the lexical |
a1dfa0c6 XL |
96 | syntax of the Rust language quite closely. |
97 | 2. **Name resolution, macro expansion, and configuration** | |
98 | - once parsing is complete, we process the AST recursively, resolving | |
99 | paths and expanding macros. This same process also processes `#[cfg]` | |
100 | nodes, and hence may strip things out of the AST as well. | |
101 | 3. **Lowering to HIR** | |
102 | - Once name resolution completes, we convert the AST into the HIR, | |
103 | or "[high-level intermediate representation]". The HIR is defined in | |
104 | `src/librustc/hir/`; that module also includes the [lowering] code. | |
105 | - The HIR is a lightly desugared variant of the AST. It is more processed | |
106 | than the AST and more suitable for the analyses that follow. | |
107 | It is **not** required to match the syntax of the Rust language. | |
108 | - As a simple example, in the **AST**, we preserve the parentheses | |
109 | that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse | |
110 | into distinct trees, even though they are equivalent. In the | |
111 | HIR, however, parentheses nodes are removed, and those two | |
112 | expressions are represented in the same way. | |
113 | 3. **Type-checking and subsequent analyses** | |
114 | - An important step in processing the HIR is to perform type | |
115 | checking. This process assigns types to every HIR expression, | |
116 | for example, and also is responsible for resolving some | |
117 | "type-dependent" paths, such as field accesses (`x.f` – we | |
118 | can't know what field `f` is being accessed until we know the | |
119 | type of `x`) and associated type references (`T::Item` – we | |
120 | can't know what type `Item` is until we know what `T` is). | |
121 | - Type checking creates "side-tables" (`TypeckTables`) that include | |
122 | the types of expressions, the way to resolve methods, and so forth. | |
123 | - After type-checking, we can do other analyses, such as privacy checking. | |
124 | 4. **Lowering to MIR and post-processing** | |
125 | - Once type-checking is done, we can lower the HIR into MIR ("middle IR"), | |
126 | which is a **very** desugared version of Rust, well suited to borrowck | |
127 | but also to certain high-level optimizations. | |
128 | 5. **Translation to LLVM and LLVM optimizations** | |
129 | - From MIR, we can produce LLVM IR. | |
130 | - LLVM then runs its various optimizations, which produces a number of | |
131 | `.o` files (one for each "codegen unit"). | |
132 | 6. **Linking** | |
133 | - Finally, those `.o` files are linked together. | |
134 | ||
135 | ||
136 | [query model]: query.html | |
137 | [high-level intermediate representation]: hir.html | |
60c5eb7d | 138 | [lowering]: lowering.html |