]>
Commit | Line | Data |
---|---|---|
c34b1796 AL |
1 | An informal guide to reading and working on the rustc compiler. |
2 | ================================================================== | |
3 | ||
4 | If you wish to expand on this document, or have a more experienced | |
5 | Rust contributor add anything else to it, please get in touch: | |
6 | ||
e9174d1e | 7 | * https://internals.rust-lang.org/ |
c34b1796 AL |
8 | * https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust |
9 | ||
10 | or file a bug: | |
11 | ||
12 | https://github.com/rust-lang/rust/issues | |
13 | ||
14 | Your concerns are probably the same as someone else's. | |
15 | ||
ea8adc8c XL |
16 | You may also be interested in the |
17 | [Rust Forge](https://forge.rust-lang.org/), which includes a number of | |
18 | interesting bits of information. | |
19 | ||
20 | Finally, at the end of this file is a GLOSSARY defining a number of | |
21 | common (and not necessarily obvious!) names that are used in the Rust | |
22 | compiler code. If you see some funky name and you'd like to know what | |
23 | it stands for, check there! | |
24 | ||
c34b1796 AL |
25 | The crates of rustc |
26 | =================== | |
27 | ||
ea8adc8c XL |
28 | Rustc consists of a number of crates, including `syntax`, |
29 | `rustc`, `rustc_back`, `rustc_trans`, `rustc_driver`, and | |
30 | many more. The source for each crate can be found in a directory | |
31 | like `src/libXXX`, where `XXX` is the crate name. | |
32 | ||
33 | (NB. The names and divisions of these crates are not set in | |
34 | stone and may change over time -- for the time being, we tend towards | |
35 | a finer-grained division to help with compilation time, though as | |
36 | incremental improves that may change.) | |
37 | ||
38 | The dependency structure of these crates is roughly a diamond: | |
39 | ||
40 | ``` | |
41 | rustc_driver | |
42 | / | \ | |
43 | / | \ | |
44 | / | \ | |
45 | / v \ | |
46 | rustc_trans rustc_borrowck ... rustc_metadata | |
47 | \ | / | |
48 | \ | / | |
49 | \ | / | |
50 | \ v / | |
51 | rustc | |
52 | | | |
53 | v | |
54 | syntax | |
55 | / \ | |
56 | / \ | |
57 | syntax_pos syntax_ext | |
58 | ``` | |
59 | ||
60 | The `rustc_driver` crate, at the top of this lattice, is effectively | |
61 | the "main" function for the rust compiler. It doesn't have much "real | |
62 | code", but instead ties together all of the code defined in the other | |
63 | crates and defines the overall flow of execution. (As we transition | |
64 | more and more to the [query model](ty/maps/README.md), however, the | |
65 | "flow" of compilation is becoming less centrally defined.) | |
66 | ||
67 | At the other extreme, the `rustc` crate defines the common and | |
68 | pervasive data structures that all the rest of the compiler uses | |
69 | (e.g., how to represent types, traits, and the program itself). It | |
70 | also contains some amount of the compiler itself, although that is | |
71 | relatively limited. | |
72 | ||
73 | Finally, all the crates in the bulge in the middle define the bulk of | |
74 | the compiler -- they all depend on `rustc`, so that they can make use | |
75 | of the various types defined there, and they export public routines | |
76 | that `rustc_driver` will invoke as needed (more and more, what these | |
77 | crates export are "query definitions", but those are covered later | |
78 | on). | |
79 | ||
80 | Below `rustc` lie various crates that make up the parser and error | |
81 | reporting mechanism. For historical reasons, these crates do not have | |
82 | the `rustc_` prefix, but they are really just as much an internal part | |
83 | of the compiler and not intended to be stable (though they do wind up | |
84 | getting used by some crates in the wild; a practice we hope to | |
85 | gradually phase out). | |
86 | ||
87 | Each crate has a `README.md` file that describes, at a high-level, | |
88 | what it contains, and tries to give some kind of explanation (some | |
89 | better than others). | |
90 | ||
91 | The compiler process | |
92 | ==================== | |
93 | ||
94 | The Rust compiler is in a bit of transition right now. It used to be a | |
95 | purely "pass-based" compiler, where we ran a number of passes over the | |
96 | entire program, and each did a particular check of transformation. | |
97 | ||
98 | We are gradually replacing this pass-based code with an alternative | |
99 | setup based on on-demand **queries**. In the query-model, we work | |
100 | backwards, executing a *query* that expresses our ultimate goal (e.g., | |
abe05a73 | 101 | "compile this crate"). This query in turn may make other queries |
ea8adc8c XL |
102 | (e.g., "get me a list of all modules in the crate"). Those queries |
103 | make other queries that ultimately bottom out in the base operations, | |
104 | like parsing the input, running the type-checker, and so forth. This | |
105 | on-demand model permits us to do exciting things like only do the | |
106 | minimal amount of work needed to type-check a single function. It also | |
107 | helps with incremental compilation. (For details on defining queries, | |
108 | check out `src/librustc/ty/maps/README.md`.) | |
109 | ||
110 | Regardless of the general setup, the basic operations that the | |
111 | compiler must perform are the same. The only thing that changes is | |
112 | whether these operations are invoked front-to-back, or on demand. In | |
113 | order to compile a Rust crate, these are the general steps that we | |
114 | take: | |
115 | ||
116 | 1. **Parsing input** | |
117 | - this processes the `.rs` files and produces the AST ("abstract syntax tree") | |
118 | - the AST is defined in `syntax/ast.rs`. It is intended to match the lexical | |
119 | syntax of the Rust language quite closely. | |
120 | 2. **Name resolution, macro expansion, and configuration** | |
121 | - once parsing is complete, we process the AST recursively, resolving paths | |
122 | and expanding macros. This same process also processes `#[cfg]` nodes, and hence | |
123 | may strip things out of the AST as well. | |
124 | 3. **Lowering to HIR** | |
125 | - Once name resolution completes, we convert the AST into the HIR, | |
126 | or "high-level IR". The HIR is defined in `src/librustc/hir/`; that module also includes | |
127 | the lowering code. | |
128 | - The HIR is a lightly desugared variant of the AST. It is more processed than the | |
129 | AST and more suitable for the analyses that follow. It is **not** required to match | |
130 | the syntax of the Rust language. | |
131 | - As a simple example, in the **AST**, we preserve the parentheses | |
132 | that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse | |
133 | into distinct trees, even though they are equivalent. In the | |
134 | HIR, however, parentheses nodes are removed, and those two | |
135 | expressions are represented in the same way. | |
136 | 3. **Type-checking and subsequent analyses** | |
137 | - An important step in processing the HIR is to perform type | |
138 | checking. This process assigns types to every HIR expression, | |
139 | for example, and also is responsible for resolving some | |
140 | "type-dependent" paths, such as field accesses (`x.f` -- we | |
141 | can't know what field `f` is being accessed until we know the | |
142 | type of `x`) and associated type references (`T::Item` -- we | |
143 | can't know what type `Item` is until we know what `T` is). | |
144 | - Type checking creates "side-tables" (`TypeckTables`) that include | |
145 | the types of expressions, the way to resolve methods, and so forth. | |
146 | - After type-checking, we can do other analyses, such as privacy checking. | |
147 | 4. **Lowering to MIR and post-processing** | |
148 | - Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which | |
149 | is a **very** desugared version of Rust, well suited to the borrowck but also | |
150 | certain high-level optimizations. | |
151 | 5. **Translation to LLVM and LLVM optimizations** | |
152 | - From MIR, we can produce LLVM IR. | |
153 | - LLVM then runs its various optimizations, which produces a number of `.o` files | |
154 | (one for each "codegen unit"). | |
155 | 6. **Linking** | |
156 | - Finally, those `.o` files are linked together. | |
157 | ||
158 | Glossary | |
159 | ======== | |
160 | ||
161 | The compiler uses a number of...idiosyncratic abbreviations and | |
162 | things. This glossary attempts to list them and give you a few | |
163 | pointers for understanding them better. | |
164 | ||
abe05a73 | 165 | - AST -- the **abstract syntax tree** produced by the `syntax` crate; reflects user syntax |
ea8adc8c XL |
166 | very closely. |
167 | - codegen unit -- when we produce LLVM IR, we group the Rust code into a number of codegen | |
168 | units. Each of these units is processed by LLVM independently from one another, | |
169 | enabling parallelism. They are also the unit of incremental re-use. | |
170 | - cx -- we tend to use "cx" as an abbrevation for context. See also tcx, infcx, etc. | |
171 | - `DefId` -- an index identifying a **definition** (see `librustc/hir/def_id.rs`). Uniquely | |
172 | identifies a `DefPath`. | |
173 | - HIR -- the **High-level IR**, created by lowering and desugaring the AST. See `librustc/hir`. | |
174 | - `HirId` -- identifies a particular node in the HIR by combining a | |
175 | def-id with an "intra-definition offset". | |
176 | - `'gcx` -- the lifetime of the global arena (see `librustc/ty`). | |
177 | - generics -- the set of generic type parameters defined on a type or item | |
178 | - ICE -- internal compiler error. When the compiler crashes. | |
179 | - infcx -- the inference context (see `librustc/infer`) | |
180 | - MIR -- the **Mid-level IR** that is created after type-checking for use by borrowck and trans. | |
181 | Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is | |
182 | found in `src/librustc_mir`. | |
183 | - obligation -- something that must be proven by the trait system; see `librustc/traits`. | |
184 | - local crate -- the crate currently being compiled. | |
185 | - node-id or `NodeId` -- an index identifying a particular node in the | |
186 | AST or HIR; gradually being phased out and replaced with `HirId`. | |
187 | - query -- perhaps some sub-computation during compilation; see `librustc/maps`. | |
188 | - provider -- the function that executes a query; see `librustc/maps`. | |
189 | - sess -- the **compiler session**, which stores global data used throughout compilation | |
190 | - side tables -- because the AST and HIR are immutable once created, we often carry extra | |
191 | information about them in the form of hashtables, indexed by the id of a particular node. | |
192 | - span -- a location in the user's source code, used for error | |
193 | reporting primarily. These are like a file-name/line-number/column | |
194 | tuple on steroids: they carry a start/end point, and also track | |
195 | macro expansions and compiler desugaring. All while being packed | |
196 | into a few bytes (really, it's an index into a table). See the | |
197 | `Span` datatype for more. | |
198 | - substs -- the **substitutions** for a given generic type or item | |
199 | (e.g., the `i32, u32` in `HashMap<i32, u32>`) | |
200 | - tcx -- the "typing context", main data structure of the compiler (see `librustc/ty`). | |
201 | - trans -- the code to **translate** MIR into LLVM IR. | |
202 | - trait reference -- a trait and values for its type parameters (see `librustc/ty`). | |
203 | - ty -- the internal representation of a **type** (see `librustc/ty`). |