]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/high-level-overview.md
New upstream version 1.43.0+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / high-level-overview.md
CommitLineData
a1dfa0c6
XL
1# High-level overview of the compiler source
2
3## Crate structure
4
5The main Rust repository consists of a `src` directory, under which
6there live many crates. These crates contain the sources for the
7standard library and the compiler. This document, of course, focuses
8on the latter.
9
74b04a01 10Rustc consists of a number of crates, including `rustc_ast`,
60c5eb7d 11`rustc`, `rustc_target`, `rustc_codegen`, `rustc_driver`, and
a1dfa0c6
XL
12many more. The source for each crate can be found in a directory
13like `src/libXXX`, where `XXX` is the crate name.
14
15(N.B. The names and divisions of these crates are not set in
16stone and may change over time. For the time being, we tend towards a
17finer-grained division to help with compilation time, though as incremental
18compilation improves, that may change.)
19
20The dependency structure of these crates is roughly a diamond:
21
22```text
23 rustc_driver
24 / | \
25 / | \
26 / | \
27 / v \
28rustc_codegen rustc_borrowck ... rustc_metadata
29 \ | /
30 \ | /
31 \ | /
32 \ v /
33 rustc
34 |
35 v
74b04a01 36 rustc_ast
a1dfa0c6
XL
37 / \
38 / \
dfeec247 39 rustc_span rustc_builtin_macros
a1dfa0c6
XL
40```
41
42The `rustc_driver` crate, at the top of this lattice, is effectively
43the "main" function for the rust compiler. It doesn't have much "real
44code", but instead ties together all of the code defined in the other
45crates and defines the overall flow of execution. (As we transition
46more and more to the [query model], however, the
47"flow" of compilation is becoming less centrally defined.)
48
49At the other extreme, the `rustc` crate defines the common and
50pervasive data structures that all the rest of the compiler uses
51(e.g. how to represent types, traits, and the program itself). It
52also contains some amount of the compiler itself, although that is
53relatively limited.
54
55Finally, all the crates in the bulge in the middle define the bulk of
56the compiler – they all depend on `rustc`, so that they can make use
57of the various types defined there, and they export public routines
58that `rustc_driver` will invoke as needed (more and more, what these
59crates export are "query definitions", but those are covered later
60on).
61
62Below `rustc` lie various crates that make up the parser and error
63reporting mechanism. For historical reasons, these crates do not have
64the `rustc_` prefix, but they are really just as much an internal part
65of the compiler and not intended to be stable (though they do wind up
66getting used by some crates in the wild; a practice we hope to
67gradually phase out).
68
a1dfa0c6
XL
69## The main stages of compilation
70
71The Rust compiler is in a bit of transition right now. It used to be a
72purely "pass-based" compiler, where we ran a number of passes over the
73entire program, and each did a particular check of transformation. We
74are gradually replacing this pass-based code with an alternative setup
75based on on-demand **queries**. In the query-model, we work backwards,
76executing a *query* that expresses our ultimate goal (e.g. "compile
77this crate"). This query in turn may make other queries (e.g. "get me
78a list of all modules in the crate"). Those queries make other queries
79that ultimately bottom out in the base operations, like parsing the
80input, running the type-checker, and so forth. This on-demand model
81permits us to do exciting things like only do the minimal amount of
82work needed to type-check a single function. It also helps with
83incremental compilation. (For details on defining queries, check out
84the [query model].)
85
86Regardless of the general setup, the basic operations that the
87compiler must perform are the same. The only thing that changes is
88whether these operations are invoked front-to-back, or on demand. In
89order to compile a Rust crate, these are the general steps that we
90take:
91
921. **Parsing input**
93 - this processes the `.rs` files and produces the AST
94 ("abstract syntax tree")
74b04a01 95 - the AST is defined in `src/librustc_ast/ast.rs`. It is intended to match the lexical
a1dfa0c6
XL
96 syntax of the Rust language quite closely.
972. **Name resolution, macro expansion, and configuration**
98 - once parsing is complete, we process the AST recursively, resolving
99 paths and expanding macros. This same process also processes `#[cfg]`
100 nodes, and hence may strip things out of the AST as well.
1013. **Lowering to HIR**
102 - Once name resolution completes, we convert the AST into the HIR,
103 or "[high-level intermediate representation]". The HIR is defined in
104 `src/librustc/hir/`; that module also includes the [lowering] code.
105 - The HIR is a lightly desugared variant of the AST. It is more processed
106 than the AST and more suitable for the analyses that follow.
107 It is **not** required to match the syntax of the Rust language.
108 - As a simple example, in the **AST**, we preserve the parentheses
109 that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse
110 into distinct trees, even though they are equivalent. In the
111 HIR, however, parentheses nodes are removed, and those two
112 expressions are represented in the same way.
1133. **Type-checking and subsequent analyses**
114 - An important step in processing the HIR is to perform type
115 checking. This process assigns types to every HIR expression,
116 for example, and also is responsible for resolving some
117 "type-dependent" paths, such as field accesses (`x.f` – we
118 can't know what field `f` is being accessed until we know the
119 type of `x`) and associated type references (`T::Item` – we
120 can't know what type `Item` is until we know what `T` is).
121 - Type checking creates "side-tables" (`TypeckTables`) that include
122 the types of expressions, the way to resolve methods, and so forth.
123 - After type-checking, we can do other analyses, such as privacy checking.
1244. **Lowering to MIR and post-processing**
125 - Once type-checking is done, we can lower the HIR into MIR ("middle IR"),
126 which is a **very** desugared version of Rust, well suited to borrowck
127 but also to certain high-level optimizations.
1285. **Translation to LLVM and LLVM optimizations**
129 - From MIR, we can produce LLVM IR.
130 - LLVM then runs its various optimizations, which produces a number of
131 `.o` files (one for each "codegen unit").
1326. **Linking**
133 - Finally, those `.o` files are linked together.
134
135
136[query model]: query.html
137[high-level intermediate representation]: hir.html
60c5eb7d 138[lowering]: lowering.html