]> git.proxmox.com Git - rustc.git/blame - src/librustc/README.md
New upstream version 1.23.0+dfsg1
[rustc.git] / src / librustc / README.md
CommitLineData
c34b1796
AL
1An informal guide to reading and working on the rustc compiler.
2==================================================================
3
4If you wish to expand on this document, or have a more experienced
5Rust contributor add anything else to it, please get in touch:
6
e9174d1e 7* https://internals.rust-lang.org/
c34b1796
AL
8* https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust
9
10or file a bug:
11
12https://github.com/rust-lang/rust/issues
13
14Your concerns are probably the same as someone else's.
15
ea8adc8c
XL
16You may also be interested in the
17[Rust Forge](https://forge.rust-lang.org/), which includes a number of
18interesting bits of information.
19
20Finally, at the end of this file is a GLOSSARY defining a number of
21common (and not necessarily obvious!) names that are used in the Rust
22compiler code. If you see some funky name and you'd like to know what
23it stands for, check there!
24
c34b1796
AL
25The crates of rustc
26===================
27
ea8adc8c
XL
28Rustc consists of a number of crates, including `syntax`,
29`rustc`, `rustc_back`, `rustc_trans`, `rustc_driver`, and
30many more. The source for each crate can be found in a directory
31like `src/libXXX`, where `XXX` is the crate name.
32
33(NB. The names and divisions of these crates are not set in
34stone and may change over time -- for the time being, we tend towards
35a finer-grained division to help with compilation time, though as
36incremental improves that may change.)
37
38The dependency structure of these crates is roughly a diamond:
39
40```
41 rustc_driver
42 / | \
43 / | \
44 / | \
45 / v \
46rustc_trans rustc_borrowck ... rustc_metadata
47 \ | /
48 \ | /
49 \ | /
50 \ v /
51 rustc
52 |
53 v
54 syntax
55 / \
56 / \
57 syntax_pos syntax_ext
58```
59
60The `rustc_driver` crate, at the top of this lattice, is effectively
61the "main" function for the rust compiler. It doesn't have much "real
62code", but instead ties together all of the code defined in the other
63crates and defines the overall flow of execution. (As we transition
64more and more to the [query model](ty/maps/README.md), however, the
65"flow" of compilation is becoming less centrally defined.)
66
67At the other extreme, the `rustc` crate defines the common and
68pervasive data structures that all the rest of the compiler uses
69(e.g., how to represent types, traits, and the program itself). It
70also contains some amount of the compiler itself, although that is
71relatively limited.
72
73Finally, all the crates in the bulge in the middle define the bulk of
74the compiler -- they all depend on `rustc`, so that they can make use
75of the various types defined there, and they export public routines
76that `rustc_driver` will invoke as needed (more and more, what these
77crates export are "query definitions", but those are covered later
78on).
79
80Below `rustc` lie various crates that make up the parser and error
81reporting mechanism. For historical reasons, these crates do not have
82the `rustc_` prefix, but they are really just as much an internal part
83of the compiler and not intended to be stable (though they do wind up
84getting used by some crates in the wild; a practice we hope to
85gradually phase out).
86
87Each crate has a `README.md` file that describes, at a high-level,
88what it contains, and tries to give some kind of explanation (some
89better than others).
90
91The compiler process
92====================
93
94The Rust compiler is in a bit of transition right now. It used to be a
95purely "pass-based" compiler, where we ran a number of passes over the
96entire program, and each did a particular check of transformation.
97
98We are gradually replacing this pass-based code with an alternative
99setup based on on-demand **queries**. In the query-model, we work
100backwards, executing a *query* that expresses our ultimate goal (e.g.,
abe05a73 101"compile this crate"). This query in turn may make other queries
ea8adc8c
XL
102(e.g., "get me a list of all modules in the crate"). Those queries
103make other queries that ultimately bottom out in the base operations,
104like parsing the input, running the type-checker, and so forth. This
105on-demand model permits us to do exciting things like only do the
106minimal amount of work needed to type-check a single function. It also
107helps with incremental compilation. (For details on defining queries,
108check out `src/librustc/ty/maps/README.md`.)
109
110Regardless of the general setup, the basic operations that the
111compiler must perform are the same. The only thing that changes is
112whether these operations are invoked front-to-back, or on demand. In
113order to compile a Rust crate, these are the general steps that we
114take:
115
1161. **Parsing input**
117 - this processes the `.rs` files and produces the AST ("abstract syntax tree")
118 - the AST is defined in `syntax/ast.rs`. It is intended to match the lexical
119 syntax of the Rust language quite closely.
1202. **Name resolution, macro expansion, and configuration**
121 - once parsing is complete, we process the AST recursively, resolving paths
122 and expanding macros. This same process also processes `#[cfg]` nodes, and hence
123 may strip things out of the AST as well.
1243. **Lowering to HIR**
125 - Once name resolution completes, we convert the AST into the HIR,
126 or "high-level IR". The HIR is defined in `src/librustc/hir/`; that module also includes
127 the lowering code.
128 - The HIR is a lightly desugared variant of the AST. It is more processed than the
129 AST and more suitable for the analyses that follow. It is **not** required to match
130 the syntax of the Rust language.
131 - As a simple example, in the **AST**, we preserve the parentheses
132 that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse
133 into distinct trees, even though they are equivalent. In the
134 HIR, however, parentheses nodes are removed, and those two
135 expressions are represented in the same way.
1363. **Type-checking and subsequent analyses**
137 - An important step in processing the HIR is to perform type
138 checking. This process assigns types to every HIR expression,
139 for example, and also is responsible for resolving some
140 "type-dependent" paths, such as field accesses (`x.f` -- we
141 can't know what field `f` is being accessed until we know the
142 type of `x`) and associated type references (`T::Item` -- we
143 can't know what type `Item` is until we know what `T` is).
144 - Type checking creates "side-tables" (`TypeckTables`) that include
145 the types of expressions, the way to resolve methods, and so forth.
146 - After type-checking, we can do other analyses, such as privacy checking.
1474. **Lowering to MIR and post-processing**
148 - Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which
149 is a **very** desugared version of Rust, well suited to the borrowck but also
150 certain high-level optimizations.
1515. **Translation to LLVM and LLVM optimizations**
152 - From MIR, we can produce LLVM IR.
153 - LLVM then runs its various optimizations, which produces a number of `.o` files
154 (one for each "codegen unit").
1556. **Linking**
156 - Finally, those `.o` files are linked together.
157
158Glossary
159========
160
161The compiler uses a number of...idiosyncratic abbreviations and
162things. This glossary attempts to list them and give you a few
163pointers for understanding them better.
164
abe05a73 165- AST -- the **abstract syntax tree** produced by the `syntax` crate; reflects user syntax
ea8adc8c
XL
166 very closely.
167- codegen unit -- when we produce LLVM IR, we group the Rust code into a number of codegen
168 units. Each of these units is processed by LLVM independently from one another,
169 enabling parallelism. They are also the unit of incremental re-use.
170- cx -- we tend to use "cx" as an abbrevation for context. See also tcx, infcx, etc.
171- `DefId` -- an index identifying a **definition** (see `librustc/hir/def_id.rs`). Uniquely
172 identifies a `DefPath`.
173- HIR -- the **High-level IR**, created by lowering and desugaring the AST. See `librustc/hir`.
174- `HirId` -- identifies a particular node in the HIR by combining a
175 def-id with an "intra-definition offset".
176- `'gcx` -- the lifetime of the global arena (see `librustc/ty`).
177- generics -- the set of generic type parameters defined on a type or item
178- ICE -- internal compiler error. When the compiler crashes.
179- infcx -- the inference context (see `librustc/infer`)
180- MIR -- the **Mid-level IR** that is created after type-checking for use by borrowck and trans.
181 Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is
182 found in `src/librustc_mir`.
183- obligation -- something that must be proven by the trait system; see `librustc/traits`.
184- local crate -- the crate currently being compiled.
185- node-id or `NodeId` -- an index identifying a particular node in the
186 AST or HIR; gradually being phased out and replaced with `HirId`.
187- query -- perhaps some sub-computation during compilation; see `librustc/maps`.
188- provider -- the function that executes a query; see `librustc/maps`.
189- sess -- the **compiler session**, which stores global data used throughout compilation
190- side tables -- because the AST and HIR are immutable once created, we often carry extra
191 information about them in the form of hashtables, indexed by the id of a particular node.
192- span -- a location in the user's source code, used for error
193 reporting primarily. These are like a file-name/line-number/column
194 tuple on steroids: they carry a start/end point, and also track
195 macro expansions and compiler desugaring. All while being packed
196 into a few bytes (really, it's an index into a table). See the
197 `Span` datatype for more.
198- substs -- the **substitutions** for a given generic type or item
199 (e.g., the `i32, u32` in `HashMap<i32, u32>`)
200- tcx -- the "typing context", main data structure of the compiler (see `librustc/ty`).
201- trans -- the code to **translate** MIR into LLVM IR.
202- trait reference -- a trait and values for its type parameters (see `librustc/ty`).
203- ty -- the internal representation of a **type** (see `librustc/ty`).