]>
Commit | Line | Data |
---|---|---|
ba9703b0 XL |
1 | # Overview of the Compiler |
2 | ||
6a06907d XL |
3 | <!-- toc --> |
4 | ||
5 | This chapter is about the overall process of compiling a program -- how | |
6 | everything fits together. | |
7 | ||
5099ac24 | 8 | The Rust compiler is special in two ways: it does things to your code that |
6a06907d XL |
9 | other compilers don't do (e.g. borrow checking) and it has a lot of |
10 | unconventional implementation choices (e.g. queries). We will talk about these | |
11 | in turn in this chapter, and in the rest of the guide, we will look at all the | |
12 | individual pieces in more detail. | |
13 | ||
14 | ## What the compiler does to your code | |
15 | ||
16 | So first, let's look at what the compiler does to your code. For now, we will | |
17 | avoid mentioning how the compiler implements these steps except as needed; | |
18 | we'll talk about that later. | |
19 | ||
04454e1e FG |
20 | ### Invocation |
21 | ||
22 | Compilation begins when a user writes a Rust source program in text | |
23 | and invokes the `rustc` compiler on it. The work that the compiler needs to | |
24 | perform is defined by command-line options. For example, it is possible to | |
25 | enable nightly features (`-Z` flags), perform `check`-only builds, or emit | |
26 | LLVM-IR rather than executable machine code. The `rustc` executable call may | |
27 | be indirect through the use of `cargo`. | |
28 | ||
29 | Command line argument parsing occurs in the [`rustc_driver`]. This crate | |
30 | defines the compile configuration that is requested by the user and passes it | |
31 | to the rest of the compilation process as a [`rustc_interface::Config`]. | |
32 | ||
33 | ### Lexing and parsing | |
34 | ||
35 | The raw Rust source text is analyzed by a low-level *lexer* located in | |
36 | [`rustc_lexer`]. At this stage, the source text is turned into a stream of | |
37 | atomic source code units known as _tokens_. The lexer supports the | |
38 | Unicode character encoding. | |
39 | ||
40 | The token stream passes through a higher-level lexer located in | |
41 | [`rustc_parse`] to prepare for the next stage of the compile process. The | |
42 | [`StringReader`] struct is used at this stage to perform a set of validations | |
43 | and turn strings into interned symbols (_interning_ is discussed later). | |
44 | [String interning] is a way of storing only one immutable | |
45 | copy of each distinct string value. | |
46 | ||
47 | The lexer has a small interface and doesn't depend directly on the | |
48 | diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain | |
49 | data which are emitted in `rustc_parse::lexer` as real diagnostics. | |
50 | The lexer preserves full fidelity information for both IDEs and proc macros. | |
51 | ||
52 | The *parser* [translates the token stream from the lexer into an Abstract Syntax | |
53 | Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax | |
54 | analysis. The crate entry points for the parser are the | |
55 | [`Parser::parse_crate_mod()`][parse_crate_mod] and [`Parser::parse_mod()`][parse_mod] | |
56 | methods found in [`rustc_parse::parser::Parser`]. The external module parsing | |
57 | entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod]. | |
58 | And the macro parser entry point is [`Parser::parse_nonterminal()`][parse_nonterminal]. | |
59 | ||
60 | Parsing is performed with a set of `Parser` utility methods including `bump`, | |
61 | `check`, `eat`, `expect`, `look_ahead`. | |
62 | ||
63 | Parsing is organized by semantic construct. Separate | |
64 | `parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir] | |
65 | directory. The source file name follows the construct name. For example, the | |
66 | following files are found in the parser: | |
67 | ||
68 | - `expr.rs` | |
69 | - `pat.rs` | |
70 | - `ty.rs` | |
71 | - `stmt.rs` | |
72 | ||
73 | This naming scheme is used across many compiler stages. You will find | |
74 | either a file or directory with the same name across the parsing, lowering, | |
75 | type checking, THIR lowering, and MIR building sources. | |
76 | ||
77 | Macro expansion, AST validation, name resolution, and early linting also take place | |
78 | during this stage. | |
79 | ||
80 | The parser uses the standard `DiagnosticBuilder` API for error handling, but we | |
81 | try to recover, parsing a superset of Rust's grammar, while also emitting an error. | |
82 | `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser. | |
83 | ||
84 | ### HIR lowering | |
85 | ||
923072b8 | 86 | Next, we take the AST and convert it to [High-Level Intermediate |
04454e1e | 87 | Representation (HIR)][hir], a more compiler-friendly representation of the |
923072b8 | 88 | AST. This process is called "lowering". It involves a lot of desugaring of things |
04454e1e FG |
89 | like loops and `async fn`. |
90 | ||
91 | We then use the HIR to do [*type inference*] (the process of automatic | |
92 | detection of the type of an expression), [*trait solving*] (the process | |
93 | of pairing up an impl with each reference to a trait), and [*type | |
94 | checking*]. Type checking is the process of converting the types found in the HIR | |
95 | ([`hir::Ty`]), which represent what the user wrote, | |
96 | into the internal representation used by the compiler ([`Ty<'tcx>`]). | |
97 | That information is used to verify the type safety, correctness and | |
98 | coherence of the types used in the program. | |
99 | ||
100 | ### MIR lowering | |
101 | ||
102 | The HIR is then [lowered to Mid-level Intermediate Representation (MIR)][mir], | |
103 | which is used for [borrow checking]. | |
104 | ||
105 | Along the way, we also construct the THIR, which is an even more desugared HIR. | |
106 | THIR is used for pattern and exhaustiveness checking. It is also more | |
107 | convenient to convert into MIR than HIR is. | |
108 | ||
109 | We do [many optimizations on the MIR][mir-opt] because it is still | |
110 | generic and that improves the code we generate later, improving compilation | |
111 | speed too. | |
112 | MIR is a higher level (and generic) representation, so it is easier to do | |
113 | some optimizations at MIR level than at LLVM-IR level. For example LLVM | |
114 | doesn't seem to be able to optimize the pattern the [`simplify_try`] mir | |
115 | opt looks for. | |
116 | ||
117 | Rust code is _monomorphized_, which means making copies of all the generic | |
118 | code with the type parameters replaced by concrete types. To do | |
119 | this, we need to collect a list of what concrete types to generate code for. | |
120 | This is called _monomorphization collection_ and it happens at the MIR level. | |
121 | ||
122 | ### Code generation | |
123 | ||
124 | We then begin what is vaguely called _code generation_ or _codegen_. | |
125 | The [code generation stage][codegen] is when higher level | |
126 | representations of source are turned into an executable binary. `rustc` | |
127 | uses LLVM for code generation. The first step is to convert the MIR | |
128 | to LLVM Intermediate Representation (LLVM IR). This is where the MIR | |
129 | is actually monomorphized, according to the list we created in the | |
130 | previous step. | |
131 | The LLVM IR is passed to LLVM, which does a lot more optimizations on it. | |
132 | It then emits machine code. It is basically assembly code with additional | |
133 | low-level types and annotations added (e.g. an ELF object or WASM). | |
134 | The different libraries/binaries are then linked together to produce the final | |
135 | binary. | |
6a06907d XL |
136 | |
137 | [String interning]: https://en.wikipedia.org/wiki/String_interning | |
138 | [`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html | |
064997fb | 139 | [`rustc_driver`]: rustc-driver.md |
6a06907d | 140 | [`rustc_interface::Config`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html |
064997fb | 141 | [lex]: the-parser.md |
6a06907d XL |
142 | [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html |
143 | [`rustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html | |
144 | [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html | |
145 | [hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html | |
064997fb FG |
146 | [*type inference*]: type-inference.md |
147 | [*trait solving*]: traits/resolution.md | |
148 | [*type checking*]: type-checking.md | |
149 | [mir]: mir/index.md | |
150 | [borrow checking]: borrow_check.md | |
151 | [mir-opt]: mir/optimizations.md | |
6a06907d | 152 | [`simplify_try`]: https://github.com/rust-lang/rust/pull/66282 |
064997fb | 153 | [codegen]: backend/codegen.md |
6a06907d | 154 | [parse_nonterminal]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal |
c295e0f8 XL |
155 | [parse_crate_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod |
156 | [parse_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod | |
157 | [`rustc_parse::parser::Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html | |
158 | [parse_external_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html | |
159 | [rustc_parse_parser_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser | |
04454e1e FG |
160 | [`hir::Ty`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html |
161 | [`Ty<'tcx>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html | |
6a06907d XL |
162 | |
163 | ## How it does it | |
164 | ||
165 | Ok, so now that we have a high-level view of what the compiler does to your | |
166 | code, let's take a high-level view of _how_ it does all that stuff. There are a | |
167 | lot of constraints and conflicting goals that the compiler needs to | |
168 | satisfy/optimize for. For example, | |
169 | ||
170 | - Compilation speed: how fast is it to compile a program. More/better | |
171 | compile-time analyses often means compilation is slower. | |
172 | - Also, we want to support incremental compilation, so we need to take that | |
173 | into account. How can we keep track of what work needs to be redone and | |
174 | what can be reused if the user modifies their program? | |
175 | - Also we can't store too much stuff in the incremental cache because | |
176 | it would take a long time to load from disk and it could take a lot | |
177 | of space on the user's system... | |
178 | - Compiler memory usage: while compiling a program, we don't want to use more | |
179 | memory than we need. | |
2b03887a | 180 | - Program speed: how fast is your compiled program? More/better compile-time |
6a06907d XL |
181 | analyses often means the compiler can do better optimizations. |
182 | - Program size: how large is the compiled binary? Similar to the previous | |
183 | point. | |
184 | - Compiler compilation speed: how long does it take to compile the compiler? | |
185 | This impacts contributors and compiler maintenance. | |
186 | - Implementation complexity: building a compiler is one of the hardest | |
187 | things a person/group can do, and Rust is not a very simple language, so how | |
188 | do we make the compiler's code base manageable? | |
189 | - Compiler correctness: the binaries produced by the compiler should do what | |
190 | the input programs says they do, and should continue to do so despite the | |
191 | tremendous amount of change constantly going on. | |
192 | - Integration: a number of other tools need to use the compiler in | |
2b03887a | 193 | various ways (e.g. cargo, clippy, miri) that must be supported. |
6a06907d XL |
194 | - Compiler stability: the compiler should not crash or fail ungracefully on the |
195 | stable channel. | |
196 | - Rust stability: the compiler must respect Rust's stability guarantees by not | |
197 | breaking programs that previously compiled despite the many changes that are | |
198 | always going on to its implementation. | |
199 | - Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some | |
200 | strengths we leverage and some limitations/weaknesses we need to work around. | |
201 | ||
202 | So, as you read through the rest of the guide, keep these things in mind. They | |
203 | will often inform decisions that we make. | |
204 | ||
205 | ### Intermediate representations | |
206 | ||
207 | As with most compilers, `rustc` uses some intermediate representations (IRs) to | |
208 | facilitate computations. In general, working directly with the source code is | |
209 | extremely inconvenient and error-prone. Source code is designed to be human-friendly while at | |
210 | the same time being unambiguous, but it's less convenient for doing something | |
211 | like, say, type checking. | |
212 | ||
213 | Instead most compilers, including `rustc`, build some sort of IR out of the | |
214 | source code which is easier to analyze. `rustc` has a few IRs, each optimized | |
215 | for different purposes: | |
216 | ||
217 | - Token stream: the lexer produces a stream of tokens directly from the source | |
218 | code. This stream of tokens is easier for the parser to deal with than raw | |
219 | text. | |
220 | - Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream | |
221 | of tokens produced by the lexer. It represents | |
222 | pretty much exactly what the user wrote. It helps to do some syntactic sanity | |
223 | checking (e.g. checking that a type is expected where the user wrote one). | |
224 | - High-level IR (HIR): This is a sort of desugared AST. It's still close | |
225 | to what the user wrote syntactically, but it includes some implicit things | |
226 | such as some elided lifetimes, etc. This IR is amenable to type checking. | |
227 | - Typed HIR (THIR): This is an intermediate between HIR and MIR, and used to be called | |
228 | High-level Abstract IR (HAIR). It is like the HIR but it is fully typed and a bit | |
229 | more desugared (e.g. method calls and implicit dereferences are made fully explicit). | |
230 | Moreover, it is easier to lower to MIR from THIR than from HIR. | |
231 | - Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG | |
232 | is a type of diagram that shows the basic blocks of a program and how control | |
233 | flow can go between them. Likewise, MIR also has a bunch of basic blocks with | |
234 | simple typed statements inside them (e.g. assignment, simple computations, | |
235 | etc) and control flow edges to other basic blocks (e.g., calls, dropping | |
236 | values). MIR is used for borrow checking and other | |
237 | important dataflow-based checks, such as checking for uninitialized values. | |
238 | It is also used for a series of optimizations and for constant evaluation (via | |
239 | MIRI). Because MIR is still generic, we can do a lot of analyses here more | |
240 | efficiently than after monomorphization. | |
241 | - LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR | |
242 | is a sort of typed assembly language with lots of annotations. It's | |
243 | a standard format that is used by all compilers that use LLVM (e.g. the clang | |
244 | C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other | |
245 | compilers to emit and also rich enough for LLVM to run a bunch of | |
246 | optimizations on it. | |
247 | ||
248 | One other thing to note is that many values in the compiler are _interned_. | |
249 | This is a performance and memory optimization in which we allocate the values | |
250 | in a special allocator called an _arena_. Then, we pass around references to | |
251 | the values allocated in the arena. This allows us to make sure that identical | |
252 | values (e.g. types in your program) are only allocated once and can be compared | |
253 | cheaply by comparing pointers. Many of the intermediate representations are | |
254 | interned. | |
255 | ||
256 | ### Queries | |
257 | ||
5099ac24 | 258 | The first big implementation choice is the _query_ system. The Rust compiler |
6a06907d XL |
259 | uses a query system which is unlike most textbook compilers, which are |
260 | organized as a series of passes over the code that execute sequentially. The | |
261 | compiler does this to make incremental compilation possible -- that is, if the | |
262 | user makes a change to their program and recompiles, we want to do as little | |
263 | redundant work as possible to produce the new binary. | |
264 | ||
265 | In `rustc`, all the major steps above are organized as a bunch of queries that | |
266 | call each other. For example, there is a query to ask for the type of something | |
267 | and another to ask for the optimized MIR of a function. These | |
268 | queries can call each other and are all tracked through the query system. | |
269 | The results of the queries are cached on disk so that we can tell which | |
270 | queries' results changed from the last compilation and only redo those. This is | |
271 | how incremental compilation works. | |
272 | ||
273 | In principle, for the query-fied steps, we do each of the above for each item | |
274 | individually. For example, we will take the HIR for a function and use queries | |
275 | to ask for the LLVM IR for that HIR. This drives the generation of optimized | |
276 | MIR, which drives the borrow checker, which drives the generation of MIR, and | |
277 | so on. | |
278 | ||
279 | ... except that this is very over-simplified. In fact, some queries are not | |
280 | cached on disk, and some parts of the compiler have to run for all code anyway | |
281 | for correctness even if the code is dead code (e.g. the borrow checker). For | |
282 | example, [currently the `mir_borrowck` query is first executed on all functions | |
283 | of a crate.][passes] Then the codegen backend invokes the | |
284 | `collect_and_partition_mono_items` query, which first recursively requests the | |
285 | `optimized_mir` for all reachable functions, which in turn runs `mir_borrowck` | |
286 | for that function and then creates codegen units. This kind of split will need | |
287 | to remain to ensure that unreachable functions still have their errors emitted. | |
288 | ||
289 | [passes]: https://github.com/rust-lang/rust/blob/45ebd5808afd3df7ba842797c0fcd4447ddf30fb/src/librustc_interface/passes.rs#L824 | |
290 | ||
291 | Moreover, the compiler wasn't originally built to use a query system; the query | |
292 | system has been retrofitted into the compiler, so parts of it are not query-fied | |
293 | yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to | |
294 | eventually query-fy all of the steps listed in the previous section, | |
487cf647 | 295 | but as of <!-- date-check --> November 2022, only the steps between HIR and |
6a06907d XL |
296 | LLVM IR are query-fied. That is, lexing, parsing, name resolution, and macro |
297 | expansion are done all at once for the whole program. | |
298 | ||
299 | One other thing to mention here is the all-important "typing context", | |
300 | [`TyCtxt`], which is a giant struct that is at the center of all things. | |
301 | (Note that the name is mostly historic. This is _not_ a "typing context" in the | |
302 | sense of `Γ` or `Δ` from type theory. The name is retained because that's what | |
303 | the name of the struct is in the source code.) All | |
304 | queries are defined as methods on the [`TyCtxt`] type, and the in-memory query | |
305 | cache is stored there too. In the code, there is usually a variable called | |
306 | `tcx` which is a handle on the typing context. You will also see lifetimes with | |
307 | the name `'tcx`, which means that something is tied to the lifetime of the | |
308 | `TyCtxt` (usually it is stored or interned there). | |
309 | ||
310 | [`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html | |
311 | ||
312 | ### `ty::Ty` | |
313 | ||
314 | Types are really important in Rust, and they form the core of a lot of compiler | |
315 | analyses. The main type (in the compiler) that represents types (in the user's | |
316 | program) is [`rustc_middle::ty::Ty`][ty]. This is so important that we have a whole chapter | |
317 | on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way | |
318 | `rustc` represents types! | |
319 | ||
320 | Also note that the `rustc_middle::ty` module defines the `TyCtxt` struct we mentioned before. | |
321 | ||
5e7ed085 | 322 | [ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html |
6a06907d XL |
323 | |
324 | ### Parallelism | |
325 | ||
326 | Compiler performance is a problem that we would like to improve on | |
327 | (and are always working on). One aspect of that is parallelizing | |
328 | `rustc` itself. | |
329 | ||
c295e0f8 | 330 | Currently, there is only one part of rustc that is parallel by default: codegen. |
6a06907d XL |
331 | |
332 | However, the rest of the compiler is still not yet parallel. There have been | |
333 | lots of efforts spent on this, but it is generally a hard problem. The current | |
334 | approach is to turn `RefCell`s into `Mutex`s -- that is, we | |
335 | switch to thread-safe internal mutability. However, there are ongoing | |
336 | challenges with lock contention, maintaining query-system invariants under | |
337 | concurrency, and the complexity of the code base. One can try out the current | |
338 | work by enabling parallel compilation in `config.toml`. It's still early days, | |
339 | but there are already some promising performance improvements. | |
340 | ||
341 | ### Bootstrapping | |
342 | ||
343 | `rustc` itself is written in Rust. So how do we compile the compiler? We use an | |
344 | older compiler to compile the newer compiler. This is called [_bootstrapping_]. | |
345 | ||
346 | Bootstrapping has a lot of interesting implications. For example, it means | |
347 | that one of the major users of Rust is the Rust compiler, so we are | |
348 | constantly testing our own software ("eating our own dogfood"). | |
349 | ||
350 | For more details on bootstrapping, see | |
351 | [the bootstrapping section of the guide][rustc-bootstrap]. | |
352 | ||
353 | [_bootstrapping_]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) | |
354 | [rustc-bootstrap]: building/bootstrapping.md | |
355 | ||
04454e1e | 356 | <!-- |
6a06907d XL |
357 | # Unresolved Questions |
358 | ||
359 | - Does LLVM ever do optimizations in debug builds? | |
360 | - How do I explore phases of the compile process in my own sources (lexer, | |
361 | parser, HIR, etc)? - e.g., `cargo rustc -- -Z unpretty=hir-tree` allows you to | |
362 | view HIR representation | |
363 | - What is the main source entry point for `X`? | |
364 | - Where do phases diverge for cross-compilation to machine code across | |
365 | different platforms? | |
04454e1e | 366 | --> |
064997fb | 367 | |
6a06907d XL |
368 | # References |
369 | ||
370 | - Command line parsing | |
064997fb | 371 | - Guide: [The Rustc Driver and Interface](rustc-driver.md) |
6a06907d XL |
372 | - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) |
373 | - Main entry point: [`rustc_session::config::build_session_options`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/config/fn.build_session_options.html) | |
374 | - Lexical Analysis: Lex the user program to a stream of tokens | |
064997fb | 375 | - Guide: [Lexing and Parsing](the-parser.md) |
6a06907d | 376 | - Lexer definition: [`rustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) |
9c376795 | 377 | - Main entry point: [`rustc_lexer::cursor::Cursor::advance_token`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/cursor/struct.Cursor.html#method.advance_token) |
6a06907d | 378 | - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) |
064997fb FG |
379 | - Guide: [Lexing and Parsing](the-parser.md) |
380 | - Guide: [Macro Expansion](macro-expansion.md) | |
381 | - Guide: [Name Resolution](name-resolution.md) | |
6a06907d XL |
382 | - Parser definition: [`rustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) |
383 | - Main entry points: | |
384 | - [Entry point for first file in crate](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/passes/fn.parse.html) | |
385 | - [Entry point for outline module parsing](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html) | |
386 | - [Entry point for macro fragments][parse_nonterminal] | |
387 | - AST definition: [`rustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) | |
6a06907d XL |
388 | - Feature gating: **TODO** |
389 | - Early linting: **TODO** | |
390 | - The High Level Intermediate Representation (HIR) | |
064997fb FG |
391 | - Guide: [The HIR](hir.md) |
392 | - Guide: [Identifiers in the HIR](hir.md#identifiers-in-the-hir) | |
393 | - Guide: [The HIR Map](hir.md#the-hir-map) | |
394 | - Guide: [Lowering AST to HIR](lowering.md) | |
6a06907d XL |
395 | - How to view HIR representation for your code `cargo rustc -- -Z unpretty=hir-tree` |
396 | - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) | |
397 | - Main entry point: **TODO** | |
398 | - Late linting: **TODO** | |
399 | - Type Inference | |
064997fb FG |
400 | - Guide: [Type Inference](type-inference.md) |
401 | - Guide: [The ty Module: Representing Types](ty.md) (semantics) | |
6a06907d XL |
402 | - Main entry point (type inference): [`InferCtxtBuilder::enter`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxtBuilder.html#method.enter) |
403 | - Main entry point (type checking bodies): [the `typeck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.typeck) | |
404 | - These two functions can't be decoupled. | |
405 | - The Mid Level Intermediate Representation (MIR) | |
064997fb | 406 | - Guide: [The MIR (Mid level IR)](mir/index.md) |
6a06907d | 407 | - Definition: [`rustc_middle/src/mir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html) |
3c0e092e | 408 | - Definition of sources that manipulates the MIR: [`rustc_mir_build`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/index.html), [`rustc_mir_dataflow`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_dataflow/index.html), [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html) |
6a06907d | 409 | - The Borrow Checker |
064997fb | 410 | - Guide: [MIR Borrow Check](borrow_check.md) |
3c0e092e XL |
411 | - Definition: [`rustc_borrowck`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/index.html) |
412 | - Main entry point: [`mir_borrowck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/fn.mir_borrowck.html) | |
6a06907d | 413 | - MIR Optimizations |
064997fb | 414 | - Guide: [MIR Optimizations](mir/optimizations.md) |
3c0e092e XL |
415 | - Definition: [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html) |
416 | - Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/fn.optimized_mir.html) | |
6a06907d | 417 | - Code Generation |
064997fb | 418 | - Guide: [Code Generation](backend/codegen.md) |
6a06907d XL |
419 | - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** |
420 | - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) | |
421 | - This monomorphizes and produces LLVM IR for one codegen unit. It then | |
422 | starts a background thread to run LLVM, which must be joined later. | |
423 | - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html) |