]>
Commit | Line | Data |
---|---|---|
ba9703b0 XL |
1 | # From MIR to Binaries |
2 | ||
3 | All of the preceding chapters of this guide have one thing in common: we never | |
4 | generated any executable machine code at all! With this chapter, all of that | |
5 | changes. | |
6 | ||
7 | So far, we've shown how the compiler can take raw source code in text format | |
6a06907d | 8 | and transform it into [MIR]. We have also shown how the compiler does various |
ba9703b0 XL |
9 | analyses on the code to detect things like type or lifetime errors. Now, we |
10 | will finally take the MIR and produce some executable machine code. | |
11 | ||
6a06907d XL |
12 | [MIR]: ./mir/index.md |
13 | ||
ba9703b0 XL |
14 | > NOTE: This part of a compiler is often called the _backend_ the term is a bit |
15 | > overloaded because in the compiler source, it usually refers to the "codegen | |
16 | > backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" | |
6a06907d | 17 | > in this part, we are referring to the "codegen backend". |
ba9703b0 XL |
18 | |
19 | So what do we need to do? | |
20 | ||
21 | 0. First, we need to collect the set of things to generate code for. In | |
22 | particular, we need to find out which concrete types to substitute for | |
23 | generic ones, since we need to generate code for the concrete types. | |
24 | Generating code for the concrete types (i.e. emitting a copy of the code for | |
25 | each concrete type) is called _monomorphization_, so the process of | |
26 | collecting all the concrete types is called _monomorphization collection_. | |
27 | 1. Next, we need to actually lower the MIR to a codegen IR | |
28 | (usually LLVM IR) for each concrete type we collected. | |
29 | 2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of | |
30 | optimization passes, generates executable code, and links together an | |
31 | executable binary. | |
32 | ||
33 | [codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html | |
34 | ||
35 | The code for codegen is actually a bit complex due to a few factors: | |
36 | ||
37 | - Support for multiple codegen backends (LLVM and Cranelift). We try to share as much | |
38 | backend code between them as possible, so a lot of it is generic over the | |
39 | codegen implementation. This means that there are often a lot of layers of | |
40 | abstraction. | |
41 | - Codegen happens asynchronously in another thread for performance. | |
42 | - The actual codegen is done by a third-party library (either LLVM or Cranelift). | |
43 | ||
44 | Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code | |
45 | (i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm] | |
46 | crate contains code specific to LLVM codegen. | |
47 | ||
48 | [ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html | |
49 | [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html | |
50 | ||
51 | At a very high level, the entry point is | |
52 | [`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the | |
53 | process discussed in the rest of this chapter. | |
54 |