]>
Commit | Line | Data |
---|---|---|
60c5eb7d XL |
1 | # Backend Agnostic Codegen |
2 | ||
6a06907d | 3 | <!-- toc --> |
60c5eb7d | 4 | |
6a06907d XL |
5 | As of <!-- date: 2021-01 --> January 2021, `rustc_codegen_ssa` provides an |
6 | abstract interface for all backends to implement, to allow other codegen | |
7 | backends (e.g. [Cranelift]). | |
8 | ||
9 | [Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift | |
60c5eb7d XL |
10 | |
11 | > The following is a copy/paste of a README from the rust-lang/rust repo. | |
12 | > Please submit a PR if it needs updating. | |
13 | ||
14 | # Refactoring of `rustc_codegen_llvm` | |
15 | by Denis Merigoux, October 23rd 2018 | |
16 | ||
17 | ## State of the code before the refactoring | |
18 | ||
19 | All the code related to the compilation of MIR into LLVM IR was contained | |
20 | inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most | |
21 | important elements: | |
22 | * the `back` folder (7,800 LOC) implements the mechanisms for creating the | |
23 | different object files and archive through LLVM, but also the communication | |
24 | mechanisms for parallel code generation; | |
25 | * the `debuginfo` (3,200 LOC) folder contains all code that passes debug | |
26 | information down to LLVM; | |
27 | * the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with | |
28 | LLVM using the C++ API; | |
29 | * the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM | |
30 | IR; | |
31 | * the `base.rs` (1,300 LOC) file contains some helper functions but also the | |
32 | high-level code that launches the code generation and distributes the work. | |
33 | * the `builder.rs` (1,200 LOC) file contains all the functions generating | |
34 | individual LLVM IR instructions inside a basic block; | |
35 | * the `common.rs` (450 LOC) contains various helper functions and all the | |
36 | functions generating LLVM static values; | |
37 | * the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR. | |
38 | ||
39 | The goal of this refactoring is to separate inside this crate code that is | |
40 | specific to the LLVM from code that can be reused for other rustc backends. For | |
41 | instance, the `mir` folder is almost entirely backend-specific but it relies | |
42 | heavily on other parts of the crate. The separation of the code must not affect | |
43 | the logic of the code nor its performance. | |
44 | ||
45 | For these reasons, the separation process involves two transformations that | |
46 | have to be done at the same time for the resulting code to compile : | |
47 | ||
48 | 1. replace all the LLVM-specific types by generics inside function signatures | |
49 | and structure definitions; | |
50 | 2. encapsulate all functions calling the LLVM FFI inside a set of traits that | |
51 | will define the interface between backend-agnostic code and the backend. | |
52 | ||
53 | While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new | |
54 | traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name | |
55 | suggestion by @eddyb). | |
56 | ||
57 | ## Generic types and structures | |
58 | ||
59 | @irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a | |
60 | generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This | |
61 | work has been extended to all structures inside the `mir` folder and elsewhere, | |
62 | as well as for LLVM's `BasicBlock` and `Type` types. | |
63 | ||
64 | The two most important structures for the LLVM codegen are `CodegenCx` and | |
65 | `Builder`. They are parametrized by multiple lifetime parameters and the type | |
66 | for `Value`. | |
67 | ||
68 | ```rust,ignore | |
69 | struct CodegenCx<'ll, 'tcx> { | |
70 | /* ... */ | |
71 | } | |
72 | ||
73 | struct Builder<'a, 'll, 'tcx> { | |
74 | cx: &'a CodegenCx<'ll, 'tcx>, | |
75 | /* ... */ | |
76 | } | |
77 | ``` | |
78 | ||
79 | `CodegenCx` is used to compile one codegen-unit that can contain multiple | |
80 | functions, whereas `Builder` is created to compile one basic block. | |
81 | ||
82 | The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime | |
83 | parameters, that correspond to the following: | |
84 | * `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt` | |
85 | containing the program's information; | |
86 | * `'a` is a short-lived reference of a `CodegenCx` or another object inside a | |
87 | struct; | |
88 | * `'ll` is the lifetime of references to LLVM objects such as `Value` or | |
89 | `Type`. | |
90 | ||
91 | Although there are already many lifetime parameters in the code, making it | |
92 | generic uncovered situations where the borrow-checker was passing only due to | |
93 | the special nature of the LLVM objects manipulated (they are extern pointers). | |
74b04a01 | 94 | For instance, an additional lifetime parameter had to be added to |
60c5eb7d XL |
95 | `LocalAnalyser` in `analyse.rs`, leading to the definition: |
96 | ||
97 | ```rust,ignore | |
98 | struct LocalAnalyzer<'mir, 'a, 'tcx> { | |
99 | /* ... */ | |
100 | } | |
101 | ``` | |
102 | ||
103 | However, the two most important structures `CodegenCx` and `Builder` are not | |
104 | defined in the backend-agnostic code. Indeed, their content is highly specific | |
105 | of the backend and it makes more sense to leave their definition to the backend | |
106 | implementor than to allow just a narrow spot via a generic field for the | |
107 | backend's context. | |
108 | ||
109 | ## Traits and interface | |
110 | ||
111 | Because they have to be defined by the backend, `CodegenCx` and `Builder` will | |
112 | be the structures implementing all the traits defining the backend's interface. | |
113 | These traits are defined in the folder `rustc_codegen_ssa/traits` and all the | |
114 | backend-agnostic code is parametrized by them. For instance, let us explain how | |
115 | a function in `base.rs` is parametrized: | |
116 | ||
117 | ```rust,ignore | |
118 | pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>( | |
119 | cx: &'a Bx::CodegenCx, | |
120 | instance: Instance<'tcx> | |
121 | ) { | |
122 | /* ... */ | |
123 | } | |
124 | ``` | |
125 | ||
126 | In this signature, we have the two lifetime parameters explained earlier and | |
127 | the master type `Bx` which satisfies the trait `BuilderMethods` corresponding | |
128 | to the interface satisfied by the `Builder` struct. The `BuilderMethods` | |
129 | defines an associated type `Bx::CodegenCx` that itself satisfies the | |
130 | `CodegenMethods` traits implemented by the struct `CodegenCx`. | |
131 | ||
132 | On the trait side, here is an example with part of the definition of | |
133 | `BuilderMethods` in `traits/builder.rs`: | |
134 | ||
135 | ```rust,ignore | |
136 | pub trait BuilderMethods<'a, 'tcx>: | |
137 | HasCodegen<'tcx> | |
138 | + DebugInfoBuilderMethods<'tcx> | |
139 | + ArgTypeMethods<'tcx> | |
140 | + AbiBuilderMethods<'tcx> | |
141 | + IntrinsicCallMethods<'tcx> | |
142 | + AsmBuilderMethods<'tcx> | |
143 | { | |
144 | fn new_block<'b>( | |
145 | cx: &'a Self::CodegenCx, | |
146 | llfn: Self::Function, | |
147 | name: &'b str | |
148 | ) -> Self; | |
149 | /* ... */ | |
150 | fn cond_br( | |
151 | &mut self, | |
152 | cond: Self::Value, | |
153 | then_llbb: Self::BasicBlock, | |
154 | else_llbb: Self::BasicBlock, | |
155 | ); | |
156 | /* ... */ | |
157 | } | |
158 | ``` | |
159 | ||
160 | Finally, a master structure implementing the `ExtraBackendMethods` trait is | |
161 | used for high-level codegen-driving functions like `codegen_crate` in | |
162 | `base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`. | |
163 | `ExtraBackendMethods` should be implemented by the same structure that | |
164 | implements the `CodegenBackend` defined in | |
165 | `rustc_codegen_utils/codegen_backend.rs`. | |
166 | ||
167 | During the traitification process, certain functions have been converted from | |
168 | methods of a local structure to methods of `CodegenCx` or `Builder` and a | |
169 | corresponding `self` parameter has been added. Indeed, LLVM stores information | |
170 | internally that it can access when called through its API. This information | |
171 | does not show up in a Rust data structure carried around when these methods are | |
172 | called. However, when implementing a Rust backend for `rustc`, these methods | |
173 | will need information from `CodegenCx`, hence the additional parameter (unused | |
174 | in the LLVM implementation of the trait). | |
175 | ||
176 | ## State of the code after the refactoring | |
177 | ||
178 | The traits offer an API which is very similar to the API of LLVM. This is not | |
179 | the best solution since LLVM has a very special way of doing things: when | |
6a06907d | 180 | adding another backend, the traits definition might be changed in order to |
60c5eb7d XL |
181 | offer more flexibility. |
182 | ||
183 | However, the current separation between backend-agnostic and LLVM-specific code | |
74b04a01 | 184 | has allowed the reuse of a significant part of the old `rustc_codegen_llvm`. |
60c5eb7d XL |
185 | Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the |
186 | most important elements: | |
187 | ||
188 | * `back` folder: 3,800 (BA) vs 4,100 (LLVM); | |
189 | * `mir` folder: 4,400 (BA) vs 0 (LLVM); | |
190 | * `base.rs`: 1,100 (BA) vs 250 (LLVM); | |
191 | * `builder.rs`: 1,400 (BA) vs 0 (LLVM); | |
192 | * `common.rs`: 350 (BA) vs 350 (LLVM); | |
193 | ||
194 | The `debuginfo` folder has been left almost untouched by the splitting and is | |
195 | specific to LLVM. Only its high-level features have been traitified. | |
196 | ||
197 | The new `traits` folder has 1500 LOC only for trait definitions. Overall, the | |
198 | 27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new | |
199 | 18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized | |
200 | `rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of | |
201 | approximately 10,000 LOC that would otherwise have had to be duplicated between | |
202 | the multiple backends of `rustc`. | |
203 | ||
204 | The refactored version of `rustc`'s backend introduced no regression over the | |
205 | test suite nor in performance benchmark, which is in coherence with the nature | |
206 | of the refactoring that used only compile-time parametricity (no trait | |
207 | objects). |