]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/backend/backend-agnostic.md
New upstream version 1.52.0~beta.3+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / backend / backend-agnostic.md
1 # Backend Agnostic Codegen
2
3 <!-- toc -->
4
5 As of <!-- date: 2021-01 --> January 2021, `rustc_codegen_ssa` provides an
6 abstract interface for all backends to implement, to allow other codegen
7 backends (e.g. [Cranelift]).
8
9 [Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift
10
11 > The following is a copy/paste of a README from the rust-lang/rust repo.
12 > Please submit a PR if it needs updating.
13
14 # Refactoring of `rustc_codegen_llvm`
15 by Denis Merigoux, October 23rd 2018
16
17 ## State of the code before the refactoring
18
19 All the code related to the compilation of MIR into LLVM IR was contained
20 inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
21 important elements:
22 * the `back` folder (7,800 LOC) implements the mechanisms for creating the
23 different object files and archive through LLVM, but also the communication
24 mechanisms for parallel code generation;
25 * the `debuginfo` (3,200 LOC) folder contains all code that passes debug
26 information down to LLVM;
27 * the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
28 LLVM using the C++ API;
29 * the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
30 IR;
31 * the `base.rs` (1,300 LOC) file contains some helper functions but also the
32 high-level code that launches the code generation and distributes the work.
33 * the `builder.rs` (1,200 LOC) file contains all the functions generating
34 individual LLVM IR instructions inside a basic block;
35 * the `common.rs` (450 LOC) contains various helper functions and all the
36 functions generating LLVM static values;
37 * the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
38
39 The goal of this refactoring is to separate inside this crate code that is
40 specific to the LLVM from code that can be reused for other rustc backends. For
41 instance, the `mir` folder is almost entirely backend-specific but it relies
42 heavily on other parts of the crate. The separation of the code must not affect
43 the logic of the code nor its performance.
44
45 For these reasons, the separation process involves two transformations that
46 have to be done at the same time for the resulting code to compile :
47
48 1. replace all the LLVM-specific types by generics inside function signatures
49 and structure definitions;
50 2. encapsulate all functions calling the LLVM FFI inside a set of traits that
51 will define the interface between backend-agnostic code and the backend.
52
53 While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
54 traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
55 suggestion by @eddyb).
56
57 ## Generic types and structures
58
59 @irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
60 generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
61 work has been extended to all structures inside the `mir` folder and elsewhere,
62 as well as for LLVM's `BasicBlock` and `Type` types.
63
64 The two most important structures for the LLVM codegen are `CodegenCx` and
65 `Builder`. They are parametrized by multiple lifetime parameters and the type
66 for `Value`.
67
68 ```rust,ignore
69 struct CodegenCx<'ll, 'tcx> {
70 /* ... */
71 }
72
73 struct Builder<'a, 'll, 'tcx> {
74 cx: &'a CodegenCx<'ll, 'tcx>,
75 /* ... */
76 }
77 ```
78
79 `CodegenCx` is used to compile one codegen-unit that can contain multiple
80 functions, whereas `Builder` is created to compile one basic block.
81
82 The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
83 parameters, that correspond to the following:
84 * `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
85 containing the program's information;
86 * `'a` is a short-lived reference of a `CodegenCx` or another object inside a
87 struct;
88 * `'ll` is the lifetime of references to LLVM objects such as `Value` or
89 `Type`.
90
91 Although there are already many lifetime parameters in the code, making it
92 generic uncovered situations where the borrow-checker was passing only due to
93 the special nature of the LLVM objects manipulated (they are extern pointers).
94 For instance, an additional lifetime parameter had to be added to
95 `LocalAnalyser` in `analyse.rs`, leading to the definition:
96
97 ```rust,ignore
98 struct LocalAnalyzer<'mir, 'a, 'tcx> {
99 /* ... */
100 }
101 ```
102
103 However, the two most important structures `CodegenCx` and `Builder` are not
104 defined in the backend-agnostic code. Indeed, their content is highly specific
105 of the backend and it makes more sense to leave their definition to the backend
106 implementor than to allow just a narrow spot via a generic field for the
107 backend's context.
108
109 ## Traits and interface
110
111 Because they have to be defined by the backend, `CodegenCx` and `Builder` will
112 be the structures implementing all the traits defining the backend's interface.
113 These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
114 backend-agnostic code is parametrized by them. For instance, let us explain how
115 a function in `base.rs` is parametrized:
116
117 ```rust,ignore
118 pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
119 cx: &'a Bx::CodegenCx,
120 instance: Instance<'tcx>
121 ) {
122 /* ... */
123 }
124 ```
125
126 In this signature, we have the two lifetime parameters explained earlier and
127 the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
128 to the interface satisfied by the `Builder` struct. The `BuilderMethods`
129 defines an associated type `Bx::CodegenCx` that itself satisfies the
130 `CodegenMethods` traits implemented by the struct `CodegenCx`.
131
132 On the trait side, here is an example with part of the definition of
133 `BuilderMethods` in `traits/builder.rs`:
134
135 ```rust,ignore
136 pub trait BuilderMethods<'a, 'tcx>:
137 HasCodegen<'tcx>
138 + DebugInfoBuilderMethods<'tcx>
139 + ArgTypeMethods<'tcx>
140 + AbiBuilderMethods<'tcx>
141 + IntrinsicCallMethods<'tcx>
142 + AsmBuilderMethods<'tcx>
143 {
144 fn new_block<'b>(
145 cx: &'a Self::CodegenCx,
146 llfn: Self::Function,
147 name: &'b str
148 ) -> Self;
149 /* ... */
150 fn cond_br(
151 &mut self,
152 cond: Self::Value,
153 then_llbb: Self::BasicBlock,
154 else_llbb: Self::BasicBlock,
155 );
156 /* ... */
157 }
158 ```
159
160 Finally, a master structure implementing the `ExtraBackendMethods` trait is
161 used for high-level codegen-driving functions like `codegen_crate` in
162 `base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
163 `ExtraBackendMethods` should be implemented by the same structure that
164 implements the `CodegenBackend` defined in
165 `rustc_codegen_utils/codegen_backend.rs`.
166
167 During the traitification process, certain functions have been converted from
168 methods of a local structure to methods of `CodegenCx` or `Builder` and a
169 corresponding `self` parameter has been added. Indeed, LLVM stores information
170 internally that it can access when called through its API. This information
171 does not show up in a Rust data structure carried around when these methods are
172 called. However, when implementing a Rust backend for `rustc`, these methods
173 will need information from `CodegenCx`, hence the additional parameter (unused
174 in the LLVM implementation of the trait).
175
176 ## State of the code after the refactoring
177
178 The traits offer an API which is very similar to the API of LLVM. This is not
179 the best solution since LLVM has a very special way of doing things: when
180 adding another backend, the traits definition might be changed in order to
181 offer more flexibility.
182
183 However, the current separation between backend-agnostic and LLVM-specific code
184 has allowed the reuse of a significant part of the old `rustc_codegen_llvm`.
185 Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
186 most important elements:
187
188 * `back` folder: 3,800 (BA) vs 4,100 (LLVM);
189 * `mir` folder: 4,400 (BA) vs 0 (LLVM);
190 * `base.rs`: 1,100 (BA) vs 250 (LLVM);
191 * `builder.rs`: 1,400 (BA) vs 0 (LLVM);
192 * `common.rs`: 350 (BA) vs 350 (LLVM);
193
194 The `debuginfo` folder has been left almost untouched by the splitting and is
195 specific to LLVM. Only its high-level features have been traitified.
196
197 The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
198 27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
199 18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
200 `rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
201 approximately 10,000 LOC that would otherwise have had to be duplicated between
202 the multiple backends of `rustc`.
203
204 The refactored version of `rustc`'s backend introduced no regression over the
205 test suite nor in performance benchmark, which is in coherence with the nature
206 of the refactoring that used only compile-time parametricity (no trait
207 objects).