]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/codegen/backend-agnostic.md
New upstream version 1.43.0+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / codegen / backend-agnostic.md
1 # Backend Agnostic Codegen
2
3 In the future, it would be nice to allow other codegen backends (e.g.
4 [Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an
5 abstract interface for all backends to implenent.
6
7 [cranelift]: https://github.com/CraneStation/cranelift
8
9 > The following is a copy/paste of a README from the rust-lang/rust repo.
10 > Please submit a PR if it needs updating.
11
12 # Refactoring of `rustc_codegen_llvm`
13 by Denis Merigoux, October 23rd 2018
14
15 ## State of the code before the refactoring
16
17 All the code related to the compilation of MIR into LLVM IR was contained
18 inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
19 important elements:
20 * the `back` folder (7,800 LOC) implements the mechanisms for creating the
21 different object files and archive through LLVM, but also the communication
22 mechanisms for parallel code generation;
23 * the `debuginfo` (3,200 LOC) folder contains all code that passes debug
24 information down to LLVM;
25 * the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
26 LLVM using the C++ API;
27 * the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
28 IR;
29 * the `base.rs` (1,300 LOC) file contains some helper functions but also the
30 high-level code that launches the code generation and distributes the work.
31 * the `builder.rs` (1,200 LOC) file contains all the functions generating
32 individual LLVM IR instructions inside a basic block;
33 * the `common.rs` (450 LOC) contains various helper functions and all the
34 functions generating LLVM static values;
35 * the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
36
37 The goal of this refactoring is to separate inside this crate code that is
38 specific to the LLVM from code that can be reused for other rustc backends. For
39 instance, the `mir` folder is almost entirely backend-specific but it relies
40 heavily on other parts of the crate. The separation of the code must not affect
41 the logic of the code nor its performance.
42
43 For these reasons, the separation process involves two transformations that
44 have to be done at the same time for the resulting code to compile :
45
46 1. replace all the LLVM-specific types by generics inside function signatures
47 and structure definitions;
48 2. encapsulate all functions calling the LLVM FFI inside a set of traits that
49 will define the interface between backend-agnostic code and the backend.
50
51 While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
52 traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
53 suggestion by @eddyb).
54
55 ## Generic types and structures
56
57 @irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
58 generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
59 work has been extended to all structures inside the `mir` folder and elsewhere,
60 as well as for LLVM's `BasicBlock` and `Type` types.
61
62 The two most important structures for the LLVM codegen are `CodegenCx` and
63 `Builder`. They are parametrized by multiple lifetime parameters and the type
64 for `Value`.
65
66 ```rust,ignore
67 struct CodegenCx<'ll, 'tcx> {
68 /* ... */
69 }
70
71 struct Builder<'a, 'll, 'tcx> {
72 cx: &'a CodegenCx<'ll, 'tcx>,
73 /* ... */
74 }
75 ```
76
77 `CodegenCx` is used to compile one codegen-unit that can contain multiple
78 functions, whereas `Builder` is created to compile one basic block.
79
80 The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
81 parameters, that correspond to the following:
82 * `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
83 containing the program's information;
84 * `'a` is a short-lived reference of a `CodegenCx` or another object inside a
85 struct;
86 * `'ll` is the lifetime of references to LLVM objects such as `Value` or
87 `Type`.
88
89 Although there are already many lifetime parameters in the code, making it
90 generic uncovered situations where the borrow-checker was passing only due to
91 the special nature of the LLVM objects manipulated (they are extern pointers).
92 For instance, an additional lifetime parameter had to be added to
93 `LocalAnalyser` in `analyse.rs`, leading to the definition:
94
95 ```rust,ignore
96 struct LocalAnalyzer<'mir, 'a, 'tcx> {
97 /* ... */
98 }
99 ```
100
101 However, the two most important structures `CodegenCx` and `Builder` are not
102 defined in the backend-agnostic code. Indeed, their content is highly specific
103 of the backend and it makes more sense to leave their definition to the backend
104 implementor than to allow just a narrow spot via a generic field for the
105 backend's context.
106
107 ## Traits and interface
108
109 Because they have to be defined by the backend, `CodegenCx` and `Builder` will
110 be the structures implementing all the traits defining the backend's interface.
111 These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
112 backend-agnostic code is parametrized by them. For instance, let us explain how
113 a function in `base.rs` is parametrized:
114
115 ```rust,ignore
116 pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
117 cx: &'a Bx::CodegenCx,
118 instance: Instance<'tcx>
119 ) {
120 /* ... */
121 }
122 ```
123
124 In this signature, we have the two lifetime parameters explained earlier and
125 the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
126 to the interface satisfied by the `Builder` struct. The `BuilderMethods`
127 defines an associated type `Bx::CodegenCx` that itself satisfies the
128 `CodegenMethods` traits implemented by the struct `CodegenCx`.
129
130 On the trait side, here is an example with part of the definition of
131 `BuilderMethods` in `traits/builder.rs`:
132
133 ```rust,ignore
134 pub trait BuilderMethods<'a, 'tcx>:
135 HasCodegen<'tcx>
136 + DebugInfoBuilderMethods<'tcx>
137 + ArgTypeMethods<'tcx>
138 + AbiBuilderMethods<'tcx>
139 + IntrinsicCallMethods<'tcx>
140 + AsmBuilderMethods<'tcx>
141 {
142 fn new_block<'b>(
143 cx: &'a Self::CodegenCx,
144 llfn: Self::Function,
145 name: &'b str
146 ) -> Self;
147 /* ... */
148 fn cond_br(
149 &mut self,
150 cond: Self::Value,
151 then_llbb: Self::BasicBlock,
152 else_llbb: Self::BasicBlock,
153 );
154 /* ... */
155 }
156 ```
157
158 Finally, a master structure implementing the `ExtraBackendMethods` trait is
159 used for high-level codegen-driving functions like `codegen_crate` in
160 `base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
161 `ExtraBackendMethods` should be implemented by the same structure that
162 implements the `CodegenBackend` defined in
163 `rustc_codegen_utils/codegen_backend.rs`.
164
165 During the traitification process, certain functions have been converted from
166 methods of a local structure to methods of `CodegenCx` or `Builder` and a
167 corresponding `self` parameter has been added. Indeed, LLVM stores information
168 internally that it can access when called through its API. This information
169 does not show up in a Rust data structure carried around when these methods are
170 called. However, when implementing a Rust backend for `rustc`, these methods
171 will need information from `CodegenCx`, hence the additional parameter (unused
172 in the LLVM implementation of the trait).
173
174 ## State of the code after the refactoring
175
176 The traits offer an API which is very similar to the API of LLVM. This is not
177 the best solution since LLVM has a very special way of doing things: when
178 addding another backend, the traits definition might be changed in order to
179 offer more flexibility.
180
181 However, the current separation between backend-agnostic and LLVM-specific code
182 has allowed the reuse of a significant part of the old `rustc_codegen_llvm`.
183 Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
184 most important elements:
185
186 * `back` folder: 3,800 (BA) vs 4,100 (LLVM);
187 * `mir` folder: 4,400 (BA) vs 0 (LLVM);
188 * `base.rs`: 1,100 (BA) vs 250 (LLVM);
189 * `builder.rs`: 1,400 (BA) vs 0 (LLVM);
190 * `common.rs`: 350 (BA) vs 350 (LLVM);
191
192 The `debuginfo` folder has been left almost untouched by the splitting and is
193 specific to LLVM. Only its high-level features have been traitified.
194
195 The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
196 27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
197 18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
198 `rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
199 approximately 10,000 LOC that would otherwise have had to be duplicated between
200 the multiple backends of `rustc`.
201
202 The refactored version of `rustc`'s backend introduced no regression over the
203 test suite nor in performance benchmark, which is in coherence with the nature
204 of the refactoring that used only compile-time parametricity (no trait
205 objects).