]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/backend/backend-agnostic.md
bump version to 1.80.1+dfsg1-1~bpo12+pve1
[rustc.git] / src / doc / rustc-dev-guide / src / backend / backend-agnostic.md
1 # Backend Agnostic Codegen
2
3 <!-- toc -->
4
5 [`rustc_codegen_ssa`]
6 provides an abstract interface for all backends to implement,
7 namely LLVM, [Cranelift], and [GCC].
8
9 [Cranelift]: https://github.com/rust-lang/rustc_codegen_cranelift
10 [GCC]: https://github.com/rust-lang/rustc_codegen_gcc
11 [`rustc_codegen_ssa`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
12
13 Below is some background information on the refactoring that created this
14 abstract interface.
15
16 ## Refactoring of `rustc_codegen_llvm`
17 by Denis Merigoux, October 23rd 2018
18
19 ### State of the code before the refactoring
20
21 All the code related to the compilation of MIR into LLVM IR was contained
22 inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
23 important elements:
24 * the `back` folder (7,800 LOC) implements the mechanisms for creating the
25 different object files and archive through LLVM, but also the communication
26 mechanisms for parallel code generation;
27 * the `debuginfo` (3,200 LOC) folder contains all code that passes debug
28 information down to LLVM;
29 * the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
30 LLVM using the C++ API;
31 * the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
32 IR;
33 * the `base.rs` (1,300 LOC) file contains some helper functions but also the
34 high-level code that launches the code generation and distributes the work.
35 * the `builder.rs` (1,200 LOC) file contains all the functions generating
36 individual LLVM IR instructions inside a basic block;
37 * the `common.rs` (450 LOC) contains various helper functions and all the
38 functions generating LLVM static values;
39 * the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
40
41 The goal of this refactoring is to separate inside this crate code that is
42 specific to the LLVM from code that can be reused for other rustc backends. For
43 instance, the `mir` folder is almost entirely backend-specific but it relies
44 heavily on other parts of the crate. The separation of the code must not affect
45 the logic of the code nor its performance.
46
47 For these reasons, the separation process involves two transformations that
48 have to be done at the same time for the resulting code to compile :
49
50 1. replace all the LLVM-specific types by generics inside function signatures
51 and structure definitions;
52 2. encapsulate all functions calling the LLVM FFI inside a set of traits that
53 will define the interface between backend-agnostic code and the backend.
54
55 While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
56 traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
57 suggestion by @eddyb).
58
59 ### Generic types and structures
60
61 @irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
62 generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
63 work has been extended to all structures inside the `mir` folder and elsewhere,
64 as well as for LLVM's `BasicBlock` and `Type` types.
65
66 The two most important structures for the LLVM codegen are `CodegenCx` and
67 `Builder`. They are parametrized by multiple lifetime parameters and the type
68 for `Value`.
69
70 ```rust,ignore
71 struct CodegenCx<'ll, 'tcx> {
72 /* ... */
73 }
74
75 struct Builder<'a, 'll, 'tcx> {
76 cx: &'a CodegenCx<'ll, 'tcx>,
77 /* ... */
78 }
79 ```
80
81 `CodegenCx` is used to compile one codegen-unit that can contain multiple
82 functions, whereas `Builder` is created to compile one basic block.
83
84 The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
85 parameters, that correspond to the following:
86 * `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
87 containing the program's information;
88 * `'a` is a short-lived reference of a `CodegenCx` or another object inside a
89 struct;
90 * `'ll` is the lifetime of references to LLVM objects such as `Value` or
91 `Type`.
92
93 Although there are already many lifetime parameters in the code, making it
94 generic uncovered situations where the borrow-checker was passing only due to
95 the special nature of the LLVM objects manipulated (they are extern pointers).
96 For instance, an additional lifetime parameter had to be added to
97 `LocalAnalyser` in `analyse.rs`, leading to the definition:
98
99 ```rust,ignore
100 struct LocalAnalyzer<'mir, 'a, 'tcx> {
101 /* ... */
102 }
103 ```
104
105 However, the two most important structures `CodegenCx` and `Builder` are not
106 defined in the backend-agnostic code. Indeed, their content is highly specific
107 of the backend and it makes more sense to leave their definition to the backend
108 implementor than to allow just a narrow spot via a generic field for the
109 backend's context.
110
111 ### Traits and interface
112
113 Because they have to be defined by the backend, `CodegenCx` and `Builder` will
114 be the structures implementing all the traits defining the backend's interface.
115 These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
116 backend-agnostic code is parametrized by them. For instance, let us explain how
117 a function in `base.rs` is parametrized:
118
119 ```rust,ignore
120 pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
121 cx: &'a Bx::CodegenCx,
122 instance: Instance<'tcx>
123 ) {
124 /* ... */
125 }
126 ```
127
128 In this signature, we have the two lifetime parameters explained earlier and
129 the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
130 to the interface satisfied by the `Builder` struct. The `BuilderMethods`
131 defines an associated type `Bx::CodegenCx` that itself satisfies the
132 `CodegenMethods` traits implemented by the struct `CodegenCx`.
133
134 On the trait side, here is an example with part of the definition of
135 `BuilderMethods` in `traits/builder.rs`:
136
137 ```rust,ignore
138 pub trait BuilderMethods<'a, 'tcx>:
139 HasCodegen<'tcx>
140 + DebugInfoBuilderMethods<'tcx>
141 + ArgTypeMethods<'tcx>
142 + AbiBuilderMethods<'tcx>
143 + IntrinsicCallMethods<'tcx>
144 + AsmBuilderMethods<'tcx>
145 {
146 fn new_block<'b>(
147 cx: &'a Self::CodegenCx,
148 llfn: Self::Function,
149 name: &'b str
150 ) -> Self;
151 /* ... */
152 fn cond_br(
153 &mut self,
154 cond: Self::Value,
155 then_llbb: Self::BasicBlock,
156 else_llbb: Self::BasicBlock,
157 );
158 /* ... */
159 }
160 ```
161
162 Finally, a master structure implementing the `ExtraBackendMethods` trait is
163 used for high-level codegen-driving functions like `codegen_crate` in
164 `base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
165 `ExtraBackendMethods` should be implemented by the same structure that
166 implements the `CodegenBackend` defined in
167 `rustc_codegen_utils/codegen_backend.rs`.
168
169 During the traitification process, certain functions have been converted from
170 methods of a local structure to methods of `CodegenCx` or `Builder` and a
171 corresponding `self` parameter has been added. Indeed, LLVM stores information
172 internally that it can access when called through its API. This information
173 does not show up in a Rust data structure carried around when these methods are
174 called. However, when implementing a Rust backend for `rustc`, these methods
175 will need information from `CodegenCx`, hence the additional parameter (unused
176 in the LLVM implementation of the trait).
177
178 ### State of the code after the refactoring
179
180 The traits offer an API which is very similar to the API of LLVM. This is not
181 the best solution since LLVM has a very special way of doing things: when
182 adding another backend, the traits definition might be changed in order to
183 offer more flexibility.
184
185 However, the current separation between backend-agnostic and LLVM-specific code
186 has allowed the reuse of a significant part of the old `rustc_codegen_llvm`.
187 Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
188 most important elements:
189
190 * `back` folder: 3,800 (BA) vs 4,100 (LLVM);
191 * `mir` folder: 4,400 (BA) vs 0 (LLVM);
192 * `base.rs`: 1,100 (BA) vs 250 (LLVM);
193 * `builder.rs`: 1,400 (BA) vs 0 (LLVM);
194 * `common.rs`: 350 (BA) vs 350 (LLVM);
195
196 The `debuginfo` folder has been left almost untouched by the splitting and is
197 specific to LLVM. Only its high-level features have been traitified.
198
199 The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
200 27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
201 18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
202 `rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
203 approximately 10,000 LOC that would otherwise have had to be duplicated between
204 the multiple backends of `rustc`.
205
206 The refactored version of `rustc`'s backend introduced no regression over the
207 test suite nor in performance benchmark, which is in coherence with the nature
208 of the refactoring that used only compile-time parametricity (no trait
209 objects).