]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/backend/backend-agnostic.md
Merge tag 'debian/1.52.1+dfsg1-1_exp2' into proxmox/buster
[rustc.git] / src / doc / rustc-dev-guide / src / backend / backend-agnostic.md
CommitLineData
60c5eb7d
XL
1# Backend Agnostic Codegen
2
6a06907d 3<!-- toc -->
60c5eb7d 4
6a06907d
XL
5As of <!-- date: 2021-01 --> January 2021, `rustc_codegen_ssa` provides an
6abstract interface for all backends to implement, to allow other codegen
7backends (e.g. [Cranelift]).
8
9[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift
60c5eb7d
XL
10
11> The following is a copy/paste of a README from the rust-lang/rust repo.
12> Please submit a PR if it needs updating.
13
14# Refactoring of `rustc_codegen_llvm`
15by Denis Merigoux, October 23rd 2018
16
17## State of the code before the refactoring
18
19All the code related to the compilation of MIR into LLVM IR was contained
20inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
21important elements:
22* the `back` folder (7,800 LOC) implements the mechanisms for creating the
23 different object files and archive through LLVM, but also the communication
24 mechanisms for parallel code generation;
25* the `debuginfo` (3,200 LOC) folder contains all code that passes debug
26 information down to LLVM;
27* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
28 LLVM using the C++ API;
29* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
30 IR;
31* the `base.rs` (1,300 LOC) file contains some helper functions but also the
32 high-level code that launches the code generation and distributes the work.
33* the `builder.rs` (1,200 LOC) file contains all the functions generating
34 individual LLVM IR instructions inside a basic block;
35* the `common.rs` (450 LOC) contains various helper functions and all the
36 functions generating LLVM static values;
37* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
38
39The goal of this refactoring is to separate inside this crate code that is
40specific to the LLVM from code that can be reused for other rustc backends. For
41instance, the `mir` folder is almost entirely backend-specific but it relies
42heavily on other parts of the crate. The separation of the code must not affect
43the logic of the code nor its performance.
44
45For these reasons, the separation process involves two transformations that
46have to be done at the same time for the resulting code to compile :
47
481. replace all the LLVM-specific types by generics inside function signatures
49 and structure definitions;
502. encapsulate all functions calling the LLVM FFI inside a set of traits that
51 will define the interface between backend-agnostic code and the backend.
52
53While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
54traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
55suggestion by @eddyb).
56
57## Generic types and structures
58
59@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
60generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
61work has been extended to all structures inside the `mir` folder and elsewhere,
62as well as for LLVM's `BasicBlock` and `Type` types.
63
64The two most important structures for the LLVM codegen are `CodegenCx` and
65`Builder`. They are parametrized by multiple lifetime parameters and the type
66for `Value`.
67
68```rust,ignore
69struct CodegenCx<'ll, 'tcx> {
70 /* ... */
71}
72
73struct Builder<'a, 'll, 'tcx> {
74 cx: &'a CodegenCx<'ll, 'tcx>,
75 /* ... */
76}
77```
78
79`CodegenCx` is used to compile one codegen-unit that can contain multiple
80functions, whereas `Builder` is created to compile one basic block.
81
82The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
83parameters, that correspond to the following:
84* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
85 containing the program's information;
86* `'a` is a short-lived reference of a `CodegenCx` or another object inside a
87 struct;
88* `'ll` is the lifetime of references to LLVM objects such as `Value` or
89 `Type`.
90
91Although there are already many lifetime parameters in the code, making it
92generic uncovered situations where the borrow-checker was passing only due to
93the special nature of the LLVM objects manipulated (they are extern pointers).
74b04a01 94For instance, an additional lifetime parameter had to be added to
60c5eb7d
XL
95`LocalAnalyser` in `analyse.rs`, leading to the definition:
96
97```rust,ignore
98struct LocalAnalyzer<'mir, 'a, 'tcx> {
99 /* ... */
100}
101```
102
103However, the two most important structures `CodegenCx` and `Builder` are not
104defined in the backend-agnostic code. Indeed, their content is highly specific
105of the backend and it makes more sense to leave their definition to the backend
106implementor than to allow just a narrow spot via a generic field for the
107backend's context.
108
109## Traits and interface
110
111Because they have to be defined by the backend, `CodegenCx` and `Builder` will
112be the structures implementing all the traits defining the backend's interface.
113These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
114backend-agnostic code is parametrized by them. For instance, let us explain how
115a function in `base.rs` is parametrized:
116
117```rust,ignore
118pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
119 cx: &'a Bx::CodegenCx,
120 instance: Instance<'tcx>
121) {
122 /* ... */
123}
124```
125
126In this signature, we have the two lifetime parameters explained earlier and
127the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
128to the interface satisfied by the `Builder` struct. The `BuilderMethods`
129defines an associated type `Bx::CodegenCx` that itself satisfies the
130`CodegenMethods` traits implemented by the struct `CodegenCx`.
131
132On the trait side, here is an example with part of the definition of
133`BuilderMethods` in `traits/builder.rs`:
134
135```rust,ignore
136pub trait BuilderMethods<'a, 'tcx>:
137 HasCodegen<'tcx>
138 + DebugInfoBuilderMethods<'tcx>
139 + ArgTypeMethods<'tcx>
140 + AbiBuilderMethods<'tcx>
141 + IntrinsicCallMethods<'tcx>
142 + AsmBuilderMethods<'tcx>
143{
144 fn new_block<'b>(
145 cx: &'a Self::CodegenCx,
146 llfn: Self::Function,
147 name: &'b str
148 ) -> Self;
149 /* ... */
150 fn cond_br(
151 &mut self,
152 cond: Self::Value,
153 then_llbb: Self::BasicBlock,
154 else_llbb: Self::BasicBlock,
155 );
156 /* ... */
157}
158```
159
160Finally, a master structure implementing the `ExtraBackendMethods` trait is
161used for high-level codegen-driving functions like `codegen_crate` in
162`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
163`ExtraBackendMethods` should be implemented by the same structure that
164implements the `CodegenBackend` defined in
165`rustc_codegen_utils/codegen_backend.rs`.
166
167During the traitification process, certain functions have been converted from
168methods of a local structure to methods of `CodegenCx` or `Builder` and a
169corresponding `self` parameter has been added. Indeed, LLVM stores information
170internally that it can access when called through its API. This information
171does not show up in a Rust data structure carried around when these methods are
172called. However, when implementing a Rust backend for `rustc`, these methods
173will need information from `CodegenCx`, hence the additional parameter (unused
174in the LLVM implementation of the trait).
175
176## State of the code after the refactoring
177
178The traits offer an API which is very similar to the API of LLVM. This is not
179the best solution since LLVM has a very special way of doing things: when
6a06907d 180adding another backend, the traits definition might be changed in order to
60c5eb7d
XL
181offer more flexibility.
182
183However, the current separation between backend-agnostic and LLVM-specific code
74b04a01 184has allowed the reuse of a significant part of the old `rustc_codegen_llvm`.
60c5eb7d
XL
185Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
186most important elements:
187
188* `back` folder: 3,800 (BA) vs 4,100 (LLVM);
189* `mir` folder: 4,400 (BA) vs 0 (LLVM);
190* `base.rs`: 1,100 (BA) vs 250 (LLVM);
191* `builder.rs`: 1,400 (BA) vs 0 (LLVM);
192* `common.rs`: 350 (BA) vs 350 (LLVM);
193
194The `debuginfo` folder has been left almost untouched by the splitting and is
195specific to LLVM. Only its high-level features have been traitified.
196
197The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
19827,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
19918,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
200`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
201approximately 10,000 LOC that would otherwise have had to be duplicated between
202the multiple backends of `rustc`.
203
204The refactored version of `rustc`'s backend introduced no regression over the
205test suite nor in performance benchmark, which is in coherence with the nature
206of the refactoring that used only compile-time parametricity (no trait
207objects).