[rustc.git] / src / doc / rustc-dev-guide / src / part-5-intro.md

# From MIR to Binaries

All of the preceding chapters of this guide have one thing in common: we never
generated any executable machine code at all! With this chapter, all of that
changes.

So far, we've shown how the compiler can take raw source code in text format
and transform it into [MIR]. We have also shown how the compiler does various
analyses on the code to detect things like type or lifetime errors. Now, we
will finally take the MIR and produce some executable machine code.

[MIR]: ./mir/index.md

> NOTE: This part of a compiler is often called the _backend_ the term is a bit
> overloaded because in the compiler source, it usually refers to the "codegen
> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend"
> in this part, we are referring to the "codegen backend".

So what do we need to do?

0. First, we need to collect the set of things to generate code for. In
   particular, we need to find out which concrete types to substitute for
   generic ones, since we need to generate code for the concrete types.
   Generating code for the concrete types (i.e. emitting a copy of the code for
   each concrete type) is called _monomorphization_, so the process of
   collecting all the concrete types is called _monomorphization collection_.
1. Next, we need to actually lower the MIR to a codegen IR
   (usually LLVM IR) for each concrete type we collected.
2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of
   optimization passes, generates executable code, and links together an
   executable binary.

[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html

The code for codegen is actually a bit complex due to a few factors:

- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much
  backend code between them as possible, so a lot of it is generic over the
  codegen implementation. This means that there are often a lot of layers of
  abstraction.
- Codegen happens asynchronously in another thread for performance.
- The actual codegen is done by a third-party library (either LLVM or Cranelift).

Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code
(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm]
crate contains code specific to LLVM codegen.

[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
[llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html

At a very high level, the entry point is
[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the
process discussed in the rest of this chapter.
Commit	Line	Data
ba9703b0 XL	1	# From MIR to Binaries
	2
	3	All of the preceding chapters of this guide have one thing in common: we never
	4	generated any executable machine code at all! With this chapter, all of that
	5	changes.
	6
	7	So far, we've shown how the compiler can take raw source code in text format
6a06907d	8	and transform it into [MIR]. We have also shown how the compiler does various
ba9703b0 XL	9	analyses on the code to detect things like type or lifetime errors. Now, we
	10	will finally take the MIR and produce some executable machine code.
	11
6a06907d XL	12	[MIR]: ./mir/index.md
6a06907d XL	13
ba9703b0 XL	14	> NOTE: This part of a compiler is often called the _backend_ the term is a bit
	15	> overloaded because in the compiler source, it usually refers to the "codegen
	16	> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend"
6a06907d	17	> in this part, we are referring to the "codegen backend".
ba9703b0 XL	18
	19	So what do we need to do?
	20
	21	0. First, we need to collect the set of things to generate code for. In
	22	particular, we need to find out which concrete types to substitute for
	23	generic ones, since we need to generate code for the concrete types.
	24	Generating code for the concrete types (i.e. emitting a copy of the code for
	25	each concrete type) is called _monomorphization_, so the process of
	26	collecting all the concrete types is called _monomorphization collection_.
	27	1. Next, we need to actually lower the MIR to a codegen IR
	28	(usually LLVM IR) for each concrete type we collected.
	29	2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of
	30	optimization passes, generates executable code, and links together an
	31	executable binary.
	32
	33	[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
	34
	35	The code for codegen is actually a bit complex due to a few factors:
	36
	37	- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much
	38	backend code between them as possible, so a lot of it is generic over the
	39	codegen implementation. This means that there are often a lot of layers of
	40	abstraction.
	41	- Codegen happens asynchronously in another thread for performance.
	42	- The actual codegen is done by a third-party library (either LLVM or Cranelift).
	43
	44	Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code
	45	(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm]
	46	crate contains code specific to LLVM codegen.
	47
	48	[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
	49	[llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html
	50
	51	At a very high level, the entry point is
	52	[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the
	53	process discussed in the rest of this chapter.
	54