]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/appendix/background.md
New upstream version 1.68.2+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / appendix / background.md
CommitLineData
6a06907d 1# Background topics
a1dfa0c6
XL
2
3This section covers a numbers of common compiler terms that arise in
4this guide. We try to give the general definition while providing some
5Rust-specific context.
6
7<a name="cfg"></a>
8
9## What is a control-flow graph?
10
6a06907d 11A control-flow graph (CFG) is a common term from compilers. If you've ever
a1dfa0c6
XL
12used a flow-chart, then the concept of a control-flow graph will be
13pretty familiar to you. It's a representation of your program that
6a06907d 14clearly exposes the underlying control flow.
a1dfa0c6
XL
15
16A control-flow graph is structured as a set of **basic blocks**
17connected by edges. The key idea of a basic block is that it is a set
18of statements that execute "together" – that is, whenever you branch
19to a basic block, you start at the first statement and then execute
20all the remainder. Only at the end of the block is there the
21possibility of branching to more than one place (in MIR, we call that
22final statement the **terminator**):
23
24```mir
25bb0: {
26 statement0;
27 statement1;
28 statement2;
29 ...
30 terminator;
31}
32```
33
34Many expressions that you are used to in Rust compile down to multiple
35basic blocks. For example, consider an if statement:
36
37```rust,ignore
38a = 1;
39if some_variable {
40 b = 1;
41} else {
42 c = 1;
43}
44d = 1;
45```
46
6a06907d
XL
47This would compile into four basic blocks in MIR. In textual form, it looks like
48this:
a1dfa0c6
XL
49
50```mir
51BB0: {
52 a = 1;
6a06907d
XL
53 if some_variable {
54 goto BB1;
55 } else {
56 goto BB2;
57 }
a1dfa0c6
XL
58}
59
60BB1: {
61 b = 1;
62 goto BB3;
63}
64
65BB2: {
66 c = 1;
67 goto BB3;
68}
69
70BB3: {
71 d = 1;
6a06907d 72 ...
a1dfa0c6
XL
73}
74```
75
6a06907d
XL
76In graphical form, it looks like this:
77
78```
79 BB0
80 +--------------------+
81 | a = 1; |
82 +--------------------+
83 / \
84 if some_variable else
85 / \
86 BB1 / \ BB2
87 +-----------+ +-----------+
88 | b = 1; | | c = 1; |
89 +-----------+ +-----------+
90 \ /
91 \ /
92 \ BB3 /
93 +----------+
94 | d = 1; |
95 | ... |
96 +----------+
97```
98
a1dfa0c6
XL
99When using a control-flow graph, a loop simply appears as a cycle in
100the graph, and the `break` keyword translates into a path out of that
101cycle.
102
103<a name="dataflow"></a>
104
105## What is a dataflow analysis?
106
107[*Static Program Analysis*](https://cs.au.dk/~amoeller/spa/) by Anders Møller
108and Michael I. Schwartzbach is an incredible resource!
109
6a06907d
XL
110_Dataflow analysis_ is a type of static analysis that is common in many
111compilers. It describes a general technique, rather than a particular analysis.
112
113The basic idea is that we can walk over a [control-flow graph (CFG)](#cfg) and
114keep track of what some value could be. At the end of the walk, we might have
115shown that some claim is true or not necessarily true (e.g. "this variable must
116be initialized"). `rustc` tends to do dataflow analyses over the MIR, since MIR
117is already a CFG.
118
119For example, suppose we want to check that `x` is initialized before it is used
120in this snippet:
121
122```rust,ignore
123fn foo() {
124 let mut x;
125
126 if some_cond {
127 x = 1;
128 }
129
130 dbg!(x);
131}
132```
133
134A CFG for this code might look like this:
135
136```txt
137 +------+
138 | Init | (A)
139 +------+
140 | \
141 | if some_cond
142 else \ +-------+
143 | \| x = 1 | (B)
144 | +-------+
145 | /
146 +---------+
147 | dbg!(x) | (C)
148 +---------+
149```
150
151We can do the dataflow analysis as follows: we will start off with a flag `init`
152which indicates if we know `x` is initialized. As we walk the CFG, we will
153update the flag. At the end, we can check its value.
154
155So first, in block (A), the variable `x` is declared but not initialized, so
156`init = false`. In block (B), we initialize the value, so we know that `x` is
157initialized. So at the end of (B), `init = true`.
158
159Block (C) is where things get interesting. Notice that there are two incoming
160edges, one from (A) and one from (B), corresponding to whether `some_cond` is true or not.
161But we cannot know that! It could be the case the `some_cond` is always true,
162so that `x` is actually always initialized. It could also be the case that
163`some_cond` depends on something random (e.g. the time), so `x` may not be
164initialized. In general, we cannot know statically (due to [Rice's
165Theorem][rice]). So what should the value of `init` be in block (C)?
166
167[rice]: https://en.wikipedia.org/wiki/Rice%27s_theorem
168
169Generally, in dataflow analyses, if a block has multiple parents (like (C) in
170our example), its dataflow value will be some function of all its parents (and
171of course, what happens in (C)). Which function we use depends on the analysis
172we are doing.
173
174In this case, we want to be able to prove definitively that `x` must be
175initialized before use. This forces us to be conservative and assume that
176`some_cond` might be false sometimes. So our "merging function" is "and". That
177is, `init = true` in (C) if `init = true` in (A) _and_ in (B) (or if `x` is
178initialized in (C)). But this is not the case; in particular, `init = false` in
179(A), and `x` is not initialized in (C). Thus, `init = false` in (C); we can
180report an error that "`x` may not be initialized before use".
181
182There is definitely a lot more that can be said about dataflow analyses. There is an
183extensive body of research literature on the topic, including a lot of theory.
184We only discussed a forwards analysis, but backwards dataflow analysis is also
185useful. For example, rather than starting from block (A) and moving forwards,
186we might have started with the usage of `x` and moved backwards to try to find
187its initialization.
a1dfa0c6
XL
188
189<a name="quantified"></a>
190
191## What is "universally quantified"? What about "existentially quantified"?
192
6a06907d
XL
193In math, a predicate may be _universally quantified_ or _existentially
194quantified_:
195
196- _Universal_ quantification:
197 - the predicate holds if it is true for all possible inputs.
198 - Traditional notation: ∀x: P(x). Read as "for all x, P(x) holds".
199- _Existential_ quantification:
200 - the predicate holds if there is any input where it is true, i.e., there
201 only has to be a single input.
202 - Traditional notation: ∃x: P(x). Read as "there exists x such that P(x) holds".
203
204In Rust, they come up in type checking and trait solving. For example,
205
206```rust,ignore
207fn foo<T>()
208```
209This function claims that the function is well-typed for all types `T`: `∀ T: well_typed(foo)`.
210
211Another example:
212
213```rust,ignore
214fn foo<'a>(_: &'a usize)
215```
216This function claims that for any lifetime `'a` (determined by the
217caller), it is well-typed: `∀ 'a: well_typed(foo)`.
218
219Another example:
220
221```rust,ignore
222fn foo<F>()
223where for<'a> F: Fn(&'a u8)
224```
225This function claims that it is well-typed for all types `F` such that for all
226lifetimes `'a`, `F: Fn(&'a u8)`: `∀ F: ∀ 'a: (F: Fn(&'a u8)) => well_typed(foo)`.
227
228One more example:
229
230```rust,ignore
231fn foo(_: dyn Debug)
232```
233This function claims that there exists some type `T` that implements `Debug`
234such that the function is well-typed: `∃ T: (T: Debug) and well_typed(foo)`.
a1dfa0c6
XL
235
236<a name="variance"></a>
237
6a06907d
XL
238## What is a de Bruijn Index?
239
240[De Bruijn indices][wikideb] are a way of representing, using only integers,
241which variables are bound in which binders. They were originally invented for
242use in lambda calculus evaluation (see [this Wikipedia article][wikideb] for
243more). In `rustc`, we use de Bruijn indices to [represent generic types][sub].
244
245[wikideb]: https://en.wikipedia.org/wiki/De_Bruijn_index
246[sub]: ../generics.md
247
248Here is a basic example of how de Bruijn indices might be used for closures (we
249don't actually do this in `rustc` though!):
250
251```rust,ignore
252|x| {
253 f(x) // de Bruijn index of `x` is 1 because `x` is bound 1 level up
254
255 |y| {
256 g(x, y) // index of `x` is 2 because it is bound 2 levels up
257 // index of `y` is 1 because it is bound 1 level up
258 }
259}
260```
261
262## What are co- and contra-variance?
a1dfa0c6
XL
263
264Check out the subtyping chapter from the
265[Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html).
266
267See the [variance](../variance.html) chapter of this guide for more info on how
268the type checker handles variance.
269
270<a name="free-vs-bound"></a>
271
272## What is a "free region" or a "free variable"? What about "bound region"?
273
274Let's describe the concepts of free vs bound in terms of program
275variables, since that's the thing we're most familiar with.
276
6a06907d
XL
277- Consider this expression, which creates a closure: `|a, b| a + b`.
278 Here, the `a` and `b` in `a + b` refer to the arguments that the closure will
279 be given when it is called. We say that the `a` and `b` there are **bound** to
280 the closure, and that the closure signature `|a, b|` is a **binder** for the
281 names `a` and `b` (because any references to `a` or `b` within refer to the
282 variables that it introduces).
283- Consider this expression: `a + b`. In this expression, `a` and `b` refer to
284 local variables that are defined *outside* of the expression. We say that
285 those variables **appear free** in the expression (i.e., they are **free**,
286 not **bound** (tied up)).
a1dfa0c6
XL
287
288So there you have it: a variable "appears free" in some
289expression/statement/whatever if it refers to something defined
290outside of that expressions/statement/whatever. Equivalently, we can
291then refer to the "free variables" of an expression – which is just
292the set of variables that "appear free".
293
294So what does this have to do with regions? Well, we can apply the
295analogous concept to type and regions. For example, in the type `&'a
296u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it
297does not.
60c5eb7d
XL
298
299# Further Reading About Compilers
300
301> Thanks to `mem`, `scottmcm`, and `Levi` on the official Discord for the
302> recommendations, and to `tinaun` for posting a link to a [twitter thread from
303> Graydon Hoare](https://twitter.com/graydon_pub/status/1039615569132118016)
304> which had some more recommendations!
6a06907d 305>
60c5eb7d
XL
306> Other sources: https://gcc.gnu.org/wiki/ListOfCompilerBooks
307>
308> If you have other suggestions, please feel free to open an issue or PR.
309
310## Books
311- [Types and Programming Languages](https://www.cis.upenn.edu/~bcpierce/tapl/)
312- [Programming Language Pragmatics](https://www.cs.rochester.edu/~scott/pragmatics/)
313- [Practical Foundations for Programming Languages](https://www.cs.cmu.edu/~rwh/pfpl/2nded.pdf)
6a06907d 314- [Compilers: Principles, Techniques, and Tools, 2nd Edition](https://www.pearson.com/us/higher-education/program/Aho-Compilers-Principles-Techniques-and-Tools-2nd-Edition/PGM167067.html)
60c5eb7d 315- [Garbage Collection: Algorithms for Automatic Dynamic Memory Management](https://www.cs.kent.ac.uk/people/staff/rej/gcbook/)
6a06907d 316- [Linkers and Loaders](https://www.amazon.com/Linkers-Kaufmann-Software-Engineering-Programming/dp/1558604960) (There are also free versions of this, but the version we had linked seems to be offline at the moment.)
60c5eb7d
XL
317- [Advanced Compiler Design and Implementation](https://www.goodreads.com/book/show/887908.Advanced_Compiler_Design_and_Implementation)
318- [Building an Optimizing Compiler](https://www.goodreads.com/book/show/2063103.Building_an_Optimizing_Compiler)
319- [Crafting Interpreters](http://www.craftinginterpreters.com/)
320
321## Courses
322- [University of Oregon Programming Languages Summer School archive](https://www.cs.uoregon.edu/research/summerschool/archives.html)
323
324## Wikis
325- [Wikipedia](https://en.wikipedia.org/wiki/List_of_programming_languages_by_type)
326- [Esoteric Programming Languages](https://esolangs.org/wiki/Main_Page)
327- [Stanford Encyclopedia of Philosophy](https://plato.stanford.edu/index.html)
328- [nLab](https://ncatlab.org/nlab/show/HomePage)
329
330## Misc Papers and Blog Posts
6522a427 331- [Programming in Martin-Löf's Type Theory](https://www.researchgate.net/publication/213877272_Programming_in_Martin-Lof's_Type_Theory)
6a06907d 332- [Polymorphism, Subtyping, and Type Inference in MLsub](https://dl.acm.org/doi/10.1145/3093333.3009882)