]>
Commit | Line | Data |
---|---|---|
6a06907d | 1 | # Background topics |
a1dfa0c6 XL |
2 | |
3 | This section covers a numbers of common compiler terms that arise in | |
4 | this guide. We try to give the general definition while providing some | |
5 | Rust-specific context. | |
6 | ||
7 | <a name="cfg"></a> | |
8 | ||
9 | ## What is a control-flow graph? | |
10 | ||
6a06907d | 11 | A control-flow graph (CFG) is a common term from compilers. If you've ever |
a1dfa0c6 XL |
12 | used a flow-chart, then the concept of a control-flow graph will be |
13 | pretty familiar to you. It's a representation of your program that | |
6a06907d | 14 | clearly exposes the underlying control flow. |
a1dfa0c6 XL |
15 | |
16 | A control-flow graph is structured as a set of **basic blocks** | |
17 | connected by edges. The key idea of a basic block is that it is a set | |
18 | of statements that execute "together" – that is, whenever you branch | |
19 | to a basic block, you start at the first statement and then execute | |
20 | all the remainder. Only at the end of the block is there the | |
21 | possibility of branching to more than one place (in MIR, we call that | |
22 | final statement the **terminator**): | |
23 | ||
24 | ```mir | |
25 | bb0: { | |
26 | statement0; | |
27 | statement1; | |
28 | statement2; | |
29 | ... | |
30 | terminator; | |
31 | } | |
32 | ``` | |
33 | ||
34 | Many expressions that you are used to in Rust compile down to multiple | |
35 | basic blocks. For example, consider an if statement: | |
36 | ||
37 | ```rust,ignore | |
38 | a = 1; | |
39 | if some_variable { | |
40 | b = 1; | |
41 | } else { | |
42 | c = 1; | |
43 | } | |
44 | d = 1; | |
45 | ``` | |
46 | ||
6a06907d XL |
47 | This would compile into four basic blocks in MIR. In textual form, it looks like |
48 | this: | |
a1dfa0c6 XL |
49 | |
50 | ```mir | |
51 | BB0: { | |
52 | a = 1; | |
6a06907d XL |
53 | if some_variable { |
54 | goto BB1; | |
55 | } else { | |
56 | goto BB2; | |
57 | } | |
a1dfa0c6 XL |
58 | } |
59 | ||
60 | BB1: { | |
61 | b = 1; | |
62 | goto BB3; | |
63 | } | |
64 | ||
65 | BB2: { | |
66 | c = 1; | |
67 | goto BB3; | |
68 | } | |
69 | ||
70 | BB3: { | |
71 | d = 1; | |
6a06907d | 72 | ... |
a1dfa0c6 XL |
73 | } |
74 | ``` | |
75 | ||
6a06907d XL |
76 | In graphical form, it looks like this: |
77 | ||
78 | ``` | |
79 | BB0 | |
80 | +--------------------+ | |
81 | | a = 1; | | |
82 | +--------------------+ | |
83 | / \ | |
84 | if some_variable else | |
85 | / \ | |
86 | BB1 / \ BB2 | |
87 | +-----------+ +-----------+ | |
88 | | b = 1; | | c = 1; | | |
89 | +-----------+ +-----------+ | |
90 | \ / | |
91 | \ / | |
92 | \ BB3 / | |
93 | +----------+ | |
94 | | d = 1; | | |
95 | | ... | | |
96 | +----------+ | |
97 | ``` | |
98 | ||
a1dfa0c6 XL |
99 | When using a control-flow graph, a loop simply appears as a cycle in |
100 | the graph, and the `break` keyword translates into a path out of that | |
101 | cycle. | |
102 | ||
103 | <a name="dataflow"></a> | |
104 | ||
105 | ## What is a dataflow analysis? | |
106 | ||
107 | [*Static Program Analysis*](https://cs.au.dk/~amoeller/spa/) by Anders Møller | |
108 | and Michael I. Schwartzbach is an incredible resource! | |
109 | ||
6a06907d XL |
110 | _Dataflow analysis_ is a type of static analysis that is common in many |
111 | compilers. It describes a general technique, rather than a particular analysis. | |
112 | ||
113 | The basic idea is that we can walk over a [control-flow graph (CFG)](#cfg) and | |
114 | keep track of what some value could be. At the end of the walk, we might have | |
115 | shown that some claim is true or not necessarily true (e.g. "this variable must | |
116 | be initialized"). `rustc` tends to do dataflow analyses over the MIR, since MIR | |
117 | is already a CFG. | |
118 | ||
119 | For example, suppose we want to check that `x` is initialized before it is used | |
120 | in this snippet: | |
121 | ||
122 | ```rust,ignore | |
123 | fn foo() { | |
124 | let mut x; | |
125 | ||
126 | if some_cond { | |
127 | x = 1; | |
128 | } | |
129 | ||
130 | dbg!(x); | |
131 | } | |
132 | ``` | |
133 | ||
134 | A CFG for this code might look like this: | |
135 | ||
136 | ```txt | |
137 | +------+ | |
138 | | Init | (A) | |
139 | +------+ | |
140 | | \ | |
141 | | if some_cond | |
142 | else \ +-------+ | |
143 | | \| x = 1 | (B) | |
144 | | +-------+ | |
145 | | / | |
146 | +---------+ | |
147 | | dbg!(x) | (C) | |
148 | +---------+ | |
149 | ``` | |
150 | ||
151 | We can do the dataflow analysis as follows: we will start off with a flag `init` | |
152 | which indicates if we know `x` is initialized. As we walk the CFG, we will | |
153 | update the flag. At the end, we can check its value. | |
154 | ||
155 | So first, in block (A), the variable `x` is declared but not initialized, so | |
156 | `init = false`. In block (B), we initialize the value, so we know that `x` is | |
157 | initialized. So at the end of (B), `init = true`. | |
158 | ||
159 | Block (C) is where things get interesting. Notice that there are two incoming | |
160 | edges, one from (A) and one from (B), corresponding to whether `some_cond` is true or not. | |
161 | But we cannot know that! It could be the case the `some_cond` is always true, | |
162 | so that `x` is actually always initialized. It could also be the case that | |
163 | `some_cond` depends on something random (e.g. the time), so `x` may not be | |
164 | initialized. In general, we cannot know statically (due to [Rice's | |
165 | Theorem][rice]). So what should the value of `init` be in block (C)? | |
166 | ||
167 | [rice]: https://en.wikipedia.org/wiki/Rice%27s_theorem | |
168 | ||
169 | Generally, in dataflow analyses, if a block has multiple parents (like (C) in | |
170 | our example), its dataflow value will be some function of all its parents (and | |
171 | of course, what happens in (C)). Which function we use depends on the analysis | |
172 | we are doing. | |
173 | ||
174 | In this case, we want to be able to prove definitively that `x` must be | |
175 | initialized before use. This forces us to be conservative and assume that | |
176 | `some_cond` might be false sometimes. So our "merging function" is "and". That | |
177 | is, `init = true` in (C) if `init = true` in (A) _and_ in (B) (or if `x` is | |
178 | initialized in (C)). But this is not the case; in particular, `init = false` in | |
179 | (A), and `x` is not initialized in (C). Thus, `init = false` in (C); we can | |
180 | report an error that "`x` may not be initialized before use". | |
181 | ||
182 | There is definitely a lot more that can be said about dataflow analyses. There is an | |
183 | extensive body of research literature on the topic, including a lot of theory. | |
184 | We only discussed a forwards analysis, but backwards dataflow analysis is also | |
185 | useful. For example, rather than starting from block (A) and moving forwards, | |
186 | we might have started with the usage of `x` and moved backwards to try to find | |
187 | its initialization. | |
a1dfa0c6 XL |
188 | |
189 | <a name="quantified"></a> | |
190 | ||
191 | ## What is "universally quantified"? What about "existentially quantified"? | |
192 | ||
6a06907d XL |
193 | In math, a predicate may be _universally quantified_ or _existentially |
194 | quantified_: | |
195 | ||
196 | - _Universal_ quantification: | |
197 | - the predicate holds if it is true for all possible inputs. | |
198 | - Traditional notation: ∀x: P(x). Read as "for all x, P(x) holds". | |
199 | - _Existential_ quantification: | |
200 | - the predicate holds if there is any input where it is true, i.e., there | |
201 | only has to be a single input. | |
202 | - Traditional notation: ∃x: P(x). Read as "there exists x such that P(x) holds". | |
203 | ||
204 | In Rust, they come up in type checking and trait solving. For example, | |
205 | ||
206 | ```rust,ignore | |
207 | fn foo<T>() | |
208 | ``` | |
209 | This function claims that the function is well-typed for all types `T`: `∀ T: well_typed(foo)`. | |
210 | ||
211 | Another example: | |
212 | ||
213 | ```rust,ignore | |
214 | fn foo<'a>(_: &'a usize) | |
215 | ``` | |
216 | This function claims that for any lifetime `'a` (determined by the | |
217 | caller), it is well-typed: `∀ 'a: well_typed(foo)`. | |
218 | ||
219 | Another example: | |
220 | ||
221 | ```rust,ignore | |
222 | fn foo<F>() | |
223 | where for<'a> F: Fn(&'a u8) | |
224 | ``` | |
225 | This function claims that it is well-typed for all types `F` such that for all | |
226 | lifetimes `'a`, `F: Fn(&'a u8)`: `∀ F: ∀ 'a: (F: Fn(&'a u8)) => well_typed(foo)`. | |
227 | ||
228 | One more example: | |
229 | ||
230 | ```rust,ignore | |
231 | fn foo(_: dyn Debug) | |
232 | ``` | |
233 | This function claims that there exists some type `T` that implements `Debug` | |
234 | such that the function is well-typed: `∃ T: (T: Debug) and well_typed(foo)`. | |
a1dfa0c6 XL |
235 | |
236 | <a name="variance"></a> | |
237 | ||
6a06907d XL |
238 | ## What is a de Bruijn Index? |
239 | ||
240 | [De Bruijn indices][wikideb] are a way of representing, using only integers, | |
241 | which variables are bound in which binders. They were originally invented for | |
242 | use in lambda calculus evaluation (see [this Wikipedia article][wikideb] for | |
243 | more). In `rustc`, we use de Bruijn indices to [represent generic types][sub]. | |
244 | ||
245 | [wikideb]: https://en.wikipedia.org/wiki/De_Bruijn_index | |
246 | [sub]: ../generics.md | |
247 | ||
248 | Here is a basic example of how de Bruijn indices might be used for closures (we | |
249 | don't actually do this in `rustc` though!): | |
250 | ||
251 | ```rust,ignore | |
252 | |x| { | |
253 | f(x) // de Bruijn index of `x` is 1 because `x` is bound 1 level up | |
254 | ||
255 | |y| { | |
256 | g(x, y) // index of `x` is 2 because it is bound 2 levels up | |
257 | // index of `y` is 1 because it is bound 1 level up | |
258 | } | |
259 | } | |
260 | ``` | |
261 | ||
262 | ## What are co- and contra-variance? | |
a1dfa0c6 XL |
263 | |
264 | Check out the subtyping chapter from the | |
265 | [Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html). | |
266 | ||
267 | See the [variance](../variance.html) chapter of this guide for more info on how | |
268 | the type checker handles variance. | |
269 | ||
270 | <a name="free-vs-bound"></a> | |
271 | ||
272 | ## What is a "free region" or a "free variable"? What about "bound region"? | |
273 | ||
274 | Let's describe the concepts of free vs bound in terms of program | |
275 | variables, since that's the thing we're most familiar with. | |
276 | ||
6a06907d XL |
277 | - Consider this expression, which creates a closure: `|a, b| a + b`. |
278 | Here, the `a` and `b` in `a + b` refer to the arguments that the closure will | |
279 | be given when it is called. We say that the `a` and `b` there are **bound** to | |
280 | the closure, and that the closure signature `|a, b|` is a **binder** for the | |
281 | names `a` and `b` (because any references to `a` or `b` within refer to the | |
282 | variables that it introduces). | |
283 | - Consider this expression: `a + b`. In this expression, `a` and `b` refer to | |
284 | local variables that are defined *outside* of the expression. We say that | |
285 | those variables **appear free** in the expression (i.e., they are **free**, | |
286 | not **bound** (tied up)). | |
a1dfa0c6 XL |
287 | |
288 | So there you have it: a variable "appears free" in some | |
289 | expression/statement/whatever if it refers to something defined | |
290 | outside of that expressions/statement/whatever. Equivalently, we can | |
291 | then refer to the "free variables" of an expression – which is just | |
292 | the set of variables that "appear free". | |
293 | ||
294 | So what does this have to do with regions? Well, we can apply the | |
295 | analogous concept to type and regions. For example, in the type `&'a | |
296 | u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it | |
297 | does not. | |
60c5eb7d XL |
298 | |
299 | # Further Reading About Compilers | |
300 | ||
301 | > Thanks to `mem`, `scottmcm`, and `Levi` on the official Discord for the | |
302 | > recommendations, and to `tinaun` for posting a link to a [twitter thread from | |
303 | > Graydon Hoare](https://twitter.com/graydon_pub/status/1039615569132118016) | |
304 | > which had some more recommendations! | |
6a06907d | 305 | > |
60c5eb7d XL |
306 | > Other sources: https://gcc.gnu.org/wiki/ListOfCompilerBooks |
307 | > | |
308 | > If you have other suggestions, please feel free to open an issue or PR. | |
309 | ||
310 | ## Books | |
311 | - [Types and Programming Languages](https://www.cis.upenn.edu/~bcpierce/tapl/) | |
312 | - [Programming Language Pragmatics](https://www.cs.rochester.edu/~scott/pragmatics/) | |
313 | - [Practical Foundations for Programming Languages](https://www.cs.cmu.edu/~rwh/pfpl/2nded.pdf) | |
6a06907d | 314 | - [Compilers: Principles, Techniques, and Tools, 2nd Edition](https://www.pearson.com/us/higher-education/program/Aho-Compilers-Principles-Techniques-and-Tools-2nd-Edition/PGM167067.html) |
60c5eb7d | 315 | - [Garbage Collection: Algorithms for Automatic Dynamic Memory Management](https://www.cs.kent.ac.uk/people/staff/rej/gcbook/) |
6a06907d | 316 | - [Linkers and Loaders](https://www.amazon.com/Linkers-Kaufmann-Software-Engineering-Programming/dp/1558604960) (There are also free versions of this, but the version we had linked seems to be offline at the moment.) |
60c5eb7d XL |
317 | - [Advanced Compiler Design and Implementation](https://www.goodreads.com/book/show/887908.Advanced_Compiler_Design_and_Implementation) |
318 | - [Building an Optimizing Compiler](https://www.goodreads.com/book/show/2063103.Building_an_Optimizing_Compiler) | |
319 | - [Crafting Interpreters](http://www.craftinginterpreters.com/) | |
320 | ||
321 | ## Courses | |
322 | - [University of Oregon Programming Languages Summer School archive](https://www.cs.uoregon.edu/research/summerschool/archives.html) | |
323 | ||
324 | ## Wikis | |
325 | - [Wikipedia](https://en.wikipedia.org/wiki/List_of_programming_languages_by_type) | |
326 | - [Esoteric Programming Languages](https://esolangs.org/wiki/Main_Page) | |
327 | - [Stanford Encyclopedia of Philosophy](https://plato.stanford.edu/index.html) | |
328 | - [nLab](https://ncatlab.org/nlab/show/HomePage) | |
329 | ||
330 | ## Misc Papers and Blog Posts | |
331 | - [Programming in Martin-Löf's Type Theory](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.6683&rep=rep1&type=pdf) | |
6a06907d | 332 | - [Polymorphism, Subtyping, and Type Inference in MLsub](https://dl.acm.org/doi/10.1145/3093333.3009882) |