]> git.proxmox.com Git - rustc.git/blob - src/doc/nomicon/src/subtyping.md
New upstream version 1.63.0+dfsg1
[rustc.git] / src / doc / nomicon / src / subtyping.md
1 # Subtyping and Variance
2
3 Subtyping is a relationship between types that allows statically typed
4 languages to be a bit more flexible and permissive.
5
6 Subtyping in Rust is a bit different from subtyping in other languages. This
7 makes it harder to give simple examples, which is a problem since subtyping,
8 and especially variance, is already hard to understand properly. As in,
9 even compiler writers mess it up all the time.
10
11 To keep things simple, this section will consider a small extension to the
12 Rust language that adds a new and simpler subtyping relationship. After
13 establishing concepts and issues under this simpler system,
14 we will then relate it back to how subtyping actually occurs in Rust.
15
16 So here's our simple extension, *Objective Rust*, featuring three new types:
17
18 ```rust
19 trait Animal {
20 fn snuggle(&self);
21 fn eat(&mut self);
22 }
23
24 trait Cat: Animal {
25 fn meow(&self);
26 }
27
28 trait Dog: Animal {
29 fn bark(&self);
30 }
31 ```
32
33 But unlike normal traits, we can use them as concrete and sized types, just like structs.
34
35 Now, say we have a very simple function that takes an Animal, like this:
36
37 <!-- ignore: simplified code -->
38 ```rust,ignore
39 fn love(pet: Animal) {
40 pet.snuggle();
41 }
42 ```
43
44 By default, static types must match *exactly* for a program to compile. As such,
45 this code won't compile:
46
47 <!-- ignore: simplified code -->
48 ```rust,ignore
49 let mr_snuggles: Cat = ...;
50 love(mr_snuggles); // ERROR: expected Animal, found Cat
51 ```
52
53 Mr. Snuggles is a Cat, and Cats aren't *exactly* Animals, so we can't love him! 😿
54
55 This is annoying because Cats *are* Animals. They support every operation
56 an Animal supports, so intuitively `love` shouldn't care if we pass it a `Cat`.
57 We should be able to just **forget** the non-animal parts of our `Cat`, as they
58 aren't necessary to love it.
59
60 This is exactly the problem that *subtyping* is intended to fix. Because Cats are just
61 Animals **and more**, we say Cat is a *subtype* of Animal (because Cats are a *subset*
62 of all the Animals). Equivalently, we say that Animal is a *supertype* of Cat.
63 With subtypes, we can tweak our overly strict static type system
64 with a simple rule: anywhere a value of type `T` is expected, we will also
65 accept values that are subtypes of `T`.
66
67 Or more concretely: anywhere an Animal is expected, a Cat or Dog will also work.
68
69 As we will see throughout the rest of this section, subtyping is a lot more complicated
70 and subtle than this, but this simple rule is a very good 99% intuition. And unless you
71 write unsafe code, the compiler will automatically handle all the corner cases for you.
72
73 But this is the Rustonomicon. We're writing unsafe code, so we need to understand how
74 this stuff really works, and how we can mess it up.
75
76 The core problem is that this rule, naively applied, will lead to *meowing Dogs*. That is,
77 we can convince someone that a Dog is actually a Cat. This completely destroys the fabric
78 of our static type system, making it worse than useless (and leading to Undefined Behavior).
79
80 Here's a simple example of this happening when we apply subtyping in a completely naive
81 "find and replace" way.
82
83 <!-- ignore: simplified code -->
84 ```rust,ignore
85 fn evil_feeder(pet: &mut Animal) {
86 let spike: Dog = ...;
87
88 // `pet` is an Animal, and Dog is a subtype of Animal,
89 // so this should be fine, right..?
90 *pet = spike;
91 }
92
93 fn main() {
94 let mut mr_snuggles: Cat = ...;
95 evil_feeder(&mut mr_snuggles); // Replaces mr_snuggles with a Dog
96 mr_snuggles.meow(); // OH NO, MEOWING DOG!
97 }
98 ```
99
100 Clearly, we need a more robust system than "find and replace". That system is *variance*,
101 which is a set of rules governing how subtyping should compose. Most importantly, variance
102 defines situations where subtyping should be disabled.
103
104 But before we get into variance, let's take a quick peek at where subtyping actually occurs in
105 Rust: *lifetimes*!
106
107 > NOTE: The typed-ness of lifetimes is a fairly arbitrary construct that some
108 > disagree with. However it simplifies our analysis to treat lifetimes
109 > and types uniformly.
110
111 Lifetimes are just regions of code, and regions can be partially ordered with the *contains*
112 (outlives) relationship. Subtyping on lifetimes is in terms of that relationship:
113 if `'big: 'small` ("big contains small" or "big outlives small"), then `'big` is a subtype
114 of `'small`. This is a large source of confusion, because it seems backwards
115 to many: the bigger region is a *subtype* of the smaller region. But it makes
116 sense if you consider our Animal example: Cat is an Animal *and more*,
117 just as `'big` is `'small` *and more*.
118
119 Put another way, if someone wants a reference that lives for `'small`,
120 usually what they actually mean is that they want a reference that lives
121 for *at least* `'small`. They don't actually care if the lifetimes match
122 exactly. So it should be ok for us to **forget** that something lives for
123 `'big` and only remember that it lives for `'small`.
124
125 The meowing dog problem for lifetimes will result in us being able to
126 store a short-lived reference in a place that expects a longer-lived one,
127 creating a dangling reference and letting us use-after-free.
128
129 It will be useful to note that `'static`, the forever lifetime, is a subtype of
130 every lifetime because by definition it outlives everything. We will be using
131 this relationship in later examples to keep them as simple as possible.
132
133 With all that said, we still have no idea how to actually *use* subtyping of lifetimes,
134 because nothing ever has type `'a`. Lifetimes only occur as part of some larger type
135 like `&'a u32` or `IterMut<'a, u32>`. To apply lifetime subtyping, we need to know
136 how to compose subtyping. Once again, we need *variance*.
137
138 ## Variance
139
140 Variance is where things get a bit complicated.
141
142 Variance is a property that *type constructors* have with respect to their
143 arguments. A type constructor in Rust is any generic type with unbound arguments.
144 For instance `Vec` is a type constructor that takes a type `T` and returns
145 `Vec<T>`. `&` and `&mut` are type constructors that take two inputs: a
146 lifetime, and a type to point to.
147
148 > NOTE: For convenience we will often refer to `F<T>` as a type constructor just so
149 > that we can easily talk about `T`. Hopefully this is clear in context.
150
151 A type constructor F's *variance* is how the subtyping of its inputs affects the
152 subtyping of its outputs. There are three kinds of variance in Rust. Given two
153 types `Sub` and `Super`, where `Sub` is a subtype of `Super`:
154
155 * `F` is *covariant* if `F<Sub>` is a subtype of `F<Super>` (subtyping "passes through")
156 * `F` is *contravariant* if `F<Super>` is a subtype of `F<Sub>` (subtyping is "inverted")
157 * `F` is *invariant* otherwise (no subtyping relationship exists)
158
159 If `F` has multiple type parameters, we can talk about the individual variances
160 by saying that, for example, `F<T, U>` is covariant over `T` and invariant over `U`.
161
162 It is very useful to keep in mind that covariance is, in practical terms, "the"
163 variance. Almost all consideration of variance is in terms of whether something
164 should be covariant or invariant. Actually witnessing contravariance is quite difficult
165 in Rust, though it does in fact exist.
166
167 Here is a table of important variances which the rest of this section will be devoted
168 to trying to explain:
169
170 | | | 'a | T | U |
171 |---|-----------------|:---------:|:-----------------:|:---------:|
172 | * | `&'a T ` | covariant | covariant | |
173 | * | `&'a mut T` | covariant | invariant | |
174 | * | `Box<T>` | | covariant | |
175 | | `Vec<T>` | | covariant | |
176 | * | `UnsafeCell<T>` | | invariant | |
177 | | `Cell<T>` | | invariant | |
178 | * | `fn(T) -> U` | | **contra**variant | covariant |
179 | | `*const T` | | covariant | |
180 | | `*mut T` | | invariant | |
181
182 The types with \*'s are the ones we will be focusing on, as they are in
183 some sense "fundamental". All the others can be understood by analogy to the others:
184
185 * `Vec<T>` and all other owning pointers and collections follow the same logic as `Box<T>`
186 * `Cell<T>` and all other interior mutability types follow the same logic as `UnsafeCell<T>`
187 * `*const T` follows the logic of `&T`
188 * `*mut T` follows the logic of `&mut T` (or `UnsafeCell<T>`)
189
190 For more types, see the ["Variance" section][variance-table] on the reference.
191
192 [variance-table]: ../reference/subtyping.html#variance
193
194 > NOTE: the *only* source of contravariance in the language is the arguments to
195 > a function, which is why it really doesn't come up much in practice. Invoking
196 > contravariance involves higher-order programming with function pointers that
197 > take references with specific lifetimes (as opposed to the usual "any lifetime",
198 > which gets into higher rank lifetimes, which work independently of subtyping).
199
200 Ok, that's enough type theory! Let's try to apply the concept of variance to Rust
201 and look at some examples.
202
203 First off, let's revisit the meowing dog example:
204
205 <!-- ignore: simplified code -->
206 ```rust,ignore
207 fn evil_feeder(pet: &mut Animal) {
208 let spike: Dog = ...;
209
210 // `pet` is an Animal, and Dog is a subtype of Animal,
211 // so this should be fine, right..?
212 *pet = spike;
213 }
214
215 fn main() {
216 let mut mr_snuggles: Cat = ...;
217 evil_feeder(&mut mr_snuggles); // Replaces mr_snuggles with a Dog
218 mr_snuggles.meow(); // OH NO, MEOWING DOG!
219 }
220 ```
221
222 If we look at our table of variances, we see that `&mut T` is *invariant* over `T`.
223 As it turns out, this completely fixes the issue! With invariance, the fact that
224 Cat is a subtype of Animal doesn't matter; `&mut Cat` still won't be a subtype of
225 `&mut Animal`. The static type checker will then correctly stop us from passing
226 a Cat into `evil_feeder`.
227
228 The soundness of subtyping is based on the idea that it's ok to forget unnecessary
229 details. But with references, there's always someone that remembers those details:
230 the value being referenced. That value expects those details to keep being true,
231 and may behave incorrectly if its expectations are violated.
232
233 The problem with making `&mut T` covariant over `T` is that it gives us the power
234 to modify the original value *when we don't remember all of its constraints*.
235 And so, we can make someone have a Dog when they're certain they still have a Cat.
236
237 With that established, we can easily see why `&T` being covariant over `T` *is*
238 sound: it doesn't let you modify the value, only look at it. Without any way to
239 mutate, there's no way for us to mess with any details. We can also see why
240 `UnsafeCell` and all the other interior mutability types must be invariant: they
241 make `&T` work like `&mut T`!
242
243 Now what about the lifetime on references? Why is it ok for both kinds of references
244 to be covariant over their lifetimes? Well, here's a two-pronged argument:
245
246 First and foremost, subtyping references based on their lifetimes is *the entire point
247 of subtyping in Rust*. The only reason we have subtyping is so we can pass
248 long-lived things where short-lived things are expected. So it better work!
249
250 Second, and more seriously, lifetimes are only a part of the reference itself. The
251 type of the referent is shared knowledge, which is why adjusting that type in only
252 one place (the reference) can lead to issues. But if you shrink down a reference's
253 lifetime when you hand it to someone, that lifetime information isn't shared in
254 any way. There are now two independent references with independent lifetimes.
255 There's no way to mess with original reference's lifetime using the other one.
256
257 Or rather, the only way to mess with someone's lifetime is to build a meowing dog.
258 But as soon as you try to build a meowing dog, the lifetime should be wrapped up
259 in an invariant type, preventing the lifetime from being shrunk. To understand this
260 better, let's port the meowing dog problem over to real Rust.
261
262 In the meowing dog problem we take a subtype (Cat), convert it into a supertype
263 (Animal), and then use that fact to overwrite the subtype with a value that satisfies
264 the constraints of the supertype but not the subtype (Dog).
265
266 So with lifetimes, we want to take a long-lived thing, convert it into a
267 short-lived thing, and then use that to write something that doesn't live long
268 enough into the place expecting something long-lived.
269
270 Here it is:
271
272 ```rust,compile_fail
273 fn evil_feeder<T>(input: &mut T, val: T) {
274 *input = val;
275 }
276
277 fn main() {
278 let mut mr_snuggles: &'static str = "meow! :3"; // mr. snuggles forever!!
279 {
280 let spike = String::from("bark! >:V");
281 let spike_str: &str = &spike; // Only lives for the block
282 evil_feeder(&mut mr_snuggles, spike_str); // EVIL!
283 }
284 println!("{}", mr_snuggles); // Use after free?
285 }
286 ```
287
288 And what do we get when we run this?
289
290 ```text
291 error[E0597]: `spike` does not live long enough
292 --> src/main.rs:9:31
293 |
294 6 | let mut mr_snuggles: &'static str = "meow! :3"; // mr. snuggles forever!!
295 | ------------ type annotation requires that `spike` is borrowed for `'static`
296 ...
297 9 | let spike_str: &str = &spike; // Only lives for the block
298 | ^^^^^^ borrowed value does not live long enough
299 10 | evil_feeder(&mut mr_snuggles, spike_str); // EVIL!
300 11 | }
301 | - `spike` dropped here while still borrowed
302 ```
303
304 Good, it doesn't compile! Let's break down what's happening here in detail.
305
306 First let's look at the new `evil_feeder` function:
307
308 ```rust
309 fn evil_feeder<T>(input: &mut T, val: T) {
310 *input = val;
311 }
312 ```
313
314 All it does is take a mutable reference and a value and overwrite the referent with it.
315 What's important about this function is that it creates a type equality constraint. It
316 clearly says in its signature the referent and the value must be the *exact same* type.
317
318 Meanwhile, in the caller we pass in `&mut &'static str` and `&'spike_str str`.
319
320 Because `&mut T` is invariant over `T`, the compiler concludes it can't apply any subtyping
321 to the first argument, and so `T` must be exactly `&'static str`.
322
323 The other argument is only an `&'a str`, which *is* covariant over `'a`. So the compiler
324 adopts a constraint: `&'spike_str str` must be a subtype of `&'static str` (inclusive),
325 which in turn implies `'spike_str` must be a subtype of `'static` (inclusive). Which is to say,
326 `'spike_str` must contain `'static`. But only one thing contains `'static` -- `'static` itself!
327
328 This is why we get an error when we try to assign `&spike` to `spike_str`. The
329 compiler has worked backwards to conclude `spike_str` must live forever, and `&spike`
330 simply can't live that long.
331
332 So even though references are covariant over their lifetimes, they "inherit" invariance
333 whenever they're put into a context that could do something bad with that. In this case,
334 we inherited invariance as soon as we put our reference inside an `&mut T`.
335
336 As it turns out, the argument for why it's ok for Box (and Vec, Hashmap, etc.) to
337 be covariant is pretty similar to the argument for why it's ok for
338 lifetimes to be covariant: as soon as you try to stuff them in something like a
339 mutable reference, they inherit invariance and you're prevented from doing anything
340 bad.
341
342 However Box makes it easier to focus on by-value aspect of references that we
343 partially glossed over.
344
345 Unlike a lot of languages which allow values to be freely aliased at all times,
346 Rust has a very strict rule: if you're allowed to mutate or move a value, you
347 are guaranteed to be the only one with access to it.
348
349 Consider the following code:
350
351 <!-- ignore: simplified code -->
352 ```rust,ignore
353 let mr_snuggles: Box<Cat> = ..;
354 let spike: Box<Dog> = ..;
355
356 let mut pet: Box<Animal>;
357 pet = mr_snuggles;
358 pet = spike;
359 ```
360
361 There is no problem at all with the fact that we have forgotten that `mr_snuggles` was a Cat,
362 or that we overwrote him with a Dog, because as soon as we moved mr_snuggles to a variable
363 that only knew he was an Animal, **we destroyed the only thing in the universe that
364 remembered he was a Cat**!
365
366 In contrast to the argument about immutable references being soundly covariant because they
367 don't let you change anything, owned values can be covariant because they make you
368 change *everything*. There is no connection between old locations and new locations.
369 Applying by-value subtyping is an irreversible act of knowledge destruction, and
370 without any memory of how things used to be, no one can be tricked into acting on
371 that old information!
372
373 Only one thing left to explain: function pointers.
374
375 To see why `fn(T) -> U` should be covariant over `U`, consider the following signature:
376
377 <!-- ignore: simplified code -->
378 ```rust,ignore
379 fn get_animal() -> Animal;
380 ```
381
382 This function claims to produce an Animal. As such, it is perfectly valid to
383 provide a function with the following signature instead:
384
385 <!-- ignore: simplified code -->
386 ```rust,ignore
387 fn get_animal() -> Cat;
388 ```
389
390 After all, Cats are Animals, so always producing a Cat is a perfectly valid way
391 to produce Animals. Or to relate it back to real Rust: if we need a function
392 that is supposed to produce something that lives for `'short`, it's perfectly
393 fine for it to produce something that lives for `'long`. We don't care, we can
394 just forget that fact.
395
396 However, the same logic does not apply to *arguments*. Consider trying to satisfy:
397
398 <!-- ignore: simplified code -->
399 ```rust,ignore
400 fn handle_animal(Animal);
401 ```
402
403 with:
404
405 <!-- ignore: simplified code -->
406 ```rust,ignore
407 fn handle_animal(Cat);
408 ```
409
410 The first function can accept Dogs, but the second function absolutely can't.
411 Covariance doesn't work here. But if we flip it around, it actually *does*
412 work! If we need a function that can handle Cats, a function that can handle *any*
413 Animal will surely work fine. Or to relate it back to real Rust: if we need a
414 function that can handle anything that lives for at least `'long`, it's perfectly
415 fine for it to be able to handle anything that lives for at least `'short`.
416
417 And that's why function types, unlike anything else in the language, are
418 **contra**variant over their arguments.
419
420 Now, this is all well and good for the types the standard library provides, but
421 how is variance determined for types that *you* define? A struct, informally
422 speaking, inherits the variance of its fields. If a struct `MyType`
423 has a generic argument `A` that is used in a field `a`, then MyType's variance
424 over `A` is exactly `a`'s variance over `A`.
425
426 However if `A` is used in multiple fields:
427
428 * If all uses of `A` are covariant, then MyType is covariant over `A`
429 * If all uses of `A` are contravariant, then MyType is contravariant over `A`
430 * Otherwise, MyType is invariant over `A`
431
432 ```rust
433 use std::cell::Cell;
434
435 struct MyType<'a, 'b, A: 'a, B: 'b, C, D, E, F, G, H, In, Out, Mixed> {
436 a: &'a A, // covariant over 'a and A
437 b: &'b mut B, // covariant over 'b and invariant over B
438
439 c: *const C, // covariant over C
440 d: *mut D, // invariant over D
441
442 e: E, // covariant over E
443 f: Vec<F>, // covariant over F
444 g: Cell<G>, // invariant over G
445
446 h1: H, // would also be covariant over H except...
447 h2: Cell<H>, // invariant over H, because invariance wins all conflicts
448
449 i: fn(In) -> Out, // contravariant over In, covariant over Out
450
451 k1: fn(Mixed) -> usize, // would be contravariant over Mixed except..
452 k2: Mixed, // invariant over Mixed, because invariance wins all conflicts
453 }
454 ```