]> git.proxmox.com Git - rustc.git/blob - src/doc/nomicon/src/phantom-data.md
New upstream version 1.63.0+dfsg1
[rustc.git] / src / doc / nomicon / src / phantom-data.md
1 # PhantomData
2
3 When working with unsafe code, we can often end up in a situation where
4 types or lifetimes are logically associated with a struct, but not actually
5 part of a field. This most commonly occurs with lifetimes. For instance, the
6 `Iter` for `&'a [T]` is (approximately) defined as follows:
7
8 ```rust,compile_fail
9 struct Iter<'a, T: 'a> {
10 ptr: *const T,
11 end: *const T,
12 }
13 ```
14
15 However because `'a` is unused within the struct's body, it's *unbounded*.
16 [Because of the troubles this has historically caused][unused-param],
17 unbounded lifetimes and types are *forbidden* in struct definitions.
18 Therefore we must somehow refer to these types in the body.
19 Correctly doing this is necessary to have correct variance and drop checking.
20
21 [unused-param]: https://rust-lang.github.io/rfcs/0738-variance.html#the-corner-case-unused-parameters-and-parameters-that-are-only-used-unsafely
22
23 We do this using `PhantomData`, which is a special marker type. `PhantomData`
24 consumes no space, but simulates a field of the given type for the purpose of
25 static analysis. This was deemed to be less error-prone than explicitly telling
26 the type-system the kind of variance that you want, while also providing other
27 useful things such as the information needed by drop check.
28
29 Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell
30 the `PhantomData` to simulate:
31
32 ```rust
33 use std::marker;
34
35 struct Iter<'a, T: 'a> {
36 ptr: *const T,
37 end: *const T,
38 _marker: marker::PhantomData<&'a T>,
39 }
40 ```
41
42 and that's it. The lifetime will be bounded, and your iterator will be covariant
43 over `'a` and `T`. Everything Just Works.
44
45 ## Generic parameters and drop-checking
46
47 In the past, there used to be another thing to take into consideration.
48
49 This very documentation used to say:
50
51 > Another important example is Vec, which is (approximately) defined as follows:
52 >
53 > ```rust
54 > struct Vec<T> {
55 > data: *const T, // *const for variance!
56 > len: usize,
57 > cap: usize,
58 > }
59 > ```
60 >
61 > Unlike the previous example, it *appears* that everything is exactly as we
62 > want. Every generic argument to Vec shows up in at least one field.
63 > Good to go!
64 >
65 > Nope.
66 >
67 > The drop checker will generously determine that `Vec<T>` does not own any values
68 > of type T. This will in turn make it conclude that it doesn't need to worry
69 > about Vec dropping any T's in its destructor for determining drop check
70 > soundness. This will in turn allow people to create unsoundness using
71 > Vec's destructor.
72 >
73 > In order to tell the drop checker that we *do* own values of type T, and
74 > therefore may drop some T's when *we* drop, we must add an extra `PhantomData`
75 > saying exactly that:
76 >
77 > ```rust
78 > use std::marker;
79 >
80 > struct Vec<T> {
81 > data: *const T, // *const for variance!
82 > len: usize,
83 > cap: usize,
84 > _owns_T: marker::PhantomData<T>,
85 > }
86 > ```
87
88 But ever since [RFC 1238](https://rust-lang.github.io/rfcs/1238-nonparametric-dropck.html),
89 **this is no longer true nor necessary**.
90
91 If you were to write:
92
93 ```rust
94 struct Vec<T> {
95 data: *const T, // `*const` for variance!
96 len: usize,
97 cap: usize,
98 }
99
100 # #[cfg(any())]
101 impl<T> Drop for Vec<T> { /* … */ }
102 ```
103
104 then the existence of that `impl<T> Drop for Vec<T>` makes it so Rust will consider
105 that that `Vec<T>` _owns_ values of type `T` (more precisely: may use values of type `T`
106 in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a
107 `Vec<T>` be dropped.
108
109 **Adding an extra `_owns_T: PhantomData<T>` field is thus _superfluous_ and accomplishes nothing**.
110
111 ___
112
113 But this situation can sometimes lead to overly restrictive code. That's why the
114 standard library uses an unstable and `unsafe` attribute to opt back into the old
115 "unchecked" drop-checking behavior, that this very documentation warned about: the
116 `#[may_dangle]` attribute.
117
118 ### An exception: the special case of the standard library and its unstable `#[may_dangle]`
119
120 This section can be skipped if you are only writing your own library code; but if you are
121 curious about what the standard library does with the actual `Vec` definition, you'll notice
122 that it still needs to use a `_owns_T: PhantomData<T>` field for soundness.
123
124 <details><summary>Click here to see why</summary>
125
126 Consider the following example:
127
128 ```rust
129 fn main() {
130 let mut v: Vec<&str> = Vec::new();
131 let s: String = "Short-lived".into();
132 v.push(&s);
133 drop(s);
134 } // <- `v` is dropped here
135 ```
136
137 with a classical `impl<T> Drop for Vec<T> {` definition, the above [is denied].
138
139 [is denied]: https://rust.godbolt.org/z/ans15Kqz3
140
141 Indeed, in this case we have a `Vec</* T = */ &'s str>` vector of `'s`-lived references
142 to `str`ings, but in the case of `let s: String`, it is dropped before the `Vec` is, and
143 thus `'s` **is expired** by the time the `Vec` is dropped, and the
144 `impl<'s> Drop for Vec<&'s str> {` is used.
145
146 This means that if such `Drop` were to be used, it would be dealing with an _expired_, or
147 _dangling_ lifetime `'s`. But this is contrary to Rust principles, where by default all
148 Rust references involved in a function signature are non-dangling and valid to dereference.
149
150 Hence why Rust has to conservatively deny this snippet.
151
152 And yet, in the case of the real `Vec`, the `Drop` impl does not care about `&'s str`,
153 _since it has no drop glue of its own_: it only wants to deallocate the backing buffer.
154
155 In other words, it would be nice if the above snippet was somehow accepted, by special
156 casing `Vec`, or by relying on some special property of `Vec`: `Vec` could try to
157 _promise not to use the `&'s str`s it holds when being dropped_.
158
159 This is the kind of `unsafe` promise that can be expressed with `#[may_dangle]`:
160
161 ```rust ,ignore
162 unsafe impl<#[may_dangle] 's> Drop for Vec<&'s str> { /* … */ }
163 ```
164
165 or, more generally:
166
167 ```rust ,ignore
168 unsafe impl<#[may_dangle] T> Drop for Vec<T> { /* … */ }
169 ```
170
171 is the `unsafe` way to opt out of this conservative assumption that Rust's drop
172 checker makes about type parameters of a dropped instance not being allowed to dangle.
173
174 And when this is done, such as in the standard library, we need to be careful in the
175 case where `T` has drop glue of its own. In this instance, imagine replacing the
176 `&'s str`s with a `struct PrintOnDrop<'s> /* = */ (&'s str);` which would have a
177 `Drop` impl wherein the inner `&'s str` would be dereferenced and printed to the screen.
178
179 Indeed, `Drop for Vec<T> {`, before deallocating the backing buffer, does have to transitively
180 drop each `T` item when it has drop glue; in the case of `PrintOnDrop<'s>`, it means that
181 `Drop for Vec<PrintOnDrop<'s>>` has to transitively drop the `PrintOnDrop<'s>`s elements before
182 deallocating the backing buffer.
183
184 So when we said that `'s` `#[may_dangle]`, it was an excessively loose statement. We'd rather want
185 to say: "`'s` may dangle provided it not be involved in some transitive drop glue". Or, more generally,
186 "`T` may dangle provided it not be involved in some transitive drop glue". This "exception to the
187 exception" is a pervasive situation whenever **we own a `T`**. That's why Rust's `#[may_dangle]` is
188 smart enough to know of this opt-out, and will thus be disabled _when the generic parameter is held
189 in an owned fashion_ by the fields of the struct.
190
191 Hence why the standard library ends up with:
192
193 ```rust
194 # #[cfg(any())]
195 // we pinky-swear not to use `T` when dropping a `Vec`…
196 unsafe impl<#[may_dangle] T> Drop for Vec<T> {
197 fn drop(&mut self) {
198 unsafe {
199 if mem::needs_drop::<T>() {
200 /* … except here, that is, … */
201 ptr::drop_in_place::<[T]>(/* … */);
202 }
203 // …
204 dealloc(/* … */)
205 // …
206 }
207 }
208 }
209
210 struct Vec<T> {
211 // … except for the fact that a `Vec` owns `T` items and
212 // may thus be dropping `T` items on drop!
213 _owns_T: core::marker::PhantomData<T>,
214
215 ptr: *const T, // `*const` for variance (but this does not express ownership of a `T` *per se*)
216 len: usize,
217 cap: usize,
218 }
219 ```
220
221 </details>
222
223 ___
224
225 Raw pointers that own an allocation is such a pervasive pattern that the
226 standard library made a utility for itself called `Unique<T>` which:
227
228 * wraps a `*const T` for variance
229 * includes a `PhantomData<T>`
230 * auto-derives `Send`/`Sync` as if T was contained
231 * marks the pointer as `NonZero` for the null-pointer optimization
232
233 ## Table of `PhantomData` patterns
234
235 Here’s a table of all the wonderful ways `PhantomData` could be used:
236
237 | Phantom type | `'a` | `T` |
238 |-----------------------------|-----------|---------------------------|
239 | `PhantomData<T>` | - | covariant (with drop check) |
240 | `PhantomData<&'a T>` | covariant | covariant |
241 | `PhantomData<&'a mut T>` | covariant | invariant |
242 | `PhantomData<*const T>` | - | covariant |
243 | `PhantomData<*mut T>` | - | invariant |
244 | `PhantomData<fn(T)>` | - | contravariant |
245 | `PhantomData<fn() -> T>` | - | covariant |
246 | `PhantomData<fn(T) -> T>` | - | invariant |
247 | `PhantomData<Cell<&'a ()>>` | invariant | - |