]>
Commit | Line | Data |
---|---|---|
c1a9b12d SL |
1 | % Choosing your Guarantees |
2 | ||
e9174d1e | 3 | One important feature of Rust is that it lets us control the costs and guarantees |
c1a9b12d SL |
4 | of a program. |
5 | ||
6 | There are various “wrapper type” abstractions in the Rust standard library which embody | |
7 | a multitude of tradeoffs between cost, ergonomics, and guarantees. Many let one choose between | |
8 | run time and compile time enforcement. This section will explain a few selected abstractions in | |
9 | detail. | |
10 | ||
11 | Before proceeding, it is highly recommended that one reads about [ownership][ownership] and | |
12 | [borrowing][borrowing] in Rust. | |
13 | ||
14 | [ownership]: ownership.html | |
15 | [borrowing]: references-and-borrowing.html | |
16 | ||
17 | # Basic pointer types | |
18 | ||
19 | ## `Box<T>` | |
20 | ||
e9174d1e SL |
21 | [`Box<T>`][box] is an “owned” pointer, or a “box”. While it can hand |
22 | out references to the contained data, it is the only owner of the data. In particular, consider | |
23 | the following: | |
c1a9b12d SL |
24 | |
25 | ```rust | |
26 | let x = Box::new(1); | |
27 | let y = x; | |
28 | // x no longer accessible here | |
29 | ``` | |
30 | ||
31 | Here, the box was _moved_ into `y`. As `x` no longer owns it, the compiler will no longer allow the | |
32 | programmer to use `x` after this. A box can similarly be moved _out_ of a function by returning it. | |
33 | ||
34 | When a box (that hasn't been moved) goes out of scope, destructors are run. These destructors take | |
35 | care of deallocating the inner data. | |
36 | ||
37 | This is a zero-cost abstraction for dynamic allocation. If you want to allocate some memory on the | |
38 | heap and safely pass around a pointer to that memory, this is ideal. Note that you will only be | |
39 | allowed to share references to this by the regular borrowing rules, checked at compile time. | |
40 | ||
41 | [box]: ../std/boxed/struct.Box.html | |
42 | ||
e9174d1e | 43 | ## `&T` and `&mut T` |
c1a9b12d SL |
44 | |
45 | These are immutable and mutable references respectively. They follow the “read-write lock” | |
46 | pattern, such that one may either have only one mutable reference to some data, or any number of | |
47 | immutable ones, but not both. This guarantee is enforced at compile time, and has no visible cost at | |
48 | runtime. In most cases these two pointer types suffice for sharing cheap references between sections | |
49 | of code. | |
50 | ||
51 | These pointers cannot be copied in such a way that they outlive the lifetime associated with them. | |
52 | ||
53 | ## `*const T` and `*mut T` | |
54 | ||
9cc50fc6 | 55 | These are C-like raw pointers with no lifetime or ownership attached to them. They point to |
c1a9b12d SL |
56 | some location in memory with no other restrictions. The only guarantee that these provide is that |
57 | they cannot be dereferenced except in code marked `unsafe`. | |
58 | ||
59 | These are useful when building safe, low cost abstractions like `Vec<T>`, but should be avoided in | |
60 | safe code. | |
61 | ||
62 | ## `Rc<T>` | |
63 | ||
64 | This is the first wrapper we will cover that has a runtime cost. | |
65 | ||
66 | [`Rc<T>`][rc] is a reference counted pointer. In other words, this lets us have multiple "owning" | |
67 | pointers to the same data, and the data will be dropped (destructors will be run) when all pointers | |
68 | are out of scope. | |
69 | ||
70 | Internally, it contains a shared “reference count” (also called “refcount”), | |
71 | which is incremented each time the `Rc` is cloned, and decremented each time one of the `Rc`s goes | |
72 | out of scope. The main responsibility of `Rc<T>` is to ensure that destructors are called for shared | |
73 | data. | |
74 | ||
75 | The internal data here is immutable, and if a cycle of references is created, the data will be | |
76 | leaked. If we want data that doesn't leak when there are cycles, we need a garbage collector. | |
77 | ||
78 | #### Guarantees | |
79 | ||
80 | The main guarantee provided here is that the data will not be destroyed until all references to it | |
81 | are out of scope. | |
82 | ||
83 | This should be used when we wish to dynamically allocate and share some data (read-only) between | |
84 | various portions of your program, where it is not certain which portion will finish using the pointer | |
85 | last. It's a viable alternative to `&T` when `&T` is either impossible to statically check for | |
86 | correctness, or creates extremely unergonomic code where the programmer does not wish to spend the | |
87 | development cost of working with. | |
88 | ||
89 | This pointer is _not_ thread safe, and Rust will not let it be sent or shared with other threads. | |
90 | This lets one avoid the cost of atomics in situations where they are unnecessary. | |
91 | ||
92 | There is a sister smart pointer to this one, `Weak<T>`. This is a non-owning, but also non-borrowed, | |
93 | smart pointer. It is also similar to `&T`, but it is not restricted in lifetime—a `Weak<T>` | |
94 | can be held on to forever. However, it is possible that an attempt to access the inner data may fail | |
95 | and return `None`, since this can outlive the owned `Rc`s. This is useful for cyclic | |
96 | data structures and other things. | |
97 | ||
98 | #### Cost | |
99 | ||
100 | As far as memory goes, `Rc<T>` is a single allocation, though it will allocate two extra words (i.e. | |
101 | two `usize` values) as compared to a regular `Box<T>` (for "strong" and "weak" refcounts). | |
102 | ||
103 | `Rc<T>` has the computational cost of incrementing/decrementing the refcount whenever it is cloned | |
104 | or goes out of scope respectively. Note that a clone will not do a deep copy, rather it will simply | |
105 | increment the inner reference count and return a copy of the `Rc<T>`. | |
106 | ||
107 | [rc]: ../std/rc/struct.Rc.html | |
108 | ||
109 | # Cell types | |
110 | ||
111 | `Cell`s provide interior mutability. In other words, they contain data which can be manipulated even | |
112 | if the type cannot be obtained in a mutable form (for example, when it is behind an `&`-ptr or | |
113 | `Rc<T>`). | |
114 | ||
115 | [The documentation for the `cell` module has a pretty good explanation for these][cell-mod]. | |
116 | ||
117 | These types are _generally_ found in struct fields, but they may be found elsewhere too. | |
118 | ||
119 | ## `Cell<T>` | |
120 | ||
121 | [`Cell<T>`][cell] is a type that provides zero-cost interior mutability, but only for `Copy` types. | |
122 | Since the compiler knows that all the data owned by the contained value is on the stack, there's | |
123 | no worry of leaking any data behind references (or worse!) by simply replacing the data. | |
124 | ||
125 | It is still possible to violate your own invariants using this wrapper, so be careful when using it. | |
126 | If a field is wrapped in `Cell`, it's a nice indicator that the chunk of data is mutable and may not | |
127 | stay the same between the time you first read it and when you intend to use it. | |
128 | ||
129 | ```rust | |
130 | use std::cell::Cell; | |
131 | ||
132 | let x = Cell::new(1); | |
133 | let y = &x; | |
134 | let z = &x; | |
135 | x.set(2); | |
136 | y.set(3); | |
137 | z.set(4); | |
138 | println!("{}", x.get()); | |
139 | ``` | |
140 | ||
141 | Note that here we were able to mutate the same value from various immutable references. | |
142 | ||
143 | This has the same runtime cost as the following: | |
144 | ||
145 | ```rust,ignore | |
146 | let mut x = 1; | |
147 | let y = &mut x; | |
148 | let z = &mut x; | |
149 | x = 2; | |
150 | *y = 3; | |
151 | *z = 4; | |
152 | println!("{}", x); | |
153 | ``` | |
154 | ||
155 | but it has the added benefit of actually compiling successfully. | |
156 | ||
157 | #### Guarantees | |
158 | ||
159 | This relaxes the “no aliasing with mutability” restriction in places where it's | |
160 | unnecessary. However, this also relaxes the guarantees that the restriction provides; so if your | |
161 | invariants depend on data stored within `Cell`, you should be careful. | |
162 | ||
163 | This is useful for mutating primitives and other `Copy` types when there is no easy way of | |
164 | doing it in line with the static rules of `&` and `&mut`. | |
165 | ||
166 | `Cell` does not let you obtain interior references to the data, which makes it safe to freely | |
167 | mutate. | |
168 | ||
169 | #### Cost | |
170 | ||
171 | There is no runtime cost to using `Cell<T>`, however if you are using it to wrap larger (`Copy`) | |
172 | structs, it might be worthwhile to instead wrap individual fields in `Cell<T>` since each write is | |
173 | otherwise a full copy of the struct. | |
174 | ||
175 | ||
176 | ## `RefCell<T>` | |
177 | ||
178 | [`RefCell<T>`][refcell] also provides interior mutability, but isn't restricted to `Copy` types. | |
179 | ||
180 | Instead, it has a runtime cost. `RefCell<T>` enforces the read-write lock pattern at runtime (it's | |
181 | like a single-threaded mutex), unlike `&T`/`&mut T` which do so at compile time. This is done by the | |
182 | `borrow()` and `borrow_mut()` functions, which modify an internal reference count and return smart | |
183 | pointers which can be dereferenced immutably and mutably respectively. The refcount is restored when | |
184 | the smart pointers go out of scope. With this system, we can dynamically ensure that there are never | |
185 | any other borrows active when a mutable borrow is active. If the programmer attempts to make such a | |
186 | borrow, the thread will panic. | |
187 | ||
188 | ```rust | |
189 | use std::cell::RefCell; | |
190 | ||
191 | let x = RefCell::new(vec![1,2,3,4]); | |
192 | { | |
193 | println!("{:?}", *x.borrow()) | |
194 | } | |
195 | ||
196 | { | |
197 | let mut my_ref = x.borrow_mut(); | |
198 | my_ref.push(1); | |
199 | } | |
200 | ``` | |
201 | ||
202 | Similar to `Cell`, this is mainly useful for situations where it's hard or impossible to satisfy the | |
203 | borrow checker. Generally we know that such mutations won't happen in a nested form, but it's good | |
204 | to check. | |
205 | ||
206 | For large, complicated programs, it becomes useful to put some things in `RefCell`s to make things | |
54a0048b | 207 | simpler. For example, a lot of the maps in the `ctxt` struct in the Rust compiler internals |
c1a9b12d SL |
208 | are inside this wrapper. These are only modified once (during creation, which is not right after |
209 | initialization) or a couple of times in well-separated places. However, since this struct is | |
210 | pervasively used everywhere, juggling mutable and immutable pointers would be hard (perhaps | |
211 | impossible) and probably form a soup of `&`-ptrs which would be hard to extend. On the other hand, | |
212 | the `RefCell` provides a cheap (not zero-cost) way of safely accessing these. In the future, if | |
213 | someone adds some code that attempts to modify the cell when it's already borrowed, it will cause a | |
214 | (usually deterministic) panic which can be traced back to the offending borrow. | |
215 | ||
216 | Similarly, in Servo's DOM there is a lot of mutation, most of which is local to a DOM type, but some | |
217 | of which crisscrosses the DOM and modifies various things. Using `RefCell` and `Cell` to guard all | |
218 | mutation lets us avoid worrying about mutability everywhere, and it simultaneously highlights the | |
219 | places where mutation is _actually_ happening. | |
220 | ||
221 | Note that `RefCell` should be avoided if a mostly simple solution is possible with `&` pointers. | |
222 | ||
223 | #### Guarantees | |
224 | ||
225 | `RefCell` relaxes the _static_ restrictions preventing aliased mutation, and replaces them with | |
226 | _dynamic_ ones. As such the guarantees have not changed. | |
227 | ||
228 | #### Cost | |
229 | ||
230 | `RefCell` does not allocate, but it contains an additional "borrow state" | |
231 | indicator (one word in size) along with the data. | |
232 | ||
233 | At runtime each borrow causes a modification/check of the refcount. | |
234 | ||
235 | [cell-mod]: ../std/cell/ | |
236 | [cell]: ../std/cell/struct.Cell.html | |
237 | [refcell]: ../std/cell/struct.RefCell.html | |
c1a9b12d SL |
238 | |
239 | # Synchronous types | |
240 | ||
241 | Many of the types above cannot be used in a threadsafe manner. Particularly, `Rc<T>` and | |
242 | `RefCell<T>`, which both use non-atomic reference counts (_atomic_ reference counts are those which | |
243 | can be incremented from multiple threads without causing a data race), cannot be used this way. This | |
244 | makes them cheaper to use, but we need thread safe versions of these too. They exist, in the form of | |
e9174d1e | 245 | `Arc<T>` and `Mutex<T>`/`RwLock<T>` |
c1a9b12d SL |
246 | |
247 | Note that the non-threadsafe types _cannot_ be sent between threads, and this is checked at compile | |
248 | time. | |
249 | ||
250 | There are many useful wrappers for concurrent programming in the [sync][sync] module, but only the | |
251 | major ones will be covered below. | |
252 | ||
253 | [sync]: ../std/sync/index.html | |
254 | ||
255 | ## `Arc<T>` | |
256 | ||
9cc50fc6 | 257 | [`Arc<T>`][arc] is a version of `Rc<T>` that uses an atomic reference count (hence, "Arc"). |
c1a9b12d SL |
258 | This can be sent freely between threads. |
259 | ||
260 | C++'s `shared_ptr` is similar to `Arc`, however in the case of C++ the inner data is always mutable. | |
261 | For semantics similar to that from C++, we should use `Arc<Mutex<T>>`, `Arc<RwLock<T>>`, or | |
262 | `Arc<UnsafeCell<T>>`[^4] (`UnsafeCell<T>` is a cell type that can be used to hold any data and has | |
263 | no runtime cost, but accessing it requires `unsafe` blocks). The last one should only be used if we | |
264 | are certain that the usage won't cause any memory unsafety. Remember that writing to a struct is not | |
265 | an atomic operation, and many functions like `vec.push()` can reallocate internally and cause unsafe | |
266 | behavior, so even monotonicity may not be enough to justify `UnsafeCell`. | |
267 | ||
268 | [^4]: `Arc<UnsafeCell<T>>` actually won't compile since `UnsafeCell<T>` isn't `Send` or `Sync`, but we can wrap it in a type and implement `Send`/`Sync` for it manually to get `Arc<Wrapper<T>>` where `Wrapper` is `struct Wrapper<T>(UnsafeCell<T>)`. | |
269 | ||
270 | #### Guarantees | |
271 | ||
272 | Like `Rc`, this provides the (thread safe) guarantee that the destructor for the internal data will | |
273 | be run when the last `Arc` goes out of scope (barring any cycles). | |
274 | ||
275 | #### Cost | |
276 | ||
277 | This has the added cost of using atomics for changing the refcount (which will happen whenever it is | |
278 | cloned or goes out of scope). When sharing data from an `Arc` in a single thread, it is preferable | |
279 | to share `&` pointers whenever possible. | |
280 | ||
281 | [arc]: ../std/sync/struct.Arc.html | |
282 | ||
283 | ## `Mutex<T>` and `RwLock<T>` | |
284 | ||
285 | [`Mutex<T>`][mutex] and [`RwLock<T>`][rwlock] provide mutual-exclusion via RAII guards (guards are | |
286 | objects which maintain some state, like a lock, until their destructor is called). For both of | |
287 | these, the mutex is opaque until we call `lock()` on it, at which point the thread will block | |
288 | until a lock can be acquired, and then a guard will be returned. This guard can be used to access | |
289 | the inner data (mutably), and the lock will be released when the guard goes out of scope. | |
290 | ||
291 | ```rust,ignore | |
292 | { | |
293 | let guard = mutex.lock(); | |
294 | // guard dereferences mutably to the inner type | |
295 | *guard += 1; | |
296 | } // lock released when destructor runs | |
297 | ``` | |
298 | ||
299 | ||
300 | `RwLock` has the added benefit of being efficient for multiple reads. It is always safe to have | |
301 | multiple readers to shared data as long as there are no writers; and `RwLock` lets readers acquire a | |
302 | "read lock". Such locks can be acquired concurrently and are kept track of via a reference count. | |
303 | Writers must obtain a "write lock" which can only be obtained when all readers have gone out of | |
304 | scope. | |
305 | ||
306 | #### Guarantees | |
307 | ||
308 | Both of these provide safe shared mutability across threads, however they are prone to deadlocks. | |
309 | Some level of additional protocol safety can be obtained via the type system. | |
310 | ||
311 | #### Costs | |
312 | ||
313 | These use internal atomic-like types to maintain the locks, which are pretty costly (they can block | |
314 | all memory reads across processors till they're done). Waiting on these locks can also be slow when | |
315 | there's a lot of concurrent access happening. | |
316 | ||
317 | [rwlock]: ../std/sync/struct.RwLock.html | |
318 | [mutex]: ../std/sync/struct.Mutex.html | |
319 | [sessions]: https://github.com/Munksgaard/rust-sessions | |
320 | ||
321 | # Composition | |
322 | ||
b039eaaf | 323 | A common gripe when reading Rust code is with types like `Rc<RefCell<Vec<T>>>` (or even more |
c1a9b12d SL |
324 | complicated compositions of such types). It's not always clear what the composition does, or why the |
325 | author chose one like this (and when one should be using such a composition in one's own code) | |
326 | ||
327 | Usually, it's a case of composing together the guarantees that you need, without paying for stuff | |
328 | that is unnecessary. | |
329 | ||
330 | For example, `Rc<RefCell<T>>` is one such composition. `Rc<T>` itself can't be dereferenced mutably; | |
331 | because `Rc<T>` provides sharing and shared mutability can lead to unsafe behavior, so we put | |
332 | `RefCell<T>` inside to get dynamically verified shared mutability. Now we have shared mutable data, | |
333 | but it's shared in a way that there can only be one mutator (and no readers) or multiple readers. | |
334 | ||
335 | Now, we can take this a step further, and have `Rc<RefCell<Vec<T>>>` or `Rc<Vec<RefCell<T>>>`. These | |
336 | are both shareable, mutable vectors, but they're not the same. | |
337 | ||
338 | With the former, the `RefCell<T>` is wrapping the `Vec<T>`, so the `Vec<T>` in its entirety is | |
339 | mutable. At the same time, there can only be one mutable borrow of the whole `Vec` at a given time. | |
340 | This means that your code cannot simultaneously work on different elements of the vector from | |
341 | different `Rc` handles. However, we are able to push and pop from the `Vec<T>` at will. This is | |
9cc50fc6 | 342 | similar to a `&mut Vec<T>` with the borrow checking done at runtime. |
c1a9b12d SL |
343 | |
344 | With the latter, the borrowing is of individual elements, but the overall vector is immutable. Thus, | |
345 | we can independently borrow separate elements, but we cannot push or pop from the vector. This is | |
9cc50fc6 | 346 | similar to a `&mut [T]`[^3], but, again, the borrow checking is at runtime. |
c1a9b12d SL |
347 | |
348 | In concurrent programs, we have a similar situation with `Arc<Mutex<T>>`, which provides shared | |
349 | mutability and ownership. | |
350 | ||
351 | When reading code that uses these, go in step by step and look at the guarantees/costs provided. | |
352 | ||
353 | When choosing a composed type, we must do the reverse; figure out which guarantees we want, and at | |
354 | which point of the composition we need them. For example, if there is a choice between | |
355 | `Vec<RefCell<T>>` and `RefCell<Vec<T>>`, we should figure out the tradeoffs as done above and pick | |
356 | one. | |
357 | ||
358 | [^3]: `&[T]` and `&mut [T]` are _slices_; they consist of a pointer and a length and can refer to a portion of a vector or array. `&mut [T]` can have its elements mutated, however its length cannot be touched. |