src/doc/book/choosing-your-guarantees.md

   1 % Choosing your Guarantees
   2
   3 One important feature of Rust is that it lets us control the costs and guarantees
   4 of a program.
   5
   6 There are various &ldquo;wrapper type&rdquo; abstractions in the Rust standard library which embody
   7 a multitude of tradeoffs between cost, ergonomics, and guarantees. Many let one choose between
   8 run time and compile time enforcement. This section will explain a few selected abstractions in
   9 detail.
  10
  11 Before proceeding, it is highly recommended that one reads about [ownership][ownership] and
  12 [borrowing][borrowing] in Rust.
  13
  14 [ownership]: ownership.html
  15 [borrowing]: references-and-borrowing.html
  16
  17 # Basic pointer types
  18
  19 ## `Box<T>`
  20
  21 [`Box<T>`][box] is an &ldquo;owned&rdquo; pointer, or a &ldquo;box&rdquo;. While it can hand
  22 out references to the contained data, it is the only owner of the data. In particular, consider
  23 the following:
  24
  25 ```rust
  26 let x = Box::new(1);
  27 let y = x;
  28 // `x` is no longer accessible here.
  29 ```
  30
  31 Here, the box was _moved_ into `y`. As `x` no longer owns it, the compiler will no longer allow the
  32 programmer to use `x` after this. A box can similarly be moved _out_ of a function by returning it.
  33
  34 When a box (that hasn't been moved) goes out of scope, destructors are run. These destructors take
  35 care of deallocating the inner data.
  36
  37 This is a zero-cost abstraction for dynamic allocation. If you want to allocate some memory on the
  38 heap and safely pass around a pointer to that memory, this is ideal. Note that you will only be
  39 allowed to share references to this by the regular borrowing rules, checked at compile time.
  40
  41 [box]: ../std/boxed/struct.Box.html
  42
  43 ## `&T` and `&mut T`
  44
  45 These are immutable and mutable references respectively. They follow the &ldquo;read-write lock&rdquo;
  46 pattern, such that one may either have only one mutable reference to some data, or any number of
  47 immutable ones, but not both. This guarantee is enforced at compile time, and has no visible cost at
  48 runtime. In most cases these two pointer types suffice for sharing cheap references between sections
  49 of code.
  50
  51 These pointers cannot be copied in such a way that they outlive the lifetime associated with them.
  52
  53 ## `*const T` and `*mut T`
  54
  55 These are C-like raw pointers with no lifetime or ownership attached to them. They point to
  56 some location in memory with no other restrictions. The only guarantee that these provide is that
  57 they cannot be dereferenced except in code marked `unsafe`.
  58
  59 These are useful when building safe, low cost abstractions like `Vec<T>`, but should be avoided in
  60 safe code.
  61
  62 ## `Rc<T>`
  63
  64 This is the first wrapper we will cover that has a runtime cost.
  65
  66 [`Rc<T>`][rc] is a reference counted pointer. In other words, this lets us have multiple "owning"
  67 pointers to the same data, and the data will be dropped (destructors will be run) when all pointers
  68 are out of scope.
  69
  70 Internally, it contains a shared &ldquo;reference count&rdquo; (also called &ldquo;refcount&rdquo;),
  71 which is incremented each time the `Rc` is cloned, and decremented each time one of the `Rc`s goes
  72 out of scope. The main responsibility of `Rc<T>` is to ensure that destructors are called for shared
  73 data.
  74
  75 The internal data here is immutable, and if a cycle of references is created, the data will be
  76 leaked. If we want data that doesn't leak when there are cycles, we need a garbage collector.
  77
  78 #### Guarantees
  79
  80 The main guarantee provided here is that the data will not be destroyed until all references to it
  81 are out of scope.
  82
  83 This should be used when we wish to dynamically allocate and share some data (read-only) between
  84 various portions of your program, where it is not certain which portion will finish using the pointer
  85 last. It's a viable alternative to `&T` when `&T` is either impossible to statically check for
  86 correctness, or creates extremely unergonomic code where the programmer does not wish to spend the
  87 development cost of working with.
  88
  89 This pointer is _not_ thread safe, and Rust will not let it be sent or shared with other threads.
  90 This lets one avoid the cost of atomics in situations where they are unnecessary.
  91
  92 There is a sister smart pointer to this one, `Weak<T>`. This is a non-owning, but also non-borrowed,
  93 smart pointer. It is also similar to `&T`, but it is not restricted in lifetime&mdash;a `Weak<T>`
  94 can be held on to forever. However, it is possible that an attempt to access the inner data may fail
  95 and return `None`, since this can outlive the owned `Rc`s. This is useful for cyclic
  96 data structures and other things.
  97
  98 #### Cost
  99
 100 As far as memory goes, `Rc<T>` is a single allocation, though it will allocate two extra words (i.e.
 101 two `usize` values) as compared to a regular `Box<T>` (for "strong" and "weak" refcounts).
 102
 103 `Rc<T>` has the computational cost of incrementing/decrementing the refcount whenever it is cloned
 104 or goes out of scope respectively. Note that a clone will not do a deep copy, rather it will simply
 105 increment the inner reference count and return a copy of the `Rc<T>`.
 106
 107 [rc]: ../std/rc/struct.Rc.html
 108
 109 # Cell types
 110
 111 `Cell`s provide interior mutability. In other words, they contain data which can be manipulated even
 112 if the type cannot be obtained in a mutable form (for example, when it is behind an `&`-ptr or
 113 `Rc<T>`).
 114
 115 [The documentation for the `cell` module has a pretty good explanation for these][cell-mod].
 116
 117 These types are _generally_ found in struct fields, but they may be found elsewhere too.
 118
 119 ## `Cell<T>`
 120
 121 [`Cell<T>`][cell] is a type that provides zero-cost interior mutability, but only for `Copy` types.
 122 Since the compiler knows that all the data owned by the contained value is on the stack, there's
 123 no worry of leaking any data behind references (or worse!) by simply replacing the data.
 124
 125 It is still possible to violate your own invariants using this wrapper, so be careful when using it.
 126 If a field is wrapped in `Cell`, it's a nice indicator that the chunk of data is mutable and may not
 127 stay the same between the time you first read it and when you intend to use it.
 128
 129 ```rust
 130 use std::cell::Cell;
 131
 132 let x = Cell::new(1);
 133 let y = &x;
 134 let z = &x;
 135 x.set(2);
 136 y.set(3);
 137 z.set(4);
 138 println!("{}", x.get());
 139 ```
 140
 141 Note that here we were able to mutate the same value from various immutable references.
 142
 143 This has the same runtime cost as the following:
 144
 145 ```rust,ignore
 146 let mut x = 1;
 147 let y = &mut x;
 148 let z = &mut x;
 149 x = 2;
 150 *y = 3;
 151 *z = 4;
 152 println!("{}", x);
 153 ```
 154
 155 but it has the added benefit of actually compiling successfully.
 156
 157 #### Guarantees
 158
 159 This relaxes the &ldquo;no aliasing with mutability&rdquo; restriction in places where it's
 160 unnecessary. However, this also relaxes the guarantees that the restriction provides; so if your
 161 invariants depend on data stored within `Cell`, you should be careful.
 162
 163 This is useful for mutating primitives and other `Copy` types when there is no easy way of
 164 doing it in line with the static rules of `&` and `&mut`.
 165
 166 `Cell` does not let you obtain interior references to the data, which makes it safe to freely
 167 mutate.
 168
 169 #### Cost
 170
 171 There is no runtime cost to using `Cell<T>`, however if you are using it to wrap larger (`Copy`)
 172 structs, it might be worthwhile to instead wrap individual fields in `Cell<T>` since each write is
 173 otherwise a full copy of the struct.
 174
 175
 176 ## `RefCell<T>`
 177
 178 [`RefCell<T>`][refcell] also provides interior mutability, but isn't restricted to `Copy` types.
 179
 180 Instead, it has a runtime cost. `RefCell<T>` enforces the read-write lock pattern at runtime (it's
 181 like a single-threaded mutex), unlike `&T`/`&mut T` which do so at compile time. This is done by the
 182 `borrow()` and `borrow_mut()` functions, which modify an internal reference count and return smart
 183 pointers which can be dereferenced immutably and mutably respectively. The refcount is restored when
 184 the smart pointers go out of scope. With this system, we can dynamically ensure that there are never
 185 any other borrows active when a mutable borrow is active. If the programmer attempts to make such a
 186 borrow, the thread will panic.
 187
 188 ```rust
 189 use std::cell::RefCell;
 190
 191 let x = RefCell::new(vec![1,2,3,4]);
 192 {
 193     println!("{:?}", *x.borrow())
 194 }
 195
 196 {
 197     let mut my_ref = x.borrow_mut();
 198     my_ref.push(1);
 199 }
 200 ```
 201
 202 Similar to `Cell`, this is mainly useful for situations where it's hard or impossible to satisfy the
 203 borrow checker. Generally we know that such mutations won't happen in a nested form, but it's good
 204 to check.
 205
 206 For large, complicated programs, it becomes useful to put some things in `RefCell`s to make things
 207 simpler. For example, a lot of the maps in the `ctxt` struct in the Rust compiler internals
 208 are inside this wrapper. These are only modified once (during creation, which is not right after
 209 initialization) or a couple of times in well-separated places. However, since this struct is
 210 pervasively used everywhere, juggling mutable and immutable pointers would be hard (perhaps
 211 impossible) and probably form a soup of `&`-ptrs which would be hard to extend. On the other hand,
 212 the `RefCell` provides a cheap (not zero-cost) way of safely accessing these. In the future, if
 213 someone adds some code that attempts to modify the cell when it's already borrowed, it will cause a
 214 (usually deterministic) panic which can be traced back to the offending borrow.
 215
 216 Similarly, in Servo's DOM there is a lot of mutation, most of which is local to a DOM type, but some
 217 of which crisscrosses the DOM and modifies various things. Using `RefCell` and `Cell` to guard all
 218 mutation lets us avoid worrying about mutability everywhere, and it simultaneously highlights the
 219 places where mutation is _actually_ happening.
 220
 221 Note that `RefCell` should be avoided if a mostly simple solution is possible with `&` pointers.
 222
 223 #### Guarantees
 224
 225 `RefCell` relaxes the _static_ restrictions preventing aliased mutation, and replaces them with
 226 _dynamic_ ones. As such the guarantees have not changed.
 227
 228 #### Cost
 229
 230 `RefCell` does not allocate, but it contains an additional "borrow state"
 231 indicator (one word in size) along with the data.
 232
 233 At runtime each borrow causes a modification/check of the refcount.
 234
 235 [cell-mod]: ../std/cell/index.html
 236 [cell]: ../std/cell/struct.Cell.html
 237 [refcell]: ../std/cell/struct.RefCell.html
 238
 239 # Synchronous types
 240
 241 Many of the types above cannot be used in a threadsafe manner. Particularly, `Rc<T>` and
 242 `RefCell<T>`, which both use non-atomic reference counts (_atomic_ reference counts are those which
 243 can be incremented from multiple threads without causing a data race), cannot be used this way. This
 244 makes them cheaper to use, but we need thread safe versions of these too. They exist, in the form of
 245 `Arc<T>` and `Mutex<T>`/`RwLock<T>`
 246
 247 Note that the non-threadsafe types _cannot_ be sent between threads, and this is checked at compile
 248 time.
 249
 250 There are many useful wrappers for concurrent programming in the [sync][sync] module, but only the
 251 major ones will be covered below.
 252
 253 [sync]: ../std/sync/index.html
 254
 255 ## `Arc<T>`
 256
 257 [`Arc<T>`][arc] is a version of `Rc<T>` that uses an atomic reference count (hence, "Arc").
 258 This can be sent freely between threads.
 259
 260 C++'s `shared_ptr` is similar to `Arc`, however in the case of C++ the inner data is always mutable.
 261 For semantics similar to that from C++, we should use `Arc<Mutex<T>>`, `Arc<RwLock<T>>`, or
 262 `Arc<UnsafeCell<T>>`[^4] (`UnsafeCell<T>` is a cell type that can be used to hold any data and has
 263 no runtime cost, but accessing it requires `unsafe` blocks). The last one should only be used if we
 264 are certain that the usage won't cause any memory unsafety. Remember that writing to a struct is not
 265 an atomic operation, and many functions like `vec.push()` can reallocate internally and cause unsafe
 266 behavior, so even monotonicity may not be enough to justify `UnsafeCell`.
 267
 268 [^4]: `Arc<UnsafeCell<T>>` actually won't compile since `UnsafeCell<T>` isn't `Send` or `Sync`, but we can wrap it in a type and implement `Send`/`Sync` for it manually to get `Arc<Wrapper<T>>` where `Wrapper` is `struct Wrapper<T>(UnsafeCell<T>)`.
 269
 270 #### Guarantees
 271
 272 Like `Rc`, this provides the (thread safe) guarantee that the destructor for the internal data will
 273 be run when the last `Arc` goes out of scope (barring any cycles).
 274
 275 #### Cost
 276
 277 This has the added cost of using atomics for changing the refcount (which will happen whenever it is
 278 cloned or goes out of scope). When sharing data from an `Arc` in a single thread, it is preferable
 279 to share `&` pointers whenever possible.
 280
 281 [arc]: ../std/sync/struct.Arc.html
 282
 283 ## `Mutex<T>` and `RwLock<T>`
 284
 285 [`Mutex<T>`][mutex] and [`RwLock<T>`][rwlock] provide mutual-exclusion via RAII guards (guards are
 286 objects which maintain some state, like a lock, until their destructor is called). For both of
 287 these, the mutex is opaque until we call `lock()` on it, at which point the thread will block
 288 until a lock can be acquired, and then a guard will be returned. This guard can be used to access
 289 the inner data (mutably), and the lock will be released when the guard goes out of scope.
 290
 291 ```rust,ignore
 292 {
 293     let guard = mutex.lock();
 294     // `guard` dereferences mutably to the inner type.
 295     *guard += 1;
 296 } // Lock is released when destructor runs.
 297 ```
 298
 299
 300 `RwLock` has the added benefit of being efficient for multiple reads. It is always safe to have
 301 multiple readers to shared data as long as there are no writers; and `RwLock` lets readers acquire a
 302 "read lock". Such locks can be acquired concurrently and are kept track of via a reference count.
 303 Writers must obtain a "write lock" which can only be obtained when all readers have gone out of
 304 scope.
 305
 306 #### Guarantees
 307
 308 Both of these provide safe shared mutability across threads, however they are prone to deadlocks.
 309 Some level of additional protocol safety can be obtained via the type system.
 310
 311 #### Costs
 312
 313 These use internal atomic-like types to maintain the locks, which are pretty costly (they can block
 314 all memory reads across processors till they're done). Waiting on these locks can also be slow when
 315 there's a lot of concurrent access happening.
 316
 317 [rwlock]: ../std/sync/struct.RwLock.html
 318 [mutex]: ../std/sync/struct.Mutex.html
 319 [sessions]: https://github.com/Munksgaard/rust-sessions
 320
 321 # Composition
 322
 323 A common gripe when reading Rust code is with types like `Rc<RefCell<Vec<T>>>` (or even more
 324 complicated compositions of such types). It's not always clear what the composition does, or why the
 325 author chose one like this (and when one should be using such a composition in one's own code)
 326
 327 Usually, it's a case of composing together the guarantees that you need, without paying for stuff
 328 that is unnecessary.
 329
 330 For example, `Rc<RefCell<T>>` is one such composition. `Rc<T>` itself can't be dereferenced mutably;
 331 because `Rc<T>` provides sharing and shared mutability can lead to unsafe behavior, so we put
 332 `RefCell<T>` inside to get dynamically verified shared mutability. Now we have shared mutable data,
 333 but it's shared in a way that there can only be one mutator (and no readers) or multiple readers.
 334
 335 Now, we can take this a step further, and have `Rc<RefCell<Vec<T>>>` or `Rc<Vec<RefCell<T>>>`. These
 336 are both shareable, mutable vectors, but they're not the same.
 337
 338 With the former, the `RefCell<T>` is wrapping the `Vec<T>`, so the `Vec<T>` in its entirety is
 339 mutable. At the same time, there can only be one mutable borrow of the whole `Vec` at a given time.
 340 This means that your code cannot simultaneously work on different elements of the vector from
 341 different `Rc` handles. However, we are able to push and pop from the `Vec<T>` at will. This is
 342 similar to a `&mut Vec<T>` with the borrow checking done at runtime.
 343
 344 With the latter, the borrowing is of individual elements, but the overall vector is immutable. Thus,
 345 we can independently borrow separate elements, but we cannot push or pop from the vector. This is
 346 similar to a `&mut [T]`[^3], but, again, the borrow checking is at runtime.
 347
 348 In concurrent programs, we have a similar situation with `Arc<Mutex<T>>`, which provides shared
 349 mutability and ownership.
 350
 351 When reading code that uses these, go in step by step and look at the guarantees/costs provided.
 352
 353 When choosing a composed type, we must do the reverse; figure out which guarantees we want, and at
 354 which point of the composition we need them. For example, if there is a choice between
 355 `Vec<RefCell<T>>` and `RefCell<Vec<T>>`, we should figure out the tradeoffs as done above and pick
 356 one.
 357
 358 [^3]: `&[T]` and `&mut [T]` are _slices_; they consist of a pointer and a length and can refer to a portion of a vector or array. `&mut [T]` can have its elements mutated, however its length cannot be touched.