src/doc/nomicon/src/leaking.md

   1 # Leaking
   2
   3 Ownership-based resource management is intended to simplify composition. You
   4 acquire resources when you create the object, and you release the resources when
   5 it gets destroyed. Since destruction is handled for you, it means you can't
   6 forget to release the resources, and it happens as soon as possible! Surely this
   7 is perfect and all of our problems are solved.
   8
   9 Everything is terrible and we have new and exotic problems to try to solve.
  10
  11 Many people like to believe that Rust eliminates resource leaks. In practice,
  12 this is basically true. You would be surprised to see a Safe Rust program
  13 leak resources in an uncontrolled way.
  14
  15 However from a theoretical perspective this is absolutely not the case, no
  16 matter how you look at it. In the strictest sense, "leaking" is so abstract as
  17 to be unpreventable. It's quite trivial to initialize a collection at the start
  18 of a program, fill it with tons of objects with destructors, and then enter an
  19 infinite event loop that never refers to it. The collection will sit around
  20 uselessly, holding on to its precious resources until the program terminates (at
  21 which point all those resources would have been reclaimed by the OS anyway).
  22
  23 We may consider a more restricted form of leak: failing to drop a value that is
  24 unreachable. Rust also doesn't prevent this. In fact Rust *has a function for
  25 doing this*: `mem::forget`. This function consumes the value it is passed *and
  26 then doesn't run its destructor*.
  27
  28 In the past `mem::forget` was marked as unsafe as a sort of lint against using
  29 it, since failing to call a destructor is generally not a well-behaved thing to
  30 do (though useful for some special unsafe code). However this was generally
  31 determined to be an untenable stance to take: there are many ways to fail to
  32 call a destructor in safe code. The most famous example is creating a cycle of
  33 reference-counted pointers using interior mutability.
  34
  35 It is reasonable for safe code to assume that destructor leaks do not happen, as
  36 any program that leaks destructors is probably wrong. However *unsafe* code
  37 cannot rely on destructors to be run in order to be safe. For most types this
  38 doesn't matter: if you leak the destructor then the type is by definition
  39 inaccessible, so it doesn't matter, right? For instance, if you leak a `Box<u8>`
  40 then you waste some memory but that's hardly going to violate memory-safety.
  41
  42 However where we must be careful with destructor leaks are *proxy* types. These
  43 are types which manage access to a distinct object, but don't actually own it.
  44 Proxy objects are quite rare. Proxy objects you'll need to care about are even
  45 rarer. However we'll focus on three interesting examples in the standard
  46 library:
  47
  48 * `vec::Drain`
  49 * `Rc`
  50 * `thread::scoped::JoinGuard`
  51
  52 ## Drain
  53
  54 `drain` is a collections API that moves data out of the container without
  55 consuming the container. This enables us to reuse the allocation of a `Vec`
  56 after claiming ownership over all of its contents. It produces an iterator
  57 (Drain) that returns the contents of the Vec by-value.
  58
  59 Now, consider Drain in the middle of iteration: some values have been moved out,
  60 and others haven't. This means that part of the Vec is now full of logically
  61 uninitialized data! We could backshift all the elements in the Vec every time we
  62 remove a value, but this would have pretty catastrophic performance
  63 consequences.
  64
  65 Instead, we would like Drain to fix the Vec's backing storage when it is
  66 dropped. It should run itself to completion, backshift any elements that weren't
  67 removed (drain supports subranges), and then fix Vec's `len`. It's even
  68 unwinding-safe! Easy!
  69
  70 Now consider the following:
  71
  72 <!-- ignore: simplified code -->
  73 ```rust,ignore
  74 let mut vec = vec![Box::new(0); 4];
  75
  76 {
  77     // start draining, vec can no longer be accessed
  78     let mut drainer = vec.drain(..);
  79
  80     // pull out two elements and immediately drop them
  81     drainer.next();
  82     drainer.next();
  83
  84     // get rid of drainer, but don't call its destructor
  85     mem::forget(drainer);
  86 }
  87
  88 // Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
  89 println!("{}", vec[0]);
  90 ```
  91
  92 This is pretty clearly Not Good. Unfortunately, we're kind of stuck between a
  93 rock and a hard place: maintaining consistent state at every step has an
  94 enormous cost (and would negate any benefits of the API). Failing to maintain
  95 consistent state gives us Undefined Behavior in safe code (making the API
  96 unsound).
  97
  98 So what can we do? Well, we can pick a trivially consistent state: set the Vec's
  99 len to be 0 when we start the iteration, and fix it up if necessary in the
 100 destructor. That way, if everything executes like normal we get the desired
 101 behavior with minimal overhead. But if someone has the *audacity* to
 102 mem::forget us in the middle of the iteration, all that does is *leak even more*
 103 (and possibly leave the Vec in an unexpected but otherwise consistent state).
 104 Since we've accepted that mem::forget is safe, this is definitely safe. We call
 105 leaks causing more leaks a *leak amplification*.
 106
 107 ## Rc
 108
 109 Rc is an interesting case because at first glance it doesn't appear to be a
 110 proxy value at all. After all, it manages the data it points to, and dropping
 111 all the Rcs for a value will drop that value. Leaking an Rc doesn't seem like it
 112 would be particularly dangerous. It will leave the refcount permanently
 113 incremented and prevent the data from being freed or dropped, but that seems
 114 just like Box, right?
 115
 116 Nope.
 117
 118 Let's consider a simplified implementation of Rc:
 119
 120 <!-- ignore: simplified code -->
 121 ```rust,ignore
 122 struct Rc<T> {
 123     ptr: *mut RcBox<T>,
 124 }
 125
 126 struct RcBox<T> {
 127     data: T,
 128     ref_count: usize,
 129 }
 130
 131 impl<T> Rc<T> {
 132     fn new(data: T) -> Self {
 133         unsafe {
 134             // Wouldn't it be nice if heap::allocate worked like this?
 135             let ptr = heap::allocate::<RcBox<T>>();
 136             ptr::write(ptr, RcBox {
 137                 data: data,
 138                 ref_count: 1,
 139             });
 140             Rc { ptr: ptr }
 141         }
 142     }
 143
 144     fn clone(&self) -> Self {
 145         unsafe {
 146             (*self.ptr).ref_count += 1;
 147         }
 148         Rc { ptr: self.ptr }
 149     }
 150 }
 151
 152 impl<T> Drop for Rc<T> {
 153     fn drop(&mut self) {
 154         unsafe {
 155             (*self.ptr).ref_count -= 1;
 156             if (*self.ptr).ref_count == 0 {
 157                 // drop the data and then free it
 158                 ptr::read(self.ptr);
 159                 heap::deallocate(self.ptr);
 160             }
 161         }
 162     }
 163 }
 164 ```
 165
 166 This code contains an implicit and subtle assumption: `ref_count` can fit in a
 167 `usize`, because there can't be more than `usize::MAX` Rcs in memory. However
 168 this itself assumes that the `ref_count` accurately reflects the number of Rcs
 169 in memory, which we know is false with `mem::forget`. Using `mem::forget` we can
 170 overflow the `ref_count`, and then get it down to 0 with outstanding Rcs. Then
 171 we can happily use-after-free the inner data. Bad Bad Not Good.
 172
 173 This can be solved by just checking the `ref_count` and doing *something*. The
 174 standard library's stance is to just abort, because your program has become
 175 horribly degenerate. Also *oh my gosh* it's such a ridiculous corner case.
 176
 177 ## thread::scoped::JoinGuard
 178
 179 The thread::scoped API intended to allow threads to be spawned that reference
 180 data on their parent's stack without any synchronization over that data by
 181 ensuring the parent joins the thread before any of the shared data goes out
 182 of scope.
 183
 184 <!-- ignore: simplified code -->
 185 ```rust,ignore
 186 pub fn scoped<'a, F>(f: F) -> JoinGuard<'a>
 187     where F: FnOnce() + Send + 'a
 188 ```
 189
 190 Here `f` is some closure for the other thread to execute. Saying that
 191 `F: Send +'a` is saying that it closes over data that lives for `'a`, and it
 192 either owns that data or the data was Sync (implying `&data` is Send).
 193
 194 Because JoinGuard has a lifetime, it keeps all the data it closes over
 195 borrowed in the parent thread. This means the JoinGuard can't outlive
 196 the data that the other thread is working on. When the JoinGuard *does* get
 197 dropped it blocks the parent thread, ensuring the child terminates before any
 198 of the closed-over data goes out of scope in the parent.
 199
 200 Usage looked like:
 201
 202 <!-- ignore: simplified code -->
 203 ```rust,ignore
 204 let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
 205 {
 206     let guards = vec![];
 207     for x in &mut data {
 208         // Move the mutable reference into the closure, and execute
 209         // it on a different thread. The closure has a lifetime bound
 210         // by the lifetime of the mutable reference `x` we store in it.
 211         // The guard that is returned is in turn assigned the lifetime
 212         // of the closure, so it also mutably borrows `data` as `x` did.
 213         // This means we cannot access `data` until the guard goes away.
 214         let guard = thread::scoped(move || {
 215             *x *= 2;
 216         });
 217         // store the thread's guard for later
 218         guards.push(guard);
 219     }
 220     // All guards are dropped here, forcing the threads to join
 221     // (this thread blocks here until the others terminate).
 222     // Once the threads join, the borrow expires and the data becomes
 223     // accessible again in this thread.
 224 }
 225 // data is definitely mutated here.
 226 ```
 227
 228 In principle, this totally works! Rust's ownership system perfectly ensures it!
 229 ...except it relies on a destructor being called to be safe.
 230
 231 <!-- ignore: simplified code -->
 232 ```rust,ignore
 233 let mut data = Box::new(0);
 234 {
 235     let guard = thread::scoped(|| {
 236         // This is at best a data race. At worst, it's also a use-after-free.
 237         *data += 1;
 238     });
 239     // Because the guard is forgotten, expiring the loan without blocking this
 240     // thread.
 241     mem::forget(guard);
 242 }
 243 // So the Box is dropped here while the scoped thread may or may not be trying
 244 // to access it.
 245 ```
 246
 247 Dang. Here the destructor running was pretty fundamental to the API, and it had
 248 to be scrapped in favor of a completely different design.