src/doc/book/concurrency.md

   1 % Concurrency
   2
   3 Concurrency and parallelism are incredibly important topics in computer
   4 science, and are also a hot topic in industry today. Computers are gaining more
   5 and more cores, yet many programmers aren't prepared to fully utilize them.
   6
   7 Rust's memory safety features also apply to its concurrency story. Even
   8 concurrent Rust programs must be memory safe, having no data races. Rust's type
   9 system is up to the task, and gives you powerful ways to reason about
  10 concurrent code at compile time.
  11
  12 Before we talk about the concurrency features that come with Rust, it's important
  13 to understand something: Rust is low-level enough that the vast majority of
  14 this is provided by the standard library, not by the language. This means that
  15 if you don't like some aspect of the way Rust handles concurrency, you can
  16 implement an alternative way of doing things.
  17 [mio](https://github.com/carllerche/mio) is a real-world example of this
  18 principle in action.
  19
  20 ## Background: `Send` and `Sync`
  21
  22 Concurrency is difficult to reason about. In Rust, we have a strong, static
  23 type system to help us reason about our code. As such, Rust gives us two traits
  24 to help us make sense of code that can possibly be concurrent.
  25
  26 ### `Send`
  27
  28 The first trait we're going to talk about is
  29 [`Send`](../std/marker/trait.Send.html). When a type `T` implements `Send`, it
  30 indicates that something of this type is able to have ownership transferred
  31 safely between threads.
  32
  33 This is important to enforce certain restrictions. For example, if we have a
  34 channel connecting two threads, we would want to be able to send some data
  35 down the channel and to the other thread. Therefore, we'd ensure that `Send` was
  36 implemented for that type.
  37
  38 In the opposite way, if we were wrapping a library with [FFI][ffi] that isn't
  39 threadsafe, we wouldn't want to implement `Send`, and so the compiler will help
  40 us enforce that it can't leave the current thread.
  41
  42 [ffi]: ffi.html
  43
  44 ### `Sync`
  45
  46 The second of these traits is called [`Sync`](../std/marker/trait.Sync.html).
  47 When a type `T` implements `Sync`, it indicates that something
  48 of this type has no possibility of introducing memory unsafety when used from
  49 multiple threads concurrently through shared references. This implies that
  50 types which don't have [interior mutability](mutability.html) are inherently
  51 `Sync`, which includes simple primitive types (like `u8`) and aggregate types
  52 containing them.
  53
  54 For sharing references across threads, Rust provides a wrapper type called
  55 `Arc<T>`. `Arc<T>` implements `Send` and `Sync` if and only if `T` implements
  56 both `Send` and `Sync`. For example, an object of type `Arc<RefCell<U>>` cannot
  57 be transferred across threads because
  58 [`RefCell`](choosing-your-guarantees.html#refcellt) does not implement
  59 `Sync`, consequently `Arc<RefCell<U>>` would not implement `Send`.
  60
  61 These two traits allow you to use the type system to make strong guarantees
  62 about the properties of your code under concurrency. Before we demonstrate
  63 why, we need to learn how to create a concurrent Rust program in the first
  64 place!
  65
  66 ## Threads
  67
  68 Rust's standard library provides a library for threads, which allow you to
  69 run Rust code in parallel. Here's a basic example of using `std::thread`:
  70
  71 ```rust
  72 use std::thread;
  73
  74 fn main() {
  75     thread::spawn(|| {
  76         println!("Hello from a thread!");
  77     });
  78 }
  79 ```
  80
  81 The `thread::spawn()` method accepts a [closure](closures.html), which is executed in a
  82 new thread. It returns a handle to the thread, that can be used to
  83 wait for the child thread to finish and extract its result:
  84
  85 ```rust
  86 use std::thread;
  87
  88 fn main() {
  89     let handle = thread::spawn(|| {
  90         "Hello from a thread!"
  91     });
  92
  93     println!("{}", handle.join().unwrap());
  94 }
  95 ```
  96
  97 As closures can capture variables from their environment, we can also try to
  98 bring some data into the other thread:
  99
 100 ```rust,ignore
 101 use std::thread;
 102
 103 fn main() {
 104     let x = 1;
 105     thread::spawn(|| {
 106         println!("x is {}", x);
 107     });
 108 }
 109 ```
 110
 111 However, this gives us an error:
 112
 113 ```text
 114 5:19: 7:6 error: closure may outlive the current function, but it
 115                  borrows `x`, which is owned by the current function
 116 ...
 117 5:19: 7:6 help: to force the closure to take ownership of `x` (and any other referenced variables),
 118           use the `move` keyword, as shown:
 119       thread::spawn(move || {
 120           println!("x is {}", x);
 121       });
 122 ```
 123
 124 This is because by default closures capture variables by reference, and thus the
 125 closure only captures a _reference to `x`_. This is a problem, because the
 126 thread may outlive the scope of `x`, leading to a dangling pointer.
 127
 128 To fix this, we use a `move` closure as mentioned in the error message. `move`
 129 closures are explained in depth [here](closures.html#move-closures); basically
 130 they move variables from their environment into themselves.
 131
 132 ```rust
 133 use std::thread;
 134
 135 fn main() {
 136     let x = 1;
 137     thread::spawn(move || {
 138         println!("x is {}", x);
 139     });
 140 }
 141 ```
 142
 143 Many languages have the ability to execute threads, but it's wildly unsafe.
 144 There are entire books about how to prevent errors that occur from shared
 145 mutable state. Rust helps out with its type system here as well, by preventing
 146 data races at compile time. Let's talk about how you actually share things
 147 between threads.
 148
 149 ## Safe Shared Mutable State
 150
 151 Due to Rust's type system, we have a concept that sounds like a lie: "safe
 152 shared mutable state." Many programmers agree that shared mutable state is
 153 very, very bad.
 154
 155 Someone once said this:
 156
 157 > Shared mutable state is the root of all evil. Most languages attempt to deal
 158 > with this problem through the 'mutable' part, but Rust deals with it by
 159 > solving the 'shared' part.
 160
 161 The same [ownership system](ownership.html) that helps prevent using pointers
 162 incorrectly also helps rule out data races, one of the worst kinds of
 163 concurrency bugs.
 164
 165 As an example, here is a Rust program that would have a data race in many
 166 languages. It will not compile:
 167
 168 ```rust,ignore
 169 use std::thread;
 170 use std::time::Duration;
 171
 172 fn main() {
 173     let mut data = vec![1, 2, 3];
 174
 175     for i in 0..3 {
 176         thread::spawn(move || {
 177             data[0] += i;
 178         });
 179     }
 180
 181     thread::sleep(Duration::from_millis(50));
 182 }
 183 ```
 184
 185 This gives us an error:
 186
 187 ```text
 188 8:17 error: capture of moved value: `data`
 189         data[0] += i;
 190         ^~~~
 191 ```
 192
 193 Rust knows this wouldn't be safe! If we had a reference to `data` in each
 194 thread, and the thread takes ownership of the reference, we'd have three owners!
 195 `data` gets moved out of `main` in the first call to `spawn()`, so subsequent
 196 calls in the loop cannot use this variable.
 197
 198 So, we need some type that lets us have more than one owning reference to a
 199 value. Usually, we'd use `Rc<T>` for this, which is a reference counted type
 200 that provides shared ownership. It has some runtime bookkeeping that keeps track
 201 of the number of references to it, hence the "reference count" part of its name.
 202
 203 Calling `clone()` on an `Rc<T>` will return a new owned reference and bump the
 204 internal reference count. We create one of these for each thread:
 205
 206
 207 ```rust,ignore
 208 use std::thread;
 209 use std::time::Duration;
 210 use std::rc::Rc;
 211
 212 fn main() {
 213     let mut data = Rc::new(vec![1, 2, 3]);
 214
 215     for i in 0..3 {
 216         // Create a new owned reference:
 217         let data_ref = data.clone();
 218
 219         // Use it in a thread:
 220         thread::spawn(move || {
 221             data_ref[0] += i;
 222         });
 223     }
 224
 225     thread::sleep(Duration::from_millis(50));
 226 }
 227 ```
 228
 229 This won't work, however, and will give us the error:
 230
 231 ```text
 232 13:9: 13:22 error: the trait bound `alloc::rc::Rc<collections::vec::Vec<i32>> : core::marker::Send`
 233             is not satisfied
 234 ...
 235 13:9: 13:22 note: `alloc::rc::Rc<collections::vec::Vec<i32>>`
 236             cannot be sent between threads safely
 237 ```
 238
 239 As the error message mentions, `Rc` cannot be sent between threads safely. This
 240 is because the internal reference count is not maintained in a thread safe
 241 matter and can have a data race.
 242
 243 To solve this, we'll use `Arc<T>`, Rust's standard atomic reference count type.
 244
 245 The Atomic part means `Arc<T>` can safely be accessed from multiple threads.
 246 To do this the compiler guarantees that mutations of the internal count use
 247 indivisible operations which can't have data races.
 248
 249 In essence, `Arc<T>` is a type that lets us share ownership of data _across
 250 threads_.
 251
 252
 253 ```rust,ignore
 254 use std::thread;
 255 use std::sync::Arc;
 256 use std::time::Duration;
 257
 258 fn main() {
 259     let mut data = Arc::new(vec![1, 2, 3]);
 260
 261     for i in 0..3 {
 262         let data = data.clone();
 263         thread::spawn(move || {
 264             data[0] += i;
 265         });
 266     }
 267
 268     thread::sleep(Duration::from_millis(50));
 269 }
 270 ```
 271
 272 Similarly to last time, we use `clone()` to create a new owned handle.
 273 This handle is then moved into the new thread.
 274
 275 And... still gives us an error.
 276
 277 ```text
 278 <anon>:11:24 error: cannot borrow immutable borrowed content as mutable
 279 <anon>:11                    data[0] += i;
 280                              ^~~~
 281 ```
 282
 283 `Arc<T>` by default has immutable contents. It allows the _sharing_ of data
 284 between threads, but shared mutable data is unsafe—and when threads are
 285 involved—can cause data races!
 286
 287
 288 Usually when we wish to make something in an immutable position mutable, we use
 289 `Cell<T>` or `RefCell<T>` which allow safe mutation via runtime checks or
 290 otherwise (see also: [Choosing Your Guarantees](choosing-your-guarantees.html)).
 291 However, similar to `Rc`, these are not thread safe. If we try using these, we
 292 will get an error about these types not being `Sync`, and the code will fail to
 293 compile.
 294
 295 It looks like we need some type that allows us to safely mutate a shared value
 296 across threads, for example a type that can ensure only one thread at a time is
 297 able to mutate the value inside it at any one time.
 298
 299 For that, we can use the `Mutex<T>` type!
 300
 301 Here's the working version:
 302
 303 ```rust
 304 use std::sync::{Arc, Mutex};
 305 use std::thread;
 306 use std::time::Duration;
 307
 308 fn main() {
 309     let data = Arc::new(Mutex::new(vec![1, 2, 3]));
 310
 311     for i in 0..3 {
 312         let data = data.clone();
 313         thread::spawn(move || {
 314             let mut data = data.lock().unwrap();
 315             data[0] += i;
 316         });
 317     }
 318
 319     thread::sleep(Duration::from_millis(50));
 320 }
 321 ```
 322
 323 Note that the value of `i` is bound (copied) to the closure and not shared
 324 among the threads.
 325
 326 We're "locking" the mutex here. A mutex (short for "mutual exclusion"), as
 327 mentioned, only allows one thread at a time to access a value. When we wish to
 328 access the value, we use `lock()` on it. This will "lock" the mutex, and no
 329 other thread will be able to lock it (and hence, do anything with the value)
 330 until we're done with it. If a thread attempts to lock a mutex which is already
 331 locked, it will wait until the other thread releases the lock.
 332
 333 The lock "release" here is implicit; when the result of the lock (in this case,
 334 `data`) goes out of scope, the lock is automatically released.
 335
 336 Note that [`lock`](../std/sync/struct.Mutex.html#method.lock) method of
 337 [`Mutex`](../std/sync/struct.Mutex.html) has this signature:
 338
 339 ```rust,ignore
 340 fn lock(&self) -> LockResult<MutexGuard<T>>
 341 ```
 342
 343 and because `Send` is not implemented for `MutexGuard<T>`, the guard cannot
 344 cross thread boundaries, ensuring thread-locality of lock acquire and release.
 345
 346 Let's examine the body of the thread more closely:
 347
 348 ```rust
 349 # use std::sync::{Arc, Mutex};
 350 # use std::thread;
 351 # use std::time::Duration;
 352 # fn main() {
 353 #     let data = Arc::new(Mutex::new(vec![1, 2, 3]));
 354 #     for i in 0..3 {
 355 #         let data = data.clone();
 356 thread::spawn(move || {
 357     let mut data = data.lock().unwrap();
 358     data[0] += i;
 359 });
 360 #     }
 361 #     thread::sleep(Duration::from_millis(50));
 362 # }
 363 ```
 364
 365 First, we call `lock()`, which acquires the mutex's lock. Because this may fail,
 366 it returns a `Result<T, E>`, and because this is just an example, we `unwrap()`
 367 it to get a reference to the data. Real code would have more robust error handling
 368 here. We're then free to mutate it, since we have the lock.
 369
 370 Lastly, while the threads are running, we wait on a short timer. But
 371 this is not ideal: we may have picked a reasonable amount of time to
 372 wait but it's more likely we'll either be waiting longer than
 373 necessary or not long enough, depending on just how much time the
 374 threads actually take to finish computing when the program runs.
 375
 376 A more precise alternative to the timer would be to use one of the
 377 mechanisms provided by the Rust standard library for synchronizing
 378 threads with each other. Let's talk about one of them: channels.
 379
 380 ## Channels
 381
 382 Here's a version of our code that uses channels for synchronization, rather
 383 than waiting for a specific time:
 384
 385 ```rust
 386 use std::sync::{Arc, Mutex};
 387 use std::thread;
 388 use std::sync::mpsc;
 389
 390 fn main() {
 391     let data = Arc::new(Mutex::new(0));
 392
 393     // `tx` is the "transmitter" or "sender".
 394     // `rx` is the "receiver".
 395     let (tx, rx) = mpsc::channel();
 396
 397     for _ in 0..10 {
 398         let (data, tx) = (data.clone(), tx.clone());
 399
 400         thread::spawn(move || {
 401             let mut data = data.lock().unwrap();
 402             *data += 1;
 403
 404             tx.send(()).unwrap();
 405         });
 406     }
 407
 408     for _ in 0..10 {
 409         rx.recv().unwrap();
 410     }
 411 }
 412 ```
 413
 414 We use the `mpsc::channel()` method to construct a new channel. We `send`
 415 a simple `()` down the channel, and then wait for ten of them to come back.
 416
 417 While this channel is sending a generic signal, we can send any data that
 418 is `Send` over the channel!
 419
 420 ```rust
 421 use std::thread;
 422 use std::sync::mpsc;
 423
 424 fn main() {
 425     let (tx, rx) = mpsc::channel();
 426
 427     for i in 0..10 {
 428         let tx = tx.clone();
 429
 430         thread::spawn(move || {
 431             let answer = i * i;
 432
 433             tx.send(answer).unwrap();
 434         });
 435     }
 436
 437     for _ in 0..10 {
 438         println!("{}", rx.recv().unwrap());
 439     }
 440 }
 441 ```
 442
 443 Here we create 10 threads, asking each to calculate the square of a number (`i`
 444 at the time of `spawn()`), and then `send()` back the answer over the channel.
 445
 446
 447 ## Panics
 448
 449 A `panic!` will crash the currently executing thread. You can use Rust's
 450 threads as a simple isolation mechanism:
 451
 452 ```rust
 453 use std::thread;
 454
 455 let handle = thread::spawn(move || {
 456     panic!("oops!");
 457 });
 458
 459 let result = handle.join();
 460
 461 assert!(result.is_err());
 462 ```
 463
 464 `Thread.join()` gives us a `Result` back, which allows us to check if the thread
 465 has panicked or not.