src/doc/book/second-edition/src/ch19-01-unsafe-rust.md

   1 ## Unsafe Rust
   2
   3 All the code we’ve discussed so far has had Rust’s memory safety guarantees
   4 enforced at compile time. However, Rust has a second language hidden inside it
   5 that doesn’t enforce these memory safety guarantees: it’s called *unsafe Rust*
   6 and works just like regular Rust, but gives us extra superpowers.
   7
   8 Unsafe Rust exists because, by nature, static analysis is conservative. When
   9 the compiler tries to determine whether or not code upholds the guarantees,
  10 it’s better for it to reject some valid programs rather than accepting some
  11 invalid programs. Although the code might be okay, as far as Rust is able to
  12 tell, it’s not! In these cases, we can use unsafe code to tell the compiler,
  13 “trust me, I know what I’m doing.” The downside is that we use it at our own
  14 risk: if we use unsafe code incorrectly, problems due to memory unsafety, such
  15 as null pointer dereferencing, can occur.
  16
  17 Another reason Rust has an unsafe alter ego is that the underlying computer
  18 hardware is inherently unsafe. If Rust didn’t let us do unsafe operations, we
  19 couldn’t do certain tasks. Rust needs to allow us to do low-level systems
  20 programming, such as directly interacting with the operating system or even
  21 writing our own operating system. Working with low-level systems programming is
  22 one of the goals of the language. Let’s explore what we can do with unsafe Rust
  23 and how to do it.
  24
  25 ### Unsafe Superpowers
  26
  27 To switch to unsafe Rust, we use the `unsafe` keyword, and then start a new
  28 block that holds the unsafe code. We can take four actions in unsafe Rust,
  29 which we call *unsafe superpowers*, that we can’t in safe Rust. Those
  30 superpowers include the ability to:
  31
  32 * Dereference a raw pointer
  33 * Call an unsafe function or method
  34 * Access or modify a mutable static variable
  35 * Implement an unsafe trait
  36
  37 It’s important to understand that `unsafe` doesn’t turn off the borrow checker
  38 or disable any other of Rust’s safety checks: if you use a reference in unsafe
  39 code, it will still be checked. The `unsafe` keyword only gives us access to
  40 these four features that are then not checked by the compiler for memory
  41 safety. We still get some degree of safety inside of an unsafe block.
  42
  43 In addition, `unsafe` does not mean the code inside the block is necessarily
  44 dangerous or that it will definitely have memory safety problems: the intent is
  45 that as the programmer, we’ll ensure the code inside an `unsafe` block will
  46 access memory in a valid way.
  47
  48 People are fallible, and mistakes will happen, but by requiring these four
  49 unsafe operations to be inside blocks annotated with `unsafe` we’ll know that
  50 any errors related to memory safety must be within an `unsafe` block. Keep
  51 `unsafe` blocks small; you’ll be thankful later when you investigate memory
  52 bugs.
  53
  54 To isolate unsafe code as much as possible, it’s best to enclose unsafe code
  55 within a safe abstraction and provide a safe API, which we’ll discuss later in
  56 the chapter when we examine unsafe functions and methods. Parts of the standard
  57 library are implemented as safe abstractions over unsafe code that has been
  58 audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe`
  59 from leaking out into all the places that you or your users might want to use
  60 the functionality implemented with `unsafe` code, because using a safe
  61 abstraction is safe.
  62
  63 Let’s look at each of the four unsafe superpowers in turn: we’ll also look at
  64 some abstractions that provide a safe interface to unsafe code.
  65
  66 ### Dereferencing a Raw Pointer
  67
  68 In Chapter 4, in the “Dangling References” section, we mentioned that the
  69 compiler ensures references are always valid. Unsafe Rust has two new types
  70 called *raw pointers* that are similar to references. As with references, raw
  71 pointers can be immutable or mutable and are written as `*const T` and `*mut
  72 T`, respectively. The asterisk isn’t the dereference operator; it’s part of the
  73 type name. In the context of raw pointers, “immutable” means that the pointer
  74 can’t be directly assigned to after being dereferenced.
  75
  76 Different from references and smart pointers, keep in mind that raw pointers:
  77
  78 * Are allowed to ignore the borrowing rules by having both immutable and
  79   mutable pointers or multiple mutable pointers to the same location
  80 * Aren’t guaranteed to point to valid memory
  81 * Are allowed to be null
  82 * Don’t implement any automatic cleanup
  83
  84 By opting out of having Rust enforce these guarantees, we can make the
  85 trade-off of giving up guaranteed safety to gain performance or the ability to
  86 interface with another language or hardware where Rust’s guarantees don’t apply.
  87
  88 Listing 19-1 shows how to create an immutable and a mutable raw pointer from
  89 references.
  90
  91 ```rust
  92 let mut num = 5;
  93
  94 let r1 = &num as *const i32;
  95 let r2 = &mut num as *mut i32;
  96 ```
  97
  98 <span class="caption">Listing 19-1: Creating raw pointers from references</span>
  99
 100 Notice that we don’t include the `unsafe` keyword in this code. We can create
 101 raw pointers in safe code; we just can’t dereference raw pointers outside an
 102 unsafe block, as you’ll see in a bit.
 103
 104 We’ve created raw pointers by using `as` to cast an immutable and a mutable
 105 reference into their corresponding raw pointer types. Because we created them
 106 directly from references guaranteed to be valid, we know these particular raw
 107 pointers are valid, but we can’t make that assumption about just any raw
 108 pointer.
 109
 110 Next, we’ll create a raw pointer whose validity we can’t be so certain of.
 111 Listing 19-2 shows how to create a raw pointer to an arbitrary location in
 112 memory. Trying to use arbitrary memory is undefined: there might be data at
 113 that address or there might not, the compiler might optimize the code so there
 114 is no memory access, or the program might error with a segmentation fault.
 115 Usually, there is no good reason to write code like this, but it is possible:
 116
 117 ```rust
 118 let address = 0x012345usize;
 119 let r = address as *const i32;
 120 ```
 121
 122 <span class="caption">Listing 19-2: Creating a raw pointer to an arbitrary
 123 memory address</span>
 124
 125 Recall that we can create raw pointers in safe code, but we can’t *dereference*
 126 raw pointers and read the data being pointed to. In Listing 19-3, we use the
 127 dereference operator `*` on a raw pointer that requires an `unsafe` block.
 128
 129 ```rust
 130 let mut num = 5;
 131
 132 let r1 = &num as *const i32;
 133 let r2 = &mut num as *mut i32;
 134
 135 unsafe {
 136     println!("r1 is: {}", *r1);
 137     println!("r2 is: {}", *r2);
 138 }
 139 ```
 140
 141 <span class="caption">Listing 19-3: Dereferencing raw pointers within an
 142 `unsafe` block</span>
 143
 144 Creating a pointer does no harm; it’s only when we try to access the value that
 145 it points at that we might end up dealing with an invalid value.
 146
 147 Note also that in Listing 19-1 and 19-3 we created `*const i32` and `*mut i32`
 148 raw pointers that both pointed to the same memory location, where `num` is
 149 stored. If we instead tried to create an immutable and a mutable reference to
 150 `num`, the code would not have compiled because Rust’s ownership rules don’t
 151 allow a mutable reference at the same time as any immutable references. With
 152 raw pointers, we can create a mutable pointer and an immutable pointer to the
 153 same location, and change data through the mutable pointer, potentially
 154 creating a data race. Be careful!
 155
 156 With all of these dangers, why would we ever use raw pointers? One major use
 157 case is when interfacing with C code, as you’ll see in the next section,
 158 “Calling an Unsafe Function or Method.” Another case is when building up safe
 159 abstractions that the borrow checker doesn’t understand. We’ll introduce unsafe
 160 functions and then look at an example of a safe abstraction that uses unsafe
 161 code.
 162
 163 ### Calling an Unsafe Function or Method
 164
 165 The second type of operation that requires an unsafe block is calls to unsafe
 166 functions. Unsafe functions and methods look exactly like regular functions and
 167 methods, but they have an extra `unsafe` before the rest of the definition. The
 168 `unsafe` keyword in this context indicates the function has requirements we
 169 need to uphold when we call this function, because Rust can’t guarantee we’ve
 170 met these requirements. By calling an unsafe function within an `unsafe` block,
 171 we’re saying that we’ve read this function’s documentation and take
 172 responsibility for upholding the function’s contracts.
 173
 174 Here is an unsafe function named `dangerous` that doesn’t do anything in its
 175 body:
 176
 177 ```rust
 178 unsafe fn dangerous() {}
 179
 180 unsafe {
 181     dangerous();
 182 }
 183 ```
 184
 185 We must call the `dangerous` function within a separate `unsafe` block. If we
 186 try to call `dangerous` without the `unsafe` block, we’ll get an error:
 187
 188 ```text
 189 error[E0133]: call to unsafe function requires unsafe function or block
 190  -->
 191   |
 192 4 |     dangerous();
 193   |     ^^^^^^^^^^^ call to unsafe function
 194 ```
 195
 196 By inserting the `unsafe` block around our call to `dangerous`, we’re asserting
 197 to Rust that we’ve read the function’s documentation, we understand how to use
 198 it properly, and we’ve verified that we’re fulfilling the contract of the
 199 function.
 200
 201 Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other
 202 unsafe operations within an unsafe function, we don’t need to add another
 203 `unsafe` block.
 204
 205 #### Creating a Safe Abstraction over Unsafe Code
 206
 207 Just because a function contains unsafe code doesn’t mean we need to mark the
 208 entire function as unsafe. In fact, wrapping unsafe code in a safe function is
 209 a common abstraction. As an example, let’s study a function from the standard
 210 library, `split_at_mut`, that requires some unsafe code and explore how we
 211 might implement it. This safe method is defined on mutable slices: it takes one
 212 slice and makes it two by splitting the slice at the index given as an
 213 argument. Listing 19-4 shows how to use `split_at_mut`.
 214
 215 ```rust
 216 let mut v = vec![1, 2, 3, 4, 5, 6];
 217
 218 let r = &mut v[..];
 219
 220 let (a, b) = r.split_at_mut(3);
 221
 222 assert_eq!(a, &mut [1, 2, 3]);
 223 assert_eq!(b, &mut [4, 5, 6]);
 224 ```
 225
 226 <span class="caption">Listing 19-4: Using the safe `split_at_mut`
 227 function</span>
 228
 229 We can’t implement this function using only safe Rust. An attempt might look
 230 something like Listing 19-5, which won’t compile. For simplicity, we’ll
 231 implement `split_at_mut` as a function rather than a method and only for slices
 232 of `i32` values rather than for a generic type `T`.
 233
 234 ```rust,ignore
 235 fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 236     let len = slice.len();
 237
 238     assert!(mid <= len);
 239
 240     (&mut slice[..mid],
 241      &mut slice[mid..])
 242 }
 243 ```
 244
 245 <span class="caption">Listing 19-5: An attempted implementation of
 246 `split_at_mut` using only safe Rust</span>
 247
 248 This function first gets the total length of the slice, then it asserts that
 249 the index given as a parameter is within the slice by checking that it’s less
 250 than or equal to the length. The assertion means that if we pass an index that
 251 is greater than the index to split the slice at, the function will panic before
 252 it attempts to use that index.
 253
 254 Then we return two mutable slices in a tuple: one from the start of the
 255 original slice to the `mid` index and another from `mid` to the end of the
 256 slice.
 257
 258 When we try to compile the code in Listing 19-5, we’ll get an error:
 259
 260 ```text
 261 error[E0499]: cannot borrow `*slice` as mutable more than once at a time
 262  -->
 263   |
 264 6 |     (&mut slice[..mid],
 265   |           ----- first mutable borrow occurs here
 266 7 |      &mut slice[mid..])
 267   |           ^^^^^ second mutable borrow occurs here
 268 8 | }
 269   | - first borrow ends here
 270 ```
 271
 272 Rust’s borrow checker can’t understand that we’re borrowing different parts of
 273 the slice; it only knows that we’re borrowing from the same slice twice.
 274 Borrowing different parts of a slice is fundamentally okay because the two
 275 slices aren’t overlapping, but Rust isn’t smart enough to know this. When we
 276 know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.
 277
 278 Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls
 279 to unsafe functions to make the implementation of `split_at_mut` work.
 280
 281 ```rust
 282 use std::slice;
 283
 284 fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 285     let len = slice.len();
 286     let ptr = slice.as_mut_ptr();
 287
 288     assert!(mid <= len);
 289
 290     unsafe {
 291         (slice::from_raw_parts_mut(ptr, mid),
 292          slice::from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
 293     }
 294 }
 295 ```
 296
 297 <span class="caption">Listing 19-6: Using unsafe code in the implementation of
 298 the `split_at_mut` function</span>
 299
 300 Recall from “The Slice Type” section in Chapter 4 that slices are a pointer to
 301 some data and the length of the slice. We use the `len` method to get the
 302 length of a slice and the `as_mut_ptr` method to access the raw pointer of a
 303 slice. In this case, because we have a mutable slice to `i32` values,
 304 `as_mut_ptr` returns a raw pointer with the type `*mut i32`, which we’ve stored
 305 in the variable `ptr`.
 306
 307 We keep the assertion that the `mid` index is within the slice. Then we get to
 308 the unsafe code: the `slice::from_raw_parts_mut` function takes a raw pointer
 309 and a length, and creates a slice. We use this function to create a slice that
 310 starts from `ptr` and is `mid` items long. Then we call the `offset` method on
 311 `ptr` with `mid` as an argument to get a raw pointer that starts at `mid`, and
 312 we create a slice using that pointer and the remaining number of items after
 313 `mid` as the length.
 314
 315 The function `slice::from_raw_parts_mut` is unsafe because it takes a raw
 316 pointer and must trust that this pointer is valid. The `offset` method on raw
 317 pointers is also unsafe, because it must trust that the offset location is also
 318 a valid pointer. Therefore, we had to put an `unsafe` block around our calls to
 319 `slice::from_raw_parts_mut` and `offset` so we could call them. By looking at
 320 the code and by adding the assertion that `mid` must be less than or equal to
 321 `len`, we can tell that all the raw pointers used within the `unsafe` block
 322 will be valid pointers to data within the slice. This is an acceptable and
 323 appropriate use of `unsafe`.
 324
 325 Note that we don’t need to mark the resulting `split_at_mut` function as
 326 `unsafe`, and we can call this function from safe Rust. We’ve created a safe
 327 abstraction to the unsafe code with an implementation of the function that uses
 328 `unsafe` code in a safe way, because it creates only valid pointers from the
 329 data this function has access to.
 330
 331 In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would
 332 likely crash when the slice is used. This code takes an arbitrary memory
 333 location and creates a slice ten thousand items long:
 334
 335 ```rust
 336 use std::slice;
 337
 338 let address = 0x012345usize;
 339 let r = address as *mut i32;
 340
 341 let slice = unsafe {
 342     slice::from_raw_parts_mut(r, 10000)
 343 };
 344 ```
 345
 346 <span class="caption">Listing 19-7: Creating a slice from an arbitrary memory
 347 location</span>
 348
 349 We don’t own the memory at this arbitrary location, and there is no guarantee
 350 that the slice this code creates contains valid `i32` values. Attempting to use
 351 `slice` as though it’s a valid slice results in undefined behavior.
 352
 353 #### Using `extern` Functions to Call External Code
 354
 355 Sometimes, your Rust code might need to interact with code written in another
 356 language. For this, Rust has a keyword, `extern`, that facilitates the creation
 357 and use of a *Foreign Function Interface (FFI)*. An FFI is a way for a
 358 programming language to define functions and enable a different (foreign)
 359 programming language to call those functions.
 360
 361 Listing 19-8 demonstrates how to set up an integration with the `abs` function
 362 from the C standard library. Functions declared within `extern` blocks are
 363 always unsafe to call from Rust code. The reason is that other languages don’t
 364 enforce Rust’s rules and guarantees, and Rust can’t check them, so
 365 responsibility falls on the programmer to ensure safety.
 366
 367 <span class="filename">Filename: src/main.rs</span>
 368
 369 ```rust
 370 extern "C" {
 371     fn abs(input: i32) -> i32;
 372 }
 373
 374 fn main() {
 375     unsafe {
 376         println!("Absolute value of -3 according to C: {}", abs(-3));
 377     }
 378 }
 379 ```
 380
 381 <span class="caption">Listing 19-8: Declaring and calling an `extern` function
 382 defined in another language</span>
 383
 384 Within the `extern "C"` block, we list the names and signatures of external
 385 functions from another language we want to call. The `"C"` part defines which
 386 *application binary interface (ABI)* the external function uses: the ABI
 387 defines how to call the function at the assembly level. The `"C"` ABI is the
 388 most common and follows the C programming language’s ABI.
 389
 390 > #### Calling Rust Functions from Other Languages
 391 >
 392 > We can also use `extern` to create an interface that allows other languages
 393 > to call Rust functions. Instead of an `extern` block, we add the `extern`
 394 > keyword and specify the ABI to use just before the `fn` keyword. We also need
 395 > to add a `#[no_mangle]` annotation to tell the Rust compiler not to mangle
 396 > the name of this function. *Mangling* is when a compiler changes the name
 397 > we’ve given a function to a different name that contains more information for
 398 > other parts of the compilation process to consume but is less human readable.
 399 > Every programming language compiler mangles names slightly differently, so
 400 > for a Rust function to be nameable by other languages, we must disable the
 401 > Rust compiler’s name mangling.
 402 >
 403 > In the following example, we make the `call_from_c` function accessible from
 404 > C code, after it’s compiled to a shared library and linked from C:
 405 >
 406 > ```rust
 407 > #[no_mangle]
 408 > pub extern "C" fn call_from_c() {
 409 >     println!("Just called a Rust function from C!");
 410 > }
 411 > ```
 412 >
 413 > This usage of `extern` does not require `unsafe`.
 414
 415 ### Accessing or Modifying a Mutable Static Variable
 416
 417 Until now, we’ve not talked about *global variables*, which Rust does support
 418 but can be problematic with Rust’s ownership rules. If two threads are
 419 accessing the same mutable global variable, it can cause a data race.
 420
 421 In Rust, global variables are called *static* variables. Listing 19-9 shows an
 422 example declaration and use of a static variable with a string slice as a
 423 value.
 424
 425 <span class="filename">Filename: src/main.rs</span>
 426
 427 ```rust
 428 static HELLO_WORLD: &str = "Hello, world!";
 429
 430 fn main() {
 431     println!("name is: {}", HELLO_WORLD);
 432 }
 433 ```
 434
 435 <span class="caption">Listing 19-9: Defining and using an immutable static
 436 variable</span>
 437
 438 Static variables are similar to constants, which we discussed in the
 439 “Differences Between Variables and Constants” section in Chapter 3. The names
 440 of static variables are in `SCREAMING_SNAKE_CASE` by convention, and we *must*
 441 annotate the variable’s type, which is `&'static str` in this example. Static
 442 variables can only store references with the `'static` lifetime, which means
 443 the Rust compiler can figure out the lifetime; we don’t need to annotate it
 444 explicitly. Accessing an immutable static variable is safe.
 445
 446 Constants and immutable static variables might seem similar, but a subtle
 447 difference is that values in a static variable have a fixed address in memory.
 448 Using the value will always access the same data. Constants, on the other hand,
 449 are allowed to duplicate their data whenever they’re used.
 450
 451 Another difference between constants and static variables is that static
 452 variables can be mutable. Accessing and modifying mutable static variables is
 453 *unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable
 454 static variable named `COUNTER`.
 455
 456 <span class="filename">Filename: src/main.rs</span>
 457
 458 ```rust
 459 static mut COUNTER: u32 = 0;
 460
 461 fn add_to_count(inc: u32) {
 462     unsafe {
 463         COUNTER += inc;
 464     }
 465 }
 466
 467 fn main() {
 468     add_to_count(3);
 469
 470     unsafe {
 471         println!("COUNTER: {}", COUNTER);
 472     }
 473 }
 474 ```
 475
 476 <span class="caption">Listing 19-10: Reading from or writing to a mutable
 477 static variable is unsafe</span>
 478
 479 As with regular variables, we specify mutability using the `mut` keyword. Any
 480 code that reads or writes from `COUNTER` must be within an `unsafe` block. This
 481 code compiles and prints `COUNTER: 3` as we would expect because it’s single
 482 threaded. Having multiple threads access `COUNTER` would likely result in data
 483 races.
 484
 485 With mutable data that is globally accessible, it’s difficult to ensure there
 486 are no data races, which is why Rust considers mutable static variables to be
 487 unsafe. Where possible, it’s preferable to use the concurrency techniques and
 488 thread-safe smart pointers we discussed in Chapter 16, so the compiler checks
 489 that data accessed from different threads is done safely.
 490
 491 ### Implementing an Unsafe Trait
 492
 493 The final action that only works with `unsafe` is implementing an unsafe trait.
 494 A trait is unsafe when at least one of its methods has some invariant that the
 495 compiler can’t verify. We can declare that a trait is `unsafe` by adding the
 496 `unsafe` keyword before `trait`; then implementation of the trait must be
 497 marked as `unsafe` too, as shown in Listing 19-11.
 498
 499 ```rust
 500 unsafe trait Foo {
 501     // methods go here
 502 }
 503
 504 unsafe impl Foo for i32 {
 505     // method implementations go here
 506 }
 507 ```
 508
 509 <span class="caption">Listing 19-11: Defining and implementing an unsafe
 510 trait</span>
 511
 512 By using `unsafe impl`, we’re promising that we’ll uphold the invariants that
 513 the compiler can’t verify.
 514
 515 As an example, recall the `Sync` and `Send` marker traits we discussed in the
 516 “Extensible Concurrency with the `Sync` and `Send` Traits” section in Chapter
 517 16: the compiler implements these traits automatically if our types are
 518 composed entirely of `Send` and `Sync` types. If we implement a type that
 519 contains a type that is not `Send` or `Sync`, such as raw pointers, and we want
 520 to mark that type as `Send` or `Sync`, we must use `unsafe`. Rust can’t verify
 521 that our type upholds the guarantees that it can be safely sent across threads
 522 or accessed from multiple threads; therefore, we need to do those checks
 523 manually and indicate as such with `unsafe`.
 524
 525 ### When to Use Unsafe Code
 526
 527 Using `unsafe` to take one of the four actions (superpowers) just discussed
 528 isn’t wrong or even frowned upon. But it is trickier to get `unsafe` code
 529 correct because the compiler can’t help uphold memory safety. When you have a
 530 reason to use `unsafe` code, you can do so, and having the explicit `unsafe`
 531 annotation makes it easier to track down the source of problems if they occur.