src/doc/book/src/ch19-01-unsafe-rust.md

   1 ## Unsafe Rust
   2
   3 All the code we’ve discussed so far has had Rust’s memory safety guarantees
   4 enforced at compile time. However, Rust has a second language hidden inside it
   5 that doesn’t enforce these memory safety guarantees: it’s called *unsafe Rust*
   6 and works just like regular Rust, but gives us extra superpowers.
   7
   8 Unsafe Rust exists because, by nature, static analysis is conservative. When
   9 the compiler tries to determine whether or not code upholds the guarantees,
  10 it’s better for it to reject some valid programs rather than accept some
  11 invalid programs. Although the code might be okay, as far as Rust is able to
  12 tell, it’s not! In these cases, you can use unsafe code to tell the compiler,
  13 “Trust me, I know what I’m doing.” The downside is that you use it at your own
  14 risk: if you use unsafe code incorrectly, problems due to memory unsafety, such
  15 as null pointer dereferencing, can occur.
  16
  17 Another reason Rust has an unsafe alter ego is that the underlying computer
  18 hardware is inherently unsafe. If Rust didn’t let you do unsafe operations, you
  19 couldn’t do certain tasks. Rust needs to allow you to do low-level systems
  20 programming, such as directly interacting with the operating system or even
  21 writing your own operating system. Working with low-level systems programming
  22 is one of the goals of the language. Let’s explore what we can do with unsafe
  23 Rust and how to do it.
  24
  25 ### Unsafe Superpowers
  26
  27 To switch to unsafe Rust, use the `unsafe` keyword and then start a new block
  28 that holds the unsafe code. You can take four actions in unsafe Rust, called
  29 *unsafe superpowers*, that you can’t in safe Rust. Those superpowers include
  30 the ability to:
  31
  32 * Dereference a raw pointer
  33 * Call an unsafe function or method
  34 * Access or modify a mutable static variable
  35 * Implement an unsafe trait
  36 * Access fields of `union`s
  37
  38 It’s important to understand that `unsafe` doesn’t turn off the borrow checker
  39 or disable any other of Rust’s safety checks: if you use a reference in unsafe
  40 code, it will still be checked. The `unsafe` keyword only gives you access to
  41 these four features that are then not checked by the compiler for memory
  42 safety. You’ll still get some degree of safety inside of an unsafe block.
  43
  44 In addition, `unsafe` does not mean the code inside the block is necessarily
  45 dangerous or that it will definitely have memory safety problems: the intent is
  46 that as the programmer, you’ll ensure the code inside an `unsafe` block will
  47 access memory in a valid way.
  48
  49 People are fallible, and mistakes will happen, but by requiring these four
  50 unsafe operations to be inside blocks annotated with `unsafe` you’ll know that
  51 any errors related to memory safety must be within an `unsafe` block. Keep
  52 `unsafe` blocks small; you’ll be thankful later when you investigate memory
  53 bugs.
  54
  55 To isolate unsafe code as much as possible, it’s best to enclose unsafe code
  56 within a safe abstraction and provide a safe API, which we’ll discuss later in
  57 the chapter when we examine unsafe functions and methods. Parts of the standard
  58 library are implemented as safe abstractions over unsafe code that has been
  59 audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe`
  60 from leaking out into all the places that you or your users might want to use
  61 the functionality implemented with `unsafe` code, because using a safe
  62 abstraction is safe.
  63
  64 Let’s look at each of the four unsafe superpowers in turn. We’ll also look at
  65 some abstractions that provide a safe interface to unsafe code.
  66
  67 ### Dereferencing a Raw Pointer
  68
  69 In Chapter 4, in the [“Dangling References”][dangling-references]<!-- ignore
  70 --> section, we mentioned that the compiler ensures references are always
  71 valid. Unsafe Rust has two new types called *raw pointers* that are similar to
  72 references. As with references, raw pointers can be immutable or mutable and
  73 are written as `*const T` and `*mut T`, respectively. The asterisk isn’t the
  74 dereference operator; it’s part of the type name. In the context of raw
  75 pointers, *immutable* means that the pointer can’t be directly assigned to
  76 after being dereferenced.
  77
  78 Different from references and smart pointers, raw pointers:
  79
  80 * Are allowed to ignore the borrowing rules by having both immutable and
  81   mutable pointers or multiple mutable pointers to the same location
  82 * Aren’t guaranteed to point to valid memory
  83 * Are allowed to be null
  84 * Don’t implement any automatic cleanup
  85
  86 By opting out of having Rust enforce these guarantees, you can give up
  87 guaranteed safety in exchange for greater performance or the ability to
  88 interface with another language or hardware where Rust’s guarantees don’t apply.
  89
  90 Listing 19-1 shows how to create an immutable and a mutable raw pointer from
  91 references.
  92
  93 ```rust
  94 let mut num = 5;
  95
  96 let r1 = &num as *const i32;
  97 let r2 = &mut num as *mut i32;
  98 ```
  99
 100 <span class="caption">Listing 19-1: Creating raw pointers from references</span>
 101
 102 Notice that we don’t include the `unsafe` keyword in this code. We can create
 103 raw pointers in safe code; we just can’t dereference raw pointers outside an
 104 unsafe block, as you’ll see in a bit.
 105
 106 We’ve created raw pointers by using `as` to cast an immutable and a mutable
 107 reference into their corresponding raw pointer types. Because we created them
 108 directly from references guaranteed to be valid, we know these particular raw
 109 pointers are valid, but we can’t make that assumption about just any raw
 110 pointer.
 111
 112 Next, we’ll create a raw pointer whose validity we can’t be so certain of.
 113 Listing 19-2 shows how to create a raw pointer to an arbitrary location in
 114 memory. Trying to use arbitrary memory is undefined: there might be data at
 115 that address or there might not, the compiler might optimize the code so there
 116 is no memory access, or the program might error with a segmentation fault.
 117 Usually, there is no good reason to write code like this, but it is possible.
 118
 119 ```rust
 120 let address = 0x012345usize;
 121 let r = address as *const i32;
 122 ```
 123
 124 <span class="caption">Listing 19-2: Creating a raw pointer to an arbitrary
 125 memory address</span>
 126
 127 Recall that we can create raw pointers in safe code, but we can’t *dereference*
 128 raw pointers and read the data being pointed to. In Listing 19-3, we use the
 129 dereference operator `*` on a raw pointer that requires an `unsafe` block.
 130
 131 ```rust,unsafe
 132 let mut num = 5;
 133
 134 let r1 = &num as *const i32;
 135 let r2 = &mut num as *mut i32;
 136
 137 unsafe {
 138     println!("r1 is: {}", *r1);
 139     println!("r2 is: {}", *r2);
 140 }
 141 ```
 142
 143 <span class="caption">Listing 19-3: Dereferencing raw pointers within an
 144 `unsafe` block</span>
 145
 146 Creating a pointer does no harm; it’s only when we try to access the value that
 147 it points at that we might end up dealing with an invalid value.
 148
 149 Note also that in Listing 19-1 and 19-3, we created `*const i32` and `*mut i32`
 150 raw pointers that both pointed to the same memory location, where `num` is
 151 stored. If we instead tried to create an immutable and a mutable reference to
 152 `num`, the code would not have compiled because Rust’s ownership rules don’t
 153 allow a mutable reference at the same time as any immutable references. With
 154 raw pointers, we can create a mutable pointer and an immutable pointer to the
 155 same location and change data through the mutable pointer, potentially creating
 156 a data race. Be careful!
 157
 158 With all of these dangers, why would you ever use raw pointers? One major use
 159 case is when interfacing with C code, as you’ll see in the next section,
 160 [“Calling an Unsafe Function or
 161 Method.”](#calling-an-unsafe-function-or-method)<!-- ignore --> Another case is
 162 when building up safe abstractions that the borrow checker doesn’t understand.
 163 We’ll introduce unsafe functions and then look at an example of a safe
 164 abstraction that uses unsafe code.
 165
 166 ### Calling an Unsafe Function or Method
 167
 168 The second type of operation that requires an unsafe block is calls to unsafe
 169 functions. Unsafe functions and methods look exactly like regular functions and
 170 methods, but they have an extra `unsafe` before the rest of the definition. The
 171 `unsafe` keyword in this context indicates the function has requirements we
 172 need to uphold when we call this function, because Rust can’t guarantee we’ve
 173 met these requirements. By calling an unsafe function within an `unsafe` block,
 174 we’re saying that we’ve read this function’s documentation and take
 175 responsibility for upholding the function’s contracts.
 176
 177 Here is an unsafe function named `dangerous` that doesn’t do anything in its
 178 body:
 179
 180 ```rust,unsafe
 181 unsafe fn dangerous() {}
 182
 183 unsafe {
 184     dangerous();
 185 }
 186 ```
 187
 188 We must call the `dangerous` function within a separate `unsafe` block. If we
 189 try to call `dangerous` without the `unsafe` block, we’ll get an error:
 190
 191 ```text
 192 error[E0133]: call to unsafe function requires unsafe function or block
 193  -->
 194   |
 195 4 |     dangerous();
 196   |     ^^^^^^^^^^^ call to unsafe function
 197 ```
 198
 199 By inserting the `unsafe` block around our call to `dangerous`, we’re asserting
 200 to Rust that we’ve read the function’s documentation, we understand how to use
 201 it properly, and we’ve verified that we’re fulfilling the contract of the
 202 function.
 203
 204 Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other
 205 unsafe operations within an unsafe function, we don’t need to add another
 206 `unsafe` block.
 207
 208 #### Creating a Safe Abstraction over Unsafe Code
 209
 210 Just because a function contains unsafe code doesn’t mean we need to mark the
 211 entire function as unsafe. In fact, wrapping unsafe code in a safe function is
 212 a common abstraction. As an example, let’s study a function from the standard
 213 library, `split_at_mut`, that requires some unsafe code and explore how we
 214 might implement it. This safe method is defined on mutable slices: it takes one
 215 slice and makes it two by splitting the slice at the index given as an
 216 argument. Listing 19-4 shows how to use `split_at_mut`.
 217
 218 ```rust
 219 let mut v = vec![1, 2, 3, 4, 5, 6];
 220
 221 let r = &mut v[..];
 222
 223 let (a, b) = r.split_at_mut(3);
 224
 225 assert_eq!(a, &mut [1, 2, 3]);
 226 assert_eq!(b, &mut [4, 5, 6]);
 227 ```
 228
 229 <span class="caption">Listing 19-4: Using the safe `split_at_mut`
 230 function</span>
 231
 232 We can’t implement this function using only safe Rust. An attempt might look
 233 something like Listing 19-5, which won’t compile. For simplicity, we’ll
 234 implement `split_at_mut` as a function rather than a method and only for slices
 235 of `i32` values rather than for a generic type `T`.
 236
 237 ```rust,ignore,does_not_compile
 238 fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 239     let len = slice.len();
 240
 241     assert!(mid <= len);
 242
 243     (&mut slice[..mid],
 244      &mut slice[mid..])
 245 }
 246 ```
 247
 248 <span class="caption">Listing 19-5: An attempted implementation of
 249 `split_at_mut` using only safe Rust</span>
 250
 251 This function first gets the total length of the slice. Then it asserts that
 252 the index given as a parameter is within the slice by checking whether it’s
 253 less than or equal to the length. The assertion means that if we pass an index
 254 that is greater than the length to split the slice at, the function will panic
 255 before it attempts to use that index.
 256
 257 Then we return two mutable slices in a tuple: one from the start of the
 258 original slice to the `mid` index and another from `mid` to the end of the
 259 slice.
 260
 261 When we try to compile the code in Listing 19-5, we’ll get an error.
 262
 263 ```text
 264 error[E0499]: cannot borrow `*slice` as mutable more than once at a time
 265  -->
 266   |
 267 6 |     (&mut slice[..mid],
 268   |           ----- first mutable borrow occurs here
 269 7 |      &mut slice[mid..])
 270   |           ^^^^^ second mutable borrow occurs here
 271 8 | }
 272   | - first borrow ends here
 273 ```
 274
 275 Rust’s borrow checker can’t understand that we’re borrowing different parts of
 276 the slice; it only knows that we’re borrowing from the same slice twice.
 277 Borrowing different parts of a slice is fundamentally okay because the two
 278 slices aren’t overlapping, but Rust isn’t smart enough to know this. When we
 279 know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.
 280
 281 Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls
 282 to unsafe functions to make the implementation of `split_at_mut` work.
 283
 284 ```rust,unsafe
 285 use std::slice;
 286
 287 fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 288     let len = slice.len();
 289     let ptr = slice.as_mut_ptr();
 290
 291     assert!(mid <= len);
 292
 293     unsafe {
 294         (slice::from_raw_parts_mut(ptr, mid),
 295          slice::from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
 296     }
 297 }
 298 ```
 299
 300 <span class="caption">Listing 19-6: Using unsafe code in the implementation of
 301 the `split_at_mut` function</span>
 302
 303 Recall from [“The Slice Type”][the-slice-type]<!-- ignore --> section in
 304 Chapter 4 that slices are a pointer to some data and the length of the slice.
 305 We use the `len` method to get the length of a slice and the `as_mut_ptr`
 306 method to access the raw pointer of a slice. In this case, because we have a
 307 mutable slice to `i32` values, `as_mut_ptr` returns a raw pointer with the type
 308 `*mut i32`, which we’ve stored in the variable `ptr`.
 309
 310 We keep the assertion that the `mid` index is within the slice. Then we get to
 311 the unsafe code: the `slice::from_raw_parts_mut` function takes a raw pointer
 312 and a length, and it creates a slice. We use this function to create a slice
 313 that starts from `ptr` and is `mid` items long. Then we call the `offset`
 314 method on `ptr` with `mid` as an argument to get a raw pointer that starts at
 315 `mid`, and we create a slice using that pointer and the remaining number of
 316 items after `mid` as the length.
 317
 318 The function `slice::from_raw_parts_mut` is unsafe because it takes a raw
 319 pointer and must trust that this pointer is valid. The `offset` method on raw
 320 pointers is also unsafe, because it must trust that the offset location is also
 321 a valid pointer. Therefore, we had to put an `unsafe` block around our calls to
 322 `slice::from_raw_parts_mut` and `offset` so we could call them. By looking at
 323 the code and by adding the assertion that `mid` must be less than or equal to
 324 `len`, we can tell that all the raw pointers used within the `unsafe` block
 325 will be valid pointers to data within the slice. This is an acceptable and
 326 appropriate use of `unsafe`.
 327
 328 Note that we don’t need to mark the resulting `split_at_mut` function as
 329 `unsafe`, and we can call this function from safe Rust. We’ve created a safe
 330 abstraction to the unsafe code with an implementation of the function that uses
 331 `unsafe` code in a safe way, because it creates only valid pointers from the
 332 data this function has access to.
 333
 334 In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would
 335 likely crash when the slice is used. This code takes an arbitrary memory
 336 location and creates a slice 10,000 items long.
 337
 338 ```rust,unsafe
 339 use std::slice;
 340
 341 let address = 0x01234usize;
 342 let r = address as *mut i32;
 343
 344 let slice: &[i32] = unsafe {
 345     slice::from_raw_parts_mut(r, 10000)
 346 };
 347 ```
 348
 349 <span class="caption">Listing 19-7: Creating a slice from an arbitrary memory
 350 location</span>
 351
 352 We don’t own the memory at this arbitrary location, and there is no guarantee
 353 that the slice this code creates contains valid `i32` values. Attempting to use
 354 `slice` as though it’s a valid slice results in undefined behavior.
 355
 356 #### Using `extern` Functions to Call External Code
 357
 358 Sometimes, your Rust code might need to interact with code written in another
 359 language. For this, Rust has a keyword, `extern`, that facilitates the creation
 360 and use of a *Foreign Function Interface (FFI)*. An FFI is a way for a
 361 programming language to define functions and enable a different (foreign)
 362 programming language to call those functions.
 363
 364 Listing 19-8 demonstrates how to set up an integration with the `abs` function
 365 from the C standard library. Functions declared within `extern` blocks are
 366 always unsafe to call from Rust code. The reason is that other languages don’t
 367 enforce Rust’s rules and guarantees, and Rust can’t check them, so
 368 responsibility falls on the programmer to ensure safety.
 369
 370 <span class="filename">Filename: src/main.rs</span>
 371
 372 ```rust,unsafe
 373 extern "C" {
 374     fn abs(input: i32) -> i32;
 375 }
 376
 377 fn main() {
 378     unsafe {
 379         println!("Absolute value of -3 according to C: {}", abs(-3));
 380     }
 381 }
 382 ```
 383
 384 <span class="caption">Listing 19-8: Declaring and calling an `extern` function
 385 defined in another language</span>
 386
 387 Within the `extern "C"` block, we list the names and signatures of external
 388 functions from another language we want to call. The `"C"` part defines which
 389 *application binary interface (ABI)* the external function uses: the ABI
 390 defines how to call the function at the assembly level. The `"C"` ABI is the
 391 most common and follows the C programming language’s ABI.
 392
 393 > #### Calling Rust Functions from Other Languages
 394 >
 395 > We can also use `extern` to create an interface that allows other languages
 396 > to call Rust functions. Instead of an `extern` block, we add the `extern`
 397 > keyword and specify the ABI to use just before the `fn` keyword. We also need
 398 > to add a `#[no_mangle]` annotation to tell the Rust compiler not to mangle
 399 > the name of this function. *Mangling* is when a compiler changes the name
 400 > we’ve given a function to a different name that contains more information for
 401 > other parts of the compilation process to consume but is less human readable.
 402 > Every programming language compiler mangles names slightly differently, so
 403 > for a Rust function to be nameable by other languages, we must disable the
 404 > Rust compiler’s name mangling.
 405 >
 406 > In the following example, we make the `call_from_c` function accessible from
 407 > C code, after it’s compiled to a shared library and linked from C:
 408 >
 409 > ```rust
 410 > #[no_mangle]
 411 > pub extern "C" fn call_from_c() {
 412 >     println!("Just called a Rust function from C!");
 413 > }
 414 > ```
 415 >
 416 > This usage of `extern` does not require `unsafe`.
 417
 418 ### Accessing or Modifying a Mutable Static Variable
 419
 420 Until now, we’ve not talked about *global variables*, which Rust does support
 421 but can be problematic with Rust’s ownership rules. If two threads are
 422 accessing the same mutable global variable, it can cause a data race.
 423
 424 In Rust, global variables are called *static* variables. Listing 19-9 shows an
 425 example declaration and use of a static variable with a string slice as a
 426 value.
 427
 428 <span class="filename">Filename: src/main.rs</span>
 429
 430 ```rust
 431 static HELLO_WORLD: &str = "Hello, world!";
 432
 433 fn main() {
 434     println!("name is: {}", HELLO_WORLD);
 435 }
 436 ```
 437
 438 <span class="caption">Listing 19-9: Defining and using an immutable static
 439 variable</span>
 440
 441 Static variables are similar to constants, which we discussed in the
 442 [“Differences Between Variables and
 443 Constants”][differences-between-variables-and-constants]<!-- ignore -->
 444 section in Chapter 3. The names of static variables are in
 445 `SCREAMING_SNAKE_CASE` by convention, and we *must* annotate the variable’s
 446 type, which is `&'static str` in this example. Static variables can only store
 447 references with the `'static` lifetime, which means the Rust compiler can
 448 figure out the lifetime; we don’t need to annotate it explicitly. Accessing an
 449 immutable static variable is safe.
 450
 451 Constants and immutable static variables might seem similar, but a subtle
 452 difference is that values in a static variable have a fixed address in memory.
 453 Using the value will always access the same data. Constants, on the other hand,
 454 are allowed to duplicate their data whenever they’re used.
 455
 456 Another difference between constants and static variables is that static
 457 variables can be mutable. Accessing and modifying mutable static variables is
 458 *unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable
 459 static variable named `COUNTER`.
 460
 461 <span class="filename">Filename: src/main.rs</span>
 462
 463 ```rust,unsafe
 464 static mut COUNTER: u32 = 0;
 465
 466 fn add_to_count(inc: u32) {
 467     unsafe {
 468         COUNTER += inc;
 469     }
 470 }
 471
 472 fn main() {
 473     add_to_count(3);
 474
 475     unsafe {
 476         println!("COUNTER: {}", COUNTER);
 477     }
 478 }
 479 ```
 480
 481 <span class="caption">Listing 19-10: Reading from or writing to a mutable
 482 static variable is unsafe</span>
 483
 484 As with regular variables, we specify mutability using the `mut` keyword. Any
 485 code that reads or writes from `COUNTER` must be within an `unsafe` block. This
 486 code compiles and prints `COUNTER: 3` as we would expect because it’s single
 487 threaded. Having multiple threads access `COUNTER` would likely result in data
 488 races.
 489
 490 With mutable data that is globally accessible, it’s difficult to ensure there
 491 are no data races, which is why Rust considers mutable static variables to be
 492 unsafe. Where possible, it’s preferable to use the concurrency techniques and
 493 thread-safe smart pointers we discussed in Chapter 16 so the compiler checks
 494 that data accessed from different threads is done safely.
 495
 496 ### Implementing an Unsafe Trait
 497
 498 The final action that works only with `unsafe` is implementing an unsafe trait.
 499 A trait is unsafe when at least one of its methods has some invariant that the
 500 compiler can’t verify. We can declare that a trait is `unsafe` by adding the
 501 `unsafe` keyword before `trait` and marking the implementation of the trait as
 502 `unsafe` too, as shown in Listing 19-11.
 503
 504 ```rust,unsafe
 505 unsafe trait Foo {
 506     // methods go here
 507 }
 508
 509 unsafe impl Foo for i32 {
 510     // method implementations go here
 511 }
 512 ```
 513
 514 <span class="caption">Listing 19-11: Defining and implementing an unsafe
 515 trait</span>
 516
 517 By using `unsafe impl`, we’re promising that we’ll uphold the invariants that
 518 the compiler can’t verify.
 519
 520 As an example, recall the `Sync` and `Send` marker traits we discussed in the
 521 [“Extensible Concurrency with the `Sync` and `Send`
 522 Traits”][extensible-concurrency-with-the-sync-and-send-traits]<!-- ignore -->
 523 section in Chapter 16: the compiler implements these traits automatically if
 524 our types are composed entirely of `Send` and `Sync` types. If we implement a
 525 type that contains a type that is not `Send` or `Sync`, such as raw pointers,
 526 and we want to mark that type as `Send` or `Sync`, we must use `unsafe`. Rust
 527 can’t verify that our type upholds the guarantees that it can be safely sent
 528 across threads or accessed from multiple threads; therefore, we need to do
 529 those checks manually and indicate as such with `unsafe`.
 530
 531 ### When to Use Unsafe Code
 532
 533 Using `unsafe` to take one of the four actions (superpowers) just discussed
 534 isn’t wrong or even frowned upon. But it is trickier to get `unsafe` code
 535 correct because the compiler can’t help uphold memory safety. When you have a
 536 reason to use `unsafe` code, you can do so, and having the explicit `unsafe`
 537 annotation makes it easier to track down the source of problems if they occur.
 538
 539 [dangling-references]:
 540 ch04-02-references-and-borrowing.html#dangling-references
 541 [differences-between-variables-and-constants]:
 542 ch03-01-variables-and-mutability.html#differences-between-variables-and-constants
 543 [extensible-concurrency-with-the-sync-and-send-traits]:
 544 ch16-04-extensible-concurrency-sync-and-send.html#extensible-concurrency-with-the-sync-and-send-traits
 545 [the-slice-type]: ch04-03-slices.html#the-slice-type