src/doc/book/src/ch19-01-unsafe-rust.md

   1 ## Unsafe Rust
   2
   3 All the code we’ve discussed so far has had Rust’s memory safety guarantees
   4 enforced at compile time. However, Rust has a second language hidden inside it
   5 that doesn’t enforce these memory safety guarantees: it’s called *unsafe Rust*
   6 and works just like regular Rust, but gives us extra superpowers.
   7
   8 Unsafe Rust exists because, by nature, static analysis is conservative. When
   9 the compiler tries to determine whether or not code upholds the guarantees,
  10 it’s better for it to reject some valid programs rather than accept some
  11 invalid programs. Although the code *might* be okay, if the Rust compiler
  12 doesn’t have enough information to be confident, it will reject the code. In
  13 these cases, you can use unsafe code to tell the compiler, “Trust me, I know
  14 what I’m doing.” The downside is that you use it at your own risk: if you use
  15 unsafe code incorrectly, problems due to memory unsafety, such as null pointer
  16 dereferencing, can occur.
  17
  18 Another reason Rust has an unsafe alter ego is that the underlying computer
  19 hardware is inherently unsafe. If Rust didn’t let you do unsafe operations, you
  20 couldn’t do certain tasks. Rust needs to allow you to do low-level systems
  21 programming, such as directly interacting with the operating system or even
  22 writing your own operating system. Working with low-level systems programming
  23 is one of the goals of the language. Let’s explore what we can do with unsafe
  24 Rust and how to do it.
  25
  26 ### Unsafe Superpowers
  27
  28 To switch to unsafe Rust, use the `unsafe` keyword and then start a new block
  29 that holds the unsafe code. You can take five actions in unsafe Rust, called
  30 *unsafe superpowers*, that you can’t in safe Rust. Those superpowers include
  31 the ability to:
  32
  33 * Dereference a raw pointer
  34 * Call an unsafe function or method
  35 * Access or modify a mutable static variable
  36 * Implement an unsafe trait
  37 * Access fields of `union`s
  38
  39 It’s important to understand that `unsafe` doesn’t turn off the borrow checker
  40 or disable any other of Rust’s safety checks: if you use a reference in unsafe
  41 code, it will still be checked. The `unsafe` keyword only gives you access to
  42 these five features that are then not checked by the compiler for memory
  43 safety. You’ll still get some degree of safety inside of an unsafe block.
  44
  45 In addition, `unsafe` does not mean the code inside the block is necessarily
  46 dangerous or that it will definitely have memory safety problems: the intent is
  47 that as the programmer, you’ll ensure the code inside an `unsafe` block will
  48 access memory in a valid way.
  49
  50 People are fallible, and mistakes will happen, but by requiring these five
  51 unsafe operations to be inside blocks annotated with `unsafe` you’ll know that
  52 any errors related to memory safety must be within an `unsafe` block. Keep
  53 `unsafe` blocks small; you’ll be thankful later when you investigate memory
  54 bugs.
  55
  56 To isolate unsafe code as much as possible, it’s best to enclose unsafe code
  57 within a safe abstraction and provide a safe API, which we’ll discuss later in
  58 the chapter when we examine unsafe functions and methods. Parts of the standard
  59 library are implemented as safe abstractions over unsafe code that has been
  60 audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe`
  61 from leaking out into all the places that you or your users might want to use
  62 the functionality implemented with `unsafe` code, because using a safe
  63 abstraction is safe.
  64
  65 Let’s look at each of the five unsafe superpowers in turn. We’ll also look at
  66 some abstractions that provide a safe interface to unsafe code.
  67
  68 ### Dereferencing a Raw Pointer
  69
  70 In Chapter 4, in the [“Dangling References”][dangling-references]<!-- ignore
  71 --> section, we mentioned that the compiler ensures references are always
  72 valid. Unsafe Rust has two new types called *raw pointers* that are similar to
  73 references. As with references, raw pointers can be immutable or mutable and
  74 are written as `*const T` and `*mut T`, respectively. The asterisk isn’t the
  75 dereference operator; it’s part of the type name. In the context of raw
  76 pointers, *immutable* means that the pointer can’t be directly assigned to
  77 after being dereferenced.
  78
  79 Different from references and smart pointers, raw pointers:
  80
  81 * Are allowed to ignore the borrowing rules by having both immutable and
  82   mutable pointers or multiple mutable pointers to the same location
  83 * Aren’t guaranteed to point to valid memory
  84 * Are allowed to be null
  85 * Don’t implement any automatic cleanup
  86
  87 By opting out of having Rust enforce these guarantees, you can give up
  88 guaranteed safety in exchange for greater performance or the ability to
  89 interface with another language or hardware where Rust’s guarantees don’t apply.
  90
  91 Listing 19-1 shows how to create an immutable and a mutable raw pointer from
  92 references.
  93
  94 ```rust
  95 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-01/src/main.rs:here}}
  96 ```
  97
  98 <span class="caption">Listing 19-1: Creating raw pointers from references</span>
  99
 100 Notice that we don’t include the `unsafe` keyword in this code. We can create
 101 raw pointers in safe code; we just can’t dereference raw pointers outside an
 102 unsafe block, as you’ll see in a bit.
 103
 104 We’ve created raw pointers by using `as` to cast an immutable and a mutable
 105 reference into their corresponding raw pointer types. Because we created them
 106 directly from references guaranteed to be valid, we know these particular raw
 107 pointers are valid, but we can’t make that assumption about just any raw
 108 pointer.
 109
 110 Next, we’ll create a raw pointer whose validity we can’t be so certain of.
 111 Listing 19-2 shows how to create a raw pointer to an arbitrary location in
 112 memory. Trying to use arbitrary memory is undefined: there might be data at
 113 that address or there might not, the compiler might optimize the code so there
 114 is no memory access, or the program might error with a segmentation fault.
 115 Usually, there is no good reason to write code like this, but it is possible.
 116
 117 ```rust
 118 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-02/src/main.rs:here}}
 119 ```
 120
 121 <span class="caption">Listing 19-2: Creating a raw pointer to an arbitrary
 122 memory address</span>
 123
 124 Recall that we can create raw pointers in safe code, but we can’t *dereference*
 125 raw pointers and read the data being pointed to. In Listing 19-3, we use the
 126 dereference operator `*` on a raw pointer that requires an `unsafe` block.
 127
 128 ```rust
 129 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-03/src/main.rs:here}}
 130 ```
 131
 132 <span class="caption">Listing 19-3: Dereferencing raw pointers within an
 133 `unsafe` block</span>
 134
 135 Creating a pointer does no harm; it’s only when we try to access the value that
 136 it points at that we might end up dealing with an invalid value.
 137
 138 Note also that in Listing 19-1 and 19-3, we created `*const i32` and `*mut i32`
 139 raw pointers that both pointed to the same memory location, where `num` is
 140 stored. If we instead tried to create an immutable and a mutable reference to
 141 `num`, the code would not have compiled because Rust’s ownership rules don’t
 142 allow a mutable reference at the same time as any immutable references. With
 143 raw pointers, we can create a mutable pointer and an immutable pointer to the
 144 same location and change data through the mutable pointer, potentially creating
 145 a data race. Be careful!
 146
 147 With all of these dangers, why would you ever use raw pointers? One major use
 148 case is when interfacing with C code, as you’ll see in the next section,
 149 [“Calling an Unsafe Function or
 150 Method.”](#calling-an-unsafe-function-or-method)<!-- ignore --> Another case is
 151 when building up safe abstractions that the borrow checker doesn’t understand.
 152 We’ll introduce unsafe functions and then look at an example of a safe
 153 abstraction that uses unsafe code.
 154
 155 ### Calling an Unsafe Function or Method
 156
 157 The second type of operation that requires an unsafe block is calls to unsafe
 158 functions. Unsafe functions and methods look exactly like regular functions and
 159 methods, but they have an extra `unsafe` before the rest of the definition. The
 160 `unsafe` keyword in this context indicates the function has requirements we
 161 need to uphold when we call this function, because Rust can’t guarantee we’ve
 162 met these requirements. By calling an unsafe function within an `unsafe` block,
 163 we’re saying that we’ve read this function’s documentation and take
 164 responsibility for upholding the function’s contracts.
 165
 166 Here is an unsafe function named `dangerous` that doesn’t do anything in its
 167 body:
 168
 169 ```rust
 170 {{#rustdoc_include ../listings/ch19-advanced-features/no-listing-01-unsafe-fn/src/main.rs:here}}
 171 ```
 172
 173 We must call the `dangerous` function within a separate `unsafe` block. If we
 174 try to call `dangerous` without the `unsafe` block, we’ll get an error:
 175
 176 ```console
 177 {{#include ../listings/ch19-advanced-features/output-only-01-missing-unsafe/output.txt}}
 178 ```
 179
 180 By inserting the `unsafe` block around our call to `dangerous`, we’re asserting
 181 to Rust that we’ve read the function’s documentation, we understand how to use
 182 it properly, and we’ve verified that we’re fulfilling the contract of the
 183 function.
 184
 185 Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other
 186 unsafe operations within an unsafe function, we don’t need to add another
 187 `unsafe` block.
 188
 189 #### Creating a Safe Abstraction over Unsafe Code
 190
 191 Just because a function contains unsafe code doesn’t mean we need to mark the
 192 entire function as unsafe. In fact, wrapping unsafe code in a safe function is
 193 a common abstraction. As an example, let’s study a function from the standard
 194 library, `split_at_mut`, that requires some unsafe code and explore how we
 195 might implement it. This safe method is defined on mutable slices: it takes one
 196 slice and makes it two by splitting the slice at the index given as an
 197 argument. Listing 19-4 shows how to use `split_at_mut`.
 198
 199 ```rust
 200 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-04/src/main.rs:here}}
 201 ```
 202
 203 <span class="caption">Listing 19-4: Using the safe `split_at_mut`
 204 function</span>
 205
 206 We can’t implement this function using only safe Rust. An attempt might look
 207 something like Listing 19-5, which won’t compile. For simplicity, we’ll
 208 implement `split_at_mut` as a function rather than a method and only for slices
 209 of `i32` values rather than for a generic type `T`.
 210
 211 ```rust,ignore,does_not_compile
 212 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-05/src/main.rs:here}}
 213 ```
 214
 215 <span class="caption">Listing 19-5: An attempted implementation of
 216 `split_at_mut` using only safe Rust</span>
 217
 218 This function first gets the total length of the slice. Then it asserts that
 219 the index given as a parameter is within the slice by checking whether it’s
 220 less than or equal to the length. The assertion means that if we pass an index
 221 that is greater than the length to split the slice at, the function will panic
 222 before it attempts to use that index.
 223
 224 Then we return two mutable slices in a tuple: one from the start of the
 225 original slice to the `mid` index and another from `mid` to the end of the
 226 slice.
 227
 228 When we try to compile the code in Listing 19-5, we’ll get an error.
 229
 230 ```console
 231 {{#include ../listings/ch19-advanced-features/listing-19-05/output.txt}}
 232 ```
 233
 234 Rust’s borrow checker can’t understand that we’re borrowing different parts of
 235 the slice; it only knows that we’re borrowing from the same slice twice.
 236 Borrowing different parts of a slice is fundamentally okay because the two
 237 slices aren’t overlapping, but Rust isn’t smart enough to know this. When we
 238 know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.
 239
 240 Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls
 241 to unsafe functions to make the implementation of `split_at_mut` work.
 242
 243 ```rust
 244 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-06/src/main.rs:here}}
 245 ```
 246
 247 <span class="caption">Listing 19-6: Using unsafe code in the implementation of
 248 the `split_at_mut` function</span>
 249
 250 Recall from [“The Slice Type”][the-slice-type]<!-- ignore --> section in
 251 Chapter 4 that slices are a pointer to some data and the length of the slice.
 252 We use the `len` method to get the length of a slice and the `as_mut_ptr`
 253 method to access the raw pointer of a slice. In this case, because we have a
 254 mutable slice to `i32` values, `as_mut_ptr` returns a raw pointer with the type
 255 `*mut i32`, which we’ve stored in the variable `ptr`.
 256
 257 We keep the assertion that the `mid` index is within the slice. Then we get to
 258 the unsafe code: the `slice::from_raw_parts_mut` function takes a raw pointer
 259 and a length, and it creates a slice. We use this function to create a slice
 260 that starts from `ptr` and is `mid` items long. Then we call the `add`
 261 method on `ptr` with `mid` as an argument to get a raw pointer that starts at
 262 `mid`, and we create a slice using that pointer and the remaining number of
 263 items after `mid` as the length.
 264
 265 The function `slice::from_raw_parts_mut` is unsafe because it takes a raw
 266 pointer and must trust that this pointer is valid. The `add` method on raw
 267 pointers is also unsafe, because it must trust that the offset location is also
 268 a valid pointer. Therefore, we had to put an `unsafe` block around our calls to
 269 `slice::from_raw_parts_mut` and `add` so we could call them. By looking at
 270 the code and by adding the assertion that `mid` must be less than or equal to
 271 `len`, we can tell that all the raw pointers used within the `unsafe` block
 272 will be valid pointers to data within the slice. This is an acceptable and
 273 appropriate use of `unsafe`.
 274
 275 Note that we don’t need to mark the resulting `split_at_mut` function as
 276 `unsafe`, and we can call this function from safe Rust. We’ve created a safe
 277 abstraction to the unsafe code with an implementation of the function that uses
 278 `unsafe` code in a safe way, because it creates only valid pointers from the
 279 data this function has access to.
 280
 281 In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would
 282 likely crash when the slice is used. This code takes an arbitrary memory
 283 location and creates a slice 10,000 items long.
 284
 285 ```rust
 286 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-07/src/main.rs:here}}
 287 ```
 288
 289 <span class="caption">Listing 19-7: Creating a slice from an arbitrary memory
 290 location</span>
 291
 292 We don’t own the memory at this arbitrary location, and there is no guarantee
 293 that the slice this code creates contains valid `i32` values. Attempting to use
 294 `slice` as though it’s a valid slice results in undefined behavior.
 295
 296 #### Using `extern` Functions to Call External Code
 297
 298 Sometimes, your Rust code might need to interact with code written in another
 299 language. For this, Rust has a keyword, `extern`, that facilitates the creation
 300 and use of a *Foreign Function Interface (FFI)*. An FFI is a way for a
 301 programming language to define functions and enable a different (foreign)
 302 programming language to call those functions.
 303
 304 Listing 19-8 demonstrates how to set up an integration with the `abs` function
 305 from the C standard library. Functions declared within `extern` blocks are
 306 always unsafe to call from Rust code. The reason is that other languages don’t
 307 enforce Rust’s rules and guarantees, and Rust can’t check them, so
 308 responsibility falls on the programmer to ensure safety.
 309
 310 <span class="filename">Filename: src/main.rs</span>
 311
 312 ```rust
 313 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-08/src/main.rs}}
 314 ```
 315
 316 <span class="caption">Listing 19-8: Declaring and calling an `extern` function
 317 defined in another language</span>
 318
 319 Within the `extern "C"` block, we list the names and signatures of external
 320 functions from another language we want to call. The `"C"` part defines which
 321 *application binary interface (ABI)* the external function uses: the ABI
 322 defines how to call the function at the assembly level. The `"C"` ABI is the
 323 most common and follows the C programming language’s ABI.
 324
 325 > #### Calling Rust Functions from Other Languages
 326 >
 327 > We can also use `extern` to create an interface that allows other languages
 328 > to call Rust functions. Instead of an `extern` block, we add the `extern`
 329 > keyword and specify the ABI to use just before the `fn` keyword. We also need
 330 > to add a `#[no_mangle]` annotation to tell the Rust compiler not to mangle
 331 > the name of this function. *Mangling* is when a compiler changes the name
 332 > we’ve given a function to a different name that contains more information for
 333 > other parts of the compilation process to consume but is less human readable.
 334 > Every programming language compiler mangles names slightly differently, so
 335 > for a Rust function to be nameable by other languages, we must disable the
 336 > Rust compiler’s name mangling.
 337 >
 338 > In the following example, we make the `call_from_c` function accessible from
 339 > C code, after it’s compiled to a shared library and linked from C:
 340 >
 341 > ```rust
 342 > #[no_mangle]
 343 > pub extern "C" fn call_from_c() {
 344 >     println!("Just called a Rust function from C!");
 345 > }
 346 > ```
 347 >
 348 > This usage of `extern` does not require `unsafe`.
 349
 350 ### Accessing or Modifying a Mutable Static Variable
 351
 352 Until now, we’ve not talked about *global variables*, which Rust does support
 353 but can be problematic with Rust’s ownership rules. If two threads are
 354 accessing the same mutable global variable, it can cause a data race.
 355
 356 In Rust, global variables are called *static* variables. Listing 19-9 shows an
 357 example declaration and use of a static variable with a string slice as a
 358 value.
 359
 360 <span class="filename">Filename: src/main.rs</span>
 361
 362 ```rust
 363 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-09/src/main.rs}}
 364 ```
 365
 366 <span class="caption">Listing 19-9: Defining and using an immutable static
 367 variable</span>
 368
 369 Static variables are similar to constants, which we discussed in the
 370 [“Differences Between Variables and
 371 Constants”][differences-between-variables-and-constants]<!-- ignore -->
 372 section in Chapter 3. The names of static variables are in
 373 `SCREAMING_SNAKE_CASE` by convention. Static variables can only store
 374 references with the `'static` lifetime, which means the Rust compiler can
 375 figure out the lifetime and we aren’t required to annotate it explicitly.
 376 Accessing an immutable static variable is safe.
 377
 378 Constants and immutable static variables might seem similar, but a subtle
 379 difference is that values in a static variable have a fixed address in memory.
 380 Using the value will always access the same data. Constants, on the other hand,
 381 are allowed to duplicate their data whenever they’re used.
 382
 383 Another difference between constants and static variables is that static
 384 variables can be mutable. Accessing and modifying mutable static variables is
 385 *unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable
 386 static variable named `COUNTER`.
 387
 388 <span class="filename">Filename: src/main.rs</span>
 389
 390 ```rust
 391 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-10/src/main.rs}}
 392 ```
 393
 394 <span class="caption">Listing 19-10: Reading from or writing to a mutable
 395 static variable is unsafe</span>
 396
 397 As with regular variables, we specify mutability using the `mut` keyword. Any
 398 code that reads or writes from `COUNTER` must be within an `unsafe` block. This
 399 code compiles and prints `COUNTER: 3` as we would expect because it’s single
 400 threaded. Having multiple threads access `COUNTER` would likely result in data
 401 races.
 402
 403 With mutable data that is globally accessible, it’s difficult to ensure there
 404 are no data races, which is why Rust considers mutable static variables to be
 405 unsafe. Where possible, it’s preferable to use the concurrency techniques and
 406 thread-safe smart pointers we discussed in Chapter 16 so the compiler checks
 407 that data accessed from different threads is done safely.
 408
 409 ### Implementing an Unsafe Trait
 410
 411 Another use case for `unsafe` is implementing an unsafe trait. A trait is
 412 unsafe when at least one of its methods has some invariant that the compiler
 413 can’t verify. We can declare that a trait is `unsafe` by adding the `unsafe`
 414 keyword before `trait` and marking the implementation of the trait as `unsafe`
 415 too, as shown in Listing 19-11.
 416
 417 ```rust
 418 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-11/src/main.rs}}
 419 ```
 420
 421 <span class="caption">Listing 19-11: Defining and implementing an unsafe
 422 trait</span>
 423
 424 By using `unsafe impl`, we’re promising that we’ll uphold the invariants that
 425 the compiler can’t verify.
 426
 427 As an example, recall the `Sync` and `Send` marker traits we discussed in the
 428 [“Extensible Concurrency with the `Sync` and `Send`
 429 Traits”][extensible-concurrency-with-the-sync-and-send-traits]<!-- ignore -->
 430 section in Chapter 16: the compiler implements these traits automatically if
 431 our types are composed entirely of `Send` and `Sync` types. If we implement a
 432 type that contains a type that is not `Send` or `Sync`, such as raw pointers,
 433 and we want to mark that type as `Send` or `Sync`, we must use `unsafe`. Rust
 434 can’t verify that our type upholds the guarantees that it can be safely sent
 435 across threads or accessed from multiple threads; therefore, we need to do
 436 those checks manually and indicate as such with `unsafe`.
 437
 438 ### Accessing Fields of a Union
 439
 440 The final action that works only with `unsafe` is accessing fields of a
 441 *union*. A `union` is similar to a `struct`, but only one declared field is
 442 used in a particular instance at one time. Unions are primarily used to
 443 interface with unions in C code. Accessing union fields is unsafe because Rust
 444 can’t guarantee the type of the data currently being stored in the union
 445 instance. You can learn more about unions in [the reference][reference].
 446
 447 ### When to Use Unsafe Code
 448
 449 Using `unsafe` to take one of the five actions (superpowers) just discussed
 450 isn’t wrong or even frowned upon. But it is trickier to get `unsafe` code
 451 correct because the compiler can’t help uphold memory safety. When you have a
 452 reason to use `unsafe` code, you can do so, and having the explicit `unsafe`
 453 annotation makes it easier to track down the source of problems when they occur.
 454
 455 [dangling-references]:
 456 ch04-02-references-and-borrowing.html#dangling-references
 457 [differences-between-variables-and-constants]:
 458 ch03-01-variables-and-mutability.html#constants
 459 [extensible-concurrency-with-the-sync-and-send-traits]:
 460 ch16-04-extensible-concurrency-sync-and-send.html#extensible-concurrency-with-the-sync-and-send-traits
 461 [the-slice-type]: ch04-03-slices.html#the-slice-type
 462 [reference]: ../reference/items/unions.html