]> git.proxmox.com Git - rustc.git/blob - src/doc/book/src/ch19-01-unsafe-rust.md
New upstream version 1.63.0+dfsg1
[rustc.git] / src / doc / book / src / ch19-01-unsafe-rust.md
1 ## Unsafe Rust
2
3 All the code we’ve discussed so far has had Rust’s memory safety guarantees
4 enforced at compile time. However, Rust has a second language hidden inside it
5 that doesn’t enforce these memory safety guarantees: it’s called *unsafe Rust*
6 and works just like regular Rust, but gives us extra superpowers.
7
8 Unsafe Rust exists because, by nature, static analysis is conservative. When
9 the compiler tries to determine whether or not code upholds the guarantees,
10 it’s better for it to reject some valid programs than to accept some invalid
11 programs. Although the code *might* be okay, if the Rust compiler doesn’t have
12 enough information to be confident, it will reject the code. In these cases,
13 you can use unsafe code to tell the compiler, “Trust me, I know what I’m
14 doing.” Be warned, however, that you use unsafe Rust at your own risk: if you
15 use unsafe code incorrectly, problems can occur due to memory unsafety, such as
16 null pointer dereferencing.
17
18 Another reason Rust has an unsafe alter ego is that the underlying computer
19 hardware is inherently unsafe. If Rust didn’t let you do unsafe operations, you
20 couldn’t do certain tasks. Rust needs to allow you to do low-level systems
21 programming, such as directly interacting with the operating system or even
22 writing your own operating system. Working with low-level systems programming
23 is one of the goals of the language. Let’s explore what we can do with unsafe
24 Rust and how to do it.
25
26 ### Unsafe Superpowers
27
28 To switch to unsafe Rust, use the `unsafe` keyword and then start a new block
29 that holds the unsafe code. You can take five actions in unsafe Rust that you
30 can’t in safe Rust, which we call *unsafe superpowers*. Those superpowers
31 include the ability to:
32
33 * Dereference a raw pointer
34 * Call an unsafe function or method
35 * Access or modify a mutable static variable
36 * Implement an unsafe trait
37 * Access fields of `union`s
38
39 It’s important to understand that `unsafe` doesn’t turn off the borrow checker
40 or disable any other of Rust’s safety checks: if you use a reference in unsafe
41 code, it will still be checked. The `unsafe` keyword only gives you access to
42 these five features that are then not checked by the compiler for memory
43 safety. You’ll still get some degree of safety inside of an unsafe block.
44
45 In addition, `unsafe` does not mean the code inside the block is necessarily
46 dangerous or that it will definitely have memory safety problems: the intent is
47 that as the programmer, you’ll ensure the code inside an `unsafe` block will
48 access memory in a valid way.
49
50 People are fallible, and mistakes will happen, but by requiring these five
51 unsafe operations to be inside blocks annotated with `unsafe` you’ll know that
52 any errors related to memory safety must be within an `unsafe` block. Keep
53 `unsafe` blocks small; you’ll be thankful later when you investigate memory
54 bugs.
55
56 To isolate unsafe code as much as possible, it’s best to enclose unsafe code
57 within a safe abstraction and provide a safe API, which we’ll discuss later in
58 the chapter when we examine unsafe functions and methods. Parts of the standard
59 library are implemented as safe abstractions over unsafe code that has been
60 audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe`
61 from leaking out into all the places that you or your users might want to use
62 the functionality implemented with `unsafe` code, because using a safe
63 abstraction is safe.
64
65 Let’s look at each of the five unsafe superpowers in turn. We’ll also look at
66 some abstractions that provide a safe interface to unsafe code.
67
68 ### Dereferencing a Raw Pointer
69
70 In Chapter 4, in the [“Dangling References”][dangling-references]<!-- ignore
71 --> section, we mentioned that the compiler ensures references are always
72 valid. Unsafe Rust has two new types called *raw pointers* that are similar to
73 references. As with references, raw pointers can be immutable or mutable and
74 are written as `*const T` and `*mut T`, respectively. The asterisk isn’t the
75 dereference operator; it’s part of the type name. In the context of raw
76 pointers, *immutable* means that the pointer can’t be directly assigned to
77 after being dereferenced.
78
79 Different from references and smart pointers, raw pointers:
80
81 * Are allowed to ignore the borrowing rules by having both immutable and
82 mutable pointers or multiple mutable pointers to the same location
83 * Aren’t guaranteed to point to valid memory
84 * Are allowed to be null
85 * Don’t implement any automatic cleanup
86
87 By opting out of having Rust enforce these guarantees, you can give up
88 guaranteed safety in exchange for greater performance or the ability to
89 interface with another language or hardware where Rust’s guarantees don’t apply.
90
91 Listing 19-1 shows how to create an immutable and a mutable raw pointer from
92 references.
93
94 ```rust
95 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-01/src/main.rs:here}}
96 ```
97
98 <span class="caption">Listing 19-1: Creating raw pointers from references</span>
99
100 Notice that we don’t include the `unsafe` keyword in this code. We can create
101 raw pointers in safe code; we just can’t dereference raw pointers outside an
102 unsafe block, as you’ll see in a bit.
103
104 We’ve created raw pointers by using `as` to cast an immutable and a mutable
105 reference into their corresponding raw pointer types. Because we created them
106 directly from references guaranteed to be valid, we know these particular raw
107 pointers are valid, but we can’t make that assumption about just any raw
108 pointer.
109
110 To demonstrate this, next we’ll create a raw pointer whose validity we can’t be
111 so certain of. Listing 19-2 shows how to create a raw pointer to an arbitrary
112 location in memory. Trying to use arbitrary memory is undefined: there might be
113 data at that address or there might not, the compiler might optimize the code
114 so there is no memory access, or the program might error with a segmentation
115 fault. Usually, there is no good reason to write code like this, but it is
116 possible.
117
118 ```rust
119 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-02/src/main.rs:here}}
120 ```
121
122 <span class="caption">Listing 19-2: Creating a raw pointer to an arbitrary
123 memory address</span>
124
125 Recall that we can create raw pointers in safe code, but we can’t *dereference*
126 raw pointers and read the data being pointed to. In Listing 19-3, we use the
127 dereference operator `*` on a raw pointer that requires an `unsafe` block.
128
129 ```rust
130 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-03/src/main.rs:here}}
131 ```
132
133 <span class="caption">Listing 19-3: Dereferencing raw pointers within an
134 `unsafe` block</span>
135
136 Creating a pointer does no harm; it’s only when we try to access the value that
137 it points at that we might end up dealing with an invalid value.
138
139 Note also that in Listing 19-1 and 19-3, we created `*const i32` and `*mut i32`
140 raw pointers that both pointed to the same memory location, where `num` is
141 stored. If we instead tried to create an immutable and a mutable reference to
142 `num`, the code would not have compiled because Rust’s ownership rules don’t
143 allow a mutable reference at the same time as any immutable references. With
144 raw pointers, we can create a mutable pointer and an immutable pointer to the
145 same location and change data through the mutable pointer, potentially creating
146 a data race. Be careful!
147
148 With all of these dangers, why would you ever use raw pointers? One major use
149 case is when interfacing with C code, as you’ll see in the next section,
150 [“Calling an Unsafe Function or
151 Method.”](#calling-an-unsafe-function-or-method)<!-- ignore --> Another case is
152 when building up safe abstractions that the borrow checker doesn’t understand.
153 We’ll introduce unsafe functions and then look at an example of a safe
154 abstraction that uses unsafe code.
155
156 ### Calling an Unsafe Function or Method
157
158 The second type of operation you can perform in an unsafe block is calling
159 unsafe functions. Unsafe functions and methods look exactly like regular
160 functions and methods, but they have an extra `unsafe` before the rest of the
161 definition. The `unsafe` keyword in this context indicates the function has
162 requirements we need to uphold when we call this function, because Rust can’t
163 guarantee we’ve met these requirements. By calling an unsafe function within an
164 `unsafe` block, we’re saying that we’ve read this function’s documentation and
165 take responsibility for upholding the function’s contracts.
166
167 Here is an unsafe function named `dangerous` that doesn’t do anything in its
168 body:
169
170 ```rust
171 {{#rustdoc_include ../listings/ch19-advanced-features/no-listing-01-unsafe-fn/src/main.rs:here}}
172 ```
173
174 We must call the `dangerous` function within a separate `unsafe` block. If we
175 try to call `dangerous` without the `unsafe` block, we’ll get an error:
176
177 ```console
178 {{#include ../listings/ch19-advanced-features/output-only-01-missing-unsafe/output.txt}}
179 ```
180
181 With the `unsafe` block, we’re asserting to Rust that we’ve read the function’s
182 documentation, we understand how to use it properly, and we’ve verified that
183 we’re fulfilling the contract of the function.
184
185 Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other
186 unsafe operations within an unsafe function, we don’t need to add another
187 `unsafe` block.
188
189 #### Creating a Safe Abstraction over Unsafe Code
190
191 Just because a function contains unsafe code doesn’t mean we need to mark the
192 entire function as unsafe. In fact, wrapping unsafe code in a safe function is
193 a common abstraction. As an example, let’s study the `split_at_mut` function
194 from the standard library, which requires some unsafe code. We’ll explore how
195 we might implement it. This safe method is defined on mutable slices: it takes
196 one slice and makes it two by splitting the slice at the index given as an
197 argument. Listing 19-4 shows how to use `split_at_mut`.
198
199 ```rust
200 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-04/src/main.rs:here}}
201 ```
202
203 <span class="caption">Listing 19-4: Using the safe `split_at_mut`
204 function</span>
205
206 We can’t implement this function using only safe Rust. An attempt might look
207 something like Listing 19-5, which won’t compile. For simplicity, we’ll
208 implement `split_at_mut` as a function rather than a method and only for slices
209 of `i32` values rather than for a generic type `T`.
210
211 ```rust,ignore,does_not_compile
212 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-05/src/main.rs:here}}
213 ```
214
215 <span class="caption">Listing 19-5: An attempted implementation of
216 `split_at_mut` using only safe Rust</span>
217
218 This function first gets the total length of the slice. Then it asserts that
219 the index given as a parameter is within the slice by checking whether it’s
220 less than or equal to the length. The assertion means that if we pass an index
221 that is greater than the length to split the slice at, the function will panic
222 before it attempts to use that index.
223
224 Then we return two mutable slices in a tuple: one from the start of the
225 original slice to the `mid` index and another from `mid` to the end of the
226 slice.
227
228 When we try to compile the code in Listing 19-5, we’ll get an error.
229
230 ```console
231 {{#include ../listings/ch19-advanced-features/listing-19-05/output.txt}}
232 ```
233
234 Rust’s borrow checker can’t understand that we’re borrowing different parts of
235 the slice; it only knows that we’re borrowing from the same slice twice.
236 Borrowing different parts of a slice is fundamentally okay because the two
237 slices aren’t overlapping, but Rust isn’t smart enough to know this. When we
238 know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.
239
240 Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls
241 to unsafe functions to make the implementation of `split_at_mut` work.
242
243 ```rust
244 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-06/src/main.rs:here}}
245 ```
246
247 <span class="caption">Listing 19-6: Using unsafe code in the implementation of
248 the `split_at_mut` function</span>
249
250 Recall from [“The Slice Type”][the-slice-type]<!-- ignore --> section in
251 Chapter 4 that slices are a pointer to some data and the length of the slice.
252 We use the `len` method to get the length of a slice and the `as_mut_ptr`
253 method to access the raw pointer of a slice. In this case, because we have a
254 mutable slice to `i32` values, `as_mut_ptr` returns a raw pointer with the type
255 `*mut i32`, which we’ve stored in the variable `ptr`.
256
257 We keep the assertion that the `mid` index is within the slice. Then we get to
258 the unsafe code: the `slice::from_raw_parts_mut` function takes a raw pointer
259 and a length, and it creates a slice. We use this function to create a slice
260 that starts from `ptr` and is `mid` items long. Then we call the `add`
261 method on `ptr` with `mid` as an argument to get a raw pointer that starts at
262 `mid`, and we create a slice using that pointer and the remaining number of
263 items after `mid` as the length.
264
265 The function `slice::from_raw_parts_mut` is unsafe because it takes a raw
266 pointer and must trust that this pointer is valid. The `add` method on raw
267 pointers is also unsafe, because it must trust that the offset location is also
268 a valid pointer. Therefore, we had to put an `unsafe` block around our calls to
269 `slice::from_raw_parts_mut` and `add` so we could call them. By looking at
270 the code and by adding the assertion that `mid` must be less than or equal to
271 `len`, we can tell that all the raw pointers used within the `unsafe` block
272 will be valid pointers to data within the slice. This is an acceptable and
273 appropriate use of `unsafe`.
274
275 Note that we don’t need to mark the resulting `split_at_mut` function as
276 `unsafe`, and we can call this function from safe Rust. We’ve created a safe
277 abstraction to the unsafe code with an implementation of the function that uses
278 `unsafe` code in a safe way, because it creates only valid pointers from the
279 data this function has access to.
280
281 In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would
282 likely crash when the slice is used. This code takes an arbitrary memory
283 location and creates a slice 10,000 items long.
284
285 ```rust
286 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-07/src/main.rs:here}}
287 ```
288
289 <span class="caption">Listing 19-7: Creating a slice from an arbitrary memory
290 location</span>
291
292 We don’t own the memory at this arbitrary location, and there is no guarantee
293 that the slice this code creates contains valid `i32` values. Attempting to use
294 `values` as though it’s a valid slice results in undefined behavior.
295
296 #### Using `extern` Functions to Call External Code
297
298 Sometimes, your Rust code might need to interact with code written in another
299 language. For this, Rust has the keyword `extern` that facilitates the creation
300 and use of a *Foreign Function Interface (FFI)*. An FFI is a way for a
301 programming language to define functions and enable a different (foreign)
302 programming language to call those functions.
303
304 Listing 19-8 demonstrates how to set up an integration with the `abs` function
305 from the C standard library. Functions declared within `extern` blocks are
306 always unsafe to call from Rust code. The reason is that other languages don’t
307 enforce Rust’s rules and guarantees, and Rust can’t check them, so
308 responsibility falls on the programmer to ensure safety.
309
310 <span class="filename">Filename: src/main.rs</span>
311
312 ```rust
313 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-08/src/main.rs}}
314 ```
315
316 <span class="caption">Listing 19-8: Declaring and calling an `extern` function
317 defined in another language</span>
318
319 Within the `extern "C"` block, we list the names and signatures of external
320 functions from another language we want to call. The `"C"` part defines which
321 *application binary interface (ABI)* the external function uses: the ABI
322 defines how to call the function at the assembly level. The `"C"` ABI is the
323 most common and follows the C programming language’s ABI.
324
325 > #### Calling Rust Functions from Other Languages
326 >
327 > We can also use `extern` to create an interface that allows other languages
328 > to call Rust functions. Instead of an creating a whole `extern` block, we add
329 > the `extern` keyword and specify the ABI to use just before the `fn` keyword
330 > for the relevant function. We also need to add a `#[no_mangle]` annotation to
331 > tell the Rust compiler not to mangle the name of this function. *Mangling* is
332 > when a compiler changes the name we’ve given a function to a different name
333 > that contains more information for other parts of the compilation process to
334 > consume but is less human readable. Every programming language compiler
335 > mangles names slightly differently, so for a Rust function to be nameable by
336 > other languages, we must disable the Rust compiler’s name mangling.
337 >
338 > In the following example, we make the `call_from_c` function accessible from
339 > C code, after it’s compiled to a shared library and linked from C:
340 >
341 > ```rust
342 > #[no_mangle]
343 > pub extern "C" fn call_from_c() {
344 > println!("Just called a Rust function from C!");
345 > }
346 > ```
347 >
348 > This usage of `extern` does not require `unsafe`.
349
350 ### Accessing or Modifying a Mutable Static Variable
351
352 In this book, we’ve not yet talked about *global variables*, which Rust does
353 support but can be problematic with Rust’s ownership rules. If two threads are
354 accessing the same mutable global variable, it can cause a data race.
355
356 In Rust, global variables are called *static* variables. Listing 19-9 shows an
357 example declaration and use of a static variable with a string slice as a
358 value.
359
360 <span class="filename">Filename: src/main.rs</span>
361
362 ```rust
363 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-09/src/main.rs}}
364 ```
365
366 <span class="caption">Listing 19-9: Defining and using an immutable static
367 variable</span>
368
369 Static variables are similar to constants, which we discussed in the
370 [“Differences Between Variables and
371 Constants”][differences-between-variables-and-constants]<!-- ignore --> section
372 in Chapter 3. The names of static variables are in `SCREAMING_SNAKE_CASE` by
373 convention. Static variables can only store references with the `'static`
374 lifetime, which means the Rust compiler can figure out the lifetime and we
375 aren’t required to annotate it explicitly. Accessing an immutable static
376 variable is safe.
377
378 A subtle difference between constants and immutable static variables is that
379 values in a static variable have a fixed address in memory. Using the value
380 will always access the same data. Constants, on the other hand, are allowed to
381 duplicate their data whenever they’re used. Another difference is that static
382 variables can be mutable. Accessing and modifying mutable static variables is
383 *unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable
384 static variable named `COUNTER`.
385
386 <span class="filename">Filename: src/main.rs</span>
387
388 ```rust
389 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-10/src/main.rs}}
390 ```
391
392 <span class="caption">Listing 19-10: Reading from or writing to a mutable
393 static variable is unsafe</span>
394
395 As with regular variables, we specify mutability using the `mut` keyword. Any
396 code that reads or writes from `COUNTER` must be within an `unsafe` block. This
397 code compiles and prints `COUNTER: 3` as we would expect because it’s single
398 threaded. Having multiple threads access `COUNTER` would likely result in data
399 races.
400
401 With mutable data that is globally accessible, it’s difficult to ensure there
402 are no data races, which is why Rust considers mutable static variables to be
403 unsafe. Where possible, it’s preferable to use the concurrency techniques and
404 thread-safe smart pointers we discussed in Chapter 16 so the compiler checks
405 that data accessed from different threads is done safely.
406
407 ### Implementing an Unsafe Trait
408
409 We can use `unsafe` to implement an unsafe trait. A trait is unsafe when at
410 least one of its methods has some invariant that the compiler can’t verify. We
411 declare that a trait is `unsafe` by adding the `unsafe` keyword before `trait`
412 and marking the implementation of the trait as `unsafe` too, as shown in
413 Listing 19-11.
414
415 ```rust
416 {{#rustdoc_include ../listings/ch19-advanced-features/listing-19-11/src/main.rs}}
417 ```
418
419 <span class="caption">Listing 19-11: Defining and implementing an unsafe
420 trait</span>
421
422 By using `unsafe impl`, we’re promising that we’ll uphold the invariants that
423 the compiler can’t verify.
424
425 As an example, recall the `Sync` and `Send` marker traits we discussed in the
426 [“Extensible Concurrency with the `Sync` and `Send`
427 Traits”][extensible-concurrency-with-the-sync-and-send-traits]<!-- ignore -->
428 section in Chapter 16: the compiler implements these traits automatically if
429 our types are composed entirely of `Send` and `Sync` types. If we implement a
430 type that contains a type that is not `Send` or `Sync`, such as raw pointers,
431 and we want to mark that type as `Send` or `Sync`, we must use `unsafe`. Rust
432 can’t verify that our type upholds the guarantees that it can be safely sent
433 across threads or accessed from multiple threads; therefore, we need to do
434 those checks manually and indicate as such with `unsafe`.
435
436 ### Accessing Fields of a Union
437
438 The final action that works only with `unsafe` is accessing fields of a
439 *union*. A `union` is similar to a `struct`, but only one declared field is
440 used in a particular instance at one time. Unions are primarily used to
441 interface with unions in C code. Accessing union fields is unsafe because Rust
442 can’t guarantee the type of the data currently being stored in the union
443 instance. You can learn more about unions in [the Rust Reference][reference].
444
445 ### When to Use Unsafe Code
446
447 Using `unsafe` to take one of the five actions (superpowers) just discussed
448 isn’t wrong or even frowned upon. But it is trickier to get `unsafe` code
449 correct because the compiler can’t help uphold memory safety. When you have a
450 reason to use `unsafe` code, you can do so, and having the explicit `unsafe`
451 annotation makes it easier to track down the source of problems when they occur.
452
453 [dangling-references]:
454 ch04-02-references-and-borrowing.html#dangling-references
455 [differences-between-variables-and-constants]:
456 ch03-01-variables-and-mutability.html#constants
457 [extensible-concurrency-with-the-sync-and-send-traits]:
458 ch16-04-extensible-concurrency-sync-and-send.html#extensible-concurrency-with-the-sync-and-send-traits
459 [the-slice-type]: ch04-03-slices.html#the-slice-type
460 [reference]: ../reference/items/unions.html