]>
Commit | Line | Data |
---|---|---|
5e7ed085 FG |
1 | <!-- DO NOT EDIT THIS FILE. |
2 | ||
3 | This file is periodically generated from the content in the `/src/` | |
4 | directory, so all fixes need to be made in `/src/`. | |
5 | --> | |
6 | ||
7 | [TOC] | |
8 | ||
9 | # Advanced Features | |
10 | ||
11 | By now, you’ve learned the most commonly used parts of the Rust programming | |
12 | language. Before we do one more project in Chapter 20, we’ll look at a few | |
923072b8 FG |
13 | aspects of the language you might run into every once in a while, but may not |
14 | use every day. You can use this chapter as a reference for when you encounter | |
15 | any unknowns. The features covered here are useful in very specific situations. | |
16 | Although you might not reach for them often, we want to make sure you have a | |
17 | grasp of all the features Rust has to offer. | |
5e7ed085 FG |
18 | |
19 | In this chapter, we’ll cover: | |
20 | ||
21 | * Unsafe Rust: how to opt out of some of Rust’s guarantees and take | |
22 | responsibility for manually upholding those guarantees | |
23 | * Advanced traits: associated types, default type parameters, fully qualified | |
24 | syntax, supertraits, and the newtype pattern in relation to traits | |
25 | * Advanced types: more about the newtype pattern, type aliases, the never type, | |
26 | and dynamically sized types | |
27 | * Advanced functions and closures: function pointers and returning closures | |
28 | * Macros: ways to define code that defines more code at compile time | |
29 | ||
30 | It’s a panoply of Rust features with something for everyone! Let’s dive in! | |
31 | ||
32 | ## Unsafe Rust | |
33 | ||
34 | All the code we’ve discussed so far has had Rust’s memory safety guarantees | |
35 | enforced at compile time. However, Rust has a second language hidden inside it | |
36 | that doesn’t enforce these memory safety guarantees: it’s called *unsafe Rust* | |
37 | and works just like regular Rust, but gives us extra superpowers. | |
38 | ||
39 | Unsafe Rust exists because, by nature, static analysis is conservative. When | |
40 | the compiler tries to determine whether or not code upholds the guarantees, | |
923072b8 FG |
41 | it’s better for it to reject some valid programs than to accept some invalid |
42 | programs. Although the code *might* be okay, if the Rust compiler doesn’t have | |
43 | enough information to be confident, it will reject the code. In these cases, | |
44 | you can use unsafe code to tell the compiler, “Trust me, I know what I’m | |
45 | doing.” Be warned, however, that you use unsafe Rust at your own risk: if you | |
46 | use unsafe code incorrectly, problems can occur due to memory unsafety, such as | |
47 | null pointer dereferencing. | |
5e7ed085 FG |
48 | |
49 | Another reason Rust has an unsafe alter ego is that the underlying computer | |
50 | hardware is inherently unsafe. If Rust didn’t let you do unsafe operations, you | |
51 | couldn’t do certain tasks. Rust needs to allow you to do low-level systems | |
52 | programming, such as directly interacting with the operating system or even | |
53 | writing your own operating system. Working with low-level systems programming | |
54 | is one of the goals of the language. Let’s explore what we can do with unsafe | |
55 | Rust and how to do it. | |
56 | ||
57 | ### Unsafe Superpowers | |
58 | ||
59 | To switch to unsafe Rust, use the `unsafe` keyword and then start a new block | |
923072b8 FG |
60 | that holds the unsafe code. You can take five actions in unsafe Rust that you |
61 | can’t in safe Rust, which we call *unsafe superpowers*. Those superpowers | |
62 | include the ability to: | |
5e7ed085 FG |
63 | |
64 | * Dereference a raw pointer | |
65 | * Call an unsafe function or method | |
66 | * Access or modify a mutable static variable | |
67 | * Implement an unsafe trait | |
68 | * Access fields of `union`s | |
69 | ||
70 | It’s important to understand that `unsafe` doesn’t turn off the borrow checker | |
71 | or disable any other of Rust’s safety checks: if you use a reference in unsafe | |
72 | code, it will still be checked. The `unsafe` keyword only gives you access to | |
73 | these five features that are then not checked by the compiler for memory | |
74 | safety. You’ll still get some degree of safety inside of an unsafe block. | |
75 | ||
76 | In addition, `unsafe` does not mean the code inside the block is necessarily | |
77 | dangerous or that it will definitely have memory safety problems: the intent is | |
78 | that as the programmer, you’ll ensure the code inside an `unsafe` block will | |
79 | access memory in a valid way. | |
80 | ||
81 | People are fallible, and mistakes will happen, but by requiring these five | |
82 | unsafe operations to be inside blocks annotated with `unsafe` you’ll know that | |
83 | any errors related to memory safety must be within an `unsafe` block. Keep | |
84 | `unsafe` blocks small; you’ll be thankful later when you investigate memory | |
85 | bugs. | |
86 | ||
87 | To isolate unsafe code as much as possible, it’s best to enclose unsafe code | |
88 | within a safe abstraction and provide a safe API, which we’ll discuss later in | |
89 | the chapter when we examine unsafe functions and methods. Parts of the standard | |
90 | library are implemented as safe abstractions over unsafe code that has been | |
91 | audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe` | |
92 | from leaking out into all the places that you or your users might want to use | |
93 | the functionality implemented with `unsafe` code, because using a safe | |
94 | abstraction is safe. | |
95 | ||
96 | Let’s look at each of the five unsafe superpowers in turn. We’ll also look at | |
97 | some abstractions that provide a safe interface to unsafe code. | |
98 | ||
99 | ### Dereferencing a Raw Pointer | |
100 | ||
101 | In Chapter 4, in the “Dangling References” section, we mentioned that the | |
102 | compiler ensures references are always valid. Unsafe Rust has two new types | |
103 | called *raw pointers* that are similar to references. As with references, raw | |
104 | pointers can be immutable or mutable and are written as `*const T` and `*mut | |
105 | T`, respectively. The asterisk isn’t the dereference operator; it’s part of the | |
106 | type name. In the context of raw pointers, *immutable* means that the pointer | |
107 | can’t be directly assigned to after being dereferenced. | |
108 | ||
109 | Different from references and smart pointers, raw pointers: | |
110 | ||
111 | * Are allowed to ignore the borrowing rules by having both immutable and | |
112 | mutable pointers or multiple mutable pointers to the same location | |
113 | * Aren’t guaranteed to point to valid memory | |
114 | * Are allowed to be null | |
115 | * Don’t implement any automatic cleanup | |
116 | ||
117 | By opting out of having Rust enforce these guarantees, you can give up | |
118 | guaranteed safety in exchange for greater performance or the ability to | |
119 | interface with another language or hardware where Rust’s guarantees don’t apply. | |
120 | ||
121 | Listing 19-1 shows how to create an immutable and a mutable raw pointer from | |
122 | references. | |
123 | ||
124 | ``` | |
125 | let mut num = 5; | |
126 | ||
127 | let r1 = &num as *const i32; | |
128 | let r2 = &mut num as *mut i32; | |
129 | ``` | |
130 | ||
131 | Listing 19-1: Creating raw pointers from references | |
132 | ||
133 | Notice that we don’t include the `unsafe` keyword in this code. We can create | |
134 | raw pointers in safe code; we just can’t dereference raw pointers outside an | |
135 | unsafe block, as you’ll see in a bit. | |
136 | ||
137 | We’ve created raw pointers by using `as` to cast an immutable and a mutable | |
138 | reference into their corresponding raw pointer types. Because we created them | |
139 | directly from references guaranteed to be valid, we know these particular raw | |
140 | pointers are valid, but we can’t make that assumption about just any raw | |
141 | pointer. | |
142 | ||
923072b8 FG |
143 | To demonstrate this, next we’ll create a raw pointer whose validity we can’t be |
144 | so certain of. Listing 19-2 shows how to create a raw pointer to an arbitrary | |
145 | location in memory. Trying to use arbitrary memory is undefined: there might be | |
146 | data at that address or there might not, the compiler might optimize the code | |
147 | so there is no memory access, or the program might error with a segmentation | |
148 | fault. Usually, there is no good reason to write code like this, but it is | |
149 | possible. | |
5e7ed085 FG |
150 | |
151 | ``` | |
152 | let address = 0x012345usize; | |
153 | let r = address as *const i32; | |
154 | ``` | |
155 | ||
156 | Listing 19-2: Creating a raw pointer to an arbitrary memory address | |
157 | ||
158 | Recall that we can create raw pointers in safe code, but we can’t *dereference* | |
159 | raw pointers and read the data being pointed to. In Listing 19-3, we use the | |
160 | dereference operator `*` on a raw pointer that requires an `unsafe` block. | |
161 | ||
162 | ``` | |
163 | let mut num = 5; | |
164 | ||
165 | let r1 = &num as *const i32; | |
166 | let r2 = &mut num as *mut i32; | |
167 | ||
168 | unsafe { | |
169 | println!("r1 is: {}", *r1); | |
170 | println!("r2 is: {}", *r2); | |
171 | } | |
172 | ``` | |
173 | ||
174 | Listing 19-3: Dereferencing raw pointers within an `unsafe` block | |
175 | ||
176 | Creating a pointer does no harm; it’s only when we try to access the value that | |
177 | it points at that we might end up dealing with an invalid value. | |
178 | ||
179 | Note also that in Listing 19-1 and 19-3, we created `*const i32` and `*mut i32` | |
180 | raw pointers that both pointed to the same memory location, where `num` is | |
181 | stored. If we instead tried to create an immutable and a mutable reference to | |
182 | `num`, the code would not have compiled because Rust’s ownership rules don’t | |
183 | allow a mutable reference at the same time as any immutable references. With | |
184 | raw pointers, we can create a mutable pointer and an immutable pointer to the | |
185 | same location and change data through the mutable pointer, potentially creating | |
186 | a data race. Be careful! | |
187 | ||
188 | With all of these dangers, why would you ever use raw pointers? One major use | |
189 | case is when interfacing with C code, as you’ll see in the next section, | |
190 | “Calling an Unsafe Function or Method.” Another case is when building up safe | |
191 | abstractions that the borrow checker doesn’t understand. We’ll introduce unsafe | |
192 | functions and then look at an example of a safe abstraction that uses unsafe | |
193 | code. | |
194 | ||
195 | ### Calling an Unsafe Function or Method | |
196 | ||
923072b8 FG |
197 | The second type of operation you can perform in an unsafe block is calling |
198 | unsafe functions. Unsafe functions and methods look exactly like regular | |
199 | functions and methods, but they have an extra `unsafe` before the rest of the | |
200 | definition. The `unsafe` keyword in this context indicates the function has | |
201 | requirements we need to uphold when we call this function, because Rust can’t | |
202 | guarantee we’ve met these requirements. By calling an unsafe function within an | |
203 | `unsafe` block, we’re saying that we’ve read this function’s documentation and | |
204 | take responsibility for upholding the function’s contracts. | |
5e7ed085 FG |
205 | |
206 | Here is an unsafe function named `dangerous` that doesn’t do anything in its | |
207 | body: | |
208 | ||
209 | ``` | |
210 | unsafe fn dangerous() {} | |
211 | ||
212 | unsafe { | |
213 | dangerous(); | |
214 | } | |
215 | ``` | |
216 | ||
217 | We must call the `dangerous` function within a separate `unsafe` block. If we | |
218 | try to call `dangerous` without the `unsafe` block, we’ll get an error: | |
219 | ||
220 | ``` | |
221 | error[E0133]: call to unsafe function is unsafe and requires unsafe function or block | |
222 | --> src/main.rs:4:5 | |
223 | | | |
224 | 4 | dangerous(); | |
225 | | ^^^^^^^^^^^ call to unsafe function | |
226 | | | |
227 | = note: consult the function's documentation for information on how to avoid undefined behavior | |
228 | ``` | |
229 | ||
923072b8 FG |
230 | With the `unsafe` block, we’re asserting to Rust that we’ve read the function’s |
231 | documentation, we understand how to use it properly, and we’ve verified that | |
232 | we’re fulfilling the contract of the function. | |
5e7ed085 FG |
233 | |
234 | Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other | |
235 | unsafe operations within an unsafe function, we don’t need to add another | |
236 | `unsafe` block. | |
237 | ||
238 | #### Creating a Safe Abstraction over Unsafe Code | |
239 | ||
240 | Just because a function contains unsafe code doesn’t mean we need to mark the | |
241 | entire function as unsafe. In fact, wrapping unsafe code in a safe function is | |
923072b8 FG |
242 | a common abstraction. As an example, let’s study the `split_at_mut` function |
243 | from the standard library, which requires some unsafe code. We’ll explore how | |
244 | we might implement it. This safe method is defined on mutable slices: it takes | |
245 | one slice and makes it two by splitting the slice at the index given as an | |
5e7ed085 FG |
246 | argument. Listing 19-4 shows how to use `split_at_mut`. |
247 | ||
248 | ``` | |
249 | let mut v = vec![1, 2, 3, 4, 5, 6]; | |
250 | ||
251 | let r = &mut v[..]; | |
252 | ||
253 | let (a, b) = r.split_at_mut(3); | |
254 | ||
255 | assert_eq!(a, &mut [1, 2, 3]); | |
256 | assert_eq!(b, &mut [4, 5, 6]); | |
257 | ``` | |
258 | ||
259 | Listing 19-4: Using the safe `split_at_mut` function | |
260 | ||
261 | We can’t implement this function using only safe Rust. An attempt might look | |
262 | something like Listing 19-5, which won’t compile. For simplicity, we’ll | |
263 | implement `split_at_mut` as a function rather than a method and only for slices | |
264 | of `i32` values rather than for a generic type `T`. | |
265 | ||
266 | ``` | |
267 | fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) { | |
268 | let len = values.len(); | |
269 | ||
270 | assert!(mid <= len); | |
271 | ||
272 | (&mut values[..mid], &mut values[mid..]) | |
273 | } | |
274 | ``` | |
275 | ||
276 | Listing 19-5: An attempted implementation of `split_at_mut` using only safe Rust | |
277 | ||
278 | This function first gets the total length of the slice. Then it asserts that | |
279 | the index given as a parameter is within the slice by checking whether it’s | |
280 | less than or equal to the length. The assertion means that if we pass an index | |
281 | that is greater than the length to split the slice at, the function will panic | |
282 | before it attempts to use that index. | |
283 | ||
284 | Then we return two mutable slices in a tuple: one from the start of the | |
285 | original slice to the `mid` index and another from `mid` to the end of the | |
286 | slice. | |
287 | ||
288 | When we try to compile the code in Listing 19-5, we’ll get an error: | |
289 | ||
290 | ``` | |
291 | error[E0499]: cannot borrow `*values` as mutable more than once at a time | |
292 | --> src/main.rs:6:31 | |
293 | | | |
294 | 1 | fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) { | |
295 | | - let's call the lifetime of this reference `'1` | |
296 | ... | |
297 | 6 | (&mut values[..mid], &mut values[mid..]) | |
298 | | --------------------------^^^^^^-------- | |
299 | | | | | | |
300 | | | | second mutable borrow occurs here | |
301 | | | first mutable borrow occurs here | |
302 | | returning this value requires that `*values` is borrowed for `'1` | |
303 | ``` | |
304 | ||
305 | Rust’s borrow checker can’t understand that we’re borrowing different parts of | |
306 | the slice; it only knows that we’re borrowing from the same slice twice. | |
307 | Borrowing different parts of a slice is fundamentally okay because the two | |
308 | slices aren’t overlapping, but Rust isn’t smart enough to know this. When we | |
309 | know code is okay, but Rust doesn’t, it’s time to reach for unsafe code. | |
310 | ||
311 | Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls | |
312 | to unsafe functions to make the implementation of `split_at_mut` work. | |
313 | ||
314 | ``` | |
315 | use std::slice; | |
316 | ||
317 | fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) { | |
318 | [1] let len = values.len(); | |
319 | [2] let ptr = values.as_mut_ptr(); | |
320 | ||
321 | [3] assert!(mid <= len); | |
322 | ||
323 | [4] unsafe { | |
324 | ( | |
325 | [5] slice::from_raw_parts_mut(ptr, mid), | |
326 | [6] slice::from_raw_parts_mut(ptr.add(mid), len - mid), | |
327 | ) | |
328 | } | |
329 | } | |
330 | ``` | |
331 | ||
332 | Listing 19-6: Using unsafe code in the implementation of the `split_at_mut` | |
333 | function | |
334 | ||
923072b8 FG |
335 | |
336 | Recall from “The Slice Type” section in Chapter 4 that a slice is a pointer to | |
5e7ed085 FG |
337 | some data and the length of the slice. We use the `len` method to get the |
338 | length of a slice [1] and the `as_mut_ptr` method to access the raw pointer of | |
339 | a slice [2]. In this case, because we have a mutable slice to `i32` values, | |
340 | `as_mut_ptr` returns a raw pointer with the type `*mut i32`, which we’ve stored | |
341 | in the variable `ptr`. | |
342 | ||
343 | We keep the assertion that the `mid` index is within the slice [3]. Then we get | |
344 | to the unsafe code [4]: the `slice::from_raw_parts_mut` function takes a raw | |
923072b8 FG |
345 | pointer and a length, and it creates a slice. We use it to create a slice that |
346 | starts from `ptr` and is `mid` items long [5]. Then we call the `add` method on | |
347 | `ptr` with `mid` as an argument to get a raw pointer that starts at `mid`, and | |
348 | we create a slice using that pointer and the remaining number of items after | |
349 | `mid` as the length [6]. | |
5e7ed085 FG |
350 | |
351 | The function `slice::from_raw_parts_mut` is unsafe because it takes a raw | |
352 | pointer and must trust that this pointer is valid. The `add` method on raw | |
353 | pointers is also unsafe, because it must trust that the offset location is also | |
354 | a valid pointer. Therefore, we had to put an `unsafe` block around our calls to | |
355 | `slice::from_raw_parts_mut` and `add` so we could call them. By looking at | |
356 | the code and by adding the assertion that `mid` must be less than or equal to | |
357 | `len`, we can tell that all the raw pointers used within the `unsafe` block | |
358 | will be valid pointers to data within the slice. This is an acceptable and | |
359 | appropriate use of `unsafe`. | |
360 | ||
361 | Note that we don’t need to mark the resulting `split_at_mut` function as | |
362 | `unsafe`, and we can call this function from safe Rust. We’ve created a safe | |
363 | abstraction to the unsafe code with an implementation of the function that uses | |
364 | `unsafe` code in a safe way, because it creates only valid pointers from the | |
365 | data this function has access to. | |
366 | ||
367 | In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would | |
368 | likely crash when the slice is used. This code takes an arbitrary memory | |
369 | location and creates a slice 10,000 items long. | |
370 | ||
371 | ``` | |
372 | use std::slice; | |
373 | ||
374 | let address = 0x01234usize; | |
375 | let r = address as *mut i32; | |
376 | ||
377 | let values: &[i32] = unsafe { slice::from_raw_parts_mut(r, 10000) }; | |
378 | ``` | |
379 | ||
380 | Listing 19-7: Creating a slice from an arbitrary memory location | |
381 | ||
382 | We don’t own the memory at this arbitrary location, and there is no guarantee | |
383 | that the slice this code creates contains valid `i32` values. Attempting to use | |
384 | `values` as though it’s a valid slice results in undefined behavior. | |
385 | ||
386 | #### Using `extern` Functions to Call External Code | |
387 | ||
388 | Sometimes, your Rust code might need to interact with code written in another | |
923072b8 | 389 | language. For this, Rust has the keyword `extern` that facilitates the creation |
5e7ed085 FG |
390 | and use of a *Foreign Function Interface (FFI)*. An FFI is a way for a |
391 | programming language to define functions and enable a different (foreign) | |
392 | programming language to call those functions. | |
393 | ||
394 | Listing 19-8 demonstrates how to set up an integration with the `abs` function | |
395 | from the C standard library. Functions declared within `extern` blocks are | |
396 | always unsafe to call from Rust code. The reason is that other languages don’t | |
397 | enforce Rust’s rules and guarantees, and Rust can’t check them, so | |
398 | responsibility falls on the programmer to ensure safety. | |
399 | ||
400 | Filename: src/main.rs | |
401 | ||
402 | ``` | |
403 | extern "C" { | |
404 | fn abs(input: i32) -> i32; | |
405 | } | |
406 | ||
407 | fn main() { | |
408 | unsafe { | |
409 | println!("Absolute value of -3 according to C: {}", abs(-3)); | |
410 | } | |
411 | } | |
412 | ``` | |
413 | ||
414 | Listing 19-8: Declaring and calling an `extern` function defined in another | |
415 | language | |
416 | ||
417 | Within the `extern "C"` block, we list the names and signatures of external | |
418 | functions from another language we want to call. The `"C"` part defines which | |
419 | *application binary interface (ABI)* the external function uses: the ABI | |
420 | defines how to call the function at the assembly level. The `"C"` ABI is the | |
421 | most common and follows the C programming language’s ABI. | |
422 | ||
923072b8 FG |
423 | <!-- Totally optional - but do we want to mention the other external types |
424 | that Rust supports here? Also, do we want to mention there are helper | |
425 | crates for connecting to other languages, include C++? | |
426 | /JT --> | |
427 | <!-- I don't really want to get into the other external types or other | |
428 | languages; there are other resources that cover these topics better than I | |
429 | could here. /Carol --> | |
430 | ||
5e7ed085 FG |
431 | > #### Calling Rust Functions from Other Languages |
432 | > | |
433 | > We can also use `extern` to create an interface that allows other languages | |
923072b8 FG |
434 | > to call Rust functions. Instead of an creating a whole `extern` block, we add |
435 | > the `extern` keyword and specify the ABI to use just before the `fn` keyword | |
436 | > for the relevant function. We also need to add a `#[no_mangle]` annotation to | |
437 | > tell the Rust compiler not to mangle the name of this function. *Mangling* is | |
438 | > when a compiler changes the name we’ve given a function to a different name | |
439 | > that contains more information for other parts of the compilation process to | |
440 | > consume but is less human readable. Every programming language compiler | |
441 | > mangles names slightly differently, so for a Rust function to be nameable by | |
442 | > other languages, we must disable the Rust compiler’s name mangling. | |
5e7ed085 FG |
443 | > |
444 | > In the following example, we make the `call_from_c` function accessible from | |
445 | > C code, after it’s compiled to a shared library and linked from C: | |
446 | > | |
447 | > ``` | |
448 | > #[no_mangle] | |
449 | > pub extern "C" fn call_from_c() { | |
450 | > println!("Just called a Rust function from C!"); | |
451 | > } | |
452 | > ``` | |
453 | > | |
454 | > This usage of `extern` does not require `unsafe`. | |
455 | ||
456 | ### Accessing or Modifying a Mutable Static Variable | |
457 | ||
923072b8 FG |
458 | In this book, we’ve not yet talked about *global variables*, which Rust does |
459 | support but can be problematic with Rust’s ownership rules. If two threads are | |
5e7ed085 FG |
460 | accessing the same mutable global variable, it can cause a data race. |
461 | ||
462 | In Rust, global variables are called *static* variables. Listing 19-9 shows an | |
463 | example declaration and use of a static variable with a string slice as a | |
464 | value. | |
465 | ||
466 | Filename: src/main.rs | |
467 | ||
468 | ``` | |
469 | static HELLO_WORLD: &str = "Hello, world!"; | |
470 | ||
471 | fn main() { | |
472 | println!("name is: {}", HELLO_WORLD); | |
473 | } | |
474 | ``` | |
475 | ||
476 | Listing 19-9: Defining and using an immutable static variable | |
477 | ||
478 | Static variables are similar to constants, which we discussed in the | |
479 | “Differences Between Variables and Constants” section in Chapter 3. The names | |
480 | of static variables are in `SCREAMING_SNAKE_CASE` by convention. Static | |
481 | variables can only store references with the `'static` lifetime, which means | |
482 | the Rust compiler can figure out the lifetime and we aren’t required to | |
483 | annotate it explicitly. Accessing an immutable static variable is safe. | |
484 | ||
923072b8 FG |
485 | A subtle difference between constants and immutable static variables is that |
486 | values in a static variable have a fixed address in memory. Using the value | |
487 | will always access the same data. Constants, on the other hand, are allowed to | |
488 | duplicate their data whenever they’re used. Another difference is that static | |
5e7ed085 FG |
489 | variables can be mutable. Accessing and modifying mutable static variables is |
490 | *unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable | |
491 | static variable named `COUNTER`. | |
492 | ||
493 | Filename: src/main.rs | |
494 | ||
495 | ``` | |
496 | static mut COUNTER: u32 = 0; | |
497 | ||
498 | fn add_to_count(inc: u32) { | |
499 | unsafe { | |
500 | COUNTER += inc; | |
501 | } | |
502 | } | |
503 | ||
504 | fn main() { | |
505 | add_to_count(3); | |
506 | ||
507 | unsafe { | |
508 | println!("COUNTER: {}", COUNTER); | |
509 | } | |
510 | } | |
511 | ``` | |
512 | ||
513 | Listing 19-10: Reading from or writing to a mutable static variable is unsafe | |
514 | ||
515 | As with regular variables, we specify mutability using the `mut` keyword. Any | |
516 | code that reads or writes from `COUNTER` must be within an `unsafe` block. This | |
517 | code compiles and prints `COUNTER: 3` as we would expect because it’s single | |
518 | threaded. Having multiple threads access `COUNTER` would likely result in data | |
519 | races. | |
520 | ||
521 | With mutable data that is globally accessible, it’s difficult to ensure there | |
522 | are no data races, which is why Rust considers mutable static variables to be | |
523 | unsafe. Where possible, it’s preferable to use the concurrency techniques and | |
524 | thread-safe smart pointers we discussed in Chapter 16 so the compiler checks | |
525 | that data accessed from different threads is done safely. | |
526 | ||
527 | ### Implementing an Unsafe Trait | |
528 | ||
923072b8 FG |
529 | We can use `unsafe` to implement an unsafe trait. A trait is unsafe when at |
530 | least one of its methods has some invariant that the compiler can’t verify. We | |
531 | declare that a trait is `unsafe` by adding the `unsafe` keyword before `trait` | |
532 | and marking the implementation of the trait as `unsafe` too, as shown in | |
533 | Listing 19-11. | |
5e7ed085 FG |
534 | |
535 | ``` | |
536 | unsafe trait Foo { | |
537 | // methods go here | |
538 | } | |
539 | ||
540 | unsafe impl Foo for i32 { | |
541 | // method implementations go here | |
542 | } | |
543 | ||
544 | fn main() {} | |
545 | ``` | |
546 | ||
547 | Listing 19-11: Defining and implementing an unsafe trait | |
548 | ||
549 | By using `unsafe impl`, we’re promising that we’ll uphold the invariants that | |
550 | the compiler can’t verify. | |
551 | ||
552 | As an example, recall the `Sync` and `Send` marker traits we discussed in the | |
553 | “Extensible Concurrency with the `Sync` and `Send` Traits” section in Chapter | |
554 | 16: the compiler implements these traits automatically if our types are | |
555 | composed entirely of `Send` and `Sync` types. If we implement a type that | |
556 | contains a type that is not `Send` or `Sync`, such as raw pointers, and we want | |
557 | to mark that type as `Send` or `Sync`, we must use `unsafe`. Rust can’t verify | |
558 | that our type upholds the guarantees that it can be safely sent across threads | |
559 | or accessed from multiple threads; therefore, we need to do those checks | |
560 | manually and indicate as such with `unsafe`. | |
561 | ||
562 | ### Accessing Fields of a Union | |
563 | ||
564 | The final action that works only with `unsafe` is accessing fields of a | |
565 | *union*. A `union` is similar to a `struct`, but only one declared field is | |
566 | used in a particular instance at one time. Unions are primarily used to | |
567 | interface with unions in C code. Accessing union fields is unsafe because Rust | |
568 | can’t guarantee the type of the data currently being stored in the union | |
569 | instance. You can learn more about unions in the Rust Reference at | |
570 | *https://doc.rust-lang.org/reference/items/unions.html*. | |
571 | ||
572 | ### When to Use Unsafe Code | |
573 | ||
923072b8 FG |
574 | Using `unsafe` to use one of the five superpowers just discussed isn’t wrong or |
575 | even frowned upon, but it is trickier to get `unsafe` code correct because the | |
576 | compiler can’t help uphold memory safety. When you have a reason to use | |
577 | `unsafe` code, you can do so, and having the explicit `unsafe` annotation makes | |
578 | it easier to track down the source of problems when they occur. | |
5e7ed085 FG |
579 | |
580 | ## Advanced Traits | |
581 | ||
582 | We first covered traits in the “Traits: Defining Shared Behavior” section of | |
583 | Chapter 10, but we didn’t discuss the more advanced details. Now that you know | |
584 | more about Rust, we can get into the nitty-gritty. | |
585 | ||
586 | ### Specifying Placeholder Types in Trait Definitions with Associated Types | |
587 | ||
588 | *Associated types* connect a type placeholder with a trait such that the trait | |
589 | method definitions can use these placeholder types in their signatures. The | |
923072b8 FG |
590 | implementor of a trait will specify the concrete type to be used instead of the |
591 | placeholder type for the particular implementation. That way, we can define a | |
592 | trait that uses some types without needing to know exactly what those types are | |
593 | until the trait is implemented. | |
5e7ed085 FG |
594 | |
595 | We’ve described most of the advanced features in this chapter as being rarely | |
596 | needed. Associated types are somewhere in the middle: they’re used more rarely | |
597 | than features explained in the rest of the book but more commonly than many of | |
598 | the other features discussed in this chapter. | |
599 | ||
600 | One example of a trait with an associated type is the `Iterator` trait that the | |
601 | standard library provides. The associated type is named `Item` and stands in | |
602 | for the type of the values the type implementing the `Iterator` trait is | |
923072b8 FG |
603 | iterating over. The definition of the `Iterator` trait is as shown in Listing |
604 | 19-12. | |
5e7ed085 FG |
605 | |
606 | ``` | |
607 | pub trait Iterator { | |
608 | type Item; | |
609 | ||
610 | fn next(&mut self) -> Option<Self::Item>; | |
611 | } | |
612 | ``` | |
613 | ||
614 | Listing 19-12: The definition of the `Iterator` trait that has an associated | |
615 | type `Item` | |
616 | ||
923072b8 FG |
617 | The type `Item` is a placeholder, and the `next` method’s definition shows that |
618 | it will return values of type `Option<Self::Item>`. Implementors of the | |
5e7ed085 FG |
619 | `Iterator` trait will specify the concrete type for `Item`, and the `next` |
620 | method will return an `Option` containing a value of that concrete type. | |
621 | ||
622 | Associated types might seem like a similar concept to generics, in that the | |
623 | latter allow us to define a function without specifying what types it can | |
923072b8 FG |
624 | handle. To examine the difference between the two concepts, we’ll look at an |
625 | implementation of the `Iterator` trait on a type named `Counter` that specifies | |
626 | the `Item` type is `u32`: | |
5e7ed085 FG |
627 | |
628 | Filename: src/lib.rs | |
629 | ||
630 | ``` | |
631 | impl Iterator for Counter { | |
632 | type Item = u32; | |
633 | ||
634 | fn next(&mut self) -> Option<Self::Item> { | |
635 | // --snip-- | |
636 | ``` | |
637 | ||
638 | This syntax seems comparable to that of generics. So why not just define the | |
639 | `Iterator` trait with generics, as shown in Listing 19-13? | |
640 | ||
641 | ``` | |
642 | pub trait Iterator<T> { | |
643 | fn next(&mut self) -> Option<T>; | |
644 | } | |
645 | ``` | |
646 | ||
647 | Listing 19-13: A hypothetical definition of the `Iterator` trait using generics | |
648 | ||
649 | The difference is that when using generics, as in Listing 19-13, we must | |
650 | annotate the types in each implementation; because we can also implement | |
651 | `Iterator<String> for Counter` or any other type, we could have multiple | |
652 | implementations of `Iterator` for `Counter`. In other words, when a trait has a | |
653 | generic parameter, it can be implemented for a type multiple times, changing | |
654 | the concrete types of the generic type parameters each time. When we use the | |
655 | `next` method on `Counter`, we would have to provide type annotations to | |
656 | indicate which implementation of `Iterator` we want to use. | |
657 | ||
658 | With associated types, we don’t need to annotate types because we can’t | |
659 | implement a trait on a type multiple times. In Listing 19-12 with the | |
660 | definition that uses associated types, we can only choose what the type of | |
661 | `Item` will be once, because there can only be one `impl Iterator for Counter`. | |
662 | We don’t have to specify that we want an iterator of `u32` values everywhere | |
663 | that we call `next` on `Counter`. | |
664 | ||
923072b8 FG |
665 | Associated types also become part of the trait’s contract: implementors of the |
666 | trait must provide a type to stand in for the associated type placeholder. | |
667 | Associated types often have a name that describes how the type will be used, | |
668 | and documenting the associated type in the API documentation is good practice. | |
669 | ||
670 | <!-- It also makes the type a part of the trait's contract. Not sure if | |
671 | too subtle of a point, but the associated type of a trait is part of the | |
672 | require things that the implementor must provide. They often also have a name | |
673 | that may clue you in as to how that required type will be used. | |
674 | /JT --> | |
675 | <!-- Great points, I've added a small paragraph here! /Carol --> | |
676 | ||
5e7ed085 FG |
677 | ### Default Generic Type Parameters and Operator Overloading |
678 | ||
679 | When we use generic type parameters, we can specify a default concrete type for | |
680 | the generic type. This eliminates the need for implementors of the trait to | |
923072b8 FG |
681 | specify a concrete type if the default type works. You specify a default type |
682 | when declaring a generic type with the `<PlaceholderType=ConcreteType>` syntax. | |
5e7ed085 | 683 | |
923072b8 FG |
684 | A great example of a situation where this technique is useful is with *operator |
685 | overloading*, in which you customize the behavior of an operator (such as `+`) | |
686 | in particular situations. | |
5e7ed085 FG |
687 | |
688 | Rust doesn’t allow you to create your own operators or overload arbitrary | |
689 | operators. But you can overload the operations and corresponding traits listed | |
690 | in `std::ops` by implementing the traits associated with the operator. For | |
691 | example, in Listing 19-14 we overload the `+` operator to add two `Point` | |
692 | instances together. We do this by implementing the `Add` trait on a `Point` | |
693 | struct: | |
694 | ||
695 | Filename: src/main.rs | |
696 | ||
697 | ``` | |
698 | use std::ops::Add; | |
699 | ||
700 | #[derive(Debug, Copy, Clone, PartialEq)] | |
701 | struct Point { | |
702 | x: i32, | |
703 | y: i32, | |
704 | } | |
705 | ||
706 | impl Add for Point { | |
707 | type Output = Point; | |
708 | ||
709 | fn add(self, other: Point) -> Point { | |
710 | Point { | |
711 | x: self.x + other.x, | |
712 | y: self.y + other.y, | |
713 | } | |
714 | } | |
715 | } | |
716 | ||
717 | fn main() { | |
718 | assert_eq!( | |
719 | Point { x: 1, y: 0 } + Point { x: 2, y: 3 }, | |
720 | Point { x: 3, y: 3 } | |
721 | ); | |
722 | } | |
723 | ``` | |
724 | ||
725 | Listing 19-14: Implementing the `Add` trait to overload the `+` operator for | |
726 | `Point` instances | |
727 | ||
728 | The `add` method adds the `x` values of two `Point` instances and the `y` | |
729 | values of two `Point` instances to create a new `Point`. The `Add` trait has an | |
730 | associated type named `Output` that determines the type returned from the `add` | |
731 | method. | |
732 | ||
733 | The default generic type in this code is within the `Add` trait. Here is its | |
734 | definition: | |
735 | ||
736 | ``` | |
737 | trait Add<Rhs=Self> { | |
738 | type Output; | |
739 | ||
740 | fn add(self, rhs: Rhs) -> Self::Output; | |
741 | } | |
742 | ``` | |
743 | ||
744 | This code should look generally familiar: a trait with one method and an | |
745 | associated type. The new part is `Rhs=Self`: this syntax is called *default | |
746 | type parameters*. The `Rhs` generic type parameter (short for “right hand | |
747 | side”) defines the type of the `rhs` parameter in the `add` method. If we don’t | |
748 | specify a concrete type for `Rhs` when we implement the `Add` trait, the type | |
749 | of `Rhs` will default to `Self`, which will be the type we’re implementing | |
750 | `Add` on. | |
751 | ||
752 | When we implemented `Add` for `Point`, we used the default for `Rhs` because we | |
753 | wanted to add two `Point` instances. Let’s look at an example of implementing | |
754 | the `Add` trait where we want to customize the `Rhs` type rather than using the | |
755 | default. | |
756 | ||
757 | We have two structs, `Millimeters` and `Meters`, holding values in different | |
758 | units. This thin wrapping of an existing type in another struct is known as the | |
759 | *newtype pattern*, which we describe in more detail in the “Using the Newtype | |
923072b8 FG |
760 | Pattern to Implement External Traits on External Types” section. We want to add |
761 | values in millimeters to values in meters and have the implementation of `Add` | |
762 | do the conversion correctly. We can implement `Add` for `Millimeters` with | |
763 | `Meters` as the `Rhs`, as shown in Listing 19-15. | |
5e7ed085 FG |
764 | |
765 | Filename: src/lib.rs | |
766 | ||
767 | ``` | |
768 | use std::ops::Add; | |
769 | ||
770 | struct Millimeters(u32); | |
771 | struct Meters(u32); | |
772 | ||
773 | impl Add<Meters> for Millimeters { | |
774 | type Output = Millimeters; | |
775 | ||
776 | fn add(self, other: Meters) -> Millimeters { | |
777 | Millimeters(self.0 + (other.0 * 1000)) | |
778 | } | |
779 | } | |
780 | ``` | |
781 | ||
782 | Listing 19-15: Implementing the `Add` trait on `Millimeters` to add | |
783 | `Millimeters` to `Meters` | |
784 | ||
785 | To add `Millimeters` and `Meters`, we specify `impl Add<Meters>` to set the | |
786 | value of the `Rhs` type parameter instead of using the default of `Self`. | |
787 | ||
788 | You’ll use default type parameters in two main ways: | |
789 | ||
790 | * To extend a type without breaking existing code | |
791 | * To allow customization in specific cases most users won’t need | |
792 | ||
793 | The standard library’s `Add` trait is an example of the second purpose: | |
794 | usually, you’ll add two like types, but the `Add` trait provides the ability to | |
795 | customize beyond that. Using a default type parameter in the `Add` trait | |
796 | definition means you don’t have to specify the extra parameter most of the | |
797 | time. In other words, a bit of implementation boilerplate isn’t needed, making | |
798 | it easier to use the trait. | |
799 | ||
800 | The first purpose is similar to the second but in reverse: if you want to add a | |
801 | type parameter to an existing trait, you can give it a default to allow | |
802 | extension of the functionality of the trait without breaking the existing | |
803 | implementation code. | |
804 | ||
805 | ### Fully Qualified Syntax for Disambiguation: Calling Methods with the Same Name | |
806 | ||
807 | Nothing in Rust prevents a trait from having a method with the same name as | |
808 | another trait’s method, nor does Rust prevent you from implementing both traits | |
809 | on one type. It’s also possible to implement a method directly on the type with | |
810 | the same name as methods from traits. | |
811 | ||
812 | When calling methods with the same name, you’ll need to tell Rust which one you | |
813 | want to use. Consider the code in Listing 19-16 where we’ve defined two traits, | |
814 | `Pilot` and `Wizard`, that both have a method called `fly`. We then implement | |
815 | both traits on a type `Human` that already has a method named `fly` implemented | |
816 | on it. Each `fly` method does something different. | |
817 | ||
818 | Filename: src/main.rs | |
819 | ||
820 | ``` | |
821 | trait Pilot { | |
822 | fn fly(&self); | |
823 | } | |
824 | ||
825 | trait Wizard { | |
826 | fn fly(&self); | |
827 | } | |
828 | ||
829 | struct Human; | |
830 | ||
831 | impl Pilot for Human { | |
832 | fn fly(&self) { | |
833 | println!("This is your captain speaking."); | |
834 | } | |
835 | } | |
836 | ||
837 | impl Wizard for Human { | |
838 | fn fly(&self) { | |
839 | println!("Up!"); | |
840 | } | |
841 | } | |
842 | ||
843 | impl Human { | |
844 | fn fly(&self) { | |
845 | println!("*waving arms furiously*"); | |
846 | } | |
847 | } | |
848 | ``` | |
849 | ||
850 | Listing 19-16: Two traits are defined to have a `fly` method and are | |
851 | implemented on the `Human` type, and a `fly` method is implemented on `Human` | |
852 | directly | |
853 | ||
854 | When we call `fly` on an instance of `Human`, the compiler defaults to calling | |
855 | the method that is directly implemented on the type, as shown in Listing 19-17. | |
856 | ||
857 | Filename: src/main.rs | |
858 | ||
859 | ``` | |
860 | fn main() { | |
861 | let person = Human; | |
862 | person.fly(); | |
863 | } | |
864 | ``` | |
865 | ||
866 | Listing 19-17: Calling `fly` on an instance of `Human` | |
867 | ||
868 | Running this code will print `*waving arms furiously*`, showing that Rust | |
869 | called the `fly` method implemented on `Human` directly. | |
870 | ||
871 | To call the `fly` methods from either the `Pilot` trait or the `Wizard` trait, | |
872 | we need to use more explicit syntax to specify which `fly` method we mean. | |
873 | Listing 19-18 demonstrates this syntax. | |
874 | ||
875 | Filename: src/main.rs | |
876 | ||
877 | ``` | |
878 | fn main() { | |
879 | let person = Human; | |
880 | Pilot::fly(&person); | |
881 | Wizard::fly(&person); | |
882 | person.fly(); | |
883 | } | |
884 | ``` | |
885 | ||
886 | Listing 19-18: Specifying which trait’s `fly` method we want to call | |
887 | ||
888 | Specifying the trait name before the method name clarifies to Rust which | |
889 | implementation of `fly` we want to call. We could also write | |
890 | `Human::fly(&person)`, which is equivalent to the `person.fly()` that we used | |
891 | in Listing 19-18, but this is a bit longer to write if we don’t need to | |
892 | disambiguate. | |
893 | ||
894 | Running this code prints the following: | |
895 | ||
896 | ``` | |
897 | $ cargo run | |
898 | Compiling traits-example v0.1.0 (file:///projects/traits-example) | |
899 | Finished dev [unoptimized + debuginfo] target(s) in 0.46s | |
900 | Running `target/debug/traits-example` | |
901 | This is your captain speaking. | |
902 | Up! | |
903 | *waving arms furiously* | |
904 | ``` | |
905 | ||
906 | Because the `fly` method takes a `self` parameter, if we had two *types* that | |
907 | both implement one *trait*, Rust could figure out which implementation of a | |
908 | trait to use based on the type of `self`. | |
909 | ||
910 | However, associated functions that are not methods don’t have a `self` | |
911 | parameter. When there are multiple types or traits that define non-method | |
912 | functions with the same function name, Rust doesn't always know which type you | |
923072b8 FG |
913 | mean unless you use *fully qualified syntax*. For example, in Listing 19-19 we |
914 | create a trait for an animal shelter that wants to name all baby dogs *Spot*. | |
915 | We make an `Animal` trait with an associated non-method function `baby_name`. | |
916 | The `Animal` trait is implemented for the struct `Dog`, on which we also | |
917 | provide an associated non-method function `baby_name` directly. | |
5e7ed085 FG |
918 | |
919 | Filename: src/main.rs | |
920 | ||
921 | ``` | |
922 | trait Animal { | |
923 | fn baby_name() -> String; | |
924 | } | |
925 | ||
926 | struct Dog; | |
927 | ||
928 | impl Dog { | |
929 | fn baby_name() -> String { | |
930 | String::from("Spot") | |
931 | } | |
932 | } | |
933 | ||
934 | impl Animal for Dog { | |
935 | fn baby_name() -> String { | |
936 | String::from("puppy") | |
937 | } | |
938 | } | |
939 | ||
940 | fn main() { | |
941 | println!("A baby dog is called a {}", Dog::baby_name()); | |
942 | } | |
943 | ``` | |
944 | ||
945 | Listing 19-19: A trait with an associated function and a type with an | |
946 | associated function of the same name that also implements the trait | |
947 | ||
923072b8 FG |
948 | We implement the code for naming all puppies Spot in the `baby_name` associated |
949 | function that is defined on `Dog`. The `Dog` type also implements the trait | |
950 | `Animal`, which describes characteristics that all animals have. Baby dogs are | |
951 | called puppies, and that is expressed in the implementation of the `Animal` | |
952 | trait on `Dog` in the `baby_name` function associated with the `Animal` trait. | |
5e7ed085 FG |
953 | |
954 | In `main`, we call the `Dog::baby_name` function, which calls the associated | |
955 | function defined on `Dog` directly. This code prints the following: | |
956 | ||
957 | ``` | |
958 | A baby dog is called a Spot | |
959 | ``` | |
960 | ||
961 | This output isn’t what we wanted. We want to call the `baby_name` function that | |
962 | is part of the `Animal` trait that we implemented on `Dog` so the code prints | |
963 | `A baby dog is called a puppy`. The technique of specifying the trait name that | |
964 | we used in Listing 19-18 doesn’t help here; if we change `main` to the code in | |
965 | Listing 19-20, we’ll get a compilation error. | |
966 | ||
967 | Filename: src/main.rs | |
968 | ||
969 | ``` | |
970 | fn main() { | |
971 | println!("A baby dog is called a {}", Animal::baby_name()); | |
972 | } | |
973 | ``` | |
974 | ||
975 | Listing 19-20: Attempting to call the `baby_name` function from the `Animal` | |
976 | trait, but Rust doesn’t know which implementation to use | |
977 | ||
978 | Because `Animal::baby_name` doesn’t have a `self` parameter, and there could be | |
979 | other types that implement the `Animal` trait, Rust can’t figure out which | |
980 | implementation of `Animal::baby_name` we want. We’ll get this compiler error: | |
981 | ||
982 | ``` | |
983 | error[E0283]: type annotations needed | |
984 | --> src/main.rs:20:43 | |
985 | | | |
986 | 20 | println!("A baby dog is called a {}", Animal::baby_name()); | |
987 | | ^^^^^^^^^^^^^^^^^ cannot infer type | |
988 | | | |
989 | = note: cannot satisfy `_: Animal` | |
990 | ``` | |
991 | ||
992 | To disambiguate and tell Rust that we want to use the implementation of | |
993 | `Animal` for `Dog` as opposed to the implementation of `Animal` for some other | |
994 | type, we need to use fully qualified syntax. Listing 19-21 demonstrates how to | |
995 | use fully qualified syntax. | |
996 | ||
997 | Filename: src/main.rs | |
998 | ||
999 | ``` | |
1000 | fn main() { | |
1001 | println!("A baby dog is called a {}", <Dog as Animal>::baby_name()); | |
1002 | } | |
1003 | ``` | |
1004 | ||
1005 | Listing 19-21: Using fully qualified syntax to specify that we want to call the | |
1006 | `baby_name` function from the `Animal` trait as implemented on `Dog` | |
1007 | ||
1008 | We’re providing Rust with a type annotation within the angle brackets, which | |
1009 | indicates we want to call the `baby_name` method from the `Animal` trait as | |
1010 | implemented on `Dog` by saying that we want to treat the `Dog` type as an | |
1011 | `Animal` for this function call. This code will now print what we want: | |
1012 | ||
1013 | ``` | |
1014 | A baby dog is called a puppy | |
1015 | ``` | |
1016 | ||
1017 | In general, fully qualified syntax is defined as follows: | |
1018 | ||
1019 | ``` | |
1020 | <Type as Trait>::function(receiver_if_method, next_arg, ...); | |
1021 | ``` | |
1022 | ||
1023 | For associated functions that aren’t methods, there would not be a `receiver`: | |
1024 | there would only be the list of other arguments. You could use fully qualified | |
1025 | syntax everywhere that you call functions or methods. However, you’re allowed | |
1026 | to omit any part of this syntax that Rust can figure out from other information | |
1027 | in the program. You only need to use this more verbose syntax in cases where | |
1028 | there are multiple implementations that use the same name and Rust needs help | |
1029 | to identify which implementation you want to call. | |
1030 | ||
1031 | ### Using Supertraits to Require One Trait’s Functionality Within Another Trait | |
1032 | ||
923072b8 FG |
1033 | Sometimes, you might write a trait definition that depends on another trait: |
1034 | for a type to implement the first trait, you want to require that type to also | |
1035 | implement the second trait. You would do this so that your trait definition can | |
1036 | make use of the associated items of the second trait. The trait your trait | |
1037 | definition is relying on is called a *supertrait* of your trait. | |
5e7ed085 FG |
1038 | |
1039 | For example, let’s say we want to make an `OutlinePrint` trait with an | |
923072b8 FG |
1040 | `outline_print` method that will print a given value formatted so that it's |
1041 | framed in asterisks. That is, given a `Point` struct that implements the | |
1042 | standard library trait `Display` to result in `(x, y)`, when we | |
5e7ed085 FG |
1043 | call `outline_print` on a `Point` instance that has `1` for `x` and `3` for |
1044 | `y`, it should print the following: | |
1045 | ||
1046 | ``` | |
1047 | ********** | |
1048 | * * | |
1049 | * (1, 3) * | |
1050 | * * | |
1051 | ********** | |
1052 | ``` | |
1053 | ||
923072b8 FG |
1054 | In the implementation of the `outline_print` method, we want to use the |
1055 | `Display` trait’s functionality. Therefore, we need to specify that the | |
1056 | `OutlinePrint` trait will work only for types that also implement `Display` and | |
1057 | provide the functionality that `OutlinePrint` needs. We can do that in the | |
1058 | trait definition by specifying `OutlinePrint: Display`. This technique is | |
1059 | similar to adding a trait bound to the trait. Listing 19-22 shows an | |
1060 | implementation of the `OutlinePrint` trait. | |
5e7ed085 FG |
1061 | |
1062 | Filename: src/main.rs | |
1063 | ||
1064 | ``` | |
1065 | use std::fmt; | |
1066 | ||
1067 | trait OutlinePrint: fmt::Display { | |
1068 | fn outline_print(&self) { | |
1069 | let output = self.to_string(); | |
1070 | let len = output.len(); | |
1071 | println!("{}", "*".repeat(len + 4)); | |
1072 | println!("*{}*", " ".repeat(len + 2)); | |
1073 | println!("* {} *", output); | |
1074 | println!("*{}*", " ".repeat(len + 2)); | |
1075 | println!("{}", "*".repeat(len + 4)); | |
1076 | } | |
1077 | } | |
1078 | ``` | |
1079 | ||
1080 | Listing 19-22: Implementing the `OutlinePrint` trait that requires the | |
1081 | functionality from `Display` | |
1082 | ||
1083 | Because we’ve specified that `OutlinePrint` requires the `Display` trait, we | |
1084 | can use the `to_string` function that is automatically implemented for any type | |
1085 | that implements `Display`. If we tried to use `to_string` without adding a | |
1086 | colon and specifying the `Display` trait after the trait name, we’d get an | |
1087 | error saying that no method named `to_string` was found for the type `&Self` in | |
1088 | the current scope. | |
1089 | ||
1090 | Let’s see what happens when we try to implement `OutlinePrint` on a type that | |
1091 | doesn’t implement `Display`, such as the `Point` struct: | |
1092 | ||
1093 | Filename: src/main.rs | |
1094 | ||
1095 | ``` | |
1096 | struct Point { | |
1097 | x: i32, | |
1098 | y: i32, | |
1099 | } | |
1100 | ||
1101 | impl OutlinePrint for Point {} | |
1102 | ``` | |
1103 | ||
1104 | We get an error saying that `Display` is required but not implemented: | |
1105 | ||
1106 | ``` | |
1107 | error[E0277]: `Point` doesn't implement `std::fmt::Display` | |
1108 | --> src/main.rs:20:6 | |
1109 | | | |
1110 | 20 | impl OutlinePrint for Point {} | |
1111 | | ^^^^^^^^^^^^ `Point` cannot be formatted with the default formatter | |
1112 | | | |
1113 | = help: the trait `std::fmt::Display` is not implemented for `Point` | |
1114 | = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead | |
1115 | note: required by a bound in `OutlinePrint` | |
1116 | --> src/main.rs:3:21 | |
1117 | | | |
1118 | 3 | trait OutlinePrint: fmt::Display { | |
1119 | | ^^^^^^^^^^^^ required by this bound in `OutlinePrint` | |
1120 | ``` | |
1121 | ||
1122 | To fix this, we implement `Display` on `Point` and satisfy the constraint that | |
1123 | `OutlinePrint` requires, like so: | |
1124 | ||
1125 | Filename: src/main.rs | |
1126 | ||
1127 | ``` | |
1128 | use std::fmt; | |
1129 | ||
1130 | impl fmt::Display for Point { | |
1131 | fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |
1132 | write!(f, "({}, {})", self.x, self.y) | |
1133 | } | |
1134 | } | |
1135 | ``` | |
1136 | ||
1137 | Then implementing the `OutlinePrint` trait on `Point` will compile | |
1138 | successfully, and we can call `outline_print` on a `Point` instance to display | |
1139 | it within an outline of asterisks. | |
1140 | ||
1141 | ### Using the Newtype Pattern to Implement External Traits on External Types | |
1142 | ||
1143 | In Chapter 10 in the “Implementing a Trait on a Type” section, we mentioned the | |
923072b8 FG |
1144 | orphan rule that states we’re only allowed to implement a trait on a type if |
1145 | either the trait or the type are local to our crate. | |
1146 | It’s possible to get | |
5e7ed085 FG |
1147 | around this restriction using the *newtype pattern*, which involves creating a |
1148 | new type in a tuple struct. (We covered tuple structs in the “Using Tuple | |
1149 | Structs without Named Fields to Create Different Types” section of Chapter 5.) | |
1150 | The tuple struct will have one field and be a thin wrapper around the type we | |
1151 | want to implement a trait for. Then the wrapper type is local to our crate, and | |
1152 | we can implement the trait on the wrapper. *Newtype* is a term that originates | |
1153 | from the Haskell programming language. There is no runtime performance penalty | |
1154 | for using this pattern, and the wrapper type is elided at compile time. | |
1155 | ||
1156 | As an example, let’s say we want to implement `Display` on `Vec<T>`, which the | |
1157 | orphan rule prevents us from doing directly because the `Display` trait and the | |
1158 | `Vec<T>` type are defined outside our crate. We can make a `Wrapper` struct | |
1159 | that holds an instance of `Vec<T>`; then we can implement `Display` on | |
1160 | `Wrapper` and use the `Vec<T>` value, as shown in Listing 19-23. | |
1161 | ||
1162 | Filename: src/main.rs | |
1163 | ||
1164 | ``` | |
1165 | use std::fmt; | |
1166 | ||
1167 | struct Wrapper(Vec<String>); | |
1168 | ||
1169 | impl fmt::Display for Wrapper { | |
1170 | fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |
1171 | write!(f, "[{}]", self.0.join(", ")) | |
1172 | } | |
1173 | } | |
1174 | ||
1175 | fn main() { | |
1176 | let w = Wrapper(vec![String::from("hello"), String::from("world")]); | |
1177 | println!("w = {}", w); | |
1178 | } | |
1179 | ``` | |
1180 | ||
1181 | Listing 19-23: Creating a `Wrapper` type around `Vec<String>` to implement | |
1182 | `Display` | |
1183 | ||
1184 | The implementation of `Display` uses `self.0` to access the inner `Vec<T>`, | |
1185 | because `Wrapper` is a tuple struct and `Vec<T>` is the item at index 0 in the | |
1186 | tuple. Then we can use the functionality of the `Display` type on `Wrapper`. | |
1187 | ||
1188 | The downside of using this technique is that `Wrapper` is a new type, so it | |
1189 | doesn’t have the methods of the value it’s holding. We would have to implement | |
1190 | all the methods of `Vec<T>` directly on `Wrapper` such that the methods | |
1191 | delegate to `self.0`, which would allow us to treat `Wrapper` exactly like a | |
1192 | `Vec<T>`. If we wanted the new type to have every method the inner type has, | |
1193 | implementing the `Deref` trait (discussed in Chapter 15 in the “Treating Smart | |
1194 | Pointers Like Regular References with the `Deref` Trait” section) on the | |
1195 | `Wrapper` to return the inner type would be a solution. If we don’t want the | |
1196 | `Wrapper` type to have all the methods of the inner type—for example, to | |
1197 | restrict the `Wrapper` type’s behavior—we would have to implement just the | |
1198 | methods we do want manually. | |
1199 | ||
923072b8 FG |
1200 | This newtype pattern is also useful even when traits are not involved. Let’s |
1201 | switch focus and look at some advanced ways to interact with Rust’s type system. | |
5e7ed085 FG |
1202 | |
1203 | ## Advanced Types | |
1204 | ||
923072b8 FG |
1205 | The Rust type system has some features that we’ve so far mentioned but haven’t |
1206 | yet discussed. We’ll start by discussing newtypes in general as we examine why | |
1207 | newtypes are useful as types. Then we’ll move on to type aliases, a feature | |
1208 | similar to newtypes but with slightly different semantics. We’ll also discuss | |
1209 | the `!` type and dynamically sized types. | |
5e7ed085 FG |
1210 | |
1211 | ### Using the Newtype Pattern for Type Safety and Abstraction | |
1212 | ||
1213 | > Note: This section assumes you’ve read the earlier section “Using the | |
1214 | > Newtype Pattern to Implement External Traits on External | |
1215 | > Types.” | |
1216 | ||
923072b8 FG |
1217 | The newtype pattern is also useful for tasks beyond those we’ve discussed so |
1218 | far, including statically enforcing that values are never confused and | |
1219 | indicating the units of a value. You saw an example of using newtypes to | |
1220 | indicate units in Listing 19-15: recall that the `Millimeters` and `Meters` | |
1221 | structs wrapped `u32` values in a newtype. If we wrote a function with a | |
1222 | parameter of type `Millimeters`, we couldn’t compile a program that | |
1223 | accidentally tried to call that function with a value of type `Meters` or a | |
1224 | plain `u32`. | |
5e7ed085 | 1225 | |
923072b8 | 1226 | We can also use the newtype pattern to abstract away some implementation |
5e7ed085 FG |
1227 | details of a type: the new type can expose a public API that is different from |
1228 | the API of the private inner type. | |
1229 | ||
1230 | Newtypes can also hide internal implementation. For example, we could provide a | |
1231 | `People` type to wrap a `HashMap<i32, String>` that stores a person’s ID | |
1232 | associated with their name. Code using `People` would only interact with the | |
1233 | public API we provide, such as a method to add a name string to the `People` | |
1234 | collection; that code wouldn’t need to know that we assign an `i32` ID to names | |
1235 | internally. The newtype pattern is a lightweight way to achieve encapsulation | |
1236 | to hide implementation details, which we discussed in the “Encapsulation that | |
1237 | Hides Implementation Details” section of Chapter 17. | |
1238 | ||
1239 | ### Creating Type Synonyms with Type Aliases | |
1240 | ||
923072b8 FG |
1241 | Rust provides the ability to declare a *type alias* to give an existing type |
1242 | another name. For this we use the `type` keyword. For example, we can create | |
1243 | the alias `Kilometers` to `i32` like so: | |
5e7ed085 FG |
1244 | |
1245 | ``` | |
1246 | type Kilometers = i32; | |
1247 | ``` | |
1248 | ||
1249 | Now, the alias `Kilometers` is a *synonym* for `i32`; unlike the `Millimeters` | |
1250 | and `Meters` types we created in Listing 19-15, `Kilometers` is not a separate, | |
1251 | new type. Values that have the type `Kilometers` will be treated the same as | |
1252 | values of type `i32`: | |
1253 | ||
1254 | ``` | |
1255 | type Kilometers = i32; | |
1256 | ||
1257 | let x: i32 = 5; | |
1258 | let y: Kilometers = 5; | |
1259 | ||
1260 | println!("x + y = {}", x + y); | |
1261 | ``` | |
1262 | ||
1263 | Because `Kilometers` and `i32` are the same type, we can add values of both | |
1264 | types and we can pass `Kilometers` values to functions that take `i32` | |
1265 | parameters. However, using this method, we don’t get the type checking benefits | |
923072b8 FG |
1266 | that we get from the newtype pattern discussed earlier. In other words, if we |
1267 | mix up `Kilometers` and `i32` values somewhere, the compiler will not give us | |
1268 | an error. | |
1269 | ||
1270 | <!-- Having a few battle wounds trying to debug using this pattern, it's | |
1271 | definitely good to warn people that if they use type aliases to the same base | |
1272 | type in their program (like multiple aliases to `usize`), they're asking for | |
1273 | trouble as the typechecker will not help them if they mix up their types. | |
1274 | /JT --> | |
1275 | <!-- I'm not sure if JT was saying this paragraph was good or it could use more | |
1276 | emphasis? I've added a sentence to the end of the paragraph above in case it | |
1277 | was the latter /Carol --> | |
5e7ed085 FG |
1278 | |
1279 | The main use case for type synonyms is to reduce repetition. For example, we | |
1280 | might have a lengthy type like this: | |
1281 | ||
1282 | ``` | |
1283 | Box<dyn Fn() + Send + 'static> | |
1284 | ``` | |
1285 | ||
1286 | Writing this lengthy type in function signatures and as type annotations all | |
1287 | over the code can be tiresome and error prone. Imagine having a project full of | |
1288 | code like that in Listing 19-24. | |
1289 | ||
1290 | ``` | |
1291 | let f: Box<dyn Fn() + Send + 'static> = Box::new(|| println!("hi")); | |
1292 | ||
1293 | fn takes_long_type(f: Box<dyn Fn() + Send + 'static>) { | |
1294 | // --snip-- | |
1295 | } | |
1296 | ||
1297 | fn returns_long_type() -> Box<dyn Fn() + Send + 'static> { | |
1298 | // --snip-- | |
1299 | } | |
1300 | ``` | |
1301 | ||
1302 | Listing 19-24: Using a long type in many places | |
1303 | ||
1304 | A type alias makes this code more manageable by reducing the repetition. In | |
1305 | Listing 19-25, we’ve introduced an alias named `Thunk` for the verbose type and | |
1306 | can replace all uses of the type with the shorter alias `Thunk`. | |
1307 | ||
1308 | ``` | |
1309 | type Thunk = Box<dyn Fn() + Send + 'static>; | |
1310 | ||
1311 | let f: Thunk = Box::new(|| println!("hi")); | |
1312 | ||
1313 | fn takes_long_type(f: Thunk) { | |
1314 | // --snip-- | |
1315 | } | |
1316 | ||
1317 | fn returns_long_type() -> Thunk { | |
1318 | // --snip-- | |
1319 | } | |
1320 | ``` | |
1321 | ||
1322 | Listing 19-25: Introducing a type alias `Thunk` to reduce repetition | |
1323 | ||
1324 | This code is much easier to read and write! Choosing a meaningful name for a | |
1325 | type alias can help communicate your intent as well (*thunk* is a word for code | |
1326 | to be evaluated at a later time, so it’s an appropriate name for a closure that | |
1327 | gets stored). | |
1328 | ||
1329 | Type aliases are also commonly used with the `Result<T, E>` type for reducing | |
1330 | repetition. Consider the `std::io` module in the standard library. I/O | |
1331 | operations often return a `Result<T, E>` to handle situations when operations | |
1332 | fail to work. This library has a `std::io::Error` struct that represents all | |
1333 | possible I/O errors. Many of the functions in `std::io` will be returning | |
1334 | `Result<T, E>` where the `E` is `std::io::Error`, such as these functions in | |
1335 | the `Write` trait: | |
1336 | ||
1337 | ``` | |
1338 | use std::fmt; | |
1339 | use std::io::Error; | |
1340 | ||
1341 | pub trait Write { | |
1342 | fn write(&mut self, buf: &[u8]) -> Result<usize, Error>; | |
1343 | fn flush(&mut self) -> Result<(), Error>; | |
1344 | ||
1345 | fn write_all(&mut self, buf: &[u8]) -> Result<(), Error>; | |
1346 | fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<(), Error>; | |
1347 | } | |
1348 | ``` | |
1349 | ||
1350 | The `Result<..., Error>` is repeated a lot. As such, `std::io` has this type | |
1351 | alias declaration: | |
1352 | ||
1353 | ``` | |
1354 | type Result<T> = std::result::Result<T, std::io::Error>; | |
1355 | ``` | |
1356 | ||
1357 | Because this declaration is in the `std::io` module, we can use the fully | |
923072b8 | 1358 | qualified alias `std::io::Result<T>`; that is, a `Result<T, E>` with the `E` |
5e7ed085 FG |
1359 | filled in as `std::io::Error`. The `Write` trait function signatures end up |
1360 | looking like this: | |
1361 | ||
1362 | ``` | |
1363 | pub trait Write { | |
1364 | fn write(&mut self, buf: &[u8]) -> Result<usize>; | |
1365 | fn flush(&mut self) -> Result<()>; | |
1366 | ||
1367 | fn write_all(&mut self, buf: &[u8]) -> Result<()>; | |
1368 | fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<()>; | |
1369 | } | |
1370 | ``` | |
1371 | ||
1372 | The type alias helps in two ways: it makes code easier to write *and* it gives | |
1373 | us a consistent interface across all of `std::io`. Because it’s an alias, it’s | |
1374 | just another `Result<T, E>`, which means we can use any methods that work on | |
1375 | `Result<T, E>` with it, as well as special syntax like the `?` operator. | |
1376 | ||
1377 | ### The Never Type that Never Returns | |
1378 | ||
1379 | Rust has a special type named `!` that’s known in type theory lingo as the | |
1380 | *empty type* because it has no values. We prefer to call it the *never type* | |
1381 | because it stands in the place of the return type when a function will never | |
1382 | return. Here is an example: | |
1383 | ||
1384 | ``` | |
1385 | fn bar() -> ! { | |
1386 | // --snip-- | |
1387 | } | |
1388 | ``` | |
1389 | ||
1390 | This code is read as “the function `bar` returns never.” Functions that return | |
1391 | never are called *diverging functions*. We can’t create values of the type `!` | |
1392 | so `bar` can never possibly return. | |
1393 | ||
1394 | But what use is a type you can never create values for? Recall the code from | |
923072b8 FG |
1395 | Listing 2-5, part of the number guessing game; we’ve reproduced a bit of it |
1396 | here in Listing 19-26. | |
5e7ed085 FG |
1397 | |
1398 | ``` | |
1399 | let guess: u32 = match guess.trim().parse() { | |
1400 | Ok(num) => num, | |
1401 | Err(_) => continue, | |
1402 | }; | |
1403 | ``` | |
1404 | ||
1405 | Listing 19-26: A `match` with an arm that ends in `continue` | |
1406 | ||
1407 | At the time, we skipped over some details in this code. In Chapter 6 in “The | |
923072b8 FG |
1408 | `match` Control Flow Operator” section, we discussed that `match` arms must all |
1409 | return the same type. So, for example, the following code doesn’t work: | |
5e7ed085 FG |
1410 | |
1411 | ``` | |
1412 | let guess = match guess.trim().parse() { | |
1413 | Ok(_) => 5, | |
1414 | Err(_) => "hello", | |
1415 | }; | |
1416 | ``` | |
1417 | ||
1418 | The type of `guess` in this code would have to be an integer *and* a string, | |
1419 | and Rust requires that `guess` have only one type. So what does `continue` | |
1420 | return? How were we allowed to return a `u32` from one arm and have another arm | |
1421 | that ends with `continue` in Listing 19-26? | |
1422 | ||
1423 | As you might have guessed, `continue` has a `!` value. That is, when Rust | |
1424 | computes the type of `guess`, it looks at both match arms, the former with a | |
1425 | value of `u32` and the latter with a `!` value. Because `!` can never have a | |
1426 | value, Rust decides that the type of `guess` is `u32`. | |
1427 | ||
1428 | The formal way of describing this behavior is that expressions of type `!` can | |
1429 | be coerced into any other type. We’re allowed to end this `match` arm with | |
1430 | `continue` because `continue` doesn’t return a value; instead, it moves control | |
1431 | back to the top of the loop, so in the `Err` case, we never assign a value to | |
1432 | `guess`. | |
1433 | ||
923072b8 FG |
1434 | The never type is useful with the `panic!` macro as well. Recall the `unwrap` |
1435 | function that we call on `Option<T>` values to produce a value or panic with | |
1436 | this definition: | |
5e7ed085 FG |
1437 | |
1438 | ``` | |
1439 | impl<T> Option<T> { | |
1440 | pub fn unwrap(self) -> T { | |
1441 | match self { | |
1442 | Some(val) => val, | |
1443 | None => panic!("called `Option::unwrap()` on a `None` value"), | |
1444 | } | |
1445 | } | |
1446 | } | |
1447 | ``` | |
1448 | ||
1449 | In this code, the same thing happens as in the `match` in Listing 19-26: Rust | |
1450 | sees that `val` has the type `T` and `panic!` has the type `!`, so the result | |
1451 | of the overall `match` expression is `T`. This code works because `panic!` | |
1452 | doesn’t produce a value; it ends the program. In the `None` case, we won’t be | |
1453 | returning a value from `unwrap`, so this code is valid. | |
1454 | ||
1455 | One final expression that has the type `!` is a `loop`: | |
1456 | ||
1457 | ``` | |
1458 | print!("forever "); | |
1459 | ||
1460 | loop { | |
1461 | print!("and ever "); | |
1462 | } | |
1463 | ``` | |
1464 | ||
1465 | Here, the loop never ends, so `!` is the value of the expression. However, this | |
1466 | wouldn’t be true if we included a `break`, because the loop would terminate | |
1467 | when it got to the `break`. | |
1468 | ||
1469 | ### Dynamically Sized Types and the `Sized` Trait | |
1470 | ||
923072b8 FG |
1471 | Rust needs to know certain details about its types, such as how much space to |
1472 | allocate for a value of a particular type. This leaves one corner of its type | |
1473 | system a little confusing at first: the concept of *dynamically sized types*. | |
1474 | Sometimes referred to as *DSTs* or *unsized types*, these types let us write | |
1475 | code using values whose size we can know only at runtime. | |
5e7ed085 FG |
1476 | |
1477 | Let’s dig into the details of a dynamically sized type called `str`, which | |
1478 | we’ve been using throughout the book. That’s right, not `&str`, but `str` on | |
1479 | its own, is a DST. We can’t know how long the string is until runtime, meaning | |
1480 | we can’t create a variable of type `str`, nor can we take an argument of type | |
1481 | `str`. Consider the following code, which does not work: | |
1482 | ||
1483 | ``` | |
1484 | let s1: str = "Hello there!"; | |
1485 | let s2: str = "How's it going?"; | |
1486 | ``` | |
1487 | ||
1488 | Rust needs to know how much memory to allocate for any value of a particular | |
1489 | type, and all values of a type must use the same amount of memory. If Rust | |
1490 | allowed us to write this code, these two `str` values would need to take up the | |
1491 | same amount of space. But they have different lengths: `s1` needs 12 bytes of | |
1492 | storage and `s2` needs 15. This is why it’s not possible to create a variable | |
1493 | holding a dynamically sized type. | |
1494 | ||
1495 | So what do we do? In this case, you already know the answer: we make the types | |
923072b8 FG |
1496 | of `s1` and `s2` a `&str` rather than a `str`. Recall from the “String Slices” |
1497 | section of Chapter 4 that the slice data structure just stores the starting | |
1498 | position and the length of the slice. So although a `&T` is a single value that | |
1499 | stores the memory address of where the `T` is located, a `&str` is *two* | |
1500 | values: the address of the `str` and its length. As such, we can know the size | |
1501 | of a `&str` value at compile time: it’s twice the length of a `usize`. That is, | |
1502 | we always know the size of a `&str`, no matter how long the string it refers to | |
1503 | is. In general, this is the way in which dynamically sized types are used in | |
1504 | Rust: they have an extra bit of metadata that stores the size of the dynamic | |
1505 | information. The golden rule of dynamically sized types is that we must always | |
1506 | put values of dynamically sized types behind a pointer of some kind. | |
5e7ed085 FG |
1507 | |
1508 | We can combine `str` with all kinds of pointers: for example, `Box<str>` or | |
1509 | `Rc<str>`. In fact, you’ve seen this before but with a different dynamically | |
1510 | sized type: traits. Every trait is a dynamically sized type we can refer to by | |
1511 | using the name of the trait. In Chapter 17 in the “Using Trait Objects That | |
1512 | Allow for Values of Different Types” section, we mentioned that to use traits | |
1513 | as trait objects, we must put them behind a pointer, such as `&dyn Trait` or | |
1514 | `Box<dyn Trait>` (`Rc<dyn Trait>` would work too). | |
1515 | ||
923072b8 FG |
1516 | To work with DSTs, Rust provides the `Sized` trait to determine whether or not |
1517 | a type’s size is known at compile time. This trait is automatically implemented | |
1518 | for everything whose size is known at compile time. In addition, Rust | |
1519 | implicitly adds a bound on `Sized` to every generic function. That is, a | |
1520 | generic function definition like this: | |
5e7ed085 FG |
1521 | |
1522 | ``` | |
1523 | fn generic<T>(t: T) { | |
1524 | // --snip-- | |
1525 | } | |
1526 | ``` | |
1527 | ||
1528 | is actually treated as though we had written this: | |
1529 | ||
1530 | ``` | |
1531 | fn generic<T: Sized>(t: T) { | |
1532 | // --snip-- | |
1533 | } | |
1534 | ``` | |
1535 | ||
1536 | By default, generic functions will work only on types that have a known size at | |
1537 | compile time. However, you can use the following special syntax to relax this | |
1538 | restriction: | |
1539 | ||
1540 | ``` | |
1541 | fn generic<T: ?Sized>(t: &T) { | |
1542 | // --snip-- | |
1543 | } | |
1544 | ``` | |
1545 | ||
1546 | A trait bound on `?Sized` means “`T` may or may not be `Sized`” and this | |
1547 | notation overrides the default that generic types must have a known size at | |
1548 | compile time. The `?Trait` syntax with this meaning is only available for | |
1549 | `Sized`, not any other traits. | |
1550 | ||
1551 | Also note that we switched the type of the `t` parameter from `T` to `&T`. | |
1552 | Because the type might not be `Sized`, we need to use it behind some kind of | |
1553 | pointer. In this case, we’ve chosen a reference. | |
1554 | ||
1555 | Next, we’ll talk about functions and closures! | |
1556 | ||
1557 | ## Advanced Functions and Closures | |
1558 | ||
1559 | This section explores some advanced features related to functions and closures, | |
1560 | including function pointers and returning closures. | |
1561 | ||
1562 | ### Function Pointers | |
1563 | ||
1564 | We’ve talked about how to pass closures to functions; you can also pass regular | |
1565 | functions to functions! This technique is useful when you want to pass a | |
923072b8 FG |
1566 | function you’ve already defined rather than defining a new closure. Functions |
1567 | coerce to the type `fn` (with a lowercase f), not to be confused with the `Fn` | |
1568 | closure trait. The `fn` type is called a *function pointer*. Passing functions | |
5e7ed085 | 1569 | with function pointers will allow you to use functions as arguments to other |
923072b8 FG |
1570 | functions. |
1571 | ||
1572 | The syntax for specifying that a parameter is a function pointer is similar to | |
1573 | that of closures, as shown in Listing 19-27, where we’ve defined a function | |
1574 | `add_one` that adds one to its parameter. The function `do_twice` takes two | |
1575 | parameters: a function pointer to any function that takes an `i32` parameter | |
1576 | and returns an `i32`, and one `i32 value`. The `do_twice` function calls the | |
1577 | function `f` twice, passing it the `arg` value, then adds the two function call | |
1578 | results together. The `main` function calls `do_twice` with the arguments | |
1579 | `add_one` and `5`. | |
5e7ed085 FG |
1580 | |
1581 | Filename: src/main.rs | |
1582 | ||
1583 | ``` | |
1584 | fn add_one(x: i32) -> i32 { | |
1585 | x + 1 | |
1586 | } | |
1587 | ||
1588 | fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32 { | |
1589 | f(arg) + f(arg) | |
1590 | } | |
1591 | ||
1592 | fn main() { | |
1593 | let answer = do_twice(add_one, 5); | |
1594 | ||
1595 | println!("The answer is: {}", answer); | |
1596 | } | |
1597 | ``` | |
1598 | ||
1599 | Listing 19-27: Using the `fn` type to accept a function pointer as an argument | |
1600 | ||
1601 | This code prints `The answer is: 12`. We specify that the parameter `f` in | |
1602 | `do_twice` is an `fn` that takes one parameter of type `i32` and returns an | |
1603 | `i32`. We can then call `f` in the body of `do_twice`. In `main`, we can pass | |
1604 | the function name `add_one` as the first argument to `do_twice`. | |
1605 | ||
1606 | Unlike closures, `fn` is a type rather than a trait, so we specify `fn` as the | |
1607 | parameter type directly rather than declaring a generic type parameter with one | |
1608 | of the `Fn` traits as a trait bound. | |
1609 | ||
1610 | Function pointers implement all three of the closure traits (`Fn`, `FnMut`, and | |
923072b8 | 1611 | `FnOnce`), meaning you can always pass a function pointer as an argument for a |
5e7ed085 FG |
1612 | function that expects a closure. It’s best to write functions using a generic |
1613 | type and one of the closure traits so your functions can accept either | |
1614 | functions or closures. | |
1615 | ||
923072b8 FG |
1616 | That said, one example of where you would want to only accept `fn` and not |
1617 | closures is when interfacing with external code that doesn’t have closures: C | |
1618 | functions can accept functions as arguments, but C doesn’t have closures. | |
5e7ed085 FG |
1619 | |
1620 | As an example of where you could use either a closure defined inline or a named | |
923072b8 FG |
1621 | function, let’s look at a use of the `map` method provided by the `Iterator` |
1622 | trait in the standard library. To use the `map` function to turn a | |
5e7ed085 FG |
1623 | vector of numbers into a vector of strings, we could use a closure, like this: |
1624 | ||
1625 | ``` | |
1626 | let list_of_numbers = vec![1, 2, 3]; | |
1627 | let list_of_strings: Vec<String> = | |
1628 | list_of_numbers.iter().map(|i| i.to_string()).collect(); | |
1629 | ``` | |
1630 | ||
1631 | Or we could name a function as the argument to `map` instead of the closure, | |
1632 | like this: | |
1633 | ||
1634 | ``` | |
1635 | let list_of_numbers = vec![1, 2, 3]; | |
1636 | let list_of_strings: Vec<String> = | |
1637 | list_of_numbers.iter().map(ToString::to_string).collect(); | |
1638 | ``` | |
1639 | ||
1640 | Note that we must use the fully qualified syntax that we talked about earlier | |
1641 | in the “Advanced Traits” section because there are multiple functions available | |
923072b8 FG |
1642 | named `to_string`. |
1643 | ||
1644 | Here, we’re using the `to_string` function defined in the | |
5e7ed085 FG |
1645 | `ToString` trait, which the standard library has implemented for any type that |
1646 | implements `Display`. | |
1647 | ||
1648 | Recall from the “Enum values” section of Chapter 6 that the name of each enum | |
1649 | variant that we define also becomes an initializer function. We can use these | |
1650 | initializer functions as function pointers that implement the closure traits, | |
1651 | which means we can specify the initializer functions as arguments for methods | |
1652 | that take closures, like so: | |
1653 | ||
1654 | ``` | |
1655 | enum Status { | |
1656 | Value(u32), | |
1657 | Stop, | |
1658 | } | |
1659 | ||
1660 | let list_of_statuses: Vec<Status> = (0u32..20).map(Status::Value).collect(); | |
1661 | ``` | |
1662 | ||
1663 | Here we create `Status::Value` instances using each `u32` value in the range | |
1664 | that `map` is called on by using the initializer function of `Status::Value`. | |
1665 | Some people prefer this style, and some people prefer to use closures. They | |
1666 | compile to the same code, so use whichever style is clearer to you. | |
1667 | ||
1668 | ### Returning Closures | |
1669 | ||
1670 | Closures are represented by traits, which means you can’t return closures | |
1671 | directly. In most cases where you might want to return a trait, you can instead | |
1672 | use the concrete type that implements the trait as the return value of the | |
923072b8 | 1673 | function. However, you can’t do that with closures because they don’t have a |
5e7ed085 FG |
1674 | concrete type that is returnable; you’re not allowed to use the function |
1675 | pointer `fn` as a return type, for example. | |
1676 | ||
1677 | The following code tries to return a closure directly, but it won’t compile: | |
1678 | ||
1679 | ``` | |
1680 | fn returns_closure() -> dyn Fn(i32) -> i32 { | |
1681 | |x| x + 1 | |
1682 | } | |
1683 | ``` | |
1684 | ||
1685 | The compiler error is as follows: | |
1686 | ||
1687 | ``` | |
1688 | error[E0746]: return type cannot have an unboxed trait object | |
1689 | --> src/lib.rs:1:25 | |
1690 | | | |
1691 | 1 | fn returns_closure() -> dyn Fn(i32) -> i32 { | |
1692 | | ^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time | |
1693 | | | |
1694 | = note: for information on `impl Trait`, see <https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits> | |
1695 | help: use `impl Fn(i32) -> i32` as the return type, as all return paths are of type `[closure@src/lib.rs:2:5: 2:14]`, which implements `Fn(i32) -> i32` | |
1696 | | | |
1697 | 1 | fn returns_closure() -> impl Fn(i32) -> i32 { | |
1698 | | ~~~~~~~~~~~~~~~~~~~ | |
1699 | ``` | |
1700 | ||
1701 | The error references the `Sized` trait again! Rust doesn’t know how much space | |
1702 | it will need to store the closure. We saw a solution to this problem earlier. | |
1703 | We can use a trait object: | |
1704 | ||
1705 | ``` | |
1706 | fn returns_closure() -> Box<dyn Fn(i32) -> i32> { | |
1707 | Box::new(|x| x + 1) | |
1708 | } | |
1709 | ``` | |
1710 | ||
1711 | This code will compile just fine. For more about trait objects, refer to the | |
1712 | section “Using Trait Objects That Allow for Values of Different Types” in | |
1713 | Chapter 17. | |
1714 | ||
1715 | Next, let’s look at macros! | |
1716 | ||
1717 | ## Macros | |
1718 | ||
1719 | We’ve used macros like `println!` throughout this book, but we haven’t fully | |
1720 | explored what a macro is and how it works. The term *macro* refers to a family | |
1721 | of features in Rust: *declarative* macros with `macro_rules!` and three kinds | |
1722 | of *procedural* macros: | |
1723 | ||
1724 | * Custom `#[derive]` macros that specify code added with the `derive` attribute | |
1725 | used on structs and enums | |
1726 | * Attribute-like macros that define custom attributes usable on any item | |
1727 | * Function-like macros that look like function calls but operate on the tokens | |
1728 | specified as their argument | |
1729 | ||
1730 | We’ll talk about each of these in turn, but first, let’s look at why we even | |
1731 | need macros when we already have functions. | |
1732 | ||
1733 | ### The Difference Between Macros and Functions | |
1734 | ||
1735 | Fundamentally, macros are a way of writing code that writes other code, which | |
1736 | is known as *metaprogramming*. In Appendix C, we discuss the `derive` | |
1737 | attribute, which generates an implementation of various traits for you. We’ve | |
1738 | also used the `println!` and `vec!` macros throughout the book. All of these | |
1739 | macros *expand* to produce more code than the code you’ve written manually. | |
1740 | ||
1741 | Metaprogramming is useful for reducing the amount of code you have to write and | |
1742 | maintain, which is also one of the roles of functions. However, macros have | |
1743 | some additional powers that functions don’t. | |
1744 | ||
1745 | A function signature must declare the number and type of parameters the | |
1746 | function has. Macros, on the other hand, can take a variable number of | |
1747 | parameters: we can call `println!("hello")` with one argument or | |
1748 | `println!("hello {}", name)` with two arguments. Also, macros are expanded | |
1749 | before the compiler interprets the meaning of the code, so a macro can, for | |
1750 | example, implement a trait on a given type. A function can’t, because it gets | |
1751 | called at runtime and a trait needs to be implemented at compile time. | |
1752 | ||
1753 | The downside to implementing a macro instead of a function is that macro | |
1754 | definitions are more complex than function definitions because you’re writing | |
1755 | Rust code that writes Rust code. Due to this indirection, macro definitions are | |
1756 | generally more difficult to read, understand, and maintain than function | |
1757 | definitions. | |
1758 | ||
1759 | Another important difference between macros and functions is that you must | |
1760 | define macros or bring them into scope *before* you call them in a file, as | |
1761 | opposed to functions you can define anywhere and call anywhere. | |
1762 | ||
1763 | ### Declarative Macros with `macro_rules!` for General Metaprogramming | |
1764 | ||
923072b8 FG |
1765 | The most widely used form of macros in Rust is the *declarative macro*. These |
1766 | are also sometimes referred to as “macros by example,” “`macro_rules!` macros,” | |
1767 | or just plain “macros.” At their core, declarative macros allow you to write | |
5e7ed085 FG |
1768 | something similar to a Rust `match` expression. As discussed in Chapter 6, |
1769 | `match` expressions are control structures that take an expression, compare the | |
1770 | resulting value of the expression to patterns, and then run the code associated | |
1771 | with the matching pattern. Macros also compare a value to patterns that are | |
1772 | associated with particular code: in this situation, the value is the literal | |
1773 | Rust source code passed to the macro; the patterns are compared with the | |
1774 | structure of that source code; and the code associated with each pattern, when | |
1775 | matched, replaces the code passed to the macro. This all happens during | |
1776 | compilation. | |
1777 | ||
1778 | To define a macro, you use the `macro_rules!` construct. Let’s explore how to | |
1779 | use `macro_rules!` by looking at how the `vec!` macro is defined. Chapter 8 | |
1780 | covered how we can use the `vec!` macro to create a new vector with particular | |
1781 | values. For example, the following macro creates a new vector containing three | |
1782 | integers: | |
1783 | ||
1784 | ``` | |
1785 | let v: Vec<u32> = vec![1, 2, 3]; | |
1786 | ``` | |
1787 | ||
1788 | We could also use the `vec!` macro to make a vector of two integers or a vector | |
1789 | of five string slices. We wouldn’t be able to use a function to do the same | |
1790 | because we wouldn’t know the number or type of values up front. | |
1791 | ||
1792 | Listing 19-28 shows a slightly simplified definition of the `vec!` macro. | |
1793 | ||
1794 | Filename: src/lib.rs | |
1795 | ||
1796 | ``` | |
1797 | [1] #[macro_export] | |
1798 | [2] macro_rules! vec { | |
1799 | [3] ( $( $x:expr ),* ) => { | |
1800 | { | |
1801 | let mut temp_vec = Vec::new(); | |
1802 | [4] $( | |
1803 | [5] temp_vec.push($x [6]); | |
1804 | )* | |
1805 | [7] temp_vec | |
1806 | } | |
1807 | }; | |
1808 | } | |
1809 | ``` | |
1810 | ||
1811 | Listing 19-28: A simplified version of the `vec!` macro definition | |
1812 | ||
1813 | > Note: The actual definition of the `vec!` macro in the standard library | |
1814 | > includes code to preallocate the correct amount of memory up front. That code | |
1815 | > is an optimization that we don’t include here to make the example simpler. | |
1816 | ||
1817 | The `#[macro_export]` annotation [1] indicates that this macro should be made | |
1818 | available whenever the crate in which the macro is defined is brought into | |
1819 | scope. Without this annotation, the macro can’t be brought into scope. | |
1820 | ||
1821 | We then start the macro definition with `macro_rules!` and the name of the | |
1822 | macro we’re defining *without* the exclamation mark [2]. The name, in this case | |
1823 | `vec`, is followed by curly brackets denoting the body of the macro definition. | |
1824 | ||
1825 | The structure in the `vec!` body is similar to the structure of a `match` | |
1826 | expression. Here we have one arm with the pattern `( $( $x:expr ),* )`, | |
1827 | followed by `=>` and the block of code associated with this pattern [3]. If the | |
1828 | pattern matches, the associated block of code will be emitted. Given that this | |
1829 | is the only pattern in this macro, there is only one valid way to match; any | |
1830 | other pattern will result in an error. More complex macros will have more than | |
1831 | one arm. | |
1832 | ||
1833 | Valid pattern syntax in macro definitions is different than the pattern syntax | |
1834 | covered in Chapter 18 because macro patterns are matched against Rust code | |
1835 | structure rather than values. Let’s walk through what the pattern pieces in | |
1836 | Listing 19-28 mean; for the full macro pattern syntax, see the Rust Reference | |
1837 | at *https://doc.rust-lang.org/reference/macros-by-example.html*. | |
1838 | ||
923072b8 FG |
1839 | First, we use a set of parentheses to encompass the whole pattern. We use a |
1840 | dollar sign (`$`) to declare a variable in the macro system that will contain | |
1841 | the Rust code matching the pattern. The dollar sign makes it clear this is a | |
1842 | macro variable as opposed to a regular Rust variable. | |
1843 | Next comes a set of parentheses that captures values that match the | |
5e7ed085 FG |
1844 | pattern within the parentheses for use in the replacement code. Within `$()` is |
1845 | `$x:expr`, which matches any Rust expression and gives the expression the name | |
1846 | `$x`. | |
1847 | ||
1848 | The comma following `$()` indicates that a literal comma separator character | |
1849 | could optionally appear after the code that matches the code in `$()`. The `*` | |
1850 | specifies that the pattern matches zero or more of whatever precedes the `*`. | |
1851 | ||
1852 | When we call this macro with `vec![1, 2, 3];`, the `$x` pattern matches three | |
1853 | times with the three expressions `1`, `2`, and `3`. | |
1854 | ||
1855 | Now let’s look at the pattern in the body of the code associated with this arm: | |
1856 | `temp_vec.push()` [5] within `$()*` [4][7] is generated for each part that | |
1857 | matches `$()` in the pattern zero or more times depending on how many times the | |
1858 | pattern matches. The `$x` [6] is replaced with each expression matched. When we | |
1859 | call this macro with `vec![1, 2, 3];`, the code generated that replaces this | |
1860 | macro call will be the following: | |
1861 | ||
1862 | ``` | |
1863 | { | |
1864 | let mut temp_vec = Vec::new(); | |
1865 | temp_vec.push(1); | |
1866 | temp_vec.push(2); | |
1867 | temp_vec.push(3); | |
1868 | temp_vec | |
1869 | } | |
1870 | ``` | |
1871 | ||
1872 | We’ve defined a macro that can take any number of arguments of any type and can | |
1873 | generate code to create a vector containing the specified elements. | |
1874 | ||
923072b8 FG |
1875 | To learn more about how to write macros, consult the online documentation or |
1876 | other resources, such as “The Little Book of Rust Macros” at | |
1877 | *https://veykril.github.io/tlborm/* started by Daniel Keep and continued by | |
1878 | Lukas Wirth. | |
5e7ed085 | 1879 | |
923072b8 FG |
1880 | <!-- Not sure what "In the future, Rust will have a second kind of declarative |
1881 | macro" means here. I suspect we're "stuck" with the two kinds of macros we | |
1882 | already have today, at least I don't see much energy in pushing to add a third | |
1883 | just yet. | |
1884 | /JT --> | |
1885 | <!-- Yeah, great catch, I think that part was back when we had more dreams that | |
1886 | have now been postponed/abandoned. I've removed. /Carol --> | |
5e7ed085 | 1887 | |
923072b8 | 1888 | ### Procedural Macros for Generating Code from Attributes |
5e7ed085 | 1889 | |
923072b8 FG |
1890 | The second form of macros is the *procedural macro*, which acts more like a |
1891 | function (and is a type of procedure). Procedural macros accept some code as an | |
1892 | input, operate on that code, and produce some code as an output rather than | |
1893 | matching against patterns and replacing the code with other code as declarative | |
1894 | macros do. The three kinds of procedural macros are custom derive, | |
1895 | attribute-like, and function-like, and all work in a similar fashion. | |
5e7ed085 FG |
1896 | |
1897 | When creating procedural macros, the definitions must reside in their own crate | |
1898 | with a special crate type. This is for complex technical reasons that we hope | |
923072b8 FG |
1899 | to eliminate in the future. In Listing 19-29, we show how to define a |
1900 | procedural macro, where `some_attribute` is a placeholder for using a specific | |
5e7ed085 FG |
1901 | macro variety. |
1902 | ||
1903 | Filename: src/lib.rs | |
1904 | ||
1905 | ``` | |
1906 | use proc_macro; | |
1907 | ||
1908 | #[some_attribute] | |
1909 | pub fn some_name(input: TokenStream) -> TokenStream { | |
1910 | } | |
1911 | ``` | |
1912 | ||
1913 | Listing 19-29: An example of defining a procedural macro | |
1914 | ||
1915 | The function that defines a procedural macro takes a `TokenStream` as an input | |
1916 | and produces a `TokenStream` as an output. The `TokenStream` type is defined by | |
1917 | the `proc_macro` crate that is included with Rust and represents a sequence of | |
1918 | tokens. This is the core of the macro: the source code that the macro is | |
1919 | operating on makes up the input `TokenStream`, and the code the macro produces | |
1920 | is the output `TokenStream`. The function also has an attribute attached to it | |
1921 | that specifies which kind of procedural macro we’re creating. We can have | |
1922 | multiple kinds of procedural macros in the same crate. | |
1923 | ||
1924 | Let’s look at the different kinds of procedural macros. We’ll start with a | |
1925 | custom derive macro and then explain the small dissimilarities that make the | |
1926 | other forms different. | |
1927 | ||
1928 | ### How to Write a Custom `derive` Macro | |
1929 | ||
1930 | Let’s create a crate named `hello_macro` that defines a trait named | |
1931 | `HelloMacro` with one associated function named `hello_macro`. Rather than | |
923072b8 FG |
1932 | making our users implement the `HelloMacro` trait for each of their types, |
1933 | we’ll provide a procedural macro so users can annotate their type with | |
5e7ed085 FG |
1934 | `#[derive(HelloMacro)]` to get a default implementation of the `hello_macro` |
1935 | function. The default implementation will print `Hello, Macro! My name is | |
1936 | TypeName!` where `TypeName` is the name of the type on which this trait has | |
1937 | been defined. In other words, we’ll write a crate that enables another | |
1938 | programmer to write code like Listing 19-30 using our crate. | |
1939 | ||
1940 | Filename: src/main.rs | |
1941 | ||
1942 | ``` | |
1943 | use hello_macro::HelloMacro; | |
1944 | use hello_macro_derive::HelloMacro; | |
1945 | ||
1946 | #[derive(HelloMacro)] | |
1947 | struct Pancakes; | |
1948 | ||
1949 | fn main() { | |
1950 | Pancakes::hello_macro(); | |
1951 | } | |
1952 | ``` | |
1953 | ||
1954 | Listing 19-30: The code a user of our crate will be able to write when using | |
1955 | our procedural macro | |
1956 | ||
1957 | This code will print `Hello, Macro! My name is Pancakes!` when we’re done. The | |
1958 | first step is to make a new library crate, like this: | |
1959 | ||
1960 | ``` | |
1961 | $ cargo new hello_macro --lib | |
1962 | ``` | |
1963 | ||
1964 | Next, we’ll define the `HelloMacro` trait and its associated function: | |
1965 | ||
1966 | Filename: src/lib.rs | |
1967 | ||
1968 | ``` | |
1969 | pub trait HelloMacro { | |
1970 | fn hello_macro(); | |
1971 | } | |
1972 | ``` | |
1973 | ||
1974 | We have a trait and its function. At this point, our crate user could implement | |
1975 | the trait to achieve the desired functionality, like so: | |
1976 | ||
1977 | ``` | |
1978 | use hello_macro::HelloMacro; | |
1979 | ||
1980 | struct Pancakes; | |
1981 | ||
1982 | impl HelloMacro for Pancakes { | |
1983 | fn hello_macro() { | |
1984 | println!("Hello, Macro! My name is Pancakes!"); | |
1985 | } | |
1986 | } | |
1987 | ||
1988 | fn main() { | |
1989 | Pancakes::hello_macro(); | |
1990 | } | |
1991 | ``` | |
1992 | ||
1993 | However, they would need to write the implementation block for each type they | |
1994 | wanted to use with `hello_macro`; we want to spare them from having to do this | |
1995 | work. | |
1996 | ||
1997 | Additionally, we can’t yet provide the `hello_macro` function with default | |
1998 | implementation that will print the name of the type the trait is implemented | |
1999 | on: Rust doesn’t have reflection capabilities, so it can’t look up the type’s | |
2000 | name at runtime. We need a macro to generate code at compile time. | |
2001 | ||
2002 | The next step is to define the procedural macro. At the time of this writing, | |
2003 | procedural macros need to be in their own crate. Eventually, this restriction | |
2004 | might be lifted. The convention for structuring crates and macro crates is as | |
2005 | follows: for a crate named `foo`, a custom derive procedural macro crate is | |
2006 | called `foo_derive`. Let’s start a new crate called `hello_macro_derive` inside | |
2007 | our `hello_macro` project: | |
2008 | ||
2009 | ``` | |
2010 | $ cargo new hello_macro_derive --lib | |
2011 | ``` | |
2012 | ||
2013 | Our two crates are tightly related, so we create the procedural macro crate | |
2014 | within the directory of our `hello_macro` crate. If we change the trait | |
2015 | definition in `hello_macro`, we’ll have to change the implementation of the | |
2016 | procedural macro in `hello_macro_derive` as well. The two crates will need to | |
2017 | be published separately, and programmers using these crates will need to add | |
2018 | both as dependencies and bring them both into scope. We could instead have the | |
2019 | `hello_macro` crate use `hello_macro_derive` as a dependency and re-export the | |
2020 | procedural macro code. However, the way we’ve structured the project makes it | |
2021 | possible for programmers to use `hello_macro` even if they don’t want the | |
2022 | `derive` functionality. | |
2023 | ||
2024 | We need to declare the `hello_macro_derive` crate as a procedural macro crate. | |
2025 | We’ll also need functionality from the `syn` and `quote` crates, as you’ll see | |
2026 | in a moment, so we need to add them as dependencies. Add the following to the | |
2027 | *Cargo.toml* file for `hello_macro_derive`: | |
2028 | ||
2029 | Filename: hello_macro_derive/Cargo.toml | |
2030 | ||
2031 | ``` | |
2032 | [lib] | |
2033 | proc-macro = true | |
2034 | ||
2035 | [dependencies] | |
2036 | syn = "1.0" | |
2037 | quote = "1.0" | |
2038 | ``` | |
2039 | ||
2040 | To start defining the procedural macro, place the code in Listing 19-31 into | |
2041 | your *src/lib.rs* file for the `hello_macro_derive` crate. Note that this code | |
2042 | won’t compile until we add a definition for the `impl_hello_macro` function. | |
2043 | ||
2044 | Filename: hello_macro_derive/src/lib.rs | |
2045 | ||
2046 | ``` | |
2047 | use proc_macro::TokenStream; | |
2048 | use quote::quote; | |
2049 | use syn; | |
2050 | ||
2051 | #[proc_macro_derive(HelloMacro)] | |
2052 | pub fn hello_macro_derive(input: TokenStream) -> TokenStream { | |
2053 | // Construct a representation of Rust code as a syntax tree | |
2054 | // that we can manipulate | |
2055 | let ast = syn::parse(input).unwrap(); | |
2056 | ||
2057 | // Build the trait implementation | |
2058 | impl_hello_macro(&ast) | |
2059 | } | |
2060 | ``` | |
2061 | ||
2062 | Listing 19-31: Code that most procedural macro crates will require in order to | |
2063 | process Rust code | |
2064 | ||
2065 | Notice that we’ve split the code into the `hello_macro_derive` function, which | |
2066 | is responsible for parsing the `TokenStream`, and the `impl_hello_macro` | |
2067 | function, which is responsible for transforming the syntax tree: this makes | |
2068 | writing a procedural macro more convenient. The code in the outer function | |
2069 | (`hello_macro_derive` in this case) will be the same for almost every | |
2070 | procedural macro crate you see or create. The code you specify in the body of | |
2071 | the inner function (`impl_hello_macro` in this case) will be different | |
2072 | depending on your procedural macro’s purpose. | |
2073 | ||
2074 | We’ve introduced three new crates: `proc_macro`, `syn` (available from | |
2075 | *https://crates.io/crates/syn*), and `quote` (available from | |
2076 | *https://crates.io/crates/quote*). The `proc_macro` crate comes with Rust, so | |
2077 | we didn’t need to add that to the dependencies in *Cargo.toml*. The | |
2078 | `proc_macro` crate is the compiler’s API that allows us to read and manipulate | |
2079 | Rust code from our code. | |
2080 | ||
2081 | The `syn` crate parses Rust code from a string into a data structure that we | |
2082 | can perform operations on. The `quote` crate turns `syn` data structures back | |
2083 | into Rust code. These crates make it much simpler to parse any sort of Rust | |
2084 | code we might want to handle: writing a full parser for Rust code is no simple | |
2085 | task. | |
2086 | ||
2087 | The `hello_macro_derive` function will be called when a user of our library | |
2088 | specifies `#[derive(HelloMacro)]` on a type. This is possible because we’ve | |
2089 | annotated the `hello_macro_derive` function here with `proc_macro_derive` and | |
923072b8 | 2090 | specified the name `HelloMacro`, which matches our trait name; this is the |
5e7ed085 FG |
2091 | convention most procedural macros follow. |
2092 | ||
2093 | The `hello_macro_derive` function first converts the `input` from a | |
2094 | `TokenStream` to a data structure that we can then interpret and perform | |
2095 | operations on. This is where `syn` comes into play. The `parse` function in | |
2096 | `syn` takes a `TokenStream` and returns a `DeriveInput` struct representing the | |
2097 | parsed Rust code. Listing 19-32 shows the relevant parts of the `DeriveInput` | |
2098 | struct we get from parsing the `struct Pancakes;` string: | |
2099 | ||
2100 | ``` | |
2101 | DeriveInput { | |
2102 | // --snip-- | |
2103 | ||
2104 | ident: Ident { | |
2105 | ident: "Pancakes", | |
2106 | span: #0 bytes(95..103) | |
2107 | }, | |
2108 | data: Struct( | |
2109 | DataStruct { | |
2110 | struct_token: Struct, | |
2111 | fields: Unit, | |
2112 | semi_token: Some( | |
2113 | Semi | |
2114 | ) | |
2115 | } | |
2116 | ) | |
2117 | } | |
2118 | ``` | |
2119 | ||
2120 | Listing 19-32: The `DeriveInput` instance we get when parsing the code that has | |
2121 | the macro’s attribute in Listing 19-30 | |
2122 | ||
2123 | The fields of this struct show that the Rust code we’ve parsed is a unit struct | |
2124 | with the `ident` (identifier, meaning the name) of `Pancakes`. There are more | |
2125 | fields on this struct for describing all sorts of Rust code; check the `syn` | |
2126 | documentation for `DeriveInput` at | |
2127 | *https://docs.rs/syn/1.0/syn/struct.DeriveInput.html* for more information. | |
2128 | ||
2129 | Soon we’ll define the `impl_hello_macro` function, which is where we’ll build | |
2130 | the new Rust code we want to include. But before we do, note that the output | |
2131 | for our derive macro is also a `TokenStream`. The returned `TokenStream` is | |
2132 | added to the code that our crate users write, so when they compile their crate, | |
2133 | they’ll get the extra functionality that we provide in the modified | |
2134 | `TokenStream`. | |
2135 | ||
2136 | You might have noticed that we’re calling `unwrap` to cause the | |
2137 | `hello_macro_derive` function to panic if the call to the `syn::parse` function | |
2138 | fails here. It’s necessary for our procedural macro to panic on errors because | |
2139 | `proc_macro_derive` functions must return `TokenStream` rather than `Result` to | |
2140 | conform to the procedural macro API. We’ve simplified this example by using | |
2141 | `unwrap`; in production code, you should provide more specific error messages | |
2142 | about what went wrong by using `panic!` or `expect`. | |
2143 | ||
2144 | Now that we have the code to turn the annotated Rust code from a `TokenStream` | |
2145 | into a `DeriveInput` instance, let’s generate the code that implements the | |
2146 | `HelloMacro` trait on the annotated type, as shown in Listing 19-33. | |
2147 | ||
2148 | Filename: hello_macro_derive/src/lib.rs | |
2149 | ||
2150 | ``` | |
2151 | fn impl_hello_macro(ast: &syn::DeriveInput) -> TokenStream { | |
2152 | let name = &ast.ident; | |
2153 | let gen = quote! { | |
2154 | impl HelloMacro for #name { | |
2155 | fn hello_macro() { | |
2156 | println!("Hello, Macro! My name is {}!", stringify!(#name)); | |
2157 | } | |
2158 | } | |
2159 | }; | |
2160 | gen.into() | |
2161 | } | |
2162 | ``` | |
2163 | ||
2164 | Listing 19-33: Implementing the `HelloMacro` trait using the parsed Rust code | |
2165 | ||
2166 | We get an `Ident` struct instance containing the name (identifier) of the | |
2167 | annotated type using `ast.ident`. The struct in Listing 19-32 shows that when | |
2168 | we run the `impl_hello_macro` function on the code in Listing 19-30, the | |
2169 | `ident` we get will have the `ident` field with a value of `"Pancakes"`. Thus, | |
2170 | the `name` variable in Listing 19-33 will contain an `Ident` struct instance | |
2171 | that, when printed, will be the string `"Pancakes"`, the name of the struct in | |
2172 | Listing 19-30. | |
2173 | ||
2174 | The `quote!` macro lets us define the Rust code that we want to return. The | |
2175 | compiler expects something different to the direct result of the `quote!` | |
2176 | macro’s execution, so we need to convert it to a `TokenStream`. We do this by | |
2177 | calling the `into` method, which consumes this intermediate representation and | |
2178 | returns a value of the required `TokenStream` type. | |
2179 | ||
2180 | The `quote!` macro also provides some very cool templating mechanics: we can | |
2181 | enter `#name`, and `quote!` will replace it with the value in the variable | |
2182 | `name`. You can even do some repetition similar to the way regular macros work. | |
2183 | Check out the `quote` crate’s docs at *https://docs.rs/quote* for a thorough | |
2184 | introduction. | |
2185 | ||
2186 | We want our procedural macro to generate an implementation of our `HelloMacro` | |
2187 | trait for the type the user annotated, which we can get by using `#name`. The | |
923072b8 | 2188 | trait implementation has the one function `hello_macro`, whose body contains the |
5e7ed085 FG |
2189 | functionality we want to provide: printing `Hello, Macro! My name is` and then |
2190 | the name of the annotated type. | |
2191 | ||
2192 | The `stringify!` macro used here is built into Rust. It takes a Rust | |
2193 | expression, such as `1 + 2`, and at compile time turns the expression into a | |
2194 | string literal, such as `"1 + 2"`. This is different than `format!` or | |
2195 | `println!`, macros which evaluate the expression and then turn the result into | |
2196 | a `String`. There is a possibility that the `#name` input might be an | |
2197 | expression to print literally, so we use `stringify!`. Using `stringify!` also | |
2198 | saves an allocation by converting `#name` to a string literal at compile time. | |
2199 | ||
2200 | At this point, `cargo build` should complete successfully in both `hello_macro` | |
2201 | and `hello_macro_derive`. Let’s hook up these crates to the code in Listing | |
2202 | 19-30 to see the procedural macro in action! Create a new binary project in | |
2203 | your *projects* directory using `cargo new pancakes`. We need to add | |
2204 | `hello_macro` and `hello_macro_derive` as dependencies in the `pancakes` | |
2205 | crate’s *Cargo.toml*. If you’re publishing your versions of `hello_macro` and | |
2206 | `hello_macro_derive` to *https://crates.io/*, they would be regular | |
2207 | dependencies; if not, you can specify them as `path` dependencies as follows: | |
2208 | ||
2209 | ``` | |
2210 | [dependencies] | |
2211 | hello_macro = { path = "../hello_macro" } | |
2212 | hello_macro_derive = { path = "../hello_macro/hello_macro_derive" } | |
2213 | ``` | |
2214 | ||
2215 | Put the code in Listing 19-30 into *src/main.rs*, and run `cargo run`: it | |
2216 | should print `Hello, Macro! My name is Pancakes!` The implementation of the | |
2217 | `HelloMacro` trait from the procedural macro was included without the | |
2218 | `pancakes` crate needing to implement it; the `#[derive(HelloMacro)]` added the | |
2219 | trait implementation. | |
2220 | ||
2221 | Next, let’s explore how the other kinds of procedural macros differ from custom | |
2222 | derive macros. | |
2223 | ||
2224 | ### Attribute-like macros | |
2225 | ||
2226 | Attribute-like macros are similar to custom derive macros, but instead of | |
2227 | generating code for the `derive` attribute, they allow you to create new | |
2228 | attributes. They’re also more flexible: `derive` only works for structs and | |
2229 | enums; attributes can be applied to other items as well, such as functions. | |
2230 | Here’s an example of using an attribute-like macro: say you have an attribute | |
2231 | named `route` that annotates functions when using a web application framework: | |
2232 | ||
2233 | ``` | |
2234 | #[route(GET, "/")] | |
2235 | fn index() { | |
2236 | ``` | |
2237 | ||
2238 | This `#[route]` attribute would be defined by the framework as a procedural | |
2239 | macro. The signature of the macro definition function would look like this: | |
2240 | ||
2241 | ``` | |
2242 | #[proc_macro_attribute] | |
2243 | pub fn route(attr: TokenStream, item: TokenStream) -> TokenStream { | |
2244 | ``` | |
2245 | ||
2246 | Here, we have two parameters of type `TokenStream`. The first is for the | |
2247 | contents of the attribute: the `GET, "/"` part. The second is the body of the | |
2248 | item the attribute is attached to: in this case, `fn index() {}` and the rest | |
2249 | of the function’s body. | |
2250 | ||
2251 | Other than that, attribute-like macros work the same way as custom derive | |
2252 | macros: you create a crate with the `proc-macro` crate type and implement a | |
2253 | function that generates the code you want! | |
2254 | ||
2255 | ### Function-like macros | |
2256 | ||
2257 | Function-like macros define macros that look like function calls. Similarly to | |
2258 | `macro_rules!` macros, they’re more flexible than functions; for example, they | |
2259 | can take an unknown number of arguments. However, `macro_rules!` macros can be | |
2260 | defined only using the match-like syntax we discussed in the section | |
2261 | “Declarative Macros with `macro_rules!` for General Metaprogramming” earlier. | |
2262 | Function-like macros take a `TokenStream` parameter and their definition | |
2263 | manipulates that `TokenStream` using Rust code as the other two types of | |
2264 | procedural macros do. An example of a function-like macro is an `sql!` macro | |
2265 | that might be called like so: | |
2266 | ||
2267 | ``` | |
2268 | let sql = sql!(SELECT * FROM posts WHERE id=1); | |
2269 | ``` | |
2270 | ||
2271 | This macro would parse the SQL statement inside it and check that it’s | |
2272 | syntactically correct, which is much more complex processing than a | |
2273 | `macro_rules!` macro can do. The `sql!` macro would be defined like this: | |
2274 | ||
2275 | ``` | |
2276 | #[proc_macro] | |
2277 | pub fn sql(input: TokenStream) -> TokenStream { | |
2278 | ``` | |
2279 | ||
2280 | This definition is similar to the custom derive macro’s signature: we receive | |
2281 | the tokens that are inside the parentheses and return the code we wanted to | |
2282 | generate. | |
2283 | ||
923072b8 FG |
2284 | <!-- I may get a few looks for this, but I wonder if we should trim the |
2285 | procedural macros section above a bit. There's a lot of information in there, | |
2286 | but it feels like something we could intro and then point people off to other | |
2287 | materials for. Reason being (and I know I may be in the minority here), | |
2288 | procedural macros are something we should use only rarely in our Rust projects. | |
2289 | They are a burden on the compiler, have the potential to hurt readability and | |
2290 | maintainability, and... you know the saying with great power comes great | |
2291 | responsibilty and all that. /JT --> | |
2292 | <!-- I think we felt obligated to have this section when procedural macros were | |
2293 | introduced because there wasn't any documentation for them. I feel like the | |
2294 | custom derive is the most common kind people want to make... While I'd love to | |
2295 | not have to maintain this section, I asked around and people seemed generally | |
2296 | in favor of keeping it, so I think I will, for now. /Carol --> | |
2297 | ||
5e7ed085 FG |
2298 | ## Summary |
2299 | ||
923072b8 FG |
2300 | Whew! Now you have some Rust features in your toolbox that you likely won’t use |
2301 | often, but you’ll know they’re available in very particular circumstances. | |
2302 | We’ve introduced several complex topics so that when you encounter them in | |
2303 | error message suggestions or in other peoples’ code, you’ll be able to | |
2304 | recognize these concepts and syntax. Use this chapter as a reference to guide | |
2305 | you to solutions. | |
5e7ed085 FG |
2306 | |
2307 | Next, we’ll put everything we’ve discussed throughout the book into practice | |
2308 | and do one more project! |