src/doc/book/nostarch/chapter19.md

   1 <!-- DO NOT EDIT THIS FILE.
   2
   3 This file is periodically generated from the content in the `/src/`
   4 directory, so all fixes need to be made in `/src/`.
   5 -->
   6
   7 [TOC]
   8
   9 # Advanced Features
  10
  11 By now, you’ve learned the most commonly used parts of the Rust programming
  12 language. Before we do one more project in Chapter 20, we’ll look at a few
  13 aspects of the language you might run into every once in a while, but may not
  14 use every day. You can use this chapter as a reference for when you encounter
  15 any unknowns. The features covered here are useful in very specific situations.
  16 Although you might not reach for them often, we want to make sure you have a
  17 grasp of all the features Rust has to offer.
  18
  19 In this chapter, we’ll cover:
  20
  21 * Unsafe Rust: how to opt out of some of Rust’s guarantees and take
  22   responsibility for manually upholding those guarantees
  23 * Advanced traits: associated types, default type parameters, fully qualified
  24   syntax, supertraits, and the newtype pattern in relation to traits
  25 * Advanced types: more about the newtype pattern, type aliases, the never type,
  26   and dynamically sized types
  27 * Advanced functions and closures: function pointers and returning closures
  28 * Macros: ways to define code that defines more code at compile time
  29
  30 It’s a panoply of Rust features with something for everyone! Let’s dive in!
  31
  32 ## Unsafe Rust
  33
  34 All the code we’ve discussed so far has had Rust’s memory safety guarantees
  35 enforced at compile time. However, Rust has a second language hidden inside it
  36 that doesn’t enforce these memory safety guarantees: it’s called *unsafe Rust*
  37 and works just like regular Rust, but gives us extra superpowers.
  38
  39 Unsafe Rust exists because, by nature, static analysis is conservative. When
  40 the compiler tries to determine whether or not code upholds the guarantees,
  41 it’s better for it to reject some valid programs than to accept some invalid
  42 programs. Although the code *might* be okay, if the Rust compiler doesn’t have
  43 enough information to be confident, it will reject the code. In these cases,
  44 you can use unsafe code to tell the compiler, “Trust me, I know what I’m
  45 doing.” Be warned, however, that you use unsafe Rust at your own risk: if you
  46 use unsafe code incorrectly, problems can occur due to memory unsafety, such as
  47 null pointer dereferencing.
  48
  49 Another reason Rust has an unsafe alter ego is that the underlying computer
  50 hardware is inherently unsafe. If Rust didn’t let you do unsafe operations, you
  51 couldn’t do certain tasks. Rust needs to allow you to do low-level systems
  52 programming, such as directly interacting with the operating system or even
  53 writing your own operating system. Working with low-level systems programming
  54 is one of the goals of the language. Let’s explore what we can do with unsafe
  55 Rust and how to do it.
  56
  57 ### Unsafe Superpowers
  58
  59 To switch to unsafe Rust, use the `unsafe` keyword and then start a new block
  60 that holds the unsafe code. You can take five actions in unsafe Rust that you
  61 can’t in safe Rust, which we call *unsafe superpowers*. Those superpowers
  62 include the ability to:
  63
  64 * Dereference a raw pointer
  65 * Call an unsafe function or method
  66 * Access or modify a mutable static variable
  67 * Implement an unsafe trait
  68 * Access fields of `union`s
  69
  70 It’s important to understand that `unsafe` doesn’t turn off the borrow checker
  71 or disable any other of Rust’s safety checks: if you use a reference in unsafe
  72 code, it will still be checked. The `unsafe` keyword only gives you access to
  73 these five features that are then not checked by the compiler for memory
  74 safety. You’ll still get some degree of safety inside of an unsafe block.
  75
  76 In addition, `unsafe` does not mean the code inside the block is necessarily
  77 dangerous or that it will definitely have memory safety problems: the intent is
  78 that as the programmer, you’ll ensure the code inside an `unsafe` block will
  79 access memory in a valid way.
  80
  81 People are fallible, and mistakes will happen, but by requiring these five
  82 unsafe operations to be inside blocks annotated with `unsafe` you’ll know that
  83 any errors related to memory safety must be within an `unsafe` block. Keep
  84 `unsafe` blocks small; you’ll be thankful later when you investigate memory
  85 bugs.
  86
  87 To isolate unsafe code as much as possible, it’s best to enclose unsafe code
  88 within a safe abstraction and provide a safe API, which we’ll discuss later in
  89 the chapter when we examine unsafe functions and methods. Parts of the standard
  90 library are implemented as safe abstractions over unsafe code that has been
  91 audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe`
  92 from leaking out into all the places that you or your users might want to use
  93 the functionality implemented with `unsafe` code, because using a safe
  94 abstraction is safe.
  95
  96 Let’s look at each of the five unsafe superpowers in turn. We’ll also look at
  97 some abstractions that provide a safe interface to unsafe code.
  98
  99 ### Dereferencing a Raw Pointer
 100
 101 In Chapter 4, in the “Dangling References” section, we mentioned that the
 102 compiler ensures references are always valid. Unsafe Rust has two new types
 103 called *raw pointers* that are similar to references. As with references, raw
 104 pointers can be immutable or mutable and are written as `*const T` and `*mut
 105 T`, respectively. The asterisk isn’t the dereference operator; it’s part of the
 106 type name. In the context of raw pointers, *immutable* means that the pointer
 107 can’t be directly assigned to after being dereferenced.
 108
 109 Different from references and smart pointers, raw pointers:
 110
 111 * Are allowed to ignore the borrowing rules by having both immutable and
 112   mutable pointers or multiple mutable pointers to the same location
 113 * Aren’t guaranteed to point to valid memory
 114 * Are allowed to be null
 115 * Don’t implement any automatic cleanup
 116
 117 By opting out of having Rust enforce these guarantees, you can give up
 118 guaranteed safety in exchange for greater performance or the ability to
 119 interface with another language or hardware where Rust’s guarantees don’t apply.
 120
 121 Listing 19-1 shows how to create an immutable and a mutable raw pointer from
 122 references.
 123
 124 ```
 125 let mut num = 5;
 126
 127 let r1 = &num as *const i32;
 128 let r2 = &mut num as *mut i32;
 129 ```
 130
 131 Listing 19-1: Creating raw pointers from references
 132
 133 Notice that we don’t include the `unsafe` keyword in this code. We can create
 134 raw pointers in safe code; we just can’t dereference raw pointers outside an
 135 unsafe block, as you’ll see in a bit.
 136
 137 We’ve created raw pointers by using `as` to cast an immutable and a mutable
 138 reference into their corresponding raw pointer types. Because we created them
 139 directly from references guaranteed to be valid, we know these particular raw
 140 pointers are valid, but we can’t make that assumption about just any raw
 141 pointer.
 142
 143 To demonstrate this, next we’ll create a raw pointer whose validity we can’t be
 144 so certain of. Listing 19-2 shows how to create a raw pointer to an arbitrary
 145 location in memory. Trying to use arbitrary memory is undefined: there might be
 146 data at that address or there might not, the compiler might optimize the code
 147 so there is no memory access, or the program might error with a segmentation
 148 fault. Usually, there is no good reason to write code like this, but it is
 149 possible.
 150
 151 ```
 152 let address = 0x012345usize;
 153 let r = address as *const i32;
 154 ```
 155
 156 Listing 19-2: Creating a raw pointer to an arbitrary memory address
 157
 158 Recall that we can create raw pointers in safe code, but we can’t *dereference*
 159 raw pointers and read the data being pointed to. In Listing 19-3, we use the
 160 dereference operator `*` on a raw pointer that requires an `unsafe` block.
 161
 162 ```
 163 let mut num = 5;
 164
 165 let r1 = &num as *const i32;
 166 let r2 = &mut num as *mut i32;
 167
 168 unsafe {
 169     println!("r1 is: {}", *r1);
 170     println!("r2 is: {}", *r2);
 171 }
 172 ```
 173
 174 Listing 19-3: Dereferencing raw pointers within an `unsafe` block
 175
 176 Creating a pointer does no harm; it’s only when we try to access the value that
 177 it points at that we might end up dealing with an invalid value.
 178
 179 Note also that in Listing 19-1 and 19-3, we created `*const i32` and `*mut i32`
 180 raw pointers that both pointed to the same memory location, where `num` is
 181 stored. If we instead tried to create an immutable and a mutable reference to
 182 `num`, the code would not have compiled because Rust’s ownership rules don’t
 183 allow a mutable reference at the same time as any immutable references. With
 184 raw pointers, we can create a mutable pointer and an immutable pointer to the
 185 same location and change data through the mutable pointer, potentially creating
 186 a data race. Be careful!
 187
 188 With all of these dangers, why would you ever use raw pointers? One major use
 189 case is when interfacing with C code, as you’ll see in the next section,
 190 “Calling an Unsafe Function or Method.” Another case is when building up safe
 191 abstractions that the borrow checker doesn’t understand. We’ll introduce unsafe
 192 functions and then look at an example of a safe abstraction that uses unsafe
 193 code.
 194
 195 ### Calling an Unsafe Function or Method
 196
 197 The second type of operation you can perform in an unsafe block is calling
 198 unsafe functions. Unsafe functions and methods look exactly like regular
 199 functions and methods, but they have an extra `unsafe` before the rest of the
 200 definition. The `unsafe` keyword in this context indicates the function has
 201 requirements we need to uphold when we call this function, because Rust can’t
 202 guarantee we’ve met these requirements. By calling an unsafe function within an
 203 `unsafe` block, we’re saying that we’ve read this function’s documentation and
 204 take responsibility for upholding the function’s contracts.
 205
 206 Here is an unsafe function named `dangerous` that doesn’t do anything in its
 207 body:
 208
 209 ```
 210 unsafe fn dangerous() {}
 211
 212 unsafe {
 213     dangerous();
 214 }
 215 ```
 216
 217 We must call the `dangerous` function within a separate `unsafe` block. If we
 218 try to call `dangerous` without the `unsafe` block, we’ll get an error:
 219
 220 ```
 221 error[E0133]: call to unsafe function is unsafe and requires unsafe function or block
 222  --> src/main.rs:4:5
 223   |
 224 4 |     dangerous();
 225   |     ^^^^^^^^^^^ call to unsafe function
 226   |
 227   = note: consult the function's documentation for information on how to avoid undefined behavior
 228 ```
 229
 230 With the `unsafe` block, we’re asserting to Rust that we’ve read the function’s
 231 documentation, we understand how to use it properly, and we’ve verified that
 232 we’re fulfilling the contract of the function.
 233
 234 Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other
 235 unsafe operations within an unsafe function, we don’t need to add another
 236 `unsafe` block.
 237
 238 #### Creating a Safe Abstraction over Unsafe Code
 239
 240 Just because a function contains unsafe code doesn’t mean we need to mark the
 241 entire function as unsafe. In fact, wrapping unsafe code in a safe function is
 242 a common abstraction. As an example, let’s study the `split_at_mut` function
 243 from the standard library, which requires some unsafe code. We’ll explore how
 244 we might implement it. This safe method is defined on mutable slices: it takes
 245 one slice and makes it two by splitting the slice at the index given as an
 246 argument. Listing 19-4 shows how to use `split_at_mut`.
 247
 248 ```
 249 let mut v = vec![1, 2, 3, 4, 5, 6];
 250
 251 let r = &mut v[..];
 252
 253 let (a, b) = r.split_at_mut(3);
 254
 255 assert_eq!(a, &mut [1, 2, 3]);
 256 assert_eq!(b, &mut [4, 5, 6]);
 257 ```
 258
 259 Listing 19-4: Using the safe `split_at_mut` function
 260
 261 We can’t implement this function using only safe Rust. An attempt might look
 262 something like Listing 19-5, which won’t compile. For simplicity, we’ll
 263 implement `split_at_mut` as a function rather than a method and only for slices
 264 of `i32` values rather than for a generic type `T`.
 265
 266 ```
 267 fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 268     let len = values.len();
 269
 270     assert!(mid <= len);
 271
 272     (&mut values[..mid], &mut values[mid..])
 273 }
 274 ```
 275
 276 Listing 19-5: An attempted implementation of `split_at_mut` using only safe Rust
 277
 278 This function first gets the total length of the slice. Then it asserts that
 279 the index given as a parameter is within the slice by checking whether it’s
 280 less than or equal to the length. The assertion means that if we pass an index
 281 that is greater than the length to split the slice at, the function will panic
 282 before it attempts to use that index.
 283
 284 Then we return two mutable slices in a tuple: one from the start of the
 285 original slice to the `mid` index and another from `mid` to the end of the
 286 slice.
 287
 288 When we try to compile the code in Listing 19-5, we’ll get an error:
 289
 290 ```
 291 error[E0499]: cannot borrow `*values` as mutable more than once at a time
 292  --> src/main.rs:6:31
 293   |
 294 1 | fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 295   |                         - let's call the lifetime of this reference `'1`
 296 ...
 297 6 |     (&mut values[..mid], &mut values[mid..])
 298   |     --------------------------^^^^^^--------
 299   |     |     |                   |
 300   |     |     |                   second mutable borrow occurs here
 301   |     |     first mutable borrow occurs here
 302   |     returning this value requires that `*values` is borrowed for `'1`
 303 ```
 304
 305 Rust’s borrow checker can’t understand that we’re borrowing different parts of
 306 the slice; it only knows that we’re borrowing from the same slice twice.
 307 Borrowing different parts of a slice is fundamentally okay because the two
 308 slices aren’t overlapping, but Rust isn’t smart enough to know this. When we
 309 know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.
 310
 311 Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls
 312 to unsafe functions to make the implementation of `split_at_mut` work.
 313
 314 ```
 315 use std::slice;
 316
 317 fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
 318     [1] let len = values.len();
 319     [2] let ptr = values.as_mut_ptr();
 320
 321     [3] assert!(mid <= len);
 322
 323     [4] unsafe {
 324         (
 325             [5] slice::from_raw_parts_mut(ptr, mid),
 326             [6] slice::from_raw_parts_mut(ptr.add(mid), len - mid),
 327         )
 328     }
 329 }
 330 ```
 331
 332 Listing 19-6: Using unsafe code in the implementation of the `split_at_mut`
 333 function
 334
 335
 336 Recall from “The Slice Type” section in Chapter 4 that a slice is a pointer to
 337 some data and the length of the slice. We use the `len` method to get the
 338 length of a slice [1] and the `as_mut_ptr` method to access the raw pointer of
 339 a slice [2]. In this case, because we have a mutable slice to `i32` values,
 340 `as_mut_ptr` returns a raw pointer with the type `*mut i32`, which we’ve stored
 341 in the variable `ptr`.
 342
 343 We keep the assertion that the `mid` index is within the slice [3]. Then we get
 344 to the unsafe code [4]: the `slice::from_raw_parts_mut` function takes a raw
 345 pointer and a length, and it creates a slice. We use it to create a slice that
 346 starts from `ptr` and is `mid` items long [5]. Then we call the `add` method on
 347 `ptr` with `mid` as an argument to get a raw pointer that starts at `mid`, and
 348 we create a slice using that pointer and the remaining number of items after
 349 `mid` as the length [6].
 350
 351 The function `slice::from_raw_parts_mut` is unsafe because it takes a raw
 352 pointer and must trust that this pointer is valid. The `add` method on raw
 353 pointers is also unsafe, because it must trust that the offset location is also
 354 a valid pointer. Therefore, we had to put an `unsafe` block around our calls to
 355 `slice::from_raw_parts_mut` and `add` so we could call them. By looking at
 356 the code and by adding the assertion that `mid` must be less than or equal to
 357 `len`, we can tell that all the raw pointers used within the `unsafe` block
 358 will be valid pointers to data within the slice. This is an acceptable and
 359 appropriate use of `unsafe`.
 360
 361 Note that we don’t need to mark the resulting `split_at_mut` function as
 362 `unsafe`, and we can call this function from safe Rust. We’ve created a safe
 363 abstraction to the unsafe code with an implementation of the function that uses
 364 `unsafe` code in a safe way, because it creates only valid pointers from the
 365 data this function has access to.
 366
 367 In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would
 368 likely crash when the slice is used. This code takes an arbitrary memory
 369 location and creates a slice 10,000 items long.
 370
 371 ```
 372 use std::slice;
 373
 374 let address = 0x01234usize;
 375 let r = address as *mut i32;
 376
 377 let values: &[i32] = unsafe { slice::from_raw_parts_mut(r, 10000) };
 378 ```
 379
 380 Listing 19-7: Creating a slice from an arbitrary memory location
 381
 382 We don’t own the memory at this arbitrary location, and there is no guarantee
 383 that the slice this code creates contains valid `i32` values. Attempting to use
 384 `values` as though it’s a valid slice results in undefined behavior.
 385
 386 #### Using `extern` Functions to Call External Code
 387
 388 Sometimes, your Rust code might need to interact with code written in another
 389 language. For this, Rust has the keyword `extern` that facilitates the creation
 390 and use of a *Foreign Function Interface (FFI)*. An FFI is a way for a
 391 programming language to define functions and enable a different (foreign)
 392 programming language to call those functions.
 393
 394 Listing 19-8 demonstrates how to set up an integration with the `abs` function
 395 from the C standard library. Functions declared within `extern` blocks are
 396 always unsafe to call from Rust code. The reason is that other languages don’t
 397 enforce Rust’s rules and guarantees, and Rust can’t check them, so
 398 responsibility falls on the programmer to ensure safety.
 399
 400 Filename: src/main.rs
 401
 402 ```
 403 extern "C" {
 404     fn abs(input: i32) -> i32;
 405 }
 406
 407 fn main() {
 408     unsafe {
 409         println!("Absolute value of -3 according to C: {}", abs(-3));
 410     }
 411 }
 412 ```
 413
 414 Listing 19-8: Declaring and calling an `extern` function defined in another
 415 language
 416
 417 Within the `extern "C"` block, we list the names and signatures of external
 418 functions from another language we want to call. The `"C"` part defines which
 419 *application binary interface (ABI)* the external function uses: the ABI
 420 defines how to call the function at the assembly level. The `"C"` ABI is the
 421 most common and follows the C programming language’s ABI.
 422
 423 <!-- Totally optional - but do we want to mention the other external types
 424 that Rust supports here? Also, do we want to mention there are helper
 425 crates for connecting to other languages, include C++?
 426 /JT -->
 427 <!-- I don't really want to get into the other external types or other
 428 languages; there are other resources that cover these topics better than I
 429 could here. /Carol -->
 430
 431 > #### Calling Rust Functions from Other Languages
 432 >
 433 > We can also use `extern` to create an interface that allows other languages
 434 > to call Rust functions. Instead of an creating a whole `extern` block, we add
 435 > the `extern` keyword and specify the ABI to use just before the `fn` keyword
 436 > for the relevant function. We also need to add a `#[no_mangle]` annotation to
 437 > tell the Rust compiler not to mangle the name of this function. *Mangling* is
 438 > when a compiler changes the name we’ve given a function to a different name
 439 > that contains more information for other parts of the compilation process to
 440 > consume but is less human readable. Every programming language compiler
 441 > mangles names slightly differently, so for a Rust function to be nameable by
 442 > other languages, we must disable the Rust compiler’s name mangling.
 443 >
 444 > In the following example, we make the `call_from_c` function accessible from
 445 > C code, after it’s compiled to a shared library and linked from C:
 446 >
 447 > ```
 448 > #[no_mangle]
 449 > pub extern "C" fn call_from_c() {
 450 >     println!("Just called a Rust function from C!");
 451 > }
 452 > ```
 453 >
 454 > This usage of `extern` does not require `unsafe`.
 455
 456 ### Accessing or Modifying a Mutable Static Variable
 457
 458 In this book, we’ve not yet talked about *global variables*, which Rust does
 459 support but can be problematic with Rust’s ownership rules. If two threads are
 460 accessing the same mutable global variable, it can cause a data race.
 461
 462 In Rust, global variables are called *static* variables. Listing 19-9 shows an
 463 example declaration and use of a static variable with a string slice as a
 464 value.
 465
 466 Filename: src/main.rs
 467
 468 ```
 469 static HELLO_WORLD: &str = "Hello, world!";
 470
 471 fn main() {
 472     println!("name is: {}", HELLO_WORLD);
 473 }
 474 ```
 475
 476 Listing 19-9: Defining and using an immutable static variable
 477
 478 Static variables are similar to constants, which we discussed in the
 479 “Differences Between Variables and Constants” section in Chapter 3. The names
 480 of static variables are in `SCREAMING_SNAKE_CASE` by convention. Static
 481 variables can only store references with the `'static` lifetime, which means
 482 the Rust compiler can figure out the lifetime and we aren’t required to
 483 annotate it explicitly. Accessing an immutable static variable is safe.
 484
 485 A subtle difference between constants and immutable static variables is that
 486 values in a static variable have a fixed address in memory. Using the value
 487 will always access the same data. Constants, on the other hand, are allowed to
 488 duplicate their data whenever they’re used. Another difference is that static
 489 variables can be mutable. Accessing and modifying mutable static variables is
 490 *unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable
 491 static variable named `COUNTER`.
 492
 493 Filename: src/main.rs
 494
 495 ```
 496 static mut COUNTER: u32 = 0;
 497
 498 fn add_to_count(inc: u32) {
 499     unsafe {
 500         COUNTER += inc;
 501     }
 502 }
 503
 504 fn main() {
 505     add_to_count(3);
 506
 507     unsafe {
 508         println!("COUNTER: {}", COUNTER);
 509     }
 510 }
 511 ```
 512
 513 Listing 19-10: Reading from or writing to a mutable static variable is unsafe
 514
 515 As with regular variables, we specify mutability using the `mut` keyword. Any
 516 code that reads or writes from `COUNTER` must be within an `unsafe` block. This
 517 code compiles and prints `COUNTER: 3` as we would expect because it’s single
 518 threaded. Having multiple threads access `COUNTER` would likely result in data
 519 races.
 520
 521 With mutable data that is globally accessible, it’s difficult to ensure there
 522 are no data races, which is why Rust considers mutable static variables to be
 523 unsafe. Where possible, it’s preferable to use the concurrency techniques and
 524 thread-safe smart pointers we discussed in Chapter 16 so the compiler checks
 525 that data accessed from different threads is done safely.
 526
 527 ### Implementing an Unsafe Trait
 528
 529 We can use `unsafe` to implement an unsafe trait. A trait is unsafe when at
 530 least one of its methods has some invariant that the compiler can’t verify. We
 531 declare that a trait is `unsafe` by adding the `unsafe` keyword before `trait`
 532 and marking the implementation of the trait as `unsafe` too, as shown in
 533 Listing 19-11.
 534
 535 ```
 536 unsafe trait Foo {
 537     // methods go here
 538 }
 539
 540 unsafe impl Foo for i32 {
 541     // method implementations go here
 542 }
 543
 544 fn main() {}
 545 ```
 546
 547 Listing 19-11: Defining and implementing an unsafe trait
 548
 549 By using `unsafe impl`, we’re promising that we’ll uphold the invariants that
 550 the compiler can’t verify.
 551
 552 As an example, recall the `Sync` and `Send` marker traits we discussed in the
 553 “Extensible Concurrency with the `Sync` and `Send` Traits” section in Chapter
 554 16: the compiler implements these traits automatically if our types are
 555 composed entirely of `Send` and `Sync` types. If we implement a type that
 556 contains a type that is not `Send` or `Sync`, such as raw pointers, and we want
 557 to mark that type as `Send` or `Sync`, we must use `unsafe`. Rust can’t verify
 558 that our type upholds the guarantees that it can be safely sent across threads
 559 or accessed from multiple threads; therefore, we need to do those checks
 560 manually and indicate as such with `unsafe`.
 561
 562 ### Accessing Fields of a Union
 563
 564 The final action that works only with `unsafe` is accessing fields of a
 565 *union*. A `union` is similar to a `struct`, but only one declared field is
 566 used in a particular instance at one time. Unions are primarily used to
 567 interface with unions in C code. Accessing union fields is unsafe because Rust
 568 can’t guarantee the type of the data currently being stored in the union
 569 instance. You can learn more about unions in the Rust Reference at
 570 *https://doc.rust-lang.org/reference/items/unions.html*.
 571
 572 ### When to Use Unsafe Code
 573
 574 Using `unsafe` to use one of the five superpowers just discussed isn’t wrong or
 575 even frowned upon, but it is trickier to get `unsafe` code correct because the
 576 compiler can’t help uphold memory safety. When you have a reason to use
 577 `unsafe` code, you can do so, and having the explicit `unsafe` annotation makes
 578 it easier to track down the source of problems when they occur.
 579
 580 ## Advanced Traits
 581
 582 We first covered traits in the “Traits: Defining Shared Behavior” section of
 583 Chapter 10, but we didn’t discuss the more advanced details. Now that you know
 584 more about Rust, we can get into the nitty-gritty.
 585
 586 ### Specifying Placeholder Types in Trait Definitions with Associated Types
 587
 588 *Associated types* connect a type placeholder with a trait such that the trait
 589 method definitions can use these placeholder types in their signatures. The
 590 implementor of a trait will specify the concrete type to be used instead of the
 591 placeholder type for the particular implementation. That way, we can define a
 592 trait that uses some types without needing to know exactly what those types are
 593 until the trait is implemented.
 594
 595 We’ve described most of the advanced features in this chapter as being rarely
 596 needed. Associated types are somewhere in the middle: they’re used more rarely
 597 than features explained in the rest of the book but more commonly than many of
 598 the other features discussed in this chapter.
 599
 600 One example of a trait with an associated type is the `Iterator` trait that the
 601 standard library provides. The associated type is named `Item` and stands in
 602 for the type of the values the type implementing the `Iterator` trait is
 603 iterating over. The definition of the `Iterator` trait is as shown in Listing
 604 19-12.
 605
 606 ```
 607 pub trait Iterator {
 608     type Item;
 609
 610     fn next(&mut self) -> Option<Self::Item>;
 611 }
 612 ```
 613
 614 Listing 19-12: The definition of the `Iterator` trait that has an associated
 615 type `Item`
 616
 617 The type `Item` is a placeholder, and the `next` method’s definition shows that
 618 it will return values of type `Option<Self::Item>`. Implementors of the
 619 `Iterator` trait will specify the concrete type for `Item`, and the `next`
 620 method will return an `Option` containing a value of that concrete type.
 621
 622 Associated types might seem like a similar concept to generics, in that the
 623 latter allow us to define a function without specifying what types it can
 624 handle. To examine the difference between the two concepts, we’ll look at an
 625 implementation of the `Iterator` trait on a type named `Counter` that specifies
 626 the `Item` type is `u32`:
 627
 628 Filename: src/lib.rs
 629
 630 ```
 631 impl Iterator for Counter {
 632     type Item = u32;
 633
 634     fn next(&mut self) -> Option<Self::Item> {
 635         // --snip--
 636 ```
 637
 638 This syntax seems comparable to that of generics. So why not just define the
 639 `Iterator` trait with generics, as shown in Listing 19-13?
 640
 641 ```
 642 pub trait Iterator<T> {
 643     fn next(&mut self) -> Option<T>;
 644 }
 645 ```
 646
 647 Listing 19-13: A hypothetical definition of the `Iterator` trait using generics
 648
 649 The difference is that when using generics, as in Listing 19-13, we must
 650 annotate the types in each implementation; because we can also implement
 651 `Iterator<String> for Counter` or any other type, we could have multiple
 652 implementations of `Iterator` for `Counter`. In other words, when a trait has a
 653 generic parameter, it can be implemented for a type multiple times, changing
 654 the concrete types of the generic type parameters each time. When we use the
 655 `next` method on `Counter`, we would have to provide type annotations to
 656 indicate which implementation of `Iterator` we want to use.
 657
 658 With associated types, we don’t need to annotate types because we can’t
 659 implement a trait on a type multiple times. In Listing 19-12 with the
 660 definition that uses associated types, we can only choose what the type of
 661 `Item` will be once, because there can only be one `impl Iterator for Counter`.
 662 We don’t have to specify that we want an iterator of `u32` values everywhere
 663 that we call `next` on `Counter`.
 664
 665 Associated types also become part of the trait’s contract: implementors of the
 666 trait must provide a type to stand in for the associated type placeholder.
 667 Associated types often have a name that describes how the type will be used,
 668 and documenting the associated type in the API documentation is good practice.
 669
 670 <!-- It also makes the type a part of the trait's contract. Not sure if
 671 too subtle of a point, but the associated type of a trait is part of the
 672 require things that the implementor must provide. They often also have a name
 673 that may clue you in as to how that required type will be used.
 674 /JT -->
 675 <!-- Great points, I've added a small paragraph here! /Carol -->
 676
 677 ### Default Generic Type Parameters and Operator Overloading
 678
 679 When we use generic type parameters, we can specify a default concrete type for
 680 the generic type. This eliminates the need for implementors of the trait to
 681 specify a concrete type if the default type works. You specify a default type
 682 when declaring a generic type with the `<PlaceholderType=ConcreteType>` syntax.
 683
 684 A great example of a situation where this technique is useful is with *operator
 685 overloading*, in which you customize the behavior of an operator (such as `+`)
 686 in particular situations.
 687
 688 Rust doesn’t allow you to create your own operators or overload arbitrary
 689 operators. But you can overload the operations and corresponding traits listed
 690 in `std::ops` by implementing the traits associated with the operator. For
 691 example, in Listing 19-14 we overload the `+` operator to add two `Point`
 692 instances together. We do this by implementing the `Add` trait on a `Point`
 693 struct:
 694
 695 Filename: src/main.rs
 696
 697 ```
 698 use std::ops::Add;
 699
 700 #[derive(Debug, Copy, Clone, PartialEq)]
 701 struct Point {
 702     x: i32,
 703     y: i32,
 704 }
 705
 706 impl Add for Point {
 707     type Output = Point;
 708
 709     fn add(self, other: Point) -> Point {
 710         Point {
 711             x: self.x + other.x,
 712             y: self.y + other.y,
 713         }
 714     }
 715 }
 716
 717 fn main() {
 718     assert_eq!(
 719         Point { x: 1, y: 0 } + Point { x: 2, y: 3 },
 720         Point { x: 3, y: 3 }
 721     );
 722 }
 723 ```
 724
 725 Listing 19-14: Implementing the `Add` trait to overload the `+` operator for
 726 `Point` instances
 727
 728 The `add` method adds the `x` values of two `Point` instances and the `y`
 729 values of two `Point` instances to create a new `Point`. The `Add` trait has an
 730 associated type named `Output` that determines the type returned from the `add`
 731 method.
 732
 733 The default generic type in this code is within the `Add` trait. Here is its
 734 definition:
 735
 736 ```
 737 trait Add<Rhs=Self> {
 738     type Output;
 739
 740     fn add(self, rhs: Rhs) -> Self::Output;
 741 }
 742 ```
 743
 744 This code should look generally familiar: a trait with one method and an
 745 associated type. The new part is `Rhs=Self`: this syntax is called *default
 746 type parameters*. The `Rhs` generic type parameter (short for “right hand
 747 side”) defines the type of the `rhs` parameter in the `add` method. If we don’t
 748 specify a concrete type for `Rhs` when we implement the `Add` trait, the type
 749 of `Rhs` will default to `Self`, which will be the type we’re implementing
 750 `Add` on.
 751
 752 When we implemented `Add` for `Point`, we used the default for `Rhs` because we
 753 wanted to add two `Point` instances. Let’s look at an example of implementing
 754 the `Add` trait where we want to customize the `Rhs` type rather than using the
 755 default.
 756
 757 We have two structs, `Millimeters` and `Meters`, holding values in different
 758 units. This thin wrapping of an existing type in another struct is known as the
 759 *newtype pattern*, which we describe in more detail in the “Using the Newtype
 760 Pattern to Implement External Traits on External Types” section. We want to add
 761 values in millimeters to values in meters and have the implementation of `Add`
 762 do the conversion correctly. We can implement `Add` for `Millimeters` with
 763 `Meters` as the `Rhs`, as shown in Listing 19-15.
 764
 765 Filename: src/lib.rs
 766
 767 ```
 768 use std::ops::Add;
 769
 770 struct Millimeters(u32);
 771 struct Meters(u32);
 772
 773 impl Add<Meters> for Millimeters {
 774     type Output = Millimeters;
 775
 776     fn add(self, other: Meters) -> Millimeters {
 777         Millimeters(self.0 + (other.0 * 1000))
 778     }
 779 }
 780 ```
 781
 782 Listing 19-15: Implementing the `Add` trait on `Millimeters` to add
 783 `Millimeters` to `Meters`
 784
 785 To add `Millimeters` and `Meters`, we specify `impl Add<Meters>` to set the
 786 value of the `Rhs` type parameter instead of using the default of `Self`.
 787
 788 You’ll use default type parameters in two main ways:
 789
 790 * To extend a type without breaking existing code
 791 * To allow customization in specific cases most users won’t need
 792
 793 The standard library’s `Add` trait is an example of the second purpose:
 794 usually, you’ll add two like types, but the `Add` trait provides the ability to
 795 customize beyond that. Using a default type parameter in the `Add` trait
 796 definition means you don’t have to specify the extra parameter most of the
 797 time. In other words, a bit of implementation boilerplate isn’t needed, making
 798 it easier to use the trait.
 799
 800 The first purpose is similar to the second but in reverse: if you want to add a
 801 type parameter to an existing trait, you can give it a default to allow
 802 extension of the functionality of the trait without breaking the existing
 803 implementation code.
 804
 805 ### Fully Qualified Syntax for Disambiguation: Calling Methods with the Same Name
 806
 807 Nothing in Rust prevents a trait from having a method with the same name as
 808 another trait’s method, nor does Rust prevent you from implementing both traits
 809 on one type. It’s also possible to implement a method directly on the type with
 810 the same name as methods from traits.
 811
 812 When calling methods with the same name, you’ll need to tell Rust which one you
 813 want to use. Consider the code in Listing 19-16 where we’ve defined two traits,
 814 `Pilot` and `Wizard`, that both have a method called `fly`. We then implement
 815 both traits on a type `Human` that already has a method named `fly` implemented
 816 on it. Each `fly` method does something different.
 817
 818 Filename: src/main.rs
 819
 820 ```
 821 trait Pilot {
 822     fn fly(&self);
 823 }
 824
 825 trait Wizard {
 826     fn fly(&self);
 827 }
 828
 829 struct Human;
 830
 831 impl Pilot for Human {
 832     fn fly(&self) {
 833         println!("This is your captain speaking.");
 834     }
 835 }
 836
 837 impl Wizard for Human {
 838     fn fly(&self) {
 839         println!("Up!");
 840     }
 841 }
 842
 843 impl Human {
 844     fn fly(&self) {
 845         println!("*waving arms furiously*");
 846     }
 847 }
 848 ```
 849
 850 Listing 19-16: Two traits are defined to have a `fly` method and are
 851 implemented on the `Human` type, and a `fly` method is implemented on `Human`
 852 directly
 853
 854 When we call `fly` on an instance of `Human`, the compiler defaults to calling
 855 the method that is directly implemented on the type, as shown in Listing 19-17.
 856
 857 Filename: src/main.rs
 858
 859 ```
 860 fn main() {
 861     let person = Human;
 862     person.fly();
 863 }
 864 ```
 865
 866 Listing 19-17: Calling `fly` on an instance of `Human`
 867
 868 Running this code will print `*waving arms furiously*`, showing that Rust
 869 called the `fly` method implemented on `Human` directly.
 870
 871 To call the `fly` methods from either the `Pilot` trait or the `Wizard` trait,
 872 we need to use more explicit syntax to specify which `fly` method we mean.
 873 Listing 19-18 demonstrates this syntax.
 874
 875 Filename: src/main.rs
 876
 877 ```
 878 fn main() {
 879     let person = Human;
 880     Pilot::fly(&person);
 881     Wizard::fly(&person);
 882     person.fly();
 883 }
 884 ```
 885
 886 Listing 19-18: Specifying which trait’s `fly` method we want to call
 887
 888 Specifying the trait name before the method name clarifies to Rust which
 889 implementation of `fly` we want to call. We could also write
 890 `Human::fly(&person)`, which is equivalent to the `person.fly()` that we used
 891 in Listing 19-18, but this is a bit longer to write if we don’t need to
 892 disambiguate.
 893
 894 Running this code prints the following:
 895
 896 ```
 897 $ cargo run
 898    Compiling traits-example v0.1.0 (file:///projects/traits-example)
 899     Finished dev [unoptimized + debuginfo] target(s) in 0.46s
 900      Running `target/debug/traits-example`
 901 This is your captain speaking.
 902 Up!
 903 *waving arms furiously*
 904 ```
 905
 906 Because the `fly` method takes a `self` parameter, if we had two *types* that
 907 both implement one *trait*, Rust could figure out which implementation of a
 908 trait to use based on the type of `self`.
 909
 910 However, associated functions that are not methods don’t have a `self`
 911 parameter. When there are multiple types or traits that define non-method
 912 functions with the same function name, Rust doesn't always know which type you
 913 mean unless you use *fully qualified syntax*. For example, in Listing 19-19 we
 914 create a trait for an animal shelter that wants to name all baby dogs *Spot*.
 915 We make an `Animal` trait with an associated non-method function `baby_name`.
 916 The `Animal` trait is implemented for the struct `Dog`, on which we also
 917 provide an associated non-method function `baby_name` directly.
 918
 919 Filename: src/main.rs
 920
 921 ```
 922 trait Animal {
 923     fn baby_name() -> String;
 924 }
 925
 926 struct Dog;
 927
 928 impl Dog {
 929     fn baby_name() -> String {
 930         String::from("Spot")
 931     }
 932 }
 933
 934 impl Animal for Dog {
 935     fn baby_name() -> String {
 936         String::from("puppy")
 937     }
 938 }
 939
 940 fn main() {
 941     println!("A baby dog is called a {}", Dog::baby_name());
 942 }
 943 ```
 944
 945 Listing 19-19: A trait with an associated function and a type with an
 946 associated function of the same name that also implements the trait
 947
 948 We implement the code for naming all puppies Spot in the `baby_name` associated
 949 function that is defined on `Dog`. The `Dog` type also implements the trait
 950 `Animal`, which describes characteristics that all animals have. Baby dogs are
 951 called puppies, and that is expressed in the implementation of the `Animal`
 952 trait on `Dog` in the `baby_name` function associated with the `Animal` trait.
 953
 954 In `main`, we call the `Dog::baby_name` function, which calls the associated
 955 function defined on `Dog` directly. This code prints the following:
 956
 957 ```
 958 A baby dog is called a Spot
 959 ```
 960
 961 This output isn’t what we wanted. We want to call the `baby_name` function that
 962 is part of the `Animal` trait that we implemented on `Dog` so the code prints
 963 `A baby dog is called a puppy`. The technique of specifying the trait name that
 964 we used in Listing 19-18 doesn’t help here; if we change `main` to the code in
 965 Listing 19-20, we’ll get a compilation error.
 966
 967 Filename: src/main.rs
 968
 969 ```
 970 fn main() {
 971     println!("A baby dog is called a {}", Animal::baby_name());
 972 }
 973 ```
 974
 975 Listing 19-20: Attempting to call the `baby_name` function from the `Animal`
 976 trait, but Rust doesn’t know which implementation to use
 977
 978 Because `Animal::baby_name` doesn’t have a `self` parameter, and there could be
 979 other types that implement the `Animal` trait, Rust can’t figure out which
 980 implementation of `Animal::baby_name` we want. We’ll get this compiler error:
 981
 982 ```
 983 error[E0283]: type annotations needed
 984   --> src/main.rs:20:43
 985    |
 986 20 |     println!("A baby dog is called a {}", Animal::baby_name());
 987    |                                           ^^^^^^^^^^^^^^^^^ cannot infer type
 988    |
 989    = note: cannot satisfy `_: Animal`
 990 ```
 991
 992 To disambiguate and tell Rust that we want to use the implementation of
 993 `Animal` for `Dog` as opposed to the implementation of `Animal` for some other
 994 type, we need to use fully qualified syntax. Listing 19-21 demonstrates how to
 995 use fully qualified syntax.
 996
 997 Filename: src/main.rs
 998
 999 ```
1000 fn main() {
1001     println!("A baby dog is called a {}", <Dog as Animal>::baby_name());
1002 }
1003 ```
1004
1005 Listing 19-21: Using fully qualified syntax to specify that we want to call the
1006 `baby_name` function from the `Animal` trait as implemented on `Dog`
1007
1008 We’re providing Rust with a type annotation within the angle brackets, which
1009 indicates we want to call the `baby_name` method from the `Animal` trait as
1010 implemented on `Dog` by saying that we want to treat the `Dog` type as an
1011 `Animal` for this function call. This code will now print what we want:
1012
1013 ```
1014 A baby dog is called a puppy
1015 ```
1016
1017 In general, fully qualified syntax is defined as follows:
1018
1019 ```
1020 <Type as Trait>::function(receiver_if_method, next_arg, ...);
1021 ```
1022
1023 For associated functions that aren’t methods, there would not be a `receiver`:
1024 there would only be the list of other arguments. You could use fully qualified
1025 syntax everywhere that you call functions or methods. However, you’re allowed
1026 to omit any part of this syntax that Rust can figure out from other information
1027 in the program. You only need to use this more verbose syntax in cases where
1028 there are multiple implementations that use the same name and Rust needs help
1029 to identify which implementation you want to call.
1030
1031 ### Using Supertraits to Require One Trait’s Functionality Within Another Trait
1032
1033 Sometimes, you might write a trait definition that depends on another trait:
1034 for a type to implement the first trait, you want to require that type to also
1035 implement the second trait. You would do this so that your trait definition can
1036 make use of the associated items of the second trait. The trait your trait
1037 definition is relying on is called a *supertrait* of your trait.
1038
1039 For example, let’s say we want to make an `OutlinePrint` trait with an
1040 `outline_print` method that will print a given value formatted so that it's
1041 framed in asterisks. That is, given a `Point` struct that implements the
1042 standard library trait `Display` to result in `(x, y)`, when we
1043 call `outline_print` on a `Point` instance that has `1` for `x` and `3` for
1044 `y`, it should print the following:
1045
1046 ```
1047 **********
1048 *        *
1049 * (1, 3) *
1050 *        *
1051 **********
1052 ```
1053
1054 In the implementation of the `outline_print` method, we want to use the
1055 `Display` trait’s functionality. Therefore, we need to specify that the
1056 `OutlinePrint` trait will work only for types that also implement `Display` and
1057 provide the functionality that `OutlinePrint` needs. We can do that in the
1058 trait definition by specifying `OutlinePrint: Display`. This technique is
1059 similar to adding a trait bound to the trait. Listing 19-22 shows an
1060 implementation of the `OutlinePrint` trait.
1061
1062 Filename: src/main.rs
1063
1064 ```
1065 use std::fmt;
1066
1067 trait OutlinePrint: fmt::Display {
1068     fn outline_print(&self) {
1069         let output = self.to_string();
1070         let len = output.len();
1071         println!("{}", "*".repeat(len + 4));
1072         println!("*{}*", " ".repeat(len + 2));
1073         println!("* {} *", output);
1074         println!("*{}*", " ".repeat(len + 2));
1075         println!("{}", "*".repeat(len + 4));
1076     }
1077 }
1078 ```
1079
1080 Listing 19-22: Implementing the `OutlinePrint` trait that requires the
1081 functionality from `Display`
1082
1083 Because we’ve specified that `OutlinePrint` requires the `Display` trait, we
1084 can use the `to_string` function that is automatically implemented for any type
1085 that implements `Display`. If we tried to use `to_string` without adding a
1086 colon and specifying the `Display` trait after the trait name, we’d get an
1087 error saying that no method named `to_string` was found for the type `&Self` in
1088 the current scope.
1089
1090 Let’s see what happens when we try to implement `OutlinePrint` on a type that
1091 doesn’t implement `Display`, such as the `Point` struct:
1092
1093 Filename: src/main.rs
1094
1095 ```
1096 struct Point {
1097     x: i32,
1098     y: i32,
1099 }
1100
1101 impl OutlinePrint for Point {}
1102 ```
1103
1104 We get an error saying that `Display` is required but not implemented:
1105
1106 ```
1107 error[E0277]: `Point` doesn't implement `std::fmt::Display`
1108   --> src/main.rs:20:6
1109    |
1110 20 | impl OutlinePrint for Point {}
1111    |      ^^^^^^^^^^^^ `Point` cannot be formatted with the default formatter
1112    |
1113    = help: the trait `std::fmt::Display` is not implemented for `Point`
1114    = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
1115 note: required by a bound in `OutlinePrint`
1116   --> src/main.rs:3:21
1117    |
1118 3  | trait OutlinePrint: fmt::Display {
1119    |                     ^^^^^^^^^^^^ required by this bound in `OutlinePrint`
1120 ```
1121
1122 To fix this, we implement `Display` on `Point` and satisfy the constraint that
1123 `OutlinePrint` requires, like so:
1124
1125 Filename: src/main.rs
1126
1127 ```
1128 use std::fmt;
1129
1130 impl fmt::Display for Point {
1131     fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
1132         write!(f, "({}, {})", self.x, self.y)
1133     }
1134 }
1135 ```
1136
1137 Then implementing the `OutlinePrint` trait on `Point` will compile
1138 successfully, and we can call `outline_print` on a `Point` instance to display
1139 it within an outline of asterisks.
1140
1141 ### Using the Newtype Pattern to Implement External Traits on External Types
1142
1143 In Chapter 10 in the “Implementing a Trait on a Type” section, we mentioned the
1144 orphan rule that states we’re only allowed to implement a trait on a type if
1145 either the trait or the type are local to our crate.
1146 It’s possible to get
1147 around this restriction using the *newtype pattern*, which involves creating a
1148 new type in a tuple struct. (We covered tuple structs in the “Using Tuple
1149 Structs without Named Fields to Create Different Types” section of Chapter 5.)
1150 The tuple struct will have one field and be a thin wrapper around the type we
1151 want to implement a trait for. Then the wrapper type is local to our crate, and
1152 we can implement the trait on the wrapper. *Newtype* is a term that originates
1153 from the Haskell programming language. There is no runtime performance penalty
1154 for using this pattern, and the wrapper type is elided at compile time.
1155
1156 As an example, let’s say we want to implement `Display` on `Vec<T>`, which the
1157 orphan rule prevents us from doing directly because the `Display` trait and the
1158 `Vec<T>` type are defined outside our crate. We can make a `Wrapper` struct
1159 that holds an instance of `Vec<T>`; then we can implement `Display` on
1160 `Wrapper` and use the `Vec<T>` value, as shown in Listing 19-23.
1161
1162 Filename: src/main.rs
1163
1164 ```
1165 use std::fmt;
1166
1167 struct Wrapper(Vec<String>);
1168
1169 impl fmt::Display for Wrapper {
1170     fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
1171         write!(f, "[{}]", self.0.join(", "))
1172     }
1173 }
1174
1175 fn main() {
1176     let w = Wrapper(vec![String::from("hello"), String::from("world")]);
1177     println!("w = {}", w);
1178 }
1179 ```
1180
1181 Listing 19-23: Creating a `Wrapper` type around `Vec<String>` to implement
1182 `Display`
1183
1184 The implementation of `Display` uses `self.0` to access the inner `Vec<T>`,
1185 because `Wrapper` is a tuple struct and `Vec<T>` is the item at index 0 in the
1186 tuple. Then we can use the functionality of the `Display` type on `Wrapper`.
1187
1188 The downside of using this technique is that `Wrapper` is a new type, so it
1189 doesn’t have the methods of the value it’s holding. We would have to implement
1190 all the methods of `Vec<T>` directly on `Wrapper` such that the methods
1191 delegate to `self.0`, which would allow us to treat `Wrapper` exactly like a
1192 `Vec<T>`. If we wanted the new type to have every method the inner type has,
1193 implementing the `Deref` trait (discussed in Chapter 15 in the “Treating Smart
1194 Pointers Like Regular References with the `Deref` Trait” section) on the
1195 `Wrapper` to return the inner type would be a solution. If we don’t want the
1196 `Wrapper` type to have all the methods of the inner type—for example, to
1197 restrict the `Wrapper` type’s behavior—we would have to implement just the
1198 methods we do want manually.
1199
1200 This newtype pattern is also useful even when traits are not involved. Let’s
1201 switch focus and look at some advanced ways to interact with Rust’s type system.
1202
1203 ## Advanced Types
1204
1205 The Rust type system has some features that we’ve so far mentioned but haven’t
1206 yet discussed. We’ll start by discussing newtypes in general as we examine why
1207 newtypes are useful as types. Then we’ll move on to type aliases, a feature
1208 similar to newtypes but with slightly different semantics. We’ll also discuss
1209 the `!` type and dynamically sized types.
1210
1211 ### Using the Newtype Pattern for Type Safety and Abstraction
1212
1213 > Note: This section assumes you’ve read the earlier section “Using the
1214 > Newtype Pattern to Implement External Traits on External
1215 > Types.”
1216
1217 The newtype pattern is also useful for tasks beyond those we’ve discussed so
1218 far, including statically enforcing that values are never confused and
1219 indicating the units of a value. You saw an example of using newtypes to
1220 indicate units in Listing 19-15: recall that the `Millimeters` and `Meters`
1221 structs wrapped `u32` values in a newtype. If we wrote a function with a
1222 parameter of type `Millimeters`, we couldn’t compile a program that
1223 accidentally tried to call that function with a value of type `Meters` or a
1224 plain `u32`.
1225
1226 We can also use the newtype pattern to abstract away some implementation
1227 details of a type: the new type can expose a public API that is different from
1228 the API of the private inner type.
1229
1230 Newtypes can also hide internal implementation. For example, we could provide a
1231 `People` type to wrap a `HashMap<i32, String>` that stores a person’s ID
1232 associated with their name. Code using `People` would only interact with the
1233 public API we provide, such as a method to add a name string to the `People`
1234 collection; that code wouldn’t need to know that we assign an `i32` ID to names
1235 internally. The newtype pattern is a lightweight way to achieve encapsulation
1236 to hide implementation details, which we discussed in the “Encapsulation that
1237 Hides Implementation Details” section of Chapter 17.
1238
1239 ### Creating Type Synonyms with Type Aliases
1240
1241 Rust provides the ability to declare a *type alias* to give an existing type
1242 another name. For this we use the `type` keyword. For example, we can create
1243 the alias `Kilometers` to `i32` like so:
1244
1245 ```
1246 type Kilometers = i32;
1247 ```
1248
1249 Now, the alias `Kilometers` is a *synonym* for `i32`; unlike the `Millimeters`
1250 and `Meters` types we created in Listing 19-15, `Kilometers` is not a separate,
1251 new type. Values that have the type `Kilometers` will be treated the same as
1252 values of type `i32`:
1253
1254 ```
1255 type Kilometers = i32;
1256
1257 let x: i32 = 5;
1258 let y: Kilometers = 5;
1259
1260 println!("x + y = {}", x + y);
1261 ```
1262
1263 Because `Kilometers` and `i32` are the same type, we can add values of both
1264 types and we can pass `Kilometers` values to functions that take `i32`
1265 parameters. However, using this method, we don’t get the type checking benefits
1266 that we get from the newtype pattern discussed earlier. In other words, if we
1267 mix up `Kilometers` and `i32` values somewhere, the compiler will not give us
1268 an error.
1269
1270 <!-- Having a few battle wounds trying to debug using this pattern, it's
1271 definitely good to warn people that if they use type aliases to the same base
1272 type in their program (like multiple aliases to `usize`), they're asking for
1273 trouble as the typechecker will not help them if they mix up their types.
1274 /JT -->
1275 <!-- I'm not sure if JT was saying this paragraph was good or it could use more
1276 emphasis? I've added a sentence to the end of the paragraph above in case it
1277 was the latter /Carol -->
1278
1279 The main use case for type synonyms is to reduce repetition. For example, we
1280 might have a lengthy type like this:
1281
1282 ```
1283 Box<dyn Fn() + Send + 'static>
1284 ```
1285
1286 Writing this lengthy type in function signatures and as type annotations all
1287 over the code can be tiresome and error prone. Imagine having a project full of
1288 code like that in Listing 19-24.
1289
1290 ```
1291 let f: Box<dyn Fn() + Send + 'static> = Box::new(|| println!("hi"));
1292
1293 fn takes_long_type(f: Box<dyn Fn() + Send + 'static>) {
1294     // --snip--
1295 }
1296
1297 fn returns_long_type() -> Box<dyn Fn() + Send + 'static> {
1298     // --snip--
1299 }
1300 ```
1301
1302 Listing 19-24: Using a long type in many places
1303
1304 A type alias makes this code more manageable by reducing the repetition. In
1305 Listing 19-25, we’ve introduced an alias named `Thunk` for the verbose type and
1306 can replace all uses of the type with the shorter alias `Thunk`.
1307
1308 ```
1309 type Thunk = Box<dyn Fn() + Send + 'static>;
1310
1311 let f: Thunk = Box::new(|| println!("hi"));
1312
1313 fn takes_long_type(f: Thunk) {
1314     // --snip--
1315 }
1316
1317 fn returns_long_type() -> Thunk {
1318     // --snip--
1319 }
1320 ```
1321
1322 Listing 19-25: Introducing a type alias `Thunk` to reduce repetition
1323
1324 This code is much easier to read and write! Choosing a meaningful name for a
1325 type alias can help communicate your intent as well (*thunk* is a word for code
1326 to be evaluated at a later time, so it’s an appropriate name for a closure that
1327 gets stored).
1328
1329 Type aliases are also commonly used with the `Result<T, E>` type for reducing
1330 repetition. Consider the `std::io` module in the standard library. I/O
1331 operations often return a `Result<T, E>` to handle situations when operations
1332 fail to work. This library has a `std::io::Error` struct that represents all
1333 possible I/O errors. Many of the functions in `std::io` will be returning
1334 `Result<T, E>` where the `E` is `std::io::Error`, such as these functions in
1335 the `Write` trait:
1336
1337 ```
1338 use std::fmt;
1339 use std::io::Error;
1340
1341 pub trait Write {
1342     fn write(&mut self, buf: &[u8]) -> Result<usize, Error>;
1343     fn flush(&mut self) -> Result<(), Error>;
1344
1345     fn write_all(&mut self, buf: &[u8]) -> Result<(), Error>;
1346     fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<(), Error>;
1347 }
1348 ```
1349
1350 The `Result<..., Error>` is repeated a lot. As such, `std::io` has this type
1351 alias declaration:
1352
1353 ```
1354 type Result<T> = std::result::Result<T, std::io::Error>;
1355 ```
1356
1357 Because this declaration is in the `std::io` module, we can use the fully
1358 qualified alias `std::io::Result<T>`; that is, a `Result<T, E>` with the `E`
1359 filled in as `std::io::Error`. The `Write` trait function signatures end up
1360 looking like this:
1361
1362 ```
1363 pub trait Write {
1364     fn write(&mut self, buf: &[u8]) -> Result<usize>;
1365     fn flush(&mut self) -> Result<()>;
1366
1367     fn write_all(&mut self, buf: &[u8]) -> Result<()>;
1368     fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<()>;
1369 }
1370 ```
1371
1372 The type alias helps in two ways: it makes code easier to write *and* it gives
1373 us a consistent interface across all of `std::io`. Because it’s an alias, it’s
1374 just another `Result<T, E>`, which means we can use any methods that work on
1375 `Result<T, E>` with it, as well as special syntax like the `?` operator.
1376
1377 ### The Never Type that Never Returns
1378
1379 Rust has a special type named `!` that’s known in type theory lingo as the
1380 *empty type* because it has no values. We prefer to call it the *never type*
1381 because it stands in the place of the return type when a function will never
1382 return. Here is an example:
1383
1384 ```
1385 fn bar() -> ! {
1386     // --snip--
1387 }
1388 ```
1389
1390 This code is read as “the function `bar` returns never.” Functions that return
1391 never are called *diverging functions*. We can’t create values of the type `!`
1392 so `bar` can never possibly return.
1393
1394 But what use is a type you can never create values for? Recall the code from
1395 Listing 2-5, part of the number guessing game; we’ve reproduced a bit of it
1396 here in Listing 19-26.
1397
1398 ```
1399 let guess: u32 = match guess.trim().parse() {
1400     Ok(num) => num,
1401     Err(_) => continue,
1402 };
1403 ```
1404
1405 Listing 19-26: A `match` with an arm that ends in `continue`
1406
1407 At the time, we skipped over some details in this code. In Chapter 6 in “The
1408 `match` Control Flow Operator” section, we discussed that `match` arms must all
1409 return the same type. So, for example, the following code doesn’t work:
1410
1411 ```
1412 let guess = match guess.trim().parse() {
1413     Ok(_) => 5,
1414     Err(_) => "hello",
1415 };
1416 ```
1417
1418 The type of `guess` in this code would have to be an integer *and* a string,
1419 and Rust requires that `guess` have only one type. So what does `continue`
1420 return? How were we allowed to return a `u32` from one arm and have another arm
1421 that ends with `continue` in Listing 19-26?
1422
1423 As you might have guessed, `continue` has a `!` value. That is, when Rust
1424 computes the type of `guess`, it looks at both match arms, the former with a
1425 value of `u32` and the latter with a `!` value. Because `!` can never have a
1426 value, Rust decides that the type of `guess` is `u32`.
1427
1428 The formal way of describing this behavior is that expressions of type `!` can
1429 be coerced into any other type. We’re allowed to end this `match` arm with
1430 `continue` because `continue` doesn’t return a value; instead, it moves control
1431 back to the top of the loop, so in the `Err` case, we never assign a value to
1432 `guess`.
1433
1434 The never type is useful with the `panic!` macro as well. Recall the `unwrap`
1435 function that we call on `Option<T>` values to produce a value or panic with
1436 this definition:
1437
1438 ```
1439 impl<T> Option<T> {
1440     pub fn unwrap(self) -> T {
1441         match self {
1442             Some(val) => val,
1443             None => panic!("called `Option::unwrap()` on a `None` value"),
1444         }
1445     }
1446 }
1447 ```
1448
1449 In this code, the same thing happens as in the `match` in Listing 19-26: Rust
1450 sees that `val` has the type `T` and `panic!` has the type `!`, so the result
1451 of the overall `match` expression is `T`. This code works because `panic!`
1452 doesn’t produce a value; it ends the program. In the `None` case, we won’t be
1453 returning a value from `unwrap`, so this code is valid.
1454
1455 One final expression that has the type `!` is a `loop`:
1456
1457 ```
1458 print!("forever ");
1459
1460 loop {
1461     print!("and ever ");
1462 }
1463 ```
1464
1465 Here, the loop never ends, so `!` is the value of the expression. However, this
1466 wouldn’t be true if we included a `break`, because the loop would terminate
1467 when it got to the `break`.
1468
1469 ### Dynamically Sized Types and the `Sized` Trait
1470
1471 Rust needs to know certain details about its types, such as how much space to
1472 allocate for a value of a particular type. This leaves one corner of its type
1473 system a little confusing at first: the concept of *dynamically sized types*.
1474 Sometimes referred to as *DSTs* or *unsized types*, these types let us write
1475 code using values whose size we can know only at runtime.
1476
1477 Let’s dig into the details of a dynamically sized type called `str`, which
1478 we’ve been using throughout the book. That’s right, not `&str`, but `str` on
1479 its own, is a DST. We can’t know how long the string is until runtime, meaning
1480 we can’t create a variable of type `str`, nor can we take an argument of type
1481 `str`. Consider the following code, which does not work:
1482
1483 ```
1484 let s1: str = "Hello there!";
1485 let s2: str = "How's it going?";
1486 ```
1487
1488 Rust needs to know how much memory to allocate for any value of a particular
1489 type, and all values of a type must use the same amount of memory. If Rust
1490 allowed us to write this code, these two `str` values would need to take up the
1491 same amount of space. But they have different lengths: `s1` needs 12 bytes of
1492 storage and `s2` needs 15. This is why it’s not possible to create a variable
1493 holding a dynamically sized type.
1494
1495 So what do we do? In this case, you already know the answer: we make the types
1496 of `s1` and `s2` a `&str` rather than a `str`. Recall from the “String Slices”
1497 section of Chapter 4 that the slice data structure just stores the starting
1498 position and the length of the slice. So although a `&T` is a single value that
1499 stores the memory address of where the `T` is located, a `&str` is *two*
1500 values: the address of the `str` and its length. As such, we can know the size
1501 of a `&str` value at compile time: it’s twice the length of a `usize`. That is,
1502 we always know the size of a `&str`, no matter how long the string it refers to
1503 is. In general, this is the way in which dynamically sized types are used in
1504 Rust: they have an extra bit of metadata that stores the size of the dynamic
1505 information. The golden rule of dynamically sized types is that we must always
1506 put values of dynamically sized types behind a pointer of some kind.
1507
1508 We can combine `str` with all kinds of pointers: for example, `Box<str>` or
1509 `Rc<str>`. In fact, you’ve seen this before but with a different dynamically
1510 sized type: traits. Every trait is a dynamically sized type we can refer to by
1511 using the name of the trait. In Chapter 17 in the “Using Trait Objects That
1512 Allow for Values of Different Types” section, we mentioned that to use traits
1513 as trait objects, we must put them behind a pointer, such as `&dyn Trait` or
1514 `Box<dyn Trait>` (`Rc<dyn Trait>` would work too).
1515
1516 To work with DSTs, Rust provides the `Sized` trait to determine whether or not
1517 a type’s size is known at compile time. This trait is automatically implemented
1518 for everything whose size is known at compile time. In addition, Rust
1519 implicitly adds a bound on `Sized` to every generic function. That is, a
1520 generic function definition like this:
1521
1522 ```
1523 fn generic<T>(t: T) {
1524     // --snip--
1525 }
1526 ```
1527
1528 is actually treated as though we had written this:
1529
1530 ```
1531 fn generic<T: Sized>(t: T) {
1532     // --snip--
1533 }
1534 ```
1535
1536 By default, generic functions will work only on types that have a known size at
1537 compile time. However, you can use the following special syntax to relax this
1538 restriction:
1539
1540 ```
1541 fn generic<T: ?Sized>(t: &T) {
1542     // --snip--
1543 }
1544 ```
1545
1546 A trait bound on `?Sized` means “`T` may or may not be `Sized`” and this
1547 notation overrides the default that generic types must have a known size at
1548 compile time. The `?Trait` syntax with this meaning is only available for
1549 `Sized`, not any other traits.
1550
1551 Also note that we switched the type of the `t` parameter from `T` to `&T`.
1552 Because the type might not be `Sized`, we need to use it behind some kind of
1553 pointer. In this case, we’ve chosen a reference.
1554
1555 Next, we’ll talk about functions and closures!
1556
1557 ## Advanced Functions and Closures
1558
1559 This section explores some advanced features related to functions and closures,
1560 including function pointers and returning closures.
1561
1562 ### Function Pointers
1563
1564 We’ve talked about how to pass closures to functions; you can also pass regular
1565 functions to functions! This technique is useful when you want to pass a
1566 function you’ve already defined rather than defining a new closure. Functions
1567 coerce to the type `fn` (with a lowercase f), not to be confused with the `Fn`
1568 closure trait. The `fn` type is called a *function pointer*. Passing functions
1569 with function pointers will allow you to use functions as arguments to other
1570 functions.
1571
1572 The syntax for specifying that a parameter is a function pointer is similar to
1573 that of closures, as shown in Listing 19-27, where we’ve defined a function
1574 `add_one` that adds one to its parameter. The function `do_twice` takes two
1575 parameters: a function pointer to any function that takes an `i32` parameter
1576 and returns an `i32`, and one `i32 value`. The `do_twice` function calls the
1577 function `f` twice, passing it the `arg` value, then adds the two function call
1578 results together. The `main` function calls `do_twice` with the arguments
1579 `add_one` and `5`.
1580
1581 Filename: src/main.rs
1582
1583 ```
1584 fn add_one(x: i32) -> i32 {
1585     x + 1
1586 }
1587
1588 fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32 {
1589     f(arg) + f(arg)
1590 }
1591
1592 fn main() {
1593     let answer = do_twice(add_one, 5);
1594
1595     println!("The answer is: {}", answer);
1596 }
1597 ```
1598
1599 Listing 19-27: Using the `fn` type to accept a function pointer as an argument
1600
1601 This code prints `The answer is: 12`. We specify that the parameter `f` in
1602 `do_twice` is an `fn` that takes one parameter of type `i32` and returns an
1603 `i32`. We can then call `f` in the body of `do_twice`. In `main`, we can pass
1604 the function name `add_one` as the first argument to `do_twice`.
1605
1606 Unlike closures, `fn` is a type rather than a trait, so we specify `fn` as the
1607 parameter type directly rather than declaring a generic type parameter with one
1608 of the `Fn` traits as a trait bound.
1609
1610 Function pointers implement all three of the closure traits (`Fn`, `FnMut`, and
1611 `FnOnce`), meaning you can always pass a function pointer as an argument for a
1612 function that expects a closure. It’s best to write functions using a generic
1613 type and one of the closure traits so your functions can accept either
1614 functions or closures.
1615
1616 That said, one example of where you would want to only accept `fn` and not
1617 closures is when interfacing with external code that doesn’t have closures: C
1618 functions can accept functions as arguments, but C doesn’t have closures.
1619
1620 As an example of where you could use either a closure defined inline or a named
1621 function, let’s look at a use of the `map` method provided by the `Iterator`
1622 trait in the standard library. To use the `map` function to turn a
1623 vector of numbers into a vector of strings, we could use a closure, like this:
1624
1625 ```
1626 let list_of_numbers = vec![1, 2, 3];
1627 let list_of_strings: Vec<String> =
1628     list_of_numbers.iter().map(|i| i.to_string()).collect();
1629 ```
1630
1631 Or we could name a function as the argument to `map` instead of the closure,
1632 like this:
1633
1634 ```
1635 let list_of_numbers = vec![1, 2, 3];
1636 let list_of_strings: Vec<String> =
1637     list_of_numbers.iter().map(ToString::to_string).collect();
1638 ```
1639
1640 Note that we must use the fully qualified syntax that we talked about earlier
1641 in the “Advanced Traits” section because there are multiple functions available
1642 named `to_string`.
1643
1644 Here, we’re using the `to_string` function defined in the
1645 `ToString` trait, which the standard library has implemented for any type that
1646 implements `Display`.
1647
1648 Recall from the “Enum values” section of Chapter 6 that the name of each enum
1649 variant that we define also becomes an initializer function. We can use these
1650 initializer functions as function pointers that implement the closure traits,
1651 which means we can specify the initializer functions as arguments for methods
1652 that take closures, like so:
1653
1654 ```
1655 enum Status {
1656     Value(u32),
1657     Stop,
1658 }
1659
1660 let list_of_statuses: Vec<Status> = (0u32..20).map(Status::Value).collect();
1661 ```
1662
1663 Here we create `Status::Value` instances using each `u32` value in the range
1664 that `map` is called on by using the initializer function of `Status::Value`.
1665 Some people prefer this style, and some people prefer to use closures. They
1666 compile to the same code, so use whichever style is clearer to you.
1667
1668 ### Returning Closures
1669
1670 Closures are represented by traits, which means you can’t return closures
1671 directly. In most cases where you might want to return a trait, you can instead
1672 use the concrete type that implements the trait as the return value of the
1673 function. However, you can’t do that with closures because they don’t have a
1674 concrete type that is returnable; you’re not allowed to use the function
1675 pointer `fn` as a return type, for example.
1676
1677 The following code tries to return a closure directly, but it won’t compile:
1678
1679 ```
1680 fn returns_closure() -> dyn Fn(i32) -> i32 {
1681     |x| x + 1
1682 }
1683 ```
1684
1685 The compiler error is as follows:
1686
1687 ```
1688 error[E0746]: return type cannot have an unboxed trait object
1689  --> src/lib.rs:1:25
1690   |
1691 1 | fn returns_closure() -> dyn Fn(i32) -> i32 {
1692   |                         ^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
1693   |
1694   = note: for information on `impl Trait`, see <https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits>
1695 help: use `impl Fn(i32) -> i32` as the return type, as all return paths are of type `[closure@src/lib.rs:2:5: 2:14]`, which implements `Fn(i32) -> i32`
1696   |
1697 1 | fn returns_closure() -> impl Fn(i32) -> i32 {
1698   |                         ~~~~~~~~~~~~~~~~~~~
1699 ```
1700
1701 The error references the `Sized` trait again! Rust doesn’t know how much space
1702 it will need to store the closure. We saw a solution to this problem earlier.
1703 We can use a trait object:
1704
1705 ```
1706 fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
1707     Box::new(|x| x + 1)
1708 }
1709 ```
1710
1711 This code will compile just fine. For more about trait objects, refer to the
1712 section “Using Trait Objects That Allow for Values of Different Types” in
1713 Chapter 17.
1714
1715 Next, let’s look at macros!
1716
1717 ## Macros
1718
1719 We’ve used macros like `println!` throughout this book, but we haven’t fully
1720 explored what a macro is and how it works. The term *macro* refers to a family
1721 of features in Rust: *declarative* macros with `macro_rules!` and three kinds
1722 of *procedural* macros:
1723
1724 * Custom `#[derive]` macros that specify code added with the `derive` attribute
1725   used on structs and enums
1726 * Attribute-like macros that define custom attributes usable on any item
1727 * Function-like macros that look like function calls but operate on the tokens
1728   specified as their argument
1729
1730 We’ll talk about each of these in turn, but first, let’s look at why we even
1731 need macros when we already have functions.
1732
1733 ### The Difference Between Macros and Functions
1734
1735 Fundamentally, macros are a way of writing code that writes other code, which
1736 is known as *metaprogramming*. In Appendix C, we discuss the `derive`
1737 attribute, which generates an implementation of various traits for you. We’ve
1738 also used the `println!` and `vec!` macros throughout the book. All of these
1739 macros *expand* to produce more code than the code you’ve written manually.
1740
1741 Metaprogramming is useful for reducing the amount of code you have to write and
1742 maintain, which is also one of the roles of functions. However, macros have
1743 some additional powers that functions don’t.
1744
1745 A function signature must declare the number and type of parameters the
1746 function has. Macros, on the other hand, can take a variable number of
1747 parameters: we can call `println!("hello")` with one argument or
1748 `println!("hello {}", name)` with two arguments. Also, macros are expanded
1749 before the compiler interprets the meaning of the code, so a macro can, for
1750 example, implement a trait on a given type. A function can’t, because it gets
1751 called at runtime and a trait needs to be implemented at compile time.
1752
1753 The downside to implementing a macro instead of a function is that macro
1754 definitions are more complex than function definitions because you’re writing
1755 Rust code that writes Rust code. Due to this indirection, macro definitions are
1756 generally more difficult to read, understand, and maintain than function
1757 definitions.
1758
1759 Another important difference between macros and functions is that you must
1760 define macros or bring them into scope *before* you call them in a file, as
1761 opposed to functions you can define anywhere and call anywhere.
1762
1763 ### Declarative Macros with `macro_rules!` for General Metaprogramming
1764
1765 The most widely used form of macros in Rust is the *declarative macro*. These
1766 are also sometimes referred to as “macros by example,” “`macro_rules!` macros,”
1767 or just plain “macros.” At their core, declarative macros allow you to write
1768 something similar to a Rust `match` expression. As discussed in Chapter 6,
1769 `match` expressions are control structures that take an expression, compare the
1770 resulting value of the expression to patterns, and then run the code associated
1771 with the matching pattern. Macros also compare a value to patterns that are
1772 associated with particular code: in this situation, the value is the literal
1773 Rust source code passed to the macro; the patterns are compared with the
1774 structure of that source code; and the code associated with each pattern, when
1775 matched, replaces the code passed to the macro. This all happens during
1776 compilation.
1777
1778 To define a macro, you use the `macro_rules!` construct. Let’s explore how to
1779 use `macro_rules!` by looking at how the `vec!` macro is defined. Chapter 8
1780 covered how we can use the `vec!` macro to create a new vector with particular
1781 values. For example, the following macro creates a new vector containing three
1782 integers:
1783
1784 ```
1785 let v: Vec<u32> = vec![1, 2, 3];
1786 ```
1787
1788 We could also use the `vec!` macro to make a vector of two integers or a vector
1789 of five string slices. We wouldn’t be able to use a function to do the same
1790 because we wouldn’t know the number or type of values up front.
1791
1792 Listing 19-28 shows a slightly simplified definition of the `vec!` macro.
1793
1794 Filename: src/lib.rs
1795
1796 ```
1797 [1] #[macro_export]
1798 [2] macro_rules! vec {
1799     [3] ( $( $x:expr ),* ) => {
1800         {
1801             let mut temp_vec = Vec::new();
1802             [4] $(
1803                 [5] temp_vec.push($x [6]);
1804             )*
1805             [7] temp_vec
1806         }
1807     };
1808 }
1809 ```
1810
1811 Listing 19-28: A simplified version of the `vec!` macro definition
1812
1813 > Note: The actual definition of the `vec!` macro in the standard library
1814 > includes code to preallocate the correct amount of memory up front. That code
1815 > is an optimization that we don’t include here to make the example simpler.
1816
1817 The `#[macro_export]` annotation [1] indicates that this macro should be made
1818 available whenever the crate in which the macro is defined is brought into
1819 scope. Without this annotation, the macro can’t be brought into scope.
1820
1821 We then start the macro definition with `macro_rules!` and the name of the
1822 macro we’re defining *without* the exclamation mark [2]. The name, in this case
1823 `vec`, is followed by curly brackets denoting the body of the macro definition.
1824
1825 The structure in the `vec!` body is similar to the structure of a `match`
1826 expression. Here we have one arm with the pattern `( $( $x:expr ),* )`,
1827 followed by `=>` and the block of code associated with this pattern [3]. If the
1828 pattern matches, the associated block of code will be emitted. Given that this
1829 is the only pattern in this macro, there is only one valid way to match; any
1830 other pattern will result in an error. More complex macros will have more than
1831 one arm.
1832
1833 Valid pattern syntax in macro definitions is different than the pattern syntax
1834 covered in Chapter 18 because macro patterns are matched against Rust code
1835 structure rather than values. Let’s walk through what the pattern pieces in
1836 Listing 19-28 mean; for the full macro pattern syntax, see the Rust Reference
1837 at *https://doc.rust-lang.org/reference/macros-by-example.html*.
1838
1839 First, we use a set of parentheses to encompass the whole pattern. We use a
1840 dollar sign (`$`) to declare a variable in the macro system that will contain
1841 the Rust code matching the pattern. The dollar sign makes it clear this is a
1842 macro variable as opposed to a regular Rust variable.
1843 Next comes a set of parentheses that captures values that match the
1844 pattern within the parentheses for use in the replacement code. Within `$()` is
1845 `$x:expr`, which matches any Rust expression and gives the expression the name
1846 `$x`.
1847
1848 The comma following `$()` indicates that a literal comma separator character
1849 could optionally appear after the code that matches the code in `$()`. The `*`
1850 specifies that the pattern matches zero or more of whatever precedes the `*`.
1851
1852 When we call this macro with `vec![1, 2, 3];`, the `$x` pattern matches three
1853 times with the three expressions `1`, `2`, and `3`.
1854
1855 Now let’s look at the pattern in the body of the code associated with this arm:
1856 `temp_vec.push()` [5] within `$()*` [4][7] is generated for each part that
1857 matches `$()` in the pattern zero or more times depending on how many times the
1858 pattern matches. The `$x` [6] is replaced with each expression matched. When we
1859 call this macro with `vec![1, 2, 3];`, the code generated that replaces this
1860 macro call will be the following:
1861
1862 ```
1863 {
1864     let mut temp_vec = Vec::new();
1865     temp_vec.push(1);
1866     temp_vec.push(2);
1867     temp_vec.push(3);
1868     temp_vec
1869 }
1870 ```
1871
1872 We’ve defined a macro that can take any number of arguments of any type and can
1873 generate code to create a vector containing the specified elements.
1874
1875 To learn more about how to write macros, consult the online documentation or
1876 other resources, such as “The Little Book of Rust Macros” at
1877 *https://veykril.github.io/tlborm/* started by Daniel Keep and continued by
1878 Lukas Wirth.
1879
1880 <!-- Not sure what "In the future, Rust will have a second kind of declarative
1881 macro" means here. I suspect we're "stuck" with the two kinds of macros we
1882 already have today, at least I don't see much energy in pushing to add a third
1883 just yet.
1884 /JT -->
1885 <!-- Yeah, great catch, I think that part was back when we had more dreams that
1886 have now been postponed/abandoned. I've removed. /Carol -->
1887
1888 ### Procedural Macros for Generating Code from Attributes
1889
1890 The second form of macros is the *procedural macro*, which acts more like a
1891 function (and is a type of procedure). Procedural macros accept some code as an
1892 input, operate on that code, and produce some code as an output rather than
1893 matching against patterns and replacing the code with other code as declarative
1894 macros do. The three kinds of procedural macros are custom derive,
1895 attribute-like, and function-like, and all work in a similar fashion.
1896
1897 When creating procedural macros, the definitions must reside in their own crate
1898 with a special crate type. This is for complex technical reasons that we hope
1899 to eliminate in the future. In Listing 19-29, we show how to define a
1900 procedural macro, where `some_attribute` is a placeholder for using a specific
1901 macro variety.
1902
1903 Filename: src/lib.rs
1904
1905 ```
1906 use proc_macro;
1907
1908 #[some_attribute]
1909 pub fn some_name(input: TokenStream) -> TokenStream {
1910 }
1911 ```
1912
1913 Listing 19-29: An example of defining a procedural macro
1914
1915 The function that defines a procedural macro takes a `TokenStream` as an input
1916 and produces a `TokenStream` as an output. The `TokenStream` type is defined by
1917 the `proc_macro` crate that is included with Rust and represents a sequence of
1918 tokens. This is the core of the macro: the source code that the macro is
1919 operating on makes up the input `TokenStream`, and the code the macro produces
1920 is the output `TokenStream`. The function also has an attribute attached to it
1921 that specifies which kind of procedural macro we’re creating. We can have
1922 multiple kinds of procedural macros in the same crate.
1923
1924 Let’s look at the different kinds of procedural macros. We’ll start with a
1925 custom derive macro and then explain the small dissimilarities that make the
1926 other forms different.
1927
1928 ### How to Write a Custom `derive` Macro
1929
1930 Let’s create a crate named `hello_macro` that defines a trait named
1931 `HelloMacro` with one associated function named `hello_macro`. Rather than
1932 making our users implement the `HelloMacro` trait for each of their types,
1933 we’ll provide a procedural macro so users can annotate their type with
1934 `#[derive(HelloMacro)]` to get a default implementation of the `hello_macro`
1935 function. The default implementation will print `Hello, Macro! My name is
1936 TypeName!` where `TypeName` is the name of the type on which this trait has
1937 been defined. In other words, we’ll write a crate that enables another
1938 programmer to write code like Listing 19-30 using our crate.
1939
1940 Filename: src/main.rs
1941
1942 ```
1943 use hello_macro::HelloMacro;
1944 use hello_macro_derive::HelloMacro;
1945
1946 #[derive(HelloMacro)]
1947 struct Pancakes;
1948
1949 fn main() {
1950     Pancakes::hello_macro();
1951 }
1952 ```
1953
1954 Listing 19-30: The code a user of our crate will be able to write when using
1955 our procedural macro
1956
1957 This code will print `Hello, Macro! My name is Pancakes!` when we’re done. The
1958 first step is to make a new library crate, like this:
1959
1960 ```
1961 $ cargo new hello_macro --lib
1962 ```
1963
1964 Next, we’ll define the `HelloMacro` trait and its associated function:
1965
1966 Filename: src/lib.rs
1967
1968 ```
1969 pub trait HelloMacro {
1970     fn hello_macro();
1971 }
1972 ```
1973
1974 We have a trait and its function. At this point, our crate user could implement
1975 the trait to achieve the desired functionality, like so:
1976
1977 ```
1978 use hello_macro::HelloMacro;
1979
1980 struct Pancakes;
1981
1982 impl HelloMacro for Pancakes {
1983     fn hello_macro() {
1984         println!("Hello, Macro! My name is Pancakes!");
1985     }
1986 }
1987
1988 fn main() {
1989     Pancakes::hello_macro();
1990 }
1991 ```
1992
1993 However, they would need to write the implementation block for each type they
1994 wanted to use with `hello_macro`; we want to spare them from having to do this
1995 work.
1996
1997 Additionally, we can’t yet provide the `hello_macro` function with default
1998 implementation that will print the name of the type the trait is implemented
1999 on: Rust doesn’t have reflection capabilities, so it can’t look up the type’s
2000 name at runtime. We need a macro to generate code at compile time.
2001
2002 The next step is to define the procedural macro. At the time of this writing,
2003 procedural macros need to be in their own crate. Eventually, this restriction
2004 might be lifted. The convention for structuring crates and macro crates is as
2005 follows: for a crate named `foo`, a custom derive procedural macro crate is
2006 called `foo_derive`. Let’s start a new crate called `hello_macro_derive` inside
2007 our `hello_macro` project:
2008
2009 ```
2010 $ cargo new hello_macro_derive --lib
2011 ```
2012
2013 Our two crates are tightly related, so we create the procedural macro crate
2014 within the directory of our `hello_macro` crate. If we change the trait
2015 definition in `hello_macro`, we’ll have to change the implementation of the
2016 procedural macro in `hello_macro_derive` as well. The two crates will need to
2017 be published separately, and programmers using these crates will need to add
2018 both as dependencies and bring them both into scope. We could instead have the
2019 `hello_macro` crate use `hello_macro_derive` as a dependency and re-export the
2020 procedural macro code. However, the way we’ve structured the project makes it
2021 possible for programmers to use `hello_macro` even if they don’t want the
2022 `derive` functionality.
2023
2024 We need to declare the `hello_macro_derive` crate as a procedural macro crate.
2025 We’ll also need functionality from the `syn` and `quote` crates, as you’ll see
2026 in a moment, so we need to add them as dependencies. Add the following to the
2027 *Cargo.toml* file for `hello_macro_derive`:
2028
2029 Filename: hello_macro_derive/Cargo.toml
2030
2031 ```
2032 [lib]
2033 proc-macro = true
2034
2035 [dependencies]
2036 syn = "1.0"
2037 quote = "1.0"
2038 ```
2039
2040 To start defining the procedural macro, place the code in Listing 19-31 into
2041 your *src/lib.rs* file for the `hello_macro_derive` crate. Note that this code
2042 won’t compile until we add a definition for the `impl_hello_macro` function.
2043
2044 Filename: hello_macro_derive/src/lib.rs
2045
2046 ```
2047 use proc_macro::TokenStream;
2048 use quote::quote;
2049 use syn;
2050
2051 #[proc_macro_derive(HelloMacro)]
2052 pub fn hello_macro_derive(input: TokenStream) -> TokenStream {
2053     // Construct a representation of Rust code as a syntax tree
2054     // that we can manipulate
2055     let ast = syn::parse(input).unwrap();
2056
2057     // Build the trait implementation
2058     impl_hello_macro(&ast)
2059 }
2060 ```
2061
2062 Listing 19-31: Code that most procedural macro crates will require in order to
2063 process Rust code
2064
2065 Notice that we’ve split the code into the `hello_macro_derive` function, which
2066 is responsible for parsing the `TokenStream`, and the `impl_hello_macro`
2067 function, which is responsible for transforming the syntax tree: this makes
2068 writing a procedural macro more convenient. The code in the outer function
2069 (`hello_macro_derive` in this case) will be the same for almost every
2070 procedural macro crate you see or create. The code you specify in the body of
2071 the inner function (`impl_hello_macro` in this case) will be different
2072 depending on your procedural macro’s purpose.
2073
2074 We’ve introduced three new crates: `proc_macro`, `syn` (available from
2075 *https://crates.io/crates/syn*), and `quote` (available from
2076 *https://crates.io/crates/quote*). The `proc_macro` crate comes with Rust, so
2077 we didn’t need to add that to the dependencies in *Cargo.toml*. The
2078 `proc_macro` crate is the compiler’s API that allows us to read and manipulate
2079 Rust code from our code.
2080
2081 The `syn` crate parses Rust code from a string into a data structure that we
2082 can perform operations on. The `quote` crate turns `syn` data structures back
2083 into Rust code. These crates make it much simpler to parse any sort of Rust
2084 code we might want to handle: writing a full parser for Rust code is no simple
2085 task.
2086
2087 The `hello_macro_derive` function will be called when a user of our library
2088 specifies `#[derive(HelloMacro)]` on a type. This is possible because we’ve
2089 annotated the `hello_macro_derive` function here with `proc_macro_derive` and
2090 specified the name `HelloMacro`, which matches our trait name; this is the
2091 convention most procedural macros follow.
2092
2093 The `hello_macro_derive` function first converts the `input` from a
2094 `TokenStream` to a data structure that we can then interpret and perform
2095 operations on. This is where `syn` comes into play. The `parse` function in
2096 `syn` takes a `TokenStream` and returns a `DeriveInput` struct representing the
2097 parsed Rust code. Listing 19-32 shows the relevant parts of the `DeriveInput`
2098 struct we get from parsing the `struct Pancakes;` string:
2099
2100 ```
2101 DeriveInput {
2102     // --snip--
2103
2104     ident: Ident {
2105         ident: "Pancakes",
2106         span: #0 bytes(95..103)
2107     },
2108     data: Struct(
2109         DataStruct {
2110             struct_token: Struct,
2111             fields: Unit,
2112             semi_token: Some(
2113                 Semi
2114             )
2115         }
2116     )
2117 }
2118 ```
2119
2120 Listing 19-32: The `DeriveInput` instance we get when parsing the code that has
2121 the macro’s attribute in Listing 19-30
2122
2123 The fields of this struct show that the Rust code we’ve parsed is a unit struct
2124 with the `ident` (identifier, meaning the name) of `Pancakes`. There are more
2125 fields on this struct for describing all sorts of Rust code; check the `syn`
2126 documentation for `DeriveInput` at
2127 *https://docs.rs/syn/1.0/syn/struct.DeriveInput.html* for more information.
2128
2129 Soon we’ll define the `impl_hello_macro` function, which is where we’ll build
2130 the new Rust code we want to include. But before we do, note that the output
2131 for our derive macro is also a `TokenStream`. The returned `TokenStream` is
2132 added to the code that our crate users write, so when they compile their crate,
2133 they’ll get the extra functionality that we provide in the modified
2134 `TokenStream`.
2135
2136 You might have noticed that we’re calling `unwrap` to cause the
2137 `hello_macro_derive` function to panic if the call to the `syn::parse` function
2138 fails here. It’s necessary for our procedural macro to panic on errors because
2139 `proc_macro_derive` functions must return `TokenStream` rather than `Result` to
2140 conform to the procedural macro API. We’ve simplified this example by using
2141 `unwrap`; in production code, you should provide more specific error messages
2142 about what went wrong by using `panic!` or `expect`.
2143
2144 Now that we have the code to turn the annotated Rust code from a `TokenStream`
2145 into a `DeriveInput` instance, let’s generate the code that implements the
2146 `HelloMacro` trait on the annotated type, as shown in Listing 19-33.
2147
2148 Filename: hello_macro_derive/src/lib.rs
2149
2150 ```
2151 fn impl_hello_macro(ast: &syn::DeriveInput) -> TokenStream {
2152     let name = &ast.ident;
2153     let gen = quote! {
2154         impl HelloMacro for #name {
2155             fn hello_macro() {
2156                 println!("Hello, Macro! My name is {}!", stringify!(#name));
2157             }
2158         }
2159     };
2160     gen.into()
2161 }
2162 ```
2163
2164 Listing 19-33: Implementing the `HelloMacro` trait using the parsed Rust code
2165
2166 We get an `Ident` struct instance containing the name (identifier) of the
2167 annotated type using `ast.ident`. The struct in Listing 19-32 shows that when
2168 we run the `impl_hello_macro` function on the code in Listing 19-30, the
2169 `ident` we get will have the `ident` field with a value of `"Pancakes"`. Thus,
2170 the `name` variable in Listing 19-33 will contain an `Ident` struct instance
2171 that, when printed, will be the string `"Pancakes"`, the name of the struct in
2172 Listing 19-30.
2173
2174 The `quote!` macro lets us define the Rust code that we want to return. The
2175 compiler expects something different to the direct result of the `quote!`
2176 macro’s execution, so we need to convert it to a `TokenStream`. We do this by
2177 calling the `into` method, which consumes this intermediate representation and
2178 returns a value of the required `TokenStream` type.
2179
2180 The `quote!` macro also provides some very cool templating mechanics: we can
2181 enter `#name`, and `quote!` will replace it with the value in the variable
2182 `name`. You can even do some repetition similar to the way regular macros work.
2183 Check out the `quote` crate’s docs at *https://docs.rs/quote* for a thorough
2184 introduction.
2185
2186 We want our procedural macro to generate an implementation of our `HelloMacro`
2187 trait for the type the user annotated, which we can get by using `#name`. The
2188 trait implementation has the one function `hello_macro`, whose body contains the
2189 functionality we want to provide: printing `Hello, Macro! My name is` and then
2190 the name of the annotated type.
2191
2192 The `stringify!` macro used here is built into Rust. It takes a Rust
2193 expression, such as `1 + 2`, and at compile time turns the expression into a
2194 string literal, such as `"1 + 2"`. This is different than `format!` or
2195 `println!`, macros which evaluate the expression and then turn the result into
2196 a `String`. There is a possibility that the `#name` input might be an
2197 expression to print literally, so we use `stringify!`. Using `stringify!` also
2198 saves an allocation by converting `#name` to a string literal at compile time.
2199
2200 At this point, `cargo build` should complete successfully in both `hello_macro`
2201 and `hello_macro_derive`. Let’s hook up these crates to the code in Listing
2202 19-30 to see the procedural macro in action! Create a new binary project in
2203 your *projects* directory using `cargo new pancakes`. We need to add
2204 `hello_macro` and `hello_macro_derive` as dependencies in the `pancakes`
2205 crate’s *Cargo.toml*. If you’re publishing your versions of `hello_macro` and
2206 `hello_macro_derive` to *https://crates.io/*, they would be regular
2207 dependencies; if not, you can specify them as `path` dependencies as follows:
2208
2209 ```
2210 [dependencies]
2211 hello_macro = { path = "../hello_macro" }
2212 hello_macro_derive = { path = "../hello_macro/hello_macro_derive" }
2213 ```
2214
2215 Put the code in Listing 19-30 into *src/main.rs*, and run `cargo run`: it
2216 should print `Hello, Macro! My name is Pancakes!` The implementation of the
2217 `HelloMacro` trait from the procedural macro was included without the
2218 `pancakes` crate needing to implement it; the `#[derive(HelloMacro)]` added the
2219 trait implementation.
2220
2221 Next, let’s explore how the other kinds of procedural macros differ from custom
2222 derive macros.
2223
2224 ### Attribute-like macros
2225
2226 Attribute-like macros are similar to custom derive macros, but instead of
2227 generating code for the `derive` attribute, they allow you to create new
2228 attributes. They’re also more flexible: `derive` only works for structs and
2229 enums; attributes can be applied to other items as well, such as functions.
2230 Here’s an example of using an attribute-like macro: say you have an attribute
2231 named `route` that annotates functions when using a web application framework:
2232
2233 ```
2234 #[route(GET, "/")]
2235 fn index() {
2236 ```
2237
2238 This `#[route]` attribute would be defined by the framework as a procedural
2239 macro. The signature of the macro definition function would look like this:
2240
2241 ```
2242 #[proc_macro_attribute]
2243 pub fn route(attr: TokenStream, item: TokenStream) -> TokenStream {
2244 ```
2245
2246 Here, we have two parameters of type `TokenStream`. The first is for the
2247 contents of the attribute: the `GET, "/"` part. The second is the body of the
2248 item the attribute is attached to: in this case, `fn index() {}` and the rest
2249 of the function’s body.
2250
2251 Other than that, attribute-like macros work the same way as custom derive
2252 macros: you create a crate with the `proc-macro` crate type and implement a
2253 function that generates the code you want!
2254
2255 ### Function-like macros
2256
2257 Function-like macros define macros that look like function calls. Similarly to
2258 `macro_rules!` macros, they’re more flexible than functions; for example, they
2259 can take an unknown number of arguments. However, `macro_rules!` macros can be
2260 defined only using the match-like syntax we discussed in the section
2261 “Declarative Macros with `macro_rules!` for General Metaprogramming” earlier.
2262 Function-like macros take a `TokenStream` parameter and their definition
2263 manipulates that `TokenStream` using Rust code as the other two types of
2264 procedural macros do. An example of a function-like macro is an `sql!` macro
2265 that might be called like so:
2266
2267 ```
2268 let sql = sql!(SELECT * FROM posts WHERE id=1);
2269 ```
2270
2271 This macro would parse the SQL statement inside it and check that it’s
2272 syntactically correct, which is much more complex processing than a
2273 `macro_rules!` macro can do. The `sql!` macro would be defined like this:
2274
2275 ```
2276 #[proc_macro]
2277 pub fn sql(input: TokenStream) -> TokenStream {
2278 ```
2279
2280 This definition is similar to the custom derive macro’s signature: we receive
2281 the tokens that are inside the parentheses and return the code we wanted to
2282 generate.
2283
2284 <!-- I may get a few looks for this, but I wonder if we should trim the
2285 procedural macros section above a bit. There's a lot of information in there,
2286 but it feels like something we could intro and then point people off to other
2287 materials for. Reason being (and I know I may be in the minority here),
2288 procedural macros are something we should use only rarely in our Rust projects.
2289 They are a burden on the compiler, have the potential to hurt readability and
2290 maintainability, and... you know the saying with great power comes great
2291 responsibilty and all that. /JT -->
2292 <!-- I think we felt obligated to have this section when procedural macros were
2293 introduced because there wasn't any documentation for them. I feel like the
2294 custom derive is the most common kind people want to make... While I'd love to
2295 not have to maintain this section, I asked around and people seemed generally
2296 in favor of keeping it, so I think I will, for now. /Carol -->
2297
2298 ## Summary
2299
2300 Whew! Now you have some Rust features in your toolbox that you likely won’t use
2301 often, but you’ll know they’re available in very particular circumstances.
2302 We’ve introduced several complex topics so that when you encounter them in
2303 error message suggestions or in other peoples’ code, you’ll be able to
2304 recognize these concepts and syntax. Use this chapter as a reference to guide
2305 you to solutions.
2306
2307 Next, we’ll put everything we’ve discussed throughout the book into practice
2308 and do one more project!