src/doc/book/src/ch12-03-improving-error-handling-and-modularity.md

   1 ## Refactoring to Improve Modularity and Error Handling
   2
   3 To improve our program, we’ll fix four problems that have to do with the
   4 program’s structure and how it’s handling potential errors.
   5
   6 First, our `main` function now performs two tasks: it parses arguments and
   7 reads files. For such a small function, this isn’t a major problem. However, if
   8 we continue to grow our program inside `main`, the number of separate tasks the
   9 `main` function handles will increase. As a function gains responsibilities, it
  10 becomes more difficult to reason about, harder to test, and harder to change
  11 without breaking one of its parts. It’s best to separate functionality so each
  12 function is responsible for one task.
  13
  14 This issue also ties into the second problem: although `query` and `filename`
  15 are configuration variables to our program, variables like `contents` are used
  16 to perform the program’s logic. The longer `main` becomes, the more variables
  17 we’ll need to bring into scope; the more variables we have in scope, the harder
  18 it will be to keep track of the purpose of each. It’s best to group the
  19 configuration variables into one structure to make their purpose clear.
  20
  21 The third problem is that we’ve used `expect` to print an error message when
  22 reading the file fails, but the error message just prints `Something went wrong
  23 reading the file`. Reading a file can fail in a number of ways: for example,
  24 the file could be missing, or we might not have permission to open it. Right
  25 now, regardless of the situation, we’d print the `Something went wrong reading
  26 the file` error message, which wouldn’t give the user any information!
  27
  28 Fourth, we use `expect` repeatedly to handle different errors, and if the user
  29 runs our program without specifying enough arguments, they’ll get an `index out
  30 of bounds` error from Rust that doesn’t clearly explain the problem. It would
  31 be best if all the error-handling code were in one place so future maintainers
  32 had only one place to consult in the code if the error-handling logic needed to
  33 change. Having all the error-handling code in one place will also ensure that
  34 we’re printing messages that will be meaningful to our end users.
  35
  36 Let’s address these four problems by refactoring our project.
  37
  38 ### Separation of Concerns for Binary Projects
  39
  40 The organizational problem of allocating responsibility for multiple tasks to
  41 the `main` function is common to many binary projects. As a result, the Rust
  42 community has developed a process to use as a guideline for splitting the
  43 separate concerns of a binary program when `main` starts getting large. The
  44 process has the following steps:
  45
  46 * Split your program into a *main.rs* and a *lib.rs* and move your program’s
  47   logic to *lib.rs*.
  48 * As long as your command line parsing logic is small, it can remain in
  49   *main.rs*.
  50 * When the command line parsing logic starts getting complicated, extract it
  51   from *main.rs* and move it to *lib.rs*.
  52
  53 The responsibilities that remain in the `main` function after this process
  54 should be limited to the following:
  55
  56 * Calling the command line parsing logic with the argument values
  57 * Setting up any other configuration
  58 * Calling a `run` function in *lib.rs*
  59 * Handling the error if `run` returns an error
  60
  61 This pattern is about separating concerns: *main.rs* handles running the
  62 program, and *lib.rs* handles all the logic of the task at hand. Because you
  63 can’t test the `main` function directly, this structure lets you test all of
  64 your program’s logic by moving it into functions in *lib.rs*. The only code
  65 that remains in *main.rs* will be small enough to verify its correctness by
  66 reading it. Let’s rework our program by following this process.
  67
  68 #### Extracting the Argument Parser
  69
  70 We’ll extract the functionality for parsing arguments into a function that
  71 `main` will call to prepare for moving the command line parsing logic to
  72 *src/lib.rs*. Listing 12-5 shows the new start of `main` that calls a new
  73 function `parse_config`, which we’ll define in *src/main.rs* for the moment.
  74
  75 <span class="filename">Filename: src/main.rs</span>
  76
  77 ```rust,ignore
  78 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-05/src/main.rs:here}}
  79 ```
  80
  81 <span class="caption">Listing 12-5: Extracting a `parse_config` function from
  82 `main`</span>
  83
  84 We’re still collecting the command line arguments into a vector, but instead of
  85 assigning the argument value at index 1 to the variable `query` and the
  86 argument value at index 2 to the variable `filename` within the `main`
  87 function, we pass the whole vector to the `parse_config` function. The
  88 `parse_config` function then holds the logic that determines which argument
  89 goes in which variable and passes the values back to `main`. We still create
  90 the `query` and `filename` variables in `main`, but `main` no longer has the
  91 responsibility of determining how the command line arguments and variables
  92 correspond.
  93
  94 This rework may seem like overkill for our small program, but we’re refactoring
  95 in small, incremental steps. After making this change, run the program again to
  96 verify that the argument parsing still works. It’s good to check your progress
  97 often, to help identify the cause of problems when they occur.
  98
  99 #### Grouping Configuration Values
 100
 101 We can take another small step to improve the `parse_config` function further.
 102 At the moment, we’re returning a tuple, but then we immediately break that
 103 tuple into individual parts again. This is a sign that perhaps we don’t have
 104 the right abstraction yet.
 105
 106 Another indicator that shows there’s room for improvement is the `config` part
 107 of `parse_config`, which implies that the two values we return are related and
 108 are both part of one configuration value. We’re not currently conveying this
 109 meaning in the structure of the data other than by grouping the two values into
 110 a tuple; we could put the two values into one struct and give each of the
 111 struct fields a meaningful name. Doing so will make it easier for future
 112 maintainers of this code to understand how the different values relate to each
 113 other and what their purpose is.
 114
 115 > Note: Using primitive values when a complex type would be more appropriate is
 116 > an anti-pattern known as *primitive obsession*.
 117
 118 Listing 12-6 shows the improvements to the `parse_config` function.
 119
 120 <span class="filename">Filename: src/main.rs</span>
 121
 122 ```rust,should_panic
 123 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-06/src/main.rs:here}}
 124 ```
 125
 126 <span class="caption">Listing 12-6: Refactoring `parse_config` to return an
 127 instance of a `Config` struct</span>
 128
 129 We’ve added a struct named `Config` defined to have fields named `query` and
 130 `filename`. The signature of `parse_config` now indicates that it returns a
 131 `Config` value. In the body of `parse_config`, where we used to return string
 132 slices that reference `String` values in `args`, we now define `Config` to
 133 contain owned `String` values. The `args` variable in `main` is the owner of
 134 the argument values and is only letting the `parse_config` function borrow
 135 them, which means we’d violate Rust’s borrowing rules if `Config` tried to take
 136 ownership of the values in `args`.
 137
 138 We could manage the `String` data in a number of different ways, but the
 139 easiest, though somewhat inefficient, route is to call the `clone` method on
 140 the values. This will make a full copy of the data for the `Config` instance to
 141 own, which takes more time and memory than storing a reference to the string
 142 data. However, cloning the data also makes our code very straightforward
 143 because we don’t have to manage the lifetimes of the references; in this
 144 circumstance, giving up a little performance to gain simplicity is a worthwhile
 145 trade-off.
 146
 147 > ### The Trade-Offs of Using `clone`
 148 >
 149 > There’s a tendency among many Rustaceans to avoid using `clone` to fix
 150 > ownership problems because of its runtime cost. In
 151 > [Chapter 13][ch13]<!-- ignore -->, you’ll learn how to use more efficient
 152 > methods in this type of situation. But for now, it’s okay to copy a few
 153 > strings to continue making progress because you’ll make these copies only
 154 > once and your filename and query string are very small. It’s better to have
 155 > a working program that’s a bit inefficient than to try to hyperoptimize code
 156 > on your first pass. As you become more experienced with Rust, it’ll be
 157 > easier to start with the most efficient solution, but for now, it’s
 158 > perfectly acceptable to call `clone`.
 159
 160 We’ve updated `main` so it places the instance of `Config` returned by
 161 `parse_config` into a variable named `config`, and we updated the code that
 162 previously used the separate `query` and `filename` variables so it now uses
 163 the fields on the `Config` struct instead.
 164
 165 Now our code more clearly conveys that `query` and `filename` are related and
 166 that their purpose is to configure how the program will work. Any code that
 167 uses these values knows to find them in the `config` instance in the fields
 168 named for their purpose.
 169
 170 #### Creating a Constructor for `Config`
 171
 172 So far, we’ve extracted the logic responsible for parsing the command line
 173 arguments from `main` and placed it in the `parse_config` function. Doing so
 174 helped us to see that the `query` and `filename` values were related and that
 175 relationship should be conveyed in our code. We then added a `Config` struct to
 176 name the related purpose of `query` and `filename` and to be able to return the
 177 values’ names as struct field names from the `parse_config` function.
 178
 179 So now that the purpose of the `parse_config` function is to create a `Config`
 180 instance, we can change `parse_config` from a plain function to a function
 181 named `new` that is associated with the `Config` struct. Making this change
 182 will make the code more idiomatic. We can create instances of types in the
 183 standard library, such as `String`, by calling `String::new`. Similarly, by
 184 changing `parse_config` into a `new` function associated with `Config`, we’ll
 185 be able to create instances of `Config` by calling `Config::new`. Listing 12-7
 186 shows the changes we need to make.
 187
 188 <span class="filename">Filename: src/main.rs</span>
 189
 190 ```rust,should_panic
 191 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-07/src/main.rs:here}}
 192 ```
 193
 194 <span class="caption">Listing 12-7: Changing `parse_config` into
 195 `Config::new`</span>
 196
 197 We’ve updated `main` where we were calling `parse_config` to instead call
 198 `Config::new`. We’ve changed the name of `parse_config` to `new` and moved it
 199 within an `impl` block, which associates the `new` function with `Config`. Try
 200 compiling this code again to make sure it works.
 201
 202 ### Fixing the Error Handling
 203
 204 Now we’ll work on fixing our error handling. Recall that attempting to access
 205 the values in the `args` vector at index 1 or index 2 will cause the program to
 206 panic if the vector contains fewer than three items. Try running the program
 207 without any arguments; it will look like this:
 208
 209 ```console
 210 {{#include ../listings/ch12-an-io-project/listing-12-07/output.txt}}
 211 ```
 212
 213 The line `index out of bounds: the len is 1 but the index is 1` is an error
 214 message intended for programmers. It won’t help our end users understand what
 215 happened and what they should do instead. Let’s fix that now.
 216
 217 #### Improving the Error Message
 218
 219 In Listing 12-8, we add a check in the `new` function that will verify that the
 220 slice is long enough before accessing index 1 and 2. If the slice isn’t long
 221 enough, the program panics and displays a better error message than the `index
 222 out of bounds` message.
 223
 224 <span class="filename">Filename: src/main.rs</span>
 225
 226 ```rust,ignore
 227 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-08/src/main.rs:here}}
 228 ```
 229
 230 <span class="caption">Listing 12-8: Adding a check for the number of
 231 arguments</span>
 232
 233 This code is similar to [the `Guess::new` function we wrote in Listing
 234 9-10][ch9-custom-types]<!-- ignore -->, where we called `panic!` when the
 235 `value` argument was out of the range of valid values. Instead of checking for
 236 a range of values here, we’re checking that the length of `args` is at least 3
 237 and the rest of the function can operate under the assumption that this
 238 condition has been met. If `args` has fewer than three items, this condition
 239 will be true, and we call the `panic!` macro to end the program immediately.
 240
 241 With these extra few lines of code in `new`, let’s run the program without any
 242 arguments again to see what the error looks like now:
 243
 244 ```console
 245 {{#include ../listings/ch12-an-io-project/listing-12-08/output.txt}}
 246 ```
 247
 248 This output is better: we now have a reasonable error message. However, we also
 249 have extraneous information we don’t want to give to our users. Perhaps using
 250 the technique we used in Listing 9-10 isn’t the best to use here: a call to
 251 `panic!` is more appropriate for a programming problem than a usage problem,
 252 [as discussed in Chapter 9][ch9-error-guidelines]<!-- ignore -->. Instead, we
 253 can use the other technique you learned about in Chapter 9—[returning a
 254 `Result`][ch9-result]<!-- ignore --> that indicates either success or an error.
 255
 256 #### Returning a `Result` from `new` Instead of Calling `panic!`
 257
 258 We can instead return a `Result` value that will contain a `Config` instance in
 259 the successful case and will describe the problem in the error case. When
 260 `Config::new` is communicating to `main`, we can use the `Result` type to
 261 signal there was a problem. Then we can change `main` to convert an `Err`
 262 variant into a more practical error for our users without the surrounding text
 263 about `thread 'main'` and `RUST_BACKTRACE` that a call to `panic!` causes.
 264
 265 Listing 12-9 shows the changes we need to make to the return value of
 266 `Config::new` and the body of the function needed to return a `Result`. Note
 267 that this won’t compile until we update `main` as well, which we’ll do in the
 268 next listing.
 269
 270 <span class="filename">Filename: src/main.rs</span>
 271
 272 ```rust,ignore
 273 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-09/src/main.rs:here}}
 274 ```
 275
 276 <span class="caption">Listing 12-9: Returning a `Result` from
 277 `Config::new`</span>
 278
 279 Our `new` function now returns a `Result` with a `Config` instance in the
 280 success case and a `&str` in the error case.
 281
 282 We’ve made two changes in the body of the `new` function: instead of calling
 283 `panic!` when the user doesn’t pass enough arguments, we now return an `Err`
 284 value, and we’ve wrapped the `Config` return value in an `Ok`. These changes
 285 make the function conform to its new type signature.
 286
 287 Returning an `Err` value from `Config::new` allows the `main` function to
 288 handle the `Result` value returned from the `new` function and exit the process
 289 more cleanly in the error case.
 290
 291 #### Calling `Config::new` and Handling Errors
 292
 293 To handle the error case and print a user-friendly message, we need to update
 294 `main` to handle the `Result` being returned by `Config::new`, as shown in
 295 Listing 12-10. We’ll also take the responsibility of exiting the command line
 296 tool with a nonzero error code from `panic!` and implement it by hand. A
 297 nonzero exit status is a convention to signal to the process that called our
 298 program that the program exited with an error state.
 299
 300 <span class="filename">Filename: src/main.rs</span>
 301
 302 ```rust,ignore
 303 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-10/src/main.rs:here}}
 304 ```
 305
 306 <span class="caption">Listing 12-10: Exiting with an error code if creating a
 307 new `Config` fails</span>
 308
 309 In this listing, we’ve used a method we haven’t covered before:
 310 `unwrap_or_else`, which is defined on `Result<T, E>` by the standard library.
 311 Using `unwrap_or_else` allows us to define some custom, non-`panic!` error
 312 handling. If the `Result` is an `Ok` value, this method’s behavior is similar
 313 to `unwrap`: it returns the inner value `Ok` is wrapping. However, if the value
 314 is an `Err` value, this method calls the code in the *closure*, which is an
 315 anonymous function we define and pass as an argument to `unwrap_or_else`. We’ll
 316 cover closures in more detail in [Chapter 13][ch13]<!-- ignore -->. For now,
 317 you just need to know that `unwrap_or_else` will pass the inner value of the
 318 `Err`, which in this case is the static string `"not enough arguments"` that we
 319 added in Listing 12-9, to our closure in the argument `err` that appears
 320 between the vertical pipes. The code in the closure can then use the `err`
 321 value when it runs.
 322
 323 We’ve added a new `use` line to bring `process` from the standard library into
 324 scope. The code in the closure that will be run in the error case is only two
 325 lines: we print the `err` value and then call `process::exit`. The
 326 `process::exit` function will stop the program immediately and return the
 327 number that was passed as the exit status code. This is similar to the
 328 `panic!`-based handling we used in Listing 12-8, but we no longer get all the
 329 extra output. Let’s try it:
 330
 331 ```console
 332 {{#include ../listings/ch12-an-io-project/listing-12-10/output.txt}}
 333 ```
 334
 335 Great! This output is much friendlier for our users.
 336
 337 ### Extracting Logic from `main`
 338
 339 Now that we’ve finished refactoring the configuration parsing, let’s turn to
 340 the program’s logic. As we stated in [“Separation of Concerns for Binary
 341 Projects”](#separation-of-concerns-for-binary-projects)<!-- ignore -->, we’ll
 342 extract a function named `run` that will hold all the logic currently in the
 343 `main` function that isn’t involved with setting up configuration or handling
 344 errors. When we’re done, `main` will be concise and easy to verify by
 345 inspection, and we’ll be able to write tests for all the other logic.
 346
 347 Listing 12-11 shows the extracted `run` function. For now, we’re just making
 348 the small, incremental improvement of extracting the function. We’re still
 349 defining the function in *src/main.rs*.
 350
 351 <span class="filename">Filename: src/main.rs</span>
 352
 353 ```rust,ignore
 354 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-11/src/main.rs:here}}
 355 ```
 356
 357 <span class="caption">Listing 12-11: Extracting a `run` function containing the
 358 rest of the program logic</span>
 359
 360 The `run` function now contains all the remaining logic from `main`, starting
 361 from reading the file. The `run` function takes the `Config` instance as an
 362 argument.
 363
 364 #### Returning Errors from the `run` Function
 365
 366 With the remaining program logic separated into the `run` function, we can
 367 improve the error handling, as we did with `Config::new` in Listing 12-9.
 368 Instead of allowing the program to panic by calling `expect`, the `run`
 369 function will return a `Result<T, E>` when something goes wrong. This will let
 370 us further consolidate into `main` the logic around handling errors in a
 371 user-friendly way. Listing 12-12 shows the changes we need to make to the
 372 signature and body of `run`.
 373
 374 <span class="filename">Filename: src/main.rs</span>
 375
 376 ```rust,ignore
 377 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-12/src/main.rs:here}}
 378 ```
 379
 380 <span class="caption">Listing 12-12: Changing the `run` function to return
 381 `Result`</span>
 382
 383 We’ve made three significant changes here. First, we changed the return type of
 384 the `run` function to `Result<(), Box<dyn Error>>`. This function previously
 385 returned the unit type, `()`, and we keep that as the value returned in the
 386 `Ok` case.
 387
 388 For the error type, we used the *trait object* `Box<dyn Error>` (and we’ve
 389 brought `std::error::Error` into scope with a `use` statement at the top).
 390 We’ll cover trait objects in [Chapter 17][ch17]<!-- ignore -->. For now, just
 391 know that `Box<dyn Error>` means the function will return a type that
 392 implements the `Error` trait, but we don’t have to specify what particular type
 393 the return value will be. This gives us flexibility to return error values that
 394 may be of different types in different error cases. The `dyn` keyword is short
 395 for “dynamic.”
 396
 397 Second, we’ve removed the call to `expect` in favor of the `?` operator, as we
 398 talked about in [Chapter 9][ch9-question-mark]<!-- ignore -->. Rather than
 399 `panic!` on an error, `?` will return the error value from the current function
 400 for the caller to handle.
 401
 402 Third, the `run` function now returns an `Ok` value in the success case. We’ve
 403 declared the `run` function’s success type as `()` in the signature, which
 404 means we need to wrap the unit type value in the `Ok` value. This `Ok(())`
 405 syntax might look a bit strange at first, but using `()` like this is the
 406 idiomatic way to indicate that we’re calling `run` for its side effects only;
 407 it doesn’t return a value we need.
 408
 409 When you run this code, it will compile but will display a warning:
 410
 411 ```console
 412 {{#include ../listings/ch12-an-io-project/listing-12-12/output.txt}}
 413 ```
 414
 415 Rust tells us that our code ignored the `Result` value and the `Result` value
 416 might indicate that an error occurred. But we’re not checking to see whether or
 417 not there was an error, and the compiler reminds us that we probably meant to
 418 have some error-handling code here! Let’s rectify that problem now.
 419
 420 #### Handling Errors Returned from `run` in `main`
 421
 422 We’ll check for errors and handle them using a technique similar to one we used
 423 with `Config::new` in Listing 12-10, but with a slight difference:
 424
 425 <span class="filename">Filename: src/main.rs</span>
 426
 427 ```rust,ignore
 428 {{#rustdoc_include ../listings/ch12-an-io-project/no-listing-01-handling-errors-in-main/src/main.rs:here}}
 429 ```
 430
 431 We use `if let` rather than `unwrap_or_else` to check whether `run` returns an
 432 `Err` value and call `process::exit(1)` if it does. The `run` function doesn’t
 433 return a value that we want to `unwrap` in the same way that `Config::new`
 434 returns the `Config` instance. Because `run` returns `()` in the success case,
 435 we only care about detecting an error, so we don’t need `unwrap_or_else` to
 436 return the unwrapped value because it would only be `()`.
 437
 438 The bodies of the `if let` and the `unwrap_or_else` functions are the same in
 439 both cases: we print the error and exit.
 440
 441 ### Splitting Code into a Library Crate
 442
 443 Our `minigrep` project is looking good so far! Now we’ll split the
 444 *src/main.rs* file and put some code into the *src/lib.rs* file so we can test
 445 it and have a *src/main.rs* file with fewer responsibilities.
 446
 447 Let’s move all the code that isn’t the `main` function from *src/main.rs* to
 448 *src/lib.rs*:
 449
 450 * The `run` function definition
 451 * The relevant `use` statements
 452 * The definition of `Config`
 453 * The `Config::new` function definition
 454
 455 The contents of *src/lib.rs* should have the signatures shown in Listing 12-13
 456 (we’ve omitted the bodies of the functions for brevity). Note that this won’t
 457 compile until we modify *src/main.rs* in Listing 12-14.
 458
 459 <span class="filename">Filename: src/lib.rs</span>
 460
 461 ```rust,ignore
 462 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-13/src/lib.rs:here}}
 463 ```
 464
 465 <span class="caption">Listing 12-13: Moving `Config` and `run` into
 466 *src/lib.rs*</span>
 467
 468 We’ve made liberal use of the `pub` keyword: on `Config`, on its fields and its
 469 `new` method, and on the `run` function. We now have a library crate that has a
 470 public API that we can test!
 471
 472 Now we need to bring the code we moved to *src/lib.rs* into the scope of the
 473 binary crate in *src/main.rs*, as shown in Listing 12-14.
 474
 475 <span class="filename">Filename: src/main.rs</span>
 476
 477 ```rust,ignore
 478 {{#rustdoc_include ../listings/ch12-an-io-project/listing-12-14/src/main.rs:here}}
 479 ```
 480
 481 <span class="caption">Listing 12-14: Using the `minigrep` library crate in
 482 *src/main.rs*</span>
 483
 484 We add a `use minigrep::Config` line to bring the `Config` type from the
 485 library crate into the binary crate’s scope, and we prefix the `run` function
 486 with our crate name. Now all the functionality should be connected and should
 487 work. Run the program with `cargo run` and make sure everything works
 488 correctly.
 489
 490 Whew! That was a lot of work, but we’ve set ourselves up for success in the
 491 future. Now it’s much easier to handle errors, and we’ve made the code more
 492 modular. Almost all of our work will be done in *src/lib.rs* from here on out.
 493
 494 Let’s take advantage of this newfound modularity by doing something that would
 495 have been difficult with the old code but is easy with the new code: we’ll
 496 write some tests!
 497
 498 [ch13]: ch13-00-functional-features.html
 499 [ch9-custom-types]: ch09-03-to-panic-or-not-to-panic.html#creating-custom-types-for-validation
 500 [ch9-error-guidelines]: ch09-03-to-panic-or-not-to-panic.html#guidelines-for-error-handling
 501 [ch9-result]: ch09-02-recoverable-errors-with-result.html
 502 [ch17]: ch17-00-oop.html
 503 [ch9-question-mark]: ch09-02-recoverable-errors-with-result.html#a-shortcut-for-propagating-errors-the--operator