]>
Commit | Line | Data |
---|---|---|
5099ac24 FG |
1 | <!-- DO NOT EDIT THIS FILE. |
2 | ||
3 | This file is periodically generated from the content in the `/src/` | |
4 | directory, so all fixes need to be made in `/src/`. | |
5 | --> | |
a2a8927a XL |
6 | |
7 | [TOC] | |
8 | ||
9 | # An I/O Project: Building a Command Line Program | |
10 | ||
11 | This chapter is a recap of the many skills you’ve learned so far and an | |
12 | exploration of a few more standard library features. We’ll build a command line | |
13 | tool that interacts with file and command line input/output to practice some of | |
14 | the Rust concepts you now have under your belt. | |
15 | ||
16 | Rust’s speed, safety, single binary output, and cross-platform support make it | |
17 | an ideal language for creating command line tools, so for our project, we’ll | |
04454e1e FG |
18 | make our own version of the classic command line search tool `grep` |
19 | (**g**lobally search a **r**egular **e**xpression and **p**rint). In the | |
20 | simplest use case, `grep` searches a specified file for a specified string. To | |
923072b8 FG |
21 | do so, `grep` takes as its arguments a file path and a string. Then it reads |
22 | the file, finds lines in that file that contain the string argument, and prints | |
04454e1e FG |
23 | those lines. |
24 | ||
25 | Along the way, we’ll show how to make our command line tool use the terminal | |
26 | features that many other command line tools use. We’ll read the value of an | |
a2a8927a XL |
27 | environment variable to allow the user to configure the behavior of our tool. |
28 | We’ll also print error messages to the standard error console stream (`stderr`) | |
29 | instead of standard output (`stdout`), so, for example, the user can redirect | |
30 | successful output to a file while still seeing error messages onscreen. | |
31 | ||
32 | One Rust community member, Andrew Gallant, has already created a fully | |
33 | featured, very fast version of `grep`, called `ripgrep`. By comparison, our | |
04454e1e FG |
34 | version will be fairly simple, but this chapter will give you some of the |
35 | background knowledge you need to understand a real-world project such as | |
a2a8927a XL |
36 | `ripgrep`. |
37 | ||
38 | Our `grep` project will combine a number of concepts you’ve learned so far: | |
39 | ||
40 | * Organizing code (using what you learned about modules in Chapter 7) | |
41 | * Using vectors and strings (collections, Chapter 8) | |
42 | * Handling errors (Chapter 9) | |
43 | * Using traits and lifetimes where appropriate (Chapter 10) | |
44 | * Writing tests (Chapter 11) | |
45 | ||
46 | We’ll also briefly introduce closures, iterators, and trait objects, which | |
47 | Chapters 13 and 17 will cover in detail. | |
48 | ||
49 | ## Accepting Command Line Arguments | |
50 | ||
51 | Let’s create a new project with, as always, `cargo new`. We’ll call our project | |
52 | `minigrep` to distinguish it from the `grep` tool that you might already have | |
53 | on your system. | |
54 | ||
55 | ``` | |
56 | $ cargo new minigrep | |
57 | Created binary (application) `minigrep` project | |
58 | $ cd minigrep | |
59 | ``` | |
60 | ||
61 | The first task is to make `minigrep` accept its two command line arguments: the | |
923072b8 FG |
62 | file path and a string to search for. That is, we want to be able to run our |
63 | program with `cargo run`, two hyphens to indicate the following arguments are | |
64 | for our program rather than for `cargo`, a string to search for, and a path to | |
65 | a file to search in, like so: | |
a2a8927a XL |
66 | |
67 | ``` | |
923072b8 | 68 | $ cargo run -- searchstring example-filename.txt |
a2a8927a XL |
69 | ``` |
70 | ||
923072b8 FG |
71 | <!--- |
72 | Depending on platform, the above might be written as | |
73 | ||
74 | ``` | |
75 | $ cargo run -- searchstring example-filename.txt | |
76 | ``` | |
77 | ||
78 | This is mentioned in the cargo run help: | |
79 | ||
80 | cargo-run | |
81 | Run a binary or example of the local package | |
82 | ||
83 | USAGE: | |
84 | cargo run [OPTIONS] [--] [args]... | |
85 | ||
86 | I know it's optional, but I think it's a bit more failsafe to separate cargo | |
87 | and its arguments from your app and your app's arguments. | |
88 | ||
89 | /JT ---> | |
90 | <!-- Good call, I've updated this here and throughout where relevant. /Carol --> | |
91 | ||
a2a8927a XL |
92 | Right now, the program generated by `cargo new` cannot process arguments we |
93 | give it. Some existing libraries on *https://crates.io/* can help with writing | |
94 | a program that accepts command line arguments, but because you’re just learning | |
95 | this concept, let’s implement this capability ourselves. | |
96 | ||
97 | ### Reading the Argument Values | |
98 | ||
99 | To enable `minigrep` to read the values of command line arguments we pass to | |
04454e1e FG |
100 | it, we’ll need the `std::env::args` function provided in Rust’s standard |
101 | library. This function returns an iterator of the command line arguments passed | |
102 | to `minigrep`. We’ll cover iterators fully in Chapter 13. For now, you only | |
103 | need to know two details about iterators: iterators produce a series of values, | |
104 | and we can call the `collect` method on an iterator to turn it into a | |
105 | collection, such as a vector, that contains all the elements the iterator | |
106 | produces. | |
a2a8927a | 107 | |
04454e1e FG |
108 | The code in Listing 12-1 allows your `minigrep` program to read any command |
109 | line arguments passed to it and then collect the values into a vector. | |
a2a8927a XL |
110 | |
111 | Filename: src/main.rs | |
112 | ||
113 | ``` | |
114 | use std::env; | |
115 | ||
116 | fn main() { | |
117 | let args: Vec<String> = env::args().collect(); | |
923072b8 | 118 | dbg!(args); |
a2a8927a XL |
119 | } |
120 | ``` | |
121 | ||
122 | Listing 12-1: Collecting the command line arguments into a vector and printing | |
123 | them | |
124 | ||
125 | First, we bring the `std::env` module into scope with a `use` statement so we | |
126 | can use its `args` function. Notice that the `std::env::args` function is | |
127 | nested in two levels of modules. As we discussed in Chapter 7, in cases where | |
923072b8 FG |
128 | the desired function is nested in more than one module, we’ve chosen to bring |
129 | the parent module into scope rather than the function. By doing so, we can | |
130 | easily use other functions from `std::env`. It’s also less ambiguous than | |
a2a8927a XL |
131 | adding `use std::env::args` and then calling the function with just `args`, |
132 | because `args` might easily be mistaken for a function that’s defined in the | |
133 | current module. | |
134 | ||
923072b8 FG |
135 | <!--- |
136 | ||
137 | "it’s conventional to bring the parent module into scope rather than the | |
138 | function" | |
139 | ||
140 | I'm not sure if we have a strong standard. The first thing that came to mind | |
141 | was "how does rustfmt handle it?" and it doesn't have any preferred format. | |
142 | Same for clippy. | |
143 | ||
144 | I'd say we could show them how to do it, but I wouldn't say anything about | |
145 | convention. | |
146 | ||
147 | /JT ---> | |
148 | <!-- Fair, I changed "it's conventional" to "we've chosen". /Carol --> | |
149 | ||
a2a8927a XL |
150 | > ### The `args` Function and Invalid Unicode |
151 | > | |
152 | > Note that `std::env::args` will panic if any argument contains invalid | |
153 | > Unicode. If your program needs to accept arguments containing invalid | |
154 | > Unicode, use `std::env::args_os` instead. That function returns an iterator | |
155 | > that produces `OsString` values instead of `String` values. We’ve chosen to | |
156 | > use `std::env::args` here for simplicity, because `OsString` values differ | |
157 | > per platform and are more complex to work with than `String` values. | |
158 | ||
159 | On the first line of `main`, we call `env::args`, and we immediately use | |
160 | `collect` to turn the iterator into a vector containing all the values produced | |
161 | by the iterator. We can use the `collect` function to create many kinds of | |
162 | collections, so we explicitly annotate the type of `args` to specify that we | |
163 | want a vector of strings. Although we very rarely need to annotate types in | |
164 | Rust, `collect` is one function you do often need to annotate because Rust | |
165 | isn’t able to infer the kind of collection you want. | |
166 | ||
923072b8 FG |
167 | Finally, we print the vector using the debug macro. Let’s try running the code |
168 | first with no arguments and then with two arguments: | |
a2a8927a XL |
169 | |
170 | ``` | |
171 | $ cargo run | |
172 | --snip-- | |
173 | ["target/debug/minigrep"] | |
174 | ``` | |
175 | ||
176 | ``` | |
923072b8 | 177 | $ cargo run -- needle haystack |
a2a8927a XL |
178 | --snip-- |
179 | ["target/debug/minigrep", "needle", "haystack"] | |
180 | ``` | |
181 | ||
182 | Notice that the first value in the vector is `"target/debug/minigrep"`, which | |
183 | is the name of our binary. This matches the behavior of the arguments list in | |
184 | C, letting programs use the name by which they were invoked in their execution. | |
185 | It’s often convenient to have access to the program name in case you want to | |
186 | print it in messages or change behavior of the program based on what command | |
187 | line alias was used to invoke the program. But for the purposes of this | |
188 | chapter, we’ll ignore it and save only the two arguments we need. | |
189 | ||
190 | ### Saving the Argument Values in Variables | |
191 | ||
04454e1e FG |
192 | The program is currently able to access the values specified as command line |
193 | arguments. Now we need to save the values of the two arguments in variables so | |
194 | we can use the values throughout the rest of the program. We do that in Listing | |
195 | 12-2. | |
a2a8927a XL |
196 | |
197 | Filename: src/main.rs | |
198 | ||
199 | ``` | |
200 | use std::env; | |
201 | ||
202 | fn main() { | |
203 | let args: Vec<String> = env::args().collect(); | |
204 | ||
205 | let query = &args[1]; | |
923072b8 | 206 | let file_path = &args[2]; |
a2a8927a XL |
207 | |
208 | println!("Searching for {}", query); | |
923072b8 | 209 | println!("In file {}", file_path); |
a2a8927a XL |
210 | } |
211 | ``` | |
212 | ||
923072b8 | 213 | Listing 12-2: Creating variables to hold the query argument and file path |
a2a8927a XL |
214 | argument |
215 | ||
216 | As we saw when we printed the vector, the program’s name takes up the first | |
04454e1e FG |
217 | value in the vector at `args[0]`, so we’re starting arguments at index `1`. The |
218 | first argument `minigrep` takes is the string we’re searching for, so we put a | |
a2a8927a | 219 | reference to the first argument in the variable `query`. The second argument |
923072b8 FG |
220 | will be the file path, so we put a reference to the second argument in the |
221 | variable `file_path`. | |
a2a8927a XL |
222 | |
223 | We temporarily print the values of these variables to prove that the code is | |
224 | working as we intend. Let’s run this program again with the arguments `test` | |
225 | and `sample.txt`: | |
226 | ||
227 | ``` | |
923072b8 | 228 | $ cargo run -- test sample.txt |
a2a8927a XL |
229 | Compiling minigrep v0.1.0 (file:///projects/minigrep) |
230 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s | |
231 | Running `target/debug/minigrep test sample.txt` | |
232 | Searching for test | |
233 | In file sample.txt | |
234 | ``` | |
235 | ||
236 | Great, the program is working! The values of the arguments we need are being | |
237 | saved into the right variables. Later we’ll add some error handling to deal | |
238 | with certain potential erroneous situations, such as when the user provides no | |
239 | arguments; for now, we’ll ignore that situation and work on adding file-reading | |
240 | capabilities instead. | |
241 | ||
242 | ## Reading a File | |
243 | ||
923072b8 | 244 | Now we’ll add functionality to read the file specified in the `file_path` |
04454e1e | 245 | argument. First, we need a sample file to test it with: we’ll use a file with a |
a2a8927a XL |
246 | small amount of text over multiple lines with some repeated words. Listing 12-3 |
247 | has an Emily Dickinson poem that will work well! Create a file called | |
248 | *poem.txt* at the root level of your project, and enter the poem “I’m Nobody! | |
249 | Who are you?” | |
250 | ||
251 | Filename: poem.txt | |
252 | ||
253 | ``` | |
254 | I'm nobody! Who are you? | |
255 | Are you nobody, too? | |
256 | Then there's a pair of us - don't tell! | |
257 | They'd banish us, you know. | |
258 | ||
259 | How dreary to be somebody! | |
260 | How public, like a frog | |
261 | To tell your name the livelong day | |
262 | To an admiring bog! | |
263 | ``` | |
264 | ||
265 | Listing 12-3: A poem by Emily Dickinson makes a good test case | |
266 | ||
267 | With the text in place, edit *src/main.rs* and add code to read the file, as | |
268 | shown in Listing 12-4. | |
269 | ||
270 | Filename: src/main.rs | |
271 | ||
272 | ``` | |
273 | use std::env; | |
274 | [1] use std::fs; | |
275 | ||
276 | fn main() { | |
277 | // --snip-- | |
923072b8 | 278 | println!("In file {}", file_path); |
a2a8927a | 279 | |
923072b8 FG |
280 | [2] let contents = fs::read_to_string(file_path) |
281 | .expect("Should have been able to read the file"); | |
a2a8927a | 282 | |
923072b8 | 283 | [3] println!("With text:\n{contents}"); |
a2a8927a XL |
284 | } |
285 | ``` | |
286 | ||
287 | Listing 12-4: Reading the contents of the file specified by the second argument | |
288 | ||
04454e1e FG |
289 | First, we bring in a relevant part of the standard library with a `use` |
290 | statement: we need `std::fs` to handle files [1]. | |
a2a8927a | 291 | |
923072b8 FG |
292 | In `main`, the new statement `fs::read_to_string` takes the `file_path`, opens |
293 | that file, and returns a `std::io::Result<String>` of the file’s contents [2]. | |
294 | ||
295 | <!--- | |
296 | ||
297 | The above returns `std::io::Result<String>`. Calling it `Result<String>` is a | |
298 | bit ambiguous and may confuse the reader. | |
299 | ||
300 | /JT ---> | |
301 | <!-- Totally right, I've fixed! /Carol --> | |
a2a8927a | 302 | |
04454e1e FG |
303 | After that, we again add a temporary `println!` statement that prints the value |
304 | of `contents` after the file is read, so we can check that the program is | |
305 | working so far [3]. | |
a2a8927a XL |
306 | |
307 | Let’s run this code with any string as the first command line argument (because | |
308 | we haven’t implemented the searching part yet) and the *poem.txt* file as the | |
309 | second argument: | |
310 | ||
311 | ``` | |
923072b8 | 312 | $ cargo run -- the poem.txt |
a2a8927a XL |
313 | Compiling minigrep v0.1.0 (file:///projects/minigrep) |
314 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s | |
315 | Running `target/debug/minigrep the poem.txt` | |
316 | Searching for the | |
317 | In file poem.txt | |
318 | With text: | |
319 | I'm nobody! Who are you? | |
320 | Are you nobody, too? | |
321 | Then there's a pair of us - don't tell! | |
322 | They'd banish us, you know. | |
323 | ||
324 | How dreary to be somebody! | |
325 | How public, like a frog | |
326 | To tell your name the livelong day | |
327 | To an admiring bog! | |
328 | ``` | |
329 | ||
330 | Great! The code read and then printed the contents of the file. But the code | |
04454e1e FG |
331 | has a few flaws. At the moment, the `main` function has multiple |
332 | responsibilities: generally, functions are clearer and easier to maintain if | |
333 | each function is responsible for only one idea. The other problem is that we’re | |
334 | not handling errors as well as we could. The program is still small, so these | |
335 | flaws aren’t a big problem, but as the program grows, it will be harder to fix | |
336 | them cleanly. It’s good practice to begin refactoring early on when developing | |
337 | a program, because it’s much easier to refactor smaller amounts of code. We’ll | |
338 | do that next. | |
a2a8927a XL |
339 | |
340 | ## Refactoring to Improve Modularity and Error Handling | |
341 | ||
342 | To improve our program, we’ll fix four problems that have to do with the | |
04454e1e FG |
343 | program’s structure and how it’s handling potential errors. First, our `main` |
344 | function now performs two tasks: it parses arguments and reads files. As our | |
345 | program grows, the number of separate tasks the `main` function handles will | |
346 | increase. As a function gains responsibilities, it becomes more difficult to | |
347 | reason about, harder to test, and harder to change without breaking one of its | |
348 | parts. It’s best to separate functionality so each function is responsible for | |
349 | one task. | |
a2a8927a | 350 | |
923072b8 | 351 | This issue also ties into the second problem: although `query` and `file_path` |
a2a8927a XL |
352 | are configuration variables to our program, variables like `contents` are used |
353 | to perform the program’s logic. The longer `main` becomes, the more variables | |
354 | we’ll need to bring into scope; the more variables we have in scope, the harder | |
355 | it will be to keep track of the purpose of each. It’s best to group the | |
356 | configuration variables into one structure to make their purpose clear. | |
357 | ||
358 | The third problem is that we’ve used `expect` to print an error message when | |
923072b8 FG |
359 | reading the file fails, but the error message just prints `Should have been |
360 | able to read the file`. Reading a file can fail in a number of ways: for | |
361 | example, the file could be missing, or we might not have permission to open it. | |
362 | Right now, regardless of the situation, we’d print the same error message for | |
04454e1e | 363 | everything, which wouldn’t give the user any information! |
a2a8927a XL |
364 | |
365 | Fourth, we use `expect` repeatedly to handle different errors, and if the user | |
366 | runs our program without specifying enough arguments, they’ll get an `index out | |
367 | of bounds` error from Rust that doesn’t clearly explain the problem. It would | |
368 | be best if all the error-handling code were in one place so future maintainers | |
04454e1e | 369 | had only one place to consult the code if the error-handling logic needed to |
a2a8927a XL |
370 | change. Having all the error-handling code in one place will also ensure that |
371 | we’re printing messages that will be meaningful to our end users. | |
372 | ||
373 | Let’s address these four problems by refactoring our project. | |
374 | ||
375 | ### Separation of Concerns for Binary Projects | |
376 | ||
377 | The organizational problem of allocating responsibility for multiple tasks to | |
378 | the `main` function is common to many binary projects. As a result, the Rust | |
04454e1e FG |
379 | community has developed guidelines for splitting the separate concerns of a |
380 | binary program when `main` starts getting large. This process has the following | |
381 | steps: | |
a2a8927a XL |
382 | |
383 | * Split your program into a *main.rs* and a *lib.rs* and move your program’s | |
384 | logic to *lib.rs*. | |
385 | * As long as your command line parsing logic is small, it can remain in | |
386 | *main.rs*. | |
387 | * When the command line parsing logic starts getting complicated, extract it | |
388 | from *main.rs* and move it to *lib.rs*. | |
389 | ||
390 | The responsibilities that remain in the `main` function after this process | |
391 | should be limited to the following: | |
392 | ||
393 | * Calling the command line parsing logic with the argument values | |
394 | * Setting up any other configuration | |
395 | * Calling a `run` function in *lib.rs* | |
396 | * Handling the error if `run` returns an error | |
397 | ||
398 | This pattern is about separating concerns: *main.rs* handles running the | |
399 | program, and *lib.rs* handles all the logic of the task at hand. Because you | |
400 | can’t test the `main` function directly, this structure lets you test all of | |
04454e1e FG |
401 | your program’s logic by moving it into functions in *lib.rs*. The code that |
402 | remains in *main.rs* will be small enough to verify its correctness by reading | |
403 | it. Let’s rework our program by following this process. | |
a2a8927a XL |
404 | |
405 | #### Extracting the Argument Parser | |
406 | ||
407 | We’ll extract the functionality for parsing arguments into a function that | |
408 | `main` will call to prepare for moving the command line parsing logic to | |
409 | *src/lib.rs*. Listing 12-5 shows the new start of `main` that calls a new | |
410 | function `parse_config`, which we’ll define in *src/main.rs* for the moment. | |
411 | ||
412 | Filename: src/main.rs | |
413 | ||
414 | ``` | |
415 | fn main() { | |
416 | let args: Vec<String> = env::args().collect(); | |
417 | ||
923072b8 | 418 | let (query, file_path) = parse_config(&args); |
a2a8927a XL |
419 | |
420 | // --snip-- | |
421 | } | |
422 | ||
423 | fn parse_config(args: &[String]) -> (&str, &str) { | |
424 | let query = &args[1]; | |
923072b8 | 425 | let file_path = &args[2]; |
a2a8927a | 426 | |
923072b8 | 427 | (query, file_path) |
a2a8927a XL |
428 | } |
429 | ``` | |
430 | ||
431 | Listing 12-5: Extracting a `parse_config` function from `main` | |
432 | ||
433 | We’re still collecting the command line arguments into a vector, but instead of | |
434 | assigning the argument value at index 1 to the variable `query` and the | |
923072b8 | 435 | argument value at index 2 to the variable `file_path` within the `main` |
a2a8927a XL |
436 | function, we pass the whole vector to the `parse_config` function. The |
437 | `parse_config` function then holds the logic that determines which argument | |
438 | goes in which variable and passes the values back to `main`. We still create | |
923072b8 | 439 | the `query` and `file_path` variables in `main`, but `main` no longer has the |
a2a8927a XL |
440 | responsibility of determining how the command line arguments and variables |
441 | correspond. | |
442 | ||
443 | This rework may seem like overkill for our small program, but we’re refactoring | |
444 | in small, incremental steps. After making this change, run the program again to | |
445 | verify that the argument parsing still works. It’s good to check your progress | |
446 | often, to help identify the cause of problems when they occur. | |
447 | ||
448 | #### Grouping Configuration Values | |
449 | ||
450 | We can take another small step to improve the `parse_config` function further. | |
451 | At the moment, we’re returning a tuple, but then we immediately break that | |
452 | tuple into individual parts again. This is a sign that perhaps we don’t have | |
453 | the right abstraction yet. | |
454 | ||
455 | Another indicator that shows there’s room for improvement is the `config` part | |
456 | of `parse_config`, which implies that the two values we return are related and | |
457 | are both part of one configuration value. We’re not currently conveying this | |
458 | meaning in the structure of the data other than by grouping the two values into | |
04454e1e | 459 | a tuple; we’ll instead put the two values into one struct and give each of the |
a2a8927a XL |
460 | struct fields a meaningful name. Doing so will make it easier for future |
461 | maintainers of this code to understand how the different values relate to each | |
462 | other and what their purpose is. | |
463 | ||
464 | Listing 12-6 shows the improvements to the `parse_config` function. | |
465 | ||
466 | Filename: src/main.rs | |
467 | ||
468 | ``` | |
469 | fn main() { | |
470 | let args: Vec<String> = env::args().collect(); | |
471 | ||
472 | [1] let config = parse_config(&args); | |
473 | ||
474 | println!("Searching for {}", config.query[2]); | |
923072b8 | 475 | println!("In file {}", config.file_path[3]); |
a2a8927a | 476 | |
923072b8 FG |
477 | let contents = fs::read_to_string(config.file_path[4]) |
478 | .expect("Should have been able to read the file"); | |
a2a8927a XL |
479 | |
480 | // --snip-- | |
481 | } | |
482 | ||
483 | [5] struct Config { | |
484 | query: String, | |
923072b8 | 485 | file_path: String, |
a2a8927a XL |
486 | } |
487 | ||
488 | [6] fn parse_config(args: &[String]) -> Config { | |
489 | [7] let query = args[1].clone(); | |
923072b8 | 490 | [8] let file_path = args[2].clone(); |
a2a8927a | 491 | |
923072b8 | 492 | Config { query, file_path } |
a2a8927a XL |
493 | } |
494 | ``` | |
495 | ||
496 | Listing 12-6: Refactoring `parse_config` to return an instance of a `Config` | |
497 | struct | |
498 | ||
499 | We’ve added a struct named `Config` defined to have fields named `query` and | |
923072b8 | 500 | `file_path` [5]. The signature of `parse_config` now indicates that it returns a |
a2a8927a XL |
501 | `Config` value [6]. In the body of `parse_config`, where we used to return |
502 | string slices that reference `String` values in `args`, we now define `Config` | |
503 | to contain owned `String` values. The `args` variable in `main` is the owner of | |
504 | the argument values and is only letting the `parse_config` function borrow | |
505 | them, which means we’d violate Rust’s borrowing rules if `Config` tried to take | |
506 | ownership of the values in `args`. | |
507 | ||
04454e1e FG |
508 | There are a number of ways we could manage the `String` data; the easiest, |
509 | though somewhat inefficient, route is to call the `clone` method on the values | |
510 | [7][8]. This will make a full copy of the data for the `Config` instance to | |
511 | own, which takes more time and memory than storing a reference to the string | |
512 | data. However, cloning the data also makes our code very straightforward | |
513 | because we don’t have to manage the lifetimes of the references; in this | |
514 | circumstance, giving up a little performance to gain simplicity is a worthwhile | |
515 | trade-off. | |
a2a8927a XL |
516 | |
517 | > ### The Trade-Offs of Using `clone` | |
518 | > | |
519 | > There’s a tendency among many Rustaceans to avoid using `clone` to fix | |
520 | > ownership problems because of its runtime cost. In | |
521 | > Chapter 13, you’ll learn how to use more efficient | |
522 | > methods in this type of situation. But for now, it’s okay to copy a few | |
523 | > strings to continue making progress because you’ll make these copies only | |
923072b8 | 524 | > once and your file path and query string are very small. It’s better to have |
a2a8927a XL |
525 | > a working program that’s a bit inefficient than to try to hyperoptimize code |
526 | > on your first pass. As you become more experienced with Rust, it’ll be | |
527 | > easier to start with the most efficient solution, but for now, it’s | |
528 | > perfectly acceptable to call `clone`. | |
529 | ||
530 | We’ve updated `main` so it places the instance of `Config` returned by | |
531 | `parse_config` into a variable named `config` [1], and we updated the code that | |
923072b8 | 532 | previously used the separate `query` and `file_path` variables so it now uses |
a2a8927a XL |
533 | the fields on the `Config` struct instead [2][3][4]. |
534 | ||
923072b8 | 535 | Now our code more clearly conveys that `query` and `file_path` are related and |
a2a8927a XL |
536 | that their purpose is to configure how the program will work. Any code that |
537 | uses these values knows to find them in the `config` instance in the fields | |
538 | named for their purpose. | |
539 | ||
540 | #### Creating a Constructor for `Config` | |
541 | ||
542 | So far, we’ve extracted the logic responsible for parsing the command line | |
543 | arguments from `main` and placed it in the `parse_config` function. Doing so | |
923072b8 | 544 | helped us to see that the `query` and `file_path` values were related and that |
a2a8927a | 545 | relationship should be conveyed in our code. We then added a `Config` struct to |
923072b8 | 546 | name the related purpose of `query` and `file_path` and to be able to return the |
a2a8927a XL |
547 | values’ names as struct field names from the `parse_config` function. |
548 | ||
549 | So now that the purpose of the `parse_config` function is to create a `Config` | |
550 | instance, we can change `parse_config` from a plain function to a function | |
551 | named `new` that is associated with the `Config` struct. Making this change | |
552 | will make the code more idiomatic. We can create instances of types in the | |
553 | standard library, such as `String`, by calling `String::new`. Similarly, by | |
554 | changing `parse_config` into a `new` function associated with `Config`, we’ll | |
555 | be able to create instances of `Config` by calling `Config::new`. Listing 12-7 | |
556 | shows the changes we need to make. | |
557 | ||
558 | Filename: src/main.rs | |
559 | ||
560 | ``` | |
561 | fn main() { | |
562 | let args: Vec<String> = env::args().collect(); | |
563 | ||
564 | [1] let config = Config::new(&args); | |
565 | ||
566 | // --snip-- | |
567 | } | |
568 | ||
569 | // --snip-- | |
570 | ||
571 | [2] impl Config { | |
572 | [3] fn new(args: &[String]) -> Config { | |
573 | let query = args[1].clone(); | |
923072b8 | 574 | let file_path = args[2].clone(); |
a2a8927a | 575 | |
923072b8 | 576 | Config { query, file_path } |
a2a8927a XL |
577 | } |
578 | } | |
579 | ``` | |
580 | ||
923072b8 FG |
581 | <!--- |
582 | ||
583 | Not sure how nitty I'm being but worth a mention: | |
584 | ||
585 | Cloning in a constructor feels a bit awkward, as the clones will take | |
586 | additional memory which could exceed the system memory and cause a panic. We | |
587 | may want to promote a "constructors don't fail" way of thinking where possible. | |
588 | For that, we'd need to move to using two `String` params for the `new` | |
589 | function, which also feels a bit more Rust-y way of doing it. | |
590 | ||
591 | /JT ---> | |
592 | <!-- We fix this in Chapter 13, and I haven't heard people saying "constructors | |
593 | don't fail" before... I can see how that would be important in some contexts, | |
594 | but I would hesitate to push that generally. /Carol --> | |
595 | ||
a2a8927a XL |
596 | Listing 12-7: Changing `parse_config` into `Config::new` |
597 | ||
598 | We’ve updated `main` where we were calling `parse_config` to instead call | |
599 | `Config::new` [1]. We’ve changed the name of `parse_config` to `new` [3] and | |
600 | moved it within an `impl` block [2], which associates the `new` function with | |
601 | `Config`. Try compiling this code again to make sure it works. | |
602 | ||
603 | ### Fixing the Error Handling | |
604 | ||
605 | Now we’ll work on fixing our error handling. Recall that attempting to access | |
606 | the values in the `args` vector at index 1 or index 2 will cause the program to | |
607 | panic if the vector contains fewer than three items. Try running the program | |
608 | without any arguments; it will look like this: | |
609 | ||
610 | ``` | |
611 | $ cargo run | |
612 | Compiling minigrep v0.1.0 (file:///projects/minigrep) | |
613 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s | |
614 | Running `target/debug/minigrep` | |
615 | thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/main.rs:27:21 | |
616 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace | |
617 | ``` | |
618 | ||
619 | The line `index out of bounds: the len is 1 but the index is 1` is an error | |
620 | message intended for programmers. It won’t help our end users understand what | |
04454e1e | 621 | they should do instead. Let’s fix that now. |
a2a8927a XL |
622 | |
623 | #### Improving the Error Message | |
624 | ||
625 | In Listing 12-8, we add a check in the `new` function that will verify that the | |
626 | slice is long enough before accessing index 1 and 2. If the slice isn’t long | |
04454e1e | 627 | enough, the program panics and displays a better error message. |
a2a8927a XL |
628 | |
629 | Filename: src/main.rs | |
630 | ||
631 | ``` | |
632 | // --snip-- | |
633 | fn new(args: &[String]) -> Config { | |
634 | if args.len() < 3 { | |
635 | panic!("not enough arguments"); | |
636 | } | |
637 | // --snip-- | |
638 | ``` | |
639 | ||
640 | Listing 12-8: Adding a check for the number of arguments | |
641 | ||
04454e1e | 642 | This code is similar to the `Guess::new` function we wrote in Listing 9-13, |
a2a8927a XL |
643 | where we called `panic!` when the `value` argument was out of the range of |
644 | valid values. Instead of checking for a range of values here, we’re checking | |
645 | that the length of `args` is at least 3 and the rest of the function can | |
646 | operate under the assumption that this condition has been met. If `args` has | |
647 | fewer than three items, this condition will be true, and we call the `panic!` | |
648 | macro to end the program immediately. | |
649 | ||
650 | With these extra few lines of code in `new`, let’s run the program without any | |
651 | arguments again to see what the error looks like now: | |
652 | ||
653 | ``` | |
654 | $ cargo run | |
655 | Compiling minigrep v0.1.0 (file:///projects/minigrep) | |
656 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s | |
657 | Running `target/debug/minigrep` | |
658 | thread 'main' panicked at 'not enough arguments', src/main.rs:26:13 | |
659 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace | |
660 | ``` | |
661 | ||
662 | This output is better: we now have a reasonable error message. However, we also | |
663 | have extraneous information we don’t want to give to our users. Perhaps using | |
664 | the technique we used in Listing 9-13 isn’t the best to use here: a call to | |
665 | `panic!` is more appropriate for a programming problem than a usage problem, as | |
04454e1e | 666 | discussed in Chapter 9. Instead, we’ll use the other technique you learned |
a2a8927a XL |
667 | about in Chapter 9—returning a `Result` that indicates either success or an |
668 | error. | |
669 | ||
923072b8 | 670 | #### Returning a `Result` Instead of Calling `panic!` |
a2a8927a XL |
671 | |
672 | We can instead return a `Result` value that will contain a `Config` instance in | |
923072b8 FG |
673 | the successful case and will describe the problem in the error case. We’re also |
674 | going to change the function name from `new` to `build` because many | |
675 | programmers expect `new` functions to never fail. When `Config::build` is | |
676 | communicating to `main`, we can use the `Result` type to signal there was a | |
677 | problem. Then we can change `main` to convert an `Err` variant into a more | |
678 | practical error for our users without the surrounding text about `thread | |
679 | 'main'` and `RUST_BACKTRACE` that a call to `panic!` causes. | |
680 | ||
681 | Listing 12-9 shows the changes we need to make to the return value of the | |
682 | function we’re now calling `Config::build` and the body of the function needed | |
683 | to return a `Result`. Note that this won’t compile until we update `main` as | |
684 | well, which we’ll do in the next listing. | |
a2a8927a XL |
685 | |
686 | Filename: src/main.rs | |
687 | ||
688 | ``` | |
689 | impl Config { | |
923072b8 | 690 | fn build(args: &[String]) -> Result<Config, &'static str> { |
a2a8927a XL |
691 | if args.len() < 3 { |
692 | return Err("not enough arguments"); | |
693 | } | |
694 | ||
695 | let query = args[1].clone(); | |
923072b8 | 696 | let file_path = args[2].clone(); |
a2a8927a | 697 | |
923072b8 | 698 | Ok(Config { query, file_path }) |
a2a8927a XL |
699 | } |
700 | } | |
701 | ``` | |
702 | ||
923072b8 FG |
703 | <!--- |
704 | ||
705 | Similar to above, I think having infallible constructors are a bit more Rust-y. | |
706 | For times where you need to construct and that construction can potentially | |
707 | fail, we should use a different name than `new` to key people in that they | |
708 | aren't getting a constructor but instead they're maybe getting the type they | |
709 | want (or maybe an error). | |
a2a8927a | 710 | |
923072b8 FG |
711 | /JT ---> |
712 | <!-- Ok, you've convinced me to change the name from `new` to `build` even | |
713 | though I don't think the "constructors should be infallible" philospohpy is | |
714 | universal. /Carol --> | |
a2a8927a | 715 | |
923072b8 | 716 | Listing 12-9: Returning a `Result` from `Config::build` |
a2a8927a | 717 | |
923072b8 FG |
718 | Our `build` function returns a `Result` with a `Config` instance in the success |
719 | case and a `&'static str` in the error case. Our error values will always be | |
720 | string literals that have the `'static` lifetime. | |
a2a8927a | 721 | |
923072b8 FG |
722 | We’ve made two changes in the body of the function: instead of calling `panic!` |
723 | when the user doesn’t pass enough arguments, we now return an `Err` value, and | |
724 | we’ve wrapped the `Config` return value in an `Ok`. These changes make the | |
725 | function conform to its new type signature. | |
726 | ||
727 | Returning an `Err` value from `Config::build` allows the `main` function to | |
728 | handle the `Result` value returned from the `build` function and exit the | |
729 | process more cleanly in the error case. | |
730 | ||
731 | #### Calling `Config::build` and Handling Errors | |
a2a8927a XL |
732 | |
733 | To handle the error case and print a user-friendly message, we need to update | |
923072b8 | 734 | `main` to handle the `Result` being returned by `Config::build`, as shown in |
a2a8927a | 735 | Listing 12-10. We’ll also take the responsibility of exiting the command line |
04454e1e FG |
736 | tool with a nonzero error code away from `panic!` and instead implement it by |
737 | hand. A nonzero exit status is a convention to signal to the process that | |
738 | called our program that the program exited with an error state. | |
a2a8927a XL |
739 | |
740 | Filename: src/main.rs | |
741 | ||
742 | ``` | |
743 | [1] use std::process; | |
744 | ||
745 | fn main() { | |
746 | let args: Vec<String> = env::args().collect(); | |
747 | ||
923072b8 FG |
748 | [2] let config = Config::build(&args).unwrap_or_else([3]|err[4]| { |
749 | [5] println!("Problem parsing arguments: {err}"); | |
a2a8927a XL |
750 | [6] process::exit(1); |
751 | }); | |
752 | ||
753 | // --snip-- | |
754 | ``` | |
755 | ||
923072b8 | 756 | Listing 12-10: Exiting with an error code if building a `Config` fails |
a2a8927a XL |
757 | |
758 | In this listing, we’ve used a method we haven’t covered in detail yet: | |
759 | `unwrap_or_else`, which is defined on `Result<T, E>` by the standard library | |
760 | [2]. Using `unwrap_or_else` allows us to define some custom, non-`panic!` error | |
761 | handling. If the `Result` is an `Ok` value, this method’s behavior is similar | |
762 | to `unwrap`: it returns the inner value `Ok` is wrapping. However, if the value | |
763 | is an `Err` value, this method calls the code in the *closure*, which is an | |
764 | anonymous function we define and pass as an argument to `unwrap_or_else` [3]. | |
765 | We’ll cover closures in more detail in Chapter 13. For now, you just need to | |
766 | know that `unwrap_or_else` will pass the inner value of the `Err`, which in | |
767 | this case is the static string `"not enough arguments"` that we added in | |
768 | Listing 12-9, to our closure in the argument `err` that appears between the | |
769 | vertical pipes [4]. The code in the closure can then use the `err` value when | |
770 | it runs. | |
771 | ||
772 | We’ve added a new `use` line to bring `process` from the standard library into | |
773 | scope [1]. The code in the closure that will be run in the error case is only | |
774 | two lines: we print the `err` value [5] and then call `process::exit` [6]. The | |
775 | `process::exit` function will stop the program immediately and return the | |
776 | number that was passed as the exit status code. This is similar to the | |
777 | `panic!`-based handling we used in Listing 12-8, but we no longer get all the | |
778 | extra output. Let’s try it: | |
779 | ||
780 | ``` | |
781 | $ cargo run | |
782 | Compiling minigrep v0.1.0 (file:///projects/minigrep) | |
783 | Finished dev [unoptimized + debuginfo] target(s) in 0.48s | |
784 | Running `target/debug/minigrep` | |
785 | Problem parsing arguments: not enough arguments | |
786 | ``` | |
787 | ||
788 | Great! This output is much friendlier for our users. | |
789 | ||
790 | ### Extracting Logic from `main` | |
791 | ||
792 | Now that we’ve finished refactoring the configuration parsing, let’s turn to | |
793 | the program’s logic. As we stated in “Separation of Concerns for Binary | |
794 | Projects”, we’ll extract a function named `run` that will hold all the logic | |
795 | currently in the `main` function that isn’t involved with setting up | |
796 | configuration or handling errors. When we’re done, `main` will be concise and | |
797 | easy to verify by inspection, and we’ll be able to write tests for all the | |
798 | other logic. | |
799 | ||
800 | Listing 12-11 shows the extracted `run` function. For now, we’re just making | |
801 | the small, incremental improvement of extracting the function. We’re still | |
802 | defining the function in *src/main.rs*. | |
803 | ||
804 | Filename: src/main.rs | |
805 | ||
806 | ``` | |
807 | fn main() { | |
808 | // --snip-- | |
809 | ||
810 | println!("Searching for {}", config.query); | |
923072b8 | 811 | println!("In file {}", config.file_path); |
a2a8927a XL |
812 | |
813 | run(config); | |
814 | } | |
815 | ||
816 | fn run(config: Config) { | |
923072b8 FG |
817 | let contents = fs::read_to_string(config.file_path) |
818 | .expect("Should have been able to read the file"); | |
a2a8927a | 819 | |
923072b8 | 820 | println!("With text:\n{contents}"); |
a2a8927a XL |
821 | } |
822 | ||
823 | // --snip-- | |
824 | ``` | |
825 | ||
826 | Listing 12-11: Extracting a `run` function containing the rest of the program | |
827 | logic | |
828 | ||
829 | The `run` function now contains all the remaining logic from `main`, starting | |
830 | from reading the file. The `run` function takes the `Config` instance as an | |
831 | argument. | |
832 | ||
833 | #### Returning Errors from the `run` Function | |
834 | ||
835 | With the remaining program logic separated into the `run` function, we can | |
923072b8 | 836 | improve the error handling, as we did with `Config::build` in Listing 12-9. |
a2a8927a XL |
837 | Instead of allowing the program to panic by calling `expect`, the `run` |
838 | function will return a `Result<T, E>` when something goes wrong. This will let | |
04454e1e | 839 | us further consolidate the logic around handling errors into `main` in a |
a2a8927a XL |
840 | user-friendly way. Listing 12-12 shows the changes we need to make to the |
841 | signature and body of `run`. | |
842 | ||
843 | Filename: src/main.rs | |
844 | ||
845 | ``` | |
846 | [1] use std::error::Error; | |
847 | ||
848 | // --snip-- | |
849 | ||
850 | [2] fn run(config: Config) -> Result<(), Box<dyn Error>> { | |
923072b8 | 851 | let contents = fs::read_to_string(config.file_path)?[3]; |
a2a8927a | 852 | |
923072b8 | 853 | println!("With text:\n{contents}"); |
a2a8927a XL |
854 | |
855 | [4] Ok(()) | |
856 | } | |
857 | ``` | |
858 | ||
859 | Listing 12-12: Changing the `run` function to return `Result` | |
860 | ||
861 | We’ve made three significant changes here. First, we changed the return type of | |
862 | the `run` function to `Result<(), Box<dyn Error>>` [2]. This function previously | |
863 | returned the unit type, `()`, and we keep that as the value returned in the | |
864 | `Ok` case. | |
865 | ||
866 | For the error type, we used the *trait object* `Box<dyn Error>` (and we’ve | |
867 | brought `std::error::Error` into scope with a `use` statement at the top [1]). | |
868 | We’ll cover trait objects in Chapter 17. For now, just know that `Box<dyn | |
869 | Error>` means the function will return a type that implements the `Error` | |
870 | trait, but we don’t have to specify what particular type the return value will | |
871 | be. This gives us flexibility to return error values that may be of different | |
872 | types in different error cases. The `dyn` keyword is short for “dynamic.” | |
873 | ||
874 | Second, we’ve removed the call to `expect` in favor of the `?` operator [3], as | |
875 | we talked about in Chapter 9. Rather than `panic!` on an error, `?` will return | |
876 | the error value from the current function for the caller to handle. | |
877 | ||
878 | Third, the `run` function now returns an `Ok` value in the success case [4]. | |
879 | We’ve declared the `run` function’s success type as `()` in the signature, | |
880 | which means we need to wrap the unit type value in the `Ok` value. This | |
881 | `Ok(())` syntax might look a bit strange at first, but using `()` like this is | |
882 | the idiomatic way to indicate that we’re calling `run` for its side effects | |
883 | only; it doesn’t return a value we need. | |
884 | ||
885 | When you run this code, it will compile but will display a warning: | |
886 | ||
887 | ``` | |
888 | warning: unused `Result` that must be used | |
889 | --> src/main.rs:19:5 | |
890 | | | |
891 | 19 | run(config); | |
892 | | ^^^^^^^^^^^^ | |
893 | | | |
894 | = note: `#[warn(unused_must_use)]` on by default | |
895 | = note: this `Result` may be an `Err` variant, which should be handled | |
896 | ``` | |
897 | ||
898 | Rust tells us that our code ignored the `Result` value and the `Result` value | |
899 | might indicate that an error occurred. But we’re not checking to see whether or | |
900 | not there was an error, and the compiler reminds us that we probably meant to | |
901 | have some error-handling code here! Let’s rectify that problem now. | |
902 | ||
903 | #### Handling Errors Returned from `run` in `main` | |
904 | ||
905 | We’ll check for errors and handle them using a technique similar to one we used | |
923072b8 | 906 | with `Config::build` in Listing 12-10, but with a slight difference: |
a2a8927a XL |
907 | |
908 | Filename: src/main.rs | |
909 | ||
910 | ``` | |
911 | fn main() { | |
912 | // --snip-- | |
913 | ||
914 | println!("Searching for {}", config.query); | |
923072b8 | 915 | println!("In file {}", config.file_path); |
a2a8927a XL |
916 | |
917 | if let Err(e) = run(config) { | |
923072b8 | 918 | println!("Application error: {e}"); |
a2a8927a XL |
919 | |
920 | process::exit(1); | |
921 | } | |
922 | } | |
923 | ``` | |
924 | ||
925 | We use `if let` rather than `unwrap_or_else` to check whether `run` returns an | |
926 | `Err` value and call `process::exit(1)` if it does. The `run` function doesn’t | |
923072b8 | 927 | return a value that we want to `unwrap` in the same way that `Config::build` |
a2a8927a XL |
928 | returns the `Config` instance. Because `run` returns `()` in the success case, |
929 | we only care about detecting an error, so we don’t need `unwrap_or_else` to | |
04454e1e | 930 | return the unwrapped value, which would only be `()`. |
a2a8927a XL |
931 | |
932 | The bodies of the `if let` and the `unwrap_or_else` functions are the same in | |
933 | both cases: we print the error and exit. | |
934 | ||
935 | ### Splitting Code into a Library Crate | |
936 | ||
937 | Our `minigrep` project is looking good so far! Now we’ll split the | |
04454e1e FG |
938 | *src/main.rs* file and put some code into the *src/lib.rs* file. That way we |
939 | can test the code and have a *src/main.rs* file with fewer responsibilities. | |
a2a8927a XL |
940 | |
941 | Let’s move all the code that isn’t the `main` function from *src/main.rs* to | |
942 | *src/lib.rs*: | |
943 | ||
944 | * The `run` function definition | |
945 | * The relevant `use` statements | |
946 | * The definition of `Config` | |
923072b8 | 947 | * The `Config::build` function definition |
a2a8927a XL |
948 | |
949 | The contents of *src/lib.rs* should have the signatures shown in Listing 12-13 | |
950 | (we’ve omitted the bodies of the functions for brevity). Note that this won’t | |
951 | compile until we modify *src/main.rs* in Listing 12-14. | |
952 | ||
953 | Filename: src/lib.rs | |
954 | ||
955 | ``` | |
956 | use std::error::Error; | |
957 | use std::fs; | |
958 | ||
959 | pub struct Config { | |
960 | pub query: String, | |
923072b8 | 961 | pub file_path: String, |
a2a8927a XL |
962 | } |
963 | ||
964 | impl Config { | |
923072b8 | 965 | pub fn build(args: &[String]) -> Result<Config, &'static str> { |
a2a8927a XL |
966 | // --snip-- |
967 | } | |
968 | } | |
969 | ||
970 | pub fn run(config: Config) -> Result<(), Box<dyn Error>> { | |
971 | // --snip-- | |
972 | } | |
973 | ``` | |
974 | ||
975 | Listing 12-13: Moving `Config` and `run` into *src/lib.rs* | |
976 | ||
977 | We’ve made liberal use of the `pub` keyword: on `Config`, on its fields and its | |
923072b8 FG |
978 | `build` method, and on the `run` function. We now have a library crate that has |
979 | a public API we can test! | |
a2a8927a XL |
980 | |
981 | Now we need to bring the code we moved to *src/lib.rs* into the scope of the | |
982 | binary crate in *src/main.rs*, as shown in Listing 12-14. | |
983 | ||
984 | Filename: src/main.rs | |
985 | ||
986 | ``` | |
987 | use std::env; | |
988 | use std::process; | |
989 | ||
990 | use minigrep::Config; | |
991 | ||
992 | fn main() { | |
993 | // --snip-- | |
994 | if let Err(e) = minigrep::run(config) { | |
995 | // --snip-- | |
996 | } | |
997 | } | |
998 | ``` | |
999 | ||
1000 | Listing 12-14: Using the `minigrep` library crate in *src/main.rs* | |
1001 | ||
1002 | We add a `use minigrep::Config` line to bring the `Config` type from the | |
1003 | library crate into the binary crate’s scope, and we prefix the `run` function | |
1004 | with our crate name. Now all the functionality should be connected and should | |
1005 | work. Run the program with `cargo run` and make sure everything works | |
1006 | correctly. | |
1007 | ||
1008 | Whew! That was a lot of work, but we’ve set ourselves up for success in the | |
1009 | future. Now it’s much easier to handle errors, and we’ve made the code more | |
1010 | modular. Almost all of our work will be done in *src/lib.rs* from here on out. | |
1011 | ||
1012 | Let’s take advantage of this newfound modularity by doing something that would | |
1013 | have been difficult with the old code but is easy with the new code: we’ll | |
1014 | write some tests! | |
1015 | ||
1016 | ## Developing the Library’s Functionality with Test-Driven Development | |
1017 | ||
1018 | Now that we’ve extracted the logic into *src/lib.rs* and left the argument | |
1019 | collecting and error handling in *src/main.rs*, it’s much easier to write tests | |
1020 | for the core functionality of our code. We can call functions directly with | |
1021 | various arguments and check return values without having to call our binary | |
1022 | from the command line. | |
1023 | ||
04454e1e FG |
1024 | In this section, we’ll add the searching logic to the `minigrep` program |
1025 | using the test-driven development (TDD) process with the following steps: | |
a2a8927a XL |
1026 | |
1027 | 1. Write a test that fails and run it to make sure it fails for the reason you | |
1028 | expect. | |
1029 | 2. Write or modify just enough code to make the new test pass. | |
1030 | 3. Refactor the code you just added or changed and make sure the tests | |
1031 | continue to pass. | |
1032 | 4. Repeat from step 1! | |
1033 | ||
04454e1e FG |
1034 | Though it’s just one of many ways to write software, TDD can help drive code |
1035 | design. Writing the test before you write the code that makes the test pass | |
1036 | helps to maintain high test coverage throughout the process. | |
a2a8927a XL |
1037 | |
1038 | We’ll test drive the implementation of the functionality that will actually do | |
1039 | the searching for the query string in the file contents and produce a list of | |
1040 | lines that match the query. We’ll add this functionality in a function called | |
1041 | `search`. | |
1042 | ||
1043 | ### Writing a Failing Test | |
1044 | ||
1045 | Because we don’t need them anymore, let’s remove the `println!` statements from | |
1046 | *src/lib.rs* and *src/main.rs* that we used to check the program’s behavior. | |
04454e1e FG |
1047 | Then, in *src/lib.rs*, add a `tests` module with a test function, as we did in |
1048 | Chapter 11. The test function specifies the behavior we want the `search` | |
1049 | function to have: it will take a query and the text to search, and it will | |
1050 | return only the lines from the text that contain the query. Listing 12-15 shows | |
1051 | this test, which won’t compile yet. | |
a2a8927a XL |
1052 | |
1053 | Filename: src/lib.rs | |
1054 | ||
1055 | ``` | |
1056 | #[cfg(test)] | |
1057 | mod tests { | |
1058 | use super::*; | |
1059 | ||
1060 | #[test] | |
1061 | fn one_result() { | |
1062 | let query = "duct"; | |
1063 | let contents = "\ | |
1064 | Rust: | |
1065 | safe, fast, productive. | |
1066 | Pick three."; | |
1067 | ||
1068 | assert_eq!(vec!["safe, fast, productive."], search(query, contents)); | |
1069 | } | |
1070 | } | |
1071 | ``` | |
1072 | ||
1073 | Listing 12-15: Creating a failing test for the `search` function we wish we had | |
1074 | ||
1075 | This test searches for the string `"duct"`. The text we’re searching is three | |
1076 | lines, only one of which contains `"duct"` (Note that the backslash after the | |
1077 | opening double quote tells Rust not to put a newline character at the beginning | |
1078 | of the contents of this string literal). We assert that the value returned from | |
1079 | the `search` function contains only the line we expect. | |
1080 | ||
04454e1e FG |
1081 | We aren’t yet able to run this test and watch it fail because the test doesn’t |
1082 | even compile: the `search` function doesn’t exist yet! In accordance with TDD | |
1083 | principles, we’ll add just enough code to get the test to compile and run by | |
1084 | adding a definition of the `search` function that always returns an empty | |
1085 | vector, as shown in Listing 12-16. Then the test should compile and fail | |
1086 | because an empty vector doesn’t match a vector containing the line `"safe, | |
1087 | fast, productive."` | |
a2a8927a XL |
1088 | |
1089 | Filename: src/lib.rs | |
1090 | ||
1091 | ``` | |
1092 | pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> { | |
1093 | vec![] | |
1094 | } | |
1095 | ``` | |
1096 | ||
1097 | Listing 12-16: Defining just enough of the `search` function so our test will | |
1098 | compile | |
1099 | ||
04454e1e FG |
1100 | Notice that we need to define an explicit lifetime `'a` in the signature of |
1101 | `search` and use that lifetime with the `contents` argument and the return | |
1102 | value. Recall in Chapter 10 that the lifetime parameters specify which argument | |
1103 | lifetime is connected to the lifetime of the return value. In this case, we | |
1104 | indicate that the returned vector should contain string slices that reference | |
1105 | slices of the argument `contents` (rather than the argument `query`). | |
a2a8927a XL |
1106 | |
1107 | In other words, we tell Rust that the data returned by the `search` function | |
1108 | will live as long as the data passed into the `search` function in the | |
1109 | `contents` argument. This is important! The data referenced *by* a slice needs | |
1110 | to be valid for the reference to be valid; if the compiler assumes we’re making | |
1111 | string slices of `query` rather than `contents`, it will do its safety checking | |
1112 | incorrectly. | |
1113 | ||
1114 | If we forget the lifetime annotations and try to compile this function, we’ll | |
1115 | get this error: | |
1116 | ||
1117 | ``` | |
1118 | error[E0106]: missing lifetime specifier | |
1119 | --> src/lib.rs:28:51 | |
1120 | | | |
1121 | 28 | pub fn search(query: &str, contents: &str) -> Vec<&str> { | |
1122 | | ---- ---- ^ expected named lifetime parameter | |
1123 | | | |
1124 | = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `query` or `contents` | |
1125 | help: consider introducing a named lifetime parameter | |
1126 | | | |
1127 | 28 | pub fn search<'a>(query: &'a str, contents: &'a str) -> Vec<&'a str> { | |
1128 | | ++++ ++ ++ ++ | |
1129 | ``` | |
1130 | ||
1131 | Rust can’t possibly know which of the two arguments we need, so we need to tell | |
04454e1e FG |
1132 | it explicitly. Because `contents` is the argument that contains all of our text |
1133 | and we want to return the parts of that text that match, we know `contents` is | |
1134 | the argument that should be connected to the return value using the lifetime | |
1135 | syntax. | |
a2a8927a XL |
1136 | |
1137 | Other programming languages don’t require you to connect arguments to return | |
04454e1e FG |
1138 | values in the signature, but this practice will get easier over time. You might |
1139 | want to compare this example with the “Validating References with Lifetimes” | |
1140 | section in Chapter 10. | |
a2a8927a XL |
1141 | |
1142 | Now let’s run the test: | |
1143 | ||
1144 | ``` | |
1145 | $ cargo test | |
1146 | Compiling minigrep v0.1.0 (file:///projects/minigrep) | |
1147 | Finished test [unoptimized + debuginfo] target(s) in 0.97s | |
1148 | Running unittests (target/debug/deps/minigrep-9cd200e5fac0fc94) | |
1149 | ||
1150 | running 1 test | |
1151 | test tests::one_result ... FAILED | |
1152 | ||
1153 | failures: | |
1154 | ||
1155 | ---- tests::one_result stdout ---- | |
1156 | thread 'main' panicked at 'assertion failed: `(left == right)` | |
1157 | left: `["safe, fast, productive."]`, | |
1158 | right: `[]`', src/lib.rs:44:9 | |
1159 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace | |
1160 | ||
1161 | ||
1162 | failures: | |
1163 | tests::one_result | |
1164 | ||
1165 | test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s | |
1166 | ||
1167 | error: test failed, to rerun pass '--lib' | |
1168 | ``` | |
1169 | ||
1170 | Great, the test fails, exactly as we expected. Let’s get the test to pass! | |
1171 | ||
1172 | ### Writing Code to Pass the Test | |
1173 | ||
1174 | Currently, our test is failing because we always return an empty vector. To fix | |
1175 | that and implement `search`, our program needs to follow these steps: | |
1176 | ||
1177 | * Iterate through each line of the contents. | |
1178 | * Check whether the line contains our query string. | |
1179 | * If it does, add it to the list of values we’re returning. | |
1180 | * If it doesn’t, do nothing. | |
1181 | * Return the list of results that match. | |
1182 | ||
1183 | Let’s work through each step, starting with iterating through lines. | |
1184 | ||
1185 | #### Iterating Through Lines with the `lines` Method | |
1186 | ||
1187 | Rust has a helpful method to handle line-by-line iteration of strings, | |
1188 | conveniently named `lines`, that works as shown in Listing 12-17. Note this | |
1189 | won’t compile yet. | |
1190 | ||
1191 | Filename: src/lib.rs | |
1192 | ||
1193 | ``` | |
1194 | pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> { | |
1195 | for line in contents.lines() { | |
1196 | // do something with line | |
1197 | } | |
1198 | } | |
1199 | ``` | |
1200 | ||
1201 | Listing 12-17: Iterating through each line in `contents` | |
1202 | ||
1203 | The `lines` method returns an iterator. We’ll talk about iterators in depth in | |
1204 | Chapter 13, but recall that you saw this way of using an iterator in Listing | |
1205 | 3-5, where we used a `for` loop with an iterator to run some code on each item | |
1206 | in a collection. | |
1207 | ||
1208 | #### Searching Each Line for the Query | |
1209 | ||
1210 | Next, we’ll check whether the current line contains our query string. | |
1211 | Fortunately, strings have a helpful method named `contains` that does this for | |
1212 | us! Add a call to the `contains` method in the `search` function, as shown in | |
1213 | Listing 12-18. Note this still won’t compile yet. | |
1214 | ||
1215 | Filename: src/lib.rs | |
1216 | ||
1217 | ``` | |
1218 | pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> { | |
1219 | for line in contents.lines() { | |
1220 | if line.contains(query) { | |
1221 | // do something with line | |
1222 | } | |
1223 | } | |
1224 | } | |
1225 | ``` | |
1226 | ||
1227 | Listing 12-18: Adding functionality to see whether the line contains the string | |
1228 | in `query` | |
1229 | ||
04454e1e FG |
1230 | At the moment, we’re building up functionality. To get it to compile, we need |
1231 | to return a value from the body as we indicated we would in the function | |
1232 | signature. | |
1233 | ||
a2a8927a XL |
1234 | #### Storing Matching Lines |
1235 | ||
04454e1e FG |
1236 | To finish this function, we need a way to store the matching lines that we want |
1237 | to return. For that, we can make a mutable vector before the `for` loop and | |
1238 | call the `push` method to store a `line` in the vector. After the `for` loop, | |
1239 | we return the vector, as shown in Listing 12-19. | |
a2a8927a XL |
1240 | |
1241 | Filename: src/lib.rs | |
1242 | ||
1243 | ``` | |
1244 | pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> { | |
1245 | let mut results = Vec::new(); | |
1246 | ||
1247 | for line in contents.lines() { | |
1248 | if line.contains(query) { | |
1249 | results.push(line); | |
1250 | } | |
1251 | } | |
1252 | ||
1253 | results | |
1254 | } | |
1255 | ``` | |
1256 | ||
1257 | Listing 12-19: Storing the lines that match so we can return them | |
1258 | ||
1259 | Now the `search` function should return only the lines that contain `query`, | |
1260 | and our test should pass. Let’s run the test: | |
1261 | ||
1262 | ``` | |
1263 | $ cargo test | |
1264 | --snip-- | |
1265 | running 1 test | |
1266 | test tests::one_result ... ok | |
1267 | ||
1268 | test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s | |
1269 | ``` | |
1270 | ||
1271 | Our test passed, so we know it works! | |
1272 | ||
1273 | At this point, we could consider opportunities for refactoring the | |
1274 | implementation of the search function while keeping the tests passing to | |
1275 | maintain the same functionality. The code in the search function isn’t too bad, | |
1276 | but it doesn’t take advantage of some useful features of iterators. We’ll | |
1277 | return to this example in Chapter 13, where we’ll explore iterators in detail, | |
1278 | and look at how to improve it. | |
1279 | ||
1280 | #### Using the `search` Function in the `run` Function | |
1281 | ||
1282 | Now that the `search` function is working and tested, we need to call `search` | |
1283 | from our `run` function. We need to pass the `config.query` value and the | |
1284 | `contents` that `run` reads from the file to the `search` function. Then `run` | |
1285 | will print each line returned from `search`: | |
1286 | ||
1287 | Filename: src/lib.rs | |
1288 | ||
1289 | ``` | |
1290 | pub fn run(config: Config) -> Result<(), Box<dyn Error>> { | |
923072b8 | 1291 | let contents = fs::read_to_string(config.file_path)?; |
a2a8927a XL |
1292 | |
1293 | for line in search(&config.query, &contents) { | |
923072b8 | 1294 | println!("{line}"); |
a2a8927a XL |
1295 | } |
1296 | ||
1297 | Ok(()) | |
1298 | } | |
1299 | ``` | |
1300 | ||
1301 | We’re still using a `for` loop to return each line from `search` and print it. | |
1302 | ||
1303 | Now the entire program should work! Let’s try it out, first with a word that | |
1304 | should return exactly one line from the Emily Dickinson poem, “frog”: | |
1305 | ||
1306 | ``` | |
923072b8 | 1307 | $ cargo run -- frog poem.txt |
a2a8927a XL |
1308 | Compiling minigrep v0.1.0 (file:///projects/minigrep) |
1309 | Finished dev [unoptimized + debuginfo] target(s) in 0.38s | |
1310 | Running `target/debug/minigrep frog poem.txt` | |
1311 | How public, like a frog | |
1312 | ``` | |
1313 | ||
1314 | Cool! Now let’s try a word that will match multiple lines, like “body”: | |
1315 | ||
1316 | ``` | |
923072b8 | 1317 | $ cargo run -- body poem.txt |
a2a8927a XL |
1318 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s |
1319 | Running `target/debug/minigrep body poem.txt` | |
1320 | I'm nobody! Who are you? | |
1321 | Are you nobody, too? | |
1322 | How dreary to be somebody! | |
1323 | ``` | |
1324 | ||
1325 | And finally, let’s make sure that we don’t get any lines when we search for a | |
1326 | word that isn’t anywhere in the poem, such as “monomorphization”: | |
1327 | ||
1328 | ``` | |
923072b8 | 1329 | $ cargo run -- monomorphization poem.txt |
a2a8927a XL |
1330 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s |
1331 | Running `target/debug/minigrep monomorphization poem.txt` | |
1332 | ``` | |
1333 | ||
1334 | Excellent! We’ve built our own mini version of a classic tool and learned a lot | |
1335 | about how to structure applications. We’ve also learned a bit about file input | |
1336 | and output, lifetimes, testing, and command line parsing. | |
1337 | ||
1338 | To round out this project, we’ll briefly demonstrate how to work with | |
1339 | environment variables and how to print to standard error, both of which are | |
1340 | useful when you’re writing command line programs. | |
1341 | ||
1342 | ## Working with Environment Variables | |
1343 | ||
1344 | We’ll improve `minigrep` by adding an extra feature: an option for | |
1345 | case-insensitive searching that the user can turn on via an environment | |
1346 | variable. We could make this feature a command line option and require that | |
04454e1e FG |
1347 | users enter it each time they want it to apply, but by instead making it an |
1348 | environment variable, we allow our users to set the environment variable once | |
1349 | and have all their searches be case insensitive in that terminal session. | |
a2a8927a XL |
1350 | |
1351 | ### Writing a Failing Test for the Case-Insensitive `search` Function | |
1352 | ||
04454e1e FG |
1353 | We first add a new `search_case_insensitive` function that will be called when |
1354 | the environment variable has a value. We’ll continue to follow the TDD process, | |
1355 | so the first step is again to write a failing test. We’ll add a new test for | |
1356 | the new `search_case_insensitive` function and rename our old test from | |
a2a8927a XL |
1357 | `one_result` to `case_sensitive` to clarify the differences between the two |
1358 | tests, as shown in Listing 12-20. | |
1359 | ||
1360 | Filename: src/lib.rs | |
1361 | ||
1362 | ``` | |
1363 | #[cfg(test)] | |
1364 | mod tests { | |
1365 | use super::*; | |
1366 | ||
1367 | #[test] | |
1368 | fn case_sensitive() { | |
1369 | let query = "duct"; | |
1370 | let contents = "\ | |
1371 | Rust: | |
1372 | safe, fast, productive. | |
1373 | Pick three. | |
1374 | Duct tape."; | |
1375 | ||
1376 | assert_eq!(vec!["safe, fast, productive."], search(query, contents)); | |
1377 | } | |
1378 | ||
1379 | #[test] | |
1380 | fn case_insensitive() { | |
1381 | let query = "rUsT"; | |
1382 | let contents = "\ | |
1383 | Rust: | |
1384 | safe, fast, productive. | |
1385 | Pick three. | |
1386 | Trust me."; | |
1387 | ||
1388 | assert_eq!( | |
1389 | vec!["Rust:", "Trust me."], | |
1390 | search_case_insensitive(query, contents) | |
1391 | ); | |
1392 | } | |
1393 | } | |
1394 | ``` | |
1395 | ||
1396 | Listing 12-20: Adding a new failing test for the case-insensitive function | |
1397 | we’re about to add | |
1398 | ||
1399 | Note that we’ve edited the old test’s `contents` too. We’ve added a new line | |
1400 | with the text `"Duct tape."` using a capital D that shouldn’t match the query | |
1401 | `"duct"` when we’re searching in a case-sensitive manner. Changing the old test | |
1402 | in this way helps ensure that we don’t accidentally break the case-sensitive | |
1403 | search functionality that we’ve already implemented. This test should pass now | |
1404 | and should continue to pass as we work on the case-insensitive search. | |
1405 | ||
1406 | The new test for the case-*insensitive* search uses `"rUsT"` as its query. In | |
1407 | the `search_case_insensitive` function we’re about to add, the query `"rUsT"` | |
1408 | should match the line containing `"Rust:"` with a capital R and match the line | |
1409 | `"Trust me."` even though both have different casing from the query. This is | |
1410 | our failing test, and it will fail to compile because we haven’t yet defined | |
1411 | the `search_case_insensitive` function. Feel free to add a skeleton | |
1412 | implementation that always returns an empty vector, similar to the way we did | |
1413 | for the `search` function in Listing 12-16 to see the test compile and fail. | |
1414 | ||
1415 | ### Implementing the `search_case_insensitive` Function | |
1416 | ||
1417 | The `search_case_insensitive` function, shown in Listing 12-21, will be almost | |
1418 | the same as the `search` function. The only difference is that we’ll lowercase | |
1419 | the `query` and each `line` so whatever the case of the input arguments, | |
1420 | they’ll be the same case when we check whether the line contains the query. | |
1421 | ||
1422 | Filename: src/lib.rs | |
1423 | ||
1424 | ``` | |
1425 | pub fn search_case_insensitive<'a>( | |
1426 | query: &str, | |
1427 | contents: &'a str, | |
1428 | ) -> Vec<&'a str> { | |
1429 | [1] let query = query.to_lowercase(); | |
1430 | let mut results = Vec::new(); | |
1431 | ||
1432 | for line in contents.lines() { | |
1433 | if line.to_lowercase()[2].contains(&query[3]) { | |
1434 | results.push(line); | |
1435 | } | |
1436 | } | |
1437 | ||
1438 | results | |
1439 | } | |
1440 | ``` | |
1441 | ||
1442 | Listing 12-21: Defining the `search_case_insensitive` function to lowercase the | |
1443 | query and the line before comparing them | |
1444 | ||
1445 | First, we lowercase the `query` string and store it in a shadowed variable with | |
1446 | the same name [1]. Calling `to_lowercase` on the query is necessary so no | |
1447 | matter whether the user’s query is `"rust"`, `"RUST"`, `"Rust"`, or `"rUsT"`, | |
1448 | we’ll treat the query as if it were `"rust"` and be insensitive to the case. | |
1449 | While `to_lowercase` will handle basic Unicode, it won’t be 100% accurate. If | |
1450 | we were writing a real application, we’d want to do a bit more work here, but | |
1451 | this section is about environment variables, not Unicode, so we’ll leave it at | |
1452 | that here. | |
1453 | ||
1454 | Note that `query` is now a `String` rather than a string slice, because calling | |
1455 | `to_lowercase` creates new data rather than referencing existing data. Say the | |
1456 | query is `"rUsT"`, as an example: that string slice doesn’t contain a lowercase | |
1457 | `u` or `t` for us to use, so we have to allocate a new `String` containing | |
1458 | `"rust"`. When we pass `query` as an argument to the `contains` method now, we | |
1459 | need to add an ampersand [3] because the signature of `contains` is defined to | |
1460 | take a string slice. | |
1461 | ||
04454e1e FG |
1462 | Next, we add a call to `to_lowercase` on each `line` to lowercase all |
1463 | characters [2]. Now that we’ve converted `line` and `query` to lowercase, we’ll | |
1464 | find matches no matter what the case of the query is. | |
a2a8927a XL |
1465 | |
1466 | Let’s see if this implementation passes the tests: | |
1467 | ||
1468 | ``` | |
1469 | running 2 tests | |
1470 | test tests::case_insensitive ... ok | |
1471 | test tests::case_sensitive ... ok | |
1472 | ||
1473 | test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s | |
1474 | ``` | |
1475 | ||
1476 | Great! They passed. Now, let’s call the new `search_case_insensitive` function | |
1477 | from the `run` function. First, we’ll add a configuration option to the | |
1478 | `Config` struct to switch between case-sensitive and case-insensitive search. | |
1479 | Adding this field will cause compiler errors because we aren’t initializing | |
1480 | this field anywhere yet: | |
1481 | ||
04454e1e FG |
1482 | <!-- JT: I decided to change the field name and the environment variable to be |
1483 | called `ignore_case` to avoid some double-negative confusion detailed in this | |
1484 | issue: https://github.com/rust-lang/book/issues/1898 I'd love your thoughts | |
1485 | especially on the names and logic throughout this section! Thank you!! | |
1486 | /Carol --> | |
923072b8 FG |
1487 | <!--- |
1488 | ||
1489 | I've left a few comments. I think the name reads okay. | |
1490 | ||
1491 | I think my recurring thought here (which you'll see in a few places), is if | |
1492 | we want to make a constructor, to do more work outside of the constructor so | |
1493 | that the constructor itself is infallible. | |
1494 | ||
1495 | Or, we could rename the function to something like `build_config` or something. | |
1496 | Then it feels a bit more like "well sure, it's possible it could fail to build | |
1497 | if I give it something wrong". | |
1498 | ||
1499 | Not sure if we have space here, but in a real-world version of your example we'd | |
1500 | probably use the builder pattern, so you could optionally grab the value | |
1501 | from the environment or not. When you're writing tests, things that find their | |
1502 | settings from the env tend to be a pain as you have to incorporate that into | |
1503 | the testing. But if you can use the builder pattern, you can use another way | |
1504 | of configuring the value that is more test-friendly. | |
1505 | ||
1506 | /JT ---> | |
1507 | <!-- While I do love the builder pattern, that would be a bigger change than I | |
1508 | want to make right now to explain it thoroughly. /Carol --> | |
04454e1e | 1509 | |
a2a8927a XL |
1510 | Filename: src/lib.rs |
1511 | ||
1512 | ``` | |
1513 | pub struct Config { | |
1514 | pub query: String, | |
923072b8 | 1515 | pub file_path: String, |
04454e1e | 1516 | pub ignore_case: bool, |
a2a8927a XL |
1517 | } |
1518 | ``` | |
1519 | ||
04454e1e FG |
1520 | We added the `ignore_case` field that holds a Boolean. Next, we need the |
1521 | `run` function to check the `ignore_case` field’s value and use that to | |
1522 | decide whether to call the `search` function or the `search_case_insensitive` | |
1523 | function, as shown in Listing 12-22. This still won’t compile yet. | |
a2a8927a XL |
1524 | |
1525 | Filename: src/lib.rs | |
1526 | ||
1527 | ``` | |
1528 | pub fn run(config: Config) -> Result<(), Box<dyn Error>> { | |
923072b8 | 1529 | let contents = fs::read_to_string(config.file_path)?; |
a2a8927a | 1530 | |
04454e1e | 1531 | let results = if config.ignore_case { |
a2a8927a | 1532 | search_case_insensitive(&config.query, &contents) |
04454e1e FG |
1533 | } else { |
1534 | search(&config.query, &contents) | |
a2a8927a XL |
1535 | }; |
1536 | ||
1537 | for line in results { | |
923072b8 | 1538 | println!("{line}"); |
a2a8927a XL |
1539 | } |
1540 | ||
1541 | Ok(()) | |
1542 | } | |
1543 | ``` | |
1544 | ||
1545 | Listing 12-22: Calling either `search` or `search_case_insensitive` based on | |
04454e1e | 1546 | the value in `config.ignore_case` |
a2a8927a XL |
1547 | |
1548 | Finally, we need to check for the environment variable. The functions for | |
1549 | working with environment variables are in the `env` module in the standard | |
04454e1e FG |
1550 | library, so we bring that module into scope at the top of *src/lib.rs*. Then |
1551 | we’ll use the `var` function from the `env` module to check to see if any value | |
1552 | has been set for an environment variable named `IGNORE_CASE`, as shown in | |
1553 | Listing 12-23. | |
a2a8927a XL |
1554 | |
1555 | Filename: src/lib.rs | |
1556 | ||
1557 | ``` | |
1558 | use std::env; | |
1559 | // --snip-- | |
1560 | ||
1561 | impl Config { | |
923072b8 | 1562 | pub fn build(args: &[String]) -> Result<Config, &'static str> { |
a2a8927a XL |
1563 | if args.len() < 3 { |
1564 | return Err("not enough arguments"); | |
1565 | } | |
1566 | ||
1567 | let query = args[1].clone(); | |
923072b8 | 1568 | let file_path = args[2].clone(); |
a2a8927a | 1569 | |
04454e1e | 1570 | let ignore_case = env::var("IGNORE_CASE").is_ok(); |
a2a8927a XL |
1571 | |
1572 | Ok(Config { | |
1573 | query, | |
923072b8 | 1574 | file_path, |
04454e1e | 1575 | ignore_case, |
a2a8927a XL |
1576 | }) |
1577 | } | |
1578 | } | |
1579 | ``` | |
1580 | ||
923072b8 FG |
1581 | <!--- |
1582 | ||
1583 | Same comment on this one, too. We can largely avoid confusion here I think by | |
1584 | just not naming it `new`. | |
1585 | ||
1586 | Taking in the args as a slice of Strings also feels less Rust-y than having | |
1587 | names for the two parameters that will be part of the Config. | |
1588 | ||
1589 | /JT ---> | |
1590 | <!-- I've changed the name from `new` to `build`, and we change the way this | |
1591 | gets arguments somewhat in chapter 13. The *real* Rusty way would be to use an | |
1592 | argument parser crate like clap... and I don't want to use external crates in | |
1593 | the book any more than the one usage of rand in chapter 2 :) /Carol --> | |
1594 | ||
04454e1e FG |
1595 | Listing 12-23: Checking for any value in an environment variable named |
1596 | `IGNORE_CASE` | |
a2a8927a | 1597 | |
04454e1e FG |
1598 | Here, we create a new variable `ignore_case`. To set its value, we call the |
1599 | `env::var` function and pass it the name of the `IGNORE_CASE` environment | |
a2a8927a XL |
1600 | variable. The `env::var` function returns a `Result` that will be the |
1601 | successful `Ok` variant that contains the value of the environment variable if | |
04454e1e FG |
1602 | the environment variable is set to any value. It will return the `Err` variant |
1603 | if the environment variable is not set. | |
a2a8927a | 1604 | |
04454e1e FG |
1605 | We’re using the `is_ok` method on the `Result` to check whether the environment |
1606 | variable is set, which means the program should do a case-insensitive search. | |
1607 | If the `IGNORE_CASE` environment variable isn’t set to anything, `is_ok` will | |
1608 | return false and the program will perform a case-sensitive search. We don’t | |
a2a8927a | 1609 | care about the *value* of the environment variable, just whether it’s set or |
04454e1e | 1610 | unset, so we’re checking `is_ok` rather than using `unwrap`, `expect`, or any |
a2a8927a XL |
1611 | of the other methods we’ve seen on `Result`. |
1612 | ||
04454e1e FG |
1613 | We pass the value in the `ignore_case` variable to the `Config` instance so the |
1614 | `run` function can read that value and decide whether to call | |
1615 | `search_case_insensitive` or `search`, as we implemented in Listing 12-22. | |
a2a8927a XL |
1616 | |
1617 | Let’s give it a try! First, we’ll run our program without the environment | |
1618 | variable set and with the query `to`, which should match any line that contains | |
1619 | the word “to” in all lowercase: | |
1620 | ||
1621 | ``` | |
923072b8 | 1622 | $ cargo run -- to poem.txt |
a2a8927a XL |
1623 | Compiling minigrep v0.1.0 (file:///projects/minigrep) |
1624 | Finished dev [unoptimized + debuginfo] target(s) in 0.0s | |
1625 | Running `target/debug/minigrep to poem.txt` | |
1626 | Are you nobody, too? | |
1627 | How dreary to be somebody! | |
1628 | ``` | |
1629 | ||
04454e1e | 1630 | Looks like that still works! Now, let’s run the program with `IGNORE_CASE` |
a2a8927a XL |
1631 | set to `1` but with the same query `to`. |
1632 | ||
04454e1e | 1633 | ``` |
923072b8 | 1634 | $ IGNORE_CASE=1 cargo run -- to poem.txt |
04454e1e FG |
1635 | ``` |
1636 | ||
a2a8927a XL |
1637 | If you’re using PowerShell, you will need to set the environment variable and |
1638 | run the program as separate commands: | |
1639 | ||
1640 | ``` | |
923072b8 | 1641 | PS> $Env:IGNORE_CASE=1; cargo run -- to poem.txt |
a2a8927a XL |
1642 | ``` |
1643 | ||
04454e1e | 1644 | This will make `IGNORE_CASE` persist for the remainder of your shell |
a2a8927a XL |
1645 | session. It can be unset with the `Remove-Item` cmdlet: |
1646 | ||
1647 | ``` | |
04454e1e | 1648 | PS> Remove-Item Env:IGNORE_CASE |
a2a8927a XL |
1649 | ``` |
1650 | ||
1651 | We should get lines that contain “to” that might have uppercase letters: | |
1652 | ||
1653 | ``` | |
a2a8927a XL |
1654 | Are you nobody, too? |
1655 | How dreary to be somebody! | |
1656 | To tell your name the livelong day | |
1657 | To an admiring bog! | |
1658 | ``` | |
1659 | ||
1660 | Excellent, we also got lines containing “To”! Our `minigrep` program can now do | |
1661 | case-insensitive searching controlled by an environment variable. Now you know | |
1662 | how to manage options set using either command line arguments or environment | |
1663 | variables. | |
1664 | ||
1665 | Some programs allow arguments *and* environment variables for the same | |
1666 | configuration. In those cases, the programs decide that one or the other takes | |
04454e1e FG |
1667 | precedence. For another exercise on your own, try controlling case sensitivity |
1668 | through either a command line argument or an environment variable. Decide | |
1669 | whether the command line argument or the environment variable should take | |
1670 | precedence if the program is run with one set to case sensitive and one set to | |
1671 | ignore case. | |
a2a8927a XL |
1672 | |
1673 | The `std::env` module contains many more useful features for dealing with | |
1674 | environment variables: check out its documentation to see what is available. | |
1675 | ||
1676 | ## Writing Error Messages to Standard Error Instead of Standard Output | |
1677 | ||
1678 | At the moment, we’re writing all of our output to the terminal using the | |
1679 | `println!` macro. In most terminals, there are two kinds of output: *standard | |
1680 | output* (`stdout`) for general information and *standard error* (`stderr`) for | |
1681 | error messages. This distinction enables users to choose to direct the | |
1682 | successful output of a program to a file but still print error messages to the | |
1683 | screen. | |
1684 | ||
1685 | The `println!` macro is only capable of printing to standard output, so we | |
1686 | have to use something else to print to standard error. | |
1687 | ||
1688 | ### Checking Where Errors Are Written | |
1689 | ||
1690 | First, let’s observe how the content printed by `minigrep` is currently being | |
1691 | written to standard output, including any error messages we want to write to | |
1692 | standard error instead. We’ll do that by redirecting the standard output stream | |
04454e1e FG |
1693 | to a file while intentionally causing an error. We won’t redirect the standard |
1694 | error stream, so any content sent to standard error will continue to display on | |
1695 | the screen. | |
a2a8927a XL |
1696 | |
1697 | Command line programs are expected to send error messages to the standard error | |
1698 | stream so we can still see error messages on the screen even if we redirect the | |
1699 | standard output stream to a file. Our program is not currently well-behaved: | |
1700 | we’re about to see that it saves the error message output to a file instead! | |
1701 | ||
923072b8 | 1702 | To demonstrate this behavior, we’ll run the program with `>` and the file_path, |
04454e1e FG |
1703 | *output.txt*, that we want to redirect the standard output stream to. We won’t |
1704 | pass any arguments, which should cause an error: | |
a2a8927a XL |
1705 | |
1706 | ``` | |
1707 | $ cargo run > output.txt | |
1708 | ``` | |
1709 | ||
1710 | The `>` syntax tells the shell to write the contents of standard output to | |
1711 | *output.txt* instead of the screen. We didn’t see the error message we were | |
1712 | expecting printed to the screen, so that means it must have ended up in the | |
1713 | file. This is what *output.txt* contains: | |
1714 | ||
1715 | ``` | |
1716 | Problem parsing arguments: not enough arguments | |
1717 | ``` | |
1718 | ||
1719 | Yup, our error message is being printed to standard output. It’s much more | |
1720 | useful for error messages like this to be printed to standard error so only | |
1721 | data from a successful run ends up in the file. We’ll change that. | |
1722 | ||
1723 | ### Printing Errors to Standard Error | |
1724 | ||
1725 | We’ll use the code in Listing 12-24 to change how error messages are printed. | |
1726 | Because of the refactoring we did earlier in this chapter, all the code that | |
1727 | prints error messages is in one function, `main`. The standard library provides | |
1728 | the `eprintln!` macro that prints to the standard error stream, so let’s change | |
1729 | the two places we were calling `println!` to print errors to use `eprintln!` | |
1730 | instead. | |
1731 | ||
1732 | Filename: src/main.rs | |
1733 | ||
1734 | ``` | |
1735 | fn main() { | |
1736 | let args: Vec<String> = env::args().collect(); | |
1737 | ||
923072b8 FG |
1738 | let config = Config::build(&args).unwrap_or_else(|err| { |
1739 | eprintln!("Problem parsing arguments: {err}"); | |
a2a8927a XL |
1740 | process::exit(1); |
1741 | }); | |
1742 | ||
1743 | if let Err(e) = minigrep::run(config) { | |
923072b8 | 1744 | eprintln!("Application error: {e}"); |
a2a8927a XL |
1745 | |
1746 | process::exit(1); | |
1747 | } | |
1748 | } | |
1749 | ``` | |
1750 | ||
1751 | Listing 12-24: Writing error messages to standard error instead of standard | |
1752 | output using `eprintln!` | |
1753 | ||
04454e1e FG |
1754 | Let’s now run the program again in the same way, without any arguments and |
1755 | redirecting standard output with `>`: | |
a2a8927a XL |
1756 | |
1757 | ``` | |
1758 | $ cargo run > output.txt | |
1759 | Problem parsing arguments: not enough arguments | |
1760 | ``` | |
1761 | ||
1762 | Now we see the error onscreen and *output.txt* contains nothing, which is the | |
1763 | behavior we expect of command line programs. | |
1764 | ||
1765 | Let’s run the program again with arguments that don’t cause an error but still | |
1766 | redirect standard output to a file, like so: | |
1767 | ||
1768 | ``` | |
923072b8 | 1769 | $ cargo run -- to poem.txt > output.txt |
a2a8927a XL |
1770 | ``` |
1771 | ||
1772 | We won’t see any output to the terminal, and *output.txt* will contain our | |
1773 | results: | |
1774 | ||
1775 | Filename: output.txt | |
1776 | ||
1777 | ``` | |
1778 | Are you nobody, too? | |
1779 | How dreary to be somebody! | |
1780 | ``` | |
1781 | ||
1782 | This demonstrates that we’re now using standard output for successful output | |
1783 | and standard error for error output as appropriate. | |
1784 | ||
1785 | ## Summary | |
1786 | ||
1787 | This chapter recapped some of the major concepts you’ve learned so far and | |
1788 | covered how to perform common I/O operations in Rust. By using command line | |
1789 | arguments, files, environment variables, and the `eprintln!` macro for printing | |
04454e1e FG |
1790 | errors, you’re now prepared to write command line applications. Combined with |
1791 | the concepts in previous chapters, your code will be well organized, store data | |
a2a8927a XL |
1792 | effectively in the appropriate data structures, handle errors nicely, and be |
1793 | well tested. | |
1794 | ||
1795 | Next, we’ll explore some Rust features that were influenced by functional | |
1796 | languages: closures and iterators. |