src/doc/book/second-edition/src/ch13-03-improving-our-io-project.md

   1 ## Improving our I/O Project
   2
   3 We can improve our implementation of the I/O project in Chapter 12 by using
   4 iterators to make places in the code clearer and more concise. Let’s take a
   5 look at how iterators can improve our implementation of both the `Config::new`
   6 function and the `search` function.
   7
   8 ### Removing a `clone` Using an Iterator
   9
  10 In Listing 12-6, we added code that took a slice of `String` values and created
  11 an instance of the `Config` struct by indexing into the slice and cloning the
  12 values so that the `Config` struct could own those values. We’ve reproduced the
  13 implementation of the `Config::new` function as it was at the end of Chapter 12
  14 in Listing 13-24:
  15
  16 <span class="filename">Filename: src/lib.rs</span>
  17
  18 ```rust,ignore
  19 impl Config {
  20     pub fn new(args: &[String]) -> Result<Config, &'static str> {
  21         if args.len() < 3 {
  22             return Err("not enough arguments");
  23         }
  24
  25         let query = args[1].clone();
  26         let filename = args[2].clone();
  27
  28         let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
  29
  30         Ok(Config { query, filename, case_sensitive })
  31     }
  32 }
  33 ```
  34
  35 <span class="caption">Listing 13-24: Reproduction of the `Config::new` function
  36 from the end of Chapter 12</span>
  37
  38 <!--Is this why we didn't want to use clone calls, they were inefficient, or
  39 was it that stacking clone calls can become confusing/is bad practice? -->
  40 <!-- Yep, it's for performance reasons /Carol -->
  41
  42 At the time, we said not to worry about the inefficient `clone` calls here
  43 because we would remove them in the future. Well, that time is now!
  44
  45 The reason we needed `clone` here in the first place is that we have a slice
  46 with `String` elements in the parameter `args`, but the `new` function does not
  47 own `args`. In order to be able to return ownership of a `Config` instance, we
  48 need to clone the values that we put in the `query` and `filename` fields of
  49 `Config`, so that the `Config` instance can own its values.
  50
  51 With our new knowledge about iterators, we can change the `new` function to
  52 take ownership of an iterator as its argument instead of borrowing a slice.
  53 We’ll use the iterator functionality instead of the code we had that checks the
  54 length of the slice and indexes into specific locations. This will clear up
  55 what the `Config::new` function is doing since the iterator will take care of
  56 accessing the values.
  57
  58 <!-- use the iterator functionality to what? How will iterating allow us to do
  59 the same thing, can you briefly lay that out? -->
  60 <!-- It's mostly for clarity and using a good abstraction, I've tried fixing
  61 /Carol -->
  62
  63 Once `Config::new` taking ownership of the iterator and not using indexing
  64 operations that borrow, we can move the `String` values from the iterator into
  65 `Config` rather than calling `clone` and making a new allocation.
  66
  67 <!-- below: which file are we in, can you specify here? -->
  68 <!-- done /Carol -->
  69
  70 #### Using the Iterator Returned by `env::args` Directly
  71
  72 In your I/O project’s *src/main.rs*, let’s change the start of the `main`
  73 function from this code that we had at the end of Chapter 12:
  74
  75 ```rust,ignore
  76 fn main() {
  77     let args: Vec<String> = env::args().collect();
  78
  79     let config = Config::new(&args).unwrap_or_else(|err| {
  80         eprintln!("Problem parsing arguments: {}", err);
  81         process::exit(1);
  82     });
  83
  84     // ...snip...
  85 }
  86 ```
  87
  88 To the code in Listing 13-25:
  89
  90 <span class="filename">Filename: src/main.rs</span>
  91
  92 ```rust,ignore
  93 fn main() {
  94     let config = Config::new(env::args()).unwrap_or_else(|err| {
  95         eprintln!("Problem parsing arguments: {}", err);
  96         process::exit(1);
  97     });
  98
  99     // ...snip...
 100 }
 101 ```
 102
 103 <span class="caption">Listing 13-25: Passing the return value of `env::args` to
 104 `Config::new`</span>
 105
 106 <!-- I think, if we're going to be building this up bit by bit, it might be
 107 worth adding listing numbers and file names to each, can you add those? Don't
 108 worry about being accurate with the numbers, we can update them more easily
 109 later -->
 110 <!-- That's nice of you to offer, but since we're maintaining an online version
 111 that we're keeping in sync with each round of edits, we need to keep the
 112 listing numbers making sense as well. We'll just take care of them. /Carol -->
 113
 114 The `env::args` function returns an iterator! Rather than collecting the
 115 iterator values into a vector and then passing a slice to `Config::new`, now
 116 we’re passing ownership of the iterator returned from `env::args` to
 117 `Config::new` directly.
 118
 119 Next, we need to update the definition of `Config::new`. In your I/O project’s
 120 *src/lib.rs*, let’s change the signature of `Config::new` to look like Listing
 121 13-26:
 122
 123 <!-- can you give the filename here too? -->
 124 <!-- done /Carol -->
 125
 126 <span class="filename">Filename: src/lib.rs</span>
 127
 128 ```rust,ignore
 129 impl Config {
 130     pub fn new(args: std::env::Args) -> Result<Config, &'static str> {
 131         // ...snip...
 132 ```
 133
 134 <span class="caption">Listing 13-26: Updating the signature of `Config::new` to
 135 expect an iterator</span>
 136
 137 The standard library documentation for the `env::args` function shows that the
 138 type of the iterator it returns is `std::env::Args`. We’ve updated the
 139 signature of the `Config::new` function so that the parameter `args` has the
 140 type `std::env::Args` instead of `&[String]`.
 141
 142 #### Using `Iterator` Trait Methods Instead of Indexing
 143
 144 Next, we’ll fix the body of `Config::new`. The standard library documentation
 145 also mentions that `std::env::Args` implements the `Iterator` trait, so we know
 146 we can call the `next` method on it! Listing 13-27 has updated the code
 147 from Listing 12-23 to use the `next` method:
 148
 149 <span class="filename">Filename: src/lib.rs</span>
 150
 151 ```rust
 152 # use std::env;
 153 #
 154 # struct Config {
 155 #     query: String,
 156 #     filename: String,
 157 #     case_sensitive: bool,
 158 # }
 159 #
 160 impl Config {
 161     pub fn new(mut args: std::env::Args) -> Result<Config, &'static str> {
 162         args.next();
 163
 164         let query = match args.next() {
 165             Some(arg) => arg,
 166             None => return Err("Didn't get a query string"),
 167         };
 168
 169         let filename = match args.next() {
 170             Some(arg) => arg,
 171             None => return Err("Didn't get a file name"),
 172         };
 173
 174         let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
 175
 176         Ok(Config {
 177             query, filename, case_sensitive
 178         })
 179     }
 180 }
 181 ```
 182
 183 <span class="caption">Listing 13-27: Changing the body of `Config::new` to use
 184 iterator methods</span>
 185
 186 <!-- is this the *full* new lib.rs code? Worth noting for ghosting purposes -->
 187 <!-- No, this is just the `Config::new` function, which I thought would be
 188 clear by saying "Next, we'll fix the body of `Config::new`.", can you elaborate
 189 on why that's not clear enough? I would expect programmers to be able to
 190 understand where a function starts and ends. /Carol -->
 191
 192 Remember that the first value in the return value of `env::args` is the name of
 193 the program. We want to ignore that and get to the next value, so first we call
 194 `next` and do nothing with the return value. Second, we call `next` on the
 195 value we want to put in the `query` field of `Config`. If `next` returns a
 196 `Some`, we use a `match` to extract the value. If it returns `None`, it means
 197 not enough arguments were given and we return early with an `Err` value. We do
 198 the same thing for the `filename` value.
 199
 200 <!-- Hm, if ? would not work anyway, I'm not clear on why we mention, why it's
 201 a shame we cant use it on Option? -->
 202 <!-- We've taken this out, it's something that a portion of the readers might
 203 be wondering and something that Rust might let you do someday, but yeah, it's
 204 probably just distracting to most people /Carol -->
 205
 206 ### Making Code Clearer with Iterator Adaptors
 207
 208 The other place in our I/O project we could take advantage of iterators is in
 209 the `search` function, reproduced here in Listing 13-28 as it was at the end of
 210 Chapter 12:
 211
 212 <span class="filename">Filename: src/lib.rs</span>
 213
 214 ```rust,ignore
 215 pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
 216     let mut results = Vec::new();
 217
 218     for line in contents.lines() {
 219         if line.contains(query) {
 220             results.push(line);
 221         }
 222     }
 223
 224     results
 225 }
 226 ```
 227
 228 <span class="caption">Listing 13-28: The implementation of the `search`
 229 function from Chapter 12</span>
 230
 231 We can write this code in a much shorter way by using iterator adaptor methods
 232 instead. This also lets us avoid having a mutable intermediate `results`
 233 vector. The functional programming style prefers to minimize the amount of
 234 mutable state to make code clearer. Removing the mutable state might make it
 235 easier for us to make a future enhancement to make searching happen in
 236 parallel, since we wouldn’t have to manage concurrent access to the `results`
 237 vector. Listing 13-29 shows this change:
 238
 239 <!-- Remind us why we want to avoid the mutable results vector? -->
 240 <!-- done /Carol -->
 241
 242 <span class="filename">Filename: src/lib.rs</span>
 243
 244 ```rust,ignore
 245 pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
 246     contents.lines()
 247         .filter(|line| line.contains(query))
 248         .collect()
 249 }
 250 ```
 251
 252 <span class="caption">Listing 13-29: Using iterator adaptor methods in the
 253 implementation of the `search` function</span>
 254
 255 Recall that the purpose of the `search` function is to return all lines in
 256 `contents` that contain the `query`. Similarly to the `filter` example in
 257 Listing 13-19, we can use the `filter` adaptor to keep only the lines that
 258 `line.contains(query)` returns true for. We then collect the matching lines up
 259 into another vector with `collect`. Much simpler! Feel free to make the same
 260 change to use iterator methods in the `search_case_insensitive` function as
 261 well.
 262
 263 <!-- what is that, here, only lines that contain a matching string? A bit more
 264 context would help out, we probably can't rely on readers remembering all the
 265 details I'm afraid -->
 266 <!-- done /Carol -->
 267
 268 The next logical question is which style you should choose in your own code:
 269 the original implementation in Listing 13-28, or the version using iterators in
 270 Listing 13-29. Most Rust programmers prefer to use the iterator style. It’s a
 271 bit tougher to get the hang of at first, but once you get a feel for the
 272 various iterator adaptors and what they do, iterators can be easier to
 273 understand. Instead of fiddling with the various bits of looping and building
 274 new vectors, the code focuses on the high-level objective of the loop. This
 275 abstracts away some of the commonplace code so that it’s easier to see the
 276 concepts that are unique to this code, like the filtering condition each
 277 element in the iterator must pass.
 278
 279 But are the two implementations truly equivalent? The intuitive assumption
 280 might be that the more low-level loop will be faster. Let’s talk about
 281 performance.