]>
Commit | Line | Data |
---|---|---|
cc61c64b XL |
1 | |
2 | [TOC] | |
3 | ||
4 | # Common Collections | |
5 | ||
ea8adc8c XL |
6 | Rust’s standard library includes a number of very useful data structures called |
7 | *collections*. Most other data types represent one specific value, but | |
cc61c64b XL |
8 | collections can contain multiple values. Unlike the built-in array and tuple |
9 | types, the data these collections point to is stored on the heap, which means | |
10 | the amount of data does not need to be known at compile time and can grow or | |
11 | shrink as the program runs. Each kind of collection has different capabilities | |
ea8adc8c XL |
12 | and costs, and choosing an appropriate one for your current situation is a |
13 | skill you’ll develop over time. In this chapter, we’ll discuss three | |
14 | collections that are used very often in Rust programs: | |
cc61c64b XL |
15 | |
16 | * A *vector* allows us to store a variable number of values next to each other. | |
ea8adc8c XL |
17 | * A *string* is a collection of characters. We’ve discussed the `String` type |
18 | previously, but in this chapter we’ll talk about it in depth. | |
3b2f2976 | 19 | * A *hash map* allows us to associate a value with a particular key. It’s a |
cc61c64b XL |
20 | particular implementation of the more general data structure called a *map*. |
21 | ||
22 | To learn about the other kinds of collections provided by the standard library, | |
ea8adc8c | 23 | see the documentation at *https://doc.rust-lang.org/stable/std/collections/*. |
cc61c64b | 24 | |
ea8adc8c XL |
25 | We’ll discuss how to create and update vectors, strings, and hash maps, as well |
26 | as what makes each special. | |
cc61c64b | 27 | |
abe05a73 | 28 | ## Vectors Store Lists of Values |
cc61c64b | 29 | |
ea8adc8c XL |
30 | The first collection type we’ll look at is `Vec<T>`, also known as a *vector*. |
31 | Vectors allow us to store more than one value in a single data structure that | |
32 | puts all the values next to each other in memory. Vectors can only store values | |
33 | of the same type. They are useful in situations in which you have a list of | |
34 | items, such as the lines of text in a file or the prices of items in a shopping | |
35 | cart. | |
cc61c64b XL |
36 | |
37 | ### Creating a New Vector | |
38 | ||
ea8adc8c XL |
39 | To create a new, empty vector, we can call the `Vec::new` function as shown in |
40 | Listing 8-1: | |
cc61c64b | 41 | |
abe05a73 | 42 | ``` |
cc61c64b XL |
43 | let v: Vec<i32> = Vec::new(); |
44 | ``` | |
45 | ||
ea8adc8c XL |
46 | Listing 8-1: Creating a new, empty vector to hold values of type `i32` |
47 | ||
48 | Note that we added a type annotation here. Because we aren’t inserting any | |
49 | values into this vector, Rust doesn’t know what kind of elements we intend to | |
50 | store. This is an important point. Vectors are implemented using generics; | |
51 | we’ll cover how to use generics with your own types in Chapter 10. For now, | |
52 | know that the `Vec<T>` type provided by the standard library can hold any type, | |
53 | and when a specific vector holds a specific type, the type is specified within | |
54 | angle brackets. In Listing 8-1, we’ve told Rust that the `Vec<T>` in `v` will | |
cc61c64b XL |
55 | hold elements of the `i32` type. |
56 | ||
ea8adc8c XL |
57 | In more realistic code, Rust can often infer the type of value we want to store |
58 | once we insert values, so you rarely need to do this type annotation. It’s more | |
59 | common to create a `Vec<T>` that has initial values, and Rust provides the | |
60 | `vec!` macro for convenience. The macro will create a new vector that holds the | |
61 | values we give it. Listing 8-2 creates a new `Vec<i32>` that holds the values | |
62 | `1`, `2`, and `3`: | |
cc61c64b | 63 | |
abe05a73 | 64 | ``` |
cc61c64b XL |
65 | let v = vec![1, 2, 3]; |
66 | ``` | |
67 | ||
ea8adc8c XL |
68 | Listing 8-2: Creating a new vector containing values |
69 | ||
cc61c64b | 70 | Because we’ve given initial `i32` values, Rust can infer that the type of `v` |
ea8adc8c XL |
71 | is `Vec<i32>`, and the type annotation isn’t necessary. Next, we’ll look at how |
72 | to modify a vector. | |
cc61c64b XL |
73 | |
74 | ### Updating a Vector | |
75 | ||
ea8adc8c XL |
76 | To create a vector and then add elements to it, we can use the `push` method as |
77 | shown in Listing 8-3: | |
cc61c64b | 78 | |
abe05a73 | 79 | ``` |
cc61c64b XL |
80 | let mut v = Vec::new(); |
81 | ||
82 | v.push(5); | |
83 | v.push(6); | |
84 | v.push(7); | |
85 | v.push(8); | |
86 | ``` | |
87 | ||
ea8adc8c XL |
88 | Listing 8-3: Using the `push` method to add values to a vector |
89 | ||
90 | As with any variable, as discussed in Chapter 3, if we want to be able to | |
91 | change its value, we need to make it mutable using the `mut` keyword. The | |
cc61c64b XL |
92 | numbers we place inside are all of type `i32`, and Rust infers this from the |
93 | data, so we don’t need the `Vec<i32>` annotation. | |
94 | ||
ea8adc8c | 95 | ### Dropping a Vector Drops Its Elements |
cc61c64b | 96 | |
ea8adc8c XL |
97 | Like any other `struct`, a vector will be freed when it goes out of scope, as |
98 | annotated in Listing 8-4: | |
cc61c64b | 99 | |
abe05a73 | 100 | ``` |
cc61c64b XL |
101 | { |
102 | let v = vec![1, 2, 3, 4]; | |
103 | ||
104 | // do stuff with v | |
105 | ||
106 | } // <- v goes out of scope and is freed here | |
107 | ``` | |
108 | ||
ea8adc8c XL |
109 | Listing 8-4: Showing where the vector and its elements are dropped |
110 | ||
cc61c64b XL |
111 | When the vector gets dropped, all of its contents will also be dropped, meaning |
112 | those integers it holds will be cleaned up. This may seem like a | |
ea8adc8c | 113 | straightforward point but can get a bit more complicated when we start to |
cc61c64b XL |
114 | introduce references to the elements of the vector. Let’s tackle that next! |
115 | ||
116 | ### Reading Elements of Vectors | |
117 | ||
118 | Now that you know how to create, update, and destroy vectors, knowing how to | |
119 | read their contents is a good next step. There are two ways to reference a | |
120 | value stored in a vector. In the examples, we’ve annotated the types of the | |
121 | values that are returned from these functions for extra clarity. | |
122 | ||
ea8adc8c | 123 | Listing 8-5 shows both methods of accessing a value in a vector either with |
cc61c64b XL |
124 | indexing syntax or the `get` method: |
125 | ||
abe05a73 | 126 | ``` |
cc61c64b XL |
127 | let v = vec![1, 2, 3, 4, 5]; |
128 | ||
129 | let third: &i32 = &v[2]; | |
130 | let third: Option<&i32> = v.get(2); | |
131 | ``` | |
132 | ||
ea8adc8c XL |
133 | Listing 8-5: Using indexing syntax or the `get` method to access an item in a |
134 | vector | |
cc61c64b | 135 | |
ea8adc8c XL |
136 | Note two details here. First, we use the index value of `2` to get the third |
137 | element: vectors are indexed by number, starting at zero. Second, the two | |
138 | different ways to get the third element are by using `&` and `[]`, which gives | |
139 | us a reference, or by using the `get` method with the index passed as an | |
140 | argument, which gives us an `Option<&T>`. | |
141 | ||
142 | The reason Rust has two ways to reference an element is so you can choose how | |
143 | the program behaves when you try to use an index value that the vector doesn’t | |
abe05a73 XL |
144 | have an element for. As an example, let's see what a program will do if it has |
145 | a vector that holds five elements and then tries to access an element at index | |
146 | 100, as shown in Listing 8-6: | |
cc61c64b | 147 | |
abe05a73 | 148 | ``` |
cc61c64b XL |
149 | let v = vec![1, 2, 3, 4, 5]; |
150 | ||
151 | let does_not_exist = &v[100]; | |
152 | let does_not_exist = v.get(100); | |
153 | ``` | |
154 | ||
ea8adc8c XL |
155 | Listing 8-6: Attempting to access the element at index 100 in a vector |
156 | containing 5 elements | |
157 | ||
158 | When you run this code, the first `[]` method will cause a `panic!` because it | |
159 | references a nonexistent element. This method is best used when you want your | |
160 | program to consider an attempt to access an element past the end of the vector | |
161 | to be a fatal error that crashes the program. | |
cc61c64b | 162 | |
ea8adc8c XL |
163 | When the `get` method is passed an index that is outside the vector, it returns |
164 | `None` without panicking. You would use this method if accessing an element | |
165 | beyond the range of the vector happens occasionally under normal circumstances. | |
166 | Your code will then have logic to handle having either `Some(&element)` or | |
167 | `None`, as discussed in Chapter 6. For example, the index could be coming from | |
168 | a person entering a number. If they accidentally enter a number that’s too | |
169 | large and the program gets a `None` value, you could tell the user how many | |
abe05a73 | 170 | items are in the current vector and give them another chance to enter a valid |
ea8adc8c | 171 | value. That would be more user-friendly than crashing the program due to a typo! |
cc61c64b XL |
172 | |
173 | #### Invalid References | |
174 | ||
ea8adc8c XL |
175 | When the program has a valid reference, the borrow checker enforces the |
176 | ownership and borrowing rules (covered in Chapter 4) to ensure this reference | |
177 | and any other references to the contents of the vector remain valid. Recall the | |
178 | rule that states we can’t have mutable and immutable references in the same | |
179 | scope. That rule applies in Listing 8-7 where we hold an immutable reference to | |
abe05a73 XL |
180 | the first element in a vector and try to add an element to the end, which won't |
181 | work: | |
cc61c64b | 182 | |
abe05a73 | 183 | ``` |
cc61c64b XL |
184 | let mut v = vec![1, 2, 3, 4, 5]; |
185 | ||
186 | let first = &v[0]; | |
187 | ||
188 | v.push(6); | |
189 | ``` | |
190 | ||
ea8adc8c XL |
191 | Listing 8-7: Attempting to add an element to a vector while holding a reference |
192 | to an item | |
cc61c64b | 193 | |
ea8adc8c XL |
194 | Compiling this code will result in this error: |
195 | ||
abe05a73 | 196 | ``` |
cc61c64b XL |
197 | error[E0502]: cannot borrow `v` as mutable because it is also borrowed as |
198 | immutable | |
199 | | | |
200 | 4 | let first = &v[0]; | |
201 | | - immutable borrow occurs here | |
202 | 5 | | |
203 | 6 | v.push(6); | |
204 | | ^ mutable borrow occurs here | |
205 | 7 | } | |
206 | | - immutable borrow ends here | |
207 | ``` | |
208 | ||
ea8adc8c XL |
209 | The code in Listing 8-7 might look like it should work: why should a reference |
210 | to the first element care about what changes at the end of the vector? The | |
211 | reason behind this error is due to the way vectors work: adding a new element | |
cc61c64b | 212 | onto the end of the vector might require allocating new memory and copying the |
ea8adc8c XL |
213 | old elements to the new space if there isn’t enough room to put all the |
214 | elements next to each other where the vector was. In that case, the reference | |
215 | to the first element would be pointing to deallocated memory. The borrowing | |
216 | rules prevent programs from ending up in that situation. | |
217 | ||
218 | > Note: For more on the implementation details of the `Vec<T>` type, see “The | |
219 | > Nomicon” at https://doc.rust-lang.org/stable/nomicon/vec.html. | |
220 | ||
221 | ### Iterating Over the Values in a Vector | |
222 | ||
abe05a73 XL |
223 | If we want to access each element in a vector in turn, we can iterate through |
224 | all of the elements rather than use indexes to access one at a time. Listing | |
225 | 8-8 shows how to use a `for` loop to get immutable references to each element | |
226 | in a vector of `i32` values and print them out: | |
ea8adc8c | 227 | |
abe05a73 | 228 | ``` |
ea8adc8c XL |
229 | let v = vec![100, 32, 57]; |
230 | for i in &v { | |
231 | println!("{}", i); | |
232 | } | |
233 | ``` | |
234 | ||
235 | Listing 8-8: Printing each element in a vector by iterating over the elements | |
236 | using a `for` loop | |
cc61c64b | 237 | |
ea8adc8c | 238 | We can also iterate over mutable references to each element in a mutable vector |
abe05a73 | 239 | in order to make changes to all the elements. The `for` loop in Listing 8-9 |
ea8adc8c XL |
240 | will add `50` to each element: |
241 | ||
abe05a73 | 242 | ``` |
ea8adc8c XL |
243 | let mut v = vec![100, 32, 57]; |
244 | for i in &mut v { | |
245 | *i += 50; | |
246 | } | |
247 | ``` | |
248 | ||
249 | Listing 8-9: Iterating over mutable references to elements in a vector | |
250 | ||
abe05a73 XL |
251 | To change the value that the mutable reference refers to, we have to use the |
252 | dereference operator (`*`) to get to the value in `i` before we can use the | |
253 | `+=` operator . | |
cc61c64b XL |
254 | |
255 | ### Using an Enum to Store Multiple Types | |
256 | ||
257 | At the beginning of this chapter, we said that vectors can only store values | |
ea8adc8c XL |
258 | that are the same type. This can be inconvenient; there are definitely use |
259 | cases for needing to store a list of items of different types. Fortunately, the | |
260 | variants of an enum are defined under the same enum type, so when we need to | |
261 | store elements of a different type in a vector, we can define and use an enum! | |
262 | ||
263 | For example, let’s say we want to get values from a row in a spreadsheet where | |
264 | some of the columns in the row contain integers, some floating-point numbers, | |
cc61c64b | 265 | and some strings. We can define an enum whose variants will hold the different |
abe05a73 | 266 | value types, and then all the enum variants will be considered the same type: |
ea8adc8c | 267 | that of the enum. Then we can create a vector that holds that enum and so, |
abe05a73 | 268 | ultimately, holds different types. We’ve demonstrated this in Listing 8-10: |
cc61c64b | 269 | |
abe05a73 | 270 | ``` |
cc61c64b XL |
271 | enum SpreadsheetCell { |
272 | Int(i32), | |
273 | Float(f64), | |
274 | Text(String), | |
275 | } | |
276 | ||
277 | let row = vec![ | |
278 | SpreadsheetCell::Int(3), | |
279 | SpreadsheetCell::Text(String::from("blue")), | |
280 | SpreadsheetCell::Float(10.12), | |
281 | ]; | |
282 | ``` | |
283 | ||
abe05a73 XL |
284 | Listing 8-10: Defining an `enum` to store values of different types in one |
285 | vector | |
ea8adc8c XL |
286 | |
287 | The reason Rust needs to know what types will be in the vector at compile time | |
288 | is so it knows exactly how much memory on the heap will be needed to store each | |
289 | element. A secondary advantage is that we can be explicit about what types are | |
290 | allowed in this vector. If Rust allowed a vector to hold any type, there would | |
291 | be a chance that one or more of the types would cause errors with the | |
292 | operations performed on the elements of the vector. Using an enum plus a | |
293 | `match` expression means that Rust will ensure at compile time that we always | |
294 | handle every possible case, as discussed in Chapter 6. | |
cc61c64b | 295 | |
abe05a73 XL |
296 | If you don’t know the exhaustive set of types the program will get at runtime |
297 | to store in a vector when you’re writing a program, the enum technique won’t | |
ea8adc8c | 298 | work. Instead, you can use a trait object, which we’ll cover in Chapter 17. |
cc61c64b | 299 | |
ea8adc8c XL |
300 | Now that we’ve discussed some of the most common ways to use vectors, be sure |
301 | to review the API documentation for all the many useful methods defined on | |
abe05a73 | 302 | `Vec<T>` by the standard library. For example, in addition to `push`, a `pop` |
ea8adc8c XL |
303 | method removes and returns the last element. Let’s move on to the next |
304 | collection type: `String`! | |
cc61c64b | 305 | |
abe05a73 | 306 | ## Strings Store UTF-8 Encoded Text |
cc61c64b | 307 | |
ea8adc8c XL |
308 | We talked about strings in Chapter 4, but we’ll look at them in more depth now. |
309 | New Rustaceans commonly get stuck on strings due to a combination of three | |
310 | concepts: Rust’s propensity for exposing possible errors, strings being a more | |
311 | complicated data structure than many programmers give them credit for, and | |
312 | UTF-8. These concepts combine in a way that can seem difficult when you’re | |
313 | coming from other programming languages. | |
cc61c64b | 314 | |
ea8adc8c | 315 | This discussion of strings is in the collections chapter because strings are |
cc61c64b XL |
316 | implemented as a collection of bytes plus some methods to provide useful |
317 | functionality when those bytes are interpreted as text. In this section, we’ll | |
ea8adc8c | 318 | talk about the operations on `String` that every collection type has, such as |
cc61c64b XL |
319 | creating, updating, and reading. We’ll also discuss the ways in which `String` |
320 | is different than the other collections, namely how indexing into a `String` is | |
ea8adc8c XL |
321 | complicated by the differences between how people and computers interpret |
322 | `String` data. | |
323 | ||
324 | ### What Is a String? | |
325 | ||
326 | We’ll first define what we mean by the term *string*. Rust has only one string | |
327 | type in the core language, which is the string slice `str` that is usually seen | |
328 | in its borrowed form `&str`. In Chapter 4, we talked about *string slices*, | |
329 | which are references to some UTF-8 encoded string data stored elsewhere. String | |
330 | literals, for example, are stored in the binary output of the program and are | |
331 | therefore string slices. | |
332 | ||
333 | The `String` type is provided in Rust’s standard library rather than coded into | |
334 | the core language and is a growable, mutable, owned, UTF-8 encoded string type. | |
335 | When Rustaceans refer to “strings” in Rust, they usually mean the `String` and | |
336 | the string slice `&str` types, not just one of those types. Although this | |
337 | section is largely about `String`, both types are used heavily in Rust’s | |
338 | standard library and both `String` and string slices are UTF-8 encoded. | |
cc61c64b XL |
339 | |
340 | Rust’s standard library also includes a number of other string types, such as | |
ea8adc8c | 341 | `OsString`, `OsStr`, `CString`, and `CStr`. Library crates can provide even |
cc61c64b XL |
342 | more options for storing string data. Similar to the `*String`/`*Str` naming, |
343 | they often provide an owned and borrowed variant, just like `String`/`&str`. | |
ea8adc8c XL |
344 | These string types can store text in different encodings or be represented in |
345 | memory in a different way, for example. We won’t discuss these other string | |
cc61c64b XL |
346 | types in this chapter; see their API documentation for more about how to use |
347 | them and when each is appropriate. | |
348 | ||
349 | ### Creating a New String | |
350 | ||
abe05a73 XL |
351 | Many of the same operations available with `Vec<T>` are available with `String` |
352 | as well, starting with the `new` function to create a string, shown in Listing | |
353 | 8-11: | |
cc61c64b | 354 | |
abe05a73 | 355 | ``` |
ea8adc8c | 356 | let mut s = String::new(); |
cc61c64b XL |
357 | ``` |
358 | ||
abe05a73 | 359 | Listing 8-11: Creating a new, empty `String` |
cc61c64b | 360 | |
ea8adc8c XL |
361 | This line creates a new empty string called `s` that we can then load data |
362 | into. Often, we’ll have some initial data that we want to start the string | |
cc61c64b | 363 | with. For that, we use the `to_string` method, which is available on any type |
abe05a73 | 364 | that implements the `Display` trait, which string literals do. Listing 8-12 |
ea8adc8c | 365 | shows two examples: |
cc61c64b | 366 | |
abe05a73 | 367 | ``` |
cc61c64b XL |
368 | let data = "initial contents"; |
369 | ||
370 | let s = data.to_string(); | |
371 | ||
372 | // the method also works on a literal directly: | |
373 | let s = "initial contents".to_string(); | |
374 | ``` | |
375 | ||
abe05a73 | 376 | Listing 8-12: Using the `to_string` method to create a `String` from a string |
ea8adc8c XL |
377 | literal |
378 | ||
379 | This code creates a string containing `initial contents`. | |
cc61c64b XL |
380 | |
381 | We can also use the function `String::from` to create a `String` from a string | |
abe05a73 | 382 | literal. The code in Listing 8-13 is equivalent to the code from Listing 8-12 |
ea8adc8c | 383 | that uses `to_string`: |
cc61c64b | 384 | |
abe05a73 | 385 | ``` |
cc61c64b XL |
386 | let s = String::from("initial contents"); |
387 | ``` | |
388 | ||
abe05a73 | 389 | Listing 8-13: Using the `String::from` function to create a `String` from a |
ea8adc8c XL |
390 | string literal |
391 | ||
392 | Because strings are used for so many things, we can use many different generic | |
393 | APIs for strings, providing us with a lot of options. Some of them can seem | |
394 | redundant, but they all have their place! In this case, `String::from` and | |
395 | `to_string` do the same thing, so which you choose is a matter of style. | |
cc61c64b XL |
396 | |
397 | Remember that strings are UTF-8 encoded, so we can include any properly encoded | |
abe05a73 | 398 | data in them, as shown in Listing 8-14: |
cc61c64b | 399 | |
abe05a73 | 400 | ``` |
ea8adc8c XL |
401 | let hello = String::from("السلام عليكم"); |
402 | let hello = String::from("Dobrý den"); | |
403 | let hello = String::from("Hello"); | |
404 | let hello = String::from("שָׁלוֹם"); | |
405 | let hello = String::from("नमस्ते"); | |
406 | let hello = String::from("こんにちは"); | |
407 | let hello = String::from("안녕하세요"); | |
408 | let hello = String::from("你好"); | |
409 | let hello = String::from("Olá"); | |
410 | let hello = String::from("Здравствуйте"); | |
411 | let hello = String::from("Hola"); | |
cc61c64b XL |
412 | ``` |
413 | ||
abe05a73 | 414 | Listing 8-14: Storing greetings in different languages in strings |
ea8adc8c XL |
415 | |
416 | All of these are valid `String` values. | |
417 | ||
cc61c64b XL |
418 | ### Updating a String |
419 | ||
ea8adc8c | 420 | A `String` can grow in size and its contents can change, just like the contents |
abe05a73 XL |
421 | of a `Vec<T>`, by pushing more data into it. In addition, we can conveniently |
422 | use the `+` operator or the `format!` macro to concatenate `String` values | |
423 | together. | |
cc61c64b | 424 | |
ea8adc8c | 425 | #### Appending to a String with `push_str` and `push` |
cc61c64b | 426 | |
ea8adc8c | 427 | We can grow a `String` by using the `push_str` method to append a string slice, |
abe05a73 | 428 | as shown in Listing 8-15: |
cc61c64b | 429 | |
abe05a73 | 430 | ``` |
cc61c64b XL |
431 | let mut s = String::from("foo"); |
432 | s.push_str("bar"); | |
433 | ``` | |
434 | ||
abe05a73 | 435 | Listing 8-15: Appending a string slice to a `String` using the `push_str` method |
ea8adc8c XL |
436 | |
437 | After these two lines, `s` will contain `foobar`. The `push_str` method takes a | |
cc61c64b | 438 | string slice because we don’t necessarily want to take ownership of the |
abe05a73 | 439 | parameter. For example, the code in Listing 8-16 shows that it would be |
ea8adc8c | 440 | unfortunate if we weren’t able to use `s2` after appending its contents to `s1`: |
cc61c64b | 441 | |
abe05a73 | 442 | ``` |
cc61c64b | 443 | let mut s1 = String::from("foo"); |
ea8adc8c | 444 | let s2 = "bar"; |
cc61c64b | 445 | s1.push_str(&s2); |
ea8adc8c | 446 | println!("s2 is {}", s2); |
cc61c64b XL |
447 | ``` |
448 | ||
abe05a73 | 449 | Listing 8-16: Using a string slice after appending its contents to a `String` |
ea8adc8c XL |
450 | |
451 | If the `push_str` method took ownership of `s2`, we wouldn’t be able to print | |
452 | out its value on the last line. However, this code works as we’d expect! | |
453 | ||
454 | The `push` method takes a single character as a parameter and adds it to the | |
abe05a73 XL |
455 | `String`. Listing 8-17 shows code that adds the letter l character to a |
456 | `String` using the `push` method: | |
cc61c64b | 457 | |
abe05a73 | 458 | ``` |
cc61c64b XL |
459 | let mut s = String::from("lo"); |
460 | s.push('l'); | |
461 | ``` | |
462 | ||
abe05a73 | 463 | Listing 8-17: Adding one character to a `String` value using `push` |
cc61c64b | 464 | |
ea8adc8c | 465 | As a result of this code, `s` will contain `lol`. |
cc61c64b | 466 | |
ea8adc8c XL |
467 | #### Concatenation with the `+` Operator or the `format!` Macro |
468 | ||
469 | Often, we’ll want to combine two existing strings. One way is to use the `+` | |
abe05a73 | 470 | operator, as shown in Listing 8-18: |
cc61c64b | 471 | |
abe05a73 | 472 | ``` |
cc61c64b XL |
473 | let s1 = String::from("Hello, "); |
474 | let s2 = String::from("world!"); | |
475 | let s3 = s1 + &s2; // Note that s1 has been moved here and can no longer be used | |
476 | ``` | |
477 | ||
abe05a73 | 478 | Listing 8-18: Using the `+` operator to combine two `String` values into a new |
ea8adc8c XL |
479 | `String` value |
480 | ||
abe05a73 | 481 | The string `s3` will contain `Hello, world!` as a result of this code. The |
ea8adc8c | 482 | reason `s1` is no longer valid after the addition and the reason we used a |
cc61c64b XL |
483 | reference to `s2` has to do with the signature of the method that gets called |
484 | when we use the `+` operator. The `+` operator uses the `add` method, whose | |
485 | signature looks something like this: | |
486 | ||
abe05a73 | 487 | ``` |
cc61c64b XL |
488 | fn add(self, s: &str) -> String { |
489 | ``` | |
490 | ||
ea8adc8c XL |
491 | This isn’t the exact signature that’s in the standard library: in the standard |
492 | library, `add` is defined using generics. Here, we’re looking at the signature | |
493 | of `add` with concrete types substituted for the generic ones, which is what | |
494 | happens when we call this method with `String` values. We’ll discuss generics | |
495 | in Chapter 10. This signature gives us the clues we need to understand the | |
496 | tricky bits of the `+` operator. | |
497 | ||
498 | First, `s2` has an `&`, meaning that we’re adding a *reference* of the second | |
499 | string to the first string because of the `s` parameter in the `add` function: | |
500 | we can only add a `&str` to a `String`; we can’t add two `String` values | |
501 | together. But wait - the type of `&s2` is `&String`, not `&str`, as specified | |
abe05a73 XL |
502 | in the second parameter to `add`. So why does Listing 8-18 compile? |
503 | ||
504 | The reason we’re able to use `&s2` in the call to `add` is that the compiler | |
505 | can *coerce* the `&String` argument into a `&str`. When we call the `add` | |
506 | method, Rust uses a *deref coercion*, which here turns `&s2` into `&s2[..]`. | |
507 | We’ll discuss deref coercion in more depth in Chapter 15. Because `add` does | |
508 | not take ownership of the `s` parameter, `s2` will still be a valid `String` | |
509 | after this operation. | |
cc61c64b XL |
510 | |
511 | Second, we can see in the signature that `add` takes ownership of `self`, | |
abe05a73 | 512 | because `self` does *not* have an `&`. This means `s1` in Listing 8-18 will be |
ea8adc8c XL |
513 | moved into the `add` call and no longer be valid after that. So although `let |
514 | s3 = s1 + &s2;` looks like it will copy both strings and create a new one, this | |
515 | statement actually takes ownership of `s1`, appends a copy of the contents of | |
516 | `s2`, and then returns ownership of the result. In other words, it looks like | |
517 | it’s making a lot of copies but isn’t: the implementation is more efficient | |
cc61c64b XL |
518 | than copying. |
519 | ||
520 | If we need to concatenate multiple strings, the behavior of `+` gets unwieldy: | |
521 | ||
abe05a73 | 522 | ``` |
cc61c64b XL |
523 | let s1 = String::from("tic"); |
524 | let s2 = String::from("tac"); | |
525 | let s3 = String::from("toe"); | |
526 | ||
527 | let s = s1 + "-" + &s2 + "-" + &s3; | |
528 | ``` | |
529 | ||
ea8adc8c XL |
530 | At this point, `s` will be `tic-tac-toe`. With all of the `+` and `"` |
531 | characters, it’s difficult to see what’s going on. For more complicated string | |
cc61c64b XL |
532 | combining, we can use the `format!` macro: |
533 | ||
abe05a73 | 534 | ``` |
cc61c64b XL |
535 | let s1 = String::from("tic"); |
536 | let s2 = String::from("tac"); | |
537 | let s3 = String::from("toe"); | |
538 | ||
539 | let s = format!("{}-{}-{}", s1, s2, s3); | |
540 | ``` | |
541 | ||
ea8adc8c XL |
542 | This code also sets `s` to `tic-tac-toe`. The `format!` macro works in the same |
543 | way as `println!`, but instead of printing the output to the screen, it returns | |
544 | a `String` with the contents. The version of the code using `format!` is much | |
545 | easier to read and also doesn’t take ownership of any of its parameters. | |
cc61c64b XL |
546 | |
547 | ### Indexing into Strings | |
548 | ||
ea8adc8c XL |
549 | In many other programming languages, accessing individual characters in a |
550 | string by referencing them by index is a valid and common operation. However, | |
551 | if we try to access parts of a `String` using indexing syntax in Rust, we’ll | |
abe05a73 | 552 | get an error. Consider the invalid code in Listing 8-19: |
cc61c64b | 553 | |
abe05a73 | 554 | ``` |
cc61c64b XL |
555 | let s1 = String::from("hello"); |
556 | let h = s1[0]; | |
557 | ``` | |
558 | ||
abe05a73 | 559 | Listing 8-19: Attempting to use indexing syntax with a String |
ea8adc8c XL |
560 | |
561 | This code will result in the following error: | |
cc61c64b | 562 | |
abe05a73 XL |
563 | ``` |
564 | error[E0277]: the trait bound `std::string::String: std::ops::Index<{integer}>` is not satisfied | |
565 | --> | |
566 | | | |
567 | 3 | let h = s1[0]; | |
568 | | ^^^^^ the type `std::string::String` cannot be indexed by `{integer}` | |
569 | | | |
570 | = help: the trait `std::ops::Index<{integer}>` is not implemented for `std::string::String` | |
cc61c64b XL |
571 | ``` |
572 | ||
ea8adc8c XL |
573 | The error and the note tell the story: Rust strings don’t support indexing. But |
574 | why not? To answer that question, we need to discuss how Rust stores strings in | |
575 | memory. | |
cc61c64b XL |
576 | |
577 | #### Internal Representation | |
578 | ||
ea8adc8c | 579 | A `String` is a wrapper over a `Vec<u8>`. Let’s look at some of our properly |
abe05a73 | 580 | encoded UTF-8 example strings from Listing 8-14. First, this one: |
cc61c64b | 581 | |
abe05a73 | 582 | ``` |
cc61c64b XL |
583 | let len = String::from("Hola").len(); |
584 | ``` | |
585 | ||
586 | In this case, `len` will be four, which means the `Vec` storing the string | |
ea8adc8c XL |
587 | “Hola” is four bytes long. Each of these letters takes one byte when encoded in |
588 | UTF-8. But what about the following line? | |
cc61c64b | 589 | |
abe05a73 | 590 | ``` |
cc61c64b XL |
591 | let len = String::from("Здравствуйте").len(); |
592 | ``` | |
593 | ||
abe05a73 XL |
594 | Note that this string begins with the capital Cyrillic letter Ze, not the |
595 | Arabic number 3. Asked how long the string is, you might say 12. However, | |
596 | Rust’s answer is 24: that’s the number of bytes it takes to encode | |
597 | “Здравствуйте” in UTF-8, because each Unicode scalar value takes two bytes of | |
598 | storage. Therefore, an index into the string’s bytes will not always correlate | |
599 | to a valid Unicode scalar value. To demonstrate, consider this invalid Rust | |
600 | code: | |
cc61c64b | 601 | |
abe05a73 | 602 | ``` |
cc61c64b XL |
603 | let hello = "Здравствуйте"; |
604 | let answer = &hello[0]; | |
605 | ``` | |
606 | ||
607 | What should the value of `answer` be? Should it be `З`, the first letter? When | |
608 | encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`, so | |
609 | `answer` should in fact be `208`, but `208` is not a valid character on its | |
ea8adc8c XL |
610 | own. Returning `208` is likely not what a user would want if they asked for the |
611 | first letter of this string; however, that’s the only data that Rust has at | |
612 | byte index 0. Returning the byte value is probably not what users want, even if | |
613 | the string contains only Latin letters: if `&"hello"[0]` was valid code that | |
614 | returned the byte value, it would return `104`, not `h`. To avoid returning an | |
615 | unexpected value and causing bugs that might not be discovered immediately, | |
616 | Rust doesn’t compile this code at all and prevents misunderstandings earlier in | |
617 | the development process. | |
cc61c64b | 618 | |
ea8adc8c | 619 | #### Bytes and Scalar Values and Grapheme Clusters! Oh My! |
cc61c64b | 620 | |
ea8adc8c XL |
621 | Another point about UTF-8 is that there are actually three relevant ways to |
622 | look at strings from Rust’s perspective: as bytes, scalar values, and grapheme | |
623 | clusters (the closest thing to what we would call *letters*). | |
cc61c64b XL |
624 | |
625 | If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is | |
626 | ultimately stored as a `Vec` of `u8` values that looks like this: | |
627 | ||
abe05a73 | 628 | ``` |
cc61c64b XL |
629 | [224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, |
630 | 224, 165, 135] | |
631 | ``` | |
632 | ||
ea8adc8c | 633 | That’s 18 bytes and is how computers ultimately store this data. If we look at |
cc61c64b XL |
634 | them as Unicode scalar values, which are what Rust’s `char` type is, those |
635 | bytes look like this: | |
636 | ||
abe05a73 | 637 | ``` |
cc61c64b XL |
638 | ['न', 'म', 'स', '्', 'त', 'े'] |
639 | ``` | |
640 | ||
ea8adc8c | 641 | There are six `char` values here, but the fourth and sixth are not letters: |
cc61c64b XL |
642 | they’re diacritics that don’t make sense on their own. Finally, if we look at |
643 | them as grapheme clusters, we’d get what a person would call the four letters | |
ea8adc8c | 644 | that make up the Hindi word: |
cc61c64b | 645 | |
abe05a73 | 646 | ``` |
cc61c64b XL |
647 | ["न", "म", "स्", "ते"] |
648 | ``` | |
649 | ||
650 | Rust provides different ways of interpreting the raw string data that computers | |
651 | store so that each program can choose the interpretation it needs, no matter | |
652 | what human language the data is in. | |
653 | ||
ea8adc8c | 654 | A final reason Rust doesn’t allow us to index into a `String` to get a |
cc61c64b | 655 | character is that indexing operations are expected to always take constant time |
ea8adc8c XL |
656 | (O(1)). But it isn’t possible to guarantee that performance with a `String`, |
657 | because Rust would have to walk through the contents from the beginning to the | |
658 | index to determine how many valid characters there were. | |
cc61c64b XL |
659 | |
660 | ### Slicing Strings | |
661 | ||
ea8adc8c XL |
662 | Indexing into a string is often a bad idea because it’s not clear what the |
663 | return type of the string indexing operation should be: a byte value, a | |
664 | character, a grapheme cluster, or a string slice. Therefore, Rust asks you to | |
665 | be more specific if you really need to use indices to create string slices. To | |
666 | be more specific in your indexing and indicate that you want a string slice, | |
667 | rather than indexing using `[]` with a single number, you can use `[]` with a | |
668 | range to create a string slice containing particular bytes: | |
cc61c64b | 669 | |
abe05a73 | 670 | ``` |
cc61c64b XL |
671 | let hello = "Здравствуйте"; |
672 | ||
673 | let s = &hello[0..4]; | |
674 | ``` | |
675 | ||
676 | Here, `s` will be a `&str` that contains the first four bytes of the string. | |
ea8adc8c XL |
677 | Earlier, we mentioned that each of these characters was two bytes, which means |
678 | `s` will be `Зд`. | |
cc61c64b | 679 | |
ea8adc8c XL |
680 | What would happen if we used `&hello[0..1]`? The answer: Rust will panic at |
681 | runtime in the same way that accessing an invalid index in a vector does: | |
cc61c64b | 682 | |
abe05a73 XL |
683 | ``` |
684 | thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside 'З' (bytes 0..2) of `Здравствуйте`', src/libcore/str/mod.rs:2188:4 | |
cc61c64b XL |
685 | ``` |
686 | ||
ea8adc8c XL |
687 | You should use ranges to create string slices with caution, because it can |
688 | crash your program. | |
cc61c64b XL |
689 | |
690 | ### Methods for Iterating Over Strings | |
691 | ||
ea8adc8c | 692 | Fortunately, we can access elements in a string in other ways. |
cc61c64b XL |
693 | |
694 | If we need to perform operations on individual Unicode scalar values, the best | |
ea8adc8c | 695 | way to do so is to use the `chars` method. Calling `chars` on “नमस्ते” separates |
abe05a73 | 696 | out and returns six values of type `char`, and we can iterate over the result |
ea8adc8c | 697 | in order to access each element: |
cc61c64b | 698 | |
abe05a73 | 699 | ``` |
cc61c64b XL |
700 | for c in "नमस्ते".chars() { |
701 | println!("{}", c); | |
702 | } | |
703 | ``` | |
704 | ||
ea8adc8c | 705 | This code will print the following: |
cc61c64b | 706 | |
abe05a73 | 707 | ``` |
cc61c64b XL |
708 | न |
709 | म | |
710 | स | |
711 | ् | |
712 | त | |
713 | े | |
714 | ``` | |
715 | ||
716 | The `bytes` method returns each raw byte, which might be appropriate for your | |
717 | domain: | |
718 | ||
abe05a73 | 719 | ``` |
cc61c64b XL |
720 | for b in "नमस्ते".bytes() { |
721 | println!("{}", b); | |
722 | } | |
723 | ``` | |
724 | ||
725 | This code will print the 18 bytes that make up this `String`, starting with: | |
726 | ||
abe05a73 | 727 | ``` |
cc61c64b XL |
728 | 224 |
729 | 164 | |
730 | 168 | |
731 | 224 | |
732 | // ... etc | |
733 | ``` | |
734 | ||
ea8adc8c XL |
735 | But be sure to remember that valid Unicode scalar values may be made up of more |
736 | than one byte. | |
cc61c64b XL |
737 | |
738 | Getting grapheme clusters from strings is complex, so this functionality is not | |
abe05a73 XL |
739 | provided by the standard library. Crates are available on |
740 | crates.io at *https://crates.io* if this is the functionality you need. | |
cc61c64b | 741 | |
ea8adc8c | 742 | ### Strings Are Not So Simple |
cc61c64b XL |
743 | |
744 | To summarize, strings are complicated. Different programming languages make | |
745 | different choices about how to present this complexity to the programmer. Rust | |
746 | has chosen to make the correct handling of `String` data the default behavior | |
ea8adc8c XL |
747 | for all Rust programs, which means programmers have to put more thought into |
748 | handling UTF-8 data upfront. This trade-off exposes more of the complexity of | |
749 | strings than other programming languages do but prevents you from having to | |
750 | handle errors involving non-ASCII characters later in your development life | |
751 | cycle. | |
cc61c64b | 752 | |
ea8adc8c | 753 | Let’s switch to something a bit less complex: hash maps! |
cc61c64b | 754 | |
abe05a73 | 755 | ## Hash Maps Store Keys Associated with Values |
cc61c64b XL |
756 | |
757 | The last of our common collections is the *hash map*. The type `HashMap<K, V>` | |
758 | stores a mapping of keys of type `K` to values of type `V`. It does this via a | |
759 | *hashing function*, which determines how it places these keys and values into | |
760 | memory. Many different programming languages support this kind of data | |
ea8adc8c XL |
761 | structure, but often use a different name, such as hash, map, object, hash |
762 | table, or associative array, just to name a few. | |
cc61c64b | 763 | |
ea8adc8c XL |
764 | Hash maps are useful for when you want to look up data not by an index, as you |
765 | can with vectors, but by using a key that can be of any type. For example, in a | |
766 | game, you could keep track of each team’s score in a hash map where each key is | |
767 | a team’s name and the values are each team’s score. Given a team name, you can | |
768 | retrieve its score. | |
cc61c64b | 769 | |
ea8adc8c XL |
770 | We’ll go over the basic API of hash maps in this section, but many more goodies |
771 | are hiding in the functions defined on `HashMap<K, V>` by the standard library. | |
772 | As always, check the standard library documentation for more information. | |
cc61c64b XL |
773 | |
774 | ### Creating a New Hash Map | |
775 | ||
ea8adc8c | 776 | We can create an empty hash map with `new` and add elements with `insert`. In |
abe05a73 | 777 | Listing 8-20, we’re keeping track of the scores of two teams whose names are |
ea8adc8c XL |
778 | Blue and Yellow. The Blue team will start with 10 points, and the Yellow team |
779 | starts with 50: | |
cc61c64b | 780 | |
abe05a73 | 781 | ``` |
cc61c64b XL |
782 | use std::collections::HashMap; |
783 | ||
784 | let mut scores = HashMap::new(); | |
785 | ||
786 | scores.insert(String::from("Blue"), 10); | |
787 | scores.insert(String::from("Yellow"), 50); | |
788 | ``` | |
789 | ||
abe05a73 | 790 | Listing 8-20: Creating a new hash map and inserting some keys and values |
ea8adc8c | 791 | |
cc61c64b XL |
792 | Note that we need to first `use` the `HashMap` from the collections portion of |
793 | the standard library. Of our three common collections, this one is the least | |
abe05a73 XL |
794 | often used, so it’s not included in the features brought into scope |
795 | automatically in the prelude. Hash maps also have less support from the | |
796 | standard library; there’s no built-in macro to construct them, for example. | |
cc61c64b XL |
797 | |
798 | Just like vectors, hash maps store their data on the heap. This `HashMap` has | |
799 | keys of type `String` and values of type `i32`. Like vectors, hash maps are | |
800 | homogeneous: all of the keys must have the same type, and all of the values | |
801 | must have the same type. | |
802 | ||
803 | Another way of constructing a hash map is by using the `collect` method on a | |
804 | vector of tuples, where each tuple consists of a key and its value. The | |
ea8adc8c | 805 | `collect` method gathers data into a number of collection types, including |
cc61c64b XL |
806 | `HashMap`. For example, if we had the team names and initial scores in two |
807 | separate vectors, we can use the `zip` method to create a vector of tuples | |
808 | where “Blue” is paired with 10, and so forth. Then we can use the `collect` | |
abe05a73 | 809 | method to turn that vector of tuples into a `HashMap` as shown in Listing 8-21: |
cc61c64b | 810 | |
abe05a73 | 811 | ``` |
cc61c64b XL |
812 | use std::collections::HashMap; |
813 | ||
814 | let teams = vec![String::from("Blue"), String::from("Yellow")]; | |
815 | let initial_scores = vec![10, 50]; | |
816 | ||
817 | let scores: HashMap<_, _> = teams.iter().zip(initial_scores.iter()).collect(); | |
818 | ``` | |
819 | ||
abe05a73 | 820 | Listing 8-21: Creating a hash map from a list of teams and a list of scores |
ea8adc8c | 821 | |
cc61c64b XL |
822 | The type annotation `HashMap<_, _>` is needed here because it’s possible to |
823 | `collect` into many different data structures, and Rust doesn’t know which you | |
824 | want unless you specify. For the type parameters for the key and value types, | |
ea8adc8c XL |
825 | however, we use underscores, and Rust can infer the types that the hash map |
826 | contains based on the types of the data in the vectors. | |
cc61c64b XL |
827 | |
828 | ### Hash Maps and Ownership | |
829 | ||
830 | For types that implement the `Copy` trait, like `i32`, the values are copied | |
831 | into the hash map. For owned values like `String`, the values will be moved and | |
abe05a73 | 832 | the hash map will be the owner of those values as demonstrated in Listing 8-22: |
cc61c64b | 833 | |
abe05a73 | 834 | ``` |
cc61c64b XL |
835 | use std::collections::HashMap; |
836 | ||
837 | let field_name = String::from("Favorite color"); | |
838 | let field_value = String::from("Blue"); | |
839 | ||
840 | let mut map = HashMap::new(); | |
841 | map.insert(field_name, field_value); | |
ea8adc8c XL |
842 | // field_name and field_value are invalid at this point, try using them and |
843 | // see what compiler error you get! | |
cc61c64b XL |
844 | ``` |
845 | ||
abe05a73 | 846 | Listing 8-22: Showing that keys and values are owned by the hash map once |
ea8adc8c XL |
847 | they’re inserted |
848 | ||
849 | We aren’t able to use the variables `field_name` and `field_value` after | |
850 | they’ve been moved into the hash map with the call to `insert`. | |
cc61c64b | 851 | |
ea8adc8c XL |
852 | If we insert references to values into the hash map, the values won’t be moved |
853 | into the hash map. The values that the references point to must be valid for at | |
854 | least as long as the hash map is valid. We’ll talk more about these issues in | |
855 | the “Validating References with Lifetimes” section in Chapter 10. | |
cc61c64b XL |
856 | |
857 | ### Accessing Values in a Hash Map | |
858 | ||
ea8adc8c | 859 | We can get a value out of the hash map by providing its key to the `get` method |
abe05a73 | 860 | as shown in Listing 8-23: |
cc61c64b | 861 | |
abe05a73 | 862 | ``` |
cc61c64b XL |
863 | use std::collections::HashMap; |
864 | ||
865 | let mut scores = HashMap::new(); | |
866 | ||
867 | scores.insert(String::from("Blue"), 10); | |
868 | scores.insert(String::from("Yellow"), 50); | |
869 | ||
870 | let team_name = String::from("Blue"); | |
871 | let score = scores.get(&team_name); | |
872 | ``` | |
873 | ||
abe05a73 | 874 | Listing 8-23: Accessing the score for the Blue team stored in the hash map |
ea8adc8c | 875 | |
cc61c64b | 876 | Here, `score` will have the value that’s associated with the Blue team, and the |
ea8adc8c XL |
877 | result will be `Some(&10)`. The result is wrapped in `Some` because `get` |
878 | returns an `Option<&V>`; if there’s no value for that key in the hash map, | |
879 | `get` will return `None`. The program will need to handle the `Option` in one | |
880 | of the ways that we covered in Chapter 6. | |
cc61c64b XL |
881 | |
882 | We can iterate over each key/value pair in a hash map in a similar manner as we | |
883 | do with vectors, using a `for` loop: | |
884 | ||
abe05a73 | 885 | ``` |
cc61c64b XL |
886 | use std::collections::HashMap; |
887 | ||
888 | let mut scores = HashMap::new(); | |
889 | ||
890 | scores.insert(String::from("Blue"), 10); | |
891 | scores.insert(String::from("Yellow"), 50); | |
892 | ||
893 | for (key, value) in &scores { | |
894 | println!("{}: {}", key, value); | |
895 | } | |
896 | ``` | |
897 | ||
ea8adc8c | 898 | This code will print each pair in an arbitrary order: |
cc61c64b | 899 | |
abe05a73 | 900 | ``` |
cc61c64b XL |
901 | Yellow: 50 |
902 | Blue: 10 | |
903 | ``` | |
904 | ||
905 | ### Updating a Hash Map | |
906 | ||
ea8adc8c XL |
907 | Although the number of keys and values is growable, each key can only have one |
908 | value associated with it at a time. When we want to change the data in a hash | |
909 | map, we have to decide how to handle the case when a key already has a value | |
910 | assigned. We could replace the old value with the new value, completely | |
911 | disregarding the old value. We could keep the old value and ignore the new | |
912 | value, and only add the new value if the key *doesn’t* already have a value. Or | |
913 | we could combine the old value and the new value. Let’s look at how to do each | |
914 | of these! | |
cc61c64b XL |
915 | |
916 | #### Overwriting a Value | |
917 | ||
ea8adc8c XL |
918 | If we insert a key and a value into a hash map, and then insert that same key |
919 | with a different value, the value associated with that key will be replaced. | |
abe05a73 | 920 | Even though the code in Listing 8-24 calls `insert` twice, the hash map will |
ea8adc8c XL |
921 | only contain one key/value pair because we’re inserting the value for the Blue |
922 | team’s key both times: | |
cc61c64b | 923 | |
abe05a73 | 924 | ``` |
cc61c64b XL |
925 | use std::collections::HashMap; |
926 | ||
927 | let mut scores = HashMap::new(); | |
928 | ||
929 | scores.insert(String::from("Blue"), 10); | |
930 | scores.insert(String::from("Blue"), 25); | |
931 | ||
932 | println!("{:?}", scores); | |
933 | ``` | |
934 | ||
abe05a73 | 935 | Listing 8-24: Replacing a value stored with a particular key |
ea8adc8c XL |
936 | |
937 | This code will print `{"Blue": 25}`. The original value of `10` has been | |
938 | overwritten. | |
cc61c64b XL |
939 | |
940 | #### Only Insert If the Key Has No Value | |
941 | ||
ea8adc8c XL |
942 | It’s common to check whether a particular key has a value, and if it doesn’t, |
943 | insert a value for it. Hash maps have a special API for this called `entry` | |
944 | that takes the key we want to check as a parameter. The return value of the | |
945 | `entry` function is an enum called `Entry` that represents a value that might | |
946 | or might not exist. Let’s say we want to check whether the key for the Yellow | |
cc61c64b | 947 | team has a value associated with it. If it doesn’t, we want to insert the value |
ea8adc8c | 948 | 50, and the same for the Blue team. Using the `entry` API, the code looks like |
abe05a73 | 949 | Listing 8-25: |
cc61c64b | 950 | |
abe05a73 | 951 | ``` |
cc61c64b XL |
952 | use std::collections::HashMap; |
953 | ||
954 | let mut scores = HashMap::new(); | |
955 | scores.insert(String::from("Blue"), 10); | |
956 | ||
957 | scores.entry(String::from("Yellow")).or_insert(50); | |
958 | scores.entry(String::from("Blue")).or_insert(50); | |
959 | ||
960 | println!("{:?}", scores); | |
961 | ``` | |
962 | ||
abe05a73 | 963 | Listing 8-25: Using the `entry` method to only insert if the key does not |
ea8adc8c XL |
964 | already have a value |
965 | ||
966 | The `or_insert` method on `Entry` is defined to return the value for the | |
967 | corresponding `Entry` key if that key exists, and if not, inserts the parameter | |
968 | as the new value for this key and returns the modified `Entry`. This technique | |
969 | is much cleaner than writing the logic ourselves, and in addition, plays more | |
970 | nicely with the borrow checker. | |
cc61c64b | 971 | |
abe05a73 | 972 | Running the code in Listing 8-25 will print `{"Yellow": 50, "Blue": 10}`. The |
ea8adc8c XL |
973 | first call to `entry` will insert the key for the Yellow team with the value |
974 | `50` because the Yellow team doesn’t have a value already. The second call to | |
975 | `entry` will not change the hash map because the Blue team already has the | |
976 | value `10`. | |
cc61c64b | 977 | |
ea8adc8c | 978 | #### Updating a Value Based on the Old Value |
cc61c64b | 979 | |
ea8adc8c | 980 | Another common use case for hash maps is to look up a key’s value and then |
abe05a73 | 981 | update it based on the old value. For instance, Listing 8-26 shows code that |
ea8adc8c XL |
982 | counts how many times each word appears in some text. We use a hash map with |
983 | the words as keys and increment the value to keep track of how many times we’ve | |
984 | seen that word. If it’s the first time we’ve seen a word, we’ll first insert | |
985 | the value `0`: | |
cc61c64b | 986 | |
abe05a73 | 987 | ``` |
cc61c64b XL |
988 | use std::collections::HashMap; |
989 | ||
990 | let text = "hello world wonderful world"; | |
991 | ||
992 | let mut map = HashMap::new(); | |
993 | ||
994 | for word in text.split_whitespace() { | |
995 | let count = map.entry(word).or_insert(0); | |
996 | *count += 1; | |
997 | } | |
998 | ||
999 | println!("{:?}", map); | |
1000 | ``` | |
1001 | ||
abe05a73 | 1002 | Listing 8-26: Counting occurrences of words using a hash map that stores words |
ea8adc8c XL |
1003 | and counts |
1004 | ||
1005 | This code will print `{"world": 2, "hello": 1, "wonderful": 1}`. The | |
1006 | `or_insert` method actually returns a mutable reference (`&mut V`) to the value | |
1007 | for this key. Here we store that mutable reference in the `count` variable, so | |
1008 | in order to assign to that value we must first dereference `count` using the | |
1009 | asterisk (`*`). The mutable reference goes out of scope at the end of the `for` | |
1010 | loop, so all of these changes are safe and allowed by the borrowing rules. | |
cc61c64b XL |
1011 | |
1012 | ### Hashing Function | |
1013 | ||
1014 | By default, `HashMap` uses a cryptographically secure hashing function that can | |
1015 | provide resistance to Denial of Service (DoS) attacks. This is not the fastest | |
ea8adc8c | 1016 | hashing algorithm available, but the trade-off for better security that comes |
cc61c64b XL |
1017 | with the drop in performance is worth it. If you profile your code and find |
1018 | that the default hash function is too slow for your purposes, you can switch to | |
1019 | another function by specifying a different *hasher*. A hasher is a type that | |
ea8adc8c | 1020 | implements the `BuildHasher` trait. We’ll talk about traits and how to |
3b2f2976 | 1021 | implement them in Chapter 10. You don’t necessarily have to implement your own |
abe05a73 XL |
1022 | hasher from scratch; crates.io at *https://crates.io* has libraries shared by |
1023 | other Rust users that provide hashers implementing many common hashing | |
1024 | algorithms. | |
cc61c64b XL |
1025 | |
1026 | ## Summary | |
1027 | ||
ea8adc8c XL |
1028 | Vectors, strings, and hash maps will provide a large amount of functionality |
1029 | that you need in programs where you need to store, access, and modify data. | |
1030 | Here are some exercises you should now be equipped to solve: | |
cc61c64b XL |
1031 | |
1032 | * Given a list of integers, use a vector and return the mean (average), median | |
1033 | (when sorted, the value in the middle position), and mode (the value that | |
1034 | occurs most often; a hash map will be helpful here) of the list. | |
ea8adc8c XL |
1035 | * Convert strings to pig latin. The first consonant of each word is moved to |
1036 | the end of the word and “ay” is added, so “first” becomes “irst-fay.” Words | |
1037 | that start with a vowel have “hay” added to the end instead (“apple” becomes | |
1038 | “apple-hay”). Keep in mind the details about UTF-8 encoding! | |
cc61c64b | 1039 | * Using a hash map and vectors, create a text interface to allow a user to add |
ea8adc8c XL |
1040 | employee names to a department in a company. For example, “Add Sally to |
1041 | Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all | |
cc61c64b XL |
1042 | people in a department or all people in the company by department, sorted |
1043 | alphabetically. | |
1044 | ||
ea8adc8c XL |
1045 | The standard library API documentation describes methods that vectors, strings, |
1046 | and hash maps have that will be helpful for these exercises! | |
cc61c64b | 1047 | |
ea8adc8c XL |
1048 | We’re getting into more complex programs in which operations can fail; so, it’s |
1049 | a perfect time to discuss error handling next! |