]> git.proxmox.com Git - rustc.git/blame - src/doc/book/second-edition/nostarch/chapter08.md
New upstream version 1.23.0+dfsg1
[rustc.git] / src / doc / book / second-edition / nostarch / chapter08.md
CommitLineData
cc61c64b
XL
1
2[TOC]
3
4# Common Collections
5
ea8adc8c
XL
6Rust’s standard library includes a number of very useful data structures called
7*collections*. Most other data types represent one specific value, but
cc61c64b
XL
8collections can contain multiple values. Unlike the built-in array and tuple
9types, the data these collections point to is stored on the heap, which means
10the amount of data does not need to be known at compile time and can grow or
11shrink as the program runs. Each kind of collection has different capabilities
ea8adc8c
XL
12and costs, and choosing an appropriate one for your current situation is a
13skill you’ll develop over time. In this chapter, we’ll discuss three
14collections that are used very often in Rust programs:
cc61c64b
XL
15
16* A *vector* allows us to store a variable number of values next to each other.
ea8adc8c
XL
17* A *string* is a collection of characters. We’ve discussed the `String` type
18 previously, but in this chapter we’ll talk about it in depth.
3b2f2976 19* A *hash map* allows us to associate a value with a particular key. It’s a
cc61c64b
XL
20 particular implementation of the more general data structure called a *map*.
21
22To learn about the other kinds of collections provided by the standard library,
ea8adc8c 23see the documentation at *https://doc.rust-lang.org/stable/std/collections/*.
cc61c64b 24
ea8adc8c
XL
25We’ll discuss how to create and update vectors, strings, and hash maps, as well
26as what makes each special.
cc61c64b 27
abe05a73 28## Vectors Store Lists of Values
cc61c64b 29
ea8adc8c
XL
30The first collection type we’ll look at is `Vec<T>`, also known as a *vector*.
31Vectors allow us to store more than one value in a single data structure that
32puts all the values next to each other in memory. Vectors can only store values
33of the same type. They are useful in situations in which you have a list of
34items, such as the lines of text in a file or the prices of items in a shopping
35cart.
cc61c64b
XL
36
37### Creating a New Vector
38
ea8adc8c
XL
39To create a new, empty vector, we can call the `Vec::new` function as shown in
40Listing 8-1:
cc61c64b 41
abe05a73 42```
cc61c64b
XL
43let v: Vec<i32> = Vec::new();
44```
45
ea8adc8c
XL
46Listing 8-1: Creating a new, empty vector to hold values of type `i32`
47
48Note that we added a type annotation here. Because we aren’t inserting any
49values into this vector, Rust doesn’t know what kind of elements we intend to
50store. This is an important point. Vectors are implemented using generics;
51we’ll cover how to use generics with your own types in Chapter 10. For now,
52know that the `Vec<T>` type provided by the standard library can hold any type,
53and when a specific vector holds a specific type, the type is specified within
54angle brackets. In Listing 8-1, we’ve told Rust that the `Vec<T>` in `v` will
cc61c64b
XL
55hold elements of the `i32` type.
56
ea8adc8c
XL
57In more realistic code, Rust can often infer the type of value we want to store
58once we insert values, so you rarely need to do this type annotation. It’s more
59common to create a `Vec<T>` that has initial values, and Rust provides the
60`vec!` macro for convenience. The macro will create a new vector that holds the
61values we give it. Listing 8-2 creates a new `Vec<i32>` that holds the values
62`1`, `2`, and `3`:
cc61c64b 63
abe05a73 64```
cc61c64b
XL
65let v = vec![1, 2, 3];
66```
67
ea8adc8c
XL
68Listing 8-2: Creating a new vector containing values
69
cc61c64b 70Because we’ve given initial `i32` values, Rust can infer that the type of `v`
ea8adc8c
XL
71is `Vec<i32>`, and the type annotation isn’t necessary. Next, we’ll look at how
72to modify a vector.
cc61c64b
XL
73
74### Updating a Vector
75
ea8adc8c
XL
76To create a vector and then add elements to it, we can use the `push` method as
77shown in Listing 8-3:
cc61c64b 78
abe05a73 79```
cc61c64b
XL
80let mut v = Vec::new();
81
82v.push(5);
83v.push(6);
84v.push(7);
85v.push(8);
86```
87
ea8adc8c
XL
88Listing 8-3: Using the `push` method to add values to a vector
89
90As with any variable, as discussed in Chapter 3, if we want to be able to
91change its value, we need to make it mutable using the `mut` keyword. The
cc61c64b
XL
92numbers we place inside are all of type `i32`, and Rust infers this from the
93data, so we don’t need the `Vec<i32>` annotation.
94
ea8adc8c 95### Dropping a Vector Drops Its Elements
cc61c64b 96
ea8adc8c
XL
97Like any other `struct`, a vector will be freed when it goes out of scope, as
98annotated in Listing 8-4:
cc61c64b 99
abe05a73 100```
cc61c64b
XL
101{
102 let v = vec![1, 2, 3, 4];
103
104 // do stuff with v
105
106} // <- v goes out of scope and is freed here
107```
108
ea8adc8c
XL
109Listing 8-4: Showing where the vector and its elements are dropped
110
cc61c64b
XL
111When the vector gets dropped, all of its contents will also be dropped, meaning
112those integers it holds will be cleaned up. This may seem like a
ea8adc8c 113straightforward point but can get a bit more complicated when we start to
cc61c64b
XL
114introduce references to the elements of the vector. Let’s tackle that next!
115
116### Reading Elements of Vectors
117
118Now that you know how to create, update, and destroy vectors, knowing how to
119read their contents is a good next step. There are two ways to reference a
120value stored in a vector. In the examples, we’ve annotated the types of the
121values that are returned from these functions for extra clarity.
122
ea8adc8c 123Listing 8-5 shows both methods of accessing a value in a vector either with
cc61c64b
XL
124indexing syntax or the `get` method:
125
abe05a73 126```
cc61c64b
XL
127let v = vec![1, 2, 3, 4, 5];
128
129let third: &i32 = &v[2];
130let third: Option<&i32> = v.get(2);
131```
132
ea8adc8c
XL
133Listing 8-5: Using indexing syntax or the `get` method to access an item in a
134vector
cc61c64b 135
ea8adc8c
XL
136Note two details here. First, we use the index value of `2` to get the third
137element: vectors are indexed by number, starting at zero. Second, the two
138different ways to get the third element are by using `&` and `[]`, which gives
139us a reference, or by using the `get` method with the index passed as an
140argument, which gives us an `Option<&T>`.
141
142The reason Rust has two ways to reference an element is so you can choose how
143the program behaves when you try to use an index value that the vector doesn’t
abe05a73
XL
144have an element for. As an example, let's see what a program will do if it has
145a vector that holds five elements and then tries to access an element at index
146100, as shown in Listing 8-6:
cc61c64b 147
abe05a73 148```
cc61c64b
XL
149let v = vec![1, 2, 3, 4, 5];
150
151let does_not_exist = &v[100];
152let does_not_exist = v.get(100);
153```
154
ea8adc8c
XL
155Listing 8-6: Attempting to access the element at index 100 in a vector
156containing 5 elements
157
158When you run this code, the first `[]` method will cause a `panic!` because it
159references a nonexistent element. This method is best used when you want your
160program to consider an attempt to access an element past the end of the vector
161to be a fatal error that crashes the program.
cc61c64b 162
ea8adc8c
XL
163When the `get` method is passed an index that is outside the vector, it returns
164`None` without panicking. You would use this method if accessing an element
165beyond the range of the vector happens occasionally under normal circumstances.
166Your code will then have logic to handle having either `Some(&element)` or
167`None`, as discussed in Chapter 6. For example, the index could be coming from
168a person entering a number. If they accidentally enter a number that’s too
169large and the program gets a `None` value, you could tell the user how many
abe05a73 170items are in the current vector and give them another chance to enter a valid
ea8adc8c 171value. That would be more user-friendly than crashing the program due to a typo!
cc61c64b
XL
172
173#### Invalid References
174
ea8adc8c
XL
175When the program has a valid reference, the borrow checker enforces the
176ownership and borrowing rules (covered in Chapter 4) to ensure this reference
177and any other references to the contents of the vector remain valid. Recall the
178rule that states we can’t have mutable and immutable references in the same
179scope. That rule applies in Listing 8-7 where we hold an immutable reference to
abe05a73
XL
180the first element in a vector and try to add an element to the end, which won't
181work:
cc61c64b 182
abe05a73 183```
cc61c64b
XL
184let mut v = vec![1, 2, 3, 4, 5];
185
186let first = &v[0];
187
188v.push(6);
189```
190
ea8adc8c
XL
191Listing 8-7: Attempting to add an element to a vector while holding a reference
192to an item
cc61c64b 193
ea8adc8c
XL
194Compiling this code will result in this error:
195
abe05a73 196```
cc61c64b
XL
197error[E0502]: cannot borrow `v` as mutable because it is also borrowed as
198immutable
199 |
2004 | let first = &v[0];
201 | - immutable borrow occurs here
2025 |
2036 | v.push(6);
204 | ^ mutable borrow occurs here
2057 | }
206 | - immutable borrow ends here
207```
208
ea8adc8c
XL
209The code in Listing 8-7 might look like it should work: why should a reference
210to the first element care about what changes at the end of the vector? The
211reason behind this error is due to the way vectors work: adding a new element
cc61c64b 212onto the end of the vector might require allocating new memory and copying the
ea8adc8c
XL
213old elements to the new space if there isn’t enough room to put all the
214elements next to each other where the vector was. In that case, the reference
215to the first element would be pointing to deallocated memory. The borrowing
216rules prevent programs from ending up in that situation.
217
218> Note: For more on the implementation details of the `Vec<T>` type, see “The
219> Nomicon” at https://doc.rust-lang.org/stable/nomicon/vec.html.
220
221### Iterating Over the Values in a Vector
222
abe05a73
XL
223If we want to access each element in a vector in turn, we can iterate through
224all of the elements rather than use indexes to access one at a time. Listing
2258-8 shows how to use a `for` loop to get immutable references to each element
226in a vector of `i32` values and print them out:
ea8adc8c 227
abe05a73 228```
ea8adc8c
XL
229let v = vec![100, 32, 57];
230for i in &v {
231 println!("{}", i);
232}
233```
234
235Listing 8-8: Printing each element in a vector by iterating over the elements
236using a `for` loop
cc61c64b 237
ea8adc8c 238We can also iterate over mutable references to each element in a mutable vector
abe05a73 239in order to make changes to all the elements. The `for` loop in Listing 8-9
ea8adc8c
XL
240will add `50` to each element:
241
abe05a73 242```
ea8adc8c
XL
243let mut v = vec![100, 32, 57];
244for i in &mut v {
245 *i += 50;
246}
247```
248
249Listing 8-9: Iterating over mutable references to elements in a vector
250
abe05a73
XL
251To change the value that the mutable reference refers to, we have to use the
252dereference operator (`*`) to get to the value in `i` before we can use the
253`+=` operator .
cc61c64b
XL
254
255### Using an Enum to Store Multiple Types
256
257At the beginning of this chapter, we said that vectors can only store values
ea8adc8c
XL
258that are the same type. This can be inconvenient; there are definitely use
259cases for needing to store a list of items of different types. Fortunately, the
260variants of an enum are defined under the same enum type, so when we need to
261store elements of a different type in a vector, we can define and use an enum!
262
263For example, let’s say we want to get values from a row in a spreadsheet where
264some of the columns in the row contain integers, some floating-point numbers,
cc61c64b 265and some strings. We can define an enum whose variants will hold the different
abe05a73 266value types, and then all the enum variants will be considered the same type:
ea8adc8c 267that of the enum. Then we can create a vector that holds that enum and so,
abe05a73 268ultimately, holds different types. We’ve demonstrated this in Listing 8-10:
cc61c64b 269
abe05a73 270```
cc61c64b
XL
271enum SpreadsheetCell {
272 Int(i32),
273 Float(f64),
274 Text(String),
275}
276
277let row = vec![
278 SpreadsheetCell::Int(3),
279 SpreadsheetCell::Text(String::from("blue")),
280 SpreadsheetCell::Float(10.12),
281];
282```
283
abe05a73
XL
284Listing 8-10: Defining an `enum` to store values of different types in one
285vector
ea8adc8c
XL
286
287The reason Rust needs to know what types will be in the vector at compile time
288is so it knows exactly how much memory on the heap will be needed to store each
289element. A secondary advantage is that we can be explicit about what types are
290allowed in this vector. If Rust allowed a vector to hold any type, there would
291be a chance that one or more of the types would cause errors with the
292operations performed on the elements of the vector. Using an enum plus a
293`match` expression means that Rust will ensure at compile time that we always
294handle every possible case, as discussed in Chapter 6.
cc61c64b 295
abe05a73
XL
296If you don’t know the exhaustive set of types the program will get at runtime
297to store in a vector when you’re writing a program, the enum technique won’t
ea8adc8c 298work. Instead, you can use a trait object, which we’ll cover in Chapter 17.
cc61c64b 299
ea8adc8c
XL
300Now that we’ve discussed some of the most common ways to use vectors, be sure
301to review the API documentation for all the many useful methods defined on
abe05a73 302`Vec<T>` by the standard library. For example, in addition to `push`, a `pop`
ea8adc8c
XL
303method removes and returns the last element. Let’s move on to the next
304collection type: `String`!
cc61c64b 305
abe05a73 306## Strings Store UTF-8 Encoded Text
cc61c64b 307
ea8adc8c
XL
308We talked about strings in Chapter 4, but we’ll look at them in more depth now.
309New Rustaceans commonly get stuck on strings due to a combination of three
310concepts: Rust’s propensity for exposing possible errors, strings being a more
311complicated data structure than many programmers give them credit for, and
312UTF-8. These concepts combine in a way that can seem difficult when you’re
313coming from other programming languages.
cc61c64b 314
ea8adc8c 315This discussion of strings is in the collections chapter because strings are
cc61c64b
XL
316implemented as a collection of bytes plus some methods to provide useful
317functionality when those bytes are interpreted as text. In this section, we’ll
ea8adc8c 318talk about the operations on `String` that every collection type has, such as
cc61c64b
XL
319creating, updating, and reading. We’ll also discuss the ways in which `String`
320is different than the other collections, namely how indexing into a `String` is
ea8adc8c
XL
321complicated by the differences between how people and computers interpret
322`String` data.
323
324### What Is a String?
325
326We’ll first define what we mean by the term *string*. Rust has only one string
327type in the core language, which is the string slice `str` that is usually seen
328in its borrowed form `&str`. In Chapter 4, we talked about *string slices*,
329which are references to some UTF-8 encoded string data stored elsewhere. String
330literals, for example, are stored in the binary output of the program and are
331therefore string slices.
332
333The `String` type is provided in Rust’s standard library rather than coded into
334the core language and is a growable, mutable, owned, UTF-8 encoded string type.
335When Rustaceans refer to “strings” in Rust, they usually mean the `String` and
336the string slice `&str` types, not just one of those types. Although this
337section is largely about `String`, both types are used heavily in Rust’s
338standard library and both `String` and string slices are UTF-8 encoded.
cc61c64b
XL
339
340Rust’s standard library also includes a number of other string types, such as
ea8adc8c 341`OsString`, `OsStr`, `CString`, and `CStr`. Library crates can provide even
cc61c64b
XL
342more options for storing string data. Similar to the `*String`/`*Str` naming,
343they often provide an owned and borrowed variant, just like `String`/`&str`.
ea8adc8c
XL
344These string types can store text in different encodings or be represented in
345memory in a different way, for example. We won’t discuss these other string
cc61c64b
XL
346types in this chapter; see their API documentation for more about how to use
347them and when each is appropriate.
348
349### Creating a New String
350
abe05a73
XL
351Many of the same operations available with `Vec<T>` are available with `String`
352as well, starting with the `new` function to create a string, shown in Listing
3538-11:
cc61c64b 354
abe05a73 355```
ea8adc8c 356let mut s = String::new();
cc61c64b
XL
357```
358
abe05a73 359Listing 8-11: Creating a new, empty `String`
cc61c64b 360
ea8adc8c
XL
361This line creates a new empty string called `s` that we can then load data
362into. Often, we’ll have some initial data that we want to start the string
cc61c64b 363with. For that, we use the `to_string` method, which is available on any type
abe05a73 364that implements the `Display` trait, which string literals do. Listing 8-12
ea8adc8c 365shows two examples:
cc61c64b 366
abe05a73 367```
cc61c64b
XL
368let data = "initial contents";
369
370let s = data.to_string();
371
372// the method also works on a literal directly:
373let s = "initial contents".to_string();
374```
375
abe05a73 376Listing 8-12: Using the `to_string` method to create a `String` from a string
ea8adc8c
XL
377literal
378
379This code creates a string containing `initial contents`.
cc61c64b
XL
380
381We can also use the function `String::from` to create a `String` from a string
abe05a73 382literal. The code in Listing 8-13 is equivalent to the code from Listing 8-12
ea8adc8c 383that uses `to_string`:
cc61c64b 384
abe05a73 385```
cc61c64b
XL
386let s = String::from("initial contents");
387```
388
abe05a73 389Listing 8-13: Using the `String::from` function to create a `String` from a
ea8adc8c
XL
390string literal
391
392Because strings are used for so many things, we can use many different generic
393APIs for strings, providing us with a lot of options. Some of them can seem
394redundant, but they all have their place! In this case, `String::from` and
395`to_string` do the same thing, so which you choose is a matter of style.
cc61c64b
XL
396
397Remember that strings are UTF-8 encoded, so we can include any properly encoded
abe05a73 398data in them, as shown in Listing 8-14:
cc61c64b 399
abe05a73 400```
ea8adc8c
XL
401let hello = String::from("السلام عليكم");
402let hello = String::from("Dobrý den");
403let hello = String::from("Hello");
404let hello = String::from("שָׁלוֹם");
405let hello = String::from("नमस्ते");
406let hello = String::from("こんにちは");
407let hello = String::from("안녕하세요");
408let hello = String::from("你好");
409let hello = String::from("Olá");
410let hello = String::from("Здравствуйте");
411let hello = String::from("Hola");
cc61c64b
XL
412```
413
abe05a73 414Listing 8-14: Storing greetings in different languages in strings
ea8adc8c
XL
415
416All of these are valid `String` values.
417
cc61c64b
XL
418### Updating a String
419
ea8adc8c 420A `String` can grow in size and its contents can change, just like the contents
abe05a73
XL
421of a `Vec<T>`, by pushing more data into it. In addition, we can conveniently
422use the `+` operator or the `format!` macro to concatenate `String` values
423together.
cc61c64b 424
ea8adc8c 425#### Appending to a String with `push_str` and `push`
cc61c64b 426
ea8adc8c 427We can grow a `String` by using the `push_str` method to append a string slice,
abe05a73 428as shown in Listing 8-15:
cc61c64b 429
abe05a73 430```
cc61c64b
XL
431let mut s = String::from("foo");
432s.push_str("bar");
433```
434
abe05a73 435Listing 8-15: Appending a string slice to a `String` using the `push_str` method
ea8adc8c
XL
436
437After these two lines, `s` will contain `foobar`. The `push_str` method takes a
cc61c64b 438string slice because we don’t necessarily want to take ownership of the
abe05a73 439parameter. For example, the code in Listing 8-16 shows that it would be
ea8adc8c 440unfortunate if we weren’t able to use `s2` after appending its contents to `s1`:
cc61c64b 441
abe05a73 442```
cc61c64b 443let mut s1 = String::from("foo");
ea8adc8c 444let s2 = "bar";
cc61c64b 445s1.push_str(&s2);
ea8adc8c 446println!("s2 is {}", s2);
cc61c64b
XL
447```
448
abe05a73 449Listing 8-16: Using a string slice after appending its contents to a `String`
ea8adc8c
XL
450
451If the `push_str` method took ownership of `s2`, we wouldn’t be able to print
452out its value on the last line. However, this code works as we’d expect!
453
454The `push` method takes a single character as a parameter and adds it to the
abe05a73
XL
455`String`. Listing 8-17 shows code that adds the letter l character to a
456`String` using the `push` method:
cc61c64b 457
abe05a73 458```
cc61c64b
XL
459let mut s = String::from("lo");
460s.push('l');
461```
462
abe05a73 463Listing 8-17: Adding one character to a `String` value using `push`
cc61c64b 464
ea8adc8c 465As a result of this code, `s` will contain `lol`.
cc61c64b 466
ea8adc8c
XL
467#### Concatenation with the `+` Operator or the `format!` Macro
468
469Often, we’ll want to combine two existing strings. One way is to use the `+`
abe05a73 470operator, as shown in Listing 8-18:
cc61c64b 471
abe05a73 472```
cc61c64b
XL
473let s1 = String::from("Hello, ");
474let s2 = String::from("world!");
475let s3 = s1 + &s2; // Note that s1 has been moved here and can no longer be used
476```
477
abe05a73 478Listing 8-18: Using the `+` operator to combine two `String` values into a new
ea8adc8c
XL
479`String` value
480
abe05a73 481The string `s3` will contain `Hello, world!` as a result of this code. The
ea8adc8c 482reason `s1` is no longer valid after the addition and the reason we used a
cc61c64b
XL
483reference to `s2` has to do with the signature of the method that gets called
484when we use the `+` operator. The `+` operator uses the `add` method, whose
485signature looks something like this:
486
abe05a73 487```
cc61c64b
XL
488fn add(self, s: &str) -> String {
489```
490
ea8adc8c
XL
491This isn’t the exact signature that’s in the standard library: in the standard
492library, `add` is defined using generics. Here, we’re looking at the signature
493of `add` with concrete types substituted for the generic ones, which is what
494happens when we call this method with `String` values. We’ll discuss generics
495in Chapter 10. This signature gives us the clues we need to understand the
496tricky bits of the `+` operator.
497
498First, `s2` has an `&`, meaning that we’re adding a *reference* of the second
499string to the first string because of the `s` parameter in the `add` function:
500we can only add a `&str` to a `String`; we can’t add two `String` values
501together. But wait - the type of `&s2` is `&String`, not `&str`, as specified
abe05a73
XL
502in the second parameter to `add`. So why does Listing 8-18 compile?
503
504The reason we’re able to use `&s2` in the call to `add` is that the compiler
505can *coerce* the `&String` argument into a `&str`. When we call the `add`
506method, Rust uses a *deref coercion*, which here turns `&s2` into `&s2[..]`.
507We’ll discuss deref coercion in more depth in Chapter 15. Because `add` does
508not take ownership of the `s` parameter, `s2` will still be a valid `String`
509after this operation.
cc61c64b
XL
510
511Second, we can see in the signature that `add` takes ownership of `self`,
abe05a73 512because `self` does *not* have an `&`. This means `s1` in Listing 8-18 will be
ea8adc8c
XL
513moved into the `add` call and no longer be valid after that. So although `let
514s3 = s1 + &s2;` looks like it will copy both strings and create a new one, this
515statement actually takes ownership of `s1`, appends a copy of the contents of
516`s2`, and then returns ownership of the result. In other words, it looks like
517it’s making a lot of copies but isn’t: the implementation is more efficient
cc61c64b
XL
518than copying.
519
520If we need to concatenate multiple strings, the behavior of `+` gets unwieldy:
521
abe05a73 522```
cc61c64b
XL
523let s1 = String::from("tic");
524let s2 = String::from("tac");
525let s3 = String::from("toe");
526
527let s = s1 + "-" + &s2 + "-" + &s3;
528```
529
ea8adc8c
XL
530At this point, `s` will be `tic-tac-toe`. With all of the `+` and `"`
531characters, it’s difficult to see what’s going on. For more complicated string
cc61c64b
XL
532combining, we can use the `format!` macro:
533
abe05a73 534```
cc61c64b
XL
535let s1 = String::from("tic");
536let s2 = String::from("tac");
537let s3 = String::from("toe");
538
539let s = format!("{}-{}-{}", s1, s2, s3);
540```
541
ea8adc8c
XL
542This code also sets `s` to `tic-tac-toe`. The `format!` macro works in the same
543way as `println!`, but instead of printing the output to the screen, it returns
544a `String` with the contents. The version of the code using `format!` is much
545easier to read and also doesn’t take ownership of any of its parameters.
cc61c64b
XL
546
547### Indexing into Strings
548
ea8adc8c
XL
549In many other programming languages, accessing individual characters in a
550string by referencing them by index is a valid and common operation. However,
551if we try to access parts of a `String` using indexing syntax in Rust, we’ll
abe05a73 552get an error. Consider the invalid code in Listing 8-19:
cc61c64b 553
abe05a73 554```
cc61c64b
XL
555let s1 = String::from("hello");
556let h = s1[0];
557```
558
abe05a73 559Listing 8-19: Attempting to use indexing syntax with a String
ea8adc8c
XL
560
561This code will result in the following error:
cc61c64b 562
abe05a73
XL
563```
564error[E0277]: the trait bound `std::string::String: std::ops::Index<{integer}>` is not satisfied
565 -->
566 |
5673 | let h = s1[0];
568 | ^^^^^ the type `std::string::String` cannot be indexed by `{integer}`
569 |
570 = help: the trait `std::ops::Index<{integer}>` is not implemented for `std::string::String`
cc61c64b
XL
571```
572
ea8adc8c
XL
573The error and the note tell the story: Rust strings don’t support indexing. But
574why not? To answer that question, we need to discuss how Rust stores strings in
575memory.
cc61c64b
XL
576
577#### Internal Representation
578
ea8adc8c 579A `String` is a wrapper over a `Vec<u8>`. Let’s look at some of our properly
abe05a73 580encoded UTF-8 example strings from Listing 8-14. First, this one:
cc61c64b 581
abe05a73 582```
cc61c64b
XL
583let len = String::from("Hola").len();
584```
585
586In this case, `len` will be four, which means the `Vec` storing the string
ea8adc8c
XL
587“Hola” is four bytes long. Each of these letters takes one byte when encoded in
588UTF-8. But what about the following line?
cc61c64b 589
abe05a73 590```
cc61c64b
XL
591let len = String::from("Здравствуйте").len();
592```
593
abe05a73
XL
594Note that this string begins with the capital Cyrillic letter Ze, not the
595Arabic number 3. Asked how long the string is, you might say 12. However,
596Rust’s answer is 24: that’s the number of bytes it takes to encode
597“Здравствуйте” in UTF-8, because each Unicode scalar value takes two bytes of
598storage. Therefore, an index into the string’s bytes will not always correlate
599to a valid Unicode scalar value. To demonstrate, consider this invalid Rust
600code:
cc61c64b 601
abe05a73 602```
cc61c64b
XL
603let hello = "Здравствуйте";
604let answer = &hello[0];
605```
606
607What should the value of `answer` be? Should it be `З`, the first letter? When
608encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`, so
609`answer` should in fact be `208`, but `208` is not a valid character on its
ea8adc8c
XL
610own. Returning `208` is likely not what a user would want if they asked for the
611first letter of this string; however, that’s the only data that Rust has at
612byte index 0. Returning the byte value is probably not what users want, even if
613the string contains only Latin letters: if `&"hello"[0]` was valid code that
614returned the byte value, it would return `104`, not `h`. To avoid returning an
615unexpected value and causing bugs that might not be discovered immediately,
616Rust doesn’t compile this code at all and prevents misunderstandings earlier in
617the development process.
cc61c64b 618
ea8adc8c 619#### Bytes and Scalar Values and Grapheme Clusters! Oh My!
cc61c64b 620
ea8adc8c
XL
621Another point about UTF-8 is that there are actually three relevant ways to
622look at strings from Rust’s perspective: as bytes, scalar values, and grapheme
623clusters (the closest thing to what we would call *letters*).
cc61c64b
XL
624
625If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is
626ultimately stored as a `Vec` of `u8` values that looks like this:
627
abe05a73 628```
cc61c64b
XL
629[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164,
630224, 165, 135]
631```
632
ea8adc8c 633That’s 18 bytes and is how computers ultimately store this data. If we look at
cc61c64b
XL
634them as Unicode scalar values, which are what Rust’s `char` type is, those
635bytes look like this:
636
abe05a73 637```
cc61c64b
XL
638['न', 'म', 'स', '्', 'त', 'े']
639```
640
ea8adc8c 641There are six `char` values here, but the fourth and sixth are not letters:
cc61c64b
XL
642they’re diacritics that don’t make sense on their own. Finally, if we look at
643them as grapheme clusters, we’d get what a person would call the four letters
ea8adc8c 644that make up the Hindi word:
cc61c64b 645
abe05a73 646```
cc61c64b
XL
647["न", "म", "स्", "ते"]
648```
649
650Rust provides different ways of interpreting the raw string data that computers
651store so that each program can choose the interpretation it needs, no matter
652what human language the data is in.
653
ea8adc8c 654A final reason Rust doesn’t allow us to index into a `String` to get a
cc61c64b 655character is that indexing operations are expected to always take constant time
ea8adc8c
XL
656(O(1)). But it isn’t possible to guarantee that performance with a `String`,
657because Rust would have to walk through the contents from the beginning to the
658index to determine how many valid characters there were.
cc61c64b
XL
659
660### Slicing Strings
661
ea8adc8c
XL
662Indexing into a string is often a bad idea because it’s not clear what the
663return type of the string indexing operation should be: a byte value, a
664character, a grapheme cluster, or a string slice. Therefore, Rust asks you to
665be more specific if you really need to use indices to create string slices. To
666be more specific in your indexing and indicate that you want a string slice,
667rather than indexing using `[]` with a single number, you can use `[]` with a
668range to create a string slice containing particular bytes:
cc61c64b 669
abe05a73 670```
cc61c64b
XL
671let hello = "Здравствуйте";
672
673let s = &hello[0..4];
674```
675
676Here, `s` will be a `&str` that contains the first four bytes of the string.
ea8adc8c
XL
677Earlier, we mentioned that each of these characters was two bytes, which means
678`s` will be `Зд`.
cc61c64b 679
ea8adc8c
XL
680What would happen if we used `&hello[0..1]`? The answer: Rust will panic at
681runtime in the same way that accessing an invalid index in a vector does:
cc61c64b 682
abe05a73
XL
683```
684thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside 'З' (bytes 0..2) of `Здравствуйте`', src/libcore/str/mod.rs:2188:4
cc61c64b
XL
685```
686
ea8adc8c
XL
687You should use ranges to create string slices with caution, because it can
688crash your program.
cc61c64b
XL
689
690### Methods for Iterating Over Strings
691
ea8adc8c 692Fortunately, we can access elements in a string in other ways.
cc61c64b
XL
693
694If we need to perform operations on individual Unicode scalar values, the best
ea8adc8c 695way to do so is to use the `chars` method. Calling `chars` on “नमस्ते” separates
abe05a73 696out and returns six values of type `char`, and we can iterate over the result
ea8adc8c 697in order to access each element:
cc61c64b 698
abe05a73 699```
cc61c64b
XL
700for c in "नमस्ते".chars() {
701 println!("{}", c);
702}
703```
704
ea8adc8c 705This code will print the following:
cc61c64b 706
abe05a73 707```
cc61c64b
XL
708
709
710
711
712
713
714```
715
716The `bytes` method returns each raw byte, which might be appropriate for your
717domain:
718
abe05a73 719```
cc61c64b
XL
720for b in "नमस्ते".bytes() {
721 println!("{}", b);
722}
723```
724
725This code will print the 18 bytes that make up this `String`, starting with:
726
abe05a73 727```
cc61c64b
XL
728224
729164
730168
731224
732// ... etc
733```
734
ea8adc8c
XL
735But be sure to remember that valid Unicode scalar values may be made up of more
736than one byte.
cc61c64b
XL
737
738Getting grapheme clusters from strings is complex, so this functionality is not
abe05a73
XL
739provided by the standard library. Crates are available on
740crates.io at *https://crates.io* if this is the functionality you need.
cc61c64b 741
ea8adc8c 742### Strings Are Not So Simple
cc61c64b
XL
743
744To summarize, strings are complicated. Different programming languages make
745different choices about how to present this complexity to the programmer. Rust
746has chosen to make the correct handling of `String` data the default behavior
ea8adc8c
XL
747for all Rust programs, which means programmers have to put more thought into
748handling UTF-8 data upfront. This trade-off exposes more of the complexity of
749strings than other programming languages do but prevents you from having to
750handle errors involving non-ASCII characters later in your development life
751cycle.
cc61c64b 752
ea8adc8c 753Let’s switch to something a bit less complex: hash maps!
cc61c64b 754
abe05a73 755## Hash Maps Store Keys Associated with Values
cc61c64b
XL
756
757The last of our common collections is the *hash map*. The type `HashMap<K, V>`
758stores a mapping of keys of type `K` to values of type `V`. It does this via a
759*hashing function*, which determines how it places these keys and values into
760memory. Many different programming languages support this kind of data
ea8adc8c
XL
761structure, but often use a different name, such as hash, map, object, hash
762table, or associative array, just to name a few.
cc61c64b 763
ea8adc8c
XL
764Hash maps are useful for when you want to look up data not by an index, as you
765can with vectors, but by using a key that can be of any type. For example, in a
766game, you could keep track of each team’s score in a hash map where each key is
767a team’s name and the values are each team’s score. Given a team name, you can
768retrieve its score.
cc61c64b 769
ea8adc8c
XL
770We’ll go over the basic API of hash maps in this section, but many more goodies
771are hiding in the functions defined on `HashMap<K, V>` by the standard library.
772As always, check the standard library documentation for more information.
cc61c64b
XL
773
774### Creating a New Hash Map
775
ea8adc8c 776We can create an empty hash map with `new` and add elements with `insert`. In
abe05a73 777Listing 8-20, we’re keeping track of the scores of two teams whose names are
ea8adc8c
XL
778Blue and Yellow. The Blue team will start with 10 points, and the Yellow team
779starts with 50:
cc61c64b 780
abe05a73 781```
cc61c64b
XL
782use std::collections::HashMap;
783
784let mut scores = HashMap::new();
785
786scores.insert(String::from("Blue"), 10);
787scores.insert(String::from("Yellow"), 50);
788```
789
abe05a73 790Listing 8-20: Creating a new hash map and inserting some keys and values
ea8adc8c 791
cc61c64b
XL
792Note that we need to first `use` the `HashMap` from the collections portion of
793the standard library. Of our three common collections, this one is the least
abe05a73
XL
794often used, so it’s not included in the features brought into scope
795automatically in the prelude. Hash maps also have less support from the
796standard library; there’s no built-in macro to construct them, for example.
cc61c64b
XL
797
798Just like vectors, hash maps store their data on the heap. This `HashMap` has
799keys of type `String` and values of type `i32`. Like vectors, hash maps are
800homogeneous: all of the keys must have the same type, and all of the values
801must have the same type.
802
803Another way of constructing a hash map is by using the `collect` method on a
804vector of tuples, where each tuple consists of a key and its value. The
ea8adc8c 805`collect` method gathers data into a number of collection types, including
cc61c64b
XL
806`HashMap`. For example, if we had the team names and initial scores in two
807separate vectors, we can use the `zip` method to create a vector of tuples
808where “Blue” is paired with 10, and so forth. Then we can use the `collect`
abe05a73 809method to turn that vector of tuples into a `HashMap` as shown in Listing 8-21:
cc61c64b 810
abe05a73 811```
cc61c64b
XL
812use std::collections::HashMap;
813
814let teams = vec![String::from("Blue"), String::from("Yellow")];
815let initial_scores = vec![10, 50];
816
817let scores: HashMap<_, _> = teams.iter().zip(initial_scores.iter()).collect();
818```
819
abe05a73 820Listing 8-21: Creating a hash map from a list of teams and a list of scores
ea8adc8c 821
cc61c64b
XL
822The type annotation `HashMap<_, _>` is needed here because it’s possible to
823`collect` into many different data structures, and Rust doesn’t know which you
824want unless you specify. For the type parameters for the key and value types,
ea8adc8c
XL
825however, we use underscores, and Rust can infer the types that the hash map
826contains based on the types of the data in the vectors.
cc61c64b
XL
827
828### Hash Maps and Ownership
829
830For types that implement the `Copy` trait, like `i32`, the values are copied
831into the hash map. For owned values like `String`, the values will be moved and
abe05a73 832the hash map will be the owner of those values as demonstrated in Listing 8-22:
cc61c64b 833
abe05a73 834```
cc61c64b
XL
835use std::collections::HashMap;
836
837let field_name = String::from("Favorite color");
838let field_value = String::from("Blue");
839
840let mut map = HashMap::new();
841map.insert(field_name, field_value);
ea8adc8c
XL
842// field_name and field_value are invalid at this point, try using them and
843// see what compiler error you get!
cc61c64b
XL
844```
845
abe05a73 846Listing 8-22: Showing that keys and values are owned by the hash map once
ea8adc8c
XL
847they’re inserted
848
849We aren’t able to use the variables `field_name` and `field_value` after
850they’ve been moved into the hash map with the call to `insert`.
cc61c64b 851
ea8adc8c
XL
852If we insert references to values into the hash map, the values won’t be moved
853into the hash map. The values that the references point to must be valid for at
854least as long as the hash map is valid. We’ll talk more about these issues in
855the “Validating References with Lifetimes” section in Chapter 10.
cc61c64b
XL
856
857### Accessing Values in a Hash Map
858
ea8adc8c 859We can get a value out of the hash map by providing its key to the `get` method
abe05a73 860as shown in Listing 8-23:
cc61c64b 861
abe05a73 862```
cc61c64b
XL
863use std::collections::HashMap;
864
865let mut scores = HashMap::new();
866
867scores.insert(String::from("Blue"), 10);
868scores.insert(String::from("Yellow"), 50);
869
870let team_name = String::from("Blue");
871let score = scores.get(&team_name);
872```
873
abe05a73 874Listing 8-23: Accessing the score for the Blue team stored in the hash map
ea8adc8c 875
cc61c64b 876Here, `score` will have the value that’s associated with the Blue team, and the
ea8adc8c
XL
877result will be `Some(&10)`. The result is wrapped in `Some` because `get`
878returns an `Option<&V>`; if there’s no value for that key in the hash map,
879`get` will return `None`. The program will need to handle the `Option` in one
880of the ways that we covered in Chapter 6.
cc61c64b
XL
881
882We can iterate over each key/value pair in a hash map in a similar manner as we
883do with vectors, using a `for` loop:
884
abe05a73 885```
cc61c64b
XL
886use std::collections::HashMap;
887
888let mut scores = HashMap::new();
889
890scores.insert(String::from("Blue"), 10);
891scores.insert(String::from("Yellow"), 50);
892
893for (key, value) in &scores {
894 println!("{}: {}", key, value);
895}
896```
897
ea8adc8c 898This code will print each pair in an arbitrary order:
cc61c64b 899
abe05a73 900```
cc61c64b
XL
901Yellow: 50
902Blue: 10
903```
904
905### Updating a Hash Map
906
ea8adc8c
XL
907Although the number of keys and values is growable, each key can only have one
908value associated with it at a time. When we want to change the data in a hash
909map, we have to decide how to handle the case when a key already has a value
910assigned. We could replace the old value with the new value, completely
911disregarding the old value. We could keep the old value and ignore the new
912value, and only add the new value if the key *doesn’t* already have a value. Or
913we could combine the old value and the new value. Let’s look at how to do each
914of these!
cc61c64b
XL
915
916#### Overwriting a Value
917
ea8adc8c
XL
918If we insert a key and a value into a hash map, and then insert that same key
919with a different value, the value associated with that key will be replaced.
abe05a73 920Even though the code in Listing 8-24 calls `insert` twice, the hash map will
ea8adc8c
XL
921only contain one key/value pair because we’re inserting the value for the Blue
922team’s key both times:
cc61c64b 923
abe05a73 924```
cc61c64b
XL
925use std::collections::HashMap;
926
927let mut scores = HashMap::new();
928
929scores.insert(String::from("Blue"), 10);
930scores.insert(String::from("Blue"), 25);
931
932println!("{:?}", scores);
933```
934
abe05a73 935Listing 8-24: Replacing a value stored with a particular key
ea8adc8c
XL
936
937This code will print `{"Blue": 25}`. The original value of `10` has been
938overwritten.
cc61c64b
XL
939
940#### Only Insert If the Key Has No Value
941
ea8adc8c
XL
942It’s common to check whether a particular key has a value, and if it doesn’t,
943insert a value for it. Hash maps have a special API for this called `entry`
944that takes the key we want to check as a parameter. The return value of the
945`entry` function is an enum called `Entry` that represents a value that might
946or might not exist. Let’s say we want to check whether the key for the Yellow
cc61c64b 947team has a value associated with it. If it doesn’t, we want to insert the value
ea8adc8c 94850, and the same for the Blue team. Using the `entry` API, the code looks like
abe05a73 949Listing 8-25:
cc61c64b 950
abe05a73 951```
cc61c64b
XL
952use std::collections::HashMap;
953
954let mut scores = HashMap::new();
955scores.insert(String::from("Blue"), 10);
956
957scores.entry(String::from("Yellow")).or_insert(50);
958scores.entry(String::from("Blue")).or_insert(50);
959
960println!("{:?}", scores);
961```
962
abe05a73 963Listing 8-25: Using the `entry` method to only insert if the key does not
ea8adc8c
XL
964already have a value
965
966The `or_insert` method on `Entry` is defined to return the value for the
967corresponding `Entry` key if that key exists, and if not, inserts the parameter
968as the new value for this key and returns the modified `Entry`. This technique
969is much cleaner than writing the logic ourselves, and in addition, plays more
970nicely with the borrow checker.
cc61c64b 971
abe05a73 972Running the code in Listing 8-25 will print `{"Yellow": 50, "Blue": 10}`. The
ea8adc8c
XL
973first call to `entry` will insert the key for the Yellow team with the value
974`50` because the Yellow team doesn’t have a value already. The second call to
975`entry` will not change the hash map because the Blue team already has the
976value `10`.
cc61c64b 977
ea8adc8c 978#### Updating a Value Based on the Old Value
cc61c64b 979
ea8adc8c 980Another common use case for hash maps is to look up a key’s value and then
abe05a73 981update it based on the old value. For instance, Listing 8-26 shows code that
ea8adc8c
XL
982counts how many times each word appears in some text. We use a hash map with
983the words as keys and increment the value to keep track of how many times we’ve
984seen that word. If it’s the first time we’ve seen a word, we’ll first insert
985the value `0`:
cc61c64b 986
abe05a73 987```
cc61c64b
XL
988use std::collections::HashMap;
989
990let text = "hello world wonderful world";
991
992let mut map = HashMap::new();
993
994for word in text.split_whitespace() {
995 let count = map.entry(word).or_insert(0);
996 *count += 1;
997}
998
999println!("{:?}", map);
1000```
1001
abe05a73 1002Listing 8-26: Counting occurrences of words using a hash map that stores words
ea8adc8c
XL
1003and counts
1004
1005This code will print `{"world": 2, "hello": 1, "wonderful": 1}`. The
1006`or_insert` method actually returns a mutable reference (`&mut V`) to the value
1007for this key. Here we store that mutable reference in the `count` variable, so
1008in order to assign to that value we must first dereference `count` using the
1009asterisk (`*`). The mutable reference goes out of scope at the end of the `for`
1010loop, so all of these changes are safe and allowed by the borrowing rules.
cc61c64b
XL
1011
1012### Hashing Function
1013
1014By default, `HashMap` uses a cryptographically secure hashing function that can
1015provide resistance to Denial of Service (DoS) attacks. This is not the fastest
ea8adc8c 1016hashing algorithm available, but the trade-off for better security that comes
cc61c64b
XL
1017with the drop in performance is worth it. If you profile your code and find
1018that the default hash function is too slow for your purposes, you can switch to
1019another function by specifying a different *hasher*. A hasher is a type that
ea8adc8c 1020implements the `BuildHasher` trait. We’ll talk about traits and how to
3b2f2976 1021implement them in Chapter 10. You don’t necessarily have to implement your own
abe05a73
XL
1022hasher from scratch; crates.io at *https://crates.io* has libraries shared by
1023other Rust users that provide hashers implementing many common hashing
1024algorithms.
cc61c64b
XL
1025
1026## Summary
1027
ea8adc8c
XL
1028Vectors, strings, and hash maps will provide a large amount of functionality
1029that you need in programs where you need to store, access, and modify data.
1030Here are some exercises you should now be equipped to solve:
cc61c64b
XL
1031
1032* Given a list of integers, use a vector and return the mean (average), median
1033 (when sorted, the value in the middle position), and mode (the value that
1034 occurs most often; a hash map will be helpful here) of the list.
ea8adc8c
XL
1035* Convert strings to pig latin. The first consonant of each word is moved to
1036 the end of the word and “ay” is added, so “first” becomes “irst-fay.” Words
1037 that start with a vowel have “hay” added to the end instead (“apple” becomes
1038 “apple-hay”). Keep in mind the details about UTF-8 encoding!
cc61c64b 1039* Using a hash map and vectors, create a text interface to allow a user to add
ea8adc8c
XL
1040 employee names to a department in a company. For example, “Add Sally to
1041 Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all
cc61c64b
XL
1042 people in a department or all people in the company by department, sorted
1043 alphabetically.
1044
ea8adc8c
XL
1045The standard library API documentation describes methods that vectors, strings,
1046and hash maps have that will be helpful for these exercises!
cc61c64b 1047
ea8adc8c
XL
1048We’re getting into more complex programs in which operations can fail; so, it’s
1049a perfect time to discuss error handling next!