]>
Commit | Line | Data |
---|---|---|
13cf67c4 XL |
1 | ## The Slice Type |
2 | ||
3 | Another data type that does not have ownership is the *slice*. Slices let you | |
4 | reference a contiguous sequence of elements in a collection rather than the | |
5 | whole collection. | |
6 | ||
7 | Here’s a small programming problem: write a function that takes a string and | |
8 | returns the first word it finds in that string. If the function doesn’t find a | |
9 | space in the string, the whole string must be one word, so the entire string | |
10 | should be returned. | |
11 | ||
12 | Let’s think about the signature of this function: | |
13 | ||
14 | ```rust,ignore | |
15 | fn first_word(s: &String) -> ? | |
16 | ``` | |
17 | ||
18 | This function, `first_word`, has a `&String` as a parameter. We don’t want | |
19 | ownership, so this is fine. But what should we return? We don’t really have a | |
20 | way to talk about *part* of a string. However, we could return the index of the | |
69743fb6 | 21 | end of the word. Let’s try that, as shown in Listing 4-7. |
13cf67c4 XL |
22 | |
23 | <span class="filename">Filename: src/main.rs</span> | |
24 | ||
25 | ```rust | |
74b04a01 | 26 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-07/src/main.rs:here}} |
13cf67c4 XL |
27 | ``` |
28 | ||
29 | <span class="caption">Listing 4-7: The `first_word` function that returns a | |
30 | byte index value into the `String` parameter</span> | |
31 | ||
32 | Because we need to go through the `String` element by element and check whether | |
33 | a value is a space, we’ll convert our `String` to an array of bytes using the | |
34 | `as_bytes` method: | |
35 | ||
36 | ```rust,ignore | |
74b04a01 | 37 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-07/src/main.rs:as_bytes}} |
13cf67c4 XL |
38 | ``` |
39 | ||
40 | Next, we create an iterator over the array of bytes using the `iter` method: | |
41 | ||
42 | ```rust,ignore | |
74b04a01 | 43 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-07/src/main.rs:iter}} |
13cf67c4 XL |
44 | ``` |
45 | ||
46 | We’ll discuss iterators in more detail in Chapter 13. For now, know that `iter` | |
47 | is a method that returns each element in a collection and that `enumerate` | |
48 | wraps the result of `iter` and returns each element as part of a tuple instead. | |
49 | The first element of the tuple returned from `enumerate` is the index, and the | |
50 | second element is a reference to the element. This is a bit more convenient | |
51 | than calculating the index ourselves. | |
52 | ||
53 | Because the `enumerate` method returns a tuple, we can use patterns to | |
54 | destructure that tuple, just like everywhere else in Rust. So in the `for` | |
55 | loop, we specify a pattern that has `i` for the index in the tuple and `&item` | |
56 | for the single byte in the tuple. Because we get a reference to the element | |
57 | from `.iter().enumerate()`, we use `&` in the pattern. | |
58 | ||
59 | Inside the `for` loop, we search for the byte that represents the space by | |
60 | using the byte literal syntax. If we find a space, we return the position. | |
61 | Otherwise, we return the length of the string by using `s.len()`: | |
62 | ||
63 | ```rust,ignore | |
74b04a01 | 64 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-07/src/main.rs:inside_for}} |
13cf67c4 XL |
65 | ``` |
66 | ||
67 | We now have a way to find out the index of the end of the first word in the | |
68 | string, but there’s a problem. We’re returning a `usize` on its own, but it’s | |
69 | only a meaningful number in the context of the `&String`. In other words, | |
70 | because it’s a separate value from the `String`, there’s no guarantee that it | |
71 | will still be valid in the future. Consider the program in Listing 4-8 that | |
69743fb6 | 72 | uses the `first_word` function from Listing 4-7. |
13cf67c4 XL |
73 | |
74 | <span class="filename">Filename: src/main.rs</span> | |
75 | ||
76 | ```rust | |
74b04a01 | 77 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-08/src/main.rs:here}} |
13cf67c4 XL |
78 | ``` |
79 | ||
80 | <span class="caption">Listing 4-8: Storing the result from calling the | |
81 | `first_word` function and then changing the `String` contents</span> | |
82 | ||
83 | This program compiles without any errors and would also do so if we used `word` | |
84 | after calling `s.clear()`. Because `word` isn’t connected to the state of `s` | |
85 | at all, `word` still contains the value `5`. We could use that value `5` with | |
86 | the variable `s` to try to extract the first word out, but this would be a bug | |
87 | because the contents of `s` have changed since we saved `5` in `word`. | |
88 | ||
89 | Having to worry about the index in `word` getting out of sync with the data in | |
90 | `s` is tedious and error prone! Managing these indices is even more brittle if | |
91 | we write a `second_word` function. Its signature would have to look like this: | |
92 | ||
93 | ```rust,ignore | |
94 | fn second_word(s: &String) -> (usize, usize) { | |
95 | ``` | |
96 | ||
97 | Now we’re tracking a starting *and* an ending index, and we have even more | |
98 | values that were calculated from data in a particular state but aren’t tied to | |
99 | that state at all. We now have three unrelated variables floating around that | |
100 | need to be kept in sync. | |
101 | ||
102 | Luckily, Rust has a solution to this problem: string slices. | |
103 | ||
104 | ### String Slices | |
105 | ||
106 | A *string slice* is a reference to part of a `String`, and it looks like this: | |
107 | ||
108 | ```rust | |
74b04a01 | 109 | {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-17-slice/src/main.rs:here}} |
13cf67c4 XL |
110 | ``` |
111 | ||
112 | This is similar to taking a reference to the whole `String` but with the extra | |
113 | `[0..5]` bit. Rather than a reference to the entire `String`, it’s a reference | |
9fa01778 | 114 | to a portion of the `String`. |
13cf67c4 XL |
115 | |
116 | We can create slices using a range within brackets by specifying | |
117 | `[starting_index..ending_index]`, where `starting_index` is the first position | |
118 | in the slice and `ending_index` is one more than the last position in the | |
119 | slice. Internally, the slice data structure stores the starting position and | |
120 | the length of the slice, which corresponds to `ending_index` minus | |
121 | `starting_index`. So in the case of `let world = &s[6..11];`, `world` would be | |
60c5eb7d | 122 | a slice that contains a pointer to the 7th byte (counting from 1) of `s` with a length value of 5. |
13cf67c4 XL |
123 | |
124 | Figure 4-6 shows this in a diagram. | |
125 | ||
126 | <img alt="world containing a pointer to the 6th byte of String s and a length 5" src="img/trpl04-06.svg" class="center" style="width: 50%;" /> | |
127 | ||
128 | <span class="caption">Figure 4-6: String slice referring to part of a | |
129 | `String`</span> | |
130 | ||
131 | With Rust’s `..` range syntax, if you want to start at the first index (zero), | |
132 | you can drop the value before the two periods. In other words, these are equal: | |
133 | ||
134 | ```rust | |
135 | let s = String::from("hello"); | |
136 | ||
137 | let slice = &s[0..2]; | |
138 | let slice = &s[..2]; | |
139 | ``` | |
140 | ||
141 | By the same token, if your slice includes the last byte of the `String`, you | |
142 | can drop the trailing number. That means these are equal: | |
143 | ||
144 | ```rust | |
145 | let s = String::from("hello"); | |
146 | ||
147 | let len = s.len(); | |
148 | ||
149 | let slice = &s[3..len]; | |
150 | let slice = &s[3..]; | |
151 | ``` | |
152 | ||
153 | You can also drop both values to take a slice of the entire string. So these | |
154 | are equal: | |
155 | ||
156 | ```rust | |
157 | let s = String::from("hello"); | |
158 | ||
159 | let len = s.len(); | |
160 | ||
161 | let slice = &s[0..len]; | |
162 | let slice = &s[..]; | |
163 | ``` | |
164 | ||
165 | > Note: String slice range indices must occur at valid UTF-8 character | |
166 | > boundaries. If you attempt to create a string slice in the middle of a | |
167 | > multibyte character, your program will exit with an error. For the purposes | |
168 | > of introducing string slices, we are assuming ASCII only in this section; a | |
9fa01778 XL |
169 | > more thorough discussion of UTF-8 handling is in the [“Storing UTF-8 Encoded |
170 | > Text with Strings”][strings]<!-- ignore --> section of Chapter 8. | |
13cf67c4 XL |
171 | |
172 | With all this information in mind, let’s rewrite `first_word` to return a | |
173 | slice. The type that signifies “string slice” is written as `&str`: | |
174 | ||
175 | <span class="filename">Filename: src/main.rs</span> | |
176 | ||
177 | ```rust | |
74b04a01 | 178 | {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-18-first-word-slice/src/main.rs:here}} |
13cf67c4 XL |
179 | ``` |
180 | ||
181 | We get the index for the end of the word in the same way as we did in Listing | |
182 | 4-7, by looking for the first occurrence of a space. When we find a space, we | |
183 | return a string slice using the start of the string and the index of the space | |
184 | as the starting and ending indices. | |
185 | ||
186 | Now when we call `first_word`, we get back a single value that is tied to the | |
187 | underlying data. The value is made up of a reference to the starting point of | |
188 | the slice and the number of elements in the slice. | |
189 | ||
190 | Returning a slice would also work for a `second_word` function: | |
191 | ||
192 | ```rust,ignore | |
193 | fn second_word(s: &String) -> &str { | |
194 | ``` | |
195 | ||
196 | We now have a straightforward API that’s much harder to mess up, because the | |
197 | compiler will ensure the references into the `String` remain valid. Remember | |
198 | the bug in the program in Listing 4-8, when we got the index to the end of the | |
199 | first word but then cleared the string so our index was invalid? That code was | |
200 | logically incorrect but didn’t show any immediate errors. The problems would | |
201 | show up later if we kept trying to use the first word index with an emptied | |
202 | string. Slices make this bug impossible and let us know we have a problem with | |
203 | our code much sooner. Using the slice version of `first_word` will throw a | |
69743fb6 | 204 | compile-time error: |
13cf67c4 XL |
205 | |
206 | <span class="filename">Filename: src/main.rs</span> | |
207 | ||
208 | ```rust,ignore,does_not_compile | |
74b04a01 | 209 | {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-19-slice-error/src/main.rs:here}} |
13cf67c4 XL |
210 | ``` |
211 | ||
212 | Here’s the compiler error: | |
213 | ||
f035d41b | 214 | ```console |
74b04a01 | 215 | {{#include ../listings/ch04-understanding-ownership/no-listing-19-slice-error/output.txt}} |
13cf67c4 XL |
216 | ``` |
217 | ||
218 | Recall from the borrowing rules that if we have an immutable reference to | |
219 | something, we cannot also take a mutable reference. Because `clear` needs to | |
48663c56 XL |
220 | truncate the `String`, it needs to get a mutable reference. Rust disallows |
221 | this, and compilation fails. Not only has Rust made our API easier to use, but | |
222 | it has also eliminated an entire class of errors at compile time! | |
13cf67c4 XL |
223 | |
224 | #### String Literals Are Slices | |
225 | ||
226 | Recall that we talked about string literals being stored inside the binary. Now | |
227 | that we know about slices, we can properly understand string literals: | |
228 | ||
229 | ```rust | |
230 | let s = "Hello, world!"; | |
231 | ``` | |
232 | ||
233 | The type of `s` here is `&str`: it’s a slice pointing to that specific point of | |
234 | the binary. This is also why string literals are immutable; `&str` is an | |
235 | immutable reference. | |
236 | ||
237 | #### String Slices as Parameters | |
238 | ||
69743fb6 XL |
239 | Knowing that you can take slices of literals and `String` values leads us to |
240 | one more improvement on `first_word`, and that’s its signature: | |
13cf67c4 XL |
241 | |
242 | ```rust,ignore | |
243 | fn first_word(s: &String) -> &str { | |
244 | ``` | |
245 | ||
69743fb6 | 246 | A more experienced Rustacean would write the signature shown in Listing 4-9 |
48663c56 | 247 | instead because it allows us to use the same function on both `&String` values |
69743fb6 | 248 | and `&str` values. |
13cf67c4 XL |
249 | |
250 | ```rust,ignore | |
74b04a01 | 251 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-09/src/main.rs:here}} |
13cf67c4 XL |
252 | ``` |
253 | ||
69743fb6 XL |
254 | <span class="caption">Listing 4-9: Improving the `first_word` function by using |
255 | a string slice for the type of the `s` parameter</span> | |
256 | ||
13cf67c4 XL |
257 | If we have a string slice, we can pass that directly. If we have a `String`, we |
258 | can pass a slice of the entire `String`. Defining a function to take a string | |
259 | slice instead of a reference to a `String` makes our API more general and useful | |
260 | without losing any functionality: | |
261 | ||
262 | <span class="filename">Filename: src/main.rs</span> | |
263 | ||
264 | ```rust | |
74b04a01 | 265 | {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-09/src/main.rs:usage}} |
13cf67c4 XL |
266 | ``` |
267 | ||
268 | ### Other Slices | |
269 | ||
270 | String slices, as you might imagine, are specific to strings. But there’s a | |
271 | more general slice type, too. Consider this array: | |
272 | ||
273 | ```rust | |
274 | let a = [1, 2, 3, 4, 5]; | |
275 | ``` | |
276 | ||
277 | Just as we might want to refer to a part of a string, we might want to refer | |
278 | to part of an array. We’d do so like this: | |
279 | ||
280 | ```rust | |
281 | let a = [1, 2, 3, 4, 5]; | |
282 | ||
283 | let slice = &a[1..3]; | |
284 | ``` | |
285 | ||
286 | This slice has the type `&[i32]`. It works the same way as string slices do, by | |
287 | storing a reference to the first element and a length. You’ll use this kind of | |
288 | slice for all sorts of other collections. We’ll discuss these collections in | |
289 | detail when we talk about vectors in Chapter 8. | |
290 | ||
291 | ## Summary | |
292 | ||
293 | The concepts of ownership, borrowing, and slices ensure memory safety in Rust | |
294 | programs at compile time. The Rust language gives you control over your memory | |
295 | usage in the same way as other systems programming languages, but having the | |
296 | owner of data automatically clean up that data when the owner goes out of scope | |
297 | means you don’t have to write and debug extra code to get this control. | |
298 | ||
299 | Ownership affects how lots of other parts of Rust work, so we’ll talk about | |
300 | these concepts further throughout the rest of the book. Let’s move on to | |
301 | Chapter 5 and look at grouping pieces of data together in a `struct`. | |
9fa01778 XL |
302 | |
303 | [strings]: ch08-02-strings.html#storing-utf-8-encoded-text-with-strings |