]>
Commit | Line | Data |
---|---|---|
13cf67c4 XL |
1 | ## The Slice Type |
2 | ||
3 | Another data type that does not have ownership is the *slice*. Slices let you | |
4 | reference a contiguous sequence of elements in a collection rather than the | |
5 | whole collection. | |
6 | ||
7 | Here’s a small programming problem: write a function that takes a string and | |
8 | returns the first word it finds in that string. If the function doesn’t find a | |
9 | space in the string, the whole string must be one word, so the entire string | |
10 | should be returned. | |
11 | ||
12 | Let’s think about the signature of this function: | |
13 | ||
14 | ```rust,ignore | |
15 | fn first_word(s: &String) -> ? | |
16 | ``` | |
17 | ||
18 | This function, `first_word`, has a `&String` as a parameter. We don’t want | |
19 | ownership, so this is fine. But what should we return? We don’t really have a | |
20 | way to talk about *part* of a string. However, we could return the index of the | |
69743fb6 | 21 | end of the word. Let’s try that, as shown in Listing 4-7. |
13cf67c4 XL |
22 | |
23 | <span class="filename">Filename: src/main.rs</span> | |
24 | ||
25 | ```rust | |
26 | fn first_word(s: &String) -> usize { | |
27 | let bytes = s.as_bytes(); | |
28 | ||
29 | for (i, &item) in bytes.iter().enumerate() { | |
30 | if item == b' ' { | |
31 | return i; | |
32 | } | |
33 | } | |
34 | ||
35 | s.len() | |
36 | } | |
37 | ``` | |
38 | ||
39 | <span class="caption">Listing 4-7: The `first_word` function that returns a | |
40 | byte index value into the `String` parameter</span> | |
41 | ||
42 | Because we need to go through the `String` element by element and check whether | |
43 | a value is a space, we’ll convert our `String` to an array of bytes using the | |
44 | `as_bytes` method: | |
45 | ||
46 | ```rust,ignore | |
47 | let bytes = s.as_bytes(); | |
48 | ``` | |
49 | ||
50 | Next, we create an iterator over the array of bytes using the `iter` method: | |
51 | ||
52 | ```rust,ignore | |
53 | for (i, &item) in bytes.iter().enumerate() { | |
54 | ``` | |
55 | ||
56 | We’ll discuss iterators in more detail in Chapter 13. For now, know that `iter` | |
57 | is a method that returns each element in a collection and that `enumerate` | |
58 | wraps the result of `iter` and returns each element as part of a tuple instead. | |
59 | The first element of the tuple returned from `enumerate` is the index, and the | |
60 | second element is a reference to the element. This is a bit more convenient | |
61 | than calculating the index ourselves. | |
62 | ||
63 | Because the `enumerate` method returns a tuple, we can use patterns to | |
64 | destructure that tuple, just like everywhere else in Rust. So in the `for` | |
65 | loop, we specify a pattern that has `i` for the index in the tuple and `&item` | |
66 | for the single byte in the tuple. Because we get a reference to the element | |
67 | from `.iter().enumerate()`, we use `&` in the pattern. | |
68 | ||
69 | Inside the `for` loop, we search for the byte that represents the space by | |
70 | using the byte literal syntax. If we find a space, we return the position. | |
71 | Otherwise, we return the length of the string by using `s.len()`: | |
72 | ||
73 | ```rust,ignore | |
74 | if item == b' ' { | |
75 | return i; | |
76 | } | |
77 | } | |
69743fb6 | 78 | |
13cf67c4 XL |
79 | s.len() |
80 | ``` | |
81 | ||
82 | We now have a way to find out the index of the end of the first word in the | |
83 | string, but there’s a problem. We’re returning a `usize` on its own, but it’s | |
84 | only a meaningful number in the context of the `&String`. In other words, | |
85 | because it’s a separate value from the `String`, there’s no guarantee that it | |
86 | will still be valid in the future. Consider the program in Listing 4-8 that | |
69743fb6 | 87 | uses the `first_word` function from Listing 4-7. |
13cf67c4 XL |
88 | |
89 | <span class="filename">Filename: src/main.rs</span> | |
90 | ||
91 | ```rust | |
92 | # fn first_word(s: &String) -> usize { | |
93 | # let bytes = s.as_bytes(); | |
94 | # | |
95 | # for (i, &item) in bytes.iter().enumerate() { | |
96 | # if item == b' ' { | |
97 | # return i; | |
98 | # } | |
99 | # } | |
100 | # | |
101 | # s.len() | |
102 | # } | |
103 | # | |
104 | fn main() { | |
105 | let mut s = String::from("hello world"); | |
106 | ||
107 | let word = first_word(&s); // word will get the value 5 | |
108 | ||
69743fb6 | 109 | s.clear(); // this empties the String, making it equal to "" |
13cf67c4 XL |
110 | |
111 | // word still has the value 5 here, but there's no more string that | |
112 | // we could meaningfully use the value 5 with. word is now totally invalid! | |
113 | } | |
114 | ``` | |
115 | ||
116 | <span class="caption">Listing 4-8: Storing the result from calling the | |
117 | `first_word` function and then changing the `String` contents</span> | |
118 | ||
119 | This program compiles without any errors and would also do so if we used `word` | |
120 | after calling `s.clear()`. Because `word` isn’t connected to the state of `s` | |
121 | at all, `word` still contains the value `5`. We could use that value `5` with | |
122 | the variable `s` to try to extract the first word out, but this would be a bug | |
123 | because the contents of `s` have changed since we saved `5` in `word`. | |
124 | ||
125 | Having to worry about the index in `word` getting out of sync with the data in | |
126 | `s` is tedious and error prone! Managing these indices is even more brittle if | |
127 | we write a `second_word` function. Its signature would have to look like this: | |
128 | ||
129 | ```rust,ignore | |
130 | fn second_word(s: &String) -> (usize, usize) { | |
131 | ``` | |
132 | ||
133 | Now we’re tracking a starting *and* an ending index, and we have even more | |
134 | values that were calculated from data in a particular state but aren’t tied to | |
135 | that state at all. We now have three unrelated variables floating around that | |
136 | need to be kept in sync. | |
137 | ||
138 | Luckily, Rust has a solution to this problem: string slices. | |
139 | ||
140 | ### String Slices | |
141 | ||
142 | A *string slice* is a reference to part of a `String`, and it looks like this: | |
143 | ||
144 | ```rust | |
145 | let s = String::from("hello world"); | |
146 | ||
147 | let hello = &s[0..5]; | |
148 | let world = &s[6..11]; | |
149 | ``` | |
150 | ||
151 | This is similar to taking a reference to the whole `String` but with the extra | |
152 | `[0..5]` bit. Rather than a reference to the entire `String`, it’s a reference | |
9fa01778 | 153 | to a portion of the `String`. |
13cf67c4 XL |
154 | |
155 | We can create slices using a range within brackets by specifying | |
156 | `[starting_index..ending_index]`, where `starting_index` is the first position | |
157 | in the slice and `ending_index` is one more than the last position in the | |
158 | slice. Internally, the slice data structure stores the starting position and | |
159 | the length of the slice, which corresponds to `ending_index` minus | |
160 | `starting_index`. So in the case of `let world = &s[6..11];`, `world` would be | |
60c5eb7d | 161 | a slice that contains a pointer to the 7th byte (counting from 1) of `s` with a length value of 5. |
13cf67c4 XL |
162 | |
163 | Figure 4-6 shows this in a diagram. | |
164 | ||
165 | <img alt="world containing a pointer to the 6th byte of String s and a length 5" src="img/trpl04-06.svg" class="center" style="width: 50%;" /> | |
166 | ||
167 | <span class="caption">Figure 4-6: String slice referring to part of a | |
168 | `String`</span> | |
169 | ||
170 | With Rust’s `..` range syntax, if you want to start at the first index (zero), | |
171 | you can drop the value before the two periods. In other words, these are equal: | |
172 | ||
173 | ```rust | |
174 | let s = String::from("hello"); | |
175 | ||
176 | let slice = &s[0..2]; | |
177 | let slice = &s[..2]; | |
178 | ``` | |
179 | ||
180 | By the same token, if your slice includes the last byte of the `String`, you | |
181 | can drop the trailing number. That means these are equal: | |
182 | ||
183 | ```rust | |
184 | let s = String::from("hello"); | |
185 | ||
186 | let len = s.len(); | |
187 | ||
188 | let slice = &s[3..len]; | |
189 | let slice = &s[3..]; | |
190 | ``` | |
191 | ||
192 | You can also drop both values to take a slice of the entire string. So these | |
193 | are equal: | |
194 | ||
195 | ```rust | |
196 | let s = String::from("hello"); | |
197 | ||
198 | let len = s.len(); | |
199 | ||
200 | let slice = &s[0..len]; | |
201 | let slice = &s[..]; | |
202 | ``` | |
203 | ||
204 | > Note: String slice range indices must occur at valid UTF-8 character | |
205 | > boundaries. If you attempt to create a string slice in the middle of a | |
206 | > multibyte character, your program will exit with an error. For the purposes | |
207 | > of introducing string slices, we are assuming ASCII only in this section; a | |
9fa01778 XL |
208 | > more thorough discussion of UTF-8 handling is in the [“Storing UTF-8 Encoded |
209 | > Text with Strings”][strings]<!-- ignore --> section of Chapter 8. | |
13cf67c4 XL |
210 | |
211 | With all this information in mind, let’s rewrite `first_word` to return a | |
212 | slice. The type that signifies “string slice” is written as `&str`: | |
213 | ||
214 | <span class="filename">Filename: src/main.rs</span> | |
215 | ||
216 | ```rust | |
217 | fn first_word(s: &String) -> &str { | |
218 | let bytes = s.as_bytes(); | |
219 | ||
220 | for (i, &item) in bytes.iter().enumerate() { | |
221 | if item == b' ' { | |
222 | return &s[0..i]; | |
223 | } | |
224 | } | |
225 | ||
226 | &s[..] | |
227 | } | |
228 | ``` | |
229 | ||
230 | We get the index for the end of the word in the same way as we did in Listing | |
231 | 4-7, by looking for the first occurrence of a space. When we find a space, we | |
232 | return a string slice using the start of the string and the index of the space | |
233 | as the starting and ending indices. | |
234 | ||
235 | Now when we call `first_word`, we get back a single value that is tied to the | |
236 | underlying data. The value is made up of a reference to the starting point of | |
237 | the slice and the number of elements in the slice. | |
238 | ||
239 | Returning a slice would also work for a `second_word` function: | |
240 | ||
241 | ```rust,ignore | |
242 | fn second_word(s: &String) -> &str { | |
243 | ``` | |
244 | ||
245 | We now have a straightforward API that’s much harder to mess up, because the | |
246 | compiler will ensure the references into the `String` remain valid. Remember | |
247 | the bug in the program in Listing 4-8, when we got the index to the end of the | |
248 | first word but then cleared the string so our index was invalid? That code was | |
249 | logically incorrect but didn’t show any immediate errors. The problems would | |
250 | show up later if we kept trying to use the first word index with an emptied | |
251 | string. Slices make this bug impossible and let us know we have a problem with | |
252 | our code much sooner. Using the slice version of `first_word` will throw a | |
69743fb6 | 253 | compile-time error: |
13cf67c4 XL |
254 | |
255 | <span class="filename">Filename: src/main.rs</span> | |
256 | ||
257 | ```rust,ignore,does_not_compile | |
258 | fn main() { | |
259 | let mut s = String::from("hello world"); | |
260 | ||
261 | let word = first_word(&s); | |
262 | ||
69743fb6 | 263 | s.clear(); // error! |
13cf67c4 XL |
264 | |
265 | println!("the first word is: {}", word); | |
266 | } | |
267 | ``` | |
268 | ||
269 | Here’s the compiler error: | |
270 | ||
271 | ```text | |
272 | error[E0502]: cannot borrow `s` as mutable because it is also borrowed as immutable | |
9fa01778 | 273 | --> src/main.rs:18:5 |
13cf67c4 | 274 | | |
9fa01778 | 275 | 16 | let word = first_word(&s); |
13cf67c4 | 276 | | -- immutable borrow occurs here |
9fa01778 XL |
277 | 17 | |
278 | 18 | s.clear(); // error! | |
13cf67c4 | 279 | | ^^^^^^^^^ mutable borrow occurs here |
9fa01778 XL |
280 | 19 | |
281 | 20 | println!("the first word is: {}", word); | |
282 | | ---- immutable borrow later used here | |
13cf67c4 XL |
283 | ``` |
284 | ||
285 | Recall from the borrowing rules that if we have an immutable reference to | |
286 | something, we cannot also take a mutable reference. Because `clear` needs to | |
48663c56 XL |
287 | truncate the `String`, it needs to get a mutable reference. Rust disallows |
288 | this, and compilation fails. Not only has Rust made our API easier to use, but | |
289 | it has also eliminated an entire class of errors at compile time! | |
13cf67c4 XL |
290 | |
291 | #### String Literals Are Slices | |
292 | ||
293 | Recall that we talked about string literals being stored inside the binary. Now | |
294 | that we know about slices, we can properly understand string literals: | |
295 | ||
296 | ```rust | |
297 | let s = "Hello, world!"; | |
298 | ``` | |
299 | ||
300 | The type of `s` here is `&str`: it’s a slice pointing to that specific point of | |
301 | the binary. This is also why string literals are immutable; `&str` is an | |
302 | immutable reference. | |
303 | ||
304 | #### String Slices as Parameters | |
305 | ||
69743fb6 XL |
306 | Knowing that you can take slices of literals and `String` values leads us to |
307 | one more improvement on `first_word`, and that’s its signature: | |
13cf67c4 XL |
308 | |
309 | ```rust,ignore | |
310 | fn first_word(s: &String) -> &str { | |
311 | ``` | |
312 | ||
69743fb6 | 313 | A more experienced Rustacean would write the signature shown in Listing 4-9 |
48663c56 | 314 | instead because it allows us to use the same function on both `&String` values |
69743fb6 | 315 | and `&str` values. |
13cf67c4 XL |
316 | |
317 | ```rust,ignore | |
318 | fn first_word(s: &str) -> &str { | |
319 | ``` | |
320 | ||
69743fb6 XL |
321 | <span class="caption">Listing 4-9: Improving the `first_word` function by using |
322 | a string slice for the type of the `s` parameter</span> | |
323 | ||
13cf67c4 XL |
324 | If we have a string slice, we can pass that directly. If we have a `String`, we |
325 | can pass a slice of the entire `String`. Defining a function to take a string | |
326 | slice instead of a reference to a `String` makes our API more general and useful | |
327 | without losing any functionality: | |
328 | ||
329 | <span class="filename">Filename: src/main.rs</span> | |
330 | ||
331 | ```rust | |
332 | # fn first_word(s: &str) -> &str { | |
333 | # let bytes = s.as_bytes(); | |
334 | # | |
335 | # for (i, &item) in bytes.iter().enumerate() { | |
336 | # if item == b' ' { | |
337 | # return &s[0..i]; | |
338 | # } | |
339 | # } | |
340 | # | |
341 | # &s[..] | |
342 | # } | |
343 | fn main() { | |
344 | let my_string = String::from("hello world"); | |
345 | ||
346 | // first_word works on slices of `String`s | |
347 | let word = first_word(&my_string[..]); | |
348 | ||
349 | let my_string_literal = "hello world"; | |
350 | ||
351 | // first_word works on slices of string literals | |
352 | let word = first_word(&my_string_literal[..]); | |
353 | ||
354 | // Because string literals *are* string slices already, | |
355 | // this works too, without the slice syntax! | |
356 | let word = first_word(my_string_literal); | |
357 | } | |
358 | ``` | |
359 | ||
360 | ### Other Slices | |
361 | ||
362 | String slices, as you might imagine, are specific to strings. But there’s a | |
363 | more general slice type, too. Consider this array: | |
364 | ||
365 | ```rust | |
366 | let a = [1, 2, 3, 4, 5]; | |
367 | ``` | |
368 | ||
369 | Just as we might want to refer to a part of a string, we might want to refer | |
370 | to part of an array. We’d do so like this: | |
371 | ||
372 | ```rust | |
373 | let a = [1, 2, 3, 4, 5]; | |
374 | ||
375 | let slice = &a[1..3]; | |
376 | ``` | |
377 | ||
378 | This slice has the type `&[i32]`. It works the same way as string slices do, by | |
379 | storing a reference to the first element and a length. You’ll use this kind of | |
380 | slice for all sorts of other collections. We’ll discuss these collections in | |
381 | detail when we talk about vectors in Chapter 8. | |
382 | ||
383 | ## Summary | |
384 | ||
385 | The concepts of ownership, borrowing, and slices ensure memory safety in Rust | |
386 | programs at compile time. The Rust language gives you control over your memory | |
387 | usage in the same way as other systems programming languages, but having the | |
388 | owner of data automatically clean up that data when the owner goes out of scope | |
389 | means you don’t have to write and debug extra code to get this control. | |
390 | ||
391 | Ownership affects how lots of other parts of Rust work, so we’ll talk about | |
392 | these concepts further throughout the rest of the book. Let’s move on to | |
393 | Chapter 5 and look at grouping pieces of data together in a `struct`. | |
9fa01778 XL |
394 | |
395 | [strings]: ch08-02-strings.html#storing-utf-8-encoded-text-with-strings |