]> git.proxmox.com Git - rustc.git/blob - src/doc/book/src/ch04-01-what-is-ownership.md
New upstream version 1.66.0+dfsg1
[rustc.git] / src / doc / book / src / ch04-01-what-is-ownership.md
1 ## What Is Ownership?
2
3 *Ownership* is a set of rules that govern how a Rust program manages memory.
4 All programs have to manage the way they use a computer’s memory while running.
5 Some languages have garbage collection that regularly looks for no-longer-used
6 memory as the program runs; in other languages, the programmer must explicitly
7 allocate and free the memory. Rust uses a third approach: memory is managed
8 through a system of ownership with a set of rules that the compiler checks. If
9 any of the rules are violated, the program won’t compile. None of the features
10 of ownership will slow down your program while it’s running.
11
12 Because ownership is a new concept for many programmers, it does take some time
13 to get used to. The good news is that the more experienced you become with Rust
14 and the rules of the ownership system, the easier you’ll find it to naturally
15 develop code that is safe and efficient. Keep at it!
16
17 When you understand ownership, you’ll have a solid foundation for understanding
18 the features that make Rust unique. In this chapter, you’ll learn ownership by
19 working through some examples that focus on a very common data structure:
20 strings.
21
22 > ### The Stack and the Heap
23 >
24 > Many programming languages don’t require you to think about the stack and the
25 > heap very often. But in a systems programming language like Rust, whether a
26 > value is on the stack or the heap affects how the language behaves and why
27 > you have to make certain decisions. Parts of ownership will be described in
28 > relation to the stack and the heap later in this chapter, so here is a brief
29 > explanation in preparation.
30 >
31 > Both the stack and the heap are parts of memory available to your code to use
32 > at runtime, but they are structured in different ways. The stack stores
33 > values in the order it gets them and removes the values in the opposite
34 > order. This is referred to as *last in, first out*. Think of a stack of
35 > plates: when you add more plates, you put them on top of the pile, and when
36 > you need a plate, you take one off the top. Adding or removing plates from
37 > the middle or bottom wouldn’t work as well! Adding data is called *pushing
38 > onto the stack*, and removing data is called *popping off the stack*. All
39 > data stored on the stack must have a known, fixed size. Data with an unknown
40 > size at compile time or a size that might change must be stored on the heap
41 > instead.
42 >
43 > The heap is less organized: when you put data on the heap, you request a
44 > certain amount of space. The memory allocator finds an empty spot in the heap
45 > that is big enough, marks it as being in use, and returns a *pointer*, which
46 > is the address of that location. This process is called *allocating on the
47 > heap* and is sometimes abbreviated as just *allocating* (pushing values onto
48 > the stack is not considered allocating). Because the pointer to the heap is a
49 > known, fixed size, you can store the pointer on the stack, but when you want
50 > the actual data, you must follow the pointer. Think of being seated at a
51 > restaurant. When you enter, you state the number of people in your group, and
52 > the host finds an empty table that fits everyone and leads you there. If
53 > someone in your group comes late, they can ask where you’ve been seated to
54 > find you.
55 >
56 > Pushing to the stack is faster than allocating on the heap because the
57 > allocator never has to search for a place to store new data; that location is
58 > always at the top of the stack. Comparatively, allocating space on the heap
59 > requires more work because the allocator must first find a big enough space
60 > to hold the data and then perform bookkeeping to prepare for the next
61 > allocation.
62 >
63 > Accessing data in the heap is slower than accessing data on the stack because
64 > you have to follow a pointer to get there. Contemporary processors are faster
65 > if they jump around less in memory. Continuing the analogy, consider a server
66 > at a restaurant taking orders from many tables. It’s most efficient to get
67 > all the orders at one table before moving on to the next table. Taking an
68 > order from table A, then an order from table B, then one from A again, and
69 > then one from B again would be a much slower process. By the same token, a
70 > processor can do its job better if it works on data that’s close to other
71 > data (as it is on the stack) rather than farther away (as it can be on the
72 > heap).
73 >
74 > When your code calls a function, the values passed into the function
75 > (including, potentially, pointers to data on the heap) and the function’s
76 > local variables get pushed onto the stack. When the function is over, those
77 > values get popped off the stack.
78 >
79 > Keeping track of what parts of code are using what data on the heap,
80 > minimizing the amount of duplicate data on the heap, and cleaning up unused
81 > data on the heap so you don’t run out of space are all problems that ownership
82 > addresses. Once you understand ownership, you won’t need to think about the
83 > stack and the heap very often, but knowing that the main purpose of ownership
84 > is to manage heap data can help explain why it works the way it does.
85
86 ### Ownership Rules
87
88 First, let’s take a look at the ownership rules. Keep these rules in mind as we
89 work through the examples that illustrate them:
90
91 * Each value in Rust has an *owner*.
92 * There can only be one owner at a time.
93 * When the owner goes out of scope, the value will be dropped.
94
95 ### Variable Scope
96
97 Now that we’re past basic Rust syntax, we won’t include all the `fn main() {`
98 code in examples, so if you’re following along, make sure to put the following
99 examples inside a `main` function manually. As a result, our examples will be a
100 bit more concise, letting us focus on the actual details rather than
101 boilerplate code.
102
103 As a first example of ownership, we’ll look at the *scope* of some variables. A
104 scope is the range within a program for which an item is valid. Take the
105 following variable:
106
107 ```rust
108 let s = "hello";
109 ```
110
111 The variable `s` refers to a string literal, where the value of the string is
112 hardcoded into the text of our program. The variable is valid from the point at
113 which it’s declared until the end of the current *scope*. Listing 4-1 shows a
114 program with comments annotating where the variable `s` would be valid.
115
116 ```rust
117 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-01/src/main.rs:here}}
118 ```
119
120 <span class="caption">Listing 4-1: A variable and the scope in which it is
121 valid</span>
122
123 In other words, there are two important points in time here:
124
125 * When `s` comes *into* scope, it is valid.
126 * It remains valid until it goes *out of* scope.
127
128 At this point, the relationship between scopes and when variables are valid is
129 similar to that in other programming languages. Now we’ll build on top of this
130 understanding by introducing the `String` type.
131
132 ### The `String` Type
133
134 To illustrate the rules of ownership, we need a data type that is more complex
135 than those we covered in the [“Data Types”][data-types]<!-- ignore --> section
136 of Chapter 3. The types covered previously are of a known size, can be stored
137 on the stack and popped off the stack when their scope is over, and can be
138 quickly and trivially copied to make a new, independent instance if another
139 part of code needs to use the same value in a different scope. But we want to
140 look at data that is stored on the heap and explore how Rust knows when to
141 clean up that data, and the `String` type is a great example.
142
143 We’ll concentrate on the parts of `String` that relate to ownership. These
144 aspects also apply to other complex data types, whether they are provided by
145 the standard library or created by you. We’ll discuss `String` in more depth in
146 [Chapter 8][ch8]<!-- ignore -->.
147
148 We’ve already seen string literals, where a string value is hardcoded into our
149 program. String literals are convenient, but they aren’t suitable for every
150 situation in which we may want to use text. One reason is that they’re
151 immutable. Another is that not every string value can be known when we write
152 our code: for example, what if we want to take user input and store it? For
153 these situations, Rust has a second string type, `String`. This type manages
154 data allocated on the heap and as such is able to store an amount of text that
155 is unknown to us at compile time. You can create a `String` from a string
156 literal using the `from` function, like so:
157
158 ```rust
159 let s = String::from("hello");
160 ```
161
162 The double colon `::` operator allows us to namespace this particular `from`
163 function under the `String` type rather than using some sort of name like
164 `string_from`. We’ll discuss this syntax more in the [“Method
165 Syntax”][method-syntax]<!-- ignore --> section of Chapter 5, and when we talk
166 about namespacing with modules in [“Paths for Referring to an Item in the
167 Module Tree”][paths-module-tree]<!-- ignore --> in Chapter 7.
168
169 This kind of string *can* be mutated:
170
171 ```rust
172 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-01-can-mutate-string/src/main.rs:here}}
173 ```
174
175 So, what’s the difference here? Why can `String` be mutated but literals
176 cannot? The difference is in how these two types deal with memory.
177
178 ### Memory and Allocation
179
180 In the case of a string literal, we know the contents at compile time, so the
181 text is hardcoded directly into the final executable. This is why string
182 literals are fast and efficient. But these properties only come from the string
183 literal’s immutability. Unfortunately, we can’t put a blob of memory into the
184 binary for each piece of text whose size is unknown at compile time and whose
185 size might change while running the program.
186
187 With the `String` type, in order to support a mutable, growable piece of text,
188 we need to allocate an amount of memory on the heap, unknown at compile time,
189 to hold the contents. This means:
190
191 * The memory must be requested from the memory allocator at runtime.
192 * We need a way of returning this memory to the allocator when we’re done with
193 our `String`.
194
195 That first part is done by us: when we call `String::from`, its implementation
196 requests the memory it needs. This is pretty much universal in programming
197 languages.
198
199 However, the second part is different. In languages with a *garbage collector
200 (GC)*, the GC keeps track of and cleans up memory that isn’t being used
201 anymore, and we don’t need to think about it. In most languages without a GC,
202 it’s our responsibility to identify when memory is no longer being used and to
203 call code to explicitly free it, just as we did to request it. Doing this
204 correctly has historically been a difficult programming problem. If we forget,
205 we’ll waste memory. If we do it too early, we’ll have an invalid variable. If
206 we do it twice, that’s a bug too. We need to pair exactly one `allocate` with
207 exactly one `free`.
208
209 Rust takes a different path: the memory is automatically returned once the
210 variable that owns it goes out of scope. Here’s a version of our scope example
211 from Listing 4-1 using a `String` instead of a string literal:
212
213 ```rust
214 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-02-string-scope/src/main.rs:here}}
215 ```
216
217 There is a natural point at which we can return the memory our `String` needs
218 to the allocator: when `s` goes out of scope. When a variable goes out of
219 scope, Rust calls a special function for us. This function is called
220 [`drop`][drop]<!-- ignore -->, and it’s where the author of `String` can put
221 the code to return the memory. Rust calls `drop` automatically at the closing
222 curly bracket.
223
224 > Note: In C++, this pattern of deallocating resources at the end of an item’s
225 > lifetime is sometimes called *Resource Acquisition Is Initialization (RAII)*.
226 > The `drop` function in Rust will be familiar to you if you’ve used RAII
227 > patterns.
228
229 This pattern has a profound impact on the way Rust code is written. It may seem
230 simple right now, but the behavior of code can be unexpected in more
231 complicated situations when we want to have multiple variables use the data
232 we’ve allocated on the heap. Let’s explore some of those situations now.
233
234 <!-- Old heading. Do not remove or links may break. -->
235 <a id="ways-variables-and-data-interact-move"></a>
236
237 #### Variables and Data Interacting with Move
238
239 Multiple variables can interact with the same data in different ways in Rust.
240 Let’s look at an example using an integer in Listing 4-2.
241
242 ```rust
243 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-02/src/main.rs:here}}
244 ```
245
246 <span class="caption">Listing 4-2: Assigning the integer value of variable `x`
247 to `y`</span>
248
249 We can probably guess what this is doing: “bind the value `5` to `x`; then make
250 a copy of the value in `x` and bind it to `y`.” We now have two variables, `x`
251 and `y`, and both equal `5`. This is indeed what is happening, because integers
252 are simple values with a known, fixed size, and these two `5` values are pushed
253 onto the stack.
254
255 Now let’s look at the `String` version:
256
257 ```rust
258 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-03-string-move/src/main.rs:here}}
259 ```
260
261 This looks very similar, so we might assume that the way it works would be the
262 same: that is, the second line would make a copy of the value in `s1` and bind
263 it to `s2`. But this isn’t quite what happens.
264
265 Take a look at Figure 4-1 to see what is happening to `String` under the
266 covers. A `String` is made up of three parts, shown on the left: a pointer to
267 the memory that holds the contents of the string, a length, and a capacity.
268 This group of data is stored on the stack. On the right is the memory on the
269 heap that holds the contents.
270
271 <img alt="Two tables: the first table contains the representation of s1 on the
272 stack, consisting of its length (5), capacity (5), and a pointer to the first
273 value in the second table. The second table contains the representation of the
274 string data on the heap, byte by byte." src="img/trpl04-01.svg" class="center"
275 style="width: 50%;" />
276
277 <span class="caption">Figure 4-1: Representation in memory of a `String`
278 holding the value `"hello"` bound to `s1`</span>
279
280 The length is how much memory, in bytes, the contents of the `String` are
281 currently using. The capacity is the total amount of memory, in bytes, that the
282 `String` has received from the allocator. The difference between length and
283 capacity matters, but not in this context, so for now, it’s fine to ignore the
284 capacity.
285
286 When we assign `s1` to `s2`, the `String` data is copied, meaning we copy the
287 pointer, the length, and the capacity that are on the stack. We do not copy the
288 data on the heap that the pointer refers to. In other words, the data
289 representation in memory looks like Figure 4-2.
290
291 <img alt="Three tables: tables s1 and s2 representing those strings on the
292 stack, respectively, and both pointing to the same string data on the heap."
293 src="img/trpl04-02.svg" class="center" style="width: 50%;" />
294
295 <span class="caption">Figure 4-2: Representation in memory of the variable `s2`
296 that has a copy of the pointer, length, and capacity of `s1`</span>
297
298 The representation does *not* look like Figure 4-3, which is what memory would
299 look like if Rust instead copied the heap data as well. If Rust did this, the
300 operation `s2 = s1` could be very expensive in terms of runtime performance if
301 the data on the heap were large.
302
303 <img alt="Four tables: two tables representing the stack data for s1 and s2,
304 and each points to its own copy of string data on the heap."
305 src="img/trpl04-03.svg" class="center" style="width: 50%;" />
306
307 <span class="caption">Figure 4-3: Another possibility for what `s2 = s1` might
308 do if Rust copied the heap data as well</span>
309
310 Earlier, we said that when a variable goes out of scope, Rust automatically
311 calls the `drop` function and cleans up the heap memory for that variable. But
312 Figure 4-2 shows both data pointers pointing to the same location. This is a
313 problem: when `s2` and `s1` go out of scope, they will both try to free the
314 same memory. This is known as a *double free* error and is one of the memory
315 safety bugs we mentioned previously. Freeing memory twice can lead to memory
316 corruption, which can potentially lead to security vulnerabilities.
317
318 To ensure memory safety, after the line `let s2 = s1;`, Rust considers `s1` as
319 no longer valid. Therefore, Rust doesn’t need to free anything when `s1` goes
320 out of scope. Check out what happens when you try to use `s1` after `s2` is
321 created; it won’t work:
322
323 ```rust,ignore,does_not_compile
324 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-04-cant-use-after-move/src/main.rs:here}}
325 ```
326
327 You’ll get an error like this because Rust prevents you from using the
328 invalidated reference:
329
330 ```console
331 {{#include ../listings/ch04-understanding-ownership/no-listing-04-cant-use-after-move/output.txt}}
332 ```
333
334 If you’ve heard the terms *shallow copy* and *deep copy* while working with
335 other languages, the concept of copying the pointer, length, and capacity
336 without copying the data probably sounds like making a shallow copy. But
337 because Rust also invalidates the first variable, instead of being called a
338 shallow copy, it’s known as a *move*. In this example, we would say that `s1`
339 was *moved* into `s2`. So, what actually happens is shown in Figure 4-4.
340
341 <img alt="Three tables: tables s1 and s2 representing those strings on the
342 stack, respectively, and both pointing to the same string data on the heap.
343 Table s1 is grayed out be-cause s1 is no longer valid; only s2 can be used to
344 access the heap data." src="img/trpl04-04.svg" class="center" style="width:
345 50%;" />
346
347 <span class="caption">Figure 4-4: Representation in memory after `s1` has been
348 invalidated</span>
349
350 That solves our problem! With only `s2` valid, when it goes out of scope it
351 alone will free the memory, and we’re done.
352
353 In addition, there’s a design choice that’s implied by this: Rust will never
354 automatically create “deep” copies of your data. Therefore, any *automatic*
355 copying can be assumed to be inexpensive in terms of runtime performance.
356
357 <!-- Old heading. Do not remove or links may break. -->
358 <a id="ways-variables-and-data-interact-clone"></a>
359
360 #### Variables and Data Interacting with Clone
361
362 If we *do* want to deeply copy the heap data of the `String`, not just the
363 stack data, we can use a common method called `clone`. We’ll discuss method
364 syntax in Chapter 5, but because methods are a common feature in many
365 programming languages, you’ve probably seen them before.
366
367 Here’s an example of the `clone` method in action:
368
369 ```rust
370 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-05-clone/src/main.rs:here}}
371 ```
372
373 This works just fine and explicitly produces the behavior shown in Figure 4-3,
374 where the heap data *does* get copied.
375
376 When you see a call to `clone`, you know that some arbitrary code is being
377 executed and that code may be expensive. It’s a visual indicator that something
378 different is going on.
379
380 #### Stack-Only Data: Copy
381
382 There’s another wrinkle we haven’t talked about yet. This code using
383 integers—part of which was shown in Listing 4-2—works and is valid:
384
385 ```rust
386 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-06-copy/src/main.rs:here}}
387 ```
388
389 But this code seems to contradict what we just learned: we don’t have a call to
390 `clone`, but `x` is still valid and wasn’t moved into `y`.
391
392 The reason is that types such as integers that have a known size at compile
393 time are stored entirely on the stack, so copies of the actual values are quick
394 to make. That means there’s no reason we would want to prevent `x` from being
395 valid after we create the variable `y`. In other words, there’s no difference
396 between deep and shallow copying here, so calling `clone` wouldn’t do anything
397 different from the usual shallow copying, and we can leave it out.
398
399 Rust has a special annotation called the `Copy` trait that we can place on
400 types that are stored on the stack, as integers are (we’ll talk more about
401 traits in [Chapter 10][traits]<!-- ignore -->). If a type implements the `Copy`
402 trait, variables that use it do not move, but rather are trivially copied,
403 making them still valid after assignment to another variable.
404
405 Rust won’t let us annotate a type with `Copy` if the type, or any of its parts,
406 has implemented the `Drop` trait. If the type needs something special to happen
407 when the value goes out of scope and we add the `Copy` annotation to that type,
408 we’ll get a compile-time error. To learn about how to add the `Copy` annotation
409 to your type to implement the trait, see [“Derivable
410 Traits”][derivable-traits]<!-- ignore --> in Appendix C.
411
412 So, what types implement the `Copy` trait? You can check the documentation for
413 the given type to be sure, but as a general rule, any group of simple scalar
414 values can implement `Copy`, and nothing that requires allocation or is some
415 form of resource can implement `Copy`. Here are some of the types that
416 implement `Copy`:
417
418 * All the integer types, such as `u32`.
419 * The Boolean type, `bool`, with values `true` and `false`.
420 * All the floating-point types, such as `f64`.
421 * The character type, `char`.
422 * Tuples, if they only contain types that also implement `Copy`. For example,
423 `(i32, i32)` implements `Copy`, but `(i32, String)` does not.
424
425 ### Ownership and Functions
426
427 The mechanics of passing a value to a function are similar to those when
428 assigning a value to a variable. Passing a variable to a function will move or
429 copy, just as assignment does. Listing 4-3 has an example with some annotations
430 showing where variables go into and out of scope.
431
432 <span class="filename">Filename: src/main.rs</span>
433
434 ```rust
435 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-03/src/main.rs}}
436 ```
437
438 <span class="caption">Listing 4-3: Functions with ownership and scope
439 annotated</span>
440
441 If we tried to use `s` after the call to `takes_ownership`, Rust would throw a
442 compile-time error. These static checks protect us from mistakes. Try adding
443 code to `main` that uses `s` and `x` to see where you can use them and where
444 the ownership rules prevent you from doing so.
445
446 ### Return Values and Scope
447
448 Returning values can also transfer ownership. Listing 4-4 shows an example of a
449 function that returns some value, with similar annotations as those in Listing
450 4-3.
451
452 <span class="filename">Filename: src/main.rs</span>
453
454 ```rust
455 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-04/src/main.rs}}
456 ```
457
458 <span class="caption">Listing 4-4: Transferring ownership of return
459 values</span>
460
461 The ownership of a variable follows the same pattern every time: assigning a
462 value to another variable moves it. When a variable that includes data on the
463 heap goes out of scope, the value will be cleaned up by `drop` unless ownership
464 of the data has been moved to another variable.
465
466 While this works, taking ownership and then returning ownership with every
467 function is a bit tedious. What if we want to let a function use a value but
468 not take ownership? It’s quite annoying that anything we pass in also needs to
469 be passed back if we want to use it again, in addition to any data resulting
470 from the body of the function that we might want to return as well.
471
472 Rust does let us return multiple values using a tuple, as shown in Listing 4-5.
473
474 <span class="filename">Filename: src/main.rs</span>
475
476 ```rust
477 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-05/src/main.rs}}
478 ```
479
480 <span class="caption">Listing 4-5: Returning ownership of parameters</span>
481
482 But this is too much ceremony and a lot of work for a concept that should be
483 common. Luckily for us, Rust has a feature for using a value without
484 transferring ownership, called *references*.
485
486 [data-types]: ch03-02-data-types.html#data-types
487 [ch8]: ch08-02-strings.html
488 [traits]: ch10-02-traits.html
489 [derivable-traits]: appendix-03-derivable-traits.html
490 [method-syntax]: ch05-03-method-syntax.html#method-syntax
491 [paths-module-tree]: ch07-03-paths-for-referring-to-an-item-in-the-module-tree.html
492 [drop]: ../std/ops/trait.Drop.html#tymethod.drop