src/doc/book/src/ch04-01-what-is-ownership.md

   1 ## What Is Ownership?
   2
   3 *Ownership* is a set of rules that govern how a Rust program manages memory.
   4 All programs have to manage the way they use a computer’s memory while running.
   5 Some languages have garbage collection that regularly looks for no-longer-used
   6 memory as the program runs; in other languages, the programmer must explicitly
   7 allocate and free the memory. Rust uses a third approach: memory is managed
   8 through a system of ownership with a set of rules that the compiler checks. If
   9 any of the rules are violated, the program won’t compile. None of the features
  10 of ownership will slow down your program while it’s running.
  11
  12 Because ownership is a new concept for many programmers, it does take some time
  13 to get used to. The good news is that the more experienced you become with Rust
  14 and the rules of the ownership system, the easier you’ll find it to naturally
  15 develop code that is safe and efficient. Keep at it!
  16
  17 When you understand ownership, you’ll have a solid foundation for understanding
  18 the features that make Rust unique. In this chapter, you’ll learn ownership by
  19 working through some examples that focus on a very common data structure:
  20 strings.
  21
  22 > ### The Stack and the Heap
  23 >
  24 > Many programming languages don’t require you to think about the stack and the
  25 > heap very often. But in a systems programming language like Rust, whether a
  26 > value is on the stack or the heap affects how the language behaves and why
  27 > you have to make certain decisions. Parts of ownership will be described in
  28 > relation to the stack and the heap later in this chapter, so here is a brief
  29 > explanation in preparation.
  30 >
  31 > Both the stack and the heap are parts of memory available to your code to use
  32 > at runtime, but they are structured in different ways. The stack stores
  33 > values in the order it gets them and removes the values in the opposite
  34 > order. This is referred to as *last in, first out*. Think of a stack of
  35 > plates: when you add more plates, you put them on top of the pile, and when
  36 > you need a plate, you take one off the top. Adding or removing plates from
  37 > the middle or bottom wouldn’t work as well! Adding data is called *pushing
  38 > onto the stack*, and removing data is called *popping off the stack*. All
  39 > data stored on the stack must have a known, fixed size. Data with an unknown
  40 > size at compile time or a size that might change must be stored on the heap
  41 > instead.
  42 >
  43 > The heap is less organized: when you put data on the heap, you request a
  44 > certain amount of space. The memory allocator finds an empty spot in the heap
  45 > that is big enough, marks it as being in use, and returns a *pointer*, which
  46 > is the address of that location. This process is called *allocating on the
  47 > heap* and is sometimes abbreviated as just *allocating* (pushing values onto
  48 > the stack is not considered allocating). Because the pointer to the heap is a
  49 > known, fixed size, you can store the pointer on the stack, but when you want
  50 > the actual data, you must follow the pointer. Think of being seated at a
  51 > restaurant. When you enter, you state the number of people in your group, and
  52 > the host finds an empty table that fits everyone and leads you there. If
  53 > someone in your group comes late, they can ask where you’ve been seated to
  54 > find you.
  55 >
  56 > Pushing to the stack is faster than allocating on the heap because the
  57 > allocator never has to search for a place to store new data; that location is
  58 > always at the top of the stack. Comparatively, allocating space on the heap
  59 > requires more work because the allocator must first find a big enough space
  60 > to hold the data and then perform bookkeeping to prepare for the next
  61 > allocation.
  62 >
  63 > Accessing data in the heap is slower than accessing data on the stack because
  64 > you have to follow a pointer to get there. Contemporary processors are faster
  65 > if they jump around less in memory. Continuing the analogy, consider a server
  66 > at a restaurant taking orders from many tables. It’s most efficient to get
  67 > all the orders at one table before moving on to the next table. Taking an
  68 > order from table A, then an order from table B, then one from A again, and
  69 > then one from B again would be a much slower process. By the same token, a
  70 > processor can do its job better if it works on data that’s close to other
  71 > data (as it is on the stack) rather than farther away (as it can be on the
  72 > heap).
  73 >
  74 > When your code calls a function, the values passed into the function
  75 > (including, potentially, pointers to data on the heap) and the function’s
  76 > local variables get pushed onto the stack. When the function is over, those
  77 > values get popped off the stack.
  78 >
  79 > Keeping track of what parts of code are using what data on the heap,
  80 > minimizing the amount of duplicate data on the heap, and cleaning up unused
  81 > data on the heap so you don’t run out of space are all problems that ownership
  82 > addresses. Once you understand ownership, you won’t need to think about the
  83 > stack and the heap very often, but knowing that the main purpose of ownership
  84 > is to manage heap data can help explain why it works the way it does.
  85
  86 ### Ownership Rules
  87
  88 First, let’s take a look at the ownership rules. Keep these rules in mind as we
  89 work through the examples that illustrate them:
  90
  91 * Each value in Rust has an *owner*.
  92 * There can only be one owner at a time.
  93 * When the owner goes out of scope, the value will be dropped.
  94
  95 ### Variable Scope
  96
  97 Now that we’re past basic Rust syntax, we won’t include all the `fn main() {`
  98 code in examples, so if you’re following along, make sure to put the following
  99 examples inside a `main` function manually. As a result, our examples will be a
 100 bit more concise, letting us focus on the actual details rather than
 101 boilerplate code.
 102
 103 As a first example of ownership, we’ll look at the *scope* of some variables. A
 104 scope is the range within a program for which an item is valid. Take the
 105 following variable:
 106
 107 ```rust
 108 let s = "hello";
 109 ```
 110
 111 The variable `s` refers to a string literal, where the value of the string is
 112 hardcoded into the text of our program. The variable is valid from the point at
 113 which it’s declared until the end of the current *scope*. Listing 4-1 shows a
 114 program with comments annotating where the variable `s` would be valid.
 115
 116 ```rust
 117 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-01/src/main.rs:here}}
 118 ```
 119
 120 <span class="caption">Listing 4-1: A variable and the scope in which it is
 121 valid</span>
 122
 123 In other words, there are two important points in time here:
 124
 125 * When `s` comes *into* scope, it is valid.
 126 * It remains valid until it goes *out of* scope.
 127
 128 At this point, the relationship between scopes and when variables are valid is
 129 similar to that in other programming languages. Now we’ll build on top of this
 130 understanding by introducing the `String` type.
 131
 132 ### The `String` Type
 133
 134 To illustrate the rules of ownership, we need a data type that is more complex
 135 than those we covered in the [“Data Types”][data-types]<!-- ignore --> section
 136 of Chapter 3. The types covered previously are of a known size, can be stored
 137 on the stack and popped off the stack when their scope is over, and can be
 138 quickly and trivially copied to make a new, independent instance if another
 139 part of code needs to use the same value in a different scope. But we want to
 140 look at data that is stored on the heap and explore how Rust knows when to
 141 clean up that data, and the `String` type is a great example.
 142
 143 We’ll concentrate on the parts of `String` that relate to ownership. These
 144 aspects also apply to other complex data types, whether they are provided by
 145 the standard library or created by you. We’ll discuss `String` in more depth in
 146 [Chapter 8][ch8]<!-- ignore -->.
 147
 148 We’ve already seen string literals, where a string value is hardcoded into our
 149 program. String literals are convenient, but they aren’t suitable for every
 150 situation in which we may want to use text. One reason is that they’re
 151 immutable. Another is that not every string value can be known when we write
 152 our code: for example, what if we want to take user input and store it? For
 153 these situations, Rust has a second string type, `String`. This type manages
 154 data allocated on the heap and as such is able to store an amount of text that
 155 is unknown to us at compile time. You can create a `String` from a string
 156 literal using the `from` function, like so:
 157
 158 ```rust
 159 let s = String::from("hello");
 160 ```
 161
 162 The double colon `::` operator allows us to namespace this particular `from`
 163 function under the `String` type rather than using some sort of name like
 164 `string_from`. We’ll discuss this syntax more in the [“Method
 165 Syntax”][method-syntax]<!-- ignore --> section of Chapter 5, and when we talk
 166 about namespacing with modules in [“Paths for Referring to an Item in the
 167 Module Tree”][paths-module-tree]<!-- ignore --> in Chapter 7.
 168
 169 This kind of string *can* be mutated:
 170
 171 ```rust
 172 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-01-can-mutate-string/src/main.rs:here}}
 173 ```
 174
 175 So, what’s the difference here? Why can `String` be mutated but literals
 176 cannot? The difference is in how these two types deal with memory.
 177
 178 ### Memory and Allocation
 179
 180 In the case of a string literal, we know the contents at compile time, so the
 181 text is hardcoded directly into the final executable. This is why string
 182 literals are fast and efficient. But these properties only come from the string
 183 literal’s immutability. Unfortunately, we can’t put a blob of memory into the
 184 binary for each piece of text whose size is unknown at compile time and whose
 185 size might change while running the program.
 186
 187 With the `String` type, in order to support a mutable, growable piece of text,
 188 we need to allocate an amount of memory on the heap, unknown at compile time,
 189 to hold the contents. This means:
 190
 191 * The memory must be requested from the memory allocator at runtime.
 192 * We need a way of returning this memory to the allocator when we’re done with
 193   our `String`.
 194
 195 That first part is done by us: when we call `String::from`, its implementation
 196 requests the memory it needs. This is pretty much universal in programming
 197 languages.
 198
 199 However, the second part is different. In languages with a *garbage collector
 200 (GC)*, the GC keeps track of and cleans up memory that isn’t being used
 201 anymore, and we don’t need to think about it. In most languages without a GC,
 202 it’s our responsibility to identify when memory is no longer being used and to
 203 call code to explicitly free it, just as we did to request it. Doing this
 204 correctly has historically been a difficult programming problem. If we forget,
 205 we’ll waste memory. If we do it too early, we’ll have an invalid variable. If
 206 we do it twice, that’s a bug too. We need to pair exactly one `allocate` with
 207 exactly one `free`.
 208
 209 Rust takes a different path: the memory is automatically returned once the
 210 variable that owns it goes out of scope. Here’s a version of our scope example
 211 from Listing 4-1 using a `String` instead of a string literal:
 212
 213 ```rust
 214 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-02-string-scope/src/main.rs:here}}
 215 ```
 216
 217 There is a natural point at which we can return the memory our `String` needs
 218 to the allocator: when `s` goes out of scope. When a variable goes out of
 219 scope, Rust calls a special function for us. This function is called
 220 [`drop`][drop]<!-- ignore -->, and it’s where the author of `String` can put
 221 the code to return the memory. Rust calls `drop` automatically at the closing
 222 curly bracket.
 223
 224 > Note: In C++, this pattern of deallocating resources at the end of an item’s
 225 > lifetime is sometimes called *Resource Acquisition Is Initialization (RAII)*.
 226 > The `drop` function in Rust will be familiar to you if you’ve used RAII
 227 > patterns.
 228
 229 This pattern has a profound impact on the way Rust code is written. It may seem
 230 simple right now, but the behavior of code can be unexpected in more
 231 complicated situations when we want to have multiple variables use the data
 232 we’ve allocated on the heap. Let’s explore some of those situations now.
 233
 234 <!-- Old heading. Do not remove or links may break. -->
 235 <a id="ways-variables-and-data-interact-move"></a>
 236
 237 #### Variables and Data Interacting with Move
 238
 239 Multiple variables can interact with the same data in different ways in Rust.
 240 Let’s look at an example using an integer in Listing 4-2.
 241
 242 ```rust
 243 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-02/src/main.rs:here}}
 244 ```
 245
 246 <span class="caption">Listing 4-2: Assigning the integer value of variable `x`
 247 to `y`</span>
 248
 249 We can probably guess what this is doing: “bind the value `5` to `x`; then make
 250 a copy of the value in `x` and bind it to `y`.” We now have two variables, `x`
 251 and `y`, and both equal `5`. This is indeed what is happening, because integers
 252 are simple values with a known, fixed size, and these two `5` values are pushed
 253 onto the stack.
 254
 255 Now let’s look at the `String` version:
 256
 257 ```rust
 258 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-03-string-move/src/main.rs:here}}
 259 ```
 260
 261 This looks very similar, so we might assume that the way it works would be the
 262 same: that is, the second line would make a copy of the value in `s1` and bind
 263 it to `s2`. But this isn’t quite what happens.
 264
 265 Take a look at Figure 4-1 to see what is happening to `String` under the
 266 covers. A `String` is made up of three parts, shown on the left: a pointer to
 267 the memory that holds the contents of the string, a length, and a capacity.
 268 This group of data is stored on the stack. On the right is the memory on the
 269 heap that holds the contents.
 270
 271 <img alt="Two tables: the first table contains the representation of s1 on the
 272 stack, consisting of its length (5), capacity (5), and a pointer to the first
 273 value in the second table. The second table contains the representation of the
 274 string data on the heap, byte by byte." src="img/trpl04-01.svg" class="center"
 275 style="width: 50%;" />
 276
 277 <span class="caption">Figure 4-1: Representation in memory of a `String`
 278 holding the value `"hello"` bound to `s1`</span>
 279
 280 The length is how much memory, in bytes, the contents of the `String` are
 281 currently using. The capacity is the total amount of memory, in bytes, that the
 282 `String` has received from the allocator. The difference between length and
 283 capacity matters, but not in this context, so for now, it’s fine to ignore the
 284 capacity.
 285
 286 When we assign `s1` to `s2`, the `String` data is copied, meaning we copy the
 287 pointer, the length, and the capacity that are on the stack. We do not copy the
 288 data on the heap that the pointer refers to. In other words, the data
 289 representation in memory looks like Figure 4-2.
 290
 291 <img alt="Three tables: tables s1 and s2 representing those strings on the
 292 stack, respectively, and both pointing to the same string data on the heap."
 293 src="img/trpl04-02.svg" class="center" style="width: 50%;" />
 294
 295 <span class="caption">Figure 4-2: Representation in memory of the variable `s2`
 296 that has a copy of the pointer, length, and capacity of `s1`</span>
 297
 298 The representation does *not* look like Figure 4-3, which is what memory would
 299 look like if Rust instead copied the heap data as well. If Rust did this, the
 300 operation `s2 = s1` could be very expensive in terms of runtime performance if
 301 the data on the heap were large.
 302
 303 <img alt="Four tables: two tables representing the stack data for s1 and s2,
 304 and each points to its own copy of string data on the heap."
 305 src="img/trpl04-03.svg" class="center" style="width: 50%;" />
 306
 307 <span class="caption">Figure 4-3: Another possibility for what `s2 = s1` might
 308 do if Rust copied the heap data as well</span>
 309
 310 Earlier, we said that when a variable goes out of scope, Rust automatically
 311 calls the `drop` function and cleans up the heap memory for that variable. But
 312 Figure 4-2 shows both data pointers pointing to the same location. This is a
 313 problem: when `s2` and `s1` go out of scope, they will both try to free the
 314 same memory. This is known as a *double free* error and is one of the memory
 315 safety bugs we mentioned previously. Freeing memory twice can lead to memory
 316 corruption, which can potentially lead to security vulnerabilities.
 317
 318 To ensure memory safety, after the line `let s2 = s1;`, Rust considers `s1` as
 319 no longer valid. Therefore, Rust doesn’t need to free anything when `s1` goes
 320 out of scope. Check out what happens when you try to use `s1` after `s2` is
 321 created; it won’t work:
 322
 323 ```rust,ignore,does_not_compile
 324 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-04-cant-use-after-move/src/main.rs:here}}
 325 ```
 326
 327 You’ll get an error like this because Rust prevents you from using the
 328 invalidated reference:
 329
 330 ```console
 331 {{#include ../listings/ch04-understanding-ownership/no-listing-04-cant-use-after-move/output.txt}}
 332 ```
 333
 334 If you’ve heard the terms *shallow copy* and *deep copy* while working with
 335 other languages, the concept of copying the pointer, length, and capacity
 336 without copying the data probably sounds like making a shallow copy. But
 337 because Rust also invalidates the first variable, instead of being called a
 338 shallow copy, it’s known as a *move*. In this example, we would say that `s1`
 339 was *moved* into `s2`. So, what actually happens is shown in Figure 4-4.
 340
 341 <img alt="Three tables: tables s1 and s2 representing those strings on the
 342 stack, respectively, and both pointing to the same string data on the heap.
 343 Table s1 is grayed out be-cause s1 is no longer valid; only s2 can be used to
 344 access the heap data." src="img/trpl04-04.svg" class="center" style="width:
 345 50%;" />
 346
 347 <span class="caption">Figure 4-4: Representation in memory after `s1` has been
 348 invalidated</span>
 349
 350 That solves our problem! With only `s2` valid, when it goes out of scope it
 351 alone will free the memory, and we’re done.
 352
 353 In addition, there’s a design choice that’s implied by this: Rust will never
 354 automatically create “deep” copies of your data. Therefore, any *automatic*
 355 copying can be assumed to be inexpensive in terms of runtime performance.
 356
 357 <!-- Old heading. Do not remove or links may break. -->
 358 <a id="ways-variables-and-data-interact-clone"></a>
 359
 360 #### Variables and Data Interacting with Clone
 361
 362 If we *do* want to deeply copy the heap data of the `String`, not just the
 363 stack data, we can use a common method called `clone`. We’ll discuss method
 364 syntax in Chapter 5, but because methods are a common feature in many
 365 programming languages, you’ve probably seen them before.
 366
 367 Here’s an example of the `clone` method in action:
 368
 369 ```rust
 370 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-05-clone/src/main.rs:here}}
 371 ```
 372
 373 This works just fine and explicitly produces the behavior shown in Figure 4-3,
 374 where the heap data *does* get copied.
 375
 376 When you see a call to `clone`, you know that some arbitrary code is being
 377 executed and that code may be expensive. It’s a visual indicator that something
 378 different is going on.
 379
 380 #### Stack-Only Data: Copy
 381
 382 There’s another wrinkle we haven’t talked about yet. This code using
 383 integers—part of which was shown in Listing 4-2—works and is valid:
 384
 385 ```rust
 386 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-06-copy/src/main.rs:here}}
 387 ```
 388
 389 But this code seems to contradict what we just learned: we don’t have a call to
 390 `clone`, but `x` is still valid and wasn’t moved into `y`.
 391
 392 The reason is that types such as integers that have a known size at compile
 393 time are stored entirely on the stack, so copies of the actual values are quick
 394 to make. That means there’s no reason we would want to prevent `x` from being
 395 valid after we create the variable `y`. In other words, there’s no difference
 396 between deep and shallow copying here, so calling `clone` wouldn’t do anything
 397 different from the usual shallow copying, and we can leave it out.
 398
 399 Rust has a special annotation called the `Copy` trait that we can place on
 400 types that are stored on the stack, as integers are (we’ll talk more about
 401 traits in [Chapter 10][traits]<!-- ignore -->). If a type implements the `Copy`
 402 trait, variables that use it do not move, but rather are trivially copied,
 403 making them still valid after assignment to another variable.
 404
 405 Rust won’t let us annotate a type with `Copy` if the type, or any of its parts,
 406 has implemented the `Drop` trait. If the type needs something special to happen
 407 when the value goes out of scope and we add the `Copy` annotation to that type,
 408 we’ll get a compile-time error. To learn about how to add the `Copy` annotation
 409 to your type to implement the trait, see [“Derivable
 410 Traits”][derivable-traits]<!-- ignore --> in Appendix C.
 411
 412 So, what types implement the `Copy` trait? You can check the documentation for
 413 the given type to be sure, but as a general rule, any group of simple scalar
 414 values can implement `Copy`, and nothing that requires allocation or is some
 415 form of resource can implement `Copy`. Here are some of the types that
 416 implement `Copy`:
 417
 418 * All the integer types, such as `u32`.
 419 * The Boolean type, `bool`, with values `true` and `false`.
 420 * All the floating-point types, such as `f64`.
 421 * The character type, `char`.
 422 * Tuples, if they only contain types that also implement `Copy`. For example,
 423   `(i32, i32)` implements `Copy`, but `(i32, String)` does not.
 424
 425 ### Ownership and Functions
 426
 427 The mechanics of passing a value to a function are similar to those when
 428 assigning a value to a variable. Passing a variable to a function will move or
 429 copy, just as assignment does. Listing 4-3 has an example with some annotations
 430 showing where variables go into and out of scope.
 431
 432 <span class="filename">Filename: src/main.rs</span>
 433
 434 ```rust
 435 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-03/src/main.rs}}
 436 ```
 437
 438 <span class="caption">Listing 4-3: Functions with ownership and scope
 439 annotated</span>
 440
 441 If we tried to use `s` after the call to `takes_ownership`, Rust would throw a
 442 compile-time error. These static checks protect us from mistakes. Try adding
 443 code to `main` that uses `s` and `x` to see where you can use them and where
 444 the ownership rules prevent you from doing so.
 445
 446 ### Return Values and Scope
 447
 448 Returning values can also transfer ownership. Listing 4-4 shows an example of a
 449 function that returns some value, with similar annotations as those in Listing
 450 4-3.
 451
 452 <span class="filename">Filename: src/main.rs</span>
 453
 454 ```rust
 455 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-04/src/main.rs}}
 456 ```
 457
 458 <span class="caption">Listing 4-4: Transferring ownership of return
 459 values</span>
 460
 461 The ownership of a variable follows the same pattern every time: assigning a
 462 value to another variable moves it. When a variable that includes data on the
 463 heap goes out of scope, the value will be cleaned up by `drop` unless ownership
 464 of the data has been moved to another variable.
 465
 466 While this works, taking ownership and then returning ownership with every
 467 function is a bit tedious. What if we want to let a function use a value but
 468 not take ownership? It’s quite annoying that anything we pass in also needs to
 469 be passed back if we want to use it again, in addition to any data resulting
 470 from the body of the function that we might want to return as well.
 471
 472 Rust does let us return multiple values using a tuple, as shown in Listing 4-5.
 473
 474 <span class="filename">Filename: src/main.rs</span>
 475
 476 ```rust
 477 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-05/src/main.rs}}
 478 ```
 479
 480 <span class="caption">Listing 4-5: Returning ownership of parameters</span>
 481
 482 But this is too much ceremony and a lot of work for a concept that should be
 483 common. Luckily for us, Rust has a feature for using a value without
 484 transferring ownership, called *references*.
 485
 486 [data-types]: ch03-02-data-types.html#data-types
 487 [ch8]: ch08-02-strings.html
 488 [traits]: ch10-02-traits.html
 489 [derivable-traits]: appendix-03-derivable-traits.html
 490 [method-syntax]: ch05-03-method-syntax.html#method-syntax
 491 [paths-module-tree]: ch07-03-paths-for-referring-to-an-item-in-the-module-tree.html
 492 [drop]: ../std/ops/trait.Drop.html#tymethod.drop