src/doc/book/src/ch04-01-what-is-ownership.md

   1 ## What Is Ownership?
   2
   3 Rust’s central feature is *ownership*. Although the feature is straightforward
   4 to explain, it has deep implications for the rest of the language.
   5
   6 All programs have to manage the way they use a computer’s memory while running.
   7 Some languages have garbage collection that constantly looks for no longer used
   8 memory as the program runs; in other languages, the programmer must explicitly
   9 allocate and free the memory. Rust uses a third approach: memory is managed
  10 through a system of ownership with a set of rules that the compiler checks at
  11 compile time. None of the ownership features slow down your program while it’s
  12 running.
  13
  14 Because ownership is a new concept for many programmers, it does take some time
  15 to get used to. The good news is that the more experienced you become with Rust
  16 and the rules of the ownership system, the more you’ll be able to naturally
  17 develop code that is safe and efficient. Keep at it!
  18
  19 When you understand ownership, you’ll have a solid foundation for understanding
  20 the features that make Rust unique. In this chapter, you’ll learn ownership by
  21 working through some examples that focus on a very common data structure:
  22 strings.
  23
  24 > ### The Stack and the Heap
  25 >
  26 > In many programming languages, you don’t have to think about the stack and
  27 > the heap very often. But in a systems programming language like Rust, whether
  28 > a value is on the stack or the heap has more of an effect on how the language
  29 > behaves and why you have to make certain decisions. Parts of ownership will
  30 > be described in relation to the stack and the heap later in this chapter, so
  31 > here is a brief explanation in preparation.
  32 >
  33 > Both the stack and the heap are parts of memory that are available to your
  34 > code to use at runtime, but they are structured in different ways. The stack
  35 > stores values in the order it gets them and removes the values in the
  36 > opposite order. This is referred to as *last in, first out*. Think of a stack
  37 > of plates: when you add more plates, you put them on top of the pile, and
  38 > when you need a plate, you take one off the top. Adding or removing plates
  39 > from the middle or bottom wouldn’t work as well! Adding data is called
  40 > *pushing onto the stack*, and removing data is called *popping off the stack*.
  41 >
  42 > All data stored on the stack must have a known, fixed size. Data with an
  43 > unknown size at compile time or a size that might change must be stored on
  44 > the heap instead. The heap is less organized: when you put data on the heap,
  45 > you request a certain amount of space. The memory allocator finds an empty
  46 > spot in the heap that is big enough, marks it as being in use, and returns a
  47 > *pointer*, which is the address of that location. This process is called
  48 > *allocating on the heap* and is sometimes abbreviated as just *allocating*.
  49 > Pushing values onto the stack is not considered allocating. Because the
  50 > pointer is a known, fixed size, you can store the pointer on the stack, but
  51 > when you want the actual data, you must follow the pointer.
  52 >
  53 > Think of being seated at a restaurant. When you enter, you state the number of
  54 > people in your group, and the staff finds an empty table that fits everyone
  55 > and leads you there. If someone in your group comes late, they can ask where
  56 > you’ve been seated to find you.
  57 >
  58 > Pushing to the stack is faster than allocating on the heap because the
  59 > allocator never has to search for a place to store new data; that
  60 > location is always at the top of the stack. Comparatively, allocating space
  61 > on the heap requires more work, because the allocator must first find
  62 > a big enough space to hold the data and then perform bookkeeping to prepare
  63 > for the next allocation.
  64 >
  65 > Accessing data in the heap is slower than accessing data on the stack because
  66 > you have to follow a pointer to get there. Contemporary processors are faster
  67 > if they jump around less in memory. Continuing the analogy, consider a server
  68 > at a restaurant taking orders from many tables. It’s most efficient to get
  69 > all the orders at one table before moving on to the next table. Taking an
  70 > order from table A, then an order from table B, then one from A again, and
  71 > then one from B again would be a much slower process. By the same token, a
  72 > processor can do its job better if it works on data that’s close to other
  73 > data (as it is on the stack) rather than farther away (as it can be on the
  74 > heap). Allocating a large amount of space on the heap can also take time.
  75 >
  76 > When your code calls a function, the values passed into the function
  77 > (including, potentially, pointers to data on the heap) and the function’s
  78 > local variables get pushed onto the stack. When the function is over, those
  79 > values get popped off the stack.
  80 >
  81 > Keeping track of what parts of code are using what data on the heap,
  82 > minimizing the amount of duplicate data on the heap, and cleaning up unused
  83 > data on the heap so you don’t run out of space are all problems that ownership
  84 > addresses. Once you understand ownership, you won’t need to think about the
  85 > stack and the heap very often, but knowing that managing heap data is why
  86 > ownership exists can help explain why it works the way it does.
  87
  88 ### Ownership Rules
  89
  90 First, let’s take a look at the ownership rules. Keep these rules in mind as we
  91 work through the examples that illustrate them:
  92
  93 * Each value in Rust has a variable that’s called its *owner*.
  94 * There can only be one owner at a time.
  95 * When the owner goes out of scope, the value will be dropped.
  96
  97 ### Variable Scope
  98
  99 We’ve walked through an example of a Rust program already in Chapter 2. Now
 100 that we’re past basic syntax, we won’t include all the `fn main() {` code in
 101 examples, so if you’re following along, you’ll have to put the following
 102 examples inside a `main` function manually. As a result, our examples will be a
 103 bit more concise, letting us focus on the actual details rather than
 104 boilerplate code.
 105
 106 As a first example of ownership, we’ll look at the *scope* of some variables. A
 107 scope is the range within a program for which an item is valid. Let’s say we
 108 have a variable that looks like this:
 109
 110 ```rust
 111 let s = "hello";
 112 ```
 113
 114 The variable `s` refers to a string literal, where the value of the string is
 115 hardcoded into the text of our program. The variable is valid from the point at
 116 which it’s declared until the end of the current *scope*. Listing 4-1 has
 117 comments annotating where the variable `s` is valid.
 118
 119 ```rust
 120 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-01/src/main.rs:here}}
 121 ```
 122
 123 <span class="caption">Listing 4-1: A variable and the scope in which it is
 124 valid</span>
 125
 126 In other words, there are two important points in time here:
 127
 128 * When `s` comes *into scope*, it is valid.
 129 * It remains valid until it goes *out of scope*.
 130
 131 At this point, the relationship between scopes and when variables are valid is
 132 similar to that in other programming languages. Now we’ll build on top of this
 133 understanding by introducing the `String` type.
 134
 135 ### The `String` Type
 136
 137 To illustrate the rules of ownership, we need a data type that is more complex
 138 than the ones we covered in the [“Data Types”][data-types]<!-- ignore -->
 139 section of Chapter 3. The types covered previously are all stored on the stack
 140 and popped off the stack when their scope is over, but we want to look at data
 141 that is stored on the heap and explore how Rust knows when to clean up that
 142 data.
 143
 144 We’ll use `String` as the example here and concentrate on the parts of `String`
 145 that relate to ownership. These aspects also apply to other complex data types,
 146 whether they are provided by the standard library or created by you. We’ll
 147 discuss `String` in more depth in Chapter 8.
 148
 149 We’ve already seen string literals, where a string value is hardcoded into our
 150 program. String literals are convenient, but they aren’t suitable for every
 151 situation in which we may want to use text. One reason is that they’re
 152 immutable. Another is that not every string value can be known when we write
 153 our code: for example, what if we want to take user input and store it? For
 154 these situations, Rust has a second string type, `String`. This type is
 155 allocated on the heap and as such is able to store an amount of text that is
 156 unknown to us at compile time. You can create a `String` from a string literal
 157 using the `from` function, like so:
 158
 159 ```rust
 160 let s = String::from("hello");
 161 ```
 162
 163 The double colon (`::`) is an operator that allows us to namespace this
 164 particular `from` function under the `String` type rather than using some sort
 165 of name like `string_from`. We’ll discuss this syntax more in the [“Method
 166 Syntax”][method-syntax]<!-- ignore --> section of Chapter 5 and when we talk
 167 about namespacing with modules in [“Paths for Referring to an Item in the
 168 Module Tree”][paths-module-tree]<!-- ignore --> in Chapter 7.
 169
 170 This kind of string *can* be mutated:
 171
 172 ```rust
 173 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-01-can-mutate-string/src/main.rs:here}}
 174 ```
 175
 176 So, what’s the difference here? Why can `String` be mutated but literals
 177 cannot? The difference is how these two types deal with memory.
 178
 179 ### Memory and Allocation
 180
 181 In the case of a string literal, we know the contents at compile time, so the
 182 text is hardcoded directly into the final executable. This is why string
 183 literals are fast and efficient. But these properties only come from the string
 184 literal’s immutability. Unfortunately, we can’t put a blob of memory into the
 185 binary for each piece of text whose size is unknown at compile time and whose
 186 size might change while running the program.
 187
 188 With the `String` type, in order to support a mutable, growable piece of text,
 189 we need to allocate an amount of memory on the heap, unknown at compile time,
 190 to hold the contents. This means:
 191
 192 * The memory must be requested from the memory allocator at runtime.
 193 * We need a way of returning this memory to the allocator when we’re
 194   done with our `String`.
 195
 196 That first part is done by us: when we call `String::from`, its implementation
 197 requests the memory it needs. This is pretty much universal in programming
 198 languages.
 199
 200 However, the second part is different. In languages with a *garbage collector
 201 (GC)*, the GC keeps track and cleans up memory that isn’t being used anymore,
 202 and we don’t need to think about it. Without a GC, it’s our responsibility to
 203 identify when memory is no longer being used and call code to explicitly return
 204 it, just as we did to request it. Doing this correctly has historically been a
 205 difficult programming problem. If we forget, we’ll waste memory. If we do it
 206 too early, we’ll have an invalid variable. If we do it twice, that’s a bug too.
 207 We need to pair exactly one `allocate` with exactly one `free`.
 208
 209 Rust takes a different path: the memory is automatically returned once the
 210 variable that owns it goes out of scope. Here’s a version of our scope example
 211 from Listing 4-1 using a `String` instead of a string literal:
 212
 213 ```rust
 214 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-02-string-scope/src/main.rs:here}}
 215 ```
 216
 217 There is a natural point at which we can return the memory our `String` needs
 218 to the allocator: when `s` goes out of scope. When a variable goes out
 219 of scope, Rust calls a special function for us. This function is called `drop`,
 220 and it’s where the author of `String` can put the code to return the memory.
 221 Rust calls `drop` automatically at the closing curly bracket.
 222
 223 > Note: In C++, this pattern of deallocating resources at the end of an item’s
 224 > lifetime is sometimes called *Resource Acquisition Is Initialization (RAII)*.
 225 > The `drop` function in Rust will be familiar to you if you’ve used RAII
 226 > patterns.
 227
 228 This pattern has a profound impact on the way Rust code is written. It may seem
 229 simple right now, but the behavior of code can be unexpected in more
 230 complicated situations when we want to have multiple variables use the data
 231 we’ve allocated on the heap. Let’s explore some of those situations now.
 232
 233 #### Ways Variables and Data Interact: Move
 234
 235 Multiple variables can interact with the same data in different ways in Rust.
 236 Let’s look at an example using an integer in Listing 4-2.
 237
 238 ```rust
 239 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-02/src/main.rs:here}}
 240 ```
 241
 242 <span class="caption">Listing 4-2: Assigning the integer value of variable `x`
 243 to `y`</span>
 244
 245 We can probably guess what this is doing: “bind the value `5` to `x`; then make
 246 a copy of the value in `x` and bind it to `y`.” We now have two variables, `x`
 247 and `y`, and both equal `5`. This is indeed what is happening, because integers
 248 are simple values with a known, fixed size, and these two `5` values are pushed
 249 onto the stack.
 250
 251 Now let’s look at the `String` version:
 252
 253 ```rust
 254 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-03-string-move/src/main.rs:here}}
 255 ```
 256
 257 This looks very similar to the previous code, so we might assume that the way
 258 it works would be the same: that is, the second line would make a copy of the
 259 value in `s1` and bind it to `s2`. But this isn’t quite what happens.
 260
 261 Take a look at Figure 4-1 to see what is happening to `String` under the
 262 covers. A `String` is made up of three parts, shown on the left: a pointer to
 263 the memory that holds the contents of the string, a length, and a capacity.
 264 This group of data is stored on the stack. On the right is the memory on the
 265 heap that holds the contents.
 266
 267 <img alt="String in memory" src="img/trpl04-01.svg" class="center" style="width: 50%;" />
 268
 269 <span class="caption">Figure 4-1: Representation in memory of a `String`
 270 holding the value `"hello"` bound to `s1`</span>
 271
 272 The length is how much memory, in bytes, the contents of the `String` is
 273 currently using. The capacity is the total amount of memory, in bytes, that the
 274 `String` has received from the allocator. The difference between length
 275 and capacity matters, but not in this context, so for now, it’s fine to ignore
 276 the capacity.
 277
 278 When we assign `s1` to `s2`, the `String` data is copied, meaning we copy the
 279 pointer, the length, and the capacity that are on the stack. We do not copy the
 280 data on the heap that the pointer refers to. In other words, the data
 281 representation in memory looks like Figure 4-2.
 282
 283 <img alt="s1 and s2 pointing to the same value" src="img/trpl04-02.svg" class="center" style="width: 50%;" />
 284
 285 <span class="caption">Figure 4-2: Representation in memory of the variable `s2`
 286 that has a copy of the pointer, length, and capacity of `s1`</span>
 287
 288 The representation does *not* look like Figure 4-3, which is what memory would
 289 look like if Rust instead copied the heap data as well. If Rust did this, the
 290 operation `s2 = s1` could be very expensive in terms of runtime performance if
 291 the data on the heap were large.
 292
 293 <img alt="s1 and s2 to two places" src="img/trpl04-03.svg" class="center" style="width: 50%;" />
 294
 295 <span class="caption">Figure 4-3: Another possibility for what `s2 = s1` might
 296 do if Rust copied the heap data as well</span>
 297
 298 Earlier, we said that when a variable goes out of scope, Rust automatically
 299 calls the `drop` function and cleans up the heap memory for that variable. But
 300 Figure 4-2 shows both data pointers pointing to the same location. This is a
 301 problem: when `s2` and `s1` go out of scope, they will both try to free the
 302 same memory. This is known as a *double free* error and is one of the memory
 303 safety bugs we mentioned previously. Freeing memory twice can lead to memory
 304 corruption, which can potentially lead to security vulnerabilities.
 305
 306 To ensure memory safety, there’s one more detail to what happens in this
 307 situation in Rust. Instead of trying to copy the allocated memory, Rust
 308 considers `s1` to no longer be valid and, therefore, Rust doesn’t need to free
 309 anything when `s1` goes out of scope. Check out what happens when you try to
 310 use `s1` after `s2` is created; it won’t work:
 311
 312 ```rust,ignore,does_not_compile
 313 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-04-cant-use-after-move/src/main.rs:here}}
 314 ```
 315
 316 You’ll get an error like this because Rust prevents you from using the
 317 invalidated reference:
 318
 319 ```console
 320 {{#include ../listings/ch04-understanding-ownership/no-listing-04-cant-use-after-move/output.txt}}
 321 ```
 322
 323 If you’ve heard the terms *shallow copy* and *deep copy* while working with
 324 other languages, the concept of copying the pointer, length, and capacity
 325 without copying the data probably sounds like making a shallow copy. But
 326 because Rust also invalidates the first variable, instead of being called a
 327 shallow copy, it’s known as a *move*. In this example, we would say that
 328 `s1` was *moved* into `s2`. So what actually happens is shown in Figure 4-4.
 329
 330 <img alt="s1 moved to s2" src="img/trpl04-04.svg" class="center" style="width: 50%;" />
 331
 332 <span class="caption">Figure 4-4: Representation in memory after `s1` has been
 333 invalidated</span>
 334
 335 That solves our problem! With only `s2` valid, when it goes out of scope, it
 336 alone will free the memory, and we’re done.
 337
 338 In addition, there’s a design choice that’s implied by this: Rust will never
 339 automatically create “deep” copies of your data. Therefore, any *automatic*
 340 copying can be assumed to be inexpensive in terms of runtime performance.
 341
 342 #### Ways Variables and Data Interact: Clone
 343
 344 If we *do* want to deeply copy the heap data of the `String`, not just the
 345 stack data, we can use a common method called `clone`. We’ll discuss method
 346 syntax in Chapter 5, but because methods are a common feature in many
 347 programming languages, you’ve probably seen them before.
 348
 349 Here’s an example of the `clone` method in action:
 350
 351 ```rust
 352 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-05-clone/src/main.rs:here}}
 353 ```
 354
 355 This works just fine and explicitly produces the behavior shown in Figure 4-3,
 356 where the heap data *does* get copied.
 357
 358 When you see a call to `clone`, you know that some arbitrary code is being
 359 executed and that code may be expensive. It’s a visual indicator that something
 360 different is going on.
 361
 362 #### Stack-Only Data: Copy
 363
 364 There’s another wrinkle we haven’t talked about yet. This code using integers –
 365 part of which was shown in Listing 4-2 – works and is valid:
 366
 367 ```rust
 368 {{#rustdoc_include ../listings/ch04-understanding-ownership/no-listing-06-copy/src/main.rs:here}}
 369 ```
 370
 371 But this code seems to contradict what we just learned: we don’t have a call to
 372 `clone`, but `x` is still valid and wasn’t moved into `y`.
 373
 374 The reason is that types such as integers that have a known size at compile
 375 time are stored entirely on the stack, so copies of the actual values are quick
 376 to make. That means there’s no reason we would want to prevent `x` from being
 377 valid after we create the variable `y`. In other words, there’s no difference
 378 between deep and shallow copying here, so calling `clone` wouldn’t do anything
 379 different from the usual shallow copying and we can leave it out.
 380
 381 Rust has a special annotation called the `Copy` trait that we can place on
 382 types like integers that are stored on the stack (we’ll talk more about traits
 383 in Chapter 10). If a type has the `Copy` trait, an older variable is still
 384 usable after assignment. Rust won’t let us annotate a type with the `Copy`
 385 trait if the type, or any of its parts, has implemented the `Drop` trait. If
 386 the type needs something special to happen when the value goes out of scope and
 387 we add the `Copy` annotation to that type, we’ll get a compile-time error. To
 388 learn about how to add the `Copy` annotation to your type, see [“Derivable
 389 Traits”][derivable-traits]<!-- ignore --> in Appendix C.
 390
 391 So what types are `Copy`? You can check the documentation for the given type to
 392 be sure, but as a general rule, any group of simple scalar values can be
 393 `Copy`, and nothing that requires allocation or is some form of resource is
 394 `Copy`. Here are some of the types that are `Copy`:
 395
 396 * All the integer types, such as `u32`.
 397 * The Boolean type, `bool`, with values `true` and `false`.
 398 * All the floating point types, such as `f64`.
 399 * The character type, `char`.
 400 * Tuples, if they only contain types that are also `Copy`. For example,
 401   `(i32, i32)` is `Copy`, but `(i32, String)` is not.
 402
 403 ### Ownership and Functions
 404
 405 The semantics for passing a value to a function are similar to those for
 406 assigning a value to a variable. Passing a variable to a function will move or
 407 copy, just as assignment does. Listing 4-3 has an example with some annotations
 408 showing where variables go into and out of scope.
 409
 410 <span class="filename">Filename: src/main.rs</span>
 411
 412 ```rust
 413 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-03/src/main.rs}}
 414 ```
 415
 416 <span class="caption">Listing 4-3: Functions with ownership and scope
 417 annotated</span>
 418
 419 If we tried to use `s` after the call to `takes_ownership`, Rust would throw a
 420 compile-time error. These static checks protect us from mistakes. Try adding
 421 code to `main` that uses `s` and `x` to see where you can use them and where
 422 the ownership rules prevent you from doing so.
 423
 424 ### Return Values and Scope
 425
 426 Returning values can also transfer ownership. Listing 4-4 is an example with
 427 similar annotations to those in Listing 4-3.
 428
 429 <span class="filename">Filename: src/main.rs</span>
 430
 431 ```rust
 432 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-04/src/main.rs}}
 433 ```
 434
 435 <span class="caption">Listing 4-4: Transferring ownership of return
 436 values</span>
 437
 438 The ownership of a variable follows the same pattern every time: assigning a
 439 value to another variable moves it. When a variable that includes data on the
 440 heap goes out of scope, the value will be cleaned up by `drop` unless the data
 441 has been moved to be owned by another variable.
 442
 443 Taking ownership and then returning ownership with every function is a bit
 444 tedious. What if we want to let a function use a value but not take ownership?
 445 It’s quite annoying that anything we pass in also needs to be passed back if we
 446 want to use it again, in addition to any data resulting from the body of the
 447 function that we might want to return as well.
 448
 449 It’s possible to return multiple values using a tuple, as shown in Listing 4-5.
 450
 451 <span class="filename">Filename: src/main.rs</span>
 452
 453 ```rust
 454 {{#rustdoc_include ../listings/ch04-understanding-ownership/listing-04-05/src/main.rs}}
 455 ```
 456
 457 <span class="caption">Listing 4-5: Returning ownership of parameters</span>
 458
 459 But this is too much ceremony and a lot of work for a concept that should be
 460 common. Luckily for us, Rust has a feature for this concept, called
 461 *references*.
 462
 463 [data-types]: ch03-02-data-types.html#data-types
 464 [derivable-traits]: appendix-03-derivable-traits.html
 465 [method-syntax]: ch05-03-method-syntax.html#method-syntax
 466 [paths-module-tree]: ch07-03-paths-for-referring-to-an-item-in-the-module-tree.html