]> git.proxmox.com Git - rustc.git/blob - src/doc/trpl/pointers.md
Imported Upstream version 1.0.0~beta
[rustc.git] / src / doc / trpl / pointers.md
1 % Pointers
2
3 Rust's pointers are one of its more unique and compelling features. Pointers
4 are also one of the more confusing topics for newcomers to Rust. They can also
5 be confusing for people coming from other languages that support pointers, such
6 as C++. This guide will help you understand this important topic.
7
8 Be sceptical of non-reference pointers in Rust: use them for a deliberate
9 purpose, not just to make the compiler happy. Each pointer type comes with an
10 explanation about when they are appropriate to use. Default to references
11 unless you're in one of those specific situations.
12
13 You may be interested in the [cheat sheet](#cheat-sheet), which gives a quick
14 overview of the types, names, and purpose of the various pointers.
15
16 # An introduction
17
18 If you aren't familiar with the concept of pointers, here's a short
19 introduction. Pointers are a very fundamental concept in systems programming
20 languages, so it's important to understand them.
21
22 ## Pointer Basics
23
24 When you create a new variable binding, you're giving a name to a value that's
25 stored at a particular location on the stack. (If you're not familiar with the
26 *heap* vs. *stack*, please check out [this Stack Overflow
27 question](http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap),
28 as the rest of this guide assumes you know the difference.) Like this:
29
30 ```{rust}
31 let x = 5;
32 let y = 8;
33 ```
34
35 | location | value |
36 |----------|-------|
37 | 0xd3e030 | 5 |
38 | 0xd3e028 | 8 |
39
40 We're making up memory locations here, they're just sample values. Anyway, the
41 point is that `x`, the name we're using for our variable, corresponds to the
42 memory location `0xd3e030`, and the value at that location is `5`. When we
43 refer to `x`, we get the corresponding value. Hence, `x` is `5`.
44
45 Let's introduce a pointer. In some languages, there is just one type of
46 'pointer,' but in Rust, we have many types. In this case, we'll use a Rust
47 *reference*, which is the simplest kind of pointer.
48
49 ```{rust}
50 let x = 5;
51 let y = 8;
52 let z = &y;
53 ```
54
55 |location | value |
56 |-------- |----------|
57 |0xd3e030 | 5 |
58 |0xd3e028 | 8 |
59 |0xd3e020 | 0xd3e028 |
60
61 See the difference? Rather than contain a value, the value of a pointer is a
62 location in memory. In this case, the location of `y`. `x` and `y` have the
63 type `i32`, but `z` has the type `&i32`. We can print this location using the
64 `{:p}` format string:
65
66 ```{rust}
67 let x = 5;
68 let y = 8;
69 let z = &y;
70
71 println!("{:p}", z);
72 ```
73
74 This would print `0xd3e028`, with our fictional memory addresses.
75
76 Because `i32` and `&i32` are different types, we can't, for example, add them
77 together:
78
79 ```{rust,ignore}
80 let x = 5;
81 let y = 8;
82 let z = &y;
83
84 println!("{}", x + z);
85 ```
86
87 This gives us an error:
88
89 ```text
90 hello.rs:6:24: 6:25 error: mismatched types: expected `_`, found `&_` (expected integral variable, found &-ptr)
91 hello.rs:6 println!("{}", x + z);
92 ^
93 ```
94
95 We can *dereference* the pointer by using the `*` operator. Dereferencing a
96 pointer means accessing the value at the location stored in the pointer. This
97 will work:
98
99 ```{rust}
100 let x = 5;
101 let y = 8;
102 let z = &y;
103
104 println!("{}", x + *z);
105 ```
106
107 It prints `13`.
108
109 That's it! That's all pointers are: they point to some memory location. Not
110 much else to them. Now that we've discussed the *what* of pointers, let's
111 talk about the *why*.
112
113 ## Pointer uses
114
115 Rust's pointers are quite useful, but in different ways than in other systems
116 languages. We'll talk about best practices for Rust pointers later in
117 the guide, but here are some ways that pointers are useful in other languages:
118
119 In C, strings are a pointer to a list of `char`s, ending with a null byte.
120 The only way to use strings is to get quite familiar with pointers.
121
122 Pointers are useful to point to memory locations that are not on the stack. For
123 example, our example used two stack variables, so we were able to give them
124 names. But if we allocated some heap memory, we wouldn't have that name
125 available. In C, `malloc` is used to allocate heap memory, and it returns a
126 pointer.
127
128 As a more general variant of the previous two points, any time you have a
129 structure that can change in size, you need a pointer. You can't tell at
130 compile time how much memory to allocate, so you've gotta use a pointer to
131 point at the memory where it will be allocated, and deal with it at run time.
132
133 Pointers are useful in languages that are pass-by-value, rather than
134 pass-by-reference. Basically, languages can make two choices (this is made
135 up syntax, it's not Rust):
136
137 ```text
138 func foo(x) {
139 x = 5
140 }
141
142 func main() {
143 i = 1
144 foo(i)
145 // what is the value of i here?
146 }
147 ```
148
149 In languages that are pass-by-value, `foo` will get a copy of `i`, and so
150 the original version of `i` is not modified. At the comment, `i` will still be
151 `1`. In a language that is pass-by-reference, `foo` will get a reference to `i`,
152 and therefore, can change its value. At the comment, `i` will be `5`.
153
154 So what do pointers have to do with this? Well, since pointers point to a
155 location in memory...
156
157 ```text
158 func foo(&i32 x) {
159 *x = 5
160 }
161
162 func main() {
163 i = 1
164 foo(&i)
165 // what is the value of i here?
166 }
167 ```
168
169 Even in a language which is pass by value, `i` will be `5` at the comment. You
170 see, because the argument `x` is a pointer, we do send a copy over to `foo`,
171 but because it points at a memory location, which we then assign to, the
172 original value is still changed. This pattern is called
173 *pass-reference-by-value*. Tricky!
174
175 ## Common pointer problems
176
177 We've talked about pointers, and we've sung their praises. So what's the
178 downside? Well, Rust attempts to mitigate each of these kinds of problems,
179 but here are problems with pointers in other languages:
180
181 Uninitialized pointers can cause a problem. For example, what does this program
182 do?
183
184 ```{ignore}
185 &int x;
186 *x = 5; // whoops!
187 ```
188
189 Who knows? We just declare a pointer, but don't point it at anything, and then
190 set the memory location that it points at to be `5`. But which location? Nobody
191 knows. This might be harmless, and it might be catastrophic.
192
193 When you combine pointers and functions, it's easy to accidentally invalidate
194 the memory the pointer is pointing to. For example:
195
196 ```text
197 func make_pointer(): &int {
198 x = 5;
199
200 return &x;
201 }
202
203 func main() {
204 &int i = make_pointer();
205 *i = 5; // uh oh!
206 }
207 ```
208
209 `x` is local to the `make_pointer` function, and therefore, is invalid as soon
210 as `make_pointer` returns. But we return a pointer to its memory location, and
211 so back in `main`, we try to use that pointer, and it's a very similar
212 situation to our first one. Setting invalid memory locations is bad.
213
214 As one last example of a big problem with pointers, *aliasing* can be an
215 issue. Two pointers are said to alias when they point at the same location
216 in memory. Like this:
217
218 ```text
219 func mutate(&int i, int j) {
220 *i = j;
221 }
222
223 func main() {
224 x = 5;
225 y = &x;
226 z = &x; //y and z are aliased
227
228
229 run_in_new_thread(mutate, y, 1);
230 run_in_new_thread(mutate, z, 100);
231
232 // what is the value of x here?
233 }
234 ```
235
236 In this made-up example, `run_in_new_thread` spins up a new thread, and calls
237 the given function name with its arguments. Since we have two threads, and
238 they're both operating on aliases to `x`, we can't tell which one finishes
239 first, and therefore, the value of `x` is actually non-deterministic. Worse,
240 what if one of them had invalidated the memory location they pointed to? We'd
241 have the same problem as before, where we'd be setting an invalid location.
242
243 ## Conclusion
244
245 That's a basic overview of pointers as a general concept. As we alluded to
246 before, Rust has different kinds of pointers, rather than just one, and
247 mitigates all of the problems that we talked about, too. This does mean that
248 Rust pointers are slightly more complicated than in other languages, but
249 it's worth it to not have the problems that simple pointers have.
250
251 # References
252
253 The most basic type of pointer that Rust has is called a *reference*. Rust
254 references look like this:
255
256 ```{rust}
257 let x = 5;
258 let y = &x;
259
260 println!("{}", *y);
261 println!("{:p}", y);
262 println!("{}", y);
263 ```
264
265 We'd say "`y` is a reference to `x`." The first `println!` prints out the
266 value of `y`'s referent by using the dereference operator, `*`. The second
267 one prints out the memory location that `y` points to, by using the pointer
268 format string. The third `println!` *also* prints out the value of `y`'s
269 referent, because `println!` will automatically dereference it for us.
270
271 Here's a function that takes a reference:
272
273 ```{rust}
274 fn succ(x: &i32) -> i32 { *x + 1 }
275 ```
276
277 You can also use `&` as an operator to create a reference, so we can
278 call this function in two different ways:
279
280 ```{rust}
281 fn succ(x: &i32) -> i32 { *x + 1 }
282
283 fn main() {
284
285 let x = 5;
286 let y = &x;
287
288 println!("{}", succ(y));
289 println!("{}", succ(&x));
290 }
291 ```
292
293 Both of these `println!`s will print out `6`.
294
295 Of course, if this were real code, we wouldn't bother with the reference, and
296 just write:
297
298 ```{rust}
299 fn succ(x: i32) -> i32 { x + 1 }
300 ```
301
302 References are immutable by default:
303
304 ```{rust,ignore}
305 let x = 5;
306 let y = &x;
307
308 *y = 5; // error: cannot assign to immutable borrowed content `*y`
309 ```
310
311 They can be made mutable with `mut`, but only if its referent is also mutable.
312 This works:
313
314 ```{rust}
315 let mut x = 5;
316 let y = &mut x;
317 ```
318
319 This does not:
320
321 ```{rust,ignore}
322 let x = 5;
323 let y = &mut x; // error: cannot borrow immutable local variable `x` as mutable
324 ```
325
326 Immutable pointers are allowed to alias:
327
328 ```{rust}
329 let x = 5;
330 let y = &x;
331 let z = &x;
332 ```
333
334 Mutable ones, however, are not:
335
336 ```{rust,ignore}
337 let mut x = 5;
338 let y = &mut x;
339 let z = &mut x; // error: cannot borrow `x` as mutable more than once at a time
340 ```
341
342 Despite their complete safety, a reference's representation at runtime is the
343 same as that of an ordinary pointer in a C program. They introduce zero
344 overhead. The compiler does all safety checks at compile time. The theory that
345 allows for this was originally called *region pointers*. Region pointers
346 evolved into what we know today as *lifetimes*.
347
348 Here's the simple explanation: would you expect this code to compile?
349
350 ```{rust,ignore}
351 fn main() {
352 println!("{}", x);
353 let x = 5;
354 }
355 ```
356
357 Probably not. That's because you know that the name `x` is valid from where
358 it's declared to when it goes out of scope. In this case, that's the end of
359 the `main` function. So you know this code will cause an error. We call this
360 duration a *lifetime*. Let's try a more complex example:
361
362 ```{rust}
363 fn main() {
364 let mut x = 5;
365
366 if x < 10 {
367 let y = &x;
368
369 println!("Oh no: {}", y);
370 return;
371 }
372
373 x -= 1;
374
375 println!("Oh no: {}", x);
376 }
377 ```
378
379 Here, we're borrowing a pointer to `x` inside of the `if`. The compiler, however,
380 is able to determine that that pointer will go out of scope without `x` being
381 mutated, and therefore, lets us pass. This wouldn't work:
382
383 ```{rust,ignore}
384 fn main() {
385 let mut x = 5;
386
387 if x < 10 {
388 let y = &x;
389
390 x -= 1;
391
392 println!("Oh no: {}", y);
393 return;
394 }
395
396 x -= 1;
397
398 println!("Oh no: {}", x);
399 }
400 ```
401
402 It gives this error:
403
404 ```text
405 test.rs:7:9: 7:15 error: cannot assign to `x` because it is borrowed
406 test.rs:7 x -= 1;
407 ^~~~~~
408 test.rs:5:18: 5:19 note: borrow of `x` occurs here
409 test.rs:5 let y = &x;
410 ^
411 ```
412
413 As you might guess, this kind of analysis is complex for a human, and therefore
414 hard for a computer, too! There is an entire [guide devoted to references, ownership,
415 and lifetimes](ownership.html) that goes into this topic in
416 great detail, so if you want the full details, check that out.
417
418 ## Best practices
419
420 In general, prefer stack allocation over heap allocation. Using references to
421 stack allocated information is preferred whenever possible. Therefore,
422 references are the default pointer type you should use, unless you have a
423 specific reason to use a different type. The other types of pointers cover when
424 they're appropriate to use in their own best practices sections.
425
426 Use references when you want to use a pointer, but do not want to take ownership.
427 References just borrow ownership, which is more polite if you don't need the
428 ownership. In other words, prefer:
429
430 ```{rust}
431 fn succ(x: &i32) -> i32 { *x + 1 }
432 ```
433
434 to
435
436 ```{rust}
437 fn succ(x: Box<i32>) -> i32 { *x + 1 }
438 ```
439
440 As a corollary to that rule, references allow you to accept a wide variety of
441 other pointers, and so are useful so that you don't have to write a number
442 of variants per pointer. In other words, prefer:
443
444 ```{rust}
445 fn succ(x: &i32) -> i32 { *x + 1 }
446 ```
447
448 to
449
450 ```{rust}
451 use std::rc::Rc;
452
453 fn box_succ(x: Box<i32>) -> i32 { *x + 1 }
454
455 fn rc_succ(x: Rc<i32>) -> i32 { *x + 1 }
456 ```
457
458 Note that the caller of your function will have to modify their calls slightly:
459
460 ```{rust}
461 use std::rc::Rc;
462
463 fn succ(x: &i32) -> i32 { *x + 1 }
464
465 let ref_x = &5;
466 let box_x = Box::new(5);
467 let rc_x = Rc::new(5);
468
469 succ(ref_x);
470 succ(&*box_x);
471 succ(&*rc_x);
472 ```
473
474 The initial `*` dereferences the pointer, and then `&` takes a reference to
475 those contents.
476
477 # Boxes
478
479 `Box<T>` is Rust's *boxed pointer* type. Boxes provide the simplest form of
480 heap allocation in Rust. Creating a box looks like this:
481
482 ```{rust}
483 let x = Box::new(5);
484 ```
485
486 Boxes are heap allocated and they are deallocated automatically by Rust when
487 they go out of scope:
488
489 ```{rust}
490 {
491 let x = Box::new(5);
492
493 // stuff happens
494
495 } // x is destructed and its memory is free'd here
496 ```
497
498 However, boxes do _not_ use reference counting or garbage collection. Boxes are
499 what's called an *affine type*. This means that the Rust compiler, at compile
500 time, determines when the box comes into and goes out of scope, and inserts the
501 appropriate calls there.
502
503 You don't need to fully grok the theory of affine types to grok boxes, though.
504 As a rough approximation, you can treat this Rust code:
505
506 ```{rust}
507 {
508 let x = Box::new(5);
509
510 // stuff happens
511 }
512 ```
513
514 As being similar to this C code:
515
516 ```c
517 {
518 int *x;
519 x = (int *)malloc(sizeof(int));
520 *x = 5;
521
522 // stuff happens
523
524 free(x);
525 }
526 ```
527
528 Of course, this is a 10,000 foot view. It leaves out destructors, for example.
529 But the general idea is correct: you get the semantics of `malloc`/`free`, but
530 with some improvements:
531
532 1. It's impossible to allocate the incorrect amount of memory, because Rust
533 figures it out from the types.
534 2. You cannot forget to `free` memory you've allocated, because Rust does it
535 for you.
536 3. Rust ensures that this `free` happens at the right time, when it is truly
537 not used. Use-after-free is not possible.
538 4. Rust enforces that no other writeable pointers alias to this heap memory,
539 which means writing to an invalid pointer is not possible.
540
541 See the section on references or the [ownership guide](ownership.html)
542 for more detail on how lifetimes work.
543
544 Using boxes and references together is very common. For example:
545
546 ```{rust}
547 fn add_one(x: &i32) -> i32 {
548 *x + 1
549 }
550
551 fn main() {
552 let x = Box::new(5);
553
554 println!("{}", add_one(&*x));
555 }
556 ```
557
558 In this case, Rust knows that `x` is being *borrowed* by the `add_one()`
559 function, and since it's only reading the value, allows it.
560
561 We can borrow `x` as read-only multiple times, even simultaneously:
562
563 ```{rust}
564 fn add(x: &i32, y: &i32) -> i32 {
565 *x + *y
566 }
567
568 fn main() {
569 let x = Box::new(5);
570
571 println!("{}", add(&*x, &*x));
572 println!("{}", add(&*x, &*x));
573 }
574 ```
575
576 We can mutably borrow `x` multiple times, but only if x itself is mutable, and
577 it may not be *simultaneously* borrowed:
578
579 ```{rust,ignore}
580 fn increment(x: &mut i32) {
581 *x += 1;
582 }
583
584 fn main() {
585 // If variable x is not "mut", this will not compile
586 let mut x = Box::new(5);
587
588 increment(&mut x);
589 increment(&mut x);
590 println!("{}", x);
591 }
592 ```
593
594 Notice the signature of `increment()` requests a mutable reference.
595
596 ## Best practices
597
598 Boxes are most appropriate to use when defining recursive data structures.
599
600 ### Recursive data structures
601
602 Sometimes, you need a recursive data structure. The simplest is known as a
603 *cons list*:
604
605
606 ```{rust}
607 #[derive(Debug)]
608 enum List<T> {
609 Cons(T, Box<List<T>>),
610 Nil,
611 }
612
613 fn main() {
614 let list: List<i32> = List::Cons(1, Box::new(List::Cons(2, Box::new(List::Cons(3, Box::new(List::Nil))))));
615 println!("{:?}", list);
616 }
617 ```
618
619 This prints:
620
621 ```text
622 Cons(1, Box(Cons(2, Box(Cons(3, Box(Nil))))))
623 ```
624
625 The reference to another `List` inside of the `Cons` enum variant must be a box,
626 because we don't know the length of the list. Because we don't know the length,
627 we don't know the size, and therefore, we need to heap allocate our list.
628
629 Working with recursive or other unknown-sized data structures is the primary
630 use-case for boxes.
631
632 # Rc and Arc
633
634 This part is coming soon.
635
636 ## Best practices
637
638 This part is coming soon.
639
640 # Raw Pointers
641
642 This part is coming soon.
643
644 ## Best practices
645
646 This part is coming soon.
647
648 # Creating your own Pointers
649
650 This part is coming soon.
651
652 ## Best practices
653
654 This part is coming soon.
655
656 # Patterns and `ref`
657
658 When you're trying to match something that's stored in a pointer, there may be
659 a situation where matching directly isn't the best option available. Let's see
660 how to properly handle this:
661
662 ```{rust,ignore}
663 fn possibly_print(x: &Option<String>) {
664 match *x {
665 // BAD: cannot move out of a `&`
666 Some(s) => println!("{}", s)
667
668 // GOOD: instead take a reference into the memory of the `Option`
669 Some(ref s) => println!("{}", *s),
670 None => {}
671 }
672 }
673 ```
674
675 The `ref s` here means that `s` will be of type `&String`, rather than type
676 `String`.
677
678 This is important when the type you're trying to get access to has a destructor
679 and you don't want to move it, you just want a reference to it.
680
681 # Cheat Sheet
682
683 Here's a quick rundown of Rust's pointer types:
684
685 | Type | Name | Summary |
686 |--------------|---------------------|---------------------------------------------------------------------|
687 | `&T` | Reference | Allows one or more references to read `T` |
688 | `&mut T` | Mutable Reference | Allows a single reference to read and write `T` |
689 | `Box<T>` | Box | Heap allocated `T` with a single owner that may read and write `T`. |
690 | `Rc<T>` | "arr cee" pointer | Heap allocated `T` with many readers |
691 | `Arc<T>` | Arc pointer | Same as above, but safe sharing across threads |
692 | `*const T` | Raw pointer | Unsafe read access to `T` |
693 | `*mut T` | Mutable raw pointer | Unsafe read and write access to `T` |
694
695 # Related resources
696
697 * [API documentation for Box](../std/boxed/index.html)
698 * [Ownership guide](ownership.html)
699 * [Cyclone paper on regions](http://www.cs.umd.edu/projects/cyclone/papers/cyclone-regions.pdf), which inspired Rust's lifetime system