]> git.proxmox.com Git - rustc.git/blame - src/doc/trpl/strings.md
Imported Upstream version 1.5.0+dfsg1
[rustc.git] / src / doc / trpl / strings.md
CommitLineData
1a4d82fc
JJ
1% Strings
2
bd371182 3Strings are an important concept for any programmer to master. Rust’s string
1a4d82fc
JJ
4handling system is a bit different from other languages, due to its systems
5focus. Any time you have a data structure of variable size, things can get
bd371182 6tricky, and strings are a re-sizable data structure. That being said, Rust’s
1a4d82fc
JJ
7strings also work differently than in some other systems languages, such as C.
8
bd371182
AL
9Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values
10encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
11encoding of UTF-8 sequences. Additionally, unlike some systems languages,
12strings are not null-terminated and can contain null bytes.
1a4d82fc 13
bd371182
AL
14Rust has two main types of strings: `&str` and `String`. Let’s talk about
15`&str` first. These are called ‘string slices’. String literals are of the type
16`&'static str`:
1a4d82fc 17
bd371182 18```rust
62682a34 19let greeting = "Hello there."; // greeting: &'static str
1a4d82fc
JJ
20```
21
bd371182 22This string is statically allocated, meaning that it’s saved inside our
62682a34 23compiled program, and exists for the entire duration it runs. The `greeting`
1a4d82fc
JJ
24binding is a reference to this statically allocated string. String slices
25have a fixed size, and cannot be mutated.
26
bd371182
AL
27A `String`, on the other hand, is a heap-allocated string. This string is
28growable, and is also guaranteed to be UTF-8. `String`s are commonly created by
29converting from a string slice using the `to_string` method.
1a4d82fc 30
bd371182 31```rust
1a4d82fc
JJ
32let mut s = "Hello".to_string(); // mut s: String
33println!("{}", s);
34
35s.push_str(", world.");
36println!("{}", s);
37```
38
85aaf69f 39`String`s will coerce into `&str` with an `&`:
1a4d82fc 40
62682a34 41```rust
1a4d82fc
JJ
42fn takes_slice(slice: &str) {
43 println!("Got: {}", slice);
44}
45
46fn main() {
47 let s = "Hello".to_string();
85aaf69f 48 takes_slice(&s);
1a4d82fc
JJ
49}
50```
51
62682a34
SL
52This coercion does not happen for functions that accept one of `&str`’s traits
53instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter
54of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly
55converted using `&*`.
56
57```rust,no_run
58use std::net::TcpStream;
59
60TcpStream::connect("192.168.0.1:3000"); // &str parameter
61
62let addr_string = "192.168.0.1:3000".to_string();
63TcpStream::connect(&*addr_string); // convert addr_string to &str
64```
65
1a4d82fc
JJ
66Viewing a `String` as a `&str` is cheap, but converting the `&str` to a
67`String` involves allocating memory. No reason to do that unless you have to!
68
bd371182
AL
69## Indexing
70
71Because strings are valid UTF-8, strings do not support indexing:
72
73```rust,ignore
74let s = "hello";
75
76println!("The first letter of s is {}", s[0]); // ERROR!!!
77```
78
79Usually, access to a vector with `[]` is very fast. But, because each character
80in a UTF-8 encoded string can be multiple bytes, you have to walk over the
81string to find the nᵗʰ letter of a string. This is a significantly more
82expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’
83isn’t something defined in Unicode, exactly. We can choose to look at a string as
84individual bytes, or as codepoints:
85
86```rust
87let hachiko = "忠犬ハチ公";
88
89for b in hachiko.as_bytes() {
90 print!("{}, ", b);
91}
92
93println!("");
94
95for c in hachiko.chars() {
96 print!("{}, ", c);
97}
98
99println!("");
100```
101
102This prints:
103
104```text
b039eaaf
SL
105229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
106忠, 犬, ハ, チ, 公,
bd371182
AL
107```
108
109As you can see, there are more bytes than `char`s.
110
111You can get something similar to an index like this:
112
113```rust
114# let hachiko = "忠犬ハチ公";
115let dog = hachiko.chars().nth(1); // kinda like hachiko[1]
116```
117
e9174d1e 118This emphasizes that we have to walk from the beginning of the list of `chars`.
bd371182 119
62682a34
SL
120## Slicing
121
122You can get a slice of a string with slicing syntax:
123
124```rust
125let dog = "hachiko";
126let hachi = &dog[0..5];
127```
128
129But note that these are _byte_ offsets, not _character_ offsets. So
130this will fail at runtime:
131
132```rust,should_panic
133let dog = "忠犬ハチ公";
134let hachi = &dog[0..2];
135```
136
137with this error:
138
139```text
140thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on
141character boundary'
142```
143
bd371182
AL
144## Concatenation
145
146If you have a `String`, you can concatenate a `&str` to the end of it:
147
148```rust
149let hello = "Hello ".to_string();
150let world = "world!";
151
152let hello_world = hello + world;
153```
154
155But if you have two `String`s, you need an `&`:
156
157```rust
158let hello = "Hello ".to_string();
159let world = "world!".to_string();
160
161let hello_world = hello + &world;
162```
163
62682a34 164This is because `&String` can automatically coerce to a `&str`. This is a
bd371182
AL
165feature called ‘[`Deref` coercions][dc]’.
166
167[dc]: deref-coercions.html
62682a34 168[connect]: ../std/net/struct.TcpStream.html#method.connect