]> git.proxmox.com Git - rustc.git/blob - src/doc/book/strings.md
Imported Upstream version 1.9.0+dfsg1
[rustc.git] / src / doc / book / strings.md
1 % Strings
2
3 Strings are an important concept for any programmer to master. Rust’s string
4 handling system is a bit different from other languages, due to its systems
5 focus. Any time you have a data structure of variable size, things can get
6 tricky, and strings are a re-sizable data structure. That being said, Rust’s
7 strings also work differently than in some other systems languages, such as C.
8
9 Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values
10 encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
11 encoding of UTF-8 sequences. Additionally, unlike some systems languages,
12 strings are not null-terminated and can contain null bytes.
13
14 Rust has two main types of strings: `&str` and `String`. Let’s talk about
15 `&str` first. These are called ‘string slices’. A string slice has a fixed
16 size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes.
17
18 ```rust
19 let greeting = "Hello there."; // greeting: &'static str
20 ```
21
22 `"Hello there."` is a string literal and its type is `&'static str`. A string
23 literal is a string slice that is statically allocated, meaning that it’s saved
24 inside our compiled program, and exists for the entire duration it runs. The
25 `greeting` binding is a reference to this statically allocated string. Any
26 function expecting a string slice will also accept a string literal.
27
28 String literals can span multiple lines. There are two forms. The first will
29 include the newline and the leading spaces:
30
31 ```rust
32 let s = "foo
33 bar";
34
35 assert_eq!("foo\n bar", s);
36 ```
37
38 The second, with a `\`, trims the spaces and the newline:
39
40 ```rust
41 let s = "foo\
42 bar";
43
44 assert_eq!("foobar", s);
45 ```
46
47 Note that you normally cannot access a `str` directly, but only through a `&str`
48 reference. This is because `str` is an unsized type which requires additional
49 runtime information to be usable. For more information see the chapter on
50 [unsized types][ut].
51
52 Rust has more than only `&str`s though. A `String` is a heap-allocated string.
53 This string is growable, and is also guaranteed to be UTF-8. `String`s are
54 commonly created by converting from a string slice using the `to_string`
55 method.
56
57 ```rust
58 let mut s = "Hello".to_string(); // mut s: String
59 println!("{}", s);
60
61 s.push_str(", world.");
62 println!("{}", s);
63 ```
64
65 `String`s will coerce into `&str` with an `&`:
66
67 ```rust
68 fn takes_slice(slice: &str) {
69 println!("Got: {}", slice);
70 }
71
72 fn main() {
73 let s = "Hello".to_string();
74 takes_slice(&s);
75 }
76 ```
77
78 This coercion does not happen for functions that accept one of `&str`’s traits
79 instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter
80 of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly
81 converted using `&*`.
82
83 ```rust,no_run
84 use std::net::TcpStream;
85
86 TcpStream::connect("192.168.0.1:3000"); // &str parameter
87
88 let addr_string = "192.168.0.1:3000".to_string();
89 TcpStream::connect(&*addr_string); // convert addr_string to &str
90 ```
91
92 Viewing a `String` as a `&str` is cheap, but converting the `&str` to a
93 `String` involves allocating memory. No reason to do that unless you have to!
94
95 ## Indexing
96
97 Because strings are valid UTF-8, they do not support indexing:
98
99 ```rust,ignore
100 let s = "hello";
101
102 println!("The first letter of s is {}", s[0]); // ERROR!!!
103 ```
104
105 Usually, access to a vector with `[]` is very fast. But, because each character
106 in a UTF-8 encoded string can be multiple bytes, you have to walk over the
107 string to find the nᵗʰ letter of a string. This is a significantly more
108 expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’
109 isn’t something defined in Unicode, exactly. We can choose to look at a string as
110 individual bytes, or as codepoints:
111
112 ```rust
113 let hachiko = "忠犬ハチ公";
114
115 for b in hachiko.as_bytes() {
116 print!("{}, ", b);
117 }
118
119 println!("");
120
121 for c in hachiko.chars() {
122 print!("{}, ", c);
123 }
124
125 println!("");
126 ```
127
128 This prints:
129
130 ```text
131 229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
132 忠, 犬, ハ, チ, 公,
133 ```
134
135 As you can see, there are more bytes than `char`s.
136
137 You can get something similar to an index like this:
138
139 ```rust
140 # let hachiko = "忠犬ハチ公";
141 let dog = hachiko.chars().nth(1); // kinda like hachiko[1]
142 ```
143
144 This emphasizes that we have to walk from the beginning of the list of `chars`.
145
146 ## Slicing
147
148 You can get a slice of a string with slicing syntax:
149
150 ```rust
151 let dog = "hachiko";
152 let hachi = &dog[0..5];
153 ```
154
155 But note that these are _byte_ offsets, not _character_ offsets. So
156 this will fail at runtime:
157
158 ```rust,should_panic
159 let dog = "忠犬ハチ公";
160 let hachi = &dog[0..2];
161 ```
162
163 with this error:
164
165 ```text
166 thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on
167 character boundary'
168 ```
169
170 ## Concatenation
171
172 If you have a `String`, you can concatenate a `&str` to the end of it:
173
174 ```rust
175 let hello = "Hello ".to_string();
176 let world = "world!";
177
178 let hello_world = hello + world;
179 ```
180
181 But if you have two `String`s, you need an `&`:
182
183 ```rust
184 let hello = "Hello ".to_string();
185 let world = "world!".to_string();
186
187 let hello_world = hello + &world;
188 ```
189
190 This is because `&String` can automatically coerce to a `&str`. This is a
191 feature called ‘[`Deref` coercions][dc]’.
192
193 [ut]: unsized-types.html
194 [dc]: deref-coercions.html
195 [connect]: ../std/net/struct.TcpStream.html#method.connect