]>
Commit | Line | Data |
---|---|---|
1a4d82fc JJ |
1 | % Strings |
2 | ||
bd371182 | 3 | Strings are an important concept for any programmer to master. Rust’s string |
1a4d82fc JJ |
4 | handling system is a bit different from other languages, due to its systems |
5 | focus. Any time you have a data structure of variable size, things can get | |
bd371182 | 6 | tricky, and strings are a re-sizable data structure. That being said, Rust’s |
1a4d82fc JJ |
7 | strings also work differently than in some other systems languages, such as C. |
8 | ||
bd371182 AL |
9 | Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values |
10 | encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid | |
11 | encoding of UTF-8 sequences. Additionally, unlike some systems languages, | |
12 | strings are not null-terminated and can contain null bytes. | |
1a4d82fc | 13 | |
bd371182 | 14 | Rust has two main types of strings: `&str` and `String`. Let’s talk about |
92a42be0 SL |
15 | `&str` first. These are called ‘string slices’. A string slice has a fixed |
16 | size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes. | |
1a4d82fc | 17 | |
bd371182 | 18 | ```rust |
62682a34 | 19 | let greeting = "Hello there."; // greeting: &'static str |
1a4d82fc JJ |
20 | ``` |
21 | ||
92a42be0 SL |
22 | `"Hello there."` is a string literal and its type is `&'static str`. A string |
23 | literal is a string slice that is statically allocated, meaning that it’s saved | |
24 | inside our compiled program, and exists for the entire duration it runs. The | |
25 | `greeting` binding is a reference to this statically allocated string. Any | |
26 | function expecting a string slice will also accept a string literal. | |
1a4d82fc | 27 | |
92a42be0 SL |
28 | String literals can span multiple lines. There are two forms. The first will |
29 | include the newline and the leading spaces: | |
30 | ||
31 | ```rust | |
32 | let s = "foo | |
33 | bar"; | |
34 | ||
35 | assert_eq!("foo\n bar", s); | |
36 | ``` | |
37 | ||
38 | The second, with a `\`, trims the spaces and the newline: | |
39 | ||
40 | ```rust | |
41 | let s = "foo\ | |
7453a54e | 42 | bar"; |
92a42be0 SL |
43 | |
44 | assert_eq!("foobar", s); | |
45 | ``` | |
46 | ||
54a0048b SL |
47 | Note that you normally cannot access a `str` directly, but only through a `&str` |
48 | reference. This is because `str` is an unsized type which requires additional | |
49 | runtime information to be usable. For more information see the chapter on | |
50 | [unsized types][ut]. | |
51 | ||
52 | Rust has more than only `&str`s though. A `String` is a heap-allocated string. | |
92a42be0 SL |
53 | This string is growable, and is also guaranteed to be UTF-8. `String`s are |
54 | commonly created by converting from a string slice using the `to_string` | |
55 | method. | |
1a4d82fc | 56 | |
bd371182 | 57 | ```rust |
1a4d82fc JJ |
58 | let mut s = "Hello".to_string(); // mut s: String |
59 | println!("{}", s); | |
60 | ||
61 | s.push_str(", world."); | |
62 | println!("{}", s); | |
63 | ``` | |
64 | ||
85aaf69f | 65 | `String`s will coerce into `&str` with an `&`: |
1a4d82fc | 66 | |
62682a34 | 67 | ```rust |
1a4d82fc JJ |
68 | fn takes_slice(slice: &str) { |
69 | println!("Got: {}", slice); | |
70 | } | |
71 | ||
72 | fn main() { | |
73 | let s = "Hello".to_string(); | |
85aaf69f | 74 | takes_slice(&s); |
1a4d82fc JJ |
75 | } |
76 | ``` | |
77 | ||
62682a34 SL |
78 | This coercion does not happen for functions that accept one of `&str`’s traits |
79 | instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter | |
80 | of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly | |
81 | converted using `&*`. | |
82 | ||
83 | ```rust,no_run | |
84 | use std::net::TcpStream; | |
85 | ||
86 | TcpStream::connect("192.168.0.1:3000"); // &str parameter | |
87 | ||
88 | let addr_string = "192.168.0.1:3000".to_string(); | |
89 | TcpStream::connect(&*addr_string); // convert addr_string to &str | |
90 | ``` | |
91 | ||
1a4d82fc JJ |
92 | Viewing a `String` as a `&str` is cheap, but converting the `&str` to a |
93 | `String` involves allocating memory. No reason to do that unless you have to! | |
94 | ||
bd371182 AL |
95 | ## Indexing |
96 | ||
54a0048b | 97 | Because strings are valid UTF-8, they do not support indexing: |
bd371182 AL |
98 | |
99 | ```rust,ignore | |
100 | let s = "hello"; | |
101 | ||
102 | println!("The first letter of s is {}", s[0]); // ERROR!!! | |
103 | ``` | |
104 | ||
105 | Usually, access to a vector with `[]` is very fast. But, because each character | |
106 | in a UTF-8 encoded string can be multiple bytes, you have to walk over the | |
107 | string to find the nᵗʰ letter of a string. This is a significantly more | |
108 | expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’ | |
109 | isn’t something defined in Unicode, exactly. We can choose to look at a string as | |
110 | individual bytes, or as codepoints: | |
111 | ||
112 | ```rust | |
113 | let hachiko = "忠犬ハチ公"; | |
114 | ||
115 | for b in hachiko.as_bytes() { | |
116 | print!("{}, ", b); | |
117 | } | |
118 | ||
119 | println!(""); | |
120 | ||
121 | for c in hachiko.chars() { | |
122 | print!("{}, ", c); | |
123 | } | |
124 | ||
125 | println!(""); | |
126 | ``` | |
127 | ||
128 | This prints: | |
129 | ||
130 | ```text | |
b039eaaf SL |
131 | 229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172, |
132 | 忠, 犬, ハ, チ, 公, | |
bd371182 AL |
133 | ``` |
134 | ||
135 | As you can see, there are more bytes than `char`s. | |
136 | ||
137 | You can get something similar to an index like this: | |
138 | ||
139 | ```rust | |
140 | # let hachiko = "忠犬ハチ公"; | |
141 | let dog = hachiko.chars().nth(1); // kinda like hachiko[1] | |
142 | ``` | |
143 | ||
e9174d1e | 144 | This emphasizes that we have to walk from the beginning of the list of `chars`. |
bd371182 | 145 | |
62682a34 SL |
146 | ## Slicing |
147 | ||
148 | You can get a slice of a string with slicing syntax: | |
149 | ||
150 | ```rust | |
151 | let dog = "hachiko"; | |
152 | let hachi = &dog[0..5]; | |
153 | ``` | |
154 | ||
155 | But note that these are _byte_ offsets, not _character_ offsets. So | |
156 | this will fail at runtime: | |
157 | ||
158 | ```rust,should_panic | |
159 | let dog = "忠犬ハチ公"; | |
160 | let hachi = &dog[0..2]; | |
161 | ``` | |
162 | ||
163 | with this error: | |
164 | ||
165 | ```text | |
166 | thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on | |
167 | character boundary' | |
168 | ``` | |
169 | ||
bd371182 AL |
170 | ## Concatenation |
171 | ||
172 | If you have a `String`, you can concatenate a `&str` to the end of it: | |
173 | ||
174 | ```rust | |
175 | let hello = "Hello ".to_string(); | |
176 | let world = "world!"; | |
177 | ||
178 | let hello_world = hello + world; | |
179 | ``` | |
180 | ||
181 | But if you have two `String`s, you need an `&`: | |
182 | ||
183 | ```rust | |
184 | let hello = "Hello ".to_string(); | |
185 | let world = "world!".to_string(); | |
186 | ||
187 | let hello_world = hello + &world; | |
188 | ``` | |
189 | ||
62682a34 | 190 | This is because `&String` can automatically coerce to a `&str`. This is a |
bd371182 AL |
191 | feature called ‘[`Deref` coercions][dc]’. |
192 | ||
54a0048b | 193 | [ut]: unsized-types.html |
bd371182 | 194 | [dc]: deref-coercions.html |
62682a34 | 195 | [connect]: ../std/net/struct.TcpStream.html#method.connect |