]>
Commit | Line | Data |
---|---|---|
1a4d82fc JJ |
1 | % Strings |
2 | ||
bd371182 | 3 | Strings are an important concept for any programmer to master. Rust’s string |
1a4d82fc JJ |
4 | handling system is a bit different from other languages, due to its systems |
5 | focus. Any time you have a data structure of variable size, things can get | |
bd371182 | 6 | tricky, and strings are a re-sizable data structure. That being said, Rust’s |
1a4d82fc JJ |
7 | strings also work differently than in some other systems languages, such as C. |
8 | ||
bd371182 AL |
9 | Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values |
10 | encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid | |
11 | encoding of UTF-8 sequences. Additionally, unlike some systems languages, | |
12 | strings are not null-terminated and can contain null bytes. | |
1a4d82fc | 13 | |
bd371182 | 14 | Rust has two main types of strings: `&str` and `String`. Let’s talk about |
92a42be0 SL |
15 | `&str` first. These are called ‘string slices’. A string slice has a fixed |
16 | size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes. | |
1a4d82fc | 17 | |
bd371182 | 18 | ```rust |
62682a34 | 19 | let greeting = "Hello there."; // greeting: &'static str |
1a4d82fc JJ |
20 | ``` |
21 | ||
92a42be0 SL |
22 | `"Hello there."` is a string literal and its type is `&'static str`. A string |
23 | literal is a string slice that is statically allocated, meaning that it’s saved | |
24 | inside our compiled program, and exists for the entire duration it runs. The | |
25 | `greeting` binding is a reference to this statically allocated string. Any | |
26 | function expecting a string slice will also accept a string literal. | |
1a4d82fc | 27 | |
92a42be0 SL |
28 | String literals can span multiple lines. There are two forms. The first will |
29 | include the newline and the leading spaces: | |
30 | ||
31 | ```rust | |
32 | let s = "foo | |
33 | bar"; | |
34 | ||
35 | assert_eq!("foo\n bar", s); | |
36 | ``` | |
37 | ||
38 | The second, with a `\`, trims the spaces and the newline: | |
39 | ||
40 | ```rust | |
41 | let s = "foo\ | |
7453a54e | 42 | bar"; |
92a42be0 SL |
43 | |
44 | assert_eq!("foobar", s); | |
45 | ``` | |
46 | ||
9cc50fc6 | 47 | Rust has more than only `&str`s though. A `String`, is a heap-allocated string. |
92a42be0 SL |
48 | This string is growable, and is also guaranteed to be UTF-8. `String`s are |
49 | commonly created by converting from a string slice using the `to_string` | |
50 | method. | |
1a4d82fc | 51 | |
bd371182 | 52 | ```rust |
1a4d82fc JJ |
53 | let mut s = "Hello".to_string(); // mut s: String |
54 | println!("{}", s); | |
55 | ||
56 | s.push_str(", world."); | |
57 | println!("{}", s); | |
58 | ``` | |
59 | ||
85aaf69f | 60 | `String`s will coerce into `&str` with an `&`: |
1a4d82fc | 61 | |
62682a34 | 62 | ```rust |
1a4d82fc JJ |
63 | fn takes_slice(slice: &str) { |
64 | println!("Got: {}", slice); | |
65 | } | |
66 | ||
67 | fn main() { | |
68 | let s = "Hello".to_string(); | |
85aaf69f | 69 | takes_slice(&s); |
1a4d82fc JJ |
70 | } |
71 | ``` | |
72 | ||
62682a34 SL |
73 | This coercion does not happen for functions that accept one of `&str`’s traits |
74 | instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter | |
75 | of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly | |
76 | converted using `&*`. | |
77 | ||
78 | ```rust,no_run | |
79 | use std::net::TcpStream; | |
80 | ||
81 | TcpStream::connect("192.168.0.1:3000"); // &str parameter | |
82 | ||
83 | let addr_string = "192.168.0.1:3000".to_string(); | |
84 | TcpStream::connect(&*addr_string); // convert addr_string to &str | |
85 | ``` | |
86 | ||
1a4d82fc JJ |
87 | Viewing a `String` as a `&str` is cheap, but converting the `&str` to a |
88 | `String` involves allocating memory. No reason to do that unless you have to! | |
89 | ||
bd371182 AL |
90 | ## Indexing |
91 | ||
92 | Because strings are valid UTF-8, strings do not support indexing: | |
93 | ||
94 | ```rust,ignore | |
95 | let s = "hello"; | |
96 | ||
97 | println!("The first letter of s is {}", s[0]); // ERROR!!! | |
98 | ``` | |
99 | ||
100 | Usually, access to a vector with `[]` is very fast. But, because each character | |
101 | in a UTF-8 encoded string can be multiple bytes, you have to walk over the | |
102 | string to find the nᵗʰ letter of a string. This is a significantly more | |
103 | expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’ | |
104 | isn’t something defined in Unicode, exactly. We can choose to look at a string as | |
105 | individual bytes, or as codepoints: | |
106 | ||
107 | ```rust | |
108 | let hachiko = "忠犬ハチ公"; | |
109 | ||
110 | for b in hachiko.as_bytes() { | |
111 | print!("{}, ", b); | |
112 | } | |
113 | ||
114 | println!(""); | |
115 | ||
116 | for c in hachiko.chars() { | |
117 | print!("{}, ", c); | |
118 | } | |
119 | ||
120 | println!(""); | |
121 | ``` | |
122 | ||
123 | This prints: | |
124 | ||
125 | ```text | |
b039eaaf SL |
126 | 229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172, |
127 | 忠, 犬, ハ, チ, 公, | |
bd371182 AL |
128 | ``` |
129 | ||
130 | As you can see, there are more bytes than `char`s. | |
131 | ||
132 | You can get something similar to an index like this: | |
133 | ||
134 | ```rust | |
135 | # let hachiko = "忠犬ハチ公"; | |
136 | let dog = hachiko.chars().nth(1); // kinda like hachiko[1] | |
137 | ``` | |
138 | ||
e9174d1e | 139 | This emphasizes that we have to walk from the beginning of the list of `chars`. |
bd371182 | 140 | |
62682a34 SL |
141 | ## Slicing |
142 | ||
143 | You can get a slice of a string with slicing syntax: | |
144 | ||
145 | ```rust | |
146 | let dog = "hachiko"; | |
147 | let hachi = &dog[0..5]; | |
148 | ``` | |
149 | ||
150 | But note that these are _byte_ offsets, not _character_ offsets. So | |
151 | this will fail at runtime: | |
152 | ||
153 | ```rust,should_panic | |
154 | let dog = "忠犬ハチ公"; | |
155 | let hachi = &dog[0..2]; | |
156 | ``` | |
157 | ||
158 | with this error: | |
159 | ||
160 | ```text | |
161 | thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on | |
162 | character boundary' | |
163 | ``` | |
164 | ||
bd371182 AL |
165 | ## Concatenation |
166 | ||
167 | If you have a `String`, you can concatenate a `&str` to the end of it: | |
168 | ||
169 | ```rust | |
170 | let hello = "Hello ".to_string(); | |
171 | let world = "world!"; | |
172 | ||
173 | let hello_world = hello + world; | |
174 | ``` | |
175 | ||
176 | But if you have two `String`s, you need an `&`: | |
177 | ||
178 | ```rust | |
179 | let hello = "Hello ".to_string(); | |
180 | let world = "world!".to_string(); | |
181 | ||
182 | let hello_world = hello + &world; | |
183 | ``` | |
184 | ||
62682a34 | 185 | This is because `&String` can automatically coerce to a `&str`. This is a |
bd371182 AL |
186 | feature called ‘[`Deref` coercions][dc]’. |
187 | ||
188 | [dc]: deref-coercions.html | |
62682a34 | 189 | [connect]: ../std/net/struct.TcpStream.html#method.connect |