]>
Commit | Line | Data |
---|---|---|
1a4d82fc JJ |
1 | % Strings |
2 | ||
bd371182 | 3 | Strings are an important concept for any programmer to master. Rust’s string |
1a4d82fc JJ |
4 | handling system is a bit different from other languages, due to its systems |
5 | focus. Any time you have a data structure of variable size, things can get | |
bd371182 | 6 | tricky, and strings are a re-sizable data structure. That being said, Rust’s |
1a4d82fc JJ |
7 | strings also work differently than in some other systems languages, such as C. |
8 | ||
bd371182 AL |
9 | Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values |
10 | encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid | |
11 | encoding of UTF-8 sequences. Additionally, unlike some systems languages, | |
12 | strings are not null-terminated and can contain null bytes. | |
1a4d82fc | 13 | |
bd371182 AL |
14 | Rust has two main types of strings: `&str` and `String`. Let’s talk about |
15 | `&str` first. These are called ‘string slices’. String literals are of the type | |
16 | `&'static str`: | |
1a4d82fc | 17 | |
bd371182 | 18 | ```rust |
62682a34 | 19 | let greeting = "Hello there."; // greeting: &'static str |
1a4d82fc JJ |
20 | ``` |
21 | ||
bd371182 | 22 | This string is statically allocated, meaning that it’s saved inside our |
62682a34 | 23 | compiled program, and exists for the entire duration it runs. The `greeting` |
1a4d82fc JJ |
24 | binding is a reference to this statically allocated string. String slices |
25 | have a fixed size, and cannot be mutated. | |
26 | ||
bd371182 AL |
27 | A `String`, on the other hand, is a heap-allocated string. This string is |
28 | growable, and is also guaranteed to be UTF-8. `String`s are commonly created by | |
29 | converting from a string slice using the `to_string` method. | |
1a4d82fc | 30 | |
bd371182 | 31 | ```rust |
1a4d82fc JJ |
32 | let mut s = "Hello".to_string(); // mut s: String |
33 | println!("{}", s); | |
34 | ||
35 | s.push_str(", world."); | |
36 | println!("{}", s); | |
37 | ``` | |
38 | ||
85aaf69f | 39 | `String`s will coerce into `&str` with an `&`: |
1a4d82fc | 40 | |
62682a34 | 41 | ```rust |
1a4d82fc JJ |
42 | fn takes_slice(slice: &str) { |
43 | println!("Got: {}", slice); | |
44 | } | |
45 | ||
46 | fn main() { | |
47 | let s = "Hello".to_string(); | |
85aaf69f | 48 | takes_slice(&s); |
1a4d82fc JJ |
49 | } |
50 | ``` | |
51 | ||
62682a34 SL |
52 | This coercion does not happen for functions that accept one of `&str`’s traits |
53 | instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter | |
54 | of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly | |
55 | converted using `&*`. | |
56 | ||
57 | ```rust,no_run | |
58 | use std::net::TcpStream; | |
59 | ||
60 | TcpStream::connect("192.168.0.1:3000"); // &str parameter | |
61 | ||
62 | let addr_string = "192.168.0.1:3000".to_string(); | |
63 | TcpStream::connect(&*addr_string); // convert addr_string to &str | |
64 | ``` | |
65 | ||
1a4d82fc JJ |
66 | Viewing a `String` as a `&str` is cheap, but converting the `&str` to a |
67 | `String` involves allocating memory. No reason to do that unless you have to! | |
68 | ||
bd371182 AL |
69 | ## Indexing |
70 | ||
71 | Because strings are valid UTF-8, strings do not support indexing: | |
72 | ||
73 | ```rust,ignore | |
74 | let s = "hello"; | |
75 | ||
76 | println!("The first letter of s is {}", s[0]); // ERROR!!! | |
77 | ``` | |
78 | ||
79 | Usually, access to a vector with `[]` is very fast. But, because each character | |
80 | in a UTF-8 encoded string can be multiple bytes, you have to walk over the | |
81 | string to find the nᵗʰ letter of a string. This is a significantly more | |
82 | expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’ | |
83 | isn’t something defined in Unicode, exactly. We can choose to look at a string as | |
84 | individual bytes, or as codepoints: | |
85 | ||
86 | ```rust | |
87 | let hachiko = "忠犬ハチ公"; | |
88 | ||
89 | for b in hachiko.as_bytes() { | |
90 | print!("{}, ", b); | |
91 | } | |
92 | ||
93 | println!(""); | |
94 | ||
95 | for c in hachiko.chars() { | |
96 | print!("{}, ", c); | |
97 | } | |
98 | ||
99 | println!(""); | |
100 | ``` | |
101 | ||
102 | This prints: | |
103 | ||
104 | ```text | |
b039eaaf SL |
105 | 229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172, |
106 | 忠, 犬, ハ, チ, 公, | |
bd371182 AL |
107 | ``` |
108 | ||
109 | As you can see, there are more bytes than `char`s. | |
110 | ||
111 | You can get something similar to an index like this: | |
112 | ||
113 | ```rust | |
114 | # let hachiko = "忠犬ハチ公"; | |
115 | let dog = hachiko.chars().nth(1); // kinda like hachiko[1] | |
116 | ``` | |
117 | ||
e9174d1e | 118 | This emphasizes that we have to walk from the beginning of the list of `chars`. |
bd371182 | 119 | |
62682a34 SL |
120 | ## Slicing |
121 | ||
122 | You can get a slice of a string with slicing syntax: | |
123 | ||
124 | ```rust | |
125 | let dog = "hachiko"; | |
126 | let hachi = &dog[0..5]; | |
127 | ``` | |
128 | ||
129 | But note that these are _byte_ offsets, not _character_ offsets. So | |
130 | this will fail at runtime: | |
131 | ||
132 | ```rust,should_panic | |
133 | let dog = "忠犬ハチ公"; | |
134 | let hachi = &dog[0..2]; | |
135 | ``` | |
136 | ||
137 | with this error: | |
138 | ||
139 | ```text | |
140 | thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on | |
141 | character boundary' | |
142 | ``` | |
143 | ||
bd371182 AL |
144 | ## Concatenation |
145 | ||
146 | If you have a `String`, you can concatenate a `&str` to the end of it: | |
147 | ||
148 | ```rust | |
149 | let hello = "Hello ".to_string(); | |
150 | let world = "world!"; | |
151 | ||
152 | let hello_world = hello + world; | |
153 | ``` | |
154 | ||
155 | But if you have two `String`s, you need an `&`: | |
156 | ||
157 | ```rust | |
158 | let hello = "Hello ".to_string(); | |
159 | let world = "world!".to_string(); | |
160 | ||
161 | let hello_world = hello + &world; | |
162 | ``` | |
163 | ||
62682a34 | 164 | This is because `&String` can automatically coerce to a `&str`. This is a |
bd371182 AL |
165 | feature called ‘[`Deref` coercions][dc]’. |
166 | ||
167 | [dc]: deref-coercions.html | |
62682a34 | 168 | [connect]: ../std/net/struct.TcpStream.html#method.connect |