]> git.proxmox.com Git - rustc.git/blame - src/doc/book/strings.md
Imported Upstream version 1.9.0+dfsg1
[rustc.git] / src / doc / book / strings.md
CommitLineData
1a4d82fc
JJ
1% Strings
2
bd371182 3Strings are an important concept for any programmer to master. Rust’s string
1a4d82fc
JJ
4handling system is a bit different from other languages, due to its systems
5focus. Any time you have a data structure of variable size, things can get
bd371182 6tricky, and strings are a re-sizable data structure. That being said, Rust’s
1a4d82fc
JJ
7strings also work differently than in some other systems languages, such as C.
8
bd371182
AL
9Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values
10encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
11encoding of UTF-8 sequences. Additionally, unlike some systems languages,
12strings are not null-terminated and can contain null bytes.
1a4d82fc 13
bd371182 14Rust has two main types of strings: `&str` and `String`. Let’s talk about
92a42be0
SL
15`&str` first. These are called ‘string slices’. A string slice has a fixed
16size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes.
1a4d82fc 17
bd371182 18```rust
62682a34 19let greeting = "Hello there."; // greeting: &'static str
1a4d82fc
JJ
20```
21
92a42be0
SL
22`"Hello there."` is a string literal and its type is `&'static str`. A string
23literal is a string slice that is statically allocated, meaning that it’s saved
24inside our compiled program, and exists for the entire duration it runs. The
25`greeting` binding is a reference to this statically allocated string. Any
26function expecting a string slice will also accept a string literal.
1a4d82fc 27
92a42be0
SL
28String literals can span multiple lines. There are two forms. The first will
29include the newline and the leading spaces:
30
31```rust
32let s = "foo
33 bar";
34
35assert_eq!("foo\n bar", s);
36```
37
38The second, with a `\`, trims the spaces and the newline:
39
40```rust
41let s = "foo\
7453a54e 42 bar";
92a42be0
SL
43
44assert_eq!("foobar", s);
45```
46
54a0048b
SL
47Note that you normally cannot access a `str` directly, but only through a `&str`
48reference. This is because `str` is an unsized type which requires additional
49runtime information to be usable. For more information see the chapter on
50[unsized types][ut].
51
52Rust has more than only `&str`s though. A `String` is a heap-allocated string.
92a42be0
SL
53This string is growable, and is also guaranteed to be UTF-8. `String`s are
54commonly created by converting from a string slice using the `to_string`
55method.
1a4d82fc 56
bd371182 57```rust
1a4d82fc
JJ
58let mut s = "Hello".to_string(); // mut s: String
59println!("{}", s);
60
61s.push_str(", world.");
62println!("{}", s);
63```
64
85aaf69f 65`String`s will coerce into `&str` with an `&`:
1a4d82fc 66
62682a34 67```rust
1a4d82fc
JJ
68fn takes_slice(slice: &str) {
69 println!("Got: {}", slice);
70}
71
72fn main() {
73 let s = "Hello".to_string();
85aaf69f 74 takes_slice(&s);
1a4d82fc
JJ
75}
76```
77
62682a34
SL
78This coercion does not happen for functions that accept one of `&str`’s traits
79instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter
80of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly
81converted using `&*`.
82
83```rust,no_run
84use std::net::TcpStream;
85
86TcpStream::connect("192.168.0.1:3000"); // &str parameter
87
88let addr_string = "192.168.0.1:3000".to_string();
89TcpStream::connect(&*addr_string); // convert addr_string to &str
90```
91
1a4d82fc
JJ
92Viewing a `String` as a `&str` is cheap, but converting the `&str` to a
93`String` involves allocating memory. No reason to do that unless you have to!
94
bd371182
AL
95## Indexing
96
54a0048b 97Because strings are valid UTF-8, they do not support indexing:
bd371182
AL
98
99```rust,ignore
100let s = "hello";
101
102println!("The first letter of s is {}", s[0]); // ERROR!!!
103```
104
105Usually, access to a vector with `[]` is very fast. But, because each character
106in a UTF-8 encoded string can be multiple bytes, you have to walk over the
107string to find the nᵗʰ letter of a string. This is a significantly more
108expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’
109isn’t something defined in Unicode, exactly. We can choose to look at a string as
110individual bytes, or as codepoints:
111
112```rust
113let hachiko = "忠犬ハチ公";
114
115for b in hachiko.as_bytes() {
116 print!("{}, ", b);
117}
118
119println!("");
120
121for c in hachiko.chars() {
122 print!("{}, ", c);
123}
124
125println!("");
126```
127
128This prints:
129
130```text
b039eaaf
SL
131229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
132忠, 犬, ハ, チ, 公,
bd371182
AL
133```
134
135As you can see, there are more bytes than `char`s.
136
137You can get something similar to an index like this:
138
139```rust
140# let hachiko = "忠犬ハチ公";
141let dog = hachiko.chars().nth(1); // kinda like hachiko[1]
142```
143
e9174d1e 144This emphasizes that we have to walk from the beginning of the list of `chars`.
bd371182 145
62682a34
SL
146## Slicing
147
148You can get a slice of a string with slicing syntax:
149
150```rust
151let dog = "hachiko";
152let hachi = &dog[0..5];
153```
154
155But note that these are _byte_ offsets, not _character_ offsets. So
156this will fail at runtime:
157
158```rust,should_panic
159let dog = "忠犬ハチ公";
160let hachi = &dog[0..2];
161```
162
163with this error:
164
165```text
166thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on
167character boundary'
168```
169
bd371182
AL
170## Concatenation
171
172If you have a `String`, you can concatenate a `&str` to the end of it:
173
174```rust
175let hello = "Hello ".to_string();
176let world = "world!";
177
178let hello_world = hello + world;
179```
180
181But if you have two `String`s, you need an `&`:
182
183```rust
184let hello = "Hello ".to_string();
185let world = "world!".to_string();
186
187let hello_world = hello + &world;
188```
189
62682a34 190This is because `&String` can automatically coerce to a `&str`. This is a
bd371182
AL
191feature called ‘[`Deref` coercions][dc]’.
192
54a0048b 193[ut]: unsized-types.html
bd371182 194[dc]: deref-coercions.html
62682a34 195[connect]: ../std/net/struct.TcpStream.html#method.connect