]> git.proxmox.com Git - rustc.git/blame - src/doc/book/strings.md
Imported Upstream version 1.8.0+dfsg1
[rustc.git] / src / doc / book / strings.md
CommitLineData
1a4d82fc
JJ
1% Strings
2
bd371182 3Strings are an important concept for any programmer to master. Rust’s string
1a4d82fc
JJ
4handling system is a bit different from other languages, due to its systems
5focus. Any time you have a data structure of variable size, things can get
bd371182 6tricky, and strings are a re-sizable data structure. That being said, Rust’s
1a4d82fc
JJ
7strings also work differently than in some other systems languages, such as C.
8
bd371182
AL
9Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values
10encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
11encoding of UTF-8 sequences. Additionally, unlike some systems languages,
12strings are not null-terminated and can contain null bytes.
1a4d82fc 13
bd371182 14Rust has two main types of strings: `&str` and `String`. Let’s talk about
92a42be0
SL
15`&str` first. These are called ‘string slices’. A string slice has a fixed
16size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes.
1a4d82fc 17
bd371182 18```rust
62682a34 19let greeting = "Hello there."; // greeting: &'static str
1a4d82fc
JJ
20```
21
92a42be0
SL
22`"Hello there."` is a string literal and its type is `&'static str`. A string
23literal is a string slice that is statically allocated, meaning that it’s saved
24inside our compiled program, and exists for the entire duration it runs. The
25`greeting` binding is a reference to this statically allocated string. Any
26function expecting a string slice will also accept a string literal.
1a4d82fc 27
92a42be0
SL
28String literals can span multiple lines. There are two forms. The first will
29include the newline and the leading spaces:
30
31```rust
32let s = "foo
33 bar";
34
35assert_eq!("foo\n bar", s);
36```
37
38The second, with a `\`, trims the spaces and the newline:
39
40```rust
41let s = "foo\
7453a54e 42 bar";
92a42be0
SL
43
44assert_eq!("foobar", s);
45```
46
9cc50fc6 47Rust has more than only `&str`s though. A `String`, is a heap-allocated string.
92a42be0
SL
48This string is growable, and is also guaranteed to be UTF-8. `String`s are
49commonly created by converting from a string slice using the `to_string`
50method.
1a4d82fc 51
bd371182 52```rust
1a4d82fc
JJ
53let mut s = "Hello".to_string(); // mut s: String
54println!("{}", s);
55
56s.push_str(", world.");
57println!("{}", s);
58```
59
85aaf69f 60`String`s will coerce into `&str` with an `&`:
1a4d82fc 61
62682a34 62```rust
1a4d82fc
JJ
63fn takes_slice(slice: &str) {
64 println!("Got: {}", slice);
65}
66
67fn main() {
68 let s = "Hello".to_string();
85aaf69f 69 takes_slice(&s);
1a4d82fc
JJ
70}
71```
72
62682a34
SL
73This coercion does not happen for functions that accept one of `&str`’s traits
74instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter
75of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly
76converted using `&*`.
77
78```rust,no_run
79use std::net::TcpStream;
80
81TcpStream::connect("192.168.0.1:3000"); // &str parameter
82
83let addr_string = "192.168.0.1:3000".to_string();
84TcpStream::connect(&*addr_string); // convert addr_string to &str
85```
86
1a4d82fc
JJ
87Viewing a `String` as a `&str` is cheap, but converting the `&str` to a
88`String` involves allocating memory. No reason to do that unless you have to!
89
bd371182
AL
90## Indexing
91
92Because strings are valid UTF-8, strings do not support indexing:
93
94```rust,ignore
95let s = "hello";
96
97println!("The first letter of s is {}", s[0]); // ERROR!!!
98```
99
100Usually, access to a vector with `[]` is very fast. But, because each character
101in a UTF-8 encoded string can be multiple bytes, you have to walk over the
102string to find the nᵗʰ letter of a string. This is a significantly more
103expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’
104isn’t something defined in Unicode, exactly. We can choose to look at a string as
105individual bytes, or as codepoints:
106
107```rust
108let hachiko = "忠犬ハチ公";
109
110for b in hachiko.as_bytes() {
111 print!("{}, ", b);
112}
113
114println!("");
115
116for c in hachiko.chars() {
117 print!("{}, ", c);
118}
119
120println!("");
121```
122
123This prints:
124
125```text
b039eaaf
SL
126229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
127忠, 犬, ハ, チ, 公,
bd371182
AL
128```
129
130As you can see, there are more bytes than `char`s.
131
132You can get something similar to an index like this:
133
134```rust
135# let hachiko = "忠犬ハチ公";
136let dog = hachiko.chars().nth(1); // kinda like hachiko[1]
137```
138
e9174d1e 139This emphasizes that we have to walk from the beginning of the list of `chars`.
bd371182 140
62682a34
SL
141## Slicing
142
143You can get a slice of a string with slicing syntax:
144
145```rust
146let dog = "hachiko";
147let hachi = &dog[0..5];
148```
149
150But note that these are _byte_ offsets, not _character_ offsets. So
151this will fail at runtime:
152
153```rust,should_panic
154let dog = "忠犬ハチ公";
155let hachi = &dog[0..2];
156```
157
158with this error:
159
160```text
161thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on
162character boundary'
163```
164
bd371182
AL
165## Concatenation
166
167If you have a `String`, you can concatenate a `&str` to the end of it:
168
169```rust
170let hello = "Hello ".to_string();
171let world = "world!";
172
173let hello_world = hello + world;
174```
175
176But if you have two `String`s, you need an `&`:
177
178```rust
179let hello = "Hello ".to_string();
180let world = "world!".to_string();
181
182let hello_world = hello + &world;
183```
184
62682a34 185This is because `&String` can automatically coerce to a `&str`. This is a
bd371182
AL
186feature called ‘[`Deref` coercions][dc]’.
187
188[dc]: deref-coercions.html
62682a34 189[connect]: ../std/net/struct.TcpStream.html#method.connect