]> git.proxmox.com Git - rustc.git/blob - src/doc/reference/src/tokens.md
New upstream version 1.28.0~beta.14+dfsg1
[rustc.git] / src / doc / reference / src / tokens.md
1 # Tokens
2
3 Tokens are primitive productions in the grammar defined by regular
4 (non-recursive) languages. "Simple" tokens are given in [string table
5 production] form, and occur in the rest of the
6 grammar in `monospace` font. Other tokens have exact rules given.
7
8 [string table production]: notation.html#string-table-productions
9
10 ## Literals
11
12 A literal is an expression consisting of a single token, rather than a sequence
13 of tokens, that immediately and directly denotes the value it evaluates to,
14 rather than referring to it by name or some other evaluation rule. A literal is
15 a form of [constant expression](expressions.html#constant-expressions), so is
16 evaluated (primarily) at compile time.
17
18 ### Examples
19
20 #### Characters and strings
21
22 | | Example | `#` sets | Characters | Escapes |
23 |----------------------------------------------|-----------------|-------------|-------------|---------------------|
24 | [Character](#character-literals) | `'H'` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
25 | [String](#string-literals) | `"hello"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
26 | [Raw](#raw-string-literals) | `r#"hello"#` | 0 or more\* | All Unicode | `N/A` |
27 | [Byte](#byte-literals) | `b'H'` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
28 | [Byte string](#byte-string-literals) | `b"hello"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
29 | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | 0 or more\* | All ASCII | `N/A` |
30
31 \* The number of `#`s on each side of the same literal must be equivalent
32
33 #### ASCII escapes
34
35 | | Name |
36 |---|------|
37 | `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) |
38 | `\n` | Newline |
39 | `\r` | Carriage return |
40 | `\t` | Tab |
41 | `\\` | Backslash |
42 | `\0` | Null |
43
44 #### Byte escapes
45
46 | | Name |
47 |---|------|
48 | `\x7F` | 8-bit character code (exactly 2 digits) |
49 | `\n` | Newline |
50 | `\r` | Carriage return |
51 | `\t` | Tab |
52 | `\\` | Backslash |
53 | `\0` | Null |
54
55 #### Unicode escapes
56
57 | | Name |
58 |---|------|
59 | `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) |
60
61 #### Quote escapes
62
63 | | Name |
64 |---|------|
65 | `\'` | Single quote |
66 | `\"` | Double quote |
67
68 #### Numbers
69
70 | [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
71 |----------------------------------------|---------|----------------|----------|
72 | Decimal integer | `98_222` | `N/A` | Integer suffixes |
73 | Hex integer | `0xff` | `N/A` | Integer suffixes |
74 | Octal integer | `0o77` | `N/A` | Integer suffixes |
75 | Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
76 | Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
77
78 `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
79
80 #### Suffixes
81
82 | Integer | Floating-point |
83 |---------|----------------|
84 | `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize` | `f32`, `f64` |
85
86 ### Character and string literals
87
88 #### Character literals
89
90 > **<sup>Lexer</sup>**
91 > CHAR_LITERAL :
92 > &nbsp;&nbsp; `'` ( ~[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
93 >
94 > QUOTE_ESCAPE :
95 > &nbsp;&nbsp; `\'` | `\"`
96 >
97 > ASCII_ESCAPE :
98 > &nbsp;&nbsp; &nbsp;&nbsp; `\x` OCT_DIGIT HEX_DIGIT
99 > &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
100 >
101 > UNICODE_ESCAPE :
102 > &nbsp;&nbsp; `\u{` ( HEX_DIGIT `_`<sup>\*</sup> )<sup>1..6</sup> `}`
103
104 A _character literal_ is a single Unicode character enclosed within two
105 `U+0027` (single-quote) characters, with the exception of `U+0027` itself,
106 which must be _escaped_ by a preceding `U+005C` character (`\`).
107
108 #### String literals
109
110 > **<sup>Lexer</sup>**
111 > STRING_LITERAL :
112 > &nbsp;&nbsp; `"` (
113 > &nbsp;&nbsp; &nbsp;&nbsp; ~[`"` `\` _IsolatedCR_]
114 > &nbsp;&nbsp; &nbsp;&nbsp; | QUOTE_ESCAPE
115 > &nbsp;&nbsp; &nbsp;&nbsp; | ASCII_ESCAPE
116 > &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE
117 > &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE
118 > &nbsp;&nbsp; )<sup>\*</sup> `"`
119 >
120 > STRING_CONTINUE :
121 > &nbsp;&nbsp; `\` _followed by_ \\n
122
123 A _string literal_ is a sequence of any Unicode characters enclosed within two
124 `U+0022` (double-quote) characters, with the exception of `U+0022` itself,
125 which must be _escaped_ by a preceding `U+005C` character (`\`).
126
127 Line-break characters are allowed in string literals. Normally they represent
128 themselves (i.e. no translation), but as a special exception, when an unescaped
129 `U+005C` character (`\`) occurs immediately before the newline (`U+000A`), the
130 `U+005C` character, the newline, and all whitespace at the beginning of the
131 next line are ignored. Thus `a` and `b` are equal:
132
133 ```rust
134 let a = "foobar";
135 let b = "foo\
136 bar";
137
138 assert_eq!(a,b);
139 ```
140
141 #### Character escapes
142
143 Some additional _escapes_ are available in either character or non-raw string
144 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
145 following forms:
146
147 * A _7-bit code point escape_ starts with `U+0078` (`x`) and is
148 followed by exactly two _hex digits_ with value up to `0x7F`. It denotes the
149 ASCII character with value equal to the provided hex value. Higher values are
150 not permitted because it is ambiguous whether they mean Unicode code points or
151 byte values.
152 * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
153 by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
154 (`}`). It denotes the Unicode code point equal to the provided hex value.
155 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
156 (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF),
157 `U+000D` (CR) or `U+0009` (HT) respectively.
158 * The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode
159 value `U+0000` (NUL).
160 * The _backslash escape_ is the character `U+005C` (`\`) which must be
161 escaped in order to denote itself.
162
163 #### Raw string literals
164
165 > **<sup>Lexer</sup>**
166 > RAW_STRING_LITERAL :
167 > &nbsp;&nbsp; `r` RAW_STRING_CONTENT
168 >
169 > RAW_STRING_CONTENT :
170 > &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`
171 > &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
172
173 Raw string literals do not process any escapes. They start with the character
174 `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
175 `U+0022` (double-quote) character. The _raw string body_ can contain any sequence
176 of Unicode characters and is terminated only by another `U+0022` (double-quote)
177 character, followed by the same number of `U+0023` (`#`) characters that preceded
178 the opening `U+0022` (double-quote) character.
179
180 All Unicode characters contained in the raw string body represent themselves,
181 the characters `U+0022` (double-quote) (except when followed by at least as
182 many `U+0023` (`#`) characters as were used to start the raw string literal) or
183 `U+005C` (`\`) do not have any special meaning.
184
185 Examples for string literals:
186
187 ```rust
188 "foo"; r"foo"; // foo
189 "\"foo\""; r#""foo""#; // "foo"
190
191 "foo #\"# bar";
192 r##"foo #"# bar"##; // foo #"# bar
193
194 "\x52"; "R"; r"R"; // R
195 "\\x52"; r"\x52"; // \x52
196 ```
197
198 ### Byte and byte string literals
199
200 #### Byte literals
201
202 > **<sup>Lexer</sup>**
203 > BYTE_LITERAL :
204 > &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'`
205 >
206 > ASCII_FOR_CHAR :
207 > &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `\`, \\n, \\r or \\t
208 >
209 > BYTE_ESCAPE :
210 > &nbsp;&nbsp; &nbsp;&nbsp; `\x` HEX_DIGIT HEX_DIGIT
211 > &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
212
213 A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
214 range) or a single _escape_ preceded by the characters `U+0062` (`b`) and
215 `U+0027` (single-quote), and followed by the character `U+0027`. If the character
216 `U+0027` is present within the literal, it must be _escaped_ by a preceding
217 `U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer
218 _number literal_.
219
220 #### Byte string literals
221
222 > **<sup>Lexer</sup>**
223 > BYTE_STRING_LITERAL :
224 > &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
225 >
226 > ASCII_FOR_STRING :
227 > &nbsp;&nbsp; _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `\` _and IsolatedCR_
228
229 A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_,
230 preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and
231 followed by the character `U+0022`. If the character `U+0022` is present within
232 the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
233 Alternatively, a byte string literal can be a _raw byte string literal_, defined
234 below. The type of a byte string literal of length `n` is `&'static [u8; n]`.
235
236 Some additional _escapes_ are available in either byte or non-raw byte string
237 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
238 following forms:
239
240 * A _byte escape_ escape starts with `U+0078` (`x`) and is
241 followed by exactly two _hex digits_. It denotes the byte
242 equal to the provided hex value.
243 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
244 (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
245 `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
246 * The _null escape_ is the character `U+0030` (`0`) and denotes the byte
247 value `0x00` (ASCII NUL).
248 * The _backslash escape_ is the character `U+005C` (`\`) which must be
249 escaped in order to denote its ASCII encoding `0x5C`.
250
251 #### Raw byte string literals
252
253 > **<sup>Lexer</sup>**
254 > RAW_BYTE_STRING_LITERAL :
255 > &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT
256 >
257 > RAW_BYTE_STRING_CONTENT :
258 > &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`
259 > &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
260 >
261 > ASCII :
262 > &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F)_
263
264 Raw byte string literals do not process any escapes. They start with the
265 character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
266 of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
267 _raw string body_ can contain any sequence of ASCII characters and is terminated
268 only by another `U+0022` (double-quote) character, followed by the same number of
269 `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
270 character. A raw byte string literal can not contain any non-ASCII byte.
271
272 All characters contained in the raw string body represent their ASCII encoding,
273 the characters `U+0022` (double-quote) (except when followed by at least as
274 many `U+0023` (`#`) characters as were used to start the raw string literal) or
275 `U+005C` (`\`) do not have any special meaning.
276
277 Examples for byte string literals:
278
279 ```rust
280 b"foo"; br"foo"; // foo
281 b"\"foo\""; br#""foo""#; // "foo"
282
283 b"foo #\"# bar";
284 br##"foo #"# bar"##; // foo #"# bar
285
286 b"\x52"; b"R"; br"R"; // R
287 b"\\x52"; br"\x52"; // \x52
288 ```
289
290 ### Number literals
291
292 A _number literal_ is either an _integer literal_ or a _floating-point
293 literal_. The grammar for recognizing the two kinds of literals is mixed.
294
295 #### Integer literals
296
297 > **<sup>Lexer</sup>**
298 > INTEGER_LITERAL :
299 > &nbsp;&nbsp; ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
300 > INTEGER_SUFFIX<sup>?</sup>
301 >
302 > DEC_LITERAL :
303 > &nbsp;&nbsp; DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
304 >
305 > TUPLE_INDEX :
306 > &nbsp;&nbsp; &nbsp;&nbsp; `0`
307 > &nbsp;&nbsp; | NON_ZERO_DEC_DIGIT DEC_DIGIT<sup>\*</sup>
308 >
309 > BIN_LITERAL :
310 > &nbsp;&nbsp; `0b` (BIN_DIGIT|`_`)<sup>\*</sup> BIN_DIGIT (BIN_DIGIT|`_`)<sup>\*</sup>
311 >
312 > OCT_LITERAL :
313 > &nbsp;&nbsp; `0o` (OCT_DIGIT|`_`)<sup>\*</sup> OCT_DIGIT (OCT_DIGIT|`_`)<sup>\*</sup>
314 >
315 > HEX_LITERAL :
316 > &nbsp;&nbsp; `0x` (HEX_DIGIT|`_`)<sup>\*</sup> HEX_DIGIT (HEX_DIGIT|`_`)<sup>\*</sup>
317 >
318 > BIN_DIGIT : [`0`-`1`]
319 >
320 > OCT_DIGIT : [`0`-`7`]
321 >
322 > DEC_DIGIT : [`0`-`9`]
323 >
324 > NON_ZERO_DEC_DIGIT : [`1`-`9`]
325 >
326 > HEX_DIGIT : [`0`-`9` `a`-`f` `A`-`F`]
327 >
328 > INTEGER_SUFFIX :
329 > &nbsp;&nbsp; &nbsp;&nbsp; `u8` | `u16` | `u32` | `u64` | `u128` | `usize`
330 > &nbsp;&nbsp; | `i8` | `i16` | `i32` | `i64` | `i128` | `isize`
331
332 An _integer literal_ has one of four forms:
333
334 * A _decimal literal_ starts with a *decimal digit* and continues with any
335 mixture of *decimal digits* and _underscores_.
336 * A _tuple index_ is either `0`, or starts with a *non-zero decimal digit* and
337 continues with zero or more decimal digits. Tuple indexes are used to refer
338 to the fields of [tuples], [tuple structs] and [tuple variants].
339 * A _hex literal_ starts with the character sequence `U+0030` `U+0078`
340 (`0x`) and continues as any mixture (with at least one digit) of hex digits
341 and underscores.
342 * An _octal literal_ starts with the character sequence `U+0030` `U+006F`
343 (`0o`) and continues as any mixture (with at least one digit) of octal digits
344 and underscores.
345 * A _binary literal_ starts with the character sequence `U+0030` `U+0062`
346 (`0b`) and continues as any mixture (with at least one digit) of binary digits
347 and underscores.
348
349 Like any literal, an integer literal may be followed (immediately,
350 without any spaces) by an _integer suffix_, which forcibly sets the
351 type of the literal. The integer suffix must be the name of one of the
352 integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`,
353 `u128`, `i128`, `usize`, or `isize`.
354
355 The type of an _unsuffixed_ integer literal is determined by type inference:
356
357 * If an integer type can be _uniquely_ determined from the surrounding
358 program context, the unsuffixed integer literal has that type.
359
360 * If the program context under-constrains the type, it defaults to the
361 signed 32-bit integer `i32`.
362
363 * If the program context over-constrains the type, it is considered a
364 static type error.
365
366 Examples of integer literals of various forms:
367
368 ```rust
369 123; // type i32
370 123i32; // type i32
371 123u32; // type u32
372 123_u32; // type u32
373 let a: u64 = 123; // type u64
374
375 0xff; // type i32
376 0xff_u8; // type u8
377
378 0o70; // type i32
379 0o70_i16; // type i16
380
381 0b1111_1111_1001_0000; // type i32
382 0b1111_1111_1001_0000i64; // type i64
383 0b________1; // type i32
384
385 0usize; // type usize
386 ```
387
388 Examples of invalid integer literals:
389
390 ```rust,ignore
391 // invalid suffixes
392
393 0invalidSuffix;
394
395 // uses numbers of the wrong base
396
397 123AFB43;
398 0b0102;
399 0o0581;
400
401 // integers too big for their type (they overflow)
402
403 128_i8;
404 256_u8;
405
406 // bin, hex and octal literals must have at least one digit
407
408 0b_;
409 0b____;
410 ```
411
412 Note that the Rust syntax considers `-1i8` as an application of the [unary minus
413 operator] to an integer literal `1i8`, rather than
414 a single integer literal.
415
416 [unary minus operator]: expressions/operator-expr.html#negation-operators
417
418 #### Floating-point literals
419
420 > **<sup>Lexer</sup>**
421 > FLOAT_LITERAL :
422 > &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL `.`
423 > _(not immediately followed by `.`, `_` or an [identifier]_)
424 > &nbsp;&nbsp; | DEC_LITERAL FLOAT_EXPONENT
425 > &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>
426 > &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
427 > FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
428 >
429 > FLOAT_EXPONENT :
430 > &nbsp;&nbsp; (`e`|`E`) (`+`|`-`)?
431 > (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
432 >
433 > FLOAT_SUFFIX :
434 > &nbsp;&nbsp; `f32` | `f64`
435
436 A _floating-point literal_ has one of two forms:
437
438 * A _decimal literal_ followed by a period character `U+002E` (`.`). This is
439 optionally followed by another decimal literal, with an optional _exponent_.
440 * A single _decimal literal_ followed by an _exponent_.
441
442 Like integer literals, a floating-point literal may be followed by a
443 suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
444 The suffix forcibly sets the type of the literal. There are two valid
445 _floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point
446 types), which explicitly determine the type of the literal.
447
448 The type of an _unsuffixed_ floating-point literal is determined by
449 type inference:
450
451 * If a floating-point type can be _uniquely_ determined from the
452 surrounding program context, the unsuffixed floating-point literal
453 has that type.
454
455 * If the program context under-constrains the type, it defaults to `f64`.
456
457 * If the program context over-constrains the type, it is considered a
458 static type error.
459
460 Examples of floating-point literals of various forms:
461
462 ```rust
463 123.0f64; // type f64
464 0.1f64; // type f64
465 0.1f32; // type f32
466 12E+99_f64; // type f64
467 let x: f64 = 2.; // type f64
468 ```
469
470 This last example is different because it is not possible to use the suffix
471 syntax with a floating point literal ending in a period. `2.f64` would attempt
472 to call a method named `f64` on `2`.
473
474 The representation semantics of floating-point numbers are described in
475 ["Machine Types"].
476
477 ["Machine Types"]: types.html#machine-types
478
479 ### Boolean literals
480
481 > **<sup>Lexer</sup>**
482 > BOOLEAN_LITERAL :
483 > &nbsp;&nbsp; &nbsp;&nbsp; `true`
484 > &nbsp;&nbsp; | `false`
485
486 The two values of the boolean type are written `true` and `false`.
487
488 ## Lifetimes and loop labels
489
490 > **<sup>Lexer</sup>**
491 > LIFETIME_TOKEN :
492 > &nbsp;&nbsp; &nbsp;&nbsp; `'` [IDENTIFIER_OR_KEYWORD][identifier]
493 > &nbsp;&nbsp; | `'_`
494 >
495 > LIFETIME_OR_LABEL :
496 > &nbsp;&nbsp; &nbsp;&nbsp; `'` [IDENTIFIER][identifier]
497
498 Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any
499 LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in
500 macros.
501
502 [loop labels]: expressions/loop-expr.html
503
504 ## Symbols
505
506 Symbols are a general class of printable [tokens] that play structural
507 roles in a variety of grammar productions. They are a
508 set of remaining miscellaneous printable tokens that do not
509 otherwise appear as [unary operators], [binary
510 operators], or [keywords].
511 They are catalogued in [the Symbols section][symbols] of the Grammar document.
512
513 [unary operators]: expressions/operator-expr.html#borrow-operators
514 [binary operators]: expressions/operator-expr.html#arithmetic-and-logical-binary-operators
515 [tokens]: #tokens
516 [symbols]: ../grammar.html#symbols
517 [keywords]: keywords.html
518 [identifier]: identifiers.html
519 [tuples]: types.html#tuple-types
520 [tuple structs]: items/structs.html
521 [tuple variants]: items/enumerations.html