]> git.proxmox.com Git - rustc.git/blob - src/doc/reference/src/tokens.md
New upstream version 1.63.0+dfsg1
[rustc.git] / src / doc / reference / src / tokens.md
1 # Tokens
2
3 Tokens are primitive productions in the grammar defined by regular
4 (non-recursive) languages. Rust source input can be broken down
5 into the following kinds of tokens:
6
7 * [Keywords]
8 * [Identifiers][identifier]
9 * [Literals](#literals)
10 * [Lifetimes](#lifetimes-and-loop-labels)
11 * [Punctuation](#punctuation)
12 * [Delimiters](#delimiters)
13
14 Within this documentation's grammar, "simple" tokens are given in [string
15 table production] form, and appear in `monospace` font.
16
17 [string table production]: notation.md#string-table-productions
18
19 ## Literals
20
21 Literals are tokens used in [literal expressions].
22
23 ### Examples
24
25 #### Characters and strings
26
27 | | Example | `#` sets\* | Characters | Escapes |
28 |----------------------------------------------|-----------------|------------|-------------|---------------------|
29 | [Character](#character-literals) | `'H'` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
30 | [String](#string-literals) | `"hello"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
31 | [Raw string](#raw-string-literals) | `r#"hello"#` | <256 | All Unicode | `N/A` |
32 | [Byte](#byte-literals) | `b'H'` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
33 | [Byte string](#byte-string-literals) | `b"hello"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
34 | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | <256 | All ASCII | `N/A` |
35
36 \* The number of `#`s on each side of the same literal must be equivalent.
37
38 #### ASCII escapes
39
40 | | Name |
41 |---|------|
42 | `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) |
43 | `\n` | Newline |
44 | `\r` | Carriage return |
45 | `\t` | Tab |
46 | `\\` | Backslash |
47 | `\0` | Null |
48
49 #### Byte escapes
50
51 | | Name |
52 |---|------|
53 | `\x7F` | 8-bit character code (exactly 2 digits) |
54 | `\n` | Newline |
55 | `\r` | Carriage return |
56 | `\t` | Tab |
57 | `\\` | Backslash |
58 | `\0` | Null |
59
60 #### Unicode escapes
61
62 | | Name |
63 |---|------|
64 | `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) |
65
66 #### Quote escapes
67
68 | | Name |
69 |---|------|
70 | `\'` | Single quote |
71 | `\"` | Double quote |
72
73 #### Numbers
74
75 | [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
76 |----------------------------------------|---------|----------------|----------|
77 | Decimal integer | `98_222` | `N/A` | Integer suffixes |
78 | Hex integer | `0xff` | `N/A` | Integer suffixes |
79 | Octal integer | `0o77` | `N/A` | Integer suffixes |
80 | Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
81 | Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
82
83 `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
84
85 #### Suffixes
86
87 A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword.
88
89 Any kind of literal (string, integer, etc) with any suffix is valid as a token,
90 and can be passed to a macro without producing an error.
91 The macro itself will decide how to interpret such a token and whether to produce an error or not.
92
93 ```rust
94 macro_rules! blackhole { ($tt:tt) => () }
95
96 blackhole!("string"suffix); // OK
97 ```
98
99 However, suffixes on literal tokens parsed as Rust code are restricted.
100 Any suffixes are rejected on non-numeric literal tokens,
101 and numeric literal tokens are accepted only with suffixes from the list below.
102
103 | Integer | Floating-point |
104 |---------|----------------|
105 | `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize` | `f32`, `f64` |
106
107 ### Character and string literals
108
109 #### Character literals
110
111 > **<sup>Lexer</sup>**\
112 > CHAR_LITERAL :\
113 > &nbsp;&nbsp; `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
114 >
115 > QUOTE_ESCAPE :\
116 > &nbsp;&nbsp; `\'` | `\"`
117 >
118 > ASCII_ESCAPE :\
119 > &nbsp;&nbsp; &nbsp;&nbsp; `\x` OCT_DIGIT HEX_DIGIT\
120 > &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
121 >
122 > UNICODE_ESCAPE :\
123 > &nbsp;&nbsp; `\u{` ( HEX_DIGIT `_`<sup>\*</sup> )<sup>1..6</sup> `}`
124
125 A _character literal_ is a single Unicode character enclosed within two
126 `U+0027` (single-quote) characters, with the exception of `U+0027` itself,
127 which must be _escaped_ by a preceding `U+005C` character (`\`).
128
129 #### String literals
130
131 > **<sup>Lexer</sup>**\
132 > STRING_LITERAL :\
133 > &nbsp;&nbsp; `"` (\
134 > &nbsp;&nbsp; &nbsp;&nbsp; ~\[`"` `\` _IsolatedCR_]\
135 > &nbsp;&nbsp; &nbsp;&nbsp; | QUOTE_ESCAPE\
136 > &nbsp;&nbsp; &nbsp;&nbsp; | ASCII_ESCAPE\
137 > &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE\
138 > &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE\
139 > &nbsp;&nbsp; )<sup>\*</sup> `"`
140 >
141 > STRING_CONTINUE :\
142 > &nbsp;&nbsp; `\` _followed by_ \\n
143
144 A _string literal_ is a sequence of any Unicode characters enclosed within two
145 `U+0022` (double-quote) characters, with the exception of `U+0022` itself,
146 which must be _escaped_ by a preceding `U+005C` character (`\`).
147
148 Line-breaks are allowed in string literals. A line-break is either a newline
149 (`U+000A`) or a pair of carriage return and newline (`U+000D`, `U+000A`). Both
150 byte sequences are normally translated to `U+000A`, but as a special exception,
151 when an unescaped `U+005C` character (`\`) occurs immediately before a line
152 break, then the line break character(s), and all immediately following
153 ` ` (`U+0020`), `\t` (`U+0009`), `\n` (`U+000A`) and `\r` (`U+0000D`) characters
154 are ignored. Thus `a`, `b` and `c` are equal:
155
156 ```rust
157 let a = "foobar";
158 let b = "foo\
159 bar";
160 let c = "foo\
161
162 bar";
163
164 assert_eq!(a, b);
165 assert_eq!(b, c);
166 ```
167
168 > Note: Rust skipping additional newlines (like in example `c`) is potentially confusing and
169 > unexpected. This behavior may be adjusted in the future. Until a decision is made, it is
170 > recommended to avoid relying on this, i.e. skipping multiple newlines with line continuations.
171 > See [this issue](https://github.com/rust-lang/reference/pull/1042) for more information.
172
173 #### Character escapes
174
175 Some additional _escapes_ are available in either character or non-raw string
176 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
177 following forms:
178
179 * A _7-bit code point escape_ starts with `U+0078` (`x`) and is
180 followed by exactly two _hex digits_ with value up to `0x7F`. It denotes the
181 ASCII character with value equal to the provided hex value. Higher values are
182 not permitted because it is ambiguous whether they mean Unicode code points or
183 byte values.
184 * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
185 by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
186 (`}`). It denotes the Unicode code point equal to the provided hex value.
187 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
188 (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF),
189 `U+000D` (CR) or `U+0009` (HT) respectively.
190 * The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode
191 value `U+0000` (NUL).
192 * The _backslash escape_ is the character `U+005C` (`\`) which must be
193 escaped in order to denote itself.
194
195 #### Raw string literals
196
197 > **<sup>Lexer</sup>**\
198 > RAW_STRING_LITERAL :\
199 > &nbsp;&nbsp; `r` RAW_STRING_CONTENT
200 >
201 > RAW_STRING_CONTENT :\
202 > &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`\
203 > &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
204
205 Raw string literals do not process any escapes. They start with the character
206 `U+0072` (`r`), followed by fewer than 256 of the character `U+0023` (`#`) and a
207 `U+0022` (double-quote) character. The _raw string body_ can contain any sequence
208 of Unicode characters and is terminated only by another `U+0022` (double-quote)
209 character, followed by the same number of `U+0023` (`#`) characters that preceded
210 the opening `U+0022` (double-quote) character.
211
212 All Unicode characters contained in the raw string body represent themselves,
213 the characters `U+0022` (double-quote) (except when followed by at least as
214 many `U+0023` (`#`) characters as were used to start the raw string literal) or
215 `U+005C` (`\`) do not have any special meaning.
216
217 Examples for string literals:
218
219 ```rust
220 "foo"; r"foo"; // foo
221 "\"foo\""; r#""foo""#; // "foo"
222
223 "foo #\"# bar";
224 r##"foo #"# bar"##; // foo #"# bar
225
226 "\x52"; "R"; r"R"; // R
227 "\\x52"; r"\x52"; // \x52
228 ```
229
230 ### Byte and byte string literals
231
232 #### Byte literals
233
234 > **<sup>Lexer</sup>**\
235 > BYTE_LITERAL :\
236 > &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'`
237 >
238 > ASCII_FOR_CHAR :\
239 > &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `\`, \\n, \\r or \\t
240 >
241 > BYTE_ESCAPE :\
242 > &nbsp;&nbsp; &nbsp;&nbsp; `\x` HEX_DIGIT HEX_DIGIT\
243 > &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0` | `\'` | `\"`
244
245 A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
246 range) or a single _escape_ preceded by the characters `U+0062` (`b`) and
247 `U+0027` (single-quote), and followed by the character `U+0027`. If the character
248 `U+0027` is present within the literal, it must be _escaped_ by a preceding
249 `U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer
250 _number literal_.
251
252 #### Byte string literals
253
254 > **<sup>Lexer</sup>**\
255 > BYTE_STRING_LITERAL :\
256 > &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
257 >
258 > ASCII_FOR_STRING :\
259 > &nbsp;&nbsp; _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `\` _and IsolatedCR_
260
261 A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_,
262 preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and
263 followed by the character `U+0022`. If the character `U+0022` is present within
264 the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
265 Alternatively, a byte string literal can be a _raw byte string literal_, defined
266 below. The type of a byte string literal of length `n` is `&'static [u8; n]`.
267
268 Some additional _escapes_ are available in either byte or non-raw byte string
269 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
270 following forms:
271
272 * A _byte escape_ escape starts with `U+0078` (`x`) and is
273 followed by exactly two _hex digits_. It denotes the byte
274 equal to the provided hex value.
275 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
276 (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
277 `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
278 * The _null escape_ is the character `U+0030` (`0`) and denotes the byte
279 value `0x00` (ASCII NUL).
280 * The _backslash escape_ is the character `U+005C` (`\`) which must be
281 escaped in order to denote its ASCII encoding `0x5C`.
282
283 #### Raw byte string literals
284
285 > **<sup>Lexer</sup>**\
286 > RAW_BYTE_STRING_LITERAL :\
287 > &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT
288 >
289 > RAW_BYTE_STRING_CONTENT :\
290 > &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`\
291 > &nbsp;&nbsp; | `#` RAW_BYTE_STRING_CONTENT `#`
292 >
293 > ASCII :\
294 > &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F)_
295
296 Raw byte string literals do not process any escapes. They start with the
297 character `U+0062` (`b`), followed by `U+0072` (`r`), followed by fewer than 256
298 of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
299 _raw string body_ can contain any sequence of ASCII characters and is terminated
300 only by another `U+0022` (double-quote) character, followed by the same number of
301 `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
302 character. A raw byte string literal can not contain any non-ASCII byte.
303
304 All characters contained in the raw string body represent their ASCII encoding,
305 the characters `U+0022` (double-quote) (except when followed by at least as
306 many `U+0023` (`#`) characters as were used to start the raw string literal) or
307 `U+005C` (`\`) do not have any special meaning.
308
309 Examples for byte string literals:
310
311 ```rust
312 b"foo"; br"foo"; // foo
313 b"\"foo\""; br#""foo""#; // "foo"
314
315 b"foo #\"# bar";
316 br##"foo #"# bar"##; // foo #"# bar
317
318 b"\x52"; b"R"; br"R"; // R
319 b"\\x52"; br"\x52"; // \x52
320 ```
321
322 ### Number literals
323
324 A _number literal_ is either an _integer literal_ or a _floating-point
325 literal_. The grammar for recognizing the two kinds of literals is mixed.
326
327 #### Integer literals
328
329 > **<sup>Lexer</sup>**\
330 > INTEGER_LITERAL :\
331 > &nbsp;&nbsp; ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
332 > INTEGER_SUFFIX<sup>?</sup>
333 >
334 > DEC_LITERAL :\
335 > &nbsp;&nbsp; DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
336 >
337 > BIN_LITERAL :\
338 > &nbsp;&nbsp; `0b` (BIN_DIGIT|`_`)<sup>\*</sup> BIN_DIGIT (BIN_DIGIT|`_`)<sup>\*</sup>
339 >
340 > OCT_LITERAL :\
341 > &nbsp;&nbsp; `0o` (OCT_DIGIT|`_`)<sup>\*</sup> OCT_DIGIT (OCT_DIGIT|`_`)<sup>\*</sup>
342 >
343 > HEX_LITERAL :\
344 > &nbsp;&nbsp; `0x` (HEX_DIGIT|`_`)<sup>\*</sup> HEX_DIGIT (HEX_DIGIT|`_`)<sup>\*</sup>
345 >
346 > BIN_DIGIT : \[`0`-`1`]
347 >
348 > OCT_DIGIT : \[`0`-`7`]
349 >
350 > DEC_DIGIT : \[`0`-`9`]
351 >
352 > HEX_DIGIT : \[`0`-`9` `a`-`f` `A`-`F`]
353 >
354 > INTEGER_SUFFIX :\
355 > &nbsp;&nbsp; &nbsp;&nbsp; `u8` | `u16` | `u32` | `u64` | `u128` | `usize`\
356 > &nbsp;&nbsp; | `i8` | `i16` | `i32` | `i64` | `i128` | `isize`
357
358 An _integer literal_ has one of four forms:
359
360 * A _decimal literal_ starts with a *decimal digit* and continues with any
361 mixture of *decimal digits* and _underscores_.
362 * A _hex literal_ starts with the character sequence `U+0030` `U+0078`
363 (`0x`) and continues as any mixture (with at least one digit) of hex digits
364 and underscores.
365 * An _octal literal_ starts with the character sequence `U+0030` `U+006F`
366 (`0o`) and continues as any mixture (with at least one digit) of octal digits
367 and underscores.
368 * A _binary literal_ starts with the character sequence `U+0030` `U+0062`
369 (`0b`) and continues as any mixture (with at least one digit) of binary digits
370 and underscores.
371
372 Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]:
373 `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`.
374 See [literal expressions] for the effect of these suffixes.
375
376 Examples of integer literals of various forms:
377
378 ```rust
379 # #![allow(overflowing_literals)]
380 123;
381 123i32;
382 123u32;
383 123_u32;
384
385 0xff;
386 0xff_u8;
387 0x01_f32; // integer 7986, not floating-point 1.0
388 0x01_e3; // integer 483, not floating-point 1000.0
389
390 0o70;
391 0o70_i16;
392
393 0b1111_1111_1001_0000;
394 0b1111_1111_1001_0000i64;
395 0b________1;
396
397 0usize;
398
399 // These are too big for their type, but are still valid tokens
400
401 128_i8;
402 256_u8;
403
404 ```
405
406 Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`.
407
408 Examples of invalid integer literals:
409
410 ```rust,compile_fail
411 // uses numbers of the wrong base
412
413 0b0102;
414 0o0581;
415
416 // bin, hex, and octal literals must have at least one digit
417
418 0b_;
419 0b____;
420 ```
421
422 #### Tuple index
423
424 > **<sup>Lexer</sup>**\
425 > TUPLE_INDEX: \
426 > &nbsp;&nbsp; INTEGER_LITERAL
427
428 A tuple index is used to refer to the fields of [tuples], [tuple structs], and
429 [tuple variants].
430
431 Tuple indices are compared with the literal token directly. Tuple indices
432 start with `0` and each successive index increments the value by `1` as a
433 decimal value. Thus, only decimal values will match, and the value must not
434 have any extra `0` prefix characters.
435
436 ```rust,compile_fail
437 let example = ("dog", "cat", "horse");
438 let dog = example.0;
439 let cat = example.1;
440 // The following examples are invalid.
441 let cat = example.01; // ERROR no field named `01`
442 let horse = example.0b10; // ERROR no field named `0b10`
443 ```
444
445 > **Note**: The tuple index may include an `INTEGER_SUFFIX`, but this is not
446 > intended to be valid, and may be removed in a future version. See
447 > <https://github.com/rust-lang/rust/issues/60210> for more information.
448
449 #### Floating-point literals
450
451 > **<sup>Lexer</sup>**\
452 > FLOAT_LITERAL :\
453 > &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL `.`
454 > _(not immediately followed by `.`, `_` or an XID_Start character)_\
455 > &nbsp;&nbsp; | DEC_LITERAL FLOAT_EXPONENT\
456 > &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>\
457 > &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
458 > FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
459 >
460 > FLOAT_EXPONENT :\
461 > &nbsp;&nbsp; (`e`|`E`) (`+`|`-`)<sup>?</sup>
462 > (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
463 >
464 > FLOAT_SUFFIX :\
465 > &nbsp;&nbsp; `f32` | `f64`
466
467 A _floating-point literal_ has one of three forms:
468
469 * A _decimal literal_ followed by a period character `U+002E` (`.`). This is
470 optionally followed by another decimal literal, with an optional _exponent_.
471 * A single _decimal literal_ followed by an _exponent_.
472 * A single _decimal literal_ (in which case a suffix is required).
473
474 Like integer literals, a floating-point literal may be followed by a
475 suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
476 There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]).
477 See [literal expressions] for the effect of these suffixes.
478
479 Examples of floating-point literals of various forms:
480
481 ```rust
482 123.0f64;
483 0.1f64;
484 0.1f32;
485 12E+99_f64;
486 5f32;
487 let x: f64 = 2.;
488 ```
489
490 This last example is different because it is not possible to use the suffix
491 syntax with a floating point literal ending in a period. `2.f64` would attempt
492 to call a method named `f64` on `2`.
493
494 Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`.
495
496 #### Number pseudoliterals
497
498 > **<sup>Lexer</sup>**\
499 > NUMBER_PSEUDOLITERAL :\
500 > &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL ( . DEC_LITERAL )<sup>?</sup> FLOAT_EXPONENT\
501 > &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\
502 > &nbsp;&nbsp; | DEC_LITERAL . DEC_LITERAL\
503 > &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\
504 > &nbsp;&nbsp; | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\
505 > &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\
506 > &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX )
507 >
508 > NUMBER_PSEUDOLITERAL_SUFFIX :\
509 > &nbsp;&nbsp; IDENTIFIER_OR_KEYWORD <sub>_not matching INTEGER_SUFFIX or FLOAT_SUFFIX_</sub>
510 >
511 > NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\
512 > &nbsp;&nbsp; NUMBER_PSEUDOLITERAL_SUFFIX <sub>_not beginning with `e` or `E`_</sub>
513
514 Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above.
515 These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments.
516
517 Examples of such tokens:
518 ```rust,compile_fail
519 0invalidSuffix;
520 123AFB43;
521 0b010a;
522 0xAB_CD_EF_GH;
523 2.0f80;
524 2e5f80;
525 2e5e6;
526 2.0e5e6;
527 1.3e10u64;
528 0b1111_f32;
529 ```
530
531 #### Reserved forms similar to number literals
532
533 > **<sup>Lexer</sup>**\
534 > RESERVED_NUMBER :\
535 > &nbsp;&nbsp; &nbsp;&nbsp; BIN_LITERAL \[`2`-`9`&ZeroWidthSpace;]\
536 > &nbsp;&nbsp; | OCT_LITERAL \[`8`-`9`&ZeroWidthSpace;]\
537 > &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \
538 > &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; _(not immediately followed by `.`, `_` or an XID_Start character)_\
539 > &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL ) `e`\
540 > &nbsp;&nbsp; | `0b` `_`<sup>\*</sup> _end of input or not BIN_DIGIT_\
541 > &nbsp;&nbsp; | `0o` `_`<sup>\*</sup> _end of input or not OCT_DIGIT_\
542 > &nbsp;&nbsp; | `0x` `_`<sup>\*</sup> _end of input or not HEX_DIGIT_\
543 > &nbsp;&nbsp; | DEC_LITERAL ( . DEC_LITERAL)<sup>?</sup> (`e`|`E`) (`+`|`-`)<sup>?</sup> _end of input or not DEC_DIGIT_
544
545 The following lexical forms similar to number literals are _reserved forms_.
546 Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens.
547
548 * An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix.
549
550 * An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals).
551
552 * An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`.
553
554 * Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits).
555
556 * Input which has the form of a floating-point literal with no digits in the exponent.
557
558 Examples of reserved forms:
559
560 ```rust,compile_fail
561 0b0102; // this is not `0b010` followed by `2`
562 0o1279; // this is not `0o127` followed by `9`
563 0x80.0; // this is not `0x80` followed by `.` and `0`
564 0b101e; // this is not a pseudoliteral, or `0b101` followed by `e`
565 0b; // this is not a pseudoliteral, or `0` followed by `b`
566 0b_; // this is not a pseudoliteral, or `0` followed by `b_`
567 2e; // this is not a pseudoliteral, or `2` followed by `e`
568 2.0e; // this is not a pseudoliteral, or `2.0` followed by `e`
569 2em; // this is not a pseudoliteral, or `2` followed by `em`
570 2.0em; // this is not a pseudoliteral, or `2.0` followed by `em`
571 ```
572
573 ## Lifetimes and loop labels
574
575 > **<sup>Lexer</sup>**\
576 > LIFETIME_TOKEN :\
577 > &nbsp;&nbsp; &nbsp;&nbsp; `'` [IDENTIFIER_OR_KEYWORD][identifier]\
578 > &nbsp;&nbsp; | `'_`
579 >
580 > LIFETIME_OR_LABEL :\
581 > &nbsp;&nbsp; &nbsp;&nbsp; `'` [NON_KEYWORD_IDENTIFIER][identifier]
582
583 Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any
584 LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in
585 macros.
586
587 ## Punctuation
588
589 Punctuation symbol tokens are listed here for completeness. Their individual
590 usages and meanings are defined in the linked pages.
591
592 | Symbol | Name | Usage |
593 |--------|-------------|-------|
594 | `+` | Plus | [Addition][arith], [Trait Bounds], [Macro Kleene Matcher][macros]
595 | `-` | Minus | [Subtraction][arith], [Negation]
596 | `*` | Star | [Multiplication][arith], [Dereference], [Raw Pointers], [Macro Kleene Matcher][macros], [Use wildcards]
597 | `/` | Slash | [Division][arith]
598 | `%` | Percent | [Remainder][arith]
599 | `^` | Caret | [Bitwise and Logical XOR][arith]
600 | `!` | Not | [Bitwise and Logical NOT][negation], [Macro Calls][macros], [Inner Attributes][attributes], [Never Type], [Negative impls]
601 | `&` | And | [Bitwise and Logical AND][arith], [Borrow], [References], [Reference patterns]
602 | <code>\|</code> | Or | [Bitwise and Logical OR][arith], [Closures], Patterns in [match], [if let], and [while let]
603 | `&&` | AndAnd | [Lazy AND][lazy-bool], [Borrow], [References], [Reference patterns]
604 | <code>\|\|</code> | OrOr | [Lazy OR][lazy-bool], [Closures]
605 | `<<` | Shl | [Shift Left][arith], [Nested Generics][generics]
606 | `>>` | Shr | [Shift Right][arith], [Nested Generics][generics]
607 | `+=` | PlusEq | [Addition assignment][compound]
608 | `-=` | MinusEq | [Subtraction assignment][compound]
609 | `*=` | StarEq | [Multiplication assignment][compound]
610 | `/=` | SlashEq | [Division assignment][compound]
611 | `%=` | PercentEq | [Remainder assignment][compound]
612 | `^=` | CaretEq | [Bitwise XOR assignment][compound]
613 | `&=` | AndEq | [Bitwise And assignment][compound]
614 | <code>\|=</code> | OrEq | [Bitwise Or assignment][compound]
615 | `<<=` | ShlEq | [Shift Left assignment][compound]
616 | `>>=` | ShrEq | [Shift Right assignment][compound], [Nested Generics][generics]
617 | `=` | Eq | [Assignment], [Attributes], Various type definitions
618 | `==` | EqEq | [Equal][comparison]
619 | `!=` | Ne | [Not Equal][comparison]
620 | `>` | Gt | [Greater than][comparison], [Generics], [Paths]
621 | `<` | Lt | [Less than][comparison], [Generics], [Paths]
622 | `>=` | Ge | [Greater than or equal to][comparison], [Generics]
623 | `<=` | Le | [Less than or equal to][comparison]
624 | `@` | At | [Subpattern binding]
625 | `_` | Underscore | [Wildcard patterns], [Inferred types], Unnamed items in [constants], [extern crates], [use declarations], and [destructuring assignment]
626 | `.` | Dot | [Field access][field], [Tuple index]
627 | `..` | DotDot | [Range][range], [Struct expressions], [Patterns], [Range Patterns][rangepat]
628 | `...` | DotDotDot | [Variadic functions][extern], [Range patterns]
629 | `..=` | DotDotEq | [Inclusive Range][range], [Range patterns]
630 | `,` | Comma | Various separators
631 | `;` | Semi | Terminator for various items and statements, [Array types]
632 | `:` | Colon | Various separators
633 | `::` | PathSep | [Path separator][paths]
634 | `->` | RArrow | [Function return type][functions], [Closure return type][closures], [Function pointer type]
635 | `=>` | FatArrow | [Match arms][match], [Macros]
636 | `#` | Pound | [Attributes]
637 | `$` | Dollar | [Macros]
638 | `?` | Question | [Question mark operator][question], [Questionably sized][sized], [Macro Kleene Matcher][macros]
639 | `~` | Tilde | The tilde operator has been unused since before Rust 1.0, but its token may still be used
640
641 ## Delimiters
642
643 Bracket punctuation is used in various parts of the grammar. An open bracket
644 must always be paired with a close bracket. Brackets and the tokens within
645 them are referred to as "token trees" in [macros]. The three types of brackets are:
646
647 | Bracket | Type |
648 |---------|-----------------|
649 | `{` `}` | Curly braces |
650 | `[` `]` | Square brackets |
651 | `(` `)` | Parentheses |
652
653 ## Reserved prefixes
654
655 > **<sup>Lexer 2021+</sup>**\
656 > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b` or `r` or `br`_</sub> | `_` ) `"`\
657 > RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b`_</sub> | `_` ) `'`\
658 > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub>_Except `r` or `br`_</sub> | `_` ) `#`
659
660 Some lexical forms known as _reserved prefixes_ are reserved for future use.
661
662 Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix.
663
664 Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix.
665
666 Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
667
668 > **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
669 >
670 > Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token).
671 >
672 > Examples accepted in all editions:
673 > ```rust
674 > macro_rules! lexes {($($_:tt)*) => {}}
675 > lexes!{a #foo}
676 > lexes!{continue 'foo}
677 > lexes!{match "..." {}}
678 > lexes!{r#let#foo} // three tokens: r#let # foo
679 > ```
680 >
681 > Examples accepted before the 2021 edition but rejected later:
682 > ```rust,edition2018
683 > macro_rules! lexes {($($_:tt)*) => {}}
684 > lexes!{a#foo}
685 > lexes!{continue'foo}
686 > lexes!{match"..." {}}
687 > ```
688
689 [Inferred types]: types/inferred.md
690 [Range patterns]: patterns.md#range-patterns
691 [Reference patterns]: patterns.md#reference-patterns
692 [Subpattern binding]: patterns.md#identifier-patterns
693 [Wildcard patterns]: patterns.md#wildcard-pattern
694 [arith]: expressions/operator-expr.md#arithmetic-and-logical-binary-operators
695 [array types]: types/array.md
696 [assignment]: expressions/operator-expr.md#assignment-expressions
697 [attributes]: attributes.md
698 [borrow]: expressions/operator-expr.md#borrow-operators
699 [closures]: expressions/closure-expr.md
700 [comparison]: expressions/operator-expr.md#comparison-operators
701 [compound]: expressions/operator-expr.md#compound-assignment-expressions
702 [constants]: items/constant-items.md
703 [dereference]: expressions/operator-expr.md#the-dereference-operator
704 [destructuring assignment]: expressions/underscore-expr.md
705 [extern crates]: items/extern-crates.md
706 [extern]: items/external-blocks.md
707 [field]: expressions/field-expr.md
708 [floating-point types]: types/numeric.md#floating-point-types
709 [function pointer type]: types/function-pointer.md
710 [functions]: items/functions.md
711 [generics]: items/generics.md
712 [identifier]: identifiers.md
713 [if let]: expressions/if-expr.md#if-let-expressions
714 [keywords]: keywords.md
715 [lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators
716 [literal expressions]: expressions/literal-expr.md
717 [loop labels]: expressions/loop-expr.md
718 [macros]: macros-by-example.md
719 [match]: expressions/match-expr.md
720 [negation]: expressions/operator-expr.md#negation-operators
721 [negative impls]: items/implementations.md
722 [never type]: types/never.md
723 [numeric types]: types/numeric.md
724 [paths]: paths.md
725 [patterns]: patterns.md
726 [question]: expressions/operator-expr.md#the-question-mark-operator
727 [range]: expressions/range-expr.md
728 [rangepat]: patterns.md#range-patterns
729 [raw pointers]: types/pointer.md#raw-pointers-const-and-mut
730 [references]: types/pointer.md
731 [sized]: trait-bounds.md#sized
732 [struct expressions]: expressions/struct-expr.md
733 [trait bounds]: trait-bounds.md
734 [tuple index]: expressions/tuple-expr.md#tuple-indexing-expressions
735 [tuple structs]: items/structs.md
736 [tuple variants]: items/enumerations.md
737 [tuples]: types/tuple.md
738 [unary minus operator]: expressions/operator-expr.md#negation-operators
739 [use declarations]: items/use-declarations.md
740 [use wildcards]: items/use-declarations.md
741 [while let]: expressions/loop-expr.md#predicate-pattern-loops