]> git.proxmox.com Git - rustc.git/blame - src/doc/reference/src/tokens.md
New upstream version 1.23.0+dfsg1
[rustc.git] / src / doc / reference / src / tokens.md
CommitLineData
8bb4bdeb
XL
1# Tokens
2
3Tokens are primitive productions in the grammar defined by regular
4(non-recursive) languages. "Simple" tokens are given in [string table
5production] form, and occur in the rest of the
6grammar as double-quoted strings. Other tokens have exact rules given.
7
abe05a73 8[string table production]: notation.html#string-table-productions
8bb4bdeb
XL
9
10## Literals
11
12A literal is an expression consisting of a single token, rather than a sequence
13of tokens, that immediately and directly denotes the value it evaluates to,
14rather than referring to it by name or some other evaluation rule. A literal is
041b39d2
XL
15a form of [constant expression](expressions.html#constant-expressions), so is
16evaluated (primarily) at compile time.
8bb4bdeb
XL
17
18### Examples
19
20#### Characters and strings
21
22| | Example | `#` sets | Characters | Escapes |
23|----------------------------------------------|-----------------|------------|-------------|---------------------|
ea8adc8c
XL
24| [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
25| [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
8bb4bdeb
XL
26| [Raw](#raw-string-literals) | `r#"hello"#` | `0...` | All Unicode | `N/A` |
27| [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
28| [Byte string](#byte-string-literals) | `b"hello"` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
29| [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | `0...` | All ASCII | `N/A` |
30
ea8adc8c
XL
31#### ASCII escapes
32
33| | Name |
34|---|------|
35| `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) |
36| `\n` | Newline |
37| `\r` | Carriage return |
38| `\t` | Tab |
39| `\\` | Backslash |
40| `\0` | Null |
41
8bb4bdeb
XL
42#### Byte escapes
43
44| | Name |
45|---|------|
46| `\x7F` | 8-bit character code (exactly 2 digits) |
47| `\n` | Newline |
48| `\r` | Carriage return |
49| `\t` | Tab |
50| `\\` | Backslash |
51| `\0` | Null |
52
53#### Unicode escapes
54
55| | Name |
56|---|------|
57| `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) |
58
59#### Quote escapes
60
61| | Name |
62|---|------|
63| `\'` | Single quote |
64| `\"` | Double quote |
65
66#### Numbers
67
68| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
69|----------------------------------------|---------|----------------|----------|
70| Decimal integer | `98_222` | `N/A` | Integer suffixes |
71| Hex integer | `0xff` | `N/A` | Integer suffixes |
72| Octal integer | `0o77` | `N/A` | Integer suffixes |
73| Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
74| Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
75
76`*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
77
78#### Suffixes
79
80| Integer | Floating-point |
81|---------|----------------|
82| `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `isize`, `usize` | `f32`, `f64` |
83
84### Character and string literals
85
86#### Character literals
87
ea8adc8c
XL
88> **<sup>Lexer</sup>**
89> CHAR_LITERAL :
90> &nbsp;&nbsp; `'` ( ~[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
91>
92> QUOTE_ESCAPE :
93> &nbsp;&nbsp; `\'` | `\"`
94>
95> ASCII_ESCAPE :
96> &nbsp;&nbsp; &nbsp;&nbsp; `\x` OCT_DIGIT HEX_DIGIT
97> &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
98>
99> UNICODE_ESCAPE :
100> &nbsp;&nbsp; `\u{` ( HEX_DIGIT `_`<sup>\*</sup> )<sup>1..6</sup> `}`
101
8bb4bdeb
XL
102A _character literal_ is a single Unicode character enclosed within two
103`U+0027` (single-quote) characters, with the exception of `U+0027` itself,
104which must be _escaped_ by a preceding `U+005C` character (`\`).
105
106#### String literals
107
ea8adc8c
XL
108> **<sup>Lexer</sup>**
109> STRING_LITERAL :
110> &nbsp;&nbsp; `"` (
111> &nbsp;&nbsp; &nbsp;&nbsp; ~[`"` `\` _IsolatedCR_]
112> &nbsp;&nbsp; &nbsp;&nbsp; | QUOTE_ESCAPE
113> &nbsp;&nbsp; &nbsp;&nbsp; | ASCII_ESCAPE
114> &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE
115> &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE
116> &nbsp;&nbsp; )<sup>\*</sup> `"`
117>
118> STRING_CONTINUE :
119> &nbsp;&nbsp; `\` _followed by_ \\n
120
8bb4bdeb
XL
121A _string literal_ is a sequence of any Unicode characters enclosed within two
122`U+0022` (double-quote) characters, with the exception of `U+0022` itself,
123which must be _escaped_ by a preceding `U+005C` character (`\`).
124
125Line-break characters are allowed in string literals. Normally they represent
126themselves (i.e. no translation), but as a special exception, when an unescaped
127`U+005C` character (`\`) occurs immediately before the newline (`U+000A`), the
128`U+005C` character, the newline, and all whitespace at the beginning of the
129next line are ignored. Thus `a` and `b` are equal:
130
131```rust
132let a = "foobar";
133let b = "foo\
134 bar";
135
136assert_eq!(a,b);
137```
138
139#### Character escapes
140
141Some additional _escapes_ are available in either character or non-raw string
142literals. An escape starts with a `U+005C` (`\`) and continues with one of the
143following forms:
144
145* An _8-bit code point escape_ starts with `U+0078` (`x`) and is
146 followed by exactly two _hex digits_. It denotes the Unicode code point
147 equal to the provided hex value.
148* A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
149 by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
150 (`}`). It denotes the Unicode code point equal to the provided hex value.
151* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
152 (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF),
153 `U+000D` (CR) or `U+0009` (HT) respectively.
154* The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode
155 value `U+0000` (NUL).
156* The _backslash escape_ is the character `U+005C` (`\`) which must be
157 escaped in order to denote *itself*.
158
159#### Raw string literals
160
ea8adc8c
XL
161> **<sup>Lexer</sup>**
162> RAW_STRING_LITERAL :
163> &nbsp;&nbsp; `r` RAW_STRING_CONTENT
164>
165> RAW_STRING_CONTENT :
166> &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`
167> &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
168
8bb4bdeb
XL
169Raw string literals do not process any escapes. They start with the character
170`U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
171`U+0022` (double-quote) character. The _raw string body_ can contain any sequence
172of Unicode characters and is terminated only by another `U+0022` (double-quote)
173character, followed by the same number of `U+0023` (`#`) characters that preceded
174the opening `U+0022` (double-quote) character.
175
176All Unicode characters contained in the raw string body represent themselves,
177the characters `U+0022` (double-quote) (except when followed by at least as
178many `U+0023` (`#`) characters as were used to start the raw string literal) or
179`U+005C` (`\`) do not have any special meaning.
180
181Examples for string literals:
182
cc61c64b 183```rust
8bb4bdeb
XL
184"foo"; r"foo"; // foo
185"\"foo\""; r#""foo""#; // "foo"
186
187"foo #\"# bar";
188r##"foo #"# bar"##; // foo #"# bar
189
190"\x52"; "R"; r"R"; // R
191"\\x52"; r"\x52"; // \x52
192```
193
194### Byte and byte string literals
195
196#### Byte literals
197
ea8adc8c
XL
198> **<sup>Lexer</sup>**
199> BYTE_LITERAL :
200> &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'`
201>
202> ASCII_FOR_CHAR :
203> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `/`, \\n, \\r or \\t
204>
205> BYTE_ESCAPE :
206> &nbsp;&nbsp; &nbsp;&nbsp; `\x` HEX_DIGIT HEX_DIGIT
207> &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
208
8bb4bdeb
XL
209A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
210range) or a single _escape_ preceded by the characters `U+0062` (`b`) and
211`U+0027` (single-quote), and followed by the character `U+0027`. If the character
212`U+0027` is present within the literal, it must be _escaped_ by a preceding
213`U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer
214_number literal_.
215
216#### Byte string literals
217
ea8adc8c
XL
218> **<sup>Lexer</sup>**
219> BYTE_STRING_LITERAL :
220> &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
221>
222> ASCII_FOR_STRING :
223> &nbsp;&nbsp; _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `/` _and IsolatedCR_
224
8bb4bdeb
XL
225A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_,
226preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and
227followed by the character `U+0022`. If the character `U+0022` is present within
228the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
229Alternatively, a byte string literal can be a _raw byte string literal_, defined
230below. A byte string literal of length `n` is equivalent to a `&'static [u8; n]` borrowed fixed-sized array
231of unsigned 8-bit integers.
232
233Some additional _escapes_ are available in either byte or non-raw byte string
234literals. An escape starts with a `U+005C` (`\`) and continues with one of the
235following forms:
236
237* A _byte escape_ escape starts with `U+0078` (`x`) and is
238 followed by exactly two _hex digits_. It denotes the byte
239 equal to the provided hex value.
240* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
241 (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
242 `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
243* The _null escape_ is the character `U+0030` (`0`) and denotes the byte
244 value `0x00` (ASCII NUL).
245* The _backslash escape_ is the character `U+005C` (`\`) which must be
246 escaped in order to denote its ASCII encoding `0x5C`.
247
248#### Raw byte string literals
249
ea8adc8c
XL
250> **<sup>Lexer</sup>**
251> RAW_BYTE_STRING_LITERAL :
252> &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT
253>
254> RAW_BYTE_STRING_CONTENT :
255> &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`
256> &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
257>
258> ASCII :
259> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F)_
260
8bb4bdeb
XL
261Raw byte string literals do not process any escapes. They start with the
262character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
263of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
264_raw string body_ can contain any sequence of ASCII characters and is terminated
265only by another `U+0022` (double-quote) character, followed by the same number of
266`U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
267character. A raw byte string literal can not contain any non-ASCII byte.
268
269All characters contained in the raw string body represent their ASCII encoding,
270the characters `U+0022` (double-quote) (except when followed by at least as
271many `U+0023` (`#`) characters as were used to start the raw string literal) or
272`U+005C` (`\`) do not have any special meaning.
273
274Examples for byte string literals:
275
cc61c64b 276```rust
8bb4bdeb
XL
277b"foo"; br"foo"; // foo
278b"\"foo\""; br#""foo""#; // "foo"
279
280b"foo #\"# bar";
281br##"foo #"# bar"##; // foo #"# bar
282
283b"\x52"; b"R"; br"R"; // R
284b"\\x52"; br"\x52"; // \x52
285```
286
287### Number literals
288
289A _number literal_ is either an _integer literal_ or a _floating-point
290literal_. The grammar for recognizing the two kinds of literals is mixed.
291
292#### Integer literals
293
ea8adc8c
XL
294> **<sup>Lexer</sup>**
295> INTEGER_LITERAL :
296> &nbsp;&nbsp; ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
297> INTEGER_SUFFIX<sup>?</sup>
298>
299> DEC_LITERAL :
300> &nbsp;&nbsp; DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
301>
302> BIN_LITERAL :
303> &nbsp;&nbsp; `0b` (BIN_DIGIT|`_`)<sup>\*</sup> BIN_DIGIT (BIN_DIGIT|`_`)<sup>\*</sup>
304>
305> OCT_LITERAL :
306> &nbsp;&nbsp; `0o` (OCT_DIGIT|`_`)<sup>\*</sup> OCT_DIGIT (OCT_DIGIT|`_`)<sup>\*</sup>
307>
308> HEX_LITERAL :
309> &nbsp;&nbsp; `0x` (HEX_DIGIT|`_`)<sup>\*</sup> HEX_DIGIT (HEX_DIGIT|`_`)<sup>\*</sup>
310>
311> BIN_DIGIT : [`0`-`1`]
312>
313> OCT_DIGIT : [`0`-`7`]
314>
315> DEC_DIGIT : [`0`-`9`]
316>
317> HEX_DIGIT : [`0`-`9` `a`-`f` `A`-`F`]
318>
319> INTEGER_SUFFIX :
320> &nbsp;&nbsp; &nbsp;&nbsp; `u8` | `u16` | `u32` | `u64` | `usize`
321> &nbsp;&nbsp; | `i8` | `u16` | `i32` | `i64` | `usize`
322
323<!-- FIXME: separate the DECIMAL_LITERAL with no prefix or suffix (used on tuple indexing and float_literal -->
324<!-- FIXME: u128 and i128 -->
325
8bb4bdeb
XL
326An _integer literal_ has one of four forms:
327
328* A _decimal literal_ starts with a *decimal digit* and continues with any
329 mixture of *decimal digits* and _underscores_.
330* A _hex literal_ starts with the character sequence `U+0030` `U+0078`
ea8adc8c
XL
331 (`0x`) and continues as any mixture (with at least one digit) of hex digits
332 and underscores.
8bb4bdeb 333* An _octal literal_ starts with the character sequence `U+0030` `U+006F`
ea8adc8c
XL
334 (`0o`) and continues as any mixture (with at least one digit) of octal digits
335 and underscores.
8bb4bdeb 336* A _binary literal_ starts with the character sequence `U+0030` `U+0062`
ea8adc8c
XL
337 (`0b`) and continues as any mixture (with at least one digit) of binary digits
338 and underscores.
8bb4bdeb
XL
339
340Like any literal, an integer literal may be followed (immediately,
341without any spaces) by an _integer suffix_, which forcibly sets the
342type of the literal. The integer suffix must be the name of one of the
343integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`,
344`isize`, or `usize`.
345
346The type of an _unsuffixed_ integer literal is determined by type inference:
347
348* If an integer type can be _uniquely_ determined from the surrounding
349 program context, the unsuffixed integer literal has that type.
350
351* If the program context under-constrains the type, it defaults to the
352 signed 32-bit integer `i32`.
353
354* If the program context over-constrains the type, it is considered a
355 static type error.
356
357Examples of integer literals of various forms:
358
cc61c64b 359```rust
ea8adc8c 360123; // type i32
8bb4bdeb
XL
361123i32; // type i32
362123u32; // type u32
363123_u32; // type u32
ea8adc8c
XL
364let a: u64 = 123; // type u64
365
3660xff; // type i32
8bb4bdeb 3670xff_u8; // type u8
ea8adc8c
XL
368
3690o70; // type i32
8bb4bdeb 3700o70_i16; // type i16
ea8adc8c
XL
371
3720b1111_1111_1001_0000; // type i32
3730b1111_1111_1001_0000i32; // type i64
3740b________1; // type i32
375
8bb4bdeb
XL
3760usize; // type usize
377```
378
ea8adc8c
XL
379Examples of invalid integer literals:
380
381```rust,ignore
382// invalid suffixes
383
3840invalidSuffix;
385
386// uses numbers of the wrong base
387
388123AFB43;
3890b0102;
3900o0581;
391
392// integers too big for their type (they overflow)
393
394128_i8;
395256_u8;
396
397// bin, hex and octal literals must have at least one digit
398
3990b_;
4000b____;
401```
402
8bb4bdeb
XL
403Note that the Rust syntax considers `-1i8` as an application of the [unary minus
404operator] to an integer literal `1i8`, rather than
405a single integer literal.
406
ea8adc8c 407[unary minus operator]: expressions/operator-expr.html#negation-operators
8bb4bdeb
XL
408
409#### Floating-point literals
410
ea8adc8c
XL
411> **<sup>Lexer</sup>**
412> FLOAT_LITERAL :
413> &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL `.`
414> _(not immediately followed by `.`, `_` or an identifier_)
415> &nbsp;&nbsp; | DEC_LITERAL FLOAT_EXPONENT
416> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>
417> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
418> FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
419>
420> FLOAT_EXPONENT :
421> &nbsp;&nbsp; (`e`|`E`) (`+`|`-`)?
422> (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
423>
424> FLOAT_SUFFIX :
425> &nbsp;&nbsp; `f32` | `f64`
426
8bb4bdeb
XL
427A _floating-point literal_ has one of two forms:
428
429* A _decimal literal_ followed by a period character `U+002E` (`.`). This is
430 optionally followed by another decimal literal, with an optional _exponent_.
431* A single _decimal literal_ followed by an _exponent_.
432
433Like integer literals, a floating-point literal may be followed by a
434suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
435The suffix forcibly sets the type of the literal. There are two valid
436_floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point
437types), which explicitly determine the type of the literal.
438
439The type of an _unsuffixed_ floating-point literal is determined by
440type inference:
441
442* If a floating-point type can be _uniquely_ determined from the
443 surrounding program context, the unsuffixed floating-point literal
444 has that type.
445
446* If the program context under-constrains the type, it defaults to `f64`.
447
448* If the program context over-constrains the type, it is considered a
449 static type error.
450
451Examples of floating-point literals of various forms:
452
cc61c64b 453```rust
8bb4bdeb
XL
454123.0f64; // type f64
4550.1f64; // type f64
4560.1f32; // type f32
45712E+99_f64; // type f64
458let x: f64 = 2.; // type f64
459```
460
461This last example is different because it is not possible to use the suffix
462syntax with a floating point literal ending in a period. `2.f64` would attempt
463to call a method named `f64` on `2`.
464
465The representation semantics of floating-point numbers are described in
466["Machine Types"].
467
468["Machine Types"]: types.html#machine-types
469
470### Boolean literals
471
ea8adc8c
XL
472> **<sup>Lexer</sup>**
473> BOOLEAN_LITERAL :
474> &nbsp;&nbsp; &nbsp;&nbsp; `true`
475> &nbsp;&nbsp; | `false`
476
8bb4bdeb
XL
477The two values of the boolean type are written `true` and `false`.
478
479## Symbols
480
481Symbols are a general class of printable [tokens] that play structural
482roles in a variety of grammar productions. They are a
483set of remaining miscellaneous printable tokens that do not
484otherwise appear as [unary operators], [binary
485operators], or [keywords].
486They are catalogued in [the Symbols section][symbols] of the Grammar document.
487
ea8adc8c
XL
488[unary operators]: expressions/operator-expr.html#borrow-operators
489[binary operators]: expressions/operator-expr.html#arithmetic-and-logical-binary-operators
8bb4bdeb
XL
490[tokens]: #tokens
491[symbols]: ../grammar.html#symbols
ea8adc8c 492[keywords]: keywords.html