3 Tokens are primitive productions in the grammar defined by regular
4 (non-recursive) languages. "Simple" tokens are given in [string table
5 production] form, and occur in the rest of the
6 grammar in `monospace` font. Other tokens have exact rules given.
8 [string table production]: notation.html#string-table-productions
12 A literal is an expression consisting of a single token, rather than a sequence
13 of tokens, that immediately and directly denotes the value it evaluates to,
14 rather than referring to it by name or some other evaluation rule. A literal is
15 a form of [constant expression](expressions.html#constant-expressions), so is
16 evaluated (primarily) at compile time.
20 #### Characters and strings
22 | | Example | `#` sets | Characters | Escapes |
23 |----------------------------------------------|-----------------|-------------|-------------|---------------------|
24 | [Character](#character-literals) | `'H'` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
25 | [String](#string-literals) | `"hello"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
26 | [Raw](#raw-string-literals) | `r#"hello"#` | 0 or more\* | All Unicode | `N/A` |
27 | [Byte](#byte-literals) | `b'H'` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
28 | [Byte string](#byte-string-literals) | `b"hello"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
29 | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | 0 or more\* | All ASCII | `N/A` |
31 \* The number of `#`s on each side of the same literal must be equivalent
37 | `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) |
39 | `\r` | Carriage return |
48 | `\x7F` | 8-bit character code (exactly 2 digits) |
50 | `\r` | Carriage return |
59 | `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) |
65 | `\'` | Single quote |
66 | `\"` | Double quote |
70 | [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
71 |----------------------------------------|---------|----------------|----------|
72 | Decimal integer | `98_222` | `N/A` | Integer suffixes |
73 | Hex integer | `0xff` | `N/A` | Integer suffixes |
74 | Octal integer | `0o77` | `N/A` | Integer suffixes |
75 | Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
76 | Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
78 `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
82 | Integer | Floating-point |
83 |---------|----------------|
84 | `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize` | `f32`, `f64` |
86 ### Character and string literals
88 #### Character literals
90 > **<sup>Lexer</sup>**
92 > `'` ( ~[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
95 > `\'` | `\"`
98 > `\x` OCT_DIGIT HEX_DIGIT
99 > | `\n` | `\r` | `\t` | `\\` | `\0`
102 > `\u{` ( HEX_DIGIT `_`<sup>\*</sup> )<sup>1..6</sup> `}`
104 A _character literal_ is a single Unicode character enclosed within two
105 `U+0027` (single-quote) characters, with the exception of `U+0027` itself,
106 which must be _escaped_ by a preceding `U+005C` character (`\`).
110 > **<sup>Lexer</sup>**
113 > ~[`"` `\` _IsolatedCR_]
114 > | QUOTE_ESCAPE
115 > | ASCII_ESCAPE
116 > | UNICODE_ESCAPE
117 > | STRING_CONTINUE
118 > )<sup>\*</sup> `"`
121 > `\` _followed by_ \\n
123 A _string literal_ is a sequence of any Unicode characters enclosed within two
124 `U+0022` (double-quote) characters, with the exception of `U+0022` itself,
125 which must be _escaped_ by a preceding `U+005C` character (`\`).
127 Line-break characters are allowed in string literals. Normally they represent
128 themselves (i.e. no translation), but as a special exception, when an unescaped
129 `U+005C` character (`\`) occurs immediately before the newline (`U+000A`), the
130 `U+005C` character, the newline, and all whitespace at the beginning of the
131 next line are ignored. Thus `a` and `b` are equal:
141 #### Character escapes
143 Some additional _escapes_ are available in either character or non-raw string
144 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
147 * A _7-bit code point escape_ starts with `U+0078` (`x`) and is
148 followed by exactly two _hex digits_ with value up to `0x7F`. It denotes the
149 ASCII character with value equal to the provided hex value. Higher values are
150 not permitted because it is ambiguous whether they mean Unicode code points or
152 * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
153 by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
154 (`}`). It denotes the Unicode code point equal to the provided hex value.
155 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
156 (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF),
157 `U+000D` (CR) or `U+0009` (HT) respectively.
158 * The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode
159 value `U+0000` (NUL).
160 * The _backslash escape_ is the character `U+005C` (`\`) which must be
161 escaped in order to denote itself.
163 #### Raw string literals
165 > **<sup>Lexer</sup>**
166 > RAW_STRING_LITERAL :
167 > `r` RAW_STRING_CONTENT
169 > RAW_STRING_CONTENT :
170 > `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`
171 > | `#` RAW_STRING_CONTENT `#`
173 Raw string literals do not process any escapes. They start with the character
174 `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
175 `U+0022` (double-quote) character. The _raw string body_ can contain any sequence
176 of Unicode characters and is terminated only by another `U+0022` (double-quote)
177 character, followed by the same number of `U+0023` (`#`) characters that preceded
178 the opening `U+0022` (double-quote) character.
180 All Unicode characters contained in the raw string body represent themselves,
181 the characters `U+0022` (double-quote) (except when followed by at least as
182 many `U+0023` (`#`) characters as were used to start the raw string literal) or
183 `U+005C` (`\`) do not have any special meaning.
185 Examples for string literals:
188 "foo"; r"foo"; // foo
189 "\"foo\""; r#""foo""#; // "foo"
192 r##"foo #"# bar"##; // foo #"# bar
194 "\x52"; "R"; r"R"; // R
195 "\\x52"; r"\x52"; // \x52
198 ### Byte and byte string literals
202 > **<sup>Lexer</sup>**
204 > `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'`
207 > _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `\`, \\n, \\r or \\t
210 > `\x` HEX_DIGIT HEX_DIGIT
211 > | `\n` | `\r` | `\t` | `\\` | `\0`
213 A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
214 range) or a single _escape_ preceded by the characters `U+0062` (`b`) and
215 `U+0027` (single-quote), and followed by the character `U+0027`. If the character
216 `U+0027` is present within the literal, it must be _escaped_ by a preceding
217 `U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer
220 #### Byte string literals
222 > **<sup>Lexer</sup>**
223 > BYTE_STRING_LITERAL :
224 > `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
227 > _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `\` _and IsolatedCR_
229 A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_,
230 preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and
231 followed by the character `U+0022`. If the character `U+0022` is present within
232 the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
233 Alternatively, a byte string literal can be a _raw byte string literal_, defined
234 below. The type of a byte string literal of length `n` is `&'static [u8; n]`.
236 Some additional _escapes_ are available in either byte or non-raw byte string
237 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
240 * A _byte escape_ escape starts with `U+0078` (`x`) and is
241 followed by exactly two _hex digits_. It denotes the byte
242 equal to the provided hex value.
243 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
244 (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
245 `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
246 * The _null escape_ is the character `U+0030` (`0`) and denotes the byte
247 value `0x00` (ASCII NUL).
248 * The _backslash escape_ is the character `U+005C` (`\`) which must be
249 escaped in order to denote its ASCII encoding `0x5C`.
251 #### Raw byte string literals
253 > **<sup>Lexer</sup>**
254 > RAW_BYTE_STRING_LITERAL :
255 > `br` RAW_BYTE_STRING_CONTENT
257 > RAW_BYTE_STRING_CONTENT :
258 > `"` ASCII<sup>* (non-greedy)</sup> `"`
259 > | `#` RAW_STRING_CONTENT `#`
262 > _any ASCII (i.e. 0x00 to 0x7F)_
264 Raw byte string literals do not process any escapes. They start with the
265 character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
266 of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
267 _raw string body_ can contain any sequence of ASCII characters and is terminated
268 only by another `U+0022` (double-quote) character, followed by the same number of
269 `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
270 character. A raw byte string literal can not contain any non-ASCII byte.
272 All characters contained in the raw string body represent their ASCII encoding,
273 the characters `U+0022` (double-quote) (except when followed by at least as
274 many `U+0023` (`#`) characters as were used to start the raw string literal) or
275 `U+005C` (`\`) do not have any special meaning.
277 Examples for byte string literals:
280 b"foo"; br"foo"; // foo
281 b"\"foo\""; br#""foo""#; // "foo"
284 br##"foo #"# bar"##; // foo #"# bar
286 b"\x52"; b"R"; br"R"; // R
287 b"\\x52"; br"\x52"; // \x52
292 A _number literal_ is either an _integer literal_ or a _floating-point
293 literal_. The grammar for recognizing the two kinds of literals is mixed.
295 #### Integer literals
297 > **<sup>Lexer</sup>**
299 > ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
300 > INTEGER_SUFFIX<sup>?</sup>
303 > DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
306 > `0`
307 > | NON_ZERO_DEC_DIGIT DEC_DIGIT<sup>\*</sup>
310 > `0b` (BIN_DIGIT|`_`)<sup>\*</sup> BIN_DIGIT (BIN_DIGIT|`_`)<sup>\*</sup>
313 > `0o` (OCT_DIGIT|`_`)<sup>\*</sup> OCT_DIGIT (OCT_DIGIT|`_`)<sup>\*</sup>
316 > `0x` (HEX_DIGIT|`_`)<sup>\*</sup> HEX_DIGIT (HEX_DIGIT|`_`)<sup>\*</sup>
318 > BIN_DIGIT : [`0`-`1`]
320 > OCT_DIGIT : [`0`-`7`]
322 > DEC_DIGIT : [`0`-`9`]
324 > NON_ZERO_DEC_DIGIT : [`1`-`9`]
326 > HEX_DIGIT : [`0`-`9` `a`-`f` `A`-`F`]
329 > `u8` | `u16` | `u32` | `u64` | `u128` | `usize`
330 > | `i8` | `i16` | `i32` | `i64` | `i128` | `isize`
332 An _integer literal_ has one of four forms:
334 * A _decimal literal_ starts with a *decimal digit* and continues with any
335 mixture of *decimal digits* and _underscores_.
336 * A _tuple index_ is either `0`, or starts with a *non-zero decimal digit* and
337 continues with zero or more decimal digits. Tuple indexes are used to refer
338 to the fields of [tuples], [tuple structs] and [tuple variants].
339 * A _hex literal_ starts with the character sequence `U+0030` `U+0078`
340 (`0x`) and continues as any mixture (with at least one digit) of hex digits
342 * An _octal literal_ starts with the character sequence `U+0030` `U+006F`
343 (`0o`) and continues as any mixture (with at least one digit) of octal digits
345 * A _binary literal_ starts with the character sequence `U+0030` `U+0062`
346 (`0b`) and continues as any mixture (with at least one digit) of binary digits
349 Like any literal, an integer literal may be followed (immediately,
350 without any spaces) by an _integer suffix_, which forcibly sets the
351 type of the literal. The integer suffix must be the name of one of the
352 integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`,
353 `u128`, `i128`, `usize`, or `isize`.
355 The type of an _unsuffixed_ integer literal is determined by type inference:
357 * If an integer type can be _uniquely_ determined from the surrounding
358 program context, the unsuffixed integer literal has that type.
360 * If the program context under-constrains the type, it defaults to the
361 signed 32-bit integer `i32`.
363 * If the program context over-constrains the type, it is considered a
366 Examples of integer literals of various forms:
373 let a: u64 = 123; // type u64
379 0o70_i16; // type i16
381 0b1111_1111_1001_0000; // type i32
382 0b1111_1111_1001_0000i64; // type i64
383 0b________1; // type i32
385 0usize; // type usize
388 Examples of invalid integer literals:
395 // uses numbers of the wrong base
401 // integers too big for their type (they overflow)
406 // bin, hex and octal literals must have at least one digit
412 Note that the Rust syntax considers `-1i8` as an application of the [unary minus
413 operator] to an integer literal `1i8`, rather than
414 a single integer literal.
416 [unary minus operator]: expressions/operator-expr.html#negation-operators
418 #### Floating-point literals
420 > **<sup>Lexer</sup>**
422 > DEC_LITERAL `.`
423 > _(not immediately followed by `.`, `_` or an [identifier]_)
424 > | DEC_LITERAL FLOAT_EXPONENT
425 > | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>
426 > | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
427 > FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
430 > (`e`|`E`) (`+`|`-`)?
431 > (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
434 > `f32` | `f64`
436 A _floating-point literal_ has one of two forms:
438 * A _decimal literal_ followed by a period character `U+002E` (`.`). This is
439 optionally followed by another decimal literal, with an optional _exponent_.
440 * A single _decimal literal_ followed by an _exponent_.
442 Like integer literals, a floating-point literal may be followed by a
443 suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
444 The suffix forcibly sets the type of the literal. There are two valid
445 _floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point
446 types), which explicitly determine the type of the literal.
448 The type of an _unsuffixed_ floating-point literal is determined by
451 * If a floating-point type can be _uniquely_ determined from the
452 surrounding program context, the unsuffixed floating-point literal
455 * If the program context under-constrains the type, it defaults to `f64`.
457 * If the program context over-constrains the type, it is considered a
460 Examples of floating-point literals of various forms:
463 123.0f64; // type f64
466 12E+99_f64; // type f64
467 let x: f64 = 2.; // type f64
470 This last example is different because it is not possible to use the suffix
471 syntax with a floating point literal ending in a period. `2.f64` would attempt
472 to call a method named `f64` on `2`.
474 The representation semantics of floating-point numbers are described in
477 ["Machine Types"]: types.html#machine-types
481 > **<sup>Lexer</sup>**
483 > `true`
484 > | `false`
486 The two values of the boolean type are written `true` and `false`.
488 ## Lifetimes and loop labels
490 > **<sup>Lexer</sup>**
492 > `'` [IDENTIFIER_OR_KEYWORD][identifier]
493 > | `'_`
495 > LIFETIME_OR_LABEL :
496 > `'` [IDENTIFIER][identifier]
498 Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any
499 LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in
502 [loop labels]: expressions/loop-expr.html
506 Symbols are a general class of printable [tokens] that play structural
507 roles in a variety of grammar productions. They are a
508 set of remaining miscellaneous printable tokens that do not
509 otherwise appear as [unary operators], [binary
510 operators], or [keywords].
511 They are catalogued in [the Symbols section][symbols] of the Grammar document.
513 [unary operators]: expressions/operator-expr.html#borrow-operators
514 [binary operators]: expressions/operator-expr.html#arithmetic-and-logical-binary-operators
516 [symbols]: ../grammar.html#symbols
517 [keywords]: keywords.html
518 [identifier]: identifiers.html
519 [tuples]: types.html#tuple-types
520 [tuple structs]: items/structs.html
521 [tuple variants]: items/enumerations.html