]>
Commit | Line | Data |
---|---|---|
1 | # Tokens | |
2 | ||
3 | Tokens are primitive productions in the grammar defined by regular | |
4 | (non-recursive) languages. "Simple" tokens are given in [string table | |
5 | production] form, and occur in the rest of the | |
6 | grammar as double-quoted strings. Other tokens have exact rules given. | |
7 | ||
8 | [string table production]: notation.html#string-table-productions | |
9 | ||
10 | ## Literals | |
11 | ||
12 | A literal is an expression consisting of a single token, rather than a sequence | |
13 | of tokens, that immediately and directly denotes the value it evaluates to, | |
14 | rather than referring to it by name or some other evaluation rule. A literal is | |
15 | a form of [constant expression](expressions.html#constant-expressions), so is | |
16 | evaluated (primarily) at compile time. | |
17 | ||
18 | ### Examples | |
19 | ||
20 | #### Characters and strings | |
21 | ||
22 | | | Example | `#` sets | Characters | Escapes | | |
23 | |----------------------------------------------|-----------------|------------|-------------|---------------------| | |
24 | | [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | | |
25 | | [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | | |
26 | | [Raw](#raw-string-literals) | `r#"hello"#` | `0...` | All Unicode | `N/A` | | |
27 | | [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | | |
28 | | [Byte string](#byte-string-literals) | `b"hello"` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | | |
29 | | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | `0...` | All ASCII | `N/A` | | |
30 | ||
31 | #### ASCII escapes | |
32 | ||
33 | | | Name | | |
34 | |---|------| | |
35 | | `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) | | |
36 | | `\n` | Newline | | |
37 | | `\r` | Carriage return | | |
38 | | `\t` | Tab | | |
39 | | `\\` | Backslash | | |
40 | | `\0` | Null | | |
41 | ||
42 | #### Byte escapes | |
43 | ||
44 | | | Name | | |
45 | |---|------| | |
46 | | `\x7F` | 8-bit character code (exactly 2 digits) | | |
47 | | `\n` | Newline | | |
48 | | `\r` | Carriage return | | |
49 | | `\t` | Tab | | |
50 | | `\\` | Backslash | | |
51 | | `\0` | Null | | |
52 | ||
53 | #### Unicode escapes | |
54 | ||
55 | | | Name | | |
56 | |---|------| | |
57 | | `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) | | |
58 | ||
59 | #### Quote escapes | |
60 | ||
61 | | | Name | | |
62 | |---|------| | |
63 | | `\'` | Single quote | | |
64 | | `\"` | Double quote | | |
65 | ||
66 | #### Numbers | |
67 | ||
68 | | [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes | | |
69 | |----------------------------------------|---------|----------------|----------| | |
70 | | Decimal integer | `98_222` | `N/A` | Integer suffixes | | |
71 | | Hex integer | `0xff` | `N/A` | Integer suffixes | | |
72 | | Octal integer | `0o77` | `N/A` | Integer suffixes | | |
73 | | Binary integer | `0b1111_0000` | `N/A` | Integer suffixes | | |
74 | | Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes | | |
75 | ||
76 | `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64` | |
77 | ||
78 | #### Suffixes | |
79 | ||
80 | | Integer | Floating-point | | |
81 | |---------|----------------| | |
82 | | `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `isize`, `usize` | `f32`, `f64` | | |
83 | ||
84 | ### Character and string literals | |
85 | ||
86 | #### Character literals | |
87 | ||
88 | > **<sup>Lexer</sup>** | |
89 | > CHAR_LITERAL : | |
90 | > `'` ( ~[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'` | |
91 | > | |
92 | > QUOTE_ESCAPE : | |
93 | > `\'` | `\"` | |
94 | > | |
95 | > ASCII_ESCAPE : | |
96 | > `\x` OCT_DIGIT HEX_DIGIT | |
97 | > | `\n` | `\r` | `\t` | `\\` | `\0` | |
98 | > | |
99 | > UNICODE_ESCAPE : | |
100 | > `\u{` ( HEX_DIGIT `_`<sup>\*</sup> )<sup>1..6</sup> `}` | |
101 | ||
102 | A _character literal_ is a single Unicode character enclosed within two | |
103 | `U+0027` (single-quote) characters, with the exception of `U+0027` itself, | |
104 | which must be _escaped_ by a preceding `U+005C` character (`\`). | |
105 | ||
106 | #### String literals | |
107 | ||
108 | > **<sup>Lexer</sup>** | |
109 | > STRING_LITERAL : | |
110 | > `"` ( | |
111 | > ~[`"` `\` _IsolatedCR_] | |
112 | > | QUOTE_ESCAPE | |
113 | > | ASCII_ESCAPE | |
114 | > | UNICODE_ESCAPE | |
115 | > | STRING_CONTINUE | |
116 | > )<sup>\*</sup> `"` | |
117 | > | |
118 | > STRING_CONTINUE : | |
119 | > `\` _followed by_ \\n | |
120 | ||
121 | A _string literal_ is a sequence of any Unicode characters enclosed within two | |
122 | `U+0022` (double-quote) characters, with the exception of `U+0022` itself, | |
123 | which must be _escaped_ by a preceding `U+005C` character (`\`). | |
124 | ||
125 | Line-break characters are allowed in string literals. Normally they represent | |
126 | themselves (i.e. no translation), but as a special exception, when an unescaped | |
127 | `U+005C` character (`\`) occurs immediately before the newline (`U+000A`), the | |
128 | `U+005C` character, the newline, and all whitespace at the beginning of the | |
129 | next line are ignored. Thus `a` and `b` are equal: | |
130 | ||
131 | ```rust | |
132 | let a = "foobar"; | |
133 | let b = "foo\ | |
134 | bar"; | |
135 | ||
136 | assert_eq!(a,b); | |
137 | ``` | |
138 | ||
139 | #### Character escapes | |
140 | ||
141 | Some additional _escapes_ are available in either character or non-raw string | |
142 | literals. An escape starts with a `U+005C` (`\`) and continues with one of the | |
143 | following forms: | |
144 | ||
145 | * An _8-bit code point escape_ starts with `U+0078` (`x`) and is | |
146 | followed by exactly two _hex digits_. It denotes the Unicode code point | |
147 | equal to the provided hex value. | |
148 | * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed | |
149 | by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D` | |
150 | (`}`). It denotes the Unicode code point equal to the provided hex value. | |
151 | * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072` | |
152 | (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF), | |
153 | `U+000D` (CR) or `U+0009` (HT) respectively. | |
154 | * The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode | |
155 | value `U+0000` (NUL). | |
156 | * The _backslash escape_ is the character `U+005C` (`\`) which must be | |
157 | escaped in order to denote *itself*. | |
158 | ||
159 | #### Raw string literals | |
160 | ||
161 | > **<sup>Lexer</sup>** | |
162 | > RAW_STRING_LITERAL : | |
163 | > `r` RAW_STRING_CONTENT | |
164 | > | |
165 | > RAW_STRING_CONTENT : | |
166 | > `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"` | |
167 | > | `#` RAW_STRING_CONTENT `#` | |
168 | ||
169 | Raw string literals do not process any escapes. They start with the character | |
170 | `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a | |
171 | `U+0022` (double-quote) character. The _raw string body_ can contain any sequence | |
172 | of Unicode characters and is terminated only by another `U+0022` (double-quote) | |
173 | character, followed by the same number of `U+0023` (`#`) characters that preceded | |
174 | the opening `U+0022` (double-quote) character. | |
175 | ||
176 | All Unicode characters contained in the raw string body represent themselves, | |
177 | the characters `U+0022` (double-quote) (except when followed by at least as | |
178 | many `U+0023` (`#`) characters as were used to start the raw string literal) or | |
179 | `U+005C` (`\`) do not have any special meaning. | |
180 | ||
181 | Examples for string literals: | |
182 | ||
183 | ```rust | |
184 | "foo"; r"foo"; // foo | |
185 | "\"foo\""; r#""foo""#; // "foo" | |
186 | ||
187 | "foo #\"# bar"; | |
188 | r##"foo #"# bar"##; // foo #"# bar | |
189 | ||
190 | "\x52"; "R"; r"R"; // R | |
191 | "\\x52"; r"\x52"; // \x52 | |
192 | ``` | |
193 | ||
194 | ### Byte and byte string literals | |
195 | ||
196 | #### Byte literals | |
197 | ||
198 | > **<sup>Lexer</sup>** | |
199 | > BYTE_LITERAL : | |
200 | > `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'` | |
201 | > | |
202 | > ASCII_FOR_CHAR : | |
203 | > _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `/`, \\n, \\r or \\t | |
204 | > | |
205 | > BYTE_ESCAPE : | |
206 | > `\x` HEX_DIGIT HEX_DIGIT | |
207 | > | `\n` | `\r` | `\t` | `\\` | `\0` | |
208 | ||
209 | A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F` | |
210 | range) or a single _escape_ preceded by the characters `U+0062` (`b`) and | |
211 | `U+0027` (single-quote), and followed by the character `U+0027`. If the character | |
212 | `U+0027` is present within the literal, it must be _escaped_ by a preceding | |
213 | `U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer | |
214 | _number literal_. | |
215 | ||
216 | #### Byte string literals | |
217 | ||
218 | > **<sup>Lexer</sup>** | |
219 | > BYTE_STRING_LITERAL : | |
220 | > `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"` | |
221 | > | |
222 | > ASCII_FOR_STRING : | |
223 | > _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `/` _and IsolatedCR_ | |
224 | ||
225 | A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_, | |
226 | preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and | |
227 | followed by the character `U+0022`. If the character `U+0022` is present within | |
228 | the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character. | |
229 | Alternatively, a byte string literal can be a _raw byte string literal_, defined | |
230 | below. A byte string literal of length `n` is equivalent to a `&'static [u8; n]` borrowed fixed-sized array | |
231 | of unsigned 8-bit integers. | |
232 | ||
233 | Some additional _escapes_ are available in either byte or non-raw byte string | |
234 | literals. An escape starts with a `U+005C` (`\`) and continues with one of the | |
235 | following forms: | |
236 | ||
237 | * A _byte escape_ escape starts with `U+0078` (`x`) and is | |
238 | followed by exactly two _hex digits_. It denotes the byte | |
239 | equal to the provided hex value. | |
240 | * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072` | |
241 | (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF), | |
242 | `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively. | |
243 | * The _null escape_ is the character `U+0030` (`0`) and denotes the byte | |
244 | value `0x00` (ASCII NUL). | |
245 | * The _backslash escape_ is the character `U+005C` (`\`) which must be | |
246 | escaped in order to denote its ASCII encoding `0x5C`. | |
247 | ||
248 | #### Raw byte string literals | |
249 | ||
250 | > **<sup>Lexer</sup>** | |
251 | > RAW_BYTE_STRING_LITERAL : | |
252 | > `br` RAW_BYTE_STRING_CONTENT | |
253 | > | |
254 | > RAW_BYTE_STRING_CONTENT : | |
255 | > `"` ASCII<sup>* (non-greedy)</sup> `"` | |
256 | > | `#` RAW_STRING_CONTENT `#` | |
257 | > | |
258 | > ASCII : | |
259 | > _any ASCII (i.e. 0x00 to 0x7F)_ | |
260 | ||
261 | Raw byte string literals do not process any escapes. They start with the | |
262 | character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more | |
263 | of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The | |
264 | _raw string body_ can contain any sequence of ASCII characters and is terminated | |
265 | only by another `U+0022` (double-quote) character, followed by the same number of | |
266 | `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) | |
267 | character. A raw byte string literal can not contain any non-ASCII byte. | |
268 | ||
269 | All characters contained in the raw string body represent their ASCII encoding, | |
270 | the characters `U+0022` (double-quote) (except when followed by at least as | |
271 | many `U+0023` (`#`) characters as were used to start the raw string literal) or | |
272 | `U+005C` (`\`) do not have any special meaning. | |
273 | ||
274 | Examples for byte string literals: | |
275 | ||
276 | ```rust | |
277 | b"foo"; br"foo"; // foo | |
278 | b"\"foo\""; br#""foo""#; // "foo" | |
279 | ||
280 | b"foo #\"# bar"; | |
281 | br##"foo #"# bar"##; // foo #"# bar | |
282 | ||
283 | b"\x52"; b"R"; br"R"; // R | |
284 | b"\\x52"; br"\x52"; // \x52 | |
285 | ``` | |
286 | ||
287 | ### Number literals | |
288 | ||
289 | A _number literal_ is either an _integer literal_ or a _floating-point | |
290 | literal_. The grammar for recognizing the two kinds of literals is mixed. | |
291 | ||
292 | #### Integer literals | |
293 | ||
294 | > **<sup>Lexer</sup>** | |
295 | > INTEGER_LITERAL : | |
296 | > ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) | |
297 | > INTEGER_SUFFIX<sup>?</sup> | |
298 | > | |
299 | > DEC_LITERAL : | |
300 | > DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup> | |
301 | > | |
302 | > BIN_LITERAL : | |
303 | > `0b` (BIN_DIGIT|`_`)<sup>\*</sup> BIN_DIGIT (BIN_DIGIT|`_`)<sup>\*</sup> | |
304 | > | |
305 | > OCT_LITERAL : | |
306 | > `0o` (OCT_DIGIT|`_`)<sup>\*</sup> OCT_DIGIT (OCT_DIGIT|`_`)<sup>\*</sup> | |
307 | > | |
308 | > HEX_LITERAL : | |
309 | > `0x` (HEX_DIGIT|`_`)<sup>\*</sup> HEX_DIGIT (HEX_DIGIT|`_`)<sup>\*</sup> | |
310 | > | |
311 | > BIN_DIGIT : [`0`-`1`] | |
312 | > | |
313 | > OCT_DIGIT : [`0`-`7`] | |
314 | > | |
315 | > DEC_DIGIT : [`0`-`9`] | |
316 | > | |
317 | > HEX_DIGIT : [`0`-`9` `a`-`f` `A`-`F`] | |
318 | > | |
319 | > INTEGER_SUFFIX : | |
320 | > `u8` | `u16` | `u32` | `u64` | `usize` | |
321 | > | `i8` | `u16` | `i32` | `i64` | `usize` | |
322 | ||
323 | <!-- FIXME: separate the DECIMAL_LITERAL with no prefix or suffix (used on tuple indexing and float_literal --> | |
324 | <!-- FIXME: u128 and i128 --> | |
325 | ||
326 | An _integer literal_ has one of four forms: | |
327 | ||
328 | * A _decimal literal_ starts with a *decimal digit* and continues with any | |
329 | mixture of *decimal digits* and _underscores_. | |
330 | * A _hex literal_ starts with the character sequence `U+0030` `U+0078` | |
331 | (`0x`) and continues as any mixture (with at least one digit) of hex digits | |
332 | and underscores. | |
333 | * An _octal literal_ starts with the character sequence `U+0030` `U+006F` | |
334 | (`0o`) and continues as any mixture (with at least one digit) of octal digits | |
335 | and underscores. | |
336 | * A _binary literal_ starts with the character sequence `U+0030` `U+0062` | |
337 | (`0b`) and continues as any mixture (with at least one digit) of binary digits | |
338 | and underscores. | |
339 | ||
340 | Like any literal, an integer literal may be followed (immediately, | |
341 | without any spaces) by an _integer suffix_, which forcibly sets the | |
342 | type of the literal. The integer suffix must be the name of one of the | |
343 | integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, | |
344 | `isize`, or `usize`. | |
345 | ||
346 | The type of an _unsuffixed_ integer literal is determined by type inference: | |
347 | ||
348 | * If an integer type can be _uniquely_ determined from the surrounding | |
349 | program context, the unsuffixed integer literal has that type. | |
350 | ||
351 | * If the program context under-constrains the type, it defaults to the | |
352 | signed 32-bit integer `i32`. | |
353 | ||
354 | * If the program context over-constrains the type, it is considered a | |
355 | static type error. | |
356 | ||
357 | Examples of integer literals of various forms: | |
358 | ||
359 | ```rust | |
360 | 123; // type i32 | |
361 | 123i32; // type i32 | |
362 | 123u32; // type u32 | |
363 | 123_u32; // type u32 | |
364 | let a: u64 = 123; // type u64 | |
365 | ||
366 | 0xff; // type i32 | |
367 | 0xff_u8; // type u8 | |
368 | ||
369 | 0o70; // type i32 | |
370 | 0o70_i16; // type i16 | |
371 | ||
372 | 0b1111_1111_1001_0000; // type i32 | |
373 | 0b1111_1111_1001_0000i32; // type i64 | |
374 | 0b________1; // type i32 | |
375 | ||
376 | 0usize; // type usize | |
377 | ``` | |
378 | ||
379 | Examples of invalid integer literals: | |
380 | ||
381 | ```rust,ignore | |
382 | // invalid suffixes | |
383 | ||
384 | 0invalidSuffix; | |
385 | ||
386 | // uses numbers of the wrong base | |
387 | ||
388 | 123AFB43; | |
389 | 0b0102; | |
390 | 0o0581; | |
391 | ||
392 | // integers too big for their type (they overflow) | |
393 | ||
394 | 128_i8; | |
395 | 256_u8; | |
396 | ||
397 | // bin, hex and octal literals must have at least one digit | |
398 | ||
399 | 0b_; | |
400 | 0b____; | |
401 | ``` | |
402 | ||
403 | Note that the Rust syntax considers `-1i8` as an application of the [unary minus | |
404 | operator] to an integer literal `1i8`, rather than | |
405 | a single integer literal. | |
406 | ||
407 | [unary minus operator]: expressions/operator-expr.html#negation-operators | |
408 | ||
409 | #### Floating-point literals | |
410 | ||
411 | > **<sup>Lexer</sup>** | |
412 | > FLOAT_LITERAL : | |
413 | > DEC_LITERAL `.` | |
414 | > _(not immediately followed by `.`, `_` or an identifier_) | |
415 | > | DEC_LITERAL FLOAT_EXPONENT | |
416 | > | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup> | |
417 | > | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup> | |
418 | > FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX | |
419 | > | |
420 | > FLOAT_EXPONENT : | |
421 | > (`e`|`E`) (`+`|`-`)? | |
422 | > (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup> | |
423 | > | |
424 | > FLOAT_SUFFIX : | |
425 | > `f32` | `f64` | |
426 | ||
427 | A _floating-point literal_ has one of two forms: | |
428 | ||
429 | * A _decimal literal_ followed by a period character `U+002E` (`.`). This is | |
430 | optionally followed by another decimal literal, with an optional _exponent_. | |
431 | * A single _decimal literal_ followed by an _exponent_. | |
432 | ||
433 | Like integer literals, a floating-point literal may be followed by a | |
434 | suffix, so long as the pre-suffix part does not end with `U+002E` (`.`). | |
435 | The suffix forcibly sets the type of the literal. There are two valid | |
436 | _floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point | |
437 | types), which explicitly determine the type of the literal. | |
438 | ||
439 | The type of an _unsuffixed_ floating-point literal is determined by | |
440 | type inference: | |
441 | ||
442 | * If a floating-point type can be _uniquely_ determined from the | |
443 | surrounding program context, the unsuffixed floating-point literal | |
444 | has that type. | |
445 | ||
446 | * If the program context under-constrains the type, it defaults to `f64`. | |
447 | ||
448 | * If the program context over-constrains the type, it is considered a | |
449 | static type error. | |
450 | ||
451 | Examples of floating-point literals of various forms: | |
452 | ||
453 | ```rust | |
454 | 123.0f64; // type f64 | |
455 | 0.1f64; // type f64 | |
456 | 0.1f32; // type f32 | |
457 | 12E+99_f64; // type f64 | |
458 | let x: f64 = 2.; // type f64 | |
459 | ``` | |
460 | ||
461 | This last example is different because it is not possible to use the suffix | |
462 | syntax with a floating point literal ending in a period. `2.f64` would attempt | |
463 | to call a method named `f64` on `2`. | |
464 | ||
465 | The representation semantics of floating-point numbers are described in | |
466 | ["Machine Types"]. | |
467 | ||
468 | ["Machine Types"]: types.html#machine-types | |
469 | ||
470 | ### Boolean literals | |
471 | ||
472 | > **<sup>Lexer</sup>** | |
473 | > BOOLEAN_LITERAL : | |
474 | > `true` | |
475 | > | `false` | |
476 | ||
477 | The two values of the boolean type are written `true` and `false`. | |
478 | ||
479 | ## Symbols | |
480 | ||
481 | Symbols are a general class of printable [tokens] that play structural | |
482 | roles in a variety of grammar productions. They are a | |
483 | set of remaining miscellaneous printable tokens that do not | |
484 | otherwise appear as [unary operators], [binary | |
485 | operators], or [keywords]. | |
486 | They are catalogued in [the Symbols section][symbols] of the Grammar document. | |
487 | ||
488 | [unary operators]: expressions/operator-expr.html#borrow-operators | |
489 | [binary operators]: expressions/operator-expr.html#arithmetic-and-logical-binary-operators | |
490 | [tokens]: #tokens | |
491 | [symbols]: ../grammar.html#symbols | |
492 | [keywords]: keywords.html |