]> git.proxmox.com Git - rustc.git/blob - src/doc/grammar.md
fb7562e7bdf8b28b8ebcb3537d66016d120efe5c
[rustc.git] / src / doc / grammar.md
1 % Grammar
2
3 # Introduction
4
5 This document is the primary reference for the Rust programming language grammar. It
6 provides only one kind of material:
7
8 - Chapters that formally define the language grammar.
9
10 This document does not serve as an introduction to the language. Background
11 familiarity with the language is assumed. A separate [guide] is available to
12 help acquire such background familiarity.
13
14 This document also does not serve as a reference to the [standard] library
15 included in the language distribution. Those libraries are documented
16 separately by extracting documentation attributes from their source code. Many
17 of the features that one might expect to be language features are library
18 features in Rust, so what you're looking for may be there, not here.
19
20 [guide]: guide.html
21 [standard]: std/index.html
22
23 # Notation
24
25 Rust's grammar is defined over Unicode codepoints, each conventionally denoted
26 `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
27 confined to the ASCII range of Unicode, and is described in this document by a
28 dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
29 supported by common automated LL(k) parsing tools such as `llgen`, rather than
30 the dialect given in ISO 14977. The dialect can be defined self-referentially
31 as follows:
32
33 ```antlr
34 grammar : rule + ;
35 rule : nonterminal ':' productionrule ';' ;
36 productionrule : production [ '|' production ] * ;
37 production : term * ;
38 term : element repeats ;
39 element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
40 repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
41 ```
42
43 Where:
44
45 - Whitespace in the grammar is ignored.
46 - Square brackets are used to group rules.
47 - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
48 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
49 Unicode codepoint `U+00QQ`.
50 - `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
51 - The `repeat` forms apply to the adjacent `element`, and are as follows:
52 - `?` means zero or one repetition
53 - `*` means zero or more repetitions
54 - `+` means one or more repetitions
55 - NUMBER trailing a repeat symbol gives a maximum repetition count
56 - NUMBER on its own gives an exact repetition count
57
58 This EBNF dialect should hopefully be familiar to many readers.
59
60 ## Unicode productions
61
62 A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
63 range. We define these productions in terms of character properties specified
64 in the Unicode standard, rather than in terms of ASCII-range codepoints. The
65 section [Special Unicode Productions](#special-unicode-productions) lists these
66 productions.
67
68 ## String table productions
69
70 Some rules in the grammar — notably [unary
71 operators](#unary-operator-expressions), [binary
72 operators](#binary-operator-expressions), and [keywords](#keywords) — are
73 given in a simplified form: as a listing of a table of unquoted, printable
74 whitespace-separated strings. These cases form a subset of the rules regarding
75 the [token](#tokens) rule, and are assumed to be the result of a
76 lexical-analysis phase feeding the parser, driven by a DFA, operating over the
77 disjunction of all such string table entries.
78
79 When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
80 it is an implicit reference to a single member of such a string table
81 production. See [tokens](#tokens) for more information.
82
83 # Lexical structure
84
85 ## Input format
86
87 Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
88 Most Rust grammar rules are defined in terms of printable ASCII-range
89 codepoints, but a small number are defined in terms of Unicode properties or
90 explicit codepoint lists. [^inputformat]
91
92 [^inputformat]: Substitute definitions for the special Unicode productions are
93 provided to the grammar verifier, restricted to ASCII range, when verifying the
94 grammar in this document.
95
96 ## Special Unicode Productions
97
98 The following productions in the Rust grammar are defined in terms of Unicode
99 properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
100 `non_double_quote`.
101
102 ### Identifiers
103
104 The `ident` production is any nonempty Unicode[^non_ascii_idents] string of
105 the following form:
106
107 [^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature
108 gated. This is expected to improve soon.
109
110 - The first character has property `XID_start`
111 - The remaining characters have property `XID_continue`
112
113 that does _not_ occur in the set of [keywords](#keywords).
114
115 > **Note**: `XID_start` and `XID_continue` as character properties cover the
116 > character ranges used to form the more familiar C and Java language-family
117 > identifiers.
118
119 ### Delimiter-restricted productions
120
121 Some productions are defined by exclusion of particular Unicode characters:
122
123 - `non_null` is any single Unicode character aside from `U+0000` (null)
124 - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
125 - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`)
126 - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)
127
128 ## Comments
129
130 ```antlr
131 comment : block_comment | line_comment ;
132 block_comment : "/*" block_comment_body * "*/" ;
133 block_comment_body : [block_comment | character] * ;
134 line_comment : "//" non_eol * ;
135 ```
136
137 **FIXME:** add doc grammar?
138
139 ## Whitespace
140
141 ```antlr
142 whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
143 whitespace : [ whitespace_char | comment ] + ;
144 ```
145
146 ## Tokens
147
148 ```antlr
149 simple_token : keyword | unop | binop ;
150 token : simple_token | ident | literal | symbol | whitespace token ;
151 ```
152
153 ### Keywords
154
155 <p id="keyword-table-marker"></p>
156
157 | | | | | |
158 |----------|----------|----------|----------|---------|
159 | abstract | alignof | as | become | box |
160 | break | const | continue | crate | do |
161 | else | enum | extern | false | final |
162 | fn | for | if | impl | in |
163 | let | loop | macro | match | mod |
164 | move | mut | offsetof | override | priv |
165 | proc | pub | pure | ref | return |
166 | Self | self | sizeof | static | struct |
167 | super | trait | true | type | typeof |
168 | unsafe | unsized | use | virtual | where |
169 | while | yield | | | |
170
171
172 Each of these keywords has special meaning in its grammar, and all of them are
173 excluded from the `ident` rule.
174
175 ### Literals
176
177 ```antlr
178 lit_suffix : ident;
179 literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
180 ```
181
182 The optional `lit_suffix` production is only used for certain numeric literals,
183 but is reserved for future extension. That is, the above gives the lexical
184 grammar, but a Rust parser will reject everything but the 12 special cases
185 mentioned in [Number literals](reference.html#number-literals) in the
186 reference.
187
188 #### Character and string literals
189
190 ```antlr
191 char_lit : '\x27' char_body '\x27' ;
192 string_lit : '"' string_body * '"' | 'r' raw_string ;
193
194 char_body : non_single_quote
195 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
196
197 string_body : non_double_quote
198 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
199 raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
200
201 common_escape : '\x5c'
202 | 'n' | 'r' | 't' | '0'
203 | 'x' hex_digit 2
204 unicode_escape : 'u' '{' hex_digit+ 6 '}';
205
206 hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
207 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
208 | dec_digit ;
209 oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
210 dec_digit : '0' | nonzero_dec ;
211 nonzero_dec: '1' | '2' | '3' | '4'
212 | '5' | '6' | '7' | '8' | '9' ;
213 ```
214
215 #### Byte and byte string literals
216
217 ```antlr
218 byte_lit : "b\x27" byte_body '\x27' ;
219 byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
220
221 byte_body : ascii_non_single_quote
222 | '\x5c' [ '\x27' | common_escape ] ;
223
224 byte_string_body : ascii_non_double_quote
225 | '\x5c' [ '\x22' | common_escape ] ;
226 raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
227
228 ```
229
230 #### Number literals
231
232 ```antlr
233 num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
234 | '0' [ [ dec_digit | '_' ] * float_suffix ?
235 | 'b' [ '1' | '0' | '_' ] +
236 | 'o' [ oct_digit | '_' ] +
237 | 'x' [ hex_digit | '_' ] + ] ;
238
239 float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
240
241 exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
242 dec_lit : [ dec_digit | '_' ] + ;
243 ```
244
245 #### Boolean literals
246
247 ```antlr
248 bool_lit : [ "true" | "false" ] ;
249 ```
250
251 The two values of the boolean type are written `true` and `false`.
252
253 ### Symbols
254
255 ```antlr
256 symbol : "::" | "->"
257 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
258 | ',' | ';' ;
259 ```
260
261 Symbols are a general class of printable [token](#tokens) that play structural
262 roles in a variety of grammar productions. They are catalogued here for
263 completeness as the set of remaining miscellaneous printable tokens that do not
264 otherwise appear as [unary operators](#unary-operator-expressions), [binary
265 operators](#binary-operator-expressions), or [keywords](#keywords).
266
267 ## Paths
268
269 ```antlr
270 expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
271 expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
272 | expr_path ;
273
274 type_path : ident [ type_path_tail ] + ;
275 type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
276 | "::" type_path ;
277 ```
278
279 # Syntax extensions
280
281 ## Macros
282
283 ```antlr
284 expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ;
285 macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
286 matcher : '(' matcher * ')' | '[' matcher * ']'
287 | '{' matcher * '}' | '$' ident ':' ident
288 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
289 | non_special_token ;
290 transcriber : '(' transcriber * ')' | '[' transcriber * ']'
291 | '{' transcriber * '}' | '$' ident
292 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
293 | non_special_token ;
294 ```
295
296 # Crates and source files
297
298 **FIXME:** grammar? What production covers #![crate_id = "foo"] ?
299
300 # Items and attributes
301
302 **FIXME:** grammar?
303
304 ## Items
305
306 ```antlr
307 item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
308 | const_item | static_item | trait_item | impl_item | extern_block ;
309 ```
310
311 ### Type Parameters
312
313 **FIXME:** grammar?
314
315 ### Modules
316
317 ```antlr
318 mod_item : "mod" ident ( ';' | '{' mod '}' );
319 mod : [ view_item | item ] * ;
320 ```
321
322 #### View items
323
324 ```antlr
325 view_item : extern_crate_decl | use_decl ';' ;
326 ```
327
328 ##### Extern crate declarations
329
330 ```antlr
331 extern_crate_decl : "extern" "crate" crate_name
332 crate_name: ident | ( ident "as" ident )
333 ```
334
335 ##### Use declarations
336
337 ```antlr
338 use_decl : vis ? "use" [ path "as" ident
339 | path_glob ] ;
340
341 path_glob : ident [ "::" [ path_glob
342 | '*' ] ] ?
343 | '{' path_item [ ',' path_item ] * '}' ;
344
345 path_item : ident | "self" ;
346 ```
347
348 ### Functions
349
350 **FIXME:** grammar?
351
352 #### Generic functions
353
354 **FIXME:** grammar?
355
356 #### Unsafety
357
358 **FIXME:** grammar?
359
360 ##### Unsafe functions
361
362 **FIXME:** grammar?
363
364 ##### Unsafe blocks
365
366 **FIXME:** grammar?
367
368 #### Diverging functions
369
370 **FIXME:** grammar?
371
372 ### Type definitions
373
374 **FIXME:** grammar?
375
376 ### Structures
377
378 **FIXME:** grammar?
379
380 ### Enumerations
381
382 **FIXME:** grammar?
383
384 ### Constant items
385
386 ```antlr
387 const_item : "const" ident ':' type '=' expr ';' ;
388 ```
389
390 ### Static items
391
392 ```antlr
393 static_item : "static" ident ':' type '=' expr ';' ;
394 ```
395
396 #### Mutable statics
397
398 **FIXME:** grammar?
399
400 ### Traits
401
402 **FIXME:** grammar?
403
404 ### Implementations
405
406 **FIXME:** grammar?
407
408 ### External blocks
409
410 ```antlr
411 extern_block_item : "extern" '{' extern_block '}' ;
412 extern_block : [ foreign_fn ] * ;
413 ```
414
415 ## Visibility and Privacy
416
417 ```antlr
418 vis : "pub" ;
419 ```
420 ### Re-exporting and Visibility
421
422 See [Use declarations](#use-declarations).
423
424 ## Attributes
425
426 ```antlr
427 attribute : '#' '!' ? '[' meta_item ']' ;
428 meta_item : ident [ '=' literal
429 | '(' meta_seq ')' ] ? ;
430 meta_seq : meta_item [ ',' meta_seq ] ? ;
431 ```
432
433 # Statements and expressions
434
435 ## Statements
436
437 ```antlr
438 stmt : decl_stmt | expr_stmt ;
439 ```
440
441 ### Declaration statements
442
443 ```antlr
444 decl_stmt : item | let_decl ;
445 ```
446
447 #### Item declarations
448
449 See [Items](#items).
450
451 #### Variable declarations
452
453 ```antlr
454 let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
455 init : [ '=' ] expr ;
456 ```
457
458 ### Expression statements
459
460 ```antlr
461 expr_stmt : expr ';' ;
462 ```
463
464 ## Expressions
465
466 ```antlr
467 expr : literal | path | tuple_expr | unit_expr | struct_expr
468 | block_expr | method_call_expr | field_expr | array_expr
469 | idx_expr | range_expr | unop_expr | binop_expr
470 | paren_expr | call_expr | lambda_expr | while_expr
471 | loop_expr | break_expr | continue_expr | for_expr
472 | if_expr | match_expr | if_let_expr | while_let_expr
473 | return_expr ;
474 ```
475
476 #### Lvalues, rvalues and temporaries
477
478 **FIXME:** grammar?
479
480 #### Moved and copied types
481
482 **FIXME:** Do we want to capture this in the grammar as different productions?
483
484 ### Literal expressions
485
486 See [Literals](#literals).
487
488 ### Path expressions
489
490 See [Paths](#paths).
491
492 ### Tuple expressions
493
494 ```antlr
495 tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
496 ```
497
498 ### Unit expressions
499
500 ```antlr
501 unit_expr : "()" ;
502 ```
503
504 ### Structure expressions
505
506 ```antlr
507 struct_expr : expr_path '{' ident ':' expr
508 [ ',' ident ':' expr ] *
509 [ ".." expr ] '}' |
510 expr_path '(' expr
511 [ ',' expr ] * ')' |
512 expr_path ;
513 ```
514
515 ### Block expressions
516
517 ```antlr
518 block_expr : '{' [ stmt ';' | item ] *
519 [ expr ] '}' ;
520 ```
521
522 ### Method-call expressions
523
524 ```antlr
525 method_call_expr : expr '.' ident paren_expr_list ;
526 ```
527
528 ### Field expressions
529
530 ```antlr
531 field_expr : expr '.' ident ;
532 ```
533
534 ### Array expressions
535
536 ```antlr
537 array_expr : '[' "mut" ? array_elems? ']' ;
538
539 array_elems : [expr [',' expr]*] | [expr ';' expr] ;
540 ```
541
542 ### Index expressions
543
544 ```antlr
545 idx_expr : expr '[' expr ']' ;
546 ```
547
548 ### Range expressions
549
550 ```antlr
551 range_expr : expr ".." expr |
552 expr ".." |
553 ".." expr |
554 ".." ;
555 ```
556
557 ### Unary operator expressions
558
559 ```antlr
560 unop_expr : unop expr ;
561 unop : '-' | '*' | '!' ;
562 ```
563
564 ### Binary operator expressions
565
566 ```antlr
567 binop_expr : expr binop expr | type_cast_expr
568 | assignment_expr | compound_assignment_expr ;
569 binop : arith_op | bitwise_op | lazy_bool_op | comp_op
570 ```
571
572 #### Arithmetic operators
573
574 ```antlr
575 arith_op : '+' | '-' | '*' | '/' | '%' ;
576 ```
577
578 #### Bitwise operators
579
580 ```antlr
581 bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
582 ```
583
584 #### Lazy boolean operators
585
586 ```antlr
587 lazy_bool_op : "&&" | "||" ;
588 ```
589
590 #### Comparison operators
591
592 ```antlr
593 comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
594 ```
595
596 #### Type cast expressions
597
598 ```antlr
599 type_cast_expr : value "as" type ;
600 ```
601
602 #### Assignment expressions
603
604 ```antlr
605 assignment_expr : expr '=' expr ;
606 ```
607
608 #### Compound assignment expressions
609
610 ```antlr
611 compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
612 ```
613
614 ### Grouped expressions
615
616 ```antlr
617 paren_expr : '(' expr ')' ;
618 ```
619
620 ### Call expressions
621
622 ```antlr
623 expr_list : [ expr [ ',' expr ]* ] ? ;
624 paren_expr_list : '(' expr_list ')' ;
625 call_expr : expr paren_expr_list ;
626 ```
627
628 ### Lambda expressions
629
630 ```antlr
631 ident_list : [ ident [ ',' ident ]* ] ? ;
632 lambda_expr : '|' ident_list '|' expr ;
633 ```
634
635 ### While loops
636
637 ```antlr
638 while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ;
639 ```
640
641 ### Infinite loops
642
643 ```antlr
644 loop_expr : [ lifetime ':' ] "loop" '{' block '}';
645 ```
646
647 ### Break expressions
648
649 ```antlr
650 break_expr : "break" [ lifetime ];
651 ```
652
653 ### Continue expressions
654
655 ```antlr
656 continue_expr : "continue" [ lifetime ];
657 ```
658
659 ### For expressions
660
661 ```antlr
662 for_expr : [ lifetime ':' ] "for" pat "in" no_struct_literal_expr '{' block '}' ;
663 ```
664
665 ### If expressions
666
667 ```antlr
668 if_expr : "if" no_struct_literal_expr '{' block '}'
669 else_tail ? ;
670
671 else_tail : "else" [ if_expr | if_let_expr
672 | '{' block '}' ] ;
673 ```
674
675 ### Match expressions
676
677 ```antlr
678 match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
679
680 match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
681
682 match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
683 ```
684
685 ### If let expressions
686
687 ```antlr
688 if_let_expr : "if" "let" pat '=' expr '{' block '}'
689 else_tail ? ;
690 else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ;
691 ```
692
693 ### While let loops
694
695 ```antlr
696 while_let_expr : "while" "let" pat '=' expr '{' block '}' ;
697 ```
698
699 ### Return expressions
700
701 ```antlr
702 return_expr : "return" expr ? ;
703 ```
704
705 # Type system
706
707 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
708
709 ## Types
710
711 ### Primitive types
712
713 **FIXME:** grammar?
714
715 #### Machine types
716
717 **FIXME:** grammar?
718
719 #### Machine-dependent integer types
720
721 **FIXME:** grammar?
722
723 ### Textual types
724
725 **FIXME:** grammar?
726
727 ### Tuple types
728
729 **FIXME:** grammar?
730
731 ### Array, and Slice types
732
733 **FIXME:** grammar?
734
735 ### Structure types
736
737 **FIXME:** grammar?
738
739 ### Enumerated types
740
741 **FIXME:** grammar?
742
743 ### Pointer types
744
745 **FIXME:** grammar?
746
747 ### Function types
748
749 **FIXME:** grammar?
750
751 ### Closure types
752
753 ```antlr
754 closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
755 [ ':' bound-list ] [ '->' type ]
756 procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')'
757 [ ':' bound-list ] [ '->' type ]
758 lifetime-list := lifetime | lifetime ',' lifetime-list
759 arg-list := ident ':' type | ident ':' type ',' arg-list
760 bound-list := bound | bound '+' bound-list
761 bound := path | lifetime
762 ```
763
764 ### Object types
765
766 **FIXME:** grammar?
767
768 ### Type parameters
769
770 **FIXME:** grammar?
771
772 ### Self types
773
774 **FIXME:** grammar?
775
776 ## Type kinds
777
778 **FIXME:** this this probably not relevant to the grammar...
779
780 # Memory and concurrency models
781
782 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
783
784 ## Memory model
785
786 ### Memory allocation and lifetime
787
788 ### Memory ownership
789
790 ### Variables
791
792 ### Boxes
793
794 ## Threads
795
796 ### Communication between threads
797
798 ### Thread lifecycle