]> git.proxmox.com Git - rustc.git/blob - src/doc/grammar.md
Imported Upstream version 1.2.0+dfsg1
[rustc.git] / src / doc / grammar.md
1 % Grammar
2
3 # Introduction
4
5 This document is the primary reference for the Rust programming language grammar. It
6 provides only one kind of material:
7
8 - Chapters that formally define the language grammar.
9
10 This document does not serve as an introduction to the language. Background
11 familiarity with the language is assumed. A separate [guide] is available to
12 help acquire such background familiarity.
13
14 This document also does not serve as a reference to the [standard] library
15 included in the language distribution. Those libraries are documented
16 separately by extracting documentation attributes from their source code. Many
17 of the features that one might expect to be language features are library
18 features in Rust, so what you're looking for may be there, not here.
19
20 [guide]: guide.html
21 [standard]: std/index.html
22
23 # Notation
24
25 Rust's grammar is defined over Unicode codepoints, each conventionally denoted
26 `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
27 confined to the ASCII range of Unicode, and is described in this document by a
28 dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
29 supported by common automated LL(k) parsing tools such as `llgen`, rather than
30 the dialect given in ISO 14977. The dialect can be defined self-referentially
31 as follows:
32
33 ```antlr
34 grammar : rule + ;
35 rule : nonterminal ':' productionrule ';' ;
36 productionrule : production [ '|' production ] * ;
37 production : term * ;
38 term : element repeats ;
39 element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
40 repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
41 ```
42
43 Where:
44
45 - Whitespace in the grammar is ignored.
46 - Square brackets are used to group rules.
47 - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
48 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
49 Unicode codepoint `U+00QQ`.
50 - `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
51 - The `repeat` forms apply to the adjacent `element`, and are as follows:
52 - `?` means zero or one repetition
53 - `*` means zero or more repetitions
54 - `+` means one or more repetitions
55 - NUMBER trailing a repeat symbol gives a maximum repetition count
56 - NUMBER on its own gives an exact repetition count
57
58 This EBNF dialect should hopefully be familiar to many readers.
59
60 ## Unicode productions
61
62 A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
63 range. We define these productions in terms of character properties specified
64 in the Unicode standard, rather than in terms of ASCII-range codepoints. The
65 section [Special Unicode Productions](#special-unicode-productions) lists these
66 productions.
67
68 ## String table productions
69
70 Some rules in the grammar — notably [unary
71 operators](#unary-operator-expressions), [binary
72 operators](#binary-operator-expressions), and [keywords](#keywords) — are
73 given in a simplified form: as a listing of a table of unquoted, printable
74 whitespace-separated strings. These cases form a subset of the rules regarding
75 the [token](#tokens) rule, and are assumed to be the result of a
76 lexical-analysis phase feeding the parser, driven by a DFA, operating over the
77 disjunction of all such string table entries.
78
79 When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
80 it is an implicit reference to a single member of such a string table
81 production. See [tokens](#tokens) for more information.
82
83 # Lexical structure
84
85 ## Input format
86
87 Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
88 Most Rust grammar rules are defined in terms of printable ASCII-range
89 codepoints, but a small number are defined in terms of Unicode properties or
90 explicit codepoint lists. [^inputformat]
91
92 [^inputformat]: Substitute definitions for the special Unicode productions are
93 provided to the grammar verifier, restricted to ASCII range, when verifying the
94 grammar in this document.
95
96 ## Special Unicode Productions
97
98 The following productions in the Rust grammar are defined in terms of Unicode
99 properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
100 `non_double_quote`.
101
102 ### Identifiers
103
104 The `ident` production is any nonempty Unicode[^non_ascii_idents] string of
105 the following form:
106
107 [^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature
108 gated. This is expected to improve soon.
109
110 - The first character has property `XID_start`
111 - The remaining characters have property `XID_continue`
112
113 that does _not_ occur in the set of [keywords](#keywords).
114
115 > **Note**: `XID_start` and `XID_continue` as character properties cover the
116 > character ranges used to form the more familiar C and Java language-family
117 > identifiers.
118
119 ### Delimiter-restricted productions
120
121 Some productions are defined by exclusion of particular Unicode characters:
122
123 - `non_null` is any single Unicode character aside from `U+0000` (null)
124 - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
125 - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`)
126 - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)
127
128 ## Comments
129
130 ```antlr
131 comment : block_comment | line_comment ;
132 block_comment : "/*" block_comment_body * "*/" ;
133 block_comment_body : [block_comment | character] * ;
134 line_comment : "//" non_eol * ;
135 ```
136
137 **FIXME:** add doc grammar?
138
139 ## Whitespace
140
141 ```antlr
142 whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
143 whitespace : [ whitespace_char | comment ] + ;
144 ```
145
146 ## Tokens
147
148 ```antlr
149 simple_token : keyword | unop | binop ;
150 token : simple_token | ident | literal | symbol | whitespace token ;
151 ```
152
153 ### Keywords
154
155 <p id="keyword-table-marker"></p>
156
157 | | | | | |
158 |----------|----------|----------|----------|---------|
159 | abstract | alignof | as | become | box |
160 | break | const | continue | crate | do |
161 | else | enum | extern | false | final |
162 | fn | for | if | impl | in |
163 | let | loop | macro | match | mod |
164 | move | mut | offsetof | override | priv |
165 | proc | pub | pure | ref | return |
166 | Self | self | sizeof | static | struct |
167 | super | trait | true | type | typeof |
168 | unsafe | unsized | use | virtual | where |
169 | while | yield | | | |
170
171
172 Each of these keywords has special meaning in its grammar, and all of them are
173 excluded from the `ident` rule.
174
175 ### Literals
176
177 ```antlr
178 lit_suffix : ident;
179 literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
180 ```
181
182 The optional `lit_suffix` production is only used for certain numeric literals,
183 but is reserved for future extension. That is, the above gives the lexical
184 grammar, but a Rust parser will reject everything but the 12 special cases
185 mentioned in [Number literals](reference.html#number-literals) in the
186 reference.
187
188 #### Character and string literals
189
190 ```antlr
191 char_lit : '\x27' char_body '\x27' ;
192 string_lit : '"' string_body * '"' | 'r' raw_string ;
193
194 char_body : non_single_quote
195 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
196
197 string_body : non_double_quote
198 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
199 raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
200
201 common_escape : '\x5c'
202 | 'n' | 'r' | 't' | '0'
203 | 'x' hex_digit 2
204 unicode_escape : 'u' '{' hex_digit+ 6 '}';
205
206 hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
207 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
208 | dec_digit ;
209 oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
210 dec_digit : '0' | nonzero_dec ;
211 nonzero_dec: '1' | '2' | '3' | '4'
212 | '5' | '6' | '7' | '8' | '9' ;
213 ```
214
215 #### Byte and byte string literals
216
217 ```antlr
218 byte_lit : "b\x27" byte_body '\x27' ;
219 byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
220
221 byte_body : ascii_non_single_quote
222 | '\x5c' [ '\x27' | common_escape ] ;
223
224 byte_string_body : ascii_non_double_quote
225 | '\x5c' [ '\x22' | common_escape ] ;
226 raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
227
228 ```
229
230 #### Number literals
231
232 ```antlr
233 num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
234 | '0' [ [ dec_digit | '_' ] * float_suffix ?
235 | 'b' [ '1' | '0' | '_' ] +
236 | 'o' [ oct_digit | '_' ] +
237 | 'x' [ hex_digit | '_' ] + ] ;
238
239 float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
240
241 exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
242 dec_lit : [ dec_digit | '_' ] + ;
243 ```
244
245 #### Boolean literals
246
247 ```antlr
248 bool_lit : [ "true" | "false" ] ;
249 ```
250
251 The two values of the boolean type are written `true` and `false`.
252
253 ### Symbols
254
255 ```antlr
256 symbol : "::" | "->"
257 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
258 | ',' | ';' ;
259 ```
260
261 Symbols are a general class of printable [token](#tokens) that play structural
262 roles in a variety of grammar productions. They are catalogued here for
263 completeness as the set of remaining miscellaneous printable tokens that do not
264 otherwise appear as [unary operators](#unary-operator-expressions), [binary
265 operators](#binary-operator-expressions), or [keywords](#keywords).
266
267 ## Paths
268
269 ```antlr
270 expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
271 expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
272 | expr_path ;
273
274 type_path : ident [ type_path_tail ] + ;
275 type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
276 | "::" type_path ;
277 ```
278
279 # Syntax extensions
280
281 ## Macros
282
283 ```antlr
284 expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
285 | "macro_rules" '!' ident '{' macro_rule * '}' ;
286 macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
287 matcher : '(' matcher * ')' | '[' matcher * ']'
288 | '{' matcher * '}' | '$' ident ':' ident
289 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
290 | non_special_token ;
291 transcriber : '(' transcriber * ')' | '[' transcriber * ']'
292 | '{' transcriber * '}' | '$' ident
293 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
294 | non_special_token ;
295 ```
296
297 # Crates and source files
298
299 **FIXME:** grammar? What production covers #![crate_id = "foo"] ?
300
301 # Items and attributes
302
303 **FIXME:** grammar?
304
305 ## Items
306
307 ```antlr
308 item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
309 | const_item | static_item | trait_item | impl_item | extern_block ;
310 ```
311
312 ### Type Parameters
313
314 **FIXME:** grammar?
315
316 ### Modules
317
318 ```antlr
319 mod_item : "mod" ident ( ';' | '{' mod '}' );
320 mod : [ view_item | item ] * ;
321 ```
322
323 #### View items
324
325 ```antlr
326 view_item : extern_crate_decl | use_decl ';' ;
327 ```
328
329 ##### Extern crate declarations
330
331 ```antlr
332 extern_crate_decl : "extern" "crate" crate_name
333 crate_name: ident | ( ident "as" ident )
334 ```
335
336 ##### Use declarations
337
338 ```antlr
339 use_decl : vis ? "use" [ path "as" ident
340 | path_glob ] ;
341
342 path_glob : ident [ "::" [ path_glob
343 | '*' ] ] ?
344 | '{' path_item [ ',' path_item ] * '}' ;
345
346 path_item : ident | "self" ;
347 ```
348
349 ### Functions
350
351 **FIXME:** grammar?
352
353 #### Generic functions
354
355 **FIXME:** grammar?
356
357 #### Unsafety
358
359 **FIXME:** grammar?
360
361 ##### Unsafe functions
362
363 **FIXME:** grammar?
364
365 ##### Unsafe blocks
366
367 **FIXME:** grammar?
368
369 #### Diverging functions
370
371 **FIXME:** grammar?
372
373 ### Type definitions
374
375 **FIXME:** grammar?
376
377 ### Structures
378
379 **FIXME:** grammar?
380
381 ### Enumerations
382
383 **FIXME:** grammar?
384
385 ### Constant items
386
387 ```antlr
388 const_item : "const" ident ':' type '=' expr ';' ;
389 ```
390
391 ### Static items
392
393 ```antlr
394 static_item : "static" ident ':' type '=' expr ';' ;
395 ```
396
397 #### Mutable statics
398
399 **FIXME:** grammar?
400
401 ### Traits
402
403 **FIXME:** grammar?
404
405 ### Implementations
406
407 **FIXME:** grammar?
408
409 ### External blocks
410
411 ```antlr
412 extern_block_item : "extern" '{' extern_block '}' ;
413 extern_block : [ foreign_fn ] * ;
414 ```
415
416 ## Visibility and Privacy
417
418 ```antlr
419 vis : "pub" ;
420 ```
421 ### Re-exporting and Visibility
422
423 See [Use declarations](#use-declarations).
424
425 ## Attributes
426
427 ```antlr
428 attribute : '#' '!' ? '[' meta_item ']' ;
429 meta_item : ident [ '=' literal
430 | '(' meta_seq ')' ] ? ;
431 meta_seq : meta_item [ ',' meta_seq ] ? ;
432 ```
433
434 # Statements and expressions
435
436 ## Statements
437
438 ```antlr
439 stmt : decl_stmt | expr_stmt ;
440 ```
441
442 ### Declaration statements
443
444 ```antlr
445 decl_stmt : item | let_decl ;
446 ```
447
448 #### Item declarations
449
450 See [Items](#items).
451
452 #### Variable declarations
453
454 ```antlr
455 let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
456 init : [ '=' ] expr ;
457 ```
458
459 ### Expression statements
460
461 ```antlr
462 expr_stmt : expr ';' ;
463 ```
464
465 ## Expressions
466
467 ```antlr
468 expr : literal | path | tuple_expr | unit_expr | struct_expr
469 | block_expr | method_call_expr | field_expr | array_expr
470 | idx_expr | range_expr | unop_expr | binop_expr
471 | paren_expr | call_expr | lambda_expr | while_expr
472 | loop_expr | break_expr | continue_expr | for_expr
473 | if_expr | match_expr | if_let_expr | while_let_expr
474 | return_expr ;
475 ```
476
477 #### Lvalues, rvalues and temporaries
478
479 **FIXME:** grammar?
480
481 #### Moved and copied types
482
483 **FIXME:** Do we want to capture this in the grammar as different productions?
484
485 ### Literal expressions
486
487 See [Literals](#literals).
488
489 ### Path expressions
490
491 See [Paths](#paths).
492
493 ### Tuple expressions
494
495 ```antlr
496 tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
497 ```
498
499 ### Unit expressions
500
501 ```antlr
502 unit_expr : "()" ;
503 ```
504
505 ### Structure expressions
506
507 ```antlr
508 struct_expr : expr_path '{' ident ':' expr
509 [ ',' ident ':' expr ] *
510 [ ".." expr ] '}' |
511 expr_path '(' expr
512 [ ',' expr ] * ')' |
513 expr_path ;
514 ```
515
516 ### Block expressions
517
518 ```antlr
519 block_expr : '{' [ stmt ';' | item ] *
520 [ expr ] '}' ;
521 ```
522
523 ### Method-call expressions
524
525 ```antlr
526 method_call_expr : expr '.' ident paren_expr_list ;
527 ```
528
529 ### Field expressions
530
531 ```antlr
532 field_expr : expr '.' ident ;
533 ```
534
535 ### Array expressions
536
537 ```antlr
538 array_expr : '[' "mut" ? array_elems? ']' ;
539
540 array_elems : [expr [',' expr]*] | [expr ';' expr] ;
541 ```
542
543 ### Index expressions
544
545 ```antlr
546 idx_expr : expr '[' expr ']' ;
547 ```
548
549 ### Range expressions
550
551 ```antlr
552 range_expr : expr ".." expr |
553 expr ".." |
554 ".." expr |
555 ".." ;
556 ```
557
558 ### Unary operator expressions
559
560 ```antlr
561 unop_expr : unop expr ;
562 unop : '-' | '*' | '!' ;
563 ```
564
565 ### Binary operator expressions
566
567 ```antlr
568 binop_expr : expr binop expr | type_cast_expr
569 | assignment_expr | compound_assignment_expr ;
570 binop : arith_op | bitwise_op | lazy_bool_op | comp_op
571 ```
572
573 #### Arithmetic operators
574
575 ```antlr
576 arith_op : '+' | '-' | '*' | '/' | '%' ;
577 ```
578
579 #### Bitwise operators
580
581 ```antlr
582 bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
583 ```
584
585 #### Lazy boolean operators
586
587 ```antlr
588 lazy_bool_op : "&&" | "||" ;
589 ```
590
591 #### Comparison operators
592
593 ```antlr
594 comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
595 ```
596
597 #### Type cast expressions
598
599 ```antlr
600 type_cast_expr : value "as" type ;
601 ```
602
603 #### Assignment expressions
604
605 ```antlr
606 assignment_expr : expr '=' expr ;
607 ```
608
609 #### Compound assignment expressions
610
611 ```antlr
612 compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
613 ```
614
615 ### Grouped expressions
616
617 ```antlr
618 paren_expr : '(' expr ')' ;
619 ```
620
621 ### Call expressions
622
623 ```antlr
624 expr_list : [ expr [ ',' expr ]* ] ? ;
625 paren_expr_list : '(' expr_list ')' ;
626 call_expr : expr paren_expr_list ;
627 ```
628
629 ### Lambda expressions
630
631 ```antlr
632 ident_list : [ ident [ ',' ident ]* ] ? ;
633 lambda_expr : '|' ident_list '|' expr ;
634 ```
635
636 ### While loops
637
638 ```antlr
639 while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ;
640 ```
641
642 ### Infinite loops
643
644 ```antlr
645 loop_expr : [ lifetime ':' ] "loop" '{' block '}';
646 ```
647
648 ### Break expressions
649
650 ```antlr
651 break_expr : "break" [ lifetime ];
652 ```
653
654 ### Continue expressions
655
656 ```antlr
657 continue_expr : "continue" [ lifetime ];
658 ```
659
660 ### For expressions
661
662 ```antlr
663 for_expr : [ lifetime ':' ] "for" pat "in" no_struct_literal_expr '{' block '}' ;
664 ```
665
666 ### If expressions
667
668 ```antlr
669 if_expr : "if" no_struct_literal_expr '{' block '}'
670 else_tail ? ;
671
672 else_tail : "else" [ if_expr | if_let_expr
673 | '{' block '}' ] ;
674 ```
675
676 ### Match expressions
677
678 ```antlr
679 match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
680
681 match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
682
683 match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
684 ```
685
686 ### If let expressions
687
688 ```antlr
689 if_let_expr : "if" "let" pat '=' expr '{' block '}'
690 else_tail ? ;
691 else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ;
692 ```
693
694 ### While let loops
695
696 ```antlr
697 while_let_expr : "while" "let" pat '=' expr '{' block '}' ;
698 ```
699
700 ### Return expressions
701
702 ```antlr
703 return_expr : "return" expr ? ;
704 ```
705
706 # Type system
707
708 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
709
710 ## Types
711
712 ### Primitive types
713
714 **FIXME:** grammar?
715
716 #### Machine types
717
718 **FIXME:** grammar?
719
720 #### Machine-dependent integer types
721
722 **FIXME:** grammar?
723
724 ### Textual types
725
726 **FIXME:** grammar?
727
728 ### Tuple types
729
730 **FIXME:** grammar?
731
732 ### Array, and Slice types
733
734 **FIXME:** grammar?
735
736 ### Structure types
737
738 **FIXME:** grammar?
739
740 ### Enumerated types
741
742 **FIXME:** grammar?
743
744 ### Pointer types
745
746 **FIXME:** grammar?
747
748 ### Function types
749
750 **FIXME:** grammar?
751
752 ### Closure types
753
754 ```antlr
755 closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
756 [ ':' bound-list ] [ '->' type ]
757 procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')'
758 [ ':' bound-list ] [ '->' type ]
759 lifetime-list := lifetime | lifetime ',' lifetime-list
760 arg-list := ident ':' type | ident ':' type ',' arg-list
761 bound-list := bound | bound '+' bound-list
762 bound := path | lifetime
763 ```
764
765 ### Object types
766
767 **FIXME:** grammar?
768
769 ### Type parameters
770
771 **FIXME:** grammar?
772
773 ### Self types
774
775 **FIXME:** grammar?
776
777 ## Type kinds
778
779 **FIXME:** this this probably not relevant to the grammar...
780
781 # Memory and concurrency models
782
783 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
784
785 ## Memory model
786
787 ### Memory allocation and lifetime
788
789 ### Memory ownership
790
791 ### Variables
792
793 ### Boxes
794
795 ## Threads
796
797 ### Communication between threads
798
799 ### Thread lifecycle