]> git.proxmox.com Git - rustc.git/blame - src/doc/grammar.md
Imported Upstream version 1.2.0+dfsg1
[rustc.git] / src / doc / grammar.md
CommitLineData
85aaf69f
SL
1% Grammar
2
3# Introduction
4
5This document is the primary reference for the Rust programming language grammar. It
6provides only one kind of material:
7
bd371182 8 - Chapters that formally define the language grammar.
85aaf69f
SL
9
10This document does not serve as an introduction to the language. Background
11familiarity with the language is assumed. A separate [guide] is available to
12help acquire such background familiarity.
13
14This document also does not serve as a reference to the [standard] library
15included in the language distribution. Those libraries are documented
16separately by extracting documentation attributes from their source code. Many
17of the features that one might expect to be language features are library
18features in Rust, so what you're looking for may be there, not here.
19
20[guide]: guide.html
21[standard]: std/index.html
22
23# Notation
24
25Rust's grammar is defined over Unicode codepoints, each conventionally denoted
26`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
27confined to the ASCII range of Unicode, and is described in this document by a
28dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
29supported by common automated LL(k) parsing tools such as `llgen`, rather than
30the dialect given in ISO 14977. The dialect can be defined self-referentially
31as follows:
32
33```antlr
34grammar : rule + ;
35rule : nonterminal ':' productionrule ';' ;
36productionrule : production [ '|' production ] * ;
37production : term * ;
38term : element repeats ;
39element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
40repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
41```
42
43Where:
44
45- Whitespace in the grammar is ignored.
46- Square brackets are used to group rules.
47- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
48 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
49 Unicode codepoint `U+00QQ`.
50- `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
51- The `repeat` forms apply to the adjacent `element`, and are as follows:
52 - `?` means zero or one repetition
53 - `*` means zero or more repetitions
54 - `+` means one or more repetitions
55 - NUMBER trailing a repeat symbol gives a maximum repetition count
56 - NUMBER on its own gives an exact repetition count
57
58This EBNF dialect should hopefully be familiar to many readers.
59
60## Unicode productions
61
62A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
63range. We define these productions in terms of character properties specified
64in the Unicode standard, rather than in terms of ASCII-range codepoints. The
65section [Special Unicode Productions](#special-unicode-productions) lists these
66productions.
67
68## String table productions
69
70Some rules in the grammar — notably [unary
71operators](#unary-operator-expressions), [binary
72operators](#binary-operator-expressions), and [keywords](#keywords) — are
73given in a simplified form: as a listing of a table of unquoted, printable
74whitespace-separated strings. These cases form a subset of the rules regarding
75the [token](#tokens) rule, and are assumed to be the result of a
76lexical-analysis phase feeding the parser, driven by a DFA, operating over the
77disjunction of all such string table entries.
78
79When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
80it is an implicit reference to a single member of such a string table
81production. See [tokens](#tokens) for more information.
82
83# Lexical structure
84
85## Input format
86
87Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
88Most Rust grammar rules are defined in terms of printable ASCII-range
89codepoints, but a small number are defined in terms of Unicode properties or
90explicit codepoint lists. [^inputformat]
91
92[^inputformat]: Substitute definitions for the special Unicode productions are
93 provided to the grammar verifier, restricted to ASCII range, when verifying the
94 grammar in this document.
95
96## Special Unicode Productions
97
98The following productions in the Rust grammar are defined in terms of Unicode
bd371182
AL
99properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
100`non_double_quote`.
85aaf69f
SL
101
102### Identifiers
103
bd371182
AL
104The `ident` production is any nonempty Unicode[^non_ascii_idents] string of
105the following form:
106
107[^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature
108 gated. This is expected to improve soon.
85aaf69f
SL
109
110- The first character has property `XID_start`
111- The remaining characters have property `XID_continue`
112
113that does _not_ occur in the set of [keywords](#keywords).
114
115> **Note**: `XID_start` and `XID_continue` as character properties cover the
116> character ranges used to form the more familiar C and Java language-family
117> identifiers.
118
119### Delimiter-restricted productions
120
121Some productions are defined by exclusion of particular Unicode characters:
122
123- `non_null` is any single Unicode character aside from `U+0000` (null)
124- `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
85aaf69f
SL
125- `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`)
126- `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)
127
128## Comments
129
130```antlr
131comment : block_comment | line_comment ;
132block_comment : "/*" block_comment_body * "*/" ;
133block_comment_body : [block_comment | character] * ;
134line_comment : "//" non_eol * ;
135```
136
137**FIXME:** add doc grammar?
138
139## Whitespace
140
141```antlr
142whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
143whitespace : [ whitespace_char | comment ] + ;
144```
145
146## Tokens
147
148```antlr
149simple_token : keyword | unop | binop ;
150token : simple_token | ident | literal | symbol | whitespace token ;
151```
152
153### Keywords
154
155<p id="keyword-table-marker"></p>
156
bd371182
AL
157| | | | | |
158|----------|----------|----------|----------|---------|
159| abstract | alignof | as | become | box |
160| break | const | continue | crate | do |
161| else | enum | extern | false | final |
162| fn | for | if | impl | in |
163| let | loop | macro | match | mod |
164| move | mut | offsetof | override | priv |
165| proc | pub | pure | ref | return |
166| Self | self | sizeof | static | struct |
167| super | trait | true | type | typeof |
168| unsafe | unsized | use | virtual | where |
169| while | yield | | | |
85aaf69f
SL
170
171
172Each of these keywords has special meaning in its grammar, and all of them are
173excluded from the `ident` rule.
174
175### Literals
176
177```antlr
178lit_suffix : ident;
bd371182 179literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
85aaf69f
SL
180```
181
bd371182
AL
182The optional `lit_suffix` production is only used for certain numeric literals,
183but is reserved for future extension. That is, the above gives the lexical
184grammar, but a Rust parser will reject everything but the 12 special cases
185mentioned in [Number literals](reference.html#number-literals) in the
186reference.
187
85aaf69f
SL
188#### Character and string literals
189
190```antlr
191char_lit : '\x27' char_body '\x27' ;
192string_lit : '"' string_body * '"' | 'r' raw_string ;
193
194char_body : non_single_quote
195 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
196
197string_body : non_double_quote
198 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
199raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
200
201common_escape : '\x5c'
202 | 'n' | 'r' | 't' | '0'
203 | 'x' hex_digit 2
204unicode_escape : 'u' '{' hex_digit+ 6 '}';
205
206hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
207 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
208 | dec_digit ;
209oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
210dec_digit : '0' | nonzero_dec ;
211nonzero_dec: '1' | '2' | '3' | '4'
212 | '5' | '6' | '7' | '8' | '9' ;
213```
214
215#### Byte and byte string literals
216
217```antlr
218byte_lit : "b\x27" byte_body '\x27' ;
219byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
220
221byte_body : ascii_non_single_quote
222 | '\x5c' [ '\x27' | common_escape ] ;
223
224byte_string_body : ascii_non_double_quote
225 | '\x5c' [ '\x22' | common_escape ] ;
226raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
227
228```
229
230#### Number literals
231
232```antlr
233num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
234 | '0' [ [ dec_digit | '_' ] * float_suffix ?
235 | 'b' [ '1' | '0' | '_' ] +
236 | 'o' [ oct_digit | '_' ] +
237 | 'x' [ hex_digit | '_' ] + ] ;
238
239float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
240
241exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
242dec_lit : [ dec_digit | '_' ] + ;
243```
244
245#### Boolean literals
246
bd371182
AL
247```antlr
248bool_lit : [ "true" | "false" ] ;
249```
85aaf69f
SL
250
251The two values of the boolean type are written `true` and `false`.
252
253### Symbols
254
255```antlr
bd371182 256symbol : "::" | "->"
85aaf69f
SL
257 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
258 | ',' | ';' ;
259```
260
261Symbols are a general class of printable [token](#tokens) that play structural
262roles in a variety of grammar productions. They are catalogued here for
263completeness as the set of remaining miscellaneous printable tokens that do not
264otherwise appear as [unary operators](#unary-operator-expressions), [binary
265operators](#binary-operator-expressions), or [keywords](#keywords).
266
267## Paths
268
269```antlr
270expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
271expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
272 | expr_path ;
273
274type_path : ident [ type_path_tail ] + ;
275type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
276 | "::" type_path ;
277```
278
279# Syntax extensions
280
281## Macros
282
283```antlr
62682a34
SL
284expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
285 | "macro_rules" '!' ident '{' macro_rule * '}' ;
85aaf69f
SL
286macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
287matcher : '(' matcher * ')' | '[' matcher * ']'
288 | '{' matcher * '}' | '$' ident ':' ident
289 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
290 | non_special_token ;
291transcriber : '(' transcriber * ')' | '[' transcriber * ']'
292 | '{' transcriber * '}' | '$' ident
293 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
294 | non_special_token ;
295```
296
297# Crates and source files
298
299**FIXME:** grammar? What production covers #![crate_id = "foo"] ?
300
301# Items and attributes
302
c34b1796 303**FIXME:** grammar?
85aaf69f
SL
304
305## Items
306
307```antlr
bd371182
AL
308item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
309 | const_item | static_item | trait_item | impl_item | extern_block ;
85aaf69f
SL
310```
311
312### Type Parameters
313
c34b1796 314**FIXME:** grammar?
85aaf69f
SL
315
316### Modules
317
318```antlr
319mod_item : "mod" ident ( ';' | '{' mod '}' );
320mod : [ view_item | item ] * ;
321```
322
323#### View items
324
325```antlr
bd371182 326view_item : extern_crate_decl | use_decl ';' ;
85aaf69f
SL
327```
328
329##### Extern crate declarations
330
331```antlr
332extern_crate_decl : "extern" "crate" crate_name
bd371182 333crate_name: ident | ( ident "as" ident )
85aaf69f
SL
334```
335
336##### Use declarations
337
338```antlr
bd371182
AL
339use_decl : vis ? "use" [ path "as" ident
340 | path_glob ] ;
85aaf69f
SL
341
342path_glob : ident [ "::" [ path_glob
343 | '*' ] ] ?
344 | '{' path_item [ ',' path_item ] * '}' ;
345
bd371182 346path_item : ident | "self" ;
85aaf69f
SL
347```
348
349### Functions
350
c34b1796 351**FIXME:** grammar?
85aaf69f
SL
352
353#### Generic functions
354
c34b1796 355**FIXME:** grammar?
85aaf69f
SL
356
357#### Unsafety
358
c34b1796 359**FIXME:** grammar?
85aaf69f
SL
360
361##### Unsafe functions
362
c34b1796 363**FIXME:** grammar?
85aaf69f
SL
364
365##### Unsafe blocks
366
c34b1796 367**FIXME:** grammar?
85aaf69f
SL
368
369#### Diverging functions
370
c34b1796 371**FIXME:** grammar?
85aaf69f
SL
372
373### Type definitions
374
c34b1796 375**FIXME:** grammar?
85aaf69f
SL
376
377### Structures
378
c34b1796 379**FIXME:** grammar?
85aaf69f 380
bd371182
AL
381### Enumerations
382
383**FIXME:** grammar?
384
85aaf69f
SL
385### Constant items
386
387```antlr
388const_item : "const" ident ':' type '=' expr ';' ;
389```
390
391### Static items
392
393```antlr
394static_item : "static" ident ':' type '=' expr ';' ;
395```
396
397#### Mutable statics
398
c34b1796 399**FIXME:** grammar?
85aaf69f
SL
400
401### Traits
402
c34b1796 403**FIXME:** grammar?
85aaf69f
SL
404
405### Implementations
406
c34b1796 407**FIXME:** grammar?
85aaf69f
SL
408
409### External blocks
410
411```antlr
412extern_block_item : "extern" '{' extern_block '}' ;
413extern_block : [ foreign_fn ] * ;
414```
415
416## Visibility and Privacy
417
bd371182
AL
418```antlr
419vis : "pub" ;
420```
85aaf69f
SL
421### Re-exporting and Visibility
422
bd371182 423See [Use declarations](#use-declarations).
85aaf69f
SL
424
425## Attributes
426
427```antlr
bd371182 428attribute : '#' '!' ? '[' meta_item ']' ;
85aaf69f
SL
429meta_item : ident [ '=' literal
430 | '(' meta_seq ')' ] ? ;
431meta_seq : meta_item [ ',' meta_seq ] ? ;
432```
433
434# Statements and expressions
435
436## Statements
437
bd371182
AL
438```antlr
439stmt : decl_stmt | expr_stmt ;
440```
85aaf69f
SL
441
442### Declaration statements
443
bd371182
AL
444```antlr
445decl_stmt : item | let_decl ;
446```
85aaf69f
SL
447
448#### Item declarations
449
bd371182 450See [Items](#items).
85aaf69f 451
bd371182 452#### Variable declarations
85aaf69f
SL
453
454```antlr
455let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
456init : [ '=' ] expr ;
457```
458
459### Expression statements
460
bd371182
AL
461```antlr
462expr_stmt : expr ';' ;
463```
85aaf69f
SL
464
465## Expressions
466
bd371182
AL
467```antlr
468expr : literal | path | tuple_expr | unit_expr | struct_expr
469 | block_expr | method_call_expr | field_expr | array_expr
470 | idx_expr | range_expr | unop_expr | binop_expr
471 | paren_expr | call_expr | lambda_expr | while_expr
472 | loop_expr | break_expr | continue_expr | for_expr
473 | if_expr | match_expr | if_let_expr | while_let_expr
474 | return_expr ;
475```
85aaf69f
SL
476
477#### Lvalues, rvalues and temporaries
478
c34b1796 479**FIXME:** grammar?
85aaf69f
SL
480
481#### Moved and copied types
482
c34b1796 483**FIXME:** Do we want to capture this in the grammar as different productions?
85aaf69f
SL
484
485### Literal expressions
486
bd371182 487See [Literals](#literals).
85aaf69f
SL
488
489### Path expressions
490
bd371182 491See [Paths](#paths).
85aaf69f
SL
492
493### Tuple expressions
494
bd371182
AL
495```antlr
496tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
497```
85aaf69f
SL
498
499### Unit expressions
500
bd371182
AL
501```antlr
502unit_expr : "()" ;
503```
85aaf69f
SL
504
505### Structure expressions
506
507```antlr
508struct_expr : expr_path '{' ident ':' expr
509 [ ',' ident ':' expr ] *
510 [ ".." expr ] '}' |
511 expr_path '(' expr
512 [ ',' expr ] * ')' |
513 expr_path ;
514```
515
516### Block expressions
517
518```antlr
bd371182 519block_expr : '{' [ stmt ';' | item ] *
85aaf69f
SL
520 [ expr ] '}' ;
521```
522
523### Method-call expressions
524
525```antlr
526method_call_expr : expr '.' ident paren_expr_list ;
527```
528
529### Field expressions
530
531```antlr
532field_expr : expr '.' ident ;
533```
534
535### Array expressions
536
537```antlr
c34b1796 538array_expr : '[' "mut" ? array_elems? ']' ;
85aaf69f 539
bd371182 540array_elems : [expr [',' expr]*] | [expr ';' expr] ;
85aaf69f
SL
541```
542
543### Index expressions
544
545```antlr
546idx_expr : expr '[' expr ']' ;
547```
548
bd371182
AL
549### Range expressions
550
551```antlr
552range_expr : expr ".." expr |
553 expr ".." |
554 ".." expr |
555 ".." ;
556```
557
85aaf69f
SL
558### Unary operator expressions
559
bd371182
AL
560```antlr
561unop_expr : unop expr ;
562unop : '-' | '*' | '!' ;
563```
85aaf69f
SL
564
565### Binary operator expressions
566
567```antlr
bd371182
AL
568binop_expr : expr binop expr | type_cast_expr
569 | assignment_expr | compound_assignment_expr ;
570binop : arith_op | bitwise_op | lazy_bool_op | comp_op
85aaf69f
SL
571```
572
573#### Arithmetic operators
574
bd371182
AL
575```antlr
576arith_op : '+' | '-' | '*' | '/' | '%' ;
577```
85aaf69f
SL
578
579#### Bitwise operators
580
bd371182
AL
581```antlr
582bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
583```
85aaf69f
SL
584
585#### Lazy boolean operators
586
bd371182
AL
587```antlr
588lazy_bool_op : "&&" | "||" ;
589```
85aaf69f
SL
590
591#### Comparison operators
592
bd371182
AL
593```antlr
594comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
595```
85aaf69f
SL
596
597#### Type cast expressions
598
bd371182
AL
599```antlr
600type_cast_expr : value "as" type ;
601```
85aaf69f
SL
602
603#### Assignment expressions
604
bd371182
AL
605```antlr
606assignment_expr : expr '=' expr ;
607```
85aaf69f
SL
608
609#### Compound assignment expressions
610
bd371182
AL
611```antlr
612compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
85aaf69f
SL
613```
614
85aaf69f
SL
615### Grouped expressions
616
617```antlr
618paren_expr : '(' expr ')' ;
619```
620
621### Call expressions
622
623```antlr
624expr_list : [ expr [ ',' expr ]* ] ? ;
625paren_expr_list : '(' expr_list ')' ;
626call_expr : expr paren_expr_list ;
627```
628
629### Lambda expressions
630
631```antlr
632ident_list : [ ident [ ',' ident ]* ] ? ;
633lambda_expr : '|' ident_list '|' expr ;
634```
635
636### While loops
637
638```antlr
bd371182 639while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ;
85aaf69f
SL
640```
641
642### Infinite loops
643
644```antlr
645loop_expr : [ lifetime ':' ] "loop" '{' block '}';
646```
647
648### Break expressions
649
650```antlr
651break_expr : "break" [ lifetime ];
652```
653
654### Continue expressions
655
656```antlr
657continue_expr : "continue" [ lifetime ];
658```
659
660### For expressions
661
662```antlr
bd371182 663for_expr : [ lifetime ':' ] "for" pat "in" no_struct_literal_expr '{' block '}' ;
85aaf69f
SL
664```
665
666### If expressions
667
668```antlr
669if_expr : "if" no_struct_literal_expr '{' block '}'
670 else_tail ? ;
671
672else_tail : "else" [ if_expr | if_let_expr
673 | '{' block '}' ] ;
674```
675
676### Match expressions
677
678```antlr
679match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
680
681match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
682
683match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
684```
685
686### If let expressions
687
688```antlr
689if_let_expr : "if" "let" pat '=' expr '{' block '}'
690 else_tail ? ;
691else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ;
692```
693
694### While let loops
695
696```antlr
697while_let_expr : "while" "let" pat '=' expr '{' block '}' ;
698```
699
700### Return expressions
701
702```antlr
703return_expr : "return" expr ? ;
704```
705
706# Type system
707
c34b1796 708**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
85aaf69f
SL
709
710## Types
711
712### Primitive types
713
c34b1796 714**FIXME:** grammar?
85aaf69f
SL
715
716#### Machine types
717
c34b1796 718**FIXME:** grammar?
85aaf69f
SL
719
720#### Machine-dependent integer types
721
c34b1796 722**FIXME:** grammar?
85aaf69f
SL
723
724### Textual types
725
c34b1796 726**FIXME:** grammar?
85aaf69f
SL
727
728### Tuple types
729
c34b1796 730**FIXME:** grammar?
85aaf69f
SL
731
732### Array, and Slice types
733
c34b1796 734**FIXME:** grammar?
85aaf69f
SL
735
736### Structure types
737
c34b1796 738**FIXME:** grammar?
85aaf69f
SL
739
740### Enumerated types
741
c34b1796 742**FIXME:** grammar?
85aaf69f
SL
743
744### Pointer types
745
c34b1796 746**FIXME:** grammar?
85aaf69f
SL
747
748### Function types
749
c34b1796 750**FIXME:** grammar?
85aaf69f
SL
751
752### Closure types
753
754```antlr
755closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
756 [ ':' bound-list ] [ '->' type ]
757procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')'
758 [ ':' bound-list ] [ '->' type ]
759lifetime-list := lifetime | lifetime ',' lifetime-list
760arg-list := ident ':' type | ident ':' type ',' arg-list
761bound-list := bound | bound '+' bound-list
762bound := path | lifetime
763```
764
765### Object types
766
c34b1796 767**FIXME:** grammar?
85aaf69f
SL
768
769### Type parameters
770
c34b1796 771**FIXME:** grammar?
85aaf69f
SL
772
773### Self types
774
c34b1796 775**FIXME:** grammar?
85aaf69f
SL
776
777## Type kinds
778
779**FIXME:** this this probably not relevant to the grammar...
780
781# Memory and concurrency models
782
c34b1796 783**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
85aaf69f
SL
784
785## Memory model
786
787### Memory allocation and lifetime
788
789### Memory ownership
790
bd371182 791### Variables
85aaf69f
SL
792
793### Boxes
794
bd371182 795## Threads
85aaf69f 796
bd371182 797### Communication between threads
85aaf69f 798
bd371182 799### Thread lifecycle