]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-guide/src/macro-expansion.md
New upstream version 1.32.0~beta.2+dfsg1
[rustc.git] / src / doc / rustc-guide / src / macro-expansion.md
1 # Macro expansion
2
3 Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
4 normal Rust parser, and the macro parser. During the parsing phase, the normal
5 Rust parser will set aside the contents of macros and their invocations. Later,
6 before name resolution, macros are expanded using these portions of the code.
7 The macro parser, in turn, may call the normal Rust parser when it needs to
8 bind a metavariable (e.g. `$my_expr`) while parsing the contents of a macro
9 invocation. The code for macro expansion is in
10 [`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to explain how macro
11 expansion works.
12
13 ### Example
14
15 It's helpful to have an example to refer to. For the remainder of this chapter,
16 whenever we refer to the "example _definition_", we mean the following:
17
18 ```rust,ignore
19 macro_rules! printer {
20 (print $mvar:ident) => {
21 println!("{}", $mvar);
22 }
23 (print twice $mvar:ident) => {
24 println!("{}", $mvar);
25 println!("{}", $mvar);
26 }
27 }
28 ```
29
30 `$mvar` is called a _metavariable_. Unlike normal variables, rather than
31 binding to a value in a computation, a metavariable binds _at compile time_ to
32 a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an
33 identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
34 special tokens, such as `EOF`, which indicates that there are no more tokens.
35 Token trees resulting from paired parentheses-like characters (`(`...`)`,
36 `[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
37 in between (we do require that parentheses-like characters be balanced). Having
38 macro expansion operate on token streams rather than the raw bytes of a source
39 file abstracts away a lot of complexity. The macro expander (and much of the
40 rest of the compiler) doesn't really care that much about the exact line and
41 column of some syntactic construct in the code; it cares about what constructs
42 are used in the code. Using tokens allows us to care about _what_ without
43 worrying about _where_. For more information about tokens, see the
44 [Parsing][parsing] chapter of this book.
45
46 Whenever we refer to the "example _invocation_", we mean the following snippet:
47
48 ```rust,ignore
49 printer!(print foo); // Assume `foo` is a variable defined somewhere else...
50 ```
51
52 The process of expanding the macro invocation into the syntax tree
53 `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
54 called _macro expansion_, and it is the topic of this chapter.
55
56 ### The macro parser
57
58 There are two parts to macro expansion: parsing the definition and parsing the
59 invocations. Interestingly, both are done by the macro parser.
60
61 Basically, the macro parser is like an NFA-based regex parser. It uses an
62 algorithm similar in spirit to the [Earley parsing
63 algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
64 defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
65
66 The interface of the macro parser is as follows (this is slightly simplified):
67
68 ```rust,ignore
69 fn parse(
70 sess: ParserSession,
71 tts: TokenStream,
72 ms: &[TokenTree]
73 ) -> NamedParseResult
74 ```
75
76 In this interface:
77
78 - `sess` is a "parsing session", which keeps track of some metadata. Most
79 notably, this is used to keep track of errors that are generated so they can
80 be reported to the user.
81 - `tts` is a stream of tokens. The macro parser's job is to consume the raw
82 stream of tokens and output a binding of metavariables to corresponding token
83 trees.
84 - `ms` a _matcher_. This is a sequence of token trees that we want to match
85 `tts` against.
86
87 In the analogy of a regex parser, `tts` is the input and we are matching it
88 against the pattern `ms`. Using our examples, `tts` could be the stream of
89 tokens containing the inside of the example invocation `print foo`, while `ms`
90 might be the sequence of token (trees) `print $mvar:ident`.
91
92 The output of the parser is a `NamedParseResult`, which indicates which of
93 three cases has occurred:
94
95 - Success: `tts` matches the given matcher `ms`, and we have produced a binding
96 from metavariables to the corresponding token trees.
97 - Failure: `tts` does not match `ms`. This results in an error message such as
98 "No rule expected token _blah_".
99 - Error: some fatal error has occurred _in the parser_. For example, this
100 happens if there are more than one pattern match, since that indicates
101 the macro is ambiguous.
102
103 The full interface is defined [here][code_parse_int].
104
105 The macro parser does pretty much exactly the same as a normal regex parser with
106 one exception: in order to parse different types of metavariables, such as
107 `ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
108 normal Rust parser.
109
110 As mentioned above, both definitions and invocations of macros are parsed using
111 the macro parser. This is extremely non-intuitive and self-referential. The code
112 to parse macro _definitions_ is in
113 [`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for
114 matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
115 a `macro_rules` definition should have in its body at least one occurrence of a
116 token tree followed by `=>` followed by another token tree. When the compiler
117 comes to a `macro_rules` definition, it uses this pattern to match the two token
118 trees per rule in the definition of the macro _using the macro parser itself_.
119 In our example definition, the metavariable `$lhs` would match the patterns of
120 both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
121 would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
122 println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
123 knowledge around for when it needs to expand a macro invocation.
124
125 When the compiler comes to a macro invocation, it parses that invocation using
126 the same NFA-based macro parser that is described above. However, the matcher
127 used is the first token tree (`$lhs`) extracted from the arms of the macro
128 _definition_. Using our example, we would try to match the token stream `print
129 foo` from the invocation against the matchers `print $mvar:ident` and `print
130 twice $mvar:ident` that we previously extracted from the definition. The
131 algorithm is exactly the same, but when the macro parser comes to a place in the
132 current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
133 it calls back to the normal Rust parser to get the contents of that
134 non-terminal. In this case, the Rust parser would look for an `ident` token,
135 which it finds (`foo`) and returns to the macro parser. Then, the macro parser
136 proceeds in parsing as normal. Also, note that exactly one of the matchers from
137 the various arms should match the invocation; if there is more than one match,
138 the parse is ambiguous, while if there are no matches at all, there is a syntax
139 error.
140
141 For more information about the macro parser's implementation, see the comments
142 in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
143
144 ### Hygiene
145
146 If you have ever used C/C++ preprocessor macros, you know that there are some
147 annoying and hard-to-debug gotchas! For example, consider the following C code:
148
149 ```c
150 #define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};
151
152 // Then, somewhere else
153 struct Bar {
154 ...
155 };
156
157 DEFINE_FOO
158 ```
159
160 Most people avoid writing C like this – and for good reason: it doesn't
161 compile. The `struct Bar` defined by the macro clashes names with the `struct
162 Bar` defined in the code. Consider also the following example:
163
164 ```c
165 #define DO_FOO(x) {\
166 int y = 0;\
167 foo(x, y);\
168 }
169
170 // Then elsewhere
171 int y = 22;
172 DO_FOO(y);
173 ```
174
175 Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead
176 we got `foo(0, 0)` because the macro defined its own `y`!
177
178 These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
179 handle names defined _within a macro_. In particular, a hygienic macro system
180 prevents errors due to names introduced within a macro. Rust macros are hygienic
181 in that they do not allow one to write the sorts of bugs above.
182
183 At a high level, hygiene within the rust compiler is accomplished by keeping
184 track of the context where a name is introduced and used. We can then
185 disambiguate names based on that context. Future iterations of the macro system
186 will allow greater control to the macro author to use that context. For example,
187 a macro author may want to introduce a new name to the context where the macro
188 was called. Alternately, the macro author may be defining a variable for use
189 only within the macro (i.e. it should not be visible outside the macro).
190
191 In rustc, this "context" is tracked via `Span`s.
192
193 TODO: what is call-site hygiene? what is def-site hygiene?
194
195 TODO
196
197 ### Procedural Macros
198
199 TODO
200
201 ### Custom Derive
202
203 TODO
204
205 TODO: maybe something about macros 2.0?
206
207
208 [code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt
209 [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/ext/tt/macro_parser/
210 [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/ext/tt/macro_rules/
211 [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/ext/tt/macro_parser/fn.parse.html
212 [parsing]: ./the-parser.html