]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/macro-expansion.md
New upstream version 1.67.1+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / macro-expansion.md
1 # Macro expansion
2
3 <!-- toc -->
4
5 > `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing
6 > refactoring, so some of the links in this chapter may be broken.
7
8 Rust has a very powerful macro system. In the previous chapter, we saw how the
9 parser sets aside macros to be expanded (it temporarily uses [placeholders]).
10 This chapter is about the process of expanding those macros iteratively until
11 we have a complete AST for our crate with no unexpanded macros (or a compile
12 error).
13
14 [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
15
16 First, we will discuss the algorithm that expands and integrates macro output
17 into ASTs. Next, we will take a look at how hygiene data is collected. Finally,
18 we will look at the specifics of expanding different types of macros.
19
20 Many of the algorithms and data structures described below are in [`rustc_expand`],
21 with basic data structures in [`rustc_expand::base`][base].
22
23 Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
24 handled in [`rustc_expand::config`][cfg].
25
26 [`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
27 [base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html
28 [cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html
29
30 ## Expansion and AST Integration
31
32 First of all, expansion happens at the crate level. Given a raw source code for
33 a crate, the compiler will produce a massive AST with all macros expanded, all
34 modules inlined, etc. The primary entry point for this process is the
35 [`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
36 use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
37 below for more detailed discussion of edge case expansion issues).
38
39 [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
40 [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
41
42 At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
43 queue of unresolved macro invocations (that is, macros we haven't found the
44 definition of yet). We repeatedly try to pick a macro from the queue, resolve
45 it, expand it, and integrate it back. If we can't make progress in an
46 iteration, this represents a compile error. Here is the [algorithm][original]:
47
48 [fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
49 [original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
50
51 1. Initialize an `queue` of unresolved macros.
52 2. Repeat until `queue` is empty (or we make no progress, which is an error):
53 1. [Resolve](./name-resolution.md) imports in our partially built crate as
54 much as possible.
55 2. Collect as many macro [`Invocation`s][inv] as possible from our
56 partially built crate (fn-like, attributes, derives) and add them to the
57 queue.
58 3. Dequeue the first element, and attempt to resolve it.
59 4. If it's resolved:
60 1. Run the macro's expander function that consumes a [`TokenStream`] or
61 AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
62 the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt],
63 each of which are a token (punctuation, identifier, or literal) or a
64 delimited group (anything inside `()`/`[]`/`{}`)).
65 - At this point, we know everything about the macro itself and can
66 call `set_expn_data` to fill in its properties in the global data;
67 that is the hygiene data associated with `ExpnId`. (See [the
68 "Hygiene" section below][hybelow]).
69 2. Integrate that piece of AST into the big existing partially built
70 AST. This is essentially where the "token-like mass" becomes a
71 proper set-in-stone AST with side-tables. It happens as follows:
72 - If the macro produces tokens (e.g. a proc macro), we parse into
73 an AST, which may produce parse errors.
74 - During expansion, we create `SyntaxContext`s (hierarchy 2). (See
75 [the "Hygiene" section below][hybelow])
76 - These three passes happen one after another on every AST fragment
77 freshly expanded from a macro:
78 - [`NodeId`]s are assigned by [`InvocationCollector`]. This
79 also collects new macro calls from this new AST piece and
80 adds them to the queue.
81 - ["Def paths"][defpath] are created and [`DefId`]s are
82 assigned to them by [`DefCollector`].
83 - Names are put into modules (from the resolver's point of
84 view) by [`BuildReducedGraphVisitor`].
85 3. After expanding a single macro and integrating its output, continue
86 to the next iteration of [`fully_expand_fragment`][fef].
87 5. If it's not resolved:
88 1. Put the macro back in the queue
89 2. Continue to next iteration...
90
91 [defpath]: hir.md#identifiers-in-the-hir
92 [`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
93 [`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
94 [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
95 [`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
96 [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
97 [hybelow]: #hygiene-and-hierarchies
98 [tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
99 [`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
100 [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
101 [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
102
103 ### Error Recovery
104
105 If we make no progress in an iteration, then we have reached a compilation
106 error (e.g. an undefined macro). We attempt to recover from failures
107 (unresolved macros or imports) for the sake of diagnostics. This allows
108 compilation to continue past the first error, so that we can report more errors
109 at a time. Recovery can't cause compilation to succeed. We know that it will
110 fail at this point. The recovery happens by expanding unresolved macros into
111 [`ExprKind::Err`][err].
112
113 [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err
114
115 ### Name Resolution
116
117 Notice that name resolution is involved here: we need to resolve imports and
118 macro names in the above algorithm. This is done in
119 [`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
120 those resolutions, and reports various errors (e.g. "not found" or "found, but
121 it's unstable" or "expected x, found y"). However, we don't try to resolve
122 other names yet. This happens later, as we will see in the [next
123 chapter](./name-resolution.md).
124
125 [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html
126
127 ### Eager Expansion
128
129 _Eager expansion_ means that we expand the arguments of a macro invocation
130 before the macro invocation itself. This is implemented only for a few special
131 built-in macros that expect literals; expanding arguments first for some of
132 these macro results in a smoother user experience. As an example, consider the
133 following:
134
135 ```rust,ignore
136 macro bar($i: ident) { $i }
137 macro foo($i: ident) { $i }
138
139 foo!(bar!(baz));
140 ```
141
142 A lazy expansion would expand `foo!` first. An eager expansion would expand
143 `bar!` first.
144
145 Eager expansion is not a generally available feature of Rust. Implementing
146 eager expansion more generally would be challenging, but we implement it for a
147 few special built-in macros for the sake of user experience. The built-in
148 macros are implemented in [`rustc_builtin_macros`], along with some other early
149 code generation facilities like injection of standard library imports or
150 generation of test harness. There are some additional helpers for building
151 their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally
152 performs a subset of the things that lazy (normal) expansion does. It is done by
153 invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to
154 the whole crate, like we normally do).
155
156 ### Other Data Structures
157
158 Here are some other notable data structures involved in expansion and integration:
159 - [`ResolverExpand`] - a trait used to break crate dependencies. This allows the
160 resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
161 pretty much everything else depending on [`rustc_ast`].
162 - [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion
163 infrastructure in the process of its work
164 - [`Annotatable`] - a piece of AST that can be an attribute target, almost same
165 thing as AstFragment except for types and patterns that can be produced by
166 macros but cannot be annotated with attributes
167 - [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a
168 different `AstFragment` depending on its [`AstFragmentKind`] - item,
169 or expression, or pattern etc.
170
171 [`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
172 [`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
173 [`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
174 [`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html
175 [`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html
176 [`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html
177 [`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html
178 [`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html
179
180 ## Hygiene and Hierarchies
181
182 If you have ever used C/C++ preprocessor macros, you know that there are some
183 annoying and hard-to-debug gotchas! For example, consider the following C code:
184
185 ```c
186 #define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};
187
188 // Then, somewhere else
189 struct Bar {
190 ...
191 };
192
193 DEFINE_FOO
194 ```
195
196 Most people avoid writing C like this – and for good reason: it doesn't
197 compile. The `struct Bar` defined by the macro clashes names with the `struct
198 Bar` defined in the code. Consider also the following example:
199
200 ```c
201 #define DO_FOO(x) {\
202 int y = 0;\
203 foo(x, y);\
204 }
205
206 // Then elsewhere
207 int y = 22;
208 DO_FOO(y);
209 ```
210
211 Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead
212 we got `foo(0, 0)` because the macro defined its own `y`!
213
214 These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
215 handle names defined _within a macro_. In particular, a hygienic macro system
216 prevents errors due to names introduced within a macro. Rust macros are hygienic
217 in that they do not allow one to write the sorts of bugs above.
218
219 At a high level, hygiene within the Rust compiler is accomplished by keeping
220 track of the context where a name is introduced and used. We can then
221 disambiguate names based on that context. Future iterations of the macro system
222 will allow greater control to the macro author to use that context. For example,
223 a macro author may want to introduce a new name to the context where the macro
224 was called. Alternately, the macro author may be defining a variable for use
225 only within the macro (i.e. it should not be visible outside the macro).
226
227 [code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
228 [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
229 [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules
230 [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
231 [parsing]: ./the-parser.html
232
233 The context is attached to AST nodes. All AST nodes generated by macros have
234 context attached. Additionally, there may be other nodes that have context
235 attached, such as some desugared syntax (non-macro-expanded nodes are
236 considered to just have the "root" context, as described below).
237 Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
238 This struct also has hygiene information attached to it, as we will see later.
239
240 [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
241
242 Because macros invocations and definitions can be nested, the syntax context of
243 a node must be a hierarchy. For example, if we expand a macro and there is
244 another macro invocation or definition in the generated output, then the syntax
245 context should reflect the nesting.
246
247 However, it turns out that there are actually a few types of context we may
248 want to track for different purposes. Thus, there are not just one but _three_
249 expansion hierarchies that together comprise the hygiene information for a
250 crate.
251
252 All of these hierarchies need some sort of "macro ID" to identify individual
253 elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive
254 an integer ID, assigned continuously starting from 0 as we discover new macro
255 calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own
256 parent.
257
258 [`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms
259 (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
260 and structures related to hygiene and expansion that are kept in global data.
261
262 The actual hierarchies are stored in [`HygieneData`][hd]. This is a global
263 piece of data containing hygiene and expansion info that can be accessed from
264 any [`Ident`] without any context.
265
266
267 [`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
268 [rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root
269 [hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html
270 [hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
271 [hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root
272 [`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html
273
274 ### The Expansion Order Hierarchy
275
276 The first hierarchy tracks the order of expansions, i.e., when a macro
277 invocation is in the output of another macro.
278
279 Here, the children in the hierarchy will be the "innermost" tokens. The
280 [`ExpnData`] struct itself contains a subset of properties from both macro
281 definition and macro call available through global data.
282 [`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy.
283
284 [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
285 [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent
286
287 For example,
288
289 ```rust,ignore
290 macro_rules! foo { () => { println!(); } }
291
292 fn main() { foo!(); }
293 ```
294
295 In this code, the AST nodes that are finally generated would have hierarchy:
296
297 ```
298 root
299 expn_id_foo
300 expn_id_println
301 ```
302
303 ### The Macro Definition Hierarchy
304
305 The second hierarchy tracks the order of macro definitions, i.e., when we are
306 expanding one macro another macro definition is revealed in its output. This
307 one is a bit tricky and more complex than the other two hierarchies.
308
309 [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
310 [`SyntaxContextData`][scd] contains data associated with the given
311 `SyntaxContext`; mostly it is a cache for results of filtering that chain in
312 different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent
313 link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
314 elements in the chain. The "chaining operator" is
315 [`SyntaxContext::apply_mark`][am] in compiler code.
316
317 A [`Span`][span], mentioned above, is actually just a compact representation of
318 a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
319 [`Symbol`] + `Span` (i.e. an interned string + hygiene data).
320
321 [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
322 [scd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html
323 [scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent
324 [sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
325 [scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
326 [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
327
328 For built-in macros, we use the context:
329 `SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to
330 be defined at the hierarchy root. We do the same for proc-macros because we
331 haven't implemented cross-crate hygiene yet.
332
333 If the token had context `X` before being produced by a macro then after being
334 produced by the macro it has context `X -> macro_id`. Here are some examples:
335
336 Example 0:
337
338 ```rust,ignore
339 macro m() { ident }
340
341 m!();
342 ```
343
344 Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has
345 context `ROOT -> id(m)` after it's produced by `m`.
346
347 [scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root
348
349
350 Example 1:
351
352 ```rust,ignore
353 macro m() { macro n() { ident } }
354
355 m!();
356 n!();
357 ```
358 In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)`
359 after the first expansion, then `ROOT -> id(m) -> id(n)`.
360
361 Example 2:
362
363 Note that these chains are not entirely determined by their last element, in
364 other words `ExpnId` is not isomorphic to `SyntaxContext`.
365
366 ```rust,ignore
367 macro m($i: ident) { macro n() { ($i, bar) } }
368
369 m!(foo);
370 ```
371
372 After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
373 `ROOT -> id(m) -> id(n)`.
374
375 Finally, one last thing to mention is that currently, this hierarchy is subject
376 to the ["context transplantation hack"][hack]. Basically, the more modern (and
377 experimental) `macro` macros have stronger hygiene than the older MBE system,
378 but this can result in weird interactions between the two. The hack is intended
379 to make things "just work" for now.
380
381 [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
382
383 ### The Call-site Hierarchy
384
385 The third and final hierarchy tracks the location of macro invocations.
386
387 In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
388
389 [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site
390
391 Here is an example:
392
393 ```rust,ignore
394 macro bar($i: ident) { $i }
395 macro foo($i: ident) { $i }
396
397 foo!(bar!(baz));
398 ```
399
400 For the `baz` AST node in the final output, the first hierarchy is `ROOT ->
401 id(foo) -> id(bar) -> baz`, while the third hierarchy is `ROOT -> baz`.
402
403 ### Macro Backtraces
404
405 Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
406 in [`rustc_span::hygiene`][hy].
407
408 [`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
409
410 ## Producing Macro Output
411
412 Above, we saw how the output of a macro is integrated into the AST for a crate,
413 and we also saw how the hygiene data for a crate is generated. But how do we
414 actually produce the output of a macro? It depends on the type of macro.
415
416 There are two types of macros in Rust:
417 `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros
418 (or "proc macros"; including custom derives). During the parsing phase, the normal
419 Rust parser will set aside the contents of macros and their invocations. Later,
420 macros are expanded using these portions of the code.
421
422 Some important data structures/interfaces here:
423 - [`SyntaxExtension`] - a lowered macro representation, contains its expander
424 function, which transforms a `TokenStream` or AST into another `TokenStream`
425 or AST + some additional data like stability, or a list of unstable features
426 allowed inside the macro.
427 - [`SyntaxExtensionKind`] - expander functions may have several different
428 signatures (take one token stream, or two, or a piece of AST, etc). This is
429 an enum that lists them.
430 - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
431 traits representing the expander function signatures.
432
433 [`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
434 [`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
435 [`BangProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.BangProcMacro.html
436 [`TTMacroExpander`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.TTMacroExpander.html
437 [`AttrProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.AttrProcMacro.html
438 [`MultiItemModifier`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MultiItemModifier.html
439
440 ## Macros By Example
441
442 MBEs have their own parser distinct from the normal Rust parser. When macros
443 are expanded, we may invoke the MBE parser to parse and expand a macro. The
444 MBE parser, in turn, may call the normal Rust parser when it needs to bind a
445 metavariable (e.g. `$my_expr`) while parsing the contents of a macro
446 invocation. The code for macro expansion is in
447 [`compiler/rustc_expand/src/mbe/`][code_dir].
448
449 ### Example
450
451 It's helpful to have an example to refer to. For the remainder of this chapter,
452 whenever we refer to the "example _definition_", we mean the following:
453
454 ```rust,ignore
455 macro_rules! printer {
456 (print $mvar:ident) => {
457 println!("{}", $mvar);
458 };
459 (print twice $mvar:ident) => {
460 println!("{}", $mvar);
461 println!("{}", $mvar);
462 };
463 }
464 ```
465
466 `$mvar` is called a _metavariable_. Unlike normal variables, rather than
467 binding to a value in a computation, a metavariable binds _at compile time_ to
468 a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an
469 identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
470 special tokens, such as `EOF`, which indicates that there are no more tokens.
471 Token trees resulting from paired parentheses-like characters (`(`...`)`,
472 `[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
473 in between (we do require that parentheses-like characters be balanced). Having
474 macro expansion operate on token streams rather than the raw bytes of a source
475 file abstracts away a lot of complexity. The macro expander (and much of the
476 rest of the compiler) doesn't really care that much about the exact line and
477 column of some syntactic construct in the code; it cares about what constructs
478 are used in the code. Using tokens allows us to care about _what_ without
479 worrying about _where_. For more information about tokens, see the
480 [Parsing][parsing] chapter of this book.
481
482 Whenever we refer to the "example _invocation_", we mean the following snippet:
483
484 ```rust,ignore
485 printer!(print foo); // Assume `foo` is a variable defined somewhere else...
486 ```
487
488 The process of expanding the macro invocation into the syntax tree
489 `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
490 called _macro expansion_, and it is the topic of this chapter.
491
492 ### The MBE parser
493
494 There are two parts to MBE expansion: parsing the definition and parsing the
495 invocations. Interestingly, both are done by the macro parser.
496
497 Basically, the MBE parser is like an NFA-based regex parser. It uses an
498 algorithm similar in spirit to the [Earley parsing
499 algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
500 defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
501
502 The interface of the macro parser is as follows (this is slightly simplified):
503
504 ```rust,ignore
505 fn parse_tt(
506 &mut self,
507 parser: &mut Cow<'_, Parser<'_>>,
508 matcher: &[MatcherLoc]
509 ) -> ParseResult
510 ```
511
512 We use these items in macro parser:
513
514 - `parser` is a reference to the state of a normal Rust parser, including the
515 token stream and parsing session. The token stream is what we are about to
516 ask the MBE parser to parse. We will consume the raw stream of tokens and
517 output a binding of metavariables to corresponding token trees. The parsing
518 session can be used to report parser errors.
519 - `matcher` is a sequence of `MatcherLoc`s that we want to match
520 the token stream against. They're converted from token trees before matching.
521
522 In the analogy of a regex parser, the token stream is the input and we are matching it
523 against the pattern `matcher`. Using our examples, the token stream could be the stream of
524 tokens containing the inside of the example invocation `print foo`, while `matcher`
525 might be the sequence of token (trees) `print $mvar:ident`.
526
527 The output of the parser is a [`ParseResult`], which indicates which of
528 three cases has occurred:
529
530 - Success: the token stream matches the given `matcher`, and we have produced a binding
531 from metavariables to the corresponding token trees.
532 - Failure: the token stream does not match `matcher`. This results in an error message such as
533 "No rule expected token _blah_".
534 - Error: some fatal error has occurred _in the parser_. For example, this
535 happens if there is more than one pattern match, since that indicates
536 the macro is ambiguous.
537
538 The full interface is defined [here][code_parse_int].
539
540 The macro parser does pretty much exactly the same as a normal regex parser with
541 one exception: in order to parse different types of metavariables, such as
542 `ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
543 normal Rust parser.
544
545 As mentioned above, both definitions and invocations of macros are parsed using
546 the macro parser. This is extremely non-intuitive and self-referential. The code
547 to parse macro _definitions_ is in
548 [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
549 matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
550 a `macro_rules` definition should have in its body at least one occurrence of a
551 token tree followed by `=>` followed by another token tree. When the compiler
552 comes to a `macro_rules` definition, it uses this pattern to match the two token
553 trees per rule in the definition of the macro _using the macro parser itself_.
554 In our example definition, the metavariable `$lhs` would match the patterns of
555 both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
556 would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
557 println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
558 knowledge around for when it needs to expand a macro invocation.
559
560 When the compiler comes to a macro invocation, it parses that invocation using
561 the same NFA-based macro parser that is described above. However, the matcher
562 used is the first token tree (`$lhs`) extracted from the arms of the macro
563 _definition_. Using our example, we would try to match the token stream `print
564 foo` from the invocation against the matchers `print $mvar:ident` and `print
565 twice $mvar:ident` that we previously extracted from the definition. The
566 algorithm is exactly the same, but when the macro parser comes to a place in the
567 current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
568 it calls back to the normal Rust parser to get the contents of that
569 non-terminal. In this case, the Rust parser would look for an `ident` token,
570 which it finds (`foo`) and returns to the macro parser. Then, the macro parser
571 proceeds in parsing as normal. Also, note that exactly one of the matchers from
572 the various arms should match the invocation; if there is more than one match,
573 the parse is ambiguous, while if there are no matches at all, there is a syntax
574 error.
575
576 For more information about the macro parser's implementation, see the comments
577 in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
578
579 ### `macro`s and Macros 2.0
580
581 There is an old and mostly undocumented effort to improve the MBE system, give
582 it more hygiene-related features, better scoping and visibility rules, etc. There
583 hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
584 macros use the same machinery as today's MBEs; they just have additional
585 syntactic sugar and are allowed to be in namespaces.
586
587 ## Procedural Macros
588
589 Procedural macros are also expanded during parsing, as mentioned above.
590 However, they use a rather different mechanism. Rather than having a parser in
591 the compiler, procedural macros are implemented as custom, third-party crates.
592 The compiler will compile the proc macro crate and specially annotated
593 functions in them (i.e. the proc macro itself), passing them a stream of tokens.
594
595 The proc macro can then transform the token stream and output a new token
596 stream, which is synthesized into the AST.
597
598 It's worth noting that the token stream type used by proc macros is _stable_,
599 so `rustc` does not use it internally (since our internal data structures are
600 unstable). The compiler's token stream is
601 [`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
602 converted into the stable [`proc_macro::TokenStream`][stablets] and back in
603 [`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms].
604 Because the Rust ABI is unstable, we use the C ABI for this conversion.
605
606 [tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
607 [rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
608 [stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html
609 [pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html
610 [pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
611 [`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html
612
613 TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
614
615 ### Custom Derive
616
617 Custom derives are a special type of proc macro.
618
619 TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)