5 > `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing
6 > refactoring, so some of the links in this chapter may be broken.
8 Rust has a very powerful macro system. In the previous chapter, we saw how the
9 parser sets aside macros to be expanded (it temporarily uses [placeholders]).
10 This chapter is about the process of expanding those macros iteratively until
11 we have a complete AST for our crate with no unexpanded macros (or a compile
14 [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
16 First, we will discuss the algorithm that expands and integrates macro output
17 into ASTs. Next, we will take a look at how hygiene data is collected. Finally,
18 we will look at the specifics of expanding different types of macros.
20 Many of the algorithms and data structures described below are in [`rustc_expand`],
21 with basic data structures in [`rustc_expand::base`][base].
23 Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
24 handled in [`rustc_expand::config`][cfg].
26 [`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
27 [base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html
28 [cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html
30 ## Expansion and AST Integration
32 First of all, expansion happens at the crate level. Given a raw source code for
33 a crate, the compiler will produce a massive AST with all macros expanded, all
34 modules inlined, etc. The primary entry point for this process is the
35 [`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
36 use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
37 below for more detailed discussion of edge case expansion issues).
39 [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
40 [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
42 At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
43 queue of unresolved macro invocations (that is, macros we haven't found the
44 definition of yet). We repeatedly try to pick a macro from the queue, resolve
45 it, expand it, and integrate it back. If we can't make progress in an
46 iteration, this represents a compile error. Here is the [algorithm][original]:
48 [fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
49 [original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
51 1. Initialize an `queue` of unresolved macros.
52 2. Repeat until `queue` is empty (or we make no progress, which is an error):
53 1. [Resolve](./name-resolution.md) imports in our partially built crate as
55 2. Collect as many macro [`Invocation`s][inv] as possible from our
56 partially built crate (fn-like, attributes, derives) and add them to the
58 3. Dequeue the first element, and attempt to resolve it.
60 1. Run the macro's expander function that consumes a [`TokenStream`] or
61 AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
62 the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt],
63 each of which are a token (punctuation, identifier, or literal) or a
64 delimited group (anything inside `()`/`[]`/`{}`)).
65 - At this point, we know everything about the macro itself and can
66 call `set_expn_data` to fill in its properties in the global data;
67 that is the hygiene data associated with `ExpnId`. (See [the
68 "Hygiene" section below][hybelow]).
69 2. Integrate that piece of AST into the big existing partially built
70 AST. This is essentially where the "token-like mass" becomes a
71 proper set-in-stone AST with side-tables. It happens as follows:
72 - If the macro produces tokens (e.g. a proc macro), we parse into
73 an AST, which may produce parse errors.
74 - During expansion, we create `SyntaxContext`s (hierarchy 2). (See
75 [the "Hygiene" section below][hybelow])
76 - These three passes happen one after another on every AST fragment
77 freshly expanded from a macro:
78 - [`NodeId`]s are assigned by [`InvocationCollector`]. This
79 also collects new macro calls from this new AST piece and
80 adds them to the queue.
81 - ["Def paths"][defpath] are created and [`DefId`]s are
82 assigned to them by [`DefCollector`].
83 - Names are put into modules (from the resolver's point of
84 view) by [`BuildReducedGraphVisitor`].
85 3. After expanding a single macro and integrating its output, continue
86 to the next iteration of [`fully_expand_fragment`][fef].
87 5. If it's not resolved:
88 1. Put the macro back in the queue
89 2. Continue to next iteration...
91 [defpath]: hir.md#identifiers-in-the-hir
92 [`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
93 [`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
94 [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
95 [`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
96 [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
97 [hybelow]: #hygiene-and-hierarchies
98 [tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
99 [`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
100 [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
101 [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
105 If we make no progress in an iteration, then we have reached a compilation
106 error (e.g. an undefined macro). We attempt to recover from failures
107 (unresolved macros or imports) for the sake of diagnostics. This allows
108 compilation to continue past the first error, so that we can report more errors
109 at a time. Recovery can't cause compilation to succeed. We know that it will
110 fail at this point. The recovery happens by expanding unresolved macros into
111 [`ExprKind::Err`][err].
113 [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err
117 Notice that name resolution is involved here: we need to resolve imports and
118 macro names in the above algorithm. This is done in
119 [`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
120 those resolutions, and reports various errors (e.g. "not found" or "found, but
121 it's unstable" or "expected x, found y"). However, we don't try to resolve
122 other names yet. This happens later, as we will see in the [next
123 chapter](./name-resolution.md).
125 [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html
129 _Eager expansion_ means that we expand the arguments of a macro invocation
130 before the macro invocation itself. This is implemented only for a few special
131 built-in macros that expect literals; expanding arguments first for some of
132 these macro results in a smoother user experience. As an example, consider the
136 macro bar($i: ident) { $i }
137 macro foo($i: ident) { $i }
142 A lazy expansion would expand `foo!` first. An eager expansion would expand
145 Eager expansion is not a generally available feature of Rust. Implementing
146 eager expansion more generally would be challenging, but we implement it for a
147 few special built-in macros for the sake of user experience. The built-in
148 macros are implemented in [`rustc_builtin_macros`], along with some other early
149 code generation facilities like injection of standard library imports or
150 generation of test harness. There are some additional helpers for building
151 their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally
152 performs a subset of the things that lazy (normal) expansion does. It is done by
153 invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to
154 the whole crate, like we normally do).
156 ### Other Data Structures
158 Here are some other notable data structures involved in expansion and integration:
159 - [`ResolverExpand`] - a trait used to break crate dependencies. This allows the
160 resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
161 pretty much everything else depending on [`rustc_ast`].
162 - [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion
163 infrastructure in the process of its work
164 - [`Annotatable`] - a piece of AST that can be an attribute target, almost same
165 thing as AstFragment except for types and patterns that can be produced by
166 macros but cannot be annotated with attributes
167 - [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a
168 different `AstFragment` depending on its [`AstFragmentKind`] - item,
169 or expression, or pattern etc.
171 [`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
172 [`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
173 [`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
174 [`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html
175 [`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html
176 [`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html
177 [`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html
178 [`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html
180 ## Hygiene and Hierarchies
182 If you have ever used C/C++ preprocessor macros, you know that there are some
183 annoying and hard-to-debug gotchas! For example, consider the following C code:
186 #define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};
188 // Then, somewhere else
196 Most people avoid writing C like this – and for good reason: it doesn't
197 compile. The `struct Bar` defined by the macro clashes names with the `struct
198 Bar` defined in the code. Consider also the following example:
211 Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead
212 we got `foo(0, 0)` because the macro defined its own `y`!
214 These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
215 handle names defined _within a macro_. In particular, a hygienic macro system
216 prevents errors due to names introduced within a macro. Rust macros are hygienic
217 in that they do not allow one to write the sorts of bugs above.
219 At a high level, hygiene within the Rust compiler is accomplished by keeping
220 track of the context where a name is introduced and used. We can then
221 disambiguate names based on that context. Future iterations of the macro system
222 will allow greater control to the macro author to use that context. For example,
223 a macro author may want to introduce a new name to the context where the macro
224 was called. Alternately, the macro author may be defining a variable for use
225 only within the macro (i.e. it should not be visible outside the macro).
227 [code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
228 [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
229 [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules
230 [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
231 [parsing]: ./the-parser.html
233 The context is attached to AST nodes. All AST nodes generated by macros have
234 context attached. Additionally, there may be other nodes that have context
235 attached, such as some desugared syntax (non-macro-expanded nodes are
236 considered to just have the "root" context, as described below).
237 Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
238 This struct also has hygiene information attached to it, as we will see later.
240 [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
242 Because macros invocations and definitions can be nested, the syntax context of
243 a node must be a hierarchy. For example, if we expand a macro and there is
244 another macro invocation or definition in the generated output, then the syntax
245 context should reflect the nesting.
247 However, it turns out that there are actually a few types of context we may
248 want to track for different purposes. Thus, there are not just one but _three_
249 expansion hierarchies that together comprise the hygiene information for a
252 All of these hierarchies need some sort of "macro ID" to identify individual
253 elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive
254 an integer ID, assigned continuously starting from 0 as we discover new macro
255 calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own
258 [`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms
259 (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
260 and structures related to hygiene and expansion that are kept in global data.
262 The actual hierarchies are stored in [`HygieneData`][hd]. This is a global
263 piece of data containing hygiene and expansion info that can be accessed from
264 any [`Ident`] without any context.
267 [`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
268 [rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root
269 [hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html
270 [hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
271 [hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root
272 [`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html
274 ### The Expansion Order Hierarchy
276 The first hierarchy tracks the order of expansions, i.e., when a macro
277 invocation is in the output of another macro.
279 Here, the children in the hierarchy will be the "innermost" tokens. The
280 [`ExpnData`] struct itself contains a subset of properties from both macro
281 definition and macro call available through global data.
282 [`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy.
284 [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
285 [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent
290 macro_rules! foo { () => { println!(); } }
292 fn main() { foo!(); }
295 In this code, the AST nodes that are finally generated would have hierarchy:
303 ### The Macro Definition Hierarchy
305 The second hierarchy tracks the order of macro definitions, i.e., when we are
306 expanding one macro another macro definition is revealed in its output. This
307 one is a bit tricky and more complex than the other two hierarchies.
309 [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
310 [`SyntaxContextData`][scd] contains data associated with the given
311 `SyntaxContext`; mostly it is a cache for results of filtering that chain in
312 different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent
313 link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
314 elements in the chain. The "chaining operator" is
315 [`SyntaxContext::apply_mark`][am] in compiler code.
317 A [`Span`][span], mentioned above, is actually just a compact representation of
318 a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
319 [`Symbol`] + `Span` (i.e. an interned string + hygiene data).
321 [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
322 [scd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html
323 [scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent
324 [sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
325 [scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
326 [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
328 For built-in macros, we use the context:
329 `SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to
330 be defined at the hierarchy root. We do the same for proc-macros because we
331 haven't implemented cross-crate hygiene yet.
333 If the token had context `X` before being produced by a macro then after being
334 produced by the macro it has context `X -> macro_id`. Here are some examples:
344 Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has
345 context `ROOT -> id(m)` after it's produced by `m`.
347 [scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root
353 macro m() { macro n() { ident } }
358 In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)`
359 after the first expansion, then `ROOT -> id(m) -> id(n)`.
363 Note that these chains are not entirely determined by their last element, in
364 other words `ExpnId` is not isomorphic to `SyntaxContext`.
367 macro m($i: ident) { macro n() { ($i, bar) } }
372 After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
373 `ROOT -> id(m) -> id(n)`.
375 Finally, one last thing to mention is that currently, this hierarchy is subject
376 to the ["context transplantation hack"][hack]. Basically, the more modern (and
377 experimental) `macro` macros have stronger hygiene than the older MBE system,
378 but this can result in weird interactions between the two. The hack is intended
379 to make things "just work" for now.
381 [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
383 ### The Call-site Hierarchy
385 The third and final hierarchy tracks the location of macro invocations.
387 In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
389 [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site
394 macro bar($i: ident) { $i }
395 macro foo($i: ident) { $i }
400 For the `baz` AST node in the final output, the first hierarchy is `ROOT ->
401 id(foo) -> id(bar) -> baz`, while the third hierarchy is `ROOT -> baz`.
405 Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
406 in [`rustc_span::hygiene`][hy].
408 [`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
410 ## Producing Macro Output
412 Above, we saw how the output of a macro is integrated into the AST for a crate,
413 and we also saw how the hygiene data for a crate is generated. But how do we
414 actually produce the output of a macro? It depends on the type of macro.
416 There are two types of macros in Rust:
417 `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros
418 (or "proc macros"; including custom derives). During the parsing phase, the normal
419 Rust parser will set aside the contents of macros and their invocations. Later,
420 macros are expanded using these portions of the code.
422 Some important data structures/interfaces here:
423 - [`SyntaxExtension`] - a lowered macro representation, contains its expander
424 function, which transforms a `TokenStream` or AST into another `TokenStream`
425 or AST + some additional data like stability, or a list of unstable features
426 allowed inside the macro.
427 - [`SyntaxExtensionKind`] - expander functions may have several different
428 signatures (take one token stream, or two, or a piece of AST, etc). This is
429 an enum that lists them.
430 - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
431 traits representing the expander function signatures.
433 [`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
434 [`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
435 [`BangProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.BangProcMacro.html
436 [`TTMacroExpander`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.TTMacroExpander.html
437 [`AttrProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.AttrProcMacro.html
438 [`MultiItemModifier`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MultiItemModifier.html
442 MBEs have their own parser distinct from the normal Rust parser. When macros
443 are expanded, we may invoke the MBE parser to parse and expand a macro. The
444 MBE parser, in turn, may call the normal Rust parser when it needs to bind a
445 metavariable (e.g. `$my_expr`) while parsing the contents of a macro
446 invocation. The code for macro expansion is in
447 [`compiler/rustc_expand/src/mbe/`][code_dir].
451 It's helpful to have an example to refer to. For the remainder of this chapter,
452 whenever we refer to the "example _definition_", we mean the following:
455 macro_rules! printer {
456 (print $mvar:ident) => {
457 println!("{}", $mvar);
459 (print twice $mvar:ident) => {
460 println!("{}", $mvar);
461 println!("{}", $mvar);
466 `$mvar` is called a _metavariable_. Unlike normal variables, rather than
467 binding to a value in a computation, a metavariable binds _at compile time_ to
468 a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an
469 identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
470 special tokens, such as `EOF`, which indicates that there are no more tokens.
471 Token trees resulting from paired parentheses-like characters (`(`...`)`,
472 `[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
473 in between (we do require that parentheses-like characters be balanced). Having
474 macro expansion operate on token streams rather than the raw bytes of a source
475 file abstracts away a lot of complexity. The macro expander (and much of the
476 rest of the compiler) doesn't really care that much about the exact line and
477 column of some syntactic construct in the code; it cares about what constructs
478 are used in the code. Using tokens allows us to care about _what_ without
479 worrying about _where_. For more information about tokens, see the
480 [Parsing][parsing] chapter of this book.
482 Whenever we refer to the "example _invocation_", we mean the following snippet:
485 printer!(print foo); // Assume `foo` is a variable defined somewhere else...
488 The process of expanding the macro invocation into the syntax tree
489 `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
490 called _macro expansion_, and it is the topic of this chapter.
494 There are two parts to MBE expansion: parsing the definition and parsing the
495 invocations. Interestingly, both are done by the macro parser.
497 Basically, the MBE parser is like an NFA-based regex parser. It uses an
498 algorithm similar in spirit to the [Earley parsing
499 algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
500 defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
502 The interface of the macro parser is as follows (this is slightly simplified):
507 parser: &mut Cow<'_, Parser<'_>>,
508 matcher: &[MatcherLoc]
512 We use these items in macro parser:
514 - `parser` is a reference to the state of a normal Rust parser, including the
515 token stream and parsing session. The token stream is what we are about to
516 ask the MBE parser to parse. We will consume the raw stream of tokens and
517 output a binding of metavariables to corresponding token trees. The parsing
518 session can be used to report parser errors.
519 - `matcher` is a sequence of `MatcherLoc`s that we want to match
520 the token stream against. They're converted from token trees before matching.
522 In the analogy of a regex parser, the token stream is the input and we are matching it
523 against the pattern `matcher`. Using our examples, the token stream could be the stream of
524 tokens containing the inside of the example invocation `print foo`, while `matcher`
525 might be the sequence of token (trees) `print $mvar:ident`.
527 The output of the parser is a [`ParseResult`], which indicates which of
528 three cases has occurred:
530 - Success: the token stream matches the given `matcher`, and we have produced a binding
531 from metavariables to the corresponding token trees.
532 - Failure: the token stream does not match `matcher`. This results in an error message such as
533 "No rule expected token _blah_".
534 - Error: some fatal error has occurred _in the parser_. For example, this
535 happens if there is more than one pattern match, since that indicates
536 the macro is ambiguous.
538 The full interface is defined [here][code_parse_int].
540 The macro parser does pretty much exactly the same as a normal regex parser with
541 one exception: in order to parse different types of metavariables, such as
542 `ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
545 As mentioned above, both definitions and invocations of macros are parsed using
546 the macro parser. This is extremely non-intuitive and self-referential. The code
547 to parse macro _definitions_ is in
548 [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
549 matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
550 a `macro_rules` definition should have in its body at least one occurrence of a
551 token tree followed by `=>` followed by another token tree. When the compiler
552 comes to a `macro_rules` definition, it uses this pattern to match the two token
553 trees per rule in the definition of the macro _using the macro parser itself_.
554 In our example definition, the metavariable `$lhs` would match the patterns of
555 both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
556 would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
557 println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
558 knowledge around for when it needs to expand a macro invocation.
560 When the compiler comes to a macro invocation, it parses that invocation using
561 the same NFA-based macro parser that is described above. However, the matcher
562 used is the first token tree (`$lhs`) extracted from the arms of the macro
563 _definition_. Using our example, we would try to match the token stream `print
564 foo` from the invocation against the matchers `print $mvar:ident` and `print
565 twice $mvar:ident` that we previously extracted from the definition. The
566 algorithm is exactly the same, but when the macro parser comes to a place in the
567 current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
568 it calls back to the normal Rust parser to get the contents of that
569 non-terminal. In this case, the Rust parser would look for an `ident` token,
570 which it finds (`foo`) and returns to the macro parser. Then, the macro parser
571 proceeds in parsing as normal. Also, note that exactly one of the matchers from
572 the various arms should match the invocation; if there is more than one match,
573 the parse is ambiguous, while if there are no matches at all, there is a syntax
576 For more information about the macro parser's implementation, see the comments
577 in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
579 ### `macro`s and Macros 2.0
581 There is an old and mostly undocumented effort to improve the MBE system, give
582 it more hygiene-related features, better scoping and visibility rules, etc. There
583 hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
584 macros use the same machinery as today's MBEs; they just have additional
585 syntactic sugar and are allowed to be in namespaces.
589 Procedural macros are also expanded during parsing, as mentioned above.
590 However, they use a rather different mechanism. Rather than having a parser in
591 the compiler, procedural macros are implemented as custom, third-party crates.
592 The compiler will compile the proc macro crate and specially annotated
593 functions in them (i.e. the proc macro itself), passing them a stream of tokens.
595 The proc macro can then transform the token stream and output a new token
596 stream, which is synthesized into the AST.
598 It's worth noting that the token stream type used by proc macros is _stable_,
599 so `rustc` does not use it internally (since our internal data structures are
600 unstable). The compiler's token stream is
601 [`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
602 converted into the stable [`proc_macro::TokenStream`][stablets] and back in
603 [`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms].
604 Because the Rust ABI is unstable, we use the C ABI for this conversion.
606 [tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
607 [rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
608 [stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html
609 [pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html
610 [pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
611 [`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html
613 TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
617 Custom derives are a special type of proc macro.
619 TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)