src/doc/rustc-dev-guide/src/macro-expansion.md

   1 # Macro expansion
   2
   3 <!-- toc -->
   4
   5 > `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing
   6 > refactoring, so some of the links in this chapter may be broken.
   7
   8 Rust has a very powerful macro system. In the previous chapter, we saw how the
   9 parser sets aside macros to be expanded (it temporarily uses [placeholders]).
  10 This chapter is about the process of expanding those macros iteratively until
  11 we have a complete AST for our crate with no unexpanded macros (or a compile
  12 error).
  13
  14 [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
  15
  16 First, we will discuss the algorithm that expands and integrates macro output
  17 into ASTs. Next, we will take a look at how hygiene data is collected. Finally,
  18 we will look at the specifics of expanding different types of macros.
  19
  20 Many of the algorithms and data structures described below are in [`rustc_expand`],
  21 with basic data structures in [`rustc_expand::base`][base].
  22
  23 Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
  24 handled in [`rustc_expand::config`][cfg].
  25
  26 [`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
  27 [base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html
  28 [cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html
  29
  30 ## Expansion and AST Integration
  31
  32 First of all, expansion happens at the crate level. Given a raw source code for
  33 a crate, the compiler will produce a massive AST with all macros expanded, all
  34 modules inlined, etc. The primary entry point for this process is the
  35 [`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
  36 use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
  37 below for more detailed discussion of edge case expansion issues).
  38
  39 [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
  40 [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
  41
  42 At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
  43 queue of unresolved macro invocations (that is, macros we haven't found the
  44 definition of yet). We repeatedly try to pick a macro from the queue, resolve
  45 it, expand it, and integrate it back. If we can't make progress in an
  46 iteration, this represents a compile error.  Here is the [algorithm][original]:
  47
  48 [fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
  49 [original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
  50
  51 1. Initialize an `queue` of unresolved macros.
  52 2. Repeat until `queue` is empty (or we make no progress, which is an error):
  53    1. [Resolve](./name-resolution.md) imports in our partially built crate as
  54       much as possible.
  55    2. Collect as many macro [`Invocation`s][inv] as possible from our
  56       partially built crate (fn-like, attributes, derives) and add them to the
  57       queue.
  58    3. Dequeue the first element, and attempt to resolve it.
  59    4. If it's resolved:
  60       1. Run the macro's expander function that consumes a [`TokenStream`] or
  61          AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
  62          the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt],
  63          each of which are a token (punctuation, identifier, or literal) or a
  64          delimited group (anything inside `()`/`[]`/`{}`)).
  65          - At this point, we know everything about the macro itself and can
  66            call `set_expn_data` to fill in its properties in the global data;
  67            that is the hygiene data associated with `ExpnId`. (See [the
  68            "Hygiene" section below][hybelow]).
  69       2. Integrate that piece of AST into the big existing partially built
  70          AST. This is essentially where the "token-like mass" becomes a
  71          proper set-in-stone AST with side-tables. It happens as follows:
  72          - If the macro produces tokens (e.g. a proc macro), we parse into
  73            an AST, which may produce parse errors.
  74          - During expansion, we create `SyntaxContext`s (hierarchy 2). (See
  75            [the "Hygiene" section below][hybelow])
  76          - These three passes happen one after another on every AST fragment
  77            freshly expanded from a macro:
  78            - [`NodeId`]s are assigned by [`InvocationCollector`]. This
  79              also collects new macro calls from this new AST piece and
  80              adds them to the queue.
  81            - ["Def paths"][defpath] are created and [`DefId`]s are
  82              assigned to them by [`DefCollector`].
  83            - Names are put into modules (from the resolver's point of
  84              view) by [`BuildReducedGraphVisitor`].
  85       3. After expanding a single macro and integrating its output, continue
  86          to the next iteration of [`fully_expand_fragment`][fef].
  87    5. If it's not resolved:
  88       1. Put the macro back in the queue
  89       2. Continue to next iteration...
  90
  91 [defpath]: hir.md#identifiers-in-the-hir
  92 [`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
  93 [`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
  94 [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
  95 [`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
  96 [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
  97 [hybelow]: #hygiene-and-hierarchies
  98 [tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
  99 [`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
 100 [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
 101 [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
 102
 103 ### Error Recovery
 104
 105 If we make no progress in an iteration, then we have reached a compilation
 106 error (e.g. an undefined macro). We attempt to recover from failures
 107 (unresolved macros or imports) for the sake of diagnostics. This allows
 108 compilation to continue past the first error, so that we can report more errors
 109 at a time. Recovery can't cause compilation to succeed. We know that it will
 110 fail at this point. The recovery happens by expanding unresolved macros into
 111 [`ExprKind::Err`][err].
 112
 113 [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err
 114
 115 ### Name Resolution
 116
 117 Notice that name resolution is involved here: we need to resolve imports and
 118 macro names in the above algorithm. This is done in
 119 [`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
 120 those resolutions, and reports various errors (e.g. "not found" or "found, but
 121 it's unstable" or "expected x, found y"). However, we don't try to resolve
 122 other names yet. This happens later, as we will see in the [next
 123 chapter](./name-resolution.md).
 124
 125 [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html
 126
 127 ### Eager Expansion
 128
 129 _Eager expansion_ means that we expand the arguments of a macro invocation
 130 before the macro invocation itself. This is implemented only for a few special
 131 built-in macros that expect literals; expanding arguments first for some of
 132 these macro results in a smoother user experience.  As an example, consider the
 133 following:
 134
 135 ```rust,ignore
 136 macro bar($i: ident) { $i }
 137 macro foo($i: ident) { $i }
 138
 139 foo!(bar!(baz));
 140 ```
 141
 142 A lazy expansion would expand `foo!` first. An eager expansion would expand
 143 `bar!` first.
 144
 145 Eager expansion is not a generally available feature of Rust.  Implementing
 146 eager expansion more generally would be challenging, but we implement it for a
 147 few special built-in macros for the sake of user experience.  The built-in
 148 macros are implemented in [`rustc_builtin_macros`], along with some other early
 149 code generation facilities like injection of standard library imports or
 150 generation of test harness. There are some additional helpers for building
 151 their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally
 152 performs a subset of the things that lazy (normal) expansion does. It is done by
 153 invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to
 154 the whole crate, like we normally do).
 155
 156 ### Other Data Structures
 157
 158 Here are some other notable data structures involved in expansion and integration:
 159 - [`ResolverExpand`] - a trait used to break crate dependencies. This allows the
 160   resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
 161   pretty much everything else depending on [`rustc_ast`].
 162 - [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion
 163   infrastructure in the process of its work
 164 - [`Annotatable`] - a piece of AST that can be an attribute target, almost same
 165   thing as AstFragment except for types and patterns that can be produced by
 166   macros but cannot be annotated with attributes
 167 - [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a
 168   different `AstFragment` depending on its [`AstFragmentKind`] - item,
 169   or expression, or pattern etc.
 170
 171 [`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
 172 [`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
 173 [`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
 174 [`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html
 175 [`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html
 176 [`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html
 177 [`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html
 178 [`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html
 179
 180 ## Hygiene and Hierarchies
 181
 182 If you have ever used C/C++ preprocessor macros, you know that there are some
 183 annoying and hard-to-debug gotchas! For example, consider the following C code:
 184
 185 ```c
 186 #define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};
 187
 188 // Then, somewhere else
 189 struct Bar {
 190     ...
 191 };
 192
 193 DEFINE_FOO
 194 ```
 195
 196 Most people avoid writing C like this – and for good reason: it doesn't
 197 compile. The `struct Bar` defined by the macro clashes names with the `struct
 198 Bar` defined in the code. Consider also the following example:
 199
 200 ```c
 201 #define DO_FOO(x) {\
 202     int y = 0;\
 203     foo(x, y);\
 204     }
 205
 206 // Then elsewhere
 207 int y = 22;
 208 DO_FOO(y);
 209 ```
 210
 211 Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead
 212 we got `foo(0, 0)` because the macro defined its own `y`!
 213
 214 These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
 215 handle names defined _within a macro_. In particular, a hygienic macro system
 216 prevents errors due to names introduced within a macro. Rust macros are hygienic
 217 in that they do not allow one to write the sorts of bugs above.
 218
 219 At a high level, hygiene within the Rust compiler is accomplished by keeping
 220 track of the context where a name is introduced and used. We can then
 221 disambiguate names based on that context. Future iterations of the macro system
 222 will allow greater control to the macro author to use that context. For example,
 223 a macro author may want to introduce a new name to the context where the macro
 224 was called. Alternately, the macro author may be defining a variable for use
 225 only within the macro (i.e. it should not be visible outside the macro).
 226
 227 [code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
 228 [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
 229 [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules
 230 [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
 231 [parsing]: ./the-parser.html
 232
 233 The context is attached to AST nodes. All AST nodes generated by macros have
 234 context attached. Additionally, there may be other nodes that have context
 235 attached, such as some desugared syntax (non-macro-expanded nodes are
 236 considered to just have the "root" context, as described below).
 237 Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
 238 This struct also has hygiene information attached to it, as we will see later.
 239
 240 [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
 241
 242 Because macros invocations and definitions can be nested, the syntax context of
 243 a node must be a hierarchy. For example, if we expand a macro and there is
 244 another macro invocation or definition in the generated output, then the syntax
 245 context should reflect the nesting.
 246
 247 However, it turns out that there are actually a few types of context we may
 248 want to track for different purposes. Thus, there are not just one but _three_
 249 expansion hierarchies that together comprise the hygiene information for a
 250 crate.
 251
 252 All of these hierarchies need some sort of "macro ID" to identify individual
 253 elements in the chain of expansions. This ID is [`ExpnId`].  All macros receive
 254 an integer ID, assigned continuously starting from 0 as we discover new macro
 255 calls.  All hierarchies start at [`ExpnId::root()`][rootid], which is its own
 256 parent.
 257
 258 [`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms
 259 (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
 260 and structures related to hygiene and expansion that are kept in global data.
 261
 262 The actual hierarchies are stored in [`HygieneData`][hd]. This is a global
 263 piece of data containing hygiene and expansion info that can be accessed from
 264 any [`Ident`] without any context.
 265
 266
 267 [`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
 268 [rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root
 269 [hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html
 270 [hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
 271 [hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root
 272 [`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html
 273
 274 ### The Expansion Order Hierarchy
 275
 276 The first hierarchy tracks the order of expansions, i.e., when a macro
 277 invocation is in the output of another macro.
 278
 279 Here, the children in the hierarchy will be the "innermost" tokens.  The
 280 [`ExpnData`] struct itself contains a subset of properties from both macro
 281 definition and macro call available through global data.
 282 [`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy.
 283
 284 [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
 285 [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent
 286
 287 For example,
 288
 289 ```rust,ignore
 290 macro_rules! foo { () => { println!(); } }
 291
 292 fn main() { foo!(); }
 293 ```
 294
 295 In this code, the AST nodes that are finally generated would have hierarchy:
 296
 297 ```
 298 root
 299     expn_id_foo
 300         expn_id_println
 301 ```
 302
 303 ### The Macro Definition Hierarchy
 304
 305 The second hierarchy tracks the order of macro definitions, i.e., when we are
 306 expanding one macro another macro definition is revealed in its output.  This
 307 one is a bit tricky and more complex than the other two hierarchies.
 308
 309 [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
 310 [`SyntaxContextData`][scd] contains data associated with the given
 311 `SyntaxContext`; mostly it is a cache for results of filtering that chain in
 312 different ways.  [`SyntaxContextData::parent`][scdp] is the child -> parent
 313 link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
 314 elements in the chain.  The "chaining operator" is
 315 [`SyntaxContext::apply_mark`][am] in compiler code.
 316
 317 A [`Span`][span], mentioned above, is actually just a compact representation of
 318 a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
 319 [`Symbol`] + `Span` (i.e. an interned string + hygiene data).
 320
 321 [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
 322 [scd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html
 323 [scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent
 324 [sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
 325 [scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
 326 [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
 327
 328 For built-in macros, we use the context:
 329 `SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to
 330 be defined at the hierarchy root. We do the same for proc-macros because we
 331 haven't implemented cross-crate hygiene yet.
 332
 333 If the token had context `X` before being produced by a macro then after being
 334 produced by the macro it has context `X -> macro_id`. Here are some examples:
 335
 336 Example 0:
 337
 338 ```rust,ignore
 339 macro m() { ident }
 340
 341 m!();
 342 ```
 343
 344 Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has
 345 context `ROOT -> id(m)` after it's produced by `m`.
 346
 347 [scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root
 348
 349
 350 Example 1:
 351
 352 ```rust,ignore
 353 macro m() { macro n() { ident } }
 354
 355 m!();
 356 n!();
 357 ```
 358 In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)`
 359 after the first expansion, then `ROOT -> id(m) -> id(n)`.
 360
 361 Example 2:
 362
 363 Note that these chains are not entirely determined by their last element, in
 364 other words `ExpnId` is not isomorphic to `SyntaxContext`.
 365
 366 ```rust,ignore
 367 macro m($i: ident) { macro n() { ($i, bar) } }
 368
 369 m!(foo);
 370 ```
 371
 372 After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
 373 `ROOT -> id(m) -> id(n)`.
 374
 375 Finally, one last thing to mention is that currently, this hierarchy is subject
 376 to the ["context transplantation hack"][hack]. Basically, the more modern (and
 377 experimental) `macro` macros have stronger hygiene than the older MBE system,
 378 but this can result in weird interactions between the two. The hack is intended
 379 to make things "just work" for now.
 380
 381 [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
 382
 383 ### The Call-site Hierarchy
 384
 385 The third and final hierarchy tracks the location of macro invocations.
 386
 387 In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
 388
 389 [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site
 390
 391 Here is an example:
 392
 393 ```rust,ignore
 394 macro bar($i: ident) { $i }
 395 macro foo($i: ident) { $i }
 396
 397 foo!(bar!(baz));
 398 ```
 399
 400 For the `baz` AST node in the final output, the first hierarchy is `ROOT ->
 401 id(foo) -> id(bar) -> baz`, while the third hierarchy is `ROOT -> baz`.
 402
 403 ### Macro Backtraces
 404
 405 Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
 406 in [`rustc_span::hygiene`][hy].
 407
 408 [`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
 409
 410 ## Producing Macro Output
 411
 412 Above, we saw how the output of a macro is integrated into the AST for a crate,
 413 and we also saw how the hygiene data for a crate is generated. But how do we
 414 actually produce the output of a macro? It depends on the type of macro.
 415
 416 There are two types of macros in Rust:
 417 `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros
 418 (or "proc macros"; including custom derives). During the parsing phase, the normal
 419 Rust parser will set aside the contents of macros and their invocations. Later,
 420 macros are expanded using these portions of the code.
 421
 422 Some important data structures/interfaces here:
 423 - [`SyntaxExtension`] - a lowered macro representation, contains its expander
 424   function, which transforms a `TokenStream` or AST into another `TokenStream`
 425   or AST + some additional data like stability, or a list of unstable features
 426   allowed inside the macro.
 427 - [`SyntaxExtensionKind`] - expander functions may have several different
 428   signatures (take one token stream, or two, or a piece of AST, etc). This is
 429   an enum that lists them.
 430 - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
 431   traits representing the expander function signatures.
 432
 433 [`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
 434 [`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
 435 [`BangProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.BangProcMacro.html
 436 [`TTMacroExpander`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.TTMacroExpander.html
 437 [`AttrProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.AttrProcMacro.html
 438 [`MultiItemModifier`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MultiItemModifier.html
 439
 440 ## Macros By Example
 441
 442 MBEs have their own parser distinct from the normal Rust parser. When macros
 443 are expanded, we may invoke the MBE parser to parse and expand a macro.  The
 444 MBE parser, in turn, may call the normal Rust parser when it needs to bind a
 445 metavariable (e.g.  `$my_expr`) while parsing the contents of a macro
 446 invocation. The code for macro expansion is in
 447 [`compiler/rustc_expand/src/mbe/`][code_dir].
 448
 449 ### Example
 450
 451 It's helpful to have an example to refer to. For the remainder of this chapter,
 452 whenever we refer to the "example _definition_", we mean the following:
 453
 454 ```rust,ignore
 455 macro_rules! printer {
 456     (print $mvar:ident) => {
 457         println!("{}", $mvar);
 458     };
 459     (print twice $mvar:ident) => {
 460         println!("{}", $mvar);
 461         println!("{}", $mvar);
 462     };
 463 }
 464 ```
 465
 466 `$mvar` is called a _metavariable_. Unlike normal variables, rather than
 467 binding to a value in a computation, a metavariable binds _at compile time_ to
 468 a tree of _tokens_.  A _token_ is a single "unit" of the grammar, such as an
 469 identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
 470 special tokens, such as `EOF`, which indicates that there are no more tokens.
 471 Token trees resulting from paired parentheses-like characters (`(`...`)`,
 472 `[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
 473 in between (we do require that parentheses-like characters be balanced). Having
 474 macro expansion operate on token streams rather than the raw bytes of a source
 475 file abstracts away a lot of complexity. The macro expander (and much of the
 476 rest of the compiler) doesn't really care that much about the exact line and
 477 column of some syntactic construct in the code; it cares about what constructs
 478 are used in the code. Using tokens allows us to care about _what_ without
 479 worrying about _where_. For more information about tokens, see the
 480 [Parsing][parsing] chapter of this book.
 481
 482 Whenever we refer to the "example _invocation_", we mean the following snippet:
 483
 484 ```rust,ignore
 485 printer!(print foo); // Assume `foo` is a variable defined somewhere else...
 486 ```
 487
 488 The process of expanding the macro invocation into the syntax tree
 489 `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
 490 called _macro expansion_, and it is the topic of this chapter.
 491
 492 ### The MBE parser
 493
 494 There are two parts to MBE expansion: parsing the definition and parsing the
 495 invocations. Interestingly, both are done by the macro parser.
 496
 497 Basically, the MBE parser is like an NFA-based regex parser. It uses an
 498 algorithm similar in spirit to the [Earley parsing
 499 algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
 500 defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
 501
 502 The interface of the macro parser is as follows (this is slightly simplified):
 503
 504 ```rust,ignore
 505 fn parse_tt(
 506     &mut self,
 507     parser: &mut Cow<'_, Parser<'_>>,
 508     matcher: &[MatcherLoc]
 509 ) -> ParseResult
 510 ```
 511
 512 We use these items in macro parser:
 513
 514 - `parser` is a reference to the state of a normal Rust parser, including the
 515   token stream and parsing session. The token stream is what we are about to
 516   ask the MBE parser to parse. We will consume the raw stream of tokens and
 517   output a binding of metavariables to corresponding token trees. The parsing
 518   session can be used to report parser errors.
 519 - `matcher` is a sequence of `MatcherLoc`s that we want to match
 520   the token stream against. They're converted from token trees before matching.
 521
 522 In the analogy of a regex parser, the token stream is the input and we are matching it
 523 against the pattern `matcher`. Using our examples, the token stream could be the stream of
 524 tokens containing the inside of the example invocation `print foo`, while `matcher`
 525 might be the sequence of token (trees) `print $mvar:ident`.
 526
 527 The output of the parser is a [`ParseResult`], which indicates which of
 528 three cases has occurred:
 529
 530 - Success: the token stream matches the given `matcher`, and we have produced a binding
 531   from metavariables to the corresponding token trees.
 532 - Failure: the token stream does not match `matcher`. This results in an error message such as
 533   "No rule expected token _blah_".
 534 - Error: some fatal error has occurred _in the parser_. For example, this
 535   happens if there is more than one pattern match, since that indicates
 536   the macro is ambiguous.
 537
 538 The full interface is defined [here][code_parse_int].
 539
 540 The macro parser does pretty much exactly the same as a normal regex parser with
 541 one exception: in order to parse different types of metavariables, such as
 542 `ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
 543 normal Rust parser.
 544
 545 As mentioned above, both definitions and invocations of macros are parsed using
 546 the macro parser. This is extremely non-intuitive and self-referential. The code
 547 to parse macro _definitions_ is in
 548 [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
 549 matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
 550 a `macro_rules` definition should have in its body at least one occurrence of a
 551 token tree followed by `=>` followed by another token tree. When the compiler
 552 comes to a `macro_rules` definition, it uses this pattern to match the two token
 553 trees per rule in the definition of the macro _using the macro parser itself_.
 554 In our example definition, the metavariable `$lhs` would match the patterns of
 555 both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`.  And `$rhs`
 556 would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
 557 println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
 558 knowledge around for when it needs to expand a macro invocation.
 559
 560 When the compiler comes to a macro invocation, it parses that invocation using
 561 the same NFA-based macro parser that is described above. However, the matcher
 562 used is the first token tree (`$lhs`) extracted from the arms of the macro
 563 _definition_. Using our example, we would try to match the token stream `print
 564 foo` from the invocation against the matchers `print $mvar:ident` and `print
 565 twice $mvar:ident` that we previously extracted from the definition.  The
 566 algorithm is exactly the same, but when the macro parser comes to a place in the
 567 current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
 568 it calls back to the normal Rust parser to get the contents of that
 569 non-terminal. In this case, the Rust parser would look for an `ident` token,
 570 which it finds (`foo`) and returns to the macro parser. Then, the macro parser
 571 proceeds in parsing as normal. Also, note that exactly one of the matchers from
 572 the various arms should match the invocation; if there is more than one match,
 573 the parse is ambiguous, while if there are no matches at all, there is a syntax
 574 error.
 575
 576 For more information about the macro parser's implementation, see the comments
 577 in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
 578
 579 ### `macro`s and Macros 2.0
 580
 581 There is an old and mostly undocumented effort to improve the MBE system, give
 582 it more hygiene-related features, better scoping and visibility rules, etc. There
 583 hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
 584 macros use the same machinery as today's MBEs; they just have additional
 585 syntactic sugar and are allowed to be in namespaces.
 586
 587 ## Procedural Macros
 588
 589 Procedural macros are also expanded during parsing, as mentioned above.
 590 However, they use a rather different mechanism. Rather than having a parser in
 591 the compiler, procedural macros are implemented as custom, third-party crates.
 592 The compiler will compile the proc macro crate and specially annotated
 593 functions in them (i.e. the proc macro itself), passing them a stream of tokens.
 594
 595 The proc macro can then transform the token stream and output a new token
 596 stream, which is synthesized into the AST.
 597
 598 It's worth noting that the token stream type used by proc macros is _stable_,
 599 so `rustc` does not use it internally (since our internal data structures are
 600 unstable). The compiler's token stream is
 601 [`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
 602 converted into the stable [`proc_macro::TokenStream`][stablets] and back in
 603 [`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms].
 604 Because the Rust ABI is unstable, we use the C ABI for this conversion.
 605
 606 [tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
 607 [rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
 608 [stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html
 609 [pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html
 610 [pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
 611 [`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html
 612
 613 TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
 614
 615 ### Custom Derive
 616
 617 Custom derives are a special type of proc macro.
 618
 619 TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)