]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/the-parser.md
Merge tag 'debian/1.52.1+dfsg1-1_exp2' into proxmox/buster
[rustc.git] / src / doc / rustc-dev-guide / src / the-parser.md
CommitLineData
60c5eb7d 1# Lexing and Parsing
a1dfa0c6 2
6a06907d
XL
3As of <!-- date: 2021-01 --> January 2021, the lexer and parser are undergoing
4refactoring to allow extracting them into libraries.
60c5eb7d
XL
5
6The very first thing the compiler does is take the program (in Unicode
7characters) and turn it into something the compiler can work with more
8conveniently than strings. This happens in two stages: Lexing and Parsing.
a1dfa0c6 9
6a06907d 10Lexing takes strings and turns them into streams of [tokens]. For example,
60c5eb7d 11`a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`.
6a06907d 12The lexer lives in [`rustc_lexer`][lexer].
a1dfa0c6 13
6a06907d 14[tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html
60c5eb7d 15[lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
a1dfa0c6 16
60c5eb7d
XL
17Parsing then takes streams of tokens and turns them into a structured
18form which is easier for the compiler to work with, usually called an [*Abstract
19Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory,
20using a `Span` to link a particular AST node back to its source text.
a1dfa0c6 21
6a06907d 22The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for
60c5eb7d
XL
23tokens and token streams, data structures/traits for mutating ASTs, and shared
24definitions for other AST-related parts of the compiler (like the lexer and
25macro-expansion).
a1dfa0c6 26
6a06907d 27The parser is defined in [`rustc_parse`][rustc_parse], along with a
60c5eb7d 28high-level interface to the lexer and some validation routines that run after
ba9703b0 29macro expansion. In particular, the [`rustc_parse::parser`][parser] contains
60c5eb7d 30the parser implementation.
a1dfa0c6 31
6a06907d
XL
32The main entrypoint to the parser is via the various `parse_*` functions and others in the
33[parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile]
a1dfa0c6
XL
34(e.g. the source in a single file) into a token stream, create a parser from
35the token stream, and then execute the parser to get a `Crate` (the root AST
36node).
37
38To minimise the amount of copying that is done, both the `StringReader` and
39`Parser` have lifetimes which bind them to the parent `ParseSess`. This contains
40all the information needed while parsing, as well as the `SourceMap` itself.
41
6a06907d
XL
42Note that while parsing, we may encounter macro definitions or invocations. We
43set these aside to be expanded (see [this chapter](./macro-expansion.md)).
44Expansion may itself require parsing the output of the macro, which may reveal
45more macros to be expanded, and so on.
46
416331ca
XL
47## More on Lexical Analysis
48
49Code for lexical analysis is split between two crates:
50
51- `rustc_lexer` crate is responsible for breaking a `&str` into chunks
52 constituting tokens. Although it is popular to implement lexers as generated
53 finite state machines, the lexer in `rustc_lexer` is hand-written.
54
6a06907d 55- [`StringReader`] from [`rustc_ast`][rustc_ast] integrates `rustc_lexer` with `rustc`
416331ca
XL
56 specific data structures. Specifically, it adds `Span` information to tokens
57 returned by `rustc_lexer` and interns identifiers.
58
6a06907d 59[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
a1dfa0c6
XL
60[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
61[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
dfeec247 62[`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
74b04a01 63[ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
6a06907d
XL
64[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
65[parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
60c5eb7d 66[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
74b04a01 67[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/parse/parser/struct.Parser.html
60c5eb7d 68[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
74b04a01 69[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html
dfeec247 70[sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html