]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/the-parser.md
0d37704e84bf528eb2216abbd9eca584da3539fd
[rustc.git] / src / doc / rustc-dev-guide / src / the-parser.md
1 # Lexing and Parsing
2
3 As of <!-- date-check --> January 2021, the lexer and parser are undergoing
4 refactoring to allow extracting them into libraries.
5
6 The very first thing the compiler does is take the program (in Unicode
7 characters) and turn it into something the compiler can work with more
8 conveniently than strings. This happens in two stages: Lexing and Parsing.
9
10 Lexing takes strings and turns them into streams of [tokens]. For example,
11 `a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`.
12 The lexer lives in [`rustc_lexer`][lexer].
13
14 [tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html
15 [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
16
17 Parsing then takes streams of tokens and turns them into a structured
18 form which is easier for the compiler to work with, usually called an [*Abstract
19 Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory,
20 using a `Span` to link a particular AST node back to its source text.
21
22 The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for
23 tokens and token streams, data structures/traits for mutating ASTs, and shared
24 definitions for other AST-related parts of the compiler (like the lexer and
25 macro-expansion).
26
27 The parser is defined in [`rustc_parse`][rustc_parse], along with a
28 high-level interface to the lexer and some validation routines that run after
29 macro expansion. In particular, the [`rustc_parse::parser`][parser] contains
30 the parser implementation.
31
32 The main entrypoint to the parser is via the various `parse_*` functions and others in the
33 [parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile]
34 (e.g. the source in a single file) into a token stream, create a parser from
35 the token stream, and then execute the parser to get a `Crate` (the root AST
36 node).
37
38 To minimize the amount of copying that is done,
39 both [`StringReader`] and [`Parser`] have lifetimes which bind them to the parent `ParseSess`.
40 This contains all the information needed while parsing,
41 as well as the [`SourceMap`] itself.
42
43 Note that while parsing, we may encounter macro definitions or invocations. We
44 set these aside to be expanded (see [this chapter](./macro-expansion.md)).
45 Expansion may itself require parsing the output of the macro, which may reveal
46 more macros to be expanded, and so on.
47
48 ## More on Lexical Analysis
49
50 Code for lexical analysis is split between two crates:
51
52 - `rustc_lexer` crate is responsible for breaking a `&str` into chunks
53 constituting tokens. Although it is popular to implement lexers as generated
54 finite state machines, the lexer in `rustc_lexer` is hand-written.
55
56 - [`StringReader`] integrates `rustc_lexer` with data structures specific to `rustc`.
57 Specifically,
58 it adds `Span` information to tokens returned by `rustc_lexer` and interns identifiers.
59
60 [rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
61 [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
62 [ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
63 [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
64 [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
65 [rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
66 [parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
67 [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
68 [`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
69 [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
70 [visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html
71 [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html