]>
Commit | Line | Data |
---|---|---|
60c5eb7d | 1 | # Lexing and Parsing |
a1dfa0c6 | 2 | |
6a06907d XL |
3 | As of <!-- date: 2021-01 --> January 2021, the lexer and parser are undergoing |
4 | refactoring to allow extracting them into libraries. | |
60c5eb7d XL |
5 | |
6 | The very first thing the compiler does is take the program (in Unicode | |
7 | characters) and turn it into something the compiler can work with more | |
8 | conveniently than strings. This happens in two stages: Lexing and Parsing. | |
a1dfa0c6 | 9 | |
6a06907d | 10 | Lexing takes strings and turns them into streams of [tokens]. For example, |
60c5eb7d | 11 | `a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`. |
6a06907d | 12 | The lexer lives in [`rustc_lexer`][lexer]. |
a1dfa0c6 | 13 | |
6a06907d | 14 | [tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html |
60c5eb7d | 15 | [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html |
a1dfa0c6 | 16 | |
60c5eb7d XL |
17 | Parsing then takes streams of tokens and turns them into a structured |
18 | form which is easier for the compiler to work with, usually called an [*Abstract | |
19 | Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory, | |
20 | using a `Span` to link a particular AST node back to its source text. | |
a1dfa0c6 | 21 | |
6a06907d | 22 | The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for |
60c5eb7d XL |
23 | tokens and token streams, data structures/traits for mutating ASTs, and shared |
24 | definitions for other AST-related parts of the compiler (like the lexer and | |
25 | macro-expansion). | |
a1dfa0c6 | 26 | |
6a06907d | 27 | The parser is defined in [`rustc_parse`][rustc_parse], along with a |
60c5eb7d | 28 | high-level interface to the lexer and some validation routines that run after |
ba9703b0 | 29 | macro expansion. In particular, the [`rustc_parse::parser`][parser] contains |
60c5eb7d | 30 | the parser implementation. |
a1dfa0c6 | 31 | |
6a06907d XL |
32 | The main entrypoint to the parser is via the various `parse_*` functions and others in the |
33 | [parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile] | |
a1dfa0c6 XL |
34 | (e.g. the source in a single file) into a token stream, create a parser from |
35 | the token stream, and then execute the parser to get a `Crate` (the root AST | |
36 | node). | |
37 | ||
38 | To minimise the amount of copying that is done, both the `StringReader` and | |
39 | `Parser` have lifetimes which bind them to the parent `ParseSess`. This contains | |
40 | all the information needed while parsing, as well as the `SourceMap` itself. | |
41 | ||
6a06907d XL |
42 | Note that while parsing, we may encounter macro definitions or invocations. We |
43 | set these aside to be expanded (see [this chapter](./macro-expansion.md)). | |
44 | Expansion may itself require parsing the output of the macro, which may reveal | |
45 | more macros to be expanded, and so on. | |
46 | ||
416331ca XL |
47 | ## More on Lexical Analysis |
48 | ||
49 | Code for lexical analysis is split between two crates: | |
50 | ||
51 | - `rustc_lexer` crate is responsible for breaking a `&str` into chunks | |
52 | constituting tokens. Although it is popular to implement lexers as generated | |
53 | finite state machines, the lexer in `rustc_lexer` is hand-written. | |
54 | ||
6a06907d | 55 | - [`StringReader`] from [`rustc_ast`][rustc_ast] integrates `rustc_lexer` with `rustc` |
416331ca XL |
56 | specific data structures. Specifically, it adds `Span` information to tokens |
57 | returned by `rustc_lexer` and interns identifiers. | |
58 | ||
6a06907d | 59 | [rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html |
a1dfa0c6 XL |
60 | [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html |
61 | [ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree | |
dfeec247 | 62 | [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html |
74b04a01 | 63 | [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html |
6a06907d XL |
64 | [rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html |
65 | [parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html | |
60c5eb7d | 66 | [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html |
74b04a01 | 67 | [`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/parse/parser/struct.Parser.html |
60c5eb7d | 68 | [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html |
74b04a01 | 69 | [visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html |
dfeec247 | 70 | [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html |