]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/the-parser.md
New upstream version 1.44.1+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / the-parser.md
1 # Lexing and Parsing
2
3 > The parser and lexer are currently undergoing a lot of refactoring, so parts
4 > of this chapter may be out of date.
5
6 The very first thing the compiler does is take the program (in Unicode
7 characters) and turn it into something the compiler can work with more
8 conveniently than strings. This happens in two stages: Lexing and Parsing.
9
10 Lexing takes strings and turns them into streams of tokens. For example,
11 `a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`.
12 The lexer lives in [`librustc_lexer`][lexer].
13
14 [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
15
16 Parsing then takes streams of tokens and turns them into a structured
17 form which is easier for the compiler to work with, usually called an [*Abstract
18 Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory,
19 using a `Span` to link a particular AST node back to its source text.
20
21 The AST is defined in [`librustc_ast`][librustc_ast], along with some definitions for
22 tokens and token streams, data structures/traits for mutating ASTs, and shared
23 definitions for other AST-related parts of the compiler (like the lexer and
24 macro-expansion).
25
26 The parser is defined in [`librustc_parse`][librustc_parse], along with a
27 high-level interface to the lexer and some validation routines that run after
28 macro expansion. In particular, the [`rustc_parse::parser`][parser] contains
29 the parser implementation.
30
31 The main entrypoint to the parser is via the various `parse_*` functions in the
32 [parser][parser]. They let you do things like turn a [`SourceFile`][sourcefile]
33 (e.g. the source in a single file) into a token stream, create a parser from
34 the token stream, and then execute the parser to get a `Crate` (the root AST
35 node).
36
37 To minimise the amount of copying that is done, both the `StringReader` and
38 `Parser` have lifetimes which bind them to the parent `ParseSess`. This contains
39 all the information needed while parsing, as well as the `SourceMap` itself.
40
41 ## More on Lexical Analysis
42
43 Code for lexical analysis is split between two crates:
44
45 - `rustc_lexer` crate is responsible for breaking a `&str` into chunks
46 constituting tokens. Although it is popular to implement lexers as generated
47 finite state machines, the lexer in `rustc_lexer` is hand-written.
48
49 - [`StringReader`] from [`librustc_ast`][librustc_ast] integrates `rustc_lexer` with `rustc`
50 specific data structures. Specifically, it adds `Span` information to tokens
51 returned by `rustc_lexer` and interns identifiers.
52
53 [librustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
54 [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
55 [ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
56 [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
57 [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
58 [librustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
59 [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
60 [`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/parse/parser/struct.Parser.html
61 [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
62 [visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html
63 [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html