]> git.proxmox.com Git - rustc.git/blame - compiler/rustc_codegen_llvm/src/debuginfo/doc.rs
New upstream version 1.52.1+dfsg1
[rustc.git] / compiler / rustc_codegen_llvm / src / debuginfo / doc.rs
CommitLineData
d9579d0f
AL
1//! # Debug Info Module
2//!
3//! This module serves the purpose of generating debug symbols. We use LLVM's
532ac7d7 4//! [source level debugging](https://llvm.org/docs/SourceLevelDebugging.html)
d9579d0f
AL
5//! features for generating the debug information. The general principle is
6//! this:
7//!
8//! Given the right metadata in the LLVM IR, the LLVM code generator is able to
9//! create DWARF debug symbols for the given code. The
532ac7d7 10//! [metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured
d9579d0f
AL
11//! much like DWARF *debugging information entries* (DIE), representing type
12//! information such as datatype layout, function signatures, block layout,
13//! variable location and scope information, etc. It is the purpose of this
14//! module to generate correct metadata and insert it into the LLVM IR.
15//!
16//! As the exact format of metadata trees may change between different LLVM
17//! versions, we now use LLVM
532ac7d7 18//! [DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html)
d9579d0f
AL
19//! to create metadata where possible. This will hopefully ease the adaption of
20//! this module to future LLVM versions.
21//!
22//! The public API of the module is a set of functions that will insert the
23//! correct metadata into the LLVM IR when called with the right parameters.
24//! The module is thus driven from an outside client with functions like
2c00a5a8 25//! `debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`.
d9579d0f
AL
26//!
27//! Internally the module will try to reuse already created metadata by
28//! utilizing a cache. The way to get a shared metadata node when needed is
29//! thus to just call the corresponding function in this module:
30//!
29967ef6 31//! let file_metadata = file_metadata(cx, file);
d9579d0f
AL
32//!
33//! The function will take care of probing the cache for an existing node for
34//! that exact file path.
35//!
36//! All private state used by the module is stored within either the
2c00a5a8
XL
37//! CrateDebugContext struct (owned by the CodegenCx) or the
38//! FunctionDebugContext (owned by the FunctionCx).
d9579d0f
AL
39//!
40//! This file consists of three conceptual sections:
41//! 1. The public interface of the module
42//! 2. Module-internal metadata creation functions
43//! 3. Minor utility functions
44//!
45//!
46//! ## Recursive Types
47//!
48//! Some kinds of types, such as structs and enums can be recursive. That means
49//! that the type definition of some type X refers to some other type which in
50//! turn (transitively) refers to X. This introduces cycles into the type
51//! referral graph. A naive algorithm doing an on-demand, depth-first traversal
52//! of this graph when describing types, can get trapped in an endless loop
53//! when it reaches such a cycle.
54//!
55//! For example, the following simple type for a singly-linked list...
56//!
57//! ```
58//! struct List {
92a42be0 59//! value: i32,
d9579d0f
AL
60//! tail: Option<Box<List>>,
61//! }
62//! ```
63//!
64//! will generate the following callstack with a naive DFS algorithm:
65//!
66//! ```
67//! describe(t = List)
92a42be0 68//! describe(t = i32)
d9579d0f
AL
69//! describe(t = Option<Box<List>>)
70//! describe(t = Box<List>)
71//! describe(t = List) // at the beginning again...
72//! ...
73//! ```
74//!
75//! To break cycles like these, we use "forward declarations". That is, when
76//! the algorithm encounters a possibly recursive type (any struct or enum), it
77//! immediately creates a type description node and inserts it into the cache
78//! *before* describing the members of the type. This type description is just
79//! a stub (as type members are not described and added to it yet) but it
80//! allows the algorithm to already refer to the type. After the stub is
81//! inserted into the cache, the algorithm continues as before. If it now
82//! encounters a recursive reference, it will hit the cache and does not try to
83//! describe the type anew.
84//!
3b2f2976 85//! This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
d9579d0f
AL
86//! which represents a kind of continuation, storing all state needed to
87//! continue traversal at the type members after the type has been registered
88//! with the cache. (This implementation approach might be a tad over-
89//! engineered and may change in the future)
90//!
91//!
92//! ## Source Locations and Line Information
93//!
94//! In addition to data type descriptions the debugging information must also
95//! allow to map machine code locations back to source code locations in order
96//! to be useful. This functionality is also handled in this module. The
97//! following functions allow to control source mappings:
98//!
99//! + set_source_location()
100//! + clear_source_location()
101//! + start_emitting_source_locations()
102//!
103//! `set_source_location()` allows to set the current source location. All IR
104//! instructions created after a call to this function will be linked to the
105//! given source location, until another location is specified with
106//! `set_source_location()` or the source location is cleared with
107//! `clear_source_location()`. In the later case, subsequent IR instruction
108//! will not be linked to any source location. As you can see, this is a
109//! stateful API (mimicking the one in LLVM), so be careful with source
110//! locations set by previous calls. It's probably best to not rely on any
111//! specific state being present at a given point in code.
112//!
113//! One topic that deserves some extra attention is *function prologues*. At
114//! the beginning of a function's machine code there are typically a few
115//! instructions for loading argument values into allocas and checking if
116//! there's enough stack space for the function to execute. This *prologue* is
117//! not visible in the source code and LLVM puts a special PROLOGUE END marker
118//! into the line table at the first non-prologue instruction of the function.
119//! In order to find out where the prologue ends, LLVM looks for the first
120//! instruction in the function body that is linked to a source location. So,
121//! when generating prologue instructions we have to make sure that we don't
122//! emit source location information until the 'real' function body begins. For
123//! this reason, source location emission is disabled by default for any new
94b46f34 124//! function being codegened and is only activated after a call to the third
d9579d0f 125//! function from the list above, `start_emitting_source_locations()`. This
94b46f34 126//! function should be called right before regularly starting to codegen the
d9579d0f
AL
127//! top-level block of the given function.
128//!
129//! There is one exception to the above rule: `llvm.dbg.declare` instruction
130//! must be linked to the source location of the variable being declared. For
131//! function parameters these `llvm.dbg.declare` instructions typically occur
132//! in the middle of the prologue, however, they are ignored by LLVM's prologue
133//! detection. The `create_argument_metadata()` and related functions take care
134//! of linking the `llvm.dbg.declare` instructions to the correct source
135//! locations even while source location emission is still disabled, so there
136//! is no need to do anything special with source location handling here.
137//!
138//! ## Unique Type Identification
139//!
140//! In order for link-time optimization to work properly, LLVM needs a unique
141//! type identifier that tells it across compilation units which types are the
142//! same as others. This type identifier is created by
60c5eb7d 143//! `TypeMap::get_unique_type_id_of_type()` using the following algorithm:
d9579d0f
AL
144//!
145//! (1) Primitive types have their name as ID
146//! (2) Structs, enums and traits have a multipart identifier
147//!
148//! (1) The first part is the SVH (strict version hash) of the crate they
3b2f2976 149//! were originally defined in
d9579d0f
AL
150//!
151//! (2) The second part is the ast::NodeId of the definition in their
3b2f2976 152//! original crate
d9579d0f
AL
153//!
154//! (3) The final part is a concatenation of the type IDs of their concrete
3b2f2976 155//! type arguments if they are generic types.
d9579d0f
AL
156//!
157//! (3) Tuple-, pointer and function types are structurally identified, which
158//! means that they are equivalent if their component types are equivalent
0731742a 159//! (i.e., (i32, i32) is the same regardless in which crate it is used).
d9579d0f
AL
160//!
161//! This algorithm also provides a stable ID for types that are defined in one
162//! crate but instantiated from metadata within another crate. We just have to
9fa01778 163//! take care to always map crate and `NodeId`s back to the original crate
d9579d0f
AL
164//! context.
165//!
166//! As a side-effect these unique type IDs also help to solve a problem arising
167//! from lifetime parameters. Since lifetime parameters are completely omitted
168//! in debuginfo, more than one `Ty` instance may map to the same debuginfo
169//! type metadata, that is, some struct `Struct<'a>` may have N instantiations
170//! with different concrete substitutions for `'a`, and thus there will be N
171//! `Ty` instances for the type `Struct<'a>` even though it is not generic
172//! otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
9fa01778 173//! cheap identifier for type metadata -- we have done this in the past, but it
d9579d0f
AL
174//! led to unnecessary metadata duplication in the best case and LLVM
175//! assertions in the worst. However, the unique type ID as described above
176//! *can* be used as identifier. Since it is comparatively expensive to
177//! construct, though, `ty::type_id()` is still used additionally as an
178//! optimization for cases where the exact same type has been seen before
179//! (which is most of the time).