]>
Commit | Line | Data |
---|---|---|
d9579d0f AL |
1 | //! # Debug Info Module |
2 | //! | |
3 | //! This module serves the purpose of generating debug symbols. We use LLVM's | |
532ac7d7 | 4 | //! [source level debugging](https://llvm.org/docs/SourceLevelDebugging.html) |
d9579d0f AL |
5 | //! features for generating the debug information. The general principle is |
6 | //! this: | |
7 | //! | |
8 | //! Given the right metadata in the LLVM IR, the LLVM code generator is able to | |
9 | //! create DWARF debug symbols for the given code. The | |
532ac7d7 | 10 | //! [metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured |
d9579d0f AL |
11 | //! much like DWARF *debugging information entries* (DIE), representing type |
12 | //! information such as datatype layout, function signatures, block layout, | |
13 | //! variable location and scope information, etc. It is the purpose of this | |
14 | //! module to generate correct metadata and insert it into the LLVM IR. | |
15 | //! | |
16 | //! As the exact format of metadata trees may change between different LLVM | |
17 | //! versions, we now use LLVM | |
532ac7d7 | 18 | //! [DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html) |
d9579d0f AL |
19 | //! to create metadata where possible. This will hopefully ease the adaption of |
20 | //! this module to future LLVM versions. | |
21 | //! | |
22 | //! The public API of the module is a set of functions that will insert the | |
23 | //! correct metadata into the LLVM IR when called with the right parameters. | |
24 | //! The module is thus driven from an outside client with functions like | |
2c00a5a8 | 25 | //! `debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`. |
d9579d0f AL |
26 | //! |
27 | //! Internally the module will try to reuse already created metadata by | |
28 | //! utilizing a cache. The way to get a shared metadata node when needed is | |
29 | //! thus to just call the corresponding function in this module: | |
30 | //! | |
29967ef6 | 31 | //! let file_metadata = file_metadata(cx, file); |
d9579d0f AL |
32 | //! |
33 | //! The function will take care of probing the cache for an existing node for | |
34 | //! that exact file path. | |
35 | //! | |
36 | //! All private state used by the module is stored within either the | |
2c00a5a8 XL |
37 | //! CrateDebugContext struct (owned by the CodegenCx) or the |
38 | //! FunctionDebugContext (owned by the FunctionCx). | |
d9579d0f AL |
39 | //! |
40 | //! This file consists of three conceptual sections: | |
41 | //! 1. The public interface of the module | |
42 | //! 2. Module-internal metadata creation functions | |
43 | //! 3. Minor utility functions | |
44 | //! | |
45 | //! | |
46 | //! ## Recursive Types | |
47 | //! | |
48 | //! Some kinds of types, such as structs and enums can be recursive. That means | |
49 | //! that the type definition of some type X refers to some other type which in | |
50 | //! turn (transitively) refers to X. This introduces cycles into the type | |
51 | //! referral graph. A naive algorithm doing an on-demand, depth-first traversal | |
52 | //! of this graph when describing types, can get trapped in an endless loop | |
53 | //! when it reaches such a cycle. | |
54 | //! | |
55 | //! For example, the following simple type for a singly-linked list... | |
56 | //! | |
57 | //! ``` | |
58 | //! struct List { | |
92a42be0 | 59 | //! value: i32, |
d9579d0f AL |
60 | //! tail: Option<Box<List>>, |
61 | //! } | |
62 | //! ``` | |
63 | //! | |
64 | //! will generate the following callstack with a naive DFS algorithm: | |
65 | //! | |
66 | //! ``` | |
67 | //! describe(t = List) | |
92a42be0 | 68 | //! describe(t = i32) |
d9579d0f AL |
69 | //! describe(t = Option<Box<List>>) |
70 | //! describe(t = Box<List>) | |
71 | //! describe(t = List) // at the beginning again... | |
72 | //! ... | |
73 | //! ``` | |
74 | //! | |
75 | //! To break cycles like these, we use "forward declarations". That is, when | |
76 | //! the algorithm encounters a possibly recursive type (any struct or enum), it | |
77 | //! immediately creates a type description node and inserts it into the cache | |
78 | //! *before* describing the members of the type. This type description is just | |
79 | //! a stub (as type members are not described and added to it yet) but it | |
80 | //! allows the algorithm to already refer to the type. After the stub is | |
81 | //! inserted into the cache, the algorithm continues as before. If it now | |
82 | //! encounters a recursive reference, it will hit the cache and does not try to | |
83 | //! describe the type anew. | |
84 | //! | |
3b2f2976 | 85 | //! This behavior is encapsulated in the 'RecursiveTypeDescription' enum, |
d9579d0f AL |
86 | //! which represents a kind of continuation, storing all state needed to |
87 | //! continue traversal at the type members after the type has been registered | |
88 | //! with the cache. (This implementation approach might be a tad over- | |
89 | //! engineered and may change in the future) | |
90 | //! | |
91 | //! | |
92 | //! ## Source Locations and Line Information | |
93 | //! | |
94 | //! In addition to data type descriptions the debugging information must also | |
95 | //! allow to map machine code locations back to source code locations in order | |
96 | //! to be useful. This functionality is also handled in this module. The | |
97 | //! following functions allow to control source mappings: | |
98 | //! | |
99 | //! + set_source_location() | |
100 | //! + clear_source_location() | |
101 | //! + start_emitting_source_locations() | |
102 | //! | |
103 | //! `set_source_location()` allows to set the current source location. All IR | |
104 | //! instructions created after a call to this function will be linked to the | |
105 | //! given source location, until another location is specified with | |
106 | //! `set_source_location()` or the source location is cleared with | |
107 | //! `clear_source_location()`. In the later case, subsequent IR instruction | |
108 | //! will not be linked to any source location. As you can see, this is a | |
109 | //! stateful API (mimicking the one in LLVM), so be careful with source | |
110 | //! locations set by previous calls. It's probably best to not rely on any | |
111 | //! specific state being present at a given point in code. | |
112 | //! | |
113 | //! One topic that deserves some extra attention is *function prologues*. At | |
114 | //! the beginning of a function's machine code there are typically a few | |
115 | //! instructions for loading argument values into allocas and checking if | |
116 | //! there's enough stack space for the function to execute. This *prologue* is | |
117 | //! not visible in the source code and LLVM puts a special PROLOGUE END marker | |
118 | //! into the line table at the first non-prologue instruction of the function. | |
119 | //! In order to find out where the prologue ends, LLVM looks for the first | |
120 | //! instruction in the function body that is linked to a source location. So, | |
121 | //! when generating prologue instructions we have to make sure that we don't | |
122 | //! emit source location information until the 'real' function body begins. For | |
123 | //! this reason, source location emission is disabled by default for any new | |
94b46f34 | 124 | //! function being codegened and is only activated after a call to the third |
d9579d0f | 125 | //! function from the list above, `start_emitting_source_locations()`. This |
94b46f34 | 126 | //! function should be called right before regularly starting to codegen the |
d9579d0f AL |
127 | //! top-level block of the given function. |
128 | //! | |
129 | //! There is one exception to the above rule: `llvm.dbg.declare` instruction | |
130 | //! must be linked to the source location of the variable being declared. For | |
131 | //! function parameters these `llvm.dbg.declare` instructions typically occur | |
132 | //! in the middle of the prologue, however, they are ignored by LLVM's prologue | |
133 | //! detection. The `create_argument_metadata()` and related functions take care | |
134 | //! of linking the `llvm.dbg.declare` instructions to the correct source | |
135 | //! locations even while source location emission is still disabled, so there | |
136 | //! is no need to do anything special with source location handling here. | |
137 | //! | |
138 | //! ## Unique Type Identification | |
139 | //! | |
140 | //! In order for link-time optimization to work properly, LLVM needs a unique | |
141 | //! type identifier that tells it across compilation units which types are the | |
142 | //! same as others. This type identifier is created by | |
60c5eb7d | 143 | //! `TypeMap::get_unique_type_id_of_type()` using the following algorithm: |
d9579d0f AL |
144 | //! |
145 | //! (1) Primitive types have their name as ID | |
146 | //! (2) Structs, enums and traits have a multipart identifier | |
147 | //! | |
148 | //! (1) The first part is the SVH (strict version hash) of the crate they | |
3b2f2976 | 149 | //! were originally defined in |
d9579d0f AL |
150 | //! |
151 | //! (2) The second part is the ast::NodeId of the definition in their | |
3b2f2976 | 152 | //! original crate |
d9579d0f AL |
153 | //! |
154 | //! (3) The final part is a concatenation of the type IDs of their concrete | |
3b2f2976 | 155 | //! type arguments if they are generic types. |
d9579d0f AL |
156 | //! |
157 | //! (3) Tuple-, pointer and function types are structurally identified, which | |
158 | //! means that they are equivalent if their component types are equivalent | |
0731742a | 159 | //! (i.e., (i32, i32) is the same regardless in which crate it is used). |
d9579d0f AL |
160 | //! |
161 | //! This algorithm also provides a stable ID for types that are defined in one | |
162 | //! crate but instantiated from metadata within another crate. We just have to | |
9fa01778 | 163 | //! take care to always map crate and `NodeId`s back to the original crate |
d9579d0f AL |
164 | //! context. |
165 | //! | |
166 | //! As a side-effect these unique type IDs also help to solve a problem arising | |
167 | //! from lifetime parameters. Since lifetime parameters are completely omitted | |
168 | //! in debuginfo, more than one `Ty` instance may map to the same debuginfo | |
169 | //! type metadata, that is, some struct `Struct<'a>` may have N instantiations | |
170 | //! with different concrete substitutions for `'a`, and thus there will be N | |
171 | //! `Ty` instances for the type `Struct<'a>` even though it is not generic | |
172 | //! otherwise. Unfortunately this means that we cannot use `ty::type_id()` as | |
9fa01778 | 173 | //! cheap identifier for type metadata -- we have done this in the past, but it |
d9579d0f AL |
174 | //! led to unnecessary metadata duplication in the best case and LLVM |
175 | //! assertions in the worst. However, the unique type ID as described above | |
176 | //! *can* be used as identifier. Since it is comparatively expensive to | |
177 | //! construct, though, `ty::type_id()` is still used additionally as an | |
178 | //! optimization for cases where the exact same type has been seen before | |
179 | //! (which is most of the time). |