]> git.proxmox.com Git - rustc.git/blob - src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md
New upstream version 1.52.0~beta.3+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / debugging-support-in-rustc.md
1 # Debugging support in the Rust compiler
2
3 <!-- toc -->
4
5 This document explains the state of debugging tools support in the Rust compiler (rustc).
6 The document gives an overview of debugging tools like GDB, LLDB etc. and infrastructure
7 around Rust compiler to debug Rust code. If you want to learn how to debug the Rust compiler
8 itself, then you must see [Debugging the Compiler] page.
9
10 The material is gathered from YouTube video [Tom Tromey discusses debugging support in rustc].
11
12 ## Preliminaries
13
14 ### Debuggers
15
16 According to Wikipedia
17
18 > A [debugger or debugging tool] is a computer program that is used to test and debug
19 > other programs (the "target" program).
20
21 Writing a debugger from scratch for a language requires a lot of work, especially if
22 debuggers have to be supported on various platforms. GDB and LLDB, however, can be
23 extended to support debugging a language. This is the path that Rust has chosen.
24 This document's main goal is to document the said debuggers support in Rust compiler.
25
26 ### DWARF
27
28 According to the [DWARF] standard website
29
30 > DWARF is a debugging file format used by many compilers and debuggers to support source level
31 > debugging. It addresses the requirements of a number of procedural languages,
32 > such as C, C++, and Fortran, and is designed to be extensible to other languages.
33 > DWARF is architecture independent and applicable to any processor or operating system.
34 > It is widely used on Unix, Linux and other operating systems,
35 > as well as in stand-alone environments.
36
37 DWARF reader is a program that consumes the DWARF format and creates debugger compatible output.
38 This program may live in the compiler itself. DWARF uses a data structure called
39 Debugging Information Entry (DIE) which stores the information as "tags" to denote functions,
40 variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc.
41 You can also invent your own tags and attributes.
42
43 ## Supported debuggers
44
45 ### GDB
46
47 We have our own fork of GDB - [https://github.com/rust-dev-tools/gdb]
48
49 #### Rust expression parser
50
51 To be able to show debug output we need an expression parser.
52 This (GDB) expression parser is written in [Bison] and is only a subset of Rust expressions.
53 This means that this parser can parse only a subset of Rust expressions.
54 GDB parser was written from scratch and has no relation to any other parser.
55 For example, this parser is not related to Rustc's parser.
56
57 GDB has Rust like value and type output. It can print values and types in a way
58 that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB,
59 it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust].
60
61 #### Parser extensions
62
63 Expression parser has a couple of extensions in it to facilitate features that you cannot do
64 with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special
65 code in the DWARF reader in GDB to support the extensions.
66
67 A couple of examples of DWARF reader support needed are as follows -
68
69 1. Enum: Needed for support for enum types. The Rustc writes the information about enum into
70 DWARF and GDB reads the DWARF to understand where is the tag field or is there a tag
71 field or is the tag slot shared with non-zero optimization etc.
72
73 2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF
74 also points to a stub description of the corresponding vtable which in turn points to the
75 concrete type for which this trait object exists. This means that you can do a `print *object`
76 for that trait object, and GDB will understand how to find the correct type of the payload in
77 the trait object.
78
79 **TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than
80 this guide page so there is no duplication. This is regarding the following comments:
81
82 [This comment by Tom](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r284027340)
83 > gdb's Rust extensions and limitations are documented in the gdb manual:
84 https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that
85 gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser
86 implements the gdb @ extension.
87
88 [This question by Aman](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r285401353)
89 > @tromey do you think we should mention this part in the GDB-Rust document rather than this
90 document so there is no duplication etc.?
91
92 #### Developer notes
93
94 * This work is now upstream. Bugs can be reported in [GDB Bugzilla].
95
96 ### LLDB
97
98 Fork of LLVM project - [https://github.com/rust-lang/llvm-project]
99
100 LLDB currently only works on macOS because of a dependency issue. This issue was easier to
101 solve for macOS as compared to Linux. However, Tom has a possible solution which can enable
102 us to ship LLDB everywhere.
103
104 #### Rust expression parser
105
106 This expression parser is written in C++. It is a type of [Recursive Descent parser].
107 Implements slightly less of the Rust language than GDB. LLDB has Rust like value and type output.
108
109 #### Parser extensions
110
111 There is some special code in the DWARF reader in LLDB to support the extensions.
112 A couple of examples of DWARF reader support needed are as follows -
113
114 1. Enum: Needed for support for enum types. The Rustc writes the information about
115 enum into DWARF and LLDB reads the DWARF to understand where is the tag field or
116 is there a tag field or is the tag slot shared with non-zero optimization etc.
117 In other words, it has enum support as well.
118
119 #### Developer notes
120
121 * None of the LLDB work is upstream. This [rust-lang/lldb wiki page] explains a few details.
122 * The reason for forking LLDB is that LLDB recently removed all the other language plugins
123 due to lack of maintenance.
124 * LLDB has a plugin architecture but that does not work for language support.
125 * LLDB is available via Rust build (`rustup`).
126 * GDB generally works better on Linux.
127
128 ## DWARF and Rustc
129
130 [DWARF] is the standard way compilers generate debugging information that debuggers read.
131 It is _the_ debugging format on macOS and Linux. It is a multi-language, extensible format
132 and is mostly good enough for Rust's purposes. Hence, the current implementation reuses DWARF's
133 concepts. This is true even if some of the concepts in DWARF do not align with Rust
134 semantically because generally there can be some kind of mapping between the two.
135
136 We have some DWARF extensions that the Rust compiler emits and the debuggers understand that
137 are _not_ in the DWARF standard.
138
139 * Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a
140 `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object
141 pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb
142 repository:
143
144 ```asm
145 <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type)
146 <1aa> DW_AT_containing_type: <0x1b4>
147 <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable
148 <1b2> DW_AT_byte_size : 0
149 <1b3> DW_AT_alignment : 8
150 ```
151
152 * The other extension is that the Rust compiler can emit a tagless discriminated union.
153 See [DWARF feature request] for this item.
154
155 ### Current limitations of DWARF
156
157 * Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF.
158 * DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits
159 fields with `__0` and debuggers look for a sequence of such names to overcome this limitation.
160 For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`.
161 This is resolved via the Rust parser in the debugger so now you can do `x.0`.
162
163 DWARF relies on debuggers to know some information about platform ABI.
164 Rust does not do that all the time.
165
166 ## Developer notes
167
168 This section is from the talk about certain aspects of development.
169
170 ## What is missing
171
172 ### Shipping GDB in Rustup
173
174 Tracking issue: [https://github.com/rust-lang/rust/issues/34457]
175
176 Shipping GDB requires change to Rustup delivery system. To manage Rustup build size and
177 times we need to build GDB separately, on its own and somehow provide the artifacts produced
178 to be included in the final build. However, if we can ship GDB with rustup, it will simplify
179 the development process by having compiler emit new debug info which can be readily consumed.
180
181 Main issue in achieving this is setting up dependencies. One such dependency is Python. That
182 is why we have our own fork of GDB because one of the drivers is patched on Rust's side to
183 check the correct version of Python (Python 2.7 in this case. *Note: Python3 is not chosen
184 for this purpose because Python's stable ABI is limited and is not sufficient for GDB's needs.
185 See [https://docs.python.org/3/c-api/stable.html]*).
186
187 This is to keep updates to debugger as fast as possible as we make changes to the debugging symbols.
188 In essence, to ship the debugger as soon as new debugging info is added. GDB only releases
189 every six months or so. However, the changes that are
190 not related to Rust itself should ideally be first merged to upstream eventually.
191
192 ### Code signing for LLDB debug server on macOS
193
194 According to Wikipedia, [System Integrity Protection] is
195
196 > System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature
197 > of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of
198 > mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned
199 > files and directories against modifications by processes without a specific "entitlement",
200 > even when executed by the root user or a user with root privileges (sudo).
201
202 It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be
203 code signed. The certificate that signs it has to be trusted on your machine.
204
205 See [Apple developer documentation for System Integrity Protection].
206
207 We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if
208 Mozilla cannot do this because it is at the maximum number of
209 keys it is allowed to sign. Tom does not know if Mozilla could get more keys.
210
211 Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple.
212 This problem is not technical in nature. If we had such a key we could sign GDB as well and
213 ship that.
214
215 ### DWARF and Traits
216
217 Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()`
218 does not work as is. The reason being that method is implemented by a trait, as opposed
219 to a type. That information is not present so finding trait methods is missing.
220
221 DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this
222 interface type as traits.
223
224 DWARF only deals with concrete names, not the reference types. So, a given implementation of a
225 trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for
226 which it is implemented would describe all the interfaces this type implements. This requires a
227 DWARF extension.
228
229 Issue on Github: [https://github.com/rust-lang/rust/issues/33014]
230
231 ## Typical process for a Debug Info change (LLVM)
232
233 LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into.
234 This is why we need to change LLVM first because that is emitted first and not DWARF directly.
235 This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off
236 some LLVM DI builder methods are called to construct representation of a type.
237
238 The steps of this process are as follows -
239
240 1. LLVM needs changing.
241
242 LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first.
243
244 Get sign off on LLVM maintainers that this is a good idea.
245
246 2. Change the DWARF extension.
247
248 3. Update the debuggers.
249
250 Update DWARF readers, expression evaluators.
251
252 4. Update Rust compiler.
253
254 Change it to emit this new information.
255
256 ### Procedural macro stepping
257
258 A deeply profound question is that how do you actually debug a procedural macro?
259 What is the location you emit for a macro expansion? Consider some of the following cases -
260
261 * You can emit location of the invocation of the macro.
262 * You can emit the location of the definition of the macro.
263 * You can emit locations of the content of the macro.
264
265 RFC: [https://github.com/rust-lang/rfcs/pull/2117]
266
267 Focus is to let macros decide what to do. This can be achieved by having some kind of attribute
268 that lets the macro tell the compiler where the line marker should be. This affects where you
269 set the breakpoints and what happens when you step it.
270
271 ## Source file checksums in debug info
272
273 Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that
274 contributed to the associated binary.
275
276 The cryptographic hash can be used by a debugger to verify that the source file matches the
277 executable. If the source file does not match, the debugger can provide a warning to the user.
278
279 The hash can also be used to prove that a given source file has not been modified since it was
280 used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities,
281 using SHA256 is recommended for this application.
282
283 The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in
284 the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata.
285
286 A default hashing algorithm is set in the target specification. This allows the target to
287 specify the best hash available, since not all targets support all hash algorithms.
288
289 The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=`
290 command-line option.
291
292 #### DWARF 5
293 DWARF version 5 supports embedding an MD5 hash to validate the source file version in use.
294 DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5
295
296 #### LLVM
297 LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node.
298
299 [LLVM DIFile documentation](https://llvm.org/docs/LangRef.html#difile)
300
301 #### Microsoft Visual C++ Compiler /ZH option
302 The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the `/ZH`
303 compiler option.
304
305 [MSVC /ZH documentation](https://docs.microsoft.com/en-us/cpp/build/reference/zh)
306
307 #### Clang
308 Clang always embeds an MD5 checksum, though this does not appear in documentation.
309
310 ## Future work
311
312 #### Name mangling changes
313
314 * New demangler in `libiberty` (gcc source tree).
315 * New demangler in LLVM or LLDB.
316
317 **TODO**: Check the location of the demangler source.
318 [Question on Github](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r283062536).
319
320 #### Reuse Rust compiler for expressions
321
322 This is an important idea because debuggers by and large do not try to implement type
323 inference. You need to be much more explicit when you type into the debugger than your
324 actual source code. So, you cannot just copy and paste an expression from your source
325 code to debugger and expect the same answer but this would be nice. This can be helped
326 by using compiler.
327
328 It is certainly doable but it is a large project. You certainly need a bridge to the
329 debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang)
330 have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC.
331
332 Both debuggers expression evaluation implement both a superset and a subset of Rust.
333 They implement just the expression language but they also add some extensions like GDB has
334 convenience variables. Therefore, if you are taking this route then you not only need
335 to do this bridge but may have to add some mode to let the compiler understand some extensions.
336
337 [Tom Tromey discusses debugging support in rustc]: https://www.youtube.com/watch?v=elBxMRSNYr4
338 [Debugging the Compiler]: compiler-debugging.md
339 [debugger or debugging tool]: https://en.wikipedia.org/wiki/Debugger
340 [Bison]: https://www.gnu.org/software/bison/
341 [ptype]: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_109.html
342 [rust-lang/lldb wiki page]: https://github.com/rust-lang/lldb/wiki
343 [DWARF]: http://dwarfstd.org
344 [manual for GDB/Rust]: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html
345 [GDB Bugzilla]: https://sourceware.org/bugzilla/
346 [Recursive Descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser
347 [System Integrity Protection]: https://en.wikipedia.org/wiki/System_Integrity_Protection
348 [https://github.com/rust-dev-tools/gdb]: https://github.com/rust-dev-tools/gdb
349 [DWARF feature request]: http://dwarfstd.org/ShowIssue.php?issue=180517.2
350 [https://docs.python.org/3/c-api/stable.html]: https://docs.python.org/3/c-api/stable.html
351 [https://github.com/rust-lang/rfcs/pull/2117]: https://github.com/rust-lang/rfcs/pull/2117
352 [https://github.com/rust-lang/rust/issues/33014]: https://github.com/rust-lang/rust/issues/33014
353 [https://github.com/rust-lang/rust/issues/34457]: https://github.com/rust-lang/rust/issues/34457
354 [Apple developer documentation for System Integrity Protection]: https://developer.apple.com/library/archive/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_11.html#//apple_ref/doc/uid/TP40016227-SW11
355 [https://github.com/rust-lang/lldb]: https://github.com/rust-lang/lldb
356 [https://github.com/rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project