]>
Commit | Line | Data |
---|---|---|
532ac7d7 XL |
1 | ## Debugging LLVM |
2 | ||
3 | > NOTE: If you are looking for info about code generation, please see [this | |
4 | > chapter][codegen] instead. | |
5 | ||
ba9703b0 | 6 | [codegen]: ./codegen.md |
532ac7d7 XL |
7 | |
8 | This section is about debugging compiler bugs in code generation (e.g. why the | |
9 | compiler generated some piece of code or crashed in LLVM). LLVM is a big | |
10 | project on its own that probably needs to have its own debugging document (not | |
11 | that I could find one). But here are some tips that are important in a rustc | |
12 | context: | |
13 | ||
04454e1e FG |
14 | ### Minimize the example |
15 | ||
532ac7d7 XL |
16 | As a general rule, compilers generate lots of information from analyzing code. |
17 | Thus, a useful first step is usually to find a minimal example. One way to do | |
18 | this is to | |
19 | ||
20 | 1. create a new crate that reproduces the issue (e.g. adding whatever crate is | |
21 | at fault as a dependency, and using it from there) | |
22 | ||
23 | 2. minimize the crate by removing external dependencies; that is, moving | |
24 | everything relevant to the new crate | |
25 | ||
26 | 3. further minimize the issue by making the code shorter (there are tools that | |
27 | help with this like `creduce`) | |
28 | ||
04454e1e FG |
29 | For more discussion on methodology for steps 2 and 3 above, there is an |
30 | [epic blog post][mcve-blog] from pnkfelix specifically about Rust program minimization. | |
31 | ||
32 | [mcve-blog]: https://blog.pnkfx.org/blog/2019/11/18/rust-bug-minimization-patterns/ | |
33 | ||
34 | ### Enable LLVM internal checks | |
35 | ||
532ac7d7 XL |
36 | The official compilers (including nightlies) have LLVM assertions disabled, |
37 | which means that LLVM assertion failures can show up as compiler crashes (not | |
38 | ICEs but "real" crashes) and other sorts of weird behavior. If you are | |
39 | encountering these, it is a good idea to try using a compiler with LLVM | |
40 | assertions enabled - either an "alt" nightly or a compiler you build yourself | |
41 | by setting `[llvm] assertions=true` in your config.toml - and see whether | |
42 | anything turns up. | |
43 | ||
44 | The rustc build process builds the LLVM tools into | |
45 | `./build/<host-triple>/llvm/bin`. They can be called directly. | |
04454e1e FG |
46 | These tools include: |
47 | * [`llc`], which compiles bitcode (`.bc` files) to executable code; this can be used to | |
48 | replicate LLVM backend bugs. | |
49 | * [`opt`], a bitcode transformer that runs LLVM optimization passes. | |
50 | * [`bugpoint`], which reduces large test cases to small, useful ones. | |
51 | * and many others, some of which are referenced in the text below. | |
52 | ||
53 | [`llc`]: https://llvm.org/docs/CommandGuide/llc.html | |
54 | [`opt`]: https://llvm.org/docs/CommandGuide/opt.html | |
55 | [`bugpoint`]: https://llvm.org/docs/Bugpoint.html | |
56 | ||
57 | By default, the Rust build system does not check for changes to the LLVM source code or | |
58 | its build configuration settings. So, if you need to rebuild the LLVM that is linked | |
59 | into `rustc`, first delete the file `llvm-finished-building`, which should be located | |
60 | in `build/<host-triple>/llvm/`. | |
532ac7d7 XL |
61 | |
62 | The default rustc compilation pipeline has multiple codegen units, which is | |
63 | hard to replicate manually and means that LLVM is called multiple times in | |
64 | parallel. If you can get away with it (i.e. if it doesn't make your bug | |
65 | disappear), passing `-C codegen-units=1` to rustc will make debugging easier. | |
66 | ||
04454e1e FG |
67 | ### Get your hands on raw LLVM input |
68 | ||
60c5eb7d | 69 | For rustc to generate LLVM IR, you need to pass the `--emit=llvm-ir` flag. If |
532ac7d7 XL |
70 | you are building via cargo, use the `RUSTFLAGS` environment variable (e.g. |
71 | `RUSTFLAGS='--emit=llvm-ir'`). This causes rustc to spit out LLVM IR into the | |
72 | target directory. | |
73 | ||
74 | `cargo llvm-ir [options] path` spits out the LLVM IR for a particular function | |
75 | at `path`. (`cargo install cargo-asm` installs `cargo asm` and `cargo | |
76 | llvm-ir`). `--build-type=debug` emits code for debug builds. There are also | |
77 | other useful options. Also, debug info in LLVM IR can clutter the output a lot: | |
78 | `RUSTFLAGS="-C debuginfo=0"` is really useful. | |
79 | ||
80 | `RUSTFLAGS="-C save-temps"` outputs LLVM bitcode (not the same as IR) at | |
04454e1e FG |
81 | different stages during compilation, which is sometimes useful. The output LLVM |
82 | bitcode will be in `.bc` files in the compiler's output directory, set via the | |
83 | `--out-dir DIR` argument to `rustc`. | |
532ac7d7 | 84 | |
04454e1e FG |
85 | * If you are hitting an assertion failure or segmentation fault from the LLVM |
86 | backend when invoking `rustc` itself, it is a good idea to try passing each | |
87 | of these `.bc` files to the `llc` command, and see if you get the same | |
88 | failure. (LLVM developers often prefer a bug reduced to a `.bc` file over one | |
89 | that uses a Rust crate for its minimized reproduction.) | |
94222f64 | 90 | |
04454e1e FG |
91 | * To get human readable versions of the LLVM bitcode, one just needs to convert |
92 | the bitcode (`.bc`) files to `.ll` files using `llvm-dis`, which should be in | |
93 | the target local compilation of rustc. | |
94 | ||
95 | ||
96 | Note that rustc emits different IR depending on whether `-O` is enabled, even | |
532ac7d7 XL |
97 | without LLVM's optimizations, so if you want to play with the IR rustc emits, |
98 | you should: | |
99 | ||
100 | ```bash | |
101 | $ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \ | |
102 | -C codegen-units=1 | |
103 | $ OPT=./build/$TRIPLE/llvm/bin/opt | |
104 | $ $OPT -S -O2 < my-file.ll > my | |
105 | ``` | |
106 | ||
107 | If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which | |
108 | IR causes an optimization-time assertion to fail, or to see when LLVM performs | |
109 | a particular optimization, you can pass the rustc flag `-C | |
110 | llvm-args=-print-after-all`, and possibly add `-C | |
111 | llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME` (e.g. `-C | |
112 | llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ | |
113 | 7replace17hbe10ea2e7c809b0bE'`). | |
114 | ||
115 | That produces a lot of output into standard error, so you'll want to pipe that | |
116 | to some file. Also, if you are using neither `-filter-print-funcs` nor `-C | |
117 | codegen-units=1`, then, because the multiple codegen units run in parallel, the | |
118 | printouts will mix together and you won't be able to read anything. | |
119 | ||
04454e1e FG |
120 | * One caveat to the aforementioned methodology: the `-print` family of options |
121 | to LLVM only prints the IR unit that the pass runs on (e.g., just a | |
122 | function), and does not include any referenced declarations, globals, | |
123 | metadata, etc. This means you cannot in general feed the output of `-print` | |
124 | into `llc` to reproduce a given problem. | |
125 | ||
126 | * Within LLVM itself, calling `F.getParent()->dump()` at the beginning of | |
127 | `SafeStackLegacyPass::runOnFunction` will dump the whole module, which | |
128 | may provide better basis for reproduction. (However, you | |
129 | should be able to get that same dump from the `.bc` files dumped by | |
130 | `-C save-temps`.) | |
131 | ||
532ac7d7 XL |
132 | If you want just the IR for a specific function (say, you want to see why it |
133 | causes an assertion or doesn't optimize correctly), you can use `llvm-extract`, | |
134 | e.g. | |
135 | ||
136 | ```bash | |
137 | $ ./build/$TRIPLE/llvm/bin/llvm-extract \ | |
138 | -func='_ZN11collections3str21_$LT$impl$u20$str$GT$7replace17hbe10ea2e7c809b0bE' \ | |
139 | -S \ | |
140 | < unextracted.ll \ | |
141 | > extracted.ll | |
142 | ``` | |
143 | ||
04454e1e FG |
144 | ### Investigate LLVM optimization passes |
145 | ||
146 | If you are seeing incorrect behavior due to an optimization pass, a very handy | |
147 | LLVM option is `-opt-bisect-limit`, which takes an integer denoting the index | |
148 | value of the highest pass to run. Index values for taken passes are stable | |
149 | from run to run; by coupling this with software that automates bisecting the | |
150 | search space based on the resulting program, an errant pass can be quickly | |
151 | determined. When an `-opt-bisect-limit` is specified, all runs are displayed | |
152 | to standard error, along with their index and output indicating if the | |
153 | pass was run or skipped. Setting the limit to an index of -1 (e.g., | |
154 | `RUSTFLAGS="-C llvm-args=-opt-bisect-limit=-1"`) will show all passes and | |
155 | their corresponding index values. | |
156 | ||
157 | If you want to play with the optimization pipeline, you can use the [`opt`] tool | |
158 | from `./build/<host-triple>/llvm/bin/` with the LLVM IR emitted by rustc. | |
159 | ||
160 | When investigating the implementation of LLVM itself, you should be | |
161 | aware of its [internal debug infrastructure][llvm-debug]. | |
162 | This is provided in LLVM Debug builds, which you enable for rustc | |
163 | LLVM builds by changing this setting in the config.toml: | |
164 | ``` | |
165 | [llvm] | |
166 | # Indicates whether the LLVM assertions are enabled or not | |
167 | assertions = true | |
168 | ||
169 | # Indicates whether the LLVM build is a Release or Debug build | |
170 | optimize = false | |
171 | ``` | |
172 | The quick summary is: | |
173 | * Setting `assertions=true` enables coarse-grain debug messaging. | |
174 | * beyond that, setting `optimize=false` enables fine-grain debug messaging. | |
175 | * `LLVM_DEBUG(dbgs() << msg)` in LLVM is like `debug!(msg)` in `rustc`. | |
176 | * The `-debug` option turns on all messaging; it is like setting the | |
177 | environment variable `RUSTC_LOG=debug` in `rustc`. | |
178 | * The `-debug-only=<pass1>,<pass2>` variant is more selective; it is like | |
179 | setting the environment variable `RUSTC_LOG=path1,path2` in `rustc`. | |
180 | ||
181 | [llvm-debug]: https://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option | |
182 | ||
60c5eb7d XL |
183 | ### Getting help and asking questions |
184 | ||
185 | If you have some questions, head over to the [rust-lang Zulip] and | |
186 | specifically the `#t-compiler/wg-llvm` stream. | |
187 | ||
188 | [rust-lang Zulip]: https://rust-lang.zulipchat.com/ | |
189 | ||
190 | ### Compiler options to know and love | |
191 | ||
6a06907d | 192 | The `-C help` and `-Z help` compiler switches will list out a variety |
60c5eb7d XL |
193 | of interesting options you may find useful. Here are a few of the most |
194 | common that pertain to LLVM development (some of them are employed in the | |
195 | tutorial above): | |
196 | ||
197 | - The `--emit llvm-ir` option emits a `<filename>.ll` file with LLVM IR in textual format | |
198 | - The `--emit llvm-bc` option emits in bytecode format (`<filename>.bc`) | |
6a06907d | 199 | - Passing `-C llvm-args=<foo>` allows passing pretty much all the |
60c5eb7d | 200 | options that tools like llc and opt would accept; |
6a06907d | 201 | e.g. `-C llvm-args=-print-before-all` to print IR before every LLVM |
60c5eb7d | 202 | pass. |
6a06907d | 203 | - The `-C no-prepopulate-passes` will avoid pre-populate the LLVM pass |
60c5eb7d XL |
204 | manager with a list of passes. This will allow you to view the LLVM |
205 | IR that rustc generates, not the LLVM IR after optimizations. | |
6a06907d XL |
206 | - The `-C passes=val` option allows you to supply a space separated list of extra LLVM passes to run |
207 | - The `-C save-temps` option saves all temporary output files during compilation | |
208 | - The `-Z print-llvm-passes` option will print out LLVM optimization passes being run | |
209 | - The `-Z time-llvm-passes` option measures the time of each LLVM pass | |
210 | - The `-Z verify-llvm-ir` option will verify the LLVM IR for correctness | |
211 | - The `-Z no-parallel-llvm` will disable parallel compilation of distinct compilation units | |
212 | - The `-Z llvm-time-trace` option will output a Chrome profiler compatible JSON file | |
74b04a01 | 213 | which contains details and timings for LLVM passes. |
94222f64 XL |
214 | - The `-C llvm-args=-opt-bisect-limit=<index>` option allows for bisecting LLVM |
215 | optimizations. | |
60c5eb7d | 216 | |
532ac7d7 XL |
217 | ### Filing LLVM bug reports |
218 | ||
219 | When filing an LLVM bug report, you will probably want some sort of minimal | |
220 | working example that demonstrates the problem. The Godbolt compiler explorer is | |
221 | really helpful for this. | |
222 | ||
223 | 1. Once you have some LLVM IR for the problematic code (see above), you can | |
224 | create a minimal working example with Godbolt. Go to | |
2b03887a | 225 | [llvm.godbolt.org](https://llvm.godbolt.org). |
532ac7d7 XL |
226 | |
227 | 2. Choose `LLVM-IR` as programming language. | |
228 | ||
229 | 3. Use `llc` to compile the IR to a particular target as is: | |
230 | - There are some useful flags: `-mattr` enables target features, `-march=` | |
231 | selects the target, `-mcpu=` selects the CPU, etc. | |
232 | - Commands like `llc -march=help` output all architectures available, which | |
233 | is useful because sometimes the Rust arch names and the LLVM names do not | |
234 | match. | |
235 | - If you have compiled rustc yourself somewhere, in the target directory | |
236 | you have binaries for `llc`, `opt`, etc. | |
237 | ||
238 | 4. If you want to optimize the LLVM-IR, you can use `opt` to see how the LLVM | |
239 | optimizations transform it. | |
240 | ||
241 | 5. Once you have a godbolt link demonstrating the issue, it is pretty easy to | |
04454e1e FG |
242 | fill in an LLVM bug. Just visit their [github issues page][llvm-issues]. |
243 | ||
244 | [llvm-issues]: https://github.com/llvm/llvm-project/issues | |
60c5eb7d XL |
245 | |
246 | ### Porting bug fixes from LLVM | |
247 | ||
248 | Once you've identified the bug as an LLVM bug, you will sometimes | |
249 | find that it has already been reported and fixed in LLVM, but we haven't | |
250 | gotten the fix yet (or perhaps you are familiar enough with LLVM to fix it yourself). | |
251 | ||
252 | In that case, we can sometimes opt to port the fix for the bug | |
253 | directly to our own LLVM fork, so that rustc can use it more easily. | |
254 | Our fork of LLVM is maintained in [rust-lang/llvm-project]. Once | |
255 | you've landed the fix there, you'll also need to land a PR modifying | |
256 | our submodule commits -- ask around on Zulip for help. | |
257 | ||
258 | [rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project/ |