]>
Commit | Line | Data |
---|---|---|
1b1a35ee XL |
1 | //! Propagates assignment destinations backwards in the CFG to eliminate redundant assignments. |
2 | //! | |
3 | //! # Motivation | |
4 | //! | |
5 | //! MIR building can insert a lot of redundant copies, and Rust code in general often tends to move | |
6 | //! values around a lot. The result is a lot of assignments of the form `dest = {move} src;` in MIR. | |
7 | //! MIR building for constants in particular tends to create additional locals that are only used | |
8 | //! inside a single block to shuffle a value around unnecessarily. | |
9 | //! | |
10 | //! LLVM by itself is not good enough at eliminating these redundant copies (eg. see | |
29967ef6 | 11 | //! <https://github.com/rust-lang/rust/issues/32966>), so this leaves some performance on the table |
1b1a35ee XL |
12 | //! that we can regain by implementing an optimization for removing these assign statements in rustc |
13 | //! itself. When this optimization runs fast enough, it can also speed up the constant evaluation | |
14 | //! and code generation phases of rustc due to the reduced number of statements and locals. | |
15 | //! | |
16 | //! # The Optimization | |
17 | //! | |
18 | //! Conceptually, this optimization is "destination propagation". It is similar to the Named Return | |
19 | //! Value Optimization, or NRVO, known from the C++ world, except that it isn't limited to return | |
20 | //! values or the return place `_0`. On a very high level, independent of the actual implementation | |
21 | //! details, it does the following: | |
22 | //! | |
487cf647 FG |
23 | //! 1) Identify `dest = src;` statements with values for `dest` and `src` whose storage can soundly |
24 | //! be merged. | |
1b1a35ee XL |
25 | //! 2) Replace all mentions of `src` with `dest` ("unifying" them and propagating the destination |
26 | //! backwards). | |
27 | //! 3) Delete the `dest = src;` statement (by making it a `nop`). | |
28 | //! | |
29 | //! Step 1) is by far the hardest, so it is explained in more detail below. | |
30 | //! | |
31 | //! ## Soundness | |
32 | //! | |
487cf647 FG |
33 | //! We have a pair of places `p` and `q`, whose memory we would like to merge. In order for this to |
34 | //! be sound, we need to check a number of conditions: | |
1b1a35ee | 35 | //! |
487cf647 FG |
36 | //! * `p` and `q` must both be *constant* - it does not make much sense to talk about merging them |
37 | //! if they do not consistently refer to the same place in memory. This is satisfied if they do | |
38 | //! not contain any indirection through a pointer or any indexing projections. | |
1b1a35ee | 39 | //! |
487cf647 FG |
40 | //! * We need to make sure that the goal of "merging the memory" is actually structurally possible |
41 | //! in MIR. For example, even if all the other conditions are satisfied, there is no way to | |
42 | //! "merge" `_5.foo` and `_6.bar`. For now, we ensure this by requiring that both `p` and `q` are | |
43 | //! locals with no further projections. Future iterations of this pass should improve on this. | |
1b1a35ee | 44 | //! |
487cf647 FG |
45 | //! * Finally, we want `p` and `q` to use the same memory - however, we still need to make sure that |
46 | //! each of them has enough "ownership" of that memory to continue "doing its job." More | |
47 | //! precisely, what we will check is that whenever the program performs a write to `p`, then it | |
48 | //! does not currently care about what the value in `q` is (and vice versa). We formalize the | |
49 | //! notion of "does not care what the value in `q` is" by checking the *liveness* of `q`. | |
1b1a35ee | 50 | //! |
487cf647 FG |
51 | //! Because of the difficulty of computing liveness of places that have their address taken, we do |
52 | //! not even attempt to do it. Any places that are in a local that has its address taken is | |
53 | //! excluded from the optimization. | |
1b1a35ee | 54 | //! |
487cf647 FG |
55 | //! The first two conditions are simple structural requirements on the `Assign` statements that can |
56 | //! be trivially checked. The third requirement however is more difficult and costly to check. | |
1b1a35ee | 57 | //! |
487cf647 FG |
58 | //! ## Future Improvements |
59 | //! | |
60 | //! There are a number of ways in which this pass could be improved in the future: | |
61 | //! | |
62 | //! * Merging storage liveness ranges instead of removing storage statements completely. This may | |
63 | //! improve stack usage. | |
64 | //! | |
65 | //! * Allow merging locals into places with projections, eg `_5` into `_6.foo`. | |
66 | //! | |
67 | //! * Liveness analysis with more precision than whole locals at a time. The smaller benefit of this | |
68 | //! is that it would allow us to dest prop at "sub-local" levels in some cases. The bigger benefit | |
69 | //! of this is that such liveness analysis can report more accurate results about whole locals at | |
70 | //! a time. For example, consider: | |
71 | //! | |
72 | //! ```ignore (syntax-highliting-only) | |
73 | //! _1 = u; | |
74 | //! // unrelated code | |
75 | //! _1.f1 = v; | |
76 | //! _2 = _1.f1; | |
77 | //! ``` | |
78 | //! | |
79 | //! Because the current analysis only thinks in terms of locals, it does not have enough | |
80 | //! information to report that `_1` is dead in the "unrelated code" section. | |
81 | //! | |
82 | //! * Liveness analysis enabled by alias analysis. This would allow us to not just bail on locals | |
83 | //! that ever have their address taken. Of course that requires actually having alias analysis | |
84 | //! (and a model to build it on), so this might be a bit of a ways off. | |
85 | //! | |
86 | //! * Various perf improvents. There are a bunch of comments in here marked `PERF` with ideas for | |
87 | //! how to do things more efficiently. However, the complexity of the pass as a whole should be | |
88 | //! kept in mind. | |
1b1a35ee XL |
89 | //! |
90 | //! ## Previous Work | |
91 | //! | |
487cf647 FG |
92 | //! A [previous attempt][attempt 1] at implementing an optimization like this turned out to be a |
93 | //! significant regression in compiler performance. Fixing the regressions introduced a lot of | |
94 | //! undesirable complexity to the implementation. | |
95 | //! | |
96 | //! A [subsequent approach][attempt 2] tried to avoid the costly computation by limiting itself to | |
97 | //! acyclic CFGs, but still turned out to be far too costly to run due to suboptimal performance | |
98 | //! within individual basic blocks, requiring a walk across the entire block for every assignment | |
99 | //! found within the block. For the `tuple-stress` benchmark, which has 458745 statements in a | |
100 | //! single block, this proved to be far too costly. | |
1b1a35ee | 101 | //! |
487cf647 FG |
102 | //! [Another approach after that][attempt 3] was much closer to correct, but had some soundness |
103 | //! issues - it was failing to consider stores outside live ranges, and failed to uphold some of the | |
104 | //! requirements that MIR has for non-overlapping places within statements. However, it also had | |
105 | //! performance issues caused by `O(l² * s)` runtime, where `l` is the number of locals and `s` is | |
106 | //! the number of statements and terminators. | |
1b1a35ee XL |
107 | //! |
108 | //! Since the first attempt at this, the compiler has improved dramatically, and new analysis | |
109 | //! frameworks have been added that should make this approach viable without requiring a limited | |
110 | //! approach that only works for some classes of CFGs: | |
111 | //! - rustc now has a powerful dataflow analysis framework that can handle forwards and backwards | |
112 | //! analyses efficiently. | |
113 | //! - Layout optimizations for generators have been added to improve code generation for | |
487cf647 | 114 | //! async/await, which are very similar in spirit to what this optimization does. |
1b1a35ee XL |
115 | //! |
116 | //! Also, rustc now has a simple NRVO pass (see `nrvo.rs`), which handles a subset of the cases that | |
117 | //! this destination propagation pass handles, proving that similar optimizations can be performed | |
118 | //! on MIR. | |
119 | //! | |
120 | //! ## Pre/Post Optimization | |
121 | //! | |
122 | //! It is recommended to run `SimplifyCfg` and then `SimplifyLocals` some time after this pass, as | |
123 | //! it replaces the eliminated assign statements with `nop`s and leaves unused locals behind. | |
124 | //! | |
125 | //! [liveness]: https://en.wikipedia.org/wiki/Live_variable_analysis | |
487cf647 FG |
126 | //! [attempt 1]: https://github.com/rust-lang/rust/pull/47954 |
127 | //! [attempt 2]: https://github.com/rust-lang/rust/pull/71003 | |
128 | //! [attempt 3]: https://github.com/rust-lang/rust/pull/72632 | |
129 | ||
130 | use std::collections::hash_map::{Entry, OccupiedEntry}; | |
1b1a35ee | 131 | |
9c376795 | 132 | use crate::simplify::remove_dead_blocks; |
c295e0f8 | 133 | use crate::MirPass; |
487cf647 FG |
134 | use rustc_data_structures::fx::FxHashMap; |
135 | use rustc_index::bit_set::BitSet; | |
9c376795 | 136 | use rustc_middle::mir::visit::{MutVisitor, PlaceContext, Visitor}; |
c295e0f8 | 137 | use rustc_middle::mir::{dump_mir, PassWhere}; |
1b1a35ee | 138 | use rustc_middle::mir::{ |
487cf647 FG |
139 | traversal, BasicBlock, Body, InlineAsmOperand, Local, LocalKind, Location, Operand, Place, |
140 | Rvalue, Statement, StatementKind, TerminatorKind, | |
141 | }; | |
17df50a5 | 142 | use rustc_middle::ty::TyCtxt; |
487cf647 FG |
143 | use rustc_mir_dataflow::impls::MaybeLiveLocals; |
144 | use rustc_mir_dataflow::{Analysis, ResultsCursor}; | |
1b1a35ee XL |
145 | |
146 | pub struct DestinationPropagation; | |
147 | ||
148 | impl<'tcx> MirPass<'tcx> for DestinationPropagation { | |
a2a8927a | 149 | fn is_enabled(&self, sess: &rustc_session::Session) -> bool { |
487cf647 FG |
150 | // For now, only run at MIR opt level 3. Two things need to be changed before this can be |
151 | // turned on by default: | |
152 | // 1. Because of the overeager removal of storage statements, this can cause stack space | |
153 | // regressions. This opt is not the place to fix this though, it's a more general | |
154 | // problem in MIR. | |
155 | // 2. Despite being an overall perf improvement, this still causes a 30% regression in | |
156 | // keccak. We can temporarily fix this by bounding function size, but in the long term | |
157 | // we should fix this by being smarter about invalidating analysis results. | |
158 | sess.mir_opt_level() >= 3 | |
a2a8927a | 159 | } |
1b1a35ee | 160 | |
a2a8927a | 161 | fn run_pass(&self, tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) { |
29967ef6 | 162 | let def_id = body.source.def_id(); |
487cf647 FG |
163 | let mut allocations = Allocations::default(); |
164 | trace!(func = ?tcx.def_path_str(def_id)); | |
29967ef6 | 165 | |
487cf647 | 166 | let borrowed = rustc_mir_dataflow::impls::borrowed_locals(body); |
1b1a35ee | 167 | |
487cf647 FG |
168 | // In order to avoid having to collect data for every single pair of locals in the body, we |
169 | // do not allow doing more than one merge for places that are derived from the same local at | |
170 | // once. To avoid missed opportunities, we instead iterate to a fixed point - we'll refer to | |
171 | // each of these iterations as a "round." | |
172 | // | |
173 | // Reaching a fixed point could in theory take up to `min(l, s)` rounds - however, we do not | |
174 | // expect to see MIR like that. To verify this, a test was run against `[rust-lang/regex]` - | |
175 | // the average MIR body saw 1.32 full iterations of this loop. The most that was hit were 30 | |
176 | // for a single function. Only 80/2801 (2.9%) of functions saw at least 5. | |
177 | // | |
178 | // [rust-lang/regex]: | |
179 | // https://github.com/rust-lang/regex/tree/b5372864e2df6a2f5e543a556a62197f50ca3650 | |
180 | let mut round_count = 0; | |
181 | loop { | |
182 | // PERF: Can we do something smarter than recalculating the candidates and liveness | |
183 | // results? | |
184 | let mut candidates = find_candidates( | |
185 | body, | |
186 | &borrowed, | |
187 | &mut allocations.candidates, | |
188 | &mut allocations.candidates_reverse, | |
1b1a35ee | 189 | ); |
487cf647 FG |
190 | trace!(?candidates); |
191 | let mut live = MaybeLiveLocals | |
192 | .into_engine(tcx, body) | |
193 | .iterate_to_fixpoint() | |
194 | .into_results_cursor(body); | |
195 | dest_prop_mir_dump(tcx, body, &mut live, round_count); | |
196 | ||
197 | FilterInformation::filter_liveness( | |
198 | &mut candidates, | |
199 | &mut live, | |
200 | &mut allocations.write_info, | |
201 | body, | |
1b1a35ee | 202 | ); |
1b1a35ee | 203 | |
487cf647 FG |
204 | // Because we do not update liveness information, it is unsound to use a local for more |
205 | // than one merge operation within a single round of optimizations. We store here which | |
206 | // ones we have already used. | |
207 | let mut merged_locals: BitSet<Local> = BitSet::new_empty(body.local_decls.len()); | |
1b1a35ee | 208 | |
487cf647 FG |
209 | // This is the set of merges we will apply this round. It is a subset of the candidates. |
210 | let mut merges = FxHashMap::default(); | |
1b1a35ee | 211 | |
487cf647 FG |
212 | for (src, candidates) in candidates.c.iter() { |
213 | if merged_locals.contains(*src) { | |
214 | continue; | |
215 | } | |
216 | let Some(dest) = | |
217 | candidates.iter().find(|dest| !merged_locals.contains(**dest)) else { | |
218 | continue; | |
219 | }; | |
220 | if !tcx.consider_optimizing(|| { | |
221 | format!("{} round {}", tcx.def_path_str(def_id), round_count) | |
222 | }) { | |
223 | break; | |
224 | } | |
225 | merges.insert(*src, *dest); | |
226 | merged_locals.insert(*src); | |
227 | merged_locals.insert(*dest); | |
1b1a35ee | 228 | } |
487cf647 | 229 | trace!(merging = ?merges); |
1b1a35ee | 230 | |
487cf647 | 231 | if merges.is_empty() { |
1b1a35ee XL |
232 | break; |
233 | } | |
487cf647 | 234 | round_count += 1; |
1b1a35ee | 235 | |
487cf647 | 236 | apply_merges(body, tcx, &merges, &merged_locals); |
1b1a35ee XL |
237 | } |
238 | ||
9c376795 FG |
239 | if round_count != 0 { |
240 | // Merging can introduce overlap between moved arguments and/or call destination in an | |
241 | // unreachable code, which validator considers to be ill-formed. | |
242 | remove_dead_blocks(tcx, body); | |
243 | } | |
244 | ||
487cf647 | 245 | trace!(round_count); |
1b1a35ee XL |
246 | } |
247 | } | |
248 | ||
487cf647 FG |
249 | /// Container for the various allocations that we need. |
250 | /// | |
251 | /// We store these here and hand out `&mut` access to them, instead of dropping and recreating them | |
252 | /// frequently. Everything with a `&'alloc` lifetime points into here. | |
253 | #[derive(Default)] | |
254 | struct Allocations { | |
255 | candidates: FxHashMap<Local, Vec<Local>>, | |
256 | candidates_reverse: FxHashMap<Local, Vec<Local>>, | |
257 | write_info: WriteInfo, | |
258 | // PERF: Do this for `MaybeLiveLocals` allocations too. | |
1b1a35ee XL |
259 | } |
260 | ||
487cf647 FG |
261 | #[derive(Debug)] |
262 | struct Candidates<'alloc> { | |
263 | /// The set of candidates we are considering in this optimization. | |
264 | /// | |
265 | /// We will always merge the key into at most one of its values. | |
266 | /// | |
267 | /// Whether a place ends up in the key or the value does not correspond to whether it appears as | |
268 | /// the lhs or rhs of any assignment. As a matter of fact, the places in here might never appear | |
269 | /// in an assignment at all. This happens because if we see an assignment like this: | |
270 | /// | |
271 | /// ```ignore (syntax-highlighting-only) | |
272 | /// _1.0 = _2.0 | |
273 | /// ``` | |
274 | /// | |
275 | /// We will still report that we would like to merge `_1` and `_2` in an attempt to allow us to | |
276 | /// remove that assignment. | |
277 | c: &'alloc mut FxHashMap<Local, Vec<Local>>, | |
278 | /// A reverse index of the `c` set; if the `c` set contains `a => Place { local: b, proj }`, | |
279 | /// then this contains `b => a`. | |
280 | // PERF: Possibly these should be `SmallVec`s? | |
281 | reverse: &'alloc mut FxHashMap<Local, Vec<Local>>, | |
1b1a35ee XL |
282 | } |
283 | ||
487cf647 FG |
284 | ////////////////////////////////////////////////////////// |
285 | // Merging | |
286 | // | |
287 | // Applies the actual optimization | |
1b1a35ee | 288 | |
487cf647 FG |
289 | fn apply_merges<'tcx>( |
290 | body: &mut Body<'tcx>, | |
291 | tcx: TyCtxt<'tcx>, | |
292 | merges: &FxHashMap<Local, Local>, | |
293 | merged_locals: &BitSet<Local>, | |
294 | ) { | |
295 | let mut merger = Merger { tcx, merges, merged_locals }; | |
296 | merger.visit_body_preserves_cfg(body); | |
1b1a35ee XL |
297 | } |
298 | ||
487cf647 | 299 | struct Merger<'a, 'tcx> { |
1b1a35ee | 300 | tcx: TyCtxt<'tcx>, |
487cf647 FG |
301 | merges: &'a FxHashMap<Local, Local>, |
302 | merged_locals: &'a BitSet<Local>, | |
1b1a35ee XL |
303 | } |
304 | ||
487cf647 | 305 | impl<'a, 'tcx> MutVisitor<'tcx> for Merger<'a, 'tcx> { |
a2a8927a | 306 | fn tcx(&self) -> TyCtxt<'tcx> { |
1b1a35ee XL |
307 | self.tcx |
308 | } | |
309 | ||
487cf647 FG |
310 | fn visit_local(&mut self, local: &mut Local, _: PlaceContext, _location: Location) { |
311 | if let Some(dest) = self.merges.get(local) { | |
312 | *local = *dest; | |
1b1a35ee XL |
313 | } |
314 | } | |
315 | ||
1b1a35ee | 316 | fn visit_statement(&mut self, statement: &mut Statement<'tcx>, location: Location) { |
1b1a35ee | 317 | match &statement.kind { |
487cf647 | 318 | // FIXME: Don't delete storage statements, but "merge" the storage ranges instead. |
1b1a35ee | 319 | StatementKind::StorageDead(local) | StatementKind::StorageLive(local) |
487cf647 | 320 | if self.merged_locals.contains(*local) => |
1b1a35ee | 321 | { |
487cf647 FG |
322 | statement.make_nop(); |
323 | return; | |
1b1a35ee | 324 | } |
487cf647 FG |
325 | _ => (), |
326 | }; | |
327 | self.super_statement(statement, location); | |
328 | match &statement.kind { | |
1b1a35ee XL |
329 | StatementKind::Assign(box (dest, rvalue)) => { |
330 | match rvalue { | |
331 | Rvalue::Use(Operand::Copy(place) | Operand::Move(place)) => { | |
332 | // These might've been turned into self-assignments by the replacement | |
333 | // (this includes the original statement we wanted to eliminate). | |
334 | if dest == place { | |
335 | debug!("{:?} turned into self-assignment, deleting", location); | |
336 | statement.make_nop(); | |
337 | } | |
338 | } | |
339 | _ => {} | |
340 | } | |
341 | } | |
342 | ||
343 | _ => {} | |
344 | } | |
345 | } | |
346 | } | |
347 | ||
487cf647 FG |
348 | ////////////////////////////////////////////////////////// |
349 | // Liveness filtering | |
350 | // | |
351 | // This section enforces bullet point 2 | |
352 | ||
353 | struct FilterInformation<'a, 'body, 'alloc, 'tcx> { | |
354 | body: &'body Body<'tcx>, | |
355 | live: &'a mut ResultsCursor<'body, 'tcx, MaybeLiveLocals>, | |
356 | candidates: &'a mut Candidates<'alloc>, | |
357 | write_info: &'alloc mut WriteInfo, | |
358 | at: Location, | |
1b1a35ee XL |
359 | } |
360 | ||
487cf647 FG |
361 | // We first implement some utility functions which we will expose removing candidates according to |
362 | // different needs. Throughout the livenss filtering, the `candidates` are only ever accessed | |
363 | // through these methods, and not directly. | |
364 | impl<'alloc> Candidates<'alloc> { | |
365 | /// Just `Vec::retain`, but the condition is inverted and we add debugging output | |
9c376795 | 366 | fn vec_filter_candidates( |
487cf647 FG |
367 | src: Local, |
368 | v: &mut Vec<Local>, | |
9c376795 | 369 | mut f: impl FnMut(Local) -> CandidateFilter, |
487cf647 FG |
370 | at: Location, |
371 | ) { | |
372 | v.retain(|dest| { | |
373 | let remove = f(*dest); | |
9c376795 | 374 | if remove == CandidateFilter::Remove { |
487cf647 | 375 | trace!("eliminating {:?} => {:?} due to conflict at {:?}", src, dest, at); |
29967ef6 | 376 | } |
9c376795 | 377 | remove == CandidateFilter::Keep |
29967ef6 | 378 | }); |
487cf647 | 379 | } |
1b1a35ee | 380 | |
9c376795 FG |
381 | /// `vec_filter_candidates` but for an `Entry` |
382 | fn entry_filter_candidates( | |
487cf647 FG |
383 | mut entry: OccupiedEntry<'_, Local, Vec<Local>>, |
384 | p: Local, | |
9c376795 | 385 | f: impl FnMut(Local) -> CandidateFilter, |
487cf647 FG |
386 | at: Location, |
387 | ) { | |
388 | let candidates = entry.get_mut(); | |
9c376795 | 389 | Self::vec_filter_candidates(p, candidates, f, at); |
487cf647 FG |
390 | if candidates.len() == 0 { |
391 | entry.remove(); | |
392 | } | |
393 | } | |
1b1a35ee | 394 | |
9c376795 FG |
395 | /// For all candidates `(p, q)` or `(q, p)` removes the candidate if `f(q)` says to do so |
396 | fn filter_candidates_by( | |
397 | &mut self, | |
398 | p: Local, | |
399 | mut f: impl FnMut(Local) -> CandidateFilter, | |
400 | at: Location, | |
401 | ) { | |
487cf647 FG |
402 | // Cover the cases where `p` appears as a `src` |
403 | if let Entry::Occupied(entry) = self.c.entry(p) { | |
9c376795 | 404 | Self::entry_filter_candidates(entry, p, &mut f, at); |
487cf647 FG |
405 | } |
406 | // And the cases where `p` appears as a `dest` | |
407 | let Some(srcs) = self.reverse.get_mut(&p) else { | |
408 | return; | |
409 | }; | |
410 | // We use `retain` here to remove the elements from the reverse set if we've removed the | |
411 | // matching candidate in the forward set. | |
412 | srcs.retain(|src| { | |
9c376795 | 413 | if f(*src) == CandidateFilter::Keep { |
487cf647 | 414 | return true; |
1b1a35ee | 415 | } |
487cf647 FG |
416 | let Entry::Occupied(entry) = self.c.entry(*src) else { |
417 | return false; | |
418 | }; | |
9c376795 FG |
419 | Self::entry_filter_candidates( |
420 | entry, | |
421 | *src, | |
422 | |dest| { | |
423 | if dest == p { CandidateFilter::Remove } else { CandidateFilter::Keep } | |
424 | }, | |
425 | at, | |
426 | ); | |
487cf647 FG |
427 | false |
428 | }); | |
429 | } | |
430 | } | |
1b1a35ee | 431 | |
9c376795 FG |
432 | #[derive(Copy, Clone, PartialEq, Eq)] |
433 | enum CandidateFilter { | |
434 | Keep, | |
435 | Remove, | |
436 | } | |
437 | ||
487cf647 FG |
438 | impl<'a, 'body, 'alloc, 'tcx> FilterInformation<'a, 'body, 'alloc, 'tcx> { |
439 | /// Filters the set of candidates to remove those that conflict. | |
440 | /// | |
441 | /// The steps we take are exactly those that are outlined at the top of the file. For each | |
442 | /// statement/terminator, we collect the set of locals that are written to in that | |
443 | /// statement/terminator, and then we remove all pairs of candidates that contain one such local | |
444 | /// and another one that is live. | |
445 | /// | |
446 | /// We need to be careful about the ordering of operations within each statement/terminator | |
447 | /// here. Many statements might write and read from more than one place, and we need to consider | |
448 | /// them all. The strategy for doing this is as follows: We first gather all the places that are | |
449 | /// written to within the statement/terminator via `WriteInfo`. Then, we use the liveness | |
450 | /// analysis from *before* the statement/terminator (in the control flow sense) to eliminate | |
451 | /// candidates - this is because we want to conservatively treat a pair of locals that is both | |
452 | /// read and written in the statement/terminator to be conflicting, and the liveness analysis | |
453 | /// before the statement/terminator will correctly report locals that are read in the | |
454 | /// statement/terminator to be live. We are additionally conservative by treating all written to | |
455 | /// locals as also being read from. | |
456 | fn filter_liveness<'b>( | |
457 | candidates: &mut Candidates<'alloc>, | |
458 | live: &mut ResultsCursor<'b, 'tcx, MaybeLiveLocals>, | |
459 | write_info_alloc: &'alloc mut WriteInfo, | |
460 | body: &'b Body<'tcx>, | |
461 | ) { | |
462 | let mut this = FilterInformation { | |
463 | body, | |
464 | live, | |
465 | candidates, | |
466 | // We don't actually store anything at this scope, we just keep things here to be able | |
467 | // to reuse the allocation. | |
468 | write_info: write_info_alloc, | |
469 | // Doesn't matter what we put here, will be overwritten before being used | |
470 | at: Location { block: BasicBlock::from_u32(0), statement_index: 0 }, | |
471 | }; | |
472 | this.internal_filter_liveness(); | |
473 | } | |
1b1a35ee | 474 | |
487cf647 FG |
475 | fn internal_filter_liveness(&mut self) { |
476 | for (block, data) in traversal::preorder(self.body) { | |
477 | self.at = Location { block, statement_index: data.statements.len() }; | |
478 | self.live.seek_after_primary_effect(self.at); | |
479 | self.write_info.for_terminator(&data.terminator().kind); | |
480 | self.apply_conflicts(); | |
481 | ||
482 | for (i, statement) in data.statements.iter().enumerate().rev() { | |
483 | self.at = Location { block, statement_index: i }; | |
484 | self.live.seek_after_primary_effect(self.at); | |
9c376795 | 485 | self.write_info.for_statement(&statement.kind, self.body); |
487cf647 | 486 | self.apply_conflicts(); |
1b1a35ee | 487 | } |
1b1a35ee | 488 | } |
1b1a35ee XL |
489 | } |
490 | ||
487cf647 FG |
491 | fn apply_conflicts(&mut self) { |
492 | let writes = &self.write_info.writes; | |
493 | for p in writes { | |
9c376795 FG |
494 | let other_skip = self.write_info.skip_pair.and_then(|(a, b)| { |
495 | if a == *p { | |
496 | Some(b) | |
497 | } else if b == *p { | |
498 | Some(a) | |
499 | } else { | |
500 | None | |
501 | } | |
502 | }); | |
503 | self.candidates.filter_candidates_by( | |
487cf647 | 504 | *p, |
9c376795 FG |
505 | |q| { |
506 | if Some(q) == other_skip { | |
507 | return CandidateFilter::Keep; | |
508 | } | |
509 | // It is possible that a local may be live for less than the | |
510 | // duration of a statement This happens in the case of function | |
511 | // calls or inline asm. Because of this, we also mark locals as | |
512 | // conflicting when both of them are written to in the same | |
513 | // statement. | |
514 | if self.live.contains(q) || writes.contains(&q) { | |
515 | CandidateFilter::Remove | |
516 | } else { | |
517 | CandidateFilter::Keep | |
518 | } | |
519 | }, | |
487cf647 FG |
520 | self.at, |
521 | ); | |
522 | } | |
523 | } | |
487cf647 | 524 | } |
1b1a35ee | 525 | |
487cf647 FG |
526 | /// Describes where a statement/terminator writes to |
527 | #[derive(Default, Debug)] | |
528 | struct WriteInfo { | |
529 | writes: Vec<Local>, | |
9c376795 FG |
530 | /// If this pair of locals is a candidate pair, completely skip processing it during this |
531 | /// statement. All other candidates are unaffected. | |
532 | skip_pair: Option<(Local, Local)>, | |
487cf647 FG |
533 | } |
534 | ||
535 | impl WriteInfo { | |
9c376795 FG |
536 | fn for_statement<'tcx>(&mut self, statement: &StatementKind<'tcx>, body: &Body<'tcx>) { |
537 | self.reset(); | |
487cf647 FG |
538 | match statement { |
539 | StatementKind::Assign(box (lhs, rhs)) => { | |
540 | self.add_place(*lhs); | |
541 | match rhs { | |
9c376795 FG |
542 | Rvalue::Use(op) => { |
543 | self.add_operand(op); | |
544 | self.consider_skipping_for_assign_use(*lhs, op, body); | |
545 | } | |
546 | Rvalue::Repeat(op, _) => { | |
487cf647 FG |
547 | self.add_operand(op); |
548 | } | |
549 | Rvalue::Cast(_, op, _) | |
550 | | Rvalue::UnaryOp(_, op) | |
551 | | Rvalue::ShallowInitBox(op, _) => { | |
552 | self.add_operand(op); | |
553 | } | |
554 | Rvalue::BinaryOp(_, ops) | Rvalue::CheckedBinaryOp(_, ops) => { | |
555 | for op in [&ops.0, &ops.1] { | |
556 | self.add_operand(op); | |
557 | } | |
558 | } | |
559 | Rvalue::Aggregate(_, ops) => { | |
560 | for op in ops { | |
561 | self.add_operand(op); | |
562 | } | |
563 | } | |
564 | Rvalue::ThreadLocalRef(_) | |
565 | | Rvalue::NullaryOp(_, _) | |
566 | | Rvalue::Ref(_, _, _) | |
567 | | Rvalue::AddressOf(_, _) | |
568 | | Rvalue::Len(_) | |
569 | | Rvalue::Discriminant(_) | |
570 | | Rvalue::CopyForDeref(_) => (), | |
571 | } | |
572 | } | |
573 | // Retags are technically also reads, but reporting them as a write suffices | |
574 | StatementKind::SetDiscriminant { place, .. } | |
575 | | StatementKind::Deinit(place) | |
576 | | StatementKind::Retag(_, place) => { | |
577 | self.add_place(**place); | |
578 | } | |
579 | StatementKind::Intrinsic(_) | |
580 | | StatementKind::Nop | |
581 | | StatementKind::Coverage(_) | |
582 | | StatementKind::StorageLive(_) | |
583 | | StatementKind::StorageDead(_) => (), | |
584 | StatementKind::FakeRead(_) | StatementKind::AscribeUserType(_, _) => { | |
585 | bug!("{:?} not found in this MIR phase", statement) | |
586 | } | |
1b1a35ee XL |
587 | } |
588 | } | |
589 | ||
9c376795 FG |
590 | fn consider_skipping_for_assign_use<'tcx>( |
591 | &mut self, | |
592 | lhs: Place<'tcx>, | |
593 | rhs: &Operand<'tcx>, | |
594 | body: &Body<'tcx>, | |
595 | ) { | |
596 | let Some(rhs) = rhs.place() else { | |
597 | return | |
598 | }; | |
599 | if let Some(pair) = places_to_candidate_pair(lhs, rhs, body) { | |
600 | self.skip_pair = Some(pair); | |
601 | } | |
602 | } | |
603 | ||
487cf647 | 604 | fn for_terminator<'tcx>(&mut self, terminator: &TerminatorKind<'tcx>) { |
9c376795 | 605 | self.reset(); |
487cf647 FG |
606 | match terminator { |
607 | TerminatorKind::SwitchInt { discr: op, .. } | |
608 | | TerminatorKind::Assert { cond: op, .. } => { | |
609 | self.add_operand(op); | |
1b1a35ee | 610 | } |
487cf647 FG |
611 | TerminatorKind::Call { destination, func, args, .. } => { |
612 | self.add_place(*destination); | |
613 | self.add_operand(func); | |
614 | for arg in args { | |
615 | self.add_operand(arg); | |
1b1a35ee XL |
616 | } |
617 | } | |
487cf647 FG |
618 | TerminatorKind::InlineAsm { operands, .. } => { |
619 | for asm_operand in operands { | |
620 | match asm_operand { | |
621 | InlineAsmOperand::In { value, .. } => { | |
622 | self.add_operand(value); | |
1b1a35ee | 623 | } |
487cf647 FG |
624 | InlineAsmOperand::Out { place, .. } => { |
625 | if let Some(place) = place { | |
626 | self.add_place(*place); | |
1b1a35ee XL |
627 | } |
628 | } | |
487cf647 FG |
629 | // Note that the `late` field in `InOut` is about whether the registers used |
630 | // for these things overlap, and is of absolutely no interest to us. | |
631 | InlineAsmOperand::InOut { in_value, out_place, .. } => { | |
632 | if let Some(place) = out_place { | |
633 | self.add_place(*place); | |
634 | } | |
635 | self.add_operand(in_value); | |
1b1a35ee | 636 | } |
487cf647 FG |
637 | InlineAsmOperand::Const { .. } |
638 | | InlineAsmOperand::SymFn { .. } | |
639 | | InlineAsmOperand::SymStatic { .. } => (), | |
1b1a35ee XL |
640 | } |
641 | } | |
642 | } | |
1b1a35ee | 643 | TerminatorKind::Goto { .. } |
487cf647 FG |
644 | | TerminatorKind::Resume { .. } |
645 | | TerminatorKind::Abort { .. } | |
1b1a35ee | 646 | | TerminatorKind::Return |
487cf647 FG |
647 | | TerminatorKind::Unreachable { .. } => (), |
648 | TerminatorKind::Drop { .. } => { | |
649 | // `Drop`s create a `&mut` and so are not considered | |
650 | } | |
651 | TerminatorKind::DropAndReplace { .. } | |
652 | | TerminatorKind::Yield { .. } | |
1b1a35ee XL |
653 | | TerminatorKind::GeneratorDrop |
654 | | TerminatorKind::FalseEdge { .. } | |
487cf647 FG |
655 | | TerminatorKind::FalseUnwind { .. } => { |
656 | bug!("{:?} not found in this MIR phase", terminator) | |
657 | } | |
1b1a35ee XL |
658 | } |
659 | } | |
660 | ||
9c376795 | 661 | fn add_place(&mut self, place: Place<'_>) { |
487cf647 FG |
662 | self.writes.push(place.local); |
663 | } | |
1b1a35ee | 664 | |
487cf647 FG |
665 | fn add_operand<'tcx>(&mut self, op: &Operand<'tcx>) { |
666 | match op { | |
667 | // FIXME(JakobDegen): In a previous version, the `Move` case was incorrectly treated as | |
668 | // being a read only. This was unsound, however we cannot add a regression test because | |
669 | // it is not possible to set this off with current MIR. Once we have that ability, a | |
670 | // regression test should be added. | |
671 | Operand::Move(p) => self.add_place(*p), | |
672 | Operand::Copy(_) | Operand::Constant(_) => (), | |
1b1a35ee | 673 | } |
1b1a35ee | 674 | } |
9c376795 FG |
675 | |
676 | fn reset(&mut self) { | |
677 | self.writes.clear(); | |
678 | self.skip_pair = None; | |
679 | } | |
487cf647 | 680 | } |
1b1a35ee | 681 | |
487cf647 FG |
682 | ///////////////////////////////////////////////////// |
683 | // Candidate accumulation | |
1b1a35ee | 684 | |
487cf647 FG |
685 | /// If the pair of places is being considered for merging, returns the candidate which would be |
686 | /// merged in order to accomplish this. | |
687 | /// | |
688 | /// The contract here is in one direction - there is a guarantee that merging the locals that are | |
689 | /// outputted by this function would result in an assignment between the inputs becoming a | |
690 | /// self-assignment. However, there is no guarantee that the returned pair is actually suitable for | |
691 | /// merging - candidate collection must still check this independently. | |
1b1a35ee | 692 | /// |
487cf647 FG |
693 | /// This output is unique for each unordered pair of input places. |
694 | fn places_to_candidate_pair<'tcx>( | |
695 | a: Place<'tcx>, | |
696 | b: Place<'tcx>, | |
697 | body: &Body<'tcx>, | |
698 | ) -> Option<(Local, Local)> { | |
699 | let (mut a, mut b) = if a.projection.len() == 0 && b.projection.len() == 0 { | |
700 | (a.local, b.local) | |
701 | } else { | |
702 | return None; | |
703 | }; | |
704 | ||
705 | // By sorting, we make sure we're input order independent | |
706 | if a > b { | |
707 | std::mem::swap(&mut a, &mut b); | |
708 | } | |
709 | ||
710 | // We could now return `(a, b)`, but then we miss some candidates in the case where `a` can't be | |
711 | // used as a `src`. | |
712 | if is_local_required(a, body) { | |
713 | std::mem::swap(&mut a, &mut b); | |
714 | } | |
715 | // We could check `is_local_required` again here, but there's no need - after all, we make no | |
716 | // promise that the candidate pair is actually valid | |
717 | Some((a, b)) | |
1b1a35ee XL |
718 | } |
719 | ||
487cf647 | 720 | /// Collects the candidates for merging |
1b1a35ee | 721 | /// |
487cf647 FG |
722 | /// This is responsible for enforcing the first and third bullet point. |
723 | fn find_candidates<'alloc, 'tcx>( | |
724 | body: &Body<'tcx>, | |
725 | borrowed: &BitSet<Local>, | |
726 | candidates: &'alloc mut FxHashMap<Local, Vec<Local>>, | |
727 | candidates_reverse: &'alloc mut FxHashMap<Local, Vec<Local>>, | |
728 | ) -> Candidates<'alloc> { | |
729 | candidates.clear(); | |
730 | candidates_reverse.clear(); | |
731 | let mut visitor = FindAssignments { body, candidates, borrowed }; | |
1b1a35ee | 732 | visitor.visit_body(body); |
487cf647 FG |
733 | // Deduplicate candidates |
734 | for (_, cands) in candidates.iter_mut() { | |
735 | cands.sort(); | |
736 | cands.dedup(); | |
737 | } | |
738 | // Generate the reverse map | |
739 | for (src, cands) in candidates.iter() { | |
740 | for dest in cands.iter().copied() { | |
741 | candidates_reverse.entry(dest).or_default().push(*src); | |
742 | } | |
743 | } | |
744 | Candidates { c: candidates, reverse: candidates_reverse } | |
1b1a35ee XL |
745 | } |
746 | ||
487cf647 | 747 | struct FindAssignments<'a, 'alloc, 'tcx> { |
1b1a35ee | 748 | body: &'a Body<'tcx>, |
487cf647 FG |
749 | candidates: &'alloc mut FxHashMap<Local, Vec<Local>>, |
750 | borrowed: &'a BitSet<Local>, | |
1b1a35ee XL |
751 | } |
752 | ||
487cf647 FG |
753 | impl<'tcx> Visitor<'tcx> for FindAssignments<'_, '_, 'tcx> { |
754 | fn visit_statement(&mut self, statement: &Statement<'tcx>, _: Location) { | |
1b1a35ee | 755 | if let StatementKind::Assign(box ( |
487cf647 FG |
756 | lhs, |
757 | Rvalue::Use(Operand::Copy(rhs) | Operand::Move(rhs)), | |
1b1a35ee XL |
758 | )) = &statement.kind |
759 | { | |
487cf647 | 760 | let Some((src, dest)) = places_to_candidate_pair(*lhs, *rhs, self.body) else { |
1b1a35ee | 761 | return; |
487cf647 | 762 | }; |
1b1a35ee | 763 | |
487cf647 FG |
764 | // As described at the top of the file, we do not go near things that have their address |
765 | // taken. | |
766 | if self.borrowed.contains(src) || self.borrowed.contains(dest) { | |
1b1a35ee XL |
767 | return; |
768 | } | |
769 | ||
487cf647 FG |
770 | // Also, we need to make sure that MIR actually allows the `src` to be removed |
771 | if is_local_required(src, self.body) { | |
1b1a35ee XL |
772 | return; |
773 | } | |
774 | ||
487cf647 FG |
775 | // We may insert duplicates here, but that's fine |
776 | self.candidates.entry(src).or_default().push(dest); | |
1b1a35ee XL |
777 | } |
778 | } | |
779 | } | |
780 | ||
781 | /// Some locals are part of the function's interface and can not be removed. | |
782 | /// | |
783 | /// Note that these locals *can* still be merged with non-required locals by removing that other | |
784 | /// local. | |
785 | fn is_local_required(local: Local, body: &Body<'_>) -> bool { | |
786 | match body.local_kind(local) { | |
787 | LocalKind::Arg | LocalKind::ReturnPointer => true, | |
788 | LocalKind::Var | LocalKind::Temp => false, | |
789 | } | |
790 | } | |
791 | ||
487cf647 FG |
792 | ///////////////////////////////////////////////////////// |
793 | // MIR Dump | |
1b1a35ee | 794 | |
487cf647 FG |
795 | fn dest_prop_mir_dump<'body, 'tcx>( |
796 | tcx: TyCtxt<'tcx>, | |
797 | body: &'body Body<'tcx>, | |
798 | live: &mut ResultsCursor<'body, 'tcx, MaybeLiveLocals>, | |
799 | round: usize, | |
800 | ) { | |
801 | let mut reachable = None; | |
802 | dump_mir(tcx, false, "DestinationPropagation-dataflow", &round, body, |pass_where, w| { | |
803 | let reachable = reachable.get_or_insert_with(|| traversal::reachable_as_bitset(body)); | |
804 | ||
805 | match pass_where { | |
806 | PassWhere::BeforeLocation(loc) if reachable.contains(loc.block) => { | |
807 | live.seek_after_primary_effect(loc); | |
808 | writeln!(w, " // live: {:?}", live.get())?; | |
809 | } | |
810 | PassWhere::AfterTerminator(bb) if reachable.contains(bb) => { | |
811 | let loc = body.terminator_loc(bb); | |
812 | live.seek_before_primary_effect(loc); | |
813 | writeln!(w, " // live: {:?}", live.get())?; | |
814 | } | |
1b1a35ee | 815 | |
487cf647 FG |
816 | PassWhere::BeforeBlock(bb) if reachable.contains(bb) => { |
817 | live.seek_to_block_start(bb); | |
818 | writeln!(w, " // live: {:?}", live.get())?; | |
819 | } | |
820 | ||
821 | PassWhere::BeforeCFG | PassWhere::AfterCFG | PassWhere::AfterLocation(_) => {} | |
822 | ||
823 | PassWhere::BeforeLocation(_) | PassWhere::AfterTerminator(_) => { | |
824 | writeln!(w, " // live: <unreachable>")?; | |
825 | } | |
826 | ||
827 | PassWhere::BeforeBlock(_) => { | |
828 | writeln!(w, " // live: <unreachable>")?; | |
829 | } | |
1b1a35ee | 830 | } |
487cf647 FG |
831 | |
832 | Ok(()) | |
833 | }); | |
1b1a35ee | 834 | } |