]> git.proxmox.com Git - rustc.git/blob - compiler/rustc_mir_transform/src/dest_prop.rs
New upstream version 1.74.1+dfsg1
[rustc.git] / compiler / rustc_mir_transform / src / dest_prop.rs
1 //! Propagates assignment destinations backwards in the CFG to eliminate redundant assignments.
2 //!
3 //! # Motivation
4 //!
5 //! MIR building can insert a lot of redundant copies, and Rust code in general often tends to move
6 //! values around a lot. The result is a lot of assignments of the form `dest = {move} src;` in MIR.
7 //! MIR building for constants in particular tends to create additional locals that are only used
8 //! inside a single block to shuffle a value around unnecessarily.
9 //!
10 //! LLVM by itself is not good enough at eliminating these redundant copies (eg. see
11 //! <https://github.com/rust-lang/rust/issues/32966>), so this leaves some performance on the table
12 //! that we can regain by implementing an optimization for removing these assign statements in rustc
13 //! itself. When this optimization runs fast enough, it can also speed up the constant evaluation
14 //! and code generation phases of rustc due to the reduced number of statements and locals.
15 //!
16 //! # The Optimization
17 //!
18 //! Conceptually, this optimization is "destination propagation". It is similar to the Named Return
19 //! Value Optimization, or NRVO, known from the C++ world, except that it isn't limited to return
20 //! values or the return place `_0`. On a very high level, independent of the actual implementation
21 //! details, it does the following:
22 //!
23 //! 1) Identify `dest = src;` statements with values for `dest` and `src` whose storage can soundly
24 //! be merged.
25 //! 2) Replace all mentions of `src` with `dest` ("unifying" them and propagating the destination
26 //! backwards).
27 //! 3) Delete the `dest = src;` statement (by making it a `nop`).
28 //!
29 //! Step 1) is by far the hardest, so it is explained in more detail below.
30 //!
31 //! ## Soundness
32 //!
33 //! We have a pair of places `p` and `q`, whose memory we would like to merge. In order for this to
34 //! be sound, we need to check a number of conditions:
35 //!
36 //! * `p` and `q` must both be *constant* - it does not make much sense to talk about merging them
37 //! if they do not consistently refer to the same place in memory. This is satisfied if they do
38 //! not contain any indirection through a pointer or any indexing projections.
39 //!
40 //! * `p` and `q` must have the **same type**. If we replace a local with a subtype or supertype,
41 //! we may end up with a differnet vtable for that local. See the `subtyping-impacts-selection`
42 //! tests for an example where that causes issues.
43 //!
44 //! * We need to make sure that the goal of "merging the memory" is actually structurally possible
45 //! in MIR. For example, even if all the other conditions are satisfied, there is no way to
46 //! "merge" `_5.foo` and `_6.bar`. For now, we ensure this by requiring that both `p` and `q` are
47 //! locals with no further projections. Future iterations of this pass should improve on this.
48 //!
49 //! * Finally, we want `p` and `q` to use the same memory - however, we still need to make sure that
50 //! each of them has enough "ownership" of that memory to continue "doing its job." More
51 //! precisely, what we will check is that whenever the program performs a write to `p`, then it
52 //! does not currently care about what the value in `q` is (and vice versa). We formalize the
53 //! notion of "does not care what the value in `q` is" by checking the *liveness* of `q`.
54 //!
55 //! Because of the difficulty of computing liveness of places that have their address taken, we do
56 //! not even attempt to do it. Any places that are in a local that has its address taken is
57 //! excluded from the optimization.
58 //!
59 //! The first two conditions are simple structural requirements on the `Assign` statements that can
60 //! be trivially checked. The third requirement however is more difficult and costly to check.
61 //!
62 //! ## Future Improvements
63 //!
64 //! There are a number of ways in which this pass could be improved in the future:
65 //!
66 //! * Merging storage liveness ranges instead of removing storage statements completely. This may
67 //! improve stack usage.
68 //!
69 //! * Allow merging locals into places with projections, eg `_5` into `_6.foo`.
70 //!
71 //! * Liveness analysis with more precision than whole locals at a time. The smaller benefit of this
72 //! is that it would allow us to dest prop at "sub-local" levels in some cases. The bigger benefit
73 //! of this is that such liveness analysis can report more accurate results about whole locals at
74 //! a time. For example, consider:
75 //!
76 //! ```ignore (syntax-highlighting-only)
77 //! _1 = u;
78 //! // unrelated code
79 //! _1.f1 = v;
80 //! _2 = _1.f1;
81 //! ```
82 //!
83 //! Because the current analysis only thinks in terms of locals, it does not have enough
84 //! information to report that `_1` is dead in the "unrelated code" section.
85 //!
86 //! * Liveness analysis enabled by alias analysis. This would allow us to not just bail on locals
87 //! that ever have their address taken. Of course that requires actually having alias analysis
88 //! (and a model to build it on), so this might be a bit of a ways off.
89 //!
90 //! * Various perf improvements. There are a bunch of comments in here marked `PERF` with ideas for
91 //! how to do things more efficiently. However, the complexity of the pass as a whole should be
92 //! kept in mind.
93 //!
94 //! ## Previous Work
95 //!
96 //! A [previous attempt][attempt 1] at implementing an optimization like this turned out to be a
97 //! significant regression in compiler performance. Fixing the regressions introduced a lot of
98 //! undesirable complexity to the implementation.
99 //!
100 //! A [subsequent approach][attempt 2] tried to avoid the costly computation by limiting itself to
101 //! acyclic CFGs, but still turned out to be far too costly to run due to suboptimal performance
102 //! within individual basic blocks, requiring a walk across the entire block for every assignment
103 //! found within the block. For the `tuple-stress` benchmark, which has 458745 statements in a
104 //! single block, this proved to be far too costly.
105 //!
106 //! [Another approach after that][attempt 3] was much closer to correct, but had some soundness
107 //! issues - it was failing to consider stores outside live ranges, and failed to uphold some of the
108 //! requirements that MIR has for non-overlapping places within statements. However, it also had
109 //! performance issues caused by `O(l² * s)` runtime, where `l` is the number of locals and `s` is
110 //! the number of statements and terminators.
111 //!
112 //! Since the first attempt at this, the compiler has improved dramatically, and new analysis
113 //! frameworks have been added that should make this approach viable without requiring a limited
114 //! approach that only works for some classes of CFGs:
115 //! - rustc now has a powerful dataflow analysis framework that can handle forwards and backwards
116 //! analyses efficiently.
117 //! - Layout optimizations for generators have been added to improve code generation for
118 //! async/await, which are very similar in spirit to what this optimization does.
119 //!
120 //! Also, rustc now has a simple NRVO pass (see `nrvo.rs`), which handles a subset of the cases that
121 //! this destination propagation pass handles, proving that similar optimizations can be performed
122 //! on MIR.
123 //!
124 //! ## Pre/Post Optimization
125 //!
126 //! It is recommended to run `SimplifyCfg` and then `SimplifyLocals` some time after this pass, as
127 //! it replaces the eliminated assign statements with `nop`s and leaves unused locals behind.
128 //!
129 //! [liveness]: https://en.wikipedia.org/wiki/Live_variable_analysis
130 //! [attempt 1]: https://github.com/rust-lang/rust/pull/47954
131 //! [attempt 2]: https://github.com/rust-lang/rust/pull/71003
132 //! [attempt 3]: https://github.com/rust-lang/rust/pull/72632
133
134 use std::collections::hash_map::{Entry, OccupiedEntry};
135
136 use crate::simplify::remove_dead_blocks;
137 use crate::MirPass;
138 use rustc_data_structures::fx::FxHashMap;
139 use rustc_index::bit_set::BitSet;
140 use rustc_middle::mir::visit::{MutVisitor, PlaceContext, Visitor};
141 use rustc_middle::mir::HasLocalDecls;
142 use rustc_middle::mir::{dump_mir, PassWhere};
143 use rustc_middle::mir::{
144 traversal, Body, InlineAsmOperand, Local, LocalKind, Location, Operand, Place, Rvalue,
145 Statement, StatementKind, TerminatorKind,
146 };
147 use rustc_middle::ty::TyCtxt;
148 use rustc_mir_dataflow::impls::MaybeLiveLocals;
149 use rustc_mir_dataflow::{Analysis, ResultsCursor};
150
151 pub struct DestinationPropagation;
152
153 impl<'tcx> MirPass<'tcx> for DestinationPropagation {
154 fn is_enabled(&self, sess: &rustc_session::Session) -> bool {
155 // For now, only run at MIR opt level 3. Two things need to be changed before this can be
156 // turned on by default:
157 // 1. Because of the overeager removal of storage statements, this can cause stack space
158 // regressions. This opt is not the place to fix this though, it's a more general
159 // problem in MIR.
160 // 2. Despite being an overall perf improvement, this still causes a 30% regression in
161 // keccak. We can temporarily fix this by bounding function size, but in the long term
162 // we should fix this by being smarter about invalidating analysis results.
163 sess.mir_opt_level() >= 3
164 }
165
166 fn run_pass(&self, tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) {
167 let def_id = body.source.def_id();
168 let mut allocations = Allocations::default();
169 trace!(func = ?tcx.def_path_str(def_id));
170
171 let borrowed = rustc_mir_dataflow::impls::borrowed_locals(body);
172
173 // In order to avoid having to collect data for every single pair of locals in the body, we
174 // do not allow doing more than one merge for places that are derived from the same local at
175 // once. To avoid missed opportunities, we instead iterate to a fixed point - we'll refer to
176 // each of these iterations as a "round."
177 //
178 // Reaching a fixed point could in theory take up to `min(l, s)` rounds - however, we do not
179 // expect to see MIR like that. To verify this, a test was run against `[rust-lang/regex]` -
180 // the average MIR body saw 1.32 full iterations of this loop. The most that was hit were 30
181 // for a single function. Only 80/2801 (2.9%) of functions saw at least 5.
182 //
183 // [rust-lang/regex]:
184 // https://github.com/rust-lang/regex/tree/b5372864e2df6a2f5e543a556a62197f50ca3650
185 let mut round_count = 0;
186 loop {
187 // PERF: Can we do something smarter than recalculating the candidates and liveness
188 // results?
189 let mut candidates = find_candidates(
190 body,
191 &borrowed,
192 &mut allocations.candidates,
193 &mut allocations.candidates_reverse,
194 );
195 trace!(?candidates);
196 let mut live = MaybeLiveLocals
197 .into_engine(tcx, body)
198 .iterate_to_fixpoint()
199 .into_results_cursor(body);
200 dest_prop_mir_dump(tcx, body, &mut live, round_count);
201
202 FilterInformation::filter_liveness(
203 &mut candidates,
204 &mut live,
205 &mut allocations.write_info,
206 body,
207 );
208
209 // Because we do not update liveness information, it is unsound to use a local for more
210 // than one merge operation within a single round of optimizations. We store here which
211 // ones we have already used.
212 let mut merged_locals: BitSet<Local> = BitSet::new_empty(body.local_decls.len());
213
214 // This is the set of merges we will apply this round. It is a subset of the candidates.
215 let mut merges = FxHashMap::default();
216
217 for (src, candidates) in candidates.c.iter() {
218 if merged_locals.contains(*src) {
219 continue;
220 }
221 let Some(dest) = candidates.iter().find(|dest| !merged_locals.contains(**dest))
222 else {
223 continue;
224 };
225 if !tcx.consider_optimizing(|| {
226 format!("{} round {}", tcx.def_path_str(def_id), round_count)
227 }) {
228 break;
229 }
230 merges.insert(*src, *dest);
231 merged_locals.insert(*src);
232 merged_locals.insert(*dest);
233 }
234 trace!(merging = ?merges);
235
236 if merges.is_empty() {
237 break;
238 }
239 round_count += 1;
240
241 apply_merges(body, tcx, &merges, &merged_locals);
242 }
243
244 if round_count != 0 {
245 // Merging can introduce overlap between moved arguments and/or call destination in an
246 // unreachable code, which validator considers to be ill-formed.
247 remove_dead_blocks(tcx, body);
248 }
249
250 trace!(round_count);
251 }
252 }
253
254 /// Container for the various allocations that we need.
255 ///
256 /// We store these here and hand out `&mut` access to them, instead of dropping and recreating them
257 /// frequently. Everything with a `&'alloc` lifetime points into here.
258 #[derive(Default)]
259 struct Allocations {
260 candidates: FxHashMap<Local, Vec<Local>>,
261 candidates_reverse: FxHashMap<Local, Vec<Local>>,
262 write_info: WriteInfo,
263 // PERF: Do this for `MaybeLiveLocals` allocations too.
264 }
265
266 #[derive(Debug)]
267 struct Candidates<'alloc> {
268 /// The set of candidates we are considering in this optimization.
269 ///
270 /// We will always merge the key into at most one of its values.
271 ///
272 /// Whether a place ends up in the key or the value does not correspond to whether it appears as
273 /// the lhs or rhs of any assignment. As a matter of fact, the places in here might never appear
274 /// in an assignment at all. This happens because if we see an assignment like this:
275 ///
276 /// ```ignore (syntax-highlighting-only)
277 /// _1.0 = _2.0
278 /// ```
279 ///
280 /// We will still report that we would like to merge `_1` and `_2` in an attempt to allow us to
281 /// remove that assignment.
282 c: &'alloc mut FxHashMap<Local, Vec<Local>>,
283 /// A reverse index of the `c` set; if the `c` set contains `a => Place { local: b, proj }`,
284 /// then this contains `b => a`.
285 // PERF: Possibly these should be `SmallVec`s?
286 reverse: &'alloc mut FxHashMap<Local, Vec<Local>>,
287 }
288
289 //////////////////////////////////////////////////////////
290 // Merging
291 //
292 // Applies the actual optimization
293
294 fn apply_merges<'tcx>(
295 body: &mut Body<'tcx>,
296 tcx: TyCtxt<'tcx>,
297 merges: &FxHashMap<Local, Local>,
298 merged_locals: &BitSet<Local>,
299 ) {
300 let mut merger = Merger { tcx, merges, merged_locals };
301 merger.visit_body_preserves_cfg(body);
302 }
303
304 struct Merger<'a, 'tcx> {
305 tcx: TyCtxt<'tcx>,
306 merges: &'a FxHashMap<Local, Local>,
307 merged_locals: &'a BitSet<Local>,
308 }
309
310 impl<'a, 'tcx> MutVisitor<'tcx> for Merger<'a, 'tcx> {
311 fn tcx(&self) -> TyCtxt<'tcx> {
312 self.tcx
313 }
314
315 fn visit_local(&mut self, local: &mut Local, _: PlaceContext, _location: Location) {
316 if let Some(dest) = self.merges.get(local) {
317 *local = *dest;
318 }
319 }
320
321 fn visit_statement(&mut self, statement: &mut Statement<'tcx>, location: Location) {
322 match &statement.kind {
323 // FIXME: Don't delete storage statements, but "merge" the storage ranges instead.
324 StatementKind::StorageDead(local) | StatementKind::StorageLive(local)
325 if self.merged_locals.contains(*local) =>
326 {
327 statement.make_nop();
328 return;
329 }
330 _ => (),
331 };
332 self.super_statement(statement, location);
333 match &statement.kind {
334 StatementKind::Assign(box (dest, rvalue)) => {
335 match rvalue {
336 Rvalue::CopyForDeref(place)
337 | Rvalue::Use(Operand::Copy(place) | Operand::Move(place)) => {
338 // These might've been turned into self-assignments by the replacement
339 // (this includes the original statement we wanted to eliminate).
340 if dest == place {
341 debug!("{:?} turned into self-assignment, deleting", location);
342 statement.make_nop();
343 }
344 }
345 _ => {}
346 }
347 }
348
349 _ => {}
350 }
351 }
352 }
353
354 //////////////////////////////////////////////////////////
355 // Liveness filtering
356 //
357 // This section enforces bullet point 2
358
359 struct FilterInformation<'a, 'body, 'alloc, 'tcx> {
360 body: &'body Body<'tcx>,
361 live: &'a mut ResultsCursor<'body, 'tcx, MaybeLiveLocals>,
362 candidates: &'a mut Candidates<'alloc>,
363 write_info: &'alloc mut WriteInfo,
364 at: Location,
365 }
366
367 // We first implement some utility functions which we will expose removing candidates according to
368 // different needs. Throughout the liveness filtering, the `candidates` are only ever accessed
369 // through these methods, and not directly.
370 impl<'alloc> Candidates<'alloc> {
371 /// Just `Vec::retain`, but the condition is inverted and we add debugging output
372 fn vec_filter_candidates(
373 src: Local,
374 v: &mut Vec<Local>,
375 mut f: impl FnMut(Local) -> CandidateFilter,
376 at: Location,
377 ) {
378 v.retain(|dest| {
379 let remove = f(*dest);
380 if remove == CandidateFilter::Remove {
381 trace!("eliminating {:?} => {:?} due to conflict at {:?}", src, dest, at);
382 }
383 remove == CandidateFilter::Keep
384 });
385 }
386
387 /// `vec_filter_candidates` but for an `Entry`
388 fn entry_filter_candidates(
389 mut entry: OccupiedEntry<'_, Local, Vec<Local>>,
390 p: Local,
391 f: impl FnMut(Local) -> CandidateFilter,
392 at: Location,
393 ) {
394 let candidates = entry.get_mut();
395 Self::vec_filter_candidates(p, candidates, f, at);
396 if candidates.len() == 0 {
397 entry.remove();
398 }
399 }
400
401 /// For all candidates `(p, q)` or `(q, p)` removes the candidate if `f(q)` says to do so
402 fn filter_candidates_by(
403 &mut self,
404 p: Local,
405 mut f: impl FnMut(Local) -> CandidateFilter,
406 at: Location,
407 ) {
408 // Cover the cases where `p` appears as a `src`
409 if let Entry::Occupied(entry) = self.c.entry(p) {
410 Self::entry_filter_candidates(entry, p, &mut f, at);
411 }
412 // And the cases where `p` appears as a `dest`
413 let Some(srcs) = self.reverse.get_mut(&p) else {
414 return;
415 };
416 // We use `retain` here to remove the elements from the reverse set if we've removed the
417 // matching candidate in the forward set.
418 srcs.retain(|src| {
419 if f(*src) == CandidateFilter::Keep {
420 return true;
421 }
422 let Entry::Occupied(entry) = self.c.entry(*src) else {
423 return false;
424 };
425 Self::entry_filter_candidates(
426 entry,
427 *src,
428 |dest| {
429 if dest == p { CandidateFilter::Remove } else { CandidateFilter::Keep }
430 },
431 at,
432 );
433 false
434 });
435 }
436 }
437
438 #[derive(Copy, Clone, PartialEq, Eq)]
439 enum CandidateFilter {
440 Keep,
441 Remove,
442 }
443
444 impl<'a, 'body, 'alloc, 'tcx> FilterInformation<'a, 'body, 'alloc, 'tcx> {
445 /// Filters the set of candidates to remove those that conflict.
446 ///
447 /// The steps we take are exactly those that are outlined at the top of the file. For each
448 /// statement/terminator, we collect the set of locals that are written to in that
449 /// statement/terminator, and then we remove all pairs of candidates that contain one such local
450 /// and another one that is live.
451 ///
452 /// We need to be careful about the ordering of operations within each statement/terminator
453 /// here. Many statements might write and read from more than one place, and we need to consider
454 /// them all. The strategy for doing this is as follows: We first gather all the places that are
455 /// written to within the statement/terminator via `WriteInfo`. Then, we use the liveness
456 /// analysis from *before* the statement/terminator (in the control flow sense) to eliminate
457 /// candidates - this is because we want to conservatively treat a pair of locals that is both
458 /// read and written in the statement/terminator to be conflicting, and the liveness analysis
459 /// before the statement/terminator will correctly report locals that are read in the
460 /// statement/terminator to be live. We are additionally conservative by treating all written to
461 /// locals as also being read from.
462 fn filter_liveness<'b>(
463 candidates: &mut Candidates<'alloc>,
464 live: &mut ResultsCursor<'b, 'tcx, MaybeLiveLocals>,
465 write_info_alloc: &'alloc mut WriteInfo,
466 body: &'b Body<'tcx>,
467 ) {
468 let mut this = FilterInformation {
469 body,
470 live,
471 candidates,
472 // We don't actually store anything at this scope, we just keep things here to be able
473 // to reuse the allocation.
474 write_info: write_info_alloc,
475 // Doesn't matter what we put here, will be overwritten before being used
476 at: Location::START,
477 };
478 this.internal_filter_liveness();
479 }
480
481 fn internal_filter_liveness(&mut self) {
482 for (block, data) in traversal::preorder(self.body) {
483 self.at = Location { block, statement_index: data.statements.len() };
484 self.live.seek_after_primary_effect(self.at);
485 self.write_info.for_terminator(&data.terminator().kind);
486 self.apply_conflicts();
487
488 for (i, statement) in data.statements.iter().enumerate().rev() {
489 self.at = Location { block, statement_index: i };
490 self.live.seek_after_primary_effect(self.at);
491 self.write_info.for_statement(&statement.kind, self.body);
492 self.apply_conflicts();
493 }
494 }
495 }
496
497 fn apply_conflicts(&mut self) {
498 let writes = &self.write_info.writes;
499 for p in writes {
500 let other_skip = self.write_info.skip_pair.and_then(|(a, b)| {
501 if a == *p {
502 Some(b)
503 } else if b == *p {
504 Some(a)
505 } else {
506 None
507 }
508 });
509 self.candidates.filter_candidates_by(
510 *p,
511 |q| {
512 if Some(q) == other_skip {
513 return CandidateFilter::Keep;
514 }
515 // It is possible that a local may be live for less than the
516 // duration of a statement This happens in the case of function
517 // calls or inline asm. Because of this, we also mark locals as
518 // conflicting when both of them are written to in the same
519 // statement.
520 if self.live.contains(q) || writes.contains(&q) {
521 CandidateFilter::Remove
522 } else {
523 CandidateFilter::Keep
524 }
525 },
526 self.at,
527 );
528 }
529 }
530 }
531
532 /// Describes where a statement/terminator writes to
533 #[derive(Default, Debug)]
534 struct WriteInfo {
535 writes: Vec<Local>,
536 /// If this pair of locals is a candidate pair, completely skip processing it during this
537 /// statement. All other candidates are unaffected.
538 skip_pair: Option<(Local, Local)>,
539 }
540
541 impl WriteInfo {
542 fn for_statement<'tcx>(&mut self, statement: &StatementKind<'tcx>, body: &Body<'tcx>) {
543 self.reset();
544 match statement {
545 StatementKind::Assign(box (lhs, rhs)) => {
546 self.add_place(*lhs);
547 match rhs {
548 Rvalue::Use(op) => {
549 self.add_operand(op);
550 self.consider_skipping_for_assign_use(*lhs, op, body);
551 }
552 Rvalue::Repeat(op, _) => {
553 self.add_operand(op);
554 }
555 Rvalue::Cast(_, op, _)
556 | Rvalue::UnaryOp(_, op)
557 | Rvalue::ShallowInitBox(op, _) => {
558 self.add_operand(op);
559 }
560 Rvalue::BinaryOp(_, ops) | Rvalue::CheckedBinaryOp(_, ops) => {
561 for op in [&ops.0, &ops.1] {
562 self.add_operand(op);
563 }
564 }
565 Rvalue::Aggregate(_, ops) => {
566 for op in ops {
567 self.add_operand(op);
568 }
569 }
570 Rvalue::ThreadLocalRef(_)
571 | Rvalue::NullaryOp(_, _)
572 | Rvalue::Ref(_, _, _)
573 | Rvalue::AddressOf(_, _)
574 | Rvalue::Len(_)
575 | Rvalue::Discriminant(_)
576 | Rvalue::CopyForDeref(_) => (),
577 }
578 }
579 // Retags are technically also reads, but reporting them as a write suffices
580 StatementKind::SetDiscriminant { place, .. }
581 | StatementKind::Deinit(place)
582 | StatementKind::Retag(_, place) => {
583 self.add_place(**place);
584 }
585 StatementKind::Intrinsic(_)
586 | StatementKind::ConstEvalCounter
587 | StatementKind::Nop
588 | StatementKind::Coverage(_)
589 | StatementKind::StorageLive(_)
590 | StatementKind::StorageDead(_)
591 | StatementKind::PlaceMention(_) => (),
592 StatementKind::FakeRead(_) | StatementKind::AscribeUserType(_, _) => {
593 bug!("{:?} not found in this MIR phase", statement)
594 }
595 }
596 }
597
598 fn consider_skipping_for_assign_use<'tcx>(
599 &mut self,
600 lhs: Place<'tcx>,
601 rhs: &Operand<'tcx>,
602 body: &Body<'tcx>,
603 ) {
604 let Some(rhs) = rhs.place() else { return };
605 if let Some(pair) = places_to_candidate_pair(lhs, rhs, body) {
606 self.skip_pair = Some(pair);
607 }
608 }
609
610 fn for_terminator<'tcx>(&mut self, terminator: &TerminatorKind<'tcx>) {
611 self.reset();
612 match terminator {
613 TerminatorKind::SwitchInt { discr: op, .. }
614 | TerminatorKind::Assert { cond: op, .. } => {
615 self.add_operand(op);
616 }
617 TerminatorKind::Call { destination, func, args, .. } => {
618 self.add_place(*destination);
619 self.add_operand(func);
620 for arg in args {
621 self.add_operand(arg);
622 }
623 }
624 TerminatorKind::InlineAsm { operands, .. } => {
625 for asm_operand in operands {
626 match asm_operand {
627 InlineAsmOperand::In { value, .. } => {
628 self.add_operand(value);
629 }
630 InlineAsmOperand::Out { place, .. } => {
631 if let Some(place) = place {
632 self.add_place(*place);
633 }
634 }
635 // Note that the `late` field in `InOut` is about whether the registers used
636 // for these things overlap, and is of absolutely no interest to us.
637 InlineAsmOperand::InOut { in_value, out_place, .. } => {
638 if let Some(place) = out_place {
639 self.add_place(*place);
640 }
641 self.add_operand(in_value);
642 }
643 InlineAsmOperand::Const { .. }
644 | InlineAsmOperand::SymFn { .. }
645 | InlineAsmOperand::SymStatic { .. } => (),
646 }
647 }
648 }
649 TerminatorKind::Goto { .. }
650 | TerminatorKind::UnwindResume
651 | TerminatorKind::UnwindTerminate(_)
652 | TerminatorKind::Return
653 | TerminatorKind::Unreachable { .. } => (),
654 TerminatorKind::Drop { .. } => {
655 // `Drop`s create a `&mut` and so are not considered
656 }
657 TerminatorKind::Yield { .. }
658 | TerminatorKind::GeneratorDrop
659 | TerminatorKind::FalseEdge { .. }
660 | TerminatorKind::FalseUnwind { .. } => {
661 bug!("{:?} not found in this MIR phase", terminator)
662 }
663 }
664 }
665
666 fn add_place(&mut self, place: Place<'_>) {
667 self.writes.push(place.local);
668 }
669
670 fn add_operand<'tcx>(&mut self, op: &Operand<'tcx>) {
671 match op {
672 // FIXME(JakobDegen): In a previous version, the `Move` case was incorrectly treated as
673 // being a read only. This was unsound, however we cannot add a regression test because
674 // it is not possible to set this off with current MIR. Once we have that ability, a
675 // regression test should be added.
676 Operand::Move(p) => self.add_place(*p),
677 Operand::Copy(_) | Operand::Constant(_) => (),
678 }
679 }
680
681 fn reset(&mut self) {
682 self.writes.clear();
683 self.skip_pair = None;
684 }
685 }
686
687 /////////////////////////////////////////////////////
688 // Candidate accumulation
689
690 /// If the pair of places is being considered for merging, returns the candidate which would be
691 /// merged in order to accomplish this.
692 ///
693 /// The contract here is in one direction - there is a guarantee that merging the locals that are
694 /// outputted by this function would result in an assignment between the inputs becoming a
695 /// self-assignment. However, there is no guarantee that the returned pair is actually suitable for
696 /// merging - candidate collection must still check this independently.
697 ///
698 /// This output is unique for each unordered pair of input places.
699 fn places_to_candidate_pair<'tcx>(
700 a: Place<'tcx>,
701 b: Place<'tcx>,
702 body: &Body<'tcx>,
703 ) -> Option<(Local, Local)> {
704 let (mut a, mut b) = if a.projection.len() == 0 && b.projection.len() == 0 {
705 (a.local, b.local)
706 } else {
707 return None;
708 };
709
710 // By sorting, we make sure we're input order independent
711 if a > b {
712 std::mem::swap(&mut a, &mut b);
713 }
714
715 // We could now return `(a, b)`, but then we miss some candidates in the case where `a` can't be
716 // used as a `src`.
717 if is_local_required(a, body) {
718 std::mem::swap(&mut a, &mut b);
719 }
720 // We could check `is_local_required` again here, but there's no need - after all, we make no
721 // promise that the candidate pair is actually valid
722 Some((a, b))
723 }
724
725 /// Collects the candidates for merging
726 ///
727 /// This is responsible for enforcing the first and third bullet point.
728 fn find_candidates<'alloc, 'tcx>(
729 body: &Body<'tcx>,
730 borrowed: &BitSet<Local>,
731 candidates: &'alloc mut FxHashMap<Local, Vec<Local>>,
732 candidates_reverse: &'alloc mut FxHashMap<Local, Vec<Local>>,
733 ) -> Candidates<'alloc> {
734 candidates.clear();
735 candidates_reverse.clear();
736 let mut visitor = FindAssignments { body, candidates, borrowed };
737 visitor.visit_body(body);
738 // Deduplicate candidates
739 for (_, cands) in candidates.iter_mut() {
740 cands.sort();
741 cands.dedup();
742 }
743 // Generate the reverse map
744 for (src, cands) in candidates.iter() {
745 for dest in cands.iter().copied() {
746 candidates_reverse.entry(dest).or_default().push(*src);
747 }
748 }
749 Candidates { c: candidates, reverse: candidates_reverse }
750 }
751
752 struct FindAssignments<'a, 'alloc, 'tcx> {
753 body: &'a Body<'tcx>,
754 candidates: &'alloc mut FxHashMap<Local, Vec<Local>>,
755 borrowed: &'a BitSet<Local>,
756 }
757
758 impl<'tcx> Visitor<'tcx> for FindAssignments<'_, '_, 'tcx> {
759 fn visit_statement(&mut self, statement: &Statement<'tcx>, _: Location) {
760 if let StatementKind::Assign(box (
761 lhs,
762 Rvalue::CopyForDeref(rhs) | Rvalue::Use(Operand::Copy(rhs) | Operand::Move(rhs)),
763 )) = &statement.kind
764 {
765 let Some((src, dest)) = places_to_candidate_pair(*lhs, *rhs, self.body) else {
766 return;
767 };
768
769 // As described at the top of the file, we do not go near things that have
770 // their address taken.
771 if self.borrowed.contains(src) || self.borrowed.contains(dest) {
772 return;
773 }
774
775 // As described at the top of this file, we do not touch locals which have
776 // different types.
777 let src_ty = self.body.local_decls()[src].ty;
778 let dest_ty = self.body.local_decls()[dest].ty;
779 if src_ty != dest_ty {
780 // FIXME(#112651): This can be removed afterwards. Also update the module description.
781 trace!("skipped `{src:?} = {dest:?}` due to subtyping: {src_ty} != {dest_ty}");
782 return;
783 }
784
785 // Also, we need to make sure that MIR actually allows the `src` to be removed
786 if is_local_required(src, self.body) {
787 return;
788 }
789
790 // We may insert duplicates here, but that's fine
791 self.candidates.entry(src).or_default().push(dest);
792 }
793 }
794 }
795
796 /// Some locals are part of the function's interface and can not be removed.
797 ///
798 /// Note that these locals *can* still be merged with non-required locals by removing that other
799 /// local.
800 fn is_local_required(local: Local, body: &Body<'_>) -> bool {
801 match body.local_kind(local) {
802 LocalKind::Arg | LocalKind::ReturnPointer => true,
803 LocalKind::Temp => false,
804 }
805 }
806
807 /////////////////////////////////////////////////////////
808 // MIR Dump
809
810 fn dest_prop_mir_dump<'body, 'tcx>(
811 tcx: TyCtxt<'tcx>,
812 body: &'body Body<'tcx>,
813 live: &mut ResultsCursor<'body, 'tcx, MaybeLiveLocals>,
814 round: usize,
815 ) {
816 let mut reachable = None;
817 dump_mir(tcx, false, "DestinationPropagation-dataflow", &round, body, |pass_where, w| {
818 let reachable = reachable.get_or_insert_with(|| traversal::reachable_as_bitset(body));
819
820 match pass_where {
821 PassWhere::BeforeLocation(loc) if reachable.contains(loc.block) => {
822 live.seek_after_primary_effect(loc);
823 writeln!(w, " // live: {:?}", live.get())?;
824 }
825 PassWhere::AfterTerminator(bb) if reachable.contains(bb) => {
826 let loc = body.terminator_loc(bb);
827 live.seek_before_primary_effect(loc);
828 writeln!(w, " // live: {:?}", live.get())?;
829 }
830
831 PassWhere::BeforeBlock(bb) if reachable.contains(bb) => {
832 live.seek_to_block_start(bb);
833 writeln!(w, " // live: {:?}", live.get())?;
834 }
835
836 PassWhere::BeforeCFG | PassWhere::AfterCFG | PassWhere::AfterLocation(_) => {}
837
838 PassWhere::BeforeLocation(_) | PassWhere::AfterTerminator(_) => {
839 writeln!(w, " // live: <unreachable>")?;
840 }
841
842 PassWhere::BeforeBlock(_) => {
843 writeln!(w, " // live: <unreachable>")?;
844 }
845 }
846
847 Ok(())
848 });
849 }