compiler/rustc_mir_transform/src/dest_prop.rs

   1 //! Propagates assignment destinations backwards in the CFG to eliminate redundant assignments.
   2 //!
   3 //! # Motivation
   4 //!
   5 //! MIR building can insert a lot of redundant copies, and Rust code in general often tends to move
   6 //! values around a lot. The result is a lot of assignments of the form `dest = {move} src;` in MIR.
   7 //! MIR building for constants in particular tends to create additional locals that are only used
   8 //! inside a single block to shuffle a value around unnecessarily.
   9 //!
  10 //! LLVM by itself is not good enough at eliminating these redundant copies (eg. see
  11 //! <https://github.com/rust-lang/rust/issues/32966>), so this leaves some performance on the table
  12 //! that we can regain by implementing an optimization for removing these assign statements in rustc
  13 //! itself. When this optimization runs fast enough, it can also speed up the constant evaluation
  14 //! and code generation phases of rustc due to the reduced number of statements and locals.
  15 //!
  16 //! # The Optimization
  17 //!
  18 //! Conceptually, this optimization is "destination propagation". It is similar to the Named Return
  19 //! Value Optimization, or NRVO, known from the C++ world, except that it isn't limited to return
  20 //! values or the return place `_0`. On a very high level, independent of the actual implementation
  21 //! details, it does the following:
  22 //!
  23 //! 1) Identify `dest = src;` statements with values for `dest` and `src` whose storage can soundly
  24 //!    be merged.
  25 //! 2) Replace all mentions of `src` with `dest` ("unifying" them and propagating the destination
  26 //!    backwards).
  27 //! 3) Delete the `dest = src;` statement (by making it a `nop`).
  28 //!
  29 //! Step 1) is by far the hardest, so it is explained in more detail below.
  30 //!
  31 //! ## Soundness
  32 //!
  33 //! We have a pair of places `p` and `q`, whose memory we would like to merge. In order for this to
  34 //! be sound, we need to check a number of conditions:
  35 //!
  36 //! * `p` and `q` must both be *constant* - it does not make much sense to talk about merging them
  37 //!   if they do not consistently refer to the same place in memory. This is satisfied if they do
  38 //!   not contain any indirection through a pointer or any indexing projections.
  39 //!
  40 //! * `p` and `q` must have the **same type**. If we replace a local with a subtype or supertype,
  41 //!   we may end up with a differnet vtable for that local. See the `subtyping-impacts-selection`
  42 //!   tests for an example where that causes issues.
  43 //!
  44 //! * We need to make sure that the goal of "merging the memory" is actually structurally possible
  45 //!   in MIR. For example, even if all the other conditions are satisfied, there is no way to
  46 //!   "merge" `_5.foo` and `_6.bar`. For now, we ensure this by requiring that both `p` and `q` are
  47 //!   locals with no further projections. Future iterations of this pass should improve on this.
  48 //!
  49 //! * Finally, we want `p` and `q` to use the same memory - however, we still need to make sure that
  50 //!   each of them has enough "ownership" of that memory to continue "doing its job." More
  51 //!   precisely, what we will check is that whenever the program performs a write to `p`, then it
  52 //!   does not currently care about what the value in `q` is (and vice versa). We formalize the
  53 //!   notion of "does not care what the value in `q` is" by checking the *liveness* of `q`.
  54 //!
  55 //!   Because of the difficulty of computing liveness of places that have their address taken, we do
  56 //!   not even attempt to do it. Any places that are in a local that has its address taken is
  57 //!   excluded from the optimization.
  58 //!
  59 //! The first two conditions are simple structural requirements on the `Assign` statements that can
  60 //! be trivially checked. The third requirement however is more difficult and costly to check.
  61 //!
  62 //! ## Future Improvements
  63 //!
  64 //! There are a number of ways in which this pass could be improved in the future:
  65 //!
  66 //! * Merging storage liveness ranges instead of removing storage statements completely. This may
  67 //!   improve stack usage.
  68 //!
  69 //! * Allow merging locals into places with projections, eg `_5` into `_6.foo`.
  70 //!
  71 //! * Liveness analysis with more precision than whole locals at a time. The smaller benefit of this
  72 //!   is that it would allow us to dest prop at "sub-local" levels in some cases. The bigger benefit
  73 //!   of this is that such liveness analysis can report more accurate results about whole locals at
  74 //!   a time. For example, consider:
  75 //!
  76 //!   ```ignore (syntax-highlighting-only)
  77 //!   _1 = u;
  78 //!   // unrelated code
  79 //!   _1.f1 = v;
  80 //!   _2 = _1.f1;
  81 //!   ```
  82 //!
  83 //!   Because the current analysis only thinks in terms of locals, it does not have enough
  84 //!   information to report that `_1` is dead in the "unrelated code" section.
  85 //!
  86 //! * Liveness analysis enabled by alias analysis. This would allow us to not just bail on locals
  87 //!   that ever have their address taken. Of course that requires actually having alias analysis
  88 //!   (and a model to build it on), so this might be a bit of a ways off.
  89 //!
  90 //! * Various perf improvements. There are a bunch of comments in here marked `PERF` with ideas for
  91 //!   how to do things more efficiently. However, the complexity of the pass as a whole should be
  92 //!   kept in mind.
  93 //!
  94 //! ## Previous Work
  95 //!
  96 //! A [previous attempt][attempt 1] at implementing an optimization like this turned out to be a
  97 //! significant regression in compiler performance. Fixing the regressions introduced a lot of
  98 //! undesirable complexity to the implementation.
  99 //!
 100 //! A [subsequent approach][attempt 2] tried to avoid the costly computation by limiting itself to
 101 //! acyclic CFGs, but still turned out to be far too costly to run due to suboptimal performance
 102 //! within individual basic blocks, requiring a walk across the entire block for every assignment
 103 //! found within the block. For the `tuple-stress` benchmark, which has 458745 statements in a
 104 //! single block, this proved to be far too costly.
 105 //!
 106 //! [Another approach after that][attempt 3] was much closer to correct, but had some soundness
 107 //! issues - it was failing to consider stores outside live ranges, and failed to uphold some of the
 108 //! requirements that MIR has for non-overlapping places within statements. However, it also had
 109 //! performance issues caused by `O(l² * s)` runtime, where `l` is the number of locals and `s` is
 110 //! the number of statements and terminators.
 111 //!
 112 //! Since the first attempt at this, the compiler has improved dramatically, and new analysis
 113 //! frameworks have been added that should make this approach viable without requiring a limited
 114 //! approach that only works for some classes of CFGs:
 115 //! - rustc now has a powerful dataflow analysis framework that can handle forwards and backwards
 116 //!   analyses efficiently.
 117 //! - Layout optimizations for generators have been added to improve code generation for
 118 //!   async/await, which are very similar in spirit to what this optimization does.
 119 //!
 120 //! Also, rustc now has a simple NRVO pass (see `nrvo.rs`), which handles a subset of the cases that
 121 //! this destination propagation pass handles, proving that similar optimizations can be performed
 122 //! on MIR.
 123 //!
 124 //! ## Pre/Post Optimization
 125 //!
 126 //! It is recommended to run `SimplifyCfg` and then `SimplifyLocals` some time after this pass, as
 127 //! it replaces the eliminated assign statements with `nop`s and leaves unused locals behind.
 128 //!
 129 //! [liveness]: https://en.wikipedia.org/wiki/Live_variable_analysis
 130 //! [attempt 1]: https://github.com/rust-lang/rust/pull/47954
 131 //! [attempt 2]: https://github.com/rust-lang/rust/pull/71003
 132 //! [attempt 3]: https://github.com/rust-lang/rust/pull/72632
 133
 134 use std::collections::hash_map::{Entry, OccupiedEntry};
 135
 136 use crate::simplify::remove_dead_blocks;
 137 use crate::MirPass;
 138 use rustc_data_structures::fx::FxHashMap;
 139 use rustc_index::bit_set::BitSet;
 140 use rustc_middle::mir::visit::{MutVisitor, PlaceContext, Visitor};
 141 use rustc_middle::mir::HasLocalDecls;
 142 use rustc_middle::mir::{dump_mir, PassWhere};
 143 use rustc_middle::mir::{
 144     traversal, Body, InlineAsmOperand, Local, LocalKind, Location, Operand, Place, Rvalue,
 145     Statement, StatementKind, TerminatorKind,
 146 };
 147 use rustc_middle::ty::TyCtxt;
 148 use rustc_mir_dataflow::impls::MaybeLiveLocals;
 149 use rustc_mir_dataflow::{Analysis, ResultsCursor};
 150
 151 pub struct DestinationPropagation;
 152
 153 impl<'tcx> MirPass<'tcx> for DestinationPropagation {
 154     fn is_enabled(&self, sess: &rustc_session::Session) -> bool {
 155         // For now, only run at MIR opt level 3. Two things need to be changed before this can be
 156         // turned on by default:
 157         //  1. Because of the overeager removal of storage statements, this can cause stack space
 158         //     regressions. This opt is not the place to fix this though, it's a more general
 159         //     problem in MIR.
 160         //  2. Despite being an overall perf improvement, this still causes a 30% regression in
 161         //     keccak. We can temporarily fix this by bounding function size, but in the long term
 162         //     we should fix this by being smarter about invalidating analysis results.
 163         sess.mir_opt_level() >= 3
 164     }
 165
 166     fn run_pass(&self, tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) {
 167         let def_id = body.source.def_id();
 168         let mut allocations = Allocations::default();
 169         trace!(func = ?tcx.def_path_str(def_id));
 170
 171         let borrowed = rustc_mir_dataflow::impls::borrowed_locals(body);
 172
 173         // In order to avoid having to collect data for every single pair of locals in the body, we
 174         // do not allow doing more than one merge for places that are derived from the same local at
 175         // once. To avoid missed opportunities, we instead iterate to a fixed point - we'll refer to
 176         // each of these iterations as a "round."
 177         //
 178         // Reaching a fixed point could in theory take up to `min(l, s)` rounds - however, we do not
 179         // expect to see MIR like that. To verify this, a test was run against `[rust-lang/regex]` -
 180         // the average MIR body saw 1.32 full iterations of this loop. The most that was hit were 30
 181         // for a single function. Only 80/2801 (2.9%) of functions saw at least 5.
 182         //
 183         // [rust-lang/regex]:
 184         //     https://github.com/rust-lang/regex/tree/b5372864e2df6a2f5e543a556a62197f50ca3650
 185         let mut round_count = 0;
 186         loop {
 187             // PERF: Can we do something smarter than recalculating the candidates and liveness
 188             // results?
 189             let mut candidates = find_candidates(
 190                 body,
 191                 &borrowed,
 192                 &mut allocations.candidates,
 193                 &mut allocations.candidates_reverse,
 194             );
 195             trace!(?candidates);
 196             let mut live = MaybeLiveLocals
 197                 .into_engine(tcx, body)
 198                 .iterate_to_fixpoint()
 199                 .into_results_cursor(body);
 200             dest_prop_mir_dump(tcx, body, &mut live, round_count);
 201
 202             FilterInformation::filter_liveness(
 203                 &mut candidates,
 204                 &mut live,
 205                 &mut allocations.write_info,
 206                 body,
 207             );
 208
 209             // Because we do not update liveness information, it is unsound to use a local for more
 210             // than one merge operation within a single round of optimizations. We store here which
 211             // ones we have already used.
 212             let mut merged_locals: BitSet<Local> = BitSet::new_empty(body.local_decls.len());
 213
 214             // This is the set of merges we will apply this round. It is a subset of the candidates.
 215             let mut merges = FxHashMap::default();
 216
 217             for (src, candidates) in candidates.c.iter() {
 218                 if merged_locals.contains(*src) {
 219                     continue;
 220                 }
 221                 let Some(dest) = candidates.iter().find(|dest| !merged_locals.contains(**dest))
 222                 else {
 223                     continue;
 224                 };
 225                 if !tcx.consider_optimizing(|| {
 226                     format!("{} round {}", tcx.def_path_str(def_id), round_count)
 227                 }) {
 228                     break;
 229                 }
 230                 merges.insert(*src, *dest);
 231                 merged_locals.insert(*src);
 232                 merged_locals.insert(*dest);
 233             }
 234             trace!(merging = ?merges);
 235
 236             if merges.is_empty() {
 237                 break;
 238             }
 239             round_count += 1;
 240
 241             apply_merges(body, tcx, &merges, &merged_locals);
 242         }
 243
 244         if round_count != 0 {
 245             // Merging can introduce overlap between moved arguments and/or call destination in an
 246             // unreachable code, which validator considers to be ill-formed.
 247             remove_dead_blocks(tcx, body);
 248         }
 249
 250         trace!(round_count);
 251     }
 252 }
 253
 254 /// Container for the various allocations that we need.
 255 ///
 256 /// We store these here and hand out `&mut` access to them, instead of dropping and recreating them
 257 /// frequently. Everything with a `&'alloc` lifetime points into here.
 258 #[derive(Default)]
 259 struct Allocations {
 260     candidates: FxHashMap<Local, Vec<Local>>,
 261     candidates_reverse: FxHashMap<Local, Vec<Local>>,
 262     write_info: WriteInfo,
 263     // PERF: Do this for `MaybeLiveLocals` allocations too.
 264 }
 265
 266 #[derive(Debug)]
 267 struct Candidates<'alloc> {
 268     /// The set of candidates we are considering in this optimization.
 269     ///
 270     /// We will always merge the key into at most one of its values.
 271     ///
 272     /// Whether a place ends up in the key or the value does not correspond to whether it appears as
 273     /// the lhs or rhs of any assignment. As a matter of fact, the places in here might never appear
 274     /// in an assignment at all. This happens because if we see an assignment like this:
 275     ///
 276     /// ```ignore (syntax-highlighting-only)
 277     /// _1.0 = _2.0
 278     /// ```
 279     ///
 280     /// We will still report that we would like to merge `_1` and `_2` in an attempt to allow us to
 281     /// remove that assignment.
 282     c: &'alloc mut FxHashMap<Local, Vec<Local>>,
 283     /// A reverse index of the `c` set; if the `c` set contains `a => Place { local: b, proj }`,
 284     /// then this contains `b => a`.
 285     // PERF: Possibly these should be `SmallVec`s?
 286     reverse: &'alloc mut FxHashMap<Local, Vec<Local>>,
 287 }
 288
 289 //////////////////////////////////////////////////////////
 290 // Merging
 291 //
 292 // Applies the actual optimization
 293
 294 fn apply_merges<'tcx>(
 295     body: &mut Body<'tcx>,
 296     tcx: TyCtxt<'tcx>,
 297     merges: &FxHashMap<Local, Local>,
 298     merged_locals: &BitSet<Local>,
 299 ) {
 300     let mut merger = Merger { tcx, merges, merged_locals };
 301     merger.visit_body_preserves_cfg(body);
 302 }
 303
 304 struct Merger<'a, 'tcx> {
 305     tcx: TyCtxt<'tcx>,
 306     merges: &'a FxHashMap<Local, Local>,
 307     merged_locals: &'a BitSet<Local>,
 308 }
 309
 310 impl<'a, 'tcx> MutVisitor<'tcx> for Merger<'a, 'tcx> {
 311     fn tcx(&self) -> TyCtxt<'tcx> {
 312         self.tcx
 313     }
 314
 315     fn visit_local(&mut self, local: &mut Local, _: PlaceContext, _location: Location) {
 316         if let Some(dest) = self.merges.get(local) {
 317             *local = *dest;
 318         }
 319     }
 320
 321     fn visit_statement(&mut self, statement: &mut Statement<'tcx>, location: Location) {
 322         match &statement.kind {
 323             // FIXME: Don't delete storage statements, but "merge" the storage ranges instead.
 324             StatementKind::StorageDead(local) | StatementKind::StorageLive(local)
 325                 if self.merged_locals.contains(*local) =>
 326             {
 327                 statement.make_nop();
 328                 return;
 329             }
 330             _ => (),
 331         };
 332         self.super_statement(statement, location);
 333         match &statement.kind {
 334             StatementKind::Assign(box (dest, rvalue)) => {
 335                 match rvalue {
 336                     Rvalue::CopyForDeref(place)
 337                     | Rvalue::Use(Operand::Copy(place) | Operand::Move(place)) => {
 338                         // These might've been turned into self-assignments by the replacement
 339                         // (this includes the original statement we wanted to eliminate).
 340                         if dest == place {
 341                             debug!("{:?} turned into self-assignment, deleting", location);
 342                             statement.make_nop();
 343                         }
 344                     }
 345                     _ => {}
 346                 }
 347             }
 348
 349             _ => {}
 350         }
 351     }
 352 }
 353
 354 //////////////////////////////////////////////////////////
 355 // Liveness filtering
 356 //
 357 // This section enforces bullet point 2
 358
 359 struct FilterInformation<'a, 'body, 'alloc, 'tcx> {
 360     body: &'body Body<'tcx>,
 361     live: &'a mut ResultsCursor<'body, 'tcx, MaybeLiveLocals>,
 362     candidates: &'a mut Candidates<'alloc>,
 363     write_info: &'alloc mut WriteInfo,
 364     at: Location,
 365 }
 366
 367 // We first implement some utility functions which we will expose removing candidates according to
 368 // different needs. Throughout the liveness filtering, the `candidates` are only ever accessed
 369 // through these methods, and not directly.
 370 impl<'alloc> Candidates<'alloc> {
 371     /// Just `Vec::retain`, but the condition is inverted and we add debugging output
 372     fn vec_filter_candidates(
 373         src: Local,
 374         v: &mut Vec<Local>,
 375         mut f: impl FnMut(Local) -> CandidateFilter,
 376         at: Location,
 377     ) {
 378         v.retain(|dest| {
 379             let remove = f(*dest);
 380             if remove == CandidateFilter::Remove {
 381                 trace!("eliminating {:?} => {:?} due to conflict at {:?}", src, dest, at);
 382             }
 383             remove == CandidateFilter::Keep
 384         });
 385     }
 386
 387     /// `vec_filter_candidates` but for an `Entry`
 388     fn entry_filter_candidates(
 389         mut entry: OccupiedEntry<'_, Local, Vec<Local>>,
 390         p: Local,
 391         f: impl FnMut(Local) -> CandidateFilter,
 392         at: Location,
 393     ) {
 394         let candidates = entry.get_mut();
 395         Self::vec_filter_candidates(p, candidates, f, at);
 396         if candidates.len() == 0 {
 397             entry.remove();
 398         }
 399     }
 400
 401     /// For all candidates `(p, q)` or `(q, p)` removes the candidate if `f(q)` says to do so
 402     fn filter_candidates_by(
 403         &mut self,
 404         p: Local,
 405         mut f: impl FnMut(Local) -> CandidateFilter,
 406         at: Location,
 407     ) {
 408         // Cover the cases where `p` appears as a `src`
 409         if let Entry::Occupied(entry) = self.c.entry(p) {
 410             Self::entry_filter_candidates(entry, p, &mut f, at);
 411         }
 412         // And the cases where `p` appears as a `dest`
 413         let Some(srcs) = self.reverse.get_mut(&p) else {
 414             return;
 415         };
 416         // We use `retain` here to remove the elements from the reverse set if we've removed the
 417         // matching candidate in the forward set.
 418         srcs.retain(|src| {
 419             if f(*src) == CandidateFilter::Keep {
 420                 return true;
 421             }
 422             let Entry::Occupied(entry) = self.c.entry(*src) else {
 423                 return false;
 424             };
 425             Self::entry_filter_candidates(
 426                 entry,
 427                 *src,
 428                 |dest| {
 429                     if dest == p { CandidateFilter::Remove } else { CandidateFilter::Keep }
 430                 },
 431                 at,
 432             );
 433             false
 434         });
 435     }
 436 }
 437
 438 #[derive(Copy, Clone, PartialEq, Eq)]
 439 enum CandidateFilter {
 440     Keep,
 441     Remove,
 442 }
 443
 444 impl<'a, 'body, 'alloc, 'tcx> FilterInformation<'a, 'body, 'alloc, 'tcx> {
 445     /// Filters the set of candidates to remove those that conflict.
 446     ///
 447     /// The steps we take are exactly those that are outlined at the top of the file. For each
 448     /// statement/terminator, we collect the set of locals that are written to in that
 449     /// statement/terminator, and then we remove all pairs of candidates that contain one such local
 450     /// and another one that is live.
 451     ///
 452     /// We need to be careful about the ordering of operations within each statement/terminator
 453     /// here. Many statements might write and read from more than one place, and we need to consider
 454     /// them all. The strategy for doing this is as follows: We first gather all the places that are
 455     /// written to within the statement/terminator via `WriteInfo`. Then, we use the liveness
 456     /// analysis from *before* the statement/terminator (in the control flow sense) to eliminate
 457     /// candidates - this is because we want to conservatively treat a pair of locals that is both
 458     /// read and written in the statement/terminator to be conflicting, and the liveness analysis
 459     /// before the statement/terminator will correctly report locals that are read in the
 460     /// statement/terminator to be live. We are additionally conservative by treating all written to
 461     /// locals as also being read from.
 462     fn filter_liveness<'b>(
 463         candidates: &mut Candidates<'alloc>,
 464         live: &mut ResultsCursor<'b, 'tcx, MaybeLiveLocals>,
 465         write_info_alloc: &'alloc mut WriteInfo,
 466         body: &'b Body<'tcx>,
 467     ) {
 468         let mut this = FilterInformation {
 469             body,
 470             live,
 471             candidates,
 472             // We don't actually store anything at this scope, we just keep things here to be able
 473             // to reuse the allocation.
 474             write_info: write_info_alloc,
 475             // Doesn't matter what we put here, will be overwritten before being used
 476             at: Location::START,
 477         };
 478         this.internal_filter_liveness();
 479     }
 480
 481     fn internal_filter_liveness(&mut self) {
 482         for (block, data) in traversal::preorder(self.body) {
 483             self.at = Location { block, statement_index: data.statements.len() };
 484             self.live.seek_after_primary_effect(self.at);
 485             self.write_info.for_terminator(&data.terminator().kind);
 486             self.apply_conflicts();
 487
 488             for (i, statement) in data.statements.iter().enumerate().rev() {
 489                 self.at = Location { block, statement_index: i };
 490                 self.live.seek_after_primary_effect(self.at);
 491                 self.write_info.for_statement(&statement.kind, self.body);
 492                 self.apply_conflicts();
 493             }
 494         }
 495     }
 496
 497     fn apply_conflicts(&mut self) {
 498         let writes = &self.write_info.writes;
 499         for p in writes {
 500             let other_skip = self.write_info.skip_pair.and_then(|(a, b)| {
 501                 if a == *p {
 502                     Some(b)
 503                 } else if b == *p {
 504                     Some(a)
 505                 } else {
 506                     None
 507                 }
 508             });
 509             self.candidates.filter_candidates_by(
 510                 *p,
 511                 |q| {
 512                     if Some(q) == other_skip {
 513                         return CandidateFilter::Keep;
 514                     }
 515                     // It is possible that a local may be live for less than the
 516                     // duration of a statement This happens in the case of function
 517                     // calls or inline asm. Because of this, we also mark locals as
 518                     // conflicting when both of them are written to in the same
 519                     // statement.
 520                     if self.live.contains(q) || writes.contains(&q) {
 521                         CandidateFilter::Remove
 522                     } else {
 523                         CandidateFilter::Keep
 524                     }
 525                 },
 526                 self.at,
 527             );
 528         }
 529     }
 530 }
 531
 532 /// Describes where a statement/terminator writes to
 533 #[derive(Default, Debug)]
 534 struct WriteInfo {
 535     writes: Vec<Local>,
 536     /// If this pair of locals is a candidate pair, completely skip processing it during this
 537     /// statement. All other candidates are unaffected.
 538     skip_pair: Option<(Local, Local)>,
 539 }
 540
 541 impl WriteInfo {
 542     fn for_statement<'tcx>(&mut self, statement: &StatementKind<'tcx>, body: &Body<'tcx>) {
 543         self.reset();
 544         match statement {
 545             StatementKind::Assign(box (lhs, rhs)) => {
 546                 self.add_place(*lhs);
 547                 match rhs {
 548                     Rvalue::Use(op) => {
 549                         self.add_operand(op);
 550                         self.consider_skipping_for_assign_use(*lhs, op, body);
 551                     }
 552                     Rvalue::Repeat(op, _) => {
 553                         self.add_operand(op);
 554                     }
 555                     Rvalue::Cast(_, op, _)
 556                     | Rvalue::UnaryOp(_, op)
 557                     | Rvalue::ShallowInitBox(op, _) => {
 558                         self.add_operand(op);
 559                     }
 560                     Rvalue::BinaryOp(_, ops) | Rvalue::CheckedBinaryOp(_, ops) => {
 561                         for op in [&ops.0, &ops.1] {
 562                             self.add_operand(op);
 563                         }
 564                     }
 565                     Rvalue::Aggregate(_, ops) => {
 566                         for op in ops {
 567                             self.add_operand(op);
 568                         }
 569                     }
 570                     Rvalue::ThreadLocalRef(_)
 571                     | Rvalue::NullaryOp(_, _)
 572                     | Rvalue::Ref(_, _, _)
 573                     | Rvalue::AddressOf(_, _)
 574                     | Rvalue::Len(_)
 575                     | Rvalue::Discriminant(_)
 576                     | Rvalue::CopyForDeref(_) => (),
 577                 }
 578             }
 579             // Retags are technically also reads, but reporting them as a write suffices
 580             StatementKind::SetDiscriminant { place, .. }
 581             | StatementKind::Deinit(place)
 582             | StatementKind::Retag(_, place) => {
 583                 self.add_place(**place);
 584             }
 585             StatementKind::Intrinsic(_)
 586             | StatementKind::ConstEvalCounter
 587             | StatementKind::Nop
 588             | StatementKind::Coverage(_)
 589             | StatementKind::StorageLive(_)
 590             | StatementKind::StorageDead(_)
 591             | StatementKind::PlaceMention(_) => (),
 592             StatementKind::FakeRead(_) | StatementKind::AscribeUserType(_, _) => {
 593                 bug!("{:?} not found in this MIR phase", statement)
 594             }
 595         }
 596     }
 597
 598     fn consider_skipping_for_assign_use<'tcx>(
 599         &mut self,
 600         lhs: Place<'tcx>,
 601         rhs: &Operand<'tcx>,
 602         body: &Body<'tcx>,
 603     ) {
 604         let Some(rhs) = rhs.place() else { return };
 605         if let Some(pair) = places_to_candidate_pair(lhs, rhs, body) {
 606             self.skip_pair = Some(pair);
 607         }
 608     }
 609
 610     fn for_terminator<'tcx>(&mut self, terminator: &TerminatorKind<'tcx>) {
 611         self.reset();
 612         match terminator {
 613             TerminatorKind::SwitchInt { discr: op, .. }
 614             | TerminatorKind::Assert { cond: op, .. } => {
 615                 self.add_operand(op);
 616             }
 617             TerminatorKind::Call { destination, func, args, .. } => {
 618                 self.add_place(*destination);
 619                 self.add_operand(func);
 620                 for arg in args {
 621                     self.add_operand(arg);
 622                 }
 623             }
 624             TerminatorKind::InlineAsm { operands, .. } => {
 625                 for asm_operand in operands {
 626                     match asm_operand {
 627                         InlineAsmOperand::In { value, .. } => {
 628                             self.add_operand(value);
 629                         }
 630                         InlineAsmOperand::Out { place, .. } => {
 631                             if let Some(place) = place {
 632                                 self.add_place(*place);
 633                             }
 634                         }
 635                         // Note that the `late` field in `InOut` is about whether the registers used
 636                         // for these things overlap, and is of absolutely no interest to us.
 637                         InlineAsmOperand::InOut { in_value, out_place, .. } => {
 638                             if let Some(place) = out_place {
 639                                 self.add_place(*place);
 640                             }
 641                             self.add_operand(in_value);
 642                         }
 643                         InlineAsmOperand::Const { .. }
 644                         | InlineAsmOperand::SymFn { .. }
 645                         | InlineAsmOperand::SymStatic { .. } => (),
 646                     }
 647                 }
 648             }
 649             TerminatorKind::Goto { .. }
 650             | TerminatorKind::UnwindResume
 651             | TerminatorKind::UnwindTerminate(_)
 652             | TerminatorKind::Return
 653             | TerminatorKind::Unreachable { .. } => (),
 654             TerminatorKind::Drop { .. } => {
 655                 // `Drop`s create a `&mut` and so are not considered
 656             }
 657             TerminatorKind::Yield { .. }
 658             | TerminatorKind::GeneratorDrop
 659             | TerminatorKind::FalseEdge { .. }
 660             | TerminatorKind::FalseUnwind { .. } => {
 661                 bug!("{:?} not found in this MIR phase", terminator)
 662             }
 663         }
 664     }
 665
 666     fn add_place(&mut self, place: Place<'_>) {
 667         self.writes.push(place.local);
 668     }
 669
 670     fn add_operand<'tcx>(&mut self, op: &Operand<'tcx>) {
 671         match op {
 672             // FIXME(JakobDegen): In a previous version, the `Move` case was incorrectly treated as
 673             // being a read only. This was unsound, however we cannot add a regression test because
 674             // it is not possible to set this off with current MIR. Once we have that ability, a
 675             // regression test should be added.
 676             Operand::Move(p) => self.add_place(*p),
 677             Operand::Copy(_) | Operand::Constant(_) => (),
 678         }
 679     }
 680
 681     fn reset(&mut self) {
 682         self.writes.clear();
 683         self.skip_pair = None;
 684     }
 685 }
 686
 687 /////////////////////////////////////////////////////
 688 // Candidate accumulation
 689
 690 /// If the pair of places is being considered for merging, returns the candidate which would be
 691 /// merged in order to accomplish this.
 692 ///
 693 /// The contract here is in one direction - there is a guarantee that merging the locals that are
 694 /// outputted by this function would result in an assignment between the inputs becoming a
 695 /// self-assignment. However, there is no guarantee that the returned pair is actually suitable for
 696 /// merging - candidate collection must still check this independently.
 697 ///
 698 /// This output is unique for each unordered pair of input places.
 699 fn places_to_candidate_pair<'tcx>(
 700     a: Place<'tcx>,
 701     b: Place<'tcx>,
 702     body: &Body<'tcx>,
 703 ) -> Option<(Local, Local)> {
 704     let (mut a, mut b) = if a.projection.len() == 0 && b.projection.len() == 0 {
 705         (a.local, b.local)
 706     } else {
 707         return None;
 708     };
 709
 710     // By sorting, we make sure we're input order independent
 711     if a > b {
 712         std::mem::swap(&mut a, &mut b);
 713     }
 714
 715     // We could now return `(a, b)`, but then we miss some candidates in the case where `a` can't be
 716     // used as a `src`.
 717     if is_local_required(a, body) {
 718         std::mem::swap(&mut a, &mut b);
 719     }
 720     // We could check `is_local_required` again here, but there's no need - after all, we make no
 721     // promise that the candidate pair is actually valid
 722     Some((a, b))
 723 }
 724
 725 /// Collects the candidates for merging
 726 ///
 727 /// This is responsible for enforcing the first and third bullet point.
 728 fn find_candidates<'alloc, 'tcx>(
 729     body: &Body<'tcx>,
 730     borrowed: &BitSet<Local>,
 731     candidates: &'alloc mut FxHashMap<Local, Vec<Local>>,
 732     candidates_reverse: &'alloc mut FxHashMap<Local, Vec<Local>>,
 733 ) -> Candidates<'alloc> {
 734     candidates.clear();
 735     candidates_reverse.clear();
 736     let mut visitor = FindAssignments { body, candidates, borrowed };
 737     visitor.visit_body(body);
 738     // Deduplicate candidates
 739     for (_, cands) in candidates.iter_mut() {
 740         cands.sort();
 741         cands.dedup();
 742     }
 743     // Generate the reverse map
 744     for (src, cands) in candidates.iter() {
 745         for dest in cands.iter().copied() {
 746             candidates_reverse.entry(dest).or_default().push(*src);
 747         }
 748     }
 749     Candidates { c: candidates, reverse: candidates_reverse }
 750 }
 751
 752 struct FindAssignments<'a, 'alloc, 'tcx> {
 753     body: &'a Body<'tcx>,
 754     candidates: &'alloc mut FxHashMap<Local, Vec<Local>>,
 755     borrowed: &'a BitSet<Local>,
 756 }
 757
 758 impl<'tcx> Visitor<'tcx> for FindAssignments<'_, '_, 'tcx> {
 759     fn visit_statement(&mut self, statement: &Statement<'tcx>, _: Location) {
 760         if let StatementKind::Assign(box (
 761             lhs,
 762             Rvalue::CopyForDeref(rhs) | Rvalue::Use(Operand::Copy(rhs) | Operand::Move(rhs)),
 763         )) = &statement.kind
 764         {
 765             let Some((src, dest)) = places_to_candidate_pair(*lhs, *rhs, self.body) else {
 766                 return;
 767             };
 768
 769             // As described at the top of the file, we do not go near things that have
 770             // their address taken.
 771             if self.borrowed.contains(src) || self.borrowed.contains(dest) {
 772                 return;
 773             }
 774
 775             // As described at the top of this file, we do not touch locals which have
 776             // different types.
 777             let src_ty = self.body.local_decls()[src].ty;
 778             let dest_ty = self.body.local_decls()[dest].ty;
 779             if src_ty != dest_ty {
 780                 // FIXME(#112651): This can be removed afterwards. Also update the module description.
 781                 trace!("skipped `{src:?} = {dest:?}` due to subtyping: {src_ty} != {dest_ty}");
 782                 return;
 783             }
 784
 785             // Also, we need to make sure that MIR actually allows the `src` to be removed
 786             if is_local_required(src, self.body) {
 787                 return;
 788             }
 789
 790             // We may insert duplicates here, but that's fine
 791             self.candidates.entry(src).or_default().push(dest);
 792         }
 793     }
 794 }
 795
 796 /// Some locals are part of the function's interface and can not be removed.
 797 ///
 798 /// Note that these locals *can* still be merged with non-required locals by removing that other
 799 /// local.
 800 fn is_local_required(local: Local, body: &Body<'_>) -> bool {
 801     match body.local_kind(local) {
 802         LocalKind::Arg | LocalKind::ReturnPointer => true,
 803         LocalKind::Temp => false,
 804     }
 805 }
 806
 807 /////////////////////////////////////////////////////////
 808 // MIR Dump
 809
 810 fn dest_prop_mir_dump<'body, 'tcx>(
 811     tcx: TyCtxt<'tcx>,
 812     body: &'body Body<'tcx>,
 813     live: &mut ResultsCursor<'body, 'tcx, MaybeLiveLocals>,
 814     round: usize,
 815 ) {
 816     let mut reachable = None;
 817     dump_mir(tcx, false, "DestinationPropagation-dataflow", &round, body, |pass_where, w| {
 818         let reachable = reachable.get_or_insert_with(|| traversal::reachable_as_bitset(body));
 819
 820         match pass_where {
 821             PassWhere::BeforeLocation(loc) if reachable.contains(loc.block) => {
 822                 live.seek_after_primary_effect(loc);
 823                 writeln!(w, "        // live: {:?}", live.get())?;
 824             }
 825             PassWhere::AfterTerminator(bb) if reachable.contains(bb) => {
 826                 let loc = body.terminator_loc(bb);
 827                 live.seek_before_primary_effect(loc);
 828                 writeln!(w, "        // live: {:?}", live.get())?;
 829             }
 830
 831             PassWhere::BeforeBlock(bb) if reachable.contains(bb) => {
 832                 live.seek_to_block_start(bb);
 833                 writeln!(w, "    // live: {:?}", live.get())?;
 834             }
 835
 836             PassWhere::BeforeCFG | PassWhere::AfterCFG | PassWhere::AfterLocation(_) => {}
 837
 838             PassWhere::BeforeLocation(_) | PassWhere::AfterTerminator(_) => {
 839                 writeln!(w, "        // live: <unreachable>")?;
 840             }
 841
 842             PassWhere::BeforeBlock(_) => {
 843                 writeln!(w, "    // live: <unreachable>")?;
 844             }
 845         }
 846
 847         Ok(())
 848     });
 849 }