vendor/similar/src/lib.rs

   1 //! This crate implements diffing utilities.  It attempts to provide an abstraction
   2 //! interface over different types of diffing algorithms.  The design of the
   3 //! library is inspired by pijul's diff library by Pierre-Étienne Meunier and
   4 //! also inherits the patience diff algorithm from there.
   5 //!
   6 //! The API of the crate is split into high and low level functionality.  Most
   7 //! of what you probably want to use is available top level.  Additionally the
   8 //! following sub modules exist:
   9 //!
  10 //! * [`algorithms`]: This implements the different types of diffing algorithms.
  11 //!   It provides both low level access to the algorithms with the minimal
  12 //!   trait bounds necessary, as well as a generic interface.
  13 //! * [`udiff`]: Unified diff functionality.
  14 //! * [`utils`]: utilities for common diff related operations.  This module
  15 //!   provides additional diffing functions for working with text diffs.
  16 //!
  17 //! # Sequence Diffing
  18 //!
  19 //! If you want to diff sequences generally indexable things you can use the
  20 //! [`capture_diff`] and [`capture_diff_slices`] functions.  They will directly
  21 //! diff an indexable object or slice and return a vector of [`DiffOp`] objects.
  22 //!
  23 //! ```rust
  24 //! use similar::{Algorithm, capture_diff_slices};
  25 //!
  26 //! let a = vec![1, 2, 3, 4, 5];
  27 //! let b = vec![1, 2, 3, 4, 7];
  28 //! let ops = capture_diff_slices(Algorithm::Myers, &a, &b);
  29 //! ```
  30 //!
  31 //! # Text Diffing
  32 //!
  33 //! Similar provides helpful utilities for text (and more specifically line) diff
  34 //! operations.  The main type you want to work with is [`TextDiff`] which
  35 //! uses the underlying diff algorithms to expose a convenient API to work with
  36 //! texts:
  37 //!
  38 //! ```rust
  39 //! # #[cfg(feature = "text")] {
  40 //! use similar::{ChangeTag, TextDiff};
  41 //!
  42 //! let diff = TextDiff::from_lines(
  43 //!     "Hello World\nThis is the second line.\nThis is the third.",
  44 //!     "Hallo Welt\nThis is the second line.\nThis is life.\nMoar and more",
  45 //! );
  46 //!
  47 //! for change in diff.iter_all_changes() {
  48 //!     let sign = match change.tag() {
  49 //!         ChangeTag::Delete => "-",
  50 //!         ChangeTag::Insert => "+",
  51 //!         ChangeTag::Equal => " ",
  52 //!     };
  53 //!     print!("{}{}", sign, change);
  54 //! }
  55 //! # }
  56 //! ```
  57 //!
  58 //! ## Trailing Newlines
  59 //!
  60 //! When working with line diffs (and unified diffs in general) there are two
  61 //! "philosophies" to look at lines.  One is to diff lines without their newline
  62 //! character, the other is to diff with the newline character.  Typically the
  63 //! latter is done because text files do not _have_ to end in a newline character.
  64 //! As a result there is a difference between `foo\n` and `foo` as far as diffs
  65 //! are concerned.
  66 //!
  67 //! In similar this is handled on the [`Change`] or [`InlineChange`] level.  If
  68 //! a diff was created via [`TextDiff::from_lines`] the text diffing system is
  69 //! instructed to check if there are missing newlines encountered
  70 //! ([`TextDiff::newline_terminated`] returns true).
  71 //!
  72 //! In any case the [`Change`] object has a convenience method called
  73 //! [`Change::missing_newline`] which returns `true` if the change is missing
  74 //! a trailing newline.  Armed with that information the caller knows to handle
  75 //! this by either rendering a virtual newline at that position or to indicate
  76 //! it in different ways.  For instance the unified diff code will render the
  77 //! special `\ No newline at end of file` marker.
  78 //!
  79 //! ## Bytes vs Unicode
  80 //!
  81 //! Similar module concerns itself with a loser definition of "text" than you would
  82 //! normally see in Rust.  While by default it can only operate on [`str`] types
  83 //! by enabling the `bytes` feature it gains support for byte slices with some
  84 //! caveats.
  85 //!
  86 //! A lot of text diff functionality assumes that what is being diffed constitutes
  87 //! text, but in the real world it can often be challenging to ensure that this is
  88 //! all valid utf-8.  Because of this the crate is built so that most functionality
  89 //! also still works with bytes for as long as they are roughly ASCII compatible.
  90 //!
  91 //! This means you will be successful in creating a unified diff from latin1
  92 //! encoded bytes but if you try to do the same with EBCDIC encoded bytes you
  93 //! will only get garbage.
  94 //!
  95 //! # Ops vs Changes
  96 //!
  97 //! Because very commonly two compared sequences will largely match this module
  98 //! splits it's functionality into two layers:
  99 //!
 100 //! Changes are encoded as [diff operations](crate::DiffOp).  These are
 101 //! ranges of the differences by index in the source sequence.  Because this
 102 //! can be cumbersome to work with a separate method [`DiffOp::iter_changes`]
 103 //! (and [`TextDiff::iter_changes`] when working with text diffs) is provided
 104 //! which expands all the changes on an item by item level encoded in an operation.
 105 //!
 106 //! As the [`TextDiff::grouped_ops`] method can isolate clusters of changes
 107 //! this even works for very long files if paired with this method.
 108 //!
 109 //! # Deadlines and Performance
 110 //!
 111 //! For large and very distinct inputs the algorithms as implemented can take
 112 //! a very, very long time to execute.  Too long to make sense in practice.
 113 //! To work around this issue all diffing algorithms also provide a version
 114 //! that accepts a deadline which is the point in time as defined by an
 115 //! [`Instant`](std::time::Instant) after which the algorithm should give up.
 116 //! What giving up means depends on the algorithm.  For instance due to the
 117 //! recursive, divide and conquer nature of Myer's diff you will still get a
 118 //! pretty decent diff in many cases when a deadline is reached.  Whereas on the
 119 //! other hand the LCS diff is unlikely to give any decent results in such a
 120 //! situation.
 121 //!
 122 //! The [`TextDiff`] type also lets you configure a deadline and/or timeout
 123 //! when performing a text diff.
 124 //!
 125 //! # Feature Flags
 126 //!
 127 //! The crate by default does not have any dependencies however for some use
 128 //! cases it's useful to pull in extra functionality.  Likewise you can turn
 129 //! off some functionality.
 130 //!
 131 //! * `text`: this feature is enabled by default and enables the text based
 132 //!   diffing types such as [`TextDiff`].
 133 //!   If the crate is used without default features it's removed.
 134 //! * `unicode`: when this feature is enabled the text diffing functionality
 135 //!   gains the ability to diff on a grapheme instead of character level.  This
 136 //!   is particularly useful when working with text containing emojis.  This
 137 //!   pulls in some relatively complex dependencies for working with the unicode
 138 //!   database.
 139 //! * `bytes`: this feature adds support for working with byte slices in text
 140 //!   APIs in addition to unicode strings.  This pulls in the
 141 //!   [`bstr`] dependency.
 142 //! * `inline`: this feature gives access to additional functionality of the
 143 //!   text diffing to provide inline information about which values changed
 144 //!   in a line diff.  This currently also enables the `unicode` feature.
 145 //! * `serde`: this feature enables serialization to some types in this
 146 //!   crate.  For enums without payload deserialization is then also supported.
 147 #![warn(missing_docs)]
 148 pub mod algorithms;
 149 pub mod iter;
 150 #[cfg(feature = "text")]
 151 pub mod udiff;
 152 #[cfg(feature = "text")]
 153 pub mod utils;
 154
 155 mod common;
 156 #[cfg(feature = "text")]
 157 mod text;
 158 mod types;
 159
 160 pub use self::common::*;
 161 #[cfg(feature = "text")]
 162 pub use self::text::*;
 163 pub use self::types::*;