]> git.proxmox.com Git - rustc.git/blob - vendor/similar/src/lib.rs
New upstream version 1.70.0+dfsg2
[rustc.git] / vendor / similar / src / lib.rs
1 //! This crate implements diffing utilities. It attempts to provide an abstraction
2 //! interface over different types of diffing algorithms. The design of the
3 //! library is inspired by pijul's diff library by Pierre-Étienne Meunier and
4 //! also inherits the patience diff algorithm from there.
5 //!
6 //! The API of the crate is split into high and low level functionality. Most
7 //! of what you probably want to use is available top level. Additionally the
8 //! following sub modules exist:
9 //!
10 //! * [`algorithms`]: This implements the different types of diffing algorithms.
11 //! It provides both low level access to the algorithms with the minimal
12 //! trait bounds necessary, as well as a generic interface.
13 //! * [`udiff`]: Unified diff functionality.
14 //! * [`utils`]: utilities for common diff related operations. This module
15 //! provides additional diffing functions for working with text diffs.
16 //!
17 //! # Sequence Diffing
18 //!
19 //! If you want to diff sequences generally indexable things you can use the
20 //! [`capture_diff`] and [`capture_diff_slices`] functions. They will directly
21 //! diff an indexable object or slice and return a vector of [`DiffOp`] objects.
22 //!
23 //! ```rust
24 //! use similar::{Algorithm, capture_diff_slices};
25 //!
26 //! let a = vec![1, 2, 3, 4, 5];
27 //! let b = vec![1, 2, 3, 4, 7];
28 //! let ops = capture_diff_slices(Algorithm::Myers, &a, &b);
29 //! ```
30 //!
31 //! # Text Diffing
32 //!
33 //! Similar provides helpful utilities for text (and more specifically line) diff
34 //! operations. The main type you want to work with is [`TextDiff`] which
35 //! uses the underlying diff algorithms to expose a convenient API to work with
36 //! texts:
37 //!
38 //! ```rust
39 //! # #[cfg(feature = "text")] {
40 //! use similar::{ChangeTag, TextDiff};
41 //!
42 //! let diff = TextDiff::from_lines(
43 //! "Hello World\nThis is the second line.\nThis is the third.",
44 //! "Hallo Welt\nThis is the second line.\nThis is life.\nMoar and more",
45 //! );
46 //!
47 //! for change in diff.iter_all_changes() {
48 //! let sign = match change.tag() {
49 //! ChangeTag::Delete => "-",
50 //! ChangeTag::Insert => "+",
51 //! ChangeTag::Equal => " ",
52 //! };
53 //! print!("{}{}", sign, change);
54 //! }
55 //! # }
56 //! ```
57 //!
58 //! ## Trailing Newlines
59 //!
60 //! When working with line diffs (and unified diffs in general) there are two
61 //! "philosophies" to look at lines. One is to diff lines without their newline
62 //! character, the other is to diff with the newline character. Typically the
63 //! latter is done because text files do not _have_ to end in a newline character.
64 //! As a result there is a difference between `foo\n` and `foo` as far as diffs
65 //! are concerned.
66 //!
67 //! In similar this is handled on the [`Change`] or [`InlineChange`] level. If
68 //! a diff was created via [`TextDiff::from_lines`] the text diffing system is
69 //! instructed to check if there are missing newlines encountered
70 //! ([`TextDiff::newline_terminated`] returns true).
71 //!
72 //! In any case the [`Change`] object has a convenience method called
73 //! [`Change::missing_newline`] which returns `true` if the change is missing
74 //! a trailing newline. Armed with that information the caller knows to handle
75 //! this by either rendering a virtual newline at that position or to indicate
76 //! it in different ways. For instance the unified diff code will render the
77 //! special `\ No newline at end of file` marker.
78 //!
79 //! ## Bytes vs Unicode
80 //!
81 //! Similar module concerns itself with a loser definition of "text" than you would
82 //! normally see in Rust. While by default it can only operate on [`str`] types
83 //! by enabling the `bytes` feature it gains support for byte slices with some
84 //! caveats.
85 //!
86 //! A lot of text diff functionality assumes that what is being diffed constitutes
87 //! text, but in the real world it can often be challenging to ensure that this is
88 //! all valid utf-8. Because of this the crate is built so that most functionality
89 //! also still works with bytes for as long as they are roughly ASCII compatible.
90 //!
91 //! This means you will be successful in creating a unified diff from latin1
92 //! encoded bytes but if you try to do the same with EBCDIC encoded bytes you
93 //! will only get garbage.
94 //!
95 //! # Ops vs Changes
96 //!
97 //! Because very commonly two compared sequences will largely match this module
98 //! splits it's functionality into two layers:
99 //!
100 //! Changes are encoded as [diff operations](crate::DiffOp). These are
101 //! ranges of the differences by index in the source sequence. Because this
102 //! can be cumbersome to work with a separate method [`DiffOp::iter_changes`]
103 //! (and [`TextDiff::iter_changes`] when working with text diffs) is provided
104 //! which expands all the changes on an item by item level encoded in an operation.
105 //!
106 //! As the [`TextDiff::grouped_ops`] method can isolate clusters of changes
107 //! this even works for very long files if paired with this method.
108 //!
109 //! # Deadlines and Performance
110 //!
111 //! For large and very distinct inputs the algorithms as implemented can take
112 //! a very, very long time to execute. Too long to make sense in practice.
113 //! To work around this issue all diffing algorithms also provide a version
114 //! that accepts a deadline which is the point in time as defined by an
115 //! [`Instant`](std::time::Instant) after which the algorithm should give up.
116 //! What giving up means depends on the algorithm. For instance due to the
117 //! recursive, divide and conquer nature of Myer's diff you will still get a
118 //! pretty decent diff in many cases when a deadline is reached. Whereas on the
119 //! other hand the LCS diff is unlikely to give any decent results in such a
120 //! situation.
121 //!
122 //! The [`TextDiff`] type also lets you configure a deadline and/or timeout
123 //! when performing a text diff.
124 //!
125 //! # Feature Flags
126 //!
127 //! The crate by default does not have any dependencies however for some use
128 //! cases it's useful to pull in extra functionality. Likewise you can turn
129 //! off some functionality.
130 //!
131 //! * `text`: this feature is enabled by default and enables the text based
132 //! diffing types such as [`TextDiff`].
133 //! If the crate is used without default features it's removed.
134 //! * `unicode`: when this feature is enabled the text diffing functionality
135 //! gains the ability to diff on a grapheme instead of character level. This
136 //! is particularly useful when working with text containing emojis. This
137 //! pulls in some relatively complex dependencies for working with the unicode
138 //! database.
139 //! * `bytes`: this feature adds support for working with byte slices in text
140 //! APIs in addition to unicode strings. This pulls in the
141 //! [`bstr`] dependency.
142 //! * `inline`: this feature gives access to additional functionality of the
143 //! text diffing to provide inline information about which values changed
144 //! in a line diff. This currently also enables the `unicode` feature.
145 //! * `serde`: this feature enables serialization to some types in this
146 //! crate. For enums without payload deserialization is then also supported.
147 #![warn(missing_docs)]
148 pub mod algorithms;
149 pub mod iter;
150 #[cfg(feature = "text")]
151 pub mod udiff;
152 #[cfg(feature = "text")]
153 pub mod utils;
154
155 mod common;
156 #[cfg(feature = "text")]
157 mod text;
158 mod types;
159
160 pub use self::common::*;
161 #[cfg(feature = "text")]
162 pub use self::text::*;
163 pub use self::types::*;