]> git.proxmox.com Git - libgit2.git/blame - docs/diff-internals.md
Drop patch that has been merged upstream
[libgit2.git] / docs / diff-internals.md
CommitLineData
114f5a6c
RB
1Diff is broken into four phases:
2
31. Building a list of things that have changed. These changes are called
4 deltas (git_diff_delta objects) and are grouped into a git_diff_list.
52. Applying file similarity measurement for rename and copy detection (and
6 to potentially split files that have changed radically). This step is
7 optional.
83. Computing the textual diff for each delta. Not all deltas have a
9 meaningful textual diff. For those that do, the textual diff can
10 either be generated on the fly and passed to output callbacks or can be
11 turned into a git_diff_patch object.
124. Formatting the diff and/or patch into standard text formats (such as
13 patches, raw lists, etc).
14
15In the source code, step 1 is implemented in `src/diff.c`, step 2 in
16`src/diff_tform.c`, step 3 in `src/diff_patch.c`, and step 4 in
17`src/diff_print.c`. Additionally, when it comes to accessing file
18content, everything goes through diff drivers that are implemented in
19`src/diff_driver.c`.
20
21External Objects
22----------------
23
27f680a9 24* `git_diff_options` represents user choices about how a diff should be
114f5a6c
RB
25 performed and is passed to most diff generating functions.
26* `git_diff_file` represents an item on one side of a possible delta
27* `git_diff_delta` represents a pair of items that have changed in some
28 way - it contains two `git_diff_file` plus a status and other stuff.
29* `git_diff_list` is a list of deltas along with information about how
30 those particular deltas were found.
31* `git_diff_patch` represents the actual diff between a pair of items. In
32 some cases, a delta may not have a corresponding patch, if the objects
33 are binary, for example. The content of a patch will be a set of hunks
34 and lines.
35* A `hunk` is range of lines described by a `git_diff_range` (i.e. "lines
36 10-20 in the old file became lines 12-23 in the new"). It will have a
37 header that compactly represents that information, and it will have a
38 number of lines of context surrounding added and deleted lines.
39* A `line` is simple a line of data along with a `git_diff_line_t` value
27f680a9 40 that tells how the data should be interpreted (e.g. context or added).
114f5a6c
RB
41
42Internal Objects
43----------------
44
45* `git_diff_file_content` is an internal structure that represents the
46 data on one side of an item to be diffed; it is an augmented
47 `git_diff_file` with more flags and the actual file data.
6918d81e
MA
48
49 * it is created from a repository plus a) a git_diff_file, b) a git_blob,
114f5a6c 50 or c) raw data and size
6918d81e
MA
51 * there are three main operations on git_diff_file_content:
52
53 * _initialization_ sets up the data structure and does what it can up to,
54 but not including loading and looking at the actual data
55 * _loading_ loads the data, preprocesses it (i.e. applies filters) and
56 potentially analyzes it (to decide if binary)
57 * _free_ releases loaded data and frees any allocated memory
114f5a6c
RB
58
59* The internal structure of a `git_diff_patch` stores the actual diff
60 between a pair of `git_diff_file_content` items
6918d81e
MA
61
62 * it may be "unset" if the items are not diffable
63 * "empty" if the items are the same
64 * otherwise it will consist of a set of hunks each of which covers some
65 number of lines of context, additions and deletions
66 * a patch is created from two git_diff_file_content items
67 * a patch is fully instantiated in three phases:
68
69 * initial creation and initialization
70 * loading of data and preliminary data examination
71 * diffing of data and optional storage of diffs
72 * (TBD) if a patch is asked to store the diffs and the size of the diff
73 is significantly smaller than the raw data of the two sides, then the
74 patch may be flattened using a pool of string data
114f5a6c
RB
75
76* `git_diff_output` is an internal structure that represents an output
77 target for a `git_diff_patch`
6918d81e
MA
78 * It consists of file, hunk, and line callbacks, plus a payload
79 * There is a standard flattened output that can be used for plain text output
80 * Typically we use a `git_xdiff_output` which drives the callbacks via the
81 xdiff code taken from core Git.
114f5a6c
RB
82
83* `git_diff_driver` is an internal structure that encapsulates the logic
84 for a given type of file
6918d81e
MA
85 * a driver is looked up based on the name and mode of a file.
86 * the driver can then be used to:
87 * determine if a file is binary (by attributes, by git_diff_options
88 settings, or by examining the content)
89 * give you a function pointer that is used to evaluate function context
90 for hunk headers
91 * At some point, the logic for getting a filtered version of file content
92 or calculating the OID of a file may be moved into the driver.