]>
Commit | Line | Data |
---|---|---|
114f5a6c RB |
1 | Diff is broken into four phases: |
2 | ||
3 | 1. Building a list of things that have changed. These changes are called | |
4 | deltas (git_diff_delta objects) and are grouped into a git_diff_list. | |
5 | 2. Applying file similarity measurement for rename and copy detection (and | |
6 | to potentially split files that have changed radically). This step is | |
7 | optional. | |
8 | 3. Computing the textual diff for each delta. Not all deltas have a | |
9 | meaningful textual diff. For those that do, the textual diff can | |
10 | either be generated on the fly and passed to output callbacks or can be | |
11 | turned into a git_diff_patch object. | |
12 | 4. Formatting the diff and/or patch into standard text formats (such as | |
13 | patches, raw lists, etc). | |
14 | ||
15 | In the source code, step 1 is implemented in `src/diff.c`, step 2 in | |
16 | `src/diff_tform.c`, step 3 in `src/diff_patch.c`, and step 4 in | |
17 | `src/diff_print.c`. Additionally, when it comes to accessing file | |
18 | content, everything goes through diff drivers that are implemented in | |
19 | `src/diff_driver.c`. | |
20 | ||
21 | External Objects | |
22 | ---------------- | |
23 | ||
27f680a9 | 24 | * `git_diff_options` represents user choices about how a diff should be |
114f5a6c RB |
25 | performed and is passed to most diff generating functions. |
26 | * `git_diff_file` represents an item on one side of a possible delta | |
27 | * `git_diff_delta` represents a pair of items that have changed in some | |
28 | way - it contains two `git_diff_file` plus a status and other stuff. | |
29 | * `git_diff_list` is a list of deltas along with information about how | |
30 | those particular deltas were found. | |
31 | * `git_diff_patch` represents the actual diff between a pair of items. In | |
32 | some cases, a delta may not have a corresponding patch, if the objects | |
33 | are binary, for example. The content of a patch will be a set of hunks | |
34 | and lines. | |
35 | * A `hunk` is range of lines described by a `git_diff_range` (i.e. "lines | |
36 | 10-20 in the old file became lines 12-23 in the new"). It will have a | |
37 | header that compactly represents that information, and it will have a | |
38 | number of lines of context surrounding added and deleted lines. | |
39 | * A `line` is simple a line of data along with a `git_diff_line_t` value | |
27f680a9 | 40 | that tells how the data should be interpreted (e.g. context or added). |
114f5a6c RB |
41 | |
42 | Internal Objects | |
43 | ---------------- | |
44 | ||
45 | * `git_diff_file_content` is an internal structure that represents the | |
46 | data on one side of an item to be diffed; it is an augmented | |
47 | `git_diff_file` with more flags and the actual file data. | |
6918d81e MA |
48 | |
49 | * it is created from a repository plus a) a git_diff_file, b) a git_blob, | |
114f5a6c | 50 | or c) raw data and size |
6918d81e MA |
51 | * there are three main operations on git_diff_file_content: |
52 | ||
53 | * _initialization_ sets up the data structure and does what it can up to, | |
54 | but not including loading and looking at the actual data | |
55 | * _loading_ loads the data, preprocesses it (i.e. applies filters) and | |
56 | potentially analyzes it (to decide if binary) | |
57 | * _free_ releases loaded data and frees any allocated memory | |
114f5a6c RB |
58 | |
59 | * The internal structure of a `git_diff_patch` stores the actual diff | |
60 | between a pair of `git_diff_file_content` items | |
6918d81e MA |
61 | |
62 | * it may be "unset" if the items are not diffable | |
63 | * "empty" if the items are the same | |
64 | * otherwise it will consist of a set of hunks each of which covers some | |
65 | number of lines of context, additions and deletions | |
66 | * a patch is created from two git_diff_file_content items | |
67 | * a patch is fully instantiated in three phases: | |
68 | ||
69 | * initial creation and initialization | |
70 | * loading of data and preliminary data examination | |
71 | * diffing of data and optional storage of diffs | |
72 | * (TBD) if a patch is asked to store the diffs and the size of the diff | |
73 | is significantly smaller than the raw data of the two sides, then the | |
74 | patch may be flattened using a pool of string data | |
114f5a6c RB |
75 | |
76 | * `git_diff_output` is an internal structure that represents an output | |
77 | target for a `git_diff_patch` | |
6918d81e MA |
78 | * It consists of file, hunk, and line callbacks, plus a payload |
79 | * There is a standard flattened output that can be used for plain text output | |
80 | * Typically we use a `git_xdiff_output` which drives the callbacks via the | |
81 | xdiff code taken from core Git. | |
114f5a6c RB |
82 | |
83 | * `git_diff_driver` is an internal structure that encapsulates the logic | |
84 | for a given type of file | |
6918d81e MA |
85 | * a driver is looked up based on the name and mode of a file. |
86 | * the driver can then be used to: | |
87 | * determine if a file is binary (by attributes, by git_diff_options | |
88 | settings, or by examining the content) | |
89 | * give you a function pointer that is used to evaluate function context | |
90 | for hunk headers | |
91 | * At some point, the logic for getting a filtered version of file content | |
92 | or calculating the OID of a file may be moved into the driver. |