]>
Commit | Line | Data |
---|---|---|
ec1ae7e6 TL |
1 | .. _tech_design_overview: |
2 | ||
1531185d DC |
3 | Technical Overview |
4 | ================== | |
5 | ||
1531185d DC |
6 | Datastores |
7 | ---------- | |
8 | ||
3253d8a2 | 9 | A Datastore is the logical place where :ref:`Backup Snapshots |
a98e2287 | 10 | <term_backup_snapshot>` and their chunks are stored. Snapshots consist of a |
7ccbce03 | 11 | manifest, blobs, and dynamic- and fixed-indexes (see :ref:`terms`), and are |
3253d8a2 | 12 | stored in the following directory structure: |
1531185d DC |
13 | |
14 | <datastore-root>/<type>/<id>/<time>/ | |
15 | ||
16 | The deduplication of datastores is based on reusing chunks, which are | |
17 | referenced by the indexes in a backup snapshot. This means that multiple | |
3253d8a2 TL |
18 | indexes can reference the same chunks, reducing the amount of space needed to |
19 | contain the data (even across backup snapshots). | |
1531185d | 20 | |
d4827535 DC |
21 | Snapshots |
22 | --------- | |
23 | ||
24 | A Snapshot is the collection of manifest, blobs and indexes that represent | |
25 | a backup. When a client creates a snapshot, it can upload blobs (single files | |
26 | which are not chunked, e.g. the client log), or one or more indexes | |
27 | (fixed or dynamic). | |
28 | ||
29 | When uploading an index, the client first has to read the source data, chunk it | |
30 | and send the data as chunks with their identifying checksum to the server. | |
c51f0d5e CE |
31 | When using the :ref:`change detection mode <change_detection_mode>` payload |
32 | chunks for unchanged files are reused from the previous snapshot, thereby not | |
33 | reading the source data again. | |
d4827535 DC |
34 | |
35 | If there is a previous Snapshot in the backup group, the client can first | |
36 | download the chunk list of the previous Snapshot. If it detects a chunk that | |
37 | already exists on the server, it can send only the checksum instead of data | |
38 | and checksum. This way the actual upload of Snapshots is incremental while | |
39 | each Snapshot references all chunks and is thus a full backup. | |
40 | ||
41 | After uploading all data, the client has to signal to the server that the | |
42 | backup is finished. If that is not done before the connection closes, the | |
43 | server will remove the unfinished snapshot. | |
44 | ||
1531185d DC |
45 | Chunks |
46 | ------ | |
47 | ||
3253d8a2 TL |
48 | A chunk is some (possibly encrypted) data with a CRC-32 checksum at the end and |
49 | a type marker at the beginning. It is identified by the SHA-256 checksum of its | |
50 | content. | |
1531185d DC |
51 | |
52 | To generate such chunks, backup data is split either into fixed-size or | |
3253d8a2 | 53 | dynamically sized chunks. The same content will be hashed to the same checksum. |
1531185d DC |
54 | |
55 | The chunks of a datastore are found in | |
56 | ||
57 | <datastore-root>/.chunks/ | |
58 | ||
7ccbce03 DW |
59 | This chunk directory is further subdivided by the first four bytes of the |
60 | chunk's checksum, so a chunk with the checksum | |
1531185d DC |
61 | |
62 | a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b | |
63 | ||
64 | lives in | |
65 | ||
66 | <datastore-root>/.chunks/a342/ | |
67 | ||
3253d8a2 TL |
68 | This is done to reduce the number of files per directory, as having many files |
69 | per directory can be bad for file system performance. | |
1531185d | 70 | |
3253d8a2 TL |
71 | These chunk directories ('0000'-'ffff') will be preallocated when a datastore |
72 | is created. | |
1531185d | 73 | |
7ccbce03 | 74 | Fixed-Sized Chunks |
1531185d DC |
75 | ^^^^^^^^^^^^^^^^^^ |
76 | ||
77 | For block based backups (like VMs), fixed-sized chunks are used. The content | |
78 | (disk image), is split into chunks of the same length (typically 4 MiB). | |
79 | ||
3253d8a2 TL |
80 | This works very well for VM images, since the file system on the guest most |
81 | often tries to allocate files in contiguous pieces, so new files get new | |
82 | blocks, and changing existing files changes only their own blocks. | |
1531185d | 83 | |
3253d8a2 | 84 | As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps', which |
7ccbce03 | 85 | can track the changed blocks of an image. Since these bitmaps are also a |
efc09f63 | 86 | representation of the image split into chunks, there is a direct relation |
7ccbce03 DW |
87 | between the dirty blocks of the image and chunks which need to be uploaded. |
88 | Thus, only modified chunks of the disk need to be uploaded to a backup. | |
1531185d | 89 | |
efc09f63 | 90 | Since the image is always split into chunks of the same size, unchanged blocks |
3253d8a2 TL |
91 | will result in identical checksums for those chunks, so such chunks do not need |
92 | to be backed up again. This way storage snapshots are not needed to find the | |
93 | changed blocks. | |
1531185d DC |
94 | |
95 | For consistency, `Proxmox VE`_ uses a QEMU internal snapshot mechanism, that | |
96 | does not rely on storage snapshots either. | |
97 | ||
7ccbce03 | 98 | Dynamically Sized Chunks |
1531185d DC |
99 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
100 | ||
7ccbce03 DW |
101 | When working with file-based systems rather than block-based systems, |
102 | using fixed-sized chunks is not a good idea, since every time a file | |
103 | would change in size, the remaining data would be shifted around, | |
104 | resulting in many chunks changing and the amount of deduplication being reduced. | |
1531185d DC |
105 | |
106 | To improve this, `Proxmox Backup`_ Server uses dynamically sized chunks | |
3253d8a2 TL |
107 | instead. Instead of splitting an image into fixed sizes, it first generates a |
108 | consistent file archive (:ref:`pxar <pxar-format>`) and uses a rolling hash | |
1531185d DC |
109 | over this on-the-fly generated archive to calculate chunk boundaries. |
110 | ||
7ccbce03 | 111 | We use a variant of Buzhash which is a cyclic polynomial algorithm. It works |
3253d8a2 | 112 | by continuously calculating a checksum while iterating over the data, and on |
7ccbce03 | 113 | certain conditions, it triggers a hash boundary. |
1531185d | 114 | |
7ccbce03 | 115 | Assuming that most files on the system that is to be backed up have not |
3253d8a2 TL |
116 | changed, eventually the algorithm triggers the boundary on the same data as a |
117 | previous backup, resulting in chunks that can be reused. | |
1531185d DC |
118 | |
119 | Encrypted Chunks | |
120 | ^^^^^^^^^^^^^^^^ | |
121 | ||
3253d8a2 TL |
122 | Encrypted chunks are a special case. Both fixed- and dynamically sized chunks |
123 | can be encrypted, and they are handled in a slightly different manner than | |
124 | normal chunks. | |
1531185d DC |
125 | |
126 | The hashes of encrypted chunks are calculated not with the actual (encrypted) | |
7ccbce03 DW |
127 | chunk content, but with the plain-text content, concatenated with the encryption |
128 | key. This way, two chunks with the same data but encrypted with different keys | |
3253d8a2 TL |
129 | generate two different checksums and no collisions occur for multiple |
130 | encryption keys. | |
1531185d | 131 | |
3253d8a2 TL |
132 | This is done to speed up the client part of the backup, since it only needs to |
133 | encrypt chunks that are actually getting uploaded. Chunks that exist already in | |
134 | the previous backup, do not need to be encrypted and uploaded. | |
1531185d DC |
135 | |
136 | Caveats and Limitations | |
137 | ----------------------- | |
138 | ||
7ccbce03 | 139 | Notes on Hash Collisions |
1531185d DC |
140 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
141 | ||
3253d8a2 TL |
142 | Every hashing algorithm has a chance to produce collisions, meaning two (or |
143 | more) inputs generate the same checksum. For SHA-256, this chance is | |
7ccbce03 DW |
144 | negligible. To calculate the chances of such a collision, one can use the ideas |
145 | of the 'birthday problem' from probability theory. For big numbers, this is | |
146 | actually unfeasible to calculate with regular computers, but there is a good | |
3253d8a2 | 147 | approximation: |
1531185d DC |
148 | |
149 | .. math:: | |
150 | ||
151 | p(n, d) = 1 - e^{-n^2/(2d)} | |
152 | ||
efc09f63 | 153 | Where `n` is the number of tries, and `d` is the number of possibilities. |
7ccbce03 | 154 | For a concrete example, lets assume a large datastore of 1 PiB and an average |
efc09f63 TL |
155 | chunk size of 4 MiB. That means :math:`n = 268435456` tries, and :math:`d = |
156 | 2^{256}` possibilities. Inserting those values in the formula from earlier you | |
157 | will see that the probability of a collision in that scenario is: | |
1531185d DC |
158 | |
159 | .. math:: | |
160 | ||
161 | 3.1115 * 10^{-61} | |
162 | ||
7ccbce03 DW |
163 | For context, in a lottery game of guessing 6 numbers out of 45, the chance to |
164 | correctly guess all 6 numbers is only :math:`1.2277 * 10^{-7}`. This means the | |
165 | chance of a collision is about the same as winning 13 such lottery games *in a | |
166 | row*. | |
1531185d | 167 | |
efc09f63 TL |
168 | In conclusion, it is extremely unlikely that such a collision would occur by |
169 | accident in a normal datastore. | |
1531185d | 170 | |
3253d8a2 | 171 | Additionally, SHA-256 is prone to length extension attacks, but since there is |
7ccbce03 | 172 | an upper limit for how big the chunks are, this is not a problem, because a |
3253d8a2 TL |
173 | potential attacker cannot arbitrarily add content to the data beyond that |
174 | limit. | |
1531185d | 175 | |
7ccbce03 | 176 | File-Based Backup |
1531185d DC |
177 | ^^^^^^^^^^^^^^^^^ |
178 | ||
179 | Since dynamically sized chunks (for file-based backups) are created on a custom | |
180 | archive format (pxar) and not over the files directly, there is no relation | |
7ccbce03 | 181 | between the files and chunks. This means that the Proxmox Backup Client has to |
efc09f63 | 182 | read all files again for every backup, otherwise it would not be possible to |
7ccbce03 DW |
183 | generate a consistent, independent pxar archive where the original chunks can be |
184 | reused. Note that in spite of this, only new or changed chunks will be uploaded. | |
1531185d | 185 | |
7ccbce03 | 186 | Verification of Encrypted Chunks |
1531185d DC |
187 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
188 | ||
3253d8a2 | 189 | For encrypted chunks, only the checksum of the original (plaintext) data is |
7ccbce03 | 190 | available, making it impossible for the server (without the encryption key) to |
3253d8a2 | 191 | verify its content against it. Instead only the CRC-32 checksum gets checked. |
5a839306 HL |
192 | |
193 | Troubleshooting | |
194 | --------------- | |
195 | ||
7ccbce03 DW |
196 | Index files(*.fidx*, *.didx*) contain information about how to rebuild a file. |
197 | More precisely, they contain an ordered list of references to the chunks that | |
198 | the original file was split into. If there is something wrong with a snapshot, | |
199 | it might be useful to find out which chunks are referenced in it, and check | |
8c83b75a | 200 | whether they are present and intact. The ``proxmox-backup-debug`` command-line |
7ccbce03 DW |
201 | tool can be used to inspect such files and recover their contents. For example, |
202 | to get a list of the referenced chunks of a *.fidx* index: | |
5a839306 HL |
203 | |
204 | .. code-block:: console | |
205 | ||
206 | # proxmox-backup-debug inspect file drive-scsi0.img.fidx | |
207 | ||
7ccbce03 DW |
208 | The same command can be used to inspect *.blob* files. Without the ``--decode`` |
209 | parameter, just the size and the encryption type, if any, are printed. If | |
210 | ``--decode`` is set, the blob file is decoded into the specified file ('-' will | |
211 | decode it directly to stdout). | |
212 | ||
213 | The following example would print the decoded contents of | |
214 | `qemu-server.conf.blob`. If the file you're trying to inspect is encrypted, a | |
215 | path to the key file must be provided using ``--keyfile``. | |
5a839306 HL |
216 | |
217 | .. code-block:: console | |
218 | ||
219 | # proxmox-backup-debug inspect file qemu-server.conf.blob --decode - | |
220 | ||
7ccbce03 | 221 | You can also check in which index files a specific chunk file is referenced |
5a839306 HL |
222 | with: |
223 | ||
224 | .. code-block:: console | |
225 | ||
a2945884 | 226 | # proxmox-backup-debug inspect chunk b531d3ffc9bd7c65748a61198c060678326a431db7eded874c327b7986e595e0 --reference-filter /path/in/a/datastore/directory |
5a839306 | 227 | |
7ccbce03 | 228 | Here ``--reference-filter`` specifies where index files should be searched. This |
a2945884 | 229 | can be an arbitrary path. If, for some reason, the filename of the chunk was |
7ccbce03 DW |
230 | changed, you can explicitly specify the digest using ``--digest``. By default, the |
231 | chunk filename is used as the digest to look for. If no ``--reference-filter`` | |
232 | is specified, it will only print the CRC and encryption status of the chunk. You | |
233 | can also decode chunks, by setting the ``--decode`` flag. If the chunk is | |
234 | encrypted, a ``--keyfile`` must be provided, in order to decode it. | |
5a839306 | 235 | |
7ccbce03 DW |
236 | Restore without a Running Proxmox Backup Server |
237 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
5a839306 | 238 | |
7ccbce03 | 239 | It's possible to restore specific files from a snapshot, without a running |
34407477 AZ |
240 | `Proxmox Backup`_ Server instance, using the ``recover`` subcommand, provided |
241 | you have access to the intact index and chunk files. Note that you also need the | |
7ccbce03 | 242 | corresponding key file if the backup was encrypted. |
5a839306 HL |
243 | |
244 | .. code-block:: console | |
245 | ||
a2945884 | 246 | # proxmox-backup-debug recover index drive-scsi0.img.fidx /path/to/.chunks |
5a839306 | 247 | |
7ccbce03 DW |
248 | In the above example, the `/path/to/.chunks` argument is the path to the |
249 | directory that contains the chunks, and `drive-scsi0.img.fidx` is the index file | |
250 | of the file you'd like to restore. Both paths can be absolute or relative. With | |
251 | ``--skip-crc``, it's possible to disable the CRC checks of the chunks. This | |
252 | will speed up the process slightly and allow for trying to restore (partially) | |
a2945884 TL |
253 | corrupt chunks. It's recommended to always try without the skip-CRC option |
254 | first. | |
5a839306 | 255 |