]> git.proxmox.com Git - proxmox-backup.git/blob - docs/technical-overview.rst
restore daemon: search disk also with truncated serial
[proxmox-backup.git] / docs / technical-overview.rst
1 .. _tech_design_overview:
2
3 Technical Overview
4 ==================
5
6 Datastores
7 ----------
8
9 A Datastore is the logical place where :ref:`Backup Snapshots
10 <term_backup_snapshot>` and their chunks are stored. Snapshots consist of a
11 manifest, blobs, dynamic- and fixed-indexes (see :ref:`terms`), and are
12 stored in the following directory structure:
13
14 <datastore-root>/<type>/<id>/<time>/
15
16 The deduplication of datastores is based on reusing chunks, which are
17 referenced by the indexes in a backup snapshot. This means that multiple
18 indexes can reference the same chunks, reducing the amount of space needed to
19 contain the data (even across backup snapshots).
20
21 Chunks
22 ------
23
24 A chunk is some (possibly encrypted) data with a CRC-32 checksum at the end and
25 a type marker at the beginning. It is identified by the SHA-256 checksum of its
26 content.
27
28 To generate such chunks, backup data is split either into fixed-size or
29 dynamically sized chunks. The same content will be hashed to the same checksum.
30
31 The chunks of a datastore are found in
32
33 <datastore-root>/.chunks/
34
35 This chunk directory is further subdivided by the first four byte of the chunks
36 checksum, so the chunk with the checksum
37
38 a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
39
40 lives in
41
42 <datastore-root>/.chunks/a342/
43
44 This is done to reduce the number of files per directory, as having many files
45 per directory can be bad for file system performance.
46
47 These chunk directories ('0000'-'ffff') will be preallocated when a datastore
48 is created.
49
50 Fixed-sized Chunks
51 ^^^^^^^^^^^^^^^^^^
52
53 For block based backups (like VMs), fixed-sized chunks are used. The content
54 (disk image), is split into chunks of the same length (typically 4 MiB).
55
56 This works very well for VM images, since the file system on the guest most
57 often tries to allocate files in contiguous pieces, so new files get new
58 blocks, and changing existing files changes only their own blocks.
59
60 As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps', which
61 can track the changed blocks of an image. Since these bitmap are also a
62 representation of the image split into chunks, there is a direct relation
63 between dirty blocks of the image and chunks which need to get uploaded, so
64 only modified chunks of the disk have to be uploaded for a backup.
65
66 Since the image is always split into chunks of the same size, unchanged blocks
67 will result in identical checksums for those chunks, so such chunks do not need
68 to be backed up again. This way storage snapshots are not needed to find the
69 changed blocks.
70
71 For consistency, `Proxmox VE`_ uses a QEMU internal snapshot mechanism, that
72 does not rely on storage snapshots either.
73
74 Dynamically sized Chunks
75 ^^^^^^^^^^^^^^^^^^^^^^^^
76
77 If one does not want to backup block-based systems but rather file-based
78 systems, using fixed-sized chunks is not a good idea, since every time a file
79 would change in size, the remaining data gets shifted around and this would
80 result in many chunks changing, reducing the amount of deduplication.
81
82 To improve this, `Proxmox Backup`_ Server uses dynamically sized chunks
83 instead. Instead of splitting an image into fixed sizes, it first generates a
84 consistent file archive (:ref:`pxar <pxar-format>`) and uses a rolling hash
85 over this on-the-fly generated archive to calculate chunk boundaries.
86
87 We use a variant of Buzhash which is a cyclic polynomial algorithm. It works
88 by continuously calculating a checksum while iterating over the data, and on
89 certain conditions it triggers a hash boundary.
90
91 Assuming that most files of the system that is to be backed up have not
92 changed, eventually the algorithm triggers the boundary on the same data as a
93 previous backup, resulting in chunks that can be reused.
94
95 Encrypted Chunks
96 ^^^^^^^^^^^^^^^^
97
98 Encrypted chunks are a special case. Both fixed- and dynamically sized chunks
99 can be encrypted, and they are handled in a slightly different manner than
100 normal chunks.
101
102 The hashes of encrypted chunks are calculated not with the actual (encrypted)
103 chunk content, but with the plaintext content concatenated with the encryption
104 key. This way, two chunks of the same data encrypted with different keys
105 generate two different checksums and no collisions occur for multiple
106 encryption keys.
107
108 This is done to speed up the client part of the backup, since it only needs to
109 encrypt chunks that are actually getting uploaded. Chunks that exist already in
110 the previous backup, do not need to be encrypted and uploaded.
111
112 Caveats and Limitations
113 -----------------------
114
115 Notes on hash collisions
116 ^^^^^^^^^^^^^^^^^^^^^^^^
117
118 Every hashing algorithm has a chance to produce collisions, meaning two (or
119 more) inputs generate the same checksum. For SHA-256, this chance is
120 negligible. To calculate such a collision, one can use the ideas of the
121 'birthday problem' from probability theory. For big numbers, this is actually
122 infeasible to calculate with regular computers, but there is a good
123 approximation:
124
125 .. math::
126
127 p(n, d) = 1 - e^{-n^2/(2d)}
128
129 Where `n` is the number of tries, and `d` is the number of possibilities.
130 For a concrete example lets assume a large datastore of 1 PiB, and an average
131 chunk size of 4 MiB. That means :math:`n = 268435456` tries, and :math:`d =
132 2^{256}` possibilities. Inserting those values in the formula from earlier you
133 will see that the probability of a collision in that scenario is:
134
135 .. math::
136
137 3.1115 * 10^{-61}
138
139 For context, in a lottery game of guessing 6 out of 45, the chance to correctly
140 guess all 6 numbers is only :math:`1.2277 * 10^{-7}`, that means the chance of
141 collission is about the same as winning 13 such lotto games *in a row*.
142
143 In conclusion, it is extremely unlikely that such a collision would occur by
144 accident in a normal datastore.
145
146 Additionally, SHA-256 is prone to length extension attacks, but since there is
147 an upper limit for how big the chunk are, this is not a problem, since a
148 potential attacker cannot arbitrarily add content to the data beyond that
149 limit.
150
151 File-based Backup
152 ^^^^^^^^^^^^^^^^^
153
154 Since dynamically sized chunks (for file-based backups) are created on a custom
155 archive format (pxar) and not over the files directly, there is no relation
156 between files and the chunks. This means that the Proxmox Backup client has to
157 read all files again for every backup, otherwise it would not be possible to
158 generate a consistent independent pxar archive where the original chunks can be
159 reused. Note that there will be still only new or change chunks be uploaded.
160
161 Verification of encrypted chunks
162 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
163
164 For encrypted chunks, only the checksum of the original (plaintext) data is
165 available, making it impossible for the server (without the encryption key), to
166 verify its content against it. Instead only the CRC-32 checksum gets checked.