]>
Commit | Line | Data |
---|---|---|
b64e9a97 | 1 | //! This module implements the data storage and access layer. |
d78345bc | 2 | //! |
b64e9a97 | 3 | //! # Data formats |
d78345bc | 4 | //! |
b64e9a97 SR |
5 | //! PBS splits large files into chunks, and stores them deduplicated using |
6 | //! a content addressable storage format. | |
39a4df61 | 7 | //! |
b64e9a97 SR |
8 | //! Backup snapshots are stored as folders containing a manifest file and |
9 | //! potentially one or more index or blob files. | |
04652189 | 10 | //! |
b64e9a97 SR |
11 | //! The manifest contains hashes of all other files and can be signed by |
12 | //! the client. | |
04652189 | 13 | //! |
b64e9a97 SR |
14 | //! Blob files contain data directly. They are used for config files and |
15 | //! the like. | |
04652189 | 16 | //! |
b64e9a97 SR |
17 | //! Index files are used to reconstruct an original file. They contain a |
18 | //! list of SHA256 checksums. The `DynamicIndex*` format is able to deal | |
19 | //! with dynamic chunk sizes (CT and host backups), whereas the | |
20 | //! `FixedIndex*` format is an optimization to store a list of equal sized | |
21 | //! chunks (VMs, whole block devices). | |
04652189 | 22 | //! |
b64e9a97 SR |
23 | //! A chunk is defined as a binary blob, which is stored inside a |
24 | //! [ChunkStore](struct.ChunkStore.html) instead of the backup directory | |
25 | //! directly, and can be addressed by its SHA256 digest. | |
04652189 DM |
26 | //! |
27 | //! | |
b64e9a97 | 28 | //! # Garbage Collection (GC) |
04652189 | 29 | //! |
b64e9a97 SR |
30 | //! Deleting backups is as easy as deleting the corresponding .idx files. |
31 | //! However, this does not free up any storage, because those files just | |
32 | //! contain references to chunks. | |
04652189 | 33 | //! |
b64e9a97 SR |
34 | //! To free up some storage, we run a garbage collection process at |
35 | //! regular intervals. The collector uses a mark and sweep approach. In | |
36 | //! the first phase, it scans all .idx files to mark used chunks. The | |
37 | //! second phase then removes all unmarked chunks from the store. | |
04652189 | 38 | //! |
b64e9a97 SR |
39 | //! The locking mechanisms mentioned below make sure that we are the only |
40 | //! process running GC. We still want to be able to create backups during | |
41 | //! GC, so there may be multiple backup threads/tasks running, either | |
42 | //! started before GC, or while GC is running. | |
04652189 | 43 | //! |
b64e9a97 | 44 | //! ## `atime` based GC |
04652189 | 45 | //! |
b64e9a97 SR |
46 | //! The idea here is to mark chunks by updating the `atime` (access |
47 | //! timestamp) on the chunk file. This is quite simple and does not need | |
48 | //! additional RAM. | |
04652189 | 49 | //! |
b64e9a97 SR |
50 | //! One minor problem is that recent Linux versions use the `relatime` |
51 | //! mount flag by default for performance reasons (and we want that). When | |
52 | //! enabled, `atime` data is written to the disk only if the file has been | |
53 | //! modified since the `atime` data was last updated (`mtime`), or if the | |
54 | //! file was last accessed more than a certain amount of time ago (by | |
55 | //! default 24h). So we may only delete chunks with `atime` older than 24 | |
56 | //! hours. | |
57 | //! | |
58 | //! Another problem arises from running backups. The mark phase does not | |
59 | //! find any chunks from those backups, because there is no .idx file for | |
60 | //! them (created after the backup). Chunks created or touched by those | |
61 | //! backups may have an `atime` as old as the start time of those backups. | |
62 | //! Please note that the backup start time may predate the GC start time. | |
63 | //! So we may only delete chunks older than the start time of those | |
64 | //! running backup jobs, which might be more than 24h back (this is the | |
65 | //! reason why ProcessLocker exclusive locks only have to be exclusive | |
66 | //! between processes, since within one we can determine the age of the | |
67 | //! oldest shared lock). | |
04652189 | 68 | //! |
b64e9a97 | 69 | //! ## Store `marks` in RAM using a HASH |
04652189 | 70 | //! |
b64e9a97 | 71 | //! Might be better. Under investigation. |
04652189 DM |
72 | //! |
73 | //! | |
b64e9a97 | 74 | //! # Locking |
04652189 | 75 | //! |
b64e9a97 SR |
76 | //! Since PBS allows multiple potentially interfering operations at the |
77 | //! same time (e.g. garbage collect, prune, multiple backup creations | |
78 | //! (only in seperate groups), forget, ...), these need to lock against | |
79 | //! each other in certain scenarios. There is no overarching global lock | |
80 | //! though, instead always the finest grained lock possible is used, | |
81 | //! because running these operations concurrently is treated as a feature | |
82 | //! on its own. | |
04652189 | 83 | //! |
b64e9a97 | 84 | //! ## Inter-process Locking |
8a475734 | 85 | //! |
b64e9a97 SR |
86 | //! We need to be able to restart the proxmox-backup service daemons, so |
87 | //! that we can update the software without rebooting the host. But such | |
88 | //! restarts must not abort running backup jobs, so we need to keep the | |
89 | //! old service running until those jobs are finished. This implies that | |
90 | //! we need some kind of locking for modifying chunks and indices in the | |
91 | //! ChunkStore. | |
8a475734 | 92 | //! |
b64e9a97 SR |
93 | //! Please note that it is perfectly valid to have multiple |
94 | //! parallel ChunkStore writers, even when they write the same chunk | |
95 | //! (because the chunk would have the same name and the same data, and | |
96 | //! writes are completed atomically via a rename). The only problem is | |
97 | //! garbage collection, because we need to avoid deleting chunks which are | |
98 | //! still referenced. | |
8a475734 | 99 | //! |
b64e9a97 SR |
100 | //! To do this we use the |
101 | //! [ProcessLocker](../tools/struct.ProcessLocker.html). | |
8a475734 | 102 | //! |
b64e9a97 | 103 | //! ### ChunkStore-wide |
8a475734 | 104 | //! |
b64e9a97 | 105 | //! * Create Index Files: |
8a475734 | 106 | //! |
b64e9a97 | 107 | //! Acquire shared lock for ChunkStore. |
c8ec450e | 108 | //! |
b64e9a97 SR |
109 | //! Note: When creating .idx files, we create a temporary .tmp file, |
110 | //! then do an atomic rename. | |
c8ec450e | 111 | //! |
b64e9a97 | 112 | //! * Garbage Collect: |
c8ec450e | 113 | //! |
b64e9a97 SR |
114 | //! Acquire exclusive lock for ChunkStore. If we have |
115 | //! already a shared lock for the ChunkStore, try to upgrade that | |
116 | //! lock. | |
c8ec450e | 117 | //! |
b64e9a97 SR |
118 | //! Exclusive locks only work _between processes_. It is valid to have an |
119 | //! exclusive and one or more shared locks held within one process. Writing | |
120 | //! chunks within one process is synchronized using the gc_mutex. | |
121 | //! | |
122 | //! On server restart, we stop any running GC in the old process to avoid | |
123 | //! having the exclusive lock held for too long. | |
124 | //! | |
125 | //! ## Locking table | |
126 | //! | |
127 | //! Below table shows all operations that play a role in locking, and which | |
128 | //! mechanisms are used to make their concurrent usage safe. | |
129 | //! | |
130 | //! | starting ><br>v during | read index file | create index file | GC mark | GC sweep | update manifest | forget | prune | create backup | verify | reader api | | |
131 | //! |-|-|-|-|-|-|-|-|-|-|-| | |
132 | //! | **read index file** | / | / | / | / | / | mmap stays valid, oldest_shared_lock prevents GC | see forget column | / | / | / | | |
133 | //! | **create index file** | / | / | / | / | / | / | / | /, happens at the end, after all chunks are touched | /, only happens without a manifest | / | | |
134 | //! | **GC mark** | / | Datastore process-lock shared | gc_mutex, exclusive ProcessLocker | gc_mutex | /, GC only cares about index files, not manifests | tells GC about removed chunks | see forget column | /, index files don’t exist yet | / | / | | |
135 | //! | **GC sweep** | / | Datastore process-lock shared | gc_mutex, exclusive ProcessLocker | gc_mutex | / | /, chunks already marked | see forget column | chunks get touched; chunk_store.mutex; oldest PL lock | / | / | | |
136 | //! | **update manifest** | / | / | / | / | update_manifest lock | update_manifest lock, remove dir under lock | see forget column | /, “write manifest” happens at the end | /, can call “write manifest”, see that column | / | | |
137 | //! | **forget** | / | / | removed_during_gc mutex is held during unlink | marking done, doesn’t matter if forgotten now | update_manifest lock, forget waits for lock | /, unlink is atomic | causes forget to fail, but that’s OK | running backup has snapshot flock | /, potentially detects missing folder | shared snap flock | | |
138 | //! | **prune** | / | / | see forget row | see forget row | see forget row | causes warn in prune, but no error | see forget column | running and last non-running can’t be pruned | see forget row | shared snap flock | | |
139 | //! | **create backup** | / | only time this happens, thus has snapshot flock | / | chunks get touched; chunk_store.mutex; oldest PL lock | no lock, but cannot exist beforehand | snapshot flock, can’t be forgotten | running and last non-running can’t be pruned | snapshot group flock, only one running per group | /, won’t be verified since manifest missing | / | | |
140 | //! | **verify** | / | / | / | / | see “update manifest” row | /, potentially detects missing folder | see forget column | / | /, but useless (“update manifest” protects itself) | / | | |
141 | //! | **reader api** | / | / | / | /, open snap can’t be forgotten, so ref must exist | / | prevented by shared snap flock | prevented by shared snap flock | / | / | /, lock is shared |! | |
142 | //! * / = no interaction | |
143 | //! * shared/exclusive from POV of 'starting' process | |
cbdd8c54 | 144 | |
f7d4e4b5 | 145 | use anyhow::{bail, Error}; |
f74a03da | 146 | |
bf6e3217 DM |
147 | // Note: .pcat1 => Proxmox Catalog Format version 1 |
148 | pub const CATALOG_NAME: &str = "catalog.pcat1.didx"; | |
36493d4d | 149 | |
986bef16 DM |
150 | #[macro_export] |
151 | macro_rules! PROXMOX_BACKUP_PROTOCOL_ID_V1 { | |
152 | () => { "proxmox-backup-protocol-v1" } | |
153 | } | |
c9ec0956 | 154 | |
dd066d28 DM |
155 | #[macro_export] |
156 | macro_rules! PROXMOX_BACKUP_READER_PROTOCOL_ID_V1 { | |
157 | () => { "proxmox-backup-reader-protocol-v1" } | |
158 | } | |
159 | ||
f74a03da DM |
160 | /// Unix system user used by proxmox-backup-proxy |
161 | pub const BACKUP_USER_NAME: &str = "backup"; | |
a6ed5e12 TL |
162 | /// Unix system group used by proxmox-backup-proxy |
163 | pub const BACKUP_GROUP_NAME: &str = "backup"; | |
f74a03da DM |
164 | |
165 | /// Return User info for the 'backup' user (``getpwnam_r(3)``) | |
166 | pub fn backup_user() -> Result<nix::unistd::User, Error> { | |
167 | match nix::unistd::User::from_name(BACKUP_USER_NAME)? { | |
168 | Some(user) => Ok(user), | |
169 | None => bail!("Unable to lookup backup user."), | |
170 | } | |
171 | } | |
172 | ||
a6ed5e12 TL |
173 | /// Return Group info for the 'backup' group (``getgrnam(3)``) |
174 | pub fn backup_group() -> Result<nix::unistd::Group, Error> { | |
175 | match nix::unistd::Group::from_name(BACKUP_GROUP_NAME)? { | |
176 | Some(group) => Ok(group), | |
177 | None => bail!("Unable to lookup backup user."), | |
178 | } | |
179 | } | |
180 | ||
991abfa8 DM |
181 | mod file_formats; |
182 | pub use file_formats::*; | |
a7dd4830 | 183 | |
59e9ba01 DM |
184 | mod manifest; |
185 | pub use manifest::*; | |
186 | ||
c38266c1 DM |
187 | mod crypt_config; |
188 | pub use crypt_config::*; | |
48b4b40b | 189 | |
826f309b DM |
190 | mod key_derivation; |
191 | pub use key_derivation::*; | |
192 | ||
018d11bb DM |
193 | mod crypt_reader; |
194 | pub use crypt_reader::*; | |
195 | ||
196 | mod crypt_writer; | |
197 | pub use crypt_writer::*; | |
198 | ||
199 | mod checksum_reader; | |
200 | pub use checksum_reader::*; | |
201 | ||
202 | mod checksum_writer; | |
203 | pub use checksum_writer::*; | |
204 | ||
7d83440c WB |
205 | mod chunker; |
206 | pub use chunker::*; | |
207 | ||
3025b3a5 DM |
208 | mod data_blob; |
209 | pub use data_blob::*; | |
210 | ||
018d11bb DM |
211 | mod data_blob_reader; |
212 | pub use data_blob_reader::*; | |
213 | ||
214 | mod data_blob_writer; | |
215 | pub use data_blob_writer::*; | |
216 | ||
89245fb5 DM |
217 | mod catalog; |
218 | pub use catalog::*; | |
9d135fe6 | 219 | |
dafc27ae DM |
220 | mod chunk_stream; |
221 | pub use chunk_stream::*; | |
222 | ||
7e336555 DM |
223 | mod chunk_stat; |
224 | pub use chunk_stat::*; | |
225 | ||
b8506736 DM |
226 | mod read_chunk; |
227 | pub use read_chunk::*; | |
228 | ||
e5064ba6 DM |
229 | mod chunk_store; |
230 | pub use chunk_store::*; | |
231 | ||
7bc1d727 WB |
232 | mod index; |
233 | pub use index::*; | |
234 | ||
e5064ba6 DM |
235 | mod fixed_index; |
236 | pub use fixed_index::*; | |
237 | ||
238 | mod dynamic_index; | |
239 | pub use dynamic_index::*; | |
240 | ||
b3483782 DM |
241 | mod backup_info; |
242 | pub use backup_info::*; | |
243 | ||
dc188491 DM |
244 | mod prune; |
245 | pub use prune::*; | |
246 | ||
e5064ba6 DM |
247 | mod datastore; |
248 | pub use datastore::*; | |
f14c96ea | 249 | |
2260f065 DM |
250 | mod store_progress; |
251 | pub use store_progress::*; | |
252 | ||
c2009e53 DM |
253 | mod verify; |
254 | pub use verify::*; | |
255 | ||
f14c96ea CE |
256 | mod catalog_shell; |
257 | pub use catalog_shell::*; | |
4a3adc3d DC |
258 | |
259 | mod async_index_reader; | |
260 | pub use async_index_reader::*; |