]> git.proxmox.com Git - proxmox-backup.git/blame - src/backup.rs
src/tools/xattr.rs: fix test when run as root
[proxmox-backup.git] / src / backup.rs
CommitLineData
d78345bc
DM
1//! This module implements the proxmox backup chunked data storage
2//!
3//! A chunk is simply defined as binary blob. We store them inside a
4//! `ChunkStore`, addressed by the SHA256 digest of the binary
5//! blob. This technology is also known as content-addressable
6//! storage.
7//!
8//! We store larger files by splitting them into chunks. The resulting
9//! SHA256 digest list is stored as separate index file. The
10//! `DynamicIndex*` format is able to deal with dynamic chunk sizes,
11//! whereas the `FixedIndex*` format is an optimization to store a
12//! list of equal sized chunks.
04652189
DM
13//!
14//! # ChunkStore Locking
15//!
16//! We need to be able to restart the proxmox-backup service daemons,
17//! so that we can update the software without rebooting the host. But
18//! such restarts must not abort running backup jobs, so we need to
19//! keep the old service running until those jobs are finished. This
c8ec450e 20//! implies that we need some kind of locking for the
04652189
DM
21//! ChunkStore. Please note that it is perfectly valid to have
22//! multiple parallel ChunkStore writers, even when they write the
23//! same chunk (because the chunk would have the same name and the
24//! same data). The only real problem is garbage collection, because
25//! we need to avoid deleting chunks which are still referenced.
26//!
27//! * Read Index Files:
28//!
29//! Acquire shared lock for .idx files.
30//!
31//!
32//! * Delete Index Files:
33//!
34//! Acquire exclusive lock for .idx files. This makes sure that we do
35//! not delete index files while they are still in use.
36//!
37//!
38//! * Create Index Files:
39//!
8a475734 40//! Acquire shared lock for ChunkStore (process wide).
04652189 41//!
c8ec450e
DM
42//! Note: When creating .idx files, we create temporary (.tmp) file,
43//! then do an atomic rename ...
04652189
DM
44//!
45//!
46//! * Garbage Collect:
47//!
8a475734
DM
48//! Acquire exclusive lock for ChunkStore (process wide). If we have
49//! already an shared lock for ChunkStore, try to updraged that
50//! lock.
04652189
DM
51//!
52//!
53//! * Server Restart
54//!
55//! Try to abort running garbage collection to release exclusive
56//! ChunkStore lock asap. Start new service with existing listening
57//! socket.
58//!
8a475734 59//!
c8ec450e 60//! # Garbage Collection (GC)
8a475734
DM
61//!
62//! Deleting backups is as easy as deleting the corresponding .idx
63//! files. Unfortunately, this does not free up any storage, because
64//! those files just contains references to chunks.
65//!
66//! To free up some storage, we run a garbage collection process at
67//! regular intervals. The collector uses an mark and sweep
c374f054
DM
68//! approach. In the first phase, it scans all .idx files to mark used
69//! chunks. The second phase then removes all unmarked chunks from the
8a475734
DM
70//! store.
71//!
72//! The above locking mechanism makes sure that we are the only
c8ec450e
DM
73//! process running GC. But we still want to be able to create backups
74//! during GC, so there may be multiple backup threads/tasks
75//! running. Either started before GC started, or started while GC is
76//! running.
8a475734 77//!
c8ec450e 78//! ## `atime` based GC
8a475734 79//!
c8ec450e
DM
80//! The idea here is to mark chunks by updating the `atime` (access
81//! timestamp) on the chunk file. This is quite simple and does not
c374f054 82//! need additional RAM.
c8ec450e
DM
83//!
84//! One minor problem is that recent Linux versions use the `relatime`
85//! mount flag by default for performance reasons (yes, we want
86//! that). When enabled, `atime` data is written to the disk only if
87//! the file has been modified since the `atime` data was last updated
88//! (`mtime`), or if the file was last accessed more than a certain
c374f054
DM
89//! amount of time ago (by default 24h). So we may only delete chunks
90//! with `atime` older than 24 hours.
91//!
92//! Another problem arise from running backups. The mark phase does
93//! not find any chunks from those backups, because there is no .idx
94//! file for them (created after the backup). Chunks created or
95//! touched by those backups may have an `atime` as old as the start
96//! time of those backup. Please not that the backup start time may
97//! predate the GC start time. Se we may only delete chunk older than
98//! the start time of those running backup jobs.
c8ec450e 99//!
c8ec450e
DM
100//!
101//! ## Store `marks` in RAM using a HASH
102//!
103//! Not sure if this is better. TODO
cbdd8c54 104
dafc27ae
DM
105mod chunk_stream;
106pub use chunk_stream::*;
107
7e336555
DM
108mod chunk_stat;
109pub use chunk_stat::*;
110
06178f13 111pub use proxmox_protocol::Chunker;
e5064ba6
DM
112
113mod chunk_store;
114pub use chunk_store::*;
115
7bc1d727
WB
116mod index;
117pub use index::*;
118
e5064ba6
DM
119mod fixed_index;
120pub use fixed_index::*;
121
122mod dynamic_index;
123pub use dynamic_index::*;
124
b3483782
DM
125mod backup_info;
126pub use backup_info::*;
127
e5064ba6
DM
128mod datastore;
129pub use datastore::*;