src/backup.rs

   1 //! This module implements the proxmox backup chunked data storage
   2 //!
   3 //! A chunk is simply defined as binary blob. We store them inside a
   4 //! `ChunkStore`, addressed by the SHA256 digest of the binary
   5 //! blob. This technology is also known as content-addressable
   6 //! storage.
   7 //!
   8 //! We store larger files by splitting them into chunks. The resulting
   9 //! SHA256 digest list is stored as separate index file. The
  10 //! `DynamicIndex*` format is able to deal with dynamic chunk sizes,
  11 //! whereas the `FixedIndex*` format is an optimization to store a
  12 //! list of equal sized chunks.
  13 //!
  14 //! # ChunkStore Locking
  15 //!
  16 //! We need to be able to restart the proxmox-backup service daemons,
  17 //! so that we can update the software without rebooting the host. But
  18 //! such restarts must not abort running backup jobs, so we need to
  19 //! keep the old service running until those jobs are finished. This
  20 //! implies that we need some kind of locking for the
  21 //! ChunkStore. Please note that it is perfectly valid to have
  22 //! multiple parallel ChunkStore writers, even when they write the
  23 //! same chunk (because the chunk would have the same name and the
  24 //! same data). The only real problem is garbage collection, because
  25 //! we need to avoid deleting chunks which are still referenced.
  26 //!
  27 //! * Read Index Files:
  28 //!
  29 //!   Acquire shared lock for .idx files.
  30 //!
  31 //!
  32 //! * Delete Index Files:
  33 //!
  34 //!   Acquire exclusive lock for .idx files. This makes sure that we do
  35 //!   not delete index files while they are still in use.
  36 //!
  37 //!
  38 //! * Create Index Files:
  39 //!
  40 //!   Acquire shared lock for ChunkStore (process wide).
  41 //!
  42 //!   Note: When creating .idx files, we create temporary (.tmp) file,
  43 //!   then do an atomic rename ...
  44 //!
  45 //!
  46 //! * Garbage Collect:
  47 //!
  48 //!   Acquire exclusive lock for ChunkStore (process wide). If we have
  49 //!   already an shared lock for ChunkStore, try to updraged that
  50 //!   lock.
  51 //!
  52 //!
  53 //! * Server Restart
  54 //!
  55 //!   Try to abort running garbage collection to release exclusive
  56 //!   ChunkStore lock asap. Start new service with existing listening
  57 //!   socket.
  58 //!
  59 //!
  60 //! # Garbage Collection (GC)
  61 //!
  62 //! Deleting backups is as easy as deleting the corresponding .idx
  63 //! files. Unfortunately, this does not free up any storage, because
  64 //! those files just contains references to chunks.
  65 //!
  66 //! To free up some storage, we run a garbage collection process at
  67 //! regular intervals. The collector uses an mark and sweep
  68 //! approach. In the first phase, it scans all .idx files to mark used
  69 //! chunks. The second phase then removes all unmarked chunks from the
  70 //! store.
  71 //!
  72 //! The above locking mechanism makes sure that we are the only
  73 //! process running GC. But we still want to be able to create backups
  74 //! during GC, so there may be multiple backup threads/tasks
  75 //! running. Either started before GC started, or started while GC is
  76 //! running.
  77 //!
  78 //! ## `atime` based GC
  79 //!
  80 //! The idea here is to mark chunks by updating the `atime` (access
  81 //! timestamp) on the chunk file. This is quite simple and does not
  82 //! need additional RAM.
  83 //!
  84 //! One minor problem is that recent Linux versions use the `relatime`
  85 //! mount flag by default for performance reasons (yes, we want
  86 //! that). When enabled, `atime` data is written to the disk only if
  87 //! the file has been modified since the `atime` data was last updated
  88 //! (`mtime`), or if the file was last accessed more than a certain
  89 //! amount of time ago (by default 24h). So we may only delete chunks
  90 //! with `atime` older than 24 hours.
  91 //!
  92 //! Another problem arise from running backups. The mark phase does
  93 //! not find any chunks from those backups, because there is no .idx
  94 //! file for them (created after the backup). Chunks created or
  95 //! touched by those backups may have an `atime` as old as the start
  96 //! time of those backup. Please not that the backup start time may
  97 //! predate the GC start time. Se we may only delete chunk older than
  98 //! the start time of those running backup jobs.
  99 //!
 100 //!
 101 //! ## Store `marks` in RAM using a HASH
 102 //!
 103 //! Not sure if this is better. TODO
 104
 105 #[macro_export]
 106 macro_rules! PROXMOX_BACKUP_PROTOCOL_ID_V1 {
 107     () =>  { "proxmox-backup-protocol-v1" }
 108 }
 109
 110 #[macro_export]
 111 macro_rules! PROXMOX_BACKUP_READER_PROTOCOL_ID_V1 {
 112     () =>  { "proxmox-backup-reader-protocol-v1" }
 113 }
 114
 115 mod file_formats;
 116 pub use file_formats::*;
 117
 118 mod crypt_config;
 119 pub use crypt_config::*;
 120
 121 mod key_derivation;
 122 pub use key_derivation::*;
 123
 124 mod data_chunk;
 125 pub use data_chunk::*;
 126
 127 mod data_blob;
 128 pub use data_blob::*;
 129
 130 mod chunk_stream;
 131 pub use chunk_stream::*;
 132
 133 mod chunk_stat;
 134 pub use chunk_stat::*;
 135
 136 pub use proxmox_protocol::Chunker;
 137
 138 mod read_chunk;
 139 pub use read_chunk::*;
 140
 141 mod chunk_store;
 142 pub use chunk_store::*;
 143
 144 mod index;
 145 pub use index::*;
 146
 147 mod fixed_index;
 148 pub use fixed_index::*;
 149
 150 mod dynamic_index;
 151 pub use dynamic_index::*;
 152
 153 mod backup_info;
 154 pub use backup_info::*;
 155
 156 mod datastore;
 157 pub use datastore::*;