9 When doing deduplication, there are different strategies to get
10 optimal results in terms of performance and/or deduplication rates.
11 Depending on the type of data, it can be split into *fixed* or *variable*
14 Fixed sized chunking requires minimal CPU power, and is used to
15 backup virtual machine images.
17 Variable sized chunking needs more CPU power, but is essential to get
18 good deduplication rates for file archives.
20 The `Proxmox Backup`_ Server supports both strategies.
23 Image Archives: ``<name>.img``
24 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26 This is used for virtual machine images and other large binary
27 data. Content is split into fixed-sized chunks.
30 File Archives: ``<name>.pxar``
31 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33 .. see https://moinakg.wordpress.com/2013/06/22/high-performance-content-defined-chunking/
35 A file archive stores a full directory tree. Content is stored using
36 the :ref:`pxar-format`, split into variable-sized chunks. The format
37 is optimized to achieve good deduplication rates.
43 This type is used to store smaller (< 16MB) binary data such as
44 configuration files. Larger files should be stored as image archives.
46 .. caution:: Please do not store all files as BLOBs. Instead, use the
47 file archive to store entire directory trees.
50 Catalog File: ``catalog.pcat1``
51 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53 The catalog file is an index for file archives. It contains
54 the list of included files and is used to speed up search operations.
57 The Manifest: ``index.json``
58 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 The manifest contains a list of all backed up files, and their
61 sizes and checksums. It is used to verify the consistency of a
67 Namespaces allow for the reuse of a single chunk store deduplication domain for
68 multiple sources, while avoiding naming conflicts and enabling more fine-grained
71 Essentially, they're implemented as a simple directory structure and don't
72 require separate configuration.
77 The backup server groups backups by *type*, where *type* is one of:
80 This type is used for :term:`virtual machine<Virtual machine>`\ s. It
81 typically consists of the virtual machine's configuration file and an image
82 archive for each disk.
85 This type is used for :term:`container<Container>`\ s. It consists of the
86 container's configuration and a single file archive for the filesystem's
90 This type is used for file/directory backups created from within a machine.
91 Typically this would be a physical host, but could also be a virtual machine
92 or container. Such backups may contain file and image archives; there are no
93 restrictions in this regard.
98 A unique ID for a specific Backup Type and Backup Namespace. Usually the
99 virtual machine or container ID. ``host`` type backups normally use the
105 The time when the backup was made with second resolution.
111 The tuple ``<type>/<id>`` is called a backup group. Such a group may contain
112 one or more backup snapshots.
115 .. _term_backup_snapshot:
120 The triplet ``<type>/<ID>/<time>`` is called a backup snapshot. It
121 uniquely identifies a specific backup within a namespace.
123 .. code-block:: console
124 :caption: Backup Snapshot Examples
126 vm/104/2019-10-09T08:01:06Z
127 host/elsa/2019-11-08T09:48:14Z
129 As you can see, the time format is RFC3339_ with Coordinated
130 Universal Time (UTC_, identified by the trailing *Z*).