]> git.proxmox.com Git - pve-qemu.git/blame - backup.txt
Makefile: drop -j option from dpkg-buildpackage
[pve-qemu.git] / backup.txt
CommitLineData
95259824
WB
1Efficient VM backup for qemu
2
3=Requirements=
4
5* Backup to a single archive file
6* Backup needs to contain all data to restore VM (full backup)
7* Do not depend on storage type or image format
8* Avoid use of temporary storage
9* store sparse images efficiently
10
11=Introduction=
12
13Most VM backup solutions use some kind of snapshot to get a consistent
14VM view at a specific point in time. For example, we previously used
15LVM to create a snapshot of all used VM images, which are then copied
16into a tar file.
17
18That basically means that any data written during backup involve
19considerable overhead. For LVM we get the following steps:
20
211.) read original data (VM write)
222.) write original data into snapshot (VM write)
233.) write new data (VM write)
244.) read data from snapshot (backup)
255.) write data from snapshot into tar file (backup)
26
27Another approach to backup VM images is to create a new qcow2 image
28which use the old image as base. During backup, writes are redirected
29to the new image, so the old image represents a 'snapshot'. After
30backup, data need to be copied back from new image into the old
31one (commit). So a simple write during backup triggers the following
32steps:
33
341.) write new data to new image (VM write)
352.) read data from old image (backup)
363.) write data from old image into tar file (backup)
37
384.) read data from new image (commit)
395.) write data to old image (commit)
40
41This is in fact the same overhead as before. Other tools like qemu
42livebackup produces similar overhead (2 reads, 3 writes).
43
44Some storage types/formats supports internal snapshots using some kind
45of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible
46to use that for backups, but for now we want to be storage-independent.
47
48=Make it more efficient=
49
50The be more efficient, we simply need to avoid unnecessary steps. The
51following steps are always required:
52
531.) read old data before it gets overwritten
542.) write that data into the backup archive
553.) write new data (VM write)
56
57As you can see, this involves only one read, and two writes.
58
59To make that work, our backup archive need to be able to store image
60data 'out of order'. It is important to notice that this will not work
61with traditional archive formats like tar.
62
63During backup we simply intercept writes, then read existing data and
64store that directly into the archive. After that we can continue the
65write.
66
67==Advantages==
68
69* very good performance (1 read, 2 writes)
70* works on any storage type and image format.
71* avoid usage of temporary storage
72* we can define a new and simple archive format, which is able to
73 store sparse files efficiently.
74
75Note: Storing sparse files is a mess with existing archive
76formats. For example, tar requires information about holes at the
77beginning of the archive.
78
79==Disadvantages==
80
81* we need to define a new archive format
82
83Note: Most existing archive formats are optimized to store small files
84including file attributes. We simply do not need that for VM archives.
85
86* archive contains data 'out of order'
87
88If you want to access image data in sequential order, you need to
89re-order archive data. It would be possible to to that on the fly,
90using temporary files.
91
92Fortunately, a normal restore/extract works perfectly with 'out of
93order' data, because the target files are seekable.
94
95* slow backup storage can slow down VM during backup
96
97It is important to note that we only do sequential writes to the
98backup storage. Furthermore one can compress the backup stream. IMHO,
99it is better to slow down the VM a bit. All other solutions creates
100large amounts of temporary data during backup.
101
102=Archive format requirements=
103
104The basic requirement for such new format is that we can store image
105date 'out of order'. It is also very likely that we have less than 256
106drives/images per VM, and we want to be able to store VM configuration
107files.
108
109We have defined a very simply format with those properties, see:
110
2b2949ca 111https://git.proxmox.com/?p=pve-qemu.git;a=blob;f=vma_spec.txt;
95259824
WB
112
113Please let us know if you know an existing format which provides the
114same functionality.
115
116