]> git.proxmox.com Git - pve-qemu-kvm.git/blame - debian/patches/0001-add-documenation-for-new-backup-framework.patch
Two more fixes
[pve-qemu-kvm.git] / debian / patches / 0001-add-documenation-for-new-backup-framework.patch
CommitLineData
89af8a77 1From 2f0dcd89a0de8b656d33ce6997c09879bd287af7 Mon Sep 17 00:00:00 2001
5ad5891c
DM
2From: Dietmar Maurer <dietmar@proxmox.com>
3Date: Tue, 13 Nov 2012 09:24:50 +0100
884c5e9f 4Subject: [PATCH v5 1/6] add documenation for new backup framework
5ad5891c 5
5ad5891c
DM
6
7Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
8---
89af8a77
DM
9 docs/backup.txt | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
10 1 files changed, 116 insertions(+), 0 deletions(-)
11 create mode 100644 docs/backup.txt
5ad5891c 12
89af8a77 13diff --git a/docs/backup.txt b/docs/backup.txt
5ad5891c 14new file mode 100644
89af8a77 15index 0000000..927d787
5ad5891c 16--- /dev/null
89af8a77
DM
17+++ b/docs/backup.txt
18@@ -0,0 +1,116 @@
19+Efficient VM backup for qemu
5ad5891c
DM
20+
21+=Requirements=
22+
23+* Backup to a single archive file
24+* Backup needs to contain all data to restore VM (full backup)
25+* Do not depend on storage type or image format
26+* Avoid use of temporary storage
27+* store sparse images efficiently
28+
29+=Introduction=
30+
31+Most VM backup solutions use some kind of snapshot to get a consistent
32+VM view at a specific point in time. For example, we previously used
33+LVM to create a snapshot of all used VM images, which are then copied
34+into a tar file.
35+
36+That basically means that any data written during backup involve
37+considerable overhead. For LVM we get the following steps:
38+
39+1.) read original data (VM write)
40+2.) write original data into snapshot (VM write)
41+3.) write new data (VM write)
42+4.) read data from snapshot (backup)
43+5.) write data from snapshot into tar file (backup)
44+
45+Another approach to backup VM images is to create a new qcow2 image
46+which use the old image as base. During backup, writes are redirected
47+to the new image, so the old image represents a 'snapshot'. After
48+backup, data need to be copied back from new image into the old
49+one (commit). So a simple write during backup triggers the following
50+steps:
51+
52+1.) write new data to new image (VM write)
53+2.) read data from old image (backup)
54+3.) write data from old image into tar file (backup)
55+
56+4.) read data from new image (commit)
57+5.) write data to old image (commit)
58+
59+This is in fact the same overhead as before. Other tools like qemu
60+livebackup produces similar overhead (2 reads, 3 writes).
61+
62+Some storage types/formats supports internal snapshots using some kind
63+of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible
64+to use that for backups, but for now we want to be storage-independent.
65+
5ad5891c
DM
66+=Make it more efficient=
67+
68+The be more efficient, we simply need to avoid unnecessary steps. The
69+following steps are always required:
70+
71+1.) read old data before it gets overwritten
72+2.) write that data into the backup archive
73+3.) write new data (VM write)
74+
89af8a77 75+As you can see, this involves only one read, and two writes.
5ad5891c
DM
76+
77+To make that work, our backup archive need to be able to store image
78+data 'out of order'. It is important to notice that this will not work
79+with traditional archive formats like tar.
80+
81+During backup we simply intercept writes, then read existing data and
82+store that directly into the archive. After that we can continue the
83+write.
84+
85+==Advantages==
86+
87+* very good performance (1 read, 2 writes)
88+* works on any storage type and image format.
89+* avoid usage of temporary storage
90+* we can define a new and simple archive format, which is able to
91+ store sparse files efficiently.
92+
93+Note: Storing sparse files is a mess with existing archive
94+formats. For example, tar requires information about holes at the
95+beginning of the archive.
96+
97+==Disadvantages==
98+
99+* we need to define a new archive format
100+
101+Note: Most existing archive formats are optimized to store small files
102+including file attributes. We simply do not need that for VM archives.
103+
104+* archive contains data 'out of order'
105+
106+If you want to access image data in sequential order, you need to
107+re-order archive data. It would be possible to to that on the fly,
108+using temporary files.
109+
110+Fortunately, a normal restore/extract works perfectly with 'out of
111+order' data, because the target files are seekable.
112+
113+* slow backup storage can slow down VM during backup
114+
115+It is important to note that we only do sequential writes to the
116+backup storage. Furthermore one can compress the backup stream. IMHO,
117+it is better to slow down the VM a bit. All other solutions creates
118+large amounts of temporary data during backup.
119+
120+=Archive format requirements=
121+
122+The basic requirement for such new format is that we can store image
123+date 'out of order'. It is also very likely that we have less than 256
124+drives/images per VM, and we want to be able to store VM configuration
125+files.
126+
127+We have defined a very simply format with those properties, see:
128+
129+docs/specs/vma_spec.txt
130+
131+Please let us know if you know an existing format which provides the
132+same functionality.
133+
134+
135--
1361.7.2.5
137