]>
Commit | Line | Data |
---|---|---|
1 | [[chapter_pvesr]] | |
2 | ifdef::manvolnum[] | |
3 | pvesr(1) | |
4 | ======== | |
5 | :pve-toplevel: | |
6 | ||
7 | NAME | |
8 | ---- | |
9 | ||
10 | pvesr - Proxmox VE Storage Replication | |
11 | ||
12 | SYNOPSIS | |
13 | -------- | |
14 | ||
15 | include::pvesr.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Storage Replication | |
23 | =================== | |
24 | :pve-toplevel: | |
25 | endif::manvolnum[] | |
26 | ||
27 | The `pvesr` command line tool manages the {PVE} storage replication | |
28 | framework. Storage replication brings redundancy for guests using | |
29 | local storage and reduces migration time. | |
30 | ||
31 | It replicates guest volumes to another node so that all data is available | |
32 | without using shared storage. Replication uses snapshots to minimize traffic | |
33 | sent over the network. Therefore, new data is sent only incrementally after | |
34 | the initial full sync. In the case of a node failure, your guest data is | |
35 | still available on the replicated node. | |
36 | ||
37 | The replication is done automatically in configurable intervals. | |
38 | The minimum replication interval is one minute, and the maximal interval | |
39 | once a week. The format used to specify those intervals is a subset of | |
40 | `systemd` calendar events, see | |
41 | xref:pvesr_schedule_time_format[Schedule Format] section: | |
42 | ||
43 | It is possible to replicate a guest to multiple target nodes, | |
44 | but not twice to the same target node. | |
45 | ||
46 | Each replications bandwidth can be limited, to avoid overloading a storage | |
47 | or server. | |
48 | ||
49 | Guests with replication enabled can currently only be migrated offline. | |
50 | Only changes since the last replication (so-called `deltas`) need to be | |
51 | transferred if the guest is migrated to a node to which it already is | |
52 | replicated. This reduces the time needed significantly. The replication | |
53 | direction automatically switches if you migrate a guest to the replication | |
54 | target node. | |
55 | ||
56 | For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`. | |
57 | You migrate it to `nodeB`, so now it gets automatically replicated back from | |
58 | `nodeB` to `nodeA`. | |
59 | ||
60 | If you migrate to a node where the guest is not replicated, the whole disk | |
61 | data must send over. After the migration, the replication job continues to | |
62 | replicate this guest to the configured nodes. | |
63 | ||
64 | [IMPORTANT] | |
65 | ==== | |
66 | High-Availability is allowed in combination with storage replication, but there | |
67 | may be some data loss between the last synced time and the time a node failed. | |
68 | ==== | |
69 | ||
70 | Supported Storage Types | |
71 | ----------------------- | |
72 | ||
73 | .Storage Types | |
74 | [width="100%",options="header"] | |
75 | |============================================ | |
76 | |Description |PVE type |Snapshots|Stable | |
77 | |ZFS (local) |zfspool |yes |yes | |
78 | |============================================ | |
79 | ||
80 | [[pvesr_schedule_time_format]] | |
81 | Schedule Format | |
82 | --------------- | |
83 | Replication uses xref:chapter_calendar_events[calendar events] for | |
84 | configuring the schedule. | |
85 | ||
86 | Error Handling | |
87 | -------------- | |
88 | ||
89 | If a replication job encounters problems, it is placed in an error state. | |
90 | In this state, the configured replication intervals get suspended | |
91 | temporarily. The failed replication is repeatedly tried again in a | |
92 | 30 minute interval. | |
93 | Once this succeeds, the original schedule gets activated again. | |
94 | ||
95 | Possible issues | |
96 | ~~~~~~~~~~~~~~~ | |
97 | ||
98 | Some of the most common issues are in the following list. Depending on your | |
99 | setup there may be another cause. | |
100 | ||
101 | * Network is not working. | |
102 | ||
103 | * No free space left on the replication target storage. | |
104 | ||
105 | * Storage with same storage ID available on the target node | |
106 | ||
107 | NOTE: You can always use the replication log to find out what is causing the problem. | |
108 | ||
109 | Migrating a guest in case of Error | |
110 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
111 | // FIXME: move this to better fitting chapter (sysadmin ?) and only link to | |
112 | // it here | |
113 | ||
114 | In the case of a grave error, a virtual guest may get stuck on a failed | |
115 | node. You then need to move it manually to a working node again. | |
116 | ||
117 | Example | |
118 | ~~~~~~~ | |
119 | ||
120 | Let's assume that you have two guests (VM 100 and CT 200) running on node A | |
121 | and replicate to node B. | |
122 | Node A failed and can not get back online. Now you have to migrate the guest | |
123 | to Node B manually. | |
124 | ||
125 | - connect to node B over ssh or open its shell via the WebUI | |
126 | ||
127 | - check if that the cluster is quorate | |
128 | + | |
129 | ---- | |
130 | # pvecm status | |
131 | ---- | |
132 | ||
133 | - If you have no quorum, we strongly advise to fix this first and make the | |
134 | node operable again. Only if this is not possible at the moment, you may | |
135 | use the following command to enforce quorum on the current node: | |
136 | + | |
137 | ---- | |
138 | # pvecm expected 1 | |
139 | ---- | |
140 | ||
141 | WARNING: Avoid changes which affect the cluster if `expected votes` are set | |
142 | (for example adding/removing nodes, storages, virtual guests) at all costs. | |
143 | Only use it to get vital guests up and running again or to resolve the quorum | |
144 | issue itself. | |
145 | ||
146 | - move both guest configuration files form the origin node A to node B: | |
147 | + | |
148 | ---- | |
149 | # mv /etc/pve/nodes/A/qemu-server/100.conf /etc/pve/nodes/B/qemu-server/100.conf | |
150 | # mv /etc/pve/nodes/A/lxc/200.conf /etc/pve/nodes/B/lxc/200.conf | |
151 | ---- | |
152 | ||
153 | - Now you can start the guests again: | |
154 | + | |
155 | ---- | |
156 | # qm start 100 | |
157 | # pct start 200 | |
158 | ---- | |
159 | ||
160 | Remember to replace the VMIDs and node names with your respective values. | |
161 | ||
162 | Managing Jobs | |
163 | ------------- | |
164 | ||
165 | [thumbnail="screenshot/gui-qemu-add-replication-job.png"] | |
166 | ||
167 | You can use the web GUI to create, modify, and remove replication jobs | |
168 | easily. Additionally, the command line interface (CLI) tool `pvesr` can be | |
169 | used to do this. | |
170 | ||
171 | You can find the replication panel on all levels (datacenter, node, virtual | |
172 | guest) in the web GUI. They differ in which jobs get shown: | |
173 | all, node- or guest-specific jobs. | |
174 | ||
175 | When adding a new job, you need to specify the guest if not already selected | |
176 | as well as the target node. The replication | |
177 | xref:pvesr_schedule_time_format[schedule] can be set if the default of `all | |
178 | 15 minutes` is not desired. You may impose a rate-limit on a replication | |
179 | job. The rate limit can help to keep the load on the storage acceptable. | |
180 | ||
181 | A replication job is identified by a cluster-wide unique ID. This ID is | |
182 | composed of the VMID in addition to a job number. | |
183 | This ID must only be specified manually if the CLI tool is used. | |
184 | ||
185 | ||
186 | Command Line Interface Examples | |
187 | ------------------------------- | |
188 | ||
189 | Create a replication job which runs every 5 minutes with a limited bandwidth | |
190 | of 10 Mbps (megabytes per second) for the guest with ID 100. | |
191 | ||
192 | ---- | |
193 | # pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10 | |
194 | ---- | |
195 | ||
196 | Disable an active job with ID `100-0`. | |
197 | ||
198 | ---- | |
199 | # pvesr disable 100-0 | |
200 | ---- | |
201 | ||
202 | Enable a deactivated job with ID `100-0`. | |
203 | ||
204 | ---- | |
205 | # pvesr enable 100-0 | |
206 | ---- | |
207 | ||
208 | Change the schedule interval of the job with ID `100-0` to once per hour. | |
209 | ||
210 | ---- | |
211 | # pvesr update 100-0 --schedule '*/00' | |
212 | ---- | |
213 | ||
214 | ifdef::manvolnum[] | |
215 | include::pve-copyright.adoc[] | |
216 | endif::manvolnum[] |