]>
Commit | Line | Data |
---|---|---|
c024f553 DM |
1 | ifdef::manvolnum[] |
2 | pvesr(1) | |
3 | ======== | |
4 | :pve-toplevel: | |
5 | ||
6 | NAME | |
7 | ---- | |
8 | ||
236bec37 | 9 | pvesr - Proxmox VE Storage Replication |
c024f553 DM |
10 | |
11 | SYNOPSIS | |
12 | -------- | |
13 | ||
14 | include::pvesr.1-synopsis.adoc[] | |
15 | ||
16 | DESCRIPTION | |
17 | ----------- | |
18 | endif::manvolnum[] | |
19 | ||
20 | ifndef::manvolnum[] | |
21 | Storage Replication | |
22 | =================== | |
23 | :pve-toplevel: | |
24 | endif::manvolnum[] | |
25 | ||
45c218cf DM |
26 | The `pvesr` command line tool manages the {PVE} storage replication |
27 | framework. Storage replication brings redundancy for guests using | |
728a3ea5 | 28 | local storage and reduces migration time. |
45c218cf | 29 | |
728a3ea5 TL |
30 | It replicates guest volumes to another node so that all data is available |
31 | without using shared storage. Replication uses snapshots to minimize traffic | |
32 | sent over the network. Therefore, new data is sent only incrementally after | |
33 | an initial full sync. In the case of a node failure, your guest data is | |
34 | still available on the replicated node. | |
236bec37 | 35 | |
728a3ea5 TL |
36 | The replication will be done automatically in configurable intervals. |
37 | The minimum replication interval is one minute and the maximal interval is | |
38 | once a week. The format used to specify those intervals is a subset of | |
39 | `systemd` calendar events, see | |
40 | xref:pvesr_schedule_time_format[Schedule Format] section: | |
236bec37 | 41 | |
728a3ea5 TL |
42 | Every guest can be replicated to multiple target nodes, but a guest cannot |
43 | get replicated twice to the same target node. | |
236bec37 | 44 | |
728a3ea5 TL |
45 | Each replications bandwidth can be limited, to avoid overloading a storage |
46 | or server. | |
236bec37 | 47 | |
728a3ea5 TL |
48 | Virtual guest with active replication cannot currently use online migration. |
49 | Offline migration is supported in general. If you migrate to a node where | |
50 | the guests data is already replicated only the changes since the last | |
51 | synchronisation (so called `delta`) must be sent, this reduces the required | |
52 | time significantly. In this case the replication direction will also switch | |
53 | nodes automatically after the migration finished. | |
54 | ||
55 | For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`. | |
56 | You migrate it to `nodeB`, so now it gets automatically replicated back from | |
57 | `nodeB` to `nodeA`. | |
58 | ||
59 | If you migrate to a node where the guest is not replicated, the whole disk | |
60 | data must send over. After the migration the replication job continues to | |
61 | replicate this guest to the configured nodes. | |
62 | ||
63 | [IMPORTANT] | |
64 | ==== | |
65 | High-Availability is allowed in combination with storage replication, but it | |
66 | has the following implications: | |
67 | ||
68 | * redistributing services after a more preferred node comes online will lead | |
69 | to errors. | |
70 | ||
71 | * recovery works, but there may be some data loss between the last synced | |
72 | time and the time a node failed. | |
73 | ==== | |
236bec37 WL |
74 | |
75 | Supported Storage Types | |
76 | ----------------------- | |
77 | ||
78 | .Storage Types | |
79 | [width="100%",options="header"] | |
80 | |============================================ | |
81 | |Description |PVE type |Snapshots|Stable | |
82 | |ZFS (local) |zfspool |yes |yes | |
83 | |============================================ | |
84 | ||
728a3ea5 TL |
85 | [[pvesr_schedule_time_format]] |
86 | Schedule Format | |
87 | --------------- | |
236bec37 | 88 | |
728a3ea5 TL |
89 | {pve} has a very flexible replication scheduler. It is based on the systemd |
90 | time calendar event format.footnote:[see `man 7 sytemd.time` for more information] | |
91 | Calendar events may be used to refer to one or more points in time in a | |
92 | single expression. | |
236bec37 | 93 | |
728a3ea5 | 94 | Such a calendar event uses the following format: |
236bec37 | 95 | |
728a3ea5 TL |
96 | ---- |
97 | [day(s)] [[start-time(s)][/repetition-time(s)]] | |
98 | ---- | |
236bec37 | 99 | |
728a3ea5 TL |
100 | This allows you to configure a set of days on which the job should run. |
101 | You can also set one or more start times, it tells the replication scheduler | |
102 | the moments in time when a job should start. | |
103 | With this information we could create a job which runs every workday at 10 | |
104 | PM: `'mon,tue,wed,thu,fri 22'` which could be abbreviated to: `'mon..fri | |
105 | 22'`, most reasonable schedules can be written quite intuitive this way. | |
106 | ||
107 | NOTE: Hours are set in 24h format. | |
108 | ||
109 | To allow easier and shorter configuration one or more repetition times can | |
110 | be set. They indicate that on the start-time(s) itself and the start-time(s) | |
111 | plus all multiples of the repetition value replications will be done. If | |
112 | you want to start replication at 8 AM and repeat it every 15 minutes you | |
113 | would use: `'8:00/15'` | |
114 | ||
115 | Here you see also that if no hour separation (`:`) is used the value gets | |
116 | interpreted as minute. If such a separation is used the value on the left | |
117 | denotes the hour(s) and the value on the right denotes the minute(s). | |
118 | Further, you can use `*` to match all possible values. | |
119 | ||
120 | To get additional ideas look at | |
121 | xref:pvesr_schedule_format_examples[more Examples below]. | |
122 | ||
123 | Detailed Specification | |
124 | ~~~~~~~~~~~~~~~~~~~~~~ | |
125 | ||
126 | days:: Days are specified with an abbreviated English version: `sun, mon, | |
127 | tue, wed, thu, fri and sat`. You may use multiple days as a comma-separated | |
128 | list. A range of days can also be set by specifying the start and end day | |
129 | separated by ``..'', for example `mon..fri`. Those formats can be also | |
130 | mixed. If omitted `'*'` is assumed. | |
131 | ||
132 | time-format:: A time format consists of hours and minutes interval lists. | |
133 | Hours and minutes are separated by `':'`. Both, hour and minute, can be list | |
134 | and ranges of values, using the same format as days. | |
135 | First come hours then minutes, hours can be omitted if not needed, in this | |
136 | case `'*'` is assumed for the value of hours. | |
137 | The valid range for values is `0-23` for hours and `0-59` for minutes. | |
138 | ||
139 | [[pvesr_schedule_format_examples]] | |
236bec37 WL |
140 | Examples: |
141 | ~~~~~~~~~ | |
142 | ||
143 | .Schedule Examples | |
144 | [width="100%",options="header"] | |
728a3ea5 TL |
145 | |============================================================================== |
146 | |Schedule String |Alternative |Meaning | |
147 | |mon,tue,wed,thu,fri |mon..fri |All working days at 0:00 | |
148 | |sat,sun |sat..sun |Only on weekend at 0:00 | |
149 | |mon,wed,fri |-- |Only on Monday, Wednesday and Friday at 0:00 | |
150 | |12:05 |12:05 |All weekdays at 12:05 PM | |
151 | |*/5 |0/5 |Every day all five minutes | |
152 | |mon..wed 30/10 |mon,tue,wed 30/10 |Monday, Tuesday, Wednesday 30, 40 and 50 minutes after every full hour | |
153 | |mon..fri 8..17,22:0/15 |-- |All working days every 15 minutes between 8 AM and 5 PM plus at 10 PM | |
154 | |fri 12..13:5/20 |fri 12,13:5/20 |Friday at 12:05, 12:25, 12:45, 13:05, 13:25 and 13:45 | |
155 | |12..22:5/2 |12:5/2 |Every day starting at 12:05 until 22:05 all 2 hours | |
156 | |* |*/1 |Every minute (minimum interval) | |
157 | |============================================================================== | |
236bec37 | 158 | |
728a3ea5 TL |
159 | Error Handling |
160 | -------------- | |
236bec37 | 161 | |
728a3ea5 TL |
162 | If a replication job encounters problems it will be placed in error state. |
163 | In this state the configured replication intervals get suspended | |
164 | temporarily. Then we retry the failed replication in a 30 minute interval, | |
165 | once this succeeds the original schedule gets activated again. | |
236bec37 | 166 | |
728a3ea5 TL |
167 | Possible issues |
168 | ~~~~~~~~~~~~~~~ | |
236bec37 | 169 | |
728a3ea5 TL |
170 | This represents only the most common issues possible, depending on your |
171 | setup there may be also another cause. | |
236bec37 | 172 | |
728a3ea5 | 173 | * Network is not working. |
236bec37 | 174 | |
728a3ea5 | 175 | * No free space left on the replication target storage. |
236bec37 | 176 | |
728a3ea5 | 177 | * Storage with same storage ID available on target node |
236bec37 | 178 | |
728a3ea5 TL |
179 | NOTE: You can always use the replication log to get hints about a problems |
180 | cause. | |
236bec37 | 181 | |
728a3ea5 TL |
182 | Migrating a guest in case of Error |
183 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
184 | // FIXME: move this to better fitting chapter (sysadmin ?) and only link to | |
185 | // it here | |
236bec37 | 186 | |
728a3ea5 TL |
187 | In the case of a grave error a virtual guest may get stuck on a failed |
188 | node. You then need to move it manually to a working node again. | |
236bec37 | 189 | |
728a3ea5 TL |
190 | Example |
191 | ~~~~~~~ | |
236bec37 | 192 | |
728a3ea5 TL |
193 | Lets assume that you have two guests (VM 100 and CT 200) running on node A |
194 | and replicate to node B. | |
195 | Node A failed and can not get back online. Now you have to migrate the guest | |
196 | to Node B manually. | |
236bec37 | 197 | |
728a3ea5 | 198 | - connect to node B over ssh or open its shell via the WebUI |
236bec37 | 199 | |
728a3ea5 TL |
200 | - check if that the cluster is quorate |
201 | + | |
202 | ---- | |
203 | # pvecm status | |
204 | ---- | |
236bec37 | 205 | |
728a3ea5 TL |
206 | - If you have no quorum we strongly advise to fix this first and make the |
207 | node operable again. Only if this is not possible at the moment you may | |
208 | use the following command to enforce quorum on the current node: | |
209 | + | |
210 | ---- | |
211 | # pvecm expected 1 | |
212 | ---- | |
236bec37 | 213 | |
728a3ea5 TL |
214 | WARNING: If expected votes are set avoid changes which affect the cluster |
215 | (for example adding/removing nodes, storages, virtual guests) at all costs. | |
216 | Only use it to get vital guests up and running again or to resolve to quorum | |
217 | issue itself. | |
236bec37 | 218 | |
728a3ea5 TL |
219 | - move both guest configuration files form the origin node A to node B: |
220 | + | |
221 | ---- | |
222 | # mv /etc/pve/node/A/qemu-server/100.conf /etc/pve/node/B/qemu-server/100.conf | |
223 | # mv /etc/pve/node/A/lxc/200.conf /etc/pve/node/B/lxc/200.conf | |
224 | ---- | |
236bec37 | 225 | |
728a3ea5 TL |
226 | - Now you can start the guests again: |
227 | + | |
228 | ---- | |
229 | # qm start 100 | |
230 | # pct start 200 | |
231 | ---- | |
236bec37 | 232 | |
728a3ea5 | 233 | Remember to replace the VMIDs and node names with your respective values. |
236bec37 | 234 | |
728a3ea5 TL |
235 | Managing Jobs |
236 | ------------- | |
236bec37 | 237 | |
728a3ea5 TL |
238 | You can use the web GUI to create, modify and remove replication jobs |
239 | easily. Additionally the command line interface (CLI) tool `pvesr` can be | |
240 | used to do this. | |
236bec37 | 241 | |
728a3ea5 TL |
242 | You can find the replication panel on all levels (datacenter, node, virtual |
243 | guest) in the web GUI. They differ in what jobs get shown: all, only node | |
244 | specific or only guest specific jobs. | |
236bec37 | 245 | |
728a3ea5 | 246 | // TODO insert auto generated images of add web UI dialog |
236bec37 | 247 | |
728a3ea5 TL |
248 | Once adding a new job you need to specify the virtual guest (if not already |
249 | selected) and the target node. The replication | |
250 | xref:pvesr_schedule_time_format[schedule] can be set if the default of `all | |
251 | 15 minutes` is not desired. You may also impose rate limiting on a | |
252 | replication job, this can help to keep the storage load acceptable. | |
236bec37 | 253 | |
728a3ea5 TL |
254 | A replication job is identified by an cluster-wide unique ID. This ID is |
255 | composed of the VMID in addition to an job number. | |
256 | This ID must only be specified manually if the CLI tool is used. | |
236bec37 | 257 | |
236bec37 | 258 | |
728a3ea5 TL |
259 | Command Line Interface Examples |
260 | ------------------------------- | |
236bec37 | 261 | |
728a3ea5 TL |
262 | Create a replication job which will run all 5 min with limited bandwidth of |
263 | 10 mbps (megabytes per second) for the guest with guest ID 100. | |
236bec37 | 264 | |
728a3ea5 TL |
265 | ---- |
266 | # pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10 | |
267 | ---- | |
236bec37 | 268 | |
728a3ea5 | 269 | Disable an active job with ID `100-0` |
236bec37 | 270 | |
728a3ea5 TL |
271 | ---- |
272 | # pvesr disable 100-0 | |
273 | ---- | |
236bec37 | 274 | |
728a3ea5 | 275 | Enable a deactivated job with ID `100-0` |
236bec37 | 276 | |
728a3ea5 TL |
277 | ---- |
278 | # pvesr enable 100-0 | |
279 | ---- | |
236bec37 | 280 | |
728a3ea5 | 281 | Change the schedule interval of the job with ID `100-0` to once a hour |
236bec37 | 282 | |
728a3ea5 TL |
283 | ---- |
284 | # pvesr update 100-0 --schedule '*/00' | |
285 | ---- | |
c024f553 DM |
286 | |
287 | ifdef::manvolnum[] | |
288 | include::pve-copyright.adoc[] | |
289 | endif::manvolnum[] |