]>
Commit | Line | Data |
---|---|---|
b1f48b2a | 1 | [[chapter_pvesr]] |
c024f553 DM |
2 | ifdef::manvolnum[] |
3 | pvesr(1) | |
4 | ======== | |
5 | :pve-toplevel: | |
6 | ||
7 | NAME | |
8 | ---- | |
9 | ||
236bec37 | 10 | pvesr - Proxmox VE Storage Replication |
c024f553 DM |
11 | |
12 | SYNOPSIS | |
13 | -------- | |
14 | ||
15 | include::pvesr.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Storage Replication | |
23 | =================== | |
24 | :pve-toplevel: | |
25 | endif::manvolnum[] | |
26 | ||
45c218cf DM |
27 | The `pvesr` command line tool manages the {PVE} storage replication |
28 | framework. Storage replication brings redundancy for guests using | |
728a3ea5 | 29 | local storage and reduces migration time. |
45c218cf | 30 | |
728a3ea5 TL |
31 | It replicates guest volumes to another node so that all data is available |
32 | without using shared storage. Replication uses snapshots to minimize traffic | |
33 | sent over the network. Therefore, new data is sent only incrementally after | |
c93fb4bf | 34 | the initial full sync. In the case of a node failure, your guest data is |
728a3ea5 | 35 | still available on the replicated node. |
236bec37 | 36 | |
c93fb4bf WL |
37 | The replication is done automatically in configurable intervals. |
38 | The minimum replication interval is one minute, and the maximal interval | |
728a3ea5 TL |
39 | once a week. The format used to specify those intervals is a subset of |
40 | `systemd` calendar events, see | |
41 | xref:pvesr_schedule_time_format[Schedule Format] section: | |
236bec37 | 42 | |
c93fb4bf WL |
43 | It is possible to replicate a guest to multiple target nodes, |
44 | but not twice to the same target node. | |
236bec37 | 45 | |
728a3ea5 TL |
46 | Each replications bandwidth can be limited, to avoid overloading a storage |
47 | or server. | |
236bec37 | 48 | |
c93fb4bf WL |
49 | Guests with replication enabled can currently only be migrated offline. |
50 | Only changes since the last replication (so-called `deltas`) need to be | |
51 | transferred if the guest is migrated to a node to which it already is | |
52 | replicated. This reduces the time needed significantly. The replication | |
53 | direction automatically switches if you migrate a guest to the replication | |
54 | target node. | |
728a3ea5 TL |
55 | |
56 | For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`. | |
57 | You migrate it to `nodeB`, so now it gets automatically replicated back from | |
58 | `nodeB` to `nodeA`. | |
59 | ||
60 | If you migrate to a node where the guest is not replicated, the whole disk | |
c93fb4bf | 61 | data must send over. After the migration, the replication job continues to |
728a3ea5 TL |
62 | replicate this guest to the configured nodes. |
63 | ||
64 | [IMPORTANT] | |
65 | ==== | |
66 | High-Availability is allowed in combination with storage replication, but it | |
67 | has the following implications: | |
68 | ||
40a27a5a TL |
69 | * as live-migrations are currently not possible, redistributing services after |
70 | a more preferred node comes online does not work. Keep that in mind when | |
71 | configuring your HA groups and their priorities for replicated guests. | |
728a3ea5 TL |
72 | |
73 | * recovery works, but there may be some data loss between the last synced | |
74 | time and the time a node failed. | |
75 | ==== | |
236bec37 WL |
76 | |
77 | Supported Storage Types | |
78 | ----------------------- | |
79 | ||
80 | .Storage Types | |
81 | [width="100%",options="header"] | |
82 | |============================================ | |
83 | |Description |PVE type |Snapshots|Stable | |
84 | |ZFS (local) |zfspool |yes |yes | |
85 | |============================================ | |
86 | ||
728a3ea5 TL |
87 | [[pvesr_schedule_time_format]] |
88 | Schedule Format | |
89 | --------------- | |
236bec37 | 90 | |
728a3ea5 | 91 | {pve} has a very flexible replication scheduler. It is based on the systemd |
470d4313 | 92 | time calendar event format.footnote:[see `man 7 systemd.time` for more information] |
728a3ea5 TL |
93 | Calendar events may be used to refer to one or more points in time in a |
94 | single expression. | |
236bec37 | 95 | |
728a3ea5 | 96 | Such a calendar event uses the following format: |
236bec37 | 97 | |
728a3ea5 TL |
98 | ---- |
99 | [day(s)] [[start-time(s)][/repetition-time(s)]] | |
100 | ---- | |
236bec37 | 101 | |
c93fb4bf WL |
102 | This format allows you to configure a set of days on which the job should run. |
103 | You can also set one or more start times. It tells the replication scheduler | |
728a3ea5 | 104 | the moments in time when a job should start. |
c93fb4bf | 105 | With this information we, can create a job which runs every workday at 10 |
728a3ea5 TL |
106 | PM: `'mon,tue,wed,thu,fri 22'` which could be abbreviated to: `'mon..fri |
107 | 22'`, most reasonable schedules can be written quite intuitive this way. | |
108 | ||
c93fb4bf | 109 | NOTE: Hours are formatted in 24-hour format. |
728a3ea5 | 110 | |
c93fb4bf WL |
111 | To allow a convenient and shorter configuration, one or more repeat times per |
112 | guest can be set. They indicate that replications are done on the start-time(s) | |
113 | itself and the start-time(s) plus all multiples of the repetition value. If | |
19cc0d77 DC |
114 | you want to start replication at 8 AM and repeat it every 15 minutes until |
115 | 9 AM you would use: `'8:00/15'` | |
728a3ea5 | 116 | |
c93fb4bf WL |
117 | Here you see that if no hour separation (`:`), is used the value gets |
118 | interpreted as minute. If such a separation is used, the value on the left | |
119 | denotes the hour(s), and the value on the right denotes the minute(s). | |
728a3ea5 TL |
120 | Further, you can use `*` to match all possible values. |
121 | ||
122 | To get additional ideas look at | |
123 | xref:pvesr_schedule_format_examples[more Examples below]. | |
124 | ||
125 | Detailed Specification | |
126 | ~~~~~~~~~~~~~~~~~~~~~~ | |
127 | ||
128 | days:: Days are specified with an abbreviated English version: `sun, mon, | |
129 | tue, wed, thu, fri and sat`. You may use multiple days as a comma-separated | |
130 | list. A range of days can also be set by specifying the start and end day | |
c93fb4bf WL |
131 | separated by ``..'', for example `mon..fri`. These formats can be mixed. |
132 | If omitted `'*'` is assumed. | |
728a3ea5 TL |
133 | |
134 | time-format:: A time format consists of hours and minutes interval lists. | |
c93fb4bf | 135 | Hours and minutes are separated by `':'`. Both hour and minute can be list |
728a3ea5 | 136 | and ranges of values, using the same format as days. |
c93fb4bf | 137 | First are hours, then minutes. Hours can be omitted if not needed. In this |
728a3ea5 TL |
138 | case `'*'` is assumed for the value of hours. |
139 | The valid range for values is `0-23` for hours and `0-59` for minutes. | |
140 | ||
141 | [[pvesr_schedule_format_examples]] | |
236bec37 WL |
142 | Examples: |
143 | ~~~~~~~~~ | |
144 | ||
145 | .Schedule Examples | |
146 | [width="100%",options="header"] | |
728a3ea5 TL |
147 | |============================================================================== |
148 | |Schedule String |Alternative |Meaning | |
8985eb37 FG |
149 | |mon,tue,wed,thu,fri |mon..fri |Every working day at 0:00 |
150 | |sat,sun |sat..sun |Only on weekends at 0:00 | |
728a3ea5 | 151 | |mon,wed,fri |-- |Only on Monday, Wednesday and Friday at 0:00 |
8985eb37 FG |
152 | |12:05 |12:05 |Every day at 12:05 PM |
153 | |*/5 |0/5 |Every five minutes | |
728a3ea5 | 154 | |mon..wed 30/10 |mon,tue,wed 30/10 |Monday, Tuesday, Wednesday 30, 40 and 50 minutes after every full hour |
8985eb37 | 155 | |mon..fri 8..17,22:0/15 |-- |Every working day every 15 minutes between 8 AM and 6 PM and between 10 PM and 11 PM |
728a3ea5 | 156 | |fri 12..13:5/20 |fri 12,13:5/20 |Friday at 12:05, 12:25, 12:45, 13:05, 13:25 and 13:45 |
8985eb37 | 157 | |12,14,16,18,20,22:5 |12/2:5 |Every day starting at 12:05 until 22:05, every 2 hours |
728a3ea5 TL |
158 | |* |*/1 |Every minute (minimum interval) |
159 | |============================================================================== | |
236bec37 | 160 | |
728a3ea5 TL |
161 | Error Handling |
162 | -------------- | |
236bec37 | 163 | |
c93fb4bf WL |
164 | If a replication job encounters problems, it is placed in an error state. |
165 | In this state, the configured replication intervals get suspended | |
166 | temporarily. The failed replication is repeatedly tried again in a | |
167 | 30 minute interval. | |
168 | Once this succeeds, the original schedule gets activated again. | |
236bec37 | 169 | |
728a3ea5 TL |
170 | Possible issues |
171 | ~~~~~~~~~~~~~~~ | |
236bec37 | 172 | |
c93fb4bf WL |
173 | Some of the most common issues are in the following list. Depending on your |
174 | setup there may be another cause. | |
236bec37 | 175 | |
728a3ea5 | 176 | * Network is not working. |
236bec37 | 177 | |
728a3ea5 | 178 | * No free space left on the replication target storage. |
236bec37 | 179 | |
c93fb4bf | 180 | * Storage with same storage ID available on the target node |
236bec37 | 181 | |
c93fb4bf | 182 | NOTE: You can always use the replication log to find out what is causing the problem. |
236bec37 | 183 | |
728a3ea5 TL |
184 | Migrating a guest in case of Error |
185 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
186 | // FIXME: move this to better fitting chapter (sysadmin ?) and only link to | |
187 | // it here | |
236bec37 | 188 | |
c93fb4bf | 189 | In the case of a grave error, a virtual guest may get stuck on a failed |
728a3ea5 | 190 | node. You then need to move it manually to a working node again. |
236bec37 | 191 | |
728a3ea5 TL |
192 | Example |
193 | ~~~~~~~ | |
236bec37 | 194 | |
c93fb4bf | 195 | Let's assume that you have two guests (VM 100 and CT 200) running on node A |
728a3ea5 TL |
196 | and replicate to node B. |
197 | Node A failed and can not get back online. Now you have to migrate the guest | |
198 | to Node B manually. | |
236bec37 | 199 | |
728a3ea5 | 200 | - connect to node B over ssh or open its shell via the WebUI |
236bec37 | 201 | |
728a3ea5 TL |
202 | - check if that the cluster is quorate |
203 | + | |
204 | ---- | |
205 | # pvecm status | |
206 | ---- | |
236bec37 | 207 | |
c93fb4bf WL |
208 | - If you have no quorum, we strongly advise to fix this first and make the |
209 | node operable again. Only if this is not possible at the moment, you may | |
728a3ea5 TL |
210 | use the following command to enforce quorum on the current node: |
211 | + | |
212 | ---- | |
213 | # pvecm expected 1 | |
214 | ---- | |
236bec37 | 215 | |
c93fb4bf WL |
216 | WARNING: Avoid changes which affect the cluster if `expected votes` are set |
217 | (for example adding/removing nodes, storages, virtual guests) at all costs. | |
5e8c8202 | 218 | Only use it to get vital guests up and running again or to resolve the quorum |
728a3ea5 | 219 | issue itself. |
236bec37 | 220 | |
728a3ea5 TL |
221 | - move both guest configuration files form the origin node A to node B: |
222 | + | |
223 | ---- | |
1cdc7c17 DC |
224 | # mv /etc/pve/nodes/A/qemu-server/100.conf /etc/pve/nodes/B/qemu-server/100.conf |
225 | # mv /etc/pve/nodes/A/lxc/200.conf /etc/pve/nodes/B/lxc/200.conf | |
728a3ea5 | 226 | ---- |
236bec37 | 227 | |
728a3ea5 TL |
228 | - Now you can start the guests again: |
229 | + | |
230 | ---- | |
231 | # qm start 100 | |
232 | # pct start 200 | |
233 | ---- | |
236bec37 | 234 | |
728a3ea5 | 235 | Remember to replace the VMIDs and node names with your respective values. |
236bec37 | 236 | |
728a3ea5 TL |
237 | Managing Jobs |
238 | ------------- | |
236bec37 | 239 | |
1ff5e4e8 | 240 | [thumbnail="screenshot/gui-qemu-add-replication-job.png"] |
2a3d436c | 241 | |
c93fb4bf WL |
242 | You can use the web GUI to create, modify, and remove replication jobs |
243 | easily. Additionally, the command line interface (CLI) tool `pvesr` can be | |
728a3ea5 | 244 | used to do this. |
236bec37 | 245 | |
728a3ea5 | 246 | You can find the replication panel on all levels (datacenter, node, virtual |
c93fb4bf WL |
247 | guest) in the web GUI. They differ in which jobs get shown: |
248 | all, node- or guest-specific jobs. | |
236bec37 | 249 | |
c93fb4bf WL |
250 | When adding a new job, you need to specify the guest if not already selected |
251 | as well as the target node. The replication | |
728a3ea5 | 252 | xref:pvesr_schedule_time_format[schedule] can be set if the default of `all |
c93fb4bf WL |
253 | 15 minutes` is not desired. You may impose a rate-limit on a replication |
254 | job. The rate limit can help to keep the load on the storage acceptable. | |
236bec37 | 255 | |
c93fb4bf WL |
256 | A replication job is identified by a cluster-wide unique ID. This ID is |
257 | composed of the VMID in addition to a job number. | |
728a3ea5 | 258 | This ID must only be specified manually if the CLI tool is used. |
236bec37 | 259 | |
236bec37 | 260 | |
728a3ea5 TL |
261 | Command Line Interface Examples |
262 | ------------------------------- | |
236bec37 | 263 | |
c93fb4bf WL |
264 | Create a replication job which runs every 5 minutes with a limited bandwidth |
265 | of 10 Mbps (megabytes per second) for the guest with ID 100. | |
236bec37 | 266 | |
728a3ea5 TL |
267 | ---- |
268 | # pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10 | |
269 | ---- | |
236bec37 | 270 | |
c93fb4bf | 271 | Disable an active job with ID `100-0`. |
236bec37 | 272 | |
728a3ea5 TL |
273 | ---- |
274 | # pvesr disable 100-0 | |
275 | ---- | |
236bec37 | 276 | |
c93fb4bf | 277 | Enable a deactivated job with ID `100-0`. |
236bec37 | 278 | |
728a3ea5 TL |
279 | ---- |
280 | # pvesr enable 100-0 | |
281 | ---- | |
236bec37 | 282 | |
c93fb4bf | 283 | Change the schedule interval of the job with ID `100-0` to once per hour. |
236bec37 | 284 | |
728a3ea5 TL |
285 | ---- |
286 | # pvesr update 100-0 --schedule '*/00' | |
287 | ---- | |
c024f553 DM |
288 | |
289 | ifdef::manvolnum[] | |
290 | include::pve-copyright.adoc[] | |
291 | endif::manvolnum[] |