X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pvesr.adoc;h=2bcc4d9486705324f3ea13471e29707ccef39dce;hp=e307d0d2b4bcf38a22f45637d69ff404e4dd4dd3;hb=a75eeddebed864bde81358463958ed1d166935e1;hpb=c024f5539e4513e8c2f94f246b1f208f3ffcdc03 diff --git a/pvesr.adoc b/pvesr.adoc index e307d0d..2bcc4d9 100644 --- a/pvesr.adoc +++ b/pvesr.adoc @@ -1,3 +1,4 @@ +[[chapter_pvesr]] ifdef::manvolnum[] pvesr(1) ======== @@ -6,7 +7,7 @@ pvesr(1) NAME ---- -pvesr - Proxmox VE Storage Replication +pvesr - Proxmox VE Storage Replication SYNOPSIS -------- @@ -23,7 +24,266 @@ Storage Replication :pve-toplevel: endif::manvolnum[] -The {PVE} storage replication tool (`pvesr`) ... +The `pvesr` command line tool manages the {PVE} storage replication +framework. Storage replication brings redundancy for guests using +local storage and reduces migration time. + +It replicates guest volumes to another node so that all data is available +without using shared storage. Replication uses snapshots to minimize traffic +sent over the network. Therefore, new data is sent only incrementally after +an initial full sync. In the case of a node failure, your guest data is +still available on the replicated node. + +The replication will be done automatically in configurable intervals. +The minimum replication interval is one minute and the maximal interval is +once a week. The format used to specify those intervals is a subset of +`systemd` calendar events, see +xref:pvesr_schedule_time_format[Schedule Format] section: + +Every guest can be replicated to multiple target nodes, but a guest cannot +get replicated twice to the same target node. + +Each replications bandwidth can be limited, to avoid overloading a storage +or server. + +Virtual guest with active replication cannot currently use online migration. +Offline migration is supported in general. If you migrate to a node where +the guests data is already replicated only the changes since the last +synchronisation (so called `delta`) must be sent, this reduces the required +time significantly. In this case the replication direction will also switch +nodes automatically after the migration finished. + +For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`. +You migrate it to `nodeB`, so now it gets automatically replicated back from +`nodeB` to `nodeA`. + +If you migrate to a node where the guest is not replicated, the whole disk +data must send over. After the migration the replication job continues to +replicate this guest to the configured nodes. + +[IMPORTANT] +==== +High-Availability is allowed in combination with storage replication, but it +has the following implications: + +* redistributing services after a more preferred node comes online will lead + to errors. + +* recovery works, but there may be some data loss between the last synced + time and the time a node failed. +==== + +Supported Storage Types +----------------------- + +.Storage Types +[width="100%",options="header"] +|============================================ +|Description |PVE type |Snapshots|Stable +|ZFS (local) |zfspool |yes |yes +|============================================ + +[[pvesr_schedule_time_format]] +Schedule Format +--------------- + +{pve} has a very flexible replication scheduler. It is based on the systemd +time calendar event format.footnote:[see `man 7 systemd.time` for more information] +Calendar events may be used to refer to one or more points in time in a +single expression. + +Such a calendar event uses the following format: + +---- +[day(s)] [[start-time(s)][/repetition-time(s)]] +---- + +This allows you to configure a set of days on which the job should run. +You can also set one or more start times, it tells the replication scheduler +the moments in time when a job should start. +With this information we could create a job which runs every workday at 10 +PM: `'mon,tue,wed,thu,fri 22'` which could be abbreviated to: `'mon..fri +22'`, most reasonable schedules can be written quite intuitive this way. + +NOTE: Hours are set in 24h format. + +To allow easier and shorter configuration one or more repetition times can +be set. They indicate that on the start-time(s) itself and the start-time(s) +plus all multiples of the repetition value replications will be done. If +you want to start replication at 8 AM and repeat it every 15 minutes until +9 AM you would use: `'8:00/15'` + +Here you see also that if no hour separation (`:`) is used the value gets +interpreted as minute. If such a separation is used the value on the left +denotes the hour(s) and the value on the right denotes the minute(s). +Further, you can use `*` to match all possible values. + +To get additional ideas look at +xref:pvesr_schedule_format_examples[more Examples below]. + +Detailed Specification +~~~~~~~~~~~~~~~~~~~~~~ + +days:: Days are specified with an abbreviated English version: `sun, mon, +tue, wed, thu, fri and sat`. You may use multiple days as a comma-separated +list. A range of days can also be set by specifying the start and end day +separated by ``..'', for example `mon..fri`. Those formats can be also +mixed. If omitted `'*'` is assumed. + +time-format:: A time format consists of hours and minutes interval lists. +Hours and minutes are separated by `':'`. Both, hour and minute, can be list +and ranges of values, using the same format as days. +First come hours then minutes, hours can be omitted if not needed, in this +case `'*'` is assumed for the value of hours. +The valid range for values is `0-23` for hours and `0-59` for minutes. + +[[pvesr_schedule_format_examples]] +Examples: +~~~~~~~~~ + +.Schedule Examples +[width="100%",options="header"] +|============================================================================== +|Schedule String |Alternative |Meaning +|mon,tue,wed,thu,fri |mon..fri |Every working day at 0:00 +|sat,sun |sat..sun |Only on weekends at 0:00 +|mon,wed,fri |-- |Only on Monday, Wednesday and Friday at 0:00 +|12:05 |12:05 |Every day at 12:05 PM +|*/5 |0/5 |Every five minutes +|mon..wed 30/10 |mon,tue,wed 30/10 |Monday, Tuesday, Wednesday 30, 40 and 50 minutes after every full hour +|mon..fri 8..17,22:0/15 |-- |Every working day every 15 minutes between 8 AM and 6 PM and between 10 PM and 11 PM +|fri 12..13:5/20 |fri 12,13:5/20 |Friday at 12:05, 12:25, 12:45, 13:05, 13:25 and 13:45 +|12,14,16,18,20,22:5 |12/2:5 |Every day starting at 12:05 until 22:05, every 2 hours +|* |*/1 |Every minute (minimum interval) +|============================================================================== + +Error Handling +-------------- + +If a replication job encounters problems it will be placed in error state. +In this state the configured replication intervals get suspended +temporarily. Then we retry the failed replication in a 30 minute interval, +once this succeeds the original schedule gets activated again. + +Possible issues +~~~~~~~~~~~~~~~ + +This represents only the most common issues possible, depending on your +setup there may be also another cause. + +* Network is not working. + +* No free space left on the replication target storage. + +* Storage with same storage ID available on target node + +NOTE: You can always use the replication log to get hints about a problems +cause. + +Migrating a guest in case of Error +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +// FIXME: move this to better fitting chapter (sysadmin ?) and only link to +// it here + +In the case of a grave error a virtual guest may get stuck on a failed +node. You then need to move it manually to a working node again. + +Example +~~~~~~~ + +Lets assume that you have two guests (VM 100 and CT 200) running on node A +and replicate to node B. +Node A failed and can not get back online. Now you have to migrate the guest +to Node B manually. + +- connect to node B over ssh or open its shell via the WebUI + +- check if that the cluster is quorate ++ +---- +# pvecm status +---- + +- If you have no quorum we strongly advise to fix this first and make the + node operable again. Only if this is not possible at the moment you may + use the following command to enforce quorum on the current node: ++ +---- +# pvecm expected 1 +---- + +WARNING: If expected votes are set avoid changes which affect the cluster +(for example adding/removing nodes, storages, virtual guests) at all costs. +Only use it to get vital guests up and running again or to resolve to quorum +issue itself. + +- move both guest configuration files form the origin node A to node B: ++ +---- +# mv /etc/pve/nodes/A/qemu-server/100.conf /etc/pve/nodes/B/qemu-server/100.conf +# mv /etc/pve/nodes/A/lxc/200.conf /etc/pve/nodes/B/lxc/200.conf +---- + +- Now you can start the guests again: ++ +---- +# qm start 100 +# pct start 200 +---- + +Remember to replace the VMIDs and node names with your respective values. + +Managing Jobs +------------- + +[thumbnail="screenshot/gui-qemu-add-replication-job.png"] + +You can use the web GUI to create, modify and remove replication jobs +easily. Additionally the command line interface (CLI) tool `pvesr` can be +used to do this. + +You can find the replication panel on all levels (datacenter, node, virtual +guest) in the web GUI. They differ in what jobs get shown: all, only node +specific or only guest specific jobs. + +Once adding a new job you need to specify the virtual guest (if not already +selected) and the target node. The replication +xref:pvesr_schedule_time_format[schedule] can be set if the default of `all +15 minutes` is not desired. You may also impose rate limiting on a +replication job, this can help to keep the storage load acceptable. + +A replication job is identified by an cluster-wide unique ID. This ID is +composed of the VMID in addition to an job number. +This ID must only be specified manually if the CLI tool is used. + + +Command Line Interface Examples +------------------------------- + +Create a replication job which will run every 5 minutes with limited bandwidth of +10 mbps (megabytes per second) for the guest with guest ID 100. + +---- +# pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10 +---- + +Disable an active job with ID `100-0` + +---- +# pvesr disable 100-0 +---- + +Enable a deactivated job with ID `100-0` + +---- +# pvesr enable 100-0 +---- + +Change the schedule interval of the job with ID `100-0` to once a hour + +---- +# pvesr update 100-0 --schedule '*/00' +---- ifdef::manvolnum[] include::pve-copyright.adoc[]