btrfs: document df weirdness and how to better get usage

[pve-docs.git] / pvesr.adoc
diff --git a/pvesr.adoc b/pvesr.adoc

index e307d0d2b4bcf38a22f45637d69ff404e4dd4dd3..a1a366c7bcbfd58f6f1043091c01d41c9a91ddac 100644 (file)
--- a/pvesr.adoc
+++ b/pvesr.adoc
@@ -1,3 +1,4 @@
+[[chapter_pvesr]]
  ifdef::manvolnum[]
  pvesr(1)
  ========
@@ -6,7 +7,7 @@ pvesr(1)
  NAME
  ----
  
-pvesr - Proxmox VE Storage Replication 
+pvesr - Proxmox VE Storage Replication
  
  SYNOPSIS
  --------
@@ -23,7 +24,267 @@ Storage Replication
  :pve-toplevel:
  endif::manvolnum[]
  
-The {PVE} storage replication tool (`pvesr`) ...
+The `pvesr` command line tool manages the {PVE} storage replication
+framework. Storage replication brings redundancy for guests using
+local storage and reduces migration time.
+
+It replicates guest volumes to another node so that all data is available
+without using shared storage. Replication uses snapshots to minimize traffic
+sent over the network. Therefore, new data is sent only incrementally after
+the initial full sync. In the case of a node failure, your guest data is
+still available on the replicated node.
+
+The replication is done automatically in configurable intervals.
+The minimum replication interval is one minute, and the maximal interval
+once a week. The format used to specify those intervals is a subset of
+`systemd` calendar events, see
+xref:pvesr_schedule_time_format[Schedule Format] section:
+
+It is possible to replicate a guest to multiple target nodes,
+but not twice to the same target node.
+
+Each replications bandwidth can be limited, to avoid overloading a storage
+or server.
+
+Guests with replication enabled can currently only be migrated offline.
+Only changes since the last replication (so-called `deltas`) need to be
+transferred if the guest is migrated to a node to which it already is
+replicated. This reduces the time needed significantly. The replication
+direction automatically switches if you migrate a guest to the replication
+target node.
+
+For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`.
+You migrate it to `nodeB`, so now it gets automatically replicated back from
+`nodeB` to `nodeA`.
+
+If you migrate to a node where the guest is not replicated, the whole disk
+data must send over. After the migration, the replication job continues to
+replicate this guest to the configured nodes.
+
+[IMPORTANT]
+====
+High-Availability is allowed in combination with storage replication, but it
+has the following implications:
+
+* as live-migrations are currently not possible, redistributing services after
+  a more preferred node comes online does not work. Keep that in mind when
+  configuring your HA groups and their priorities for replicated guests.
+
+* recovery works, but there may be some data loss between the last synced
+  time and the time a node failed.
+====
+
+Supported Storage Types
+-----------------------
+
+.Storage Types
+[width="100%",options="header"]
+|============================================
+|Description    |PVE type    |Snapshots|Stable
+|ZFS (local)    |zfspool     |yes      |yes
+|============================================
+
+[[pvesr_schedule_time_format]]
+Schedule Format
+---------------
+
+{pve} has a very flexible replication scheduler. It is based on the systemd
+time calendar event format.footnote:[see `man 7 systemd.time` for more information]
+Calendar events may be used to refer to one or more points in time in a
+single expression.
+
+Such a calendar event uses the following format:
+
+----
+[day(s)] [[start-time(s)][/repetition-time(s)]]
+----
+
+This format allows you to configure a set of days on which the job should run.
+You can also set one or more start times. It tells the replication scheduler
+the moments in time when a job should start.
+With this information we, can create a job which runs every workday at 10
+PM: `'mon,tue,wed,thu,fri 22'` which could be abbreviated to: `'mon..fri
+22'`, most reasonable schedules can be written quite intuitive this way.
+
+NOTE: Hours are formatted in 24-hour format.
+
+To allow a convenient and shorter configuration, one or more repeat times per
+guest can be set. They indicate that replications are done on the start-time(s)
+itself and the start-time(s) plus all multiples of the repetition value. If
+you want to start replication at 8 AM and repeat it every 15 minutes until
+9 AM you would use: `'8:00/15'`
+
+Here you see that if no hour separation (`:`), is used the value gets
+interpreted as minute. If such a separation is used, the value on the left
+denotes the hour(s), and the value on the right denotes the minute(s).
+Further, you can use `*` to match all possible values.
+
+To get additional ideas look at
+xref:pvesr_schedule_format_examples[more Examples below].
+
+Detailed Specification
+~~~~~~~~~~~~~~~~~~~~~~
+
+days:: Days are specified with an abbreviated English version: `sun, mon,
+tue, wed, thu, fri and sat`. You may use multiple days as a comma-separated
+list. A range of days can also be set by specifying the start and end day
+separated by ``..'', for example `mon..fri`. These formats can be mixed.
+If omitted `'*'` is assumed.
+
+time-format:: A time format consists of hours and minutes interval lists.
+Hours and minutes are separated by `':'`. Both hour and minute can be list
+and ranges of values, using the same format as days.
+First are hours, then minutes. Hours can be omitted if not needed. In this
+case `'*'` is assumed for the value of hours.
+The valid range for values is `0-23` for hours and `0-59` for minutes.
+
+[[pvesr_schedule_format_examples]]
+Examples:
+~~~~~~~~~
+
+.Schedule Examples
+[width="100%",options="header"]
+|==============================================================================
+|Schedule String       |Alternative            |Meaning
+|mon,tue,wed,thu,fri   |mon..fri               |Every working day at 0:00
+|sat,sun               |sat..sun               |Only on weekends at 0:00
+|mon,wed,fri           |--                     |Only on Monday, Wednesday and Friday at 0:00
+|12:05                 |12:05                  |Every day at 12:05 PM
+|*/5                   |0/5                    |Every five minutes
+|mon..wed 30/10                |mon,tue,wed 30/10      |Monday, Tuesday, Wednesday 30, 40 and 50 minutes after every full hour
+|mon..fri 8..17,22:0/15        |--                     |Every working day every 15 minutes between 8 AM and 6 PM and between 10 PM and 11 PM
+|fri 12..13:5/20       |fri 12,13:5/20         |Friday at 12:05, 12:25, 12:45, 13:05, 13:25 and 13:45
+|12,14,16,18,20,22:5   |12/2:5                 |Every day starting at 12:05 until 22:05, every 2 hours
+|*                     |*/1                    |Every minute (minimum interval)
+|==============================================================================
+
+Error Handling
+--------------
+
+If a replication job encounters problems, it is placed in an error state.
+In this state, the configured replication intervals get suspended
+temporarily. The failed replication is repeatedly tried again in a
+30 minute interval.
+Once this succeeds, the original schedule gets activated again.
+
+Possible issues
+~~~~~~~~~~~~~~~
+
+Some of the most common issues are in the following list. Depending on your
+setup there may be another cause.
+
+* Network is not working.
+
+* No free space left on the replication target storage.
+
+* Storage with same storage ID available on the target node
+
+NOTE: You can always use the replication log to find out what is causing the problem.
+
+Migrating a guest in case of Error
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+// FIXME: move this to better fitting chapter (sysadmin ?) and only link to
+// it here
+
+In the case of a grave error, a virtual guest may get stuck on a failed
+node. You then need to move it manually to a working node again.
+
+Example
+~~~~~~~
+
+Let's assume that you have two guests (VM 100 and CT 200) running on node A
+and replicate to node B.
+Node A failed and can not get back online. Now you have to migrate the guest
+to Node B manually.
+
+- connect to node B over ssh or open its shell via the WebUI
+
+- check if that the cluster is quorate
++
+----
+# pvecm status
+----
+
+- If you have no quorum, we strongly advise to fix this first and make the
+  node operable again. Only if this is not possible at the moment, you may
+  use the following command to enforce quorum on the current node:
++
+----
+# pvecm expected 1
+----
+
+WARNING: Avoid changes which affect the cluster if `expected votes` are set
+(for example adding/removing nodes, storages, virtual guests) at all costs.
+Only use it to get vital guests up and running again or to resolve the quorum
+issue itself.
+
+- move both guest configuration files form the origin node A to node B:
++
+----
+# mv /etc/pve/nodes/A/qemu-server/100.conf /etc/pve/nodes/B/qemu-server/100.conf
+# mv /etc/pve/nodes/A/lxc/200.conf /etc/pve/nodes/B/lxc/200.conf
+----
+
+- Now you can start the guests again:
++
+----
+# qm start 100
+# pct start 200
+----
+
+Remember to replace the VMIDs and node names with your respective values.
+
+Managing Jobs
+-------------
+
+[thumbnail="screenshot/gui-qemu-add-replication-job.png"]
+
+You can use the web GUI to create, modify, and remove replication jobs
+easily. Additionally, the command line interface (CLI) tool `pvesr` can be
+used to do this.
+
+You can find the replication panel on all levels (datacenter, node, virtual
+guest) in the web GUI. They differ in which jobs get shown:
+all, node- or guest-specific jobs.
+
+When adding a new job, you need to specify the guest if not already selected
+as well as the target node. The replication
+xref:pvesr_schedule_time_format[schedule] can be set if the default of `all
+15 minutes` is not desired. You may impose a rate-limit on a replication
+job. The rate limit can help to keep the load on the storage acceptable.
+
+A replication job is identified by a cluster-wide unique ID. This ID is
+composed of the VMID in addition to a job number.
+This ID must only be specified manually if the CLI tool is used.
+
+
+Command Line Interface Examples
+-------------------------------
+
+Create a replication job which runs every 5 minutes with a limited bandwidth
+of 10 Mbps (megabytes per second) for the guest with ID 100.
+
+----
+# pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10
+----
+
+Disable an active job with ID `100-0`.
+
+----
+# pvesr disable 100-0
+----
+
+Enable a deactivated job with ID `100-0`.
+
+----
+# pvesr enable 100-0
+----
+
+Change the schedule interval of the job with ID `100-0` to once per hour.
+
+----
+# pvesr update 100-0 --schedule '*/00'
+----
  
  ifdef::manvolnum[]
  include::pve-copyright.adoc[]