X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pmxcfs.adoc;h=7b9cfac0d4faa0b51506f7d0a5f41389dc2182ef;hp=12e51d19968dfd50ea0f1bad6b27fe20973c4bc1;hb=424214c1174b8bcf304b6afa335bec9dd121dc51;hpb=a69bfc83f6d2b79e94eeb39781d89b720b4482dc

diff --git a/pmxcfs.adoc b/pmxcfs.adoc
index 12e51d1..7b9cfac 100644
--- a/pmxcfs.adoc
+++ b/pmxcfs.adoc
@@ -1,3 +1,4 @@
+[[chapter_pmxcfs]]
 ifdef::manvolnum[]
 pmxcfs(8)
 =========
@@ -99,6 +100,7 @@ Files
 |`datacenter.cfg`                       | {pve} datacenter wide configuration (keyboard layout, proxy, ...)
 |`user.cfg`                             | {pve} access control configuration (users/groups/...)
 |`domains.cfg`                          | {pve} authentication domains
+|`status.cfg`                           | {pve} external metrics server configuration
 |`authkey.pub`                          | Public key used by ticket system
 |`pve-root-ca.pem`                      | Public certificate of cluster CA
 |`priv/shadow.cfg`                      | Shadow password file
@@ -145,11 +147,11 @@ Enable/Disable debugging
 
 You can enable verbose syslog messages with:
 
- echo "1" >/etc/pve/.debug 
+ echo "1" >/etc/pve/.debug
 
 And disable verbose syslog messages with:
 
- echo "0" >/etc/pve/.debug 
+ echo "0" >/etc/pve/.debug
 
 
 Recovery
@@ -176,6 +178,45 @@ In some cases, you might prefer to put a node back to local mode without
 reinstall, which is described in
 <<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
 
+
+Recovering/Moving Guests from Failed Nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
+`nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as
+owner of the respective guest. This concept enables the usage of local locks
+instead of expensive cluster-wide locks for preventing concurrent guest
+configuration changes.
+
+As a consequence, if the owning node of a guest fails (e.g., because of a power
+outage, fencing event, ..), a regular migration is not possible (even if all
+the disks are located on shared storage) because such a local lock on the
+(dead) owning node is unobtainable. This is not a problem for HA-managed
+guests, as {pve}'s High Availability stack includes the necessary
+(cluster-wide) locking and watchdog functionality to ensure correct and
+automatic recovery of guests from fenced nodes.
+
+If a non-HA-managed guest has only shared disks (and no other local resources
+which are only available on the failed node are configured), a manual recovery
+is possible by simply moving the guest configuration file from the failed
+node's directory in `/etc/pve/` to an alive node's directory (which changes the
+logical owner or location of the guest).
+
+For example, recovering the VM with ID `100` from a dead `node1` to another
+node `node2` works with the following command executed when logged in as root
+on any member node of the cluster:
+
+ mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
+
+WARNING: Before manually recovering a guest like this, make absolutely sure
+that the failed source node is really powered off/fenced. Otherwise {pve}'s
+locking principles are violated by the `mv` command, which can have unexpected
+consequences.
+
+WARNING: Guest with local disks (or other local resources which are only
+available on the dead node) are not recoverable like this. Either wait for the
+failed node to rejoin the cluster or restore such guests from backups.
+
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]
 endif::manvolnum[]