X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pmxcfs.adoc;h=5a68598b9621319db66dbcfc2b1f9789f779bcb9;hp=2e3dd514691fcef078703a3423f05f4c203b2a13;hb=34ee078e66d5593492fe64d018696a2541075710;hpb=194d2f296102b7693c5915ff803e225f6d3b6526

diff --git a/pmxcfs.adoc b/pmxcfs.adoc
index 2e3dd51..5a68598 100644
--- a/pmxcfs.adoc
+++ b/pmxcfs.adoc
@@ -1,7 +1,6 @@
 ifdef::manvolnum[]
 pmxcfs(8)
 =========
-include::attributes.txt[]
 :pve-toplevel:
 
 NAME
@@ -12,7 +11,7 @@ pmxcfs - Proxmox Cluster File System
 SYNOPSIS
 --------
 
-include::pmxcfs.8-cli.adoc[]
+include::pmxcfs.8-synopsis.adoc[]
 
 DESCRIPTION
 -----------
@@ -21,7 +20,6 @@ endif::manvolnum[]
 ifndef::manvolnum[]
 Proxmox Cluster File System (pmxcfs)
 ====================================
-include::attributes.txt[]
 :pve-toplevel:
 endif::manvolnum[]
 
@@ -174,34 +172,48 @@ The recommended way is to reinstall the node after you removed it from
 your cluster. This makes sure that all secret cluster/ssh keys and any
 shared configuration data is destroyed.
 
-In some cases, you might prefer to put a node back to local mode
-without reinstall, which is described here:
+In some cases, you might prefer to put a node back to local mode without
+reinstall, which is described in
+<<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
 
-* stop the cluster file system in `/etc/pve/`
 
- # systemctl stop pve-cluster
+Recovering/Moving Guests from Failed Nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-* start it again but forcing local mode
+For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
+`nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as
+owner of the respective guest. This concept enables the usage of local locks
+instead of expensive cluster-wide locks for preventing concurrent guest
+configuration changes.
 
- # pmxcfs -l
+As a consequence, if the owning node of a guest fails (e.g., because of a power
+outage, fencing event, ..), a regular migration is not possible (even if all
+the disks are located on shared storage) because such a local lock on the
+(dead) owning node is unobtainable. This is not a problem for HA-managed
+guests, as {pve}'s High Availability stack includes the necessary
+(cluster-wide) locking and watchdog functionality to ensure correct and
+automatic recovery of guests from fenced nodes.
 
-* remove the cluster configuration
+If a non-HA-managed guest has only shared disks (and no other local resources
+which are only available on the failed node are configured), a manual recovery
+is possible by simply moving the guest configuration file from the failed
+node's directory in `/etc/pve/` to an alive node's directory (which changes the
+logical owner or location of the guest).
 
- # rm /etc/pve/cluster.conf
- # rm /etc/cluster/cluster.conf
- # rm /var/lib/pve-cluster/corosync.authkey
+For example, recovering the VM with ID `100` from a dead `node1` to another
+node `node2` works with the following command executed when logged in as root
+on any member node of the cluster:
 
-* stop the cluster file system again
+ mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
 
- # systemctl stop pve-cluster
-
-* restart PVE services (or reboot)
-
- # systemctl start pve-cluster
- # systemctl restart pvedaemon
- # systemctl restart pveproxy
- # systemctl restart pvestatd
+WARNING: Before manually recovering a guest like this, make absolutely sure
+that the failed source node is really powered off/fenced. Otherwise {pve}'s
+locking principles are violated by the `mv` command, which can have unexpected
+consequences.
 
+WARNING: Guest with local disks (or other local resources which are only
+available on the dead node) are not recoverable like this. Either wait for the
+failed node to rejoin the cluster or restore such guests from backups.
 
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]