X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=ha-manager.adoc;h=5db5b052e44d3f1817298a284ca8b65efc8eadca;hp=026d4a4d41ba65d67465bf21a52eab5d47da8652;hb=51e33128e352041933fd6c7e79ec0fa3a992ed00;hpb=c9aa5d470ea84ae279aeeb70d2f1b495f51cf687

diff --git a/ha-manager.adoc b/ha-manager.adoc
index 026d4a4..5db5b05 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -350,6 +350,24 @@ If you have a hardware watchdog available remove its kernel module from the
 blacklist, load it with insmod and restart the 'watchdog-mux' service or reboot
 the node.
 
+Recover Fenced Services
+~~~~~~~~~~~~~~~~~~~~~~~
+
+After a node failed and its fencing was successful we start to recover services
+to other available nodes and restart them there so that they can provide service
+again.
+
+The selection of the node on which the services gets recovered is influenced
+by the users group settings, the currently active nodes and their respective
+active service count.
+First we build a set out of the intersection between user selected nodes and
+available nodes. Then the subset with the highest priority of those nodes
+gets chosen as possible nodes for recovery. We select the node with the
+currently lowest active service count as a new node for the service.
+That minimizes the possibility of an overload, which else could cause an
+unresponsive node and as a result a chain reaction of node failures in the
+cluster.
+
 Groups
 ------
 
@@ -378,10 +396,19 @@ the resource won't automatically fail back when a more preferred node
 (re)joins the cluster.
 
 
-Recovery Policy
----------------
+Start Failure Policy
+---------------------
+
+The start failure policy comes in effect if a service failed to start on a
+node once ore more times. It can be used to configure how often a restart
+should be triggered on the same node and how often a service should be
+relocated so that it gets a try to be started on another node.
+The aim of this policy is to circumvent temporary unavailability of shared
+resources on a specific node. For example, if a shared storage isn't available
+on a quorate node anymore, e.g. network problems, but still on other nodes,
+the relocate policy allows then that the service gets started nonetheless.
 
-There are two service recover policy settings which can be configured
+There are two service start recover policy settings which can be configured
 specific for each resource.
 
 max_restart::