-[[chapter-ha-manager]]
+[[chapter_ha_manager]]
ifdef::manvolnum[]
-PVE({manvolnum})
-================
-include::attributes.txt[]
+ha-manager(1)
+=============
+:pve-toplevel:
NAME
----
ha-manager - Proxmox VE HA Manager
-SYNOPSYS
+SYNOPSIS
--------
include::ha-manager.1-synopsis.adoc[]
DESCRIPTION
-----------
endif::manvolnum[]
-
ifndef::manvolnum[]
High Availability
=================
-include::attributes.txt[]
+:pve-toplevel:
endif::manvolnum[]
-
Our modern society depends heavily on information provided by
computers over the network. Mobile devices amplified that dependency,
because people can access the network any time from anywhere. If you
* optional hardware fencing devices
+[[ha_manager_resources]]
Resources
---------
`pve-ha-lrm`::
-The local resource manager (LRM), it controls the services running on
-the local node.
-It reads the requested states for its services from the current manager
-status file and executes the respective commands.
+The local resource manager (LRM), which controls the services running on
+the local node. It reads the requested states for its services from
+the current manager status file and executes the respective commands.
`pve-ha-crm`::
-The cluster resource manager (CRM), it controls the cluster wide
-actions of the services, processes the LRM results and includes the state
-machine which controls the state of each service.
+The cluster resource manager (CRM), which makes the cluster wide
+decisions. It sends commands to the LRM, processes the results,
+and moves resources to other nodes if something fails. The CRM also
+handles node fencing.
+
.Locks in the LRM & CRM
[NOTE]
a watchdog reset.
+[[ha_manager_fencing]]
Fencing
-------
unresponsive node and as a result a chain reaction of node failures in the
cluster.
+[[ha_manager_groups]]
Groups
------
available. If more nodes are in the highest priority class the services will
get distributed to those node if not already there. The priorities have a
relative meaning only.
+ Example;;
+ You want to run all services from a group on `node1` if possible. If this node
+ is not available, you want them to run equally splitted on `node2` and `node3`, and
+ if those fail it should use `node4`.
+ To achieve this you could set the node list to:
+[source,bash]
+ ha-manager groupset mygroup -nodes "node1:2,node2:1,node3:1,node4"
restricted::
Resources bound to this group may only run on nodes defined by the
group. If no group node member is available the resource will be
placed in the stopped state.
+ Example;;
+ Lets say a service uses resources only available on `node1` and `node2`,
+ so we need to make sure that HA manager does not use other nodes.
+ We need to create a 'restricted' group with said nodes:
+[source,bash]
+ ha-manager groupset mygroup -nodes "node1,node2" -restricted
nofailback::
The resource won't automatically fail back when a more preferred node
(re)joins the cluster.
+ Examples;;
+ * You need to migrate a service to a node which hasn't the highest priority
+ in the group at the moment, to tell the HA manager to not move this service
+ instantly back set the 'nofailback' option and the service will stay on
+ the current node.
+
+ * A service was fenced and it got recovered to another node. The admin
+ repaired the node and brought it up online again but does not want that the
+ recovered services move straight back to the repaired node as he wants to
+ first investigate the failure cause and check if it runs stable. He can use
+ the 'nofailback' option to achieve this.
Start Failure Policy
* *after* you fixed all errors you may enable the service again
+[[ha_manager_service_operations]]
Service Operations
------------------