-[[chapter-ha-manager]]
+[[chapter_ha_manager]]
ifdef::manvolnum[]
-PVE({manvolnum})
-================
+ha-manager(1)
+=============
include::attributes.txt[]
+:pve-toplevel:
NAME
----
ha-manager - Proxmox VE HA Manager
-SYNOPSYS
+SYNOPSIS
--------
include::ha-manager.1-synopsis.adoc[]
DESCRIPTION
-----------
endif::manvolnum[]
-
ifndef::manvolnum[]
High Availability
=================
include::attributes.txt[]
+:pve-toplevel:
endif::manvolnum[]
-
Our modern society depends heavily on information provided by
computers over the network. Mobile devices amplified that dependency,
because people can access the network any time from anywhere. If you
* optional hardware fencing devices
+[[ha_manager_resources]]
Resources
---------
It can be in three states:
-*wait for agent lock*::
+wait for agent lock::
The LRM waits for our exclusive lock. This is also used as idle state if no
service is configured.
-*active*::
+active::
The LRM holds its exclusive lock and has services configured.
-*lost agent lock*::
+lost agent lock::
The LRM lost its lock, this means a failure happened and quorum was lost.
It can be in three states:
-*wait for agent lock*::
+wait for agent lock::
The CRM waits for our exclusive lock. This is also used as idle state if no
service is configured
-*active*::
+active::
The CRM holds its exclusive lock and has services configured
-*lost agent lock*::
+lost agent lock::
The CRM lost its lock, this means a failure happened and quorum was lost.
a watchdog reset.
+[[ha_manager_fencing]]
Fencing
-------
unresponsive node and as a result a chain reaction of node failures in the
cluster.
+[[ha_manager_groups]]
Groups
------
available. If more nodes are in the highest priority class the services will
get distributed to those node if not already there. The priorities have a
relative meaning only.
+ Example;;
+ You want to run all services from a group on `node1` if possible. If this node
+ is not available, you want them to run equally splitted on `node2` and `node3`, and
+ if those fail it should use `node4`.
+ To achieve this you could set the node list to:
+[source,bash]
+ ha-manager groupset mygroup -nodes "node1:2,node2:1,node3:1,node4"
restricted::
Resources bound to this group may only run on nodes defined by the
group. If no group node member is available the resource will be
placed in the stopped state.
+ Example;;
+ Lets say a service uses resources only available on `node1` and `node2`,
+ so we need to make sure that HA manager does not use other nodes.
+ We need to create a 'restricted' group with said nodes:
+[source,bash]
+ ha-manager groupset mygroup -nodes "node1,node2" -restricted
nofailback::
The resource won't automatically fail back when a more preferred node
(re)joins the cluster.
+ Examples;;
+ * You need to migrate a service to a node which hasn't the highest priority
+ in the group at the moment, to tell the HA manager to not move this service
+ instantly back set the 'nofailback' option and the service will stay on
+ the current node.
+
+ * A service was fenced and it got recovered to another node. The admin
+ repaired the node and brought it up online again but does not want that the
+ recovered services move straight back to the repaired node as he wants to
+ first investigate the failure cause and check if it runs stable. He can use
+ the 'nofailback' option to achieve this.
Start Failure Policy
* *after* you fixed all errors you may enable the service again
+[[ha_manager_service_operations]]
Service Operations
------------------