* Eliminate single point of failure (redundant components)
** use an uninterruptible power supply (UPS)
-** use redundant power supplies on the main boards
+** use redundant power supplies in your servers
** use ECC-RAM
** use redundant network hardware
** use RAID for local storage
This section provides a short overview of common management tasks. The
first step is to enable HA for a resource. This is done by adding the
resource to the HA resource configuration. You can do this using the
-GUI, or simply use the command line tool, for example:
+GUI, or simply use the command-line tool, for example:
----
# ha-manager add vm:100
lock.
+[[ha_manager_service_states]]
Service States
~~~~~~~~~~~~~~
The CRM uses a service state enumeration to record the current service
state. This state is displayed on the GUI and can be queried using
-the `ha-manager` command line tool:
+the `ha-manager` command-line tool:
----
# ha-manager status
Service is stopped and marked as `disabled`
+[[ha_manager_lrm]]
Local Resource Manager
~~~~~~~~~~~~~~~~~~~~~~
`journalctl -u pve-ha-lrm` on the node(s) where the service is and
the same command for the pve-ha-crm on the node which is the current master.
+
+[[ha_manager_crm]]
Cluster Resource Manager
~~~~~~~~~~~~~~~~~~~~~~~~
-------------
The HA stack is well integrated into the {pve} API. So, for example,
-HA can be configured via the `ha-manager` command line interface, or
+HA can be configured via the `ha-manager` command-line interface, or
the {pve} web interface - both interfaces provide an easy way to
manage HA. Automation tools can use the API directly.
[thumbnail="screenshot/gui-ha-manager-add-resource.png"]
-The above config was generated using the `ha-manager` command line tool:
+The above config was generated using the `ha-manager` command-line tool:
----
# ha-manager add vm:501 --state started --max_relocate 2
case, may result in a reset triggered by the watchdog.
+[[ha_manager_node_maintenance]]
Node Maintenance
----------------
Maintenance Mode
~~~~~~~~~~~~~~~~
-Enabling the manual maintenance mode will mark the node as unavailable for
-operation, this in turn will migrate away all services to other nodes, which
-are selected through the configured cluster resource scheduler (CRS) mode.
-During migration the original node will be recorded, so that the service can be
-moved back to to that node as soon as the maintenance mode is disabled, and it
-becomes online again.
+You can use the manual maintenance mode to mark the node as unavailable for HA
+operation, prompting all services managed by HA to migrate to other nodes.
+
+The target nodes for these migrations are selected from the other currently
+available nodes, and determined by the HA group configuration and the configured
+cluster resource scheduler (CRS) mode.
+During each migration, the original node will be recorded in the HA managers'
+state, so that the service can be moved back again automatically once the
+maintenance mode is disabled and the node is back online.
Currently you can enabled or disable the maintenance mode using the ha-manager
CLI tool.
shutdown. Currently 'Conditional' is the default due to backward compatibility.
Some users may find that 'Migrate' behaves more as expected.
+The shutdown policy can be configured in the Web UI (`Datacenter` -> `Options`
+-> `HA Settings`), or directly in `datacenter.cfg`:
+
+----
+ha: shutdown_policy=<value>
+----
+
Migrate
^^^^^^^