X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=ha-manager.adoc;h=d8489cb232652a4e2e0c04c30c8b3162749e4852;hp=a5ffe00325e0b711cac094036ef8a229a1c37af2;hb=a9c77fec9239c1dd979bb0fd025a4d9186ae6449;hpb=49a5e11cd14742d8ad28116fab7fce9fc85321bd

diff --git a/ha-manager.adoc b/ha-manager.adoc
index a5ffe00..d8489cb 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -1,8 +1,8 @@
-[[chapter-ha-manager]]
+[[chapter_ha_manager]]
 ifdef::manvolnum[]
-PVE({manvolnum})
-================
-include::attributes.txt[]
+ha-manager(1)
+=============
+:pve-toplevel:
 
 NAME
 ----
@@ -17,14 +17,12 @@ include::ha-manager.1-synopsis.adoc[]
 DESCRIPTION
 -----------
 endif::manvolnum[]
-
 ifndef::manvolnum[]
 High Availability
 =================
-include::attributes.txt[]
+:pve-toplevel:
 endif::manvolnum[]
 
-
 Our modern society depends heavily on information provided by
 computers over the network. Mobile devices amplified that dependency,
 because people can access the network any time from anywhere. If you
@@ -58,7 +56,7 @@ yourself. The following solutions works without modifying the
 software:
 
 * Use reliable ``server'' components
-
++
 NOTE: Computer components with same functionality can have varying
 reliability numbers, depending on the component quality. Most vendors
 sell components with higher reliability as ``server'' components -
@@ -107,21 +105,27 @@ hard and costly. `ha-manager` has typical error detection and failover
 times of about 2 minutes, so you can get no more than 99.999%
 availability.
 
+
 Requirements
 ------------
 
+You must meet the following requirements before you start with HA:
+
 * at least three cluster nodes (to get reliable quorum)
 
 * shared storage for VMs and containers
 
 * hardware redundancy (everywhere)
 
+* use reliable âserverâ components
+
 * hardware watchdog - if not available we fall back to the
   linux kernel software watchdog (`softdog`)
 
 * optional hardware fencing devices
 
 
+[[ha_manager_resources]]
 Resources
 ---------
 
@@ -148,16 +152,17 @@ To provide High Availability two daemons run on each node:
 
 `pve-ha-lrm`::
 
-The local resource manager (LRM), it controls the services running on
-the local node.
-It reads the requested states for its services from the current manager
-status file and executes the respective commands.
+The local resource manager (LRM), which controls the services running on
+the local node. It reads the requested states for its services from
+the current manager status file and executes the respective commands.
 
 `pve-ha-crm`::
 
-The cluster resource manager (CRM), it controls the cluster wide
-actions of the services, processes the LRM results and includes the state
-machine which controls the state of each service.
+The cluster resource manager (CRM), which makes the cluster wide
+decisions. It sends commands to the LRM, processes the results,
+and moves resources to other nodes if something fails. The CRM also
+handles node fencing.
+
 
 .Locks in the LRM & CRM
 [NOTE]
@@ -269,17 +274,59 @@ quorum, the LRM waits for a new quorum to form. As long as there is no
 quorum the node cannot reset the watchdog. This will trigger a reboot
 after the watchdog then times out, this happens after 60 seconds.
 
+
 Configuration
 -------------
 
-The HA stack is well integrated in the Proxmox VE API2. So, for
-example, HA can be configured via `ha-manager` or the PVE web
-interface, which both provide an easy to use tool.
+The HA stack is well integrated into the {pve} API. So, for example,
+HA can be configured via the `ha-manager` command line interface, or
+the {pve} web interface - both interfaces provide an easy way to
+manage HA. Automation tools can use the API directly.
+
+All HA configuration files are within `/etc/pve/ha/`, so they get
+automatically distributed to the cluster nodes, and all nodes share
+the same HA configuration.
+
+
+Resources
+~~~~~~~~~
+
+The resource configuration file `/etc/pve/ha/resources.cfg` stores
+the list of resources managed by `ha-manager`. A resource configuration
+inside that list look like this:
+
+----
+<type>:<name>
+	<property> <value>
+	...
+----
+
+It starts with a resource type followed by a resource specific name,
+separated with colon. Together this forms the HA resource ID, which is
+used by all `ha-manager` commands to uniquely identify a resource
+(example: `vm:100` or `ct:101`). The next lines contain additional
+properties:
+
+include::ha-resources-opts.adoc[]
+
+
+Groups
+~~~~~~
+
+The HA group configuration file `/etc/pve/ha/groups.cfg` is used to
+define groups of cluster nodes. A resource can be restricted to run
+only on the members of such group. A group configuration look like
+this:
+
+----
+group: <group>
+       nodes <node_list>
+       <property> <value>
+       ...
+----
+
+include::ha-groups-opts.adoc[]
 
-The resource configuration file can be located at
-`/etc/pve/ha/resources.cfg` and the group configuration file at
-`/etc/pve/ha/groups.cfg`. Use the provided tools to make changes,
-there shouldn't be any need to edit them manually.
 
 Node Power Status
 -----------------
@@ -311,6 +358,7 @@ the update process can be too long which, in the worst case, may result in
 a watchdog reset.
 
 
+[[ha_manager_fencing]]
 Fencing
 -------
 
@@ -380,6 +428,7 @@ That minimizes the possibility of an overload, which else could cause an
 unresponsive node and as a result a chain reaction of node failures in the
 cluster.
 
+[[ha_manager_groups]]
 Groups
 ------
 
@@ -481,6 +530,7 @@ killing its process)
 * *after* you fixed all errors you may enable the service again
 
 
+[[ha_manager_service_operations]]
 Service Operations
 ------------------