ha-manager.adoc: fix file format description

[pve-docs.git] / ha-manager.adoc
diff --git a/ha-manager.adoc b/ha-manager.adoc

index 052eefc1fe1e6101cc4bce95b55f3e15638f601c..2b4ffcbec4af0496b6e22594a1057b91e6e7b2d6 100644 (file)
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -1,9 +1,7 @@
-[[chapter-ha-manager]]
+[[chapter_ha_manager]]
  ifdef::manvolnum[]
-PVE({manvolnum})
-================
-include::attributes.txt[]
-
+ha-manager(1)
+=============
  :pve-toplevel:
  
  NAME
@@ -19,16 +17,11 @@ include::ha-manager.1-synopsis.adoc[]
  DESCRIPTION
  -----------
  endif::manvolnum[]
-
  ifndef::manvolnum[]
  High Availability
  =================
-include::attributes.txt[]
-endif::manvolnum[]
-
-ifdef::wiki[]
  :pve-toplevel:
-endif::wiki[]
+endif::manvolnum[]
  
  Our modern society depends heavily on information provided by
  computers over the network. Mobile devices amplified that dependency,
@@ -63,7 +56,7 @@ yourself. The following solutions works without modifying the
  software:
  
  * Use reliable ``server'' components
-
++
  NOTE: Computer components with same functionality can have varying
  reliability numbers, depending on the component quality. Most vendors
  sell components with higher reliability as ``server'' components -
@@ -112,21 +105,27 @@ hard and costly. `ha-manager` has typical error detection and failover
  times of about 2 minutes, so you can get no more than 99.999%
  availability.
  
+
  Requirements
  ------------
  
+You must meet the following requirements before you start with HA:
+
  * at least three cluster nodes (to get reliable quorum)
  
  * shared storage for VMs and containers
  
  * hardware redundancy (everywhere)
  
+* use reliable “server” components
+
  * hardware watchdog - if not available we fall back to the
    linux kernel software watchdog (`softdog`)
  
  * optional hardware fencing devices
  
  
+[[ha_manager_resources]]
  Resources
  ---------
  
@@ -153,16 +152,17 @@ To provide High Availability two daemons run on each node:
  
  `pve-ha-lrm`::
  
-The local resource manager (LRM), it controls the services running on
-the local node.
-It reads the requested states for its services from the current manager
-status file and executes the respective commands.
+The local resource manager (LRM), which controls the services running on
+the local node. It reads the requested states for its services from
+the current manager status file and executes the respective commands.
  
  `pve-ha-crm`::
  
-The cluster resource manager (CRM), it controls the cluster wide
-actions of the services, processes the LRM results and includes the state
-machine which controls the state of each service.
+The cluster resource manager (CRM), which makes the cluster wide
+decisions. It sends commands to the LRM, processes the results,
+and moves resources to other nodes if something fails. The CRM also
+handles node fencing.
+
  
  .Locks in the LRM & CRM
  [NOTE]
@@ -274,17 +274,61 @@ quorum, the LRM waits for a new quorum to form. As long as there is no
  quorum the node cannot reset the watchdog. This will trigger a reboot
  after the watchdog then times out, this happens after 60 seconds.
  
+
  Configuration
  -------------
  
-The HA stack is well integrated in the Proxmox VE API2. So, for
-example, HA can be configured via `ha-manager` or the PVE web
-interface, which both provide an easy to use tool.
+The HA stack is well integrated into the {pve} API. So, for example,
+HA can be configured via the `ha-manager` command line interface, or
+the {pve} web interface - both interfaces provide an easy way to
+manage HA. Automation tools can use the API directly.
+
+All HA configuration files are within `/etc/pve/ha/`, so they get
+automatically distributed to the cluster nodes, and all nodes share
+the same HA configuration.
+
+
+Resources
+~~~~~~~~~
+
+The resource configuration file `/etc/pve/ha/resources.cfg` stores
+the list of resources managed by `ha-manager`. A resource configuration
+inside that list look like this:
+
+----
+<type>:<name>
+       <property> <value>
+       ...
+----
+
+It starts with a resource type followed by a resource specific name,
+separated with colon. Together this forms the HA resource ID, which is
+used by all `ha-manager` commands to uniquely identify a resource
+(example: `vm:100` or `ct:101`).
+
+It starts with the service ID followed by a collon. The next lines
+contain additional properties:
+
+include::ha-resources-opts.adoc[]
+
+
+Groups
+~~~~~~
+
+The HA group configuration file `/etc/pve/ha/groups.cfg` is used to
+define groups of cluster nodes. A resource can be restricted to run
+only on the members of such group. A group configuration look like
+this:
+
+----
+group: <group>
+       nodes <node_list>
+       <property> <value>
+       ...
+----
+
+include::ha-groups-opts.adoc[]
  
-The resource configuration file can be located at
-`/etc/pve/ha/resources.cfg` and the group configuration file at
-`/etc/pve/ha/groups.cfg`. Use the provided tools to make changes,
-there shouldn't be any need to edit them manually.
  
  Node Power Status
  -----------------
@@ -316,6 +360,7 @@ the update process can be too long which, in the worst case, may result in
  a watchdog reset.
  
  
+[[ha_manager_fencing]]
  Fencing
  -------
  
@@ -385,6 +430,7 @@ That minimizes the possibility of an overload, which else could cause an
  unresponsive node and as a result a chain reaction of node failures in the
  cluster.
  
+[[ha_manager_groups]]
  Groups
  ------
  
@@ -486,6 +532,7 @@ killing its process)
  * *after* you fixed all errors you may enable the service again
  
  
+[[ha_manager_service_operations]]
  Service Operations
  ------------------