+* shared storage for VMs and containers
+
+* hardware redundancy (everywhere)
+
+* hardware watchdog - if not available we fall back to the
+ linux kernel software watchdog (`softdog`)
+
+* optional hardware fencing devices
+
+
+Resources
+---------
+
+We call the primary management unit handled by `ha-manager` a
+resource. A resource (also called ``service'') is uniquely
+identified by a service ID (SID), which consists of the resource type
+and an type specific ID, e.g.: `vm:100`. That example would be a
+resource of type `vm` (virtual machine) with the ID 100.
+
+For now we have two important resources types - virtual machines and
+containers. One basic idea here is that we can bundle related software
+into such VM or container, so there is no need to compose one big
+service from other services, like it was done with `rgmanager`. In
+general, a HA enabled resource should not depend on other resources.
+
+
+How It Works
+------------
+
+This section provides an in detail description of the {PVE} HA-manager
+internals. It describes how the CRM and the LRM work together.
+
+To provide High Availability two daemons run on each node:
+
+`pve-ha-lrm`::
+
+The local resource manager (LRM), it controls the services running on
+the local node.
+It reads the requested states for its services from the current manager
+status file and executes the respective commands.
+
+`pve-ha-crm`::
+
+The cluster resource manager (CRM), it controls the cluster wide
+actions of the services, processes the LRM results and includes the state
+machine which controls the state of each service.
+
+.Locks in the LRM & CRM
+[NOTE]
+Locks are provided by our distributed configuration file system (pmxcfs).
+They are used to guarantee that each LRM is active once and working. As a
+LRM only executes actions when it holds its lock we can mark a failed node
+as fenced if we can acquire its lock. This lets us then recover any failed
+HA services securely without any interference from the now unknown failed node.
+This all gets supervised by the CRM which holds currently the manager master
+lock.
+
+Local Resource Manager
+~~~~~~~~~~~~~~~~~~~~~~
+
+The local resource manager (`pve-ha-lrm`) is started as a daemon on
+boot and waits until the HA cluster is quorate and thus cluster wide
+locks are working.
+
+It can be in three states:
+
+wait for agent lock::
+
+The LRM waits for our exclusive lock. This is also used as idle state if no
+service is configured.
+
+active::
+
+The LRM holds its exclusive lock and has services configured.
+
+lost agent lock::
+
+The LRM lost its lock, this means a failure happened and quorum was lost.
+
+After the LRM gets in the active state it reads the manager status
+file in `/etc/pve/ha/manager_status` and determines the commands it
+has to execute for the services it owns.
+For each command a worker gets started, this workers are running in
+parallel and are limited to at most 4 by default. This default setting
+may be changed through the datacenter configuration key `max_worker`.
+When finished the worker process gets collected and its result saved for
+the CRM.
+
+.Maximum Concurrent Worker Adjustment Tips
+[NOTE]
+The default value of at most 4 concurrent workers may be unsuited for
+a specific setup. For example may 4 live migrations happen at the same
+time, which can lead to network congestions with slower networks and/or
+big (memory wise) services. Ensure that also in the worst case no congestion
+happens and lower the `max_worker` value if needed. In the contrary, if you
+have a particularly powerful high end setup you may also want to increase it.
+
+Each command requested by the CRM is uniquely identifiable by an UID, when
+the worker finished its result will be processed and written in the LRM
+status file `/etc/pve/nodes/<nodename>/lrm_status`. There the CRM may collect
+it and let its state machine - respective the commands output - act on it.
+
+The actions on each service between CRM and LRM are normally always synced.
+This means that the CRM requests a state uniquely marked by an UID, the LRM
+then executes this action *one time* and writes back the result, also
+identifiable by the same UID. This is needed so that the LRM does not
+executes an outdated command.
+With the exception of the `stop` and the `error` command,
+those two do not depend on the result produced and are executed
+always in the case of the stopped state and once in the case of
+the error state.
+
+.Read the Logs
+[NOTE]
+The HA Stack logs every action it makes. This helps to understand what
+and also why something happens in the cluster. Here its important to see
+what both daemons, the LRM and the CRM, did. You may use
+`journalctl -u pve-ha-lrm` on the node(s) where the service is and
+the same command for the pve-ha-crm on the node which is the current master.
+
+Cluster Resource Manager
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The cluster resource manager (`pve-ha-crm`) starts on each node and