update readme to be a bit less confusing/outdated

author Thomas Lamprecht <t.lamprecht@proxmox.com>

Tue, 3 Jan 2023 12:19:16 +0000 (13:19 +0100)

committer Thomas Lamprecht <t.lamprecht@proxmox.com>

Tue, 3 Jan 2023 12:19:18 +0000 (13:19 +0100)
author Thomas Lamprecht <t.lamprecht@proxmox.com>
Tue, 3 Jan 2023 12:19:16 +0000 (13:19 +0100)
committer Thomas Lamprecht <t.lamprecht@proxmox.com>
Tue, 3 Jan 2023 12:19:18 +0000 (13:19 +0100)
diff --git a/README b/README

index 1c5177f16e81fba2df819a4e469cd03323ac89a4..cf6560da85455173299ae4aeb02e56352bed7609 100644 (file)
--- a/README
+++ b/README
@@ -1,41 +1,42 @@
  = Proxmox HA Manager =
  
-== Motivation ==
+Note that this README got written as early development planning/documentation
+in 2015, even though small updates where made in 2023 it might be a bit out of
+date. For usage documentation see the official reference docs shipped with your
+Proxmox VE installation or, for your convenience, hosted at:
+https://pve.proxmox.com/pve-docs/chapter-ha-manager.html
  
-The current HA manager has a bunch of drawbacks:
+== History & Motivation ==
  
-- no more development (redhat moved to pacemaker)
+The `rgmanager` HA stack used in Proxmox VE 3.x has a bunch of drawbacks:
  
+- no more development (redhat moved to pacemaker)
  - highly depend on old version of corosync
-
-- complicated code (cause by compatibility layer with 
-  older cluster stack (cman)
-
+- complicated code (cause by compatibility layer with older cluster stack
+  (cman)
  - no self-fencing
  
-In future, we want to make HA easier for our users, and it should 
-be possible to move to newest corosync, or even a totally different 
-cluster stack. So we want:
-
-- possibility to run with any distributed key/value store which provides
-  some kind of locking with timeouts (zookeeper, consul, etcd, ..) 
+For Proxmox VE 4.0 we thus required a new HA stack and also wanted to make HA
+easier for our users while also making it possible to move to newest corosync,
+or even a totally different cluster stack. So, the following core requirements
+got set out:
  
+- possibility to run with any distributed key/value store which provides some
+  kind of locking with timeouts (zookeeper, consul, etcd, ..) 
  - self fencing using Linux watchdog device
-
  - implemented in Perl, so that we can use PVE framework
-
  - only work with simply resources like VMs
  
-We dropped the idea to assemble complex, dependend services, because we think
-this is already done with the VM abstraction.
+We dropped the idea to assemble complex, dependent services, because we think
+this is already done with the VM/CT abstraction.
  
-= Architecture =
+== Architecture ==
  
-== Cluster requirements ==
+Cluster requirements.
  
  === Cluster wide locks with timeouts ===
  
-The cluster stack must provide cluster wide locks with timeouts.
+The cluster stack must provide cluster-wide locks with timeouts.
  The Proxmox 'pmxcfs' implements this on top of corosync.
  
  === Watchdog ===
@@ -50,7 +51,7 @@ allows one user. We need to protect at least two daemons, so we write
  our own watchdog daemon. This daemon work on /dev/watchdog, but
  provides that service to several other daemons using a local socket.
  
-== Self fencing ==
+=== Self fencing ===
  
  A node needs to acquire a special 'ha_agent_${node}_lock' (one separate
  lock for each node) before starting HA resources, and the node updates
@@ -65,21 +66,25 @@ long as there are running services on that node.
  The HA manger can assume that the watchdog triggered a reboot when he
  is able to acquire the 'ha_agent_${node}_lock' for that node.
  
-=== Problems with "two_node" Clusters ===
+==== Problems with "two_node" Clusters ====
  
  This corosync options depends on a fence race condition, and only
  works using reliable HW fence devices.
  
  Above 'self fencing' algorithm does not work if you use this option!
  
-== Testing requirements ==
+Note that you can use a QDevice, i.e., a external simple (no full corosync
+membership, so relaxed networking) note arbiter process.
+
+=== Testing Requirements ===
  
-We want to be able to simulate HA cluster, using a GUI. This makes it easier
-to learn how the system behaves. We also need a way to run regression tests.
+We want to be able to simulate and test the behavior of a HA cluster, using
+either a GUI or a CLI. This makes it easier to learn how the system behaves. We
+also need a way to run regression tests.
  
-= Implementation details =
+== Implementation details ==
  
-== Cluster Resource Manager (class PVE::HA::CRM) ==
+=== Cluster Resource Manager (class PVE::HA::CRM) ====
  
  The Cluster Resource Manager (CRM) daemon runs one each node, but
  locking makes sure only one CRM daemon act in 'master' role. That
@@ -88,41 +93,51 @@ service states by writing the global 'manager_status'. That data
  structure is read by the Local Resource Manager, which performs the
  real work (start/stop/migrate) services.
  
-=== Service Relocation ===
+==== Service Relocation ====
  
-Some services like Qemu Virtual Machines supports live migration.
-So the LRM can migrate those services without stopping them (CRM 
-service state 'migrate'),
+Some services, like a QEMU Virtual Machine, supports live migration.
+So the LRM can migrate those services without stopping them (CRM service state
+'migrate'),
  
-Most other service types requires the service to be stopped, and then
-restarted at the other node. Stopped services are moved by the CRM
-(usually by simply changing the service configuration).
+Most other service types requires the service to be stopped, and then restarted
+at the other node. Stopped services are moved by the CRM (usually by simply
+changing the service configuration).
  
-=== Service ordering and colocation constarints ===
+==== Service Ordering and Co-location Constraints ====
  
-So far there are no plans to implement this (although it would be possible).
+There are no to implement this for the initial version but although it would be
+possible and probably should be done for later versions.
  
-=== Possible CRM Service States ===
+==== Possible CRM Service States ====
  
  stopped:      Service is stopped (confirmed by LRM)
  
  request_stop: Service should be stopped. Waiting for 
-             confirmation from LRM.
+              confirmation from LRM.
  
  started:      Service is active an LRM should start it asap.
  
  fence:        Wait for node fencing (service node is not inside
-             quorate cluster partition).
+              quorate cluster partition).
+
+recovery:     Service node gets recovered to a new node as it current one was
+              fenced. Note that a service might be stuck here depending on the
+              group/priority configuration
  
  freeze:       Do not touch. We use this state while we reboot a node,
-             or when we restart the LRM daemon.
+              or when we restart the LRM daemon.
  
  migrate:      Migrate (live) service to other node.
  
+relocate:     Migrate (stop. move, start) service to other node.
+
  error:        Service disabled because of LRM errors.
  
  
-== Local Resource Manager (class PVE::HA::LRM) ==
+There's also a `ignored` state which tells the HA stack to ignore a service
+completely, i.e., as it wasn't under HA control at all.
+
+=== Local Resource Manager (class PVE::HA::LRM) ===
  
  The Local Resource Manager (LRM) daemon runs one each node, and
  performs service commands (start/stop/migrate) for services assigned
@@ -131,11 +146,11 @@ cluster wide 'ha_agent_${node}_lock' lock, and the CRM is not allowed
  to assign the service to another node while the LRM holds that lock.
  
  The LRM reads the requested service state from 'manager_status', and
-tries to bring the local service into that state. The actial service
+tries to bring the local service into that state. The actual service
  status is written back to the 'service_${node}_status', and can be
  read by the CRM.
  
-== Pluggable Interface for cluster environment (class PVE::HA::Env) ==
+=== Pluggable Interface for Cluster Environment (class PVE::HA::Env) ===
  
  This class defines an interface to the actual cluster environment:
author	Thomas Lamprecht <t.lamprecht@proxmox.com>
	Tue, 3 Jan 2023 12:19:16 +0000 (13:19 +0100)
committer	Thomas Lamprecht <t.lamprecht@proxmox.com>
	Tue, 3 Jan 2023 12:19:18 +0000 (13:19 +0100)