X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=ha-manager.adoc;h=fadc6b5bb3bfceea1067a235624d78fe98e75a3b;hp=2162d25ea0e64db8561375e9a9fe85d43a0b63d9;hb=HEAD;hpb=049fc55728e69ae80361ebf88c8dabbf068a4417 diff --git a/ha-manager.adoc b/ha-manager.adoc index 2162d25..66a3b8f 100644 --- a/ha-manager.adoc +++ b/ha-manager.adoc @@ -63,7 +63,7 @@ usually at higher price. * Eliminate single point of failure (redundant components) ** use an uninterruptible power supply (UPS) -** use redundant power supplies on the main boards +** use redundant power supplies in your servers ** use ECC-RAM ** use redundant network hardware ** use RAID for local storage @@ -147,7 +147,7 @@ Management Tasks This section provides a short overview of common management tasks. The first step is to enable HA for a resource. This is done by adding the resource to the HA resource configuration. You can do this using the -GUI, or simply use the command line tool, for example: +GUI, or simply use the command-line tool, for example: ---- # ha-manager add vm:100 @@ -243,7 +243,7 @@ the current manager status file and executes the respective commands. `pve-ha-crm`:: -The cluster resource manager (CRM), which makes the cluster wide +The cluster resource manager (CRM), which makes the cluster-wide decisions. It sends commands to the LRM, processes the results, and moves resources to other nodes if something fails. The CRM also handles node fencing. @@ -260,12 +260,13 @@ This all gets supervised by the CRM which currently holds the manager master lock. +[[ha_manager_service_states]] Service States ~~~~~~~~~~~~~~ The CRM uses a service state enumeration to record the current service state. This state is displayed on the GUI and can be queried using -the `ha-manager` command line tool: +the `ha-manager` command-line tool: ---- # ha-manager status @@ -307,10 +308,20 @@ LRM that the service is running. fence:: -Wait for node fencing (service node is not inside quorate cluster -partition). As soon as node gets fenced successfully the service will -be recovered to another node, if possible -(see xref:ha_manager_fencing[Fencing]). +Wait for node fencing as the service node is not inside the quorate cluster +partition (see xref:ha_manager_fencing[Fencing]). +As soon as node gets fenced successfully the service will be placed into the +recovery state. + +recovery:: + +Wait for recovery of the service. The HA manager tries to find a new node where +the service can run on. This search depends not only on the list of online and +quorate nodes, but also if the service is a group member and how such a group +is limited. +As soon as a new available node is found, the service will be moved there and +initially placed into stopped state. If it's configured to run the new node +will do so. freeze:: @@ -321,9 +332,8 @@ node, or when we restart the LRM daemon ignored:: Act as if the service were not managed by HA at all. -Useful, when full control over the service is desired temporarily, -without removing it from the HA configuration. - +Useful, when full control over the service is desired temporarily, without +removing it from the HA configuration. migrate:: @@ -343,11 +353,12 @@ disabled:: Service is stopped and marked as `disabled` +[[ha_manager_lrm]] Local Resource Manager ~~~~~~~~~~~~~~~~~~~~~~ The local resource manager (`pve-ha-lrm`) is started as a daemon on -boot and waits until the HA cluster is quorate and thus cluster wide +boot and waits until the HA cluster is quorate and thus cluster-wide locks are working. It can be in three states: @@ -407,6 +418,8 @@ what both daemons, the LRM and the CRM, did. You may use `journalctl -u pve-ha-lrm` on the node(s) where the service is and the same command for the pve-ha-crm on the node which is the current master. + +[[ha_manager_crm]] Cluster Resource Manager ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -509,7 +522,7 @@ Configuration ------------- The HA stack is well integrated into the {pve} API. So, for example, -HA can be configured via the `ha-manager` command line interface, or +HA can be configured via the `ha-manager` command-line interface, or the {pve} web interface - both interfaces provide an easy way to manage HA. Automation tools can use the API directly. @@ -559,7 +572,7 @@ ct: 102 [thumbnail="screenshot/gui-ha-manager-add-resource.png"] -The above config was generated using the `ha-manager` command line tool: +The above config was generated using the `ha-manager` command-line tool: ---- # ha-manager add vm:501 --state started --max_relocate 2 @@ -825,13 +838,76 @@ this is not the case the update process can take too long which, in the worst case, may result in a reset triggered by the watchdog. +[[ha_manager_node_maintenance]] Node Maintenance ---------------- -It is sometimes necessary to shutdown or reboot a node to do maintenance tasks, -such as to replace hardware, or simply to install a new kernel image. This is -also true when using the HA stack. The behaviour of the HA stack during a -shutdown can be configured. +Sometimes it is necessary to perform maintenance on a node, such as replacing +hardware or simply installing a new kernel image. This also applies while the +HA stack is in use. + +The HA stack can support you mainly in two types of maintenance: + +* for general shutdowns or reboots, the behavior can be configured, see + xref:ha_manager_shutdown_policy[Shutdown Policy]. +* for maintenance that does not require a shutdown or reboot, or that should + not be switched off automatically after only one reboot, you can enable the + manual maintenance mode. + + +Maintenance Mode +~~~~~~~~~~~~~~~~ + +You can use the manual maintenance mode to mark the node as unavailable for HA +operation, prompting all services managed by HA to migrate to other nodes. + +The target nodes for these migrations are selected from the other currently +available nodes, and determined by the HA group configuration and the configured +cluster resource scheduler (CRS) mode. +During each migration, the original node will be recorded in the HA managers' +state, so that the service can be moved back again automatically once the +maintenance mode is disabled and the node is back online. + +Currently you can enabled or disable the maintenance mode using the ha-manager +CLI tool. + +.Enabling maintenance mode for a node +---- +# ha-manager crm-command node-maintenance enable NODENAME +---- + +This will queue a CRM command, when the manager processes this command it will +record the request for maintenance-mode in the manager status. This allows you +to submit the command on any node, not just on the one you want to place in, or +out of the maintenance mode. + +Once the LRM on the respective node picks the command up it will mark itself as +unavailable, but still process all migration commands. This means that the LRM +self-fencing watchdog will stay active until all active services got moved, and +all running workers finished. + +Note that the LRM status will read `maintenance` mode as soon as the LRM +picked the requested state up, not only when all services got moved away, this +user experience is planned to be improved in the future. +For now, you can check for any active HA service left on the node, or watching +out for a log line like: `pve-ha-lrm[PID]: watchdog closed (disabled)` to know +when the node finished its transition into the maintenance mode. + +NOTE: The manual maintenance mode is not automatically deleted on node reboot, +but only if it is either manually deactivated using the `ha-manager` CLI or if +the manager-status is manually cleared. + +.Disabling maintenance mode for a node +---- +# ha-manager crm-command node-maintenance disable NODENAME +---- + +The process of disabling the manual maintenance mode is similar to enabling it. +Using the `ha-manager` CLI command shown above will queue a CRM command that, +once processed, marks the respective LRM node as available again. + +If you deactivate the maintenance mode, all services that were on the node when +the maintenance mode was activated will be moved back. [[ha_manager_shutdown_policy]] Shutdown Policy @@ -841,6 +917,13 @@ Below you will find a description of the different HA policies for a node shutdown. Currently 'Conditional' is the default due to backward compatibility. Some users may find that 'Migrate' behaves more as expected. +The shutdown policy can be configured in the Web UI (`Datacenter` -> `Options` +-> `HA Settings`), or directly in `datacenter.cfg`: + +---- +ha: shutdown_policy= +---- + Migrate ^^^^^^^ @@ -924,6 +1007,87 @@ NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or immediate node reboot or even reset. +[[ha_manager_crs]] +Cluster Resource Scheduling +--------------------------- + +The cluster resource scheduler (CRS) mode controls how HA selects nodes for the +recovery of a service as well as for migrations that are triggered by a +shutdown policy. The default mode is `basic`, you can change it in the Web UI +(`Datacenter` -> `Options`), or directly in `datacenter.cfg`: + +---- +crs: ha=static +---- + +[thumbnail="screenshot/gui-datacenter-options-crs.png"] + +The change will be in effect starting with the next manager round (after a few +seconds). + +For each service that needs to be recovered or migrated, the scheduler +iteratively chooses the best node among the nodes with the highest priority in +the service's group. + +NOTE: There are plans to add modes for (static and dynamic) load-balancing in +the future. + +Basic Scheduler +~~~~~~~~~~~~~~~ + +The number of active HA services on each node is used to choose a recovery node. +Non-HA-managed services are currently not counted. + +Static-Load Scheduler +~~~~~~~~~~~~~~~~~~~~~ + +IMPORTANT: The static mode is still a technology preview. + +Static usage information from HA services on each node is used to choose a +recovery node. Usage of non-HA-managed services is currently not considered. + +For this selection, each node in turn is considered as if the service was +already running on it, using CPU and memory usage from the associated guest +configuration. Then for each such alternative, CPU and memory usage of all nodes +are considered, with memory being weighted much more, because it's a truly +limited resource. For both, CPU and memory, highest usage among nodes (weighted +more, as ideally no node should be overcommitted) and average usage of all nodes +(to still be able to distinguish in case there already is a more highly +committed node) are considered. + +IMPORTANT: The more services the more possible combinations there are, so it's +currently not recommended to use it if you have thousands of HA managed +services. + + +CRS Scheduling Points +~~~~~~~~~~~~~~~~~~~~~ + +The CRS algorithm is not applied for every service in every round, since this +would mean a large number of constant migrations. Depending on the workload, +this could put more strain on the cluster than could be avoided by constant +balancing. +That's why the {pve} HA manager favors keeping services on their current node. + +The CRS is currently used at the following scheduling points: + +- Service recovery (always active). When a node with active HA services fails, + all its services need to be recovered to other nodes. The CRS algorithm will + be used here to balance that recovery over the remaining nodes. + +- HA group config changes (always active). If a node is removed from a group, + or its priority is reduced, the HA stack will use the CRS algorithm to find a + new target node for the HA services in that group, matching the adapted + priority constraints. + +- HA service stopped -> start transtion (opt-in). Requesting that a stopped + service should be started is an good opportunity to check for the best suited + node as per the CRS algorithm, as moving stopped services is cheaper to do + than moving them started, especially if their disk volumes reside on shared + storage. You can enable this by setting the **`ha-rebalance-on-start`** + CRS option in the datacenter config. You can change that option also in the + Web UI, under `Datacenter` -> `Options` -> `Cluster Resource Scheduling`. + ifdef::manvolnum[] include::pve-copyright.adoc[] endif::manvolnum[]