+* *after* you fixed all errors you may request that the service starts again
+
+
+[[ha_manager_package_updates]]
+Package Updates
+---------------
+
+When updating the ha-manager, you should do one node after the other, never
+all at once for various reasons. First, while we test our software
+thoroughly, a bug affecting your specific setup cannot totally be ruled out.
+Updating one node after the other and checking the functionality of each node
+after finishing the update helps to recover from eventual problems, while
+updating all at once could result in a broken cluster and is generally not
+good practice.
+
+Also, the {pve} HA stack uses a request acknowledge protocol to perform
+actions between the cluster and the local resource manager. For restarting,
+the LRM makes a request to the CRM to freeze all its services. This prevents
+them from getting touched by the Cluster during the short time the LRM is restarting.
+After that, the LRM may safely close the watchdog during a restart.
+Such a restart happens normally during a package update and, as already stated,
+an active master CRM is needed to acknowledge the requests from the LRM. If
+this is not the case the update process can take too long which, in the worst
+case, may result in a reset triggered by the watchdog.
+
+
+[[ha_manager_node_maintenance]]
+Node Maintenance
+----------------
+
+Sometimes it is necessary to perform maintenance on a node, such as replacing
+hardware or simply installing a new kernel image. This also applies while the
+HA stack is in use.
+
+The HA stack can support you mainly in two types of maintenance:
+
+* for general shutdowns or reboots, the behavior can be configured, see
+ xref:ha_manager_shutdown_policy[Shutdown Policy].
+* for maintenance that does not require a shutdown or reboot, or that should
+ not be switched off automatically after only one reboot, you can enable the
+ manual maintenance mode.
+
+
+Maintenance Mode
+~~~~~~~~~~~~~~~~
+
+You can use the manual maintenance mode to mark the node as unavailable for HA
+operation, prompting all services managed by HA to migrate to other nodes.
+
+The target nodes for these migrations are selected from the other currently
+available nodes, and determined by the HA group configuration and the configured
+cluster resource scheduler (CRS) mode.
+During each migration, the original node will be recorded in the HA managers'
+state, so that the service can be moved back again automatically once the
+maintenance mode is disabled and the node is back online.
+
+Currently you can enabled or disable the maintenance mode using the ha-manager
+CLI tool.
+
+.Enabling maintenance mode for a node
+----
+# ha-manager crm-command node-maintenance enable NODENAME
+----
+
+This will queue a CRM command, when the manager processes this command it will
+record the request for maintenance-mode in the manager status. This allows you
+to submit the command on any node, not just on the one you want to place in, or
+out of the maintenance mode.
+
+Once the LRM on the respective node picks the command up it will mark itself as
+unavailable, but still process all migration commands. This means that the LRM
+self-fencing watchdog will stay active until all active services got moved, and
+all running workers finished.
+
+Note that the LRM status will read `maintenance` mode as soon as the LRM
+picked the requested state up, not only when all services got moved away, this
+user experience is planned to be improved in the future.
+For now, you can check for any active HA service left on the node, or watching
+out for a log line like: `pve-ha-lrm[PID]: watchdog closed (disabled)` to know
+when the node finished its transition into the maintenance mode.
+
+NOTE: The manual maintenance mode is not automatically deleted on node reboot,
+but only if it is either manually deactivated using the `ha-manager` CLI or if
+the manager-status is manually cleared.
+
+.Disabling maintenance mode for a node
+----
+# ha-manager crm-command node-maintenance disable NODENAME
+----
+
+The process of disabling the manual maintenance mode is similar to enabling it.
+Using the `ha-manager` CLI command shown above will queue a CRM command that,
+once processed, marks the respective LRM node as available again.
+
+If you deactivate the maintenance mode, all services that were on the node when
+the maintenance mode was activated will be moved back.
+
+[[ha_manager_shutdown_policy]]
+Shutdown Policy
+~~~~~~~~~~~~~~~
+
+Below you will find a description of the different HA policies for a node
+shutdown. Currently 'Conditional' is the default due to backward compatibility.
+Some users may find that 'Migrate' behaves more as expected.
+
+The shutdown policy can be configured in the Web UI (`Datacenter` -> `Options`
+-> `HA Settings`), or directly in `datacenter.cfg`:
+
+----
+ha: shutdown_policy=<value>
+----
+
+Migrate
+^^^^^^^
+
+Once the Local Resource manager (LRM) gets a shutdown request and this policy
+is enabled, it will mark itself as unavailable for the current HA manager.
+This triggers a migration of all HA Services currently located on this node.
+The LRM will try to delay the shutdown process, until all running services get
+moved away. But, this expects that the running services *can* be migrated to
+another node. In other words, the service must not be locally bound, for example
+by using hardware passthrough. As non-group member nodes are considered as
+runnable target if no group member is available, this policy can still be used
+when making use of HA groups with only some nodes selected. But, marking a group
+as 'restricted' tells the HA manager that the service cannot run outside of the
+chosen set of nodes. If all of those nodes are unavailable, the shutdown will
+hang until you manually intervene. Once the shut down node comes back online
+again, the previously displaced services will be moved back, if they were not
+already manually migrated in-between.
+
+NOTE: The watchdog is still active during the migration process on shutdown.
+If the node loses quorum it will be fenced and the services will be recovered.
+
+If you start a (previously stopped) service on a node which is currently being
+maintained, the node needs to be fenced to ensure that the service can be moved
+and started on another available node.
+
+Failover
+^^^^^^^^
+
+This mode ensures that all services get stopped, but that they will also be
+recovered, if the current node is not online soon. It can be useful when doing
+maintenance on a cluster scale, where live-migrating VMs may not be possible if
+too many nodes are powered off at a time, but you still want to ensure HA
+services get recovered and started again as soon as possible.
+
+Freeze
+^^^^^^
+
+This mode ensures that all services get stopped and frozen, so that they won't
+get recovered until the current node is online again.
+
+Conditional
+^^^^^^^^^^^
+
+The 'Conditional' shutdown policy automatically detects if a shutdown or a
+reboot is requested, and changes behaviour accordingly.
+
+.Shutdown
+
+A shutdown ('poweroff') is usually done if it is planned for the node to stay
+down for some time. The LRM stops all managed services in this case. This means
+that other nodes will take over those services afterwards.
+
+NOTE: Recent hardware has large amounts of memory (RAM). So we stop all
+resources, then restart them to avoid online migration of all that RAM. If you
+want to use online migration, you need to invoke that manually before you
+shutdown the node.
+
+
+.Reboot
+
+Node reboots are initiated with the 'reboot' command. This is usually done
+after installing a new kernel. Please note that this is different from
+``shutdown'', because the node immediately starts again.
+
+The LRM tells the CRM that it wants to restart, and waits until the CRM puts
+all resources into the `freeze` state (same mechanism is used for
+xref:ha_manager_package_updates[Package Updates]). This prevents those resources
+from being moved to other nodes. Instead, the CRM starts the resources after the
+reboot on the same node.
+
+
+Manual Resource Movement
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Last but not least, you can also manually move resources to other nodes, before
+you shutdown or restart a node. The advantage is that you have full control,
+and you can decide if you want to use online migration or not.
+
+NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or
+`watchdog-mux`. They manage and use the watchdog, so this can result in an
+immediate node reboot or even reset.
+
+
+[[ha_manager_crs]]
+Cluster Resource Scheduling
+---------------------------
+
+The cluster resource scheduler (CRS) mode controls how HA selects nodes for the
+recovery of a service as well as for migrations that are triggered by a
+shutdown policy. The default mode is `basic`, you can change it in the Web UI
+(`Datacenter` -> `Options`), or directly in `datacenter.cfg`:
+
+----
+crs: ha=static
+----
+
+[thumbnail="screenshot/gui-datacenter-options-crs.png"]
+
+The change will be in effect starting with the next manager round (after a few
+seconds).
+
+For each service that needs to be recovered or migrated, the scheduler
+iteratively chooses the best node among the nodes with the highest priority in
+the service's group.
+
+NOTE: There are plans to add modes for (static and dynamic) load-balancing in
+the future.