From: Dietmar Maurer Date: Tue, 22 Nov 2016 06:45:48 +0000 (+0100) Subject: ha-manager.adoc: reorder sections X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=commitdiff_plain;h=26513daeb341827d3c0672ecb15807370ef7cd7b ha-manager.adoc: reorder sections --- diff --git a/ha-manager.adoc b/ha-manager.adoc index e1b0df8..904347d 100644 --- a/ha-manager.adoc +++ b/ha-manager.adoc @@ -624,6 +624,29 @@ killing its process) * *after* you fixed all errors you may enable the service again +[[ha_manager_package_updates]] +Package Updates +--------------- + +When updating the ha-manager you should do one node after the other, never +all at once for various reasons. First, while we test our software +thoughtfully, a bug affecting your specific setup cannot totally be ruled out. +Upgrading one node after the other and checking the functionality of each node +after finishing the update helps to recover from an eventual problems, while +updating all could render you in a broken cluster state and is generally not +good practice. + +Also, the {pve} HA stack uses a request acknowledge protocol to perform +actions between the cluster and the local resource manager. For restarting, +the LRM makes a request to the CRM to freeze all its services. This prevents +that they get touched by the Cluster during the short time the LRM is restarting. +After that the LRM may safely close the watchdog during a restart. +Such a restart happens on a update and as already stated a active master +CRM is needed to acknowledge the requests from the LRM, if this is not the case +the update process can be too long which, in the worst case, may result in +a watchdog reset. + + Node Maintenance ---------------- @@ -654,9 +677,10 @@ done after installing a new kernel. Please note that this is different from ``shutdown'', because the node immediately starts again. The LRM tells the CRM that it wants to restart, and waits until the -CRM puts all resources into the `freeze` state. This prevents that -those resources are moved to other nodes. Instead, the CRM start the -resources after the reboot on the same node. +CRM puts all resources into the `freeze` state (same mechanism is used +for xref:ha_manager_package_updates[Pakage Updates]). This prevents +that those resources are moved to other nodes. Instead, the CRM start +the resources after the reboot on the same node. Manual Resource Movement @@ -672,29 +696,6 @@ NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or in a node reboot. -[[ha_manager_package_updates]] -Package Updates ---------------- - -When updating the ha-manager you should do one node after the other, never -all at once for various reasons. First, while we test our software -thoughtfully, a bug affecting your specific setup cannot totally be ruled out. -Upgrading one node after the other and checking the functionality of each node -after finishing the update helps to recover from an eventual problems, while -updating all could render you in a broken cluster state and is generally not -good practice. - -Also, the {pve} HA stack uses a request acknowledge protocol to perform -actions between the cluster and the local resource manager. For restarting, -the LRM makes a request to the CRM to freeze all its services. This prevents -that they get touched by the Cluster during the short time the LRM is restarting. -After that the LRM may safely close the watchdog during a restart. -Such a restart happens on a update and as already stated a active master -CRM is needed to acknowledge the requests from the LRM, if this is not the case -the update process can be too long which, in the worst case, may result in -a watchdog reset. - - [[ha_manager_service_operations]] Service Operations ------------------