X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=ha-manager.adoc;h=04cff283c9de497487dcb91bc534ae37eaef81fd;hp=f9a9a94be9e7f8122a57a25c156e9a679e6a83a9;hb=863a8f3a781d4c7493dcac0175abc0f52c37a046;hpb=4c34defdf7e1707e0164b3b68a9639ca7e7b14f8 diff --git a/ha-manager.adoc b/ha-manager.adoc index f9a9a94..04cff28 100644 --- a/ha-manager.adoc +++ b/ha-manager.adoc @@ -177,8 +177,6 @@ lock. Service States ~~~~~~~~~~~~~~ -[thumbnail="gui-ha-manager-status.png"] - The CRM use a service state enumeration to record the current service state. We display this state on the GUI and you can query it using the `ha-manager` command line tool: @@ -205,6 +203,10 @@ request_stop:: Service should be stopped. The CRM waits for confirmation from the LRM. +stopping:: + +Pending stop request. But the CRM did not get the request so far. + started:: Service is active an LRM should start it ASAP if not already running. @@ -212,6 +214,11 @@ If the Service fails and is detected to be not running the LRM restarts it (see xref:ha_manager_start_failure_policy[Start Failure Policy]). +starting:: + +Pending start request. But the CRM has not got any confirmation from the +LRM that the service is running. + fence:: Wait for node fencing (service node is not inside quorate cluster @@ -234,6 +241,14 @@ error:: Service is disabled because of LRM errors. Needs manual intervention (see xref:ha_manager_error_recovery[Error Recovery]). +queued:: + +Service is newly added, and the CRM has not seen it so far. + +disabled:: + +Service is stopped and marked as `disabled` + Local Resource Manager ~~~~~~~~~~~~~~~~~~~~~~ @@ -354,7 +369,8 @@ the same HA configuration. Resources ~~~~~~~~~ -[thumbnail="gui-ha-manager-resources-view.png"] +[thumbnail="gui-ha-manager-status.png"] + The resource configuration file `/etc/pve/ha/resources.cfg` stores the list of resources managed by `ha-manager`. A resource configuration @@ -613,13 +629,20 @@ Error Recovery If after all tries the service state could not be recovered it gets placed in an error state. In this state the service won't get touched -by the HA stack anymore. To recover from this state you should follow -these steps: +by the HA stack anymore. The only way out is disabling a service: -* bring the resource back into a safe and consistent state (e.g., -killing its process) +---- +# ha-manager set vm:100 --state disabled +---- -* disable the ha resource to place it in an stopped state +This can also be done in the web interface. + +To recover from the error state you should do the following: + +* bring the resource back into a safe and consistent state (e.g.: +kill its process if the service could not be stopped) + +* disable the resource to remove the error flag * fix the error which led to this failures @@ -710,6 +733,7 @@ set state:: Request the service state. See xref:ha_manager_resource_config[Resource Configuration] for possible request states. ++ ---- # ha-manager set SID -state REQUEST_STATE ----