git.proxmox.com Git - pve-ha-manager.git/commit

author	Thomas Lamprecht <t.lamprecht@proxmox.com>
	Wed, 16 Sep 2015 09:25:15 +0000 (11:25 +0200)
committer	Dietmar Maurer <dietmar@proxmox.com>
	Wed, 16 Sep 2015 09:54:29 +0000 (11:54 +0200)
commit	ea4443cc590b68908984ec0851aac524961d7bae
tree	1b3cddc364f3dcf29da356f78fa62f0d313b74cb	tree
parent	bf119a50c271ac3d4a95260ff8efde18b9e5194a	commit \| diff

implement recovery policy for services

We implement recovery policies which use settings known from
rgmanager, however the behaviour is not strictly the same,
our approach is more configurable. For example rgmanager cannot
combine its restart and relocate policy.

There are the following policy settings which kick in on an failed
service start:
* max_restart:  maxmial number of tries to restart an failed service
                on the actual node. The default is 1 restart try.
                This policy gets enforced by the LRM.

* max_relocate: maximal number of tries to relocate the service to a
                a different node. A relocate only takes place after
                the max_restart value is exceeded on the actual node
                This policy gets enforced by the CRM.

If a service is still no running after all max tries, it's state
gets set to 'error'. This means that the service needs to be checked
and disabled manually.

*Note* that the relocate state will only reset when the service had
at least one successful start. That means if a service is reenabled
without fixing the error only the restart policy gets repeated.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>

src/PVE/HA/Env/PVE2.pm		diff \| blob \| blame \| history
src/PVE/HA/LRM.pm		diff \| blob \| blame \| history
src/PVE/HA/Manager.pm		diff \| blob \| blame \| history
src/PVE/HA/Resources.pm		diff \| blob \| blame \| history