]> git.proxmox.com Git - pve-ha-manager.git/log
pve-ha-manager.git
8 years agobump version to 1.0-6
Dietmar Maurer [Wed, 16 Sep 2015 10:06:37 +0000 (12:06 +0200)]
bump version to 1.0-6

8 years agofix includes from services
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:18 +0000 (11:25 +0200)]
fix includes from services

The crm and lrm daemon executables need to include SafeSyslog, as
they use syslog in their signal handler.
Whereas it isn't needed anymore in the Service class of the daemons.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agofixing typos, also whitespace cleanup in PVE2 env class
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:17 +0000 (11:25 +0200)]
fixing typos, also whitespace cleanup in PVE2 env class

fix typos through the whole project, used codespell to find most of
them.
Also do a big whitespace cleanup in the PVE2 enviorment class.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoadjust log level on failed start and error to warning
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:16 +0000 (11:25 +0200)]
adjust log level on failed start and error to warning

use warning instead of info to represent the significance of the
log message

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoimplement recovery policy for services
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:15 +0000 (11:25 +0200)]
implement recovery policy for services

We implement recovery policies which use settings known from
rgmanager, however the behaviour is not strictly the same,
our approach is more configurable. For example rgmanager cannot
combine its restart and relocate policy.

There are the following policy settings which kick in on an failed
service start:
* max_restart:  maxmial number of tries to restart an failed service
                on the actual node. The default is 1 restart try.
                This policy gets enforced by the LRM.

* max_relocate: maximal number of tries to relocate the service to a
                a different node. A relocate only takes place after
                the max_restart value is exceeded on the actual node
                This policy gets enforced by the CRM.

If a service is still no running after all max tries, it's state
gets set to 'error'. This means that the service needs to be checked
and disabled manually.

*Note* that the relocate state will only reset when the service had
at least one successful start. That means if a service is reenabled
without fixing the error only the restart policy gets repeated.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoimprove sid bash completion
Dietmar Maurer [Wed, 16 Sep 2015 06:32:58 +0000 (08:32 +0200)]
improve sid bash completion

8 years agouse helpers to enable advanced auto completion
Thomas Lamprecht [Tue, 15 Sep 2015 07:27:37 +0000 (09:27 +0200)]
use helpers to enable advanced auto completion

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoadd auto completion helper for service IDs and HA groups
Thomas Lamprecht [Tue, 15 Sep 2015 07:27:36 +0000 (09:27 +0200)]
add auto completion helper for service IDs and HA groups

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agosimulator: fix random output of manager status
Thomas Lamprecht [Fri, 11 Sep 2015 14:57:17 +0000 (16:57 +0200)]
simulator: fix random output of manager status

Tell Data::Dumper to sort the keys before dumping. That fixes
the manager status mess of jumping keys.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoremove 'exename' from CLIHandler classes (not required)
Dietmar Maurer [Tue, 15 Sep 2015 06:26:59 +0000 (08:26 +0200)]
remove 'exename' from CLIHandler classes (not required)

8 years agoha-manager: fix manpage header
Dietmar Maurer [Tue, 15 Sep 2015 05:32:22 +0000 (07:32 +0200)]
ha-manager: fix manpage header

8 years agoconvert pve-ha-crm into a PVE::Service class
Thomas Lamprecht [Mon, 14 Sep 2015 15:21:56 +0000 (17:21 +0200)]
convert pve-ha-crm into a PVE::Service class

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoconvert pve-ha-lrm into a PVE::Service class
Thomas Lamprecht [Mon, 14 Sep 2015 15:21:55 +0000 (17:21 +0200)]
convert pve-ha-lrm into a PVE::Service class

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agore-add code silently removed by last commit
Dietmar Maurer [Tue, 15 Sep 2015 05:25:10 +0000 (07:25 +0200)]
re-add code silently removed by last commit

8 years agomove ha-manager to separate CLIHandler class
Thomas Lamprecht [Mon, 14 Sep 2015 15:21:54 +0000 (17:21 +0200)]
move ha-manager to separate CLIHandler class

Move ha-manager to separate CLIHandler class and add basic auto
completion support.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agobump version to 1.0-5
Dietmar Maurer [Tue, 8 Sep 2015 06:46:03 +0000 (08:46 +0200)]
bump version to 1.0-5

8 years agoAdding error state behaviour
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:33 +0000 (17:52 +0200)]
Adding error state behaviour

Previously there was no way out of the error state.
Now a 'safe' state can be reached by disabling the service manually.

Disabling and reactivating should only be done if the error cause
was found and fixed.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoReplacing hardcoded qemu commands with plugin calls
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:32 +0000 (17:52 +0200)]
Replacing hardcoded qemu commands with plugin calls

Now a service specific plugin gets loaded and the calls to commands
like 'migrate' or 'stop' will be handled by the plugin.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoFixed hardcoded type 'vm' in check if vm is ha managed
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:31 +0000 (17:52 +0200)]
Fixed hardcoded type 'vm' in check if vm is ha managed

The new approach checks every registered resource type.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoAdding PVECT resource class so that CT can be HA managed
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:30 +0000 (17:52 +0200)]
Adding PVECT resource class so that CT can be HA managed

Extend the PVEVM resource class and add a PVECT resource class so
that service type specific operations (e.g.: start, migrate, ...)
can be handled through an plugin and are independent of the service
type.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
8 years agoHA parse_sid changed to accept CT
Alen Grizonic [Tue, 1 Sep 2015 09:53:59 +0000 (11:53 +0200)]
HA parse_sid changed to accept CT

[PATCH v4] changes:

- fixed VM/CT exist check
- added internal error exception
- fix spelling errors

8 years agoFix Typo
Wolfgang Link [Mon, 17 Aug 2015 08:52:02 +0000 (10:52 +0200)]
Fix Typo

8 years agobump version to 1.0-4
Dietmar Maurer [Tue, 16 Jun 2015 07:59:25 +0000 (09:59 +0200)]
bump version to 1.0-4

8 years agogroups: encode nodes as hash (internally)
Dietmar Maurer [Tue, 16 Jun 2015 07:57:09 +0000 (09:57 +0200)]
groups: encode nodes as hash (internally)

8 years agoadd trigger for pve-api-updates
Dietmar Maurer [Tue, 16 Jun 2015 07:55:48 +0000 (09:55 +0200)]
add trigger for pve-api-updates

8 years agocrm: simply wait if there is no resource config
Dietmar Maurer [Wed, 10 Jun 2015 05:39:18 +0000 (07:39 +0200)]
crm: simply wait if there is no resource config

8 years agobump version to 1.0-3
Dietmar Maurer [Tue, 9 Jun 2015 12:35:32 +0000 (14:35 +0200)]
bump version to 1.0-3

8 years agobump version to 1.0-2
Dietmar Maurer [Tue, 9 Jun 2015 07:33:42 +0000 (09:33 +0200)]
bump version to 1.0-2

8 years agouse Wants instead of Requires inside systemd service definitions
Dietmar Maurer [Tue, 9 Jun 2015 07:32:15 +0000 (09:32 +0200)]
use Wants instead of Requires inside systemd service definitions

To avoid unnecessary restarts of dependent services.

8 years agobump version to 1.0-1
Dietmar Maurer [Fri, 5 Jun 2015 08:04:45 +0000 (10:04 +0200)]
bump version to 1.0-1

8 years agodelete stale files
Dietmar Maurer [Fri, 5 Jun 2015 08:02:55 +0000 (10:02 +0200)]
delete stale files

8 years agoalways start crm and lrm service
Dietmar Maurer [Fri, 5 Jun 2015 08:00:48 +0000 (10:00 +0200)]
always start crm and lrm service

Even if there is no resources.cfg. That makes it easier to
enable HA, because we don't need to start services manually.

9 years agobump version to 0.9-3
Dietmar Maurer [Fri, 10 Apr 2015 04:54:36 +0000 (06:54 +0200)]
bump version to 0.9-3

9 years agoimplement delay command for regression tester
Dietmar Maurer [Fri, 10 Apr 2015 04:51:51 +0000 (06:51 +0200)]
implement delay command for regression tester

9 years agotest of failback
root [Fri, 3 Apr 2015 08:26:29 +0000 (10:26 +0200)]
test of failback

Signed-off-by: Wolfgang Link <w.link@proxmox.com>
9 years agocorrectly pass parameters for change_service_location
Dietmar Maurer [Fri, 10 Apr 2015 04:32:46 +0000 (06:32 +0200)]
correctly pass parameters for change_service_location

9 years agosort output so that we can compare logs
Dietmar Maurer [Fri, 10 Apr 2015 04:31:44 +0000 (06:31 +0200)]
sort output so that we can compare logs

9 years agobump version to 0.9-2
Dietmar Maurer [Tue, 7 Apr 2015 07:52:14 +0000 (09:52 +0200)]
bump version to 0.9-2

9 years agoadd warnings if ha group does not exists
Dietmar Maurer [Tue, 7 Apr 2015 07:50:56 +0000 (09:50 +0200)]
add warnings if ha group does not exists

9 years agouse groups parser
Dietmar Maurer [Tue, 7 Apr 2015 04:55:02 +0000 (06:55 +0200)]
use groups parser

9 years agoavoid perl warning
Dietmar Maurer [Sun, 5 Apr 2015 15:55:09 +0000 (17:55 +0200)]
avoid perl warning

9 years agoupdate README
Dietmar Maurer [Fri, 3 Apr 2015 17:00:22 +0000 (19:00 +0200)]
update README

9 years agodo not allow deletion of ha group if group is used
Dietmar Maurer [Fri, 3 Apr 2015 14:45:49 +0000 (16:45 +0200)]
do not allow deletion of ha group if group is used

9 years agouse correct class
Dietmar Maurer [Fri, 3 Apr 2015 09:16:19 +0000 (11:16 +0200)]
use correct class

9 years agocomplete ha group api
Dietmar Maurer [Fri, 3 Apr 2015 09:08:23 +0000 (11:08 +0200)]
complete ha group api

9 years agoapi: allow to use simply VMIDs as resource id
Dietmar Maurer [Fri, 3 Apr 2015 06:33:37 +0000 (08:33 +0200)]
api: allow to use simply VMIDs as resource id

9 years agoimprove status API
Dietmar Maurer [Fri, 3 Apr 2015 04:47:07 +0000 (06:47 +0200)]
improve status API

9 years agoremove ipaddr resource type
Dietmar Maurer [Fri, 3 Apr 2015 04:24:47 +0000 (06:24 +0200)]
remove ipaddr resource type

9 years agobump version to 0.9-1
Dietmar Maurer [Fri, 3 Apr 2015 04:18:23 +0000 (06:18 +0200)]
bump version to 0.9-1

9 years agorename vm resource prefix: pvevm: => vm:
Dietmar Maurer [Fri, 3 Apr 2015 04:16:40 +0000 (06:16 +0200)]
rename vm resource prefix: pvevm: => vm:

9 years agoadd API to query ha status
Dietmar Maurer [Fri, 3 Apr 2015 04:14:04 +0000 (06:14 +0200)]
add API to query ha status

9 years agobump version to 0.8-2
Dietmar Maurer [Thu, 2 Apr 2015 06:48:37 +0000 (08:48 +0200)]
bump version to 0.8-2

9 years agolrm: reduce TimeoutStopSec
Dietmar Maurer [Thu, 2 Apr 2015 06:47:01 +0000 (08:47 +0200)]
lrm: reduce TimeoutStopSec

because systemd waits 2*TimeoutStopSec

9 years agolrm: set systemd killmode to 'process'
Dietmar Maurer [Thu, 2 Apr 2015 06:43:28 +0000 (08:43 +0200)]
lrm: set systemd killmode to 'process'

We do not want to kill running VMs (for example during software update).

9 years agobump version to 0.8-1
Dietmar Maurer [Thu, 2 Apr 2015 06:21:26 +0000 (08:21 +0200)]
bump version to 0.8-1

9 years agocurrecrtly send cfs lock update request
Dietmar Maurer [Thu, 2 Apr 2015 06:17:15 +0000 (08:17 +0200)]
currecrtly send cfs lock update request

9 years agobump version to 0.7-1
Dietmar Maurer [Wed, 1 Apr 2015 09:05:25 +0000 (11:05 +0200)]
bump version to 0.7-1

9 years agocreate /etc/pve/ha
Dietmar Maurer [Wed, 1 Apr 2015 07:57:03 +0000 (09:57 +0200)]
create /etc/pve/ha

9 years agouse correct package for lock_ha_config
Dietmar Maurer [Wed, 1 Apr 2015 07:51:48 +0000 (09:51 +0200)]
use correct package for lock_ha_config

9 years agofit ha-manager status when ha is unconfigured
Dietmar Maurer [Wed, 1 Apr 2015 06:20:05 +0000 (08:20 +0200)]
fit ha-manager status when ha is unconfigured

9 years agodo not unlink watchdog socket when started via systemd
Dietmar Maurer [Wed, 1 Apr 2015 06:19:32 +0000 (08:19 +0200)]
do not unlink watchdog socket when started via systemd

9 years agodepend on systemd (build-depend on dh-systemd)
Dietmar Maurer [Wed, 1 Apr 2015 06:05:01 +0000 (08:05 +0200)]
depend on systemd (build-depend on dh-systemd)

9 years agofix json_reader
Dietmar Maurer [Wed, 1 Apr 2015 05:53:08 +0000 (07:53 +0200)]
fix json_reader

9 years agofix dependencies
Dietmar Maurer [Tue, 31 Mar 2015 11:46:33 +0000 (13:46 +0200)]
fix dependencies

9 years agolrm: use correct rpcenv 'ha'
Dietmar Maurer [Fri, 27 Mar 2015 11:42:20 +0000 (12:42 +0100)]
lrm: use correct rpcenv 'ha'

9 years agobump version to 0.6-1
Dietmar Maurer [Fri, 27 Mar 2015 11:29:56 +0000 (12:29 +0100)]
bump version to 0.6-1

9 years agomove configuration handling into PVE::HA::Config
Dietmar Maurer [Fri, 27 Mar 2015 11:26:26 +0000 (12:26 +0100)]
move configuration handling into PVE::HA::Config

9 years agouse cfs_read_file and cfs_write_file
Dietmar Maurer [Fri, 27 Mar 2015 10:40:21 +0000 (11:40 +0100)]
use cfs_read_file and cfs_write_file

9 years agoha-manager status: include service state
Dietmar Maurer [Fri, 27 Mar 2015 08:17:15 +0000 (09:17 +0100)]
ha-manager status: include service state

9 years agoha-manager status: add --verbose flag
Dietmar Maurer [Fri, 27 Mar 2015 08:00:53 +0000 (09:00 +0100)]
ha-manager status: add --verbose flag

9 years agorestart lrm after upgrade
Dietmar Maurer [Fri, 27 Mar 2015 07:51:41 +0000 (08:51 +0100)]
restart lrm after upgrade

9 years agoha-manager: improve status output
Dietmar Maurer [Fri, 27 Mar 2015 07:31:41 +0000 (08:31 +0100)]
ha-manager: improve status output

9 years agoadd timestamp to manager status
Dietmar Maurer [Fri, 27 Mar 2015 07:31:13 +0000 (08:31 +0100)]
add timestamp to manager status

9 years agoupdate lrm status on each iteration
Dietmar Maurer [Fri, 27 Mar 2015 05:56:51 +0000 (06:56 +0100)]
update lrm status on each iteration

9 years agoupdate_lrm_status: add a time stamp
Dietmar Maurer [Fri, 27 Mar 2015 05:50:45 +0000 (06:50 +0100)]
update_lrm_status: add a time stamp

9 years agocleanup lrm startup code
Dietmar Maurer [Fri, 27 Mar 2015 05:49:19 +0000 (06:49 +0100)]
cleanup lrm startup code

9 years agodepend on qemu-server
Dietmar Maurer [Fri, 27 Mar 2015 05:32:04 +0000 (06:32 +0100)]
depend on qemu-server

9 years agoimprove docu
Dietmar Maurer [Fri, 27 Mar 2015 05:28:50 +0000 (06:28 +0100)]
improve docu

9 years agoremove dead code
Dietmar Maurer [Thu, 26 Mar 2015 16:17:49 +0000 (17:17 +0100)]
remove dead code

9 years agoadd another test
Dietmar Maurer [Thu, 26 Mar 2015 15:47:18 +0000 (16:47 +0100)]
add another test

9 years agoadd another test case
Dietmar Maurer [Thu, 26 Mar 2015 15:39:56 +0000 (16:39 +0100)]
add another test case

9 years agobump version 0.5-1
Dietmar Maurer [Thu, 26 Mar 2015 12:23:20 +0000 (13:23 +0100)]
bump version 0.5-1

9 years agoimplement migrate
Dietmar Maurer [Thu, 26 Mar 2015 12:01:27 +0000 (13:01 +0100)]
implement migrate

9 years agoimplement change_service_location
Dietmar Maurer [Thu, 26 Mar 2015 11:50:47 +0000 (12:50 +0100)]
implement change_service_location

9 years agolrm: fix stop timeout
Dietmar Maurer [Thu, 26 Mar 2015 09:43:06 +0000 (10:43 +0100)]
lrm: fix stop timeout

9 years agofix service dependencies
Dietmar Maurer [Thu, 26 Mar 2015 09:21:02 +0000 (10:21 +0100)]
fix service dependencies

So that we can shutdown without triggering the watchdog. It is also
important to depend on syslog.service (else logs gets lost)

9 years agoassume lrm mode 'active' by default
Dietmar Maurer [Thu, 26 Mar 2015 07:08:58 +0000 (08:08 +0100)]
assume lrm mode 'active' by default

9 years agolog errors when writing lrm status
Dietmar Maurer [Thu, 26 Mar 2015 07:01:38 +0000 (08:01 +0100)]
log errors when writing lrm status

And correctly write status once at daemon startup (we need to wait for quorum)

9 years agowrite lrm mode into lrm status file
Dietmar Maurer [Thu, 26 Mar 2015 06:26:24 +0000 (07:26 +0100)]
write lrm mode into lrm status file

LRM is normally in 'active' mode, but can be set to 'reboot', 'shutdown' or 'restart'.
We use this to freeze services, so that we can safely reboot a node, or restart
the LRM.

9 years agobump version to 0.4-1
Dietmar Maurer [Wed, 25 Mar 2015 12:59:47 +0000 (13:59 +0100)]
bump version to 0.4-1

9 years agoincrease fence_delay to 60 seconds
Dietmar Maurer [Wed, 25 Mar 2015 12:09:28 +0000 (13:09 +0100)]
increase fence_delay to 60 seconds

To match the watchdog timeout.

9 years agoremove dead code
Dietmar Maurer [Wed, 25 Mar 2015 12:04:28 +0000 (13:04 +0100)]
remove dead code

9 years agofix failover after master crash with pending fence action
Dietmar Maurer [Wed, 25 Mar 2015 12:00:09 +0000 (13:00 +0100)]
fix failover after master crash with pending fence action

Also include a test case for that.

9 years agoadd README for regresstion test
Dietmar Maurer [Wed, 25 Mar 2015 08:06:16 +0000 (09:06 +0100)]
add README for regresstion test

The idea is to describe each test shortly, so that it is easier
to understand the purpose.

9 years agore-enable ha-tester (run regression tests)
Dietmar Maurer [Wed, 25 Mar 2015 08:01:59 +0000 (09:01 +0100)]
re-enable ha-tester (run regression tests)

9 years agoremove stale tests
Dietmar Maurer [Wed, 25 Mar 2015 07:58:18 +0000 (08:58 +0100)]
remove stale tests

9 years agofix regression test environment
Dietmar Maurer [Wed, 25 Mar 2015 07:51:57 +0000 (08:51 +0100)]
fix regression test environment

9 years agomove exec_resource_agent() to PVE::HA::Sim::Env
Dietmar Maurer [Wed, 25 Mar 2015 07:49:48 +0000 (08:49 +0100)]
move exec_resource_agent() to PVE::HA::Sim::Env

so that we can reuse it with regression tests

9 years agoLRM do not use time(), improve logging
Dietmar Maurer [Wed, 25 Mar 2015 07:48:29 +0000 (08:48 +0100)]
LRM do not use time(), improve logging

9 years agoadd a hack to support regression tests (can_fork())
Dietmar Maurer [Wed, 25 Mar 2015 07:46:22 +0000 (08:46 +0100)]
add a hack to support regression tests (can_fork())