]>
git.proxmox.com Git - pve-ha-manager.git/log
Dietmar Maurer [Wed, 16 Sep 2015 10:06:37 +0000 (12:06 +0200)]
bump version to 1.0-6
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:18 +0000 (11:25 +0200)]
fix includes from services
The crm and lrm daemon executables need to include SafeSyslog, as
they use syslog in their signal handler.
Whereas it isn't needed anymore in the Service class of the daemons.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:17 +0000 (11:25 +0200)]
fixing typos, also whitespace cleanup in PVE2 env class
fix typos through the whole project, used codespell to find most of
them.
Also do a big whitespace cleanup in the PVE2 enviorment class.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:16 +0000 (11:25 +0200)]
adjust log level on failed start and error to warning
use warning instead of info to represent the significance of the
log message
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 16 Sep 2015 09:25:15 +0000 (11:25 +0200)]
implement recovery policy for services
We implement recovery policies which use settings known from
rgmanager, however the behaviour is not strictly the same,
our approach is more configurable. For example rgmanager cannot
combine its restart and relocate policy.
There are the following policy settings which kick in on an failed
service start:
* max_restart: maxmial number of tries to restart an failed service
on the actual node. The default is 1 restart try.
This policy gets enforced by the LRM.
* max_relocate: maximal number of tries to relocate the service to a
a different node. A relocate only takes place after
the max_restart value is exceeded on the actual node
This policy gets enforced by the CRM.
If a service is still no running after all max tries, it's state
gets set to 'error'. This means that the service needs to be checked
and disabled manually.
*Note* that the relocate state will only reset when the service had
at least one successful start. That means if a service is reenabled
without fixing the error only the restart policy gets repeated.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dietmar Maurer [Wed, 16 Sep 2015 06:32:58 +0000 (08:32 +0200)]
improve sid bash completion
Thomas Lamprecht [Tue, 15 Sep 2015 07:27:37 +0000 (09:27 +0200)]
use helpers to enable advanced auto completion
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 15 Sep 2015 07:27:36 +0000 (09:27 +0200)]
add auto completion helper for service IDs and HA groups
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 11 Sep 2015 14:57:17 +0000 (16:57 +0200)]
simulator: fix random output of manager status
Tell Data::Dumper to sort the keys before dumping. That fixes
the manager status mess of jumping keys.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dietmar Maurer [Tue, 15 Sep 2015 06:26:59 +0000 (08:26 +0200)]
remove 'exename' from CLIHandler classes (not required)
Dietmar Maurer [Tue, 15 Sep 2015 05:32:22 +0000 (07:32 +0200)]
ha-manager: fix manpage header
Thomas Lamprecht [Mon, 14 Sep 2015 15:21:56 +0000 (17:21 +0200)]
convert pve-ha-crm into a PVE::Service class
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Mon, 14 Sep 2015 15:21:55 +0000 (17:21 +0200)]
convert pve-ha-lrm into a PVE::Service class
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dietmar Maurer [Tue, 15 Sep 2015 05:25:10 +0000 (07:25 +0200)]
re-add code silently removed by last commit
Thomas Lamprecht [Mon, 14 Sep 2015 15:21:54 +0000 (17:21 +0200)]
move ha-manager to separate CLIHandler class
Move ha-manager to separate CLIHandler class and add basic auto
completion support.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dietmar Maurer [Tue, 8 Sep 2015 06:46:03 +0000 (08:46 +0200)]
bump version to 1.0-5
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:33 +0000 (17:52 +0200)]
Adding error state behaviour
Previously there was no way out of the error state.
Now a 'safe' state can be reached by disabling the service manually.
Disabling and reactivating should only be done if the error cause
was found and fixed.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:32 +0000 (17:52 +0200)]
Replacing hardcoded qemu commands with plugin calls
Now a service specific plugin gets loaded and the calls to commands
like 'migrate' or 'stop' will be handled by the plugin.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:31 +0000 (17:52 +0200)]
Fixed hardcoded type 'vm' in check if vm is ha managed
The new approach checks every registered resource type.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 2 Sep 2015 15:52:30 +0000 (17:52 +0200)]
Adding PVECT resource class so that CT can be HA managed
Extend the PVEVM resource class and add a PVECT resource class so
that service type specific operations (e.g.: start, migrate, ...)
can be handled through an plugin and are independent of the service
type.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Alen Grizonic [Tue, 1 Sep 2015 09:53:59 +0000 (11:53 +0200)]
HA parse_sid changed to accept CT
[PATCH v4] changes:
- fixed VM/CT exist check
- added internal error exception
- fix spelling errors
Wolfgang Link [Mon, 17 Aug 2015 08:52:02 +0000 (10:52 +0200)]
Fix Typo
Dietmar Maurer [Tue, 16 Jun 2015 07:59:25 +0000 (09:59 +0200)]
bump version to 1.0-4
Dietmar Maurer [Tue, 16 Jun 2015 07:57:09 +0000 (09:57 +0200)]
groups: encode nodes as hash (internally)
Dietmar Maurer [Tue, 16 Jun 2015 07:55:48 +0000 (09:55 +0200)]
add trigger for pve-api-updates
Dietmar Maurer [Wed, 10 Jun 2015 05:39:18 +0000 (07:39 +0200)]
crm: simply wait if there is no resource config
Dietmar Maurer [Tue, 9 Jun 2015 12:35:32 +0000 (14:35 +0200)]
bump version to 1.0-3
Dietmar Maurer [Tue, 9 Jun 2015 07:33:42 +0000 (09:33 +0200)]
bump version to 1.0-2
Dietmar Maurer [Tue, 9 Jun 2015 07:32:15 +0000 (09:32 +0200)]
use Wants instead of Requires inside systemd service definitions
To avoid unnecessary restarts of dependent services.
Dietmar Maurer [Fri, 5 Jun 2015 08:04:45 +0000 (10:04 +0200)]
bump version to 1.0-1
Dietmar Maurer [Fri, 5 Jun 2015 08:02:55 +0000 (10:02 +0200)]
delete stale files
Dietmar Maurer [Fri, 5 Jun 2015 08:00:48 +0000 (10:00 +0200)]
always start crm and lrm service
Even if there is no resources.cfg. That makes it easier to
enable HA, because we don't need to start services manually.
Dietmar Maurer [Fri, 10 Apr 2015 04:54:36 +0000 (06:54 +0200)]
bump version to 0.9-3
Dietmar Maurer [Fri, 10 Apr 2015 04:51:51 +0000 (06:51 +0200)]
implement delay command for regression tester
root [Fri, 3 Apr 2015 08:26:29 +0000 (10:26 +0200)]
test of failback
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
Dietmar Maurer [Fri, 10 Apr 2015 04:32:46 +0000 (06:32 +0200)]
correctly pass parameters for change_service_location
Dietmar Maurer [Fri, 10 Apr 2015 04:31:44 +0000 (06:31 +0200)]
sort output so that we can compare logs
Dietmar Maurer [Tue, 7 Apr 2015 07:52:14 +0000 (09:52 +0200)]
bump version to 0.9-2
Dietmar Maurer [Tue, 7 Apr 2015 07:50:56 +0000 (09:50 +0200)]
add warnings if ha group does not exists
Dietmar Maurer [Tue, 7 Apr 2015 04:55:02 +0000 (06:55 +0200)]
use groups parser
Dietmar Maurer [Sun, 5 Apr 2015 15:55:09 +0000 (17:55 +0200)]
avoid perl warning
Dietmar Maurer [Fri, 3 Apr 2015 17:00:22 +0000 (19:00 +0200)]
update README
Dietmar Maurer [Fri, 3 Apr 2015 14:45:49 +0000 (16:45 +0200)]
do not allow deletion of ha group if group is used
Dietmar Maurer [Fri, 3 Apr 2015 09:16:19 +0000 (11:16 +0200)]
use correct class
Dietmar Maurer [Fri, 3 Apr 2015 09:08:23 +0000 (11:08 +0200)]
complete ha group api
Dietmar Maurer [Fri, 3 Apr 2015 06:33:37 +0000 (08:33 +0200)]
api: allow to use simply VMIDs as resource id
Dietmar Maurer [Fri, 3 Apr 2015 04:47:07 +0000 (06:47 +0200)]
improve status API
Dietmar Maurer [Fri, 3 Apr 2015 04:24:47 +0000 (06:24 +0200)]
remove ipaddr resource type
Dietmar Maurer [Fri, 3 Apr 2015 04:18:23 +0000 (06:18 +0200)]
bump version to 0.9-1
Dietmar Maurer [Fri, 3 Apr 2015 04:16:40 +0000 (06:16 +0200)]
rename vm resource prefix: pvevm: => vm:
Dietmar Maurer [Fri, 3 Apr 2015 04:14:04 +0000 (06:14 +0200)]
add API to query ha status
Dietmar Maurer [Thu, 2 Apr 2015 06:48:37 +0000 (08:48 +0200)]
bump version to 0.8-2
Dietmar Maurer [Thu, 2 Apr 2015 06:47:01 +0000 (08:47 +0200)]
lrm: reduce TimeoutStopSec
because systemd waits 2*TimeoutStopSec
Dietmar Maurer [Thu, 2 Apr 2015 06:43:28 +0000 (08:43 +0200)]
lrm: set systemd killmode to 'process'
We do not want to kill running VMs (for example during software update).
Dietmar Maurer [Thu, 2 Apr 2015 06:21:26 +0000 (08:21 +0200)]
bump version to 0.8-1
Dietmar Maurer [Thu, 2 Apr 2015 06:17:15 +0000 (08:17 +0200)]
currecrtly send cfs lock update request
Dietmar Maurer [Wed, 1 Apr 2015 09:05:25 +0000 (11:05 +0200)]
bump version to 0.7-1
Dietmar Maurer [Wed, 1 Apr 2015 07:57:03 +0000 (09:57 +0200)]
create /etc/pve/ha
Dietmar Maurer [Wed, 1 Apr 2015 07:51:48 +0000 (09:51 +0200)]
use correct package for lock_ha_config
Dietmar Maurer [Wed, 1 Apr 2015 06:20:05 +0000 (08:20 +0200)]
fit ha-manager status when ha is unconfigured
Dietmar Maurer [Wed, 1 Apr 2015 06:19:32 +0000 (08:19 +0200)]
do not unlink watchdog socket when started via systemd
Dietmar Maurer [Wed, 1 Apr 2015 06:05:01 +0000 (08:05 +0200)]
depend on systemd (build-depend on dh-systemd)
Dietmar Maurer [Wed, 1 Apr 2015 05:53:08 +0000 (07:53 +0200)]
fix json_reader
Dietmar Maurer [Tue, 31 Mar 2015 11:46:33 +0000 (13:46 +0200)]
fix dependencies
Dietmar Maurer [Fri, 27 Mar 2015 11:42:20 +0000 (12:42 +0100)]
lrm: use correct rpcenv 'ha'
Dietmar Maurer [Fri, 27 Mar 2015 11:29:56 +0000 (12:29 +0100)]
bump version to 0.6-1
Dietmar Maurer [Fri, 27 Mar 2015 11:26:26 +0000 (12:26 +0100)]
move configuration handling into PVE::HA::Config
Dietmar Maurer [Fri, 27 Mar 2015 10:40:21 +0000 (11:40 +0100)]
use cfs_read_file and cfs_write_file
Dietmar Maurer [Fri, 27 Mar 2015 08:17:15 +0000 (09:17 +0100)]
ha-manager status: include service state
Dietmar Maurer [Fri, 27 Mar 2015 08:00:53 +0000 (09:00 +0100)]
ha-manager status: add --verbose flag
Dietmar Maurer [Fri, 27 Mar 2015 07:51:41 +0000 (08:51 +0100)]
restart lrm after upgrade
Dietmar Maurer [Fri, 27 Mar 2015 07:31:41 +0000 (08:31 +0100)]
ha-manager: improve status output
Dietmar Maurer [Fri, 27 Mar 2015 07:31:13 +0000 (08:31 +0100)]
add timestamp to manager status
Dietmar Maurer [Fri, 27 Mar 2015 05:56:51 +0000 (06:56 +0100)]
update lrm status on each iteration
Dietmar Maurer [Fri, 27 Mar 2015 05:50:45 +0000 (06:50 +0100)]
update_lrm_status: add a time stamp
Dietmar Maurer [Fri, 27 Mar 2015 05:49:19 +0000 (06:49 +0100)]
cleanup lrm startup code
Dietmar Maurer [Fri, 27 Mar 2015 05:32:04 +0000 (06:32 +0100)]
depend on qemu-server
Dietmar Maurer [Fri, 27 Mar 2015 05:28:50 +0000 (06:28 +0100)]
improve docu
Dietmar Maurer [Thu, 26 Mar 2015 16:17:49 +0000 (17:17 +0100)]
remove dead code
Dietmar Maurer [Thu, 26 Mar 2015 15:47:18 +0000 (16:47 +0100)]
add another test
Dietmar Maurer [Thu, 26 Mar 2015 15:39:56 +0000 (16:39 +0100)]
add another test case
Dietmar Maurer [Thu, 26 Mar 2015 12:23:20 +0000 (13:23 +0100)]
bump version 0.5-1
Dietmar Maurer [Thu, 26 Mar 2015 12:01:27 +0000 (13:01 +0100)]
implement migrate
Dietmar Maurer [Thu, 26 Mar 2015 11:50:47 +0000 (12:50 +0100)]
implement change_service_location
Dietmar Maurer [Thu, 26 Mar 2015 09:43:06 +0000 (10:43 +0100)]
lrm: fix stop timeout
Dietmar Maurer [Thu, 26 Mar 2015 09:21:02 +0000 (10:21 +0100)]
fix service dependencies
So that we can shutdown without triggering the watchdog. It is also
important to depend on syslog.service (else logs gets lost)
Dietmar Maurer [Thu, 26 Mar 2015 07:08:58 +0000 (08:08 +0100)]
assume lrm mode 'active' by default
Dietmar Maurer [Thu, 26 Mar 2015 07:01:38 +0000 (08:01 +0100)]
log errors when writing lrm status
And correctly write status once at daemon startup (we need to wait for quorum)
Dietmar Maurer [Thu, 26 Mar 2015 06:26:24 +0000 (07:26 +0100)]
write lrm mode into lrm status file
LRM is normally in 'active' mode, but can be set to 'reboot', 'shutdown' or 'restart'.
We use this to freeze services, so that we can safely reboot a node, or restart
the LRM.
Dietmar Maurer [Wed, 25 Mar 2015 12:59:47 +0000 (13:59 +0100)]
bump version to 0.4-1
Dietmar Maurer [Wed, 25 Mar 2015 12:09:28 +0000 (13:09 +0100)]
increase fence_delay to 60 seconds
To match the watchdog timeout.
Dietmar Maurer [Wed, 25 Mar 2015 12:04:28 +0000 (13:04 +0100)]
remove dead code
Dietmar Maurer [Wed, 25 Mar 2015 12:00:09 +0000 (13:00 +0100)]
fix failover after master crash with pending fence action
Also include a test case for that.
Dietmar Maurer [Wed, 25 Mar 2015 08:06:16 +0000 (09:06 +0100)]
add README for regresstion test
The idea is to describe each test shortly, so that it is easier
to understand the purpose.
Dietmar Maurer [Wed, 25 Mar 2015 08:01:59 +0000 (09:01 +0100)]
re-enable ha-tester (run regression tests)
Dietmar Maurer [Wed, 25 Mar 2015 07:58:18 +0000 (08:58 +0100)]
remove stale tests
Dietmar Maurer [Wed, 25 Mar 2015 07:51:57 +0000 (08:51 +0100)]
fix regression test environment
Dietmar Maurer [Wed, 25 Mar 2015 07:49:48 +0000 (08:49 +0100)]
move exec_resource_agent() to PVE::HA::Sim::Env
so that we can reuse it with regression tests
Dietmar Maurer [Wed, 25 Mar 2015 07:48:29 +0000 (08:48 +0100)]
LRM do not use time(), improve logging
Dietmar Maurer [Wed, 25 Mar 2015 07:46:22 +0000 (08:46 +0100)]
add a hack to support regression tests (can_fork())