Thomas Lamprecht [Wed, 23 Jan 2019 09:34:40 +0000 (10:34 +0100)]
fix #1602: allow to delete 'ignored' services over API
service_is_ha_managed returns false if a service is in the resource
configuration but marked as 'ignore', as for the internal stack it is
as it wasn't HA managed at all.
But user should be able to remvoe it from the configuration easily
even in this state, without setting the requesttate to anything else
first.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 23 Jan 2019 08:43:14 +0000 (09:43 +0100)]
fix #1842: do not pass forceStop to CT shutdown
The vm_shutdown parameter forceStop differs in behaviour between VMs
and CTs. While on VMs it ensures that a VM gets stoppped if it could
not shutdown gracefully only after the timeout passed, the container
stack always ignores any timeout if forceStop is set and hard stops
the CT immediately.
To achieve this behaviour for CTs too, the timeout is enough, as
lxc-stop then does the hard stop after timeout itself.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 13 Jan 2019 11:39:53 +0000 (12:39 +0100)]
fence config parser: early return on ignored devices
We do not support all of the dlm.conf possibilities, but we also do
not want to die on such "unkown" keys/commands as an admin should be
able to share this config if it is already used for other purposes,
e.g. lockd, gfs, or such.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
in this package we provide api functions, thus we want to activate
the pve-api-update trigger, so that packages like pve-manager get
notified about it. But we also use api functions directly so we setup
an interest in the pve-api-update trigger. This results in an lintian
error (lintian version from buster or newer) which we can override:
> [...]
> This tag is also triggered if the package has an activate trigger
> for something on which it also declares an interest. The only (but
> rather unlikely) reason to do this is if another package also
> declares an interest and this package needs to activate that other
> package. If the package is using it for this exact purpose, then
> please use a Lintian override to state this.
-- https://lintian.debian.org/tags/repeated-trigger-name.html
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
addresses a few nits from Fabians review at:
https://pve.proxmox.com/pipermail/pve-devel/2018-December/035061.html
https://pve.proxmox.com/pipermail/pve-devel/2018-December/035085.html
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 20 Dec 2018 07:44:42 +0000 (08:44 +0100)]
fix #1378: allow to specify a service shutdown policy
Allow an admin to set a datacenter wide HA policy which can change
the way we handle services on a node shutdown.
There's:
* freeze: always freeze servivces, independent of the shutdown type
(reboot, poweroff)
* failover: never freeze services, this means that a service will get
recovered to another node if possible and if the current node does
not comes back up in the grace period of 1 minute.
* default: this is the current behavior, freeze on reboot but do not
freeze on poweroff
Add to tests, shutdown-policy1 which is based of the reboot1 test,
but enforces no freeze with a failover policy, and shutdown-policy2
which is based on the shutdown1 test but with a explicit freeze
policy. You can compare (diff) each tests log result to the test it's
based on to see what changes.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
use dpkg-buildpackage and debhelper properly, add missing dependencies and
embed used perl modules from libpve-common-perl to make pve-ha-simulator
standalone.
by moving parse_sid to PVE::HA::Env, with the default implementation in
PVE::HA::Config.
the bash completion methods use PVE::HA::Config (and PVE::Cluster), but
the corresponding use statements are only in PVE::CLI::ha_manager, where the
bash completion is actually used.
and use PVE::HA::Groups to parse the config when testing/simulating.
this allows us to drop the dependency on PVE::HA::Config, which would
otherwise pull in a lot of additional depdendencies that we don't want
in the simulator.
Thomas Lamprecht [Wed, 22 Nov 2017 10:53:12 +0000 (11:53 +0100)]
do not do active work if cfs update failed
We ignored if the cluster state update failed and happily worked with
an empty state, resulting in strange actions, e.g., the removal of
all (not so) "stale" services or changing the all but the masters
node state to unknown.
Check on the update result and if failed, either do not get active,
or, if already active, skip the current round with the knowledge
that we only got here because the update failed but our lock renew
worked => cfs got already in a working and quorate state again -
(probably just a restart)
Thomas Lamprecht [Wed, 22 Nov 2017 10:53:11 +0000 (11:53 +0100)]
move cfs update to common code
We updated the CRM and LRM view of the cluster state only in the PVE2
environment, outside of all regression testing and simulation scope.
Further, we ignored if this update failed and happily worked with an
empty state, resulting in strange actions, e.g., the removal of all
(not so) "stale" services or changing the all but the masters node
state to unknown.
This patch tries to improve this by moving out the update in a own
environment method, cluster_update_state, calling this in the LRM and
CRM and saving its result.
As with our introduced functionallity to simulate cfs rw or update
errors we can also simulate failures of this state update with the RT
system.
Thomas Lamprecht [Wed, 22 Nov 2017 10:53:08 +0000 (11:53 +0100)]
CRM: refactor check if state transition to active is ok
Mainly addresses a problem where we read the manager status without
catching any possible exceptions.
As this was done only to check if our node has active fencing jobs,
which tells us that it makes no sense to even try to acquire the
manager lock - as we're fenced soon anyway.
Besides this check we always checked if we're quorate and if there
are services configured, so move
both checks in the new 'can_get_active' method, which replaces the
check_pending_fencing and the has_services method.
Move the quorum check in front and catch a possible error from the
following manager status read.
As a side effect the state transition code gets a bit shorter without
hiding the check intention.
Thomas Lamprecht [Wed, 22 Nov 2017 10:53:07 +0000 (11:53 +0100)]
lrm: handle an error during service_status update
we may get an error here if the cluster filesystem is (temporarily)
unavailable here, this error resulted in stopping the whole CRM
service immediately, which then triggered a node reset (if happened
on the current master), even if we had still time left to retry and
thus, for example, handle a update of pve-cluster gracefully.
Add a method which wraps the status read in an eval and logs an
eventual error, but does not abort the service. Instead we rely on
our get_protected_ha_agent_lock method to detect a problem and switch
to the lost_agent_lock state.
If the pmxcfs outage was really short, so that the manager status
read failed but the lock update worked again we update also always
before doing real work when in the 'active' state. If this update
fails we return from the eval and try next round again, as no point
in doing anything without consistent state.
Thomas Lamprecht [Wed, 22 Nov 2017 10:53:06 +0000 (11:53 +0100)]
test/sim: allow to simulate cfs failures
Add simulated hardware commands for the cluster file system.
This allows to tell the regression test or simulator system that a
certain nodes calls to methods accessing the CFS should fail, i.e.,
die.
With this we can cover a situation which mainly happen during a
cluster file system update.
For now allow to define if the CFS is read-/writeable (state rw) and
if updates of the CFS (state update) should work or fail.
Add 'can read/write' assertions all over the relevant methods.
Thomas Lamprecht [Wed, 24 Jan 2018 10:04:56 +0000 (11:04 +0100)]
postinst: use auto generated postinst
This was introduced for cleaning up an possible left over systemd
watchdog mux enable link, which is gone for good now.
Then it was extended with trigger targets, as the HA Manager services
now restart when the pve-api-update trigger fires.
As the autogenerated postinst does the same unconditionally for the
pve-ha-lrm.service and pve-ha-crm.service already we may remove it
too.
The only difference is that try-restart is used by the auto generated
script, not reload-or-try-restart, but this does not matter, as the
HA services have currently no reload ability.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 24 Jan 2018 10:04:55 +0000 (11:04 +0100)]
postinst: we do not use templates, remove debconf
This was copied by accident when adding the transitional code for
removing the left over of the systemd managed watchdog mux in
commit f8a3fc80af299e613c21c9b67e29aee8cc807018
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
wrap possible problematic cfs_read_file calls in eval
Wrap those calls to the cfs_read_file method, which may now also die
if there was a grave problem reading the file, into eval in all
methods which are used by the ha services.
The ones only used by API calls or CLI helpers are not wrapped, as
there it can be handled more gracefull (i.e., no watchdog is
running) and further, this is more intended to temporarily workaround
until we handle such an exception explicitly in the services - which
is a bit bigger change, so let's just go back to the old behavior for
now.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 24 Jan 2017 17:37:23 +0000 (18:37 +0100)]
do not show a service as queued if not configured
The check if a service is configured has precedence over the check if
a service is already processed by the manager.
This fixes a bug where a service could be shown as queued even if he
was meant to be ignored.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 24 Jan 2017 17:37:22 +0000 (18:37 +0100)]
add ignore state for resources
In this state the resource will not get touched by us, all commands
(like start/stop/migrate) go directly to the VM/CT itself and not
through the HA stack.
The resource will not get recovered if its node fails.
Achieve that by simply removing the respective service from the
manager_status service status hash if it is in ignored state.
Add the state also to the test and simulator hardware.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 11 Oct 2017 13:10:19 +0000 (15:10 +0200)]
lrm: crm: show interest in pve-api-update trigger
To ensure that the LRM and CRM services get reloaded when
pve-api-update trigger gets activated.
Important, as we directly use perl API modules from qemu-server,
pve-container, pve-common and really want to avoid to run outdated,
possible problematic or deprecated, code.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 11 Oct 2017 13:10:18 +0000 (15:10 +0200)]
lrm.service: do not timeout on stop
we must shut all services down when stopping the LRM for a host
shutdown, this can take longer than 95 seconds and should not
get interrupted to ensure a gracefull poweroff.
The watchdog is still active untill all services got stopped so we
still are safe from a freeze or equivalent failure.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Philip Abernethy [Thu, 14 Sep 2017 12:39:33 +0000 (14:39 +0200)]
fix #1347: let postfix fill in FQDN in fence mails
Using the nodename in $mailto is not correct and can lead to mails not
forwarding in restrictive mail server configurations.
Also changes $mailfrom to 'root' instead of 'root@localhost', which
results in postfix appending the proper FQDN there, too. As a result the
Delivered-to header reads something like 'root@host.domain.tld' instead
of 'root@localhost', which is much more informational and more
consistent.
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 23 Aug 2017 08:15:49 +0000 (10:15 +0200)]
fix #1073: do not count backup-suspended VMs as running
when a stopped VM managed by HA got backuped the HA stack
continuously tried to shut it down as check_running returns only if a
PID for the VM exists.
As the VM was locked the shutdown tries were blocked, but still a lot
of annoying messages and task spawns happened during the backup
period.
As querying the VM status through the vm monitor is not cheap, check
if the VM is locked with the backup lock first, the config is cached
and so this is quite cheap, only then query the VMs status over qmp,
and check if the VM is in the 'prelaunch' state.
This state gets only set if KVM was started with the `-S` option and
has not yet continued guest operation.
Some performance results, I repeated each check 1000 times, first
number is the total time spent just with the check, second time is
the the time per single check:
old check (vm runs): 87.117 ms/total => 87.117 us/loop
new check (runs, no backup): 107.744 ms/total => 107.744 us/loop
new check (runs, backup): 760.337 ms/total => 760.337 us/loop
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 23 May 2017 12:35:38 +0000 (14:35 +0200)]
explicitly sync journal when disabling watchdog updates
Without syncing the journal could loose logs for a small interval (ca
10-60 seconds), but these last seconds are really interesting for
analyzing the cause of a triggered watchdog.
Also without this often the
> "client did not stop watchdog - disable watchdog updates"
messages wasn't flushed to persistent storage and so some users had a
hard time to figure out why the machine reset.
Use the '--sync' switch of journalctl which - to quote its man page -
"guarantees that any log messages written before its invocation are
safely stored on disk at the time it returns."
Use execl to call `journalctl --sync` in a child process, do not care
for any error checks or recovery as we will be reset anyway. This is
just a hit or miss try to log the situation more consistently, if it
fails we cannot really do anything anyhow.
We call the function on two points:
a) if we exit with active connections, here the watchdog will be
triggered soon and we want to ensure that this is logged.
b) if a client closes the connection without sending the magic close
byte, here the watchdog would trigger while we hang in epoll at
the beginning of the loop, so sync the log here also.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 26 May 2017 15:56:11 +0000 (17:56 +0200)]
always queue service stop if node shuts down
Commit 61ae38eb6fc5ab351fb61f2323776819e20538b7 which ensured that
services get freezed on a node reboot had a side effect where running
services did not get gracefully shutdown on node reboot.
This may lead to data loss as the services then get hard killed, or
they may even prevent a node reboot because a storage cannot get
unmounted as a service still access it.
This commits addresses this issue but does not changes behavior of
the freeze logic for now, but we should evaluate if a freeze makes
really sense here or at least make it configurable.
The changed regression test is a result of the fact that we did not
adapt the correct behavior for the is_node_shutdown command in the
problematic commit. The simulation envrionment returned true
everytime a node shutdown (reboot and poweroff) and the real world
environment just returned true if a poweroff happened but not on a
reboot.
Now the simulation acts the same way as the real environment.
Further I moved the simulation implemenentation to the base class so
that both simulator and regression test system behave the same.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 24 Jan 2017 16:54:03 +0000 (17:54 +0100)]
Resource/API: abort early if resource in error state
If a service is in error state the single state change command that
can make sense is setting the disabled request state.
Thus abort on all other commands early to enhance user experience.
Thomas Lamprecht [Thu, 19 Jan 2017 12:32:47 +0000 (13:32 +0100)]
sim: allow new service request states over gui
Change the old enabled/disabled GTK "Switch" element to a ComboBox
one and add all possible service states, so we can simulate the real
world behaviour with its new states better.
As we do not need to map a the boolean swicth value to our states
anymore, we may drop the set_setvice_state method from the RTHardware
class and use the one from the Hardware base class instead.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 19 Jan 2017 12:32:46 +0000 (13:32 +0100)]
factor out and unify sim_hardware_cmd
Most things done by sim_hardware_cmd are already abstracted and
available in both, the TestHardware and the RTHardware class.
Abstract out the CRM and LRM control to allow the unification of both
classes sim_hardware_cmd.
As in the last year mostly the regression test systems TestHardware
class saw new features use it as base.
We return now the current status out of the locked context, this
allows to update the simulators GUI out of the locked context.
This changes increases the power of the HA Simulator, but the new
possible actions must be still implemented in its GUI. This will be
done in future patches.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 19 Jan 2017 12:32:45 +0000 (13:32 +0100)]
sim: allocate HA Env only once per service and node
Do not allocate the HA Environment every time we fork a new CRM or
LRM, but once at the start of the Simulator for all nodes.
This can be done as the Env does not saves any state and thus can be
reused, we use this also in the TestHardware class.
Making the behavior of both Hardware classes more similar allows us
to refactor out some common code in following commits.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 12 Jan 2017 14:51:59 +0000 (15:51 +0100)]
Status: factor out new service state calculation
Factor out the new "fast feedback for user" service state calculation
and use it also in the HA Simulator to provide the same feedback as
in the real world.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 12 Jan 2017 14:51:57 +0000 (15:51 +0100)]
sim: improve canceling the migrate dialog
We could only cancel the migrate dialog by pressing ESC or - if the
used window manager supports it - by pressing the windows "X" button
in the window border.
Pressing ESC caused a warning because the result of the domain was
now a string containing the signal names, as we checked for an
integer to see if the "Ok" button was pressed perl warned us about:
> Argument "closed" isn't numeric in int at ...
Improve this by adding a cancel button and by switching the button
return values from integer to strings, which can be compared in a
more general way.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 12 Jan 2017 14:51:56 +0000 (15:51 +0100)]
sim: set migrate dialog transient to parent
This allows window managers to e.g. keep the dialog on top of the
main window, or center the dialog over the main window.
Fixes also a warning that the dialog had no transient parent
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 12 Jan 2017 14:51:54 +0000 (15:51 +0100)]
ensure test/sim.pl always use the currently developed code
sim.pl suggested to be a perl script, but was a bash script which
called ../pve-ha-simulator
It set the include directory to '..' as its intended to use the
currently developed code, not the pve-ha-simulator installed on the
system.
This did not work correctly as pve-ha-simulator has a
use lib '/global/path/to/HA/Simulator'
directive.
Create a small perl script which runs the RTHardware.
Changes which differ from the pve-ha-simulator script include that we
fall back to the 'simtest' directory if none is given.
Also the 'batch' option does not exists here, use the ha-tester
script if you want this behavior.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 20 Dec 2016 08:33:44 +0000 (09:33 +0100)]
is_node_shutdown: check for correct systemd targets
shutdown.target is active every time when the node shuts down, be it
reboot, poweroff, halt or kexec.
As we want to return true only when the node powers down without a
restart afterwards this was wrong.
Match only poweroff.target and halt.target, those two systemd targets
which cause a node shutdown without a reboot.
Enhance also the regular expression so that we do not falsely match
when a target includes poweroff.target in its name, e.g.
not-a-poweroff.target
Also pass the 'full' flag to systemctl to ensure that target name do
not get ellipsized or cut off.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>