This fixes point 2. of commit
3addeeb - avoiding that a LRM goes
active as long as the CRM still has it in (pending) `fence` state,
which can happen after a watchdog reset + fast boot. This avoids that
we interfere with the CRM acquiring the lock, which is all the more
important once a future commit gets added that ensures a node isn't
stuck in `fence` state if there's no service configured (anymore) due
to admin manually removing them during fencing.
We explicitly fix the startup first to better show how it works in
the test framework, but as the test/sim hardware can now delay the
CRM now while keeping LRM running, the second test (i.e.,
test-service-command9) should still trigger after the next commit, if
this one would be reverted or broken otherwise.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
return undef;
} else {
$self->{service_status} = $ms->{service_status} || {};
+ my $nodename = $haenv->nodename();
+ $self->{node_status} = $ms->{node_status}->{$nodename} || 'unknown';
return 1;
}
}
my ($self) = @_;
my $haenv = $self->{haenv};
+
my $nodename = $haenv->nodename();
my $ss = $self->{service_status};
my $fenced_services = PVE::HA::Tools::count_fenced_services($ss, $nodename);
- return $fenced_services;
+ return $fenced_services || $self->{node_status} eq 'fence';
}
sub active_service_count {
info 24 node3/crm: status change wait_for_quorum => slave
info 120 cmdlist: execute service vm:103 add node3 stopped
info 120 node1/crm: adding new service 'vm:103' on node 'node3'
-info 125 node3/lrm: got lock 'ha_agent_node3_lock'
-info 125 node3/lrm: status change wait_for_agent_lock => active
-info 140 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
info 220 cmdlist: execute service vm:103 started
-info 220 node1/crm: service 'vm:103': state changed from 'stopped' to 'started' (node = node3)
-info 225 node3/lrm: starting service vm:103
-info 225 node3/lrm: service status vm:103 started
info 820 hardware: exit simulation - done
info 20 node1/lrm: status change wait_for_agent_lock => active
info 20 node1/lrm: starting service vm:101
info 20 node1/lrm: service status vm:101 started
-info 22 node3/lrm: got lock 'ha_agent_node3_lock'
-info 22 node3/lrm: status change wait_for_agent_lock => active
-info 22 node3/lrm: starting service vm:103
-info 22 node3/lrm: service status vm:103 started
info 40 run-loop: skipping CRM round
info 60 node1/crm: got lock 'ha_manager_lock'
info 60 node1/crm: status change wait_for_quorum => master