]> git.proxmox.com Git - pve-ha-manager.git/blame - README
try to simulate all nodes
[pve-ha-manager.git] / README
CommitLineData
7cdfa499 1= Proxmox HA Manager =
95ca6580 2
7cdfa499
DM
3== Motivation ==
4
5The current HA manager has a bunch of drawbacks:
6
7- no more development (redhat moved to pacemaker)
8
9- highly depend on corosync (old version)
10
11- complicated code (cause by compatibility layer with
12 older cluster stack (cman)
13
14- no self-fencing
15
16In future, we want to make HA easier for our users, and it should
17be possible to move to newest corosync, or even a totally different
18cluster stack. So we want:
19
20- possible to run with any distributed key/value store which provides
21 some kind of locking (with timeouts).
22
23- self fencing using linux watchdog device
24
25- implemented in perl, so thatw e can use PVE framework
95ca6580
DM
26
27- only works with simply resources like VMs
28
7cdfa499
DM
29= Architecture =
30
31== Cluster requirements ==
32
33=== Cluster wide locks with timeouts ===
34
35The cluster stack must provide cluster wide locks with timeouts.
36The Proxmox 'pmxcfs' implements this on top of corosync.
37
38== Self fencing ==
39
40A node needs to aquire a special 'agent_lock' (one separate lock for
41each node) before starting HA resources, and the node updates the
42watchdog device once it get that lock. If the node loose quorum, or is
43unable to get the 'agent_lock', the watchdog is no longer updated. The
44node can release the lock if there are no running HA resources.
45
46This makes sure that the node holds the 'agent_lock' as long as there
47are running services on that node.
48
49The HA manger can assume that the watchdog triggered a reboot when he
50is able to aquire the 'agent_lock' for that node.