]> git.proxmox.com Git - pve-ha-manager.git/blob - README
improve CRM state transitions
[pve-ha-manager.git] / README
1 = Proxmox HA Manager =
2
3 == Motivation ==
4
5 The current HA manager has a bunch of drawbacks:
6
7 - no more development (redhat moved to pacemaker)
8
9 - highly depend on old version of corosync
10
11 - complicated code (cause by compatibility layer with
12 older cluster stack (cman)
13
14 - no self-fencing
15
16 In future, we want to make HA easier for our users, and it should
17 be possible to move to newest corosync, or even a totally different
18 cluster stack. So we want:
19
20 - possible to run with any distributed key/value store which provides
21 some kind of locking with timeouts.
22
23 - self fencing using Linux watchdog device
24
25 - implemented in Perl, so that we can use PVE framework
26
27 - only works with simply resources like VMs
28
29 = Architecture =
30
31 == Cluster requirements ==
32
33 === Cluster wide locks with timeouts ===
34
35 The cluster stack must provide cluster wide locks with timeouts.
36 The Proxmox 'pmxcfs' implements this on top of corosync.
37
38 == Self fencing ==
39
40 A node needs to aquire a special 'ha_agent_${node}_lock' (one separate
41 lock for each node) before starting HA resources, and the node updates
42 the watchdog device once it get that lock. If the node loose quorum,
43 or is unable to get the 'ha_agent_${node}_lock', the watchdog is no
44 longer updated. The node can release the lock if there are no running
45 HA resources.
46
47 This makes sure that the node holds the 'ha_agent_${node}_lock' as
48 long as there are running services on that node.
49
50 The HA manger can assume that the watchdog triggered a reboot when he
51 is able to aquire the 'ha_agent_${node}_lock' for that node.
52
53 == Testing requirements ==
54
55 We want to be able to simulate HA cluster, using a GUI. This makes it easier
56 to learn how the system behaves. We also need a way to run regression tests.
57
58 = Implementation details =
59
60 == Cluster Resource Manager (class PVE::HA::CRM) ==
61
62 The Cluster Resource Manager (CRM) daemon runs one each node, but
63 locking makes sure only one CRM daemon act in 'master' role. That
64 'master' daemon reads the service configuration file, and request new
65 service states by writing the global 'manager_status'. That data
66 structure is read by the Local Resource Manager, which performs the
67 real work (start/stop/migrate) services.
68
69 === Possible CRM Service States ===
70
71 stopped: Service is stopped (confirmed by LRM)
72
73 request_stop: Service should be stopped. Waiting for
74 confirmation from LRM.
75
76 started: Service is active an LRM should start it asap.
77
78 fence: Wait for node fencing (service node is not inside
79 quorate cluster partition).
80
81 migrate: Migrate VM to other node
82
83 error: Service disabled because of LRM errors.
84
85 == Local Resource Manager (class PVE::HA::LRM) ==
86
87 The Local Resource Manager (LRM) daemon runs one each node, and
88 performs service commands (start/stop/migrate) for services assigned
89 to the local node. It should be mentioned that each LRM holds a
90 cluster wide 'ha_agent_${node}_lock' lock, and the CRM is not allowed
91 to assign the service to another node while the LRM holds that lock.
92
93 The LRM reads the requested service state from 'manager_status', and
94 tries to bring the local service into that state. The actial service
95 status is written back to the 'service_${node}_status', and can be
96 read by the CRM.
97
98 == Pluggable Interface for cluster environment (class PVE::HA::Env) ==
99
100 This class defines an interface to the actual cluster environment:
101
102 * get node membership and quorum information
103
104 * get/release cluster wide locks
105
106 * get system time
107
108 * watchdog interface
109
110 * read/write cluster wide status files
111
112 We have plugins for several different environments:
113
114 * PVE::HA::Sim::TestEnv: the regression test environment
115
116 * PVE::HA::Sim::RTEnv: the graphical simulator
117
118 * PVE::HA::Env::PVE2: the real Proxmox VE cluster
119
120