]> git.proxmox.com Git - pve-docs.git/blob - pvecm.adoc
add section about quorum and cluster cold start
[pve-docs.git] / pvecm.adoc
1 ifdef::manvolnum[]
2 PVE({manvolnum})
3 ================
4 include::attributes.txt[]
5
6 NAME
7 ----
8
9 pvecm - Proxmox VE Cluster Manager
10
11 SYNOPSYS
12 --------
13
14 include::pvecm.1-synopsis.adoc[]
15
16 DESCRIPTION
17 -----------
18 endif::manvolnum[]
19
20 ifndef::manvolnum[]
21 Cluster Manager
22 ===============
23 include::attributes.txt[]
24 endif::manvolnum[]
25
26 The {PVE} cluster manager 'pvecm' is a tool to create a group of
27 physical servers. Such group is called a *cluster*. We use the
28 http://www.corosync.org[Corosync Cluster Engine] for reliable group
29 communication, and such cluster can consists of up to 32 physical nodes
30 (probably more, dependent on network latency).
31
32 'pvecm' can be used to create a new cluster, join nodes to a cluster,
33 leave the cluster, get status information and do various other cluster
34 related tasks. The Proxmox Cluster file system (pmxcfs) is used to
35 transparently distribute the cluster configuration to all cluster
36 nodes.
37
38 Grouping nodes into a cluster has the following advantages:
39
40 * Centralized, web based management
41
42 * Multi-master clusters: Each node can do all management task
43
44 * Proxmox Cluster file system (pmxcfs): Database-driven file system
45 for storing configuration files, replicated in real-time on all
46 nodes using corosync.
47
48 * Easy migration of Virtual Machines and Containers between physical
49 hosts
50
51 * Fast deployment
52
53 * Cluster-wide services like firewall and HA
54
55
56 Requirements
57 ------------
58
59 * All nodes must be in the same network as corosync uses IP Multicast
60 to communicate between nodes (also see
61 http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
62 ports 5404 and 5405 for cluster communication.
63 +
64 NOTE: Some switches do not support IP multicast by default and must be
65 manually enabled first.
66
67 * Date and time have to be synchronized.
68
69 * SSH tunnel on TCP port 22 between nodes is used.
70
71 * If you are interested in High Availability, you need to have at
72 least three nodes for reliable quorum. All nodes should have the
73 same version.
74
75 * We recommend a dedicated NIC for the cluster traffic, especially if
76 you use shared storage.
77
78 NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
79 Proxmox VE 4.0 cluster nodes.
80
81
82 Preparing Nodes
83 ---------------
84
85 First, install {PVE} on all nodes. Make sure that each node is
86 installed with the final hostname and IP configuration. Changing the
87 hostname and IP is not possible after cluster creation.
88
89 Currently the cluster creation has to be done on the console, so you
90 need to login via 'ssh'.
91
92 Create the Cluster
93 ------------------
94
95 Login via 'ssh' to the first Proxmox VE node. Use a unique name for
96 your cluster. This name cannot be changed later.
97
98 hp1# pvecm create YOUR-CLUSTER-NAME
99
100 CAUTION: The cluster name is used to compute the default multicast
101 address. Please use unique cluster names if you run more than one
102 cluster inside your network.
103
104 To check the state of your cluster use:
105
106 hp1# pvecm status
107
108
109 Adding Nodes to the Cluster
110 ---------------------------
111
112 Login via 'ssh' to the node you want to add.
113
114 hp2# pvecm add IP-ADDRESS-CLUSTER
115
116 For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
117
118 CAUTION: A new node cannot hold any VM´s, because you would get
119 conflicts about identical VM IDs. Also, all existing configuration in
120 '/etc/pve' is overwritten when you join a new node to the cluster. To
121 workaround, use vzdump to backup and restore to a different VMID after
122 adding the node to the cluster.
123
124 To check the state of cluster:
125
126 # pvecm status
127
128 .Cluster status after adding 4 nodes
129 ----
130 hp2# pvecm status
131 Quorum information
132 ~~~~~~~~~~~~~~~~~~
133 Date: Mon Apr 20 12:30:13 2015
134 Quorum provider: corosync_votequorum
135 Nodes: 4
136 Node ID: 0x00000001
137 Ring ID: 1928
138 Quorate: Yes
139
140 Votequorum information
141 ~~~~~~~~~~~~~~~~~~~~~~
142 Expected votes: 4
143 Highest expected: 4
144 Total votes: 4
145 Quorum: 2
146 Flags: Quorate
147
148 Membership information
149 ~~~~~~~~~~~~~~~~~~~~~~
150 Nodeid Votes Name
151 0x00000001 1 192.168.15.91
152 0x00000002 1 192.168.15.92 (local)
153 0x00000003 1 192.168.15.93
154 0x00000004 1 192.168.15.94
155 ----
156
157 If you only want the list of all nodes use:
158
159 # pvecm nodes
160
161 .List Nodes in a Cluster
162 ----
163 hp2# pvecm nodes
164
165 Membership information
166 ~~~~~~~~~~~~~~~~~~~~~~
167 Nodeid Votes Name
168 1 1 hp1
169 2 1 hp2 (local)
170 3 1 hp3
171 4 1 hp4
172 ----
173
174
175 Remove a Cluster Node
176 ---------------------
177
178 CAUTION: Read carefully the procedure before proceeding, as it could
179 not be what you want or need.
180
181 Move all virtual machines from the node. Make sure you have no local
182 data or backups you want to keep, or save them accordingly.
183
184 Log in to one remaining node via ssh. Issue a 'pvecm nodes' command to
185 identify the node ID:
186
187 ----
188 hp1# pvecm status
189
190 Quorum information
191 ~~~~~~~~~~~~~~~~~~
192 Date: Mon Apr 20 12:30:13 2015
193 Quorum provider: corosync_votequorum
194 Nodes: 4
195 Node ID: 0x00000001
196 Ring ID: 1928
197 Quorate: Yes
198
199 Votequorum information
200 ~~~~~~~~~~~~~~~~~~~~~~
201 Expected votes: 4
202 Highest expected: 4
203 Total votes: 4
204 Quorum: 2
205 Flags: Quorate
206
207 Membership information
208 ~~~~~~~~~~~~~~~~~~~~~~
209 Nodeid Votes Name
210 0x00000001 1 192.168.15.91 (local)
211 0x00000002 1 192.168.15.92
212 0x00000003 1 192.168.15.93
213 0x00000004 1 192.168.15.94
214 ----
215
216 IMPORTANT: at this point you must power off the node to be removed and
217 make sure that it will not power on again (in the network) as it
218 is.
219
220 ----
221 hp1# pvecm nodes
222
223 Membership information
224 ~~~~~~~~~~~~~~~~~~~~~~
225 Nodeid Votes Name
226 1 1 hp1 (local)
227 2 1 hp2
228 3 1 hp3
229 4 1 hp4
230 ----
231
232 Log in to one remaining node via ssh. Issue the delete command (here
233 deleting node hp4):
234
235 hp1# pvecm delnode hp4
236
237 If the operation succeeds no output is returned, just check the node
238 list again with 'pvecm nodes' or 'pvecm status'. You should see
239 something like:
240
241 ----
242 hp1# pvecm status
243
244 Quorum information
245 ~~~~~~~~~~~~~~~~~~
246 Date: Mon Apr 20 12:44:28 2015
247 Quorum provider: corosync_votequorum
248 Nodes: 3
249 Node ID: 0x00000001
250 Ring ID: 1992
251 Quorate: Yes
252
253 Votequorum information
254 ~~~~~~~~~~~~~~~~~~~~~~
255 Expected votes: 3
256 Highest expected: 3
257 Total votes: 3
258 Quorum: 3
259 Flags: Quorate
260
261 Membership information
262 ~~~~~~~~~~~~~~~~~~~~~~
263 Nodeid Votes Name
264 0x00000001 1 192.168.15.90 (local)
265 0x00000002 1 192.168.15.91
266 0x00000003 1 192.168.15.92
267 ----
268
269 IMPORTANT: as said above, it is very important to power off the node
270 *before* removal, and make sure that it will *never* power on again
271 (in the existing cluster network) as it is.
272
273 If you power on the node as it is, your cluster will be screwed up and
274 it could be difficult to restore a clean cluster state.
275
276 If, for whatever reason, you want that this server joins the same
277 cluster again, you have to
278
279 * reinstall pve on it from scratch
280
281 * then join it, as explained in the previous section.
282
283
284 Quorum
285 ------
286
287 {pve} use a quorum-based technique to provide a consistent state among
288 all cluster nodes.
289
290 [quote, from Wikipedia, Quorum (distributed computing)]
291 ____
292 A quorum is the minimum number of votes that a distributed transaction
293 has to obtain in order to be allowed to perform an operation in a
294 distributed system.
295 ____
296
297 In case of network partitioning, state changes requires that a
298 majority of nodes are online. The cluster switches to read-only mode
299 if it loose quorum.
300
301 NOTE: {pve} assigns a single vote to each node by default.
302
303
304 Cluster Cold Start
305 ------------------
306
307 It is obvious that a cluster is not quorate when all nodes are
308 offline. This is a common case after a power failure.
309
310 NOTE: It is always a good idea to use an uninterruptible power supply
311 ('UPS', also called 'battery backup') to avoid this state. Especially if
312 you want HA.
313
314 On node startup, service 'pve-manager' waits up to 60 seconds to reach
315 quorum, and then starts all guests. If it fails to get quorum, that
316 service simply aborts, and you need to start your guest manually once
317 you have quorum.
318
319 If you start all nodes at the same time (for example when power comes
320 back), it is likely that you reach quorum within above timeout. But
321 startup can fail if some nodes starts much faster than others, so you
322 need to start your guest manually after reaching quorum. You can do
323 that on the GUI, or on the command line with:
324
325 systemctl start pve-manager
326
327
328 ifdef::manvolnum[]
329 include::pve-copyright.adoc[]
330 endif::manvolnum[]