add node sparation without re installation section
[pve-docs.git] / pvecm.adoc
CommitLineData
d8742b0c
DM
1ifdef::manvolnum[]
2PVE({manvolnum})
3================
4include::attributes.txt[]
5
6NAME
7----
8
74026b8f 9pvecm - Proxmox VE Cluster Manager
d8742b0c
DM
10
11SYNOPSYS
12--------
13
14include::pvecm.1-synopsis.adoc[]
15
16DESCRIPTION
17-----------
18endif::manvolnum[]
19
20ifndef::manvolnum[]
21Cluster Manager
22===============
23include::attributes.txt[]
24endif::manvolnum[]
25
8c1189b6
FG
26The {PVE} cluster manager `pvecm` is a tool to create a group of
27physical servers. Such a group is called a *cluster*. We use the
8a865621 28http://www.corosync.org[Corosync Cluster Engine] for reliable group
5eba0743 29communication, and such clusters can consist of up to 32 physical nodes
8a865621
DM
30(probably more, dependent on network latency).
31
8c1189b6 32`pvecm` can be used to create a new cluster, join nodes to a cluster,
8a865621 33leave the cluster, get status information and do various other cluster
e300cf7d
FG
34related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
35is used to transparently distribute the cluster configuration to all cluster
8a865621
DM
36nodes.
37
38Grouping nodes into a cluster has the following advantages:
39
40* Centralized, web based management
41
5eba0743 42* Multi-master clusters: each node can do all management task
8a865621 43
8c1189b6
FG
44* `pmxcfs`: database-driven file system for storing configuration files,
45 replicated in real-time on all nodes using `corosync`.
8a865621 46
5eba0743 47* Easy migration of virtual machines and containers between physical
8a865621
DM
48 hosts
49
50* Fast deployment
51
52* Cluster-wide services like firewall and HA
53
54
55Requirements
56------------
57
8c1189b6 58* All nodes must be in the same network as `corosync` uses IP Multicast
8a865621 59 to communicate between nodes (also see
ceabe189 60 http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
ff72a2ba 61 ports 5404 and 5405 for cluster communication.
ceabe189
DM
62+
63NOTE: Some switches do not support IP multicast by default and must be
64manually enabled first.
8a865621
DM
65
66* Date and time have to be synchronized.
67
ceabe189 68* SSH tunnel on TCP port 22 between nodes is used.
8a865621 69
ceabe189
DM
70* If you are interested in High Availability, you need to have at
71 least three nodes for reliable quorum. All nodes should have the
72 same version.
8a865621
DM
73
74* We recommend a dedicated NIC for the cluster traffic, especially if
75 you use shared storage.
76
77NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
ceabe189 78Proxmox VE 4.0 cluster nodes.
8a865621
DM
79
80
ceabe189
DM
81Preparing Nodes
82---------------
8a865621
DM
83
84First, install {PVE} on all nodes. Make sure that each node is
85installed with the final hostname and IP configuration. Changing the
86hostname and IP is not possible after cluster creation.
87
88Currently the cluster creation has to be done on the console, so you
8c1189b6 89need to login via `ssh`.
8a865621 90
8a865621 91Create the Cluster
ceabe189 92------------------
8a865621 93
8c1189b6
FG
94Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
95This name cannot be changed later.
8a865621
DM
96
97 hp1# pvecm create YOUR-CLUSTER-NAME
98
63f956c8
DM
99CAUTION: The cluster name is used to compute the default multicast
100address. Please use unique cluster names if you run more than one
101cluster inside your network.
102
8a865621
DM
103To check the state of your cluster use:
104
105 hp1# pvecm status
106
107
108Adding Nodes to the Cluster
ceabe189 109---------------------------
8a865621 110
8c1189b6 111Login via `ssh` to the node you want to add.
8a865621
DM
112
113 hp2# pvecm add IP-ADDRESS-CLUSTER
114
115For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
116
5eba0743 117CAUTION: A new node cannot hold any VMs, because you would get
7980581f 118conflicts about identical VM IDs. Also, all existing configuration in
8c1189b6
FG
119`/etc/pve` is overwritten when you join a new node to the cluster. To
120workaround, use `vzdump` to backup and restore to a different VMID after
7980581f 121adding the node to the cluster.
8a865621
DM
122
123To check the state of cluster:
124
125 # pvecm status
126
ceabe189 127.Cluster status after adding 4 nodes
8a865621
DM
128----
129hp2# pvecm status
130Quorum information
131~~~~~~~~~~~~~~~~~~
132Date: Mon Apr 20 12:30:13 2015
133Quorum provider: corosync_votequorum
134Nodes: 4
135Node ID: 0x00000001
136Ring ID: 1928
137Quorate: Yes
138
139Votequorum information
140~~~~~~~~~~~~~~~~~~~~~~
141Expected votes: 4
142Highest expected: 4
143Total votes: 4
144Quorum: 2
145Flags: Quorate
146
147Membership information
148~~~~~~~~~~~~~~~~~~~~~~
149 Nodeid Votes Name
1500x00000001 1 192.168.15.91
1510x00000002 1 192.168.15.92 (local)
1520x00000003 1 192.168.15.93
1530x00000004 1 192.168.15.94
154----
155
156If you only want the list of all nodes use:
157
158 # pvecm nodes
159
5eba0743 160.List nodes in a cluster
8a865621
DM
161----
162hp2# pvecm nodes
163
164Membership information
165~~~~~~~~~~~~~~~~~~~~~~
166 Nodeid Votes Name
167 1 1 hp1
168 2 1 hp2 (local)
169 3 1 hp3
170 4 1 hp4
171----
172
173
174Remove a Cluster Node
ceabe189 175---------------------
8a865621
DM
176
177CAUTION: Read carefully the procedure before proceeding, as it could
178not be what you want or need.
179
180Move all virtual machines from the node. Make sure you have no local
181data or backups you want to keep, or save them accordingly.
182
8c1189b6 183Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
7980581f 184identify the node ID:
8a865621
DM
185
186----
187hp1# pvecm status
188
189Quorum information
190~~~~~~~~~~~~~~~~~~
191Date: Mon Apr 20 12:30:13 2015
192Quorum provider: corosync_votequorum
193Nodes: 4
194Node ID: 0x00000001
195Ring ID: 1928
196Quorate: Yes
197
198Votequorum information
199~~~~~~~~~~~~~~~~~~~~~~
200Expected votes: 4
201Highest expected: 4
202Total votes: 4
203Quorum: 2
204Flags: Quorate
205
206Membership information
207~~~~~~~~~~~~~~~~~~~~~~
208 Nodeid Votes Name
2090x00000001 1 192.168.15.91 (local)
2100x00000002 1 192.168.15.92
2110x00000003 1 192.168.15.93
2120x00000004 1 192.168.15.94
213----
214
215IMPORTANT: at this point you must power off the node to be removed and
216make sure that it will not power on again (in the network) as it
217is.
218
219----
220hp1# pvecm nodes
221
222Membership information
223~~~~~~~~~~~~~~~~~~~~~~
224 Nodeid Votes Name
225 1 1 hp1 (local)
226 2 1 hp2
227 3 1 hp3
228 4 1 hp4
229----
230
231Log in to one remaining node via ssh. Issue the delete command (here
8c1189b6 232deleting node `hp4`):
8a865621
DM
233
234 hp1# pvecm delnode hp4
235
236If the operation succeeds no output is returned, just check the node
8c1189b6 237list again with `pvecm nodes` or `pvecm status`. You should see
8a865621
DM
238something like:
239
240----
241hp1# pvecm status
242
243Quorum information
244~~~~~~~~~~~~~~~~~~
245Date: Mon Apr 20 12:44:28 2015
246Quorum provider: corosync_votequorum
247Nodes: 3
248Node ID: 0x00000001
249Ring ID: 1992
250Quorate: Yes
251
252Votequorum information
253~~~~~~~~~~~~~~~~~~~~~~
254Expected votes: 3
255Highest expected: 3
256Total votes: 3
257Quorum: 3
258Flags: Quorate
259
260Membership information
261~~~~~~~~~~~~~~~~~~~~~~
262 Nodeid Votes Name
2630x00000001 1 192.168.15.90 (local)
2640x00000002 1 192.168.15.91
2650x00000003 1 192.168.15.92
266----
267
268IMPORTANT: as said above, it is very important to power off the node
269*before* removal, and make sure that it will *never* power on again
270(in the existing cluster network) as it is.
271
272If you power on the node as it is, your cluster will be screwed up and
273it could be difficult to restore a clean cluster state.
274
275If, for whatever reason, you want that this server joins the same
276cluster again, you have to
277
26ca7ff5 278* reinstall {pve} on it from scratch
8a865621
DM
279
280* then join it, as explained in the previous section.
d8742b0c 281
555e966b
TL
282Separate A Node Without Reinstalling
283~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284
285CAUTION: This is *not* the recommended method, proceed with caution. Use the
286above mentioned method if you're unsure.
287
288You can also separate a node from a cluster without reinstalling it from
289scratch. But after removing the node from the cluster it will still have
290access to the shared storages! This must be resolved before you start removing
291the node from the cluster. A {pve} cluster cannot share the exact same
292storage with another cluster, as it leads to VMID conflicts.
293
294Move the guests which you want to keep on this node now, after the removal you
295can do this only via backup and restore. Its suggested that you create a new
296storage where only the node which you want to separate has access. This can be
297an new export on your NFS or a new Ceph pool, to name a few examples. Its just
298important that the exact same storage does not gets accessed by multiple
299clusters. After setting this storage up move all data from the node and its VMs
300to it. Then you are ready to separate the node from the cluster.
301
302WARNING: Ensure all shared resources are cleanly separated! You will run into
303conflicts and problems else.
304
305First stop the corosync and the pve-cluster services on the node:
306[source,bash]
307systemctl stop pve-cluster
308systemctl stop corosync
309
310Start the cluster filesystem again in local mode:
311[source,bash]
312pmxcfs -l
313
314Delete the corosync configuration files:
315[source,bash]
316rm /etc/pve/corosync.conf
317rm /etc/corosync/*
318
319You can now start the filesystem again as normal service:
320[source,bash]
321killall pmxcfs
322systemctl start pve-cluster
323
324The node is now separated from the cluster. You can deleted it from a remaining
325node of the cluster with:
326[source,bash]
327pvecm delnode oldnode
328
329If the command failed, because the remaining node in the cluster lost quorum
330when the now separate node exited, you may set the expected votes to 1 as a workaround:
331[source,bash]
332pvecm expected 1
333
334And the repeat the 'pvecm delnode' command.
335
336Now switch back to the separated node, here delete all remaining files left
337from the old cluster. This ensures that the node can be added to another
338cluster again without problems.
339
340[source,bash]
341rm /var/lib/corosync/*
342
343As the configuration files from the other nodes are still in the cluster
344filesystem you may want to clean those up too. Remove simply the whole
345directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
346you used the correct one before deleting it.
347
348CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
349the nodes can still connect to each other with public key authentication. This
350should be fixed by removing the respective keys from the
351'/etc/pve/priv/authorized_keys' file.
d8742b0c 352
806ef12d
DM
353Quorum
354------
355
356{pve} use a quorum-based technique to provide a consistent state among
357all cluster nodes.
358
359[quote, from Wikipedia, Quorum (distributed computing)]
360____
361A quorum is the minimum number of votes that a distributed transaction
362has to obtain in order to be allowed to perform an operation in a
363distributed system.
364____
365
366In case of network partitioning, state changes requires that a
367majority of nodes are online. The cluster switches to read-only mode
5eba0743 368if it loses quorum.
806ef12d
DM
369
370NOTE: {pve} assigns a single vote to each node by default.
371
372
373Cluster Cold Start
374------------------
375
376It is obvious that a cluster is not quorate when all nodes are
377offline. This is a common case after a power failure.
378
379NOTE: It is always a good idea to use an uninterruptible power supply
8c1189b6 380(``UPS'', also called ``battery backup'') to avoid this state, especially if
806ef12d
DM
381you want HA.
382
8c1189b6
FG
383On node startup, service `pve-manager` is started and waits for
384quorum. Once quorate, it starts all guests which have the `onboot`
612417fd
DM
385flag set.
386
387When you turn on nodes, or when power comes back after power failure,
388it is likely that some nodes boots faster than others. Please keep in
389mind that guest startup is delayed until you reach quorum.
806ef12d
DM
390
391
d8742b0c
DM
392ifdef::manvolnum[]
393include::pve-copyright.adoc[]
394endif::manvolnum[]