[pve-docs.git] / pvecm.adoc

ifdef::manvolnum[]
PVE({manvolnum})
================
include::attributes.txt[]

NAME
----

pvecm - Proxmox VE Cluster Manager

SYNOPSYS
--------

include::pvecm.1-synopsis.adoc[]

DESCRIPTION
-----------
endif::manvolnum[]

ifndef::manvolnum[]
Cluster Manager
===============
include::attributes.txt[]
endif::manvolnum[]

The {PVE} cluster manager 'pvecm' is a tool to create a group of
physical servers. Such group is called a *cluster*. We use the
http://www.corosync.org[Corosync Cluster Engine] for reliable group
communication, and such cluster can consists of up to 32 physical nodes
(probably more, dependent on network latency).

'pvecm' can be used to create a new cluster, join nodes to a cluster,
leave the cluster, get status information and do various other cluster
related tasks. The Proxmox Cluster file system (pmxcfs) is used to
transparently distribute the cluster configuration to all cluster
nodes.

Grouping nodes into a cluster has the following advantages:

* Centralized, web based management

* Multi-master clusters: Each node can do all management task

* Proxmox Cluster file system (pmxcfs): Database-driven file system
  for storing configuration files, replicated in real-time on all
  nodes using corosync.

* Easy migration of Virtual Machines and Containers between physical
  hosts

* Fast deployment

* Cluster-wide services like firewall and HA


Requirements
------------

* All nodes must be in the same network as corosync uses IP Multicast
 to communicate between nodes (also see
 http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
 ports 5404 and 5405 for cluster communication.
+
NOTE: Some switches do not support IP multicast by default and must be
manually enabled first.

* Date and time have to be synchronized.

* SSH tunnel on TCP port 22 between nodes is used. 

* If you are interested in High Availability, you need to have at
  least three nodes for reliable quorum. All nodes should have the
  same version.

* We recommend a dedicated NIC for the cluster traffic, especially if
  you use shared storage.

NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
Proxmox VE 4.0 cluster nodes.


Preparing Nodes
---------------

First, install {PVE} on all nodes. Make sure that each node is
installed with the final hostname and IP configuration. Changing the
hostname and IP is not possible after cluster creation.

Currently the cluster creation has to be done on the console, so you
need to login via 'ssh'.

Create the Cluster
------------------

Login via 'ssh' to the first Proxmox VE node. Use a unique name for
your cluster. This name cannot be changed later.

 hp1# pvecm create YOUR-CLUSTER-NAME

CAUTION: The cluster name is used to compute the default multicast
address. Please use unique cluster names if you run more than one
cluster inside your network.

To check the state of your cluster use:

 hp1# pvecm status


Adding Nodes to the Cluster
---------------------------

Login via 'ssh' to the node you want to add.

 hp2# pvecm add IP-ADDRESS-CLUSTER

For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.

CAUTION: A new node cannot hold any VM´s, because you would get
conflicts about identical VM IDs. Also, all existing configuration in
'/etc/pve' is overwritten when you join a new node to the cluster. To
workaround, use vzdump to backup and restore to a different VMID after
adding the node to the cluster.

To check the state of cluster:

 # pvecm status

.Cluster status after adding 4 nodes
----
hp2# pvecm status
Quorum information
~~~~~~~~~~~~~~~~~~
Date:             Mon Apr 20 12:30:13 2015
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1928
Quorate:          Yes

Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           2
Flags:            Quorate

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
0x00000001          1 192.168.15.91
0x00000002          1 192.168.15.92 (local)
0x00000003          1 192.168.15.93
0x00000004          1 192.168.15.94
----

If you only want the list of all nodes use:

 # pvecm nodes

.List Nodes in a Cluster
----
hp2# pvecm nodes

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
         1          1 hp1
         2          1 hp2 (local)
         3          1 hp3
         4          1 hp4
----


Remove a Cluster Node
---------------------

CAUTION: Read carefully the procedure before proceeding, as it could
not be what you want or need.

Move all virtual machines from the node. Make sure you have no local
data or backups you want to keep, or save them accordingly.

Log in to one remaining node via ssh. Issue a 'pvecm nodes' command to
identify the node ID:

----
hp1# pvecm status

Quorum information
~~~~~~~~~~~~~~~~~~
Date:             Mon Apr 20 12:30:13 2015
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1928
Quorate:          Yes

Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           2
Flags:            Quorate

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
0x00000001          1 192.168.15.91 (local)
0x00000002          1 192.168.15.92
0x00000003          1 192.168.15.93
0x00000004          1 192.168.15.94
----

IMPORTANT: at this point you must power off the node to be removed and
make sure that it will not power on again (in the network) as it
is.

----
hp1# pvecm nodes

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
         1          1 hp1 (local)
         2          1 hp2
         3          1 hp3
         4          1 hp4
----

Log in to one remaining node via ssh. Issue the delete command (here
deleting node hp4):

 hp1# pvecm delnode hp4

If the operation succeeds no output is returned, just check the node
list again with 'pvecm nodes' or 'pvecm status'. You should see
something like:

----
hp1# pvecm status

Quorum information
~~~~~~~~~~~~~~~~~~
Date:             Mon Apr 20 12:44:28 2015
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1992
Quorate:          Yes

Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
0x00000001          1 192.168.15.90 (local)
0x00000002          1 192.168.15.91
0x00000003          1 192.168.15.92
----

IMPORTANT: as said above, it is very important to power off the node
*before* removal, and make sure that it will *never* power on again
(in the existing cluster network) as it is.

If you power on the node as it is, your cluster will be screwed up and
it could be difficult to restore a clean cluster state.

If, for whatever reason, you want that this server joins the same
cluster again, you have to

* reinstall pve on it from scratch

* then join it, as explained in the previous section.


Quorum
------

{pve} use a quorum-based technique to provide a consistent state among
all cluster nodes.

[quote, from Wikipedia, Quorum (distributed computing)]
____
A quorum is the minimum number of votes that a distributed transaction
has to obtain in order to be allowed to perform an operation in a
distributed system.
____

In case of network partitioning, state changes requires that a
majority of nodes are online. The cluster switches to read-only mode
if it loose quorum.

NOTE: {pve} assigns a single vote to each node by default.


Cluster Cold Start
------------------

It is obvious that a cluster is not quorate when all nodes are
offline. This is a common case after a power failure.

NOTE: It is always a good idea to use an uninterruptible power supply
('UPS', also called 'battery backup') to avoid this state. Especially if
you want HA.

On node startup, service 'pve-manager' is started and waits for
quorum. Once quorate, it starts all guests which have the 'onboot'
flag set.

When you turn on nodes, or when power comes back after power failure,
it is likely that some nodes boots faster than others. Please keep in
mind that guest startup is delayed until you reach quorum.


ifdef::manvolnum[]
include::pve-copyright.adoc[]
endif::manvolnum[]
Commit	Line	Data
d8742b0c DM	1	ifdef::manvolnum[]
	2	PVE({manvolnum})
	3	================
	4	include::attributes.txt[]
	5
	6	NAME
	7	----
	8
	9	pvecm - Proxmox VE Cluster Manager
	10
	11	SYNOPSYS
	12	--------
	13
	14	include::pvecm.1-synopsis.adoc[]
	15
	16	DESCRIPTION
	17	-----------
	18	endif::manvolnum[]
	19
	20	ifndef::manvolnum[]
	21	Cluster Manager
	22	===============
	23	include::attributes.txt[]
	24	endif::manvolnum[]
	25
8a865621 DM	26	The {PVE} cluster manager 'pvecm' is a tool to create a group of
	27	physical servers. Such group is called a cluster. We use the
	28	http://www.corosync.org[Corosync Cluster Engine] for reliable group
	29	communication, and such cluster can consists of up to 32 physical nodes
	30	(probably more, dependent on network latency).
	31
	32	'pvecm' can be used to create a new cluster, join nodes to a cluster,
	33	leave the cluster, get status information and do various other cluster
	34	related tasks. The Proxmox Cluster file system (pmxcfs) is used to
	35	transparently distribute the cluster configuration to all cluster
	36	nodes.
	37
	38	Grouping nodes into a cluster has the following advantages:
	39
	40	* Centralized, web based management
	41
	42	* Multi-master clusters: Each node can do all management task
	43
	44	* Proxmox Cluster file system (pmxcfs): Database-driven file system
	45	for storing configuration files, replicated in real-time on all
	46	nodes using corosync.
	47
	48	* Easy migration of Virtual Machines and Containers between physical
	49	hosts
	50
	51	* Fast deployment
	52
	53	* Cluster-wide services like firewall and HA
	54
	55
	56	Requirements
	57	------------
	58
	59	* All nodes must be in the same network as corosync uses IP Multicast
	60	to communicate between nodes (also see
ceabe189	61	http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
ff72a2ba	62	ports 5404 and 5405 for cluster communication.
ceabe189 DM	63	+
	64	NOTE: Some switches do not support IP multicast by default and must be
	65	manually enabled first.
8a865621 DM	66
	67	* Date and time have to be synchronized.
	68
ceabe189	69	* SSH tunnel on TCP port 22 between nodes is used.
8a865621	70
ceabe189 DM	71	* If you are interested in High Availability, you need to have at
	72	least three nodes for reliable quorum. All nodes should have the
	73	same version.
8a865621 DM	74
	75	* We recommend a dedicated NIC for the cluster traffic, especially if
	76	you use shared storage.
	77
	78	NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
ceabe189	79	Proxmox VE 4.0 cluster nodes.
8a865621 DM	80
8a865621 DM	81
ceabe189 DM	82	Preparing Nodes
ceabe189 DM	83	---------------
8a865621 DM	84
	85	First, install {PVE} on all nodes. Make sure that each node is
	86	installed with the final hostname and IP configuration. Changing the
	87	hostname and IP is not possible after cluster creation.
	88
	89	Currently the cluster creation has to be done on the console, so you
	90	need to login via 'ssh'.
	91
8a865621	92	Create the Cluster
ceabe189	93	------------------
8a865621 DM	94
	95	Login via 'ssh' to the first Proxmox VE node. Use a unique name for
	96	your cluster. This name cannot be changed later.
	97
	98	hp1# pvecm create YOUR-CLUSTER-NAME
	99
63f956c8 DM	100	CAUTION: The cluster name is used to compute the default multicast
	101	address. Please use unique cluster names if you run more than one
	102	cluster inside your network.
	103
8a865621 DM	104	To check the state of your cluster use:
	105
	106	hp1# pvecm status
	107
	108
	109	Adding Nodes to the Cluster
ceabe189	110	---------------------------
8a865621 DM	111
	112	Login via 'ssh' to the node you want to add.
	113
	114	hp2# pvecm add IP-ADDRESS-CLUSTER
	115
	116	For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
	117
	118	CAUTION: A new node cannot hold any VM´s, because you would get
7980581f DM	119	conflicts about identical VM IDs. Also, all existing configuration in
	120	'/etc/pve' is overwritten when you join a new node to the cluster. To
	121	workaround, use vzdump to backup and restore to a different VMID after
	122	adding the node to the cluster.
8a865621 DM	123
	124	To check the state of cluster:
	125
	126	# pvecm status
	127
ceabe189	128	.Cluster status after adding 4 nodes
8a865621 DM	129	----
	130	hp2# pvecm status
	131	Quorum information
	132	~~~~~~~~~~~~~~~~~~
	133	Date: Mon Apr 20 12:30:13 2015
	134	Quorum provider: corosync_votequorum
	135	Nodes: 4
	136	Node ID: 0x00000001
	137	Ring ID: 1928
	138	Quorate: Yes
	139
	140	Votequorum information
	141	~~~~~~~~~~~~~~~~~~~~~~
	142	Expected votes: 4
	143	Highest expected: 4
	144	Total votes: 4
	145	Quorum: 2
	146	Flags: Quorate
	147
	148	Membership information
	149	~~~~~~~~~~~~~~~~~~~~~~
	150	Nodeid Votes Name
	151	0x00000001 1 192.168.15.91
	152	0x00000002 1 192.168.15.92 (local)
	153	0x00000003 1 192.168.15.93
	154	0x00000004 1 192.168.15.94
	155	----
	156
	157	If you only want the list of all nodes use:
	158
	159	# pvecm nodes
	160
	161	.List Nodes in a Cluster
	162	----
	163	hp2# pvecm nodes
	164
	165	Membership information
	166	~~~~~~~~~~~~~~~~~~~~~~
	167	Nodeid Votes Name
	168	1 1 hp1
	169	2 1 hp2 (local)
	170	3 1 hp3
	171	4 1 hp4
	172	----
	173
	174
	175	Remove a Cluster Node
ceabe189	176	---------------------
8a865621 DM	177
	178	CAUTION: Read carefully the procedure before proceeding, as it could
	179	not be what you want or need.
	180
	181	Move all virtual machines from the node. Make sure you have no local
	182	data or backups you want to keep, or save them accordingly.
	183
	184	Log in to one remaining node via ssh. Issue a 'pvecm nodes' command to
7980581f	185	identify the node ID:
8a865621 DM	186
	187	----
	188	hp1# pvecm status
	189
	190	Quorum information
	191	~~~~~~~~~~~~~~~~~~
	192	Date: Mon Apr 20 12:30:13 2015
	193	Quorum provider: corosync_votequorum
	194	Nodes: 4
	195	Node ID: 0x00000001
	196	Ring ID: 1928
	197	Quorate: Yes
	198
	199	Votequorum information
	200	~~~~~~~~~~~~~~~~~~~~~~
	201	Expected votes: 4
	202	Highest expected: 4
	203	Total votes: 4
	204	Quorum: 2
	205	Flags: Quorate
	206
	207	Membership information
	208	~~~~~~~~~~~~~~~~~~~~~~
	209	Nodeid Votes Name
	210	0x00000001 1 192.168.15.91 (local)
	211	0x00000002 1 192.168.15.92
	212	0x00000003 1 192.168.15.93
	213	0x00000004 1 192.168.15.94
	214	----
	215
	216	IMPORTANT: at this point you must power off the node to be removed and
	217	make sure that it will not power on again (in the network) as it
	218	is.
	219
	220	----
	221	hp1# pvecm nodes
	222
	223	Membership information
	224	~~~~~~~~~~~~~~~~~~~~~~
	225	Nodeid Votes Name
	226	1 1 hp1 (local)
	227	2 1 hp2
	228	3 1 hp3
	229	4 1 hp4
	230	----
	231
	232	Log in to one remaining node via ssh. Issue the delete command (here
	233	deleting node hp4):
	234
	235	hp1# pvecm delnode hp4
	236
	237	If the operation succeeds no output is returned, just check the node
	238	list again with 'pvecm nodes' or 'pvecm status'. You should see
	239	something like:
	240
	241	----
	242	hp1# pvecm status
	243
	244	Quorum information
	245	~~~~~~~~~~~~~~~~~~
	246	Date: Mon Apr 20 12:44:28 2015
	247	Quorum provider: corosync_votequorum
	248	Nodes: 3
	249	Node ID: 0x00000001
250	Ring ID: 1992
251	Quorate: Yes
252
253	Votequorum information
254	~~~~~~~~~~~~~~~~~~~~~~
255	Expected votes: 3
256	Highest expected: 3
257	Total votes: 3
258	Quorum: 3
259	Flags: Quorate
260
261	Membership information
262	~~~~~~~~~~~~~~~~~~~~~~
263	Nodeid Votes Name
264	0x00000001 1 192.168.15.90 (local)
265	0x00000002 1 192.168.15.91
266	0x00000003 1 192.168.15.92
267	----
268
269	IMPORTANT: as said above, it is very important to power off the node
270	before removal, and make sure that it will never power on again
271	(in the existing cluster network) as it is.
272
273	If you power on the node as it is, your cluster will be screwed up and
274	it could be difficult to restore a clean cluster state.
275
276	If, for whatever reason, you want that this server joins the same
277	cluster again, you have to
278
279	* reinstall pve on it from scratch
280
281	* then join it, as explained in the previous section.
d8742b0c DM	282
d8742b0c DM	283
806ef12d DM	284	Quorum
	285	------
	286
	287	{pve} use a quorum-based technique to provide a consistent state among
	288	all cluster nodes.
	289
	290	[quote, from Wikipedia, Quorum (distributed computing)]
	291	____
	292	A quorum is the minimum number of votes that a distributed transaction
	293	has to obtain in order to be allowed to perform an operation in a
	294	distributed system.
	295	____
	296
	297	In case of network partitioning, state changes requires that a
	298	majority of nodes are online. The cluster switches to read-only mode
	299	if it loose quorum.
	300
	301	NOTE: {pve} assigns a single vote to each node by default.
	302
	303
	304	Cluster Cold Start
	305	------------------
	306
	307	It is obvious that a cluster is not quorate when all nodes are
	308	offline. This is a common case after a power failure.
	309
	310	NOTE: It is always a good idea to use an uninterruptible power supply
	311	('UPS', also called 'battery backup') to avoid this state. Especially if
	312	you want HA.
	313
612417fd DM	314	On node startup, service 'pve-manager' is started and waits for
	315	quorum. Once quorate, it starts all guests which have the 'onboot'
	316	flag set.
	317
	318	When you turn on nodes, or when power comes back after power failure,
	319	it is likely that some nodes boots faster than others. Please keep in
	320	mind that guest startup is delayed until you reach quorum.
806ef12d DM	321
806ef12d DM	322
d8742b0c DM	323	ifdef::manvolnum[]
	324	include::pve-copyright.adoc[]
	325	endif::manvolnum[]