[pve-docs.git] / pvecm.adoc

ifdef::manvolnum[]
PVE({manvolnum})
================
include::attributes.txt[]

NAME
----

pvecm - Proxmox VE Cluster Manager

SYNOPSYS
--------

include::pvecm.1-synopsis.adoc[]

DESCRIPTION
-----------
endif::manvolnum[]

ifndef::manvolnum[]
Cluster Manager
===============
include::attributes.txt[]
endif::manvolnum[]

The {PVE} cluster manager `pvecm` is a tool to create a group of
physical servers. Such a group is called a *cluster*. We use the
http://www.corosync.org[Corosync Cluster Engine] for reliable group
communication, and such clusters can consist of up to 32 physical nodes
(probably more, dependent on network latency).

`pvecm` can be used to create a new cluster, join nodes to a cluster,
leave the cluster, get status information and do various other cluster
related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
is used to transparently distribute the cluster configuration to all cluster
nodes.

Grouping nodes into a cluster has the following advantages:

* Centralized, web based management

* Multi-master clusters: each node can do all management task

* `pmxcfs`: database-driven file system for storing configuration files,
 replicated in real-time on all nodes using `corosync`.

* Easy migration of virtual machines and containers between physical
  hosts

* Fast deployment

* Cluster-wide services like firewall and HA


Requirements
------------

* All nodes must be in the same network as `corosync` uses IP Multicast
 to communicate between nodes (also see
 http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
 ports 5404 and 5405 for cluster communication.
+
NOTE: Some switches do not support IP multicast by default and must be
manually enabled first.

* Date and time have to be synchronized.

* SSH tunnel on TCP port 22 between nodes is used. 

* If you are interested in High Availability, you need to have at
  least three nodes for reliable quorum. All nodes should have the
  same version.

* We recommend a dedicated NIC for the cluster traffic, especially if
  you use shared storage.

NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
Proxmox VE 4.0 cluster nodes.


Preparing Nodes
---------------

First, install {PVE} on all nodes. Make sure that each node is
installed with the final hostname and IP configuration. Changing the
hostname and IP is not possible after cluster creation.

Currently the cluster creation has to be done on the console, so you
need to login via `ssh`.

Create the Cluster
------------------

Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
This name cannot be changed later.

 hp1# pvecm create YOUR-CLUSTER-NAME

CAUTION: The cluster name is used to compute the default multicast
address. Please use unique cluster names if you run more than one
cluster inside your network.

To check the state of your cluster use:

 hp1# pvecm status


Adding Nodes to the Cluster
---------------------------

Login via `ssh` to the node you want to add.

 hp2# pvecm add IP-ADDRESS-CLUSTER

For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.

CAUTION: A new node cannot hold any VMs, because you would get
conflicts about identical VM IDs. Also, all existing configuration in
`/etc/pve` is overwritten when you join a new node to the cluster. To
workaround, use `vzdump` to backup and restore to a different VMID after
adding the node to the cluster.

To check the state of cluster:

 # pvecm status

.Cluster status after adding 4 nodes
----
hp2# pvecm status
Quorum information
~~~~~~~~~~~~~~~~~~
Date:             Mon Apr 20 12:30:13 2015
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1928
Quorate:          Yes

Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           2
Flags:            Quorate

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
0x00000001          1 192.168.15.91
0x00000002          1 192.168.15.92 (local)
0x00000003          1 192.168.15.93
0x00000004          1 192.168.15.94
----

If you only want the list of all nodes use:

 # pvecm nodes

.List nodes in a cluster
----
hp2# pvecm nodes

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
         1          1 hp1
         2          1 hp2 (local)
         3          1 hp3
         4          1 hp4
----


Remove a Cluster Node
---------------------

CAUTION: Read carefully the procedure before proceeding, as it could
not be what you want or need.

Move all virtual machines from the node. Make sure you have no local
data or backups you want to keep, or save them accordingly.

Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
identify the node ID:

----
hp1# pvecm status

Quorum information
~~~~~~~~~~~~~~~~~~
Date:             Mon Apr 20 12:30:13 2015
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1928
Quorate:          Yes

Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           2
Flags:            Quorate

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
0x00000001          1 192.168.15.91 (local)
0x00000002          1 192.168.15.92
0x00000003          1 192.168.15.93
0x00000004          1 192.168.15.94
----

IMPORTANT: at this point you must power off the node to be removed and
make sure that it will not power on again (in the network) as it
is.

----
hp1# pvecm nodes

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
         1          1 hp1 (local)
         2          1 hp2
         3          1 hp3
         4          1 hp4
----

Log in to one remaining node via ssh. Issue the delete command (here
deleting node `hp4`):

 hp1# pvecm delnode hp4

If the operation succeeds no output is returned, just check the node
list again with `pvecm nodes` or `pvecm status`. You should see
something like:

----
hp1# pvecm status

Quorum information
~~~~~~~~~~~~~~~~~~
Date:             Mon Apr 20 12:44:28 2015
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1992
Quorate:          Yes

Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
~~~~~~~~~~~~~~~~~~~~~~
    Nodeid      Votes Name
0x00000001          1 192.168.15.90 (local)
0x00000002          1 192.168.15.91
0x00000003          1 192.168.15.92
----

IMPORTANT: as said above, it is very important to power off the node
*before* removal, and make sure that it will *never* power on again
(in the existing cluster network) as it is.

If you power on the node as it is, your cluster will be screwed up and
it could be difficult to restore a clean cluster state.

If, for whatever reason, you want that this server joins the same
cluster again, you have to

* reinstall {pve} on it from scratch

* then join it, as explained in the previous section.


Quorum
------

{pve} use a quorum-based technique to provide a consistent state among
all cluster nodes.

[quote, from Wikipedia, Quorum (distributed computing)]
____
A quorum is the minimum number of votes that a distributed transaction
has to obtain in order to be allowed to perform an operation in a
distributed system.
____

In case of network partitioning, state changes requires that a
majority of nodes are online. The cluster switches to read-only mode
if it loses quorum.

NOTE: {pve} assigns a single vote to each node by default.


Cluster Cold Start
------------------

It is obvious that a cluster is not quorate when all nodes are
offline. This is a common case after a power failure.

NOTE: It is always a good idea to use an uninterruptible power supply
(``UPS'', also called ``battery backup'') to avoid this state, especially if
you want HA.

On node startup, service `pve-manager` is started and waits for
quorum. Once quorate, it starts all guests which have the `onboot`
flag set.

When you turn on nodes, or when power comes back after power failure,
it is likely that some nodes boots faster than others. Please keep in
mind that guest startup is delayed until you reach quorum.


ifdef::manvolnum[]
include::pve-copyright.adoc[]
endif::manvolnum[]
Commit	Line	Data
	1	ifdef::manvolnum[]
	2	PVE({manvolnum})
	3	================
	4	include::attributes.txt[]
	5
	6	NAME
	7	----
	8
	9	pvecm - Proxmox VE Cluster Manager
	10
	11	SYNOPSYS
	12	--------
	13
	14	include::pvecm.1-synopsis.adoc[]
	15
	16	DESCRIPTION
	17	-----------
	18	endif::manvolnum[]
	19
	20	ifndef::manvolnum[]
	21	Cluster Manager
	22	===============
	23	include::attributes.txt[]
	24	endif::manvolnum[]
	25
	26	The {PVE} cluster manager `pvecm` is a tool to create a group of
	27	physical servers. Such a group is called a cluster. We use the
	28	http://www.corosync.org[Corosync Cluster Engine] for reliable group
	29	communication, and such clusters can consist of up to 32 physical nodes
	30	(probably more, dependent on network latency).
	31
	32	`pvecm` can be used to create a new cluster, join nodes to a cluster,
	33	leave the cluster, get status information and do various other cluster
	34	related tasks. The Proxmox Cluster File System (``pmxcfs'')
	35	is used to transparently distribute the cluster configuration to all cluster
	36	nodes.
	37
	38	Grouping nodes into a cluster has the following advantages:
	39
	40	* Centralized, web based management
	41
	42	* Multi-master clusters: each node can do all management task
	43
	44	* `pmxcfs`: database-driven file system for storing configuration files,
	45	replicated in real-time on all nodes using `corosync`.
	46
	47	* Easy migration of virtual machines and containers between physical
	48	hosts
	49
	50	* Fast deployment
	51
	52	* Cluster-wide services like firewall and HA
	53
	54
	55	Requirements
	56	------------
	57
	58	* All nodes must be in the same network as `corosync` uses IP Multicast
	59	to communicate between nodes (also see
	60	http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
	61	ports 5404 and 5405 for cluster communication.
	62	+
	63	NOTE: Some switches do not support IP multicast by default and must be
	64	manually enabled first.
	65
	66	* Date and time have to be synchronized.
	67
	68	* SSH tunnel on TCP port 22 between nodes is used.
	69
	70	* If you are interested in High Availability, you need to have at
	71	least three nodes for reliable quorum. All nodes should have the
	72	same version.
	73
	74	* We recommend a dedicated NIC for the cluster traffic, especially if
	75	you use shared storage.
	76
	77	NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
	78	Proxmox VE 4.0 cluster nodes.
	79
	80
	81	Preparing Nodes
	82	---------------
	83
	84	First, install {PVE} on all nodes. Make sure that each node is
	85	installed with the final hostname and IP configuration. Changing the
	86	hostname and IP is not possible after cluster creation.
	87
	88	Currently the cluster creation has to be done on the console, so you
	89	need to login via `ssh`.
	90
	91	Create the Cluster
	92	------------------
	93
	94	Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
	95	This name cannot be changed later.
	96
	97	hp1# pvecm create YOUR-CLUSTER-NAME
	98
	99	CAUTION: The cluster name is used to compute the default multicast
	100	address. Please use unique cluster names if you run more than one
	101	cluster inside your network.
	102
	103	To check the state of your cluster use:
	104
	105	hp1# pvecm status
	106
	107
	108	Adding Nodes to the Cluster
	109	---------------------------
	110
	111	Login via `ssh` to the node you want to add.
	112
	113	hp2# pvecm add IP-ADDRESS-CLUSTER
	114
	115	For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
	116
	117	CAUTION: A new node cannot hold any VMs, because you would get
	118	conflicts about identical VM IDs. Also, all existing configuration in
	119	`/etc/pve` is overwritten when you join a new node to the cluster. To
	120	workaround, use `vzdump` to backup and restore to a different VMID after
	121	adding the node to the cluster.
	122
	123	To check the state of cluster:
	124
	125	# pvecm status
	126
	127	.Cluster status after adding 4 nodes
	128	----
	129	hp2# pvecm status
	130	Quorum information
	131	~~~~~~~~~~~~~~~~~~
	132	Date: Mon Apr 20 12:30:13 2015
	133	Quorum provider: corosync_votequorum
	134	Nodes: 4
	135	Node ID: 0x00000001
	136	Ring ID: 1928
	137	Quorate: Yes
	138
	139	Votequorum information
	140	~~~~~~~~~~~~~~~~~~~~~~
	141	Expected votes: 4
	142	Highest expected: 4
	143	Total votes: 4
	144	Quorum: 2
	145	Flags: Quorate
	146
	147	Membership information
	148	~~~~~~~~~~~~~~~~~~~~~~
	149	Nodeid Votes Name
	150	0x00000001 1 192.168.15.91
	151	0x00000002 1 192.168.15.92 (local)
	152	0x00000003 1 192.168.15.93
	153	0x00000004 1 192.168.15.94
	154	----
	155
	156	If you only want the list of all nodes use:
	157
	158	# pvecm nodes
	159
	160	.List nodes in a cluster
	161	----
	162	hp2# pvecm nodes
	163
	164	Membership information
	165	~~~~~~~~~~~~~~~~~~~~~~
	166	Nodeid Votes Name
	167	1 1 hp1
	168	2 1 hp2 (local)
	169	3 1 hp3
	170	4 1 hp4
	171	----
	172
	173
	174	Remove a Cluster Node
	175	---------------------
	176
	177	CAUTION: Read carefully the procedure before proceeding, as it could
	178	not be what you want or need.
	179
	180	Move all virtual machines from the node. Make sure you have no local
	181	data or backups you want to keep, or save them accordingly.
	182
	183	Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
	184	identify the node ID:
	185
	186	----
	187	hp1# pvecm status
	188
	189	Quorum information
	190	~~~~~~~~~~~~~~~~~~
	191	Date: Mon Apr 20 12:30:13 2015
	192	Quorum provider: corosync_votequorum
	193	Nodes: 4
	194	Node ID: 0x00000001
	195	Ring ID: 1928
	196	Quorate: Yes
	197
	198	Votequorum information
	199	~~~~~~~~~~~~~~~~~~~~~~
	200	Expected votes: 4
	201	Highest expected: 4
	202	Total votes: 4
	203	Quorum: 2
	204	Flags: Quorate
	205
	206	Membership information
	207	~~~~~~~~~~~~~~~~~~~~~~
	208	Nodeid Votes Name
	209	0x00000001 1 192.168.15.91 (local)
	210	0x00000002 1 192.168.15.92
	211	0x00000003 1 192.168.15.93
	212	0x00000004 1 192.168.15.94
	213	----
	214
	215	IMPORTANT: at this point you must power off the node to be removed and
	216	make sure that it will not power on again (in the network) as it
	217	is.
	218
	219	----
	220	hp1# pvecm nodes
	221
	222	Membership information
	223	~~~~~~~~~~~~~~~~~~~~~~
	224	Nodeid Votes Name
	225	1 1 hp1 (local)
	226	2 1 hp2
	227	3 1 hp3
	228	4 1 hp4
	229	----
	230
	231	Log in to one remaining node via ssh. Issue the delete command (here
	232	deleting node `hp4`):
	233
	234	hp1# pvecm delnode hp4
	235
	236	If the operation succeeds no output is returned, just check the node
	237	list again with `pvecm nodes` or `pvecm status`. You should see
	238	something like:
	239
	240	----
	241	hp1# pvecm status
	242
	243	Quorum information
	244	~~~~~~~~~~~~~~~~~~
	245	Date: Mon Apr 20 12:44:28 2015
	246	Quorum provider: corosync_votequorum
	247	Nodes: 3
	248	Node ID: 0x00000001
	249	Ring ID: 1992
	250	Quorate: Yes
	251
	252	Votequorum information
	253	~~~~~~~~~~~~~~~~~~~~~~
	254	Expected votes: 3
	255	Highest expected: 3
	256	Total votes: 3
	257	Quorum: 3
	258	Flags: Quorate
	259
	260	Membership information
	261	~~~~~~~~~~~~~~~~~~~~~~
	262	Nodeid Votes Name
	263	0x00000001 1 192.168.15.90 (local)
	264	0x00000002 1 192.168.15.91
	265	0x00000003 1 192.168.15.92
	266	----
	267
	268	IMPORTANT: as said above, it is very important to power off the node
	269	before removal, and make sure that it will never power on again
	270	(in the existing cluster network) as it is.
	271
	272	If you power on the node as it is, your cluster will be screwed up and
	273	it could be difficult to restore a clean cluster state.
	274
	275	If, for whatever reason, you want that this server joins the same
	276	cluster again, you have to
	277
	278	* reinstall {pve} on it from scratch
	279
	280	* then join it, as explained in the previous section.
	281
	282
	283	Quorum
	284	------
	285
	286	{pve} use a quorum-based technique to provide a consistent state among
	287	all cluster nodes.
	288
	289	[quote, from Wikipedia, Quorum (distributed computing)]
	290	____
	291	A quorum is the minimum number of votes that a distributed transaction
	292	has to obtain in order to be allowed to perform an operation in a
	293	distributed system.
	294	____
	295
	296	In case of network partitioning, state changes requires that a
	297	majority of nodes are online. The cluster switches to read-only mode
	298	if it loses quorum.
	299
	300	NOTE: {pve} assigns a single vote to each node by default.
	301
	302
	303	Cluster Cold Start
	304	------------------
	305
	306	It is obvious that a cluster is not quorate when all nodes are
	307	offline. This is a common case after a power failure.
	308
	309	NOTE: It is always a good idea to use an uninterruptible power supply
	310	(``UPS'', also called ``battery backup'') to avoid this state, especially if
	311	you want HA.
	312
	313	On node startup, service `pve-manager` is started and waits for
	314	quorum. Once quorate, it starts all guests which have the `onboot`
	315	flag set.
	316
	317	When you turn on nodes, or when power comes back after power failure,
	318	it is likely that some nodes boots faster than others. Please keep in
	319	mind that guest startup is delayed until you reach quorum.
	320
	321
	322	ifdef::manvolnum[]
	323	include::pve-copyright.adoc[]
	324	endif::manvolnum[]