include::attributes.txt[]
endif::manvolnum[]
-'pvecm' is a program to manage the cluster configuration. It can be
-used to create a new cluster, join nodes to a cluster, leave the
-cluster, get status information and do various other cluster related
-tasks.
+The {PVE} cluster manager `pvecm` is a tool to create a group of
+physical servers. Such a group is called a *cluster*. We use the
+http://www.corosync.org[Corosync Cluster Engine] for reliable group
+communication, and such clusters can consist of up to 32 physical nodes
+(probably more, dependent on network latency).
+
+`pvecm` can be used to create a new cluster, join nodes to a cluster,
+leave the cluster, get status information and do various other cluster
+related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
+is used to transparently distribute the cluster configuration to all cluster
+nodes.
+
+Grouping nodes into a cluster has the following advantages:
+
+* Centralized, web based management
+
+* Multi-master clusters: each node can do all management task
+
+* `pmxcfs`: database-driven file system for storing configuration files,
+ replicated in real-time on all nodes using `corosync`.
+
+* Easy migration of virtual machines and containers between physical
+ hosts
+
+* Fast deployment
+
+* Cluster-wide services like firewall and HA
+
+
+Requirements
+------------
+
+* All nodes must be in the same network as `corosync` uses IP Multicast
+ to communicate between nodes (also see
+ http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
+ ports 5404 and 5405 for cluster communication.
++
+NOTE: Some switches do not support IP multicast by default and must be
+manually enabled first.
+
+* Date and time have to be synchronized.
+
+* SSH tunnel on TCP port 22 between nodes is used.
+
+* If you are interested in High Availability, you need to have at
+ least three nodes for reliable quorum. All nodes should have the
+ same version.
+
+* We recommend a dedicated NIC for the cluster traffic, especially if
+ you use shared storage.
+
+NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
+Proxmox VE 4.0 cluster nodes.
+
+
+Preparing Nodes
+---------------
+
+First, install {PVE} on all nodes. Make sure that each node is
+installed with the final hostname and IP configuration. Changing the
+hostname and IP is not possible after cluster creation.
+
+Currently the cluster creation has to be done on the console, so you
+need to login via `ssh`.
+
+Create the Cluster
+------------------
+
+Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
+This name cannot be changed later.
+
+ hp1# pvecm create YOUR-CLUSTER-NAME
+
+CAUTION: The cluster name is used to compute the default multicast
+address. Please use unique cluster names if you run more than one
+cluster inside your network.
+
+To check the state of your cluster use:
+
+ hp1# pvecm status
+
+
+Adding Nodes to the Cluster
+---------------------------
+
+Login via `ssh` to the node you want to add.
+
+ hp2# pvecm add IP-ADDRESS-CLUSTER
+
+For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
+
+CAUTION: A new node cannot hold any VMs, because you would get
+conflicts about identical VM IDs. Also, all existing configuration in
+`/etc/pve` is overwritten when you join a new node to the cluster. To
+workaround, use `vzdump` to backup and restore to a different VMID after
+adding the node to the cluster.
+
+To check the state of cluster:
+
+ # pvecm status
+
+.Cluster status after adding 4 nodes
+----
+hp2# pvecm status
+Quorum information
+~~~~~~~~~~~~~~~~~~
+Date: Mon Apr 20 12:30:13 2015
+Quorum provider: corosync_votequorum
+Nodes: 4
+Node ID: 0x00000001
+Ring ID: 1928
+Quorate: Yes
+
+Votequorum information
+~~~~~~~~~~~~~~~~~~~~~~
+Expected votes: 4
+Highest expected: 4
+Total votes: 4
+Quorum: 2
+Flags: Quorate
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+ Nodeid Votes Name
+0x00000001 1 192.168.15.91
+0x00000002 1 192.168.15.92 (local)
+0x00000003 1 192.168.15.93
+0x00000004 1 192.168.15.94
+----
+
+If you only want the list of all nodes use:
+
+ # pvecm nodes
+
+.List nodes in a cluster
+----
+hp2# pvecm nodes
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+ Nodeid Votes Name
+ 1 1 hp1
+ 2 1 hp2 (local)
+ 3 1 hp3
+ 4 1 hp4
+----
+
+
+Remove a Cluster Node
+---------------------
+
+CAUTION: Read carefully the procedure before proceeding, as it could
+not be what you want or need.
+
+Move all virtual machines from the node. Make sure you have no local
+data or backups you want to keep, or save them accordingly.
+
+Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
+identify the node ID:
+
+----
+hp1# pvecm status
+
+Quorum information
+~~~~~~~~~~~~~~~~~~
+Date: Mon Apr 20 12:30:13 2015
+Quorum provider: corosync_votequorum
+Nodes: 4
+Node ID: 0x00000001
+Ring ID: 1928
+Quorate: Yes
+
+Votequorum information
+~~~~~~~~~~~~~~~~~~~~~~
+Expected votes: 4
+Highest expected: 4
+Total votes: 4
+Quorum: 2
+Flags: Quorate
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+ Nodeid Votes Name
+0x00000001 1 192.168.15.91 (local)
+0x00000002 1 192.168.15.92
+0x00000003 1 192.168.15.93
+0x00000004 1 192.168.15.94
+----
+
+IMPORTANT: at this point you must power off the node to be removed and
+make sure that it will not power on again (in the network) as it
+is.
+
+----
+hp1# pvecm nodes
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+ Nodeid Votes Name
+ 1 1 hp1 (local)
+ 2 1 hp2
+ 3 1 hp3
+ 4 1 hp4
+----
+
+Log in to one remaining node via ssh. Issue the delete command (here
+deleting node `hp4`):
+
+ hp1# pvecm delnode hp4
+
+If the operation succeeds no output is returned, just check the node
+list again with `pvecm nodes` or `pvecm status`. You should see
+something like:
+
+----
+hp1# pvecm status
+
+Quorum information
+~~~~~~~~~~~~~~~~~~
+Date: Mon Apr 20 12:44:28 2015
+Quorum provider: corosync_votequorum
+Nodes: 3
+Node ID: 0x00000001
+Ring ID: 1992
+Quorate: Yes
+
+Votequorum information
+~~~~~~~~~~~~~~~~~~~~~~
+Expected votes: 3
+Highest expected: 3
+Total votes: 3
+Quorum: 3
+Flags: Quorate
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+ Nodeid Votes Name
+0x00000001 1 192.168.15.90 (local)
+0x00000002 1 192.168.15.91
+0x00000003 1 192.168.15.92
+----
+
+IMPORTANT: as said above, it is very important to power off the node
+*before* removal, and make sure that it will *never* power on again
+(in the existing cluster network) as it is.
+
+If you power on the node as it is, your cluster will be screwed up and
+it could be difficult to restore a clean cluster state.
+
+If, for whatever reason, you want that this server joins the same
+cluster again, you have to
+
+* reinstall {pve} on it from scratch
+
+* then join it, as explained in the previous section.
+
+Separate A Node Without Reinstalling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+CAUTION: This is *not* the recommended method, proceed with caution. Use the
+above mentioned method if you're unsure.
+
+You can also separate a node from a cluster without reinstalling it from
+scratch. But after removing the node from the cluster it will still have
+access to the shared storages! This must be resolved before you start removing
+the node from the cluster. A {pve} cluster cannot share the exact same
+storage with another cluster, as it leads to VMID conflicts.
+
+Move the guests which you want to keep on this node now, after the removal you
+can do this only via backup and restore. Its suggested that you create a new
+storage where only the node which you want to separate has access. This can be
+an new export on your NFS or a new Ceph pool, to name a few examples. Its just
+important that the exact same storage does not gets accessed by multiple
+clusters. After setting this storage up move all data from the node and its VMs
+to it. Then you are ready to separate the node from the cluster.
+
+WARNING: Ensure all shared resources are cleanly separated! You will run into
+conflicts and problems else.
+
+First stop the corosync and the pve-cluster services on the node:
+[source,bash]
+systemctl stop pve-cluster
+systemctl stop corosync
+
+Start the cluster filesystem again in local mode:
+[source,bash]
+pmxcfs -l
+
+Delete the corosync configuration files:
+[source,bash]
+rm /etc/pve/corosync.conf
+rm /etc/corosync/*
+
+You can now start the filesystem again as normal service:
+[source,bash]
+killall pmxcfs
+systemctl start pve-cluster
+
+The node is now separated from the cluster. You can deleted it from a remaining
+node of the cluster with:
+[source,bash]
+pvecm delnode oldnode
+
+If the command failed, because the remaining node in the cluster lost quorum
+when the now separate node exited, you may set the expected votes to 1 as a workaround:
+[source,bash]
+pvecm expected 1
+
+And the repeat the 'pvecm delnode' command.
+
+Now switch back to the separated node, here delete all remaining files left
+from the old cluster. This ensures that the node can be added to another
+cluster again without problems.
+
+[source,bash]
+rm /var/lib/corosync/*
+
+As the configuration files from the other nodes are still in the cluster
+filesystem you may want to clean those up too. Remove simply the whole
+directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
+you used the correct one before deleting it.
+
+CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
+the nodes can still connect to each other with public key authentication. This
+should be fixed by removing the respective keys from the
+'/etc/pve/priv/authorized_keys' file.
+
+Quorum
+------
+
+{pve} use a quorum-based technique to provide a consistent state among
+all cluster nodes.
+
+[quote, from Wikipedia, Quorum (distributed computing)]
+____
+A quorum is the minimum number of votes that a distributed transaction
+has to obtain in order to be allowed to perform an operation in a
+distributed system.
+____
+
+In case of network partitioning, state changes requires that a
+majority of nodes are online. The cluster switches to read-only mode
+if it loses quorum.
+
+NOTE: {pve} assigns a single vote to each node by default.
+
+
+Cluster Cold Start
+------------------
+
+It is obvious that a cluster is not quorate when all nodes are
+offline. This is a common case after a power failure.
+
+NOTE: It is always a good idea to use an uninterruptible power supply
+(``UPS'', also called ``battery backup'') to avoid this state, especially if
+you want HA.
+
+On node startup, service `pve-manager` is started and waits for
+quorum. Once quorate, it starts all guests which have the `onboot`
+flag set.
+
+When you turn on nodes, or when power comes back after power failure,
+it is likely that some nodes boots faster than others. Please keep in
+mind that guest startup is delayed until you reach quorum.
ifdef::manvolnum[]