The {PVE} cluster manager `pvecm` is a tool to create a group of
physical servers. Such a group is called a *cluster*. We use the
http://www.corosync.org[Corosync Cluster Engine] for reliable group
-communication, and such clusters can consist of up to 32 physical nodes
-(probably more, dependent on network latency).
+communication. There's no explicit limit for the number of nodes in a cluster.
+In practice, the actual possible node count may be limited by the host and
+network performance. Currently (2021), there are reports of clusters (using
+high-end enterprise hardware) with over 50 nodes in production.
`pvecm` can be used to create a new cluster, join nodes to a cluster,
-leave the cluster, get status information and do various other cluster
-related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
+leave the cluster, get status information and do various other cluster-related
+tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
is used to transparently distribute the cluster configuration to all cluster
nodes.
* Date and time have to be synchronized.
-* SSH tunnel on TCP port 22 between nodes is used.
+* SSH tunnel on TCP port 22 between nodes is used.
* If you are interested in High Availability, you need to have at
least three nodes for reliable quorum. All nodes should have the
----
hp1# pvecm delnode hp4
+ Killing node 4
----
-If the operation succeeds no output is returned, just check the node
-list again with `pvecm nodes` or `pvecm status`. You should see
-something like:
+Use `pvecm nodes` or `pvecm status` to check the node list again. It should
+look something like:
----
hp1# pvecm status
scratch. But after removing the node from the cluster it will still have
access to the shared storages! This must be resolved before you start removing
the node from the cluster. A {pve} cluster cannot share the exact same
-storage with another cluster, as storage locking doesn't work over cluster
+storage with another cluster, as storage locking doesn't work over the cluster
boundary. Further, it may also lead to VMID conflicts.
Its suggested that you create a new storage where only the node which you want
WARNING: Ensure all shared resources are cleanly separated! Otherwise you will
run into conflicts and problems.
-First stop the corosync and the pve-cluster services on the node:
+First, stop the corosync and the pve-cluster services on the node:
[source,bash]
----
systemctl stop pve-cluster
[source,bash]
----
rm /etc/pve/corosync.conf
-rm /etc/corosync/*
+rm -r /etc/corosync/*
----
You can now start the filesystem again as normal service:
Setting Up A New Network
^^^^^^^^^^^^^^^^^^^^^^^^
-First you have to set up a new network interface. It should be on a physically
+First, you have to set up a new network interface. It should be on a physically
separate network. Ensure that your network fulfills the
xref:pvecm_cluster_network_requirements[cluster network requirements].
which may lead to a situation where an address is changed without thinking
about implications for corosync.
-A seperate, static hostname specifically for corosync is recommended, if
+A separate, static hostname specifically for corosync is recommended, if
hostnames are preferred. Also, make sure that every node in the cluster can
resolve all hostnames correctly.
Nodes that joined the cluster on earlier versions likely still use their
unresolved hostname in `corosync.conf`. It might be a good idea to replace
-them with IPs or a seperate hostname, as mentioned above.
+them with IPs or a separate hostname, as mentioned above.
[[pvecm_redundancy]]
Links are used according to a priority setting. You can configure this priority
by setting 'knet_link_priority' in the corresponding interface section in
-`corosync.conf`, or, preferrably, using the 'priority' parameter when creating
+`corosync.conf`, or, preferably, using the 'priority' parameter when creating
your cluster with `pvecm`:
----
- # pvecm create CLUSTERNAME --link0 10.10.10.1,priority=20 --link1 10.20.20.1,priority=15
+ # pvecm create CLUSTERNAME --link0 10.10.10.1,priority=15 --link1 10.20.20.1,priority=20
----
-This would cause 'link1' to be used first, since it has the lower priority.
+This would cause 'link1' to be used first, since it has the higher priority.
If no priorities are configured manually (or two links have the same priority),
links will be used in order of their number, with the lower number having higher
If you see a healthy cluster state, it means that your new link is being used.
+Role of SSH in {PVE} Clusters
+-----------------------------
+
+{PVE} utilizes SSH tunnels for various features.
+
+* Proxying console/shell sessions (node and guests)
++
+When using the shell for node B while being connected to node A, connects to a
+terminal proxy on node A, which is in turn connected to the login shell on node
+B via a non-interactive SSH tunnel.
+
+* VM and CT memory and local-storage migration in 'secure' mode.
++
+During the migration one or more SSH tunnel(s) are established between the
+source and target nodes, in order to exchange migration information and
+transfer memory and disk contents.
+
+* Storage replication
+
+.Pitfalls due to automatic execution of `.bashrc` and siblings
+[IMPORTANT]
+====
+In case you have a custom `.bashrc`, or similar files that get executed on
+login by the configured shell, `ssh` will automatically run it once the session
+is established successfully. This can cause some unexpected behavior, as those
+commands may be executed with root permissions on any above described
+operation. That can cause possible problematic side-effects!
+
+In order to avoid such complications, it's recommended to add a check in
+`/root/.bashrc` to make sure the session is interactive, and only then run
+`.bashrc` commands.
+
+You can add this snippet at the beginning of your `.bashrc` file:
+
+----
+# Early exit if not running interactively to avoid side-effects!
+case $- in
+ *i*) ;;
+ *) return;;
+esac
+----
+====
+
+
Corosync External Vote Support
------------------------------
QDevice Technical Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~
-The Corosync Quroum Device (QDevice) is a daemon which runs on each cluster
+The Corosync Quorum Device (QDevice) is a daemon which runs on each cluster
node. It provides a configured number of votes to the clusters quorum
subsystem based on an external running third-party arbitrator's decision.
Its primary use is to allow a cluster to sustain more node failures than
~~~~~~~~~~~~~~~~~
We recommend to run any daemon which provides votes to corosync-qdevice as an
-unprivileged user. {pve} and Debian provides a package which is already
+unprivileged user. {pve} and Debian provide a package which is already
configured to do so.
The traffic between the daemon and the cluster must be encrypted to ensure a
safe and secure QDevice integration in {pve}.
-First install the 'corosync-qnetd' package on your external server and
-the 'corosync-qdevice' package on all cluster nodes.
+First, install the 'corosync-qnetd' package on your external server
+
+----
+external# apt install corosync-qnetd
+----
+
+and the 'corosync-qdevice' package on all cluster nodes
+
+----
+pve# apt install corosync-qdevice
+----
After that, ensure that all your nodes on the cluster are online.
pve# pvecm qdevice setup <QDEVICE-IP>
----
-The SSH key from the cluster will be automatically copied to the QDevice. You
-might need to enter an SSH password during this step.
+The SSH key from the cluster will be automatically copied to the QDevice.
+
+NOTE: Make sure that the SSH configuration on your external server allows root
+login via password, if you are asked for a password during this step.
After you enter the password and all the steps are successfully completed, you
will see "Done". You can check the status now:
address 192.X.Y.57
netmask 255.255.250.0
gateway 192.X.Y.1
- bridge_ports eno1
- bridge_stp off
- bridge_fd 0
+ bridge-ports eno1
+ bridge-stp off
+ bridge-fd 0
# cluster network
auto eno2