Requirements
------------
-* All nodes must be able to connect to each other via UDP ports 5404 and 5405
+* All nodes must be able to connect to each other via UDP ports 5405-5412
for corosync to work.
* Date and time must be synchronized.
Adding Nodes to the Cluster
---------------------------
-CAUTION: A node that is about to be added to the cluster cannot hold any guests.
-All existing configuration in `/etc/pve` is overwritten when joining a cluster,
-since guest IDs could otherwise conflict. As a workaround, you can create a
-backup of the guest (`vzdump`) and restore it under a different ID, after the
-node has been added to the cluster.
+CAUTION: All existing configuration in `/etc/pve` is overwritten when joining a
+cluster. In particular, a joining node cannot hold any guests, since guest IDs
+could otherwise conflict, and the node will inherit the cluster's storage
+configuration. To join a node with existing guest, as a workaround, you can
+create a backup of each guest (using `vzdump`) and restore it under a different
+ID after joining. If the node's storage layout differs, you will need to re-add
+the node's storages, and adapt each storage's node restriction to reflect on
+which nodes the storage is actually available.
Join Node to Cluster via GUI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[source,bash]
----
-pvecm add IP-ADDRESS-CLUSTER -link0 LOCAL-IP-ADDRESS-LINK0
+# pvecm add IP-ADDRESS-CLUSTER --link0 LOCAL-IP-ADDRESS-LINK0
----
If you want to use the built-in xref:pvecm_redundancy[redundancy] of the
* then join it, as explained in the previous section.
+The configuration files for the removed node will still reside in
+'/etc/pve/nodes/hp4'. Recover any configuration you still need and remove the
+directory afterwards.
+
NOTE: After removal of the node, its SSH fingerprint will still reside in the
'known_hosts' of the other nodes. If you receive an SSH error after rejoining
a node with the same IP or hostname, run `pvecm updatecerts` once on the
[[pvecm_cluster_network_requirements]]
Network Requirements
~~~~~~~~~~~~~~~~~~~~
-This needs a reliable network with latencies under 2 milliseconds (LAN
-performance) to work properly. The network should not be used heavily by other
-members; ideally corosync runs on its own network. Do not use a shared network
-for corosync and storage (except as a potential low-priority fallback in a
-xref:pvecm_redundancy[redundant] configuration).
+
+The {pve} cluster stack requires a reliable network with latencies under 5
+milliseconds (LAN performance) between all nodes to operate stably. While on
+setups with a small node count a network with higher latencies _may_ work, this
+is not guaranteed and gets rather unlikely with more than three nodes and
+latencies above around 10 ms.
+
+The network should not be used heavily by other members, as while corosync does
+not uses much bandwidth it is sensitive to latency jitters; ideally corosync
+runs on its own physically separated network. Especially do not use a shared
+network for corosync and storage (except as a potential low-priority fallback
+in a xref:pvecm_redundancy[redundant] configuration).
Before setting up a cluster, it is good practice to check if the network is fit
for that purpose. To ensure that the nodes can connect to each other on the
xref:pvecm_corosync_addresses[Link Address Types]).
In this example, we want to switch cluster communication to the
-10.10.10.1/25 network, so we change the 'ring0_addr' of each node respectively.
+10.10.10.0/25 network, so we change the 'ring0_addr' of each node respectively.
NOTE: The exact same procedure can be used to change other 'ringX_addr' values
as well. However, we recommend only changing one link address at a time, so
* Storage replication
-.Pitfalls due to automatic execution of `.bashrc` and siblings
-[IMPORTANT]
-====
+SSH setup
+~~~~~~~~~
+
+On {pve} systems, the following changes are made to the SSH configuration/setup:
+
+* the `root` user's SSH client config gets setup to prefer `AES` over `ChaCha20`
+
+* the `root` user's `authorized_keys` file gets linked to
+ `/etc/pve/priv/authorized_keys`, merging all authorized keys within a cluster
+
+* `sshd` is configured to allow logging in as root with a password
+
+NOTE: Older systems might also have `/etc/ssh/ssh_known_hosts` set up as symlink
+pointing to `/etc/pve/priv/known_hosts`, containing a merged version of all
+node host keys. This system was replaced with explicit host key pinning in
+`pve-cluster <<INSERT VERSION>>`, the symlink can be deconfigured if still in
+place by running `pvecm updatecerts --unmerge-known-hosts`.
+
+Pitfalls due to automatic execution of `.bashrc` and siblings
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
In case you have a custom `.bashrc`, or similar files that get executed on
login by the configured shell, `ssh` will automatically run it once the session
is established successfully. This can cause some unexpected behavior, as those
*) return;;
esac
----
-====
-
Corosync External Vote Support
------------------------------
for Debian based hosts, and other Linux distributions should also have a package
available through their respective package manager.
-NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
-TCP/IP. The daemon may even run outside of the cluster's LAN and can have longer
-latencies than 2 ms.
+NOTE: Unlike corosync itself, a QDevice connects to the cluster over TCP/IP.
+The daemon can also run outside the LAN of the cluster and isn't limited to the
+low latencies requirements of corosync.
Supported Setups
~~~~~~~~~~~~~~~~
The SSH key from the cluster will be automatically copied to the QDevice.
-NOTE: Make sure that the SSH configuration on your external server allows root
-login via password, if you are asked for a password during this step.
+NOTE: Make sure to setup key-based access for the root user on your external
+server, or temporarily allow root login with password during the setup phase.
+If you receive an error such as 'Host key verification failed.' at this
+stage, running `pvecm updatecerts` could fix the issue.
-After you enter the password and all the steps have successfully completed, you
-will see "Done". You can verify that the QDevice has been set up with:
+After all the steps have successfully completed, you will see "Done". You can
+verify that the QDevice has been set up with:
----
pve# pvecm status
Membership information
~~~~~~~~~~~~~~~~~~~~~~
Nodeid Votes Qdevice Name
- 0x00000001 1 A,V,NMW 192.168.22.180 (local)
- 0x00000002 1 A,V,NMW 192.168.22.181
- 0x00000000 1 Qdevice
+ 0x00000001 1 A,V,NMW 192.168.22.180 (local)
+ 0x00000002 1 A,V,NMW 192.168.22.181
+ 0x00000000 1 Qdevice
----
+[[pvecm_qdevice_status_flags]]
+QDevice Status Flags
+^^^^^^^^^^^^^^^^^^^^
+
+The status output of the QDevice, as seen above, will usually contain three
+columns:
+
+* `A` / `NA`: Alive or Not Alive. Indicates if the communication to the external
+ `corosync-qnetd` daemon works.
+* `V` / `NV`: If the QDevice will cast a vote for the node. In a split-brain
+ situation, where the corosync connection between the nodes is down, but they
+ both can still communicate with the external `corosync-qnetd` daemon,
+ only one node will get the vote.
+* `MW` / `NMW`: Master wins (`MV`) or not (`NMW`). Default is `NMW`, see
+ footnote:[`votequorum_qdevice_master_wins` manual page
+ https://manpages.debian.org/bookworm/libvotequorum-dev/votequorum_qdevice_master_wins.3.en.html].
+* `NR`: QDevice is not registered.
+
+NOTE: If your QDevice is listed as `Not Alive` (`NA` in the output above),
+ensure that port `5403` (the default port of the qnetd server) of your external
+server is reachable via TCP/IP!
+
Frequently Asked Questions
~~~~~~~~~~~~~~~~~~~~~~~~~~
mind that guest startup is delayed until you reach quorum.
+[[pvecm_next_id_range]]
+Guest VMID Auto-Selection
+------------------------
+
+When creating new guests the web interface will ask the backend for a free VMID
+automatically. The default range for searching is `100` to `1000000` (lower
+than the maximal allowed VMID enforced by the schema).
+
+Sometimes admins either want to allocate new VMIDs in a separate range, for
+example to easily separate temporary VMs with ones that choose a VMID manually.
+Other times its just desired to provided a stable length VMID, for which
+setting the lower boundary to, for example, `100000` gives much more room for.
+
+To accommodate this use case one can set either lower, upper or both boundaries
+via the `datacenter.cfg` configuration file, which can be edited in the web
+interface under 'Datacenter' -> 'Options'.
+
+NOTE: The range is only used for the next-id API call, so it isn't a hard
+limit.
+
Guest Migration
---------------
Migrating virtual guests to other nodes is a useful feature in a
cluster. There are settings to control the behavior of such
migrations. This can be done via the configuration file
-`datacenter.cfg` or for a specific migration via API or command line
+`datacenter.cfg` or for a specific migration via API or command-line
parameters.
It makes a difference if a guest is online or offline, or if it has
Here, we will use the network 10.1.2.0/24 as a migration network. For
a single migration, you can do this using the `migration_network`
-parameter of the command line tool:
+parameter of the command-line tool:
----
# qm migrate 106 tre --online --migration_network 10.1.2.0/24