notifications: add note regarding when 'job-id' is set for backups

[pve-docs.git] / pvecm.adoc
diff --git a/pvecm.adoc b/pvecm.adoc

index 48c3927cb17ec1791e62106985854141fc64f8bd..5117eaa90ca2139737c729c7e90ffb7217f0c770 100644 (file)
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -58,7 +58,7 @@ Grouping nodes into a cluster has the following advantages:
  Requirements
  ------------
  
-* All nodes must be able to connect to each other via UDP ports 5404 and 5405
+* All nodes must be able to connect to each other via UDP ports 5405-5412
   for corosync to work.
  
  * Date and time must be synchronized.
@@ -172,11 +172,14 @@ infrastructure for bigger clusters.
  Adding Nodes to the Cluster
  ---------------------------
  
-CAUTION: A node that is about to be added to the cluster cannot hold any guests.
-All existing configuration in `/etc/pve` is overwritten when joining a cluster,
-since guest IDs could otherwise conflict. As a workaround, you can create a
-backup of the guest (`vzdump`) and restore it under a different ID, after the
-node has been added to the cluster.
+CAUTION: All existing configuration in `/etc/pve` is overwritten when joining a
+cluster. In particular, a joining node cannot hold any guests, since guest IDs
+could otherwise conflict, and the node will inherit the cluster's storage
+configuration. To join a node with existing guest, as a workaround, you can
+create a backup of each guest (using `vzdump`) and restore it under a different
+ID after joining. If the node's storage layout differs, you will need to re-add
+the node's storages, and adapt each storage's node restriction to reflect on
+which nodes the storage is actually available.
  
  Join Node to Cluster via GUI
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -292,7 +295,7 @@ use the 'link0' parameter to set the nodes address on that network:
  
  [source,bash]
  ----
-pvecm add IP-ADDRESS-CLUSTER -link0 LOCAL-IP-ADDRESS-LINK0
+# pvecm add IP-ADDRESS-CLUSTER --link0 LOCAL-IP-ADDRESS-LINK0
  ----
  
  If you want to use the built-in xref:pvecm_redundancy[redundancy] of the
@@ -387,6 +390,10 @@ you have to:
  
  * then join it, as explained in the previous section.
  
+The configuration files for the removed node will still reside in
+'/etc/pve/nodes/hp4'. Recover any configuration you still need and remove the
+directory afterwards.
+
  NOTE: After removal of the node, its SSH fingerprint will still reside in the
  'known_hosts' of the other nodes. If you receive an SSH error after rejoining
  a node with the same IP or hostname, run `pvecm updatecerts` once on the
@@ -511,11 +518,18 @@ file system (`pmxcfs`).
  [[pvecm_cluster_network_requirements]]
  Network Requirements
  ~~~~~~~~~~~~~~~~~~~~
-This needs a reliable network with latencies under 2 milliseconds (LAN
-performance) to work properly. The network should not be used heavily by other
-members; ideally corosync runs on its own network. Do not use a shared network
-for corosync and storage (except as a potential low-priority fallback in a
-xref:pvecm_redundancy[redundant] configuration).
+
+The {pve} cluster stack requires a reliable network with latencies under 5
+milliseconds (LAN performance) between all nodes to operate stably. While on
+setups with a small node count a network with higher latencies _may_ work, this
+is not guaranteed and gets rather unlikely with more than three nodes and
+latencies above around 10 ms.
+
+The network should not be used heavily by other members, as while corosync does
+not uses much bandwidth it is sensitive to latency jitters; ideally corosync
+runs on its own physically separated network.  Especially do not use a shared
+network for corosync and storage (except as a potential low-priority fallback
+in a xref:pvecm_redundancy[redundant] configuration).
  
  Before setting up a cluster, it is good practice to check if the network is fit
  for that purpose. To ensure that the nodes can connect to each other on the
@@ -647,7 +661,7 @@ hostnames, ensure that they are resolvable from all nodes (see also
  xref:pvecm_corosync_addresses[Link Address Types]).
  
  In this example, we want to switch cluster communication to the
-10.10.10.1/25 network, so we change the 'ring0_addr' of each node respectively.
+10.10.10.0/25 network, so we change the 'ring0_addr' of each node respectively.
  
  NOTE: The exact same procedure can be used to change other 'ringX_addr' values
  as well. However, we recommend only changing one link address at a time, so
@@ -908,9 +922,27 @@ transfer memory and disk contents.
  
  * Storage replication
  
-.Pitfalls due to automatic execution of `.bashrc` and siblings
-[IMPORTANT]
-====
+SSH setup
+~~~~~~~~~
+
+On {pve} systems, the following changes are made to the SSH configuration/setup:
+
+* the `root` user's SSH client config gets setup to prefer `AES` over `ChaCha20`
+
+* the `root` user's `authorized_keys` file gets linked to
+  `/etc/pve/priv/authorized_keys`, merging all authorized keys within a cluster
+
+* `sshd` is configured to allow logging in as root with a password
+
+NOTE: Older systems might also have `/etc/ssh/ssh_known_hosts` set up as symlink
+pointing to `/etc/pve/priv/known_hosts`, containing a merged version of all
+node host keys. This system was replaced with explicit host key pinning in
+`pve-cluster <<INSERT VERSION>>`, the symlink can be deconfigured if still in
+place by running `pvecm updatecerts --unmerge-known-hosts`.
+
+Pitfalls due to automatic execution of `.bashrc` and siblings
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
  In case you have a custom `.bashrc`, or similar files that get executed on
  login by the configured shell, `ssh` will automatically run it once the session
  is established successfully. This can cause some unexpected behavior, as those
@@ -930,8 +962,6 @@ case $- in
        *) return;;
  esac
  ----
-====
-
  
  Corosync External Vote Support
  ------------------------------
@@ -974,9 +1004,9 @@ the cluster and to have a corosync-qnetd package available. We provide a package
  for Debian based hosts, and other Linux distributions should also have a package
  available through their respective package manager.
  
-NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
-TCP/IP. The daemon may even run outside of the cluster's LAN and can have longer
-latencies than 2 ms.
+NOTE: Unlike corosync itself, a QDevice connects to the cluster over TCP/IP.
+The daemon can also run outside the LAN of the cluster and isn't limited to the
+low latencies requirements of corosync.
  
  Supported Setups
  ~~~~~~~~~~~~~~~~
@@ -1042,11 +1072,13 @@ pve# pvecm qdevice setup <QDEVICE-IP>
  
  The SSH key from the cluster will be automatically copied to the QDevice.
  
-NOTE: Make sure that the SSH configuration on your external server allows root
-login via password, if you are asked for a password during this step.
+NOTE: Make sure to setup key-based access for the root user on your external
+server, or temporarily allow root login with password during the setup phase.
+If you receive an error such as 'Host key verification failed.' at this
+stage, running `pvecm updatecerts` could fix the issue.
  
-After you enter the password and all the steps have successfully completed, you
-will see "Done". You can verify that the QDevice has been set up with:
+After all the steps have successfully completed, you will see "Done". You can
+verify that the QDevice has been set up with:
  
  ----
  pve# pvecm status
@@ -1064,12 +1096,34 @@ Flags:            Quorate Qdevice
  Membership information
  ~~~~~~~~~~~~~~~~~~~~~~
      Nodeid      Votes    Qdevice Name
-    0x00000001          1    A,V,NMW 192.168.22.180 (local)
-    0x00000002          1    A,V,NMW 192.168.22.181
-    0x00000000          1            Qdevice
+    0x00000001      1    A,V,NMW 192.168.22.180 (local)
+    0x00000002      1    A,V,NMW 192.168.22.181
+    0x00000000      1            Qdevice
  
  ----
  
+[[pvecm_qdevice_status_flags]]
+QDevice Status Flags
+^^^^^^^^^^^^^^^^^^^^
+
+The status output of the QDevice, as seen above, will usually contain three
+columns:
+
+* `A` / `NA`: Alive or Not Alive. Indicates if the communication to the external
+    `corosync-qnetd` daemon works.
+* `V` / `NV`: If the QDevice will cast a vote for the node. In a split-brain
+    situation, where the corosync connection between the nodes is down, but they
+    both can still communicate with the external `corosync-qnetd` daemon,
+    only one node will get the vote.
+* `MW` / `NMW`: Master wins (`MV`) or not (`NMW`). Default is `NMW`, see
+   footnote:[`votequorum_qdevice_master_wins` manual page
+   https://manpages.debian.org/bookworm/libvotequorum-dev/votequorum_qdevice_master_wins.3.en.html].
+* `NR`: QDevice is not registered.
+
+NOTE: If your QDevice is listed as `Not Alive` (`NA` in the output above),
+ensure that port `5403` (the default port of the qnetd server) of your external
+server is reachable via TCP/IP!
+
  
  Frequently Asked Questions
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1250,13 +1304,33 @@ it is likely that some nodes will boot faster than others. Please keep in
  mind that guest startup is delayed until you reach quorum.
  
  
+[[pvecm_next_id_range]]
+Guest VMID Auto-Selection
+------------------------
+
+When creating new guests the web interface will ask the backend for a free VMID
+automatically. The default range for searching is `100` to `1000000` (lower
+than the maximal allowed VMID enforced by the schema).
+
+Sometimes admins either want to allocate new VMIDs in a separate range, for
+example to easily separate temporary VMs with ones that choose a VMID manually.
+Other times its just desired to provided a stable length VMID, for which
+setting the lower boundary to, for example, `100000` gives much more room for.
+
+To accommodate this use case one can set either lower, upper or both boundaries
+via the `datacenter.cfg` configuration file, which can be edited in the web
+interface under 'Datacenter' -> 'Options'.
+
+NOTE: The range is only used for the next-id API call, so it isn't a hard
+limit.
+
  Guest Migration
  ---------------
  
  Migrating virtual guests to other nodes is a useful feature in a
  cluster. There are settings to control the behavior of such
  migrations. This can be done via the configuration file
-`datacenter.cfg` or for a specific migration via API or command line
+`datacenter.cfg` or for a specific migration via API or command-line
  parameters.
  
  It makes a difference if a guest is online or offline, or if it has
@@ -1345,7 +1419,7 @@ iface eno3 inet static
  
  Here, we will use the network 10.1.2.0/24 as a migration network. For
  a single migration, you can do this using the `migration_network`
-parameter of the command line tool:
+parameter of the command-line tool:
  
  ----
  # qm migrate 106 tre --online --migration_network 10.1.2.0/24