* All nodes must be in the same network as corosync uses IP Multicast
to communicate between nodes (also see
http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
- ports 5404and 5405 for cluster communication.
+ ports 5404 and 5405 for cluster communication.
+
NOTE: Some switches do not support IP multicast by default and must be
manually enabled first.
Currently the cluster creation has to be done on the console, so you
need to login via 'ssh'.
-
Create the Cluster
------------------
hp1# pvecm create YOUR-CLUSTER-NAME
+CAUTION: The cluster name is used to compute the default multicast
+address. Please use unique cluster names if you run more than one
+cluster inside your network.
+
To check the state of your cluster use:
hp1# pvecm status
For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
CAUTION: A new node cannot hold any VM´s, because you would get
-conflicts about identical VM IDs. Also, all existing configuration is
-overwritten when you join a new node to the cluster. To workaround,
-use vzdump to backup and restore to a different VMID after adding
-the node to the cluster.
+conflicts about identical VM IDs. Also, all existing configuration in
+'/etc/pve' is overwritten when you join a new node to the cluster. To
+workaround, use vzdump to backup and restore to a different VMID after
+adding the node to the cluster.
To check the state of cluster:
data or backups you want to keep, or save them accordingly.
Log in to one remaining node via ssh. Issue a 'pvecm nodes' command to
-identify the nodeID:
+identify the node ID:
----
hp1# pvecm status
* then join it, as explained in the previous section.
+Quorum
+------
+
+{pve} use a quorum-based technique to provide a consistent state among
+all cluster nodes.
+
+[quote, from Wikipedia, Quorum (distributed computing)]
+____
+A quorum is the minimum number of votes that a distributed transaction
+has to obtain in order to be allowed to perform an operation in a
+distributed system.
+____
+
+In case of network partitioning, state changes requires that a
+majority of nodes are online. The cluster switches to read-only mode
+if it loose quorum.
+
+NOTE: {pve} assigns a single vote to each node by default.
+
+
+Cluster Cold Start
+------------------
+
+It is obvious that a cluster is not quorate when all nodes are
+offline. This is a common case after a power failure.
+
+NOTE: It is always a good idea to use an uninterruptible power supply
+('UPS', also called 'battery backup') to avoid this state. Especially if
+you want HA.
+
+On node startup, service 'pve-manager' waits up to 60 seconds to reach
+quorum, and then starts all guests. If it fails to get quorum, that
+service simply aborts, and you need to start your guest manually once
+you have quorum.
+
+If you start all nodes at the same time (for example when power comes
+back), it is likely that you reach quorum within above timeout. But
+startup can fail if some nodes starts much faster than others, so you
+need to start your guest manually after reaching quorum. You can do
+that on the GUI, or on the command line with:
+
+ systemctl start pve-manager
+
+
ifdef::manvolnum[]
include::pve-copyright.adoc[]
endif::manvolnum[]