X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pvecm.adoc;h=08f38e569b72472aa4249223d347047185d8e2b1;hp=3b2a75da18eb8d3436a6b1982773db9b2617db09;hb=e4ec415409536b12477442c713ab217a183d8bed;hpb=d8742b0c9cbfbc0a2bac4b342657dc94db079a81

diff --git a/pvecm.adoc b/pvecm.adoc
index 3b2a75d..08f38e5 100644
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -23,10 +23,819 @@ Cluster Manager
 include::attributes.txt[]
 endif::manvolnum[]
 
-'pvecm' is a program to manage the cluster configuration. It can be
-used to create a new cluster, join nodes to a cluster, leave the
-cluster, get status information and do various other cluster related
-tasks.
+The {PVE} cluster manager `pvecm` is a tool to create a group of
+physical servers. Such a group is called a *cluster*. We use the
+http://www.corosync.org[Corosync Cluster Engine] for reliable group
+communication, and such clusters can consist of up to 32 physical nodes
+(probably more, dependent on network latency).
+
+`pvecm` can be used to create a new cluster, join nodes to a cluster,
+leave the cluster, get status information and do various other cluster
+related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
+is used to transparently distribute the cluster configuration to all cluster
+nodes.
+
+Grouping nodes into a cluster has the following advantages:
+
+* Centralized, web based management
+
+* Multi-master clusters: each node can do all management task
+
+* `pmxcfs`: database-driven file system for storing configuration files,
+ replicated in real-time on all nodes using `corosync`.
+
+* Easy migration of virtual machines and containers between physical
+  hosts
+
+* Fast deployment
+
+* Cluster-wide services like firewall and HA
+
+
+Requirements
+------------
+
+* All nodes must be in the same network as `corosync` uses IP Multicast
+ to communicate between nodes (also see
+ http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
+ ports 5404 and 5405 for cluster communication.
++
+NOTE: Some switches do not support IP multicast by default and must be
+manually enabled first.
+
+* Date and time have to be synchronized.
+
+* SSH tunnel on TCP port 22 between nodes is used. 
+
+* If you are interested in High Availability, you need to have at
+  least three nodes for reliable quorum. All nodes should have the
+  same version.
+
+* We recommend a dedicated NIC for the cluster traffic, especially if
+  you use shared storage.
+
+NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
+Proxmox VE 4.0 cluster nodes.
+
+
+Preparing Nodes
+---------------
+
+First, install {PVE} on all nodes. Make sure that each node is
+installed with the final hostname and IP configuration. Changing the
+hostname and IP is not possible after cluster creation.
+
+Currently the cluster creation has to be done on the console, so you
+need to login via `ssh`.
+
+Create the Cluster
+------------------
+
+Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
+This name cannot be changed later.
+
+ hp1# pvecm create YOUR-CLUSTER-NAME
+
+CAUTION: The cluster name is used to compute the default multicast
+address. Please use unique cluster names if you run more than one
+cluster inside your network.
+
+To check the state of your cluster use:
+
+ hp1# pvecm status
+
+
+Adding Nodes to the Cluster
+---------------------------
+
+Login via `ssh` to the node you want to add.
+
+ hp2# pvecm add IP-ADDRESS-CLUSTER
+
+For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
+
+CAUTION: A new node cannot hold any VMs, because you would get
+conflicts about identical VM IDs. Also, all existing configuration in
+`/etc/pve` is overwritten when you join a new node to the cluster. To
+workaround, use `vzdump` to backup and restore to a different VMID after
+adding the node to the cluster.
+
+To check the state of cluster:
+
+ # pvecm status
+
+.Cluster status after adding 4 nodes
+----
+hp2# pvecm status
+Quorum information
+~~~~~~~~~~~~~~~~~~
+Date:             Mon Apr 20 12:30:13 2015
+Quorum provider:  corosync_votequorum
+Nodes:            4
+Node ID:          0x00000001
+Ring ID:          1928
+Quorate:          Yes
+
+Votequorum information
+~~~~~~~~~~~~~~~~~~~~~~
+Expected votes:   4
+Highest expected: 4
+Total votes:      4
+Quorum:           2
+Flags:            Quorate
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+    Nodeid      Votes Name
+0x00000001          1 192.168.15.91
+0x00000002          1 192.168.15.92 (local)
+0x00000003          1 192.168.15.93
+0x00000004          1 192.168.15.94
+----
+
+If you only want the list of all nodes use:
+
+ # pvecm nodes
+
+.List nodes in a cluster
+----
+hp2# pvecm nodes
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+    Nodeid      Votes Name
+         1          1 hp1
+         2          1 hp2 (local)
+         3          1 hp3
+         4          1 hp4
+----
+
+Adding Nodes With Separated Cluster Network
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When adding a node to a cluster with a separated cluster network you need to
+use the 'ringX_addr' parameters to set the nodes address on those networks:
+
+[source,bash]
+pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
+
+If you want to use the Redundant Ring Protocol you will also want to pass the
+'ring1_addr' parameter.
+
+
+Remove a Cluster Node
+---------------------
+
+CAUTION: Read carefully the procedure before proceeding, as it could
+not be what you want or need.
+
+Move all virtual machines from the node. Make sure you have no local
+data or backups you want to keep, or save them accordingly.
+
+Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
+identify the node ID:
+
+----
+hp1# pvecm status
+
+Quorum information
+~~~~~~~~~~~~~~~~~~
+Date:             Mon Apr 20 12:30:13 2015
+Quorum provider:  corosync_votequorum
+Nodes:            4
+Node ID:          0x00000001
+Ring ID:          1928
+Quorate:          Yes
+
+Votequorum information
+~~~~~~~~~~~~~~~~~~~~~~
+Expected votes:   4
+Highest expected: 4
+Total votes:      4
+Quorum:           2
+Flags:            Quorate
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+    Nodeid      Votes Name
+0x00000001          1 192.168.15.91 (local)
+0x00000002          1 192.168.15.92
+0x00000003          1 192.168.15.93
+0x00000004          1 192.168.15.94
+----
+
+IMPORTANT: at this point you must power off the node to be removed and
+make sure that it will not power on again (in the network) as it
+is.
+
+----
+hp1# pvecm nodes
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+    Nodeid      Votes Name
+         1          1 hp1 (local)
+         2          1 hp2
+         3          1 hp3
+         4          1 hp4
+----
+
+Log in to one remaining node via ssh. Issue the delete command (here
+deleting node `hp4`):
+
+ hp1# pvecm delnode hp4
+
+If the operation succeeds no output is returned, just check the node
+list again with `pvecm nodes` or `pvecm status`. You should see
+something like:
+
+----
+hp1# pvecm status
+
+Quorum information
+~~~~~~~~~~~~~~~~~~
+Date:             Mon Apr 20 12:44:28 2015
+Quorum provider:  corosync_votequorum
+Nodes:            3
+Node ID:          0x00000001
+Ring ID:          1992
+Quorate:          Yes
+
+Votequorum information
+~~~~~~~~~~~~~~~~~~~~~~
+Expected votes:   3
+Highest expected: 3
+Total votes:      3
+Quorum:           3
+Flags:            Quorate
+
+Membership information
+~~~~~~~~~~~~~~~~~~~~~~
+    Nodeid      Votes Name
+0x00000001          1 192.168.15.90 (local)
+0x00000002          1 192.168.15.91
+0x00000003          1 192.168.15.92
+----
+
+IMPORTANT: as said above, it is very important to power off the node
+*before* removal, and make sure that it will *never* power on again
+(in the existing cluster network) as it is.
+
+If you power on the node as it is, your cluster will be screwed up and
+it could be difficult to restore a clean cluster state.
+
+If, for whatever reason, you want that this server joins the same
+cluster again, you have to
+
+* reinstall {pve} on it from scratch
+
+* then join it, as explained in the previous section.
+
+Separate A Node Without Reinstalling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+CAUTION: This is *not* the recommended method, proceed with caution. Use the
+above mentioned method if you're unsure.
+
+You can also separate a node from a cluster without reinstalling it from
+scratch.  But after removing the node from the cluster it will still have
+access to the shared storages! This must be resolved before you start removing
+the node from the cluster. A {pve} cluster cannot share the exact same
+storage with another cluster, as it leads to VMID conflicts.
+
+Move the guests which you want to keep on this node now, after the removal you
+can do this only via backup and restore. Its suggested that you create a new
+storage where only the node which you want to separate has access. This can be
+an new export on your NFS or a new Ceph pool, to name a few examples. Its just
+important that the exact same storage does not gets accessed by multiple
+clusters. After setting this storage up move all data from the node and its VMs
+to it. Then you are ready to separate the node from the cluster.
+
+WARNING: Ensure all shared resources are cleanly separated! You will run into
+conflicts and problems else.
+
+First stop the corosync and the pve-cluster services on the node:
+[source,bash]
+systemctl stop pve-cluster
+systemctl stop corosync
+
+Start the cluster filesystem again in local mode:
+[source,bash]
+pmxcfs -l
+
+Delete the corosync configuration files:
+[source,bash]
+rm /etc/pve/corosync.conf
+rm /etc/corosync/*
+
+You can now start the filesystem again as normal service:
+[source,bash]
+killall pmxcfs
+systemctl start pve-cluster
+
+The node is now separated from the cluster. You can deleted it from a remaining
+node of the cluster with:
+[source,bash]
+pvecm delnode oldnode
+
+If the command failed, because the remaining node in the cluster lost quorum
+when the now separate node exited, you may set the expected votes to 1 as a workaround:
+[source,bash]
+pvecm expected 1
+
+And the repeat the 'pvecm delnode' command.
+
+Now switch back to the separated node, here delete all remaining files left
+from the old cluster. This ensures that the node can be added to another
+cluster again without problems.
+
+[source,bash]
+rm /var/lib/corosync/*
+
+As the configuration files from the other nodes are still in the cluster
+filesystem you may want to clean those up too.  Remove simply the whole
+directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
+you used the correct one before deleting it.
+
+CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
+the nodes can still connect to each other with public key authentication. This
+should be fixed by removing the respective keys from the
+'/etc/pve/priv/authorized_keys' file.
+
+Quorum
+------
+
+{pve} use a quorum-based technique to provide a consistent state among
+all cluster nodes.
+
+[quote, from Wikipedia, Quorum (distributed computing)]
+____
+A quorum is the minimum number of votes that a distributed transaction
+has to obtain in order to be allowed to perform an operation in a
+distributed system.
+____
+
+In case of network partitioning, state changes requires that a
+majority of nodes are online. The cluster switches to read-only mode
+if it loses quorum.
+
+NOTE: {pve} assigns a single vote to each node by default.
+
+Cluster Network
+---------------
+
+The cluster network is the core of a cluster. All messages sent over it have to
+be delivered reliable to all nodes in their respective order. In {pve} this
+part is done by corosync, an implementation of a high performance low overhead
+high availability development toolkit. It serves our decentralized
+configuration file system (`pmxcfs`).
+
+[[cluster-network-requirements]]
+Network Requirements
+~~~~~~~~~~~~~~~~~~~~
+This needs a reliable network with latencies under 2 milliseconds (LAN
+performance) to work properly. While corosync can also use unicast for
+communication between nodes its **highly recommended** to have a multicast
+capable network. The network should not be used heavily by other members,
+ideally corosync runs on its own network.
+*never* share it with network where storage communicates too.
+
+Before setting up a cluster it is good practice to check if the network is fit
+for that purpose.
+
+* Ensure that all nodes are in the same subnet. This must only be true for the
+  network interfaces used for cluster communication (corosync).
+
+* Ensure all nodes can reach each other over those interfaces, using `ping` is
+  enough for a basic test.
+
+* Ensure that multicast works in general and a high package rates. This can be
+  done with the `omping` tool. The final "%loss" number should be < 1%.
+[source,bash]
+----
+omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
+----
+
+* Ensure that multicast communication works over an extended period of time.
+  This covers up problems where IGMP snooping is activated on the network but
+  no multicast querier is active. This test has a duration of around 10
+  minutes.
+[source,bash]
+omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
+
+Your network is not ready for clustering if any of these test fails. Recheck
+your network configuration. Especially switches are notorious for having
+multicast disabled by default or IGMP snooping enabled with no IGMP querier
+active.
+
+In smaller cluster its also an option to use unicast if you really cannot get
+multicast to work.
+
+Separate Cluster Network
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When creating a cluster without any parameters the cluster network is generally
+shared with the Web UI and the VMs and its traffic. Depending on your setup
+even storage traffic may get sent over the same network. Its recommended to
+change that, as corosync is a time critical real time application.
+
+Setting Up A New Network
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+First you have to setup a new network interface. It should be on a physical
+separate network. Ensure that your network fulfills the
+<<cluster-network-requirements,cluster network requirements>>.
+
+Separate On Cluster Creation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
+the 'pvecm create' command used for creating a new cluster.
+
+If you have setup a additional NIC with a static address on 10.10.10.1/25
+and want to send and receive all cluster communication over this interface
+you would execute:
+
+[source,bash]
+pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
+
+To check if everything is working properly execute:
+[source,bash]
+systemctl status corosync
+
+[[separate-cluster-net-after-creation]]
+Separate After Cluster Creation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can do this also if you have already created a cluster and want to switch
+its communication to another network, without rebuilding the whole cluster.
+This change may lead to short durations of quorum loss in the cluster, as nodes
+have to restart corosync and come up one after the other on the new network.
+
+Check how to <<edit-corosync-conf,edit the corosync.conf file>> first.
+The open it and you should see a file similar to:
+
+----
+logging {
+  debug: off
+  to_syslog: yes
+}
+
+nodelist {
+
+  node {
+    name: due
+    nodeid: 2
+    quorum_votes: 1
+    ring0_addr: due
+  }
+
+  node {
+    name: tre
+    nodeid: 3
+    quorum_votes: 1
+    ring0_addr: tre
+  }
+
+  node {
+    name: uno
+    nodeid: 1
+    quorum_votes: 1
+    ring0_addr: uno
+  }
+
+}
+
+quorum {
+  provider: corosync_votequorum
+}
+
+totem {
+  cluster_name: thomas-testcluster
+  config_version: 3
+  ip_version: ipv4
+  secauth: on
+  version: 2
+  interface {
+    bindnetaddr: 192.168.30.50
+    ringnumber: 0
+  }
+
+}
+----
+
+The first you want to do is add the 'name' properties in the node entries if
+you do not see them already. Those *must* match the node name.
+
+Then replace the address from the 'ring0_addr' properties with the new
+addresses.  You may use plain IP addresses or also hostnames here. If you use
+hostnames ensure that they are resolvable from all nodes.
+
+In my example I want to switch my cluster communication to the 10.10.10.1/25
+network. So I replace all 'ring0_addr' respectively. I also set the bindetaddr
+in the totem section of the config to an address of the new network. It can be
+any address from the subnet configured on the new network interface.
+
+After you increased the 'config_version' property the new configuration file
+should look like:
+
+----
+
+logging {
+  debug: off
+  to_syslog: yes
+}
+
+nodelist {
+
+  node {
+    name: due
+    nodeid: 2
+    quorum_votes: 1
+    ring0_addr: 10.10.10.2
+  }
+
+  node {
+    name: tre
+    nodeid: 3
+    quorum_votes: 1
+    ring0_addr: 10.10.10.3
+  }
+
+  node {
+    name: uno
+    nodeid: 1
+    quorum_votes: 1
+    ring0_addr: 10.10.10.1
+  }
+
+}
+
+quorum {
+  provider: corosync_votequorum
+}
+
+totem {
+  cluster_name: thomas-testcluster
+  config_version: 4
+  ip_version: ipv4
+  secauth: on
+  version: 2
+  interface {
+    bindnetaddr: 10.10.10.1
+    ringnumber: 0
+  }
+
+}
+----
+
+Now after a final check whether all changed information is correct we save it
+and see again the <<edit-corosync-conf,edit corosync.conf file>> section to
+learn how to bring it in effect.
+
+As our change cannot be enforced live from corosync we have to do an restart.
+
+On a single node execute:
+[source,bash]
+systemctl restart corosync
+
+Now check if everything is fine:
+
+[source,bash]
+systemctl status corosync
+
+If corosync runs again correct restart corosync also on all other nodes.
+They will then join the cluster membership one by one on the new network.
+
+Redundant Ring Protocol
+~~~~~~~~~~~~~~~~~~~~~~~
+To avoid a single point of failure you should implement counter measurements.
+This can be on the hardware and operating system level through network bonding.
+
+Corosync itself offers also a possibility to add redundancy through the so
+called 'Redundant Ring Protocol'. This protocol allows running a second totem
+ring on another network, this network should be physically separated from the
+other rings network to actually increase availability.
+
+RRP On Cluster Creation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
+'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
+
+NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means.
+
+So if you have two networks, one on the 10.10.10.1/24 and the other on the
+10.10.20.1/24 subnet you would execute:
+
+[source,bash]
+pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
+-bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
+
+RRP On A Created Cluster
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When enabling an already running cluster to use RRP you will take similar steps
+as describe in <<separate-cluster-net-after-creation,separating the cluster
+network>>. You just do it on another ring.
+
+First add a new `interface` subsection in the `totem` section, set its
+`ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
+address of the subnet you have configured for your new ring.
+Further set the `rrp_mode` to `passive`, this is the only stable mode.
+
+Then add to each node entry in the `nodelist` section its new `ring1_addr`
+property with the nodes additional ring address.
+
+So if you have two networks, one on the 10.10.10.1/24 and the other on the
+10.10.20.1/24 subnet, the final configuration file should look like:
+
+----
+totem {
+  cluster_name: tweak
+  config_version: 9
+  ip_version: ipv4
+  rrp_mode: passive
+  secauth: on
+  version: 2
+  interface {
+    bindnetaddr: 10.10.10.1
+    ringnumber: 0
+  }
+  interface {
+    bindnetaddr: 10.10.20.1
+    ringnumber: 1
+  }
+}
+
+nodelist {
+  node {
+    name: pvecm1
+    nodeid: 1
+    quorum_votes: 1
+    ring0_addr: 10.10.10.1
+    ring1_addr: 10.10.20.1
+  }
+
+ node {
+    name: pvecm2
+    nodeid: 2
+    quorum_votes: 1
+    ring0_addr: 10.10.10.2
+    ring1_addr: 10.10.20.2
+  }
+
+  [...] # other cluster nodes here
+}
+
+[...] # other remaining config sections here
+
+----
+
+Bring it in effect like described in the <<edit-corosync-conf,edit the
+corosync.conf file>> section.
+
+This is a change which cannot take live in effect and needs at least a restart
+of corosync. Recommended is a restart of the whole cluster.
+
+If you cannot reboot the whole cluster ensure no High Availability services are
+configured and the stop the corosync service on all nodes. After corosync is
+stopped on all nodes start it one after the other again.
+
+Corosync Configuration
+----------------------
+
+The `/ect/pve/corosync.conf` file plays a central role in {pve} cluster. It
+controls the cluster member ship and its network.
+For reading more about it check the corosync.conf man page:
+[source,bash]
+man corosync.conf
+
+For node membership you should always use the `pvecm` tool provided by {pve}.
+You may have to edit the configuration file manually for other changes.
+Here are a few best practice tips for doing this.
+
+[[edit-corosync-conf]]
+Edit corosync.conf
+~~~~~~~~~~~~~~~~~~
+
+Editing the corosync.conf file can be not always straight forward. There are
+two on each cluster, one in `/etc/pve/corosync.conf` and the other in
+`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
+propagate the changes to the local one, but not vice versa.
+
+The configuration will get updated automatically as soon as the file changes.
+This means changes which can be integrated in a running corosync will take
+instantly effect. So you should always make a copy and edit that instead, to
+avoid triggering some unwanted changes by an in between safe.
+
+[source,bash]
+cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
+
+Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
+preinstalled on {pve} for example.
+
+NOTE: Always increment the 'config_version' number on configuration changes,
+omitting this can lead to problems.
+
+After making the necessary changes create another copy of the current working
+configuration file. This serves as a backup if the new configuration fails to
+apply or makes problems in other ways.
+
+[source,bash]
+cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
+
+Then move the new configuration file over the old one:
+[source,bash]
+mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
+
+You may check with the commands
+[source,bash]
+systemctl status corosync
+journalctl -b -u corosync
+
+If the change could applied automatically. If not you may have to restart the
+corosync service via:
+[source,bash]
+systemctl restart corosync
+
+On errors check the troubleshooting section below.
+
+Troubleshooting
+~~~~~~~~~~~~~~~
+
+Issue: 'quorum.expected_votes must be configured'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When corosync starts to fail and you get the following message in the system log:
+
+----
+[...]
+corosync[1647]:  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
+corosync[1647]:  [SERV  ] Service engine 'corosync_quorum' failed to load for reason
+    'configuration error: nodelist or quorum.expected_votes must be configured!'
+[...]
+----
+
+It means that the hostname you set for corosync 'ringX_addr' in the
+configuration could not be resolved.
+
+
+Write Configuration When Not Quorate
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you
+know what you do, use:
+[source,bash]
+pvecm expected 1
+
+This sets the expected vote count to 1 and makes the cluster quorate. You can
+now fix your configuration, or revert it back to the last working backup.
+
+This is not enough if corosync cannot start anymore. Here its best to edit the
+local copy of the corosync configuration in '/etc/corosync/corosync.conf' so
+that corosync can start again. Ensure that on all nodes this configuration has
+the same content to avoid split brains. If you are not sure what went wrong
+it's best to ask the Proxmox Community to help you.
+
+
+[[corosync-conf-glossary]]
+Corosync Configuration Glossary
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ringX_addr::
+This names the different ring addresses for the corosync totem rings used for
+the cluster communication.
+
+bindnetaddr::
+Defines to which interface the ring should bind to. It may be any address of
+the subnet configured on the interface we want to use. In general its the
+recommended to just use an address a node uses on this interface.
+
+rrp_mode::
+Specifies the mode of the redundant ring protocol and may be passive, active or
+none. Note that use of active is highly experimental and not official
+supported. Passive is the preferred mode, it may double the cluster
+communication throughput and increases availability.
+
+
+Cluster Cold Start
+------------------
+
+It is obvious that a cluster is not quorate when all nodes are
+offline. This is a common case after a power failure.
+
+NOTE: It is always a good idea to use an uninterruptible power supply
+(``UPS'', also called ``battery backup'') to avoid this state, especially if
+you want HA.
+
+On node startup, service `pve-manager` is started and waits for
+quorum. Once quorate, it starts all guests which have the `onboot`
+flag set.
+
+When you turn on nodes, or when power comes back after power failure,
+it is likely that some nodes boots faster than others. Please keep in
+mind that guest startup is delayed until you reach quorum.
 
 
 ifdef::manvolnum[]