vrf : remove net.ipv4.tcp_l3mdev_accept=1 sysctl tuning

[pve-docs.git] / pvecm.adoc
diff --git a/pvecm.adoc b/pvecm.adoc

index a8f017c7b9a9035d1bf7e9e96dfc3f1b3eeef3c3..7c786bc99f7f158f442c05becf10458623be6ab6 100644 (file)
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -1,3 +1,4 @@
+[[chapter_pvecm]]
  ifdef::manvolnum[]
  pvecm(1)
  ========
@@ -74,6 +75,8 @@ manually enabled first.
  * We recommend a dedicated NIC for the cluster traffic, especially if
    you use shared storage.
  
+* Root password of a cluster node is required for adding nodes.
+
  NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
  Proxmox VE 4.0 cluster nodes.
  
@@ -85,14 +88,14 @@ First, install {PVE} on all nodes. Make sure that each node is
  installed with the final hostname and IP configuration. Changing the
  hostname and IP is not possible after cluster creation.
  
-Currently the cluster creation has to be done on the console, so you
-need to login via `ssh`.
+Currently the cluster creation can either be done on the console(login via `ssh`) or the GUI.
  
+[[pvecm_create_cluster]]
  Create the Cluster
  ------------------
  
  Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
-This name cannot be changed later.
+This name cannot be changed later. The cluster name follows the same rules as node names.
  
   hp1# pvecm create YOUR-CLUSTER-NAME
  
@@ -104,7 +107,22 @@ To check the state of your cluster use:
  
   hp1# pvecm status
  
+Multiple Clusters In Same Network
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to create multiple clusters in the same physical or logical
+network. Each cluster must have a unique name, which is used to generate the
+cluster's multicast group address. As long as no duplicate cluster names are
+configured in one network segment, the different clusters won't interfere with
+each other.
+
+If multiple clusters operate in a single network it may be beneficial to setup
+an IGMP querier and enable IGMP Snooping in said network. This may reduce the
+load of the network significantly because multicast packets are only delivered
+to endpoints of the respective member nodes.
+
  
+[[pvecm_join_node_to_cluster]]
  Adding Nodes to the Cluster
  ---------------------------
  
@@ -170,6 +188,7 @@ Membership information
           4          1 hp4
  ----
  
+[[adding-nodes-with-separated-cluster-network]]
  Adding Nodes With Separated Cluster Network
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -193,42 +212,10 @@ not be what you want or need.
  
  Move all virtual machines from the node. Make sure you have no local
  data or backups you want to keep, or save them accordingly.
+In the following example we will remove the node hp4 from the cluster.
  
-Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
-identify the node ID:
-
-----
-hp1# pvecm status
-
-Quorum information
-~~~~~~~~~~~~~~~~~~
-Date:             Mon Apr 20 12:30:13 2015
-Quorum provider:  corosync_votequorum
-Nodes:            4
-Node ID:          0x00000001
-Ring ID:          1928
-Quorate:          Yes
-
-Votequorum information
-~~~~~~~~~~~~~~~~~~~~~~
-Expected votes:   4
-Highest expected: 4
-Total votes:      4
-Quorum:           2
-Flags:            Quorate
-
-Membership information
-~~~~~~~~~~~~~~~~~~~~~~
-    Nodeid      Votes Name
-0x00000001          1 192.168.15.91 (local)
-0x00000002          1 192.168.15.92
-0x00000003          1 192.168.15.93
-0x00000004          1 192.168.15.94
-----
-
-IMPORTANT: at this point you must power off the node to be removed and
-make sure that it will not power on again (in the network) as it
-is.
+Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
+command to identify the node ID to remove:
  
  ----
  hp1# pvecm nodes
@@ -242,8 +229,18 @@ Membership information
           4          1 hp4
  ----
  
-Log in to one remaining node via ssh. Issue the delete command (here
-deleting node `hp4`):
+
+At this point you must power off hp4 and
+make sure that it will not power on again (in the network) as it
+is.
+
+IMPORTANT: As said above, it is critical to power off the node
+*before* removal, and make sure that it will *never* power on again
+(in the existing cluster network) as it is.
+If you power on the node as it is, your cluster will be screwed up and
+it could be difficult to restore a clean cluster state.
+
+After powering off the node hp4, we can safely remove it from the cluster.
  
   hp1# pvecm delnode hp4
  
@@ -279,13 +276,6 @@ Membership information
  0x00000003          1 192.168.15.92
  ----
  
-IMPORTANT: as said above, it is very important to power off the node
-*before* removal, and make sure that it will *never* power on again
-(in the existing cluster network) as it is.
-
-If you power on the node as it is, your cluster will be screwed up and
-it could be difficult to restore a clean cluster state.
-
  If, for whatever reason, you want that this server joins the same
  cluster again, you have to
  
@@ -304,7 +294,8 @@ You can also separate a node from a cluster without reinstalling it from
  scratch.  But after removing the node from the cluster it will still have
  access to the shared storages! This must be resolved before you start removing
  the node from the cluster. A {pve} cluster cannot share the exact same
-storage with another cluster, as it leads to VMID conflicts.
+storage with another cluster, as storage locking doesn't work over cluster
+boundary. Further, it may also lead to VMID conflicts.
  
  Its suggested that you create a new storage where only the node which you want
  to separate has access. This can be an new export on your NFS or a new Ceph
@@ -427,15 +418,17 @@ for that purpose.
  
  * Ensure that multicast works in general and a high package rates. This can be
    done with the `omping` tool. The final "%loss" number should be < 1%.
++
  [source,bash]
  ----
  omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
  ----
  
  * Ensure that multicast communication works over an extended period of time.
-  This covers up problems where IGMP snooping is activated on the network but
+  This uncovers problems where IGMP snooping is activated on the network but
    no multicast querier is active. This test has a duration of around 10
    minutes.
++
  [source,bash]
  ----
  omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
@@ -470,7 +463,7 @@ Separate On Cluster Creation
  This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
  the 'pvecm create' command used for creating a new cluster.
  
-If you have setup a additional NIC with a static address on 10.10.10.1/25
+If you have setup an additional NIC with a static address on 10.10.10.1/25
  and want to send and receive all cluster communication over this interface
  you would execute:
  
@@ -485,6 +478,9 @@ To check if everything is working properly execute:
  systemctl status corosync
  ----
  
+Afterwards, proceed as descripted in the section to
+<<adding-nodes-with-separated-cluster-network,add nodes with a separated cluster network>>.
+
  [[separate-cluster-net-after-creation]]
  Separate After Cluster Creation
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -554,7 +550,7 @@ addresses.  You may use plain IP addresses or also hostnames here. If you use
  hostnames ensure that they are resolvable from all nodes.
  
  In my example I want to switch my cluster communication to the 10.10.10.1/25
-network. So I replace all 'ring0_addr' respectively. I also set the bindetaddr
+network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr
  in the totem section of the config to an address of the new network. It can be
  any address from the subnet configured on the new network interface.
  
@@ -633,6 +629,7 @@ systemctl status corosync
  If corosync runs again correct restart corosync also on all other nodes.
  They will then join the cluster membership one by one on the new network.
  
+[[pvecm_rrp]]
  Redundant Ring Protocol
  ~~~~~~~~~~~~~~~~~~~~~~~
  To avoid a single point of failure you should implement counter measurements.
@@ -660,13 +657,13 @@ pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
  -bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
  ----
  
-RRP On A Created Cluster
+RRP On Existing Clusters
  ~~~~~~~~~~~~~~~~~~~~~~~~
  
-When enabling an already running cluster to use RRP you will take similar steps
-as describe in
-<<separate-cluster-net-after-creation,separating the cluster network>>. You
-just do it on another ring.
+You will take similar steps as described in
+<<separate-cluster-net-after-creation,separating the cluster network>> to
+enable RRP on an already running cluster. The single difference is, that you
+will add `ring1` and use it instead of `ring0`.
  
  First add a new `interface` subsection in the `totem` section, set its
  `ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
@@ -734,7 +731,7 @@ stopped on all nodes start it one after the other again.
  Corosync Configuration
  ----------------------
  
-The `/ect/pve/corosync.conf` file plays a central role in {pve} cluster. It
+The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It
  controls the cluster member ship and its network.
  For reading more about it check the corosync.conf man page:
  [source,bash]
@@ -872,7 +869,7 @@ NOTE: It is always a good idea to use an uninterruptible power supply
  (``UPS'', also called ``battery backup'') to avoid this state, especially if
  you want HA.
  
-On node startup, service `pve-manager` is started and waits for
+On node startup, the `pve-guests` service is started and waits for
  quorum. Once quorate, it starts all guests which have the `onboot`
  flag set.
  
@@ -890,14 +887,22 @@ migrations. This can be done via the configuration file
  `datacenter.cfg` or for a specific migration via API or command line
  parameters.
  
+It makes a difference if a Guest is online or offline, or if it has
+local resources (like a local disk).
+
+For Details about Virtual Machine Migration see the
+xref:qm_migration[QEMU/KVM Migration Chapter]
+
+For Details about Container Migration see the
+xref:pct_migration[Container Migration Chapter]
  
  Migration Type
  ~~~~~~~~~~~~~~
  
-The migration type defines if the migration data should be sent over a
+The migration type defines if the migration data should be sent over an
  encrypted (`secure`) channel or an unencrypted (`insecure`) one.
  Setting the migration type to insecure means that the RAM content of a
-virtual guest gets also transfered unencrypted, which can lead to
+virtual guest gets also transferred unencrypted, which can lead to
  information disclosure of critical data from inside the guest (for
  example passwords or encryption keys).
  
@@ -946,7 +951,7 @@ dedicated network for migration.
  A network configuration for such a setup might look as follows:
  
  ----
-iface eth0 inet manual
+iface eno1 inet manual
  
  # public network
  auto vmbr0
@@ -954,19 +959,19 @@ iface vmbr0 inet static
      address 192.X.Y.57
      netmask 255.255.250.0
      gateway 192.X.Y.1
-    bridge_ports eth0
+    bridge_ports eno1
      bridge_stp off
      bridge_fd 0
  
  # cluster network
-auto eth1
-iface eth1 inet static
+auto eno2
+iface eno2 inet static
      address  10.1.1.1
      netmask  255.255.255.0
  
  # fast network
-auto eth2
-iface eth2 inet static
+auto eno3
+iface eno3 inet static
      address  10.1.2.1
      netmask  255.255.255.0
  ----