pvecm.adoc

   1 ifdef::manvolnum[]
   2 PVE({manvolnum})
   3 ================
   4 include::attributes.txt[]
   5
   6 NAME
   7 ----
   8
   9 pvecm - Proxmox VE Cluster Manager
  10
  11 SYNOPSYS
  12 --------
  13
  14 include::pvecm.1-synopsis.adoc[]
  15
  16 DESCRIPTION
  17 -----------
  18 endif::manvolnum[]
  19
  20 ifndef::manvolnum[]
  21 Cluster Manager
  22 ===============
  23 include::attributes.txt[]
  24 endif::manvolnum[]
  25
  26 The {PVE} cluster manager `pvecm` is a tool to create a group of
  27 physical servers. Such a group is called a *cluster*. We use the
  28 http://www.corosync.org[Corosync Cluster Engine] for reliable group
  29 communication, and such clusters can consist of up to 32 physical nodes
  30 (probably more, dependent on network latency).
  31
  32 `pvecm` can be used to create a new cluster, join nodes to a cluster,
  33 leave the cluster, get status information and do various other cluster
  34 related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
  35 is used to transparently distribute the cluster configuration to all cluster
  36 nodes.
  37
  38 Grouping nodes into a cluster has the following advantages:
  39
  40 * Centralized, web based management
  41
  42 * Multi-master clusters: each node can do all management task
  43
  44 * `pmxcfs`: database-driven file system for storing configuration files,
  45  replicated in real-time on all nodes using `corosync`.
  46
  47 * Easy migration of virtual machines and containers between physical
  48   hosts
  49
  50 * Fast deployment
  51
  52 * Cluster-wide services like firewall and HA
  53
  54
  55 Requirements
  56 ------------
  57
  58 * All nodes must be in the same network as `corosync` uses IP Multicast
  59  to communicate between nodes (also see
  60  http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
  61  ports 5404 and 5405 for cluster communication.
  62 +
  63 NOTE: Some switches do not support IP multicast by default and must be
  64 manually enabled first.
  65
  66 * Date and time have to be synchronized.
  67
  68 * SSH tunnel on TCP port 22 between nodes is used.
  69
  70 * If you are interested in High Availability, you need to have at
  71   least three nodes for reliable quorum. All nodes should have the
  72   same version.
  73
  74 * We recommend a dedicated NIC for the cluster traffic, especially if
  75   you use shared storage.
  76
  77 NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
  78 Proxmox VE 4.0 cluster nodes.
  79
  80
  81 Preparing Nodes
  82 ---------------
  83
  84 First, install {PVE} on all nodes. Make sure that each node is
  85 installed with the final hostname and IP configuration. Changing the
  86 hostname and IP is not possible after cluster creation.
  87
  88 Currently the cluster creation has to be done on the console, so you
  89 need to login via `ssh`.
  90
  91 Create the Cluster
  92 ------------------
  93
  94 Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
  95 This name cannot be changed later.
  96
  97  hp1# pvecm create YOUR-CLUSTER-NAME
  98
  99 CAUTION: The cluster name is used to compute the default multicast
 100 address. Please use unique cluster names if you run more than one
 101 cluster inside your network.
 102
 103 To check the state of your cluster use:
 104
 105  hp1# pvecm status
 106
 107
 108 Adding Nodes to the Cluster
 109 ---------------------------
 110
 111 Login via `ssh` to the node you want to add.
 112
 113  hp2# pvecm add IP-ADDRESS-CLUSTER
 114
 115 For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
 116
 117 CAUTION: A new node cannot hold any VMs, because you would get
 118 conflicts about identical VM IDs. Also, all existing configuration in
 119 `/etc/pve` is overwritten when you join a new node to the cluster. To
 120 workaround, use `vzdump` to backup and restore to a different VMID after
 121 adding the node to the cluster.
 122
 123 To check the state of cluster:
 124
 125  # pvecm status
 126
 127 .Cluster status after adding 4 nodes
 128 ----
 129 hp2# pvecm status
 130 Quorum information
 131 ~~~~~~~~~~~~~~~~~~
 132 Date:             Mon Apr 20 12:30:13 2015
 133 Quorum provider:  corosync_votequorum
 134 Nodes:            4
 135 Node ID:          0x00000001
 136 Ring ID:          1928
 137 Quorate:          Yes
 138
 139 Votequorum information
 140 ~~~~~~~~~~~~~~~~~~~~~~
 141 Expected votes:   4
 142 Highest expected: 4
 143 Total votes:      4
 144 Quorum:           2
 145 Flags:            Quorate
 146
 147 Membership information
 148 ~~~~~~~~~~~~~~~~~~~~~~
 149     Nodeid      Votes Name
 150 0x00000001          1 192.168.15.91
 151 0x00000002          1 192.168.15.92 (local)
 152 0x00000003          1 192.168.15.93
 153 0x00000004          1 192.168.15.94
 154 ----
 155
 156 If you only want the list of all nodes use:
 157
 158  # pvecm nodes
 159
 160 .List nodes in a cluster
 161 ----
 162 hp2# pvecm nodes
 163
 164 Membership information
 165 ~~~~~~~~~~~~~~~~~~~~~~
 166     Nodeid      Votes Name
 167          1          1 hp1
 168          2          1 hp2 (local)
 169          3          1 hp3
 170          4          1 hp4
 171 ----
 172
 173
 174 Remove a Cluster Node
 175 ---------------------
 176
 177 CAUTION: Read carefully the procedure before proceeding, as it could
 178 not be what you want or need.
 179
 180 Move all virtual machines from the node. Make sure you have no local
 181 data or backups you want to keep, or save them accordingly.
 182
 183 Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
 184 identify the node ID:
 185
 186 ----
 187 hp1# pvecm status
 188
 189 Quorum information
 190 ~~~~~~~~~~~~~~~~~~
 191 Date:             Mon Apr 20 12:30:13 2015
 192 Quorum provider:  corosync_votequorum
 193 Nodes:            4
 194 Node ID:          0x00000001
 195 Ring ID:          1928
 196 Quorate:          Yes
 197
 198 Votequorum information
 199 ~~~~~~~~~~~~~~~~~~~~~~
 200 Expected votes:   4
 201 Highest expected: 4
 202 Total votes:      4
 203 Quorum:           2
 204 Flags:            Quorate
 205
 206 Membership information
 207 ~~~~~~~~~~~~~~~~~~~~~~
 208     Nodeid      Votes Name
 209 0x00000001          1 192.168.15.91 (local)
 210 0x00000002          1 192.168.15.92
 211 0x00000003          1 192.168.15.93
 212 0x00000004          1 192.168.15.94
 213 ----
 214
 215 IMPORTANT: at this point you must power off the node to be removed and
 216 make sure that it will not power on again (in the network) as it
 217 is.
 218
 219 ----
 220 hp1# pvecm nodes
 221
 222 Membership information
 223 ~~~~~~~~~~~~~~~~~~~~~~
 224     Nodeid      Votes Name
 225          1          1 hp1 (local)
 226          2          1 hp2
 227          3          1 hp3
 228          4          1 hp4
 229 ----
 230
 231 Log in to one remaining node via ssh. Issue the delete command (here
 232 deleting node `hp4`):
 233
 234  hp1# pvecm delnode hp4
 235
 236 If the operation succeeds no output is returned, just check the node
 237 list again with `pvecm nodes` or `pvecm status`. You should see
 238 something like:
 239
 240 ----
 241 hp1# pvecm status
 242
 243 Quorum information
 244 ~~~~~~~~~~~~~~~~~~
 245 Date:             Mon Apr 20 12:44:28 2015
 246 Quorum provider:  corosync_votequorum
 247 Nodes:            3
 248 Node ID:          0x00000001
 249 Ring ID:          1992
 250 Quorate:          Yes
 251
 252 Votequorum information
 253 ~~~~~~~~~~~~~~~~~~~~~~
 254 Expected votes:   3
 255 Highest expected: 3
 256 Total votes:      3
 257 Quorum:           3
 258 Flags:            Quorate
 259
 260 Membership information
 261 ~~~~~~~~~~~~~~~~~~~~~~
 262     Nodeid      Votes Name
 263 0x00000001          1 192.168.15.90 (local)
 264 0x00000002          1 192.168.15.91
 265 0x00000003          1 192.168.15.92
 266 ----
 267
 268 IMPORTANT: as said above, it is very important to power off the node
 269 *before* removal, and make sure that it will *never* power on again
 270 (in the existing cluster network) as it is.
 271
 272 If you power on the node as it is, your cluster will be screwed up and
 273 it could be difficult to restore a clean cluster state.
 274
 275 If, for whatever reason, you want that this server joins the same
 276 cluster again, you have to
 277
 278 * reinstall {pve} on it from scratch
 279
 280 * then join it, as explained in the previous section.
 281
 282 Separate A Node Without Reinstalling
 283 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 284
 285 CAUTION: This is *not* the recommended method, proceed with caution. Use the
 286 above mentioned method if you're unsure.
 287
 288 You can also separate a node from a cluster without reinstalling it from
 289 scratch.  But after removing the node from the cluster it will still have
 290 access to the shared storages! This must be resolved before you start removing
 291 the node from the cluster. A {pve} cluster cannot share the exact same
 292 storage with another cluster, as it leads to VMID conflicts.
 293
 294 Move the guests which you want to keep on this node now, after the removal you
 295 can do this only via backup and restore. Its suggested that you create a new
 296 storage where only the node which you want to separate has access. This can be
 297 an new export on your NFS or a new Ceph pool, to name a few examples. Its just
 298 important that the exact same storage does not gets accessed by multiple
 299 clusters. After setting this storage up move all data from the node and its VMs
 300 to it. Then you are ready to separate the node from the cluster.
 301
 302 WARNING: Ensure all shared resources are cleanly separated! You will run into
 303 conflicts and problems else.
 304
 305 First stop the corosync and the pve-cluster services on the node:
 306 [source,bash]
 307 systemctl stop pve-cluster
 308 systemctl stop corosync
 309
 310 Start the cluster filesystem again in local mode:
 311 [source,bash]
 312 pmxcfs -l
 313
 314 Delete the corosync configuration files:
 315 [source,bash]
 316 rm /etc/pve/corosync.conf
 317 rm /etc/corosync/*
 318
 319 You can now start the filesystem again as normal service:
 320 [source,bash]
 321 killall pmxcfs
 322 systemctl start pve-cluster
 323
 324 The node is now separated from the cluster. You can deleted it from a remaining
 325 node of the cluster with:
 326 [source,bash]
 327 pvecm delnode oldnode
 328
 329 If the command failed, because the remaining node in the cluster lost quorum
 330 when the now separate node exited, you may set the expected votes to 1 as a workaround:
 331 [source,bash]
 332 pvecm expected 1
 333
 334 And the repeat the 'pvecm delnode' command.
 335
 336 Now switch back to the separated node, here delete all remaining files left
 337 from the old cluster. This ensures that the node can be added to another
 338 cluster again without problems.
 339
 340 [source,bash]
 341 rm /var/lib/corosync/*
 342
 343 As the configuration files from the other nodes are still in the cluster
 344 filesystem you may want to clean those up too.  Remove simply the whole
 345 directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
 346 you used the correct one before deleting it.
 347
 348 CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
 349 the nodes can still connect to each other with public key authentication. This
 350 should be fixed by removing the respective keys from the
 351 '/etc/pve/priv/authorized_keys' file.
 352
 353 Quorum
 354 ------
 355
 356 {pve} use a quorum-based technique to provide a consistent state among
 357 all cluster nodes.
 358
 359 [quote, from Wikipedia, Quorum (distributed computing)]
 360 ____
 361 A quorum is the minimum number of votes that a distributed transaction
 362 has to obtain in order to be allowed to perform an operation in a
 363 distributed system.
 364 ____
 365
 366 In case of network partitioning, state changes requires that a
 367 majority of nodes are online. The cluster switches to read-only mode
 368 if it loses quorum.
 369
 370 NOTE: {pve} assigns a single vote to each node by default.
 371
 372
 373 Cluster Cold Start
 374 ------------------
 375
 376 It is obvious that a cluster is not quorate when all nodes are
 377 offline. This is a common case after a power failure.
 378
 379 NOTE: It is always a good idea to use an uninterruptible power supply
 380 (``UPS'', also called ``battery backup'') to avoid this state, especially if
 381 you want HA.
 382
 383 On node startup, service `pve-manager` is started and waits for
 384 quorum. Once quorate, it starts all guests which have the `onboot`
 385 flag set.
 386
 387 When you turn on nodes, or when power comes back after power failure,
 388 it is likely that some nodes boots faster than others. Please keep in
 389 mind that guest startup is delayed until you reach quorum.
 390
 391
 392 ifdef::manvolnum[]
 393 include::pve-copyright.adoc[]
 394 endif::manvolnum[]