pvecm.adoc

   1 [[chapter_pvecm]]
   2 ifdef::manvolnum[]
   3 pvecm(1)
   4 ========
   5 :pve-toplevel:
   6
   7 NAME
   8 ----
   9
  10 pvecm - Proxmox VE Cluster Manager
  11
  12 SYNOPSIS
  13 --------
  14
  15 include::pvecm.1-synopsis.adoc[]
  16
  17 DESCRIPTION
  18 -----------
  19 endif::manvolnum[]
  20
  21 ifndef::manvolnum[]
  22 Cluster Manager
  23 ===============
  24 :pve-toplevel:
  25 endif::manvolnum[]
  26
  27 The {PVE} cluster manager `pvecm` is a tool to create a group of
  28 physical servers. Such a group is called a *cluster*. We use the
  29 http://www.corosync.org[Corosync Cluster Engine] for reliable group
  30 communication, and such clusters can consist of up to 32 physical nodes
  31 (probably more, dependent on network latency).
  32
  33 `pvecm` can be used to create a new cluster, join nodes to a cluster,
  34 leave the cluster, get status information and do various other cluster
  35 related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
  36 is used to transparently distribute the cluster configuration to all cluster
  37 nodes.
  38
  39 Grouping nodes into a cluster has the following advantages:
  40
  41 * Centralized, web based management
  42
  43 * Multi-master clusters: each node can do all management task
  44
  45 * `pmxcfs`: database-driven file system for storing configuration files,
  46  replicated in real-time on all nodes using `corosync`.
  47
  48 * Easy migration of virtual machines and containers between physical
  49   hosts
  50
  51 * Fast deployment
  52
  53 * Cluster-wide services like firewall and HA
  54
  55
  56 Requirements
  57 ------------
  58
  59 * All nodes must be in the same network as `corosync` uses IP Multicast
  60  to communicate between nodes (also see
  61  http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
  62  ports 5404 and 5405 for cluster communication.
  63 +
  64 NOTE: Some switches do not support IP multicast by default and must be
  65 manually enabled first.
  66
  67 * Date and time have to be synchronized.
  68
  69 * SSH tunnel on TCP port 22 between nodes is used.
  70
  71 * If you are interested in High Availability, you need to have at
  72   least three nodes for reliable quorum. All nodes should have the
  73   same version.
  74
  75 * We recommend a dedicated NIC for the cluster traffic, especially if
  76   you use shared storage.
  77
  78 NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
  79 Proxmox VE 4.0 cluster nodes.
  80
  81
  82 Preparing Nodes
  83 ---------------
  84
  85 First, install {PVE} on all nodes. Make sure that each node is
  86 installed with the final hostname and IP configuration. Changing the
  87 hostname and IP is not possible after cluster creation.
  88
  89 Currently the cluster creation has to be done on the console, so you
  90 need to login via `ssh`.
  91
  92 Create the Cluster
  93 ------------------
  94
  95 Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
  96 This name cannot be changed later.
  97
  98  hp1# pvecm create YOUR-CLUSTER-NAME
  99
 100 CAUTION: The cluster name is used to compute the default multicast
 101 address. Please use unique cluster names if you run more than one
 102 cluster inside your network.
 103
 104 To check the state of your cluster use:
 105
 106  hp1# pvecm status
 107
 108 Multiple Clusters In Same Network
 109 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 110
 111 It is possible to create multiple clusters in the same physical or logical
 112 network. Each cluster must have a unique name, which is used to generate the
 113 cluster's multicast group address. As long as no duplicate cluster names are
 114 configured in one network segment, the different clusters won't interfere with
 115 each other.
 116
 117 If multiple clusters operate in a single network it may be beneficial to setup
 118 an IGMP querier and enable IGMP Snooping in said network. This may reduce the
 119 load of the network significantly because multicast packets are only delivered
 120 to endpoints of the respective member nodes.
 121
 122
 123 Adding Nodes to the Cluster
 124 ---------------------------
 125
 126 Login via `ssh` to the node you want to add.
 127
 128  hp2# pvecm add IP-ADDRESS-CLUSTER
 129
 130 For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
 131
 132 CAUTION: A new node cannot hold any VMs, because you would get
 133 conflicts about identical VM IDs. Also, all existing configuration in
 134 `/etc/pve` is overwritten when you join a new node to the cluster. To
 135 workaround, use `vzdump` to backup and restore to a different VMID after
 136 adding the node to the cluster.
 137
 138 To check the state of cluster:
 139
 140  # pvecm status
 141
 142 .Cluster status after adding 4 nodes
 143 ----
 144 hp2# pvecm status
 145 Quorum information
 146 ~~~~~~~~~~~~~~~~~~
 147 Date:             Mon Apr 20 12:30:13 2015
 148 Quorum provider:  corosync_votequorum
 149 Nodes:            4
 150 Node ID:          0x00000001
 151 Ring ID:          1928
 152 Quorate:          Yes
 153
 154 Votequorum information
 155 ~~~~~~~~~~~~~~~~~~~~~~
 156 Expected votes:   4
 157 Highest expected: 4
 158 Total votes:      4
 159 Quorum:           2
 160 Flags:            Quorate
 161
 162 Membership information
 163 ~~~~~~~~~~~~~~~~~~~~~~
 164     Nodeid      Votes Name
 165 0x00000001          1 192.168.15.91
 166 0x00000002          1 192.168.15.92 (local)
 167 0x00000003          1 192.168.15.93
 168 0x00000004          1 192.168.15.94
 169 ----
 170
 171 If you only want the list of all nodes use:
 172
 173  # pvecm nodes
 174
 175 .List nodes in a cluster
 176 ----
 177 hp2# pvecm nodes
 178
 179 Membership information
 180 ~~~~~~~~~~~~~~~~~~~~~~
 181     Nodeid      Votes Name
 182          1          1 hp1
 183          2          1 hp2 (local)
 184          3          1 hp3
 185          4          1 hp4
 186 ----
 187
 188 [[adding-nodes-with-separated-cluster-network]]
 189 Adding Nodes With Separated Cluster Network
 190 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 191
 192 When adding a node to a cluster with a separated cluster network you need to
 193 use the 'ringX_addr' parameters to set the nodes address on those networks:
 194
 195 [source,bash]
 196 ----
 197 pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
 198 ----
 199
 200 If you want to use the Redundant Ring Protocol you will also want to pass the
 201 'ring1_addr' parameter.
 202
 203
 204 Remove a Cluster Node
 205 ---------------------
 206
 207 CAUTION: Read carefully the procedure before proceeding, as it could
 208 not be what you want or need.
 209
 210 Move all virtual machines from the node. Make sure you have no local
 211 data or backups you want to keep, or save them accordingly.
 212 In the following example we will remove the node hp4 from the cluster.
 213
 214 Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
 215 command to identify the node ID to remove:
 216
 217 ----
 218 hp1# pvecm nodes
 219
 220 Membership information
 221 ~~~~~~~~~~~~~~~~~~~~~~
 222     Nodeid      Votes Name
 223          1          1 hp1 (local)
 224          2          1 hp2
 225          3          1 hp3
 226          4          1 hp4
 227 ----
 228
 229
 230 At this point you must power off hp4 and
 231 make sure that it will not power on again (in the network) as it
 232 is.
 233
 234 IMPORTANT: As said above, it is critical to power off the node
 235 *before* removal, and make sure that it will *never* power on again
 236 (in the existing cluster network) as it is.
 237 If you power on the node as it is, your cluster will be screwed up and
 238 it could be difficult to restore a clean cluster state.
 239
 240 After powering off the node hp4, we can safely remove it from the cluster.
 241
 242  hp1# pvecm delnode hp4
 243
 244 If the operation succeeds no output is returned, just check the node
 245 list again with `pvecm nodes` or `pvecm status`. You should see
 246 something like:
 247
 248 ----
 249 hp1# pvecm status
 250
 251 Quorum information
 252 ~~~~~~~~~~~~~~~~~~
 253 Date:             Mon Apr 20 12:44:28 2015
 254 Quorum provider:  corosync_votequorum
 255 Nodes:            3
 256 Node ID:          0x00000001
 257 Ring ID:          1992
 258 Quorate:          Yes
 259
 260 Votequorum information
 261 ~~~~~~~~~~~~~~~~~~~~~~
 262 Expected votes:   3
 263 Highest expected: 3
 264 Total votes:      3
 265 Quorum:           3
 266 Flags:            Quorate
 267
 268 Membership information
 269 ~~~~~~~~~~~~~~~~~~~~~~
 270     Nodeid      Votes Name
 271 0x00000001          1 192.168.15.90 (local)
 272 0x00000002          1 192.168.15.91
 273 0x00000003          1 192.168.15.92
 274 ----
 275
 276 If, for whatever reason, you want that this server joins the same
 277 cluster again, you have to
 278
 279 * reinstall {pve} on it from scratch
 280
 281 * then join it, as explained in the previous section.
 282
 283 [[pvecm_separate_node_without_reinstall]]
 284 Separate A Node Without Reinstalling
 285 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 286
 287 CAUTION: This is *not* the recommended method, proceed with caution. Use the
 288 above mentioned method if you're unsure.
 289
 290 You can also separate a node from a cluster without reinstalling it from
 291 scratch.  But after removing the node from the cluster it will still have
 292 access to the shared storages! This must be resolved before you start removing
 293 the node from the cluster. A {pve} cluster cannot share the exact same
 294 storage with another cluster, as storage locking doesn't work over cluster
 295 boundary. Further, it may also lead to VMID conflicts.
 296
 297 Its suggested that you create a new storage where only the node which you want
 298 to separate has access. This can be an new export on your NFS or a new Ceph
 299 pool, to name a few examples. Its just important that the exact same storage
 300 does not gets accessed by multiple clusters. After setting this storage up move
 301 all data from the node and its VMs to it. Then you are ready to separate the
 302 node from the cluster.
 303
 304 WARNING: Ensure all shared resources are cleanly separated! You will run into
 305 conflicts and problems else.
 306
 307 First stop the corosync and the pve-cluster services on the node:
 308 [source,bash]
 309 ----
 310 systemctl stop pve-cluster
 311 systemctl stop corosync
 312 ----
 313
 314 Start the cluster filesystem again in local mode:
 315 [source,bash]
 316 ----
 317 pmxcfs -l
 318 ----
 319
 320 Delete the corosync configuration files:
 321 [source,bash]
 322 ----
 323 rm /etc/pve/corosync.conf
 324 rm /etc/corosync/*
 325 ----
 326
 327 You can now start the filesystem again as normal service:
 328 [source,bash]
 329 ----
 330 killall pmxcfs
 331 systemctl start pve-cluster
 332 ----
 333
 334 The node is now separated from the cluster. You can deleted it from a remaining
 335 node of the cluster with:
 336 [source,bash]
 337 ----
 338 pvecm delnode oldnode
 339 ----
 340
 341 If the command failed, because the remaining node in the cluster lost quorum
 342 when the now separate node exited, you may set the expected votes to 1 as a workaround:
 343 [source,bash]
 344 ----
 345 pvecm expected 1
 346 ----
 347
 348 And the repeat the 'pvecm delnode' command.
 349
 350 Now switch back to the separated node, here delete all remaining files left
 351 from the old cluster. This ensures that the node can be added to another
 352 cluster again without problems.
 353
 354 [source,bash]
 355 ----
 356 rm /var/lib/corosync/*
 357 ----
 358
 359 As the configuration files from the other nodes are still in the cluster
 360 filesystem you may want to clean those up too.  Remove simply the whole
 361 directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
 362 you used the correct one before deleting it.
 363
 364 CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
 365 the nodes can still connect to each other with public key authentication. This
 366 should be fixed by removing the respective keys from the
 367 '/etc/pve/priv/authorized_keys' file.
 368
 369 Quorum
 370 ------
 371
 372 {pve} use a quorum-based technique to provide a consistent state among
 373 all cluster nodes.
 374
 375 [quote, from Wikipedia, Quorum (distributed computing)]
 376 ____
 377 A quorum is the minimum number of votes that a distributed transaction
 378 has to obtain in order to be allowed to perform an operation in a
 379 distributed system.
 380 ____
 381
 382 In case of network partitioning, state changes requires that a
 383 majority of nodes are online. The cluster switches to read-only mode
 384 if it loses quorum.
 385
 386 NOTE: {pve} assigns a single vote to each node by default.
 387
 388 Cluster Network
 389 ---------------
 390
 391 The cluster network is the core of a cluster. All messages sent over it have to
 392 be delivered reliable to all nodes in their respective order. In {pve} this
 393 part is done by corosync, an implementation of a high performance low overhead
 394 high availability development toolkit. It serves our decentralized
 395 configuration file system (`pmxcfs`).
 396
 397 [[cluster-network-requirements]]
 398 Network Requirements
 399 ~~~~~~~~~~~~~~~~~~~~
 400 This needs a reliable network with latencies under 2 milliseconds (LAN
 401 performance) to work properly. While corosync can also use unicast for
 402 communication between nodes its **highly recommended** to have a multicast
 403 capable network. The network should not be used heavily by other members,
 404 ideally corosync runs on its own network.
 405 *never* share it with network where storage communicates too.
 406
 407 Before setting up a cluster it is good practice to check if the network is fit
 408 for that purpose.
 409
 410 * Ensure that all nodes are in the same subnet. This must only be true for the
 411   network interfaces used for cluster communication (corosync).
 412
 413 * Ensure all nodes can reach each other over those interfaces, using `ping` is
 414   enough for a basic test.
 415
 416 * Ensure that multicast works in general and a high package rates. This can be
 417   done with the `omping` tool. The final "%loss" number should be < 1%.
 418 +
 419 [source,bash]
 420 ----
 421 omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
 422 ----
 423
 424 * Ensure that multicast communication works over an extended period of time.
 425   This uncovers problems where IGMP snooping is activated on the network but
 426   no multicast querier is active. This test has a duration of around 10
 427   minutes.
 428 +
 429 [source,bash]
 430 ----
 431 omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
 432 ----
 433
 434 Your network is not ready for clustering if any of these test fails. Recheck
 435 your network configuration. Especially switches are notorious for having
 436 multicast disabled by default or IGMP snooping enabled with no IGMP querier
 437 active.
 438
 439 In smaller cluster its also an option to use unicast if you really cannot get
 440 multicast to work.
 441
 442 Separate Cluster Network
 443 ~~~~~~~~~~~~~~~~~~~~~~~~
 444
 445 When creating a cluster without any parameters the cluster network is generally
 446 shared with the Web UI and the VMs and its traffic. Depending on your setup
 447 even storage traffic may get sent over the same network. Its recommended to
 448 change that, as corosync is a time critical real time application.
 449
 450 Setting Up A New Network
 451 ^^^^^^^^^^^^^^^^^^^^^^^^
 452
 453 First you have to setup a new network interface. It should be on a physical
 454 separate network. Ensure that your network fulfills the
 455 <<cluster-network-requirements,cluster network requirements>>.
 456
 457 Separate On Cluster Creation
 458 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 459
 460 This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
 461 the 'pvecm create' command used for creating a new cluster.
 462
 463 If you have setup an additional NIC with a static address on 10.10.10.1/25
 464 and want to send and receive all cluster communication over this interface
 465 you would execute:
 466
 467 [source,bash]
 468 ----
 469 pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
 470 ----
 471
 472 To check if everything is working properly execute:
 473 [source,bash]
 474 ----
 475 systemctl status corosync
 476 ----
 477
 478 Afterwards, proceed as descripted in the section to
 479 <<adding-nodes-with-separated-cluster-network,add nodes with a separated cluster network>>.
 480
 481 [[separate-cluster-net-after-creation]]
 482 Separate After Cluster Creation
 483 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 484
 485 You can do this also if you have already created a cluster and want to switch
 486 its communication to another network, without rebuilding the whole cluster.
 487 This change may lead to short durations of quorum loss in the cluster, as nodes
 488 have to restart corosync and come up one after the other on the new network.
 489
 490 Check how to <<edit-corosync-conf,edit the corosync.conf file>> first.
 491 The open it and you should see a file similar to:
 492
 493 ----
 494 logging {
 495   debug: off
 496   to_syslog: yes
 497 }
 498
 499 nodelist {
 500
 501   node {
 502     name: due
 503     nodeid: 2
 504     quorum_votes: 1
 505     ring0_addr: due
 506   }
 507
 508   node {
 509     name: tre
 510     nodeid: 3
 511     quorum_votes: 1
 512     ring0_addr: tre
 513   }
 514
 515   node {
 516     name: uno
 517     nodeid: 1
 518     quorum_votes: 1
 519     ring0_addr: uno
 520   }
 521
 522 }
 523
 524 quorum {
 525   provider: corosync_votequorum
 526 }
 527
 528 totem {
 529   cluster_name: thomas-testcluster
 530   config_version: 3
 531   ip_version: ipv4
 532   secauth: on
 533   version: 2
 534   interface {
 535     bindnetaddr: 192.168.30.50
 536     ringnumber: 0
 537   }
 538
 539 }
 540 ----
 541
 542 The first you want to do is add the 'name' properties in the node entries if
 543 you do not see them already. Those *must* match the node name.
 544
 545 Then replace the address from the 'ring0_addr' properties with the new
 546 addresses.  You may use plain IP addresses or also hostnames here. If you use
 547 hostnames ensure that they are resolvable from all nodes.
 548
 549 In my example I want to switch my cluster communication to the 10.10.10.1/25
 550 network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr
 551 in the totem section of the config to an address of the new network. It can be
 552 any address from the subnet configured on the new network interface.
 553
 554 After you increased the 'config_version' property the new configuration file
 555 should look like:
 556
 557 ----
 558
 559 logging {
 560   debug: off
 561   to_syslog: yes
 562 }
 563
 564 nodelist {
 565
 566   node {
 567     name: due
 568     nodeid: 2
 569     quorum_votes: 1
 570     ring0_addr: 10.10.10.2
 571   }
 572
 573   node {
 574     name: tre
 575     nodeid: 3
 576     quorum_votes: 1
 577     ring0_addr: 10.10.10.3
 578   }
 579
 580   node {
 581     name: uno
 582     nodeid: 1
 583     quorum_votes: 1
 584     ring0_addr: 10.10.10.1
 585   }
 586
 587 }
 588
 589 quorum {
 590   provider: corosync_votequorum
 591 }
 592
 593 totem {
 594   cluster_name: thomas-testcluster
 595   config_version: 4
 596   ip_version: ipv4
 597   secauth: on
 598   version: 2
 599   interface {
 600     bindnetaddr: 10.10.10.1
 601     ringnumber: 0
 602   }
 603
 604 }
 605 ----
 606
 607 Now after a final check whether all changed information is correct we save it
 608 and see again the <<edit-corosync-conf,edit corosync.conf file>> section to
 609 learn how to bring it in effect.
 610
 611 As our change cannot be enforced live from corosync we have to do an restart.
 612
 613 On a single node execute:
 614 [source,bash]
 615 ----
 616 systemctl restart corosync
 617 ----
 618
 619 Now check if everything is fine:
 620
 621 [source,bash]
 622 ----
 623 systemctl status corosync
 624 ----
 625
 626 If corosync runs again correct restart corosync also on all other nodes.
 627 They will then join the cluster membership one by one on the new network.
 628
 629 Redundant Ring Protocol
 630 ~~~~~~~~~~~~~~~~~~~~~~~
 631 To avoid a single point of failure you should implement counter measurements.
 632 This can be on the hardware and operating system level through network bonding.
 633
 634 Corosync itself offers also a possibility to add redundancy through the so
 635 called 'Redundant Ring Protocol'. This protocol allows running a second totem
 636 ring on another network, this network should be physically separated from the
 637 other rings network to actually increase availability.
 638
 639 RRP On Cluster Creation
 640 ~~~~~~~~~~~~~~~~~~~~~~~
 641
 642 The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
 643 'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
 644
 645 NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means.
 646
 647 So if you have two networks, one on the 10.10.10.1/24 and the other on the
 648 10.10.20.1/24 subnet you would execute:
 649
 650 [source,bash]
 651 ----
 652 pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
 653 -bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
 654 ----
 655
 656 RRP On Existing Clusters
 657 ~~~~~~~~~~~~~~~~~~~~~~~~
 658
 659 You will take similar steps as described in
 660 <<separate-cluster-net-after-creation,separating the cluster network>> to
 661 enable RRP on an already running cluster. The single difference is, that you
 662 will add `ring1` and use it instead of `ring0`.
 663
 664 First add a new `interface` subsection in the `totem` section, set its
 665 `ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
 666 address of the subnet you have configured for your new ring.
 667 Further set the `rrp_mode` to `passive`, this is the only stable mode.
 668
 669 Then add to each node entry in the `nodelist` section its new `ring1_addr`
 670 property with the nodes additional ring address.
 671
 672 So if you have two networks, one on the 10.10.10.1/24 and the other on the
 673 10.10.20.1/24 subnet, the final configuration file should look like:
 674
 675 ----
 676 totem {
 677   cluster_name: tweak
 678   config_version: 9
 679   ip_version: ipv4
 680   rrp_mode: passive
 681   secauth: on
 682   version: 2
 683   interface {
 684     bindnetaddr: 10.10.10.1
 685     ringnumber: 0
 686   }
 687   interface {
 688     bindnetaddr: 10.10.20.1
 689     ringnumber: 1
 690   }
 691 }
 692
 693 nodelist {
 694   node {
 695     name: pvecm1
 696     nodeid: 1
 697     quorum_votes: 1
 698     ring0_addr: 10.10.10.1
 699     ring1_addr: 10.10.20.1
 700   }
 701
 702  node {
 703     name: pvecm2
 704     nodeid: 2
 705     quorum_votes: 1
 706     ring0_addr: 10.10.10.2
 707     ring1_addr: 10.10.20.2
 708   }
 709
 710   [...] # other cluster nodes here
 711 }
 712
 713 [...] # other remaining config sections here
 714
 715 ----
 716
 717 Bring it in effect like described in the
 718 <<edit-corosync-conf,edit the corosync.conf file>> section.
 719
 720 This is a change which cannot take live in effect and needs at least a restart
 721 of corosync. Recommended is a restart of the whole cluster.
 722
 723 If you cannot reboot the whole cluster ensure no High Availability services are
 724 configured and the stop the corosync service on all nodes. After corosync is
 725 stopped on all nodes start it one after the other again.
 726
 727 Corosync Configuration
 728 ----------------------
 729
 730 The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It
 731 controls the cluster member ship and its network.
 732 For reading more about it check the corosync.conf man page:
 733 [source,bash]
 734 ----
 735 man corosync.conf
 736 ----
 737
 738 For node membership you should always use the `pvecm` tool provided by {pve}.
 739 You may have to edit the configuration file manually for other changes.
 740 Here are a few best practice tips for doing this.
 741
 742 [[edit-corosync-conf]]
 743 Edit corosync.conf
 744 ~~~~~~~~~~~~~~~~~~
 745
 746 Editing the corosync.conf file can be not always straight forward. There are
 747 two on each cluster, one in `/etc/pve/corosync.conf` and the other in
 748 `/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
 749 propagate the changes to the local one, but not vice versa.
 750
 751 The configuration will get updated automatically as soon as the file changes.
 752 This means changes which can be integrated in a running corosync will take
 753 instantly effect. So you should always make a copy and edit that instead, to
 754 avoid triggering some unwanted changes by an in between safe.
 755
 756 [source,bash]
 757 ----
 758 cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
 759 ----
 760
 761 Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
 762 preinstalled on {pve} for example.
 763
 764 NOTE: Always increment the 'config_version' number on configuration changes,
 765 omitting this can lead to problems.
 766
 767 After making the necessary changes create another copy of the current working
 768 configuration file. This serves as a backup if the new configuration fails to
 769 apply or makes problems in other ways.
 770
 771 [source,bash]
 772 ----
 773 cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
 774 ----
 775
 776 Then move the new configuration file over the old one:
 777 [source,bash]
 778 ----
 779 mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
 780 ----
 781
 782 You may check with the commands
 783 [source,bash]
 784 ----
 785 systemctl status corosync
 786 journalctl -b -u corosync
 787 ----
 788
 789 If the change could applied automatically. If not you may have to restart the
 790 corosync service via:
 791 [source,bash]
 792 ----
 793 systemctl restart corosync
 794 ----
 795
 796 On errors check the troubleshooting section below.
 797
 798 Troubleshooting
 799 ~~~~~~~~~~~~~~~
 800
 801 Issue: 'quorum.expected_votes must be configured'
 802 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 803
 804 When corosync starts to fail and you get the following message in the system log:
 805
 806 ----
 807 [...]
 808 corosync[1647]:  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
 809 corosync[1647]:  [SERV  ] Service engine 'corosync_quorum' failed to load for reason
 810     'configuration error: nodelist or quorum.expected_votes must be configured!'
 811 [...]
 812 ----
 813
 814 It means that the hostname you set for corosync 'ringX_addr' in the
 815 configuration could not be resolved.
 816
 817
 818 Write Configuration When Not Quorate
 819 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 820
 821 If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you
 822 know what you do, use:
 823 [source,bash]
 824 ----
 825 pvecm expected 1
 826 ----
 827
 828 This sets the expected vote count to 1 and makes the cluster quorate. You can
 829 now fix your configuration, or revert it back to the last working backup.
 830
 831 This is not enough if corosync cannot start anymore. Here its best to edit the
 832 local copy of the corosync configuration in '/etc/corosync/corosync.conf' so
 833 that corosync can start again. Ensure that on all nodes this configuration has
 834 the same content to avoid split brains. If you are not sure what went wrong
 835 it's best to ask the Proxmox Community to help you.
 836
 837
 838 [[corosync-conf-glossary]]
 839 Corosync Configuration Glossary
 840 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 841
 842 ringX_addr::
 843 This names the different ring addresses for the corosync totem rings used for
 844 the cluster communication.
 845
 846 bindnetaddr::
 847 Defines to which interface the ring should bind to. It may be any address of
 848 the subnet configured on the interface we want to use. In general its the
 849 recommended to just use an address a node uses on this interface.
 850
 851 rrp_mode::
 852 Specifies the mode of the redundant ring protocol and may be passive, active or
 853 none. Note that use of active is highly experimental and not official
 854 supported. Passive is the preferred mode, it may double the cluster
 855 communication throughput and increases availability.
 856
 857
 858 Cluster Cold Start
 859 ------------------
 860
 861 It is obvious that a cluster is not quorate when all nodes are
 862 offline. This is a common case after a power failure.
 863
 864 NOTE: It is always a good idea to use an uninterruptible power supply
 865 (``UPS'', also called ``battery backup'') to avoid this state, especially if
 866 you want HA.
 867
 868 On node startup, the `pve-guests` service is started and waits for
 869 quorum. Once quorate, it starts all guests which have the `onboot`
 870 flag set.
 871
 872 When you turn on nodes, or when power comes back after power failure,
 873 it is likely that some nodes boots faster than others. Please keep in
 874 mind that guest startup is delayed until you reach quorum.
 875
 876
 877 Guest Migration
 878 ---------------
 879
 880 Migrating virtual guests to other nodes is a useful feature in a
 881 cluster. There are settings to control the behavior of such
 882 migrations. This can be done via the configuration file
 883 `datacenter.cfg` or for a specific migration via API or command line
 884 parameters.
 885
 886 It makes a difference if a Guest is online or offline, or if it has
 887 local resources (like a local disk).
 888
 889 For Details about Virtual Machine Migration see the
 890 xref:qm_migration[QEMU/KVM Migration Chapter]
 891
 892 For Details about Container Migration see the
 893 xref:pct_migration[Container Migration Chapter]
 894
 895 Migration Type
 896 ~~~~~~~~~~~~~~
 897
 898 The migration type defines if the migration data should be sent over an
 899 encrypted (`secure`) channel or an unencrypted (`insecure`) one.
 900 Setting the migration type to insecure means that the RAM content of a
 901 virtual guest gets also transferred unencrypted, which can lead to
 902 information disclosure of critical data from inside the guest (for
 903 example passwords or encryption keys).
 904
 905 Therefore, we strongly recommend using the secure channel if you do
 906 not have full control over the network and can not guarantee that no
 907 one is eavesdropping to it.
 908
 909 NOTE: Storage migration does not follow this setting. Currently, it
 910 always sends the storage content over a secure channel.
 911
 912 Encryption requires a lot of computing power, so this setting is often
 913 changed to "unsafe" to achieve better performance. The impact on
 914 modern systems is lower because they implement AES encryption in
 915 hardware. The performance impact is particularly evident in fast
 916 networks where you can transfer 10 Gbps or more.
 917
 918
 919 Migration Network
 920 ~~~~~~~~~~~~~~~~~
 921
 922 By default, {pve} uses the network in which cluster communication
 923 takes place to send the migration traffic. This is not optimal because
 924 sensitive cluster traffic can be disrupted and this network may not
 925 have the best bandwidth available on the node.
 926
 927 Setting the migration network parameter allows the use of a dedicated
 928 network for the entire migration traffic. In addition to the memory,
 929 this also affects the storage traffic for offline migrations.
 930
 931 The migration network is set as a network in the CIDR notation. This
 932 has the advantage that you do not have to set individual IP addresses
 933 for each node.  {pve} can determine the real address on the
 934 destination node from the network specified in the CIDR form.  To
 935 enable this, the network must be specified so that each node has one,
 936 but only one IP in the respective network.
 937
 938
 939 Example
 940 ^^^^^^^
 941
 942 We assume that we have a three-node setup with three separate
 943 networks. One for public communication with the Internet, one for
 944 cluster communication and a very fast one, which we want to use as a
 945 dedicated network for migration.
 946
 947 A network configuration for such a setup might look as follows:
 948
 949 ----
 950 iface eno1 inet manual
 951
 952 # public network
 953 auto vmbr0
 954 iface vmbr0 inet static
 955     address 192.X.Y.57
 956     netmask 255.255.250.0
 957     gateway 192.X.Y.1
 958     bridge_ports eno1
 959     bridge_stp off
 960     bridge_fd 0
 961
 962 # cluster network
 963 auto eno2
 964 iface eno2 inet static
 965     address  10.1.1.1
 966     netmask  255.255.255.0
 967
 968 # fast network
 969 auto eno3
 970 iface eno3 inet static
 971     address  10.1.2.1
 972     netmask  255.255.255.0
 973 ----
 974
 975 Here, we will use the network 10.1.2.0/24 as a migration network. For
 976 a single migration, you can do this using the `migration_network`
 977 parameter of the command line tool:
 978
 979 ----
 980 # qm migrate 106 tre --online --migration_network 10.1.2.0/24
 981 ----
 982
 983 To configure this as the default network for all migrations in the
 984 cluster, set the `migration` property of the `/etc/pve/datacenter.cfg`
 985 file:
 986
 987 ----
 988 # use dedicated migration network
 989 migration: secure,network=10.1.2.0/24
 990 ----
 991
 992 NOTE: The migration type must always be set when the migration network
 993 gets set in `/etc/pve/datacenter.cfg`.
 994
 995
 996 ifdef::manvolnum[]
 997 include::pve-copyright.adoc[]
 998 endif::manvolnum[]