pvecm.adoc

   1 ifdef::manvolnum[]
   2 pvecm(1)
   3 ========
   4 :pve-toplevel:
   5
   6 NAME
   7 ----
   8
   9 pvecm - Proxmox VE Cluster Manager
  10
  11 SYNOPSIS
  12 --------
  13
  14 include::pvecm.1-synopsis.adoc[]
  15
  16 DESCRIPTION
  17 -----------
  18 endif::manvolnum[]
  19
  20 ifndef::manvolnum[]
  21 Cluster Manager
  22 ===============
  23 :pve-toplevel:
  24 endif::manvolnum[]
  25
  26 The {PVE} cluster manager `pvecm` is a tool to create a group of
  27 physical servers. Such a group is called a *cluster*. We use the
  28 http://www.corosync.org[Corosync Cluster Engine] for reliable group
  29 communication, and such clusters can consist of up to 32 physical nodes
  30 (probably more, dependent on network latency).
  31
  32 `pvecm` can be used to create a new cluster, join nodes to a cluster,
  33 leave the cluster, get status information and do various other cluster
  34 related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
  35 is used to transparently distribute the cluster configuration to all cluster
  36 nodes.
  37
  38 Grouping nodes into a cluster has the following advantages:
  39
  40 * Centralized, web based management
  41
  42 * Multi-master clusters: each node can do all management task
  43
  44 * `pmxcfs`: database-driven file system for storing configuration files,
  45  replicated in real-time on all nodes using `corosync`.
  46
  47 * Easy migration of virtual machines and containers between physical
  48   hosts
  49
  50 * Fast deployment
  51
  52 * Cluster-wide services like firewall and HA
  53
  54
  55 Requirements
  56 ------------
  57
  58 * All nodes must be in the same network as `corosync` uses IP Multicast
  59  to communicate between nodes (also see
  60  http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
  61  ports 5404 and 5405 for cluster communication.
  62 +
  63 NOTE: Some switches do not support IP multicast by default and must be
  64 manually enabled first.
  65
  66 * Date and time have to be synchronized.
  67
  68 * SSH tunnel on TCP port 22 between nodes is used.
  69
  70 * If you are interested in High Availability, you need to have at
  71   least three nodes for reliable quorum. All nodes should have the
  72   same version.
  73
  74 * We recommend a dedicated NIC for the cluster traffic, especially if
  75   you use shared storage.
  76
  77 NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
  78 Proxmox VE 4.0 cluster nodes.
  79
  80
  81 Preparing Nodes
  82 ---------------
  83
  84 First, install {PVE} on all nodes. Make sure that each node is
  85 installed with the final hostname and IP configuration. Changing the
  86 hostname and IP is not possible after cluster creation.
  87
  88 Currently the cluster creation has to be done on the console, so you
  89 need to login via `ssh`.
  90
  91 Create the Cluster
  92 ------------------
  93
  94 Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
  95 This name cannot be changed later.
  96
  97  hp1# pvecm create YOUR-CLUSTER-NAME
  98
  99 CAUTION: The cluster name is used to compute the default multicast
 100 address. Please use unique cluster names if you run more than one
 101 cluster inside your network.
 102
 103 To check the state of your cluster use:
 104
 105  hp1# pvecm status
 106
 107
 108 Adding Nodes to the Cluster
 109 ---------------------------
 110
 111 Login via `ssh` to the node you want to add.
 112
 113  hp2# pvecm add IP-ADDRESS-CLUSTER
 114
 115 For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
 116
 117 CAUTION: A new node cannot hold any VMs, because you would get
 118 conflicts about identical VM IDs. Also, all existing configuration in
 119 `/etc/pve` is overwritten when you join a new node to the cluster. To
 120 workaround, use `vzdump` to backup and restore to a different VMID after
 121 adding the node to the cluster.
 122
 123 To check the state of cluster:
 124
 125  # pvecm status
 126
 127 .Cluster status after adding 4 nodes
 128 ----
 129 hp2# pvecm status
 130 Quorum information
 131 ~~~~~~~~~~~~~~~~~~
 132 Date:             Mon Apr 20 12:30:13 2015
 133 Quorum provider:  corosync_votequorum
 134 Nodes:            4
 135 Node ID:          0x00000001
 136 Ring ID:          1928
 137 Quorate:          Yes
 138
 139 Votequorum information
 140 ~~~~~~~~~~~~~~~~~~~~~~
 141 Expected votes:   4
 142 Highest expected: 4
 143 Total votes:      4
 144 Quorum:           2
 145 Flags:            Quorate
 146
 147 Membership information
 148 ~~~~~~~~~~~~~~~~~~~~~~
 149     Nodeid      Votes Name
 150 0x00000001          1 192.168.15.91
 151 0x00000002          1 192.168.15.92 (local)
 152 0x00000003          1 192.168.15.93
 153 0x00000004          1 192.168.15.94
 154 ----
 155
 156 If you only want the list of all nodes use:
 157
 158  # pvecm nodes
 159
 160 .List nodes in a cluster
 161 ----
 162 hp2# pvecm nodes
 163
 164 Membership information
 165 ~~~~~~~~~~~~~~~~~~~~~~
 166     Nodeid      Votes Name
 167          1          1 hp1
 168          2          1 hp2 (local)
 169          3          1 hp3
 170          4          1 hp4
 171 ----
 172
 173 Adding Nodes With Separated Cluster Network
 174 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 175
 176 When adding a node to a cluster with a separated cluster network you need to
 177 use the 'ringX_addr' parameters to set the nodes address on those networks:
 178
 179 [source,bash]
 180 ----
 181 pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
 182 ----
 183
 184 If you want to use the Redundant Ring Protocol you will also want to pass the
 185 'ring1_addr' parameter.
 186
 187
 188 Remove a Cluster Node
 189 ---------------------
 190
 191 CAUTION: Read carefully the procedure before proceeding, as it could
 192 not be what you want or need.
 193
 194 Move all virtual machines from the node. Make sure you have no local
 195 data or backups you want to keep, or save them accordingly.
 196
 197 Log in to one remaining node via ssh. Issue a `pvecm nodes` command to
 198 identify the node ID:
 199
 200 ----
 201 hp1# pvecm status
 202
 203 Quorum information
 204 ~~~~~~~~~~~~~~~~~~
 205 Date:             Mon Apr 20 12:30:13 2015
 206 Quorum provider:  corosync_votequorum
 207 Nodes:            4
 208 Node ID:          0x00000001
 209 Ring ID:          1928
 210 Quorate:          Yes
 211
 212 Votequorum information
 213 ~~~~~~~~~~~~~~~~~~~~~~
 214 Expected votes:   4
 215 Highest expected: 4
 216 Total votes:      4
 217 Quorum:           2
 218 Flags:            Quorate
 219
 220 Membership information
 221 ~~~~~~~~~~~~~~~~~~~~~~
 222     Nodeid      Votes Name
 223 0x00000001          1 192.168.15.91 (local)
 224 0x00000002          1 192.168.15.92
 225 0x00000003          1 192.168.15.93
 226 0x00000004          1 192.168.15.94
 227 ----
 228
 229 IMPORTANT: at this point you must power off the node to be removed and
 230 make sure that it will not power on again (in the network) as it
 231 is.
 232
 233 ----
 234 hp1# pvecm nodes
 235
 236 Membership information
 237 ~~~~~~~~~~~~~~~~~~~~~~
 238     Nodeid      Votes Name
 239          1          1 hp1 (local)
 240          2          1 hp2
 241          3          1 hp3
 242          4          1 hp4
 243 ----
 244
 245 Log in to one remaining node via ssh. Issue the delete command (here
 246 deleting node `hp4`):
 247
 248  hp1# pvecm delnode hp4
 249
 250 If the operation succeeds no output is returned, just check the node
 251 list again with `pvecm nodes` or `pvecm status`. You should see
 252 something like:
 253
 254 ----
 255 hp1# pvecm status
 256
 257 Quorum information
 258 ~~~~~~~~~~~~~~~~~~
 259 Date:             Mon Apr 20 12:44:28 2015
 260 Quorum provider:  corosync_votequorum
 261 Nodes:            3
 262 Node ID:          0x00000001
 263 Ring ID:          1992
 264 Quorate:          Yes
 265
 266 Votequorum information
 267 ~~~~~~~~~~~~~~~~~~~~~~
 268 Expected votes:   3
 269 Highest expected: 3
 270 Total votes:      3
 271 Quorum:           3
 272 Flags:            Quorate
 273
 274 Membership information
 275 ~~~~~~~~~~~~~~~~~~~~~~
 276     Nodeid      Votes Name
 277 0x00000001          1 192.168.15.90 (local)
 278 0x00000002          1 192.168.15.91
 279 0x00000003          1 192.168.15.92
 280 ----
 281
 282 IMPORTANT: as said above, it is very important to power off the node
 283 *before* removal, and make sure that it will *never* power on again
 284 (in the existing cluster network) as it is.
 285
 286 If you power on the node as it is, your cluster will be screwed up and
 287 it could be difficult to restore a clean cluster state.
 288
 289 If, for whatever reason, you want that this server joins the same
 290 cluster again, you have to
 291
 292 * reinstall {pve} on it from scratch
 293
 294 * then join it, as explained in the previous section.
 295
 296 [[pvecm_separate_node_without_reinstall]]
 297 Separate A Node Without Reinstalling
 298 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 299
 300 CAUTION: This is *not* the recommended method, proceed with caution. Use the
 301 above mentioned method if you're unsure.
 302
 303 You can also separate a node from a cluster without reinstalling it from
 304 scratch.  But after removing the node from the cluster it will still have
 305 access to the shared storages! This must be resolved before you start removing
 306 the node from the cluster. A {pve} cluster cannot share the exact same
 307 storage with another cluster, as it leads to VMID conflicts.
 308
 309 Its suggested that you create a new storage where only the node which you want
 310 to separate has access. This can be an new export on your NFS or a new Ceph
 311 pool, to name a few examples. Its just important that the exact same storage
 312 does not gets accessed by multiple clusters. After setting this storage up move
 313 all data from the node and its VMs to it. Then you are ready to separate the
 314 node from the cluster.
 315
 316 WARNING: Ensure all shared resources are cleanly separated! You will run into
 317 conflicts and problems else.
 318
 319 First stop the corosync and the pve-cluster services on the node:
 320 [source,bash]
 321 ----
 322 systemctl stop pve-cluster
 323 systemctl stop corosync
 324 ----
 325
 326 Start the cluster filesystem again in local mode:
 327 [source,bash]
 328 ----
 329 pmxcfs -l
 330 ----
 331
 332 Delete the corosync configuration files:
 333 [source,bash]
 334 ----
 335 rm /etc/pve/corosync.conf
 336 rm /etc/corosync/*
 337 ----
 338
 339 You can now start the filesystem again as normal service:
 340 [source,bash]
 341 ----
 342 killall pmxcfs
 343 systemctl start pve-cluster
 344 ----
 345
 346 The node is now separated from the cluster. You can deleted it from a remaining
 347 node of the cluster with:
 348 [source,bash]
 349 ----
 350 pvecm delnode oldnode
 351 ----
 352
 353 If the command failed, because the remaining node in the cluster lost quorum
 354 when the now separate node exited, you may set the expected votes to 1 as a workaround:
 355 [source,bash]
 356 ----
 357 pvecm expected 1
 358 ----
 359
 360 And the repeat the 'pvecm delnode' command.
 361
 362 Now switch back to the separated node, here delete all remaining files left
 363 from the old cluster. This ensures that the node can be added to another
 364 cluster again without problems.
 365
 366 [source,bash]
 367 ----
 368 rm /var/lib/corosync/*
 369 ----
 370
 371 As the configuration files from the other nodes are still in the cluster
 372 filesystem you may want to clean those up too.  Remove simply the whole
 373 directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
 374 you used the correct one before deleting it.
 375
 376 CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
 377 the nodes can still connect to each other with public key authentication. This
 378 should be fixed by removing the respective keys from the
 379 '/etc/pve/priv/authorized_keys' file.
 380
 381 Quorum
 382 ------
 383
 384 {pve} use a quorum-based technique to provide a consistent state among
 385 all cluster nodes.
 386
 387 [quote, from Wikipedia, Quorum (distributed computing)]
 388 ____
 389 A quorum is the minimum number of votes that a distributed transaction
 390 has to obtain in order to be allowed to perform an operation in a
 391 distributed system.
 392 ____
 393
 394 In case of network partitioning, state changes requires that a
 395 majority of nodes are online. The cluster switches to read-only mode
 396 if it loses quorum.
 397
 398 NOTE: {pve} assigns a single vote to each node by default.
 399
 400 Cluster Network
 401 ---------------
 402
 403 The cluster network is the core of a cluster. All messages sent over it have to
 404 be delivered reliable to all nodes in their respective order. In {pve} this
 405 part is done by corosync, an implementation of a high performance low overhead
 406 high availability development toolkit. It serves our decentralized
 407 configuration file system (`pmxcfs`).
 408
 409 [[cluster-network-requirements]]
 410 Network Requirements
 411 ~~~~~~~~~~~~~~~~~~~~
 412 This needs a reliable network with latencies under 2 milliseconds (LAN
 413 performance) to work properly. While corosync can also use unicast for
 414 communication between nodes its **highly recommended** to have a multicast
 415 capable network. The network should not be used heavily by other members,
 416 ideally corosync runs on its own network.
 417 *never* share it with network where storage communicates too.
 418
 419 Before setting up a cluster it is good practice to check if the network is fit
 420 for that purpose.
 421
 422 * Ensure that all nodes are in the same subnet. This must only be true for the
 423   network interfaces used for cluster communication (corosync).
 424
 425 * Ensure all nodes can reach each other over those interfaces, using `ping` is
 426   enough for a basic test.
 427
 428 * Ensure that multicast works in general and a high package rates. This can be
 429   done with the `omping` tool. The final "%loss" number should be < 1%.
 430 [source,bash]
 431 ----
 432 omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
 433 ----
 434
 435 * Ensure that multicast communication works over an extended period of time.
 436   This covers up problems where IGMP snooping is activated on the network but
 437   no multicast querier is active. This test has a duration of around 10
 438   minutes.
 439 [source,bash]
 440 ----
 441 omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
 442 ----
 443
 444 Your network is not ready for clustering if any of these test fails. Recheck
 445 your network configuration. Especially switches are notorious for having
 446 multicast disabled by default or IGMP snooping enabled with no IGMP querier
 447 active.
 448
 449 In smaller cluster its also an option to use unicast if you really cannot get
 450 multicast to work.
 451
 452 Separate Cluster Network
 453 ~~~~~~~~~~~~~~~~~~~~~~~~
 454
 455 When creating a cluster without any parameters the cluster network is generally
 456 shared with the Web UI and the VMs and its traffic. Depending on your setup
 457 even storage traffic may get sent over the same network. Its recommended to
 458 change that, as corosync is a time critical real time application.
 459
 460 Setting Up A New Network
 461 ^^^^^^^^^^^^^^^^^^^^^^^^
 462
 463 First you have to setup a new network interface. It should be on a physical
 464 separate network. Ensure that your network fulfills the
 465 <<cluster-network-requirements,cluster network requirements>>.
 466
 467 Separate On Cluster Creation
 468 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 469
 470 This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
 471 the 'pvecm create' command used for creating a new cluster.
 472
 473 If you have setup a additional NIC with a static address on 10.10.10.1/25
 474 and want to send and receive all cluster communication over this interface
 475 you would execute:
 476
 477 [source,bash]
 478 ----
 479 pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
 480 ----
 481
 482 To check if everything is working properly execute:
 483 [source,bash]
 484 ----
 485 systemctl status corosync
 486 ----
 487
 488 [[separate-cluster-net-after-creation]]
 489 Separate After Cluster Creation
 490 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 491
 492 You can do this also if you have already created a cluster and want to switch
 493 its communication to another network, without rebuilding the whole cluster.
 494 This change may lead to short durations of quorum loss in the cluster, as nodes
 495 have to restart corosync and come up one after the other on the new network.
 496
 497 Check how to <<edit-corosync-conf,edit the corosync.conf file>> first.
 498 The open it and you should see a file similar to:
 499
 500 ----
 501 logging {
 502   debug: off
 503   to_syslog: yes
 504 }
 505
 506 nodelist {
 507
 508   node {
 509     name: due
 510     nodeid: 2
 511     quorum_votes: 1
 512     ring0_addr: due
 513   }
 514
 515   node {
 516     name: tre
 517     nodeid: 3
 518     quorum_votes: 1
 519     ring0_addr: tre
 520   }
 521
 522   node {
 523     name: uno
 524     nodeid: 1
 525     quorum_votes: 1
 526     ring0_addr: uno
 527   }
 528
 529 }
 530
 531 quorum {
 532   provider: corosync_votequorum
 533 }
 534
 535 totem {
 536   cluster_name: thomas-testcluster
 537   config_version: 3
 538   ip_version: ipv4
 539   secauth: on
 540   version: 2
 541   interface {
 542     bindnetaddr: 192.168.30.50
 543     ringnumber: 0
 544   }
 545
 546 }
 547 ----
 548
 549 The first you want to do is add the 'name' properties in the node entries if
 550 you do not see them already. Those *must* match the node name.
 551
 552 Then replace the address from the 'ring0_addr' properties with the new
 553 addresses.  You may use plain IP addresses or also hostnames here. If you use
 554 hostnames ensure that they are resolvable from all nodes.
 555
 556 In my example I want to switch my cluster communication to the 10.10.10.1/25
 557 network. So I replace all 'ring0_addr' respectively. I also set the bindetaddr
 558 in the totem section of the config to an address of the new network. It can be
 559 any address from the subnet configured on the new network interface.
 560
 561 After you increased the 'config_version' property the new configuration file
 562 should look like:
 563
 564 ----
 565
 566 logging {
 567   debug: off
 568   to_syslog: yes
 569 }
 570
 571 nodelist {
 572
 573   node {
 574     name: due
 575     nodeid: 2
 576     quorum_votes: 1
 577     ring0_addr: 10.10.10.2
 578   }
 579
 580   node {
 581     name: tre
 582     nodeid: 3
 583     quorum_votes: 1
 584     ring0_addr: 10.10.10.3
 585   }
 586
 587   node {
 588     name: uno
 589     nodeid: 1
 590     quorum_votes: 1
 591     ring0_addr: 10.10.10.1
 592   }
 593
 594 }
 595
 596 quorum {
 597   provider: corosync_votequorum
 598 }
 599
 600 totem {
 601   cluster_name: thomas-testcluster
 602   config_version: 4
 603   ip_version: ipv4
 604   secauth: on
 605   version: 2
 606   interface {
 607     bindnetaddr: 10.10.10.1
 608     ringnumber: 0
 609   }
 610
 611 }
 612 ----
 613
 614 Now after a final check whether all changed information is correct we save it
 615 and see again the <<edit-corosync-conf,edit corosync.conf file>> section to
 616 learn how to bring it in effect.
 617
 618 As our change cannot be enforced live from corosync we have to do an restart.
 619
 620 On a single node execute:
 621 [source,bash]
 622 ----
 623 systemctl restart corosync
 624 ----
 625
 626 Now check if everything is fine:
 627
 628 [source,bash]
 629 ----
 630 systemctl status corosync
 631 ----
 632
 633 If corosync runs again correct restart corosync also on all other nodes.
 634 They will then join the cluster membership one by one on the new network.
 635
 636 Redundant Ring Protocol
 637 ~~~~~~~~~~~~~~~~~~~~~~~
 638 To avoid a single point of failure you should implement counter measurements.
 639 This can be on the hardware and operating system level through network bonding.
 640
 641 Corosync itself offers also a possibility to add redundancy through the so
 642 called 'Redundant Ring Protocol'. This protocol allows running a second totem
 643 ring on another network, this network should be physically separated from the
 644 other rings network to actually increase availability.
 645
 646 RRP On Cluster Creation
 647 ~~~~~~~~~~~~~~~~~~~~~~~
 648
 649 The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
 650 'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
 651
 652 NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means.
 653
 654 So if you have two networks, one on the 10.10.10.1/24 and the other on the
 655 10.10.20.1/24 subnet you would execute:
 656
 657 [source,bash]
 658 ----
 659 pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
 660 -bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
 661 ----
 662
 663 RRP On A Created Cluster
 664 ~~~~~~~~~~~~~~~~~~~~~~~~
 665
 666 When enabling an already running cluster to use RRP you will take similar steps
 667 as describe in
 668 <<separate-cluster-net-after-creation,separating the cluster network>>. You
 669 just do it on another ring.
 670
 671 First add a new `interface` subsection in the `totem` section, set its
 672 `ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
 673 address of the subnet you have configured for your new ring.
 674 Further set the `rrp_mode` to `passive`, this is the only stable mode.
 675
 676 Then add to each node entry in the `nodelist` section its new `ring1_addr`
 677 property with the nodes additional ring address.
 678
 679 So if you have two networks, one on the 10.10.10.1/24 and the other on the
 680 10.10.20.1/24 subnet, the final configuration file should look like:
 681
 682 ----
 683 totem {
 684   cluster_name: tweak
 685   config_version: 9
 686   ip_version: ipv4
 687   rrp_mode: passive
 688   secauth: on
 689   version: 2
 690   interface {
 691     bindnetaddr: 10.10.10.1
 692     ringnumber: 0
 693   }
 694   interface {
 695     bindnetaddr: 10.10.20.1
 696     ringnumber: 1
 697   }
 698 }
 699
 700 nodelist {
 701   node {
 702     name: pvecm1
 703     nodeid: 1
 704     quorum_votes: 1
 705     ring0_addr: 10.10.10.1
 706     ring1_addr: 10.10.20.1
 707   }
 708
 709  node {
 710     name: pvecm2
 711     nodeid: 2
 712     quorum_votes: 1
 713     ring0_addr: 10.10.10.2
 714     ring1_addr: 10.10.20.2
 715   }
 716
 717   [...] # other cluster nodes here
 718 }
 719
 720 [...] # other remaining config sections here
 721
 722 ----
 723
 724 Bring it in effect like described in the
 725 <<edit-corosync-conf,edit the corosync.conf file>> section.
 726
 727 This is a change which cannot take live in effect and needs at least a restart
 728 of corosync. Recommended is a restart of the whole cluster.
 729
 730 If you cannot reboot the whole cluster ensure no High Availability services are
 731 configured and the stop the corosync service on all nodes. After corosync is
 732 stopped on all nodes start it one after the other again.
 733
 734 Corosync Configuration
 735 ----------------------
 736
 737 The `/ect/pve/corosync.conf` file plays a central role in {pve} cluster. It
 738 controls the cluster member ship and its network.
 739 For reading more about it check the corosync.conf man page:
 740 [source,bash]
 741 ----
 742 man corosync.conf
 743 ----
 744
 745 For node membership you should always use the `pvecm` tool provided by {pve}.
 746 You may have to edit the configuration file manually for other changes.
 747 Here are a few best practice tips for doing this.
 748
 749 [[edit-corosync-conf]]
 750 Edit corosync.conf
 751 ~~~~~~~~~~~~~~~~~~
 752
 753 Editing the corosync.conf file can be not always straight forward. There are
 754 two on each cluster, one in `/etc/pve/corosync.conf` and the other in
 755 `/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
 756 propagate the changes to the local one, but not vice versa.
 757
 758 The configuration will get updated automatically as soon as the file changes.
 759 This means changes which can be integrated in a running corosync will take
 760 instantly effect. So you should always make a copy and edit that instead, to
 761 avoid triggering some unwanted changes by an in between safe.
 762
 763 [source,bash]
 764 ----
 765 cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
 766 ----
 767
 768 Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
 769 preinstalled on {pve} for example.
 770
 771 NOTE: Always increment the 'config_version' number on configuration changes,
 772 omitting this can lead to problems.
 773
 774 After making the necessary changes create another copy of the current working
 775 configuration file. This serves as a backup if the new configuration fails to
 776 apply or makes problems in other ways.
 777
 778 [source,bash]
 779 ----
 780 cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
 781 ----
 782
 783 Then move the new configuration file over the old one:
 784 [source,bash]
 785 ----
 786 mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
 787 ----
 788
 789 You may check with the commands
 790 [source,bash]
 791 ----
 792 systemctl status corosync
 793 journalctl -b -u corosync
 794 ----
 795
 796 If the change could applied automatically. If not you may have to restart the
 797 corosync service via:
 798 [source,bash]
 799 ----
 800 systemctl restart corosync
 801 ----
 802
 803 On errors check the troubleshooting section below.
 804
 805 Troubleshooting
 806 ~~~~~~~~~~~~~~~
 807
 808 Issue: 'quorum.expected_votes must be configured'
 809 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 810
 811 When corosync starts to fail and you get the following message in the system log:
 812
 813 ----
 814 [...]
 815 corosync[1647]:  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
 816 corosync[1647]:  [SERV  ] Service engine 'corosync_quorum' failed to load for reason
 817     'configuration error: nodelist or quorum.expected_votes must be configured!'
 818 [...]
 819 ----
 820
 821 It means that the hostname you set for corosync 'ringX_addr' in the
 822 configuration could not be resolved.
 823
 824
 825 Write Configuration When Not Quorate
 826 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 827
 828 If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you
 829 know what you do, use:
 830 [source,bash]
 831 ----
 832 pvecm expected 1
 833 ----
 834
 835 This sets the expected vote count to 1 and makes the cluster quorate. You can
 836 now fix your configuration, or revert it back to the last working backup.
 837
 838 This is not enough if corosync cannot start anymore. Here its best to edit the
 839 local copy of the corosync configuration in '/etc/corosync/corosync.conf' so
 840 that corosync can start again. Ensure that on all nodes this configuration has
 841 the same content to avoid split brains. If you are not sure what went wrong
 842 it's best to ask the Proxmox Community to help you.
 843
 844
 845 [[corosync-conf-glossary]]
 846 Corosync Configuration Glossary
 847 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 848
 849 ringX_addr::
 850 This names the different ring addresses for the corosync totem rings used for
 851 the cluster communication.
 852
 853 bindnetaddr::
 854 Defines to which interface the ring should bind to. It may be any address of
 855 the subnet configured on the interface we want to use. In general its the
 856 recommended to just use an address a node uses on this interface.
 857
 858 rrp_mode::
 859 Specifies the mode of the redundant ring protocol and may be passive, active or
 860 none. Note that use of active is highly experimental and not official
 861 supported. Passive is the preferred mode, it may double the cluster
 862 communication throughput and increases availability.
 863
 864
 865 Cluster Cold Start
 866 ------------------
 867
 868 It is obvious that a cluster is not quorate when all nodes are
 869 offline. This is a common case after a power failure.
 870
 871 NOTE: It is always a good idea to use an uninterruptible power supply
 872 (``UPS'', also called ``battery backup'') to avoid this state, especially if
 873 you want HA.
 874
 875 On node startup, service `pve-manager` is started and waits for
 876 quorum. Once quorate, it starts all guests which have the `onboot`
 877 flag set.
 878
 879 When you turn on nodes, or when power comes back after power failure,
 880 it is likely that some nodes boots faster than others. Please keep in
 881 mind that guest startup is delayed until you reach quorum.
 882
 883
 884 Guest Migration
 885 ---------------
 886
 887 Migrating virtual guests to other nodes is a useful feature in a
 888 cluster. There are settings to control the behavior of such
 889 migrations. This can be done via the configuration file
 890 `datacenter.cfg` or for a specific migration via API or command line
 891 parameters.
 892
 893
 894 Migration Type
 895 ~~~~~~~~~~~~~~
 896
 897 The migration type defines if the migration data should be sent over a
 898 encrypted (`secure`) channel or an unencrypted (`insecure`) one.
 899 Setting the migration type to insecure means that the RAM content of a
 900 virtual guest gets also transfered unencrypted, which can lead to
 901 information disclosure of critical data from inside the guest (for
 902 example passwords or encryption keys).
 903
 904 Therefore, we strongly recommend using the secure channel if you do
 905 not have full control over the network and can not guarantee that no
 906 one is eavesdropping to it.
 907
 908 NOTE: Storage migration does not follow this setting. Currently, it
 909 always sends the storage content over a secure channel.
 910
 911 Encryption requires a lot of computing power, so this setting is often
 912 changed to "unsafe" to achieve better performance. The impact on
 913 modern systems is lower because they implement AES encryption in
 914 hardware. The performance impact is particularly evident in fast
 915 networks where you can transfer 10 Gbps or more.
 916
 917
 918 Migration Network
 919 ~~~~~~~~~~~~~~~~~
 920
 921 By default {pve} uses the network where the cluster communication happens
 922 for sending the migration traffic. This is may be suboptimal, for one the
 923 sensible cluster traffic can be disturbed and on the other hand it may not
 924 have the best bandwidth available from all network interfaces on the node.
 925
 926 Setting the migration network parameter allows using a dedicated network for
 927 sending all the migration traffic when migrating a guest system. This
 928 includes the traffic for offline storage migrations.
 929
 930 The migration network is represented as a network in 'CIDR' notation. This
 931 has the advantage that you do not need to set a IP for each node, {pve} is
 932 able to figure out the real address from the given CIDR denoted network and
 933 the networks configured on the target node.
 934 To let this work the network must be specific enough, i.e. each node must
 935 have one and only one IP configured in the given network.
 936
 937 Example
 938 ^^^^^^^
 939
 940 Lets assume that we have a three node setup with three networks, one for the
 941 public communication with the Internet, one for the cluster communication
 942 and one very fast one, which we want to use as an dedicated migration
 943 network. A network configuration for such a setup could look like:
 944
 945 ----
 946 iface eth0 inet manual
 947
 948 # public network
 949 auto vmbr0
 950 iface vmbr0 inet static
 951     address 192.X.Y.57
 952     netmask 255.255.250.0
 953     gateway 192.X.Y.1
 954     bridge_ports eth0
 955     bridge_stp off
 956     bridge_fd 0
 957
 958 # cluster network
 959 auto eth1
 960 iface eth1 inet static
 961     address  10.1.1.1
 962     netmask  255.255.255.0
 963
 964 # fast network
 965 auto eth2
 966 iface eth2 inet static
 967     address  10.1.2.1
 968     netmask  255.255.255.0
 969
 970 # [...]
 971 ----
 972
 973 Here we want to use the 10.1.2.0/24 network as migration network.
 974 For a single migration you can achieve this by using the 'migration_network'
 975 parameter:
 976 ----
 977 # qm migrate 106 tre --online --migration_network 10.1.2.0/24
 978 ----
 979
 980 To set this up as default network for all migrations cluster wide you can use
 981 the migration property in '/etc/pve/datacenter.cfg':
 982 ----
 983 # [...]
 984 migration: secure,network=10.1.2.0/24
 985 ----
 986
 987 Note that the migration type must be always set if the network gets set.
 988
 989 ifdef::manvolnum[]
 990 include::pve-copyright.adoc[]
 991 endif::manvolnum[]