pvecm.adoc

   1 [[chapter_pvecm]]
   2 ifdef::manvolnum[]
   3 pvecm(1)
   4 ========
   5 :pve-toplevel:
   6
   7 NAME
   8 ----
   9
  10 pvecm - Proxmox VE Cluster Manager
  11
  12 SYNOPSIS
  13 --------
  14
  15 include::pvecm.1-synopsis.adoc[]
  16
  17 DESCRIPTION
  18 -----------
  19 endif::manvolnum[]
  20
  21 ifndef::manvolnum[]
  22 Cluster Manager
  23 ===============
  24 :pve-toplevel:
  25 endif::manvolnum[]
  26
  27 The {PVE} cluster manager `pvecm` is a tool to create a group of
  28 physical servers. Such a group is called a *cluster*. We use the
  29 http://www.corosync.org[Corosync Cluster Engine] for reliable group
  30 communication, and such clusters can consist of up to 32 physical nodes
  31 (probably more, dependent on network latency).
  32
  33 `pvecm` can be used to create a new cluster, join nodes to a cluster,
  34 leave the cluster, get status information and do various other cluster
  35 related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
  36 is used to transparently distribute the cluster configuration to all cluster
  37 nodes.
  38
  39 Grouping nodes into a cluster has the following advantages:
  40
  41 * Centralized, web based management
  42
  43 * Multi-master clusters: each node can do all management task
  44
  45 * `pmxcfs`: database-driven file system for storing configuration files,
  46  replicated in real-time on all nodes using `corosync`.
  47
  48 * Easy migration of virtual machines and containers between physical
  49   hosts
  50
  51 * Fast deployment
  52
  53 * Cluster-wide services like firewall and HA
  54
  55
  56 Requirements
  57 ------------
  58
  59 * All nodes must be in the same network as `corosync` uses IP Multicast
  60  to communicate between nodes (also see
  61  http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
  62  ports 5404 and 5405 for cluster communication.
  63 +
  64 NOTE: Some switches do not support IP multicast by default and must be
  65 manually enabled first.
  66
  67 * Date and time have to be synchronized.
  68
  69 * SSH tunnel on TCP port 22 between nodes is used.
  70
  71 * If you are interested in High Availability, you need to have at
  72   least three nodes for reliable quorum. All nodes should have the
  73   same version.
  74
  75 * We recommend a dedicated NIC for the cluster traffic, especially if
  76   you use shared storage.
  77
  78 * Root password of a cluster node is required for adding nodes.
  79
  80 NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
  81 Proxmox VE 4.0 cluster nodes.
  82
  83
  84 Preparing Nodes
  85 ---------------
  86
  87 First, install {PVE} on all nodes. Make sure that each node is
  88 installed with the final hostname and IP configuration. Changing the
  89 hostname and IP is not possible after cluster creation.
  90
  91 Currently the cluster creation can either be done on the console (login via
  92 `ssh`) or the API, which we have a GUI implementation for (__Datacenter ->
  93 Cluster__).
  94
  95 [[pvecm_create_cluster]]
  96 Create the Cluster
  97 ------------------
  98
  99 Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
 100 This name cannot be changed later. The cluster name follows the same rules as node names.
 101
 102  hp1# pvecm create YOUR-CLUSTER-NAME
 103
 104 CAUTION: The cluster name is used to compute the default multicast
 105 address. Please use unique cluster names if you run more than one
 106 cluster inside your network.
 107
 108 To check the state of your cluster use:
 109
 110  hp1# pvecm status
 111
 112 Multiple Clusters In Same Network
 113 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 114
 115 It is possible to create multiple clusters in the same physical or logical
 116 network. Each cluster must have a unique name, which is used to generate the
 117 cluster's multicast group address. As long as no duplicate cluster names are
 118 configured in one network segment, the different clusters won't interfere with
 119 each other.
 120
 121 If multiple clusters operate in a single network it may be beneficial to setup
 122 an IGMP querier and enable IGMP Snooping in said network. This may reduce the
 123 load of the network significantly because multicast packets are only delivered
 124 to endpoints of the respective member nodes.
 125
 126
 127 [[pvecm_join_node_to_cluster]]
 128 Adding Nodes to the Cluster
 129 ---------------------------
 130
 131 Login via `ssh` to the node you want to add.
 132
 133  hp2# pvecm add IP-ADDRESS-CLUSTER
 134
 135 For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
 136
 137 CAUTION: A new node cannot hold any VMs, because you would get
 138 conflicts about identical VM IDs. Also, all existing configuration in
 139 `/etc/pve` is overwritten when you join a new node to the cluster. To
 140 workaround, use `vzdump` to backup and restore to a different VMID after
 141 adding the node to the cluster.
 142
 143 To check the state of cluster:
 144
 145  # pvecm status
 146
 147 .Cluster status after adding 4 nodes
 148 ----
 149 hp2# pvecm status
 150 Quorum information
 151 ~~~~~~~~~~~~~~~~~~
 152 Date:             Mon Apr 20 12:30:13 2015
 153 Quorum provider:  corosync_votequorum
 154 Nodes:            4
 155 Node ID:          0x00000001
 156 Ring ID:          1928
 157 Quorate:          Yes
 158
 159 Votequorum information
 160 ~~~~~~~~~~~~~~~~~~~~~~
 161 Expected votes:   4
 162 Highest expected: 4
 163 Total votes:      4
 164 Quorum:           2
 165 Flags:            Quorate
 166
 167 Membership information
 168 ~~~~~~~~~~~~~~~~~~~~~~
 169     Nodeid      Votes Name
 170 0x00000001          1 192.168.15.91
 171 0x00000002          1 192.168.15.92 (local)
 172 0x00000003          1 192.168.15.93
 173 0x00000004          1 192.168.15.94
 174 ----
 175
 176 If you only want the list of all nodes use:
 177
 178  # pvecm nodes
 179
 180 .List nodes in a cluster
 181 ----
 182 hp2# pvecm nodes
 183
 184 Membership information
 185 ~~~~~~~~~~~~~~~~~~~~~~
 186     Nodeid      Votes Name
 187          1          1 hp1
 188          2          1 hp2 (local)
 189          3          1 hp3
 190          4          1 hp4
 191 ----
 192
 193 [[adding-nodes-with-separated-cluster-network]]
 194 Adding Nodes With Separated Cluster Network
 195 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 196
 197 When adding a node to a cluster with a separated cluster network you need to
 198 use the 'ringX_addr' parameters to set the nodes address on those networks:
 199
 200 [source,bash]
 201 ----
 202 pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
 203 ----
 204
 205 If you want to use the Redundant Ring Protocol you will also want to pass the
 206 'ring1_addr' parameter.
 207
 208
 209 Remove a Cluster Node
 210 ---------------------
 211
 212 CAUTION: Read carefully the procedure before proceeding, as it could
 213 not be what you want or need.
 214
 215 Move all virtual machines from the node. Make sure you have no local
 216 data or backups you want to keep, or save them accordingly.
 217 In the following example we will remove the node hp4 from the cluster.
 218
 219 Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
 220 command to identify the node ID to remove:
 221
 222 ----
 223 hp1# pvecm nodes
 224
 225 Membership information
 226 ~~~~~~~~~~~~~~~~~~~~~~
 227     Nodeid      Votes Name
 228          1          1 hp1 (local)
 229          2          1 hp2
 230          3          1 hp3
 231          4          1 hp4
 232 ----
 233
 234
 235 At this point you must power off hp4 and
 236 make sure that it will not power on again (in the network) as it
 237 is.
 238
 239 IMPORTANT: As said above, it is critical to power off the node
 240 *before* removal, and make sure that it will *never* power on again
 241 (in the existing cluster network) as it is.
 242 If you power on the node as it is, your cluster will be screwed up and
 243 it could be difficult to restore a clean cluster state.
 244
 245 After powering off the node hp4, we can safely remove it from the cluster.
 246
 247  hp1# pvecm delnode hp4
 248
 249 If the operation succeeds no output is returned, just check the node
 250 list again with `pvecm nodes` or `pvecm status`. You should see
 251 something like:
 252
 253 ----
 254 hp1# pvecm status
 255
 256 Quorum information
 257 ~~~~~~~~~~~~~~~~~~
 258 Date:             Mon Apr 20 12:44:28 2015
 259 Quorum provider:  corosync_votequorum
 260 Nodes:            3
 261 Node ID:          0x00000001
 262 Ring ID:          1992
 263 Quorate:          Yes
 264
 265 Votequorum information
 266 ~~~~~~~~~~~~~~~~~~~~~~
 267 Expected votes:   3
 268 Highest expected: 3
 269 Total votes:      3
 270 Quorum:           3
 271 Flags:            Quorate
 272
 273 Membership information
 274 ~~~~~~~~~~~~~~~~~~~~~~
 275     Nodeid      Votes Name
 276 0x00000001          1 192.168.15.90 (local)
 277 0x00000002          1 192.168.15.91
 278 0x00000003          1 192.168.15.92
 279 ----
 280
 281 If, for whatever reason, you want that this server joins the same
 282 cluster again, you have to
 283
 284 * reinstall {pve} on it from scratch
 285
 286 * then join it, as explained in the previous section.
 287
 288 [[pvecm_separate_node_without_reinstall]]
 289 Separate A Node Without Reinstalling
 290 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 291
 292 CAUTION: This is *not* the recommended method, proceed with caution. Use the
 293 above mentioned method if you're unsure.
 294
 295 You can also separate a node from a cluster without reinstalling it from
 296 scratch.  But after removing the node from the cluster it will still have
 297 access to the shared storages! This must be resolved before you start removing
 298 the node from the cluster. A {pve} cluster cannot share the exact same
 299 storage with another cluster, as storage locking doesn't work over cluster
 300 boundary. Further, it may also lead to VMID conflicts.
 301
 302 Its suggested that you create a new storage where only the node which you want
 303 to separate has access. This can be an new export on your NFS or a new Ceph
 304 pool, to name a few examples. Its just important that the exact same storage
 305 does not gets accessed by multiple clusters. After setting this storage up move
 306 all data from the node and its VMs to it. Then you are ready to separate the
 307 node from the cluster.
 308
 309 WARNING: Ensure all shared resources are cleanly separated! You will run into
 310 conflicts and problems else.
 311
 312 First stop the corosync and the pve-cluster services on the node:
 313 [source,bash]
 314 ----
 315 systemctl stop pve-cluster
 316 systemctl stop corosync
 317 ----
 318
 319 Start the cluster filesystem again in local mode:
 320 [source,bash]
 321 ----
 322 pmxcfs -l
 323 ----
 324
 325 Delete the corosync configuration files:
 326 [source,bash]
 327 ----
 328 rm /etc/pve/corosync.conf
 329 rm /etc/corosync/*
 330 ----
 331
 332 You can now start the filesystem again as normal service:
 333 [source,bash]
 334 ----
 335 killall pmxcfs
 336 systemctl start pve-cluster
 337 ----
 338
 339 The node is now separated from the cluster. You can deleted it from a remaining
 340 node of the cluster with:
 341 [source,bash]
 342 ----
 343 pvecm delnode oldnode
 344 ----
 345
 346 If the command failed, because the remaining node in the cluster lost quorum
 347 when the now separate node exited, you may set the expected votes to 1 as a workaround:
 348 [source,bash]
 349 ----
 350 pvecm expected 1
 351 ----
 352
 353 And the repeat the 'pvecm delnode' command.
 354
 355 Now switch back to the separated node, here delete all remaining files left
 356 from the old cluster. This ensures that the node can be added to another
 357 cluster again without problems.
 358
 359 [source,bash]
 360 ----
 361 rm /var/lib/corosync/*
 362 ----
 363
 364 As the configuration files from the other nodes are still in the cluster
 365 filesystem you may want to clean those up too.  Remove simply the whole
 366 directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
 367 you used the correct one before deleting it.
 368
 369 CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
 370 the nodes can still connect to each other with public key authentication. This
 371 should be fixed by removing the respective keys from the
 372 '/etc/pve/priv/authorized_keys' file.
 373
 374 Quorum
 375 ------
 376
 377 {pve} use a quorum-based technique to provide a consistent state among
 378 all cluster nodes.
 379
 380 [quote, from Wikipedia, Quorum (distributed computing)]
 381 ____
 382 A quorum is the minimum number of votes that a distributed transaction
 383 has to obtain in order to be allowed to perform an operation in a
 384 distributed system.
 385 ____
 386
 387 In case of network partitioning, state changes requires that a
 388 majority of nodes are online. The cluster switches to read-only mode
 389 if it loses quorum.
 390
 391 NOTE: {pve} assigns a single vote to each node by default.
 392
 393 Cluster Network
 394 ---------------
 395
 396 The cluster network is the core of a cluster. All messages sent over it have to
 397 be delivered reliable to all nodes in their respective order. In {pve} this
 398 part is done by corosync, an implementation of a high performance low overhead
 399 high availability development toolkit. It serves our decentralized
 400 configuration file system (`pmxcfs`).
 401
 402 [[cluster-network-requirements]]
 403 Network Requirements
 404 ~~~~~~~~~~~~~~~~~~~~
 405 This needs a reliable network with latencies under 2 milliseconds (LAN
 406 performance) to work properly. While corosync can also use unicast for
 407 communication between nodes its **highly recommended** to have a multicast
 408 capable network. The network should not be used heavily by other members,
 409 ideally corosync runs on its own network.
 410 *never* share it with network where storage communicates too.
 411
 412 Before setting up a cluster it is good practice to check if the network is fit
 413 for that purpose.
 414
 415 * Ensure that all nodes are in the same subnet. This must only be true for the
 416   network interfaces used for cluster communication (corosync).
 417
 418 * Ensure all nodes can reach each other over those interfaces, using `ping` is
 419   enough for a basic test.
 420
 421 * Ensure that multicast works in general and a high package rates. This can be
 422   done with the `omping` tool. The final "%loss" number should be < 1%.
 423 +
 424 [source,bash]
 425 ----
 426 omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
 427 ----
 428
 429 * Ensure that multicast communication works over an extended period of time.
 430   This uncovers problems where IGMP snooping is activated on the network but
 431   no multicast querier is active. This test has a duration of around 10
 432   minutes.
 433 +
 434 [source,bash]
 435 ----
 436 omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
 437 ----
 438
 439 Your network is not ready for clustering if any of these test fails. Recheck
 440 your network configuration. Especially switches are notorious for having
 441 multicast disabled by default or IGMP snooping enabled with no IGMP querier
 442 active.
 443
 444 In smaller cluster its also an option to use unicast if you really cannot get
 445 multicast to work.
 446
 447 Separate Cluster Network
 448 ~~~~~~~~~~~~~~~~~~~~~~~~
 449
 450 When creating a cluster without any parameters the cluster network is generally
 451 shared with the Web UI and the VMs and its traffic. Depending on your setup
 452 even storage traffic may get sent over the same network. Its recommended to
 453 change that, as corosync is a time critical real time application.
 454
 455 Setting Up A New Network
 456 ^^^^^^^^^^^^^^^^^^^^^^^^
 457
 458 First you have to setup a new network interface. It should be on a physical
 459 separate network. Ensure that your network fulfills the
 460 <<cluster-network-requirements,cluster network requirements>>.
 461
 462 Separate On Cluster Creation
 463 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 464
 465 This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
 466 the 'pvecm create' command used for creating a new cluster.
 467
 468 If you have setup an additional NIC with a static address on 10.10.10.1/25
 469 and want to send and receive all cluster communication over this interface
 470 you would execute:
 471
 472 [source,bash]
 473 ----
 474 pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
 475 ----
 476
 477 To check if everything is working properly execute:
 478 [source,bash]
 479 ----
 480 systemctl status corosync
 481 ----
 482
 483 Afterwards, proceed as descripted in the section to
 484 <<adding-nodes-with-separated-cluster-network,add nodes with a separated cluster network>>.
 485
 486 [[separate-cluster-net-after-creation]]
 487 Separate After Cluster Creation
 488 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 489
 490 You can do this also if you have already created a cluster and want to switch
 491 its communication to another network, without rebuilding the whole cluster.
 492 This change may lead to short durations of quorum loss in the cluster, as nodes
 493 have to restart corosync and come up one after the other on the new network.
 494
 495 Check how to <<edit-corosync-conf,edit the corosync.conf file>> first.
 496 The open it and you should see a file similar to:
 497
 498 ----
 499 logging {
 500   debug: off
 501   to_syslog: yes
 502 }
 503
 504 nodelist {
 505
 506   node {
 507     name: due
 508     nodeid: 2
 509     quorum_votes: 1
 510     ring0_addr: due
 511   }
 512
 513   node {
 514     name: tre
 515     nodeid: 3
 516     quorum_votes: 1
 517     ring0_addr: tre
 518   }
 519
 520   node {
 521     name: uno
 522     nodeid: 1
 523     quorum_votes: 1
 524     ring0_addr: uno
 525   }
 526
 527 }
 528
 529 quorum {
 530   provider: corosync_votequorum
 531 }
 532
 533 totem {
 534   cluster_name: thomas-testcluster
 535   config_version: 3
 536   ip_version: ipv4
 537   secauth: on
 538   version: 2
 539   interface {
 540     bindnetaddr: 192.168.30.50
 541     ringnumber: 0
 542   }
 543
 544 }
 545 ----
 546
 547 The first you want to do is add the 'name' properties in the node entries if
 548 you do not see them already. Those *must* match the node name.
 549
 550 Then replace the address from the 'ring0_addr' properties with the new
 551 addresses.  You may use plain IP addresses or also hostnames here. If you use
 552 hostnames ensure that they are resolvable from all nodes.
 553
 554 In my example I want to switch my cluster communication to the 10.10.10.1/25
 555 network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr
 556 in the totem section of the config to an address of the new network. It can be
 557 any address from the subnet configured on the new network interface.
 558
 559 After you increased the 'config_version' property the new configuration file
 560 should look like:
 561
 562 ----
 563
 564 logging {
 565   debug: off
 566   to_syslog: yes
 567 }
 568
 569 nodelist {
 570
 571   node {
 572     name: due
 573     nodeid: 2
 574     quorum_votes: 1
 575     ring0_addr: 10.10.10.2
 576   }
 577
 578   node {
 579     name: tre
 580     nodeid: 3
 581     quorum_votes: 1
 582     ring0_addr: 10.10.10.3
 583   }
 584
 585   node {
 586     name: uno
 587     nodeid: 1
 588     quorum_votes: 1
 589     ring0_addr: 10.10.10.1
 590   }
 591
 592 }
 593
 594 quorum {
 595   provider: corosync_votequorum
 596 }
 597
 598 totem {
 599   cluster_name: thomas-testcluster
 600   config_version: 4
 601   ip_version: ipv4
 602   secauth: on
 603   version: 2
 604   interface {
 605     bindnetaddr: 10.10.10.1
 606     ringnumber: 0
 607   }
 608
 609 }
 610 ----
 611
 612 Now after a final check whether all changed information is correct we save it
 613 and see again the <<edit-corosync-conf,edit corosync.conf file>> section to
 614 learn how to bring it in effect.
 615
 616 As our change cannot be enforced live from corosync we have to do an restart.
 617
 618 On a single node execute:
 619 [source,bash]
 620 ----
 621 systemctl restart corosync
 622 ----
 623
 624 Now check if everything is fine:
 625
 626 [source,bash]
 627 ----
 628 systemctl status corosync
 629 ----
 630
 631 If corosync runs again correct restart corosync also on all other nodes.
 632 They will then join the cluster membership one by one on the new network.
 633
 634 [[pvecm_rrp]]
 635 Redundant Ring Protocol
 636 ~~~~~~~~~~~~~~~~~~~~~~~
 637 To avoid a single point of failure you should implement counter measurements.
 638 This can be on the hardware and operating system level through network bonding.
 639
 640 Corosync itself offers also a possibility to add redundancy through the so
 641 called 'Redundant Ring Protocol'. This protocol allows running a second totem
 642 ring on another network, this network should be physically separated from the
 643 other rings network to actually increase availability.
 644
 645 RRP On Cluster Creation
 646 ~~~~~~~~~~~~~~~~~~~~~~~
 647
 648 The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
 649 'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
 650
 651 NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means.
 652
 653 So if you have two networks, one on the 10.10.10.1/24 and the other on the
 654 10.10.20.1/24 subnet you would execute:
 655
 656 [source,bash]
 657 ----
 658 pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
 659 -bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
 660 ----
 661
 662 RRP On Existing Clusters
 663 ~~~~~~~~~~~~~~~~~~~~~~~~
 664
 665 You will take similar steps as described in
 666 <<separate-cluster-net-after-creation,separating the cluster network>> to
 667 enable RRP on an already running cluster. The single difference is, that you
 668 will add `ring1` and use it instead of `ring0`.
 669
 670 First add a new `interface` subsection in the `totem` section, set its
 671 `ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
 672 address of the subnet you have configured for your new ring.
 673 Further set the `rrp_mode` to `passive`, this is the only stable mode.
 674
 675 Then add to each node entry in the `nodelist` section its new `ring1_addr`
 676 property with the nodes additional ring address.
 677
 678 So if you have two networks, one on the 10.10.10.1/24 and the other on the
 679 10.10.20.1/24 subnet, the final configuration file should look like:
 680
 681 ----
 682 totem {
 683   cluster_name: tweak
 684   config_version: 9
 685   ip_version: ipv4
 686   rrp_mode: passive
 687   secauth: on
 688   version: 2
 689   interface {
 690     bindnetaddr: 10.10.10.1
 691     ringnumber: 0
 692   }
 693   interface {
 694     bindnetaddr: 10.10.20.1
 695     ringnumber: 1
 696   }
 697 }
 698
 699 nodelist {
 700   node {
 701     name: pvecm1
 702     nodeid: 1
 703     quorum_votes: 1
 704     ring0_addr: 10.10.10.1
 705     ring1_addr: 10.10.20.1
 706   }
 707
 708  node {
 709     name: pvecm2
 710     nodeid: 2
 711     quorum_votes: 1
 712     ring0_addr: 10.10.10.2
 713     ring1_addr: 10.10.20.2
 714   }
 715
 716   [...] # other cluster nodes here
 717 }
 718
 719 [...] # other remaining config sections here
 720
 721 ----
 722
 723 Bring it in effect like described in the
 724 <<edit-corosync-conf,edit the corosync.conf file>> section.
 725
 726 This is a change which cannot take live in effect and needs at least a restart
 727 of corosync. Recommended is a restart of the whole cluster.
 728
 729 If you cannot reboot the whole cluster ensure no High Availability services are
 730 configured and the stop the corosync service on all nodes. After corosync is
 731 stopped on all nodes start it one after the other again.
 732
 733 Corosync Configuration
 734 ----------------------
 735
 736 The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It
 737 controls the cluster member ship and its network.
 738 For reading more about it check the corosync.conf man page:
 739 [source,bash]
 740 ----
 741 man corosync.conf
 742 ----
 743
 744 For node membership you should always use the `pvecm` tool provided by {pve}.
 745 You may have to edit the configuration file manually for other changes.
 746 Here are a few best practice tips for doing this.
 747
 748 [[edit-corosync-conf]]
 749 Edit corosync.conf
 750 ~~~~~~~~~~~~~~~~~~
 751
 752 Editing the corosync.conf file can be not always straight forward. There are
 753 two on each cluster, one in `/etc/pve/corosync.conf` and the other in
 754 `/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
 755 propagate the changes to the local one, but not vice versa.
 756
 757 The configuration will get updated automatically as soon as the file changes.
 758 This means changes which can be integrated in a running corosync will take
 759 instantly effect. So you should always make a copy and edit that instead, to
 760 avoid triggering some unwanted changes by an in between safe.
 761
 762 [source,bash]
 763 ----
 764 cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
 765 ----
 766
 767 Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
 768 preinstalled on {pve} for example.
 769
 770 NOTE: Always increment the 'config_version' number on configuration changes,
 771 omitting this can lead to problems.
 772
 773 After making the necessary changes create another copy of the current working
 774 configuration file. This serves as a backup if the new configuration fails to
 775 apply or makes problems in other ways.
 776
 777 [source,bash]
 778 ----
 779 cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
 780 ----
 781
 782 Then move the new configuration file over the old one:
 783 [source,bash]
 784 ----
 785 mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
 786 ----
 787
 788 You may check with the commands
 789 [source,bash]
 790 ----
 791 systemctl status corosync
 792 journalctl -b -u corosync
 793 ----
 794
 795 If the change could applied automatically. If not you may have to restart the
 796 corosync service via:
 797 [source,bash]
 798 ----
 799 systemctl restart corosync
 800 ----
 801
 802 On errors check the troubleshooting section below.
 803
 804 Troubleshooting
 805 ~~~~~~~~~~~~~~~
 806
 807 Issue: 'quorum.expected_votes must be configured'
 808 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 809
 810 When corosync starts to fail and you get the following message in the system log:
 811
 812 ----
 813 [...]
 814 corosync[1647]:  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
 815 corosync[1647]:  [SERV  ] Service engine 'corosync_quorum' failed to load for reason
 816     'configuration error: nodelist or quorum.expected_votes must be configured!'
 817 [...]
 818 ----
 819
 820 It means that the hostname you set for corosync 'ringX_addr' in the
 821 configuration could not be resolved.
 822
 823
 824 Write Configuration When Not Quorate
 825 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 826
 827 If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you
 828 know what you do, use:
 829 [source,bash]
 830 ----
 831 pvecm expected 1
 832 ----
 833
 834 This sets the expected vote count to 1 and makes the cluster quorate. You can
 835 now fix your configuration, or revert it back to the last working backup.
 836
 837 This is not enough if corosync cannot start anymore. Here its best to edit the
 838 local copy of the corosync configuration in '/etc/corosync/corosync.conf' so
 839 that corosync can start again. Ensure that on all nodes this configuration has
 840 the same content to avoid split brains. If you are not sure what went wrong
 841 it's best to ask the Proxmox Community to help you.
 842
 843
 844 [[corosync-conf-glossary]]
 845 Corosync Configuration Glossary
 846 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 847
 848 ringX_addr::
 849 This names the different ring addresses for the corosync totem rings used for
 850 the cluster communication.
 851
 852 bindnetaddr::
 853 Defines to which interface the ring should bind to. It may be any address of
 854 the subnet configured on the interface we want to use. In general its the
 855 recommended to just use an address a node uses on this interface.
 856
 857 rrp_mode::
 858 Specifies the mode of the redundant ring protocol and may be passive, active or
 859 none. Note that use of active is highly experimental and not official
 860 supported. Passive is the preferred mode, it may double the cluster
 861 communication throughput and increases availability.
 862
 863
 864 Cluster Cold Start
 865 ------------------
 866
 867 It is obvious that a cluster is not quorate when all nodes are
 868 offline. This is a common case after a power failure.
 869
 870 NOTE: It is always a good idea to use an uninterruptible power supply
 871 (``UPS'', also called ``battery backup'') to avoid this state, especially if
 872 you want HA.
 873
 874 On node startup, the `pve-guests` service is started and waits for
 875 quorum. Once quorate, it starts all guests which have the `onboot`
 876 flag set.
 877
 878 When you turn on nodes, or when power comes back after power failure,
 879 it is likely that some nodes boots faster than others. Please keep in
 880 mind that guest startup is delayed until you reach quorum.
 881
 882
 883 Guest Migration
 884 ---------------
 885
 886 Migrating virtual guests to other nodes is a useful feature in a
 887 cluster. There are settings to control the behavior of such
 888 migrations. This can be done via the configuration file
 889 `datacenter.cfg` or for a specific migration via API or command line
 890 parameters.
 891
 892 It makes a difference if a Guest is online or offline, or if it has
 893 local resources (like a local disk).
 894
 895 For Details about Virtual Machine Migration see the
 896 xref:qm_migration[QEMU/KVM Migration Chapter]
 897
 898 For Details about Container Migration see the
 899 xref:pct_migration[Container Migration Chapter]
 900
 901 Migration Type
 902 ~~~~~~~~~~~~~~
 903
 904 The migration type defines if the migration data should be sent over an
 905 encrypted (`secure`) channel or an unencrypted (`insecure`) one.
 906 Setting the migration type to insecure means that the RAM content of a
 907 virtual guest gets also transferred unencrypted, which can lead to
 908 information disclosure of critical data from inside the guest (for
 909 example passwords or encryption keys).
 910
 911 Therefore, we strongly recommend using the secure channel if you do
 912 not have full control over the network and can not guarantee that no
 913 one is eavesdropping to it.
 914
 915 NOTE: Storage migration does not follow this setting. Currently, it
 916 always sends the storage content over a secure channel.
 917
 918 Encryption requires a lot of computing power, so this setting is often
 919 changed to "unsafe" to achieve better performance. The impact on
 920 modern systems is lower because they implement AES encryption in
 921 hardware. The performance impact is particularly evident in fast
 922 networks where you can transfer 10 Gbps or more.
 923
 924
 925 Migration Network
 926 ~~~~~~~~~~~~~~~~~
 927
 928 By default, {pve} uses the network in which cluster communication
 929 takes place to send the migration traffic. This is not optimal because
 930 sensitive cluster traffic can be disrupted and this network may not
 931 have the best bandwidth available on the node.
 932
 933 Setting the migration network parameter allows the use of a dedicated
 934 network for the entire migration traffic. In addition to the memory,
 935 this also affects the storage traffic for offline migrations.
 936
 937 The migration network is set as a network in the CIDR notation. This
 938 has the advantage that you do not have to set individual IP addresses
 939 for each node.  {pve} can determine the real address on the
 940 destination node from the network specified in the CIDR form.  To
 941 enable this, the network must be specified so that each node has one,
 942 but only one IP in the respective network.
 943
 944
 945 Example
 946 ^^^^^^^
 947
 948 We assume that we have a three-node setup with three separate
 949 networks. One for public communication with the Internet, one for
 950 cluster communication and a very fast one, which we want to use as a
 951 dedicated network for migration.
 952
 953 A network configuration for such a setup might look as follows:
 954
 955 ----
 956 iface eno1 inet manual
 957
 958 # public network
 959 auto vmbr0
 960 iface vmbr0 inet static
 961     address 192.X.Y.57
 962     netmask 255.255.250.0
 963     gateway 192.X.Y.1
 964     bridge_ports eno1
 965     bridge_stp off
 966     bridge_fd 0
 967
 968 # cluster network
 969 auto eno2
 970 iface eno2 inet static
 971     address  10.1.1.1
 972     netmask  255.255.255.0
 973
 974 # fast network
 975 auto eno3
 976 iface eno3 inet static
 977     address  10.1.2.1
 978     netmask  255.255.255.0
 979 ----
 980
 981 Here, we will use the network 10.1.2.0/24 as a migration network. For
 982 a single migration, you can do this using the `migration_network`
 983 parameter of the command line tool:
 984
 985 ----
 986 # qm migrate 106 tre --online --migration_network 10.1.2.0/24
 987 ----
 988
 989 To configure this as the default network for all migrations in the
 990 cluster, set the `migration` property of the `/etc/pve/datacenter.cfg`
 991 file:
 992
 993 ----
 994 # use dedicated migration network
 995 migration: secure,network=10.1.2.0/24
 996 ----
 997
 998 NOTE: The migration type must always be set when the migration network
 999 gets set in `/etc/pve/datacenter.cfg`.
1000
1001
1002 ifdef::manvolnum[]
1003 include::pve-copyright.adoc[]
1004 endif::manvolnum[]