X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=pvecm.adoc;h=867c6583d768faf8e89c3a9db7fd69b6fd4ffc0d;hb=8c1189b640ae7d10119ff1c046580f48749d38bd;hp=3b2a75da18eb8d3436a6b1982773db9b2617db09;hpb=d8742b0c9cbfbc0a2bac4b342657dc94db079a81;p=pve-docs.git diff --git a/pvecm.adoc b/pvecm.adoc index 3b2a75d..867c658 100644 --- a/pvecm.adoc +++ b/pvecm.adoc @@ -23,10 +23,300 @@ Cluster Manager include::attributes.txt[] endif::manvolnum[] -'pvecm' is a program to manage the cluster configuration. It can be -used to create a new cluster, join nodes to a cluster, leave the -cluster, get status information and do various other cluster related -tasks. +The {PVE} cluster manager `pvecm` is a tool to create a group of +physical servers. Such a group is called a *cluster*. We use the +http://www.corosync.org[Corosync Cluster Engine] for reliable group +communication, and such cluster can consists of up to 32 physical nodes +(probably more, dependent on network latency). + +`pvecm` can be used to create a new cluster, join nodes to a cluster, +leave the cluster, get status information and do various other cluster +related tasks. The Proxmox Cluster file system (pmxcfs) is used to +transparently distribute the cluster configuration to all cluster +nodes. + +Grouping nodes into a cluster has the following advantages: + +* Centralized, web based management + +* Multi-master clusters: Each node can do all management task + +* `pmxcfs`: database-driven file system for storing configuration files, + replicated in real-time on all nodes using `corosync`. + +* Easy migration of Virtual Machines and Containers between physical + hosts + +* Fast deployment + +* Cluster-wide services like firewall and HA + + +Requirements +------------ + +* All nodes must be in the same network as `corosync` uses IP Multicast + to communicate between nodes (also see + http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP + ports 5404 and 5405 for cluster communication. ++ +NOTE: Some switches do not support IP multicast by default and must be +manually enabled first. + +* Date and time have to be synchronized. + +* SSH tunnel on TCP port 22 between nodes is used. + +* If you are interested in High Availability, you need to have at + least three nodes for reliable quorum. All nodes should have the + same version. + +* We recommend a dedicated NIC for the cluster traffic, especially if + you use shared storage. + +NOTE: It is not possible to mix Proxmox VE 3.x and earlier with +Proxmox VE 4.0 cluster nodes. + + +Preparing Nodes +--------------- + +First, install {PVE} on all nodes. Make sure that each node is +installed with the final hostname and IP configuration. Changing the +hostname and IP is not possible after cluster creation. + +Currently the cluster creation has to be done on the console, so you +need to login via `ssh`. + +Create the Cluster +------------------ + +Login via `ssh` to the first {pve} node. Use a unique name for your cluster. +This name cannot be changed later. + + hp1# pvecm create YOUR-CLUSTER-NAME + +CAUTION: The cluster name is used to compute the default multicast +address. Please use unique cluster names if you run more than one +cluster inside your network. + +To check the state of your cluster use: + + hp1# pvecm status + + +Adding Nodes to the Cluster +--------------------------- + +Login via `ssh` to the node you want to add. + + hp2# pvecm add IP-ADDRESS-CLUSTER + +For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node. + +CAUTION: A new node cannot hold any VM´s, because you would get +conflicts about identical VM IDs. Also, all existing configuration in +`/etc/pve` is overwritten when you join a new node to the cluster. To +workaround, use `vzdump` to backup and restore to a different VMID after +adding the node to the cluster. + +To check the state of cluster: + + # pvecm status + +.Cluster status after adding 4 nodes +---- +hp2# pvecm status +Quorum information +~~~~~~~~~~~~~~~~~~ +Date: Mon Apr 20 12:30:13 2015 +Quorum provider: corosync_votequorum +Nodes: 4 +Node ID: 0x00000001 +Ring ID: 1928 +Quorate: Yes + +Votequorum information +~~~~~~~~~~~~~~~~~~~~~~ +Expected votes: 4 +Highest expected: 4 +Total votes: 4 +Quorum: 2 +Flags: Quorate + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name +0x00000001 1 192.168.15.91 +0x00000002 1 192.168.15.92 (local) +0x00000003 1 192.168.15.93 +0x00000004 1 192.168.15.94 +---- + +If you only want the list of all nodes use: + + # pvecm nodes + +.List Nodes in a Cluster +---- +hp2# pvecm nodes + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name + 1 1 hp1 + 2 1 hp2 (local) + 3 1 hp3 + 4 1 hp4 +---- + + +Remove a Cluster Node +--------------------- + +CAUTION: Read carefully the procedure before proceeding, as it could +not be what you want or need. + +Move all virtual machines from the node. Make sure you have no local +data or backups you want to keep, or save them accordingly. + +Log in to one remaining node via ssh. Issue a `pvecm nodes` command to +identify the node ID: + +---- +hp1# pvecm status + +Quorum information +~~~~~~~~~~~~~~~~~~ +Date: Mon Apr 20 12:30:13 2015 +Quorum provider: corosync_votequorum +Nodes: 4 +Node ID: 0x00000001 +Ring ID: 1928 +Quorate: Yes + +Votequorum information +~~~~~~~~~~~~~~~~~~~~~~ +Expected votes: 4 +Highest expected: 4 +Total votes: 4 +Quorum: 2 +Flags: Quorate + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name +0x00000001 1 192.168.15.91 (local) +0x00000002 1 192.168.15.92 +0x00000003 1 192.168.15.93 +0x00000004 1 192.168.15.94 +---- + +IMPORTANT: at this point you must power off the node to be removed and +make sure that it will not power on again (in the network) as it +is. + +---- +hp1# pvecm nodes + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name + 1 1 hp1 (local) + 2 1 hp2 + 3 1 hp3 + 4 1 hp4 +---- + +Log in to one remaining node via ssh. Issue the delete command (here +deleting node `hp4`): + + hp1# pvecm delnode hp4 + +If the operation succeeds no output is returned, just check the node +list again with `pvecm nodes` or `pvecm status`. You should see +something like: + +---- +hp1# pvecm status + +Quorum information +~~~~~~~~~~~~~~~~~~ +Date: Mon Apr 20 12:44:28 2015 +Quorum provider: corosync_votequorum +Nodes: 3 +Node ID: 0x00000001 +Ring ID: 1992 +Quorate: Yes + +Votequorum information +~~~~~~~~~~~~~~~~~~~~~~ +Expected votes: 3 +Highest expected: 3 +Total votes: 3 +Quorum: 3 +Flags: Quorate + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name +0x00000001 1 192.168.15.90 (local) +0x00000002 1 192.168.15.91 +0x00000003 1 192.168.15.92 +---- + +IMPORTANT: as said above, it is very important to power off the node +*before* removal, and make sure that it will *never* power on again +(in the existing cluster network) as it is. + +If you power on the node as it is, your cluster will be screwed up and +it could be difficult to restore a clean cluster state. + +If, for whatever reason, you want that this server joins the same +cluster again, you have to + +* reinstall pve on it from scratch + +* then join it, as explained in the previous section. + + +Quorum +------ + +{pve} use a quorum-based technique to provide a consistent state among +all cluster nodes. + +[quote, from Wikipedia, Quorum (distributed computing)] +____ +A quorum is the minimum number of votes that a distributed transaction +has to obtain in order to be allowed to perform an operation in a +distributed system. +____ + +In case of network partitioning, state changes requires that a +majority of nodes are online. The cluster switches to read-only mode +if it loose quorum. + +NOTE: {pve} assigns a single vote to each node by default. + + +Cluster Cold Start +------------------ + +It is obvious that a cluster is not quorate when all nodes are +offline. This is a common case after a power failure. + +NOTE: It is always a good idea to use an uninterruptible power supply +(``UPS'', also called ``battery backup'') to avoid this state, especially if +you want HA. + +On node startup, service `pve-manager` is started and waits for +quorum. Once quorate, it starts all guests which have the `onboot` +flag set. + +When you turn on nodes, or when power comes back after power failure, +it is likely that some nodes boots faster than others. Please keep in +mind that guest startup is delayed until you reach quorum. ifdef::manvolnum[]