]>
Commit | Line | Data |
---|---|---|
d8742b0c DM |
1 | ifdef::manvolnum[] |
2 | PVE({manvolnum}) | |
3 | ================ | |
4 | include::attributes.txt[] | |
5 | ||
6 | NAME | |
7 | ---- | |
8 | ||
74026b8f | 9 | pvecm - Proxmox VE Cluster Manager |
d8742b0c DM |
10 | |
11 | SYNOPSYS | |
12 | -------- | |
13 | ||
14 | include::pvecm.1-synopsis.adoc[] | |
15 | ||
16 | DESCRIPTION | |
17 | ----------- | |
18 | endif::manvolnum[] | |
19 | ||
20 | ifndef::manvolnum[] | |
21 | Cluster Manager | |
22 | =============== | |
23 | include::attributes.txt[] | |
24 | endif::manvolnum[] | |
25 | ||
8c1189b6 FG |
26 | The {PVE} cluster manager `pvecm` is a tool to create a group of |
27 | physical servers. Such a group is called a *cluster*. We use the | |
8a865621 | 28 | http://www.corosync.org[Corosync Cluster Engine] for reliable group |
5eba0743 | 29 | communication, and such clusters can consist of up to 32 physical nodes |
8a865621 DM |
30 | (probably more, dependent on network latency). |
31 | ||
8c1189b6 | 32 | `pvecm` can be used to create a new cluster, join nodes to a cluster, |
8a865621 | 33 | leave the cluster, get status information and do various other cluster |
e300cf7d FG |
34 | related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'') |
35 | is used to transparently distribute the cluster configuration to all cluster | |
8a865621 DM |
36 | nodes. |
37 | ||
38 | Grouping nodes into a cluster has the following advantages: | |
39 | ||
40 | * Centralized, web based management | |
41 | ||
5eba0743 | 42 | * Multi-master clusters: each node can do all management task |
8a865621 | 43 | |
8c1189b6 FG |
44 | * `pmxcfs`: database-driven file system for storing configuration files, |
45 | replicated in real-time on all nodes using `corosync`. | |
8a865621 | 46 | |
5eba0743 | 47 | * Easy migration of virtual machines and containers between physical |
8a865621 DM |
48 | hosts |
49 | ||
50 | * Fast deployment | |
51 | ||
52 | * Cluster-wide services like firewall and HA | |
53 | ||
54 | ||
55 | Requirements | |
56 | ------------ | |
57 | ||
8c1189b6 | 58 | * All nodes must be in the same network as `corosync` uses IP Multicast |
8a865621 | 59 | to communicate between nodes (also see |
ceabe189 | 60 | http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP |
ff72a2ba | 61 | ports 5404 and 5405 for cluster communication. |
ceabe189 DM |
62 | + |
63 | NOTE: Some switches do not support IP multicast by default and must be | |
64 | manually enabled first. | |
8a865621 DM |
65 | |
66 | * Date and time have to be synchronized. | |
67 | ||
ceabe189 | 68 | * SSH tunnel on TCP port 22 between nodes is used. |
8a865621 | 69 | |
ceabe189 DM |
70 | * If you are interested in High Availability, you need to have at |
71 | least three nodes for reliable quorum. All nodes should have the | |
72 | same version. | |
8a865621 DM |
73 | |
74 | * We recommend a dedicated NIC for the cluster traffic, especially if | |
75 | you use shared storage. | |
76 | ||
77 | NOTE: It is not possible to mix Proxmox VE 3.x and earlier with | |
ceabe189 | 78 | Proxmox VE 4.0 cluster nodes. |
8a865621 DM |
79 | |
80 | ||
ceabe189 DM |
81 | Preparing Nodes |
82 | --------------- | |
8a865621 DM |
83 | |
84 | First, install {PVE} on all nodes. Make sure that each node is | |
85 | installed with the final hostname and IP configuration. Changing the | |
86 | hostname and IP is not possible after cluster creation. | |
87 | ||
88 | Currently the cluster creation has to be done on the console, so you | |
8c1189b6 | 89 | need to login via `ssh`. |
8a865621 | 90 | |
8a865621 | 91 | Create the Cluster |
ceabe189 | 92 | ------------------ |
8a865621 | 93 | |
8c1189b6 FG |
94 | Login via `ssh` to the first {pve} node. Use a unique name for your cluster. |
95 | This name cannot be changed later. | |
8a865621 DM |
96 | |
97 | hp1# pvecm create YOUR-CLUSTER-NAME | |
98 | ||
63f956c8 DM |
99 | CAUTION: The cluster name is used to compute the default multicast |
100 | address. Please use unique cluster names if you run more than one | |
101 | cluster inside your network. | |
102 | ||
8a865621 DM |
103 | To check the state of your cluster use: |
104 | ||
105 | hp1# pvecm status | |
106 | ||
107 | ||
108 | Adding Nodes to the Cluster | |
ceabe189 | 109 | --------------------------- |
8a865621 | 110 | |
8c1189b6 | 111 | Login via `ssh` to the node you want to add. |
8a865621 DM |
112 | |
113 | hp2# pvecm add IP-ADDRESS-CLUSTER | |
114 | ||
115 | For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node. | |
116 | ||
5eba0743 | 117 | CAUTION: A new node cannot hold any VMs, because you would get |
7980581f | 118 | conflicts about identical VM IDs. Also, all existing configuration in |
8c1189b6 FG |
119 | `/etc/pve` is overwritten when you join a new node to the cluster. To |
120 | workaround, use `vzdump` to backup and restore to a different VMID after | |
7980581f | 121 | adding the node to the cluster. |
8a865621 DM |
122 | |
123 | To check the state of cluster: | |
124 | ||
125 | # pvecm status | |
126 | ||
ceabe189 | 127 | .Cluster status after adding 4 nodes |
8a865621 DM |
128 | ---- |
129 | hp2# pvecm status | |
130 | Quorum information | |
131 | ~~~~~~~~~~~~~~~~~~ | |
132 | Date: Mon Apr 20 12:30:13 2015 | |
133 | Quorum provider: corosync_votequorum | |
134 | Nodes: 4 | |
135 | Node ID: 0x00000001 | |
136 | Ring ID: 1928 | |
137 | Quorate: Yes | |
138 | ||
139 | Votequorum information | |
140 | ~~~~~~~~~~~~~~~~~~~~~~ | |
141 | Expected votes: 4 | |
142 | Highest expected: 4 | |
143 | Total votes: 4 | |
144 | Quorum: 2 | |
145 | Flags: Quorate | |
146 | ||
147 | Membership information | |
148 | ~~~~~~~~~~~~~~~~~~~~~~ | |
149 | Nodeid Votes Name | |
150 | 0x00000001 1 192.168.15.91 | |
151 | 0x00000002 1 192.168.15.92 (local) | |
152 | 0x00000003 1 192.168.15.93 | |
153 | 0x00000004 1 192.168.15.94 | |
154 | ---- | |
155 | ||
156 | If you only want the list of all nodes use: | |
157 | ||
158 | # pvecm nodes | |
159 | ||
5eba0743 | 160 | .List nodes in a cluster |
8a865621 DM |
161 | ---- |
162 | hp2# pvecm nodes | |
163 | ||
164 | Membership information | |
165 | ~~~~~~~~~~~~~~~~~~~~~~ | |
166 | Nodeid Votes Name | |
167 | 1 1 hp1 | |
168 | 2 1 hp2 (local) | |
169 | 3 1 hp3 | |
170 | 4 1 hp4 | |
171 | ---- | |
172 | ||
173 | ||
174 | Remove a Cluster Node | |
ceabe189 | 175 | --------------------- |
8a865621 DM |
176 | |
177 | CAUTION: Read carefully the procedure before proceeding, as it could | |
178 | not be what you want or need. | |
179 | ||
180 | Move all virtual machines from the node. Make sure you have no local | |
181 | data or backups you want to keep, or save them accordingly. | |
182 | ||
8c1189b6 | 183 | Log in to one remaining node via ssh. Issue a `pvecm nodes` command to |
7980581f | 184 | identify the node ID: |
8a865621 DM |
185 | |
186 | ---- | |
187 | hp1# pvecm status | |
188 | ||
189 | Quorum information | |
190 | ~~~~~~~~~~~~~~~~~~ | |
191 | Date: Mon Apr 20 12:30:13 2015 | |
192 | Quorum provider: corosync_votequorum | |
193 | Nodes: 4 | |
194 | Node ID: 0x00000001 | |
195 | Ring ID: 1928 | |
196 | Quorate: Yes | |
197 | ||
198 | Votequorum information | |
199 | ~~~~~~~~~~~~~~~~~~~~~~ | |
200 | Expected votes: 4 | |
201 | Highest expected: 4 | |
202 | Total votes: 4 | |
203 | Quorum: 2 | |
204 | Flags: Quorate | |
205 | ||
206 | Membership information | |
207 | ~~~~~~~~~~~~~~~~~~~~~~ | |
208 | Nodeid Votes Name | |
209 | 0x00000001 1 192.168.15.91 (local) | |
210 | 0x00000002 1 192.168.15.92 | |
211 | 0x00000003 1 192.168.15.93 | |
212 | 0x00000004 1 192.168.15.94 | |
213 | ---- | |
214 | ||
215 | IMPORTANT: at this point you must power off the node to be removed and | |
216 | make sure that it will not power on again (in the network) as it | |
217 | is. | |
218 | ||
219 | ---- | |
220 | hp1# pvecm nodes | |
221 | ||
222 | Membership information | |
223 | ~~~~~~~~~~~~~~~~~~~~~~ | |
224 | Nodeid Votes Name | |
225 | 1 1 hp1 (local) | |
226 | 2 1 hp2 | |
227 | 3 1 hp3 | |
228 | 4 1 hp4 | |
229 | ---- | |
230 | ||
231 | Log in to one remaining node via ssh. Issue the delete command (here | |
8c1189b6 | 232 | deleting node `hp4`): |
8a865621 DM |
233 | |
234 | hp1# pvecm delnode hp4 | |
235 | ||
236 | If the operation succeeds no output is returned, just check the node | |
8c1189b6 | 237 | list again with `pvecm nodes` or `pvecm status`. You should see |
8a865621 DM |
238 | something like: |
239 | ||
240 | ---- | |
241 | hp1# pvecm status | |
242 | ||
243 | Quorum information | |
244 | ~~~~~~~~~~~~~~~~~~ | |
245 | Date: Mon Apr 20 12:44:28 2015 | |
246 | Quorum provider: corosync_votequorum | |
247 | Nodes: 3 | |
248 | Node ID: 0x00000001 | |
249 | Ring ID: 1992 | |
250 | Quorate: Yes | |
251 | ||
252 | Votequorum information | |
253 | ~~~~~~~~~~~~~~~~~~~~~~ | |
254 | Expected votes: 3 | |
255 | Highest expected: 3 | |
256 | Total votes: 3 | |
257 | Quorum: 3 | |
258 | Flags: Quorate | |
259 | ||
260 | Membership information | |
261 | ~~~~~~~~~~~~~~~~~~~~~~ | |
262 | Nodeid Votes Name | |
263 | 0x00000001 1 192.168.15.90 (local) | |
264 | 0x00000002 1 192.168.15.91 | |
265 | 0x00000003 1 192.168.15.92 | |
266 | ---- | |
267 | ||
268 | IMPORTANT: as said above, it is very important to power off the node | |
269 | *before* removal, and make sure that it will *never* power on again | |
270 | (in the existing cluster network) as it is. | |
271 | ||
272 | If you power on the node as it is, your cluster will be screwed up and | |
273 | it could be difficult to restore a clean cluster state. | |
274 | ||
275 | If, for whatever reason, you want that this server joins the same | |
276 | cluster again, you have to | |
277 | ||
26ca7ff5 | 278 | * reinstall {pve} on it from scratch |
8a865621 DM |
279 | |
280 | * then join it, as explained in the previous section. | |
d8742b0c | 281 | |
555e966b TL |
282 | Separate A Node Without Reinstalling |
283 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
284 | ||
285 | CAUTION: This is *not* the recommended method, proceed with caution. Use the | |
286 | above mentioned method if you're unsure. | |
287 | ||
288 | You can also separate a node from a cluster without reinstalling it from | |
289 | scratch. But after removing the node from the cluster it will still have | |
290 | access to the shared storages! This must be resolved before you start removing | |
291 | the node from the cluster. A {pve} cluster cannot share the exact same | |
292 | storage with another cluster, as it leads to VMID conflicts. | |
293 | ||
294 | Move the guests which you want to keep on this node now, after the removal you | |
295 | can do this only via backup and restore. Its suggested that you create a new | |
296 | storage where only the node which you want to separate has access. This can be | |
297 | an new export on your NFS or a new Ceph pool, to name a few examples. Its just | |
298 | important that the exact same storage does not gets accessed by multiple | |
299 | clusters. After setting this storage up move all data from the node and its VMs | |
300 | to it. Then you are ready to separate the node from the cluster. | |
301 | ||
302 | WARNING: Ensure all shared resources are cleanly separated! You will run into | |
303 | conflicts and problems else. | |
304 | ||
305 | First stop the corosync and the pve-cluster services on the node: | |
306 | [source,bash] | |
307 | systemctl stop pve-cluster | |
308 | systemctl stop corosync | |
309 | ||
310 | Start the cluster filesystem again in local mode: | |
311 | [source,bash] | |
312 | pmxcfs -l | |
313 | ||
314 | Delete the corosync configuration files: | |
315 | [source,bash] | |
316 | rm /etc/pve/corosync.conf | |
317 | rm /etc/corosync/* | |
318 | ||
319 | You can now start the filesystem again as normal service: | |
320 | [source,bash] | |
321 | killall pmxcfs | |
322 | systemctl start pve-cluster | |
323 | ||
324 | The node is now separated from the cluster. You can deleted it from a remaining | |
325 | node of the cluster with: | |
326 | [source,bash] | |
327 | pvecm delnode oldnode | |
328 | ||
329 | If the command failed, because the remaining node in the cluster lost quorum | |
330 | when the now separate node exited, you may set the expected votes to 1 as a workaround: | |
331 | [source,bash] | |
332 | pvecm expected 1 | |
333 | ||
334 | And the repeat the 'pvecm delnode' command. | |
335 | ||
336 | Now switch back to the separated node, here delete all remaining files left | |
337 | from the old cluster. This ensures that the node can be added to another | |
338 | cluster again without problems. | |
339 | ||
340 | [source,bash] | |
341 | rm /var/lib/corosync/* | |
342 | ||
343 | As the configuration files from the other nodes are still in the cluster | |
344 | filesystem you may want to clean those up too. Remove simply the whole | |
345 | directory recursive from '/etc/pve/nodes/NODENAME', but check three times that | |
346 | you used the correct one before deleting it. | |
347 | ||
348 | CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means | |
349 | the nodes can still connect to each other with public key authentication. This | |
350 | should be fixed by removing the respective keys from the | |
351 | '/etc/pve/priv/authorized_keys' file. | |
d8742b0c | 352 | |
806ef12d DM |
353 | Quorum |
354 | ------ | |
355 | ||
356 | {pve} use a quorum-based technique to provide a consistent state among | |
357 | all cluster nodes. | |
358 | ||
359 | [quote, from Wikipedia, Quorum (distributed computing)] | |
360 | ____ | |
361 | A quorum is the minimum number of votes that a distributed transaction | |
362 | has to obtain in order to be allowed to perform an operation in a | |
363 | distributed system. | |
364 | ____ | |
365 | ||
366 | In case of network partitioning, state changes requires that a | |
367 | majority of nodes are online. The cluster switches to read-only mode | |
5eba0743 | 368 | if it loses quorum. |
806ef12d DM |
369 | |
370 | NOTE: {pve} assigns a single vote to each node by default. | |
371 | ||
372 | ||
373 | Cluster Cold Start | |
374 | ------------------ | |
375 | ||
376 | It is obvious that a cluster is not quorate when all nodes are | |
377 | offline. This is a common case after a power failure. | |
378 | ||
379 | NOTE: It is always a good idea to use an uninterruptible power supply | |
8c1189b6 | 380 | (``UPS'', also called ``battery backup'') to avoid this state, especially if |
806ef12d DM |
381 | you want HA. |
382 | ||
8c1189b6 FG |
383 | On node startup, service `pve-manager` is started and waits for |
384 | quorum. Once quorate, it starts all guests which have the `onboot` | |
612417fd DM |
385 | flag set. |
386 | ||
387 | When you turn on nodes, or when power comes back after power failure, | |
388 | it is likely that some nodes boots faster than others. Please keep in | |
389 | mind that guest startup is delayed until you reach quorum. | |
806ef12d DM |
390 | |
391 | ||
d8742b0c DM |
392 | ifdef::manvolnum[] |
393 | include::pve-copyright.adoc[] | |
394 | endif::manvolnum[] |