]>
Commit | Line | Data |
---|---|---|
bde0e57d | 1 | [[chapter_pvecm]] |
d8742b0c | 2 | ifdef::manvolnum[] |
b2f242ab DM |
3 | pvecm(1) |
4 | ======== | |
5f09af76 DM |
5 | :pve-toplevel: |
6 | ||
d8742b0c DM |
7 | NAME |
8 | ---- | |
9 | ||
74026b8f | 10 | pvecm - Proxmox VE Cluster Manager |
d8742b0c | 11 | |
49a5e11c | 12 | SYNOPSIS |
d8742b0c DM |
13 | -------- |
14 | ||
15 | include::pvecm.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Cluster Manager | |
23 | =============== | |
5f09af76 | 24 | :pve-toplevel: |
194d2f29 | 25 | endif::manvolnum[] |
5f09af76 | 26 | |
8c1189b6 FG |
27 | The {PVE} cluster manager `pvecm` is a tool to create a group of |
28 | physical servers. Such a group is called a *cluster*. We use the | |
8a865621 | 29 | http://www.corosync.org[Corosync Cluster Engine] for reliable group |
fdf1dd36 TL |
30 | communication. There's no explicit limit for the number of nodes in a cluster. |
31 | In practice, the actual possible node count may be limited by the host and | |
79bb0794 | 32 | network performance. Currently (2021), there are reports of clusters (using |
fdf1dd36 | 33 | high-end enterprise hardware) with over 50 nodes in production. |
8a865621 | 34 | |
8c1189b6 | 35 | `pvecm` can be used to create a new cluster, join nodes to a cluster, |
8a865621 | 36 | leave the cluster, get status information and do various other cluster |
e300cf7d FG |
37 | related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'') |
38 | is used to transparently distribute the cluster configuration to all cluster | |
8a865621 DM |
39 | nodes. |
40 | ||
41 | Grouping nodes into a cluster has the following advantages: | |
42 | ||
43 | * Centralized, web based management | |
44 | ||
6d3c0b34 | 45 | * Multi-master clusters: each node can do all management tasks |
8a865621 | 46 | |
8c1189b6 FG |
47 | * `pmxcfs`: database-driven file system for storing configuration files, |
48 | replicated in real-time on all nodes using `corosync`. | |
8a865621 | 49 | |
5eba0743 | 50 | * Easy migration of virtual machines and containers between physical |
8a865621 DM |
51 | hosts |
52 | ||
53 | * Fast deployment | |
54 | ||
55 | * Cluster-wide services like firewall and HA | |
56 | ||
57 | ||
58 | Requirements | |
59 | ------------ | |
60 | ||
a9e7c3aa SR |
61 | * All nodes must be able to connect to each other via UDP ports 5404 and 5405 |
62 | for corosync to work. | |
8a865621 DM |
63 | |
64 | * Date and time have to be synchronized. | |
65 | ||
94cfc9d4 | 66 | * SSH tunnel on TCP port 22 between nodes is used. |
8a865621 | 67 | |
ceabe189 DM |
68 | * If you are interested in High Availability, you need to have at |
69 | least three nodes for reliable quorum. All nodes should have the | |
70 | same version. | |
8a865621 DM |
71 | |
72 | * We recommend a dedicated NIC for the cluster traffic, especially if | |
73 | you use shared storage. | |
74 | ||
d4a9910f DL |
75 | * Root password of a cluster node is required for adding nodes. |
76 | ||
e4b62d04 TL |
77 | NOTE: It is not possible to mix {pve} 3.x and earlier with {pve} 4.X cluster |
78 | nodes. | |
79 | ||
6cab1704 TL |
80 | NOTE: While it's possible to mix {pve} 4.4 and {pve} 5.0 nodes, doing so is |
81 | not supported as production configuration and should only used temporarily | |
82 | during upgrading the whole cluster from one to another major version. | |
8a865621 | 83 | |
a9e7c3aa SR |
84 | NOTE: Running a cluster of {pve} 6.x with earlier versions is not possible. The |
85 | cluster protocol (corosync) between {pve} 6.x and earlier versions changed | |
86 | fundamentally. The corosync 3 packages for {pve} 5.4 are only intended for the | |
87 | upgrade procedure to {pve} 6.0. | |
88 | ||
8a865621 | 89 | |
ceabe189 DM |
90 | Preparing Nodes |
91 | --------------- | |
8a865621 DM |
92 | |
93 | First, install {PVE} on all nodes. Make sure that each node is | |
94 | installed with the final hostname and IP configuration. Changing the | |
95 | hostname and IP is not possible after cluster creation. | |
96 | ||
a9e7c3aa SR |
97 | While it's common to reference all nodenames and their IPs in `/etc/hosts` (or |
98 | make their names resolvable through other means), this is not necessary for a | |
99 | cluster to work. It may be useful however, as you can then connect from one node | |
100 | to the other with SSH via the easier to remember node name (see also | |
101 | xref:pvecm_corosync_addresses[Link Address Types]). Note that we always | |
102 | recommend to reference nodes by their IP addresses in the cluster configuration. | |
103 | ||
9a7396aa | 104 | |
11202f1d | 105 | [[pvecm_create_cluster]] |
6cab1704 TL |
106 | Create a Cluster |
107 | ---------------- | |
108 | ||
109 | You can either create a cluster on the console (login via `ssh`), or through | |
110 | the API using the {pve} Webinterface (__Datacenter -> Cluster__). | |
8a865621 | 111 | |
6cab1704 TL |
112 | NOTE: Use a unique name for your cluster. This name cannot be changed later. |
113 | The cluster name follows the same rules as node names. | |
3e380ce0 | 114 | |
6cab1704 | 115 | [[pvecm_cluster_create_via_gui]] |
3e380ce0 SR |
116 | Create via Web GUI |
117 | ~~~~~~~~~~~~~~~~~~ | |
118 | ||
24398259 SR |
119 | [thumbnail="screenshot/gui-cluster-create.png"] |
120 | ||
3e380ce0 SR |
121 | Under __Datacenter -> Cluster__, click on *Create Cluster*. Enter the cluster |
122 | name and select a network connection from the dropdown to serve as the main | |
123 | cluster network (Link 0). It defaults to the IP resolved via the node's | |
124 | hostname. | |
125 | ||
126 | To add a second link as fallback, you can select the 'Advanced' checkbox and | |
127 | choose an additional network interface (Link 1, see also | |
128 | xref:pvecm_redundancy[Corosync Redundancy]). | |
129 | ||
6cab1704 TL |
130 | NOTE: Ensure the network selected for the cluster communication is not used for |
131 | any high traffic loads like those of (network) storages or live-migration. | |
132 | While the cluster network itself produces small amounts of data, it is very | |
133 | sensitive to latency. Check out full | |
134 | xref:pvecm_cluster_network_requirements[cluster network requirements]. | |
135 | ||
136 | [[pvecm_cluster_create_via_cli]] | |
3e380ce0 SR |
137 | Create via Command Line |
138 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
139 | ||
140 | Login via `ssh` to the first {pve} node and run the following command: | |
8a865621 | 141 | |
c15cdfba TL |
142 | ---- |
143 | hp1# pvecm create CLUSTERNAME | |
144 | ---- | |
8a865621 | 145 | |
3e380ce0 | 146 | To check the state of the new cluster use: |
8a865621 | 147 | |
c15cdfba | 148 | ---- |
8a865621 | 149 | hp1# pvecm status |
c15cdfba | 150 | ---- |
8a865621 | 151 | |
dd1aa0e0 TL |
152 | Multiple Clusters In Same Network |
153 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
154 | ||
155 | It is possible to create multiple clusters in the same physical or logical | |
3e380ce0 SR |
156 | network. Each such cluster must have a unique name to avoid possible clashes in |
157 | the cluster communication stack. This also helps avoid human confusion by making | |
158 | clusters clearly distinguishable. | |
dd1aa0e0 TL |
159 | |
160 | While the bandwidth requirement of a corosync cluster is relatively low, the | |
161 | latency of packages and the package per second (PPS) rate is the limiting | |
162 | factor. Different clusters in the same network can compete with each other for | |
163 | these resources, so it may still make sense to use separate physical network | |
164 | infrastructure for bigger clusters. | |
8a865621 | 165 | |
11202f1d | 166 | [[pvecm_join_node_to_cluster]] |
8a865621 | 167 | Adding Nodes to the Cluster |
ceabe189 | 168 | --------------------------- |
8a865621 | 169 | |
3e380ce0 SR |
170 | CAUTION: A node that is about to be added to the cluster cannot hold any guests. |
171 | All existing configuration in `/etc/pve` is overwritten when joining a cluster, | |
172 | since guest IDs could be conflicting. As a workaround create a backup of the | |
173 | guest (`vzdump`) and restore it as a different ID after the node has been added | |
174 | to the cluster. | |
175 | ||
6cab1704 TL |
176 | Join Node to Cluster via GUI |
177 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
3e380ce0 | 178 | |
24398259 SR |
179 | [thumbnail="screenshot/gui-cluster-join-information.png"] |
180 | ||
3e380ce0 SR |
181 | Login to the web interface on an existing cluster node. Under __Datacenter -> |
182 | Cluster__, click the button *Join Information* at the top. Then, click on the | |
183 | button *Copy Information*. Alternatively, copy the string from the 'Information' | |
184 | field manually. | |
185 | ||
24398259 SR |
186 | [thumbnail="screenshot/gui-cluster-join.png"] |
187 | ||
3e380ce0 SR |
188 | Next, login to the web interface on the node you want to add. |
189 | Under __Datacenter -> Cluster__, click on *Join Cluster*. Fill in the | |
6cab1704 TL |
190 | 'Information' field with the 'Join Information' text you copied earlier. |
191 | Most settings required for joining the cluster will be filled out | |
192 | automatically. For security reasons, the cluster password has to be entered | |
193 | manually. | |
3e380ce0 SR |
194 | |
195 | NOTE: To enter all required data manually, you can disable the 'Assisted Join' | |
196 | checkbox. | |
197 | ||
6cab1704 TL |
198 | After clicking the *Join* button, the cluster join process will start |
199 | immediately. After the node joined the cluster its current node certificate | |
200 | will be replaced by one signed from the cluster certificate authority (CA), | |
201 | that means the current session will stop to work after a few seconds. You might | |
202 | then need to force-reload the webinterface and re-login with the cluster | |
203 | credentials. | |
3e380ce0 | 204 | |
6cab1704 | 205 | Now your node should be visible under __Datacenter -> Cluster__. |
3e380ce0 | 206 | |
6cab1704 TL |
207 | Join Node to Cluster via Command Line |
208 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
3e380ce0 | 209 | |
6cab1704 | 210 | Login via `ssh` to the node you want to join into an existing cluster. |
8a865621 | 211 | |
c15cdfba | 212 | ---- |
8a865621 | 213 | hp2# pvecm add IP-ADDRESS-CLUSTER |
c15cdfba | 214 | ---- |
8a865621 | 215 | |
270757a1 | 216 | For `IP-ADDRESS-CLUSTER` use the IP or hostname of an existing cluster node. |
a9e7c3aa | 217 | An IP address is recommended (see xref:pvecm_corosync_addresses[Link Address Types]). |
8a865621 | 218 | |
8a865621 | 219 | |
a9e7c3aa | 220 | To check the state of the cluster use: |
8a865621 | 221 | |
c15cdfba | 222 | ---- |
8a865621 | 223 | # pvecm status |
c15cdfba | 224 | ---- |
8a865621 | 225 | |
ceabe189 | 226 | .Cluster status after adding 4 nodes |
8a865621 DM |
227 | ---- |
228 | hp2# pvecm status | |
229 | Quorum information | |
230 | ~~~~~~~~~~~~~~~~~~ | |
231 | Date: Mon Apr 20 12:30:13 2015 | |
232 | Quorum provider: corosync_votequorum | |
233 | Nodes: 4 | |
234 | Node ID: 0x00000001 | |
a9e7c3aa | 235 | Ring ID: 1/8 |
8a865621 DM |
236 | Quorate: Yes |
237 | ||
238 | Votequorum information | |
239 | ~~~~~~~~~~~~~~~~~~~~~~ | |
240 | Expected votes: 4 | |
241 | Highest expected: 4 | |
242 | Total votes: 4 | |
91f3edd0 | 243 | Quorum: 3 |
8a865621 DM |
244 | Flags: Quorate |
245 | ||
246 | Membership information | |
247 | ~~~~~~~~~~~~~~~~~~~~~~ | |
248 | Nodeid Votes Name | |
249 | 0x00000001 1 192.168.15.91 | |
250 | 0x00000002 1 192.168.15.92 (local) | |
251 | 0x00000003 1 192.168.15.93 | |
252 | 0x00000004 1 192.168.15.94 | |
253 | ---- | |
254 | ||
255 | If you only want the list of all nodes use: | |
256 | ||
c15cdfba | 257 | ---- |
8a865621 | 258 | # pvecm nodes |
c15cdfba | 259 | ---- |
8a865621 | 260 | |
5eba0743 | 261 | .List nodes in a cluster |
8a865621 DM |
262 | ---- |
263 | hp2# pvecm nodes | |
264 | ||
265 | Membership information | |
266 | ~~~~~~~~~~~~~~~~~~~~~~ | |
267 | Nodeid Votes Name | |
268 | 1 1 hp1 | |
269 | 2 1 hp2 (local) | |
270 | 3 1 hp3 | |
271 | 4 1 hp4 | |
272 | ---- | |
273 | ||
3254bfdd | 274 | [[pvecm_adding_nodes_with_separated_cluster_network]] |
e4ec4154 TL |
275 | Adding Nodes With Separated Cluster Network |
276 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
277 | ||
278 | When adding a node to a cluster with a separated cluster network you need to | |
a9e7c3aa | 279 | use the 'link0' parameter to set the nodes address on that network: |
e4ec4154 TL |
280 | |
281 | [source,bash] | |
4d19cb00 | 282 | ---- |
a9e7c3aa | 283 | pvecm add IP-ADDRESS-CLUSTER -link0 LOCAL-IP-ADDRESS-LINK0 |
4d19cb00 | 284 | ---- |
e4ec4154 | 285 | |
a9e7c3aa SR |
286 | If you want to use the built-in xref:pvecm_redundancy[redundancy] of the |
287 | kronosnet transport layer, also use the 'link1' parameter. | |
e4ec4154 | 288 | |
3e380ce0 SR |
289 | Using the GUI, you can select the correct interface from the corresponding 'Link 0' |
290 | and 'Link 1' fields in the *Cluster Join* dialog. | |
8a865621 DM |
291 | |
292 | Remove a Cluster Node | |
ceabe189 | 293 | --------------------- |
8a865621 DM |
294 | |
295 | CAUTION: Read carefully the procedure before proceeding, as it could | |
296 | not be what you want or need. | |
297 | ||
298 | Move all virtual machines from the node. Make sure you have no local | |
299 | data or backups you want to keep, or save them accordingly. | |
e8503c6c | 300 | In the following example we will remove the node hp4 from the cluster. |
8a865621 | 301 | |
e8503c6c EK |
302 | Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes` |
303 | command to identify the node ID to remove: | |
8a865621 DM |
304 | |
305 | ---- | |
306 | hp1# pvecm nodes | |
307 | ||
308 | Membership information | |
309 | ~~~~~~~~~~~~~~~~~~~~~~ | |
310 | Nodeid Votes Name | |
311 | 1 1 hp1 (local) | |
312 | 2 1 hp2 | |
313 | 3 1 hp3 | |
314 | 4 1 hp4 | |
315 | ---- | |
316 | ||
e8503c6c EK |
317 | |
318 | At this point you must power off hp4 and | |
319 | make sure that it will not power on again (in the network) as it | |
320 | is. | |
321 | ||
322 | IMPORTANT: As said above, it is critical to power off the node | |
323 | *before* removal, and make sure that it will *never* power on again | |
324 | (in the existing cluster network) as it is. | |
325 | If you power on the node as it is, your cluster will be screwed up and | |
326 | it could be difficult to restore a clean cluster state. | |
327 | ||
328 | After powering off the node hp4, we can safely remove it from the cluster. | |
8a865621 | 329 | |
c15cdfba | 330 | ---- |
8a865621 | 331 | hp1# pvecm delnode hp4 |
10da5ce1 | 332 | Killing node 4 |
c15cdfba | 333 | ---- |
8a865621 | 334 | |
10da5ce1 DJ |
335 | Use `pvecm nodes` or `pvecm status` to check the node list again. It should |
336 | look something like: | |
8a865621 DM |
337 | |
338 | ---- | |
339 | hp1# pvecm status | |
340 | ||
341 | Quorum information | |
342 | ~~~~~~~~~~~~~~~~~~ | |
343 | Date: Mon Apr 20 12:44:28 2015 | |
344 | Quorum provider: corosync_votequorum | |
345 | Nodes: 3 | |
346 | Node ID: 0x00000001 | |
a9e7c3aa | 347 | Ring ID: 1/8 |
8a865621 DM |
348 | Quorate: Yes |
349 | ||
350 | Votequorum information | |
351 | ~~~~~~~~~~~~~~~~~~~~~~ | |
352 | Expected votes: 3 | |
353 | Highest expected: 3 | |
354 | Total votes: 3 | |
91f3edd0 | 355 | Quorum: 2 |
8a865621 DM |
356 | Flags: Quorate |
357 | ||
358 | Membership information | |
359 | ~~~~~~~~~~~~~~~~~~~~~~ | |
360 | Nodeid Votes Name | |
361 | 0x00000001 1 192.168.15.90 (local) | |
362 | 0x00000002 1 192.168.15.91 | |
363 | 0x00000003 1 192.168.15.92 | |
364 | ---- | |
365 | ||
a9e7c3aa SR |
366 | If, for whatever reason, you want this server to join the same cluster again, |
367 | you have to | |
8a865621 | 368 | |
26ca7ff5 | 369 | * reinstall {pve} on it from scratch |
8a865621 DM |
370 | |
371 | * then join it, as explained in the previous section. | |
d8742b0c | 372 | |
41925ede SR |
373 | NOTE: After removal of the node, its SSH fingerprint will still reside in the |
374 | 'known_hosts' of the other nodes. If you receive an SSH error after rejoining | |
9121b45b TL |
375 | a node with the same IP or hostname, run `pvecm updatecerts` once on the |
376 | re-added node to update its fingerprint cluster wide. | |
41925ede | 377 | |
38ae8db3 | 378 | [[pvecm_separate_node_without_reinstall]] |
555e966b TL |
379 | Separate A Node Without Reinstalling |
380 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
381 | ||
382 | CAUTION: This is *not* the recommended method, proceed with caution. Use the | |
383 | above mentioned method if you're unsure. | |
384 | ||
385 | You can also separate a node from a cluster without reinstalling it from | |
386 | scratch. But after removing the node from the cluster it will still have | |
387 | access to the shared storages! This must be resolved before you start removing | |
388 | the node from the cluster. A {pve} cluster cannot share the exact same | |
2ea5c4a5 TL |
389 | storage with another cluster, as storage locking doesn't work over cluster |
390 | boundary. Further, it may also lead to VMID conflicts. | |
555e966b | 391 | |
3be22308 | 392 | Its suggested that you create a new storage where only the node which you want |
a9e7c3aa | 393 | to separate has access. This can be a new export on your NFS or a new Ceph |
3be22308 TL |
394 | pool, to name a few examples. Its just important that the exact same storage |
395 | does not gets accessed by multiple clusters. After setting this storage up move | |
396 | all data from the node and its VMs to it. Then you are ready to separate the | |
397 | node from the cluster. | |
555e966b | 398 | |
a9e7c3aa SR |
399 | WARNING: Ensure all shared resources are cleanly separated! Otherwise you will |
400 | run into conflicts and problems. | |
555e966b | 401 | |
9ffebff5 | 402 | First, stop the corosync and the pve-cluster services on the node: |
555e966b | 403 | [source,bash] |
4d19cb00 | 404 | ---- |
555e966b TL |
405 | systemctl stop pve-cluster |
406 | systemctl stop corosync | |
4d19cb00 | 407 | ---- |
555e966b TL |
408 | |
409 | Start the cluster filesystem again in local mode: | |
410 | [source,bash] | |
4d19cb00 | 411 | ---- |
555e966b | 412 | pmxcfs -l |
4d19cb00 | 413 | ---- |
555e966b TL |
414 | |
415 | Delete the corosync configuration files: | |
416 | [source,bash] | |
4d19cb00 | 417 | ---- |
555e966b | 418 | rm /etc/pve/corosync.conf |
838081cd | 419 | rm -r /etc/corosync/* |
4d19cb00 | 420 | ---- |
555e966b TL |
421 | |
422 | You can now start the filesystem again as normal service: | |
423 | [source,bash] | |
4d19cb00 | 424 | ---- |
555e966b TL |
425 | killall pmxcfs |
426 | systemctl start pve-cluster | |
4d19cb00 | 427 | ---- |
555e966b TL |
428 | |
429 | The node is now separated from the cluster. You can deleted it from a remaining | |
430 | node of the cluster with: | |
431 | [source,bash] | |
4d19cb00 | 432 | ---- |
555e966b | 433 | pvecm delnode oldnode |
4d19cb00 | 434 | ---- |
555e966b TL |
435 | |
436 | If the command failed, because the remaining node in the cluster lost quorum | |
437 | when the now separate node exited, you may set the expected votes to 1 as a workaround: | |
438 | [source,bash] | |
4d19cb00 | 439 | ---- |
555e966b | 440 | pvecm expected 1 |
4d19cb00 | 441 | ---- |
555e966b | 442 | |
96d698db | 443 | And then repeat the 'pvecm delnode' command. |
555e966b TL |
444 | |
445 | Now switch back to the separated node, here delete all remaining files left | |
446 | from the old cluster. This ensures that the node can be added to another | |
447 | cluster again without problems. | |
448 | ||
449 | [source,bash] | |
4d19cb00 | 450 | ---- |
555e966b | 451 | rm /var/lib/corosync/* |
4d19cb00 | 452 | ---- |
555e966b TL |
453 | |
454 | As the configuration files from the other nodes are still in the cluster | |
455 | filesystem you may want to clean those up too. Remove simply the whole | |
456 | directory recursive from '/etc/pve/nodes/NODENAME', but check three times that | |
457 | you used the correct one before deleting it. | |
458 | ||
459 | CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means | |
460 | the nodes can still connect to each other with public key authentication. This | |
461 | should be fixed by removing the respective keys from the | |
462 | '/etc/pve/priv/authorized_keys' file. | |
d8742b0c | 463 | |
a9e7c3aa | 464 | |
806ef12d DM |
465 | Quorum |
466 | ------ | |
467 | ||
468 | {pve} use a quorum-based technique to provide a consistent state among | |
469 | all cluster nodes. | |
470 | ||
471 | [quote, from Wikipedia, Quorum (distributed computing)] | |
472 | ____ | |
473 | A quorum is the minimum number of votes that a distributed transaction | |
474 | has to obtain in order to be allowed to perform an operation in a | |
475 | distributed system. | |
476 | ____ | |
477 | ||
478 | In case of network partitioning, state changes requires that a | |
479 | majority of nodes are online. The cluster switches to read-only mode | |
5eba0743 | 480 | if it loses quorum. |
806ef12d DM |
481 | |
482 | NOTE: {pve} assigns a single vote to each node by default. | |
483 | ||
a9e7c3aa | 484 | |
e4ec4154 TL |
485 | Cluster Network |
486 | --------------- | |
487 | ||
488 | The cluster network is the core of a cluster. All messages sent over it have to | |
a9e7c3aa SR |
489 | be delivered reliably to all nodes in their respective order. In {pve} this |
490 | part is done by corosync, an implementation of a high performance, low overhead | |
e4ec4154 TL |
491 | high availability development toolkit. It serves our decentralized |
492 | configuration file system (`pmxcfs`). | |
493 | ||
3254bfdd | 494 | [[pvecm_cluster_network_requirements]] |
e4ec4154 TL |
495 | Network Requirements |
496 | ~~~~~~~~~~~~~~~~~~~~ | |
497 | This needs a reliable network with latencies under 2 milliseconds (LAN | |
a9e7c3aa SR |
498 | performance) to work properly. The network should not be used heavily by other |
499 | members, ideally corosync runs on its own network. Do not use a shared network | |
500 | for corosync and storage (except as a potential low-priority fallback in a | |
501 | xref:pvecm_redundancy[redundant] configuration). | |
e4ec4154 | 502 | |
a9e7c3aa SR |
503 | Before setting up a cluster, it is good practice to check if the network is fit |
504 | for that purpose. To make sure the nodes can connect to each other on the | |
505 | cluster network, you can test the connectivity between them with the `ping` | |
506 | tool. | |
e4ec4154 | 507 | |
a9e7c3aa SR |
508 | If the {pve} firewall is enabled, ACCEPT rules for corosync will automatically |
509 | be generated - no manual action is required. | |
e4ec4154 | 510 | |
a9e7c3aa SR |
511 | NOTE: Corosync used Multicast before version 3.0 (introduced in {pve} 6.0). |
512 | Modern versions rely on https://kronosnet.org/[Kronosnet] for cluster | |
513 | communication, which, for now, only supports regular UDP unicast. | |
e4ec4154 | 514 | |
a9e7c3aa SR |
515 | CAUTION: You can still enable Multicast or legacy unicast by setting your |
516 | transport to `udp` or `udpu` in your xref:pvecm_edit_corosync_conf[corosync.conf], | |
517 | but keep in mind that this will disable all cryptography and redundancy support. | |
518 | This is therefore not recommended. | |
e4ec4154 TL |
519 | |
520 | Separate Cluster Network | |
521 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
522 | ||
a9e7c3aa SR |
523 | When creating a cluster without any parameters the corosync cluster network is |
524 | generally shared with the Web UI and the VMs and their traffic. Depending on | |
525 | your setup, even storage traffic may get sent over the same network. Its | |
526 | recommended to change that, as corosync is a time critical real time | |
527 | application. | |
e4ec4154 TL |
528 | |
529 | Setting Up A New Network | |
530 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
531 | ||
9ffebff5 | 532 | First, you have to set up a new network interface. It should be on a physically |
e4ec4154 | 533 | separate network. Ensure that your network fulfills the |
3254bfdd | 534 | xref:pvecm_cluster_network_requirements[cluster network requirements]. |
e4ec4154 TL |
535 | |
536 | Separate On Cluster Creation | |
537 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
538 | ||
a9e7c3aa SR |
539 | This is possible via the 'linkX' parameters of the 'pvecm create' |
540 | command used for creating a new cluster. | |
e4ec4154 | 541 | |
a9e7c3aa SR |
542 | If you have set up an additional NIC with a static address on 10.10.10.1/25, |
543 | and want to send and receive all cluster communication over this interface, | |
e4ec4154 TL |
544 | you would execute: |
545 | ||
546 | [source,bash] | |
4d19cb00 | 547 | ---- |
a9e7c3aa | 548 | pvecm create test --link0 10.10.10.1 |
4d19cb00 | 549 | ---- |
e4ec4154 TL |
550 | |
551 | To check if everything is working properly execute: | |
552 | [source,bash] | |
4d19cb00 | 553 | ---- |
e4ec4154 | 554 | systemctl status corosync |
4d19cb00 | 555 | ---- |
e4ec4154 | 556 | |
a9e7c3aa | 557 | Afterwards, proceed as described above to |
3254bfdd | 558 | xref:pvecm_adding_nodes_with_separated_cluster_network[add nodes with a separated cluster network]. |
82d52451 | 559 | |
3254bfdd | 560 | [[pvecm_separate_cluster_net_after_creation]] |
e4ec4154 TL |
561 | Separate After Cluster Creation |
562 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
563 | ||
a9e7c3aa | 564 | You can do this if you have already created a cluster and want to switch |
e4ec4154 TL |
565 | its communication to another network, without rebuilding the whole cluster. |
566 | This change may lead to short durations of quorum loss in the cluster, as nodes | |
567 | have to restart corosync and come up one after the other on the new network. | |
568 | ||
3254bfdd | 569 | Check how to xref:pvecm_edit_corosync_conf[edit the corosync.conf file] first. |
a9e7c3aa | 570 | Then, open it and you should see a file similar to: |
e4ec4154 TL |
571 | |
572 | ---- | |
573 | logging { | |
574 | debug: off | |
575 | to_syslog: yes | |
576 | } | |
577 | ||
578 | nodelist { | |
579 | ||
580 | node { | |
581 | name: due | |
582 | nodeid: 2 | |
583 | quorum_votes: 1 | |
584 | ring0_addr: due | |
585 | } | |
586 | ||
587 | node { | |
588 | name: tre | |
589 | nodeid: 3 | |
590 | quorum_votes: 1 | |
591 | ring0_addr: tre | |
592 | } | |
593 | ||
594 | node { | |
595 | name: uno | |
596 | nodeid: 1 | |
597 | quorum_votes: 1 | |
598 | ring0_addr: uno | |
599 | } | |
600 | ||
601 | } | |
602 | ||
603 | quorum { | |
604 | provider: corosync_votequorum | |
605 | } | |
606 | ||
607 | totem { | |
a9e7c3aa | 608 | cluster_name: testcluster |
e4ec4154 | 609 | config_version: 3 |
a9e7c3aa | 610 | ip_version: ipv4-6 |
e4ec4154 TL |
611 | secauth: on |
612 | version: 2 | |
613 | interface { | |
a9e7c3aa | 614 | linknumber: 0 |
e4ec4154 TL |
615 | } |
616 | ||
617 | } | |
618 | ---- | |
619 | ||
a9e7c3aa SR |
620 | NOTE: `ringX_addr` actually specifies a corosync *link address*, the name "ring" |
621 | is a remnant of older corosync versions that is kept for backwards | |
622 | compatibility. | |
623 | ||
624 | The first thing you want to do is add the 'name' properties in the node entries | |
625 | if you do not see them already. Those *must* match the node name. | |
e4ec4154 | 626 | |
a9e7c3aa SR |
627 | Then replace all addresses from the 'ring0_addr' properties of all nodes with |
628 | the new addresses. You may use plain IP addresses or hostnames here. If you use | |
270757a1 | 629 | hostnames ensure that they are resolvable from all nodes. (see also |
a9e7c3aa | 630 | xref:pvecm_corosync_addresses[Link Address Types]) |
e4ec4154 | 631 | |
a9e7c3aa SR |
632 | In this example, we want to switch the cluster communication to the |
633 | 10.10.10.1/25 network. So we replace all 'ring0_addr' respectively. | |
e4ec4154 | 634 | |
a9e7c3aa SR |
635 | NOTE: The exact same procedure can be used to change other 'ringX_addr' values |
636 | as well, although we recommend to not change multiple addresses at once, to make | |
637 | it easier to recover if something goes wrong. | |
638 | ||
639 | After we increase the 'config_version' property, the new configuration file | |
e4ec4154 TL |
640 | should look like: |
641 | ||
642 | ---- | |
e4ec4154 TL |
643 | logging { |
644 | debug: off | |
645 | to_syslog: yes | |
646 | } | |
647 | ||
648 | nodelist { | |
649 | ||
650 | node { | |
651 | name: due | |
652 | nodeid: 2 | |
653 | quorum_votes: 1 | |
654 | ring0_addr: 10.10.10.2 | |
655 | } | |
656 | ||
657 | node { | |
658 | name: tre | |
659 | nodeid: 3 | |
660 | quorum_votes: 1 | |
661 | ring0_addr: 10.10.10.3 | |
662 | } | |
663 | ||
664 | node { | |
665 | name: uno | |
666 | nodeid: 1 | |
667 | quorum_votes: 1 | |
668 | ring0_addr: 10.10.10.1 | |
669 | } | |
670 | ||
671 | } | |
672 | ||
673 | quorum { | |
674 | provider: corosync_votequorum | |
675 | } | |
676 | ||
677 | totem { | |
a9e7c3aa | 678 | cluster_name: testcluster |
e4ec4154 | 679 | config_version: 4 |
a9e7c3aa | 680 | ip_version: ipv4-6 |
e4ec4154 TL |
681 | secauth: on |
682 | version: 2 | |
683 | interface { | |
a9e7c3aa | 684 | linknumber: 0 |
e4ec4154 TL |
685 | } |
686 | ||
687 | } | |
688 | ---- | |
689 | ||
a9e7c3aa SR |
690 | Then, after a final check if all changed information is correct, we save it and |
691 | once again follow the xref:pvecm_edit_corosync_conf[edit corosync.conf file] | |
692 | section to bring it into effect. | |
e4ec4154 | 693 | |
a9e7c3aa SR |
694 | The changes will be applied live, so restarting corosync is not strictly |
695 | necessary. If you changed other settings as well, or notice corosync | |
696 | complaining, you can optionally trigger a restart. | |
e4ec4154 TL |
697 | |
698 | On a single node execute: | |
a9e7c3aa | 699 | |
e4ec4154 | 700 | [source,bash] |
4d19cb00 | 701 | ---- |
e4ec4154 | 702 | systemctl restart corosync |
4d19cb00 | 703 | ---- |
e4ec4154 TL |
704 | |
705 | Now check if everything is fine: | |
706 | ||
707 | [source,bash] | |
4d19cb00 | 708 | ---- |
e4ec4154 | 709 | systemctl status corosync |
4d19cb00 | 710 | ---- |
e4ec4154 TL |
711 | |
712 | If corosync runs again correct restart corosync also on all other nodes. | |
713 | They will then join the cluster membership one by one on the new network. | |
714 | ||
3254bfdd | 715 | [[pvecm_corosync_addresses]] |
270757a1 SR |
716 | Corosync addresses |
717 | ~~~~~~~~~~~~~~~~~~ | |
718 | ||
a9e7c3aa SR |
719 | A corosync link address (for backwards compatibility denoted by 'ringX_addr' in |
720 | `corosync.conf`) can be specified in two ways: | |
270757a1 SR |
721 | |
722 | * **IPv4/v6 addresses** will be used directly. They are recommended, since they | |
723 | are static and usually not changed carelessly. | |
724 | ||
725 | * **Hostnames** will be resolved using `getaddrinfo`, which means that per | |
726 | default, IPv6 addresses will be used first, if available (see also | |
727 | `man gai.conf`). Keep this in mind, especially when upgrading an existing | |
728 | cluster to IPv6. | |
729 | ||
730 | CAUTION: Hostnames should be used with care, since the address they | |
731 | resolve to can be changed without touching corosync or the node it runs on - | |
732 | which may lead to a situation where an address is changed without thinking | |
733 | about implications for corosync. | |
734 | ||
5f318cc0 | 735 | A separate, static hostname specifically for corosync is recommended, if |
270757a1 SR |
736 | hostnames are preferred. Also, make sure that every node in the cluster can |
737 | resolve all hostnames correctly. | |
738 | ||
739 | Since {pve} 5.1, while supported, hostnames will be resolved at the time of | |
740 | entry. Only the resolved IP is then saved to the configuration. | |
741 | ||
742 | Nodes that joined the cluster on earlier versions likely still use their | |
743 | unresolved hostname in `corosync.conf`. It might be a good idea to replace | |
5f318cc0 | 744 | them with IPs or a separate hostname, as mentioned above. |
270757a1 | 745 | |
e4ec4154 | 746 | |
a9e7c3aa SR |
747 | [[pvecm_redundancy]] |
748 | Corosync Redundancy | |
749 | ------------------- | |
e4ec4154 | 750 | |
a9e7c3aa SR |
751 | Corosync supports redundant networking via its integrated kronosnet layer by |
752 | default (it is not supported on the legacy udp/udpu transports). It can be | |
753 | enabled by specifying more than one link address, either via the '--linkX' | |
3e380ce0 SR |
754 | parameters of `pvecm`, in the GUI as **Link 1** (while creating a cluster or |
755 | adding a new node) or by specifying more than one 'ringX_addr' in | |
756 | `corosync.conf`. | |
e4ec4154 | 757 | |
a9e7c3aa SR |
758 | NOTE: To provide useful failover, every link should be on its own |
759 | physical network connection. | |
e4ec4154 | 760 | |
a9e7c3aa SR |
761 | Links are used according to a priority setting. You can configure this priority |
762 | by setting 'knet_link_priority' in the corresponding interface section in | |
5f318cc0 | 763 | `corosync.conf`, or, preferably, using the 'priority' parameter when creating |
a9e7c3aa | 764 | your cluster with `pvecm`: |
e4ec4154 | 765 | |
4d19cb00 | 766 | ---- |
fcf0226e | 767 | # pvecm create CLUSTERNAME --link0 10.10.10.1,priority=15 --link1 10.20.20.1,priority=20 |
4d19cb00 | 768 | ---- |
e4ec4154 | 769 | |
fcf0226e | 770 | This would cause 'link1' to be used first, since it has the higher priority. |
a9e7c3aa SR |
771 | |
772 | If no priorities are configured manually (or two links have the same priority), | |
773 | links will be used in order of their number, with the lower number having higher | |
774 | priority. | |
775 | ||
776 | Even if all links are working, only the one with the highest priority will see | |
777 | corosync traffic. Link priorities cannot be mixed, i.e. links with different | |
778 | priorities will not be able to communicate with each other. | |
e4ec4154 | 779 | |
a9e7c3aa SR |
780 | Since lower priority links will not see traffic unless all higher priorities |
781 | have failed, it becomes a useful strategy to specify even networks used for | |
782 | other tasks (VMs, storage, etc...) as low-priority links. If worst comes to | |
783 | worst, a higher-latency or more congested connection might be better than no | |
784 | connection at all. | |
e4ec4154 | 785 | |
a9e7c3aa SR |
786 | Adding Redundant Links To An Existing Cluster |
787 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
e4ec4154 | 788 | |
a9e7c3aa SR |
789 | To add a new link to a running configuration, first check how to |
790 | xref:pvecm_edit_corosync_conf[edit the corosync.conf file]. | |
e4ec4154 | 791 | |
a9e7c3aa SR |
792 | Then, add a new 'ringX_addr' to every node in the `nodelist` section. Make |
793 | sure that your 'X' is the same for every node you add it to, and that it is | |
794 | unique for each node. | |
795 | ||
796 | Lastly, add a new 'interface', as shown below, to your `totem` | |
797 | section, replacing 'X' with your link number chosen above. | |
798 | ||
799 | Assuming you added a link with number 1, the new configuration file could look | |
800 | like this: | |
e4ec4154 TL |
801 | |
802 | ---- | |
a9e7c3aa SR |
803 | logging { |
804 | debug: off | |
805 | to_syslog: yes | |
e4ec4154 TL |
806 | } |
807 | ||
808 | nodelist { | |
a9e7c3aa | 809 | |
e4ec4154 | 810 | node { |
a9e7c3aa SR |
811 | name: due |
812 | nodeid: 2 | |
e4ec4154 | 813 | quorum_votes: 1 |
a9e7c3aa SR |
814 | ring0_addr: 10.10.10.2 |
815 | ring1_addr: 10.20.20.2 | |
e4ec4154 TL |
816 | } |
817 | ||
a9e7c3aa SR |
818 | node { |
819 | name: tre | |
820 | nodeid: 3 | |
e4ec4154 | 821 | quorum_votes: 1 |
a9e7c3aa SR |
822 | ring0_addr: 10.10.10.3 |
823 | ring1_addr: 10.20.20.3 | |
e4ec4154 TL |
824 | } |
825 | ||
a9e7c3aa SR |
826 | node { |
827 | name: uno | |
828 | nodeid: 1 | |
829 | quorum_votes: 1 | |
830 | ring0_addr: 10.10.10.1 | |
831 | ring1_addr: 10.20.20.1 | |
832 | } | |
833 | ||
834 | } | |
835 | ||
836 | quorum { | |
837 | provider: corosync_votequorum | |
838 | } | |
839 | ||
840 | totem { | |
841 | cluster_name: testcluster | |
842 | config_version: 4 | |
843 | ip_version: ipv4-6 | |
844 | secauth: on | |
845 | version: 2 | |
846 | interface { | |
847 | linknumber: 0 | |
848 | } | |
849 | interface { | |
850 | linknumber: 1 | |
851 | } | |
e4ec4154 | 852 | } |
a9e7c3aa | 853 | ---- |
e4ec4154 | 854 | |
a9e7c3aa SR |
855 | The new link will be enabled as soon as you follow the last steps to |
856 | xref:pvecm_edit_corosync_conf[edit the corosync.conf file]. A restart should not | |
857 | be necessary. You can check that corosync loaded the new link using: | |
e4ec4154 | 858 | |
a9e7c3aa SR |
859 | ---- |
860 | journalctl -b -u corosync | |
e4ec4154 TL |
861 | ---- |
862 | ||
a9e7c3aa SR |
863 | It might be a good idea to test the new link by temporarily disconnecting the |
864 | old link on one node and making sure that its status remains online while | |
865 | disconnected: | |
e4ec4154 | 866 | |
a9e7c3aa SR |
867 | ---- |
868 | pvecm status | |
869 | ---- | |
870 | ||
871 | If you see a healthy cluster state, it means that your new link is being used. | |
e4ec4154 | 872 | |
e4ec4154 | 873 | |
9d999d1b TL |
874 | Role of SSH in {PVE} Clusters |
875 | ----------------------------- | |
39aa8892 | 876 | |
4e8fe2a9 | 877 | {PVE} utilizes SSH tunnels for various features. |
39aa8892 | 878 | |
4e8fe2a9 | 879 | * Proxying console/shell sessions (node and guests) |
9d999d1b | 880 | + |
4e8fe2a9 FG |
881 | When using the shell for node B while being connected to node A, connects to a |
882 | terminal proxy on node A, which is in turn connected to the login shell on node | |
883 | B via a non-interactive SSH tunnel. | |
39aa8892 | 884 | |
4e8fe2a9 FG |
885 | * VM and CT memory and local-storage migration in 'secure' mode. |
886 | + | |
887 | During the migration one or more SSH tunnel(s) are established between the | |
888 | source and target nodes, in order to exchange migration information and | |
889 | transfer memory and disk contents. | |
9d999d1b TL |
890 | |
891 | * Storage replication | |
39aa8892 | 892 | |
9d999d1b TL |
893 | .Pitfalls due to automatic execution of `.bashrc` and siblings |
894 | [IMPORTANT] | |
895 | ==== | |
896 | In case you have a custom `.bashrc`, or similar files that get executed on | |
897 | login by the configured shell, `ssh` will automatically run it once the session | |
898 | is established successfully. This can cause some unexpected behavior, as those | |
899 | commands may be executed with root permissions on any above described | |
900 | operation. That can cause possible problematic side-effects! | |
39aa8892 OB |
901 | |
902 | In order to avoid such complications, it's recommended to add a check in | |
903 | `/root/.bashrc` to make sure the session is interactive, and only then run | |
904 | `.bashrc` commands. | |
905 | ||
906 | You can add this snippet at the beginning of your `.bashrc` file: | |
907 | ||
908 | ---- | |
9d999d1b | 909 | # Early exit if not running interactively to avoid side-effects! |
39aa8892 OB |
910 | case $- in |
911 | *i*) ;; | |
912 | *) return;; | |
913 | esac | |
914 | ---- | |
9d999d1b | 915 | ==== |
39aa8892 OB |
916 | |
917 | ||
c21d2cbe OB |
918 | Corosync External Vote Support |
919 | ------------------------------ | |
920 | ||
921 | This section describes a way to deploy an external voter in a {pve} cluster. | |
922 | When configured, the cluster can sustain more node failures without | |
923 | violating safety properties of the cluster communication. | |
924 | ||
925 | For this to work there are two services involved: | |
926 | ||
927 | * a so called qdevice daemon which runs on each {pve} node | |
928 | ||
929 | * an external vote daemon which runs on an independent server. | |
930 | ||
931 | As a result you can achieve higher availability even in smaller setups (for | |
932 | example 2+1 nodes). | |
933 | ||
934 | QDevice Technical Overview | |
935 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
936 | ||
5f318cc0 | 937 | The Corosync Quorum Device (QDevice) is a daemon which runs on each cluster |
c21d2cbe OB |
938 | node. It provides a configured number of votes to the clusters quorum |
939 | subsystem based on an external running third-party arbitrator's decision. | |
940 | Its primary use is to allow a cluster to sustain more node failures than | |
941 | standard quorum rules allow. This can be done safely as the external device | |
942 | can see all nodes and thus choose only one set of nodes to give its vote. | |
51730d56 | 943 | This will only be done if said set of nodes can have quorum (again) when |
c21d2cbe OB |
944 | receiving the third-party vote. |
945 | ||
946 | Currently only 'QDevice Net' is supported as a third-party arbitrator. It is | |
947 | a daemon which provides a vote to a cluster partition if it can reach the | |
948 | partition members over the network. It will give only votes to one partition | |
949 | of a cluster at any time. | |
950 | It's designed to support multiple clusters and is almost configuration and | |
951 | state free. New clusters are handled dynamically and no configuration file | |
952 | is needed on the host running a QDevice. | |
953 | ||
954 | The external host has the only requirement that it needs network access to the | |
955 | cluster and a corosync-qnetd package available. We provide such a package | |
956 | for Debian based hosts, other Linux distributions should also have a package | |
957 | available through their respective package manager. | |
958 | ||
959 | NOTE: In contrast to corosync itself, a QDevice connects to the cluster over | |
a9e7c3aa SR |
960 | TCP/IP. The daemon may even run outside of the clusters LAN and can have longer |
961 | latencies than 2 ms. | |
c21d2cbe OB |
962 | |
963 | Supported Setups | |
964 | ~~~~~~~~~~~~~~~~ | |
965 | ||
966 | We support QDevices for clusters with an even number of nodes and recommend | |
967 | it for 2 node clusters, if they should provide higher availability. | |
968 | For clusters with an odd node count we discourage the use of QDevices | |
969 | currently. The reason for this, is the difference of the votes the QDevice | |
970 | provides for each cluster type. Even numbered clusters get single additional | |
971 | vote, with this we can only increase availability, i.e. if the QDevice | |
972 | itself fails we are in the same situation as with no QDevice at all. | |
973 | ||
974 | Now, with an odd numbered cluster size the QDevice provides '(N-1)' votes -- | |
975 | where 'N' corresponds to the cluster node count. This difference makes | |
976 | sense, if we had only one additional vote the cluster can get into a split | |
977 | brain situation. | |
978 | This algorithm would allow that all nodes but one (and naturally the | |
979 | QDevice itself) could fail. | |
980 | There are two drawbacks with this: | |
981 | ||
982 | * If the QNet daemon itself fails, no other node may fail or the cluster | |
983 | immediately loses quorum. For example, in a cluster with 15 nodes 7 | |
984 | could fail before the cluster becomes inquorate. But, if a QDevice is | |
985 | configured here and said QDevice fails itself **no single node** of | |
986 | the 15 may fail. The QDevice acts almost as a single point of failure in | |
987 | this case. | |
988 | ||
989 | * The fact that all but one node plus QDevice may fail sound promising at | |
990 | first, but this may result in a mass recovery of HA services that would | |
991 | overload the single node left. Also ceph server will stop to provide | |
992 | services after only '((N-1)/2)' nodes are online. | |
993 | ||
994 | If you understand the drawbacks and implications you can decide yourself if | |
995 | you should use this technology in an odd numbered cluster setup. | |
996 | ||
c21d2cbe OB |
997 | QDevice-Net Setup |
998 | ~~~~~~~~~~~~~~~~~ | |
999 | ||
1000 | We recommend to run any daemon which provides votes to corosync-qdevice as an | |
7c039095 | 1001 | unprivileged user. {pve} and Debian provide a package which is already |
e34c3e91 | 1002 | configured to do so. |
c21d2cbe OB |
1003 | The traffic between the daemon and the cluster must be encrypted to ensure a |
1004 | safe and secure QDevice integration in {pve}. | |
1005 | ||
41a37193 DJ |
1006 | First, install the 'corosync-qnetd' package on your external server |
1007 | ||
1008 | ---- | |
1009 | external# apt install corosync-qnetd | |
1010 | ---- | |
1011 | ||
1012 | and the 'corosync-qdevice' package on all cluster nodes | |
1013 | ||
1014 | ---- | |
1015 | pve# apt install corosync-qdevice | |
1016 | ---- | |
c21d2cbe OB |
1017 | |
1018 | After that, ensure that all your nodes on the cluster are online. | |
1019 | ||
1020 | You can now easily set up your QDevice by running the following command on one | |
1021 | of the {pve} nodes: | |
1022 | ||
1023 | ---- | |
1024 | pve# pvecm qdevice setup <QDEVICE-IP> | |
1025 | ---- | |
1026 | ||
1b80fbaa DJ |
1027 | The SSH key from the cluster will be automatically copied to the QDevice. |
1028 | ||
1029 | NOTE: Make sure that the SSH configuration on your external server allows root | |
1030 | login via password, if you are asked for a password during this step. | |
c21d2cbe OB |
1031 | |
1032 | After you enter the password and all the steps are successfully completed, you | |
1033 | will see "Done". You can check the status now: | |
1034 | ||
1035 | ---- | |
1036 | pve# pvecm status | |
1037 | ||
1038 | ... | |
1039 | ||
1040 | Votequorum information | |
1041 | ~~~~~~~~~~~~~~~~~~~~~ | |
1042 | Expected votes: 3 | |
1043 | Highest expected: 3 | |
1044 | Total votes: 3 | |
1045 | Quorum: 2 | |
1046 | Flags: Quorate Qdevice | |
1047 | ||
1048 | Membership information | |
1049 | ~~~~~~~~~~~~~~~~~~~~~~ | |
1050 | Nodeid Votes Qdevice Name | |
1051 | 0x00000001 1 A,V,NMW 192.168.22.180 (local) | |
1052 | 0x00000002 1 A,V,NMW 192.168.22.181 | |
1053 | 0x00000000 1 Qdevice | |
1054 | ||
1055 | ---- | |
1056 | ||
1057 | which means the QDevice is set up. | |
1058 | ||
c21d2cbe OB |
1059 | Frequently Asked Questions |
1060 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
1061 | ||
1062 | Tie Breaking | |
1063 | ^^^^^^^^^^^^ | |
1064 | ||
00821894 TL |
1065 | In case of a tie, where two same-sized cluster partitions cannot see each other |
1066 | but the QDevice, the QDevice chooses randomly one of those partitions and | |
c21d2cbe OB |
1067 | provides a vote to it. |
1068 | ||
d31de328 TL |
1069 | Possible Negative Implications |
1070 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1071 | ||
00821894 TL |
1072 | For clusters with an even node count there are no negative implications when |
1073 | setting up a QDevice. If it fails to work, you are as good as without QDevice at | |
1074 | all. | |
d31de328 | 1075 | |
870c2817 OB |
1076 | Adding/Deleting Nodes After QDevice Setup |
1077 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
d31de328 TL |
1078 | |
1079 | If you want to add a new node or remove an existing one from a cluster with a | |
00821894 TL |
1080 | QDevice setup, you need to remove the QDevice first. After that, you can add or |
1081 | remove nodes normally. Once you have a cluster with an even node count again, | |
1082 | you can set up the QDevice again as described above. | |
870c2817 OB |
1083 | |
1084 | Removing the QDevice | |
1085 | ^^^^^^^^^^^^^^^^^^^^ | |
1086 | ||
00821894 TL |
1087 | If you used the official `pvecm` tool to add the QDevice, you can remove it |
1088 | trivially by running: | |
870c2817 OB |
1089 | |
1090 | ---- | |
1091 | pve# pvecm qdevice remove | |
1092 | ---- | |
d31de328 | 1093 | |
51730d56 TL |
1094 | //Still TODO |
1095 | //^^^^^^^^^^ | |
a9e7c3aa | 1096 | //There is still stuff to add here |
c21d2cbe OB |
1097 | |
1098 | ||
e4ec4154 TL |
1099 | Corosync Configuration |
1100 | ---------------------- | |
1101 | ||
a9e7c3aa SR |
1102 | The `/etc/pve/corosync.conf` file plays a central role in a {pve} cluster. It |
1103 | controls the cluster membership and its network. | |
1104 | For further information about it, check the corosync.conf man page: | |
e4ec4154 | 1105 | [source,bash] |
4d19cb00 | 1106 | ---- |
e4ec4154 | 1107 | man corosync.conf |
4d19cb00 | 1108 | ---- |
e4ec4154 TL |
1109 | |
1110 | For node membership you should always use the `pvecm` tool provided by {pve}. | |
1111 | You may have to edit the configuration file manually for other changes. | |
1112 | Here are a few best practice tips for doing this. | |
1113 | ||
3254bfdd | 1114 | [[pvecm_edit_corosync_conf]] |
e4ec4154 TL |
1115 | Edit corosync.conf |
1116 | ~~~~~~~~~~~~~~~~~~ | |
1117 | ||
a9e7c3aa SR |
1118 | Editing the corosync.conf file is not always very straightforward. There are |
1119 | two on each cluster node, one in `/etc/pve/corosync.conf` and the other in | |
e4ec4154 TL |
1120 | `/etc/corosync/corosync.conf`. Editing the one in our cluster file system will |
1121 | propagate the changes to the local one, but not vice versa. | |
1122 | ||
1123 | The configuration will get updated automatically as soon as the file changes. | |
1124 | This means changes which can be integrated in a running corosync will take | |
a9e7c3aa SR |
1125 | effect immediately. So you should always make a copy and edit that instead, to |
1126 | avoid triggering some unwanted changes by an in-between safe. | |
e4ec4154 TL |
1127 | |
1128 | [source,bash] | |
4d19cb00 | 1129 | ---- |
e4ec4154 | 1130 | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new |
4d19cb00 | 1131 | ---- |
e4ec4154 | 1132 | |
a9e7c3aa SR |
1133 | Then open the config file with your favorite editor, `nano` and `vim.tiny` are |
1134 | preinstalled on any {pve} node for example. | |
e4ec4154 TL |
1135 | |
1136 | NOTE: Always increment the 'config_version' number on configuration changes, | |
1137 | omitting this can lead to problems. | |
1138 | ||
1139 | After making the necessary changes create another copy of the current working | |
1140 | configuration file. This serves as a backup if the new configuration fails to | |
1141 | apply or makes problems in other ways. | |
1142 | ||
1143 | [source,bash] | |
4d19cb00 | 1144 | ---- |
e4ec4154 | 1145 | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak |
4d19cb00 | 1146 | ---- |
e4ec4154 TL |
1147 | |
1148 | Then move the new configuration file over the old one: | |
1149 | [source,bash] | |
4d19cb00 | 1150 | ---- |
e4ec4154 | 1151 | mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf |
4d19cb00 | 1152 | ---- |
e4ec4154 TL |
1153 | |
1154 | You may check with the commands | |
1155 | [source,bash] | |
4d19cb00 | 1156 | ---- |
e4ec4154 TL |
1157 | systemctl status corosync |
1158 | journalctl -b -u corosync | |
4d19cb00 | 1159 | ---- |
e4ec4154 | 1160 | |
a9e7c3aa | 1161 | If the change could be applied automatically. If not you may have to restart the |
e4ec4154 TL |
1162 | corosync service via: |
1163 | [source,bash] | |
4d19cb00 | 1164 | ---- |
e4ec4154 | 1165 | systemctl restart corosync |
4d19cb00 | 1166 | ---- |
e4ec4154 TL |
1167 | |
1168 | On errors check the troubleshooting section below. | |
1169 | ||
1170 | Troubleshooting | |
1171 | ~~~~~~~~~~~~~~~ | |
1172 | ||
1173 | Issue: 'quorum.expected_votes must be configured' | |
1174 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1175 | ||
1176 | When corosync starts to fail and you get the following message in the system log: | |
1177 | ||
1178 | ---- | |
1179 | [...] | |
1180 | corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. | |
1181 | corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason | |
1182 | 'configuration error: nodelist or quorum.expected_votes must be configured!' | |
1183 | [...] | |
1184 | ---- | |
1185 | ||
1186 | It means that the hostname you set for corosync 'ringX_addr' in the | |
1187 | configuration could not be resolved. | |
1188 | ||
e4ec4154 TL |
1189 | Write Configuration When Not Quorate |
1190 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1191 | ||
1192 | If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you | |
1193 | know what you do, use: | |
1194 | [source,bash] | |
4d19cb00 | 1195 | ---- |
e4ec4154 | 1196 | pvecm expected 1 |
4d19cb00 | 1197 | ---- |
e4ec4154 TL |
1198 | |
1199 | This sets the expected vote count to 1 and makes the cluster quorate. You can | |
1200 | now fix your configuration, or revert it back to the last working backup. | |
1201 | ||
6d3c0b34 | 1202 | This is not enough if corosync cannot start anymore. Here it is best to edit the |
e4ec4154 TL |
1203 | local copy of the corosync configuration in '/etc/corosync/corosync.conf' so |
1204 | that corosync can start again. Ensure that on all nodes this configuration has | |
1205 | the same content to avoid split brains. If you are not sure what went wrong | |
1206 | it's best to ask the Proxmox Community to help you. | |
1207 | ||
1208 | ||
3254bfdd | 1209 | [[pvecm_corosync_conf_glossary]] |
e4ec4154 TL |
1210 | Corosync Configuration Glossary |
1211 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
1212 | ||
1213 | ringX_addr:: | |
a9e7c3aa SR |
1214 | This names the different link addresses for the kronosnet connections between |
1215 | nodes. | |
e4ec4154 | 1216 | |
806ef12d DM |
1217 | |
1218 | Cluster Cold Start | |
1219 | ------------------ | |
1220 | ||
1221 | It is obvious that a cluster is not quorate when all nodes are | |
1222 | offline. This is a common case after a power failure. | |
1223 | ||
1224 | NOTE: It is always a good idea to use an uninterruptible power supply | |
8c1189b6 | 1225 | (``UPS'', also called ``battery backup'') to avoid this state, especially if |
806ef12d DM |
1226 | you want HA. |
1227 | ||
204231df | 1228 | On node startup, the `pve-guests` service is started and waits for |
8c1189b6 | 1229 | quorum. Once quorate, it starts all guests which have the `onboot` |
612417fd DM |
1230 | flag set. |
1231 | ||
1232 | When you turn on nodes, or when power comes back after power failure, | |
1233 | it is likely that some nodes boots faster than others. Please keep in | |
1234 | mind that guest startup is delayed until you reach quorum. | |
806ef12d | 1235 | |
054a7e7d | 1236 | |
082ea7d9 TL |
1237 | Guest Migration |
1238 | --------------- | |
1239 | ||
054a7e7d DM |
1240 | Migrating virtual guests to other nodes is a useful feature in a |
1241 | cluster. There are settings to control the behavior of such | |
1242 | migrations. This can be done via the configuration file | |
1243 | `datacenter.cfg` or for a specific migration via API or command line | |
1244 | parameters. | |
1245 | ||
da6c7dee DC |
1246 | It makes a difference if a Guest is online or offline, or if it has |
1247 | local resources (like a local disk). | |
1248 | ||
1249 | For Details about Virtual Machine Migration see the | |
a9e7c3aa | 1250 | xref:qm_migration[QEMU/KVM Migration Chapter]. |
da6c7dee DC |
1251 | |
1252 | For Details about Container Migration see the | |
a9e7c3aa | 1253 | xref:pct_migration[Container Migration Chapter]. |
082ea7d9 TL |
1254 | |
1255 | Migration Type | |
1256 | ~~~~~~~~~~~~~~ | |
1257 | ||
44f38275 | 1258 | The migration type defines if the migration data should be sent over an |
d63be10b | 1259 | encrypted (`secure`) channel or an unencrypted (`insecure`) one. |
082ea7d9 | 1260 | Setting the migration type to insecure means that the RAM content of a |
470d4313 | 1261 | virtual guest gets also transferred unencrypted, which can lead to |
b1743473 DM |
1262 | information disclosure of critical data from inside the guest (for |
1263 | example passwords or encryption keys). | |
054a7e7d DM |
1264 | |
1265 | Therefore, we strongly recommend using the secure channel if you do | |
1266 | not have full control over the network and can not guarantee that no | |
6d3c0b34 | 1267 | one is eavesdropping on it. |
082ea7d9 | 1268 | |
054a7e7d DM |
1269 | NOTE: Storage migration does not follow this setting. Currently, it |
1270 | always sends the storage content over a secure channel. | |
1271 | ||
1272 | Encryption requires a lot of computing power, so this setting is often | |
1273 | changed to "unsafe" to achieve better performance. The impact on | |
1274 | modern systems is lower because they implement AES encryption in | |
b1743473 DM |
1275 | hardware. The performance impact is particularly evident in fast |
1276 | networks where you can transfer 10 Gbps or more. | |
082ea7d9 | 1277 | |
082ea7d9 TL |
1278 | Migration Network |
1279 | ~~~~~~~~~~~~~~~~~ | |
1280 | ||
a9baa444 TL |
1281 | By default, {pve} uses the network in which cluster communication |
1282 | takes place to send the migration traffic. This is not optimal because | |
1283 | sensitive cluster traffic can be disrupted and this network may not | |
1284 | have the best bandwidth available on the node. | |
1285 | ||
1286 | Setting the migration network parameter allows the use of a dedicated | |
1287 | network for the entire migration traffic. In addition to the memory, | |
1288 | this also affects the storage traffic for offline migrations. | |
1289 | ||
1290 | The migration network is set as a network in the CIDR notation. This | |
1291 | has the advantage that you do not have to set individual IP addresses | |
1292 | for each node. {pve} can determine the real address on the | |
1293 | destination node from the network specified in the CIDR form. To | |
1294 | enable this, the network must be specified so that each node has one, | |
1295 | but only one IP in the respective network. | |
1296 | ||
082ea7d9 TL |
1297 | Example |
1298 | ^^^^^^^ | |
1299 | ||
a9baa444 TL |
1300 | We assume that we have a three-node setup with three separate |
1301 | networks. One for public communication with the Internet, one for | |
1302 | cluster communication and a very fast one, which we want to use as a | |
1303 | dedicated network for migration. | |
1304 | ||
1305 | A network configuration for such a setup might look as follows: | |
082ea7d9 TL |
1306 | |
1307 | ---- | |
7a0d4784 | 1308 | iface eno1 inet manual |
082ea7d9 TL |
1309 | |
1310 | # public network | |
1311 | auto vmbr0 | |
1312 | iface vmbr0 inet static | |
1313 | address 192.X.Y.57 | |
1314 | netmask 255.255.250.0 | |
1315 | gateway 192.X.Y.1 | |
7a39aabd AL |
1316 | bridge-ports eno1 |
1317 | bridge-stp off | |
1318 | bridge-fd 0 | |
082ea7d9 TL |
1319 | |
1320 | # cluster network | |
7a0d4784 WL |
1321 | auto eno2 |
1322 | iface eno2 inet static | |
082ea7d9 TL |
1323 | address 10.1.1.1 |
1324 | netmask 255.255.255.0 | |
1325 | ||
1326 | # fast network | |
7a0d4784 WL |
1327 | auto eno3 |
1328 | iface eno3 inet static | |
082ea7d9 TL |
1329 | address 10.1.2.1 |
1330 | netmask 255.255.255.0 | |
082ea7d9 TL |
1331 | ---- |
1332 | ||
a9baa444 TL |
1333 | Here, we will use the network 10.1.2.0/24 as a migration network. For |
1334 | a single migration, you can do this using the `migration_network` | |
1335 | parameter of the command line tool: | |
1336 | ||
082ea7d9 | 1337 | ---- |
b1743473 | 1338 | # qm migrate 106 tre --online --migration_network 10.1.2.0/24 |
082ea7d9 TL |
1339 | ---- |
1340 | ||
a9baa444 TL |
1341 | To configure this as the default network for all migrations in the |
1342 | cluster, set the `migration` property of the `/etc/pve/datacenter.cfg` | |
1343 | file: | |
1344 | ||
082ea7d9 | 1345 | ---- |
a9baa444 | 1346 | # use dedicated migration network |
b1743473 | 1347 | migration: secure,network=10.1.2.0/24 |
082ea7d9 TL |
1348 | ---- |
1349 | ||
a9baa444 TL |
1350 | NOTE: The migration type must always be set when the migration network |
1351 | gets set in `/etc/pve/datacenter.cfg`. | |
1352 | ||
806ef12d | 1353 | |
d8742b0c DM |
1354 | ifdef::manvolnum[] |
1355 | include::pve-copyright.adoc[] | |
1356 | endif::manvolnum[] |