]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/add-or-rm-mons.rst
359fa767642a6b7c6ebf5d0e71cdf8b4bec7648e
[ceph.git] / ceph / doc / rados / operations / add-or-rm-mons.rst
1 .. _adding-and-removing-monitors:
2
3 ==========================
4 Adding/Removing Monitors
5 ==========================
6
7 When you have a cluster up and running, you may add or remove monitors
8 from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
9 or `Monitor Bootstrap`_.
10
11 .. _adding-monitors:
12
13 Adding Monitors
14 ===============
15
16 Ceph monitors are lightweight processes that are the single source of truth
17 for the cluster map. You can run a cluster with 1 monitor but we recommend at least 3
18 for a production cluster. Ceph monitors use a variation of the
19 `Paxos`_ algorithm to establish consensus about maps and other critical
20 information across the cluster. Due to the nature of Paxos, Ceph requires
21 a majority of monitors to be active to establish a quorum (thus establishing
22 consensus).
23
24 It is advisable to run an odd number of monitors. An
25 odd number of monitors is more resilient than an
26 even number. For instance, with a two monitor deployment, no
27 failures can be tolerated and still maintain a quorum; with three monitors,
28 one failure can be tolerated; in a four monitor deployment, one failure can
29 be tolerated; with five monitors, two failures can be tolerated. This avoids
30 the dreaded *split brain* phenomenon, and is why an odd number is best.
31 In short, Ceph needs a majority of
32 monitors to be active (and able to communicate with each other), but that
33 majority can be achieved using a single monitor, or 2 out of 2 monitors,
34 2 out of 3, 3 out of 4, etc.
35
36 For small or non-critical deployments of multi-node Ceph clusters, it is
37 advisable to deploy three monitors, and to increase the number of monitors
38 to five for larger clusters or to survive a double failure. There is rarely
39 justification for seven or more.
40
41 Since monitors are lightweight, it is possible to run them on the same
42 host as OSDs; however, we recommend running them on separate hosts,
43 because `fsync` issues with the kernel may impair performance.
44 Dedicated monitor nodes also minimize disruption since monitor and OSD
45 daemons are not inactive at the same time when a node crashes or is
46 taken down for maintenance.
47
48 Dedicated
49 monitor nodes also make for cleaner maintenance by avoiding both OSDs and
50 a mon going down if a node is rebooted, taken down, or crashes.
51
52 .. note:: A *majority* of monitors in your cluster must be able to
53 reach each other in order to establish a quorum.
54
55 Deploy your Hardware
56 --------------------
57
58 If you are adding a new host when adding a new monitor, see `Hardware
59 Recommendations`_ for details on minimum recommendations for monitor hardware.
60 To add a monitor host to your cluster, first make sure you have an up-to-date
61 version of Linux installed (typically Ubuntu 16.04 or RHEL 7).
62
63 Add your monitor host to a rack in your cluster, connect it to the network
64 and ensure that it has network connectivity.
65
66 .. _Hardware Recommendations: ../../../start/hardware-recommendations
67
68 Install the Required Software
69 -----------------------------
70
71 For manually deployed clusters, you must install Ceph packages
72 manually. See `Installing Packages`_ for details.
73 You should configure SSH to a user with password-less authentication
74 and root permissions.
75
76 .. _Installing Packages: ../../../install/install-storage-cluster
77
78
79 .. _Adding a Monitor (Manual):
80
81 Adding a Monitor (Manual)
82 -------------------------
83
84 This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
85 and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If
86 this results in only two monitor daemons, you may add more monitors by
87 repeating this procedure until you have a sufficient number of ``ceph-mon``
88 daemons to achieve a quorum.
89
90 At this point you should define your monitor's id. Traditionally, monitors
91 have been named with single letters (``a``, ``b``, ``c``, ...), but you are
92 free to define the id as you see fit. For the purpose of this document,
93 please take into account that ``{mon-id}`` should be the id you chose,
94 without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a``
95 on ``mon.a``).
96
97 #. Create the default directory on the machine that will host your
98 new monitor:
99
100 .. prompt:: bash $
101
102 ssh {new-mon-host}
103 sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
104
105 #. Create a temporary directory ``{tmp}`` to keep the files needed during
106 this process. This directory should be different from the monitor's default
107 directory created in the previous step, and can be removed after all the
108 steps are executed:
109
110 .. prompt:: bash $
111
112 mkdir {tmp}
113
114 #. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to
115 the retrieved keyring, and ``{key-filename}`` is the name of the file
116 containing the retrieved monitor key:
117
118 .. prompt:: bash $
119
120 ceph auth get mon. -o {tmp}/{key-filename}
121
122 #. Retrieve the monitor map, where ``{tmp}`` is the path to
123 the retrieved monitor map, and ``{map-filename}`` is the name of the file
124 containing the retrieved monitor map:
125
126 .. prompt:: bash $
127
128 ceph mon getmap -o {tmp}/{map-filename}
129
130 #. Prepare the monitor's data directory created in the first step. You must
131 specify the path to the monitor map so that you can retrieve the
132 information about a quorum of monitors and their ``fsid``. You must also
133 specify a path to the monitor keyring:
134
135 .. prompt:: bash $
136
137 sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
138
139
140 #. Start the new monitor and it will automatically join the cluster.
141 The daemon needs to know which address to bind to, via either the
142 ``--public-addr {ip}`` or ``--public-network {network}`` argument.
143 For example:
144
145 .. prompt:: bash $
146
147 ceph-mon -i {mon-id} --public-addr {ip:port}
148
149 .. _removing-monitors:
150
151 Removing Monitors
152 =================
153
154 When you remove monitors from a cluster, consider that Ceph monitors use
155 Paxos to establish consensus about the master cluster map. You must have
156 a sufficient number of monitors to establish a quorum for consensus about
157 the cluster map.
158
159 .. _Removing a Monitor (Manual):
160
161 Removing a Monitor (Manual)
162 ---------------------------
163
164 This procedure removes a ``ceph-mon`` daemon from your cluster. If this
165 procedure results in only two monitor daemons, you may add or remove another
166 monitor until you have a number of ``ceph-mon`` daemons that can achieve a
167 quorum.
168
169 #. Stop the monitor:
170
171 .. prompt:: bash $
172
173 service ceph -a stop mon.{mon-id}
174
175 #. Remove the monitor from the cluster:
176
177 .. prompt:: bash $
178
179 ceph mon remove {mon-id}
180
181 #. Remove the monitor entry from ``ceph.conf``.
182
183 .. _rados-mon-remove-from-unhealthy:
184
185 Removing Monitors from an Unhealthy Cluster
186 -------------------------------------------
187
188 This procedure removes a ``ceph-mon`` daemon from an unhealthy
189 cluster, for example a cluster where the monitors cannot form a
190 quorum.
191
192
193 #. Stop all ``ceph-mon`` daemons on all monitor hosts:
194
195 .. prompt:: bash $
196
197 ssh {mon-host}
198 systemctl stop ceph-mon.target
199
200 Repeat for all monitor hosts.
201
202 #. Identify a surviving monitor and log in to that host:
203
204 .. prompt:: bash $
205
206 ssh {mon-host}
207
208 #. Extract a copy of the monmap file:
209
210 .. prompt:: bash $
211
212 ceph-mon -i {mon-id} --extract-monmap {map-path}
213
214 In most cases, this command will be:
215
216 .. prompt:: bash $
217
218 ceph-mon -i `hostname` --extract-monmap /tmp/monmap
219
220 #. Remove the non-surviving or problematic monitors. For example, if
221 you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
222 only ``mon.a`` will survive, follow the example below:
223
224 .. prompt:: bash $
225
226 monmaptool {map-path} --rm {mon-id}
227
228 For example,
229
230 .. prompt:: bash $
231
232 monmaptool /tmp/monmap --rm b
233 monmaptool /tmp/monmap --rm c
234
235 #. Inject the surviving map with the removed monitors into the
236 surviving monitor(s). For example, to inject a map into monitor
237 ``mon.a``, follow the example below:
238
239 .. prompt:: bash $
240
241 ceph-mon -i {mon-id} --inject-monmap {map-path}
242
243 For example:
244
245 .. prompt:: bash $
246
247 ceph-mon -i a --inject-monmap /tmp/monmap
248
249 #. Start only the surviving monitors.
250
251 #. Verify the monitors form a quorum (``ceph -s``).
252
253 #. You may wish to archive the removed monitors' data directory in
254 ``/var/lib/ceph/mon`` in a safe location, or delete it if you are
255 confident the remaining monitors are healthy and are sufficiently
256 redundant.
257
258 .. _Changing a Monitor's IP address:
259
260 Changing a Monitor's IP Address
261 ===============================
262
263 .. important:: Existing monitors are not supposed to change their IP addresses.
264
265 Monitors are critical components of a Ceph cluster, and they need to maintain a
266 quorum for the whole system to work properly. To establish a quorum, the
267 monitors need to discover each other. Ceph has strict requirements for
268 discovering monitors.
269
270 Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
271 However, monitors discover each other using the monitor map, not ``ceph.conf``.
272 For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you
273 need to obtain the current monmap for the cluster when creating a new monitor,
274 as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
275 following sections explain the consistency requirements for Ceph monitors, and a
276 few safe ways to change a monitor's IP address.
277
278
279 Consistency Requirements
280 ------------------------
281
282 A monitor always refers to the local copy of the monmap when discovering other
283 monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids
284 errors that could break the cluster (e.g., typos in ``ceph.conf`` when
285 specifying a monitor address or port). Since monitors use monmaps for discovery
286 and they share monmaps with clients and other Ceph daemons, the monmap provides
287 monitors with a strict guarantee that their consensus is valid.
288
289 Strict consistency also applies to updates to the monmap. As with any other
290 updates on the monitor, changes to the monmap always run through a distributed
291 consensus algorithm called `Paxos`_. The monitors must agree on each update to
292 the monmap, such as adding or removing a monitor, to ensure that each monitor in
293 the quorum has the same version of the monmap. Updates to the monmap are
294 incremental so that monitors have the latest agreed upon version, and a set of
295 previous versions, allowing a monitor that has an older version of the monmap to
296 catch up with the current state of the cluster.
297
298 If monitors discovered each other through the Ceph configuration file instead of
299 through the monmap, it would introduce additional risks because the Ceph
300 configuration files are not updated and distributed automatically. Monitors
301 might inadvertently use an older ``ceph.conf`` file, fail to recognize a
302 monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
303 to determine the current state of the system accurately. Consequently, making
304 changes to an existing monitor's IP address must be done with great care.
305
306
307 Changing a Monitor's IP address (The Right Way)
308 -----------------------------------------------
309
310 Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
311 ensure that other monitors in the cluster will receive the update. To change a
312 monitor's IP address, you must add a new monitor with the IP address you want
313 to use (as described in `Adding a Monitor (Manual)`_), ensure that the new
314 monitor successfully joins the quorum; then, remove the monitor that uses the
315 old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
316 other daemons know the IP address of the new monitor.
317
318 For example, lets assume there are three monitors in place, such as ::
319
320 [mon.a]
321 host = host01
322 addr = 10.0.0.1:6789
323 [mon.b]
324 host = host02
325 addr = 10.0.0.2:6789
326 [mon.c]
327 host = host03
328 addr = 10.0.0.3:6789
329
330 To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the
331 steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure
332 that ``mon.d`` is running before removing ``mon.c``, or it will break the
333 quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving
334 all three monitors would thus require repeating this process as many times as
335 needed.
336
337
338 Changing a Monitor's IP address (The Messy Way)
339 -----------------------------------------------
340
341 There may come a time when the monitors must be moved to a different network, a
342 different part of the datacenter or a different datacenter altogether. While it
343 is possible to do it, the process becomes a bit more hazardous.
344
345 In such a case, the solution is to generate a new monmap with updated IP
346 addresses for all the monitors in the cluster, and inject the new map on each
347 individual monitor. This is not the most user-friendly approach, but we do not
348 expect this to be something that needs to be done every other week. As it is
349 clearly stated on the top of this section, monitors are not supposed to change
350 IP addresses.
351
352 Using the previous monitor configuration as an example, assume you want to move
353 all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
354 networks are unable to communicate. Use the following procedure:
355
356 #. Retrieve the monitor map, where ``{tmp}`` is the path to
357 the retrieved monitor map, and ``{filename}`` is the name of the file
358 containing the retrieved monitor map:
359
360 .. prompt:: bash $
361
362 ceph mon getmap -o {tmp}/{filename}
363
364 #. The following example demonstrates the contents of the monmap:
365
366 .. prompt:: bash $
367
368 monmaptool --print {tmp}/{filename}
369
370 ::
371
372 monmaptool: monmap file {tmp}/{filename}
373 epoch 1
374 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
375 last_changed 2012-12-17 02:46:41.591248
376 created 2012-12-17 02:46:41.591248
377 0: 10.0.0.1:6789/0 mon.a
378 1: 10.0.0.2:6789/0 mon.b
379 2: 10.0.0.3:6789/0 mon.c
380
381 #. Remove the existing monitors:
382
383 .. prompt:: bash $
384
385 monmaptool --rm a --rm b --rm c {tmp}/{filename}
386
387
388 ::
389
390 monmaptool: monmap file {tmp}/{filename}
391 monmaptool: removing a
392 monmaptool: removing b
393 monmaptool: removing c
394 monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
395
396 #. Add the new monitor locations:
397
398 .. prompt:: bash $
399
400 monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
401
402
403 ::
404
405 monmaptool: monmap file {tmp}/{filename}
406 monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
407
408 #. Check new contents:
409
410 .. prompt:: bash $
411
412 monmaptool --print {tmp}/{filename}
413
414 ::
415
416 monmaptool: monmap file {tmp}/{filename}
417 epoch 1
418 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
419 last_changed 2012-12-17 02:46:41.591248
420 created 2012-12-17 02:46:41.591248
421 0: 10.1.0.1:6789/0 mon.a
422 1: 10.1.0.2:6789/0 mon.b
423 2: 10.1.0.3:6789/0 mon.c
424
425 At this point, we assume the monitors (and stores) are installed at the new
426 location. The next step is to propagate the modified monmap to the new
427 monitors, and inject the modified monmap into each new monitor.
428
429 #. First, make sure to stop all your monitors. Injection must be done while
430 the daemon is not running.
431
432 #. Inject the monmap:
433
434 .. prompt:: bash $
435
436 ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
437
438 #. Restart the monitors.
439
440 After this step, migration to the new location is complete and
441 the monitors should operate successfully.
442
443
444 .. _Manual Deployment: ../../../install/manual-deployment
445 .. _Monitor Bootstrap: ../../../dev/mon-bootstrap
446 .. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science)