]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/add-or-rm-mons.rst
124076d6e4aeead6a8f94ea32d8b85d4acbcd2e7
[ceph.git] / ceph / doc / rados / operations / add-or-rm-mons.rst
1 .. _adding-and-removing-monitors:
2
3 ==========================
4 Adding/Removing Monitors
5 ==========================
6
7 When you have a cluster up and running, you may add or remove monitors
8 from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
9 or `Monitor Bootstrap`_.
10
11 .. _adding-monitors:
12
13 Adding Monitors
14 ===============
15
16 Ceph monitors are lightweight processes that are the single source of truth
17 for the cluster map. You can run a cluster with 1 monitor but we recommend at least 3
18 for a production cluster. Ceph monitors use a variation of the
19 `Paxos`_ algorithm to establish consensus about maps and other critical
20 information across the cluster. Due to the nature of Paxos, Ceph requires
21 a majority of monitors to be active to establish a quorum (thus establishing
22 consensus).
23
24 It is advisable to run an odd number of monitors. An
25 odd number of monitors is more resilient than an
26 even number. For instance, with a two monitor deployment, no
27 failures can be tolerated and still maintain a quorum; with three monitors,
28 one failure can be tolerated; in a four monitor deployment, one failure can
29 be tolerated; with five monitors, two failures can be tolerated. This avoids
30 the dreaded *split brain* phenomenon, and is why an odd number is best.
31 In short, Ceph needs a majority of
32 monitors to be active (and able to communicate with each other), but that
33 majority can be achieved using a single monitor, or 2 out of 2 monitors,
34 2 out of 3, 3 out of 4, etc.
35
36 For small or non-critical deployments of multi-node Ceph clusters, it is
37 advisable to deploy three monitors, and to increase the number of monitors
38 to five for larger clusters or to survive a double failure. There is rarely
39 justification for seven or more.
40
41 Since monitors are lightweight, it is possible to run them on the same
42 host as OSDs; however, we recommend running them on separate hosts,
43 because `fsync` issues with the kernel may impair performance.
44 Dedicated monitor nodes also minimize disruption since monitor and OSD
45 daemons are not inactive at the same time when a node crashes or is
46 taken down for maintenance.
47
48 Dedicated
49 monitor nodes also make for cleaner maintenance by avoiding both OSDs and
50 a mon going down if a node is rebooted, taken down, or crashes.
51
52 .. note:: A *majority* of monitors in your cluster must be able to
53 reach each other in order to establish a quorum.
54
55 Deploy your Hardware
56 --------------------
57
58 If you are adding a new host when adding a new monitor, see `Hardware
59 Recommendations`_ for details on minimum recommendations for monitor hardware.
60 To add a monitor host to your cluster, first make sure you have an up-to-date
61 version of Linux installed (typically Ubuntu 16.04 or RHEL 7).
62
63 Add your monitor host to a rack in your cluster, connect it to the network
64 and ensure that it has network connectivity.
65
66 .. _Hardware Recommendations: ../../../start/hardware-recommendations
67
68 Install the Required Software
69 -----------------------------
70
71 For manually deployed clusters, you must install Ceph packages
72 manually. See `Installing Packages`_ for details.
73 You should configure SSH to a user with password-less authentication
74 and root permissions.
75
76 .. _Installing Packages: ../../../install/install-storage-cluster
77
78
79 .. _Adding a Monitor (Manual):
80
81 Adding a Monitor (Manual)
82 -------------------------
83
84 This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
85 and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If
86 this results in only two monitor daemons, you may add more monitors by
87 repeating this procedure until you have a sufficient number of ``ceph-mon``
88 daemons to achieve a quorum.
89
90 At this point you should define your monitor's id. Traditionally, monitors
91 have been named with single letters (``a``, ``b``, ``c``, ...), but you are
92 free to define the id as you see fit. For the purpose of this document,
93 please take into account that ``{mon-id}`` should be the id you chose,
94 without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a``
95 on ``mon.a``).
96
97 #. Create the default directory on the machine that will host your
98 new monitor. ::
99
100 ssh {new-mon-host}
101 sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
102
103 #. Create a temporary directory ``{tmp}`` to keep the files needed during
104 this process. This directory should be different from the monitor's default
105 directory created in the previous step, and can be removed after all the
106 steps are executed. ::
107
108 mkdir {tmp}
109
110 #. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to
111 the retrieved keyring, and ``{key-filename}`` is the name of the file
112 containing the retrieved monitor key. ::
113
114 ceph auth get mon. -o {tmp}/{key-filename}
115
116 #. Retrieve the monitor map, where ``{tmp}`` is the path to
117 the retrieved monitor map, and ``{map-filename}`` is the name of the file
118 containing the retrieved monitor map. ::
119
120 ceph mon getmap -o {tmp}/{map-filename}
121
122 #. Prepare the monitor's data directory created in the first step. You must
123 specify the path to the monitor map so that you can retrieve the
124 information about a quorum of monitors and their ``fsid``. You must also
125 specify a path to the monitor keyring::
126
127 sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
128
129
130 #. Start the new monitor and it will automatically join the cluster.
131 The daemon needs to know which address to bind to, via either the
132 ``--public-addr {ip}`` or ``--public-network {network}`` argument.
133 For example::
134
135 ceph-mon -i {mon-id} --public-addr {ip:port}
136
137 .. _removing-monitors:
138
139 Removing Monitors
140 =================
141
142 When you remove monitors from a cluster, consider that Ceph monitors use
143 Paxos to establish consensus about the master cluster map. You must have
144 a sufficient number of monitors to establish a quorum for consensus about
145 the cluster map.
146
147 .. _Removing a Monitor (Manual):
148
149 Removing a Monitor (Manual)
150 ---------------------------
151
152 This procedure removes a ``ceph-mon`` daemon from your cluster. If this
153 procedure results in only two monitor daemons, you may add or remove another
154 monitor until you have a number of ``ceph-mon`` daemons that can achieve a
155 quorum.
156
157 #. Stop the monitor. ::
158
159 service ceph -a stop mon.{mon-id}
160
161 #. Remove the monitor from the cluster. ::
162
163 ceph mon remove {mon-id}
164
165 #. Remove the monitor entry from ``ceph.conf``.
166
167 .. _rados-mon-remove-from-unhealthy:
168
169 Removing Monitors from an Unhealthy Cluster
170 -------------------------------------------
171
172 This procedure removes a ``ceph-mon`` daemon from an unhealthy
173 cluster, for example a cluster where the monitors cannot form a
174 quorum.
175
176
177 #. Stop all ``ceph-mon`` daemons on all monitor hosts. ::
178
179 ssh {mon-host}
180 systemctl stop ceph-mon.target
181 # and repeat for all mons
182
183 #. Identify a surviving monitor and log in to that host. ::
184
185 ssh {mon-host}
186
187 #. Extract a copy of the monmap file. ::
188
189 ceph-mon -i {mon-id} --extract-monmap {map-path}
190 # in most cases, that's
191 ceph-mon -i `hostname` --extract-monmap /tmp/monmap
192
193 #. Remove the non-surviving or problematic monitors. For example, if
194 you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
195 only ``mon.a`` will survive, follow the example below::
196
197 monmaptool {map-path} --rm {mon-id}
198 # for example,
199 monmaptool /tmp/monmap --rm b
200 monmaptool /tmp/monmap --rm c
201
202 #. Inject the surviving map with the removed monitors into the
203 surviving monitor(s). For example, to inject a map into monitor
204 ``mon.a``, follow the example below::
205
206 ceph-mon -i {mon-id} --inject-monmap {map-path}
207 # for example,
208 ceph-mon -i a --inject-monmap /tmp/monmap
209
210 #. Start only the surviving monitors.
211
212 #. Verify the monitors form a quorum (``ceph -s``).
213
214 #. You may wish to archive the removed monitors' data directory in
215 ``/var/lib/ceph/mon`` in a safe location, or delete it if you are
216 confident the remaining monitors are healthy and are sufficiently
217 redundant.
218
219 .. _Changing a Monitor's IP address:
220
221 Changing a Monitor's IP Address
222 ===============================
223
224 .. important:: Existing monitors are not supposed to change their IP addresses.
225
226 Monitors are critical components of a Ceph cluster, and they need to maintain a
227 quorum for the whole system to work properly. To establish a quorum, the
228 monitors need to discover each other. Ceph has strict requirements for
229 discovering monitors.
230
231 Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
232 However, monitors discover each other using the monitor map, not ``ceph.conf``.
233 For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you
234 need to obtain the current monmap for the cluster when creating a new monitor,
235 as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
236 following sections explain the consistency requirements for Ceph monitors, and a
237 few safe ways to change a monitor's IP address.
238
239
240 Consistency Requirements
241 ------------------------
242
243 A monitor always refers to the local copy of the monmap when discovering other
244 monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids
245 errors that could break the cluster (e.g., typos in ``ceph.conf`` when
246 specifying a monitor address or port). Since monitors use monmaps for discovery
247 and they share monmaps with clients and other Ceph daemons, the monmap provides
248 monitors with a strict guarantee that their consensus is valid.
249
250 Strict consistency also applies to updates to the monmap. As with any other
251 updates on the monitor, changes to the monmap always run through a distributed
252 consensus algorithm called `Paxos`_. The monitors must agree on each update to
253 the monmap, such as adding or removing a monitor, to ensure that each monitor in
254 the quorum has the same version of the monmap. Updates to the monmap are
255 incremental so that monitors have the latest agreed upon version, and a set of
256 previous versions, allowing a monitor that has an older version of the monmap to
257 catch up with the current state of the cluster.
258
259 If monitors discovered each other through the Ceph configuration file instead of
260 through the monmap, it would introduce additional risks because the Ceph
261 configuration files are not updated and distributed automatically. Monitors
262 might inadvertently use an older ``ceph.conf`` file, fail to recognize a
263 monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
264 to determine the current state of the system accurately. Consequently, making
265 changes to an existing monitor's IP address must be done with great care.
266
267
268 Changing a Monitor's IP address (The Right Way)
269 -----------------------------------------------
270
271 Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
272 ensure that other monitors in the cluster will receive the update. To change a
273 monitor's IP address, you must add a new monitor with the IP address you want
274 to use (as described in `Adding a Monitor (Manual)`_), ensure that the new
275 monitor successfully joins the quorum; then, remove the monitor that uses the
276 old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
277 other daemons know the IP address of the new monitor.
278
279 For example, lets assume there are three monitors in place, such as ::
280
281 [mon.a]
282 host = host01
283 addr = 10.0.0.1:6789
284 [mon.b]
285 host = host02
286 addr = 10.0.0.2:6789
287 [mon.c]
288 host = host03
289 addr = 10.0.0.3:6789
290
291 To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the
292 steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure
293 that ``mon.d`` is running before removing ``mon.c``, or it will break the
294 quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving
295 all three monitors would thus require repeating this process as many times as
296 needed.
297
298
299 Changing a Monitor's IP address (The Messy Way)
300 -----------------------------------------------
301
302 There may come a time when the monitors must be moved to a different network, a
303 different part of the datacenter or a different datacenter altogether. While it
304 is possible to do it, the process becomes a bit more hazardous.
305
306 In such a case, the solution is to generate a new monmap with updated IP
307 addresses for all the monitors in the cluster, and inject the new map on each
308 individual monitor. This is not the most user-friendly approach, but we do not
309 expect this to be something that needs to be done every other week. As it is
310 clearly stated on the top of this section, monitors are not supposed to change
311 IP addresses.
312
313 Using the previous monitor configuration as an example, assume you want to move
314 all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
315 networks are unable to communicate. Use the following procedure:
316
317 #. Retrieve the monitor map, where ``{tmp}`` is the path to
318 the retrieved monitor map, and ``{filename}`` is the name of the file
319 containing the retrieved monitor map. ::
320
321 ceph mon getmap -o {tmp}/{filename}
322
323 #. The following example demonstrates the contents of the monmap. ::
324
325 $ monmaptool --print {tmp}/{filename}
326
327 monmaptool: monmap file {tmp}/{filename}
328 epoch 1
329 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
330 last_changed 2012-12-17 02:46:41.591248
331 created 2012-12-17 02:46:41.591248
332 0: 10.0.0.1:6789/0 mon.a
333 1: 10.0.0.2:6789/0 mon.b
334 2: 10.0.0.3:6789/0 mon.c
335
336 #. Remove the existing monitors. ::
337
338 $ monmaptool --rm a --rm b --rm c {tmp}/{filename}
339
340 monmaptool: monmap file {tmp}/{filename}
341 monmaptool: removing a
342 monmaptool: removing b
343 monmaptool: removing c
344 monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
345
346 #. Add the new monitor locations. ::
347
348 $ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
349
350 monmaptool: monmap file {tmp}/{filename}
351 monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
352
353 #. Check new contents. ::
354
355 $ monmaptool --print {tmp}/{filename}
356
357 monmaptool: monmap file {tmp}/{filename}
358 epoch 1
359 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
360 last_changed 2012-12-17 02:46:41.591248
361 created 2012-12-17 02:46:41.591248
362 0: 10.1.0.1:6789/0 mon.a
363 1: 10.1.0.2:6789/0 mon.b
364 2: 10.1.0.3:6789/0 mon.c
365
366 At this point, we assume the monitors (and stores) are installed at the new
367 location. The next step is to propagate the modified monmap to the new
368 monitors, and inject the modified monmap into each new monitor.
369
370 #. First, make sure to stop all your monitors. Injection must be done while
371 the daemon is not running.
372
373 #. Inject the monmap. ::
374
375 ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
376
377 #. Restart the monitors.
378
379 After this step, migration to the new location is complete and
380 the monitors should operate successfully.
381
382
383 .. _Manual Deployment: ../../../install/manual-deployment
384 .. _Monitor Bootstrap: ../../../dev/mon-bootstrap
385 .. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science)