]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/add-or-rm-osds.rst
bump version to 18.2.4-pve3
[ceph.git] / ceph / doc / rados / operations / add-or-rm-osds.rst
CommitLineData
7c673cae
FG
1======================
2 Adding/Removing OSDs
3======================
4
05a536ef 5When a cluster is up and running, it is possible to add or remove OSDs.
7c673cae
FG
6
7Adding OSDs
8===========
9
05a536ef
TL
10OSDs can be added to a cluster in order to expand the cluster's capacity and
11resilience. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on one
12storage drive within a host machine. But if your host machine has multiple
13storage drives, you may map one ``ceph-osd`` daemon for each drive on the
14machine.
7c673cae 15
05a536ef
TL
16It's a good idea to check the capacity of your cluster so that you know when it
17approaches its capacity limits. If your cluster has reached its ``near full``
18ratio, then you should add OSDs to expand your cluster's capacity.
7c673cae 19
05a536ef
TL
20.. warning:: Do not add an OSD after your cluster has reached its ``full
21 ratio``. OSD failures that occur after the cluster reaches its ``near full
22 ratio`` might cause the cluster to exceed its ``full ratio``.
7c673cae 23
7c673cae 24
05a536ef
TL
25Deploying your Hardware
26-----------------------
27
28If you are also adding a new host when adding a new OSD, see `Hardware
7c673cae 29Recommendations`_ for details on minimum recommendations for OSD hardware. To
05a536ef
TL
30add an OSD host to your cluster, begin by making sure that an appropriate
31version of Linux has been installed on the host machine and that all initial
32preparations for your storage drives have been carried out. For details, see
33`Filesystem Recommendations`_.
34
35Next, add your OSD host to a rack in your cluster, connect the host to the
36network, and ensure that the host has network connectivity. For details, see
37`Network Configuration Reference`_.
7c673cae 38
7c673cae
FG
39
40.. _Hardware Recommendations: ../../../start/hardware-recommendations
41.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
42.. _Network Configuration Reference: ../../configuration/network-config-ref
43
05a536ef
TL
44Installing the Required Software
45--------------------------------
7c673cae 46
05a536ef
TL
47If your cluster has been manually deployed, you will need to install Ceph
48software packages manually. For details, see `Installing Ceph (Manual)`_.
49Configure SSH for the appropriate user to have both passwordless authentication
7c673cae
FG
50and root permissions.
51
52.. _Installing Ceph (Manual): ../../../install
53
54
55Adding an OSD (Manual)
56----------------------
57
05a536ef
TL
58The following procedure sets up a ``ceph-osd`` daemon, configures this OSD to
59use one drive, and configures the cluster to distribute data to the OSD. If
60your host machine has multiple drives, you may add an OSD for each drive on the
61host by repeating this procedure.
7c673cae 62
05a536ef
TL
63As the following procedure will demonstrate, adding an OSD involves creating a
64metadata directory for it, configuring a data storage drive, adding the OSD to
65the cluster, and then adding it to the CRUSH map.
7c673cae 66
05a536ef
TL
67When you add the OSD to the CRUSH map, you will need to consider the weight you
68assign to the new OSD. Since storage drive capacities increase over time, newer
69OSD hosts are likely to have larger hard drives than the older hosts in the
70cluster have and therefore might have greater weight as well.
7c673cae 71
05a536ef
TL
72.. tip:: Ceph works best with uniform hardware across pools. It is possible to
73 add drives of dissimilar size and then adjust their weights accordingly.
74 However, for best performance, consider a CRUSH hierarchy that has drives of
75 the same type and size. It is better to add larger drives uniformly to
76 existing hosts. This can be done incrementally, replacing smaller drives
77 each time the new drives are added.
7c673cae 78
05a536ef
TL
79#. Create the new OSD by running a command of the following form. If you opt
80 not to specify a UUID in this command, the UUID will be set automatically
81 when the OSD starts up. The OSD number, which is needed for subsequent
82 steps, is found in the command's output:
11fdf7f2 83
39ae355f
TL
84 .. prompt:: bash $
85
86 ceph osd create [{uuid} [{id}]]
7c673cae 87
05a536ef
TL
88 If the optional parameter {id} is specified it will be used as the OSD ID.
89 However, if the ID number is already in use, the command will fail.
7c673cae 90
05a536ef
TL
91 .. warning:: Explicitly specifying the ``{id}`` parameter is not
92 recommended. IDs are allocated as an array, and any skipping of entries
93 consumes extra memory. This memory consumption can become significant if
94 there are large gaps or if clusters are large. By leaving the ``{id}``
95 parameter unspecified, we ensure that Ceph uses the smallest ID number
96 available and that these problems are avoided.
7c673cae 97
05a536ef
TL
98#. Create the default directory for your new OSD by running commands of the
99 following form:
7c673cae 100
39ae355f 101 .. prompt:: bash $
7c673cae 102
39ae355f
TL
103 ssh {new-osd-host}
104 sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
11fdf7f2 105
05a536ef
TL
106#. If the OSD will be created on a drive other than the OS drive, prepare it
107 for use with Ceph. Run commands of the following form:
39ae355f
TL
108
109 .. prompt:: bash $
7c673cae 110
39ae355f
TL
111 ssh {new-osd-host}
112 sudo mkfs -t {fstype} /dev/{drive}
113 sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
7c673cae 114
05a536ef 115#. Initialize the OSD data directory by running commands of the following form:
11fdf7f2 116
39ae355f 117 .. prompt:: bash $
7c673cae 118
39ae355f
TL
119 ssh {new-osd-host}
120 ceph-osd -i {osd-num} --mkfs --mkkey
11fdf7f2 121
05a536ef 122 Make sure that the directory is empty before running ``ceph-osd``.
7c673cae 123
05a536ef
TL
124#. Register the OSD authentication key by running a command of the following
125 form:
7c673cae 126
39ae355f 127 .. prompt:: bash $
7c673cae 128
39ae355f 129 ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
7c673cae 130
05a536ef
TL
131 This presentation of the command has ``ceph-{osd-num}`` in the listed path
132 because many clusters have the name ``ceph``. However, if your cluster name
133 is not ``ceph``, then the string ``ceph`` in ``ceph-{osd-num}`` needs to be
134 replaced with your cluster name. For example, if your cluster name is
135 ``cluster1``, then the path in the command should be
136 ``/var/lib/ceph/osd/cluster1-{osd-num}/keyring``.
137
138#. Add the OSD to the CRUSH map by running the following command. This allows
139 the OSD to begin receiving data. The ``ceph osd crush add`` command can add
140 OSDs to the CRUSH hierarchy wherever you want. If you specify one or more
141 buckets, the command places the OSD in the most specific of those buckets,
142 and it moves that bucket underneath any other buckets that you have
143 specified. **Important:** If you specify only the root bucket, the command
144 will attach the OSD directly to the root, but CRUSH rules expect OSDs to be
145 inside of hosts. If the OSDs are not inside hosts, the OSDS will likely not
146 receive any data.
7c673cae 147
39ae355f
TL
148 .. prompt:: bash $
149
150 ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
7c673cae 151
05a536ef
TL
152 Note that there is another way to add a new OSD to the CRUSH map: decompile
153 the CRUSH map, add the OSD to the device list, add the host as a bucket (if
154 it is not already in the CRUSH map), add the device as an item in the host,
155 assign the device a weight, recompile the CRUSH map, and set the CRUSH map.
156 For details, see `Add/Move an OSD`_. This is rarely necessary with recent
157 releases (this sentence was written the month that Reef was released).
7c673cae
FG
158
159
28e407b8 160.. _rados-replacing-an-osd:
7c673cae 161
c07f9fc5
FG
162Replacing an OSD
163----------------
164
05a536ef
TL
165.. note:: If the procedure in this section does not work for you, try the
166 instructions in the ``cephadm`` documentation:
167 :ref:`cephadm-replacing-an-osd`.
168
169Sometimes OSDs need to be replaced: for example, when a disk fails, or when an
170administrator wants to reprovision OSDs with a new back end (perhaps when
171switching from Filestore to BlueStore). Replacing an OSD differs from `Removing
172the OSD`_ in that the replaced OSD's ID and CRUSH map entry must be kept intact
173after the OSD is destroyed for replacement.
39ae355f 174
c07f9fc5 175
05a536ef 176#. Make sure that it is safe to destroy the OSD:
9f95a23c 177
39ae355f 178 .. prompt:: bash $
9f95a23c 179
39ae355f 180 while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done
c07f9fc5 181
05a536ef 182#. Destroy the OSD:
39ae355f
TL
183
184 .. prompt:: bash $
185
186 ceph osd destroy {id} --yes-i-really-mean-it
c07f9fc5 187
05a536ef
TL
188#. *Optional*: If the disk that you plan to use is not a new disk and has been
189 used before for other purposes, zap the disk:
39ae355f
TL
190
191 .. prompt:: bash $
c07f9fc5 192
39ae355f 193 ceph-volume lvm zap /dev/sdX
c07f9fc5 194
05a536ef
TL
195#. Prepare the disk for replacement by using the ID of the OSD that was
196 destroyed in previous steps:
c07f9fc5 197
39ae355f 198 .. prompt:: bash $
c07f9fc5 199
39ae355f 200 ceph-volume lvm prepare --osd-id {id} --data /dev/sdX
c07f9fc5 201
05a536ef 202#. Finally, activate the OSD:
39ae355f
TL
203
204 .. prompt:: bash $
205
206 ceph-volume lvm activate {id} {fsid}
11fdf7f2 207
05a536ef
TL
208Alternatively, instead of carrying out the final two steps (preparing the disk
209and activating the OSD), you can re-create the OSD by running a single command
210of the following form:
39ae355f
TL
211
212 .. prompt:: bash $
11fdf7f2 213
39ae355f 214 ceph-volume lvm create --osd-id {id} --data /dev/sdX
c07f9fc5 215
7c673cae
FG
216Starting the OSD
217----------------
218
05a536ef
TL
219After an OSD is added to Ceph, the OSD is in the cluster. However, until it is
220started, the OSD is considered ``down`` and ``in``. The OSD is not running and
221will be unable to receive data. To start an OSD, either run ``service ceph``
222from your admin host or run a command of the following form to start the OSD
223from its host machine:
7c673cae 224
39ae355f
TL
225 .. prompt:: bash $
226
227 sudo systemctl start ceph-osd@{osd-num}
7c673cae 228
05a536ef
TL
229After the OSD is started, it is considered ``up`` and ``in``.
230
231Observing the Data Migration
232----------------------------
7c673cae 233
05a536ef
TL
234After the new OSD has been added to the CRUSH map, Ceph begins rebalancing the
235cluster by migrating placement groups (PGs) to the new OSD. To observe this
236process by using the `ceph`_ tool, run the following command:
7c673cae 237
05a536ef 238 .. prompt:: bash $
7c673cae 239
05a536ef 240 ceph -w
7c673cae 241
05a536ef 242Or:
39ae355f
TL
243
244 .. prompt:: bash $
7c673cae 245
05a536ef 246 watch ceph status
7c673cae 247
05a536ef
TL
248The PG states will first change from ``active+clean`` to ``active, some
249degraded objects`` and then return to ``active+clean`` when migration
250completes. When you are finished observing, press Ctrl-C to exit.
7c673cae 251
7c673cae
FG
252.. _Add/Move an OSD: ../crush-map#addosd
253.. _ceph: ../monitoring
254
255
7c673cae
FG
256Removing OSDs (Manual)
257======================
258
05a536ef
TL
259It is possible to remove an OSD manually while the cluster is running: you
260might want to do this in order to reduce the size of the cluster or when
261replacing hardware. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on
262one storage drive within a host machine. Alternatively, if your host machine
263has multiple storage drives, you might need to remove multiple ``ceph-osd``
264daemons: one daemon for each drive on the machine.
7c673cae 265
05a536ef
TL
266.. warning:: Before you begin the process of removing an OSD, make sure that
267 your cluster is not near its ``full ratio``. Otherwise the act of removing
268 OSDs might cause the cluster to reach or exceed its ``full ratio``.
11fdf7f2 269
7c673cae 270
05a536ef
TL
271Taking the OSD ``out`` of the Cluster
272-------------------------------------
7c673cae 273
05a536ef
TL
274OSDs are typically ``up`` and ``in`` before they are removed from the cluster.
275Before the OSD can be removed from the cluster, the OSD must be taken ``out``
276of the cluster so that Ceph can begin rebalancing and copying its data to other
277OSDs. To take an OSD ``out`` of the cluster, run a command of the following
278form:
7c673cae 279
39ae355f
TL
280 .. prompt:: bash $
281
282 ceph osd out {osd-num}
7c673cae
FG
283
284
05a536ef
TL
285Observing the Data Migration
286----------------------------
7c673cae 287
05a536ef
TL
288After the OSD has been taken ``out`` of the cluster, Ceph begins rebalancing
289the cluster by migrating placement groups out of the OSD that was removed. To
290observe this process by using the `ceph`_ tool, run the following command:
39ae355f
TL
291
292 .. prompt:: bash $
7c673cae 293
39ae355f 294 ceph -w
7c673cae 295
05a536ef
TL
296The PG states will change from ``active+clean`` to ``active, some degraded
297objects`` and will then return to ``active+clean`` when migration completes.
298When you are finished observing, press Ctrl-C to exit.
7c673cae 299
05a536ef
TL
300.. note:: Under certain conditions, the action of taking ``out`` an OSD
301 might lead CRUSH to encounter a corner case in which some PGs remain stuck
302 in the ``active+remapped`` state. This problem sometimes occurs in small
303 clusters with few hosts (for example, in a small testing cluster). To
304 address this problem, mark the OSD ``in`` by running a command of the
305 following form:
7c673cae 306
39ae355f
TL
307 .. prompt:: bash $
308
309 ceph osd in {osd-num}
7c673cae 310
05a536ef
TL
311 After the OSD has come back to its initial state, do not mark the OSD
312 ``out`` again. Instead, set the OSD's weight to ``0`` by running a command
313 of the following form:
7c673cae 314
39ae355f
TL
315 .. prompt:: bash $
316
317 ceph osd crush reweight osd.{osd-num} 0
7c673cae 318
05a536ef
TL
319 After the OSD has been reweighted, observe the data migration and confirm
320 that it has completed successfully. The difference between marking an OSD
321 ``out`` and reweighting the OSD to ``0`` has to do with the bucket that
322 contains the OSD. When an OSD is marked ``out``, the weight of the bucket is
323 not changed. But when an OSD is reweighted to ``0``, the weight of the
324 bucket is updated (namely, the weight of the OSD is subtracted from the
325 overall weight of the bucket). When operating small clusters, it can
326 sometimes be preferable to use the above reweight command.
7c673cae
FG
327
328
329Stopping the OSD
330----------------
331
05a536ef
TL
332After you take an OSD ``out`` of the cluster, the OSD might still be running.
333In such a case, the OSD is ``up`` and ``out``. Before it is removed from the
334cluster, the OSD must be stopped by running commands of the following form:
39ae355f
TL
335
336 .. prompt:: bash $
7c673cae 337
39ae355f
TL
338 ssh {osd-host}
339 sudo systemctl stop ceph-osd@{osd-num}
7c673cae 340
05a536ef 341After the OSD has been stopped, it is ``down``.
7c673cae
FG
342
343
344Removing the OSD
345----------------
346
05a536ef
TL
347The following procedure removes an OSD from the cluster map, removes the OSD's
348authentication key, removes the OSD from the OSD map, and removes the OSD from
349the ``ceph.conf`` file. If your host has multiple drives, it might be necessary
350to remove an OSD from each drive by repeating this procedure.
7c673cae 351
05a536ef
TL
352#. Begin by having the cluster forget the OSD. This step removes the OSD from
353 the CRUSH map, removes the OSD's authentication key, and removes the OSD
354 from the OSD map. (The :ref:`purge subcommand <ceph-admin-osd>` was
355 introduced in Luminous. For older releases, see :ref:`the procedure linked
356 here <ceph_osd_purge_procedure_pre_luminous>`.):
39ae355f
TL
357
358 .. prompt:: bash $
c07f9fc5 359
39ae355f 360 ceph osd purge {id} --yes-i-really-mean-it
c07f9fc5 361
05a536ef
TL
362
363#. Navigate to the host where the master copy of the cluster's
364 ``ceph.conf`` file is kept:
c07f9fc5 365
39ae355f 366 .. prompt:: bash $
c07f9fc5 367
39ae355f
TL
368 ssh {admin-host}
369 cd /etc/ceph
370 vim ceph.conf
371
05a536ef
TL
372#. Remove the OSD entry from your ``ceph.conf`` file (if such an entry
373 exists)::
c07f9fc5 374
05a536ef
TL
375 [osd.1]
376 host = {hostname}
c07f9fc5 377
05a536ef
TL
378#. Copy the updated ``ceph.conf`` file from the location on the host where the
379 master copy of the cluster's ``ceph.conf`` is kept to the ``/etc/ceph``
380 directory of the other hosts in your cluster.
c07f9fc5 381
05a536ef 382.. _ceph_osd_purge_procedure_pre_luminous:
c07f9fc5 383
05a536ef
TL
384If your Ceph cluster is older than Luminous, you will be unable to use the
385``ceph osd purge`` command. Instead, carry out the following procedure:
7c673cae 386
05a536ef
TL
387#. Remove the OSD from the CRUSH map so that it no longer receives data (for
388 more details, see `Remove an OSD`_):
39ae355f
TL
389
390 .. prompt:: bash $
391
392 ceph osd crush remove {name}
393
05a536ef
TL
394 Instead of removing the OSD from the CRUSH map, you might opt for one of two
395 alternatives: (1) decompile the CRUSH map, remove the OSD from the device
396 list, and remove the device from the host bucket; (2) remove the host bucket
397 from the CRUSH map (provided that it is in the CRUSH map and that you intend
398 to remove the host), recompile the map, and set it:
399
400
39ae355f
TL
401#. Remove the OSD authentication key:
402
403 .. prompt:: bash $
404
405 ceph auth del osd.{osd-num}
7c673cae 406
39ae355f 407#. Remove the OSD:
7c673cae 408
39ae355f 409 .. prompt:: bash $
11fdf7f2 410
39ae355f 411 ceph osd rm {osd-num}
11fdf7f2 412
05a536ef 413 For example:
7c673cae 414
39ae355f 415 .. prompt:: bash $
7c673cae 416
39ae355f 417 ceph osd rm 1
11fdf7f2 418
7c673cae 419.. _Remove an OSD: ../crush-map#removeosd