]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Adding/Removing OSDs | |
3 | ====================== | |
4 | ||
05a536ef | 5 | When a cluster is up and running, it is possible to add or remove OSDs. |
7c673cae FG |
6 | |
7 | Adding OSDs | |
8 | =========== | |
9 | ||
05a536ef TL |
10 | OSDs can be added to a cluster in order to expand the cluster's capacity and |
11 | resilience. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on one | |
12 | storage drive within a host machine. But if your host machine has multiple | |
13 | storage drives, you may map one ``ceph-osd`` daemon for each drive on the | |
14 | machine. | |
7c673cae | 15 | |
05a536ef TL |
16 | It's a good idea to check the capacity of your cluster so that you know when it |
17 | approaches its capacity limits. If your cluster has reached its ``near full`` | |
18 | ratio, then you should add OSDs to expand your cluster's capacity. | |
7c673cae | 19 | |
05a536ef TL |
20 | .. warning:: Do not add an OSD after your cluster has reached its ``full |
21 | ratio``. OSD failures that occur after the cluster reaches its ``near full | |
22 | ratio`` might cause the cluster to exceed its ``full ratio``. | |
7c673cae | 23 | |
7c673cae | 24 | |
05a536ef TL |
25 | Deploying your Hardware |
26 | ----------------------- | |
27 | ||
28 | If you are also adding a new host when adding a new OSD, see `Hardware | |
7c673cae | 29 | Recommendations`_ for details on minimum recommendations for OSD hardware. To |
05a536ef TL |
30 | add an OSD host to your cluster, begin by making sure that an appropriate |
31 | version of Linux has been installed on the host machine and that all initial | |
32 | preparations for your storage drives have been carried out. For details, see | |
33 | `Filesystem Recommendations`_. | |
34 | ||
35 | Next, add your OSD host to a rack in your cluster, connect the host to the | |
36 | network, and ensure that the host has network connectivity. For details, see | |
37 | `Network Configuration Reference`_. | |
7c673cae | 38 | |
7c673cae FG |
39 | |
40 | .. _Hardware Recommendations: ../../../start/hardware-recommendations | |
41 | .. _Filesystem Recommendations: ../../configuration/filesystem-recommendations | |
42 | .. _Network Configuration Reference: ../../configuration/network-config-ref | |
43 | ||
05a536ef TL |
44 | Installing the Required Software |
45 | -------------------------------- | |
7c673cae | 46 | |
05a536ef TL |
47 | If your cluster has been manually deployed, you will need to install Ceph |
48 | software packages manually. For details, see `Installing Ceph (Manual)`_. | |
49 | Configure SSH for the appropriate user to have both passwordless authentication | |
7c673cae FG |
50 | and root permissions. |
51 | ||
52 | .. _Installing Ceph (Manual): ../../../install | |
53 | ||
54 | ||
55 | Adding an OSD (Manual) | |
56 | ---------------------- | |
57 | ||
05a536ef TL |
58 | The following procedure sets up a ``ceph-osd`` daemon, configures this OSD to |
59 | use one drive, and configures the cluster to distribute data to the OSD. If | |
60 | your host machine has multiple drives, you may add an OSD for each drive on the | |
61 | host by repeating this procedure. | |
7c673cae | 62 | |
05a536ef TL |
63 | As the following procedure will demonstrate, adding an OSD involves creating a |
64 | metadata directory for it, configuring a data storage drive, adding the OSD to | |
65 | the cluster, and then adding it to the CRUSH map. | |
7c673cae | 66 | |
05a536ef TL |
67 | When you add the OSD to the CRUSH map, you will need to consider the weight you |
68 | assign to the new OSD. Since storage drive capacities increase over time, newer | |
69 | OSD hosts are likely to have larger hard drives than the older hosts in the | |
70 | cluster have and therefore might have greater weight as well. | |
7c673cae | 71 | |
05a536ef TL |
72 | .. tip:: Ceph works best with uniform hardware across pools. It is possible to |
73 | add drives of dissimilar size and then adjust their weights accordingly. | |
74 | However, for best performance, consider a CRUSH hierarchy that has drives of | |
75 | the same type and size. It is better to add larger drives uniformly to | |
76 | existing hosts. This can be done incrementally, replacing smaller drives | |
77 | each time the new drives are added. | |
7c673cae | 78 | |
05a536ef TL |
79 | #. Create the new OSD by running a command of the following form. If you opt |
80 | not to specify a UUID in this command, the UUID will be set automatically | |
81 | when the OSD starts up. The OSD number, which is needed for subsequent | |
82 | steps, is found in the command's output: | |
11fdf7f2 | 83 | |
39ae355f TL |
84 | .. prompt:: bash $ |
85 | ||
86 | ceph osd create [{uuid} [{id}]] | |
7c673cae | 87 | |
05a536ef TL |
88 | If the optional parameter {id} is specified it will be used as the OSD ID. |
89 | However, if the ID number is already in use, the command will fail. | |
7c673cae | 90 | |
05a536ef TL |
91 | .. warning:: Explicitly specifying the ``{id}`` parameter is not |
92 | recommended. IDs are allocated as an array, and any skipping of entries | |
93 | consumes extra memory. This memory consumption can become significant if | |
94 | there are large gaps or if clusters are large. By leaving the ``{id}`` | |
95 | parameter unspecified, we ensure that Ceph uses the smallest ID number | |
96 | available and that these problems are avoided. | |
7c673cae | 97 | |
05a536ef TL |
98 | #. Create the default directory for your new OSD by running commands of the |
99 | following form: | |
7c673cae | 100 | |
39ae355f | 101 | .. prompt:: bash $ |
7c673cae | 102 | |
39ae355f TL |
103 | ssh {new-osd-host} |
104 | sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} | |
11fdf7f2 | 105 | |
05a536ef TL |
106 | #. If the OSD will be created on a drive other than the OS drive, prepare it |
107 | for use with Ceph. Run commands of the following form: | |
39ae355f TL |
108 | |
109 | .. prompt:: bash $ | |
7c673cae | 110 | |
39ae355f TL |
111 | ssh {new-osd-host} |
112 | sudo mkfs -t {fstype} /dev/{drive} | |
113 | sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} | |
7c673cae | 114 | |
05a536ef | 115 | #. Initialize the OSD data directory by running commands of the following form: |
11fdf7f2 | 116 | |
39ae355f | 117 | .. prompt:: bash $ |
7c673cae | 118 | |
39ae355f TL |
119 | ssh {new-osd-host} |
120 | ceph-osd -i {osd-num} --mkfs --mkkey | |
11fdf7f2 | 121 | |
05a536ef | 122 | Make sure that the directory is empty before running ``ceph-osd``. |
7c673cae | 123 | |
05a536ef TL |
124 | #. Register the OSD authentication key by running a command of the following |
125 | form: | |
7c673cae | 126 | |
39ae355f | 127 | .. prompt:: bash $ |
7c673cae | 128 | |
39ae355f | 129 | ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring |
7c673cae | 130 | |
05a536ef TL |
131 | This presentation of the command has ``ceph-{osd-num}`` in the listed path |
132 | because many clusters have the name ``ceph``. However, if your cluster name | |
133 | is not ``ceph``, then the string ``ceph`` in ``ceph-{osd-num}`` needs to be | |
134 | replaced with your cluster name. For example, if your cluster name is | |
135 | ``cluster1``, then the path in the command should be | |
136 | ``/var/lib/ceph/osd/cluster1-{osd-num}/keyring``. | |
137 | ||
138 | #. Add the OSD to the CRUSH map by running the following command. This allows | |
139 | the OSD to begin receiving data. The ``ceph osd crush add`` command can add | |
140 | OSDs to the CRUSH hierarchy wherever you want. If you specify one or more | |
141 | buckets, the command places the OSD in the most specific of those buckets, | |
142 | and it moves that bucket underneath any other buckets that you have | |
143 | specified. **Important:** If you specify only the root bucket, the command | |
144 | will attach the OSD directly to the root, but CRUSH rules expect OSDs to be | |
145 | inside of hosts. If the OSDs are not inside hosts, the OSDS will likely not | |
146 | receive any data. | |
7c673cae | 147 | |
39ae355f TL |
148 | .. prompt:: bash $ |
149 | ||
150 | ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] | |
7c673cae | 151 | |
05a536ef TL |
152 | Note that there is another way to add a new OSD to the CRUSH map: decompile |
153 | the CRUSH map, add the OSD to the device list, add the host as a bucket (if | |
154 | it is not already in the CRUSH map), add the device as an item in the host, | |
155 | assign the device a weight, recompile the CRUSH map, and set the CRUSH map. | |
156 | For details, see `Add/Move an OSD`_. This is rarely necessary with recent | |
157 | releases (this sentence was written the month that Reef was released). | |
7c673cae FG |
158 | |
159 | ||
28e407b8 | 160 | .. _rados-replacing-an-osd: |
7c673cae | 161 | |
c07f9fc5 FG |
162 | Replacing an OSD |
163 | ---------------- | |
164 | ||
05a536ef TL |
165 | .. note:: If the procedure in this section does not work for you, try the |
166 | instructions in the ``cephadm`` documentation: | |
167 | :ref:`cephadm-replacing-an-osd`. | |
168 | ||
169 | Sometimes OSDs need to be replaced: for example, when a disk fails, or when an | |
170 | administrator wants to reprovision OSDs with a new back end (perhaps when | |
171 | switching from Filestore to BlueStore). Replacing an OSD differs from `Removing | |
172 | the OSD`_ in that the replaced OSD's ID and CRUSH map entry must be kept intact | |
173 | after the OSD is destroyed for replacement. | |
39ae355f | 174 | |
c07f9fc5 | 175 | |
05a536ef | 176 | #. Make sure that it is safe to destroy the OSD: |
9f95a23c | 177 | |
39ae355f | 178 | .. prompt:: bash $ |
9f95a23c | 179 | |
39ae355f | 180 | while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done |
c07f9fc5 | 181 | |
05a536ef | 182 | #. Destroy the OSD: |
39ae355f TL |
183 | |
184 | .. prompt:: bash $ | |
185 | ||
186 | ceph osd destroy {id} --yes-i-really-mean-it | |
c07f9fc5 | 187 | |
05a536ef TL |
188 | #. *Optional*: If the disk that you plan to use is not a new disk and has been |
189 | used before for other purposes, zap the disk: | |
39ae355f TL |
190 | |
191 | .. prompt:: bash $ | |
c07f9fc5 | 192 | |
39ae355f | 193 | ceph-volume lvm zap /dev/sdX |
c07f9fc5 | 194 | |
05a536ef TL |
195 | #. Prepare the disk for replacement by using the ID of the OSD that was |
196 | destroyed in previous steps: | |
c07f9fc5 | 197 | |
39ae355f | 198 | .. prompt:: bash $ |
c07f9fc5 | 199 | |
39ae355f | 200 | ceph-volume lvm prepare --osd-id {id} --data /dev/sdX |
c07f9fc5 | 201 | |
05a536ef | 202 | #. Finally, activate the OSD: |
39ae355f TL |
203 | |
204 | .. prompt:: bash $ | |
205 | ||
206 | ceph-volume lvm activate {id} {fsid} | |
11fdf7f2 | 207 | |
05a536ef TL |
208 | Alternatively, instead of carrying out the final two steps (preparing the disk |
209 | and activating the OSD), you can re-create the OSD by running a single command | |
210 | of the following form: | |
39ae355f TL |
211 | |
212 | .. prompt:: bash $ | |
11fdf7f2 | 213 | |
39ae355f | 214 | ceph-volume lvm create --osd-id {id} --data /dev/sdX |
c07f9fc5 | 215 | |
7c673cae FG |
216 | Starting the OSD |
217 | ---------------- | |
218 | ||
05a536ef TL |
219 | After an OSD is added to Ceph, the OSD is in the cluster. However, until it is |
220 | started, the OSD is considered ``down`` and ``in``. The OSD is not running and | |
221 | will be unable to receive data. To start an OSD, either run ``service ceph`` | |
222 | from your admin host or run a command of the following form to start the OSD | |
223 | from its host machine: | |
7c673cae | 224 | |
39ae355f TL |
225 | .. prompt:: bash $ |
226 | ||
227 | sudo systemctl start ceph-osd@{osd-num} | |
7c673cae | 228 | |
05a536ef TL |
229 | After the OSD is started, it is considered ``up`` and ``in``. |
230 | ||
231 | Observing the Data Migration | |
232 | ---------------------------- | |
7c673cae | 233 | |
05a536ef TL |
234 | After the new OSD has been added to the CRUSH map, Ceph begins rebalancing the |
235 | cluster by migrating placement groups (PGs) to the new OSD. To observe this | |
236 | process by using the `ceph`_ tool, run the following command: | |
7c673cae | 237 | |
05a536ef | 238 | .. prompt:: bash $ |
7c673cae | 239 | |
05a536ef | 240 | ceph -w |
7c673cae | 241 | |
05a536ef | 242 | Or: |
39ae355f TL |
243 | |
244 | .. prompt:: bash $ | |
7c673cae | 245 | |
05a536ef | 246 | watch ceph status |
7c673cae | 247 | |
05a536ef TL |
248 | The PG states will first change from ``active+clean`` to ``active, some |
249 | degraded objects`` and then return to ``active+clean`` when migration | |
250 | completes. When you are finished observing, press Ctrl-C to exit. | |
7c673cae | 251 | |
7c673cae FG |
252 | .. _Add/Move an OSD: ../crush-map#addosd |
253 | .. _ceph: ../monitoring | |
254 | ||
255 | ||
7c673cae FG |
256 | Removing OSDs (Manual) |
257 | ====================== | |
258 | ||
05a536ef TL |
259 | It is possible to remove an OSD manually while the cluster is running: you |
260 | might want to do this in order to reduce the size of the cluster or when | |
261 | replacing hardware. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on | |
262 | one storage drive within a host machine. Alternatively, if your host machine | |
263 | has multiple storage drives, you might need to remove multiple ``ceph-osd`` | |
264 | daemons: one daemon for each drive on the machine. | |
7c673cae | 265 | |
05a536ef TL |
266 | .. warning:: Before you begin the process of removing an OSD, make sure that |
267 | your cluster is not near its ``full ratio``. Otherwise the act of removing | |
268 | OSDs might cause the cluster to reach or exceed its ``full ratio``. | |
11fdf7f2 | 269 | |
7c673cae | 270 | |
05a536ef TL |
271 | Taking the OSD ``out`` of the Cluster |
272 | ------------------------------------- | |
7c673cae | 273 | |
05a536ef TL |
274 | OSDs are typically ``up`` and ``in`` before they are removed from the cluster. |
275 | Before the OSD can be removed from the cluster, the OSD must be taken ``out`` | |
276 | of the cluster so that Ceph can begin rebalancing and copying its data to other | |
277 | OSDs. To take an OSD ``out`` of the cluster, run a command of the following | |
278 | form: | |
7c673cae | 279 | |
39ae355f TL |
280 | .. prompt:: bash $ |
281 | ||
282 | ceph osd out {osd-num} | |
7c673cae FG |
283 | |
284 | ||
05a536ef TL |
285 | Observing the Data Migration |
286 | ---------------------------- | |
7c673cae | 287 | |
05a536ef TL |
288 | After the OSD has been taken ``out`` of the cluster, Ceph begins rebalancing |
289 | the cluster by migrating placement groups out of the OSD that was removed. To | |
290 | observe this process by using the `ceph`_ tool, run the following command: | |
39ae355f TL |
291 | |
292 | .. prompt:: bash $ | |
7c673cae | 293 | |
39ae355f | 294 | ceph -w |
7c673cae | 295 | |
05a536ef TL |
296 | The PG states will change from ``active+clean`` to ``active, some degraded |
297 | objects`` and will then return to ``active+clean`` when migration completes. | |
298 | When you are finished observing, press Ctrl-C to exit. | |
7c673cae | 299 | |
05a536ef TL |
300 | .. note:: Under certain conditions, the action of taking ``out`` an OSD |
301 | might lead CRUSH to encounter a corner case in which some PGs remain stuck | |
302 | in the ``active+remapped`` state. This problem sometimes occurs in small | |
303 | clusters with few hosts (for example, in a small testing cluster). To | |
304 | address this problem, mark the OSD ``in`` by running a command of the | |
305 | following form: | |
7c673cae | 306 | |
39ae355f TL |
307 | .. prompt:: bash $ |
308 | ||
309 | ceph osd in {osd-num} | |
7c673cae | 310 | |
05a536ef TL |
311 | After the OSD has come back to its initial state, do not mark the OSD |
312 | ``out`` again. Instead, set the OSD's weight to ``0`` by running a command | |
313 | of the following form: | |
7c673cae | 314 | |
39ae355f TL |
315 | .. prompt:: bash $ |
316 | ||
317 | ceph osd crush reweight osd.{osd-num} 0 | |
7c673cae | 318 | |
05a536ef TL |
319 | After the OSD has been reweighted, observe the data migration and confirm |
320 | that it has completed successfully. The difference between marking an OSD | |
321 | ``out`` and reweighting the OSD to ``0`` has to do with the bucket that | |
322 | contains the OSD. When an OSD is marked ``out``, the weight of the bucket is | |
323 | not changed. But when an OSD is reweighted to ``0``, the weight of the | |
324 | bucket is updated (namely, the weight of the OSD is subtracted from the | |
325 | overall weight of the bucket). When operating small clusters, it can | |
326 | sometimes be preferable to use the above reweight command. | |
7c673cae FG |
327 | |
328 | ||
329 | Stopping the OSD | |
330 | ---------------- | |
331 | ||
05a536ef TL |
332 | After you take an OSD ``out`` of the cluster, the OSD might still be running. |
333 | In such a case, the OSD is ``up`` and ``out``. Before it is removed from the | |
334 | cluster, the OSD must be stopped by running commands of the following form: | |
39ae355f TL |
335 | |
336 | .. prompt:: bash $ | |
7c673cae | 337 | |
39ae355f TL |
338 | ssh {osd-host} |
339 | sudo systemctl stop ceph-osd@{osd-num} | |
7c673cae | 340 | |
05a536ef | 341 | After the OSD has been stopped, it is ``down``. |
7c673cae FG |
342 | |
343 | ||
344 | Removing the OSD | |
345 | ---------------- | |
346 | ||
05a536ef TL |
347 | The following procedure removes an OSD from the cluster map, removes the OSD's |
348 | authentication key, removes the OSD from the OSD map, and removes the OSD from | |
349 | the ``ceph.conf`` file. If your host has multiple drives, it might be necessary | |
350 | to remove an OSD from each drive by repeating this procedure. | |
7c673cae | 351 | |
05a536ef TL |
352 | #. Begin by having the cluster forget the OSD. This step removes the OSD from |
353 | the CRUSH map, removes the OSD's authentication key, and removes the OSD | |
354 | from the OSD map. (The :ref:`purge subcommand <ceph-admin-osd>` was | |
355 | introduced in Luminous. For older releases, see :ref:`the procedure linked | |
356 | here <ceph_osd_purge_procedure_pre_luminous>`.): | |
39ae355f TL |
357 | |
358 | .. prompt:: bash $ | |
c07f9fc5 | 359 | |
39ae355f | 360 | ceph osd purge {id} --yes-i-really-mean-it |
c07f9fc5 | 361 | |
05a536ef TL |
362 | |
363 | #. Navigate to the host where the master copy of the cluster's | |
364 | ``ceph.conf`` file is kept: | |
c07f9fc5 | 365 | |
39ae355f | 366 | .. prompt:: bash $ |
c07f9fc5 | 367 | |
39ae355f TL |
368 | ssh {admin-host} |
369 | cd /etc/ceph | |
370 | vim ceph.conf | |
371 | ||
05a536ef TL |
372 | #. Remove the OSD entry from your ``ceph.conf`` file (if such an entry |
373 | exists):: | |
c07f9fc5 | 374 | |
05a536ef TL |
375 | [osd.1] |
376 | host = {hostname} | |
c07f9fc5 | 377 | |
05a536ef TL |
378 | #. Copy the updated ``ceph.conf`` file from the location on the host where the |
379 | master copy of the cluster's ``ceph.conf`` is kept to the ``/etc/ceph`` | |
380 | directory of the other hosts in your cluster. | |
c07f9fc5 | 381 | |
05a536ef | 382 | .. _ceph_osd_purge_procedure_pre_luminous: |
c07f9fc5 | 383 | |
05a536ef TL |
384 | If your Ceph cluster is older than Luminous, you will be unable to use the |
385 | ``ceph osd purge`` command. Instead, carry out the following procedure: | |
7c673cae | 386 | |
05a536ef TL |
387 | #. Remove the OSD from the CRUSH map so that it no longer receives data (for |
388 | more details, see `Remove an OSD`_): | |
39ae355f TL |
389 | |
390 | .. prompt:: bash $ | |
391 | ||
392 | ceph osd crush remove {name} | |
393 | ||
05a536ef TL |
394 | Instead of removing the OSD from the CRUSH map, you might opt for one of two |
395 | alternatives: (1) decompile the CRUSH map, remove the OSD from the device | |
396 | list, and remove the device from the host bucket; (2) remove the host bucket | |
397 | from the CRUSH map (provided that it is in the CRUSH map and that you intend | |
398 | to remove the host), recompile the map, and set it: | |
399 | ||
400 | ||
39ae355f TL |
401 | #. Remove the OSD authentication key: |
402 | ||
403 | .. prompt:: bash $ | |
404 | ||
405 | ceph auth del osd.{osd-num} | |
7c673cae | 406 | |
39ae355f | 407 | #. Remove the OSD: |
7c673cae | 408 | |
39ae355f | 409 | .. prompt:: bash $ |
11fdf7f2 | 410 | |
39ae355f | 411 | ceph osd rm {osd-num} |
11fdf7f2 | 412 | |
05a536ef | 413 | For example: |
7c673cae | 414 | |
39ae355f | 415 | .. prompt:: bash $ |
7c673cae | 416 | |
39ae355f | 417 | ceph osd rm 1 |
11fdf7f2 | 418 | |
7c673cae | 419 | .. _Remove an OSD: ../crush-map#removeosd |