ceph/doc/rados/operations/add-or-rm-osds.rst

   1 ======================
   2  Adding/Removing OSDs
   3 ======================
   4
   5 When you have a cluster up and running, you may add OSDs or remove OSDs
   6 from the cluster at runtime.
   7
   8 Adding OSDs
   9 ===========
  10
  11 When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an
  12 OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a
  13 host machine. If your host has multiple storage drives, you may map one
  14 ``ceph-osd`` daemon for each drive.
  15
  16 Generally, it's a good idea to check the capacity of your cluster to see if you
  17 are reaching the upper end of its capacity. As your cluster reaches its ``near
  18 full`` ratio, you should add one or more OSDs to expand your cluster's capacity.
  19
  20 .. warning:: Do not let your cluster reach its ``full ratio`` before
  21    adding an OSD. OSD failures that occur after the cluster reaches
  22    its ``near full`` ratio may cause the cluster to exceed its
  23    ``full ratio``.
  24
  25 Deploy your Hardware
  26 --------------------
  27
  28 If you are adding a new host when adding a new OSD,  see `Hardware
  29 Recommendations`_ for details on minimum recommendations for OSD hardware. To
  30 add an OSD host to your cluster, first make sure you have an up-to-date version
  31 of Linux installed, and you have made some initial preparations for your
  32 storage drives.  See `Filesystem Recommendations`_ for details.
  33
  34 Add your OSD host to a rack in your cluster, connect it to the network
  35 and ensure that it has network connectivity. See the `Network Configuration
  36 Reference`_ for details.
  37
  38 .. _Hardware Recommendations: ../../../start/hardware-recommendations
  39 .. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
  40 .. _Network Configuration Reference: ../../configuration/network-config-ref
  41
  42 Install the Required Software
  43 -----------------------------
  44
  45 For manually deployed clusters, you must install Ceph packages
  46 manually. See `Installing Ceph (Manual)`_ for details.
  47 You should configure SSH to a user with password-less authentication
  48 and root permissions.
  49
  50 .. _Installing Ceph (Manual): ../../../install
  51
  52
  53 Adding an OSD (Manual)
  54 ----------------------
  55
  56 This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive,
  57 and configures the cluster to distribute data to the OSD. If your host has
  58 multiple drives, you may add an OSD for each drive by repeating this procedure.
  59
  60 To add an OSD, create a data directory for it, mount a drive to that directory,
  61 add the OSD to the cluster, and then add it to the CRUSH map.
  62
  63 When you add the OSD to the CRUSH map, consider the weight you give to the new
  64 OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger
  65 hard drives than older hosts in the cluster (i.e., they may have greater
  66 weight).
  67
  68 .. tip:: Ceph prefers uniform hardware across pools. If you are adding drives
  69    of dissimilar size, you can adjust their weights. However, for best
  70    performance, consider a CRUSH hierarchy with drives of the same type/size.
  71
  72 #. Create the OSD. If no UUID is given, it will be set automatically when the
  73    OSD starts up. The following command will output the OSD number, which you
  74    will need for subsequent steps. ::
  75
  76         ceph osd create [{uuid} [{id}]]
  77
  78    If the optional parameter {id} is given it will be used as the OSD id.
  79    Note, in this case the command may fail if the number is already in use.
  80
  81    .. warning:: In general, explicitly specifying {id} is not recommended.
  82       IDs are allocated as an array, and skipping entries consumes some extra
  83       memory. This can become significant if there are large gaps and/or
  84       clusters are large. If {id} is not specified, the smallest available is
  85       used.
  86
  87 #. Create the default directory on your new OSD. ::
  88
  89         ssh {new-osd-host}
  90         sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
  91
  92
  93 #. If the OSD is for a drive other than the OS drive, prepare it
  94    for use with Ceph, and mount it to the directory you just created::
  95
  96         ssh {new-osd-host}
  97         sudo mkfs -t {fstype} /dev/{drive}
  98         sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
  99
 100
 101 #. Initialize the OSD data directory. ::
 102
 103         ssh {new-osd-host}
 104         ceph-osd -i {osd-num} --mkfs --mkkey
 105
 106    The directory must be empty before you can run ``ceph-osd``.
 107
 108 #. Register the OSD authentication key. The value of ``ceph`` for
 109    ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.  If your
 110    cluster name differs from ``ceph``, use your cluster name instead.::
 111
 112         ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
 113
 114
 115 #. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The
 116    ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy
 117    wherever you wish. If you specify at least one bucket, the command
 118    will place the OSD into the most specific bucket you specify, *and* it will
 119    move that bucket underneath any other buckets you specify. **Important:** If
 120    you specify only the root bucket, the command will attach the OSD directly
 121    to the root, but CRUSH rules expect OSDs to be inside of hosts.
 122
 123    Execute the following::
 124
 125         ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]
 126
 127    You may also decompile the CRUSH map, add the OSD to the device list, add the
 128    host as a bucket (if it's not already in the CRUSH map), add the device as an
 129    item in the host, assign it a weight, recompile it and set it. See
 130    `Add/Move an OSD`_ for details.
 131
 132
 133 .. _rados-replacing-an-osd:
 134
 135 Replacing an OSD
 136 ----------------
 137
 138 When disks fail, or if an administrator wants to reprovision OSDs with a new
 139 backend, for instance, for switching from FileStore to BlueStore, OSDs need to
 140 be replaced. Unlike `Removing the OSD`_, replaced OSD's id and CRUSH map entry
 141 need to be keep intact after the OSD is destroyed for replacement.
 142
 143 #. Make sure it is safe to destroy the OSD::
 144
 145      while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done
 146
 147 #. Destroy the OSD first::
 148
 149      ceph osd destroy {id} --yes-i-really-mean-it
 150
 151 #. Zap a disk for the new OSD, if the disk was used before for other purposes.
 152    It's not necessary for a new disk::
 153
 154      ceph-volume lvm zap /dev/sdX
 155
 156 #. Prepare the disk for replacement by using the previously destroyed OSD id::
 157
 158      ceph-volume lvm prepare --osd-id {id} --data /dev/sdX
 159
 160 #. And activate the OSD::
 161
 162      ceph-volume lvm activate {id} {fsid}
 163
 164 Alternatively, instead of preparing and activating, the device can be recreated
 165 in one call, like::
 166
 167      ceph-volume lvm create --osd-id {id} --data /dev/sdX
 168
 169
 170 Starting the OSD
 171 ----------------
 172
 173 After you add an OSD to Ceph, the OSD is in your configuration. However,
 174 it is not yet running. The OSD is ``down`` and ``in``. You must start
 175 your new OSD before it can begin receiving data. You may use
 176 ``service ceph`` from your admin host or start the OSD from its host
 177 machine.
 178
 179 For Ubuntu Trusty use Upstart. ::
 180
 181         sudo start ceph-osd id={osd-num}
 182
 183 For all other distros use systemd. ::
 184
 185         sudo systemctl start ceph-osd@{osd-num}
 186
 187
 188 Once you start your OSD, it is ``up`` and ``in``.
 189
 190
 191 Observe the Data Migration
 192 --------------------------
 193
 194 Once you have added your new OSD to the CRUSH map, Ceph  will begin rebalancing
 195 the server by migrating placement groups to your new OSD. You can observe this
 196 process with  the `ceph`_ tool. ::
 197
 198         ceph -w
 199
 200 You should see the placement group states change from ``active+clean`` to
 201 ``active, some degraded objects``, and finally ``active+clean`` when migration
 202 completes. (Control-c to exit.)
 203
 204
 205 .. _Add/Move an OSD: ../crush-map#addosd
 206 .. _ceph: ../monitoring
 207
 208
 209
 210 Removing OSDs (Manual)
 211 ======================
 212
 213 When you want to reduce the size of a cluster or replace hardware, you may
 214 remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd``
 215 daemon for one storage drive within a host machine. If your host has multiple
 216 storage drives, you may need to remove one ``ceph-osd`` daemon for each drive.
 217 Generally, it's a good idea to check the capacity of your cluster to see if you
 218 are reaching the upper end of its capacity. Ensure that when you remove an OSD
 219 that your cluster is not at its ``near full`` ratio.
 220
 221 .. warning:: Do not let your cluster reach its ``full ratio`` when
 222    removing an OSD. Removing OSDs could cause the cluster to reach
 223    or exceed its ``full ratio``.
 224
 225
 226 Take the OSD out of the Cluster
 227 -----------------------------------
 228
 229 Before you remove an OSD, it is usually ``up`` and ``in``.  You need to take it
 230 out of the cluster so that Ceph can begin rebalancing and copying its data to
 231 other OSDs. ::
 232
 233         ceph osd out {osd-num}
 234
 235
 236 Observe the Data Migration
 237 --------------------------
 238
 239 Once you have taken your OSD ``out`` of the cluster, Ceph  will begin
 240 rebalancing the cluster by migrating placement groups out of the OSD you
 241 removed. You can observe  this process with  the `ceph`_ tool. ::
 242
 243         ceph -w
 244
 245 You should see the placement group states change from ``active+clean`` to
 246 ``active, some degraded objects``, and finally ``active+clean`` when migration
 247 completes. (Control-c to exit.)
 248
 249 .. note:: Sometimes, typically in a "small" cluster with few hosts (for
 250    instance with a small testing cluster), the fact to take ``out`` the
 251    OSD can spawn a CRUSH corner case where some PGs remain stuck in the
 252    ``active+remapped`` state. If you are in this case, you should mark
 253    the OSD ``in`` with:
 254
 255        ``ceph osd in {osd-num}``
 256
 257    to come back to the initial state and then, instead of marking ``out``
 258    the OSD, set its weight to 0 with:
 259
 260        ``ceph osd crush reweight osd.{osd-num} 0``
 261
 262    After that, you can observe the data migration which should come to its
 263    end. The difference between marking ``out`` the OSD and reweighting it
 264    to 0 is that in the first case the weight of the bucket which contains
 265    the OSD is not changed whereas in the second case the weight of the bucket
 266    is updated (and decreased of the OSD weight). The reweight command could
 267    be sometimes favoured in the case of a "small" cluster.
 268
 269
 270
 271 Stopping the OSD
 272 ----------------
 273
 274 After you take an OSD out of the cluster, it may still be running.
 275 That is, the OSD may be ``up`` and ``out``. You must stop
 276 your OSD before you remove it from the configuration. ::
 277
 278         ssh {osd-host}
 279         sudo systemctl stop ceph-osd@{osd-num}
 280
 281 Once you stop your OSD, it is ``down``.
 282
 283
 284 Removing the OSD
 285 ----------------
 286
 287 This procedure removes an OSD from a cluster map, removes its authentication
 288 key, removes the OSD from the OSD map, and removes the OSD from the
 289 ``ceph.conf`` file. If your host has multiple drives, you may need to remove an
 290 OSD for each drive by repeating this procedure.
 291
 292 #. Let the cluster forget the OSD first. This step removes the OSD from the CRUSH
 293    map, removes its authentication key. And it is removed from the OSD map as
 294    well. Please note the :ref:`purge subcommand <ceph-admin-osd>` is introduced in Luminous, for older
 295    versions, please see below ::
 296
 297     ceph osd purge {id} --yes-i-really-mean-it
 298
 299 #. Navigate to the host where you keep the master copy of the cluster's
 300    ``ceph.conf`` file. ::
 301
 302         ssh {admin-host}
 303         cd /etc/ceph
 304         vim ceph.conf
 305
 306 #. Remove the OSD entry from your ``ceph.conf`` file (if it exists). ::
 307
 308         [osd.1]
 309                 host = {hostname}
 310
 311 #. From the host where you keep the master copy of the cluster's ``ceph.conf`` file,
 312    copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other
 313    hosts in your cluster.
 314
 315 If your Ceph cluster is older than Luminous, instead of using ``ceph osd purge``,
 316 you need to perform this step manually:
 317
 318
 319 #. Remove the OSD from the CRUSH map so that it no longer receives data. You may
 320    also decompile the CRUSH map, remove the OSD from the device list, remove the
 321    device as an item in the host bucket or remove the host  bucket (if it's in the
 322    CRUSH map and you intend to remove the host), recompile the map and set it.
 323    See `Remove an OSD`_ for details. ::
 324
 325         ceph osd crush remove {name}
 326
 327 #. Remove the OSD authentication key. ::
 328
 329         ceph auth del osd.{osd-num}
 330
 331    The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.
 332    If your cluster name differs from ``ceph``, use your cluster name instead.
 333
 334 #. Remove the OSD. ::
 335
 336         ceph osd rm {osd-num}
 337         #for example
 338         ceph osd rm 1
 339
 340
 341 .. _Remove an OSD: ../crush-map#removeosd