4 .. _device management: ../rados/operations/devices
5 .. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt
10 ``ceph-volume`` scans each cluster in the host from time to time in order
11 to determine which devices are present and whether they are eligible to be
14 To print a list of devices discovered by ``cephadm``, run this command:
18 ceph orch device ls [--hostname=...] [--wide] [--refresh]
23 Hostname Path Type Serial Size Health Ident Fault Available
24 srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No
25 srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No
26 srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No
27 srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No
28 srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No
29 srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No
30 srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No
31 srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No
32 srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No
33 srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No
34 srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No
36 Using the ``--wide`` option provides all details relating to the device,
37 including any reasons that the device might not be eligible for use as an OSD.
39 In the above example you can see fields named "Health", "Ident", and "Fault".
40 This information is provided by integration with `libstoragemgmt`_. By default,
41 this integration is disabled (because `libstoragemgmt`_ may not be 100%
42 compatible with your hardware). To make ``cephadm`` include these fields,
43 enable cephadm's "enhanced device scan" option as follows;
47 ceph config set mgr mgr/cephadm/device_enhanced_scan true
50 Although the libstoragemgmt library performs standard SCSI inquiry calls,
51 there is no guarantee that your firmware fully implements these standards.
52 This can lead to erratic behaviour and even bus resets on some older
53 hardware. It is therefore recommended that, before enabling this feature,
54 you test your hardware's compatibility with libstoragemgmt first to avoid
55 unplanned interruptions to services.
57 There are a number of ways to test compatibility, but the simplest may be
58 to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell
59 lsmcli ldl``. If your hardware is supported you should see something like
64 Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status
65 ----------------------------------------------------------------------------
66 /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good
67 /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good
70 After you have enabled libstoragemgmt support, the output will look something
76 Hostname Path Type Serial Size Health Ident Fault Available
77 srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No
78 srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No
81 In this example, libstoragemgmt has confirmed the health of the drives and the ability to
82 interact with the Identification and Fault LEDs on the drive enclosures. For further
83 information about interacting with these LEDs, refer to `device management`_.
86 The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based
87 local disks only. There is no official support for NVMe devices (PCIe)
89 .. _cephadm-deploy-osds:
94 Listing Storage Devices
95 -----------------------
97 In order to deploy an OSD, there must be a storage device that is *available* on
98 which the OSD will be deployed.
100 Run this command to display an inventory of storage devices on all cluster hosts:
106 A storage device is considered *available* if all of the following
109 * The device must have no partitions.
110 * The device must not have any LVM state.
111 * The device must not be mounted.
112 * The device must not contain a file system.
113 * The device must not contain a Ceph BlueStore OSD.
114 * The device must be larger than 5 GB.
116 Ceph will not provision an OSD on a device that is not available.
121 There are a few ways to create new OSDs:
123 * Tell Ceph to consume any available and unused storage device:
127 ceph orch apply osd --all-available-devices
129 * Create an OSD from a specific device on a specific host:
133 ceph orch daemon add osd *<host>*:*<device-path>*
139 ceph orch daemon add osd host1:/dev/sdb
141 * You can use :ref:`drivegroups` to categorize device(s) based on their
142 properties. This might be useful in forming a clearer picture of which
143 devices are available to consume. Properties include device type (SSD or
144 HDD), device model names, size, and the hosts on which the devices exist:
148 ceph orch apply -i spec.yml
153 The ``--dry-run`` flag causes the orchestrator to present a preview of what
154 will happen without actually creating the OSDs.
160 ceph orch apply osd --all-available-devices --dry-run
164 NAME HOST DATA DB WAL
165 all-available-devices node1 /dev/vdb - -
166 all-available-devices node2 /dev/vdc - -
167 all-available-devices node3 /dev/vdd - -
169 .. _cephadm-osd-declarative:
174 Note that the effect of ``ceph orch apply`` is persistent; that is, drives which are added to the system
175 or become available (say, by zapping) after the command is complete will be automatically found and added to the cluster.
177 That is, after using::
179 ceph orch apply osd --all-available-devices
181 * If you add new disks to the cluster they will automatically be used to create new OSDs.
182 * A new OSD will be created automatically if you remove an OSD and clean the LVM physical volume.
184 If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter:
188 ceph orch apply osd --all-available-devices --unmanaged=true
190 * For cephadm, see also :ref:`cephadm-spec-unmanaged`.
196 Removing an OSD from a cluster involves two steps:
198 #. evacuating all placement groups (PGs) from the cluster
199 #. removing the PG-free OSD from the cluster
201 The following command performs these two steps:
205 ceph orch osd rm <osd_id(s)> [--replace] [--force]
215 Scheduled OSD(s) for removal
217 OSDs that are not safe to destroy will be rejected.
222 You can query the state of OSD operation with the following command:
226 ceph orch osd rm status
230 OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT
231 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684
232 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158
233 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158
236 When no PGs are left on the OSD, it will be decommissioned and removed from the cluster.
239 After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created.
240 For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`.
245 It is possible to stop queued OSD removals by using the following command:
249 ceph orch osd rm stop <svc_id(s)>
255 ceph orch osd rm stop 4
259 Stopped OSD(s) removal
261 This resets the initial state of the OSD and takes it off the removal queue.
269 orch osd rm <svc_id(s)> --replace [--force]
275 ceph orch osd rm 4 --replace
279 Scheduled OSD(s) for replacement
281 This follows the same procedure as the procedure in the "Remove OSD" section, with
282 one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is
283 instead assigned a 'destroyed' flag.
285 **Preserving the OSD ID**
287 The 'destroyed' flag is used to determine which OSD ids will be reused in the
290 If you use OSDSpecs for OSD deployment, your newly added disks will be assigned
291 the OSD ids of their replaced counterparts. This assumes that the new disks
292 still match the OSDSpecs.
294 Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd``
295 command does what you want it to. The ``--dry-run`` flag shows you what the
296 outcome of the command will be without making the changes you specify. When
297 you are satisfied that the command will do what you want, run the command
298 without the ``--dry-run`` flag.
302 The name of your OSDSpec can be retrieved with the command ``ceph orch ls``
304 Alternatively, you can use your OSDSpec file:
308 ceph orch apply osd -i <osd_spec_file> --dry-run
312 NAME HOST DATA DB WAL
313 <name_of_osd_spec> node1 /dev/vdb - -
316 When this output reflects your intention, omit the ``--dry-run`` flag to
317 execute the deployment.
320 Erasing Devices (Zapping Devices)
321 ---------------------------------
323 Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume
324 zap`` on the remote host.
328 orch device zap <hostname> <path>
334 ceph orch device zap my_hostname /dev/sdx
337 If the unmanaged flag is unset, cephadm automatically deploys drives that
338 match the DriveGroup in your OSDSpec. For example, if you use the
339 ``all-available-devices`` option when creating OSDs, when you ``zap`` a
340 device the cephadm orchestrator automatically creates a new OSD in the
341 device. To disable this behavior, see :ref:`cephadm-osd-declarative`.
346 Advanced OSD Service Specifications
347 ===================================
349 :ref:`orchestrator-cli-service-spec` of type ``osd`` are a way to describe a cluster layout using the properties of disks.
350 It gives the user an abstract way tell ceph which disks should turn into an OSD
351 with which configuration without knowing the specifics of device names and paths.
353 Instead of doing this
355 .. prompt:: bash [monitor.1]#
357 ceph orch daemon add osd *<host>*:*<path-to-device>*
359 for each device and each host, we can define a yaml|json file that allows us to describe
360 the layout. Here's the most basic example.
362 Create a file called i.e. osd_spec.yml
367 service_id: default_drive_group <- name of the drive_group (name can be custom)
369 host_pattern: '*' <- which hosts to target, currently only supports globs
370 data_devices: <- the type of devices you are applying specs to
371 all: true <- a filter, check below for a full list
373 This would translate to:
375 Turn any available(ceph-volume decides what 'available' is) into an OSD on all hosts that match
376 the glob pattern '*'. (The glob pattern matches against the registered hosts from `host ls`)
377 There will be a more detailed section on host_pattern down below.
379 and pass it to `osd create` like so
381 .. prompt:: bash [monitor.1]#
383 ceph orch apply osd -i /path/to/osd_spec.yml
385 This will go out on all the matching hosts and deploy these OSDs.
387 Since we want to have more complex setups, there are more filters than just the 'all' filter.
389 Also, there is a `--dry-run` flag that can be passed to the `apply osd` command, which gives you a synopsis
390 of the proposed layout.
394 .. prompt:: bash [monitor.1]#
396 [monitor.1]# ceph orch apply osd -i /path/to/osd_spec.yml --dry-run
404 Filters are applied using a `AND` gate by default. This essentially means that a drive needs to fulfill all filter
405 criteria in order to get selected.
406 If you wish to change this behavior you can adjust this behavior by setting
408 `filter_logic: OR` # valid arguments are `AND`, `OR`
410 in the OSD Specification.
412 You can assign disks to certain groups by their attributes using filters.
414 The attributes are based off of ceph-volume's disk query. You can retrieve the information
419 ceph-volume inventory </path/to/disk>
424 You can target specific disks by their Vendor or by their Model
428 model: disk_model_name
434 vendor: disk_vendor_name
440 You can also match by disk `Size`.
449 Size specification of format can be of form:
458 Includes disks of an exact size
464 Includes disks which size is within the range
470 Includes disks less than or equal to 10G in size
477 Includes disks equal to or greater than 40G in size
483 Sizes don't have to be exclusively in Gigabyte(G).
485 Supported units are Megabyte(M), Gigabyte(G) and Terrabyte(T). Also appending the (B) for byte is supported. MB, GB, TB
491 This operates on the 'rotational' attribute of the disk.
497 `1` to match all disks that are rotational
499 `0` to match all disks that are non-rotational (SSD, NVME etc)
505 This will take all disks that are 'available'
507 Note: This is exclusive for the data_devices section.
517 When you specified valid filters but want to limit the amount of matching disks you can use the 'limit' directive.
523 For example, if you used `vendor` to match all disks that are from `VendorA` but only want to use the first two
524 you could use `limit`.
532 Note: Be aware that `limit` is really just a last resort and shouldn't be used if it can be avoided.
538 There are multiple optional settings you can use to change the way OSDs are deployed.
539 You can add these options to the base level of a DriveGroup for it to take effect.
541 This example would deploy all OSDs with encryption enabled.
546 service_id: example_osd_spec
553 See a full list in the DriveGroupSpecs
555 .. py:currentmodule:: ceph.deployment.drive_group
557 .. autoclass:: DriveGroupSpec
559 :exclude-members: from_json
567 All nodes with the same setup
581 This is a common setup and can be described quite easily:
586 service_id: osd_spec_default
590 model: HDD-123-foo <- note that HDD-123 would also be valid
592 model: MC-55-44-XZ <- same here, MC-55-44 is valid
594 However, we can improve it by reducing the filters on core properties of the drives:
599 service_id: osd_spec_default
607 Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db)
609 If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size:
614 service_id: osd_spec_default
622 Note: All of the above DriveGroups are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change.
628 Here we have two distinct setups
648 * 20 HDDs should share 2 SSDs
649 * 10 SSDs should share 2 NVMes
651 This can be described with two layouts.
656 service_id: osd_spec_hdd
663 limit: 2 (db_slots is actually to be favoured here, but it's not implemented yet)
666 service_id: osd_spec_ssd
674 This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices.
675 The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices.
677 The advanced case (with non-uniform nodes)
678 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
680 The examples above assumed that all nodes have the same drives. That's however not always the case.
708 You can use the 'host_pattern' key in the layout to target certain nodes. Salt target notation helps to keep things easy.
714 service_id: osd_spec_node_one_to_five
716 host_pattern: 'node[1-5]'
723 service_id: osd_spec_six_to_ten
725 host_pattern: 'node[6-10]'
731 This applies different OSD specs to different hosts depending on the `host_pattern` key.
736 All previous cases co-located the WALs with the DBs.
737 It's however possible to deploy the WAL on a dedicated device as well, if it makes sense.
757 The OSD spec for this case would look like the following (using the `model` filter):
762 service_id: osd_spec_default
773 It is also possible to specify directly device paths in specific hosts like the following:
778 service_id: osd_using_paths
794 This can easily be done with other filters, like `size` or `vendor` as well.
796 Activate existing OSDs
797 ======================
799 In case the OS of a host was reinstalled, existing OSDs need to be activated
800 again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that
801 activates all existing OSDs on a host.
805 ceph cephadm osd activate <host>...
807 This will scan all existing disks for OSDs and deploy corresponding daemons.