]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/services/osd.rst
import ceph quincy 17.2.6
[ceph.git] / ceph / doc / cephadm / services / osd.rst
CommitLineData
f67539c2
TL
1***********
2OSD Service
3***********
4.. _device management: ../rados/operations/devices
5.. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt
6
7List Devices
8============
9
522d829b 10``ceph-volume`` scans each host in the cluster from time to time in order
f67539c2
TL
11to determine which devices are present and whether they are eligible to be
12used as OSDs.
13
14To print a list of devices discovered by ``cephadm``, run this command:
15
16.. prompt:: bash #
17
18 ceph orch device ls [--hostname=...] [--wide] [--refresh]
19
20Example
21::
22
23 Hostname Path Type Serial Size Health Ident Fault Available
24 srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No
25 srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No
26 srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No
27 srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No
28 srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No
29 srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No
30 srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No
31 srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No
32 srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No
33 srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No
34 srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No
35
36Using the ``--wide`` option provides all details relating to the device,
37including any reasons that the device might not be eligible for use as an OSD.
38
39In the above example you can see fields named "Health", "Ident", and "Fault".
40This information is provided by integration with `libstoragemgmt`_. By default,
41this integration is disabled (because `libstoragemgmt`_ may not be 100%
42compatible with your hardware). To make ``cephadm`` include these fields,
43enable cephadm's "enhanced device scan" option as follows;
44
45.. prompt:: bash #
46
47 ceph config set mgr mgr/cephadm/device_enhanced_scan true
48
49.. warning::
50 Although the libstoragemgmt library performs standard SCSI inquiry calls,
51 there is no guarantee that your firmware fully implements these standards.
52 This can lead to erratic behaviour and even bus resets on some older
53 hardware. It is therefore recommended that, before enabling this feature,
54 you test your hardware's compatibility with libstoragemgmt first to avoid
55 unplanned interruptions to services.
56
57 There are a number of ways to test compatibility, but the simplest may be
58 to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell
59 lsmcli ldl``. If your hardware is supported you should see something like
60 this:
61
62 ::
63
64 Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status
65 ----------------------------------------------------------------------------
66 /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good
67 /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good
68
69
70After you have enabled libstoragemgmt support, the output will look something
71like this:
72
73::
74
75 # ceph orch device ls
76 Hostname Path Type Serial Size Health Ident Fault Available
77 srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No
78 srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No
79 :
80
81In this example, libstoragemgmt has confirmed the health of the drives and the ability to
82interact with the Identification and Fault LEDs on the drive enclosures. For further
83information about interacting with these LEDs, refer to `device management`_.
84
85.. note::
86 The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based
87 local disks only. There is no official support for NVMe devices (PCIe)
88
89.. _cephadm-deploy-osds:
90
91Deploy OSDs
92===========
93
94Listing Storage Devices
95-----------------------
96
97In order to deploy an OSD, there must be a storage device that is *available* on
98which the OSD will be deployed.
99
100Run this command to display an inventory of storage devices on all cluster hosts:
101
102.. prompt:: bash #
103
104 ceph orch device ls
105
106A storage device is considered *available* if all of the following
107conditions are met:
108
109* The device must have no partitions.
110* The device must not have any LVM state.
111* The device must not be mounted.
112* The device must not contain a file system.
113* The device must not contain a Ceph BlueStore OSD.
114* The device must be larger than 5 GB.
115
116Ceph will not provision an OSD on a device that is not available.
117
118Creating New OSDs
119-----------------
120
121There are a few ways to create new OSDs:
122
123* Tell Ceph to consume any available and unused storage device:
124
125 .. prompt:: bash #
126
127 ceph orch apply osd --all-available-devices
128
129* Create an OSD from a specific device on a specific host:
130
131 .. prompt:: bash #
132
133 ceph orch daemon add osd *<host>*:*<device-path>*
134
135 For example:
136
137 .. prompt:: bash #
138
139 ceph orch daemon add osd host1:/dev/sdb
140
33c7a0ef
TL
141 Advanced OSD creation from specific devices on a specific host:
142
143 .. prompt:: bash #
144
145 ceph orch daemon add osd host1:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2
146
f67539c2
TL
147* You can use :ref:`drivegroups` to categorize device(s) based on their
148 properties. This might be useful in forming a clearer picture of which
149 devices are available to consume. Properties include device type (SSD or
150 HDD), device model names, size, and the hosts on which the devices exist:
151
152 .. prompt:: bash #
153
154 ceph orch apply -i spec.yml
155
156Dry Run
157-------
158
159The ``--dry-run`` flag causes the orchestrator to present a preview of what
160will happen without actually creating the OSDs.
161
162For example:
163
164 .. prompt:: bash #
165
166 ceph orch apply osd --all-available-devices --dry-run
167
168 ::
169
170 NAME HOST DATA DB WAL
171 all-available-devices node1 /dev/vdb - -
172 all-available-devices node2 /dev/vdc - -
173 all-available-devices node3 /dev/vdd - -
174
175.. _cephadm-osd-declarative:
176
177Declarative State
178-----------------
179
b3b6e05e
TL
180The effect of ``ceph orch apply`` is persistent. This means that drives that
181are added to the system after the ``ceph orch apply`` command completes will be
182automatically found and added to the cluster. It also means that drives that
183become available (by zapping, for example) after the ``ceph orch apply``
184command completes will be automatically found and added to the cluster.
f67539c2 185
b3b6e05e 186We will examine the effects of the following command:
f67539c2 187
b3b6e05e
TL
188 .. prompt:: bash #
189
190 ceph orch apply osd --all-available-devices
191
192After running the above command:
193
194* If you add new disks to the cluster, they will automatically be used to
195 create new OSDs.
196* If you remove an OSD and clean the LVM physical volume, a new OSD will be
197 created automatically.
f67539c2 198
b3b6e05e
TL
199To disable the automatic creation of OSD on available devices, use the
200``unmanaged`` parameter:
f67539c2
TL
201
202If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter:
203
204.. prompt:: bash #
205
206 ceph orch apply osd --all-available-devices --unmanaged=true
207
b3b6e05e
TL
208.. note::
209
210 Keep these three facts in mind:
211
212 - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected.
213
214 - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service.
215
216 - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service.
217
f67539c2
TL
218* For cephadm, see also :ref:`cephadm-spec-unmanaged`.
219
522d829b 220.. _cephadm-osd-removal:
f67539c2
TL
221
222Remove an OSD
223=============
224
225Removing an OSD from a cluster involves two steps:
226
227#. evacuating all placement groups (PGs) from the cluster
228#. removing the PG-free OSD from the cluster
229
230The following command performs these two steps:
231
232.. prompt:: bash #
233
234 ceph orch osd rm <osd_id(s)> [--replace] [--force]
235
236Example:
237
238.. prompt:: bash #
239
240 ceph orch osd rm 0
241
242Expected output::
243
244 Scheduled OSD(s) for removal
245
246OSDs that are not safe to destroy will be rejected.
247
2a845540
TL
248.. note::
249 After removing OSDs, if the drives the OSDs were deployed on once again
250 become available, cephadm may automatically try to deploy more OSDs
251 on these drives if they match an existing drivegroup spec. If you deployed
252 the OSDs you are removing with a spec and don't want any new OSDs deployed on
253 the drives after removal, it's best to modify the drivegroup spec before removal.
254 Either set ``unmanaged: true`` to stop it from picking up new drives at all,
255 or modify it in some way that it no longer matches the drives used for the
256 OSDs you wish to remove. Then re-apply the spec. For more info on drivegroup
257 specs see :ref:`drivegroups`. For more info on the declarative nature of
258 cephadm in reference to deploying OSDs, see :ref:`cephadm-osd-declarative`
259
f67539c2
TL
260Monitoring OSD State
261--------------------
262
263You can query the state of OSD operation with the following command:
264
265.. prompt:: bash #
266
267 ceph orch osd rm status
268
269Expected output::
270
271 OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT
272 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684
273 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158
274 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158
275
276
277When no PGs are left on the OSD, it will be decommissioned and removed from the cluster.
278
279.. note::
280 After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created.
281 For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`.
282
283Stopping OSD Removal
284--------------------
285
286It is possible to stop queued OSD removals by using the following command:
287
288.. prompt:: bash #
289
b3b6e05e 290 ceph orch osd rm stop <osd_id(s)>
f67539c2
TL
291
292Example:
293
294.. prompt:: bash #
295
296 ceph orch osd rm stop 4
297
298Expected output::
299
300 Stopped OSD(s) removal
301
302This resets the initial state of the OSD and takes it off the removal queue.
303
39ae355f 304.. _cephadm-replacing-an-osd:
f67539c2
TL
305
306Replacing an OSD
307----------------
308
309.. prompt:: bash #
310
b3b6e05e 311 orch osd rm <osd_id(s)> --replace [--force]
f67539c2
TL
312
313Example:
314
315.. prompt:: bash #
316
317 ceph orch osd rm 4 --replace
318
319Expected output::
320
321 Scheduled OSD(s) for replacement
322
323This follows the same procedure as the procedure in the "Remove OSD" section, with
324one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is
325instead assigned a 'destroyed' flag.
326
a4b75251
TL
327.. note::
328 The new OSD that will replace the removed OSD must be created on the same host
329 as the OSD that was removed.
330
f67539c2
TL
331**Preserving the OSD ID**
332
333The 'destroyed' flag is used to determine which OSD ids will be reused in the
334next OSD deployment.
335
336If you use OSDSpecs for OSD deployment, your newly added disks will be assigned
337the OSD ids of their replaced counterparts. This assumes that the new disks
338still match the OSDSpecs.
339
340Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd``
341command does what you want it to. The ``--dry-run`` flag shows you what the
342outcome of the command will be without making the changes you specify. When
343you are satisfied that the command will do what you want, run the command
344without the ``--dry-run`` flag.
345
346.. tip::
347
348 The name of your OSDSpec can be retrieved with the command ``ceph orch ls``
349
350Alternatively, you can use your OSDSpec file:
351
352.. prompt:: bash #
353
20effc67 354 ceph orch apply -i <osd_spec_file> --dry-run
f67539c2
TL
355
356Expected output::
357
358 NAME HOST DATA DB WAL
359 <name_of_osd_spec> node1 /dev/vdb - -
360
361
362When this output reflects your intention, omit the ``--dry-run`` flag to
363execute the deployment.
364
365
366Erasing Devices (Zapping Devices)
367---------------------------------
368
369Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume
370zap`` on the remote host.
371
372.. prompt:: bash #
373
522d829b 374 ceph orch device zap <hostname> <path>
f67539c2
TL
375
376Example command:
377
378.. prompt:: bash #
379
380 ceph orch device zap my_hostname /dev/sdx
381
382.. note::
383 If the unmanaged flag is unset, cephadm automatically deploys drives that
a4b75251 384 match the OSDSpec. For example, if you use the
f67539c2
TL
385 ``all-available-devices`` option when creating OSDs, when you ``zap`` a
386 device the cephadm orchestrator automatically creates a new OSD in the
387 device. To disable this behavior, see :ref:`cephadm-osd-declarative`.
388
389
b3b6e05e
TL
390.. _osd_autotune:
391
392Automatically tuning OSD memory
393===============================
394
395OSD daemons will adjust their memory consumption based on the
396``osd_memory_target`` config option (several gigabytes, by
397default). If Ceph is deployed on dedicated nodes that are not sharing
398memory with other services, cephadm can automatically adjust the per-OSD
399memory consumption based on the total amount of RAM and the number of deployed
400OSDs.
401
20effc67 402.. warning:: Cephadm sets ``osd_memory_target_autotune`` to ``true`` by default which is unsuitable for hyperconverged infrastructures.
b3b6e05e
TL
403
404Cephadm will start with a fraction
405(``mgr/cephadm/autotune_memory_target_ratio``, which defaults to
406``.7``) of the total RAM in the system, subtract off any memory
407consumed by non-autotuned daemons (non-OSDs, for OSDs for which
408``osd_memory_target_autotune`` is false), and then divide by the
409remaining OSDs.
410
411The final targets are reflected in the config database with options like::
412
413 WHO MASK LEVEL OPTION VALUE
414 osd host:foo basic osd_memory_target 126092301926
415 osd host:bar basic osd_memory_target 6442450944
416
417Both the limits and the current memory consumed by each daemon are visible from
418the ``ceph orch ps`` output in the ``MEM LIMIT`` column::
419
420 NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID
421 osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c
422 osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051
423 osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27
424
425To exclude an OSD from memory autotuning, disable the autotune option
426for that OSD and also set a specific memory target. For example,
427
428 .. prompt:: bash #
429
430 ceph config set osd.123 osd_memory_target_autotune false
431 ceph config set osd.123 osd_memory_target 16G
432
433
f67539c2
TL
434.. _drivegroups:
435
436Advanced OSD Service Specifications
437===================================
438
b3b6e05e
TL
439:ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a
440cluster layout, using the properties of disks. Service specifications give the
441user an abstract way to tell Ceph which disks should turn into OSDs with which
442configurations, without knowing the specifics of device names and paths.
443
444Service specifications make it possible to define a yaml or json file that can
445be used to reduce the amount of manual work involved in creating OSDs.
f67539c2 446
b3b6e05e 447For example, instead of running the following command:
f67539c2
TL
448
449.. prompt:: bash [monitor.1]#
450
451 ceph orch daemon add osd *<host>*:*<path-to-device>*
452
b3b6e05e
TL
453for each device and each host, we can define a yaml or json file that allows us
454to describe the layout. Here's the most basic example.
f67539c2 455
b3b6e05e 456Create a file called (for example) ``osd_spec.yml``:
f67539c2
TL
457
458.. code-block:: yaml
459
460 service_type: osd
a4b75251 461 service_id: default_drive_group # custom name of the osd spec
f67539c2 462 placement:
a4b75251
TL
463 host_pattern: '*' # which hosts to target
464 spec:
465 data_devices: # the type of devices you are applying specs to
466 all: true # a filter, check below for a full list
f67539c2 467
b3b6e05e 468This means :
f67539c2 469
b3b6e05e
TL
470#. Turn any available device (ceph-volume decides what 'available' is) into an
471 OSD on all hosts that match the glob pattern '*'. (The glob pattern matches
472 against the registered hosts from `host ls`) A more detailed section on
473 host_pattern is available below.
f67539c2 474
b3b6e05e 475#. Then pass it to `osd create` like this:
f67539c2 476
b3b6e05e 477 .. prompt:: bash [monitor.1]#
f67539c2 478
20effc67 479 ceph orch apply -i /path/to/osd_spec.yml
f67539c2 480
b3b6e05e
TL
481 This instruction will be issued to all the matching hosts, and will deploy
482 these OSDs.
f67539c2 483
b3b6e05e
TL
484 Setups more complex than the one specified by the ``all`` filter are
485 possible. See :ref:`osd_filters` for details.
f67539c2 486
b3b6e05e
TL
487 A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a
488 synopsis of the proposed layout.
f67539c2
TL
489
490Example
491
492.. prompt:: bash [monitor.1]#
493
20effc67 494 ceph orch apply -i /path/to/osd_spec.yml --dry-run
b3b6e05e 495
f67539c2
TL
496
497
b3b6e05e 498.. _osd_filters:
f67539c2
TL
499
500Filters
501-------
502
503.. note::
b3b6e05e
TL
504 Filters are applied using an `AND` gate by default. This means that a drive
505 must fulfill all filter criteria in order to get selected. This behavior can
506 be adjusted by setting ``filter_logic: OR`` in the OSD specification.
f67539c2 507
b3b6e05e
TL
508Filters are used to assign disks to groups, using their attributes to group
509them.
f67539c2 510
b3b6e05e
TL
511The attributes are based off of ceph-volume's disk query. You can retrieve
512information about the attributes with this command:
f67539c2
TL
513
514.. code-block:: bash
515
516 ceph-volume inventory </path/to/disk>
517
b3b6e05e
TL
518Vendor or Model
519^^^^^^^^^^^^^^^
f67539c2 520
b3b6e05e 521Specific disks can be targeted by vendor or model:
f67539c2
TL
522
523.. code-block:: yaml
524
525 model: disk_model_name
526
527or
528
529.. code-block:: yaml
530
531 vendor: disk_vendor_name
532
533
b3b6e05e
TL
534Size
535^^^^
f67539c2 536
b3b6e05e 537Specific disks can be targeted by `Size`:
f67539c2
TL
538
539.. code-block:: yaml
540
541 size: size_spec
542
b3b6e05e
TL
543Size specs
544__________
f67539c2 545
b3b6e05e 546Size specifications can be of the following forms:
f67539c2
TL
547
548* LOW:HIGH
549* :HIGH
550* LOW:
551* EXACT
552
553Concrete examples:
554
b3b6e05e 555To include disks of an exact size
f67539c2
TL
556
557.. code-block:: yaml
558
559 size: '10G'
560
b3b6e05e 561To include disks within a given range of size:
f67539c2
TL
562
563.. code-block:: yaml
564
565 size: '10G:40G'
566
b3b6e05e 567To include disks that are less than or equal to 10G in size:
f67539c2
TL
568
569.. code-block:: yaml
570
571 size: ':10G'
572
b3b6e05e 573To include disks equal to or greater than 40G in size:
f67539c2
TL
574
575.. code-block:: yaml
576
577 size: '40G:'
578
b3b6e05e 579Sizes don't have to be specified exclusively in Gigabytes(G).
f67539c2 580
b3b6e05e
TL
581Other units of size are supported: Megabyte(M), Gigabyte(G) and Terrabyte(T).
582Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``.
f67539c2
TL
583
584
b3b6e05e
TL
585Rotational
586^^^^^^^^^^
f67539c2
TL
587
588This operates on the 'rotational' attribute of the disk.
589
590.. code-block:: yaml
591
592 rotational: 0 | 1
593
594`1` to match all disks that are rotational
595
596`0` to match all disks that are non-rotational (SSD, NVME etc)
597
598
b3b6e05e
TL
599All
600^^^
f67539c2
TL
601
602This will take all disks that are 'available'
603
a4b75251 604.. note:: This is exclusive for the data_devices section.
f67539c2
TL
605
606.. code-block:: yaml
607
608 all: true
609
610
b3b6e05e
TL
611Limiter
612^^^^^^^
f67539c2 613
b3b6e05e 614If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive:
f67539c2
TL
615
616.. code-block:: yaml
617
618 limit: 2
619
b3b6e05e
TL
620For example, if you used `vendor` to match all disks that are from `VendorA`
621but want to use only the first two, you could use `limit`:
f67539c2
TL
622
623.. code-block:: yaml
624
625 data_devices:
626 vendor: VendorA
627 limit: 2
628
a4b75251 629.. note:: `limit` is a last resort and shouldn't be used if it can be avoided.
f67539c2
TL
630
631
632Additional Options
633------------------
634
635There are multiple optional settings you can use to change the way OSDs are deployed.
a4b75251 636You can add these options to the base level of an OSD spec for it to take effect.
f67539c2
TL
637
638This example would deploy all OSDs with encryption enabled.
639
640.. code-block:: yaml
641
642 service_type: osd
643 service_id: example_osd_spec
644 placement:
645 host_pattern: '*'
a4b75251
TL
646 spec:
647 data_devices:
648 all: true
649 encrypted: true
f67539c2
TL
650
651See a full list in the DriveGroupSpecs
652
653.. py:currentmodule:: ceph.deployment.drive_group
654
655.. autoclass:: DriveGroupSpec
656 :members:
657 :exclude-members: from_json
658
659Examples
a4b75251 660========
f67539c2
TL
661
662The simple case
a4b75251 663---------------
f67539c2
TL
664
665All nodes with the same setup
666
667.. code-block:: none
668
669 20 HDDs
670 Vendor: VendorA
671 Model: HDD-123-foo
672 Size: 4TB
673
674 2 SSDs
675 Vendor: VendorB
676 Model: MC-55-44-ZX
677 Size: 512GB
678
679This is a common setup and can be described quite easily:
680
681.. code-block:: yaml
682
683 service_type: osd
684 service_id: osd_spec_default
685 placement:
686 host_pattern: '*'
a4b75251
TL
687 spec:
688 data_devices:
689 model: HDD-123-foo # Note, HDD-123 would also be valid
690 db_devices:
691 model: MC-55-44-XZ # Same here, MC-55-44 is valid
f67539c2
TL
692
693However, we can improve it by reducing the filters on core properties of the drives:
694
695.. code-block:: yaml
696
697 service_type: osd
698 service_id: osd_spec_default
699 placement:
700 host_pattern: '*'
a4b75251
TL
701 spec:
702 data_devices:
703 rotational: 1
704 db_devices:
705 rotational: 0
f67539c2
TL
706
707Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db)
708
709If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size:
710
711.. code-block:: yaml
712
713 service_type: osd
714 service_id: osd_spec_default
715 placement:
716 host_pattern: '*'
a4b75251
TL
717 spec:
718 data_devices:
719 size: '2TB:'
720 db_devices:
721 size: ':2TB'
f67539c2 722
a4b75251 723.. note:: All of the above OSD specs are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change.
f67539c2
TL
724
725
a4b75251
TL
726Multiple OSD specs for a single host
727------------------------------------
f67539c2
TL
728
729Here we have two distinct setups
730
731.. code-block:: none
732
733 20 HDDs
734 Vendor: VendorA
735 Model: HDD-123-foo
736 Size: 4TB
737
738 12 SSDs
739 Vendor: VendorB
740 Model: MC-55-44-ZX
741 Size: 512GB
742
743 2 NVMEs
744 Vendor: VendorC
745 Model: NVME-QQQQ-987
746 Size: 256GB
747
748
749* 20 HDDs should share 2 SSDs
750* 10 SSDs should share 2 NVMes
751
752This can be described with two layouts.
753
754.. code-block:: yaml
755
756 service_type: osd
757 service_id: osd_spec_hdd
758 placement:
759 host_pattern: '*'
a4b75251
TL
760 spec:
761 data_devices:
762 rotational: 0
763 db_devices:
764 model: MC-55-44-XZ
765 limit: 2 # db_slots is actually to be favoured here, but it's not implemented yet
f67539c2
TL
766 ---
767 service_type: osd
768 service_id: osd_spec_ssd
769 placement:
770 host_pattern: '*'
a4b75251
TL
771 spec:
772 data_devices:
773 model: MC-55-44-XZ
774 db_devices:
775 vendor: VendorC
f67539c2
TL
776
777This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices.
778The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices.
779
a4b75251
TL
780Multiple hosts with the same disk layout
781----------------------------------------
782
783Assuming the cluster has different kinds of hosts each with similar disk
784layout, it is recommended to apply different OSD specs matching only one
785set of hosts. Typically you will have a spec for multiple hosts with the
786same layout.
f67539c2 787
20effc67
TL
788The service id as the unique key: In case a new OSD spec with an already
789applied service id is applied, the existing OSD spec will be superseded.
a4b75251
TL
790cephadm will now create new OSD daemons based on the new spec
791definition. Existing OSD daemons will not be affected. See :ref:`cephadm-osd-declarative`.
f67539c2
TL
792
793Node1-5
794
795.. code-block:: none
796
797 20 HDDs
798 Vendor: Intel
799 Model: SSD-123-foo
800 Size: 4TB
801 2 SSDs
802 Vendor: VendorA
803 Model: MC-55-44-ZX
804 Size: 512GB
805
806Node6-10
807
808.. code-block:: none
809
810 5 NVMEs
811 Vendor: Intel
812 Model: SSD-123-foo
813 Size: 4TB
814 20 SSDs
815 Vendor: VendorA
816 Model: MC-55-44-ZX
817 Size: 512GB
818
a4b75251 819You can use the 'placement' key in the layout to target certain nodes.
f67539c2
TL
820
821.. code-block:: yaml
822
823 service_type: osd
a4b75251 824 service_id: disk_layout_a
f67539c2 825 placement:
a4b75251
TL
826 label: disk_layout_a
827 spec:
828 data_devices:
829 rotational: 1
830 db_devices:
831 rotational: 0
f67539c2
TL
832 ---
833 service_type: osd
a4b75251 834 service_id: disk_layout_b
f67539c2 835 placement:
a4b75251
TL
836 label: disk_layout_b
837 spec:
838 data_devices:
839 model: MC-55-44-XZ
840 db_devices:
841 model: SSD-123-foo
842
843This applies different OSD specs to different hosts depending on the `placement` key.
844See :ref:`orchestrator-cli-placement-spec`
845
846.. note::
847
848 Assuming each host has a unique disk layout, each OSD
849 spec needs to have a different service id
f67539c2 850
f67539c2
TL
851
852Dedicated wal + db
a4b75251 853------------------
f67539c2
TL
854
855All previous cases co-located the WALs with the DBs.
856It's however possible to deploy the WAL on a dedicated device as well, if it makes sense.
857
858.. code-block:: none
859
860 20 HDDs
861 Vendor: VendorA
862 Model: SSD-123-foo
863 Size: 4TB
864
865 2 SSDs
866 Vendor: VendorB
867 Model: MC-55-44-ZX
868 Size: 512GB
869
870 2 NVMEs
871 Vendor: VendorC
872 Model: NVME-QQQQ-987
873 Size: 256GB
874
875
876The OSD spec for this case would look like the following (using the `model` filter):
877
878.. code-block:: yaml
879
880 service_type: osd
881 service_id: osd_spec_default
882 placement:
883 host_pattern: '*'
a4b75251
TL
884 spec:
885 data_devices:
886 model: MC-55-44-XZ
887 db_devices:
888 model: SSD-123-foo
889 wal_devices:
890 model: NVME-QQQQ-987
f67539c2
TL
891
892
893It is also possible to specify directly device paths in specific hosts like the following:
894
895.. code-block:: yaml
896
897 service_type: osd
898 service_id: osd_using_paths
899 placement:
900 hosts:
901 - Node01
902 - Node02
a4b75251
TL
903 spec:
904 data_devices:
905 paths:
f67539c2 906 - /dev/sdb
a4b75251
TL
907 db_devices:
908 paths:
f67539c2 909 - /dev/sdc
a4b75251
TL
910 wal_devices:
911 paths:
f67539c2
TL
912 - /dev/sdd
913
914
915This can easily be done with other filters, like `size` or `vendor` as well.
916
39ae355f
TL
917It's possible to specify the `crush_device_class` parameter within the
918DriveGroup spec, and it's applied to all the devices defined by the `paths`
919keyword:
920
921.. code-block:: yaml
922
923 service_type: osd
924 service_id: osd_using_paths
925 placement:
926 hosts:
927 - Node01
928 - Node02
929 crush_device_class: ssd
930 spec:
931 data_devices:
932 paths:
933 - /dev/sdb
934 - /dev/sdc
935 db_devices:
936 paths:
937 - /dev/sdd
938 wal_devices:
939 paths:
940 - /dev/sde
941
942The `crush_device_class` parameter, however, can be defined for each OSD passed
943using the `paths` keyword with the following syntax:
944
945.. code-block:: yaml
946
947 service_type: osd
948 service_id: osd_using_paths
949 placement:
950 hosts:
951 - Node01
952 - Node02
953 crush_device_class: ssd
954 spec:
955 data_devices:
956 paths:
957 - path: /dev/sdb
958 crush_device_class: ssd
959 - path: /dev/sdc
960 crush_device_class: nvme
961 db_devices:
962 paths:
963 - /dev/sdd
964 wal_devices:
965 paths:
966 - /dev/sde
967
a4b75251
TL
968.. _cephadm-osd-activate:
969
f67539c2
TL
970Activate existing OSDs
971======================
972
973In case the OS of a host was reinstalled, existing OSDs need to be activated
974again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that
975activates all existing OSDs on a host.
976
977.. prompt:: bash #
978
979 ceph cephadm osd activate <host>...
980
981This will scan all existing disks for OSDs and deploy corresponding daemons.
a4b75251 982
20effc67
TL
983Further Reading
984===============
a4b75251
TL
985
986* :ref:`ceph-volume`
987* :ref:`rados-index`