]>
Commit | Line | Data |
---|---|---|
1 | *********** | |
2 | OSD Service | |
3 | *********** | |
4 | .. _device management: ../rados/operations/devices | |
5 | .. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt | |
6 | ||
7 | List Devices | |
8 | ============ | |
9 | ||
10 | ``ceph-volume`` scans each host in the cluster from time to time in order | |
11 | to determine which devices are present and whether they are eligible to be | |
12 | used as OSDs. | |
13 | ||
14 | To print a list of devices discovered by ``cephadm``, run this command: | |
15 | ||
16 | .. prompt:: bash # | |
17 | ||
18 | ceph orch device ls [--hostname=...] [--wide] [--refresh] | |
19 | ||
20 | Example:: | |
21 | ||
22 | Hostname Path Type Serial Size Health Ident Fault Available | |
23 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No | |
24 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No | |
25 | srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No | |
26 | srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No | |
27 | srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No | |
28 | srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No | |
29 | srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No | |
30 | srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No | |
31 | srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No | |
32 | srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No | |
33 | srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No | |
34 | ||
35 | Using the ``--wide`` option provides all details relating to the device, | |
36 | including any reasons that the device might not be eligible for use as an OSD. | |
37 | ||
38 | In the above example you can see fields named "Health", "Ident", and "Fault". | |
39 | This information is provided by integration with `libstoragemgmt`_. By default, | |
40 | this integration is disabled (because `libstoragemgmt`_ may not be 100% | |
41 | compatible with your hardware). To make ``cephadm`` include these fields, | |
42 | enable cephadm's "enhanced device scan" option as follows; | |
43 | ||
44 | .. prompt:: bash # | |
45 | ||
46 | ceph config set mgr mgr/cephadm/device_enhanced_scan true | |
47 | ||
48 | .. warning:: | |
49 | Although the libstoragemgmt library performs standard SCSI inquiry calls, | |
50 | there is no guarantee that your firmware fully implements these standards. | |
51 | This can lead to erratic behaviour and even bus resets on some older | |
52 | hardware. It is therefore recommended that, before enabling this feature, | |
53 | you test your hardware's compatibility with libstoragemgmt first to avoid | |
54 | unplanned interruptions to services. | |
55 | ||
56 | There are a number of ways to test compatibility, but the simplest may be | |
57 | to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell | |
58 | lsmcli ldl``. If your hardware is supported you should see something like | |
59 | this: | |
60 | ||
61 | :: | |
62 | ||
63 | Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status | |
64 | ---------------------------------------------------------------------------- | |
65 | /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good | |
66 | /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good | |
67 | ||
68 | ||
69 | After you have enabled libstoragemgmt support, the output will look something | |
70 | like this: | |
71 | ||
72 | :: | |
73 | ||
74 | # ceph orch device ls | |
75 | Hostname Path Type Serial Size Health Ident Fault Available | |
76 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No | |
77 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No | |
78 | : | |
79 | ||
80 | In this example, libstoragemgmt has confirmed the health of the drives and the ability to | |
81 | interact with the Identification and Fault LEDs on the drive enclosures. For further | |
82 | information about interacting with these LEDs, refer to `device management`_. | |
83 | ||
84 | .. note:: | |
85 | The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based | |
86 | local disks only. There is no official support for NVMe devices (PCIe) | |
87 | ||
88 | .. _cephadm-deploy-osds: | |
89 | ||
90 | Deploy OSDs | |
91 | =========== | |
92 | ||
93 | Listing Storage Devices | |
94 | ----------------------- | |
95 | ||
96 | In order to deploy an OSD, there must be a storage device that is *available* on | |
97 | which the OSD will be deployed. | |
98 | ||
99 | Run this command to display an inventory of storage devices on all cluster hosts: | |
100 | ||
101 | .. prompt:: bash # | |
102 | ||
103 | ceph orch device ls | |
104 | ||
105 | A storage device is considered *available* if all of the following | |
106 | conditions are met: | |
107 | ||
108 | * The device must have no partitions. | |
109 | * The device must not have any LVM state. | |
110 | * The device must not be mounted. | |
111 | * The device must not contain a file system. | |
112 | * The device must not contain a Ceph BlueStore OSD. | |
113 | * The device must be larger than 5 GB. | |
114 | ||
115 | Ceph will not provision an OSD on a device that is not available. | |
116 | ||
117 | Creating New OSDs | |
118 | ----------------- | |
119 | ||
120 | There are a few ways to create new OSDs: | |
121 | ||
122 | * Tell Ceph to consume any available and unused storage device: | |
123 | ||
124 | .. prompt:: bash # | |
125 | ||
126 | ceph orch apply osd --all-available-devices | |
127 | ||
128 | * Create an OSD from a specific device on a specific host: | |
129 | ||
130 | .. prompt:: bash # | |
131 | ||
132 | ceph orch daemon add osd *<host>*:*<device-path>* | |
133 | ||
134 | For example: | |
135 | ||
136 | .. prompt:: bash # | |
137 | ||
138 | ceph orch daemon add osd host1:/dev/sdb | |
139 | ||
140 | Advanced OSD creation from specific devices on a specific host: | |
141 | ||
142 | .. prompt:: bash # | |
143 | ||
144 | ceph orch daemon add osd host1:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2 | |
145 | ||
146 | * Create an OSD on a specific LVM logical volume on a specific host: | |
147 | ||
148 | .. prompt:: bash # | |
149 | ||
150 | ceph orch daemon add osd *<host>*:*<lvm-path>* | |
151 | ||
152 | For example: | |
153 | ||
154 | .. prompt:: bash # | |
155 | ||
156 | ceph orch daemon add osd host1:/dev/vg_osd/lvm_osd1701 | |
157 | ||
158 | * You can use :ref:`drivegroups` to categorize device(s) based on their | |
159 | properties. This might be useful in forming a clearer picture of which | |
160 | devices are available to consume. Properties include device type (SSD or | |
161 | HDD), device model names, size, and the hosts on which the devices exist: | |
162 | ||
163 | .. prompt:: bash # | |
164 | ||
165 | ceph orch apply -i spec.yml | |
166 | ||
167 | Dry Run | |
168 | ------- | |
169 | ||
170 | The ``--dry-run`` flag causes the orchestrator to present a preview of what | |
171 | will happen without actually creating the OSDs. | |
172 | ||
173 | For example: | |
174 | ||
175 | .. prompt:: bash # | |
176 | ||
177 | ceph orch apply osd --all-available-devices --dry-run | |
178 | ||
179 | :: | |
180 | ||
181 | NAME HOST DATA DB WAL | |
182 | all-available-devices node1 /dev/vdb - - | |
183 | all-available-devices node2 /dev/vdc - - | |
184 | all-available-devices node3 /dev/vdd - - | |
185 | ||
186 | .. _cephadm-osd-declarative: | |
187 | ||
188 | Declarative State | |
189 | ----------------- | |
190 | ||
191 | The effect of ``ceph orch apply`` is persistent. This means that drives that | |
192 | are added to the system after the ``ceph orch apply`` command completes will be | |
193 | automatically found and added to the cluster. It also means that drives that | |
194 | become available (by zapping, for example) after the ``ceph orch apply`` | |
195 | command completes will be automatically found and added to the cluster. | |
196 | ||
197 | We will examine the effects of the following command: | |
198 | ||
199 | .. prompt:: bash # | |
200 | ||
201 | ceph orch apply osd --all-available-devices | |
202 | ||
203 | After running the above command: | |
204 | ||
205 | * If you add new disks to the cluster, they will automatically be used to | |
206 | create new OSDs. | |
207 | * If you remove an OSD and clean the LVM physical volume, a new OSD will be | |
208 | created automatically. | |
209 | ||
210 | If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: | |
211 | ||
212 | .. prompt:: bash # | |
213 | ||
214 | ceph orch apply osd --all-available-devices --unmanaged=true | |
215 | ||
216 | .. note:: | |
217 | ||
218 | Keep these three facts in mind: | |
219 | ||
220 | - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. | |
221 | ||
222 | - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. | |
223 | ||
224 | - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. | |
225 | ||
226 | * For cephadm, see also :ref:`cephadm-spec-unmanaged`. | |
227 | ||
228 | .. _cephadm-osd-removal: | |
229 | ||
230 | Remove an OSD | |
231 | ============= | |
232 | ||
233 | Removing an OSD from a cluster involves two steps: | |
234 | ||
235 | #. evacuating all placement groups (PGs) from the OSD | |
236 | #. removing the PG-free OSD from the cluster | |
237 | ||
238 | The following command performs these two steps: | |
239 | ||
240 | .. prompt:: bash # | |
241 | ||
242 | ceph orch osd rm <osd_id(s)> [--replace] [--force] | |
243 | ||
244 | Example: | |
245 | ||
246 | .. prompt:: bash # | |
247 | ||
248 | ceph orch osd rm 0 | |
249 | ||
250 | Expected output:: | |
251 | ||
252 | Scheduled OSD(s) for removal | |
253 | ||
254 | OSDs that are not safe to destroy will be rejected. | |
255 | ||
256 | .. note:: | |
257 | After removing OSDs, if the drives the OSDs were deployed on once again | |
258 | become available, cephadm may automatically try to deploy more OSDs | |
259 | on these drives if they match an existing drivegroup spec. If you deployed | |
260 | the OSDs you are removing with a spec and don't want any new OSDs deployed on | |
261 | the drives after removal, it's best to modify the drivegroup spec before removal. | |
262 | Either set ``unmanaged: true`` to stop it from picking up new drives at all, | |
263 | or modify it in some way that it no longer matches the drives used for the | |
264 | OSDs you wish to remove. Then re-apply the spec. For more info on drivegroup | |
265 | specs see :ref:`drivegroups`. For more info on the declarative nature of | |
266 | cephadm in reference to deploying OSDs, see :ref:`cephadm-osd-declarative` | |
267 | ||
268 | Monitoring OSD State | |
269 | -------------------- | |
270 | ||
271 | You can query the state of OSD operation with the following command: | |
272 | ||
273 | .. prompt:: bash # | |
274 | ||
275 | ceph orch osd rm status | |
276 | ||
277 | Expected output:: | |
278 | ||
279 | OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT | |
280 | 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 | |
281 | 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 | |
282 | 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 | |
283 | ||
284 | ||
285 | When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. | |
286 | ||
287 | .. note:: | |
288 | After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. | |
289 | For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. | |
290 | ||
291 | Stopping OSD Removal | |
292 | -------------------- | |
293 | ||
294 | It is possible to stop queued OSD removals by using the following command: | |
295 | ||
296 | .. prompt:: bash # | |
297 | ||
298 | ceph orch osd rm stop <osd_id(s)> | |
299 | ||
300 | Example: | |
301 | ||
302 | .. prompt:: bash # | |
303 | ||
304 | ceph orch osd rm stop 4 | |
305 | ||
306 | Expected output:: | |
307 | ||
308 | Stopped OSD(s) removal | |
309 | ||
310 | This resets the initial state of the OSD and takes it off the removal queue. | |
311 | ||
312 | .. _cephadm-replacing-an-osd: | |
313 | ||
314 | Replacing an OSD | |
315 | ---------------- | |
316 | ||
317 | .. prompt:: bash # | |
318 | ||
319 | ceph orch osd rm <osd_id(s)> --replace [--force] | |
320 | ||
321 | Example: | |
322 | ||
323 | .. prompt:: bash # | |
324 | ||
325 | ceph orch osd rm 4 --replace | |
326 | ||
327 | Expected output:: | |
328 | ||
329 | Scheduled OSD(s) for replacement | |
330 | ||
331 | This follows the same procedure as the procedure in the "Remove OSD" section, with | |
332 | one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is | |
333 | instead assigned a 'destroyed' flag. | |
334 | ||
335 | .. note:: | |
336 | The new OSD that will replace the removed OSD must be created on the same host | |
337 | as the OSD that was removed. | |
338 | ||
339 | **Preserving the OSD ID** | |
340 | ||
341 | The 'destroyed' flag is used to determine which OSD ids will be reused in the | |
342 | next OSD deployment. | |
343 | ||
344 | If you use OSDSpecs for OSD deployment, your newly added disks will be assigned | |
345 | the OSD ids of their replaced counterparts. This assumes that the new disks | |
346 | still match the OSDSpecs. | |
347 | ||
348 | Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` | |
349 | command does what you want it to. The ``--dry-run`` flag shows you what the | |
350 | outcome of the command will be without making the changes you specify. When | |
351 | you are satisfied that the command will do what you want, run the command | |
352 | without the ``--dry-run`` flag. | |
353 | ||
354 | .. tip:: | |
355 | ||
356 | The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` | |
357 | ||
358 | Alternatively, you can use your OSDSpec file: | |
359 | ||
360 | .. prompt:: bash # | |
361 | ||
362 | ceph orch apply -i <osd_spec_file> --dry-run | |
363 | ||
364 | Expected output:: | |
365 | ||
366 | NAME HOST DATA DB WAL | |
367 | <name_of_osd_spec> node1 /dev/vdb - - | |
368 | ||
369 | ||
370 | When this output reflects your intention, omit the ``--dry-run`` flag to | |
371 | execute the deployment. | |
372 | ||
373 | ||
374 | Erasing Devices (Zapping Devices) | |
375 | --------------------------------- | |
376 | ||
377 | Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume | |
378 | zap`` on the remote host. | |
379 | ||
380 | .. prompt:: bash # | |
381 | ||
382 | ceph orch device zap <hostname> <path> | |
383 | ||
384 | Example command: | |
385 | ||
386 | .. prompt:: bash # | |
387 | ||
388 | ceph orch device zap my_hostname /dev/sdx | |
389 | ||
390 | .. note:: | |
391 | If the unmanaged flag is unset, cephadm automatically deploys drives that | |
392 | match the OSDSpec. For example, if you use the | |
393 | ``all-available-devices`` option when creating OSDs, when you ``zap`` a | |
394 | device the cephadm orchestrator automatically creates a new OSD in the | |
395 | device. To disable this behavior, see :ref:`cephadm-osd-declarative`. | |
396 | ||
397 | ||
398 | .. _osd_autotune: | |
399 | ||
400 | Automatically tuning OSD memory | |
401 | =============================== | |
402 | ||
403 | OSD daemons will adjust their memory consumption based on the | |
404 | ``osd_memory_target`` config option (several gigabytes, by | |
405 | default). If Ceph is deployed on dedicated nodes that are not sharing | |
406 | memory with other services, cephadm can automatically adjust the per-OSD | |
407 | memory consumption based on the total amount of RAM and the number of deployed | |
408 | OSDs. | |
409 | ||
410 | .. warning:: Cephadm sets ``osd_memory_target_autotune`` to ``true`` by default which is unsuitable for hyperconverged infrastructures. | |
411 | ||
412 | Cephadm will start with a fraction | |
413 | (``mgr/cephadm/autotune_memory_target_ratio``, which defaults to | |
414 | ``.7``) of the total RAM in the system, subtract off any memory | |
415 | consumed by non-autotuned daemons (non-OSDs, for OSDs for which | |
416 | ``osd_memory_target_autotune`` is false), and then divide by the | |
417 | remaining OSDs. | |
418 | ||
419 | The final targets are reflected in the config database with options like:: | |
420 | ||
421 | WHO MASK LEVEL OPTION VALUE | |
422 | osd host:foo basic osd_memory_target 126092301926 | |
423 | osd host:bar basic osd_memory_target 6442450944 | |
424 | ||
425 | Both the limits and the current memory consumed by each daemon are visible from | |
426 | the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: | |
427 | ||
428 | NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID | |
429 | osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c | |
430 | osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 | |
431 | osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 | |
432 | ||
433 | To exclude an OSD from memory autotuning, disable the autotune option | |
434 | for that OSD and also set a specific memory target. For example, | |
435 | ||
436 | .. prompt:: bash # | |
437 | ||
438 | ceph config set osd.123 osd_memory_target_autotune false | |
439 | ceph config set osd.123 osd_memory_target 16G | |
440 | ||
441 | ||
442 | .. _drivegroups: | |
443 | ||
444 | Advanced OSD Service Specifications | |
445 | =================================== | |
446 | ||
447 | :ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a | |
448 | cluster layout, using the properties of disks. Service specifications give the | |
449 | user an abstract way to tell Ceph which disks should turn into OSDs with which | |
450 | configurations, without knowing the specifics of device names and paths. | |
451 | ||
452 | Service specifications make it possible to define a yaml or json file that can | |
453 | be used to reduce the amount of manual work involved in creating OSDs. | |
454 | ||
455 | For example, instead of running the following command: | |
456 | ||
457 | .. prompt:: bash [monitor.1]# | |
458 | ||
459 | ceph orch daemon add osd *<host>*:*<path-to-device>* | |
460 | ||
461 | for each device and each host, we can define a yaml or json file that allows us | |
462 | to describe the layout. Here's the most basic example. | |
463 | ||
464 | Create a file called (for example) ``osd_spec.yml``: | |
465 | ||
466 | .. code-block:: yaml | |
467 | ||
468 | service_type: osd | |
469 | service_id: default_drive_group # custom name of the osd spec | |
470 | placement: | |
471 | host_pattern: '*' # which hosts to target | |
472 | spec: | |
473 | data_devices: # the type of devices you are applying specs to | |
474 | all: true # a filter, check below for a full list | |
475 | ||
476 | This means : | |
477 | ||
478 | #. Turn any available device (ceph-volume decides what 'available' is) into an | |
479 | OSD on all hosts that match the glob pattern '*'. (The glob pattern matches | |
480 | against the registered hosts from `host ls`) A more detailed section on | |
481 | host_pattern is available below. | |
482 | ||
483 | #. Then pass it to `osd create` like this: | |
484 | ||
485 | .. prompt:: bash [monitor.1]# | |
486 | ||
487 | ceph orch apply -i /path/to/osd_spec.yml | |
488 | ||
489 | This instruction will be issued to all the matching hosts, and will deploy | |
490 | these OSDs. | |
491 | ||
492 | Setups more complex than the one specified by the ``all`` filter are | |
493 | possible. See :ref:`osd_filters` for details. | |
494 | ||
495 | A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a | |
496 | synopsis of the proposed layout. | |
497 | ||
498 | Example | |
499 | ||
500 | .. prompt:: bash [monitor.1]# | |
501 | ||
502 | ceph orch apply -i /path/to/osd_spec.yml --dry-run | |
503 | ||
504 | ||
505 | ||
506 | .. _osd_filters: | |
507 | ||
508 | Filters | |
509 | ------- | |
510 | ||
511 | .. note:: | |
512 | Filters are applied using an `AND` gate by default. This means that a drive | |
513 | must fulfill all filter criteria in order to get selected. This behavior can | |
514 | be adjusted by setting ``filter_logic: OR`` in the OSD specification. | |
515 | ||
516 | Filters are used to assign disks to groups, using their attributes to group | |
517 | them. | |
518 | ||
519 | The attributes are based off of ceph-volume's disk query. You can retrieve | |
520 | information about the attributes with this command: | |
521 | ||
522 | .. code-block:: bash | |
523 | ||
524 | ceph-volume inventory </path/to/disk> | |
525 | ||
526 | Vendor or Model | |
527 | ^^^^^^^^^^^^^^^ | |
528 | ||
529 | Specific disks can be targeted by vendor or model: | |
530 | ||
531 | .. code-block:: yaml | |
532 | ||
533 | model: disk_model_name | |
534 | ||
535 | or | |
536 | ||
537 | .. code-block:: yaml | |
538 | ||
539 | vendor: disk_vendor_name | |
540 | ||
541 | ||
542 | Size | |
543 | ^^^^ | |
544 | ||
545 | Specific disks can be targeted by `Size`: | |
546 | ||
547 | .. code-block:: yaml | |
548 | ||
549 | size: size_spec | |
550 | ||
551 | Size specs | |
552 | __________ | |
553 | ||
554 | Size specifications can be of the following forms: | |
555 | ||
556 | * LOW:HIGH | |
557 | * :HIGH | |
558 | * LOW: | |
559 | * EXACT | |
560 | ||
561 | Concrete examples: | |
562 | ||
563 | To include disks of an exact size | |
564 | ||
565 | .. code-block:: yaml | |
566 | ||
567 | size: '10G' | |
568 | ||
569 | To include disks within a given range of size: | |
570 | ||
571 | .. code-block:: yaml | |
572 | ||
573 | size: '10G:40G' | |
574 | ||
575 | To include disks that are less than or equal to 10G in size: | |
576 | ||
577 | .. code-block:: yaml | |
578 | ||
579 | size: ':10G' | |
580 | ||
581 | To include disks equal to or greater than 40G in size: | |
582 | ||
583 | .. code-block:: yaml | |
584 | ||
585 | size: '40G:' | |
586 | ||
587 | Sizes don't have to be specified exclusively in Gigabytes(G). | |
588 | ||
589 | Other units of size are supported: Megabyte(M), Gigabyte(G) and Terabyte(T). | |
590 | Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. | |
591 | ||
592 | ||
593 | Rotational | |
594 | ^^^^^^^^^^ | |
595 | ||
596 | This operates on the 'rotational' attribute of the disk. | |
597 | ||
598 | .. code-block:: yaml | |
599 | ||
600 | rotational: 0 | 1 | |
601 | ||
602 | `1` to match all disks that are rotational | |
603 | ||
604 | `0` to match all disks that are non-rotational (SSD, NVME etc) | |
605 | ||
606 | ||
607 | All | |
608 | ^^^ | |
609 | ||
610 | This will take all disks that are 'available' | |
611 | ||
612 | .. note:: This is exclusive for the data_devices section. | |
613 | ||
614 | .. code-block:: yaml | |
615 | ||
616 | all: true | |
617 | ||
618 | ||
619 | Limiter | |
620 | ^^^^^^^ | |
621 | ||
622 | If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: | |
623 | ||
624 | .. code-block:: yaml | |
625 | ||
626 | limit: 2 | |
627 | ||
628 | For example, if you used `vendor` to match all disks that are from `VendorA` | |
629 | but want to use only the first two, you could use `limit`: | |
630 | ||
631 | .. code-block:: yaml | |
632 | ||
633 | data_devices: | |
634 | vendor: VendorA | |
635 | limit: 2 | |
636 | ||
637 | .. note:: `limit` is a last resort and shouldn't be used if it can be avoided. | |
638 | ||
639 | ||
640 | Additional Options | |
641 | ------------------ | |
642 | ||
643 | There are multiple optional settings you can use to change the way OSDs are deployed. | |
644 | You can add these options to the base level of an OSD spec for it to take effect. | |
645 | ||
646 | This example would deploy all OSDs with encryption enabled. | |
647 | ||
648 | .. code-block:: yaml | |
649 | ||
650 | service_type: osd | |
651 | service_id: example_osd_spec | |
652 | placement: | |
653 | host_pattern: '*' | |
654 | spec: | |
655 | data_devices: | |
656 | all: true | |
657 | encrypted: true | |
658 | ||
659 | See a full list in the DriveGroupSpecs | |
660 | ||
661 | .. py:currentmodule:: ceph.deployment.drive_group | |
662 | ||
663 | .. autoclass:: DriveGroupSpec | |
664 | :members: | |
665 | :exclude-members: from_json | |
666 | ||
667 | ||
668 | Examples | |
669 | ======== | |
670 | ||
671 | The simple case | |
672 | --------------- | |
673 | ||
674 | All nodes with the same setup | |
675 | ||
676 | .. code-block:: none | |
677 | ||
678 | 20 HDDs | |
679 | Vendor: VendorA | |
680 | Model: HDD-123-foo | |
681 | Size: 4TB | |
682 | ||
683 | 2 SSDs | |
684 | Vendor: VendorB | |
685 | Model: MC-55-44-ZX | |
686 | Size: 512GB | |
687 | ||
688 | This is a common setup and can be described quite easily: | |
689 | ||
690 | .. code-block:: yaml | |
691 | ||
692 | service_type: osd | |
693 | service_id: osd_spec_default | |
694 | placement: | |
695 | host_pattern: '*' | |
696 | spec: | |
697 | data_devices: | |
698 | model: HDD-123-foo # Note, HDD-123 would also be valid | |
699 | db_devices: | |
700 | model: MC-55-44-XZ # Same here, MC-55-44 is valid | |
701 | ||
702 | However, we can improve it by reducing the filters on core properties of the drives: | |
703 | ||
704 | .. code-block:: yaml | |
705 | ||
706 | service_type: osd | |
707 | service_id: osd_spec_default | |
708 | placement: | |
709 | host_pattern: '*' | |
710 | spec: | |
711 | data_devices: | |
712 | rotational: 1 | |
713 | db_devices: | |
714 | rotational: 0 | |
715 | ||
716 | Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) | |
717 | ||
718 | If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: | |
719 | ||
720 | .. code-block:: yaml | |
721 | ||
722 | service_type: osd | |
723 | service_id: osd_spec_default | |
724 | placement: | |
725 | host_pattern: '*' | |
726 | spec: | |
727 | data_devices: | |
728 | size: '2TB:' | |
729 | db_devices: | |
730 | size: ':2TB' | |
731 | ||
732 | .. note:: All of the above OSD specs are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. | |
733 | ||
734 | ||
735 | Multiple OSD specs for a single host | |
736 | ------------------------------------ | |
737 | ||
738 | Here we have two distinct setups | |
739 | ||
740 | .. code-block:: none | |
741 | ||
742 | 20 HDDs | |
743 | Vendor: VendorA | |
744 | Model: HDD-123-foo | |
745 | Size: 4TB | |
746 | ||
747 | 12 SSDs | |
748 | Vendor: VendorB | |
749 | Model: MC-55-44-ZX | |
750 | Size: 512GB | |
751 | ||
752 | 2 NVMEs | |
753 | Vendor: VendorC | |
754 | Model: NVME-QQQQ-987 | |
755 | Size: 256GB | |
756 | ||
757 | ||
758 | * 20 HDDs should share 2 SSDs | |
759 | * 10 SSDs should share 2 NVMes | |
760 | ||
761 | This can be described with two layouts. | |
762 | ||
763 | .. code-block:: yaml | |
764 | ||
765 | service_type: osd | |
766 | service_id: osd_spec_hdd | |
767 | placement: | |
768 | host_pattern: '*' | |
769 | spec: | |
770 | data_devices: | |
771 | rotational: 1 | |
772 | db_devices: | |
773 | model: MC-55-44-XZ | |
774 | limit: 2 # db_slots is actually to be favoured here, but it's not implemented yet | |
775 | --- | |
776 | service_type: osd | |
777 | service_id: osd_spec_ssd | |
778 | placement: | |
779 | host_pattern: '*' | |
780 | spec: | |
781 | data_devices: | |
782 | model: MC-55-44-XZ | |
783 | db_devices: | |
784 | vendor: VendorC | |
785 | ||
786 | This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. | |
787 | The remaining SSDs(10) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. | |
788 | ||
789 | Multiple hosts with the same disk layout | |
790 | ---------------------------------------- | |
791 | ||
792 | Assuming the cluster has different kinds of hosts each with similar disk | |
793 | layout, it is recommended to apply different OSD specs matching only one | |
794 | set of hosts. Typically you will have a spec for multiple hosts with the | |
795 | same layout. | |
796 | ||
797 | The service id as the unique key: In case a new OSD spec with an already | |
798 | applied service id is applied, the existing OSD spec will be superseded. | |
799 | cephadm will now create new OSD daemons based on the new spec | |
800 | definition. Existing OSD daemons will not be affected. See :ref:`cephadm-osd-declarative`. | |
801 | ||
802 | Node1-5 | |
803 | ||
804 | .. code-block:: none | |
805 | ||
806 | 20 HDDs | |
807 | Vendor: VendorA | |
808 | Model: SSD-123-foo | |
809 | Size: 4TB | |
810 | 2 SSDs | |
811 | Vendor: VendorB | |
812 | Model: MC-55-44-ZX | |
813 | Size: 512GB | |
814 | ||
815 | Node6-10 | |
816 | ||
817 | .. code-block:: none | |
818 | ||
819 | 5 NVMEs | |
820 | Vendor: VendorA | |
821 | Model: SSD-123-foo | |
822 | Size: 4TB | |
823 | 20 SSDs | |
824 | Vendor: VendorB | |
825 | Model: MC-55-44-ZX | |
826 | Size: 512GB | |
827 | ||
828 | You can use the 'placement' key in the layout to target certain nodes. | |
829 | ||
830 | .. code-block:: yaml | |
831 | ||
832 | service_type: osd | |
833 | service_id: disk_layout_a | |
834 | placement: | |
835 | label: disk_layout_a | |
836 | spec: | |
837 | data_devices: | |
838 | rotational: 1 | |
839 | db_devices: | |
840 | rotational: 0 | |
841 | --- | |
842 | service_type: osd | |
843 | service_id: disk_layout_b | |
844 | placement: | |
845 | label: disk_layout_b | |
846 | spec: | |
847 | data_devices: | |
848 | model: MC-55-44-XZ | |
849 | db_devices: | |
850 | model: SSD-123-foo | |
851 | ||
852 | ||
853 | This applies different OSD specs to different hosts depending on the `placement` key. | |
854 | See :ref:`orchestrator-cli-placement-spec` | |
855 | ||
856 | .. note:: | |
857 | ||
858 | Assuming each host has a unique disk layout, each OSD | |
859 | spec needs to have a different service id | |
860 | ||
861 | ||
862 | Dedicated wal + db | |
863 | ------------------ | |
864 | ||
865 | All previous cases co-located the WALs with the DBs. | |
866 | It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. | |
867 | ||
868 | .. code-block:: none | |
869 | ||
870 | 20 HDDs | |
871 | Vendor: VendorA | |
872 | Model: SSD-123-foo | |
873 | Size: 4TB | |
874 | ||
875 | 2 SSDs | |
876 | Vendor: VendorB | |
877 | Model: MC-55-44-ZX | |
878 | Size: 512GB | |
879 | ||
880 | 2 NVMEs | |
881 | Vendor: VendorC | |
882 | Model: NVME-QQQQ-987 | |
883 | Size: 256GB | |
884 | ||
885 | ||
886 | The OSD spec for this case would look like the following (using the `model` filter): | |
887 | ||
888 | .. code-block:: yaml | |
889 | ||
890 | service_type: osd | |
891 | service_id: osd_spec_default | |
892 | placement: | |
893 | host_pattern: '*' | |
894 | spec: | |
895 | data_devices: | |
896 | model: MC-55-44-XZ | |
897 | db_devices: | |
898 | model: SSD-123-foo | |
899 | wal_devices: | |
900 | model: NVME-QQQQ-987 | |
901 | ||
902 | ||
903 | It is also possible to specify directly device paths in specific hosts like the following: | |
904 | ||
905 | .. code-block:: yaml | |
906 | ||
907 | service_type: osd | |
908 | service_id: osd_using_paths | |
909 | placement: | |
910 | hosts: | |
911 | - Node01 | |
912 | - Node02 | |
913 | spec: | |
914 | data_devices: | |
915 | paths: | |
916 | - /dev/sdb | |
917 | db_devices: | |
918 | paths: | |
919 | - /dev/sdc | |
920 | wal_devices: | |
921 | paths: | |
922 | - /dev/sdd | |
923 | ||
924 | ||
925 | This can easily be done with other filters, like `size` or `vendor` as well. | |
926 | ||
927 | It's possible to specify the `crush_device_class` parameter within the | |
928 | DriveGroup spec, and it's applied to all the devices defined by the `paths` | |
929 | keyword: | |
930 | ||
931 | .. code-block:: yaml | |
932 | ||
933 | service_type: osd | |
934 | service_id: osd_using_paths | |
935 | placement: | |
936 | hosts: | |
937 | - Node01 | |
938 | - Node02 | |
939 | crush_device_class: ssd | |
940 | spec: | |
941 | data_devices: | |
942 | paths: | |
943 | - /dev/sdb | |
944 | - /dev/sdc | |
945 | db_devices: | |
946 | paths: | |
947 | - /dev/sdd | |
948 | wal_devices: | |
949 | paths: | |
950 | - /dev/sde | |
951 | ||
952 | The `crush_device_class` parameter, however, can be defined for each OSD passed | |
953 | using the `paths` keyword with the following syntax: | |
954 | ||
955 | .. code-block:: yaml | |
956 | ||
957 | service_type: osd | |
958 | service_id: osd_using_paths | |
959 | placement: | |
960 | hosts: | |
961 | - Node01 | |
962 | - Node02 | |
963 | crush_device_class: ssd | |
964 | spec: | |
965 | data_devices: | |
966 | paths: | |
967 | - path: /dev/sdb | |
968 | crush_device_class: ssd | |
969 | - path: /dev/sdc | |
970 | crush_device_class: nvme | |
971 | db_devices: | |
972 | paths: | |
973 | - /dev/sdd | |
974 | wal_devices: | |
975 | paths: | |
976 | - /dev/sde | |
977 | ||
978 | .. _cephadm-osd-activate: | |
979 | ||
980 | Activate existing OSDs | |
981 | ====================== | |
982 | ||
983 | In case the OS of a host was reinstalled, existing OSDs need to be activated | |
984 | again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that | |
985 | activates all existing OSDs on a host. | |
986 | ||
987 | .. prompt:: bash # | |
988 | ||
989 | ceph cephadm osd activate <host>... | |
990 | ||
991 | This will scan all existing disks for OSDs and deploy corresponding daemons. | |
992 | ||
993 | Further Reading | |
994 | =============== | |
995 | ||
996 | * :ref:`ceph-volume` | |
997 | * :ref:`rados-index` |