]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | *********** |
2 | OSD Service | |
3 | *********** | |
4 | .. _device management: ../rados/operations/devices | |
5 | .. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt | |
6 | ||
7 | List Devices | |
8 | ============ | |
9 | ||
522d829b | 10 | ``ceph-volume`` scans each host in the cluster from time to time in order |
f67539c2 TL |
11 | to determine which devices are present and whether they are eligible to be |
12 | used as OSDs. | |
13 | ||
14 | To print a list of devices discovered by ``cephadm``, run this command: | |
15 | ||
16 | .. prompt:: bash # | |
17 | ||
18 | ceph orch device ls [--hostname=...] [--wide] [--refresh] | |
19 | ||
20 | Example | |
21 | :: | |
22 | ||
23 | Hostname Path Type Serial Size Health Ident Fault Available | |
24 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No | |
25 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No | |
26 | srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No | |
27 | srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No | |
28 | srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No | |
29 | srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No | |
30 | srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No | |
31 | srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No | |
32 | srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No | |
33 | srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No | |
34 | srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No | |
35 | ||
36 | Using the ``--wide`` option provides all details relating to the device, | |
37 | including any reasons that the device might not be eligible for use as an OSD. | |
38 | ||
39 | In the above example you can see fields named "Health", "Ident", and "Fault". | |
40 | This information is provided by integration with `libstoragemgmt`_. By default, | |
41 | this integration is disabled (because `libstoragemgmt`_ may not be 100% | |
42 | compatible with your hardware). To make ``cephadm`` include these fields, | |
43 | enable cephadm's "enhanced device scan" option as follows; | |
44 | ||
45 | .. prompt:: bash # | |
46 | ||
47 | ceph config set mgr mgr/cephadm/device_enhanced_scan true | |
48 | ||
49 | .. warning:: | |
50 | Although the libstoragemgmt library performs standard SCSI inquiry calls, | |
51 | there is no guarantee that your firmware fully implements these standards. | |
52 | This can lead to erratic behaviour and even bus resets on some older | |
53 | hardware. It is therefore recommended that, before enabling this feature, | |
54 | you test your hardware's compatibility with libstoragemgmt first to avoid | |
55 | unplanned interruptions to services. | |
56 | ||
57 | There are a number of ways to test compatibility, but the simplest may be | |
58 | to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell | |
59 | lsmcli ldl``. If your hardware is supported you should see something like | |
60 | this: | |
61 | ||
62 | :: | |
63 | ||
64 | Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status | |
65 | ---------------------------------------------------------------------------- | |
66 | /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good | |
67 | /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good | |
68 | ||
69 | ||
70 | After you have enabled libstoragemgmt support, the output will look something | |
71 | like this: | |
72 | ||
73 | :: | |
74 | ||
75 | # ceph orch device ls | |
76 | Hostname Path Type Serial Size Health Ident Fault Available | |
77 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No | |
78 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No | |
79 | : | |
80 | ||
81 | In this example, libstoragemgmt has confirmed the health of the drives and the ability to | |
82 | interact with the Identification and Fault LEDs on the drive enclosures. For further | |
83 | information about interacting with these LEDs, refer to `device management`_. | |
84 | ||
85 | .. note:: | |
86 | The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based | |
87 | local disks only. There is no official support for NVMe devices (PCIe) | |
88 | ||
89 | .. _cephadm-deploy-osds: | |
90 | ||
91 | Deploy OSDs | |
92 | =========== | |
93 | ||
94 | Listing Storage Devices | |
95 | ----------------------- | |
96 | ||
97 | In order to deploy an OSD, there must be a storage device that is *available* on | |
98 | which the OSD will be deployed. | |
99 | ||
100 | Run this command to display an inventory of storage devices on all cluster hosts: | |
101 | ||
102 | .. prompt:: bash # | |
103 | ||
104 | ceph orch device ls | |
105 | ||
106 | A storage device is considered *available* if all of the following | |
107 | conditions are met: | |
108 | ||
109 | * The device must have no partitions. | |
110 | * The device must not have any LVM state. | |
111 | * The device must not be mounted. | |
112 | * The device must not contain a file system. | |
113 | * The device must not contain a Ceph BlueStore OSD. | |
114 | * The device must be larger than 5 GB. | |
115 | ||
116 | Ceph will not provision an OSD on a device that is not available. | |
117 | ||
118 | Creating New OSDs | |
119 | ----------------- | |
120 | ||
121 | There are a few ways to create new OSDs: | |
122 | ||
123 | * Tell Ceph to consume any available and unused storage device: | |
124 | ||
125 | .. prompt:: bash # | |
126 | ||
127 | ceph orch apply osd --all-available-devices | |
128 | ||
129 | * Create an OSD from a specific device on a specific host: | |
130 | ||
131 | .. prompt:: bash # | |
132 | ||
133 | ceph orch daemon add osd *<host>*:*<device-path>* | |
134 | ||
135 | For example: | |
136 | ||
137 | .. prompt:: bash # | |
138 | ||
139 | ceph orch daemon add osd host1:/dev/sdb | |
140 | ||
141 | * You can use :ref:`drivegroups` to categorize device(s) based on their | |
142 | properties. This might be useful in forming a clearer picture of which | |
143 | devices are available to consume. Properties include device type (SSD or | |
144 | HDD), device model names, size, and the hosts on which the devices exist: | |
145 | ||
146 | .. prompt:: bash # | |
147 | ||
148 | ceph orch apply -i spec.yml | |
149 | ||
150 | Dry Run | |
151 | ------- | |
152 | ||
153 | The ``--dry-run`` flag causes the orchestrator to present a preview of what | |
154 | will happen without actually creating the OSDs. | |
155 | ||
156 | For example: | |
157 | ||
158 | .. prompt:: bash # | |
159 | ||
160 | ceph orch apply osd --all-available-devices --dry-run | |
161 | ||
162 | :: | |
163 | ||
164 | NAME HOST DATA DB WAL | |
165 | all-available-devices node1 /dev/vdb - - | |
166 | all-available-devices node2 /dev/vdc - - | |
167 | all-available-devices node3 /dev/vdd - - | |
168 | ||
169 | .. _cephadm-osd-declarative: | |
170 | ||
171 | Declarative State | |
172 | ----------------- | |
173 | ||
b3b6e05e TL |
174 | The effect of ``ceph orch apply`` is persistent. This means that drives that |
175 | are added to the system after the ``ceph orch apply`` command completes will be | |
176 | automatically found and added to the cluster. It also means that drives that | |
177 | become available (by zapping, for example) after the ``ceph orch apply`` | |
178 | command completes will be automatically found and added to the cluster. | |
f67539c2 | 179 | |
b3b6e05e | 180 | We will examine the effects of the following command: |
f67539c2 | 181 | |
b3b6e05e TL |
182 | .. prompt:: bash # |
183 | ||
184 | ceph orch apply osd --all-available-devices | |
185 | ||
186 | After running the above command: | |
187 | ||
188 | * If you add new disks to the cluster, they will automatically be used to | |
189 | create new OSDs. | |
190 | * If you remove an OSD and clean the LVM physical volume, a new OSD will be | |
191 | created automatically. | |
f67539c2 | 192 | |
b3b6e05e TL |
193 | To disable the automatic creation of OSD on available devices, use the |
194 | ``unmanaged`` parameter: | |
f67539c2 TL |
195 | |
196 | If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: | |
197 | ||
198 | .. prompt:: bash # | |
199 | ||
200 | ceph orch apply osd --all-available-devices --unmanaged=true | |
201 | ||
b3b6e05e TL |
202 | .. note:: |
203 | ||
204 | Keep these three facts in mind: | |
205 | ||
206 | - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. | |
207 | ||
208 | - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. | |
209 | ||
210 | - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. | |
211 | ||
f67539c2 TL |
212 | * For cephadm, see also :ref:`cephadm-spec-unmanaged`. |
213 | ||
522d829b | 214 | .. _cephadm-osd-removal: |
f67539c2 TL |
215 | |
216 | Remove an OSD | |
217 | ============= | |
218 | ||
219 | Removing an OSD from a cluster involves two steps: | |
220 | ||
221 | #. evacuating all placement groups (PGs) from the cluster | |
222 | #. removing the PG-free OSD from the cluster | |
223 | ||
224 | The following command performs these two steps: | |
225 | ||
226 | .. prompt:: bash # | |
227 | ||
228 | ceph orch osd rm <osd_id(s)> [--replace] [--force] | |
229 | ||
230 | Example: | |
231 | ||
232 | .. prompt:: bash # | |
233 | ||
234 | ceph orch osd rm 0 | |
235 | ||
236 | Expected output:: | |
237 | ||
238 | Scheduled OSD(s) for removal | |
239 | ||
240 | OSDs that are not safe to destroy will be rejected. | |
241 | ||
242 | Monitoring OSD State | |
243 | -------------------- | |
244 | ||
245 | You can query the state of OSD operation with the following command: | |
246 | ||
247 | .. prompt:: bash # | |
248 | ||
249 | ceph orch osd rm status | |
250 | ||
251 | Expected output:: | |
252 | ||
253 | OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT | |
254 | 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 | |
255 | 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 | |
256 | 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 | |
257 | ||
258 | ||
259 | When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. | |
260 | ||
261 | .. note:: | |
262 | After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. | |
263 | For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. | |
264 | ||
265 | Stopping OSD Removal | |
266 | -------------------- | |
267 | ||
268 | It is possible to stop queued OSD removals by using the following command: | |
269 | ||
270 | .. prompt:: bash # | |
271 | ||
b3b6e05e | 272 | ceph orch osd rm stop <osd_id(s)> |
f67539c2 TL |
273 | |
274 | Example: | |
275 | ||
276 | .. prompt:: bash # | |
277 | ||
278 | ceph orch osd rm stop 4 | |
279 | ||
280 | Expected output:: | |
281 | ||
282 | Stopped OSD(s) removal | |
283 | ||
284 | This resets the initial state of the OSD and takes it off the removal queue. | |
285 | ||
286 | ||
287 | Replacing an OSD | |
288 | ---------------- | |
289 | ||
290 | .. prompt:: bash # | |
291 | ||
b3b6e05e | 292 | orch osd rm <osd_id(s)> --replace [--force] |
f67539c2 TL |
293 | |
294 | Example: | |
295 | ||
296 | .. prompt:: bash # | |
297 | ||
298 | ceph orch osd rm 4 --replace | |
299 | ||
300 | Expected output:: | |
301 | ||
302 | Scheduled OSD(s) for replacement | |
303 | ||
304 | This follows the same procedure as the procedure in the "Remove OSD" section, with | |
305 | one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is | |
306 | instead assigned a 'destroyed' flag. | |
307 | ||
308 | **Preserving the OSD ID** | |
309 | ||
310 | The 'destroyed' flag is used to determine which OSD ids will be reused in the | |
311 | next OSD deployment. | |
312 | ||
313 | If you use OSDSpecs for OSD deployment, your newly added disks will be assigned | |
314 | the OSD ids of their replaced counterparts. This assumes that the new disks | |
315 | still match the OSDSpecs. | |
316 | ||
317 | Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` | |
318 | command does what you want it to. The ``--dry-run`` flag shows you what the | |
319 | outcome of the command will be without making the changes you specify. When | |
320 | you are satisfied that the command will do what you want, run the command | |
321 | without the ``--dry-run`` flag. | |
322 | ||
323 | .. tip:: | |
324 | ||
325 | The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` | |
326 | ||
327 | Alternatively, you can use your OSDSpec file: | |
328 | ||
329 | .. prompt:: bash # | |
330 | ||
331 | ceph orch apply osd -i <osd_spec_file> --dry-run | |
332 | ||
333 | Expected output:: | |
334 | ||
335 | NAME HOST DATA DB WAL | |
336 | <name_of_osd_spec> node1 /dev/vdb - - | |
337 | ||
338 | ||
339 | When this output reflects your intention, omit the ``--dry-run`` flag to | |
340 | execute the deployment. | |
341 | ||
342 | ||
343 | Erasing Devices (Zapping Devices) | |
344 | --------------------------------- | |
345 | ||
346 | Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume | |
347 | zap`` on the remote host. | |
348 | ||
349 | .. prompt:: bash # | |
350 | ||
522d829b | 351 | ceph orch device zap <hostname> <path> |
f67539c2 TL |
352 | |
353 | Example command: | |
354 | ||
355 | .. prompt:: bash # | |
356 | ||
357 | ceph orch device zap my_hostname /dev/sdx | |
358 | ||
359 | .. note:: | |
360 | If the unmanaged flag is unset, cephadm automatically deploys drives that | |
361 | match the DriveGroup in your OSDSpec. For example, if you use the | |
362 | ``all-available-devices`` option when creating OSDs, when you ``zap`` a | |
363 | device the cephadm orchestrator automatically creates a new OSD in the | |
364 | device. To disable this behavior, see :ref:`cephadm-osd-declarative`. | |
365 | ||
366 | ||
b3b6e05e TL |
367 | .. _osd_autotune: |
368 | ||
369 | Automatically tuning OSD memory | |
370 | =============================== | |
371 | ||
372 | OSD daemons will adjust their memory consumption based on the | |
373 | ``osd_memory_target`` config option (several gigabytes, by | |
374 | default). If Ceph is deployed on dedicated nodes that are not sharing | |
375 | memory with other services, cephadm can automatically adjust the per-OSD | |
376 | memory consumption based on the total amount of RAM and the number of deployed | |
377 | OSDs. | |
378 | ||
379 | This option is enabled globally with:: | |
380 | ||
381 | ceph config set osd osd_memory_target_autotune true | |
382 | ||
383 | Cephadm will start with a fraction | |
384 | (``mgr/cephadm/autotune_memory_target_ratio``, which defaults to | |
385 | ``.7``) of the total RAM in the system, subtract off any memory | |
386 | consumed by non-autotuned daemons (non-OSDs, for OSDs for which | |
387 | ``osd_memory_target_autotune`` is false), and then divide by the | |
388 | remaining OSDs. | |
389 | ||
390 | The final targets are reflected in the config database with options like:: | |
391 | ||
392 | WHO MASK LEVEL OPTION VALUE | |
393 | osd host:foo basic osd_memory_target 126092301926 | |
394 | osd host:bar basic osd_memory_target 6442450944 | |
395 | ||
396 | Both the limits and the current memory consumed by each daemon are visible from | |
397 | the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: | |
398 | ||
399 | NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID | |
400 | osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c | |
401 | osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 | |
402 | osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 | |
403 | ||
404 | To exclude an OSD from memory autotuning, disable the autotune option | |
405 | for that OSD and also set a specific memory target. For example, | |
406 | ||
407 | .. prompt:: bash # | |
408 | ||
409 | ceph config set osd.123 osd_memory_target_autotune false | |
410 | ceph config set osd.123 osd_memory_target 16G | |
411 | ||
412 | ||
f67539c2 TL |
413 | .. _drivegroups: |
414 | ||
415 | Advanced OSD Service Specifications | |
416 | =================================== | |
417 | ||
b3b6e05e TL |
418 | :ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a |
419 | cluster layout, using the properties of disks. Service specifications give the | |
420 | user an abstract way to tell Ceph which disks should turn into OSDs with which | |
421 | configurations, without knowing the specifics of device names and paths. | |
422 | ||
423 | Service specifications make it possible to define a yaml or json file that can | |
424 | be used to reduce the amount of manual work involved in creating OSDs. | |
f67539c2 | 425 | |
b3b6e05e | 426 | For example, instead of running the following command: |
f67539c2 TL |
427 | |
428 | .. prompt:: bash [monitor.1]# | |
429 | ||
430 | ceph orch daemon add osd *<host>*:*<path-to-device>* | |
431 | ||
b3b6e05e TL |
432 | for each device and each host, we can define a yaml or json file that allows us |
433 | to describe the layout. Here's the most basic example. | |
f67539c2 | 434 | |
b3b6e05e | 435 | Create a file called (for example) ``osd_spec.yml``: |
f67539c2 TL |
436 | |
437 | .. code-block:: yaml | |
438 | ||
439 | service_type: osd | |
440 | service_id: default_drive_group <- name of the drive_group (name can be custom) | |
441 | placement: | |
442 | host_pattern: '*' <- which hosts to target, currently only supports globs | |
443 | data_devices: <- the type of devices you are applying specs to | |
444 | all: true <- a filter, check below for a full list | |
445 | ||
b3b6e05e | 446 | This means : |
f67539c2 | 447 | |
b3b6e05e TL |
448 | #. Turn any available device (ceph-volume decides what 'available' is) into an |
449 | OSD on all hosts that match the glob pattern '*'. (The glob pattern matches | |
450 | against the registered hosts from `host ls`) A more detailed section on | |
451 | host_pattern is available below. | |
f67539c2 | 452 | |
b3b6e05e | 453 | #. Then pass it to `osd create` like this: |
f67539c2 | 454 | |
b3b6e05e | 455 | .. prompt:: bash [monitor.1]# |
f67539c2 | 456 | |
b3b6e05e | 457 | ceph orch apply osd -i /path/to/osd_spec.yml |
f67539c2 | 458 | |
b3b6e05e TL |
459 | This instruction will be issued to all the matching hosts, and will deploy |
460 | these OSDs. | |
f67539c2 | 461 | |
b3b6e05e TL |
462 | Setups more complex than the one specified by the ``all`` filter are |
463 | possible. See :ref:`osd_filters` for details. | |
f67539c2 | 464 | |
b3b6e05e TL |
465 | A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a |
466 | synopsis of the proposed layout. | |
f67539c2 TL |
467 | |
468 | Example | |
469 | ||
470 | .. prompt:: bash [monitor.1]# | |
471 | ||
b3b6e05e TL |
472 | ceph orch apply osd -i /path/to/osd_spec.yml --dry-run |
473 | ||
f67539c2 TL |
474 | |
475 | ||
b3b6e05e | 476 | .. _osd_filters: |
f67539c2 TL |
477 | |
478 | Filters | |
479 | ------- | |
480 | ||
481 | .. note:: | |
b3b6e05e TL |
482 | Filters are applied using an `AND` gate by default. This means that a drive |
483 | must fulfill all filter criteria in order to get selected. This behavior can | |
484 | be adjusted by setting ``filter_logic: OR`` in the OSD specification. | |
f67539c2 | 485 | |
b3b6e05e TL |
486 | Filters are used to assign disks to groups, using their attributes to group |
487 | them. | |
f67539c2 | 488 | |
b3b6e05e TL |
489 | The attributes are based off of ceph-volume's disk query. You can retrieve |
490 | information about the attributes with this command: | |
f67539c2 TL |
491 | |
492 | .. code-block:: bash | |
493 | ||
494 | ceph-volume inventory </path/to/disk> | |
495 | ||
b3b6e05e TL |
496 | Vendor or Model |
497 | ^^^^^^^^^^^^^^^ | |
f67539c2 | 498 | |
b3b6e05e | 499 | Specific disks can be targeted by vendor or model: |
f67539c2 TL |
500 | |
501 | .. code-block:: yaml | |
502 | ||
503 | model: disk_model_name | |
504 | ||
505 | or | |
506 | ||
507 | .. code-block:: yaml | |
508 | ||
509 | vendor: disk_vendor_name | |
510 | ||
511 | ||
b3b6e05e TL |
512 | Size |
513 | ^^^^ | |
f67539c2 | 514 | |
b3b6e05e | 515 | Specific disks can be targeted by `Size`: |
f67539c2 TL |
516 | |
517 | .. code-block:: yaml | |
518 | ||
519 | size: size_spec | |
520 | ||
b3b6e05e TL |
521 | Size specs |
522 | __________ | |
f67539c2 | 523 | |
b3b6e05e | 524 | Size specifications can be of the following forms: |
f67539c2 TL |
525 | |
526 | * LOW:HIGH | |
527 | * :HIGH | |
528 | * LOW: | |
529 | * EXACT | |
530 | ||
531 | Concrete examples: | |
532 | ||
b3b6e05e | 533 | To include disks of an exact size |
f67539c2 TL |
534 | |
535 | .. code-block:: yaml | |
536 | ||
537 | size: '10G' | |
538 | ||
b3b6e05e | 539 | To include disks within a given range of size: |
f67539c2 TL |
540 | |
541 | .. code-block:: yaml | |
542 | ||
543 | size: '10G:40G' | |
544 | ||
b3b6e05e | 545 | To include disks that are less than or equal to 10G in size: |
f67539c2 TL |
546 | |
547 | .. code-block:: yaml | |
548 | ||
549 | size: ':10G' | |
550 | ||
b3b6e05e | 551 | To include disks equal to or greater than 40G in size: |
f67539c2 TL |
552 | |
553 | .. code-block:: yaml | |
554 | ||
555 | size: '40G:' | |
556 | ||
b3b6e05e | 557 | Sizes don't have to be specified exclusively in Gigabytes(G). |
f67539c2 | 558 | |
b3b6e05e TL |
559 | Other units of size are supported: Megabyte(M), Gigabyte(G) and Terrabyte(T). |
560 | Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. | |
f67539c2 TL |
561 | |
562 | ||
b3b6e05e TL |
563 | Rotational |
564 | ^^^^^^^^^^ | |
f67539c2 TL |
565 | |
566 | This operates on the 'rotational' attribute of the disk. | |
567 | ||
568 | .. code-block:: yaml | |
569 | ||
570 | rotational: 0 | 1 | |
571 | ||
572 | `1` to match all disks that are rotational | |
573 | ||
574 | `0` to match all disks that are non-rotational (SSD, NVME etc) | |
575 | ||
576 | ||
b3b6e05e TL |
577 | All |
578 | ^^^ | |
f67539c2 TL |
579 | |
580 | This will take all disks that are 'available' | |
581 | ||
582 | Note: This is exclusive for the data_devices section. | |
583 | ||
584 | .. code-block:: yaml | |
585 | ||
586 | all: true | |
587 | ||
588 | ||
b3b6e05e TL |
589 | Limiter |
590 | ^^^^^^^ | |
f67539c2 | 591 | |
b3b6e05e | 592 | If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: |
f67539c2 TL |
593 | |
594 | .. code-block:: yaml | |
595 | ||
596 | limit: 2 | |
597 | ||
b3b6e05e TL |
598 | For example, if you used `vendor` to match all disks that are from `VendorA` |
599 | but want to use only the first two, you could use `limit`: | |
f67539c2 TL |
600 | |
601 | .. code-block:: yaml | |
602 | ||
603 | data_devices: | |
604 | vendor: VendorA | |
605 | limit: 2 | |
606 | ||
b3b6e05e | 607 | Note: `limit` is a last resort and shouldn't be used if it can be avoided. |
f67539c2 TL |
608 | |
609 | ||
610 | Additional Options | |
611 | ------------------ | |
612 | ||
613 | There are multiple optional settings you can use to change the way OSDs are deployed. | |
614 | You can add these options to the base level of a DriveGroup for it to take effect. | |
615 | ||
616 | This example would deploy all OSDs with encryption enabled. | |
617 | ||
618 | .. code-block:: yaml | |
619 | ||
620 | service_type: osd | |
621 | service_id: example_osd_spec | |
622 | placement: | |
623 | host_pattern: '*' | |
624 | data_devices: | |
625 | all: true | |
626 | encrypted: true | |
627 | ||
628 | See a full list in the DriveGroupSpecs | |
629 | ||
630 | .. py:currentmodule:: ceph.deployment.drive_group | |
631 | ||
632 | .. autoclass:: DriveGroupSpec | |
633 | :members: | |
634 | :exclude-members: from_json | |
635 | ||
636 | Examples | |
637 | -------- | |
638 | ||
639 | The simple case | |
640 | ^^^^^^^^^^^^^^^ | |
641 | ||
642 | All nodes with the same setup | |
643 | ||
644 | .. code-block:: none | |
645 | ||
646 | 20 HDDs | |
647 | Vendor: VendorA | |
648 | Model: HDD-123-foo | |
649 | Size: 4TB | |
650 | ||
651 | 2 SSDs | |
652 | Vendor: VendorB | |
653 | Model: MC-55-44-ZX | |
654 | Size: 512GB | |
655 | ||
656 | This is a common setup and can be described quite easily: | |
657 | ||
658 | .. code-block:: yaml | |
659 | ||
660 | service_type: osd | |
661 | service_id: osd_spec_default | |
662 | placement: | |
663 | host_pattern: '*' | |
664 | data_devices: | |
665 | model: HDD-123-foo <- note that HDD-123 would also be valid | |
666 | db_devices: | |
667 | model: MC-55-44-XZ <- same here, MC-55-44 is valid | |
668 | ||
669 | However, we can improve it by reducing the filters on core properties of the drives: | |
670 | ||
671 | .. code-block:: yaml | |
672 | ||
673 | service_type: osd | |
674 | service_id: osd_spec_default | |
675 | placement: | |
676 | host_pattern: '*' | |
677 | data_devices: | |
678 | rotational: 1 | |
679 | db_devices: | |
680 | rotational: 0 | |
681 | ||
682 | Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) | |
683 | ||
684 | If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: | |
685 | ||
686 | .. code-block:: yaml | |
687 | ||
688 | service_type: osd | |
689 | service_id: osd_spec_default | |
690 | placement: | |
691 | host_pattern: '*' | |
692 | data_devices: | |
693 | size: '2TB:' | |
694 | db_devices: | |
695 | size: ':2TB' | |
696 | ||
697 | Note: All of the above DriveGroups are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. | |
698 | ||
699 | ||
700 | The advanced case | |
701 | ^^^^^^^^^^^^^^^^^ | |
702 | ||
703 | Here we have two distinct setups | |
704 | ||
705 | .. code-block:: none | |
706 | ||
707 | 20 HDDs | |
708 | Vendor: VendorA | |
709 | Model: HDD-123-foo | |
710 | Size: 4TB | |
711 | ||
712 | 12 SSDs | |
713 | Vendor: VendorB | |
714 | Model: MC-55-44-ZX | |
715 | Size: 512GB | |
716 | ||
717 | 2 NVMEs | |
718 | Vendor: VendorC | |
719 | Model: NVME-QQQQ-987 | |
720 | Size: 256GB | |
721 | ||
722 | ||
723 | * 20 HDDs should share 2 SSDs | |
724 | * 10 SSDs should share 2 NVMes | |
725 | ||
726 | This can be described with two layouts. | |
727 | ||
728 | .. code-block:: yaml | |
729 | ||
730 | service_type: osd | |
731 | service_id: osd_spec_hdd | |
732 | placement: | |
733 | host_pattern: '*' | |
734 | data_devices: | |
735 | rotational: 0 | |
736 | db_devices: | |
737 | model: MC-55-44-XZ | |
738 | limit: 2 (db_slots is actually to be favoured here, but it's not implemented yet) | |
739 | --- | |
740 | service_type: osd | |
741 | service_id: osd_spec_ssd | |
742 | placement: | |
743 | host_pattern: '*' | |
744 | data_devices: | |
745 | model: MC-55-44-XZ | |
746 | db_devices: | |
747 | vendor: VendorC | |
748 | ||
749 | This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. | |
750 | The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. | |
751 | ||
752 | The advanced case (with non-uniform nodes) | |
753 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
754 | ||
755 | The examples above assumed that all nodes have the same drives. That's however not always the case. | |
756 | ||
757 | Node1-5 | |
758 | ||
759 | .. code-block:: none | |
760 | ||
761 | 20 HDDs | |
762 | Vendor: Intel | |
763 | Model: SSD-123-foo | |
764 | Size: 4TB | |
765 | 2 SSDs | |
766 | Vendor: VendorA | |
767 | Model: MC-55-44-ZX | |
768 | Size: 512GB | |
769 | ||
770 | Node6-10 | |
771 | ||
772 | .. code-block:: none | |
773 | ||
774 | 5 NVMEs | |
775 | Vendor: Intel | |
776 | Model: SSD-123-foo | |
777 | Size: 4TB | |
778 | 20 SSDs | |
779 | Vendor: VendorA | |
780 | Model: MC-55-44-ZX | |
781 | Size: 512GB | |
782 | ||
783 | You can use the 'host_pattern' key in the layout to target certain nodes. Salt target notation helps to keep things easy. | |
784 | ||
785 | ||
786 | .. code-block:: yaml | |
787 | ||
788 | service_type: osd | |
789 | service_id: osd_spec_node_one_to_five | |
790 | placement: | |
791 | host_pattern: 'node[1-5]' | |
792 | data_devices: | |
793 | rotational: 1 | |
794 | db_devices: | |
795 | rotational: 0 | |
796 | --- | |
797 | service_type: osd | |
798 | service_id: osd_spec_six_to_ten | |
799 | placement: | |
800 | host_pattern: 'node[6-10]' | |
801 | data_devices: | |
802 | model: MC-55-44-XZ | |
803 | db_devices: | |
804 | model: SSD-123-foo | |
805 | ||
806 | This applies different OSD specs to different hosts depending on the `host_pattern` key. | |
807 | ||
808 | Dedicated wal + db | |
809 | ^^^^^^^^^^^^^^^^^^ | |
810 | ||
811 | All previous cases co-located the WALs with the DBs. | |
812 | It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. | |
813 | ||
814 | .. code-block:: none | |
815 | ||
816 | 20 HDDs | |
817 | Vendor: VendorA | |
818 | Model: SSD-123-foo | |
819 | Size: 4TB | |
820 | ||
821 | 2 SSDs | |
822 | Vendor: VendorB | |
823 | Model: MC-55-44-ZX | |
824 | Size: 512GB | |
825 | ||
826 | 2 NVMEs | |
827 | Vendor: VendorC | |
828 | Model: NVME-QQQQ-987 | |
829 | Size: 256GB | |
830 | ||
831 | ||
832 | The OSD spec for this case would look like the following (using the `model` filter): | |
833 | ||
834 | .. code-block:: yaml | |
835 | ||
836 | service_type: osd | |
837 | service_id: osd_spec_default | |
838 | placement: | |
839 | host_pattern: '*' | |
840 | data_devices: | |
841 | model: MC-55-44-XZ | |
842 | db_devices: | |
843 | model: SSD-123-foo | |
844 | wal_devices: | |
845 | model: NVME-QQQQ-987 | |
846 | ||
847 | ||
848 | It is also possible to specify directly device paths in specific hosts like the following: | |
849 | ||
850 | .. code-block:: yaml | |
851 | ||
852 | service_type: osd | |
853 | service_id: osd_using_paths | |
854 | placement: | |
855 | hosts: | |
856 | - Node01 | |
857 | - Node02 | |
858 | data_devices: | |
859 | paths: | |
860 | - /dev/sdb | |
861 | db_devices: | |
862 | paths: | |
863 | - /dev/sdc | |
864 | wal_devices: | |
865 | paths: | |
866 | - /dev/sdd | |
867 | ||
868 | ||
869 | This can easily be done with other filters, like `size` or `vendor` as well. | |
870 | ||
871 | Activate existing OSDs | |
872 | ====================== | |
873 | ||
874 | In case the OS of a host was reinstalled, existing OSDs need to be activated | |
875 | again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that | |
876 | activates all existing OSDs on a host. | |
877 | ||
878 | .. prompt:: bash # | |
879 | ||
880 | ceph cephadm osd activate <host>... | |
881 | ||
882 | This will scan all existing disks for OSDs and deploy corresponding daemons. |