]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/configuration/mclock-config-ref.rst
import ceph quincy 17.2.6
[ceph.git] / ceph / doc / rados / configuration / mclock-config-ref.rst
1 ========================
2 mClock Config Reference
3 ========================
4
5 .. index:: mclock; configuration
6
7 QoS support in Ceph is implemented using a queuing scheduler based on `the
8 dmClock algorithm`_. See :ref:`dmclock-qos` section for more details.
9
10 .. note:: The *mclock_scheduler* is supported for BlueStore OSDs. For Filestore
11 OSDs the *osd_op_queue* is set to *wpq* and is enforced even if you
12 attempt to change it.
13
14 To make the usage of mclock more user-friendly and intuitive, mclock config
15 profiles are introduced. The mclock profiles mask the low level details from
16 users, making it easier to configure and use mclock.
17
18 The following input parameters are required for a mclock profile to configure
19 the QoS related parameters:
20
21 * total capacity (IOPS) of each OSD (determined automatically -
22 See `OSD Capacity Determination (Automated)`_)
23
24 * an mclock profile type to enable
25
26 Using the settings in the specified profile, an OSD determines and applies the
27 lower-level mclock and Ceph parameters. The parameters applied by the mclock
28 profile make it possible to tune the QoS between client I/O and background
29 operations in the OSD.
30
31
32 .. index:: mclock; mclock clients
33
34 mClock Client Types
35 ===================
36
37 The mclock scheduler handles requests from different types of Ceph services.
38 Each service can be considered as a type of client from mclock's perspective.
39 Depending on the type of requests handled, mclock clients are classified into
40 the buckets as shown in the table below,
41
42 +------------------------+----------------------------------------------------+
43 | Client Type | Request Types |
44 +========================+====================================================+
45 | Client | I/O requests issued by external clients of Ceph |
46 +------------------------+----------------------------------------------------+
47 | Background recovery | Internal recovery/backfill requests |
48 +------------------------+----------------------------------------------------+
49 | Background best-effort | Internal scrub, snap trim and PG deletion requests |
50 +------------------------+----------------------------------------------------+
51
52 The mclock profiles allocate parameters like reservation, weight and limit
53 (see :ref:`dmclock-qos`) differently for each client type. The next sections
54 describe the mclock profiles in greater detail.
55
56
57 .. index:: mclock; profile definition
58
59 mClock Profiles - Definition and Purpose
60 ========================================
61
62 A mclock profile is *“a configuration setting that when applied on a running
63 Ceph cluster enables the throttling of the operations(IOPS) belonging to
64 different client classes (background recovery, scrub, snaptrim, client op,
65 osd subop)”*.
66
67 The mclock profile uses the capacity limits and the mclock profile type selected
68 by the user to determine the low-level mclock resource control configuration
69 parameters and apply them transparently. Additionally, other Ceph configuration
70 parameters are also applied. Please see sections below for more information.
71
72 The low-level mclock resource control parameters are the *reservation*,
73 *limit*, and *weight* that provide control of the resource shares, as
74 described in the :ref:`dmclock-qos` section.
75
76
77 .. index:: mclock; profile types
78
79 mClock Profile Types
80 ====================
81
82 mclock profiles can be broadly classified into *built-in* and *custom* profiles,
83
84 Built-in Profiles
85 -----------------
86 Users can choose between the following built-in profile types:
87
88 .. note:: The values mentioned in the tables below represent the percentage
89 of the total IOPS capacity of the OSD allocated for the service type.
90
91 By default, the *high_client_ops* profile is enabled to ensure that a larger
92 chunk of the bandwidth allocation goes to client ops. Background recovery ops
93 are given lower allocation (and therefore take a longer time to complete). But
94 there might be instances that necessitate giving higher allocations to either
95 client ops or recovery ops. In order to deal with such a situation, the
96 alternate built-in profiles may be enabled by following the steps mentioned
97 in next sections.
98
99 high_client_ops (*default*)
100 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
101 This profile optimizes client performance over background activities by
102 allocating more reservation and limit to client operations as compared to
103 background operations in the OSD. This profile is enabled by default. The table
104 shows the resource control parameters set by the profile:
105
106 +------------------------+-------------+--------+-------+
107 | Service Type | Reservation | Weight | Limit |
108 +========================+=============+========+=======+
109 | client | 50% | 2 | MAX |
110 +------------------------+-------------+--------+-------+
111 | background recovery | 25% | 1 | 100% |
112 +------------------------+-------------+--------+-------+
113 | background best-effort | 25% | 2 | MAX |
114 +------------------------+-------------+--------+-------+
115
116 high_recovery_ops
117 ^^^^^^^^^^^^^^^^^
118 This profile optimizes background recovery performance as compared to external
119 clients and other background operations within the OSD. This profile, for
120 example, may be enabled by an administrator temporarily to speed-up background
121 recoveries during non-peak hours. The table shows the resource control
122 parameters set by the profile:
123
124 +------------------------+-------------+--------+-------+
125 | Service Type | Reservation | Weight | Limit |
126 +========================+=============+========+=======+
127 | client | 30% | 1 | 80% |
128 +------------------------+-------------+--------+-------+
129 | background recovery | 60% | 2 | 200% |
130 +------------------------+-------------+--------+-------+
131 | background best-effort | 1 (MIN) | 2 | MAX |
132 +------------------------+-------------+--------+-------+
133
134 balanced
135 ^^^^^^^^
136 This profile allocates equal reservation to client I/O operations and background
137 recovery operations. This means that equal I/O resources are allocated to both
138 external and background recovery operations. This profile, for example, may be
139 enabled by an administrator when external client performance requirement is not
140 critical and there are other background operations that still need attention
141 within the OSD.
142
143 +------------------------+-------------+--------+-------+
144 | Service Type | Reservation | Weight | Limit |
145 +========================+=============+========+=======+
146 | client | 40% | 1 | 100% |
147 +------------------------+-------------+--------+-------+
148 | background recovery | 40% | 1 | 150% |
149 +------------------------+-------------+--------+-------+
150 | background best-effort | 20% | 2 | MAX |
151 +------------------------+-------------+--------+-------+
152
153 .. note:: Across the built-in profiles, internal background best-effort clients
154 of mclock include "scrub", "snap trim", and "pg deletion" operations.
155
156
157 Custom Profile
158 --------------
159 This profile gives users complete control over all the mclock configuration
160 parameters. This profile should be used with caution and is meant for advanced
161 users, who understand mclock and Ceph related configuration options.
162
163
164 .. index:: mclock; built-in profiles
165
166 mClock Built-in Profiles - Locked Config Options
167 =================================================
168 The below sections describe the config options that are locked to certain values
169 in order to ensure mClock scheduler is able to provide predictable QoS.
170
171 mClock Config Options
172 ---------------------
173 When a built-in profile is enabled, the mClock scheduler calculates the low
174 level mclock parameters [*reservation*, *weight*, *limit*] based on the profile
175 enabled for each client type. The mclock parameters are calculated based on
176 the max OSD capacity provided beforehand. As a result, the following mclock
177 config parameters cannot be modified when using any of the built-in profiles:
178
179 - :confval:`osd_mclock_scheduler_client_res`
180 - :confval:`osd_mclock_scheduler_client_wgt`
181 - :confval:`osd_mclock_scheduler_client_lim`
182 - :confval:`osd_mclock_scheduler_background_recovery_res`
183 - :confval:`osd_mclock_scheduler_background_recovery_wgt`
184 - :confval:`osd_mclock_scheduler_background_recovery_lim`
185 - :confval:`osd_mclock_scheduler_background_best_effort_res`
186 - :confval:`osd_mclock_scheduler_background_best_effort_wgt`
187 - :confval:`osd_mclock_scheduler_background_best_effort_lim`
188
189 Recovery/Backfill Options
190 -------------------------
191 The following recovery and backfill related Ceph options are set to new defaults
192 for mClock:
193
194 - :confval:`osd_max_backfills`
195 - :confval:`osd_recovery_max_active`
196 - :confval:`osd_recovery_max_active_hdd`
197 - :confval:`osd_recovery_max_active_ssd`
198
199 The following table shows the new mClock defaults. This is done to maximize the
200 impact of the built-in profile:
201
202 +----------------------------------------+------------------+----------------+
203 | Config Option | Original Default | mClock Default |
204 +========================================+==================+================+
205 | :confval:`osd_max_backfills` | 1 | 10 |
206 +----------------------------------------+------------------+----------------+
207 | :confval:`osd_recovery_max_active` | 0 | 0 |
208 +----------------------------------------+------------------+----------------+
209 | :confval:`osd_recovery_max_active_hdd` | 3 | 10 |
210 +----------------------------------------+------------------+----------------+
211 | :confval:`osd_recovery_max_active_ssd` | 10 | 20 |
212 +----------------------------------------+------------------+----------------+
213
214 The above mClock defaults, can be modified if necessary by enabling
215 :confval:`osd_mclock_override_recovery_settings` (default: false). The
216 steps for this is discussed in the
217 `Steps to Modify mClock Max Backfills/Recovery Limits`_ section.
218
219 Sleep Options
220 -------------
221 If any mClock profile (including "custom") is active, the following Ceph config
222 sleep options are disabled (set to 0),
223
224 - :confval:`osd_recovery_sleep`
225 - :confval:`osd_recovery_sleep_hdd`
226 - :confval:`osd_recovery_sleep_ssd`
227 - :confval:`osd_recovery_sleep_hybrid`
228 - :confval:`osd_scrub_sleep`
229 - :confval:`osd_delete_sleep`
230 - :confval:`osd_delete_sleep_hdd`
231 - :confval:`osd_delete_sleep_ssd`
232 - :confval:`osd_delete_sleep_hybrid`
233 - :confval:`osd_snap_trim_sleep`
234 - :confval:`osd_snap_trim_sleep_hdd`
235 - :confval:`osd_snap_trim_sleep_ssd`
236 - :confval:`osd_snap_trim_sleep_hybrid`
237
238 The above sleep options are disabled to ensure that mclock scheduler is able to
239 determine when to pick the next op from its operation queue and transfer it to
240 the operation sequencer. This results in the desired QoS being provided across
241 all its clients.
242
243
244 .. index:: mclock; enable built-in profile
245
246 Steps to Enable mClock Profile
247 ==============================
248
249 As already mentioned, the default mclock profile is set to *high_client_ops*.
250 The other values for the built-in profiles include *balanced* and
251 *high_recovery_ops*.
252
253 If there is a requirement to change the default profile, then the option
254 :confval:`osd_mclock_profile` may be set during runtime by using the following
255 command:
256
257 .. prompt:: bash #
258
259 ceph config set osd.N osd_mclock_profile <value>
260
261 For example, to change the profile to allow faster recoveries on "osd.0", the
262 following command can be used to switch to the *high_recovery_ops* profile:
263
264 .. prompt:: bash #
265
266 ceph config set osd.0 osd_mclock_profile high_recovery_ops
267
268 .. note:: The *custom* profile is not recommended unless you are an advanced
269 user.
270
271 And that's it! You are ready to run workloads on the cluster and check if the
272 QoS requirements are being met.
273
274
275 Switching Between Built-in and Custom Profiles
276 ==============================================
277
278 There may be situations requiring switching from a built-in profile to the
279 *custom* profile and vice-versa. The following sections outline the steps to
280 accomplish this.
281
282 Steps to Switch From a Built-in to the Custom Profile
283 -----------------------------------------------------
284
285 The following command can be used to switch to the *custom* profile:
286
287 .. prompt:: bash #
288
289 ceph config set osd osd_mclock_profile custom
290
291 For example, to change the profile to *custom* on all OSDs, the following
292 command can be used:
293
294 .. prompt:: bash #
295
296 ceph config set osd osd_mclock_profile custom
297
298 After switching to the *custom* profile, the desired mClock configuration
299 option may be modified. For example, to change the client reservation IOPS
300 allocation for a specific OSD (say osd.0), the following command can be used:
301
302 .. prompt:: bash #
303
304 ceph config set osd.0 osd_mclock_scheduler_client_res 3000
305
306 .. important:: Care must be taken to change the reservations of other services like
307 recovery and background best effort accordingly to ensure that the sum of the
308 reservations do not exceed the maximum IOPS capacity of the OSD.
309
310 .. tip:: The reservation and limit parameter allocations are per-shard based on
311 the type of backing device (HDD/SSD) under the OSD. See
312 :confval:`osd_op_num_shards_hdd` and :confval:`osd_op_num_shards_ssd` for
313 more details.
314
315 Steps to Switch From the Custom Profile to a Built-in Profile
316 -------------------------------------------------------------
317
318 Switching from the *custom* profile to a built-in profile requires an
319 intermediate step of removing the custom settings from the central config
320 database for the changes to take effect.
321
322 The following sequence of commands can be used to switch to a built-in profile:
323
324 #. Set the desired built-in profile using:
325
326 .. prompt:: bash #
327
328 ceph config set osd <mClock Configuration Option>
329
330 For example, to set the built-in profile to ``high_client_ops`` on all
331 OSDs, run the following command:
332
333 .. prompt:: bash #
334
335 ceph config set osd osd_mclock_profile high_client_ops
336 #. Determine the existing custom mClock configuration settings in the central
337 config database using the following command:
338
339 .. prompt:: bash #
340
341 ceph config dump
342 #. Remove the custom mClock configuration settings determined in the previous
343 step from the central config database:
344
345 .. prompt:: bash #
346
347 ceph config rm osd <mClock Configuration Option>
348
349 For example, to remove the configuration option
350 :confval:`osd_mclock_scheduler_client_res` that was set on all OSDs, run the
351 following command:
352
353 .. prompt:: bash #
354
355 ceph config rm osd osd_mclock_scheduler_client_res
356 #. After all existing custom mClock configuration settings have been removed
357 from the central config database, the configuration settings pertaining to
358 ``high_client_ops`` will come into effect. For e.g., to verify the settings
359 on osd.0 use:
360
361 .. prompt:: bash #
362
363 ceph config show osd.0
364
365 Switch Temporarily Between mClock Profiles
366 ------------------------------------------
367
368 To switch between mClock profiles on a temporary basis, the following commands
369 may be used to override the settings:
370
371 .. warning:: This section is for advanced users or for experimental testing. The
372 recommendation is to not use the below commands on a running cluster as it
373 could have unexpected outcomes.
374
375 .. note:: The configuration changes on an OSD using the below commands are
376 ephemeral and are lost when it restarts. It is also important to note that
377 the config options overridden using the below commands cannot be modified
378 further using the *ceph config set osd.N ...* command. The changes will not
379 take effect until a given OSD is restarted. This is intentional, as per the
380 config subsystem design. However, any further modification can still be made
381 ephemerally using the commands mentioned below.
382
383 #. Run the *injectargs* command as shown to override the mclock settings:
384
385 .. prompt:: bash #
386
387 ceph tell osd.N injectargs '--<mClock Configuration Option>=<value>'
388
389 For example, the following command overrides the
390 :confval:`osd_mclock_profile` option on osd.0:
391
392 .. prompt:: bash #
393
394 ceph tell osd.0 injectargs '--osd_mclock_profile=high_recovery_ops'
395
396
397 #. An alternate command that can be used is:
398
399 .. prompt:: bash #
400
401 ceph daemon osd.N config set <mClock Configuration Option> <value>
402
403 For example, the following command overrides the
404 :confval:`osd_mclock_profile` option on osd.0:
405
406 .. prompt:: bash #
407
408 ceph daemon osd.0 config set osd_mclock_profile high_recovery_ops
409
410 The individual QoS-related config options for the *custom* profile can also be
411 modified ephemerally using the above commands.
412
413
414 Steps to Modify mClock Max Backfills/Recovery Limits
415 ====================================================
416
417 This section describes the steps to modify the default max backfills or recovery
418 limits if the need arises.
419
420 .. warning:: This section is for advanced users or for experimental testing. The
421 recommendation is to retain the defaults as is on a running cluster as
422 modifying them could have unexpected performance outcomes. The values may
423 be modified only if the cluster is unable to cope/showing poor performance
424 with the default settings or for performing experiments on a test cluster.
425
426 .. important:: The max backfill/recovery options that can be modified are listed
427 in section `Recovery/Backfill Options`_. The modification of the mClock
428 default backfills/recovery limit is gated by the
429 :confval:`osd_mclock_override_recovery_settings` option, which is set to
430 *false* by default. Attempting to modify any default recovery/backfill
431 limits without setting the gating option will reset that option back to the
432 mClock defaults along with a warning message logged in the cluster log. Note
433 that it may take a few seconds for the default value to come back into
434 effect. Verify the limit using the *config show* command as shown below.
435
436 #. Set the :confval:`osd_mclock_override_recovery_settings` config option on all
437 osds to *true* using:
438
439 .. prompt:: bash #
440
441 ceph config set osd osd_mclock_override_recovery_settings true
442
443 #. Set the desired max backfill/recovery option using:
444
445 .. prompt:: bash #
446
447 ceph config set osd osd_max_backfills <value>
448
449 For example, the following command modifies the :confval:`osd_max_backfills`
450 option on all osds to 5.
451
452 .. prompt:: bash #
453
454 ceph config set osd osd_max_backfills 5
455
456 #. Wait for a few seconds and verify the running configuration for a specific
457 OSD using:
458
459 .. prompt:: bash #
460
461 ceph config show osd.N | grep osd_max_backfills
462
463 For example, the following command shows the running configuration of
464 :confval:`osd_max_backfills` on osd.0.
465
466 .. prompt:: bash #
467
468 ceph config show osd.0 | grep osd_max_backfills
469
470 #. Reset the :confval:`osd_mclock_override_recovery_settings` config option on
471 all osds to *false* using:
472
473 .. prompt:: bash #
474
475 ceph config set osd osd_mclock_override_recovery_settings false
476
477
478 OSD Capacity Determination (Automated)
479 ======================================
480
481 The OSD capacity in terms of total IOPS is determined automatically during OSD
482 initialization. This is achieved by running the OSD bench tool and overriding
483 the default value of ``osd_mclock_max_capacity_iops_[hdd, ssd]`` option
484 depending on the device type. No other action/input is expected from the user
485 to set the OSD capacity.
486
487 .. note:: If you wish to manually benchmark OSD(s) or manually tune the
488 Bluestore throttle parameters, see section
489 `Steps to Manually Benchmark an OSD (Optional)`_.
490
491 You may verify the capacity of an OSD after the cluster is brought up by using
492 the following command:
493
494 .. prompt:: bash #
495
496 ceph config show osd.N osd_mclock_max_capacity_iops_[hdd, ssd]
497
498 For example, the following command shows the max capacity for "osd.0" on a Ceph
499 node whose underlying device type is SSD:
500
501 .. prompt:: bash #
502
503 ceph config show osd.0 osd_mclock_max_capacity_iops_ssd
504
505 Mitigation of Unrealistic OSD Capacity From Automated Test
506 ----------------------------------------------------------
507 In certain conditions, the OSD bench tool may show unrealistic/inflated result
508 depending on the drive configuration and other environment related conditions.
509 To mitigate the performance impact due to this unrealistic capacity, a couple
510 of threshold config options depending on the osd's device type are defined and
511 used:
512
513 - :confval:`osd_mclock_iops_capacity_threshold_hdd` = 500
514 - :confval:`osd_mclock_iops_capacity_threshold_ssd` = 80000
515
516 The following automated step is performed:
517
518 Fallback to using default OSD capacity (automated)
519 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
520 If OSD bench reports a measurement that exceeds the above threshold values
521 depending on the underlying device type, the fallback mechanism reverts to the
522 default value of :confval:`osd_mclock_max_capacity_iops_hdd` or
523 :confval:`osd_mclock_max_capacity_iops_ssd`. The threshold config options
524 can be reconfigured based on the type of drive used. Additionally, a cluster
525 warning is logged in case the measurement exceeds the threshold. For example, ::
526
527 2022-10-27T15:30:23.270+0000 7f9b5dbe95c0 0 log_channel(cluster) log [WRN]
528 : OSD bench result of 39546.479392 IOPS exceeded the threshold limit of
529 25000.000000 IOPS for osd.1. IOPS capacity is unchanged at 21500.000000
530 IOPS. The recommendation is to establish the osd's IOPS capacity using other
531 benchmark tools (e.g. Fio) and then override
532 osd_mclock_max_capacity_iops_[hdd|ssd].
533
534 If the default capacity doesn't accurately represent the OSD's capacity, the
535 following additional step is recommended to address this:
536
537 Run custom drive benchmark if defaults are not accurate (manual)
538 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
539 If the default OSD capacity is not accurate, the recommendation is to run a
540 custom benchmark using your preferred tool (e.g. Fio) on the drive and then
541 override the ``osd_mclock_max_capacity_iops_[hdd, ssd]`` option as described
542 in the `Specifying Max OSD Capacity`_ section.
543
544 This step is highly recommended until an alternate mechansim is worked upon.
545
546 Steps to Manually Benchmark an OSD (Optional)
547 =============================================
548
549 .. note:: These steps are only necessary if you want to override the OSD
550 capacity already determined automatically during OSD initialization.
551 Otherwise, you may skip this section entirely.
552
553 .. tip:: If you have already determined the benchmark data and wish to manually
554 override the max osd capacity for an OSD, you may skip to section
555 `Specifying Max OSD Capacity`_.
556
557
558 Any existing benchmarking tool (e.g. Fio) can be used for this purpose. In this
559 case, the steps use the *Ceph OSD Bench* command described in the next section.
560 Regardless of the tool/command used, the steps outlined further below remain the
561 same.
562
563 As already described in the :ref:`dmclock-qos` section, the number of
564 shards and the bluestore's throttle parameters have an impact on the mclock op
565 queues. Therefore, it is critical to set these values carefully in order to
566 maximize the impact of the mclock scheduler.
567
568 :Number of Operational Shards:
569 We recommend using the default number of shards as defined by the
570 configuration options ``osd_op_num_shards``, ``osd_op_num_shards_hdd``, and
571 ``osd_op_num_shards_ssd``. In general, a lower number of shards will increase
572 the impact of the mclock queues.
573
574 :Bluestore Throttle Parameters:
575 We recommend using the default values as defined by
576 :confval:`bluestore_throttle_bytes` and
577 :confval:`bluestore_throttle_deferred_bytes`. But these parameters may also be
578 determined during the benchmarking phase as described below.
579
580 OSD Bench Command Syntax
581 ------------------------
582
583 The :ref:`osd-subsystem` section describes the OSD bench command. The syntax
584 used for benchmarking is shown below :
585
586 .. prompt:: bash #
587
588 ceph tell osd.N bench [TOTAL_BYTES] [BYTES_PER_WRITE] [OBJ_SIZE] [NUM_OBJS]
589
590 where,
591
592 * ``TOTAL_BYTES``: Total number of bytes to write
593 * ``BYTES_PER_WRITE``: Block size per write
594 * ``OBJ_SIZE``: Bytes per object
595 * ``NUM_OBJS``: Number of objects to write
596
597 Benchmarking Test Steps Using OSD Bench
598 ---------------------------------------
599
600 The steps below use the default shards and detail the steps used to determine
601 the correct bluestore throttle values (optional).
602
603 #. Bring up your Ceph cluster and login to the Ceph node hosting the OSDs that
604 you wish to benchmark.
605 #. Run a simple 4KiB random write workload on an OSD using the following
606 commands:
607
608 .. note:: Note that before running the test, caches must be cleared to get an
609 accurate measurement.
610
611 For example, if you are running the benchmark test on osd.0, run the following
612 commands:
613
614 .. prompt:: bash #
615
616 ceph tell osd.0 cache drop
617
618 .. prompt:: bash #
619
620 ceph tell osd.0 bench 12288000 4096 4194304 100
621
622 #. Note the overall throughput(IOPS) obtained from the output of the osd bench
623 command. This value is the baseline throughput(IOPS) when the default
624 bluestore throttle options are in effect.
625 #. If the intent is to determine the bluestore throttle values for your
626 environment, then set the two options, :confval:`bluestore_throttle_bytes`
627 and :confval:`bluestore_throttle_deferred_bytes` to 32 KiB(32768 Bytes) each
628 to begin with. Otherwise, you may skip to the next section.
629 #. Run the 4KiB random write test as before using OSD bench.
630 #. Note the overall throughput from the output and compare the value
631 against the baseline throughput recorded in step 3.
632 #. If the throughput doesn't match with the baseline, increment the bluestore
633 throttle options by 2x and repeat steps 5 through 7 until the obtained
634 throughput is very close to the baseline value.
635
636 For example, during benchmarking on a machine with NVMe SSDs, a value of 256 KiB
637 for both bluestore throttle and deferred bytes was determined to maximize the
638 impact of mclock. For HDDs, the corresponding value was 40 MiB, where the
639 overall throughput was roughly equal to the baseline throughput. Note that in
640 general for HDDs, the bluestore throttle values are expected to be higher when
641 compared to SSDs.
642
643
644 Specifying Max OSD Capacity
645 ----------------------------
646
647 The steps in this section may be performed only if you want to override the
648 max osd capacity automatically set during OSD initialization. The option
649 ``osd_mclock_max_capacity_iops_[hdd, ssd]`` for an OSD can be set by running the
650 following command:
651
652 .. prompt:: bash #
653
654 ceph config set osd.N osd_mclock_max_capacity_iops_[hdd,ssd] <value>
655
656 For example, the following command sets the max capacity for a specific OSD
657 (say "osd.0") whose underlying device type is HDD to 350 IOPS:
658
659 .. prompt:: bash #
660
661 ceph config set osd.0 osd_mclock_max_capacity_iops_hdd 350
662
663 Alternatively, you may specify the max capacity for OSDs within the Ceph
664 configuration file under the respective [osd.N] section. See
665 :ref:`ceph-conf-settings` for more details.
666
667
668 .. index:: mclock; config settings
669
670 mClock Config Options
671 =====================
672
673 .. confval:: osd_mclock_profile
674 .. confval:: osd_mclock_max_capacity_iops_hdd
675 .. confval:: osd_mclock_max_capacity_iops_ssd
676 .. confval:: osd_mclock_cost_per_io_usec
677 .. confval:: osd_mclock_cost_per_io_usec_hdd
678 .. confval:: osd_mclock_cost_per_io_usec_ssd
679 .. confval:: osd_mclock_cost_per_byte_usec
680 .. confval:: osd_mclock_cost_per_byte_usec_hdd
681 .. confval:: osd_mclock_cost_per_byte_usec_ssd
682 .. confval:: osd_mclock_force_run_benchmark_on_init
683 .. confval:: osd_mclock_skip_benchmark
684 .. confval:: osd_mclock_override_recovery_settings
685 .. confval:: osd_mclock_iops_capacity_threshold_hdd
686 .. confval:: osd_mclock_iops_capacity_threshold_ssd
687
688 .. _the dmClock algorithm: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf