.. index:: mclock; configuration
-Mclock profiles mask the low level details from users, making it
-easier for them to configure mclock.
+QoS support in Ceph is implemented using a queuing scheduler based on `the
+dmClock algorithm`_. See :ref:`dmclock-qos` section for more details.
+
+.. note:: The *mclock_scheduler* is supported for BlueStore OSDs. For Filestore
+ OSDs the *osd_op_queue* is set to *wpq* and is enforced even if you
+ attempt to change it.
+
+To make the usage of mclock more user-friendly and intuitive, mclock config
+profiles are introduced. The mclock profiles mask the low level details from
+users, making it easier to configure and use mclock.
The following input parameters are required for a mclock profile to configure
the QoS related parameters:
-* total capacity (IOPS) of each OSD (determined automatically)
+* total capacity (IOPS) of each OSD (determined automatically -
+ See `OSD Capacity Determination (Automated)`_)
* an mclock profile type to enable
-Using the settings in the specified profile, the OSD determines and applies the
+Using the settings in the specified profile, an OSD determines and applies the
lower-level mclock and Ceph parameters. The parameters applied by the mclock
-profile make it possible to tune the QoS between client I/O, recovery/backfill
-operations, and other background operations (for example, scrub, snap trim, and
-PG deletion). These background activities are considered best-effort internal
-clients of Ceph.
+profile make it possible to tune the QoS between client I/O and background
+operations in the OSD.
+
+
+.. index:: mclock; mclock clients
+
+mClock Client Types
+===================
+
+The mclock scheduler handles requests from different types of Ceph services.
+Each service can be considered as a type of client from mclock's perspective.
+Depending on the type of requests handled, mclock clients are classified into
+the buckets as shown in the table below,
+
++------------------------+----------------------------------------------------+
+| Client Type | Request Types |
++========================+====================================================+
+| Client | I/O requests issued by external clients of Ceph |
++------------------------+----------------------------------------------------+
+| Background recovery | Internal recovery/backfill requests |
++------------------------+----------------------------------------------------+
+| Background best-effort | Internal scrub, snap trim and PG deletion requests |
++------------------------+----------------------------------------------------+
+
+The mclock profiles allocate parameters like reservation, weight and limit
+(see :ref:`dmclock-qos`) differently for each client type. The next sections
+describe the mclock profiles in greater detail.
.. index:: mclock; profile definition
osd subop)”*.
The mclock profile uses the capacity limits and the mclock profile type selected
-by the user to determine the low-level mclock resource control parameters.
-
-Depending on the profile type, lower-level mclock resource-control parameters
-and some Ceph-configuration parameters are transparently applied.
+by the user to determine the low-level mclock resource control configuration
+parameters and apply them transparently. Additionally, other Ceph configuration
+parameters are also applied. Please see sections below for more information.
The low-level mclock resource control parameters are the *reservation*,
*limit*, and *weight* that provide control of the resource shares, as
mClock Profile Types
====================
-mclock profiles can be broadly classified into two types,
-
-- **Built-in**: Users can choose between the following built-in profile types:
-
- - **high_client_ops** (*default*):
- This profile allocates more reservation and limit to external-client ops
- as compared to background recoveries and other internal clients within
- Ceph. This profile is enabled by default.
- - **high_recovery_ops**:
- This profile allocates more reservation to background recoveries as
- compared to external clients and other internal clients within Ceph. For
- example, an admin may enable this profile temporarily to speed-up background
- recoveries during non-peak hours.
- - **balanced**:
- This profile allocates equal reservation to client ops and background
- recovery ops.
-
-- **Custom**: This profile gives users complete control over all the mclock
- configuration parameters. Using this profile is not recommended without
- a deep understanding of mclock and related Ceph-configuration options.
-
-.. note:: Across the built-in profiles, internal clients of mclock (for example
- "scrub", "snap trim", and "pg deletion") are given slightly lower
- reservations, but higher weight and no limit. This ensures that
- these operations are able to complete quickly if there are no other
- competing services.
+mclock profiles can be broadly classified into *built-in* and *custom* profiles,
+
+Built-in Profiles
+-----------------
+Users can choose between the following built-in profile types:
+
+.. note:: The values mentioned in the tables below represent the percentage
+ of the total IOPS capacity of the OSD allocated for the service type.
+
+high_client_ops (*default*)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+This profile optimizes client performance over background activities by
+allocating more reservation and limit to client operations as compared to
+background operations in the OSD. This profile is enabled by default. The table
+shows the resource control parameters set by the profile:
+
++------------------------+-------------+--------+-------+
+| Service Type | Reservation | Weight | Limit |
++========================+=============+========+=======+
+| client | 50% | 2 | MAX |
++------------------------+-------------+--------+-------+
+| background recovery | 25% | 1 | 100% |
++------------------------+-------------+--------+-------+
+| background best-effort | 25% | 1 | MAX |
++------------------------+-------------+--------+-------+
+
+high_recovery_ops
+^^^^^^^^^^^^^^^^^
+This profile optimizes background recovery performance as compared to external
+clients and other background operations within the OSD. This profile, for
+example, may be enabled by an administrator temporarily to speed-up background
+recoveries during non-peak hours. The table shows the resource control
+parameters set by the profile:
+
++------------------------+-------------+--------+-------+
+| Service Type | Reservation | Weight | Limit |
++========================+=============+========+=======+
+| client | 30% | 1 | 80% |
++------------------------+-------------+--------+-------+
+| background recovery | 60% | 2 | 200% |
++------------------------+-------------+--------+-------+
+| background best-effort | 1 (MIN) | 1 | MAX |
++------------------------+-------------+--------+-------+
+
+balanced
+^^^^^^^^
+This profile allocates equal reservation to client I/O operations and background
+recovery operations. This means that equal I/O resources are allocated to both
+external and background recovery operations. This profile, for example, may be
+enabled by an administrator when external client performance requirement is not
+critical and there are other background operations that still need attention
+within the OSD.
+
++------------------------+-------------+--------+-------+
+| Service Type | Reservation | Weight | Limit |
++========================+=============+========+=======+
+| client | 40% | 1 | 100% |
++------------------------+-------------+--------+-------+
+| background recovery | 40% | 1 | 150% |
++------------------------+-------------+--------+-------+
+| background best-effort | 20% | 1 | MAX |
++------------------------+-------------+--------+-------+
+
+.. note:: Across the built-in profiles, internal background best-effort clients
+ of mclock ("scrub", "snap trim", and "pg deletion") are given lower
+ reservations but no limits(MAX). This ensures that requests from such
+ clients are able to complete quickly if there are no other competing
+ operations.
+
+
+Custom Profile
+--------------
+This profile gives users complete control over all the mclock configuration
+parameters. This profile should be used with caution and is meant for advanced
+users, who understand mclock and Ceph related configuration options.
.. index:: mclock; built-in profiles
chunk of the bandwidth allocation goes to client ops. Background recovery ops
are given lower allocation (and therefore take a longer time to complete). But
there might be instances that necessitate giving higher allocations to either
-client ops or recovery ops. In order to deal with such a situation, you can
-enable one of the alternate built-in profiles by following the steps mentioned
+client ops or recovery ops. In order to deal with such a situation, the
+alternate built-in profiles may be enabled by following the steps mentioned
in the next section.
If any mClock profile (including "custom") is active, the following Ceph config
initialization. This is achieved by running the OSD bench tool and overriding
the default value of ``osd_mclock_max_capacity_iops_[hdd, ssd]`` option
depending on the device type. No other action/input is expected from the user
-to set the OSD capacity. You may verify the capacity of an OSD after the
-cluster is brought up by using the following command:
+to set the OSD capacity.
+
+.. note:: If you wish to manually benchmark OSD(s) or manually tune the
+ Bluestore throttle parameters, see section
+ `Steps to Manually Benchmark an OSD (Optional)`_.
+
+You may verify the capacity of an OSD after the cluster is brought up by using
+the following command:
.. prompt:: bash #
determined during the benchmarking phase as described below.
OSD Bench Command Syntax
-````````````````````````
+------------------------
The :ref:`osd-subsystem` section describes the OSD bench command. The syntax
used for benchmarking is shown below :
* ``NUM_OBJS``: Number of objects to write
Benchmarking Test Steps Using OSD Bench
-```````````````````````````````````````
+---------------------------------------
The steps below use the default shards and detail the steps used to determine
the correct bluestore throttle values (optional).
Specifying Max OSD Capacity
-````````````````````````````
+----------------------------
The steps in this section may be performed only if you want to override the
max osd capacity automatically set during OSD initialization. The option
.. confval:: osd_mclock_cost_per_byte_usec
.. confval:: osd_mclock_cost_per_byte_usec_hdd
.. confval:: osd_mclock_cost_per_byte_usec_ssd
+.. confval:: osd_mclock_force_run_benchmark_on_init
+.. confval:: osd_mclock_skip_benchmark
+
+.. _the dmClock algorithm: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf