method of storing and retrieving data, Ceph avoids a single point of failure, a
performance bottleneck, and a physical limit to its scalability.
-CRUSH requires a map of your cluster, and uses the CRUSH map to pseudo-randomly
-store and retrieve data in OSDs with a uniform distribution of data across the
-cluster. For a detailed discussion of CRUSH, see
+CRUSH requires a map of your cluster, and uses the CRUSH map to pseudo-randomly
+store and retrieve data in OSDs with a uniform distribution of data across the
+cluster. For a detailed discussion of CRUSH, see
`CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_
CRUSH maps contain a list of :abbr:`OSDs (Object Storage Devices)`, a list of
#. Note that the order of the keys does not matter.
#. The key name (left of ``=``) must be a valid CRUSH ``type``. By default
- these include root, datacenter, room, row, pod, pdu, rack, chassis and host,
- but those types can be customized to be anything appropriate by modifying
+ these include root, datacenter, room, row, pod, pdu, rack, chassis and host,
+ but those types can be customized to be anything appropriate by modifying
the CRUSH map.
#. Not all keys need to be specified. For example, by default, Ceph
automatically sets a ``ceph-osd`` daemon's location to be
The crush location for an OSD is normally expressed via the ``crush location``
config option being set in the ``ceph.conf`` file. Each time the OSD starts,
it verifies it is in the correct location in the CRUSH map and, if it is not,
-it moved itself. To disable this automatic CRUSH map management, add the
+it moves itself. To disable this automatic CRUSH map management, add the
following to your configuration file in the ``[osd]`` section::
osd crush update on start = false
---------------------
A customized location hook can be used to generate a more complete
-crush location on startup. The sample ``ceph-crush-location`` utility
-will generate a CRUSH location string for a given daemon. The
-location is based on, in order of preference:
+crush location on startup. The crush location is based on, in order
+of preference:
#. A ``crush location`` option in ceph.conf.
#. A default of ``root=default host=HOSTNAME`` where the hostname is
generated with the ``hostname -s`` command.
This is not useful by itself, as the OSD itself has the exact same
-behavior. However, the script can be modified to provide additional
+behavior. However, a script can be written to provide additional
location fields (for example, the rack or datacenter), and then the
hook enabled via the config option::
This hook is passed several arguments (below) and should output a single line
to stdout with the CRUSH location description.::
- $ ceph-crush-location --cluster CLUSTER --id ID --type TYPE
+ --cluster CLUSTER --id ID --type TYPE
where the cluster name is typically 'ceph', the id is the daemon
-identifier (the OSD number), and the daemon type is typically ``osd``.
+identifier (e.g., the OSD number or daemon identifier), and the daemon
+type is ``osd``, ``mds``, or similar.
+
+For example, a simple hook that additionally specified a rack location
+based on a hypothetical file ``/etc/rack`` might be::
+
+ #!/bin/sh
+ echo "host=$(hostname -s) rack=$(cat /etc/rack) root=default"
CRUSH structure
a name, normally ``osd.N`` where ``N`` is the device id.
Devices may also have a *device class* associated with them (e.g.,
-``hdd`` or ``ssd``), allowing them to be conveniently targetted by a
+``hdd`` or ``ssd``), allowing them to be conveniently targeted by a
crush rule.
Types and Buckets
.. ditaa::
+-----------------+
- | {o}root default |
+ |{o}root default |
+--------+--------+
|
- +---------------+---------------+
+ +---------------+---------------+
+ | |
+ +------+------+ +------+------+
+ |{o}host foo | |{o}host bar |
+ +------+------+ +------+------+
| |
- +-------+-------+ +-----+-------+
- | {o}host foo | | {o}host bar |
- +-------+-------+ +-----+-------+
- | |
+-------+-------+ +-------+-------+
| | | |
+-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
- | osd.0 | | osd.1 | | osd.2 | | osd.3 |
+ | osd.0 | | osd.1 | | osd.2 | | osd.3 |
+-----------+ +-----------+ +-----------+ +-----------+
Each node (device or bucket) in the hierarchy has a *weight*
erasure coded), the *failure domain*, and optionally a *device class*.
In rare cases rules must be written by hand by manually editing the
CRUSH map.
-
+
You can see what rules are defined for your cluster with::
ceph osd crush rule ls
#. A **per-pool** weight set is more flexible in that it allows
placement to be optimized for each data pool. Additionally,
weights can be adjusted for each position of placement, allowing
- the optimizer to correct for a suble skew of data toward devices
+ the optimizer to correct for a subtle skew of data toward devices
with small weights relative to their peers (and effect that is
usually only apparently in very large clusters but which can cause
balancing problems).
``name``
-:Description: The full name of the OSD.
+:Description: The full name of the OSD.
:Type: String
:Required: Yes
:Example: ``osd.0``
``bucket-type``
-:Description: You may specify the OSD's location in the CRUSH hierarchy.
+:Description: You may specify the OSD's location in the CRUSH hierarchy.
:Type: Key/value pairs.
:Required: No
:Example: ``datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1``
``name``
-:Description: The full name of the OSD.
+:Description: The full name of the OSD.
:Type: String
:Required: Yes
:Example: ``osd.0``
``weight``
-:Description: The CRUSH weight for the OSD.
+:Description: The CRUSH weight for the OSD.
:Type: Double
:Required: Yes
:Example: ``2.0``
``name``
-:Description: The full name of the OSD.
+:Description: The full name of the OSD.
:Type: String
:Required: Yes
:Example: ``osd.0``
``bucket-type``
-:Description: You may specify the bucket's location in the CRUSH hierarchy.
+:Description: You may specify the bucket's location in the CRUSH hierarchy.
:Type: Key/value pairs.
:Required: No
:Example: ``datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1``
ceph osd crush rule rm {rule-name}
+.. _crush-map-tunables:
+
Tunables
========
CRUSH is sometimes unable to find a mapping. The optimal value (in
terms of computational cost and correctness) is 1.
-Migration impact:
+Migration impact:
* For existing clusters that have lots of existing data, changing
from 0 to 1 will cause a lot of data to move; a value of 4 or 5
For the change to take effect, you will need to restart the monitors, or
apply the option to running monitors with::
- ceph tell mon.\* injectargs --no-mon-warn-on-legacy-crush-tunables
+ ceph tell mon.\* config set mon_warn_on_legacy_crush_tunables false
A few important points
effectively grandfathered in, and will misbehave if they do not
support the new feature.
* If the CRUSH tunables are set to non-legacy values and then later
- changed back to the defult values, ``ceph-osd`` daemons will not be
+ changed back to the default values, ``ceph-osd`` daemons will not be
required to support the feature. However, the OSD peering process
requires examining and understanding old maps. Therefore, you
should not run old versions of the ``ceph-osd`` daemon