ceph/doc/rados/operations/crush-map-edits.rst

   1 Manually editing a CRUSH Map
   2 ============================
   3
   4 .. note:: Manually editing the CRUSH map is considered an advanced
   5           administrator operation.  All CRUSH changes that are
   6           necessary for the overwhelming majority of installations are
   7           possible via the standard ceph CLI and do not require manual
   8           CRUSH map edits.  If you have identified a use case where
   9           manual edits *are* necessary, consider contacting the Ceph
  10           developers so that future versions of Ceph can make this
  11           unnecessary.
  12
  13 To edit an existing CRUSH map:
  14
  15 #. `Get the CRUSH map`_.
  16 #. `Decompile`_ the CRUSH map.
  17 #. Edit at least one of `Devices`_, `Buckets`_ and `Rules`_.
  18 #. `Recompile`_ the CRUSH map.
  19 #. `Set the CRUSH map`_.
  20
  21 For details on setting the CRUSH map rule for a specific pool, see `Set
  22 Pool Values`_.
  23
  24 .. _Get the CRUSH map: #getcrushmap
  25 .. _Decompile: #decompilecrushmap
  26 .. _Devices: #crushmapdevices
  27 .. _Buckets: #crushmapbuckets
  28 .. _Rules: #crushmaprules
  29 .. _Recompile: #compilecrushmap
  30 .. _Set the CRUSH map: #setcrushmap
  31 .. _Set Pool Values: ../pools#setpoolvalues
  32
  33 .. _getcrushmap:
  34
  35 Get a CRUSH Map
  36 ---------------
  37
  38 To get the CRUSH map for your cluster, execute the following::
  39
  40         ceph osd getcrushmap -o {compiled-crushmap-filename}
  41
  42 Ceph will output (-o) a compiled CRUSH map to the filename you specified. Since
  43 the CRUSH map is in a compiled form, you must decompile it first before you can
  44 edit it.
  45
  46 .. _decompilecrushmap:
  47
  48 Decompile a CRUSH Map
  49 ---------------------
  50
  51 To decompile a CRUSH map, execute the following::
  52
  53         crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
  54
  55
  56 Sections
  57 --------
  58
  59 There are six main sections to a CRUSH Map.
  60
  61 #. **tunables:** The preamble at the top of the map described any *tunables*
  62    for CRUSH behavior that vary from the historical/legacy CRUSH behavior. These
  63    correct for old bugs, optimizations, or other changes in behavior that have
  64    been made over the years to improve CRUSH's behavior.
  65
  66 #. **devices:** Devices are individual ``ceph-osd`` daemons that can
  67    store data.
  68
  69 #. **types**: Bucket ``types`` define the types of buckets used in
  70    your CRUSH hierarchy. Buckets consist of a hierarchical aggregation
  71    of storage locations (e.g., rows, racks, chassis, hosts, etc.) and
  72    their assigned weights.
  73
  74 #. **buckets:** Once you define bucket types, you must define each node
  75    in the hierarchy, its type, and which devices or other nodes it
  76    containes.
  77
  78 #. **rules:** Rules define policy about how data is distributed across
  79    devices in the hierarchy.
  80
  81 #. **choose_args:** Choose_args are alternative weights associated with
  82    the hierarchy that have been adjusted to optimize data placement.  A single
  83    choose_args map can be used for the entire cluster, or one can be
  84    created for each individual pool.
  85
  86
  87 .. _crushmapdevices:
  88
  89 CRUSH Map Devices
  90 -----------------
  91
  92 Devices are individual ``ceph-osd`` daemons that can store data.  You
  93 will normally have one defined here for each OSD daemon in your
  94 cluster.  Devices are identified by an id (a non-negative integer) and
  95 a name, normally ``osd.N`` where ``N`` is the device id.
  96
  97 Devices may also have a *device class* associated with them (e.g.,
  98 ``hdd`` or ``ssd``), allowing them to be conveniently targetted by a
  99 crush rule.
 100
 101 ::
 102
 103         # devices
 104         device {num} {osd.name} [class {class}]
 105
 106 For example::
 107
 108         # devices
 109         device 0 osd.0 class ssd
 110         device 1 osd.1 class hdd
 111         device 2 osd.2
 112         device 3 osd.3
 113
 114 In most cases, each device maps to a single ``ceph-osd`` daemon.  This
 115 is normally a single storage device, a pair of devices (for example,
 116 one for data and one for a journal or metadata), or in some cases a
 117 small RAID device.
 118
 119
 120
 121
 122
 123 CRUSH Map Bucket Types
 124 ----------------------
 125
 126 The second list in the CRUSH map defines 'bucket' types. Buckets facilitate
 127 a hierarchy of nodes and leaves. Node (or non-leaf) buckets typically represent
 128 physical locations in a hierarchy. Nodes aggregate other nodes or leaves.
 129 Leaf buckets represent ``ceph-osd`` daemons and their corresponding storage
 130 media.
 131
 132 .. tip:: The term "bucket" used in the context of CRUSH means a node in
 133    the hierarchy, i.e. a location or a piece of physical hardware. It
 134    is a different concept from the term "bucket" when used in the
 135    context of RADOS Gateway APIs.
 136
 137 To add a bucket type to the CRUSH map, create a new line under your list of
 138 bucket types. Enter ``type`` followed by a unique numeric ID and a bucket name.
 139 By convention, there is one leaf bucket and it is ``type 0``;  however, you may
 140 give it any name you like (e.g., osd, disk, drive, storage, etc.)::
 141
 142         #types
 143         type {num} {bucket-name}
 144
 145 For example::
 146
 147         # types
 148         type 0 osd
 149         type 1 host
 150         type 2 chassis
 151         type 3 rack
 152         type 4 row
 153         type 5 pdu
 154         type 6 pod
 155         type 7 room
 156         type 8 datacenter
 157         type 9 region
 158         type 10 root
 159
 160
 161
 162 .. _crushmapbuckets:
 163
 164 CRUSH Map Bucket Hierarchy
 165 --------------------------
 166
 167 The CRUSH algorithm distributes data objects among storage devices according
 168 to a per-device weight value, approximating a uniform probability distribution.
 169 CRUSH distributes objects and their replicas according to the hierarchical
 170 cluster map you define. Your CRUSH map represents the available storage
 171 devices and the logical elements that contain them.
 172
 173 To map placement groups to OSDs across failure domains, a CRUSH map defines a
 174 hierarchical list of bucket types (i.e., under ``#types`` in the generated CRUSH
 175 map). The purpose of creating a bucket hierarchy is to segregate the
 176 leaf nodes by their failure domains, such as hosts, chassis, racks, power
 177 distribution units, pods, rows, rooms, and data centers. With the exception of
 178 the leaf nodes representing OSDs, the rest of the hierarchy is arbitrary, and
 179 you may define it according to your own needs.
 180
 181 We recommend adapting your CRUSH map to your firms's hardware naming conventions
 182 and using instances names that reflect the physical hardware. Your naming
 183 practice can make it easier to administer the cluster and troubleshoot
 184 problems when an OSD and/or other hardware malfunctions and the administrator
 185 need access to physical hardware.
 186
 187 In the following example, the bucket hierarchy has a leaf bucket named ``osd``,
 188 and two node buckets named ``host`` and ``rack`` respectively.
 189
 190 .. ditaa::
 191                            +-----------+
 192                            | {o}rack   |
 193                            |   Bucket  |
 194                            +-----+-----+
 195                                  |
 196                  +---------------+---------------+
 197                  |                               |
 198            +-----+-----+                   +-----+-----+
 199            | {o}host   |                   | {o}host   |
 200            |   Bucket  |                   |   Bucket  |
 201            +-----+-----+                   +-----+-----+
 202                  |                               |
 203          +-------+-------+               +-------+-------+
 204          |               |               |               |
 205    +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
 206    |    osd    |   |    osd    |   |    osd    |   |    osd    |
 207    |   Bucket  |   |   Bucket  |   |   Bucket  |   |   Bucket  |
 208    +-----------+   +-----------+   +-----------+   +-----------+
 209
 210 .. note:: The higher numbered ``rack`` bucket type aggregates the lower
 211    numbered ``host`` bucket type.
 212
 213 Since leaf nodes reflect storage devices declared under the ``#devices`` list
 214 at the beginning of the CRUSH map, you do not need to declare them as bucket
 215 instances. The second lowest bucket type in your hierarchy usually aggregates
 216 the devices (i.e., it's usually the computer containing the storage media, and
 217 uses whatever term you prefer to describe it, such as  "node", "computer",
 218 "server," "host", "machine", etc.). In high density environments, it is
 219 increasingly common to see multiple hosts/nodes per chassis. You should account
 220 for chassis failure too--e.g., the need to pull a chassis if a node fails may
 221 result in bringing down numerous hosts/nodes and their OSDs.
 222
 223 When declaring a bucket instance, you must specify its type, give it a unique
 224 name (string), assign it a unique ID expressed as a negative integer (optional),
 225 specify a weight relative to the total capacity/capability of its item(s),
 226 specify the bucket algorithm (usually ``straw``), and the hash (usually ``0``,
 227 reflecting hash algorithm ``rjenkins1``). A bucket may have one or more items.
 228 The items may consist of node buckets or leaves. Items may have a weight that
 229 reflects the relative weight of the item.
 230
 231 You may declare a node bucket with the following syntax::
 232
 233         [bucket-type] [bucket-name] {
 234                 id [a unique negative numeric ID]
 235                 weight [the relative capacity/capability of the item(s)]
 236                 alg [the bucket type: uniform | list | tree | straw ]
 237                 hash [the hash type: 0 by default]
 238                 item [item-name] weight [weight]
 239         }
 240
 241 For example, using the diagram above, we would define two host buckets
 242 and one rack bucket. The OSDs are declared as items within the host buckets::
 243
 244         host node1 {
 245                 id -1
 246                 alg straw
 247                 hash 0
 248                 item osd.0 weight 1.00
 249                 item osd.1 weight 1.00
 250         }
 251
 252         host node2 {
 253                 id -2
 254                 alg straw
 255                 hash 0
 256                 item osd.2 weight 1.00
 257                 item osd.3 weight 1.00
 258         }
 259
 260         rack rack1 {
 261                 id -3
 262                 alg straw
 263                 hash 0
 264                 item node1 weight 2.00
 265                 item node2 weight 2.00
 266         }
 267
 268 .. note:: In the foregoing example, note that the rack bucket does not contain
 269    any OSDs. Rather it contains lower level host buckets, and includes the
 270    sum total of their weight in the item entry.
 271
 272 .. topic:: Bucket Types
 273
 274    Ceph supports four bucket types, each representing a tradeoff between
 275    performance and reorganization efficiency. If you are unsure of which bucket
 276    type to use, we recommend using a ``straw`` bucket.  For a detailed
 277    discussion of bucket types, refer to
 278    `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
 279    and more specifically to **Section 3.4**. The bucket types are:
 280
 281         #. **Uniform:** Uniform buckets aggregate devices with **exactly** the same
 282            weight. For example, when firms commission or decommission hardware, they
 283            typically do so with many machines that have exactly the same physical
 284            configuration (e.g., bulk purchases). When storage devices have exactly
 285            the same weight, you may use the ``uniform`` bucket type, which allows
 286            CRUSH to map replicas into uniform buckets in constant time. With
 287            non-uniform weights, you should use another bucket algorithm.
 288
 289         #. **List**: List buckets aggregate their content as linked lists. Based on
 290            the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`P` algorithm,
 291            a list is a natural and intuitive choice for an **expanding cluster**:
 292            either an object is relocated to the newest device with some appropriate
 293            probability, or it remains on the older devices as before. The result is
 294            optimal data migration when items are added to the bucket. Items removed
 295            from the middle or tail of the list, however, can result in a signiﬁcant
 296            amount of unnecessary movement, making list buckets most suitable for
 297            circumstances in which they **never (or very rarely) shrink**.
 298
 299         #. **Tree**: Tree buckets use a binary search tree. They are more efficient
 300            than list buckets when a bucket contains a larger set of items. Based on
 301            the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`R` algorithm,
 302            tree buckets reduce the placement time to O(log :sub:`n`), making them
 303            suitable for managing much larger sets of devices or nested buckets.
 304
 305         #. **Straw:** List and Tree buckets use a divide and conquer strategy
 306            in a way that either gives certain items precedence (e.g., those
 307            at the beginning of a list) or obviates the need to consider entire
 308            subtrees of items at all. That improves the performance of the replica
 309            placement process, but can also introduce suboptimal reorganization
 310            behavior when the contents of a bucket change due an addition, removal,
 311            or re-weighting of an item. The straw bucket type allows all items to
 312            fairly “compete” against each other for replica placement through a
 313            process analogous to a draw of straws.
 314
 315 .. topic:: Hash
 316
 317    Each bucket uses a hash algorithm. Currently, Ceph supports ``rjenkins1``.
 318    Enter ``0`` as your hash setting to select ``rjenkins1``.
 319
 320
 321 .. _weightingbucketitems:
 322
 323 .. topic:: Weighting Bucket Items
 324
 325    Ceph expresses bucket weights as doubles, which allows for fine
 326    weighting. A weight is the relative difference between device capacities. We
 327    recommend using ``1.00`` as the relative weight for a 1TB storage device.
 328    In such a scenario, a weight of ``0.5`` would represent approximately 500GB,
 329    and a weight of ``3.00`` would represent approximately 3TB. Higher level
 330    buckets have a weight that is the sum total of the leaf items aggregated by
 331    the bucket.
 332
 333    A bucket item weight is one dimensional, but you may also calculate your
 334    item weights to reflect the performance of the storage drive. For example,
 335    if you have many 1TB drives where some have relatively low data transfer
 336    rate and the others have a relatively high data transfer rate, you may
 337    weight them differently, even though they have the same capacity (e.g.,
 338    a weight of 0.80 for the first set of drives with lower total throughput,
 339    and 1.20 for the second set of drives with higher total throughput).
 340
 341
 342 .. _crushmaprules:
 343
 344 CRUSH Map Rules
 345 ---------------
 346
 347 CRUSH maps support the notion of 'CRUSH rules', which are the rules that
 348 determine data placement for a pool. The default CRUSH map has a rule for each
 349 pool. For large clusters, you will likely create many pools where each pool may
 350 have its own non-default CRUSH rule.
 351
 352 .. note:: In most cases, you will not need to modify the default rule. When
 353    you create a new pool, by default the rule will be set to ``0``.
 354
 355
 356 CRUSH rules define placement and replication strategies or distribution policies
 357 that allow you to specify exactly how CRUSH places object replicas. For
 358 example, you might create a rule selecting a pair of targets for 2-way
 359 mirroring, another rule for selecting three targets in two different data
 360 centers for 3-way mirroring, and yet another rule for erasure coding over six
 361 storage devices. For a detailed discussion of CRUSH rules, refer to
 362 `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
 363 and more specifically to **Section 3.2**.
 364
 365 A rule takes the following form::
 366
 367         rule <rulename> {
 368
 369                 ruleset <ruleset>
 370                 type [ replicated | erasure ]
 371                 min_size <min-size>
 372                 max_size <max-size>
 373                 step take <bucket-name> [class <device-class>]
 374                 step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
 375                 step emit
 376         }
 377
 378
 379 ``ruleset``
 380
 381 :Description: A unique whole number for identifying the rule. The name ``ruleset``
 382               is a carry-over from the past, when it was possible to have multiple
 383               CRUSH rules per pool.
 384
 385 :Purpose: A component of the rule mask.
 386 :Type: Integer
 387 :Required: Yes
 388 :Default: 0
 389
 390
 391 ``type``
 392
 393 :Description: Describes a rule for either a storage drive (replicated)
 394               or a RAID.
 395
 396 :Purpose: A component of the rule mask.
 397 :Type: String
 398 :Required: Yes
 399 :Default: ``replicated``
 400 :Valid Values: Currently only ``replicated`` and ``erasure``
 401
 402 ``min_size``
 403
 404 :Description: If a pool makes fewer replicas than this number, CRUSH will
 405               **NOT** select this rule.
 406
 407 :Type: Integer
 408 :Purpose: A component of the rule mask.
 409 :Required: Yes
 410 :Default: ``1``
 411
 412 ``max_size``
 413
 414 :Description: If a pool makes more replicas than this number, CRUSH will
 415               **NOT** select this rule.
 416
 417 :Type: Integer
 418 :Purpose: A component of the rule mask.
 419 :Required: Yes
 420 :Default: 10
 421
 422
 423 ``step take <bucket-name> [class <device-class>]``
 424
 425 :Description: Takes a bucket name, and begins iterating down the tree.
 426               If the ``device-class`` is specified, it must match
 427               a class previously used when defining a device. All
 428               devices that do not belong to the class are excluded.
 429 :Purpose: A component of the rule.
 430 :Required: Yes
 431 :Example: ``step take data``
 432
 433
 434 ``step choose firstn {num} type {bucket-type}``
 435
 436 :Description: Selects the number of buckets of the given type. The number is
 437               usually the number of replicas in the pool (i.e., pool size).
 438
 439               - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
 440               - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
 441               - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
 442
 443 :Purpose: A component of the rule.
 444 :Prerequisite: Follows ``step take`` or ``step choose``.
 445 :Example: ``step choose firstn 1 type row``
 446
 447
 448 ``step chooseleaf firstn {num} type {bucket-type}``
 449
 450 :Description: Selects a set of buckets of ``{bucket-type}`` and chooses a leaf
 451               node from the subtree of each bucket in the set of buckets. The
 452               number of buckets in the set is usually the number of replicas in
 453               the pool (i.e., pool size).
 454
 455               - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
 456               - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
 457               - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
 458
 459 :Purpose: A component of the rule. Usage removes the need to select a device using two steps.
 460 :Prerequisite: Follows ``step take`` or ``step choose``.
 461 :Example: ``step chooseleaf firstn 0 type row``
 462
 463
 464
 465 ``step emit``
 466
 467 :Description: Outputs the current value and empties the stack. Typically used
 468               at the end of a rule, but may also be used to pick from different
 469               trees in the same rule.
 470
 471 :Purpose: A component of the rule.
 472 :Prerequisite: Follows ``step choose``.
 473 :Example: ``step emit``
 474
 475 .. important:: A given CRUSH rule may be assigned to multiple pools, but it
 476    is not possible for a single pool to have multiple CRUSH rules.
 477
 478 .. _crush-reclassify:
 479
 480 Migrating from a legacy SSD rule to device classes
 481 --------------------------------------------------
 482
 483 It used to be necessary to manually edit your CRUSH map and maintain a
 484 parallel hierarchy for each specialized device type (e.g., SSD) in order to
 485 write rules that apply to those devices.  Since the Luminous release,
 486 the *device class* feature has enabled this transparently.
 487
 488 However, migrating from an existing, manually customized per-device map to
 489 the new device class rules in the trivial way will cause all data in the
 490 system to be reshuffled.
 491
 492 The ``crushtool`` has a few commands that can transform a legacy rule
 493 and hierarchy so that you can start using the new class-based rules.
 494 There are three types of transformations possible:
 495
 496 #. ``--reclassify-root <root-name> <device-class>``
 497
 498    This will take everything in the hierarchy beneath root-name and
 499    adjust any rules that reference that root via a ``take
 500    <root-name>`` to instead ``take <root-name> class <device-class>``.
 501    It renumbers the buckets in such a way that the old IDs are instead
 502    used for the specified class's "shadow tree" so that no data
 503    movement takes place.
 504
 505    For example, imagine you have an existing rule like::
 506
 507      rule replicated_ruleset {
 508         id 0
 509         type replicated
 510         min_size 1
 511         max_size 10
 512         step take default
 513         step chooseleaf firstn 0 type rack
 514         step emit
 515      }
 516
 517    If you reclassify the root `default` as class `hdd`, the rule will
 518    become::
 519
 520      rule replicated_ruleset {
 521         id 0
 522         type replicated
 523         min_size 1
 524         max_size 10
 525         step take default class hdd
 526         step chooseleaf firstn 0 type rack
 527         step emit
 528      }
 529
 530 #. ``--set-subtree-class <bucket-name> <device-class>``
 531
 532    This will mark every device in the subtree rooted at *bucket-name*
 533    with the specified device class.
 534
 535    This is normally used in conjunction with the ``--reclassify-root``
 536    option to ensure that all devices in that root are labeled with the
 537    correct class.  In some situations, however, some of those devices
 538    (correctly) have a different class and we do not want to relabel
 539    them.  In such cases, one can exclude the ``--set-subtree-class``
 540    option.  This means that the remapping process will not be perfect,
 541    since the previous rule distributed across devices of multiple
 542    classes but the adjusted rules will only map to devices of the
 543    specified *device-class*, but that often is an accepted level of
 544    data movement when the nubmer of outlier devices is small.
 545
 546 #. ``--reclassify-bucket <match-pattern> <device-class> <default-parent>``
 547
 548    This will allow you to merge a parallel type-specific hiearchy with the normal hierarchy.  For example, many users have maps like::
 549
 550      host node1 {
 551         id -2           # do not change unnecessarily
 552         # weight 109.152
 553         alg straw
 554         hash 0  # rjenkins1
 555         item osd.0 weight 9.096
 556         item osd.1 weight 9.096
 557         item osd.2 weight 9.096
 558         item osd.3 weight 9.096
 559         item osd.4 weight 9.096
 560         item osd.5 weight 9.096
 561         ...
 562      }
 563
 564      host node1-ssd {
 565         id -10          # do not change unnecessarily
 566         # weight 2.000
 567         alg straw
 568         hash 0  # rjenkins1
 569         item osd.80 weight 2.000
 570         ...
 571      }
 572
 573      root default {
 574         id -1           # do not change unnecessarily
 575         alg straw
 576         hash 0  # rjenkins1
 577         item node1 weight 110.967
 578         ...
 579      }
 580
 581      root ssd {
 582         id -18          # do not change unnecessarily
 583         # weight 16.000
 584         alg straw
 585         hash 0  # rjenkins1
 586         item node1-ssd weight 2.000
 587         ...
 588      }
 589
 590    This function will reclassify each bucket that matches a
 591    pattern.  The pattern can look like ``%suffix`` or ``prefix%``.
 592    For example, in the above example, we would use the pattern
 593    ``%-ssd``.  For each matched bucket, the remaining portion of the
 594    name (that matches the ``%`` wildcard) specifies the *base bucket*.
 595    All devices in the matched bucket are labeled with the specified
 596    device class and then moved to the base bucket.  If the base bucket
 597    does not exist (e.g., ``node12-ssd`` exists but ``node12`` does
 598    not), then it is created and linked underneath the specified
 599    *default parent* bucket.  In each case, we are careful to preserve
 600    the old bucket IDs for the new shadow buckets to prevent data
 601    movement.  Any rules with ``take`` steps referencing the old
 602    buckets are adjusted.
 603
 604 #. ``--reclassify-bucket <bucket-name> <device-class> <base-bucket>``
 605
 606    The same command can also be used without a wildcard to map a
 607    single bucket.  For example, in the previous example, we want the
 608    ``ssd`` bucket to be mapped to the ``default`` bucket.
 609
 610 The final command to convert the map comprised of the above fragments would be something like::
 611
 612   $ ceph osd getcrushmap -o original
 613   $ crushtool -i original --reclassify \
 614       --set-subtree-class default hdd \
 615       --reclassify-root default hdd \
 616       --reclassify-bucket %-ssd ssd default \
 617       --reclassify-bucket ssd ssd default \
 618       -o adjusted
 619
 620 In order to ensure that the conversion is correct, there is a ``--compare`` command that will test a large sample of inputs to the CRUSH map and ensure that the same result comes back out.  These inputs are controlled by the same options that apply to the ``--test`` command.  For the above example,::
 621
 622   $ crushtool -i original --compare adjusted
 623   rule 0 had 0/10240 mismatched mappings (0)
 624   rule 1 had 0/10240 mismatched mappings (0)
 625   maps appear equivalent
 626
 627 If there were difference, you'd see what ratio of inputs are remapped
 628 in the parentheses.
 629
 630 If you are satisfied with the adjusted map, you can apply it to the cluster with something like::
 631
 632   ceph osd setcrushmap -i adjusted
 633
 634 Tuning CRUSH, the hard way
 635 --------------------------
 636
 637 If you can ensure that all clients are running recent code, you can
 638 adjust the tunables by extracting the CRUSH map, modifying the values,
 639 and reinjecting it into the cluster.
 640
 641 * Extract the latest CRUSH map::
 642
 643         ceph osd getcrushmap -o /tmp/crush
 644
 645 * Adjust tunables.  These values appear to offer the best behavior
 646   for both large and small clusters we tested with.  You will need to
 647   additionally specify the ``--enable-unsafe-tunables`` argument to
 648   ``crushtool`` for this to work.  Please use this option with
 649   extreme care.::
 650
 651         crushtool -i /tmp/crush --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 --set-choose-total-tries 50 -o /tmp/crush.new
 652
 653 * Reinject modified map::
 654
 655         ceph osd setcrushmap -i /tmp/crush.new
 656
 657 Legacy values
 658 -------------
 659
 660 For reference, the legacy values for the CRUSH tunables can be set
 661 with::
 662
 663    crushtool -i /tmp/crush --set-choose-local-tries 2 --set-choose-local-fallback-tries 5 --set-choose-total-tries 19 --set-chooseleaf-descend-once 0 --set-chooseleaf-vary-r 0 -o /tmp/crush.legacy
 664
 665 Again, the special ``--enable-unsafe-tunables`` option is required.
 666 Further, as noted above, be careful running old versions of the
 667 ``ceph-osd`` daemon after reverting to legacy values as the feature
 668 bit is not perfectly enforced.