ceph/doc/rados/operations/crush-map-edits.rst

   1 Manually editing a CRUSH Map
   2 ============================
   3
   4 .. note:: Manually editing the CRUSH map is an advanced
   5           administrator operation.  All CRUSH changes that are
   6           necessary for the overwhelming majority of installations are
   7           possible via the standard ceph CLI and do not require manual
   8           CRUSH map edits.  If you have identified a use case where
   9           manual edits *are* necessary with recent Ceph releases, consider
  10           contacting the Ceph developers so that future versions of Ceph
  11           can obviate your corner case.
  12
  13 To edit an existing CRUSH map:
  14
  15 #. `Get the CRUSH map`_.
  16 #. `Decompile`_ the CRUSH map.
  17 #. Edit at least one of `Devices`_, `Buckets`_ and `Rules`_.
  18 #. `Recompile`_ the CRUSH map.
  19 #. `Set the CRUSH map`_.
  20
  21 For details on setting the CRUSH map rule for a specific pool, see `Set
  22 Pool Values`_.
  23
  24 .. _Get the CRUSH map: #getcrushmap
  25 .. _Decompile: #decompilecrushmap
  26 .. _Devices: #crushmapdevices
  27 .. _Buckets: #crushmapbuckets
  28 .. _Rules: #crushmaprules
  29 .. _Recompile: #compilecrushmap
  30 .. _Set the CRUSH map: #setcrushmap
  31 .. _Set Pool Values: ../pools#setpoolvalues
  32
  33 .. _getcrushmap:
  34
  35 Get a CRUSH Map
  36 ---------------
  37
  38 To get the CRUSH map for your cluster, execute the following::
  39
  40         ceph osd getcrushmap -o {compiled-crushmap-filename}
  41
  42 Ceph will output (-o) a compiled CRUSH map to the filename you specified. Since
  43 the CRUSH map is in a compiled form, you must decompile it first before you can
  44 edit it.
  45
  46 .. _decompilecrushmap:
  47
  48 Decompile a CRUSH Map
  49 ---------------------
  50
  51 To decompile a CRUSH map, execute the following::
  52
  53         crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
  54
  55 .. _compilecrushmap:
  56
  57 Recompile a CRUSH Map
  58 ---------------------
  59
  60 To compile a CRUSH map, execute the following::
  61
  62         crushtool -c {decompiled-crushmap-filename} -o {compiled-crushmap-filename}
  63
  64 .. _setcrushmap:
  65
  66 Set the CRUSH Map
  67 -----------------
  68
  69 To set the CRUSH map for your cluster, execute the following::
  70
  71         ceph osd setcrushmap -i {compiled-crushmap-filename}
  72
  73 Ceph will load (-i) a compiled CRUSH map from the filename you specified.
  74
  75 Sections
  76 --------
  77
  78 There are six main sections to a CRUSH Map.
  79
  80 #. **tunables:** The preamble at the top of the map describes any *tunables*
  81    that differ from the historical / legacy CRUSH behavior. These
  82    correct for old bugs, optimizations, or other changes that have
  83    been made over the years to improve CRUSH's behavior.
  84
  85 #. **devices:** Devices are individual OSDs that store data.
  86
  87 #. **types**: Bucket ``types`` define the types of buckets used in
  88    your CRUSH hierarchy. Buckets consist of a hierarchical aggregation
  89    of storage locations (e.g., rows, racks, chassis, hosts, etc.) and
  90    their assigned weights.
  91
  92 #. **buckets:** Once you define bucket types, you must define each node
  93    in the hierarchy, its type, and which devices or other nodes it
  94    contains.
  95
  96 #. **rules:** Rules define policy about how data is distributed across
  97    devices in the hierarchy.
  98
  99 #. **choose_args:** Choose_args are alternative weights associated with
 100    the hierarchy that have been adjusted to optimize data placement.  A single
 101    choose_args map can be used for the entire cluster, or one can be
 102    created for each individual pool.
 103
 104
 105 .. _crushmapdevices:
 106
 107 CRUSH Map Devices
 108 -----------------
 109
 110 Devices are individual OSDs that store data.  Usually one is defined here for each
 111 OSD daemon in your
 112 cluster.  Devices are identified by an ``id`` (a non-negative integer) and
 113 a ``name``, normally ``osd.N`` where ``N`` is the device id.
 114
 115 .. _crush-map-device-class:
 116
 117 Devices may also have a *device class* associated with them (e.g.,
 118 ``hdd`` or ``ssd``), allowing them to be conveniently targeted by a
 119 crush rule.
 120
 121 ::
 122
 123         # devices
 124         device {num} {osd.name} [class {class}]
 125
 126 For example::
 127
 128         # devices
 129         device 0 osd.0 class ssd
 130         device 1 osd.1 class hdd
 131         device 2 osd.2
 132         device 3 osd.3
 133
 134 In most cases, each device maps to a single ``ceph-osd`` daemon.  This
 135 is normally a single storage device, a pair of devices (for example,
 136 one for data and one for a journal or metadata), or in some cases a
 137 small RAID device.
 138
 139
 140
 141
 142
 143 CRUSH Map Bucket Types
 144 ----------------------
 145
 146 The second list in the CRUSH map defines 'bucket' types. Buckets facilitate
 147 a hierarchy of nodes and leaves. Node (or non-leaf) buckets typically represent
 148 physical locations in a hierarchy. Nodes aggregate other nodes or leaves.
 149 Leaf buckets represent ``ceph-osd`` daemons and their corresponding storage
 150 media.
 151
 152 .. tip:: The term "bucket" used in the context of CRUSH means a node in
 153    the hierarchy, i.e. a location or a piece of physical hardware. It
 154    is a different concept from the term "bucket" when used in the
 155    context of RADOS Gateway APIs.
 156
 157 To add a bucket type to the CRUSH map, create a new line under your list of
 158 bucket types. Enter ``type`` followed by a unique numeric ID and a bucket name.
 159 By convention, there is one leaf bucket and it is ``type 0``;  however, you may
 160 give it any name you like (e.g., osd, disk, drive, storage, etc.)::
 161
 162         #types
 163         type {num} {bucket-name}
 164
 165 For example::
 166
 167         # types
 168         type 0 osd
 169         type 1 host
 170         type 2 chassis
 171         type 3 rack
 172         type 4 row
 173         type 5 pdu
 174         type 6 pod
 175         type 7 room
 176         type 8 datacenter
 177         type 9 zone
 178         type 10 region
 179         type 11 root
 180
 181
 182
 183 .. _crushmapbuckets:
 184
 185 CRUSH Map Bucket Hierarchy
 186 --------------------------
 187
 188 The CRUSH algorithm distributes data objects among storage devices according
 189 to a per-device weight value, approximating a uniform probability distribution.
 190 CRUSH distributes objects and their replicas according to the hierarchical
 191 cluster map you define. Your CRUSH map represents the available storage
 192 devices and the logical elements that contain them.
 193
 194 To map placement groups to OSDs across failure domains, a CRUSH map defines a
 195 hierarchical list of bucket types (i.e., under ``#types`` in the generated CRUSH
 196 map). The purpose of creating a bucket hierarchy is to segregate the
 197 leaf nodes by their failure domains, such as hosts, chassis, racks, power
 198 distribution units, pods, rows, rooms, and data centers. With the exception of
 199 the leaf nodes representing OSDs, the rest of the hierarchy is arbitrary, and
 200 you may define it according to your own needs.
 201
 202 We recommend adapting your CRUSH map to your firms's hardware naming conventions
 203 and using instances names that reflect the physical hardware. Your naming
 204 practice can make it easier to administer the cluster and troubleshoot
 205 problems when an OSD and/or other hardware malfunctions and the administrator
 206 need access to physical hardware.
 207
 208 In the following example, the bucket hierarchy has a leaf bucket named ``osd``,
 209 and two node buckets named ``host`` and ``rack`` respectively.
 210
 211 .. ditaa::
 212                            +-----------+
 213                            | {o}rack   |
 214                            |   Bucket  |
 215                            +-----+-----+
 216                                  |
 217                  +---------------+---------------+
 218                  |                               |
 219            +-----+-----+                   +-----+-----+
 220            | {o}host   |                   | {o}host   |
 221            |   Bucket  |                   |   Bucket  |
 222            +-----+-----+                   +-----+-----+
 223                  |                               |
 224          +-------+-------+               +-------+-------+
 225          |               |               |               |
 226    +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
 227    |    osd    |   |    osd    |   |    osd    |   |    osd    |
 228    |   Bucket  |   |   Bucket  |   |   Bucket  |   |   Bucket  |
 229    +-----------+   +-----------+   +-----------+   +-----------+
 230
 231 .. note:: The higher numbered ``rack`` bucket type aggregates the lower
 232    numbered ``host`` bucket type.
 233
 234 Since leaf nodes reflect storage devices declared under the ``#devices`` list
 235 at the beginning of the CRUSH map, you do not need to declare them as bucket
 236 instances. The second lowest bucket type in your hierarchy usually aggregates
 237 the devices (i.e., it's usually the computer containing the storage media, and
 238 uses whatever term you prefer to describe it, such as  "node", "computer",
 239 "server," "host", "machine", etc.). In high density environments, it is
 240 increasingly common to see multiple hosts/nodes per chassis. You should account
 241 for chassis failure too--e.g., the need to pull a chassis if a node fails may
 242 result in bringing down numerous hosts/nodes and their OSDs.
 243
 244 When declaring a bucket instance, you must specify its type, give it a unique
 245 name (string), assign it a unique ID expressed as a negative integer (optional),
 246 specify a weight relative to the total capacity/capability of its item(s),
 247 specify the bucket algorithm (usually ``straw2``), and the hash (usually ``0``,
 248 reflecting hash algorithm ``rjenkins1``). A bucket may have one or more items.
 249 The items may consist of node buckets or leaves. Items may have a weight that
 250 reflects the relative weight of the item.
 251
 252 You may declare a node bucket with the following syntax::
 253
 254         [bucket-type] [bucket-name] {
 255                 id [a unique negative numeric ID]
 256                 weight [the relative capacity/capability of the item(s)]
 257                 alg [the bucket type: uniform | list | tree | straw | straw2 ]
 258                 hash [the hash type: 0 by default]
 259                 item [item-name] weight [weight]
 260         }
 261
 262 For example, using the diagram above, we would define two host buckets
 263 and one rack bucket. The OSDs are declared as items within the host buckets::
 264
 265         host node1 {
 266                 id -1
 267                 alg straw2
 268                 hash 0
 269                 item osd.0 weight 1.00
 270                 item osd.1 weight 1.00
 271         }
 272
 273         host node2 {
 274                 id -2
 275                 alg straw2
 276                 hash 0
 277                 item osd.2 weight 1.00
 278                 item osd.3 weight 1.00
 279         }
 280
 281         rack rack1 {
 282                 id -3
 283                 alg straw2
 284                 hash 0
 285                 item node1 weight 2.00
 286                 item node2 weight 2.00
 287         }
 288
 289 .. note:: In the foregoing example, note that the rack bucket does not contain
 290    any OSDs. Rather it contains lower level host buckets, and includes the
 291    sum total of their weight in the item entry.
 292
 293 .. topic:: Bucket Types
 294
 295    Ceph supports five bucket types, each representing a tradeoff between
 296    performance and reorganization efficiency. If you are unsure of which bucket
 297    type to use, we recommend using a ``straw2`` bucket.  For a detailed
 298    discussion of bucket types, refer to
 299    `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
 300    and more specifically to **Section 3.4**. The bucket types are:
 301
 302         #. **uniform**: Uniform buckets aggregate devices with **exactly** the same
 303            weight. For example, when firms commission or decommission hardware, they
 304            typically do so with many machines that have exactly the same physical
 305            configuration (e.g., bulk purchases). When storage devices have exactly
 306            the same weight, you may use the ``uniform`` bucket type, which allows
 307            CRUSH to map replicas into uniform buckets in constant time. With
 308            non-uniform weights, you should use another bucket algorithm.
 309
 310         #. **list**: List buckets aggregate their content as linked lists. Based on
 311            the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`P` algorithm,
 312            a list is a natural and intuitive choice for an **expanding cluster**:
 313            either an object is relocated to the newest device with some appropriate
 314            probability, or it remains on the older devices as before. The result is
 315            optimal data migration when items are added to the bucket. Items removed
 316            from the middle or tail of the list, however, can result in a signiﬁcant
 317            amount of unnecessary movement, making list buckets most suitable for
 318            circumstances in which they **never (or very rarely) shrink**.
 319
 320         #. **tree**: Tree buckets use a binary search tree. They are more efficient
 321            than list buckets when a bucket contains a larger set of items. Based on
 322            the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`R` algorithm,
 323            tree buckets reduce the placement time to O(log :sub:`n`), making them
 324            suitable for managing much larger sets of devices or nested buckets.
 325
 326         #. **straw**: List and Tree buckets use a divide and conquer strategy
 327            in a way that either gives certain items precedence (e.g., those
 328            at the beginning of a list) or obviates the need to consider entire
 329            subtrees of items at all. That improves the performance of the replica
 330            placement process, but can also introduce suboptimal reorganization
 331            behavior when the contents of a bucket change due an addition, removal,
 332            or re-weighting of an item. The straw bucket type allows all items to
 333            fairly “compete” against each other for replica placement through a
 334            process analogous to a draw of straws.
 335
 336         #. **straw2**: Straw2 buckets improve Straw to correctly avoid any data
 337            movement between items when neighbor weights change.
 338
 339            For example the weight of item A including adding it anew or removing
 340            it completely, there will be data movement only to or from item A.
 341
 342 .. topic:: Hash
 343
 344    Each bucket uses a hash algorithm. Currently, Ceph supports ``rjenkins1``.
 345    Enter ``0`` as your hash setting to select ``rjenkins1``.
 346
 347
 348 .. _weightingbucketitems:
 349
 350 .. topic:: Weighting Bucket Items
 351
 352    Ceph expresses bucket weights as doubles, which allows for fine
 353    weighting. A weight is the relative difference between device capacities. We
 354    recommend using ``1.00`` as the relative weight for a 1TB storage device.
 355    In such a scenario, a weight of ``0.5`` would represent approximately 500GB,
 356    and a weight of ``3.00`` would represent approximately 3TB. Higher level
 357    buckets have a weight that is the sum total of the leaf items aggregated by
 358    the bucket.
 359
 360    A bucket item weight is one dimensional, but you may also calculate your
 361    item weights to reflect the performance of the storage drive. For example,
 362    if you have many 1TB drives where some have relatively low data transfer
 363    rate and the others have a relatively high data transfer rate, you may
 364    weight them differently, even though they have the same capacity (e.g.,
 365    a weight of 0.80 for the first set of drives with lower total throughput,
 366    and 1.20 for the second set of drives with higher total throughput).
 367
 368
 369 .. _crushmaprules:
 370
 371 CRUSH Map Rules
 372 ---------------
 373
 374 CRUSH maps support the notion of 'CRUSH rules', which are the rules that
 375 determine data placement for a pool. The default CRUSH map has a rule for each
 376 pool. For large clusters, you will likely create many pools where each pool may
 377 have its own non-default CRUSH rule.
 378
 379 .. note:: In most cases, you will not need to modify the default rule. When
 380    you create a new pool, by default the rule will be set to ``0``.
 381
 382
 383 CRUSH rules define placement and replication strategies or distribution policies
 384 that allow you to specify exactly how CRUSH places object replicas. For
 385 example, you might create a rule selecting a pair of targets for 2-way
 386 mirroring, another rule for selecting three targets in two different data
 387 centers for 3-way mirroring, and yet another rule for erasure coding over six
 388 storage devices. For a detailed discussion of CRUSH rules, refer to
 389 `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
 390 and more specifically to **Section 3.2**.
 391
 392 A rule takes the following form::
 393
 394         rule <rulename> {
 395
 396                 id [a unique whole numeric ID]
 397                 type [ replicated | erasure ]
 398                 min_size <min-size>
 399                 max_size <max-size>
 400                 step take <bucket-name> [class <device-class>]
 401                 step [choose|chooseleaf] [firstn|indep] <N> type <bucket-type>
 402                 step emit
 403         }
 404
 405
 406 ``id``
 407
 408 :Description: A unique whole number for identifying the rule.
 409
 410 :Purpose: A component of the rule mask.
 411 :Type: Integer
 412 :Required: Yes
 413 :Default: 0
 414
 415
 416 ``type``
 417
 418 :Description: Describes a rule for either a storage drive (replicated)
 419               or a RAID.
 420
 421 :Purpose: A component of the rule mask.
 422 :Type: String
 423 :Required: Yes
 424 :Default: ``replicated``
 425 :Valid Values: Currently only ``replicated`` and ``erasure``
 426
 427 ``min_size``
 428
 429 :Description: If a pool makes fewer replicas than this number, CRUSH will
 430               **NOT** select this rule.
 431
 432 :Type: Integer
 433 :Purpose: A component of the rule mask.
 434 :Required: Yes
 435 :Default: ``1``
 436
 437 ``max_size``
 438
 439 :Description: If a pool makes more replicas than this number, CRUSH will
 440               **NOT** select this rule.
 441
 442 :Type: Integer
 443 :Purpose: A component of the rule mask.
 444 :Required: Yes
 445 :Default: 10
 446
 447
 448 ``step take <bucket-name> [class <device-class>]``
 449
 450 :Description: Takes a bucket name, and begins iterating down the tree.
 451               If the ``device-class`` is specified, it must match
 452               a class previously used when defining a device. All
 453               devices that do not belong to the class are excluded.
 454 :Purpose: A component of the rule.
 455 :Required: Yes
 456 :Example: ``step take data``
 457
 458
 459 ``step choose firstn {num} type {bucket-type}``
 460
 461 :Description: Selects the number of buckets of the given type from within the
 462               current bucket. The number is usually the number of replicas in
 463               the pool (i.e., pool size).
 464
 465               - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
 466               - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
 467               - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
 468
 469 :Purpose: A component of the rule.
 470 :Prerequisite: Follows ``step take`` or ``step choose``.
 471 :Example: ``step choose firstn 1 type row``
 472
 473
 474 ``step chooseleaf firstn {num} type {bucket-type}``
 475
 476 :Description: Selects a set of buckets of ``{bucket-type}`` and chooses a leaf
 477               node (that is, an OSD) from the subtree of each bucket in the set of buckets.
 478               The number of buckets in the set is usually the number of replicas in
 479               the pool (i.e., pool size).
 480
 481               - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
 482               - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
 483               - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
 484
 485 :Purpose: A component of the rule. Usage removes the need to select a device using two steps.
 486 :Prerequisite: Follows ``step take`` or ``step choose``.
 487 :Example: ``step chooseleaf firstn 0 type row``
 488
 489
 490 ``step emit``
 491
 492 :Description: Outputs the current value and empties the stack. Typically used
 493               at the end of a rule, but may also be used to pick from different
 494               trees in the same rule.
 495
 496 :Purpose: A component of the rule.
 497 :Prerequisite: Follows ``step choose``.
 498 :Example: ``step emit``
 499
 500 .. important:: A given CRUSH rule may be assigned to multiple pools, but it
 501    is not possible for a single pool to have multiple CRUSH rules.
 502
 503 ``firstn`` versus ``indep``
 504
 505 :Description: Controls the replacement strategy CRUSH uses when items (OSDs)
 506               are marked down in the CRUSH map. If this rule is to be used with
 507               replicated pools it should be ``firstn`` and if it's for
 508               erasure-coded pools it should be ``indep``.
 509
 510               The reason has to do with how they behave when a
 511               previously-selected device fails. Let's say you have a PG stored
 512               on OSDs 1, 2, 3, 4, 5. Then 3 goes down.
 513
 514               With the "firstn" mode, CRUSH simply adjusts its calculation to
 515               select 1 and 2, then selects 3 but discovers it's down, so it
 516               retries and selects 4 and 5, and then goes on to select a new
 517               OSD 6. So the final CRUSH mapping change is
 518               1, 2, 3, 4, 5 -> 1, 2, 4, 5, 6.
 519
 520               But if you're storing an EC pool, that means you just changed the
 521               data mapped to OSDs 4, 5, and 6! So the "indep" mode attempts to
 522               not do that. You can instead expect it, when it selects the failed
 523               OSD 3, to try again and pick out 6, for a final transformation of:
 524               1, 2, 3, 4, 5 -> 1, 2, 6, 4, 5
 525
 526 .. _crush-reclassify:
 527
 528 Migrating from a legacy SSD rule to device classes
 529 --------------------------------------------------
 530
 531 It used to be necessary to manually edit your CRUSH map and maintain a
 532 parallel hierarchy for each specialized device type (e.g., SSD) in order to
 533 write rules that apply to those devices.  Since the Luminous release,
 534 the *device class* feature has enabled this transparently.
 535
 536 However, migrating from an existing, manually customized per-device map to
 537 the new device class rules in the trivial way will cause all data in the
 538 system to be reshuffled.
 539
 540 The ``crushtool`` has a few commands that can transform a legacy rule
 541 and hierarchy so that you can start using the new class-based rules.
 542 There are three types of transformations possible:
 543
 544 #. ``--reclassify-root <root-name> <device-class>``
 545
 546    This will take everything in the hierarchy beneath root-name and
 547    adjust any rules that reference that root via a ``take
 548    <root-name>`` to instead ``take <root-name> class <device-class>``.
 549    It renumbers the buckets in such a way that the old IDs are instead
 550    used for the specified class's "shadow tree" so that no data
 551    movement takes place.
 552
 553    For example, imagine you have an existing rule like::
 554
 555      rule replicated_rule {
 556         id 0
 557         type replicated
 558         step take default
 559         step chooseleaf firstn 0 type rack
 560         step emit
 561      }
 562
 563    If you reclassify the root `default` as class `hdd`, the rule will
 564    become::
 565
 566      rule replicated_rule {
 567         id 0
 568         type replicated
 569         step take default class hdd
 570         step chooseleaf firstn 0 type rack
 571         step emit
 572      }
 573
 574 #. ``--set-subtree-class <bucket-name> <device-class>``
 575
 576    This will mark every device in the subtree rooted at *bucket-name*
 577    with the specified device class.
 578
 579    This is normally used in conjunction with the ``--reclassify-root``
 580    option to ensure that all devices in that root are labeled with the
 581    correct class.  In some situations, however, some of those devices
 582    (correctly) have a different class and we do not want to relabel
 583    them.  In such cases, one can exclude the ``--set-subtree-class``
 584    option.  This means that the remapping process will not be perfect,
 585    since the previous rule distributed across devices of multiple
 586    classes but the adjusted rules will only map to devices of the
 587    specified *device-class*, but that often is an accepted level of
 588    data movement when the number of outlier devices is small.
 589
 590 #. ``--reclassify-bucket <match-pattern> <device-class> <default-parent>``
 591
 592    This will allow you to merge a parallel type-specific hierarchy with the normal hierarchy.  For example, many users have maps like::
 593
 594      host node1 {
 595         id -2           # do not change unnecessarily
 596         # weight 109.152
 597         alg straw2
 598         hash 0  # rjenkins1
 599         item osd.0 weight 9.096
 600         item osd.1 weight 9.096
 601         item osd.2 weight 9.096
 602         item osd.3 weight 9.096
 603         item osd.4 weight 9.096
 604         item osd.5 weight 9.096
 605         ...
 606      }
 607
 608      host node1-ssd {
 609         id -10          # do not change unnecessarily
 610         # weight 2.000
 611         alg straw2
 612         hash 0  # rjenkins1
 613         item osd.80 weight 2.000
 614         ...
 615      }
 616
 617      root default {
 618         id -1           # do not change unnecessarily
 619         alg straw2
 620         hash 0  # rjenkins1
 621         item node1 weight 110.967
 622         ...
 623      }
 624
 625      root ssd {
 626         id -18          # do not change unnecessarily
 627         # weight 16.000
 628         alg straw2
 629         hash 0  # rjenkins1
 630         item node1-ssd weight 2.000
 631         ...
 632      }
 633
 634    This function will reclassify each bucket that matches a
 635    pattern.  The pattern can look like ``%suffix`` or ``prefix%``.
 636    For example, in the above example, we would use the pattern
 637    ``%-ssd``.  For each matched bucket, the remaining portion of the
 638    name (that matches the ``%`` wildcard) specifies the *base bucket*.
 639    All devices in the matched bucket are labeled with the specified
 640    device class and then moved to the base bucket.  If the base bucket
 641    does not exist (e.g., ``node12-ssd`` exists but ``node12`` does
 642    not), then it is created and linked underneath the specified
 643    *default parent* bucket.  In each case, we are careful to preserve
 644    the old bucket IDs for the new shadow buckets to prevent data
 645    movement.  Any rules with ``take`` steps referencing the old
 646    buckets are adjusted.
 647
 648 #. ``--reclassify-bucket <bucket-name> <device-class> <base-bucket>``
 649
 650    The same command can also be used without a wildcard to map a
 651    single bucket.  For example, in the previous example, we want the
 652    ``ssd`` bucket to be mapped to the ``default`` bucket.
 653
 654 The final command to convert the map comprised of the above fragments would be something like::
 655
 656   $ ceph osd getcrushmap -o original
 657   $ crushtool -i original --reclassify \
 658       --set-subtree-class default hdd \
 659       --reclassify-root default hdd \
 660       --reclassify-bucket %-ssd ssd default \
 661       --reclassify-bucket ssd ssd default \
 662       -o adjusted
 663
 664 In order to ensure that the conversion is correct, there is a ``--compare`` command that will test a large sample of inputs to the CRUSH map and ensure that the same result comes back out.  These inputs are controlled by the same options that apply to the ``--test`` command.  For the above example,::
 665
 666   $ crushtool -i original --compare adjusted
 667   rule 0 had 0/10240 mismatched mappings (0)
 668   rule 1 had 0/10240 mismatched mappings (0)
 669   maps appear equivalent
 670
 671 If there were difference, you'd see what ratio of inputs are remapped
 672 in the parentheses.
 673
 674 If you are satisfied with the adjusted map, you can apply it to the cluster with something like::
 675
 676   ceph osd setcrushmap -i adjusted
 677
 678 Tuning CRUSH, the hard way
 679 --------------------------
 680
 681 If you can ensure that all clients are running recent code, you can
 682 adjust the tunables by extracting the CRUSH map, modifying the values,
 683 and reinjecting it into the cluster.
 684
 685 * Extract the latest CRUSH map::
 686
 687         ceph osd getcrushmap -o /tmp/crush
 688
 689 * Adjust tunables.  These values appear to offer the best behavior
 690   for both large and small clusters we tested with.  You will need to
 691   additionally specify the ``--enable-unsafe-tunables`` argument to
 692   ``crushtool`` for this to work.  Please use this option with
 693   extreme care.::
 694
 695         crushtool -i /tmp/crush --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 --set-choose-total-tries 50 -o /tmp/crush.new
 696
 697 * Reinject modified map::
 698
 699         ceph osd setcrushmap -i /tmp/crush.new
 700
 701 Legacy values
 702 -------------
 703
 704 For reference, the legacy values for the CRUSH tunables can be set
 705 with::
 706
 707    crushtool -i /tmp/crush --set-choose-local-tries 2 --set-choose-local-fallback-tries 5 --set-choose-total-tries 19 --set-chooseleaf-descend-once 0 --set-chooseleaf-vary-r 0 -o /tmp/crush.legacy
 708
 709 Again, the special ``--enable-unsafe-tunables`` option is required.
 710 Further, as noted above, be careful running old versions of the
 711 ``ceph-osd`` daemon after reverting to legacy values as the feature
 712 bit is not perfectly enforced.
 713
 714 .. _CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data: https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf