ceph/doc/man/8/crushtool.rst

   1 :orphan:
   2
   3 ==========================================
   4  crushtool -- CRUSH map manipulation tool
   5 ==========================================
   6
   7 .. program:: crushtool
   8
   9 Synopsis
  10 ========
  11
  12 | **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds*
  13   *layer1* *...* | --test ) [ -o *outfile* ]
  14
  15
  16 Description
  17 ===========
  18
  19 **crushtool** is a utility that lets you create, compile, decompile
  20 and test CRUSH map files.
  21
  22 CRUSH is a pseudo-random data distribution algorithm that efficiently
  23 maps input values (which, in the context of Ceph, correspond to Placement
  24 Groups) across a heterogeneous, hierarchically structured device map.
  25 The algorithm was originally described in detail in the following paper
  26 (although it has evolved some since then)::
  27
  28    http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
  29
  30 The tool has four modes of operation.
  31
  32 .. option:: --compile|-c map.txt
  33
  34    will compile a plaintext map.txt into a binary map file.
  35
  36 .. option:: --decompile|-d map
  37
  38    will take the compiled map and decompile it into a plaintext source
  39    file, suitable for editing.
  40
  41 .. option:: --build --num_osds {num-osds} layer1 ...
  42
  43    will create map with the given layer structure. See below for a
  44    detailed explanation.
  45
  46 .. option:: --test
  47
  48    will perform a dry run of a CRUSH mapping for a range of input
  49    values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be
  50    thought of as simulated Placement Groups. See below for a more
  51    detailed explanation.
  52
  53 Unlike other Ceph tools, **crushtool** does not accept generic options
  54 such as **--debug-crush** from the command line. They can, however, be
  55 provided via the CEPH_ARGS environment variable. For instance, to
  56 silence all output from the CRUSH subsystem::
  57
  58     CEPH_ARGS="--debug-crush 0" crushtool ...
  59
  60
  61 Running tests with --test
  62 =========================
  63
  64 The test mode will use the input crush map ( as specified with **-i
  65 map** ) and perform a dry run of CRUSH mapping or random placement
  66 (if **--simulate** is set ). On completion, two kinds of reports can be
  67 created.
  68 1) The **--show-...** option outputs human readable information
  69 on stderr.
  70 2) The **--output-csv** option creates CSV files that are
  71 documented by the **--help-output** option.
  72
  73 Note: Each Placement Group (PG) has an integer ID which can be obtained
  74 from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32).
  75 The pool and PG IDs are combined by a function to get a value which is
  76 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
  77 pools; it only runs simulations by mapping values in the range
  78 ``[--min-x,--max-x]``.
  79
  80
  81 .. option:: --show-statistics
  82
  83    Displays a summary of the distribution. For instance::
  84
  85        rule 1 (metadata) num_rep 5 result size == 5:    1024/1024
  86
  87    shows that rule **1** which is named **metadata** successfully
  88    mapped **1024** values to **result size == 5** devices when trying
  89    to map them to **num_rep 5** replicas. When it fails to provide the
  90    required mapping, presumably because the number of **tries** must
  91    be increased, a breakdown of the failures is displayed. For instance::
  92
  93        rule 1 (metadata) num_rep 10 result size == 8:   4/1024
  94        rule 1 (metadata) num_rep 10 result size == 9:   93/1024
  95        rule 1 (metadata) num_rep 10 result size == 10:  927/1024
  96
  97    shows that although **num_rep 10** replicas were required, **4**
  98    out of **1024** values ( **4/1024** ) were mapped to **result size
  99    == 8** devices only.
 100
 101 .. option:: --show-mappings
 102
 103    Displays the mapping of each value in the range ``[--min-x,--max-x]``.
 104    For instance::
 105
 106        CRUSH rule 1 x 24 [11,6]
 107
 108    shows that value **24** is mapped to devices **[11,6]** by rule
 109    **1**.
 110
 111    One of the following is required when using the ``--show-mappings`` option:
 112
 113         (a) ``--num-rep``
 114         (b) both ``--min-rep`` and ``--max-rep``
 115
 116    ``--num-rep`` stands for "number of replicas, indicates the number of
 117    replicas in a pool, and is used to specify an exact number of replicas (for
 118    example ``--num-rep 5``). ``--min-rep`` and ``--max-rep`` are used together
 119    to specify a range of replicas (for example, ``--min-rep 1 --max-rep 10``).
 120
 121 .. option:: --show-bad-mappings
 122
 123    Displays which value failed to be mapped to the required number of
 124    devices. For instance::
 125
 126      bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
 127
 128    shows that when rule **1** was required to map **7** devices, it
 129    could map only six : **[8,10,2,11,6,9]**.
 130
 131 .. option:: --show-utilization
 132
 133    Displays the expected and actual utilization for each device, for
 134    each number of replicas. For instance::
 135
 136      device 0: stored : 951      expected : 853.333
 137      device 1: stored : 963      expected : 853.333
 138      ...
 139
 140    shows that device **0** stored **951** values and was expected to store **853**.
 141    Implies **--show-statistics**.
 142
 143 .. option:: --show-utilization-all
 144
 145    Displays the same as **--show-utilization** but does not suppress
 146    output when the weight of a device is zero.
 147    Implies **--show-statistics**.
 148
 149 .. option:: --show-choose-tries
 150
 151    Displays how many attempts were needed to find a device mapping.
 152    For instance::
 153
 154       0:     95224
 155       1:      3745
 156       2:      2225
 157       ..
 158
 159    shows that **95224** mappings succeeded without retries, **3745**
 160    mappings succeeded with one attempts, etc. There are as many rows
 161    as the value of the **--set-choose-total-tries** option.
 162
 163 .. option:: --output-csv
 164
 165    Creates CSV files (in the current directory) containing information
 166    documented by **--help-output**. The files are named after the rule
 167    used when collecting the statistics. For instance, if the rule
 168    : 'metadata' is used, the CSV files will be::
 169
 170       metadata-absolute_weights.csv
 171       metadata-device_utilization.csv
 172       ...
 173
 174    The first line of the file shortly explains the column layout. For
 175    instance::
 176
 177       metadata-absolute_weights.csv
 178       Device ID, Absolute Weight
 179       0,1
 180       ...
 181
 182 .. option:: --output-name NAME
 183
 184    Prepend **NAME** to the file names generated when **--output-csv**
 185    is specified. For instance **--output-name FOO** will create
 186    files::
 187
 188       FOO-metadata-absolute_weights.csv
 189       FOO-metadata-device_utilization.csv
 190       ...
 191
 192 The **--set-...** options can be used to modify the tunables of the
 193 input crush map. The input crush map is modified in
 194 memory. For example::
 195
 196       $ crushtool -i mymap --test --show-bad-mappings
 197       bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
 198
 199 could be fixed by increasing the **choose-total-tries** as follows:
 200
 201       $ crushtool -i mymap --test \
 202           --show-bad-mappings \
 203           --set-choose-total-tries 500
 204
 205 Building a map with --build
 206 ===========================
 207
 208 The build mode will generate hierarchical maps. The first argument
 209 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
 210 layer describes how the layer (or devices) preceding it should be
 211 grouped.
 212
 213 Each layer consists of::
 214
 215        bucket ( uniform | list | tree | straw | straw2 ) size
 216
 217 The **bucket** is the type of the buckets in the layer
 218 (e.g. "rack"). Each bucket name will be built by appending a unique
 219 number to the **bucket** string (e.g. "rack0", "rack1"...).
 220
 221 The second component is the type of bucket: **straw** should be used
 222 most of the time.
 223
 224 The third component is the maximum size of the bucket. A size of zero
 225 means a bucket of infinite capacity.
 226
 227
 228 Example
 229 =======
 230
 231 Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
 232 each node contains 4 storage devices for Ceph OSD Daemons. This configuration
 233 allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
 234 leaving an extra 2U for a rack switch.
 235
 236 To reflect our hierarchy of devices, nodes, racks and rows, we would execute
 237 the following::
 238
 239     $ crushtool -o crushmap --build --num_osds 320 \
 240            node straw 4 \
 241            rack straw 20 \
 242            row straw 2 \
 243            root straw 0
 244     # id        weight  type name       reweight
 245     -87 320     root root
 246     -85 160             row row0
 247     -81 80                      rack rack0
 248     -1  4                               node node0
 249     0   1                                       osd.0   1
 250     1   1                                       osd.1   1
 251     2   1                                       osd.2   1
 252     3   1                                       osd.3   1
 253     -2  4                               node node1
 254     4   1                                       osd.4   1
 255     5   1                                       osd.5   1
 256     ...
 257
 258 CRUSH rules are created so the generated crushmap can be
 259 tested. They are the same rules as the ones created by default when
 260 creating a new Ceph cluster. They can be further edited with::
 261
 262        # decompile
 263        crushtool -d crushmap -o map.txt
 264
 265        # edit
 266        emacs map.txt
 267
 268        # recompile
 269        crushtool -c map.txt -o crushmap
 270
 271 Reclassify
 272 ==========
 273
 274 The *reclassify* function allows users to transition from older maps that
 275 maintain parallel hierarchies for OSDs of different types to a modern CRUSH
 276 map that makes use of the *device class* feature.  For more information,
 277 see https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
 278
 279 Example output from --test
 280 ==========================
 281
 282 See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
 283 for sample ``crushtool --test`` commands and output produced thereby.
 284
 285 Availability
 286 ============
 287
 288 **crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
 289 refer to the Ceph documentation at https://docs.ceph.com for more
 290 information.
 291
 292
 293 See also
 294 ========
 295
 296 :doc:`ceph <ceph>`\(8),
 297 :doc:`osdmaptool <osdmaptool>`\(8),
 298
 299 Authors
 300 =======
 301
 302 John Wilkins, Sage Weil, Loic Dachary