ceph/doc/man/8/crushtool.rst

   1 :orphan:
   2
   3 ==========================================
   4  crushtool -- CRUSH map manipulation tool
   5 ==========================================
   6
   7 .. program:: crushtool
   8
   9 Synopsis
  10 ========
  11
  12 | **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds*
  13   *layer1* *...* | --test ) [ -o *outfile* ]
  14
  15
  16 Description
  17 ===========
  18
  19 **crushtool** is a utility that lets you create, compile, decompile
  20 and test CRUSH map files.
  21
  22 CRUSH is a pseudo-random data distribution algorithm that efficiently
  23 maps input values (which, in the context of Ceph, correspond to Placement
  24 Groups) across a heterogeneous, hierarchically structured device map.
  25 The algorithm was originally described in detail in the following paper
  26 (although it has evolved some since then)::
  27
  28    http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
  29
  30 The tool has four modes of operation.
  31
  32 .. option:: --compile|-c map.txt
  33
  34    will compile a plaintext map.txt into a binary map file.
  35
  36 .. option:: --decompile|-d map
  37
  38    will take the compiled map and decompile it into a plaintext source
  39    file, suitable for editing.
  40
  41 .. option:: --build --num_osds {num-osds} layer1 ...
  42
  43    will create map with the given layer structure. See below for a
  44    detailed explanation.
  45
  46 .. option:: --test
  47
  48    will perform a dry run of a CRUSH mapping for a range of input
  49    values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be
  50    thought of as simulated Placement Groups. See below for a more
  51    detailed explanation.
  52
  53 Unlike other Ceph tools, **crushtool** does not accept generic options
  54 such as **--debug-crush** from the command line. They can, however, be
  55 provided via the CEPH_ARGS environment variable. For instance, to
  56 silence all output from the CRUSH subsystem::
  57
  58     CEPH_ARGS="--debug-crush 0" crushtool ...
  59
  60
  61 Running tests with --test
  62 =========================
  63
  64 The test mode will use the input crush map ( as specified with **-i
  65 map** ) and perform a dry run of CRUSH mapping or random placement
  66 (if **--simulate** is set ). On completion, two kinds of reports can be
  67 created.
  68 1) The **--show-...** option outputs human readable information
  69 on stderr.
  70 2) The **--output-csv** option creates CSV files that are
  71 documented by the **--help-output** option.
  72
  73 Note: Each Placement Group (PG) has an integer ID which can be obtained
  74 from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32).
  75 The pool and PG IDs are combined by a function to get a value which is
  76 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
  77 pools; it only runs simulations by mapping values in the range
  78 ``[--min-x,--max-x]``.
  79
  80
  81 .. option:: --show-statistics
  82
  83    Displays a summary of the distribution. For instance::
  84
  85        rule 1 (metadata) num_rep 5 result size == 5:    1024/1024
  86
  87    shows that rule **1** which is named **metadata** successfully
  88    mapped **1024** values to **result size == 5** devices when trying
  89    to map them to **num_rep 5** replicas. When it fails to provide the
  90    required mapping, presumably because the number of **tries** must
  91    be increased, a breakdown of the failures is displayed. For instance::
  92
  93        rule 1 (metadata) num_rep 10 result size == 8:   4/1024
  94        rule 1 (metadata) num_rep 10 result size == 9:   93/1024
  95        rule 1 (metadata) num_rep 10 result size == 10:  927/1024
  96
  97    shows that although **num_rep 10** replicas were required, **4**
  98    out of **1024** values ( **4/1024** ) were mapped to **result size
  99    == 8** devices only.
 100
 101 .. option:: --show-mappings
 102
 103    Displays the mapping of each value in the range ``[--min-x,--max-x]``.
 104    For instance::
 105
 106        CRUSH rule 1 x 24 [11,6]
 107
 108    shows that value **24** is mapped to devices **[11,6]** by rule
 109    **1**.
 110
 111 .. option:: --show-bad-mappings
 112
 113    Displays which value failed to be mapped to the required number of
 114    devices. For instance::
 115
 116      bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
 117
 118    shows that when rule **1** was required to map **7** devices, it
 119    could map only six : **[8,10,2,11,6,9]**.
 120
 121 .. option:: --show-utilization
 122
 123    Displays the expected and actual utilization for each device, for
 124    each number of replicas. For instance::
 125
 126      device 0: stored : 951      expected : 853.333
 127      device 1: stored : 963      expected : 853.333
 128      ...
 129
 130    shows that device **0** stored **951** values and was expected to store **853**.
 131    Implies **--show-statistics**.
 132
 133 .. option:: --show-utilization-all
 134
 135    Displays the same as **--show-utilization** but does not suppress
 136    output when the weight of a device is zero.
 137    Implies **--show-statistics**.
 138
 139 .. option:: --show-choose-tries
 140
 141    Displays how many attempts were needed to find a device mapping.
 142    For instance::
 143
 144       0:     95224
 145       1:      3745
 146       2:      2225
 147       ..
 148
 149    shows that **95224** mappings succeeded without retries, **3745**
 150    mappings succeeded with one attempts, etc. There are as many rows
 151    as the value of the **--set-choose-total-tries** option.
 152
 153 .. option:: --output-csv
 154
 155    Creates CSV files (in the current directory) containing information
 156    documented by **--help-output**. The files are named after the rule
 157    used when collecting the statistics. For instance, if the rule
 158    : 'metadata' is used, the CSV files will be::
 159
 160       metadata-absolute_weights.csv
 161       metadata-device_utilization.csv
 162       ...
 163
 164    The first line of the file shortly explains the column layout. For
 165    instance::
 166
 167       metadata-absolute_weights.csv
 168       Device ID, Absolute Weight
 169       0,1
 170       ...
 171
 172 .. option:: --output-name NAME
 173
 174    Prepend **NAME** to the file names generated when **--output-csv**
 175    is specified. For instance **--output-name FOO** will create
 176    files::
 177
 178       FOO-metadata-absolute_weights.csv
 179       FOO-metadata-device_utilization.csv
 180       ...
 181
 182 The **--set-...** options can be used to modify the tunables of the
 183 input crush map. The input crush map is modified in
 184 memory. For example::
 185
 186       $ crushtool -i mymap --test --show-bad-mappings
 187       bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
 188
 189 could be fixed by increasing the **choose-total-tries** as follows:
 190
 191       $ crushtool -i mymap --test \
 192           --show-bad-mappings \
 193           --set-choose-total-tries 500
 194
 195 Building a map with --build
 196 ===========================
 197
 198 The build mode will generate hierarchical maps. The first argument
 199 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
 200 layer describes how the layer (or devices) preceding it should be
 201 grouped.
 202
 203 Each layer consists of::
 204
 205        bucket ( uniform | list | tree | straw | straw2 ) size
 206
 207 The **bucket** is the type of the buckets in the layer
 208 (e.g. "rack"). Each bucket name will be built by appending a unique
 209 number to the **bucket** string (e.g. "rack0", "rack1"...).
 210
 211 The second component is the type of bucket: **straw** should be used
 212 most of the time.
 213
 214 The third component is the maximum size of the bucket. A size of zero
 215 means a bucket of infinite capacity.
 216
 217
 218 Example
 219 =======
 220
 221 Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
 222 each node contains 4 storage devices for Ceph OSD Daemons. This configuration
 223 allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
 224 leaving an extra 2U for a rack switch.
 225
 226 To reflect our hierarchy of devices, nodes, racks and rows, we would execute
 227 the following::
 228
 229     $ crushtool -o crushmap --build --num_osds 320 \
 230            node straw 4 \
 231            rack straw 20 \
 232            row straw 2 \
 233            root straw 0
 234     # id        weight  type name       reweight
 235     -87 320     root root
 236     -85 160             row row0
 237     -81 80                      rack rack0
 238     -1  4                               node node0
 239     0   1                                       osd.0   1
 240     1   1                                       osd.1   1
 241     2   1                                       osd.2   1
 242     3   1                                       osd.3   1
 243     -2  4                               node node1
 244     4   1                                       osd.4   1
 245     5   1                                       osd.5   1
 246     ...
 247
 248 CRUSH rules are created so the generated crushmap can be
 249 tested. They are the same rules as the ones created by default when
 250 creating a new Ceph cluster. They can be further edited with::
 251
 252        # decompile
 253        crushtool -d crushmap -o map.txt
 254
 255        # edit
 256        emacs map.txt
 257
 258        # recompile
 259        crushtool -c map.txt -o crushmap
 260
 261 Reclassify
 262 ==========
 263
 264 The *reclassify* function allows users to transition from older maps that
 265 maintain parallel hierarchies for OSDs of different types to a modern CRUSH
 266 map that makes use of the *device class* feature.  For more information,
 267 see https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
 268
 269 Example output from --test
 270 ==========================
 271
 272 See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
 273 for sample ``crushtool --test`` commands and output produced thereby.
 274
 275 Availability
 276 ============
 277
 278 **crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
 279 refer to the Ceph documentation at https://docs.ceph.com for more
 280 information.
 281
 282
 283 See also
 284 ========
 285
 286 :doc:`ceph <ceph>`\(8),
 287 :doc:`osdmaptool <osdmaptool>`\(8),
 288
 289 Authors
 290 =======
 291
 292 John Wilkins, Sage Weil, Loic Dachary