3 ==========================================
4 crushtool -- CRUSH map manipulation tool
5 ==========================================
12 | **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds*
13 *layer1* *...* | --test ) [ -o *outfile* ]
19 **crushtool** is a utility that lets you create, compile, decompile
20 and test CRUSH map files.
22 CRUSH is a pseudo-random data distribution algorithm that efficiently
23 maps input values (which, in the context of Ceph, correspond to Placement
24 Groups) across a heterogeneous, hierarchically structured device map.
25 The algorithm was originally described in detail in the following paper
26 (although it has evolved some since then)::
28 http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
30 The tool has four modes of operation.
32 .. option:: --compile|-c map.txt
34 will compile a plaintext map.txt into a binary map file.
36 .. option:: --decompile|-d map
38 will take the compiled map and decompile it into a plaintext source
39 file, suitable for editing.
41 .. option:: --build --num_osds {num-osds} layer1 ...
43 will create map with the given layer structure. See below for a
48 will perform a dry run of a CRUSH mapping for a range of input
49 values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be
50 thought of as simulated Placement Groups. See below for a more
53 Unlike other Ceph tools, **crushtool** does not accept generic options
54 such as **--debug-crush** from the command line. They can, however, be
55 provided via the CEPH_ARGS environment variable. For instance, to
56 silence all output from the CRUSH subsystem::
58 CEPH_ARGS="--debug-crush 0" crushtool ...
61 Running tests with --test
62 =========================
64 The test mode will use the input crush map ( as specified with **-i
65 map** ) and perform a dry run of CRUSH mapping or random placement
66 (if **--simulate** is set ). On completion, two kinds of reports can be
68 1) The **--show-...** option outputs human readable information
70 2) The **--output-csv** option creates CSV files that are
71 documented by the **--help-output** option.
73 Note: Each Placement Group (PG) has an integer ID which can be obtained
74 from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32).
75 The pool and PG IDs are combined by a function to get a value which is
76 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
77 pools; it only runs simulations by mapping values in the range
78 ``[--min-x,--max-x]``.
81 .. option:: --show-statistics
83 Displays a summary of the distribution. For instance::
85 rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
87 shows that rule **1** which is named **metadata** successfully
88 mapped **1024** values to **result size == 5** devices when trying
89 to map them to **num_rep 5** replicas. When it fails to provide the
90 required mapping, presumably because the number of **tries** must
91 be increased, a breakdown of the failures is displayed. For instance::
93 rule 1 (metadata) num_rep 10 result size == 8: 4/1024
94 rule 1 (metadata) num_rep 10 result size == 9: 93/1024
95 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
97 shows that although **num_rep 10** replicas were required, **4**
98 out of **1024** values ( **4/1024** ) were mapped to **result size
101 .. option:: --show-mappings
103 Displays the mapping of each value in the range ``[--min-x,--max-x]``.
106 CRUSH rule 1 x 24 [11,6]
108 shows that value **24** is mapped to devices **[11,6]** by rule
111 .. option:: --show-bad-mappings
113 Displays which value failed to be mapped to the required number of
114 devices. For instance::
116 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
118 shows that when rule **1** was required to map **7** devices, it
119 could map only six : **[8,10,2,11,6,9]**.
121 .. option:: --show-utilization
123 Displays the expected and actual utilisation for each device, for
124 each number of replicas. For instance::
126 device 0: stored : 951 expected : 853.333
127 device 1: stored : 963 expected : 853.333
130 shows that device **0** stored **951** values and was expected to store **853**.
131 Implies **--show-statistics**.
133 .. option:: --show-utilization-all
135 Displays the same as **--show-utilization** but does not suppress
136 output when the weight of a device is zero.
137 Implies **--show-statistics**.
139 .. option:: --show-choose-tries
141 Displays how many attempts were needed to find a device mapping.
149 shows that **95224** mappings succeeded without retries, **3745**
150 mappings succeeded with one attempts, etc. There are as many rows
151 as the value of the **--set-choose-total-tries** option.
153 .. option:: --output-csv
155 Creates CSV files (in the current directory) containing information
156 documented by **--help-output**. The files are named after the rule
157 used when collecting the statistics. For instance, if the rule
158 : 'metadata' is used, the CSV files will be::
160 metadata-absolute_weights.csv
161 metadata-device_utilization.csv
164 The first line of the file shortly explains the column layout. For
167 metadata-absolute_weights.csv
168 Device ID, Absolute Weight
172 .. option:: --output-name NAME
174 Prepend **NAME** to the file names generated when **--output-csv**
175 is specified. For instance **--output-name FOO** will create
178 FOO-metadata-absolute_weights.csv
179 FOO-metadata-device_utilization.csv
182 The **--set-...** options can be used to modify the tunables of the
183 input crush map. The input crush map is modified in
184 memory. For example::
186 $ crushtool -i mymap --test --show-bad-mappings
187 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
189 could be fixed by increasing the **choose-total-tries** as follows:
191 $ crushtool -i mymap --test \
192 --show-bad-mappings \
193 --set-choose-total-tries 500
195 Building a map with --build
196 ===========================
198 The build mode will generate hierarchical maps. The first argument
199 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
200 layer describes how the layer (or devices) preceding it should be
203 Each layer consists of::
205 bucket ( uniform | list | tree | straw ) size
207 The **bucket** is the type of the buckets in the layer
208 (e.g. "rack"). Each bucket name will be built by appending a unique
209 number to the **bucket** string (e.g. "rack0", "rack1"...).
211 The second component is the type of bucket: **straw** should be used
214 The third component is the maximum size of the bucket. A size of zero
215 means a bucket of infinite capacity.
221 Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
222 each node contains 4 storage devices for Ceph OSD Daemons. This configuration
223 allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
224 leaving an extra 2U for a rack switch.
226 To reflect our hierarchy of devices, nodes, racks and rows, we would execute
229 $ crushtool -o crushmap --build --num_osds 320 \
234 # id weight type name reweight
248 CRUSH rules are created so the generated crushmap can be
249 tested. They are the same rules as the ones created by default when
250 creating a new Ceph cluster. They can be further edited with::
253 crushtool -d crushmap -o map.txt
259 crushtool -c map.txt -o crushmap
261 Example output from --test
262 ==========================
264 See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
265 for sample ``crushtool --test`` commands and output produced thereby.
270 **crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
271 refer to the Ceph documentation at http://ceph.com/docs for more
278 :doc:`ceph <ceph>`\(8),
279 :doc:`osdmaptool <osdmaptool>`\(8),
284 John Wilkins, Sage Weil, Loic Dachary