]> git.proxmox.com Git - ceph.git/blob - ceph/doc/man/8/crushtool.rst
bump version to 18.2.2-pve1
[ceph.git] / ceph / doc / man / 8 / crushtool.rst
1 :orphan:
2
3 ==========================================
4 crushtool -- CRUSH map manipulation tool
5 ==========================================
6
7 .. program:: crushtool
8
9 Synopsis
10 ========
11
12 | **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds*
13 *layer1* *...* | --test ) [ -o *outfile* ]
14
15
16 Description
17 ===========
18
19 **crushtool** is a utility that lets you create, compile, decompile
20 and test CRUSH map files.
21
22 CRUSH is a pseudo-random data distribution algorithm that efficiently
23 maps input values (which, in the context of Ceph, correspond to Placement
24 Groups) across a heterogeneous, hierarchically structured device map.
25 The algorithm was originally described in detail in the following paper
26 (although it has evolved some since then)::
27
28 http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
29
30 The tool has four modes of operation.
31
32 .. option:: --compile|-c map.txt
33
34 will compile a plaintext map.txt into a binary map file.
35
36 .. option:: --decompile|-d map
37
38 will take the compiled map and decompile it into a plaintext source
39 file, suitable for editing.
40
41 .. option:: --build --num_osds {num-osds} layer1 ...
42
43 will create map with the given layer structure. See below for a
44 detailed explanation.
45
46 .. option:: --test
47
48 will perform a dry run of a CRUSH mapping for a range of input
49 values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be
50 thought of as simulated Placement Groups. See below for a more
51 detailed explanation.
52
53 Unlike other Ceph tools, **crushtool** does not accept generic options
54 such as **--debug-crush** from the command line. They can, however, be
55 provided via the CEPH_ARGS environment variable. For instance, to
56 silence all output from the CRUSH subsystem::
57
58 CEPH_ARGS="--debug-crush 0" crushtool ...
59
60
61 Running tests with --test
62 =========================
63
64 The test mode will use the input crush map ( as specified with **-i
65 map** ) and perform a dry run of CRUSH mapping or random placement
66 (if **--simulate** is set ). On completion, two kinds of reports can be
67 created.
68 1) The **--show-...** option outputs human readable information
69 on stderr.
70 2) The **--output-csv** option creates CSV files that are
71 documented by the **--help-output** option.
72
73 Note: Each Placement Group (PG) has an integer ID which can be obtained
74 from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32).
75 The pool and PG IDs are combined by a function to get a value which is
76 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
77 pools; it only runs simulations by mapping values in the range
78 ``[--min-x,--max-x]``.
79
80
81 .. option:: --show-statistics
82
83 Displays a summary of the distribution. For instance::
84
85 rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
86
87 shows that rule **1** which is named **metadata** successfully
88 mapped **1024** values to **result size == 5** devices when trying
89 to map them to **num_rep 5** replicas. When it fails to provide the
90 required mapping, presumably because the number of **tries** must
91 be increased, a breakdown of the failures is displayed. For instance::
92
93 rule 1 (metadata) num_rep 10 result size == 8: 4/1024
94 rule 1 (metadata) num_rep 10 result size == 9: 93/1024
95 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
96
97 shows that although **num_rep 10** replicas were required, **4**
98 out of **1024** values ( **4/1024** ) were mapped to **result size
99 == 8** devices only.
100
101 .. option:: --show-mappings
102
103 Displays the mapping of each value in the range ``[--min-x,--max-x]``.
104 For instance::
105
106 CRUSH rule 1 x 24 [11,6]
107
108 shows that value **24** is mapped to devices **[11,6]** by rule
109 **1**.
110
111 One of the following is required when using the ``--show-mappings`` option:
112
113 (a) ``--num-rep``
114 (b) both ``--min-rep`` and ``--max-rep``
115
116 ``--num-rep`` stands for "number of replicas, indicates the number of
117 replicas in a pool, and is used to specify an exact number of replicas (for
118 example ``--num-rep 5``). ``--min-rep`` and ``--max-rep`` are used together
119 to specify a range of replicas (for example, ``--min-rep 1 --max-rep 10``).
120
121 .. option:: --show-bad-mappings
122
123 Displays which value failed to be mapped to the required number of
124 devices. For instance::
125
126 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
127
128 shows that when rule **1** was required to map **7** devices, it
129 could map only six : **[8,10,2,11,6,9]**.
130
131 .. option:: --show-utilization
132
133 Displays the expected and actual utilization for each device, for
134 each number of replicas. For instance::
135
136 device 0: stored : 951 expected : 853.333
137 device 1: stored : 963 expected : 853.333
138 ...
139
140 shows that device **0** stored **951** values and was expected to store **853**.
141 Implies **--show-statistics**.
142
143 .. option:: --show-utilization-all
144
145 Displays the same as **--show-utilization** but does not suppress
146 output when the weight of a device is zero.
147 Implies **--show-statistics**.
148
149 .. option:: --show-choose-tries
150
151 Displays how many attempts were needed to find a device mapping.
152 For instance::
153
154 0: 95224
155 1: 3745
156 2: 2225
157 ..
158
159 shows that **95224** mappings succeeded without retries, **3745**
160 mappings succeeded with one attempts, etc. There are as many rows
161 as the value of the **--set-choose-total-tries** option.
162
163 .. option:: --output-csv
164
165 Creates CSV files (in the current directory) containing information
166 documented by **--help-output**. The files are named after the rule
167 used when collecting the statistics. For instance, if the rule
168 : 'metadata' is used, the CSV files will be::
169
170 metadata-absolute_weights.csv
171 metadata-device_utilization.csv
172 ...
173
174 The first line of the file shortly explains the column layout. For
175 instance::
176
177 metadata-absolute_weights.csv
178 Device ID, Absolute Weight
179 0,1
180 ...
181
182 .. option:: --output-name NAME
183
184 Prepend **NAME** to the file names generated when **--output-csv**
185 is specified. For instance **--output-name FOO** will create
186 files::
187
188 FOO-metadata-absolute_weights.csv
189 FOO-metadata-device_utilization.csv
190 ...
191
192 The **--set-...** options can be used to modify the tunables of the
193 input crush map. The input crush map is modified in
194 memory. For example::
195
196 $ crushtool -i mymap --test --show-bad-mappings
197 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
198
199 could be fixed by increasing the **choose-total-tries** as follows:
200
201 $ crushtool -i mymap --test \
202 --show-bad-mappings \
203 --set-choose-total-tries 500
204
205 Building a map with --build
206 ===========================
207
208 The build mode will generate hierarchical maps. The first argument
209 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
210 layer describes how the layer (or devices) preceding it should be
211 grouped.
212
213 Each layer consists of::
214
215 bucket ( uniform | list | tree | straw | straw2 ) size
216
217 The **bucket** is the type of the buckets in the layer
218 (e.g. "rack"). Each bucket name will be built by appending a unique
219 number to the **bucket** string (e.g. "rack0", "rack1"...).
220
221 The second component is the type of bucket: **straw** should be used
222 most of the time.
223
224 The third component is the maximum size of the bucket. A size of zero
225 means a bucket of infinite capacity.
226
227
228 Example
229 =======
230
231 Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
232 each node contains 4 storage devices for Ceph OSD Daemons. This configuration
233 allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
234 leaving an extra 2U for a rack switch.
235
236 To reflect our hierarchy of devices, nodes, racks and rows, we would execute
237 the following::
238
239 $ crushtool -o crushmap --build --num_osds 320 \
240 node straw 4 \
241 rack straw 20 \
242 row straw 2 \
243 root straw 0
244 # id weight type name reweight
245 -87 320 root root
246 -85 160 row row0
247 -81 80 rack rack0
248 -1 4 node node0
249 0 1 osd.0 1
250 1 1 osd.1 1
251 2 1 osd.2 1
252 3 1 osd.3 1
253 -2 4 node node1
254 4 1 osd.4 1
255 5 1 osd.5 1
256 ...
257
258 CRUSH rules are created so the generated crushmap can be
259 tested. They are the same rules as the ones created by default when
260 creating a new Ceph cluster. They can be further edited with::
261
262 # decompile
263 crushtool -d crushmap -o map.txt
264
265 # edit
266 emacs map.txt
267
268 # recompile
269 crushtool -c map.txt -o crushmap
270
271 Reclassify
272 ==========
273
274 The *reclassify* function allows users to transition from older maps that
275 maintain parallel hierarchies for OSDs of different types to a modern CRUSH
276 map that makes use of the *device class* feature. For more information,
277 see https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
278
279 Example output from --test
280 ==========================
281
282 See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
283 for sample ``crushtool --test`` commands and output produced thereby.
284
285 Availability
286 ============
287
288 **crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
289 refer to the Ceph documentation at https://docs.ceph.com for more
290 information.
291
292
293 See also
294 ========
295
296 :doc:`ceph <ceph>`\(8),
297 :doc:`osdmaptool <osdmaptool>`\(8),
298
299 Authors
300 =======
301
302 John Wilkins, Sage Weil, Loic Dachary