]> git.proxmox.com Git - ceph.git/blob - ceph/doc/man/8/crushtool.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / man / 8 / crushtool.rst
1 :orphan:
2
3 ==========================================
4 crushtool -- CRUSH map manipulation tool
5 ==========================================
6
7 .. program:: crushtool
8
9 Synopsis
10 ========
11
12 | **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds*
13 *layer1* *...* | --test ) [ -o *outfile* ]
14
15
16 Description
17 ===========
18
19 **crushtool** is a utility that lets you create, compile, decompile
20 and test CRUSH map files.
21
22 CRUSH is a pseudo-random data distribution algorithm that efficiently
23 maps input values (which, in the context of Ceph, correspond to Placement
24 Groups) across a heterogeneous, hierarchically structured device map.
25 The algorithm was originally described in detail in the following paper
26 (although it has evolved some since then)::
27
28 http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
29
30 The tool has four modes of operation.
31
32 .. option:: --compile|-c map.txt
33
34 will compile a plaintext map.txt into a binary map file.
35
36 .. option:: --decompile|-d map
37
38 will take the compiled map and decompile it into a plaintext source
39 file, suitable for editing.
40
41 .. option:: --build --num_osds {num-osds} layer1 ...
42
43 will create map with the given layer structure. See below for a
44 detailed explanation.
45
46 .. option:: --test
47
48 will perform a dry run of a CRUSH mapping for a range of input
49 values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be
50 thought of as simulated Placement Groups. See below for a more
51 detailed explanation.
52
53 Unlike other Ceph tools, **crushtool** does not accept generic options
54 such as **--debug-crush** from the command line. They can, however, be
55 provided via the CEPH_ARGS environment variable. For instance, to
56 silence all output from the CRUSH subsystem::
57
58 CEPH_ARGS="--debug-crush 0" crushtool ...
59
60
61 Running tests with --test
62 =========================
63
64 The test mode will use the input crush map ( as specified with **-i
65 map** ) and perform a dry run of CRUSH mapping or random placement
66 (if **--simulate** is set ). On completion, two kinds of reports can be
67 created.
68 1) The **--show-...** option outputs human readable information
69 on stderr.
70 2) The **--output-csv** option creates CSV files that are
71 documented by the **--help-output** option.
72
73 Note: Each Placement Group (PG) has an integer ID which can be obtained
74 from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32).
75 The pool and PG IDs are combined by a function to get a value which is
76 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
77 pools; it only runs simulations by mapping values in the range
78 ``[--min-x,--max-x]``.
79
80
81 .. option:: --show-statistics
82
83 Displays a summary of the distribution. For instance::
84
85 rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
86
87 shows that rule **1** which is named **metadata** successfully
88 mapped **1024** values to **result size == 5** devices when trying
89 to map them to **num_rep 5** replicas. When it fails to provide the
90 required mapping, presumably because the number of **tries** must
91 be increased, a breakdown of the failures is displayed. For instance::
92
93 rule 1 (metadata) num_rep 10 result size == 8: 4/1024
94 rule 1 (metadata) num_rep 10 result size == 9: 93/1024
95 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
96
97 shows that although **num_rep 10** replicas were required, **4**
98 out of **1024** values ( **4/1024** ) were mapped to **result size
99 == 8** devices only.
100
101 .. option:: --show-mappings
102
103 Displays the mapping of each value in the range ``[--min-x,--max-x]``.
104 For instance::
105
106 CRUSH rule 1 x 24 [11,6]
107
108 shows that value **24** is mapped to devices **[11,6]** by rule
109 **1**.
110
111 .. option:: --show-bad-mappings
112
113 Displays which value failed to be mapped to the required number of
114 devices. For instance::
115
116 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
117
118 shows that when rule **1** was required to map **7** devices, it
119 could map only six : **[8,10,2,11,6,9]**.
120
121 .. option:: --show-utilization
122
123 Displays the expected and actual utilization for each device, for
124 each number of replicas. For instance::
125
126 device 0: stored : 951 expected : 853.333
127 device 1: stored : 963 expected : 853.333
128 ...
129
130 shows that device **0** stored **951** values and was expected to store **853**.
131 Implies **--show-statistics**.
132
133 .. option:: --show-utilization-all
134
135 Displays the same as **--show-utilization** but does not suppress
136 output when the weight of a device is zero.
137 Implies **--show-statistics**.
138
139 .. option:: --show-choose-tries
140
141 Displays how many attempts were needed to find a device mapping.
142 For instance::
143
144 0: 95224
145 1: 3745
146 2: 2225
147 ..
148
149 shows that **95224** mappings succeeded without retries, **3745**
150 mappings succeeded with one attempts, etc. There are as many rows
151 as the value of the **--set-choose-total-tries** option.
152
153 .. option:: --output-csv
154
155 Creates CSV files (in the current directory) containing information
156 documented by **--help-output**. The files are named after the rule
157 used when collecting the statistics. For instance, if the rule
158 : 'metadata' is used, the CSV files will be::
159
160 metadata-absolute_weights.csv
161 metadata-device_utilization.csv
162 ...
163
164 The first line of the file shortly explains the column layout. For
165 instance::
166
167 metadata-absolute_weights.csv
168 Device ID, Absolute Weight
169 0,1
170 ...
171
172 .. option:: --output-name NAME
173
174 Prepend **NAME** to the file names generated when **--output-csv**
175 is specified. For instance **--output-name FOO** will create
176 files::
177
178 FOO-metadata-absolute_weights.csv
179 FOO-metadata-device_utilization.csv
180 ...
181
182 The **--set-...** options can be used to modify the tunables of the
183 input crush map. The input crush map is modified in
184 memory. For example::
185
186 $ crushtool -i mymap --test --show-bad-mappings
187 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
188
189 could be fixed by increasing the **choose-total-tries** as follows:
190
191 $ crushtool -i mymap --test \
192 --show-bad-mappings \
193 --set-choose-total-tries 500
194
195 Building a map with --build
196 ===========================
197
198 The build mode will generate hierarchical maps. The first argument
199 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
200 layer describes how the layer (or devices) preceding it should be
201 grouped.
202
203 Each layer consists of::
204
205 bucket ( uniform | list | tree | straw | straw2 ) size
206
207 The **bucket** is the type of the buckets in the layer
208 (e.g. "rack"). Each bucket name will be built by appending a unique
209 number to the **bucket** string (e.g. "rack0", "rack1"...).
210
211 The second component is the type of bucket: **straw** should be used
212 most of the time.
213
214 The third component is the maximum size of the bucket. A size of zero
215 means a bucket of infinite capacity.
216
217
218 Example
219 =======
220
221 Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
222 each node contains 4 storage devices for Ceph OSD Daemons. This configuration
223 allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
224 leaving an extra 2U for a rack switch.
225
226 To reflect our hierarchy of devices, nodes, racks and rows, we would execute
227 the following::
228
229 $ crushtool -o crushmap --build --num_osds 320 \
230 node straw 4 \
231 rack straw 20 \
232 row straw 2 \
233 root straw 0
234 # id weight type name reweight
235 -87 320 root root
236 -85 160 row row0
237 -81 80 rack rack0
238 -1 4 node node0
239 0 1 osd.0 1
240 1 1 osd.1 1
241 2 1 osd.2 1
242 3 1 osd.3 1
243 -2 4 node node1
244 4 1 osd.4 1
245 5 1 osd.5 1
246 ...
247
248 CRUSH rules are created so the generated crushmap can be
249 tested. They are the same rules as the ones created by default when
250 creating a new Ceph cluster. They can be further edited with::
251
252 # decompile
253 crushtool -d crushmap -o map.txt
254
255 # edit
256 emacs map.txt
257
258 # recompile
259 crushtool -c map.txt -o crushmap
260
261 Reclassify
262 ==========
263
264 The *reclassify* function allows users to transition from older maps that
265 maintain parallel hierarchies for OSDs of different types to a modern CRUSH
266 map that makes use of the *device class* feature. For more information,
267 see https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
268
269 Example output from --test
270 ==========================
271
272 See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
273 for sample ``crushtool --test`` commands and output produced thereby.
274
275 Availability
276 ============
277
278 **crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
279 refer to the Ceph documentation at https://docs.ceph.com for more
280 information.
281
282
283 See also
284 ========
285
286 :doc:`ceph <ceph>`\(8),
287 :doc:`osdmaptool <osdmaptool>`\(8),
288
289 Authors
290 =======
291
292 John Wilkins, Sage Weil, Loic Dachary