]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | :orphan: |
2 | ||
3 | ========================================== | |
4 | crushtool -- CRUSH map manipulation tool | |
5 | ========================================== | |
6 | ||
7 | .. program:: crushtool | |
8 | ||
9 | Synopsis | |
10 | ======== | |
11 | ||
12 | | **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds* | |
13 | *layer1* *...* | --test ) [ -o *outfile* ] | |
14 | ||
15 | ||
16 | Description | |
17 | =========== | |
18 | ||
19 | **crushtool** is a utility that lets you create, compile, decompile | |
20 | and test CRUSH map files. | |
21 | ||
22 | CRUSH is a pseudo-random data distribution algorithm that efficiently | |
23 | maps input values (which, in the context of Ceph, correspond to Placement | |
24 | Groups) across a heterogeneous, hierarchically structured device map. | |
25 | The algorithm was originally described in detail in the following paper | |
26 | (although it has evolved some since then):: | |
27 | ||
28 | http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf | |
29 | ||
30 | The tool has four modes of operation. | |
31 | ||
32 | .. option:: --compile|-c map.txt | |
33 | ||
34 | will compile a plaintext map.txt into a binary map file. | |
35 | ||
36 | .. option:: --decompile|-d map | |
37 | ||
38 | will take the compiled map and decompile it into a plaintext source | |
39 | file, suitable for editing. | |
40 | ||
41 | .. option:: --build --num_osds {num-osds} layer1 ... | |
42 | ||
43 | will create map with the given layer structure. See below for a | |
44 | detailed explanation. | |
45 | ||
46 | .. option:: --test | |
47 | ||
48 | will perform a dry run of a CRUSH mapping for a range of input | |
49 | values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be | |
50 | thought of as simulated Placement Groups. See below for a more | |
51 | detailed explanation. | |
52 | ||
53 | Unlike other Ceph tools, **crushtool** does not accept generic options | |
54 | such as **--debug-crush** from the command line. They can, however, be | |
55 | provided via the CEPH_ARGS environment variable. For instance, to | |
56 | silence all output from the CRUSH subsystem:: | |
57 | ||
58 | CEPH_ARGS="--debug-crush 0" crushtool ... | |
59 | ||
60 | ||
61 | Running tests with --test | |
62 | ========================= | |
63 | ||
64 | The test mode will use the input crush map ( as specified with **-i | |
65 | map** ) and perform a dry run of CRUSH mapping or random placement | |
66 | (if **--simulate** is set ). On completion, two kinds of reports can be | |
67 | created. | |
68 | 1) The **--show-...** option outputs human readable information | |
69 | on stderr. | |
70 | 2) The **--output-csv** option creates CSV files that are | |
71 | documented by the **--help-output** option. | |
72 | ||
73 | Note: Each Placement Group (PG) has an integer ID which can be obtained | |
74 | from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32). | |
75 | The pool and PG IDs are combined by a function to get a value which is | |
76 | given to CRUSH to map it to OSDs. crushtool does not know about PGs or | |
77 | pools; it only runs simulations by mapping values in the range | |
78 | ``[--min-x,--max-x]``. | |
79 | ||
80 | ||
81 | .. option:: --show-statistics | |
82 | ||
83 | Displays a summary of the distribution. For instance:: | |
84 | ||
85 | rule 1 (metadata) num_rep 5 result size == 5: 1024/1024 | |
86 | ||
87 | shows that rule **1** which is named **metadata** successfully | |
88 | mapped **1024** values to **result size == 5** devices when trying | |
89 | to map them to **num_rep 5** replicas. When it fails to provide the | |
90 | required mapping, presumably because the number of **tries** must | |
91 | be increased, a breakdown of the failures is displayed. For instance:: | |
92 | ||
93 | rule 1 (metadata) num_rep 10 result size == 8: 4/1024 | |
94 | rule 1 (metadata) num_rep 10 result size == 9: 93/1024 | |
95 | rule 1 (metadata) num_rep 10 result size == 10: 927/1024 | |
96 | ||
97 | shows that although **num_rep 10** replicas were required, **4** | |
98 | out of **1024** values ( **4/1024** ) were mapped to **result size | |
99 | == 8** devices only. | |
100 | ||
101 | .. option:: --show-mappings | |
102 | ||
103 | Displays the mapping of each value in the range ``[--min-x,--max-x]``. | |
104 | For instance:: | |
105 | ||
106 | CRUSH rule 1 x 24 [11,6] | |
107 | ||
108 | shows that value **24** is mapped to devices **[11,6]** by rule | |
109 | **1**. | |
110 | ||
111 | .. option:: --show-bad-mappings | |
112 | ||
113 | Displays which value failed to be mapped to the required number of | |
114 | devices. For instance:: | |
115 | ||
116 | bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9] | |
117 | ||
118 | shows that when rule **1** was required to map **7** devices, it | |
119 | could map only six : **[8,10,2,11,6,9]**. | |
120 | ||
121 | .. option:: --show-utilization | |
122 | ||
123 | Displays the expected and actual utilisation for each device, for | |
124 | each number of replicas. For instance:: | |
125 | ||
126 | device 0: stored : 951 expected : 853.333 | |
127 | device 1: stored : 963 expected : 853.333 | |
128 | ... | |
129 | ||
130 | shows that device **0** stored **951** values and was expected to store **853**. | |
131 | Implies **--show-statistics**. | |
132 | ||
133 | .. option:: --show-utilization-all | |
134 | ||
135 | Displays the same as **--show-utilization** but does not suppress | |
136 | output when the weight of a device is zero. | |
137 | Implies **--show-statistics**. | |
138 | ||
139 | .. option:: --show-choose-tries | |
140 | ||
141 | Displays how many attempts were needed to find a device mapping. | |
142 | For instance:: | |
143 | ||
144 | 0: 95224 | |
145 | 1: 3745 | |
146 | 2: 2225 | |
147 | .. | |
148 | ||
149 | shows that **95224** mappings succeeded without retries, **3745** | |
150 | mappings succeeded with one attempts, etc. There are as many rows | |
151 | as the value of the **--set-choose-total-tries** option. | |
152 | ||
153 | .. option:: --output-csv | |
154 | ||
155 | Creates CSV files (in the current directory) containing information | |
156 | documented by **--help-output**. The files are named after the rule | |
157 | used when collecting the statistics. For instance, if the rule | |
158 | : 'metadata' is used, the CSV files will be:: | |
159 | ||
160 | metadata-absolute_weights.csv | |
161 | metadata-device_utilization.csv | |
162 | ... | |
163 | ||
164 | The first line of the file shortly explains the column layout. For | |
165 | instance:: | |
166 | ||
167 | metadata-absolute_weights.csv | |
168 | Device ID, Absolute Weight | |
169 | 0,1 | |
170 | ... | |
171 | ||
172 | .. option:: --output-name NAME | |
173 | ||
174 | Prepend **NAME** to the file names generated when **--output-csv** | |
175 | is specified. For instance **--output-name FOO** will create | |
176 | files:: | |
177 | ||
178 | FOO-metadata-absolute_weights.csv | |
179 | FOO-metadata-device_utilization.csv | |
180 | ... | |
181 | ||
182 | The **--set-...** options can be used to modify the tunables of the | |
183 | input crush map. The input crush map is modified in | |
184 | memory. For example:: | |
185 | ||
186 | $ crushtool -i mymap --test --show-bad-mappings | |
187 | bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9] | |
188 | ||
189 | could be fixed by increasing the **choose-total-tries** as follows: | |
190 | ||
191 | $ crushtool -i mymap --test \ | |
192 | --show-bad-mappings \ | |
193 | --set-choose-total-tries 500 | |
194 | ||
195 | Building a map with --build | |
196 | =========================== | |
197 | ||
198 | The build mode will generate hierarchical maps. The first argument | |
199 | specifies the number of devices (leaves) in the CRUSH hierarchy. Each | |
200 | layer describes how the layer (or devices) preceding it should be | |
201 | grouped. | |
202 | ||
203 | Each layer consists of:: | |
204 | ||
205 | bucket ( uniform | list | tree | straw ) size | |
206 | ||
207 | The **bucket** is the type of the buckets in the layer | |
208 | (e.g. "rack"). Each bucket name will be built by appending a unique | |
209 | number to the **bucket** string (e.g. "rack0", "rack1"...). | |
210 | ||
211 | The second component is the type of bucket: **straw** should be used | |
212 | most of the time. | |
213 | ||
214 | The third component is the maximum size of the bucket. A size of zero | |
215 | means a bucket of infinite capacity. | |
216 | ||
217 | ||
218 | Example | |
219 | ======= | |
220 | ||
221 | Suppose we have two rows with two racks each and 20 nodes per rack. Suppose | |
222 | each node contains 4 storage devices for Ceph OSD Daemons. This configuration | |
223 | allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes, | |
224 | leaving an extra 2U for a rack switch. | |
225 | ||
226 | To reflect our hierarchy of devices, nodes, racks and rows, we would execute | |
227 | the following:: | |
228 | ||
229 | $ crushtool -o crushmap --build --num_osds 320 \ | |
230 | node straw 4 \ | |
231 | rack straw 20 \ | |
232 | row straw 2 \ | |
233 | root straw 0 | |
234 | # id weight type name reweight | |
235 | -87 320 root root | |
236 | -85 160 row row0 | |
237 | -81 80 rack rack0 | |
238 | -1 4 node node0 | |
239 | 0 1 osd.0 1 | |
240 | 1 1 osd.1 1 | |
241 | 2 1 osd.2 1 | |
242 | 3 1 osd.3 1 | |
243 | -2 4 node node1 | |
244 | 4 1 osd.4 1 | |
245 | 5 1 osd.5 1 | |
246 | ... | |
247 | ||
248 | CRUSH rulesets are created so the generated crushmap can be | |
249 | tested. They are the same rulesets as the one created by default when | |
250 | creating a new Ceph cluster. They can be further edited with:: | |
251 | ||
252 | # decompile | |
253 | crushtool -d crushmap -o map.txt | |
254 | ||
255 | # edit | |
256 | emacs map.txt | |
257 | ||
258 | # recompile | |
259 | crushtool -c map.txt -o crushmap | |
260 | ||
261 | Example output from --test | |
262 | ========================== | |
263 | ||
264 | See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t | |
265 | for sample ``crushtool --test`` commands and output produced thereby. | |
266 | ||
267 | Availability | |
268 | ============ | |
269 | ||
270 | **crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please | |
271 | refer to the Ceph documentation at http://ceph.com/docs for more | |
272 | information. | |
273 | ||
274 | ||
275 | See also | |
276 | ======== | |
277 | ||
278 | :doc:`ceph <ceph>`\(8), | |
279 | :doc:`osdmaptool <osdmaptool>`\(8), | |
280 | ||
281 | Authors | |
282 | ======= | |
283 | ||
284 | John Wilkins, Sage Weil, Loic Dachary |