]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | :orphan: |
2 | ||
11fdf7f2 TL |
3 | .. _osdmaptool: |
4 | ||
7c673cae FG |
5 | ====================================================== |
6 | osdmaptool -- ceph osd cluster map manipulation tool | |
7 | ====================================================== | |
8 | ||
9 | .. program:: osdmaptool | |
10 | ||
11 | Synopsis | |
12 | ======== | |
13 | ||
14 | | **osdmaptool** *mapfilename* [--print] [--createsimple *numosd* | |
15 | [--pgbits *bitsperosd* ] ] [--clobber] | |
92f5a8d4 TL |
16 | | **osdmaptool** *mapfilename* [--import-crush *crushmap*] |
17 | | **osdmaptool** *mapfilename* [--export-crush *crushmap*] | |
18 | | **osdmaptool** *mapfilename* [--upmap *file*] [--upmap-max *max-optimizations*] | |
19 | [--upmap-deviation *max-deviation*] [--upmap-pool *poolname*] | |
f67539c2 TL |
20 | [--save] [--upmap-active] |
21 | | **osdmaptool** *mapfilename* [--upmap-cleanup] [--upmap *file*] | |
7c673cae FG |
22 | |
23 | ||
24 | Description | |
25 | =========== | |
26 | ||
27 | **osdmaptool** is a utility that lets you create, view, and manipulate | |
28 | OSD cluster maps from the Ceph distributed storage system. Notably, it | |
29 | lets you extract the embedded CRUSH map or import a new CRUSH map. | |
92f5a8d4 TL |
30 | It can also simulate the upmap balancer mode so you can get a sense of |
31 | what is needed to balance your PGs. | |
7c673cae FG |
32 | |
33 | ||
34 | Options | |
35 | ======= | |
36 | ||
37 | .. option:: --print | |
38 | ||
39 | will simply make the tool print a plaintext dump of the map, after | |
40 | any modifications are made. | |
41 | ||
11fdf7f2 TL |
42 | .. option:: --dump <format> |
43 | ||
44 | displays the map in plain text when <format> is 'plain', 'json' if specified | |
45 | format is not supported. This is an alternative to the print option. | |
46 | ||
7c673cae FG |
47 | .. option:: --clobber |
48 | ||
49 | will allow osdmaptool to overwrite mapfilename if changes are made. | |
50 | ||
51 | .. option:: --import-crush mapfile | |
52 | ||
53 | will load the CRUSH map from mapfile and embed it in the OSD map. | |
54 | ||
55 | .. option:: --export-crush mapfile | |
56 | ||
57 | will extract the CRUSH map from the OSD map and write it to | |
58 | mapfile. | |
59 | ||
11fdf7f2 | 60 | .. option:: --createsimple numosd [--pg-bits bitsperosd] [--pgp-bits bits] |
7c673cae FG |
61 | |
62 | will create a relatively generic OSD map with the numosd devices. | |
11fdf7f2 | 63 | If --pg-bits is specified, the initial placement group counts will |
7c673cae FG |
64 | be set with bitsperosd bits per OSD. That is, the pg_num map |
65 | attribute will be set to numosd shifted by bitsperosd. | |
11fdf7f2 TL |
66 | If --pgp-bits is specified, then the pgp_num map attribute will |
67 | be set to numosd shifted by bits. | |
68 | ||
69 | .. option:: --create-from-conf | |
70 | ||
71 | creates an osd map with default configurations. | |
7c673cae | 72 | |
11fdf7f2 | 73 | .. option:: --test-map-pgs [--pool poolid] [--range-first <first> --range-last <last>] |
7c673cae FG |
74 | |
75 | will print out the mappings from placement groups to OSDs. | |
11fdf7f2 TL |
76 | If range is specified, then it iterates from first to last in the directory |
77 | specified by argument to osdmaptool. | |
78 | Eg: **osdmaptool --test-map-pgs --range-first 0 --range-last 2 osdmap_dir**. | |
79 | This will iterate through the files named 0,1,2 in osdmap_dir. | |
7c673cae | 80 | |
11fdf7f2 TL |
81 | .. option:: --test-map-pgs-dump [--pool poolid] [--range-first <first> --range-last <last>] |
82 | ||
83 | will print out the summary of all placement groups and the mappings from them to the mapped OSDs. | |
84 | If range is specified, then it iterates from first to last in the directory | |
85 | specified by argument to osdmaptool. | |
86 | Eg: **osdmaptool --test-map-pgs-dump --range-first 0 --range-last 2 osdmap_dir**. | |
87 | This will iterate through the files named 0,1,2 in osdmap_dir. | |
88 | ||
89 | .. option:: --test-map-pgs-dump-all [--pool poolid] [--range-first <first> --range-last <last>] | |
7c673cae FG |
90 | |
91 | will print out the summary of all placement groups and the mappings | |
11fdf7f2 TL |
92 | from them to all the OSDs. |
93 | If range is specified, then it iterates from first to last in the directory | |
94 | specified by argument to osdmaptool. | |
95 | Eg: **osdmaptool --test-map-pgs-dump-all --range-first 0 --range-last 2 osdmap_dir**. | |
96 | This will iterate through the files named 0,1,2 in osdmap_dir. | |
97 | ||
98 | .. option:: --test-random | |
99 | ||
100 | does a random mapping of placement groups to the OSDs. | |
101 | ||
102 | .. option:: --test-map-pg <pgid> | |
103 | ||
104 | map a particular placement group(specified by pgid) to the OSDs. | |
105 | ||
106 | .. option:: --test-map-object <objectname> [--pool <poolid>] | |
107 | ||
108 | map a particular placement group(specified by objectname) to the OSDs. | |
109 | ||
110 | .. option:: --test-crush [--range-first <first> --range-last <last>] | |
111 | ||
112 | map placement groups to acting OSDs. | |
113 | If range is specified, then it iterates from first to last in the directory | |
114 | specified by argument to osdmaptool. | |
115 | Eg: **osdmaptool --test-crush --range-first 0 --range-last 2 osdmap_dir**. | |
116 | This will iterate through the files named 0,1,2 in osdmap_dir. | |
117 | ||
118 | .. option:: --mark-up-in | |
119 | ||
120 | mark osds up and in (but do not persist). | |
121 | ||
92f5a8d4 TL |
122 | .. option:: --mark-out |
123 | ||
124 | mark an osd as out (but do not persist) | |
125 | ||
f67539c2 TL |
126 | .. option:: --mark-up <osdid> |
127 | ||
128 | mark an osd as up (but do not persist) | |
129 | ||
130 | .. option:: --mark-in <osdid> | |
131 | ||
132 | mark an osd as in (but do not persist) | |
133 | ||
11fdf7f2 TL |
134 | .. option:: --tree |
135 | ||
136 | Displays a hierarchical tree of the map. | |
137 | ||
138 | .. option:: --clear-temp | |
7c673cae | 139 | |
11fdf7f2 | 140 | clears pg_temp and primary_temp variables. |
7c673cae | 141 | |
f91f0fd5 TL |
142 | .. option:: --clean-temps |
143 | ||
144 | clean pg_temps. | |
145 | ||
92f5a8d4 TL |
146 | .. option:: --health |
147 | ||
148 | dump health checks | |
149 | ||
150 | .. option:: --with-default-pool | |
151 | ||
152 | include default pool when creating map | |
153 | ||
154 | .. option:: --upmap-cleanup <file> | |
155 | ||
156 | clean up pg_upmap[_items] entries, writing commands to <file> [default: - for stdout] | |
157 | ||
158 | .. option:: --upmap <file> | |
159 | ||
160 | calculate pg upmap entries to balance pg layout writing commands to <file> [default: - for stdout] | |
161 | ||
162 | .. option:: --upmap-max <max-optimizations> | |
163 | ||
164 | set max upmap entries to calculate [default: 10] | |
165 | ||
166 | .. option:: --upmap-deviation <max-deviation> | |
167 | ||
168 | max deviation from target [default: 5] | |
169 | ||
170 | .. option:: --upmap-pool <poolname> | |
171 | ||
172 | restrict upmap balancing to 1 pool or the option can be repeated for multiple pools | |
173 | ||
92f5a8d4 TL |
174 | .. option:: --upmap-active |
175 | ||
176 | Act like an active balancer, keep applying changes until balanced | |
177 | ||
f67539c2 TL |
178 | .. option:: --adjust-crush-weight <osdid:weight>[,<osdid:weight>,<...>] |
179 | ||
180 | Change CRUSH weight of <osdid> | |
181 | ||
182 | .. option:: --save | |
183 | ||
184 | write modified osdmap with upmap or crush-adjust changes | |
92f5a8d4 | 185 | |
aee94f69 TL |
186 | .. option:: --read <file> |
187 | ||
188 | calculate pg upmap entries to balance pg primaries | |
189 | ||
190 | .. option:: --read-pool <poolname> | |
191 | ||
192 | specify which pool the read balancer should adjust | |
193 | ||
194 | .. option:: --vstart | |
195 | ||
196 | prefix upmap and read output with './bin/' | |
197 | ||
7c673cae FG |
198 | Example |
199 | ======= | |
200 | ||
201 | To create a simple map with 16 devices:: | |
202 | ||
203 | osdmaptool --createsimple 16 osdmap --clobber | |
204 | ||
205 | To view the result:: | |
206 | ||
207 | osdmaptool --print osdmap | |
208 | ||
92f5a8d4 | 209 | To view the mappings of placement groups for pool 1:: |
7c673cae | 210 | |
92f5a8d4 | 211 | osdmaptool osdmap --test-map-pgs-dump --pool 1 |
7c673cae | 212 | |
f67539c2 | 213 | pool 1 pg_num 8 |
92f5a8d4 TL |
214 | 1.0 [0,2,1] 0 |
215 | 1.1 [2,0,1] 2 | |
216 | 1.2 [0,1,2] 0 | |
217 | 1.3 [2,0,1] 2 | |
218 | 1.4 [0,2,1] 0 | |
219 | 1.5 [0,2,1] 0 | |
220 | 1.6 [0,1,2] 0 | |
221 | 1.7 [1,0,2] 1 | |
7c673cae FG |
222 | #osd count first primary c wt wt |
223 | osd.0 8 5 5 1 1 | |
224 | osd.1 8 1 1 1 1 | |
225 | osd.2 8 2 2 1 1 | |
226 | in 3 | |
227 | avg 8 stddev 0 (0x) (expected 2.3094 0.288675x)) | |
228 | min osd.0 8 | |
229 | max osd.0 8 | |
230 | size 0 0 | |
231 | size 1 0 | |
232 | size 2 0 | |
233 | size 3 8 | |
234 | ||
235 | In which, | |
92f5a8d4 | 236 | #. pool 1 has 8 placement groups. And two tables follow: |
7c673cae FG |
237 | #. A table for placement groups. Each row presents a placement group. With columns of: |
238 | ||
239 | * placement group id, | |
240 | * acting set, and | |
241 | * primary OSD. | |
242 | #. A table for all OSDs. Each row presents an OSD. With columns of: | |
243 | ||
244 | * count of placement groups being mapped to this OSD, | |
245 | * count of placement groups where this OSD is the first one in their acting sets, | |
246 | * count of placement groups where this OSD is the primary of them, | |
247 | * the CRUSH weight of this OSD, and | |
248 | * the weight of this OSD. | |
249 | #. Looking at the number of placement groups held by 3 OSDs. We have | |
250 | ||
1e59de90 | 251 | * average, stddev, stddev/average, expected stddev, expected stddev / average |
7c673cae FG |
252 | * min and max |
253 | #. The number of placement groups mapping to n OSDs. In this case, all 8 placement | |
254 | groups are mapping to 3 different OSDs. | |
255 | ||
256 | In a less-balanced cluster, we could have following output for the statistics of | |
257 | placement group distribution, whose standard deviation is 1.41421:: | |
258 | ||
259 | #osd count first primary c wt wt | |
260 | osd.0 8 5 5 1 1 | |
261 | osd.1 8 1 1 1 1 | |
262 | osd.2 8 2 2 1 1 | |
263 | ||
264 | #osd count first primary c wt wt | |
265 | osd.0 33 9 9 0.0145874 1 | |
266 | osd.1 34 14 14 0.0145874 1 | |
267 | osd.2 31 7 7 0.0145874 1 | |
268 | osd.3 31 13 13 0.0145874 1 | |
269 | osd.4 30 14 14 0.0145874 1 | |
270 | osd.5 33 7 7 0.0145874 1 | |
271 | in 6 | |
272 | avg 32 stddev 1.41421 (0.0441942x) (expected 5.16398 0.161374x)) | |
273 | min osd.4 30 | |
274 | max osd.1 34 | |
275 | size 00 | |
276 | size 10 | |
277 | size 20 | |
278 | size 364 | |
279 | ||
f67539c2 | 280 | To simulate the active balancer in upmap mode:: |
92f5a8d4 TL |
281 | |
282 | osdmaptool --upmap upmaps.out --upmap-active --upmap-deviation 6 --upmap-max 11 osdmap | |
283 | ||
284 | osdmaptool: osdmap file 'osdmap' | |
285 | writing upmap command output to: upmaps.out | |
286 | checking for upmap cleanups | |
287 | upmap, max-count 11, max deviation 6 | |
288 | pools movies photos metadata data | |
289 | prepared 11/11 changes | |
290 | Time elapsed 0.00310404 secs | |
291 | pools movies photos metadata data | |
292 | prepared 11/11 changes | |
293 | Time elapsed 0.00283402 secs | |
294 | pools data metadata movies photos | |
295 | prepared 11/11 changes | |
296 | Time elapsed 0.003122 secs | |
297 | pools photos metadata data movies | |
298 | prepared 11/11 changes | |
299 | Time elapsed 0.00324372 secs | |
300 | pools movies metadata data photos | |
301 | prepared 1/11 changes | |
302 | Time elapsed 0.00222609 secs | |
303 | pools data movies photos metadata | |
304 | prepared 0/11 changes | |
305 | Time elapsed 0.00209916 secs | |
306 | Unable to find further optimization, or distribution is already perfect | |
307 | osd.0 pgs 41 | |
308 | osd.1 pgs 42 | |
309 | osd.2 pgs 42 | |
310 | osd.3 pgs 41 | |
311 | osd.4 pgs 46 | |
312 | osd.5 pgs 39 | |
313 | osd.6 pgs 39 | |
314 | osd.7 pgs 43 | |
315 | osd.8 pgs 41 | |
316 | osd.9 pgs 46 | |
317 | osd.10 pgs 46 | |
318 | osd.11 pgs 46 | |
319 | osd.12 pgs 46 | |
320 | osd.13 pgs 41 | |
321 | osd.14 pgs 40 | |
322 | osd.15 pgs 40 | |
323 | osd.16 pgs 39 | |
324 | osd.17 pgs 46 | |
325 | osd.18 pgs 46 | |
326 | osd.19 pgs 39 | |
327 | osd.20 pgs 42 | |
328 | Total time elapsed 0.0167765 secs, 5 rounds | |
329 | ||
aee94f69 TL |
330 | To simulate the active balancer in read mode, first make sure capacity is balanced |
331 | by running the balancer in upmap mode. Then, balance the reads on a replicated pool with:: | |
332 | ||
333 | osdmaptool osdmap --read read.out --read-pool <pool name> | |
334 | ||
335 | ./bin/osdmaptool: osdmap file 'om' | |
336 | writing upmap command output to: read.out | |
337 | ||
338 | ---------- BEFORE ------------ | |
339 | osd.0 | primary affinity: 1 | number of prims: 3 | |
340 | osd.1 | primary affinity: 1 | number of prims: 10 | |
341 | osd.2 | primary affinity: 1 | number of prims: 3 | |
342 | ||
343 | read_balance_score of 'cephfs.a.meta': 1.88 | |
344 | ||
345 | ||
346 | ---------- AFTER ------------ | |
347 | osd.0 | primary affinity: 1 | number of prims: 5 | |
348 | osd.1 | primary affinity: 1 | number of prims: 5 | |
349 | osd.2 | primary affinity: 1 | number of prims: 6 | |
350 | ||
351 | read_balance_score of 'cephfs.a.meta': 1.13 | |
352 | ||
353 | ||
354 | num changes: 5 | |
7c673cae FG |
355 | |
356 | Availability | |
357 | ============ | |
358 | ||
359 | **osdmaptool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please | |
20effc67 | 360 | refer to the Ceph documentation at https://docs.ceph.com for more |
7c673cae FG |
361 | information. |
362 | ||
363 | ||
364 | See also | |
365 | ======== | |
366 | ||
367 | :doc:`ceph <ceph>`\(8), | |
368 | :doc:`crushtool <crushtool>`\(8), |