]> git.proxmox.com Git - ceph.git/blame - ceph/doc/man/8/osdmaptool.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / man / 8 / osdmaptool.rst
CommitLineData
7c673cae
FG
1:orphan:
2
11fdf7f2
TL
3.. _osdmaptool:
4
7c673cae
FG
5======================================================
6 osdmaptool -- ceph osd cluster map manipulation tool
7======================================================
8
9.. program:: osdmaptool
10
11Synopsis
12========
13
14| **osdmaptool** *mapfilename* [--print] [--createsimple *numosd*
15 [--pgbits *bitsperosd* ] ] [--clobber]
92f5a8d4
TL
16| **osdmaptool** *mapfilename* [--import-crush *crushmap*]
17| **osdmaptool** *mapfilename* [--export-crush *crushmap*]
18| **osdmaptool** *mapfilename* [--upmap *file*] [--upmap-max *max-optimizations*]
19 [--upmap-deviation *max-deviation*] [--upmap-pool *poolname*]
f67539c2
TL
20 [--save] [--upmap-active]
21| **osdmaptool** *mapfilename* [--upmap-cleanup] [--upmap *file*]
7c673cae
FG
22
23
24Description
25===========
26
27**osdmaptool** is a utility that lets you create, view, and manipulate
28OSD cluster maps from the Ceph distributed storage system. Notably, it
29lets you extract the embedded CRUSH map or import a new CRUSH map.
92f5a8d4
TL
30It can also simulate the upmap balancer mode so you can get a sense of
31what is needed to balance your PGs.
7c673cae
FG
32
33
34Options
35=======
36
37.. option:: --print
38
39 will simply make the tool print a plaintext dump of the map, after
40 any modifications are made.
41
11fdf7f2
TL
42.. option:: --dump <format>
43
44 displays the map in plain text when <format> is 'plain', 'json' if specified
45 format is not supported. This is an alternative to the print option.
46
7c673cae
FG
47.. option:: --clobber
48
49 will allow osdmaptool to overwrite mapfilename if changes are made.
50
51.. option:: --import-crush mapfile
52
53 will load the CRUSH map from mapfile and embed it in the OSD map.
54
55.. option:: --export-crush mapfile
56
57 will extract the CRUSH map from the OSD map and write it to
58 mapfile.
59
11fdf7f2 60.. option:: --createsimple numosd [--pg-bits bitsperosd] [--pgp-bits bits]
7c673cae
FG
61
62 will create a relatively generic OSD map with the numosd devices.
11fdf7f2 63 If --pg-bits is specified, the initial placement group counts will
7c673cae
FG
64 be set with bitsperosd bits per OSD. That is, the pg_num map
65 attribute will be set to numosd shifted by bitsperosd.
11fdf7f2
TL
66 If --pgp-bits is specified, then the pgp_num map attribute will
67 be set to numosd shifted by bits.
68
69.. option:: --create-from-conf
70
71 creates an osd map with default configurations.
7c673cae 72
11fdf7f2 73.. option:: --test-map-pgs [--pool poolid] [--range-first <first> --range-last <last>]
7c673cae
FG
74
75 will print out the mappings from placement groups to OSDs.
11fdf7f2
TL
76 If range is specified, then it iterates from first to last in the directory
77 specified by argument to osdmaptool.
78 Eg: **osdmaptool --test-map-pgs --range-first 0 --range-last 2 osdmap_dir**.
79 This will iterate through the files named 0,1,2 in osdmap_dir.
7c673cae 80
11fdf7f2
TL
81.. option:: --test-map-pgs-dump [--pool poolid] [--range-first <first> --range-last <last>]
82
83 will print out the summary of all placement groups and the mappings from them to the mapped OSDs.
84 If range is specified, then it iterates from first to last in the directory
85 specified by argument to osdmaptool.
86 Eg: **osdmaptool --test-map-pgs-dump --range-first 0 --range-last 2 osdmap_dir**.
87 This will iterate through the files named 0,1,2 in osdmap_dir.
88
89.. option:: --test-map-pgs-dump-all [--pool poolid] [--range-first <first> --range-last <last>]
7c673cae
FG
90
91 will print out the summary of all placement groups and the mappings
11fdf7f2
TL
92 from them to all the OSDs.
93 If range is specified, then it iterates from first to last in the directory
94 specified by argument to osdmaptool.
95 Eg: **osdmaptool --test-map-pgs-dump-all --range-first 0 --range-last 2 osdmap_dir**.
96 This will iterate through the files named 0,1,2 in osdmap_dir.
97
98.. option:: --test-random
99
100 does a random mapping of placement groups to the OSDs.
101
102.. option:: --test-map-pg <pgid>
103
104 map a particular placement group(specified by pgid) to the OSDs.
105
106.. option:: --test-map-object <objectname> [--pool <poolid>]
107
108 map a particular placement group(specified by objectname) to the OSDs.
109
110.. option:: --test-crush [--range-first <first> --range-last <last>]
111
112 map placement groups to acting OSDs.
113 If range is specified, then it iterates from first to last in the directory
114 specified by argument to osdmaptool.
115 Eg: **osdmaptool --test-crush --range-first 0 --range-last 2 osdmap_dir**.
116 This will iterate through the files named 0,1,2 in osdmap_dir.
117
118.. option:: --mark-up-in
119
120 mark osds up and in (but do not persist).
121
92f5a8d4
TL
122.. option:: --mark-out
123
124 mark an osd as out (but do not persist)
125
f67539c2
TL
126.. option:: --mark-up <osdid>
127
128 mark an osd as up (but do not persist)
129
130.. option:: --mark-in <osdid>
131
132 mark an osd as in (but do not persist)
133
11fdf7f2
TL
134.. option:: --tree
135
136 Displays a hierarchical tree of the map.
137
138.. option:: --clear-temp
7c673cae 139
11fdf7f2 140 clears pg_temp and primary_temp variables.
7c673cae 141
f91f0fd5
TL
142.. option:: --clean-temps
143
144 clean pg_temps.
145
92f5a8d4
TL
146.. option:: --health
147
148 dump health checks
149
150.. option:: --with-default-pool
151
152 include default pool when creating map
153
154.. option:: --upmap-cleanup <file>
155
156 clean up pg_upmap[_items] entries, writing commands to <file> [default: - for stdout]
157
158.. option:: --upmap <file>
159
160 calculate pg upmap entries to balance pg layout writing commands to <file> [default: - for stdout]
161
162.. option:: --upmap-max <max-optimizations>
163
164 set max upmap entries to calculate [default: 10]
165
166.. option:: --upmap-deviation <max-deviation>
167
168 max deviation from target [default: 5]
169
170.. option:: --upmap-pool <poolname>
171
172 restrict upmap balancing to 1 pool or the option can be repeated for multiple pools
173
92f5a8d4
TL
174.. option:: --upmap-active
175
176 Act like an active balancer, keep applying changes until balanced
177
f67539c2
TL
178.. option:: --adjust-crush-weight <osdid:weight>[,<osdid:weight>,<...>]
179
180 Change CRUSH weight of <osdid>
181
182.. option:: --save
183
184 write modified osdmap with upmap or crush-adjust changes
92f5a8d4 185
aee94f69
TL
186.. option:: --read <file>
187
188 calculate pg upmap entries to balance pg primaries
189
190.. option:: --read-pool <poolname>
191
192 specify which pool the read balancer should adjust
193
194.. option:: --vstart
195
196 prefix upmap and read output with './bin/'
197
7c673cae
FG
198Example
199=======
200
201To create a simple map with 16 devices::
202
203 osdmaptool --createsimple 16 osdmap --clobber
204
205To view the result::
206
207 osdmaptool --print osdmap
208
92f5a8d4 209To view the mappings of placement groups for pool 1::
7c673cae 210
92f5a8d4 211 osdmaptool osdmap --test-map-pgs-dump --pool 1
7c673cae 212
f67539c2 213 pool 1 pg_num 8
92f5a8d4
TL
214 1.0 [0,2,1] 0
215 1.1 [2,0,1] 2
216 1.2 [0,1,2] 0
217 1.3 [2,0,1] 2
218 1.4 [0,2,1] 0
219 1.5 [0,2,1] 0
220 1.6 [0,1,2] 0
221 1.7 [1,0,2] 1
7c673cae
FG
222 #osd count first primary c wt wt
223 osd.0 8 5 5 1 1
224 osd.1 8 1 1 1 1
225 osd.2 8 2 2 1 1
226 in 3
227 avg 8 stddev 0 (0x) (expected 2.3094 0.288675x))
228 min osd.0 8
229 max osd.0 8
230 size 0 0
231 size 1 0
232 size 2 0
233 size 3 8
234
235In which,
92f5a8d4 236 #. pool 1 has 8 placement groups. And two tables follow:
7c673cae
FG
237 #. A table for placement groups. Each row presents a placement group. With columns of:
238
239 * placement group id,
240 * acting set, and
241 * primary OSD.
242 #. A table for all OSDs. Each row presents an OSD. With columns of:
243
244 * count of placement groups being mapped to this OSD,
245 * count of placement groups where this OSD is the first one in their acting sets,
246 * count of placement groups where this OSD is the primary of them,
247 * the CRUSH weight of this OSD, and
248 * the weight of this OSD.
249 #. Looking at the number of placement groups held by 3 OSDs. We have
250
1e59de90 251 * average, stddev, stddev/average, expected stddev, expected stddev / average
7c673cae
FG
252 * min and max
253 #. The number of placement groups mapping to n OSDs. In this case, all 8 placement
254 groups are mapping to 3 different OSDs.
255
256In a less-balanced cluster, we could have following output for the statistics of
257placement group distribution, whose standard deviation is 1.41421::
258
259 #osd count first primary c wt wt
260 osd.0 8 5 5 1 1
261 osd.1 8 1 1 1 1
262 osd.2 8 2 2 1 1
263
264 #osd count first primary c wt wt
265 osd.0 33 9 9 0.0145874 1
266 osd.1 34 14 14 0.0145874 1
267 osd.2 31 7 7 0.0145874 1
268 osd.3 31 13 13 0.0145874 1
269 osd.4 30 14 14 0.0145874 1
270 osd.5 33 7 7 0.0145874 1
271 in 6
272 avg 32 stddev 1.41421 (0.0441942x) (expected 5.16398 0.161374x))
273 min osd.4 30
274 max osd.1 34
275 size 00
276 size 10
277 size 20
278 size 364
279
f67539c2 280To simulate the active balancer in upmap mode::
92f5a8d4
TL
281
282 osdmaptool --upmap upmaps.out --upmap-active --upmap-deviation 6 --upmap-max 11 osdmap
283
284 osdmaptool: osdmap file 'osdmap'
285 writing upmap command output to: upmaps.out
286 checking for upmap cleanups
287 upmap, max-count 11, max deviation 6
288 pools movies photos metadata data
289 prepared 11/11 changes
290 Time elapsed 0.00310404 secs
291 pools movies photos metadata data
292 prepared 11/11 changes
293 Time elapsed 0.00283402 secs
294 pools data metadata movies photos
295 prepared 11/11 changes
296 Time elapsed 0.003122 secs
297 pools photos metadata data movies
298 prepared 11/11 changes
299 Time elapsed 0.00324372 secs
300 pools movies metadata data photos
301 prepared 1/11 changes
302 Time elapsed 0.00222609 secs
303 pools data movies photos metadata
304 prepared 0/11 changes
305 Time elapsed 0.00209916 secs
306 Unable to find further optimization, or distribution is already perfect
307 osd.0 pgs 41
308 osd.1 pgs 42
309 osd.2 pgs 42
310 osd.3 pgs 41
311 osd.4 pgs 46
312 osd.5 pgs 39
313 osd.6 pgs 39
314 osd.7 pgs 43
315 osd.8 pgs 41
316 osd.9 pgs 46
317 osd.10 pgs 46
318 osd.11 pgs 46
319 osd.12 pgs 46
320 osd.13 pgs 41
321 osd.14 pgs 40
322 osd.15 pgs 40
323 osd.16 pgs 39
324 osd.17 pgs 46
325 osd.18 pgs 46
326 osd.19 pgs 39
327 osd.20 pgs 42
328 Total time elapsed 0.0167765 secs, 5 rounds
329
aee94f69
TL
330To simulate the active balancer in read mode, first make sure capacity is balanced
331by running the balancer in upmap mode. Then, balance the reads on a replicated pool with::
332
333 osdmaptool osdmap --read read.out --read-pool <pool name>
334
335 ./bin/osdmaptool: osdmap file 'om'
336 writing upmap command output to: read.out
337
338 ---------- BEFORE ------------
339 osd.0 | primary affinity: 1 | number of prims: 3
340 osd.1 | primary affinity: 1 | number of prims: 10
341 osd.2 | primary affinity: 1 | number of prims: 3
342
343 read_balance_score of 'cephfs.a.meta': 1.88
344
345
346 ---------- AFTER ------------
347 osd.0 | primary affinity: 1 | number of prims: 5
348 osd.1 | primary affinity: 1 | number of prims: 5
349 osd.2 | primary affinity: 1 | number of prims: 6
350
351 read_balance_score of 'cephfs.a.meta': 1.13
352
353
354 num changes: 5
7c673cae
FG
355
356Availability
357============
358
359**osdmaptool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
20effc67 360refer to the Ceph documentation at https://docs.ceph.com for more
7c673cae
FG
361information.
362
363
364See also
365========
366
367:doc:`ceph <ceph>`\(8),
368:doc:`crushtool <crushtool>`\(8),