]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/control.rst
import ceph 16.2.7
[ceph.git] / ceph / doc / rados / operations / control.rst
1 .. index:: control, commands
2
3 ==================
4 Control Commands
5 ==================
6
7
8 Monitor Commands
9 ================
10
11 Monitor commands are issued using the ``ceph`` utility::
12
13 ceph [-m monhost] {command}
14
15 The command is usually (though not always) of the form::
16
17 ceph {subsystem} {command}
18
19
20 System Commands
21 ===============
22
23 Execute the following to display the current cluster status. ::
24
25 ceph -s
26 ceph status
27
28 Execute the following to display a running summary of cluster status
29 and major events. ::
30
31 ceph -w
32
33 Execute the following to show the monitor quorum, including which monitors are
34 participating and which one is the leader. ::
35
36 ceph mon stat
37 ceph quorum_status
38
39 Execute the following to query the status of a single monitor, including whether
40 or not it is in the quorum. ::
41
42 ceph tell mon.[id] mon_status
43
44 where the value of ``[id]`` can be determined, e.g., from ``ceph -s``.
45
46
47 Authentication Subsystem
48 ========================
49
50 To add a keyring for an OSD, execute the following::
51
52 ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}
53
54 To list the cluster's keys and their capabilities, execute the following::
55
56 ceph auth ls
57
58
59 Placement Group Subsystem
60 =========================
61
62 To display the statistics for all placement groups (PGs), execute the following::
63
64 ceph pg dump [--format {format}]
65
66 The valid formats are ``plain`` (default), ``json`` ``json-pretty``, ``xml``, and ``xml-pretty``.
67 When implementing monitoring and other tools, it is best to use ``json`` format.
68 JSON parsing is more deterministic than the human-oriented ``plain``, and the layout is much
69 less variable from release to release. The ``jq`` utility can be invaluable when extracting
70 data from JSON output.
71
72 To display the statistics for all placement groups stuck in a specified state,
73 execute the following::
74
75 ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]
76
77
78 ``--format`` may be ``plain`` (default), ``json``, ``json-pretty``, ``xml``, or ``xml-pretty``.
79
80 ``--threshold`` defines how many seconds "stuck" is (default: 300)
81
82 **Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
83 with the most up-to-date data to come back.
84
85 **Unclean** Placement groups contain objects that are not replicated the desired number
86 of times. They should be recovering.
87
88 **Stale** Placement groups are in an unknown state - the OSDs that host them have not
89 reported to the monitor cluster in a while (configured by
90 ``mon_osd_report_timeout``).
91
92 Delete "lost" objects or revert them to their prior state, either a previous version
93 or delete them if they were just created. ::
94
95 ceph pg {pgid} mark_unfound_lost revert|delete
96
97
98 .. _osd-subsystem:
99
100 OSD Subsystem
101 =============
102
103 Query OSD subsystem status. ::
104
105 ceph osd stat
106
107 Write a copy of the most recent OSD map to a file. See
108 :ref:`osdmaptool <osdmaptool>`. ::
109
110 ceph osd getmap -o file
111
112 Write a copy of the crush map from the most recent OSD map to
113 file. ::
114
115 ceph osd getcrushmap -o file
116
117 The foregoing is functionally equivalent to ::
118
119 ceph osd getmap -o /tmp/osdmap
120 osdmaptool /tmp/osdmap --export-crush file
121
122 Dump the OSD map. Valid formats for ``-f`` are ``plain``, ``json``, ``json-pretty``,
123 ``xml``, and ``xml-pretty``. If no ``--format`` option is given, the OSD map is
124 dumped as plain text. As above, JSON format is best for tools, scripting, and other automation. ::
125
126 ceph osd dump [--format {format}]
127
128 Dump the OSD map as a tree with one line per OSD containing weight
129 and state. ::
130
131 ceph osd tree [--format {format}]
132
133 Find out where a specific object is or would be stored in the system::
134
135 ceph osd map <pool-name> <object-name>
136
137 Add or move a new item (OSD) with the given id/name/weight at the specified
138 location. ::
139
140 ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]
141
142 Remove an existing item (OSD) from the CRUSH map. ::
143
144 ceph osd crush remove {name}
145
146 Remove an existing bucket from the CRUSH map. ::
147
148 ceph osd crush remove {bucket-name}
149
150 Move an existing bucket from one position in the hierarchy to another. ::
151
152 ceph osd crush move {id} {loc1} [{loc2} ...]
153
154 Set the weight of the item given by ``{name}`` to ``{weight}``. ::
155
156 ceph osd crush reweight {name} {weight}
157
158 Mark an OSD as ``lost``. This may result in permanent data loss. Use with caution. ::
159
160 ceph osd lost {id} [--yes-i-really-mean-it]
161
162 Create a new OSD. If no UUID is given, it will be set automatically when the OSD
163 starts up. ::
164
165 ceph osd create [{uuid}]
166
167 Remove the given OSD(s). ::
168
169 ceph osd rm [{id}...]
170
171 Query the current ``max_osd`` parameter in the OSD map. ::
172
173 ceph osd getmaxosd
174
175 Import the given crush map. ::
176
177 ceph osd setcrushmap -i file
178
179 Set the ``max_osd`` parameter in the OSD map. This defaults to 10000 now so
180 most admins will never need to adjust this. ::
181
182 ceph osd setmaxosd
183
184 Mark OSD ``{osd-num}`` down. ::
185
186 ceph osd down {osd-num}
187
188 Mark OSD ``{osd-num}`` out of the distribution (i.e. allocated no data). ::
189
190 ceph osd out {osd-num}
191
192 Mark ``{osd-num}`` in the distribution (i.e. allocated data). ::
193
194 ceph osd in {osd-num}
195
196 Set or clear the pause flags in the OSD map. If set, no IO requests
197 will be sent to any OSD. Clearing the flags via unpause results in
198 resending pending requests. ::
199
200 ceph osd pause
201 ceph osd unpause
202
203 Set the override weight (reweight) of ``{osd-num}`` to ``{weight}``. Two OSDs with the
204 same weight will receive roughly the same number of I/O requests and
205 store approximately the same amount of data. ``ceph osd reweight``
206 sets an override weight on the OSD. This value is in the range 0 to 1,
207 and forces CRUSH to re-place (1-weight) of the data that would
208 otherwise live on this drive. It does not change weights assigned
209 to the buckets above the OSD in the crush map, and is a corrective
210 measure in case the normal CRUSH distribution is not working out quite
211 right. For instance, if one of your OSDs is at 90% and the others are
212 at 50%, you could reduce this weight to compensate. ::
213
214 ceph osd reweight {osd-num} {weight}
215
216 Balance OSD fullness by reducing the override weight of OSDs which are
217 overly utilized. Note that these override aka ``reweight`` values
218 default to 1.00000 and are relative only to each other; they not absolute.
219 It is crucial to distinguish them from CRUSH weights, which reflect the
220 absolute capacity of a bucket in TiB. By default this command adjusts
221 override weight on OSDs which have + or - 20% of the average utilization,
222 but if you include a ``threshold`` that percentage will be used instead. ::
223
224 ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]
225
226 To limit the step by which any OSD's reweight will be changed, specify
227 ``max_change`` which defaults to 0.05. To limit the number of OSDs that will
228 be adjusted, specify ``max_osds`` as well; the default is 4. Increasing these
229 parameters can speed leveling of OSD utilization, at the potential cost of
230 greater impact on client operations due to more data moving at once.
231
232 To determine which and how many PGs and OSDs will be affected by a given invocation
233 you can test before executing. ::
234
235 ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]
236
237 Adding ``--no-increasing`` to either command prevents increasing any
238 override weights that are currently < 1.00000. This can be useful when
239 you are balancing in a hurry to remedy ``full`` or ``nearful`` OSDs or
240 when some OSDs are being evacuated or slowly brought into service.
241
242 Deployments utilizing Nautilus (or later revisions of Luminous and Mimic)
243 that have no pre-Luminous cients may instead wish to instead enable the
244 `balancer`` module for ``ceph-mgr``.
245
246 Add/remove an IP address to/from the blocklist. When adding an address,
247 you can specify how long it should be blocklisted in seconds; otherwise,
248 it will default to 1 hour. A blocklisted address is prevented from
249 connecting to any OSD. Blocklisting is most often used to prevent a
250 lagging metadata server from making bad changes to data on the OSDs.
251
252 These commands are mostly only useful for failure testing, as
253 blocklists are normally maintained automatically and shouldn't need
254 manual intervention. ::
255
256 ceph osd blocklist add ADDRESS[:source_port] [TIME]
257 ceph osd blocklist rm ADDRESS[:source_port]
258
259 Creates/deletes a snapshot of a pool. ::
260
261 ceph osd pool mksnap {pool-name} {snap-name}
262 ceph osd pool rmsnap {pool-name} {snap-name}
263
264 Creates/deletes/renames a storage pool. ::
265
266 ceph osd pool create {pool-name} [pg_num [pgp_num]]
267 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
268 ceph osd pool rename {old-name} {new-name}
269
270 Changes a pool setting. ::
271
272 ceph osd pool set {pool-name} {field} {value}
273
274 Valid fields are:
275
276 * ``size``: Sets the number of copies of data in the pool.
277 * ``pg_num``: The placement group number.
278 * ``pgp_num``: Effective number when calculating pg placement.
279 * ``crush_rule``: rule number for mapping placement.
280
281 Get the value of a pool setting. ::
282
283 ceph osd pool get {pool-name} {field}
284
285 Valid fields are:
286
287 * ``pg_num``: The placement group number.
288 * ``pgp_num``: Effective number of placement groups when calculating placement.
289
290
291 Sends a scrub command to OSD ``{osd-num}``. To send the command to all OSDs, use ``*``. ::
292
293 ceph osd scrub {osd-num}
294
295 Sends a repair command to OSD.N. To send the command to all OSDs, use ``*``. ::
296
297 ceph osd repair N
298
299 Runs a simple throughput benchmark against OSD.N, writing ``TOTAL_DATA_BYTES``
300 in write requests of ``BYTES_PER_WRITE`` each. By default, the test
301 writes 1 GB in total in 4-MB increments.
302 The benchmark is non-destructive and will not overwrite existing live
303 OSD data, but might temporarily affect the performance of clients
304 concurrently accessing the OSD. ::
305
306 ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]
307
308 To clear an OSD's caches between benchmark runs, use the 'cache drop' command ::
309
310 ceph tell osd.N cache drop
311
312 To get the cache statistics of an OSD, use the 'cache status' command ::
313
314 ceph tell osd.N cache status
315
316 MDS Subsystem
317 =============
318
319 Change configuration parameters on a running mds. ::
320
321 ceph tell mds.{mds-id} config set {setting} {value}
322
323 Example::
324
325 ceph tell mds.0 config set debug_ms 1
326
327 Enables debug messages. ::
328
329 ceph mds stat
330
331 Displays the status of all metadata servers. ::
332
333 ceph mds fail 0
334
335 Marks the active MDS as failed, triggering failover to a standby if present.
336
337 .. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap
338
339
340 Mon Subsystem
341 =============
342
343 Show monitor stats::
344
345 ceph mon stat
346
347 e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c
348
349
350 The ``quorum`` list at the end lists monitor nodes that are part of the current quorum.
351
352 This is also available more directly::
353
354 ceph quorum_status -f json-pretty
355
356 .. code-block:: javascript
357
358 {
359 "election_epoch": 6,
360 "quorum": [
361 0,
362 1,
363 2
364 ],
365 "quorum_names": [
366 "a",
367 "b",
368 "c"
369 ],
370 "quorum_leader_name": "a",
371 "monmap": {
372 "epoch": 2,
373 "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
374 "modified": "2016-12-26 14:42:09.288066",
375 "created": "2016-12-26 14:42:03.573585",
376 "features": {
377 "persistent": [
378 "kraken"
379 ],
380 "optional": []
381 },
382 "mons": [
383 {
384 "rank": 0,
385 "name": "a",
386 "addr": "127.0.0.1:40000\/0",
387 "public_addr": "127.0.0.1:40000\/0"
388 },
389 {
390 "rank": 1,
391 "name": "b",
392 "addr": "127.0.0.1:40001\/0",
393 "public_addr": "127.0.0.1:40001\/0"
394 },
395 {
396 "rank": 2,
397 "name": "c",
398 "addr": "127.0.0.1:40002\/0",
399 "public_addr": "127.0.0.1:40002\/0"
400 }
401 ]
402 }
403 }
404
405
406 The above will block until a quorum is reached.
407
408 For a status of just a single monitor::
409
410 ceph tell mon.[name] mon_status
411
412 where the value of ``[name]`` can be taken from ``ceph quorum_status``. Sample
413 output::
414
415 {
416 "name": "b",
417 "rank": 1,
418 "state": "peon",
419 "election_epoch": 6,
420 "quorum": [
421 0,
422 1,
423 2
424 ],
425 "features": {
426 "required_con": "9025616074522624",
427 "required_mon": [
428 "kraken"
429 ],
430 "quorum_con": "1152921504336314367",
431 "quorum_mon": [
432 "kraken"
433 ]
434 },
435 "outside_quorum": [],
436 "extra_probe_peers": [],
437 "sync_provider": [],
438 "monmap": {
439 "epoch": 2,
440 "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
441 "modified": "2016-12-26 14:42:09.288066",
442 "created": "2016-12-26 14:42:03.573585",
443 "features": {
444 "persistent": [
445 "kraken"
446 ],
447 "optional": []
448 },
449 "mons": [
450 {
451 "rank": 0,
452 "name": "a",
453 "addr": "127.0.0.1:40000\/0",
454 "public_addr": "127.0.0.1:40000\/0"
455 },
456 {
457 "rank": 1,
458 "name": "b",
459 "addr": "127.0.0.1:40001\/0",
460 "public_addr": "127.0.0.1:40001\/0"
461 },
462 {
463 "rank": 2,
464 "name": "c",
465 "addr": "127.0.0.1:40002\/0",
466 "public_addr": "127.0.0.1:40002\/0"
467 }
468 ]
469 }
470 }
471
472 A dump of the monitor state::
473
474 ceph mon dump
475
476 dumped monmap epoch 2
477 epoch 2
478 fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
479 last_changed 2016-12-26 14:42:09.288066
480 created 2016-12-26 14:42:03.573585
481 0: 127.0.0.1:40000/0 mon.a
482 1: 127.0.0.1:40001/0 mon.b
483 2: 127.0.0.1:40002/0 mon.c
484