]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/monitoring.rst
update ceph source to reef 18.1.2
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
CommitLineData
7c673cae
FG
1======================
2 Monitoring a Cluster
3======================
4
1e59de90
TL
5After you have a running cluster, you can use the ``ceph`` tool to monitor your
6cluster. Monitoring a cluster typically involves checking OSD status, monitor
7status, placement group status, and metadata server status.
7c673cae 8
c07f9fc5
FG
9Using the command line
10======================
11
12Interactive mode
13----------------
7c673cae
FG
14
15To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
1e59de90 16with no arguments. For example:
39ae355f
TL
17
18.. prompt:: bash $
7c673cae 19
1e59de90 20 ceph
39ae355f
TL
21
22.. prompt:: ceph>
23 :prompts: ceph>
24
25 health
26 status
27 quorum_status
28 mon stat
7c673cae 29
c07f9fc5
FG
30Non-default paths
31-----------------
7c673cae 32
1e59de90
TL
33If you specified non-default locations for your configuration or keyring when
34you install the cluster, you may specify their locations to the ``ceph`` tool
35by running the following command:
39ae355f
TL
36
37.. prompt:: bash $
7c673cae
FG
38
39 ceph -c /path/to/conf -k /path/to/keyring health
40
c07f9fc5
FG
41Checking a Cluster's Status
42===========================
43
1e59de90
TL
44After you start your cluster, and before you start reading and/or writing data,
45you should check your cluster's status.
7c673cae 46
1e59de90 47To check a cluster's status, run the following command:
39ae355f
TL
48
49.. prompt:: bash $
7c673cae 50
39ae355f 51 ceph status
1e59de90
TL
52
53Alternatively, you can run the following command:
7c673cae 54
39ae355f 55.. prompt:: bash $
c07f9fc5 56
39ae355f 57 ceph -s
c07f9fc5 58
1e59de90
TL
59In interactive mode, this operation is performed by typing ``status`` and
60pressing **Enter**:
39ae355f
TL
61
62.. prompt:: ceph>
63 :prompts: ceph>
1e59de90 64
39ae355f 65 status
c07f9fc5 66
1e59de90
TL
67Ceph will print the cluster status. For example, a tiny Ceph "demonstration
68cluster" that is running one instance of each service (monitor, manager, and
69OSD) might print the following:
c07f9fc5
FG
70
71::
72
73 cluster:
74 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
75 health: HEALTH_OK
76
77 services:
11fdf7f2 78 mon: 3 daemons, quorum a,b,c
c07f9fc5 79 mgr: x(active)
11fdf7f2
TL
80 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
81 osd: 3 osds: 3 up, 3 in
c07f9fc5
FG
82
83 data:
84 pools: 2 pools, 16 pgs
11fdf7f2 85 objects: 21 objects, 2.19K
c07f9fc5
FG
86 usage: 546 GB used, 384 GB / 931 GB avail
87 pgs: 16 active+clean
7c673cae 88
7c673cae 89
1e59de90
TL
90How Ceph Calculates Data Usage
91------------------------------
7c673cae 92
1e59de90
TL
93The ``usage`` value reflects the *actual* amount of raw storage used. The ``xxx
94GB / xxx GB`` value means the amount available (the lesser number) of the
95overall storage capacity of the cluster. The notional number reflects the size
96of the stored data before it is replicated, cloned or snapshotted. Therefore,
97the amount of data actually stored typically exceeds the notional amount
98stored, because Ceph creates replicas of the data and may also use storage
99capacity for cloning and snapshotting.
7c673cae
FG
100
101
c07f9fc5
FG
102Watching a Cluster
103==================
104
1e59de90
TL
105Each daemon in the Ceph cluster maintains a log of events, and the Ceph cluster
106itself maintains a *cluster log* that records high-level events about the
107entire Ceph cluster. These events are logged to disk on monitor servers (in
108the default location ``/var/log/ceph/ceph.log``), and they can be monitored via
109the command line.
c07f9fc5 110
1e59de90 111To follow the cluster log, run the following command:
c07f9fc5 112
39ae355f 113.. prompt:: bash $
c07f9fc5 114
39ae355f 115 ceph -w
c07f9fc5 116
1e59de90
TL
117Ceph will print the status of the system, followed by each log message as it is
118added. For example:
c07f9fc5
FG
119
120::
121
122 cluster:
123 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
124 health: HEALTH_OK
125
126 services:
11fdf7f2 127 mon: 3 daemons, quorum a,b,c
c07f9fc5 128 mgr: x(active)
11fdf7f2
TL
129 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
130 osd: 3 osds: 3 up, 3 in
c07f9fc5
FG
131
132 data:
133 pools: 2 pools, 16 pgs
11fdf7f2 134 objects: 21 objects, 2.19K
c07f9fc5
FG
135 usage: 546 GB used, 384 GB / 931 GB avail
136 pgs: 16 active+clean
137
138
139 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
140 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
141 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
142
1e59de90
TL
143Instead of printing log lines as they are added, you might want to print only
144the most recent lines. Run ``ceph log last [n]`` to see the most recent ``n``
145lines from the cluster log.
c07f9fc5
FG
146
147Monitoring Health Checks
148========================
149
1e59de90
TL
150Ceph continuously runs various *health checks*. When
151a health check fails, this failure is reflected in the output of ``ceph status`` and
152``ceph health``. The cluster log receives messages that
153indicate when a check has failed and when the cluster has recovered.
c07f9fc5
FG
154
155For example, when an OSD goes down, the ``health`` section of the status
1e59de90 156output is updated as follows:
c07f9fc5
FG
157
158::
159
160 health: HEALTH_WARN
161 1 osds down
162 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
163
1e59de90 164At the same time, cluster log messages are emitted to record the failure of the
c07f9fc5
FG
165health checks:
166
167::
168
169 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
170 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
171
172When the OSD comes back online, the cluster log records the cluster's return
1e59de90 173to a healthy state:
c07f9fc5
FG
174
175::
176
177 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
178 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
179 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
180
eafe8130
TL
181Network Performance Checks
182--------------------------
183
1e59de90
TL
184Ceph OSDs send heartbeat ping messages to each other in order to monitor daemon
185availability and network performance. If a single delayed response is detected,
186this might indicate nothing more than a busy OSD. But if multiple delays
187between distinct pairs of OSDs are detected, this might indicate a failed
188network switch, a NIC failure, or a layer 1 failure.
eafe8130 189
1e59de90
TL
190By default, a heartbeat time that exceeds 1 second (1000 milliseconds) raises a
191health check (a ``HEALTH_WARN``. For example:
eafe8130
TL
192
193::
194
9f95a23c 195 HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)
eafe8130 196
1e59de90
TL
197In the output of the ``ceph health detail`` command, you can see which OSDs are
198experiencing delays and how long the delays are. The output of ``ceph health
199detail`` is limited to ten lines. Here is an example of the output you can
200expect from the ``ceph health detail`` command::
eafe8130 201
9f95a23c
TL
202 [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
203 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
204 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
205 Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
206 Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec
eafe8130 207
1e59de90
TL
208To see more detail and to collect a complete dump of network performance
209information, use the ``dump_osd_network`` command. This command is usually sent
210to a Ceph Manager Daemon, but it can be used to collect information about a
211specific OSD's interactions by sending it to that OSD. The default threshold
212for a slow heartbeat is 1 second (1000 milliseconds), but this can be
213overridden by providing a number of milliseconds as an argument.
eafe8130 214
1e59de90
TL
215To show all network performance data with a specified threshold of 0, send the
216following command to the mgr:
eafe8130 217
39ae355f
TL
218.. prompt:: bash $
219
220 ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
221
eafe8130
TL
222::
223
eafe8130
TL
224 {
225 "threshold": 0,
226 "entries": [
227 {
228 "last update": "Wed Sep 4 17:04:49 2019",
229 "stale": false,
230 "from osd": 2,
231 "to osd": 0,
232 "interface": "front",
233 "average": {
234 "1min": 1.023,
235 "5min": 0.860,
236 "15min": 0.883
237 },
238 "min": {
239 "1min": 0.818,
240 "5min": 0.607,
241 "15min": 0.607
242 },
243 "max": {
244 "1min": 1.164,
245 "5min": 1.173,
246 "15min": 1.544
247 },
248 "last": 0.924
249 },
250 {
251 "last update": "Wed Sep 4 17:04:49 2019",
252 "stale": false,
253 "from osd": 2,
254 "to osd": 0,
255 "interface": "back",
256 "average": {
257 "1min": 0.968,
258 "5min": 0.897,
259 "15min": 0.830
260 },
261 "min": {
262 "1min": 0.860,
263 "5min": 0.563,
264 "15min": 0.502
265 },
266 "max": {
267 "1min": 1.171,
268 "5min": 1.216,
269 "15min": 1.456
270 },
271 "last": 0.845
272 },
273 {
274 "last update": "Wed Sep 4 17:04:48 2019",
275 "stale": false,
276 "from osd": 0,
277 "to osd": 1,
278 "interface": "front",
279 "average": {
280 "1min": 0.965,
281 "5min": 0.811,
282 "15min": 0.850
283 },
284 "min": {
285 "1min": 0.650,
286 "5min": 0.488,
287 "15min": 0.466
288 },
289 "max": {
290 "1min": 1.252,
291 "5min": 1.252,
292 "15min": 1.362
293 },
294 "last": 0.791
295 },
296 ...
297
c07f9fc5 298
9f95a23c 299
1e59de90 300Muting Health Checks
9f95a23c
TL
301--------------------
302
1e59de90
TL
303Health checks can be muted so that they have no effect on the overall
304reported status of the cluster. For example, if the cluster has raised a
305single health check and then you mute that health check, then the cluster will report a status of ``HEALTH_OK``.
306To mute a specific health check, use the health check code that corresponds to that health check (see :ref:`health-checks`), and
307run the following command:
39ae355f
TL
308
309.. prompt:: bash $
9f95a23c 310
39ae355f 311 ceph health mute <code>
9f95a23c 312
1e59de90 313For example, to mute an ``OSD_DOWN`` health check, run the following command:
39ae355f
TL
314
315.. prompt:: bash $
9f95a23c 316
39ae355f 317 ceph health mute OSD_DOWN
9f95a23c 318
1e59de90 319Mutes are reported as part of the short and long form of the ``ceph health`` command's output.
39ae355f 320For example, in the above scenario, the cluster would report:
9f95a23c 321
39ae355f 322.. prompt:: bash $
9f95a23c 323
39ae355f
TL
324 ceph health
325
326::
9f95a23c 327
39ae355f
TL
328 HEALTH_OK (muted: OSD_DOWN)
329
330.. prompt:: bash $
331
332 ceph health detail
333
334::
9f95a23c 335
39ae355f
TL
336 HEALTH_OK (muted: OSD_DOWN)
337 (MUTED) OSD_DOWN 1 osds down
338 osd.1 is down
9f95a23c 339
1e59de90 340A mute can be removed by running the following command:
39ae355f
TL
341
342.. prompt:: bash $
343
344 ceph health unmute <code>
345
346For example:
347
348.. prompt:: bash $
349
350 ceph health unmute OSD_DOWN
9f95a23c 351
1e59de90
TL
352A "health mute" can have a TTL (**T**\ime **T**\o **L**\ive)
353associated with it: this means that the mute will automatically expire
354after a specified period of time. The TTL is specified as an optional
355duration argument, as seen in the following examples:
9f95a23c 356
39ae355f
TL
357.. prompt:: bash $
358
359 ceph health mute OSD_DOWN 4h # mute for 4 hours
1e59de90 360 ceph health mute MON_DOWN 15m # mute for 15 minutes
9f95a23c 361
1e59de90
TL
362Normally, if a muted health check is resolved (for example, if the OSD that raised the ``OSD_DOWN`` health check
363in the example above has come back up), the mute goes away. If the health check comes
9f95a23c
TL
364back later, it will be reported in the usual way.
365
1e59de90
TL
366It is possible to make a health mute "sticky": this means that the mute will remain even if the
367health check clears. For example, to make a health mute "sticky", you might run the following command:
39ae355f
TL
368
369.. prompt:: bash $
9f95a23c 370
39ae355f 371 ceph health mute OSD_DOWN 1h --sticky # ignore any/all down OSDs for next hour
9f95a23c 372
1e59de90
TL
373Most health mutes disappear if the unhealthy condition that triggered the health check gets worse.
374For example, suppose that there is one OSD down and the health check is muted. In that case, if
375one or more additional OSDs go down, then the health mute disappears. This behavior occurs in any health check with a threshold value.
9f95a23c 376
c07f9fc5 377
7c673cae
FG
378Checking a Cluster's Usage Stats
379================================
380
1e59de90
TL
381To check a cluster's data usage and data distribution among pools, use the
382``df`` command. This option is similar to Linux's ``df`` command. Run the
383following command:
7c673cae 384
39ae355f
TL
385.. prompt:: bash $
386
387 ceph df
7c673cae 388
1e59de90 389The output of ``ceph df`` resembles the following::
7c673cae 390
f67539c2
TL
391 CLASS SIZE AVAIL USED RAW USED %RAW USED
392 ssd 202 GiB 200 GiB 2.0 GiB 2.0 GiB 1.00
393 TOTAL 202 GiB 200 GiB 2.0 GiB 2.0 GiB 1.00
394
395 --- POOLS ---
396 POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
397 device_health_metrics 1 1 242 KiB 15 KiB 227 KiB 4 251 KiB 24 KiB 227 KiB 0 297 GiB N/A N/A 4 0 B 0 B
398 cephfs.a.meta 2 32 6.8 KiB 6.8 KiB 0 B 22 96 KiB 96 KiB 0 B 0 297 GiB N/A N/A 22 0 B 0 B
399 cephfs.a.data 3 32 0 B 0 B 0 B 0 0 B 0 B 0 B 0 99 GiB N/A N/A 0 0 B 0 B
400 test 4 32 22 MiB 22 MiB 50 KiB 248 19 MiB 19 MiB 50 KiB 0 297 GiB N/A N/A 248 0 B 0 B
1e59de90
TL
401
402- **CLASS:** For example, "ssd" or "hdd".
11fdf7f2 403- **SIZE:** The amount of storage capacity managed by the cluster.
7c673cae 404- **AVAIL:** The amount of free space available in the cluster.
f67539c2 405- **USED:** The amount of raw storage consumed by user data (excluding
1e59de90 406 BlueStore's database).
f67539c2 407- **RAW USED:** The amount of raw storage consumed by user data, internal
1e59de90
TL
408 overhead, and reserved capacity.
409- **%RAW USED:** The percentage of raw storage used. Watch this number in
410 conjunction with ``full ratio`` and ``near full ratio`` to be forewarned when
411 your cluster approaches the fullness thresholds. See `Storage Capacity`_.
7c673cae 412
f67539c2 413
1e59de90 414**POOLS:**
f67539c2 415
1e59de90
TL
416The POOLS section of the output provides a list of pools and the *notional*
417usage of each pool. This section of the output **DOES NOT** reflect replicas,
418clones, or snapshots. For example, if you store an object with 1MB of data,
419then the notional usage will be 1MB, but the actual usage might be 2MB or more
420depending on the number of replicas, clones, and snapshots.
f67539c2 421
1e59de90
TL
422- **ID:** The number of the specific node within the pool.
423- **STORED:** The actual amount of data that the user has stored in a pool.
424 This is similar to the USED column in earlier versions of Ceph, but the
425 calculations (for BlueStore!) are more precise (in that gaps are properly
426 handled).
f67539c2 427
1e59de90 428 - **(DATA):** Usage for RBD (RADOS Block Device), CephFS file data, and RGW
f67539c2 429 (RADOS Gateway) object data.
1e59de90 430 - **(OMAP):** Key-value pairs. Used primarily by CephFS and RGW (RADOS
f67539c2
TL
431 Gateway) for metadata storage.
432
1e59de90
TL
433- **OBJECTS:** The notional number of objects stored per pool (that is, the
434 number of objects other than replicas, clones, or snapshots).
435- **USED:** The space allocated for a pool over all OSDs. This includes space
436 for replication, space for allocation granularity, and space for the overhead
437 associated with erasure-coding. Compression savings and object-content gaps
438 are also taken into account. However, BlueStore's database is not included in
439 the amount reported under USED.
440
441 - **(DATA):** Object usage for RBD (RADOS Block Device), CephFS file data,
442 and RGW (RADOS Gateway) object data.
443 - **(OMAP):** Object key-value pairs. Used primarily by CephFS and RGW (RADOS
f67539c2 444 Gateway) for metadata storage.
7c673cae 445
7c673cae
FG
446- **%USED:** The notional percentage of storage used per pool.
447- **MAX AVAIL:** An estimate of the notional amount of data that can be written
448 to this pool.
f67539c2
TL
449- **QUOTA OBJECTS:** The number of quota objects.
450- **QUOTA BYTES:** The number of bytes in the quota objects.
522d829b 451- **DIRTY:** The number of objects in the cache pool that have been written to
1e59de90
TL
452 the cache pool but have not yet been flushed to the base pool. This field is
453 available only when cache tiering is in use.
454- **USED COMPR:** The amount of space allocated for compressed data. This
455 includes compressed data in addition to all of the space required for
456 replication, allocation granularity, and erasure- coding overhead.
457- **UNDER COMPR:** The amount of data that has passed through compression
458 (summed over all replicas) and that is worth storing in a compressed form.
7c673cae 459
7c673cae 460
1e59de90
TL
461.. note:: The numbers in the POOLS section are notional. They do not include
462 the number of replicas, clones, or snapshots. As a result, the sum of the
463 USED and %USED amounts in the POOLS section of the output will not be equal
464 to the sum of the USED and %USED amounts in the RAW section of the output.
7c673cae 465
1e59de90
TL
466.. note:: The MAX AVAIL value is a complicated function of the replication or
467 the kind of erasure coding used, the CRUSH rule that maps storage to
468 devices, the utilization of those devices, and the configured
469 ``mon_osd_full_ratio`` setting.
7c673cae 470
7c673cae
FG
471
472Checking OSD Status
473===================
474
1e59de90 475To check if OSDs are ``up`` and ``in``, run the
f67539c2
TL
476following command:
477
478.. prompt:: bash #
7c673cae 479
f67539c2 480 ceph osd stat
1e59de90
TL
481
482Alternatively, you can run the following command:
f67539c2
TL
483
484.. prompt:: bash #
7c673cae 485
f67539c2 486 ceph osd dump
1e59de90
TL
487
488To view OSDs according to their position in the CRUSH map, run the following
489command:
7c673cae 490
f67539c2
TL
491.. prompt:: bash #
492
493 ceph osd tree
7c673cae 494
1e59de90
TL
495To print out a CRUSH tree that displays a host, its OSDs, whether the OSDs are
496``up``, and the weight of the OSDs, run the following command:
f67539c2
TL
497
498.. code-block:: bash
499
500 #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
501 -1 3.00000 pool default
502 -3 3.00000 rack mainrack
503 -2 3.00000 host osd-host
504 0 ssd 1.00000 osd.0 up 1.00000 1.00000
505 1 ssd 1.00000 osd.1 up 1.00000 1.00000
506 2 ssd 1.00000 osd.2 up 1.00000 1.00000
7c673cae 507
1e59de90 508See `Monitoring OSDs and Placement Groups`_.
7c673cae
FG
509
510Checking Monitor Status
511=======================
512
1e59de90
TL
513If your cluster has multiple monitors, then you need to perform certain
514"monitor status" checks. After starting the cluster and before reading or
515writing data, you should check quorum status. A quorum must be present when
516multiple monitors are running to ensure proper functioning of your Ceph
517cluster. Check monitor status regularly in order to ensure that all of the
518monitors are running.
7c673cae 519
1e59de90 520To display the monitor map, run the following command:
39ae355f
TL
521
522.. prompt:: bash $
7c673cae 523
39ae355f 524 ceph mon stat
1e59de90
TL
525
526Alternatively, you can run the following command:
7c673cae 527
39ae355f
TL
528.. prompt:: bash $
529
530 ceph mon dump
1e59de90
TL
531
532To check the quorum status for the monitor cluster, run the following command:
533
39ae355f
TL
534.. prompt:: bash $
535
536 ceph quorum_status
7c673cae 537
1e59de90
TL
538Ceph returns the quorum status. For example, a Ceph cluster that consists of
539three monitors might return the following:
7c673cae
FG
540
541.. code-block:: javascript
542
1e59de90
TL
543 { "election_epoch": 10,
544 "quorum": [
545 0,
546 1,
547 2],
548 "quorum_names": [
549 "a",
550 "b",
551 "c"],
552 "quorum_leader_name": "a",
553 "monmap": { "epoch": 1,
554 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
555 "modified": "2011-12-12 13:28:27.505520",
556 "created": "2011-12-12 13:28:27.505520",
557 "features": {"persistent": [
558 "kraken",
559 "luminous",
560 "mimic"],
561 "optional": []
562 },
563 "mons": [
564 { "rank": 0,
565 "name": "a",
566 "addr": "127.0.0.1:6789/0",
567 "public_addr": "127.0.0.1:6789/0"},
568 { "rank": 1,
569 "name": "b",
570 "addr": "127.0.0.1:6790/0",
571 "public_addr": "127.0.0.1:6790/0"},
572 { "rank": 2,
573 "name": "c",
574 "addr": "127.0.0.1:6791/0",
575 "public_addr": "127.0.0.1:6791/0"}
576 ]
577 }
578 }
7c673cae
FG
579
580Checking MDS Status
581===================
582
1e59de90
TL
583Metadata servers provide metadata services for CephFS. Metadata servers have
584two sets of states: ``up | down`` and ``active | inactive``. To check if your
585metadata servers are ``up`` and ``active``, run the following command:
7c673cae 586
39ae355f
TL
587.. prompt:: bash $
588
589 ceph mds stat
1e59de90
TL
590
591To display details of the metadata servers, run the following command:
39ae355f
TL
592
593.. prompt:: bash $
7c673cae 594
39ae355f 595 ceph fs dump
7c673cae
FG
596
597
598Checking Placement Group States
599===============================
600
1e59de90
TL
601Placement groups (PGs) map objects to OSDs. PGs are monitored in order to
602ensure that they are ``active`` and ``clean``. See `Monitoring OSDs and
603Placement Groups`_.
7c673cae
FG
604
605.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
606
e306af50 607.. _rados-monitoring-using-admin-socket:
7c673cae
FG
608
609Using the Admin Socket
610======================
611
1e59de90
TL
612The Ceph admin socket allows you to query a daemon via a socket interface. By
613default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon via
614the admin socket, log in to the host that is running the daemon and run one of
615the two following commands:
39ae355f
TL
616
617.. prompt:: bash $
618
619 ceph daemon {daemon-name}
620 ceph daemon {path-to-socket-file}
621
1e59de90 622For example, the following commands are equivalent to each other:
7c673cae 623
39ae355f 624.. prompt:: bash $
7c673cae 625
39ae355f
TL
626 ceph daemon osd.0 foo
627 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
7c673cae 628
1e59de90 629To view the available admin-socket commands, run the following command:
7c673cae 630
39ae355f 631.. prompt:: bash $
7c673cae 632
39ae355f 633 ceph daemon {daemon-name} help
7c673cae 634
1e59de90
TL
635Admin-socket commands enable you to view and set your configuration at runtime.
636For more on viewing your configuration, see `Viewing a Configuration at
637Runtime`_. There are two methods of setting configuration value at runtime: (1)
638using the admin socket, which bypasses the monitor and requires a direct login
639to the host in question, and (2) using the ``ceph tell {daemon-type}.{id}
640config set`` command, which relies on the monitor and does not require a direct
641login.
7c673cae 642
11fdf7f2 643.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
7c673cae 644.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity