]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/monitoring.rst
update source to Ceph Pacific 16.2.2
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
CommitLineData
7c673cae
FG
1======================
2 Monitoring a Cluster
3======================
4
5Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6cluster. Monitoring a cluster typically involves checking OSD status, monitor
7status, placement group status and metadata server status.
8
c07f9fc5
FG
9Using the command line
10======================
11
12Interactive mode
13----------------
7c673cae
FG
14
15To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
16with no arguments. For example::
17
18 ceph
19 ceph> health
20 ceph> status
21 ceph> quorum_status
9f95a23c 22 ceph> mon stat
7c673cae 23
c07f9fc5
FG
24Non-default paths
25-----------------
7c673cae
FG
26
27If you specified non-default locations for your configuration or keyring,
28you may specify their locations::
29
30 ceph -c /path/to/conf -k /path/to/keyring health
31
c07f9fc5
FG
32Checking a Cluster's Status
33===========================
34
35After you start your cluster, and before you start reading and/or
36writing data, check your cluster's status first.
7c673cae 37
c07f9fc5 38To check a cluster's status, execute the following::
7c673cae 39
c07f9fc5
FG
40 ceph status
41
42Or::
7c673cae 43
c07f9fc5
FG
44 ceph -s
45
46In interactive mode, type ``status`` and press **Enter**. ::
47
48 ceph> status
49
50Ceph will print the cluster status. For example, a tiny Ceph demonstration
51cluster with one of each service may print the following:
52
53::
54
55 cluster:
56 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
57 health: HEALTH_OK
58
59 services:
11fdf7f2 60 mon: 3 daemons, quorum a,b,c
c07f9fc5 61 mgr: x(active)
11fdf7f2
TL
62 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
63 osd: 3 osds: 3 up, 3 in
c07f9fc5
FG
64
65 data:
66 pools: 2 pools, 16 pgs
11fdf7f2 67 objects: 21 objects, 2.19K
c07f9fc5
FG
68 usage: 546 GB used, 384 GB / 931 GB avail
69 pgs: 16 active+clean
7c673cae 70
7c673cae
FG
71
72.. topic:: How Ceph Calculates Data Usage
73
c07f9fc5 74 The ``usage`` value reflects the *actual* amount of raw storage used. The
7c673cae
FG
75 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
76 of the overall storage capacity of the cluster. The notional number reflects
77 the size of the stored data before it is replicated, cloned or snapshotted.
78 Therefore, the amount of data actually stored typically exceeds the notional
79 amount stored, because Ceph creates replicas of the data and may also use
80 storage capacity for cloning and snapshotting.
81
82
c07f9fc5
FG
83Watching a Cluster
84==================
85
86In addition to local logging by each daemon, Ceph clusters maintain
87a *cluster log* that records high level events about the whole system.
88This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
89default), but can also be monitored via the command line.
90
91To follow the cluster log, use the following command
92
93::
94
95 ceph -w
96
97Ceph will print the status of the system, followed by each log message as it
98is emitted. For example:
99
100::
101
102 cluster:
103 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
104 health: HEALTH_OK
105
106 services:
11fdf7f2 107 mon: 3 daemons, quorum a,b,c
c07f9fc5 108 mgr: x(active)
11fdf7f2
TL
109 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
110 osd: 3 osds: 3 up, 3 in
c07f9fc5
FG
111
112 data:
113 pools: 2 pools, 16 pgs
11fdf7f2 114 objects: 21 objects, 2.19K
c07f9fc5
FG
115 usage: 546 GB used, 384 GB / 931 GB avail
116 pgs: 16 active+clean
117
118
119 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
120 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
121 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
122
123
124In addition to using ``ceph -w`` to print log lines as they are emitted,
125use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
126log.
127
128Monitoring Health Checks
129========================
130
11fdf7f2 131Ceph continuously runs various *health checks* against its own status. When
c07f9fc5
FG
132a health check fails, this is reflected in the output of ``ceph status`` (or
133``ceph health``). In addition, messages are sent to the cluster log to
134indicate when a check fails, and when the cluster recovers.
135
136For example, when an OSD goes down, the ``health`` section of the status
137output may be updated as follows:
138
139::
140
141 health: HEALTH_WARN
142 1 osds down
143 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
144
145At this time, cluster log messages are also emitted to record the failure of the
146health checks:
147
148::
149
150 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
151 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
152
153When the OSD comes back online, the cluster log records the cluster's return
154to a health state:
155
156::
157
158 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
159 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
160 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
161
eafe8130
TL
162Network Performance Checks
163--------------------------
164
165Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability. We
166also use the response times to monitor network performance.
167While it is possible that a busy OSD could delay a ping response, we can assume
9f95a23c 168that if a network switch fails multiple delays will be detected between distinct pairs of OSDs.
eafe8130
TL
169
170By default we will warn about ping times which exceed 1 second (1000 milliseconds).
171
172::
173
9f95a23c 174 HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)
eafe8130
TL
175
176The health detail will add the combination of OSDs are seeing the delays and by how much. There is a limit of 10
177detail line items.
178
179::
180
9f95a23c
TL
181 [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
182 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
183 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
184 Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
185 Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec
eafe8130
TL
186
187To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used. Typically, this would be
188sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD. The current threshold which defaults to 1 second
189(1000 milliseconds) can be overridden as an argument in milliseconds.
190
191The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.
192
193::
194
195 $ ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
196 {
197 "threshold": 0,
198 "entries": [
199 {
200 "last update": "Wed Sep 4 17:04:49 2019",
201 "stale": false,
202 "from osd": 2,
203 "to osd": 0,
204 "interface": "front",
205 "average": {
206 "1min": 1.023,
207 "5min": 0.860,
208 "15min": 0.883
209 },
210 "min": {
211 "1min": 0.818,
212 "5min": 0.607,
213 "15min": 0.607
214 },
215 "max": {
216 "1min": 1.164,
217 "5min": 1.173,
218 "15min": 1.544
219 },
220 "last": 0.924
221 },
222 {
223 "last update": "Wed Sep 4 17:04:49 2019",
224 "stale": false,
225 "from osd": 2,
226 "to osd": 0,
227 "interface": "back",
228 "average": {
229 "1min": 0.968,
230 "5min": 0.897,
231 "15min": 0.830
232 },
233 "min": {
234 "1min": 0.860,
235 "5min": 0.563,
236 "15min": 0.502
237 },
238 "max": {
239 "1min": 1.171,
240 "5min": 1.216,
241 "15min": 1.456
242 },
243 "last": 0.845
244 },
245 {
246 "last update": "Wed Sep 4 17:04:48 2019",
247 "stale": false,
248 "from osd": 0,
249 "to osd": 1,
250 "interface": "front",
251 "average": {
252 "1min": 0.965,
253 "5min": 0.811,
254 "15min": 0.850
255 },
256 "min": {
257 "1min": 0.650,
258 "5min": 0.488,
259 "15min": 0.466
260 },
261 "max": {
262 "1min": 1.252,
263 "5min": 1.252,
264 "15min": 1.362
265 },
266 "last": 0.791
267 },
268 ...
269
c07f9fc5 270
9f95a23c
TL
271
272Muting health checks
273--------------------
274
275Health checks can be muted so that they do not affect the overall
276reported status of the cluster. Alerts are specified using the health
277check code (see :ref:`health-checks`)::
278
279 ceph health mute <code>
280
281For example, if there is a health warning, muting it will make the
282cluster report an overall status of ``HEALTH_OK``. For example, to
283mute an ``OSD_DOWN`` alert,::
284
285 ceph health mute OSD_DOWN
286
287Mutes are reported as part of the short and long form of the ``ceph health`` command.
288For example, in the above scenario, the cluster would report::
289
290 $ ceph health
291 HEALTH_OK (muted: OSD_DOWN)
292 $ ceph health detail
293 HEALTH_OK (muted: OSD_DOWN)
294 (MUTED) OSD_DOWN 1 osds down
295 osd.1 is down
296
297A mute can be explicitly removed with::
298
299 ceph health unmute <code>
300
301For example,::
302
303 ceph health unmute OSD_DOWN
304
305A health check mute may optionally have a TTL (time to live)
306associated with it, such that the mute will automatically expire
307after the specified period of time has elapsed. The TTL is specified as an optional
308duration argument, e.g.::
309
310 ceph health mute OSD_DOWN 4h # mute for 4 hours
311 ceph health mute MON_DOWN 15m # mute for 15 minutes
312
313Normally, if a muted health alert is resolved (e.g., in the example
314above, the OSD comes back up), the mute goes away. If the alert comes
315back later, it will be reported in the usual way.
316
317It is possible to make a mute "sticky" such that the mute will remain even if the
318alert clears. For example,::
319
320 ceph health mute OSD_DOWN 1h --sticky # ignore any/all down OSDs for next hour
321
322Most health mutes also disappear if the extent of an alert gets worse. For example,
323if there is one OSD down, and the alert is muted, the mute will disappear if one
324or more additional OSDs go down. This is true for any health alert that involves
325a count indicating how much or how many of something is triggering the warning or
326error.
327
328
c07f9fc5
FG
329Detecting configuration issues
330==============================
331
332In addition to the health checks that Ceph continuously runs on its
333own status, there are some configuration issues that may only be detected
334by an external tool.
335
336Use the `ceph-medic`_ tool to run these additional checks on your Ceph
337cluster's configuration.
338
7c673cae
FG
339Checking a Cluster's Usage Stats
340================================
341
342To check a cluster's data usage and data distribution among pools, you can
343use the ``df`` option. It is similar to Linux ``df``. Execute
344the following::
345
346 ceph df
347
f67539c2 348The output of ``ceph df`` looks like this::
7c673cae 349
f67539c2
TL
350 CLASS SIZE AVAIL USED RAW USED %RAW USED
351 ssd 202 GiB 200 GiB 2.0 GiB 2.0 GiB 1.00
352 TOTAL 202 GiB 200 GiB 2.0 GiB 2.0 GiB 1.00
353
354 --- POOLS ---
355 POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
356 device_health_metrics 1 1 242 KiB 15 KiB 227 KiB 4 251 KiB 24 KiB 227 KiB 0 297 GiB N/A N/A 4 0 B 0 B
357 cephfs.a.meta 2 32 6.8 KiB 6.8 KiB 0 B 22 96 KiB 96 KiB 0 B 0 297 GiB N/A N/A 22 0 B 0 B
358 cephfs.a.data 3 32 0 B 0 B 0 B 0 0 B 0 B 0 B 0 99 GiB N/A N/A 0 0 B 0 B
359 test 4 32 22 MiB 22 MiB 50 KiB 248 19 MiB 19 MiB 50 KiB 0 297 GiB N/A N/A 248 0 B 0 B
360
361
362
363
364
365- **CLASS:** for example, "ssd" or "hdd"
11fdf7f2 366- **SIZE:** The amount of storage capacity managed by the cluster.
7c673cae 367- **AVAIL:** The amount of free space available in the cluster.
f67539c2
TL
368- **USED:** The amount of raw storage consumed by user data (excluding
369 BlueStore's database)
370- **RAW USED:** The amount of raw storage consumed by user data, internal
371 overhead, or reserved capacity.
11fdf7f2 372- **%RAW USED:** The percentage of raw storage used. Use this number in
7c673cae
FG
373 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
374 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
375 additional details.
376
f67539c2
TL
377
378**POOLS:**
379
7c673cae
FG
380The **POOLS** section of the output provides a list of pools and the notional
381usage of each pool. The output from this section **DOES NOT** reflect replicas,
382clones or snapshots. For example, if you store an object with 1MB of data, the
383notional usage will be 1MB, but the actual usage may be 2MB or more depending
f67539c2
TL
384on the number of replicas, clones and snapshots.
385
386- **ID:** The number of the node within the pool.
387- **STORED:** actual amount of data user/Ceph has stored in a pool. This is
388 similar to the USED column in earlier versions of Ceph but the calculations
389 (for BlueStore!) are more precise (gaps are properly handled).
390
391 - **(DATA):** usage for RBD (RADOS Block Device), CephFS file data, and RGW
392 (RADOS Gateway) object data.
393 - **(OMAP):** key-value pairs. Used primarily by CephFS and RGW (RADOS
394 Gateway) for metadata storage.
395
396- **OBJECTS:** The notional number of objects stored per pool. "Notional" is
397 defined above in the paragraph immediately under "POOLS".
398- **USED:** The space allocated for a pool over all OSDs. This includes
399 replication, allocation granularity, and erasure-coding overhead. Compression
400 savings and object content gaps are also taken into account. BlueStore's
401 database is not included in this amount.
402
403 - **(DATA):** object usage for RBD (RADOS Block Device), CephFS file data, and RGW
404 (RADOS Gateway) object data.
405 - **(OMAP):** object key-value pairs. Used primarily by CephFS and RGW (RADOS
406 Gateway) for metadata storage.
7c673cae 407
7c673cae
FG
408- **%USED:** The notional percentage of storage used per pool.
409- **MAX AVAIL:** An estimate of the notional amount of data that can be written
410 to this pool.
f67539c2
TL
411- **QUOTA OBJECTS:** The number of quota objects.
412- **QUOTA BYTES:** The number of bytes in the quota objects.
413- **DIRTY:** "DIRTY" is meaningful only when cache tiering is in use. If cache
414 tiering is in use, the "DIRTY" column lists the number of objects in the
415 cache pool that have been written to the cache pool but have not flushed yet
416 to the base pool.
417- **USED COMPR:** amount of space allocated for compressed data (i.e. this
418 includes comrpessed data plus all the allocation, replication and erasure
419 coding overhead).
420- **UNDER COMPR:** amount of data passed through compression (summed over all
421 replicas) and beneficial enough to be stored in a compressed form.
7c673cae 422
7c673cae 423
f67539c2
TL
424.. note:: The numbers in the POOLS section are notional. They are not
425 inclusive of the number of replicas, snapshots or clones. As a result, the
426 sum of the USED and %USED amounts will not add up to the USED and %USED
427 amounts in the RAW section of the output.
7c673cae 428
f67539c2
TL
429.. note:: The MAX AVAIL value is a complicated function of the replication
430 or erasure code used, the CRUSH rule that maps storage to devices, the
431 utilization of those devices, and the configured ``mon_osd_full_ratio``.
7c673cae 432
7c673cae
FG
433
434Checking OSD Status
435===================
436
f67539c2
TL
437You can check OSDs to ensure they are ``up`` and ``in`` by executing the
438following command:
439
440.. prompt:: bash #
7c673cae 441
f67539c2 442 ceph osd stat
7c673cae 443
f67539c2
TL
444Or:
445
446.. prompt:: bash #
7c673cae 447
f67539c2 448 ceph osd dump
7c673cae 449
f67539c2
TL
450You can also check view OSDs according to their position in the CRUSH map by
451using the folloiwng command:
7c673cae 452
f67539c2
TL
453.. prompt:: bash #
454
455 ceph osd tree
7c673cae
FG
456
457Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
f67539c2
TL
458and their weight:
459
460.. code-block:: bash
461
462 #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
463 -1 3.00000 pool default
464 -3 3.00000 rack mainrack
465 -2 3.00000 host osd-host
466 0 ssd 1.00000 osd.0 up 1.00000 1.00000
467 1 ssd 1.00000 osd.1 up 1.00000 1.00000
468 2 ssd 1.00000 osd.2 up 1.00000 1.00000
7c673cae
FG
469
470For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
471
472Checking Monitor Status
473=======================
474
475If your cluster has multiple monitors (likely), you should check the monitor
11fdf7f2 476quorum status after you start the cluster and before reading and/or writing data. A
7c673cae
FG
477quorum must be present when multiple monitors are running. You should also check
478monitor status periodically to ensure that they are running.
479
480To see display the monitor map, execute the following::
481
482 ceph mon stat
483
484Or::
485
486 ceph mon dump
487
488To check the quorum status for the monitor cluster, execute the following::
489
490 ceph quorum_status
491
492Ceph will return the quorum status. For example, a Ceph cluster consisting of
493three monitors may return the following:
494
495.. code-block:: javascript
496
497 { "election_epoch": 10,
498 "quorum": [
499 0,
500 1,
501 2],
11fdf7f2
TL
502 "quorum_names": [
503 "a",
504 "b",
505 "c"],
506 "quorum_leader_name": "a",
7c673cae
FG
507 "monmap": { "epoch": 1,
508 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
509 "modified": "2011-12-12 13:28:27.505520",
510 "created": "2011-12-12 13:28:27.505520",
11fdf7f2
TL
511 "features": {"persistent": [
512 "kraken",
513 "luminous",
514 "mimic"],
515 "optional": []
516 },
7c673cae
FG
517 "mons": [
518 { "rank": 0,
519 "name": "a",
11fdf7f2
TL
520 "addr": "127.0.0.1:6789/0",
521 "public_addr": "127.0.0.1:6789/0"},
7c673cae
FG
522 { "rank": 1,
523 "name": "b",
11fdf7f2
TL
524 "addr": "127.0.0.1:6790/0",
525 "public_addr": "127.0.0.1:6790/0"},
7c673cae
FG
526 { "rank": 2,
527 "name": "c",
11fdf7f2
TL
528 "addr": "127.0.0.1:6791/0",
529 "public_addr": "127.0.0.1:6791/0"}
7c673cae 530 ]
11fdf7f2 531 }
7c673cae
FG
532 }
533
534Checking MDS Status
535===================
536
91327a77 537Metadata servers provide metadata services for CephFS. Metadata servers have
7c673cae
FG
538two sets of states: ``up | down`` and ``active | inactive``. To ensure your
539metadata servers are ``up`` and ``active``, execute the following::
540
541 ceph mds stat
542
543To display details of the metadata cluster, execute the following::
544
545 ceph fs dump
546
547
548Checking Placement Group States
549===============================
550
551Placement groups map objects to OSDs. When you monitor your
552placement groups, you will want them to be ``active`` and ``clean``.
553For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
554
555.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
556
e306af50 557.. _rados-monitoring-using-admin-socket:
7c673cae
FG
558
559Using the Admin Socket
560======================
561
562The Ceph admin socket allows you to query a daemon via a socket interface.
563By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
564via the admin socket, login to the host running the daemon and use the
565following command::
566
567 ceph daemon {daemon-name}
568 ceph daemon {path-to-socket-file}
569
570For example, the following are equivalent::
571
572 ceph daemon osd.0 foo
573 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
574
575To view the available admin socket commands, execute the following command::
576
577 ceph daemon {daemon-name} help
578
579The admin socket command enables you to show and set your configuration at
580runtime. See `Viewing a Configuration at Runtime`_ for details.
581
582Additionally, you can set configuration values at runtime directly (i.e., the
583admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
11fdf7f2 584config set``, which relies on the monitor but doesn't require you to login
7c673cae
FG
585directly to the host in question ).
586
11fdf7f2 587.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
7c673cae 588.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
c07f9fc5 589.. _ceph-medic: http://docs.ceph.com/ceph-medic/master/