]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/monitoring.rst
import ceph quincy 17.2.6
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
1 ======================
2 Monitoring a Cluster
3 ======================
4
5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
7 status, placement group status and metadata server status.
8
9 Using the command line
10 ======================
11
12 Interactive mode
13 ----------------
14
15 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
16 with no arguments. For example:
17
18 .. prompt:: bash $
19
20 ceph
21
22 .. prompt:: ceph>
23 :prompts: ceph>
24
25 health
26 status
27 quorum_status
28 mon stat
29
30 Non-default paths
31 -----------------
32
33 If you specified non-default locations for your configuration or keyring,
34 you may specify their locations:
35
36 .. prompt:: bash $
37
38 ceph -c /path/to/conf -k /path/to/keyring health
39
40 Checking a Cluster's Status
41 ===========================
42
43 After you start your cluster, and before you start reading and/or
44 writing data, check your cluster's status first.
45
46 To check a cluster's status, execute the following:
47
48 .. prompt:: bash $
49
50 ceph status
51
52 Or:
53
54 .. prompt:: bash $
55
56 ceph -s
57
58 In interactive mode, type ``status`` and press **Enter**:
59
60 .. prompt:: ceph>
61 :prompts: ceph>
62
63 status
64
65 Ceph will print the cluster status. For example, a tiny Ceph demonstration
66 cluster with one of each service may print the following:
67
68 ::
69
70 cluster:
71 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
72 health: HEALTH_OK
73
74 services:
75 mon: 3 daemons, quorum a,b,c
76 mgr: x(active)
77 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
78 osd: 3 osds: 3 up, 3 in
79
80 data:
81 pools: 2 pools, 16 pgs
82 objects: 21 objects, 2.19K
83 usage: 546 GB used, 384 GB / 931 GB avail
84 pgs: 16 active+clean
85
86
87 .. topic:: How Ceph Calculates Data Usage
88
89 The ``usage`` value reflects the *actual* amount of raw storage used. The
90 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
91 of the overall storage capacity of the cluster. The notional number reflects
92 the size of the stored data before it is replicated, cloned or snapshotted.
93 Therefore, the amount of data actually stored typically exceeds the notional
94 amount stored, because Ceph creates replicas of the data and may also use
95 storage capacity for cloning and snapshotting.
96
97
98 Watching a Cluster
99 ==================
100
101 In addition to local logging by each daemon, Ceph clusters maintain
102 a *cluster log* that records high level events about the whole system.
103 This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
104 default), but can also be monitored via the command line.
105
106 To follow the cluster log, use the following command:
107
108 .. prompt:: bash $
109
110 ceph -w
111
112 Ceph will print the status of the system, followed by each log message as it
113 is emitted. For example:
114
115 ::
116
117 cluster:
118 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
119 health: HEALTH_OK
120
121 services:
122 mon: 3 daemons, quorum a,b,c
123 mgr: x(active)
124 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
125 osd: 3 osds: 3 up, 3 in
126
127 data:
128 pools: 2 pools, 16 pgs
129 objects: 21 objects, 2.19K
130 usage: 546 GB used, 384 GB / 931 GB avail
131 pgs: 16 active+clean
132
133
134 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
135 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
136 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
137
138
139 In addition to using ``ceph -w`` to print log lines as they are emitted,
140 use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
141 log.
142
143 Monitoring Health Checks
144 ========================
145
146 Ceph continuously runs various *health checks* against its own status. When
147 a health check fails, this is reflected in the output of ``ceph status`` (or
148 ``ceph health``). In addition, messages are sent to the cluster log to
149 indicate when a check fails, and when the cluster recovers.
150
151 For example, when an OSD goes down, the ``health`` section of the status
152 output may be updated as follows:
153
154 ::
155
156 health: HEALTH_WARN
157 1 osds down
158 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
159
160 At this time, cluster log messages are also emitted to record the failure of the
161 health checks:
162
163 ::
164
165 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
166 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
167
168 When the OSD comes back online, the cluster log records the cluster's return
169 to a health state:
170
171 ::
172
173 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
174 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
175 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
176
177 Network Performance Checks
178 --------------------------
179
180 Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability. We
181 also use the response times to monitor network performance.
182 While it is possible that a busy OSD could delay a ping response, we can assume
183 that if a network switch fails multiple delays will be detected between distinct pairs of OSDs.
184
185 By default we will warn about ping times which exceed 1 second (1000 milliseconds).
186
187 ::
188
189 HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)
190
191 The health detail will add the combination of OSDs are seeing the delays and by how much. There is a limit of 10
192 detail line items.
193
194 ::
195
196 [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
197 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
198 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
199 Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
200 Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec
201
202 To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used. Typically, this would be
203 sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD. The current threshold which defaults to 1 second
204 (1000 milliseconds) can be overridden as an argument in milliseconds.
205
206 The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.
207
208 .. prompt:: bash $
209
210 ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
211
212 ::
213
214 {
215 "threshold": 0,
216 "entries": [
217 {
218 "last update": "Wed Sep 4 17:04:49 2019",
219 "stale": false,
220 "from osd": 2,
221 "to osd": 0,
222 "interface": "front",
223 "average": {
224 "1min": 1.023,
225 "5min": 0.860,
226 "15min": 0.883
227 },
228 "min": {
229 "1min": 0.818,
230 "5min": 0.607,
231 "15min": 0.607
232 },
233 "max": {
234 "1min": 1.164,
235 "5min": 1.173,
236 "15min": 1.544
237 },
238 "last": 0.924
239 },
240 {
241 "last update": "Wed Sep 4 17:04:49 2019",
242 "stale": false,
243 "from osd": 2,
244 "to osd": 0,
245 "interface": "back",
246 "average": {
247 "1min": 0.968,
248 "5min": 0.897,
249 "15min": 0.830
250 },
251 "min": {
252 "1min": 0.860,
253 "5min": 0.563,
254 "15min": 0.502
255 },
256 "max": {
257 "1min": 1.171,
258 "5min": 1.216,
259 "15min": 1.456
260 },
261 "last": 0.845
262 },
263 {
264 "last update": "Wed Sep 4 17:04:48 2019",
265 "stale": false,
266 "from osd": 0,
267 "to osd": 1,
268 "interface": "front",
269 "average": {
270 "1min": 0.965,
271 "5min": 0.811,
272 "15min": 0.850
273 },
274 "min": {
275 "1min": 0.650,
276 "5min": 0.488,
277 "15min": 0.466
278 },
279 "max": {
280 "1min": 1.252,
281 "5min": 1.252,
282 "15min": 1.362
283 },
284 "last": 0.791
285 },
286 ...
287
288
289
290 Muting health checks
291 --------------------
292
293 Health checks can be muted so that they do not affect the overall
294 reported status of the cluster. Alerts are specified using the health
295 check code (see :ref:`health-checks`):
296
297 .. prompt:: bash $
298
299 ceph health mute <code>
300
301 For example, if there is a health warning, muting it will make the
302 cluster report an overall status of ``HEALTH_OK``. For example, to
303 mute an ``OSD_DOWN`` alert,:
304
305 .. prompt:: bash $
306
307 ceph health mute OSD_DOWN
308
309 Mutes are reported as part of the short and long form of the ``ceph health`` command.
310 For example, in the above scenario, the cluster would report:
311
312 .. prompt:: bash $
313
314 ceph health
315
316 ::
317
318 HEALTH_OK (muted: OSD_DOWN)
319
320 .. prompt:: bash $
321
322 ceph health detail
323
324 ::
325
326 HEALTH_OK (muted: OSD_DOWN)
327 (MUTED) OSD_DOWN 1 osds down
328 osd.1 is down
329
330 A mute can be explicitly removed with:
331
332 .. prompt:: bash $
333
334 ceph health unmute <code>
335
336 For example:
337
338 .. prompt:: bash $
339
340 ceph health unmute OSD_DOWN
341
342 A health check mute may optionally have a TTL (time to live)
343 associated with it, such that the mute will automatically expire
344 after the specified period of time has elapsed. The TTL is specified as an optional
345 duration argument, e.g.:
346
347 .. prompt:: bash $
348
349 ceph health mute OSD_DOWN 4h # mute for 4 hours
350 ceph health mute MON_DOWN 15m # mute for 15 minutes
351
352 Normally, if a muted health alert is resolved (e.g., in the example
353 above, the OSD comes back up), the mute goes away. If the alert comes
354 back later, it will be reported in the usual way.
355
356 It is possible to make a mute "sticky" such that the mute will remain even if the
357 alert clears. For example:
358
359 .. prompt:: bash $
360
361 ceph health mute OSD_DOWN 1h --sticky # ignore any/all down OSDs for next hour
362
363 Most health mutes also disappear if the extent of an alert gets worse. For example,
364 if there is one OSD down, and the alert is muted, the mute will disappear if one
365 or more additional OSDs go down. This is true for any health alert that involves
366 a count indicating how much or how many of something is triggering the warning or
367 error.
368
369
370 Detecting configuration issues
371 ==============================
372
373 In addition to the health checks that Ceph continuously runs on its
374 own status, there are some configuration issues that may only be detected
375 by an external tool.
376
377 Use the `ceph-medic`_ tool to run these additional checks on your Ceph
378 cluster's configuration.
379
380 Checking a Cluster's Usage Stats
381 ================================
382
383 To check a cluster's data usage and data distribution among pools, you can
384 use the ``df`` option. It is similar to Linux ``df``. Execute
385 the following:
386
387 .. prompt:: bash $
388
389 ceph df
390
391 The output of ``ceph df`` looks like this::
392
393 CLASS SIZE AVAIL USED RAW USED %RAW USED
394 ssd 202 GiB 200 GiB 2.0 GiB 2.0 GiB 1.00
395 TOTAL 202 GiB 200 GiB 2.0 GiB 2.0 GiB 1.00
396
397 --- POOLS ---
398 POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
399 device_health_metrics 1 1 242 KiB 15 KiB 227 KiB 4 251 KiB 24 KiB 227 KiB 0 297 GiB N/A N/A 4 0 B 0 B
400 cephfs.a.meta 2 32 6.8 KiB 6.8 KiB 0 B 22 96 KiB 96 KiB 0 B 0 297 GiB N/A N/A 22 0 B 0 B
401 cephfs.a.data 3 32 0 B 0 B 0 B 0 0 B 0 B 0 B 0 99 GiB N/A N/A 0 0 B 0 B
402 test 4 32 22 MiB 22 MiB 50 KiB 248 19 MiB 19 MiB 50 KiB 0 297 GiB N/A N/A 248 0 B 0 B
403
404
405
406
407
408 - **CLASS:** for example, "ssd" or "hdd"
409 - **SIZE:** The amount of storage capacity managed by the cluster.
410 - **AVAIL:** The amount of free space available in the cluster.
411 - **USED:** The amount of raw storage consumed by user data (excluding
412 BlueStore's database)
413 - **RAW USED:** The amount of raw storage consumed by user data, internal
414 overhead, or reserved capacity.
415 - **%RAW USED:** The percentage of raw storage used. Use this number in
416 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
417 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
418 additional details.
419
420
421 **POOLS:**
422
423 The **POOLS** section of the output provides a list of pools and the notional
424 usage of each pool. The output from this section **DOES NOT** reflect replicas,
425 clones or snapshots. For example, if you store an object with 1MB of data, the
426 notional usage will be 1MB, but the actual usage may be 2MB or more depending
427 on the number of replicas, clones and snapshots.
428
429 - **ID:** The number of the node within the pool.
430 - **STORED:** actual amount of data user/Ceph has stored in a pool. This is
431 similar to the USED column in earlier versions of Ceph but the calculations
432 (for BlueStore!) are more precise (gaps are properly handled).
433
434 - **(DATA):** usage for RBD (RADOS Block Device), CephFS file data, and RGW
435 (RADOS Gateway) object data.
436 - **(OMAP):** key-value pairs. Used primarily by CephFS and RGW (RADOS
437 Gateway) for metadata storage.
438
439 - **OBJECTS:** The notional number of objects stored per pool. "Notional" is
440 defined above in the paragraph immediately under "POOLS".
441 - **USED:** The space allocated for a pool over all OSDs. This includes
442 replication, allocation granularity, and erasure-coding overhead. Compression
443 savings and object content gaps are also taken into account. BlueStore's
444 database is not included in this amount.
445
446 - **(DATA):** object usage for RBD (RADOS Block Device), CephFS file data, and RGW
447 (RADOS Gateway) object data.
448 - **(OMAP):** object key-value pairs. Used primarily by CephFS and RGW (RADOS
449 Gateway) for metadata storage.
450
451 - **%USED:** The notional percentage of storage used per pool.
452 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
453 to this pool.
454 - **QUOTA OBJECTS:** The number of quota objects.
455 - **QUOTA BYTES:** The number of bytes in the quota objects.
456 - **DIRTY:** The number of objects in the cache pool that have been written to
457 the cache pool but have not been flushed yet to the base pool. This field is
458 only available when cache tiering is in use.
459 - **USED COMPR:** amount of space allocated for compressed data (i.e. this
460 includes compressed data plus all the allocation, replication and erasure
461 coding overhead).
462 - **UNDER COMPR:** amount of data passed through compression (summed over all
463 replicas) and beneficial enough to be stored in a compressed form.
464
465
466 .. note:: The numbers in the POOLS section are notional. They are not
467 inclusive of the number of replicas, snapshots or clones. As a result, the
468 sum of the USED and %USED amounts will not add up to the USED and %USED
469 amounts in the RAW section of the output.
470
471 .. note:: The MAX AVAIL value is a complicated function of the replication
472 or erasure code used, the CRUSH rule that maps storage to devices, the
473 utilization of those devices, and the configured ``mon_osd_full_ratio``.
474
475
476 Checking OSD Status
477 ===================
478
479 You can check OSDs to ensure they are ``up`` and ``in`` by executing the
480 following command:
481
482 .. prompt:: bash #
483
484 ceph osd stat
485
486 Or:
487
488 .. prompt:: bash #
489
490 ceph osd dump
491
492 You can also check view OSDs according to their position in the CRUSH map by
493 using the following command:
494
495 .. prompt:: bash #
496
497 ceph osd tree
498
499 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
500 and their weight:
501
502 .. code-block:: bash
503
504 #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
505 -1 3.00000 pool default
506 -3 3.00000 rack mainrack
507 -2 3.00000 host osd-host
508 0 ssd 1.00000 osd.0 up 1.00000 1.00000
509 1 ssd 1.00000 osd.1 up 1.00000 1.00000
510 2 ssd 1.00000 osd.2 up 1.00000 1.00000
511
512 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
513
514 Checking Monitor Status
515 =======================
516
517 If your cluster has multiple monitors (likely), you should check the monitor
518 quorum status after you start the cluster and before reading and/or writing data. A
519 quorum must be present when multiple monitors are running. You should also check
520 monitor status periodically to ensure that they are running.
521
522 To see display the monitor map, execute the following:
523
524 .. prompt:: bash $
525
526 ceph mon stat
527
528 Or:
529
530 .. prompt:: bash $
531
532 ceph mon dump
533
534 To check the quorum status for the monitor cluster, execute the following:
535
536 .. prompt:: bash $
537
538 ceph quorum_status
539
540 Ceph will return the quorum status. For example, a Ceph cluster consisting of
541 three monitors may return the following:
542
543 .. code-block:: javascript
544
545 { "election_epoch": 10,
546 "quorum": [
547 0,
548 1,
549 2],
550 "quorum_names": [
551 "a",
552 "b",
553 "c"],
554 "quorum_leader_name": "a",
555 "monmap": { "epoch": 1,
556 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
557 "modified": "2011-12-12 13:28:27.505520",
558 "created": "2011-12-12 13:28:27.505520",
559 "features": {"persistent": [
560 "kraken",
561 "luminous",
562 "mimic"],
563 "optional": []
564 },
565 "mons": [
566 { "rank": 0,
567 "name": "a",
568 "addr": "127.0.0.1:6789/0",
569 "public_addr": "127.0.0.1:6789/0"},
570 { "rank": 1,
571 "name": "b",
572 "addr": "127.0.0.1:6790/0",
573 "public_addr": "127.0.0.1:6790/0"},
574 { "rank": 2,
575 "name": "c",
576 "addr": "127.0.0.1:6791/0",
577 "public_addr": "127.0.0.1:6791/0"}
578 ]
579 }
580 }
581
582 Checking MDS Status
583 ===================
584
585 Metadata servers provide metadata services for CephFS. Metadata servers have
586 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
587 metadata servers are ``up`` and ``active``, execute the following:
588
589 .. prompt:: bash $
590
591 ceph mds stat
592
593 To display details of the metadata cluster, execute the following:
594
595 .. prompt:: bash $
596
597 ceph fs dump
598
599
600 Checking Placement Group States
601 ===============================
602
603 Placement groups map objects to OSDs. When you monitor your
604 placement groups, you will want them to be ``active`` and ``clean``.
605 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
606
607 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
608
609 .. _rados-monitoring-using-admin-socket:
610
611 Using the Admin Socket
612 ======================
613
614 The Ceph admin socket allows you to query a daemon via a socket interface.
615 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
616 via the admin socket, login to the host running the daemon and use the
617 following command:
618
619 .. prompt:: bash $
620
621 ceph daemon {daemon-name}
622 ceph daemon {path-to-socket-file}
623
624 For example, the following are equivalent:
625
626 .. prompt:: bash $
627
628 ceph daemon osd.0 foo
629 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
630
631 To view the available admin socket commands, execute the following command:
632
633 .. prompt:: bash $
634
635 ceph daemon {daemon-name} help
636
637 The admin socket command enables you to show and set your configuration at
638 runtime. See `Viewing a Configuration at Runtime`_ for details.
639
640 Additionally, you can set configuration values at runtime directly (i.e., the
641 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
642 config set``, which relies on the monitor but doesn't require you to login
643 directly to the host in question ).
644
645 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
646 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
647 .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/