]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/monitoring.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
1 ======================
2 Monitoring a Cluster
3 ======================
4
5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
7 status, placement group status and metadata server status.
8
9 Using the command line
10 ======================
11
12 Interactive mode
13 ----------------
14
15 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
16 with no arguments. For example::
17
18 ceph
19 ceph> health
20 ceph> status
21 ceph> quorum_status
22 ceph> mon stat
23
24 Non-default paths
25 -----------------
26
27 If you specified non-default locations for your configuration or keyring,
28 you may specify their locations::
29
30 ceph -c /path/to/conf -k /path/to/keyring health
31
32 Checking a Cluster's Status
33 ===========================
34
35 After you start your cluster, and before you start reading and/or
36 writing data, check your cluster's status first.
37
38 To check a cluster's status, execute the following::
39
40 ceph status
41
42 Or::
43
44 ceph -s
45
46 In interactive mode, type ``status`` and press **Enter**. ::
47
48 ceph> status
49
50 Ceph will print the cluster status. For example, a tiny Ceph demonstration
51 cluster with one of each service may print the following:
52
53 ::
54
55 cluster:
56 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
57 health: HEALTH_OK
58
59 services:
60 mon: 3 daemons, quorum a,b,c
61 mgr: x(active)
62 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
63 osd: 3 osds: 3 up, 3 in
64
65 data:
66 pools: 2 pools, 16 pgs
67 objects: 21 objects, 2.19K
68 usage: 546 GB used, 384 GB / 931 GB avail
69 pgs: 16 active+clean
70
71
72 .. topic:: How Ceph Calculates Data Usage
73
74 The ``usage`` value reflects the *actual* amount of raw storage used. The
75 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
76 of the overall storage capacity of the cluster. The notional number reflects
77 the size of the stored data before it is replicated, cloned or snapshotted.
78 Therefore, the amount of data actually stored typically exceeds the notional
79 amount stored, because Ceph creates replicas of the data and may also use
80 storage capacity for cloning and snapshotting.
81
82
83 Watching a Cluster
84 ==================
85
86 In addition to local logging by each daemon, Ceph clusters maintain
87 a *cluster log* that records high level events about the whole system.
88 This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
89 default), but can also be monitored via the command line.
90
91 To follow the cluster log, use the following command
92
93 ::
94
95 ceph -w
96
97 Ceph will print the status of the system, followed by each log message as it
98 is emitted. For example:
99
100 ::
101
102 cluster:
103 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
104 health: HEALTH_OK
105
106 services:
107 mon: 3 daemons, quorum a,b,c
108 mgr: x(active)
109 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
110 osd: 3 osds: 3 up, 3 in
111
112 data:
113 pools: 2 pools, 16 pgs
114 objects: 21 objects, 2.19K
115 usage: 546 GB used, 384 GB / 931 GB avail
116 pgs: 16 active+clean
117
118
119 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
120 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
121 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
122
123
124 In addition to using ``ceph -w`` to print log lines as they are emitted,
125 use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
126 log.
127
128 Monitoring Health Checks
129 ========================
130
131 Ceph continuously runs various *health checks* against its own status. When
132 a health check fails, this is reflected in the output of ``ceph status`` (or
133 ``ceph health``). In addition, messages are sent to the cluster log to
134 indicate when a check fails, and when the cluster recovers.
135
136 For example, when an OSD goes down, the ``health`` section of the status
137 output may be updated as follows:
138
139 ::
140
141 health: HEALTH_WARN
142 1 osds down
143 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
144
145 At this time, cluster log messages are also emitted to record the failure of the
146 health checks:
147
148 ::
149
150 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
151 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
152
153 When the OSD comes back online, the cluster log records the cluster's return
154 to a health state:
155
156 ::
157
158 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
159 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
160 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
161
162 Network Performance Checks
163 --------------------------
164
165 Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability. We
166 also use the response times to monitor network performance.
167 While it is possible that a busy OSD could delay a ping response, we can assume
168 that if a network switch fails multiple delays will be detected between distinct pairs of OSDs.
169
170 By default we will warn about ping times which exceed 1 second (1000 milliseconds).
171
172 ::
173
174 HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)
175
176 The health detail will add the combination of OSDs are seeing the delays and by how much. There is a limit of 10
177 detail line items.
178
179 ::
180
181 [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
182 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
183 Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
184 Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
185 Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec
186
187 To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used. Typically, this would be
188 sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD. The current threshold which defaults to 1 second
189 (1000 milliseconds) can be overridden as an argument in milliseconds.
190
191 The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.
192
193 ::
194
195 $ ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
196 {
197 "threshold": 0,
198 "entries": [
199 {
200 "last update": "Wed Sep 4 17:04:49 2019",
201 "stale": false,
202 "from osd": 2,
203 "to osd": 0,
204 "interface": "front",
205 "average": {
206 "1min": 1.023,
207 "5min": 0.860,
208 "15min": 0.883
209 },
210 "min": {
211 "1min": 0.818,
212 "5min": 0.607,
213 "15min": 0.607
214 },
215 "max": {
216 "1min": 1.164,
217 "5min": 1.173,
218 "15min": 1.544
219 },
220 "last": 0.924
221 },
222 {
223 "last update": "Wed Sep 4 17:04:49 2019",
224 "stale": false,
225 "from osd": 2,
226 "to osd": 0,
227 "interface": "back",
228 "average": {
229 "1min": 0.968,
230 "5min": 0.897,
231 "15min": 0.830
232 },
233 "min": {
234 "1min": 0.860,
235 "5min": 0.563,
236 "15min": 0.502
237 },
238 "max": {
239 "1min": 1.171,
240 "5min": 1.216,
241 "15min": 1.456
242 },
243 "last": 0.845
244 },
245 {
246 "last update": "Wed Sep 4 17:04:48 2019",
247 "stale": false,
248 "from osd": 0,
249 "to osd": 1,
250 "interface": "front",
251 "average": {
252 "1min": 0.965,
253 "5min": 0.811,
254 "15min": 0.850
255 },
256 "min": {
257 "1min": 0.650,
258 "5min": 0.488,
259 "15min": 0.466
260 },
261 "max": {
262 "1min": 1.252,
263 "5min": 1.252,
264 "15min": 1.362
265 },
266 "last": 0.791
267 },
268 ...
269
270
271
272 Muting health checks
273 --------------------
274
275 Health checks can be muted so that they do not affect the overall
276 reported status of the cluster. Alerts are specified using the health
277 check code (see :ref:`health-checks`)::
278
279 ceph health mute <code>
280
281 For example, if there is a health warning, muting it will make the
282 cluster report an overall status of ``HEALTH_OK``. For example, to
283 mute an ``OSD_DOWN`` alert,::
284
285 ceph health mute OSD_DOWN
286
287 Mutes are reported as part of the short and long form of the ``ceph health`` command.
288 For example, in the above scenario, the cluster would report::
289
290 $ ceph health
291 HEALTH_OK (muted: OSD_DOWN)
292 $ ceph health detail
293 HEALTH_OK (muted: OSD_DOWN)
294 (MUTED) OSD_DOWN 1 osds down
295 osd.1 is down
296
297 A mute can be explicitly removed with::
298
299 ceph health unmute <code>
300
301 For example,::
302
303 ceph health unmute OSD_DOWN
304
305 A health check mute may optionally have a TTL (time to live)
306 associated with it, such that the mute will automatically expire
307 after the specified period of time has elapsed. The TTL is specified as an optional
308 duration argument, e.g.::
309
310 ceph health mute OSD_DOWN 4h # mute for 4 hours
311 ceph health mute MON_DOWN 15m # mute for 15 minutes
312
313 Normally, if a muted health alert is resolved (e.g., in the example
314 above, the OSD comes back up), the mute goes away. If the alert comes
315 back later, it will be reported in the usual way.
316
317 It is possible to make a mute "sticky" such that the mute will remain even if the
318 alert clears. For example,::
319
320 ceph health mute OSD_DOWN 1h --sticky # ignore any/all down OSDs for next hour
321
322 Most health mutes also disappear if the extent of an alert gets worse. For example,
323 if there is one OSD down, and the alert is muted, the mute will disappear if one
324 or more additional OSDs go down. This is true for any health alert that involves
325 a count indicating how much or how many of something is triggering the warning or
326 error.
327
328
329 Detecting configuration issues
330 ==============================
331
332 In addition to the health checks that Ceph continuously runs on its
333 own status, there are some configuration issues that may only be detected
334 by an external tool.
335
336 Use the `ceph-medic`_ tool to run these additional checks on your Ceph
337 cluster's configuration.
338
339 Checking a Cluster's Usage Stats
340 ================================
341
342 To check a cluster's data usage and data distribution among pools, you can
343 use the ``df`` option. It is similar to Linux ``df``. Execute
344 the following::
345
346 ceph df
347
348 The **RAW STORAGE** section of the output provides an overview of the
349 amount of storage that is managed by your cluster.
350
351 - **CLASS:** The class of OSD device (or the total for the cluster)
352 - **SIZE:** The amount of storage capacity managed by the cluster.
353 - **AVAIL:** The amount of free space available in the cluster.
354 - **USED:** The amount of raw storage consumed by user data.
355 - **RAW USED:** The amount of raw storage consumed by user data, internal overhead, or reserved capacity.
356 - **%RAW USED:** The percentage of raw storage used. Use this number in
357 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
358 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
359 additional details.
360
361 The **POOLS** section of the output provides a list of pools and the notional
362 usage of each pool. The output from this section **DOES NOT** reflect replicas,
363 clones or snapshots. For example, if you store an object with 1MB of data, the
364 notional usage will be 1MB, but the actual usage may be 2MB or more depending
365 on the number of replicas, clones and snapshots.
366
367 - **NAME:** The name of the pool.
368 - **ID:** The pool ID.
369 - **USED:** The notional amount of data stored in kilobytes, unless the number
370 appends **M** for megabytes or **G** for gigabytes.
371 - **%USED:** The notional percentage of storage used per pool.
372 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
373 to this pool.
374 - **OBJECTS:** The notional number of objects stored per pool.
375
376 .. note:: The numbers in the **POOLS** section are notional. They are not
377 inclusive of the number of replicas, snapshots or clones. As a result,
378 the sum of the **USED** and **%USED** amounts will not add up to the
379 **USED** and **%USED** amounts in the **RAW** section of the
380 output.
381
382 .. note:: The **MAX AVAIL** value is a complicated function of the
383 replication or erasure code used, the CRUSH rule that maps storage
384 to devices, the utilization of those devices, and the configured
385 mon_osd_full_ratio.
386
387
388
389 Checking OSD Status
390 ===================
391
392 You can check OSDs to ensure they are ``up`` and ``in`` by executing::
393
394 ceph osd stat
395
396 Or::
397
398 ceph osd dump
399
400 You can also check view OSDs according to their position in the CRUSH map. ::
401
402 ceph osd tree
403
404 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
405 and their weight. ::
406
407 #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
408 -1 3.00000 pool default
409 -3 3.00000 rack mainrack
410 -2 3.00000 host osd-host
411 0 ssd 1.00000 osd.0 up 1.00000 1.00000
412 1 ssd 1.00000 osd.1 up 1.00000 1.00000
413 2 ssd 1.00000 osd.2 up 1.00000 1.00000
414
415 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
416
417 Checking Monitor Status
418 =======================
419
420 If your cluster has multiple monitors (likely), you should check the monitor
421 quorum status after you start the cluster and before reading and/or writing data. A
422 quorum must be present when multiple monitors are running. You should also check
423 monitor status periodically to ensure that they are running.
424
425 To see display the monitor map, execute the following::
426
427 ceph mon stat
428
429 Or::
430
431 ceph mon dump
432
433 To check the quorum status for the monitor cluster, execute the following::
434
435 ceph quorum_status
436
437 Ceph will return the quorum status. For example, a Ceph cluster consisting of
438 three monitors may return the following:
439
440 .. code-block:: javascript
441
442 { "election_epoch": 10,
443 "quorum": [
444 0,
445 1,
446 2],
447 "quorum_names": [
448 "a",
449 "b",
450 "c"],
451 "quorum_leader_name": "a",
452 "monmap": { "epoch": 1,
453 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
454 "modified": "2011-12-12 13:28:27.505520",
455 "created": "2011-12-12 13:28:27.505520",
456 "features": {"persistent": [
457 "kraken",
458 "luminous",
459 "mimic"],
460 "optional": []
461 },
462 "mons": [
463 { "rank": 0,
464 "name": "a",
465 "addr": "127.0.0.1:6789/0",
466 "public_addr": "127.0.0.1:6789/0"},
467 { "rank": 1,
468 "name": "b",
469 "addr": "127.0.0.1:6790/0",
470 "public_addr": "127.0.0.1:6790/0"},
471 { "rank": 2,
472 "name": "c",
473 "addr": "127.0.0.1:6791/0",
474 "public_addr": "127.0.0.1:6791/0"}
475 ]
476 }
477 }
478
479 Checking MDS Status
480 ===================
481
482 Metadata servers provide metadata services for CephFS. Metadata servers have
483 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
484 metadata servers are ``up`` and ``active``, execute the following::
485
486 ceph mds stat
487
488 To display details of the metadata cluster, execute the following::
489
490 ceph fs dump
491
492
493 Checking Placement Group States
494 ===============================
495
496 Placement groups map objects to OSDs. When you monitor your
497 placement groups, you will want them to be ``active`` and ``clean``.
498 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
499
500 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
501
502
503 Using the Admin Socket
504 ======================
505
506 The Ceph admin socket allows you to query a daemon via a socket interface.
507 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
508 via the admin socket, login to the host running the daemon and use the
509 following command::
510
511 ceph daemon {daemon-name}
512 ceph daemon {path-to-socket-file}
513
514 For example, the following are equivalent::
515
516 ceph daemon osd.0 foo
517 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
518
519 To view the available admin socket commands, execute the following command::
520
521 ceph daemon {daemon-name} help
522
523 The admin socket command enables you to show and set your configuration at
524 runtime. See `Viewing a Configuration at Runtime`_ for details.
525
526 Additionally, you can set configuration values at runtime directly (i.e., the
527 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
528 config set``, which relies on the monitor but doesn't require you to login
529 directly to the host in question ).
530
531 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
532 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
533 .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/