]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Monitoring a Cluster | |
3 | ====================== | |
4 | ||
5 | Once you have a running cluster, you may use the ``ceph`` tool to monitor your | |
6 | cluster. Monitoring a cluster typically involves checking OSD status, monitor | |
7 | status, placement group status and metadata server status. | |
8 | ||
c07f9fc5 FG |
9 | Using the command line |
10 | ====================== | |
11 | ||
12 | Interactive mode | |
13 | ---------------- | |
7c673cae FG |
14 | |
15 | To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line | |
16 | with no arguments. For example:: | |
17 | ||
18 | ceph | |
19 | ceph> health | |
20 | ceph> status | |
21 | ceph> quorum_status | |
22 | ceph> mon_status | |
7c673cae | 23 | |
c07f9fc5 FG |
24 | Non-default paths |
25 | ----------------- | |
7c673cae FG |
26 | |
27 | If you specified non-default locations for your configuration or keyring, | |
28 | you may specify their locations:: | |
29 | ||
30 | ceph -c /path/to/conf -k /path/to/keyring health | |
31 | ||
c07f9fc5 FG |
32 | Checking a Cluster's Status |
33 | =========================== | |
34 | ||
35 | After you start your cluster, and before you start reading and/or | |
36 | writing data, check your cluster's status first. | |
7c673cae | 37 | |
c07f9fc5 | 38 | To check a cluster's status, execute the following:: |
7c673cae | 39 | |
c07f9fc5 FG |
40 | ceph status |
41 | ||
42 | Or:: | |
7c673cae | 43 | |
c07f9fc5 FG |
44 | ceph -s |
45 | ||
46 | In interactive mode, type ``status`` and press **Enter**. :: | |
47 | ||
48 | ceph> status | |
49 | ||
50 | Ceph will print the cluster status. For example, a tiny Ceph demonstration | |
51 | cluster with one of each service may print the following: | |
52 | ||
53 | :: | |
54 | ||
55 | cluster: | |
56 | id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20 | |
57 | health: HEALTH_OK | |
58 | ||
59 | services: | |
11fdf7f2 | 60 | mon: 3 daemons, quorum a,b,c |
c07f9fc5 | 61 | mgr: x(active) |
11fdf7f2 TL |
62 | mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby |
63 | osd: 3 osds: 3 up, 3 in | |
c07f9fc5 FG |
64 | |
65 | data: | |
66 | pools: 2 pools, 16 pgs | |
11fdf7f2 | 67 | objects: 21 objects, 2.19K |
c07f9fc5 FG |
68 | usage: 546 GB used, 384 GB / 931 GB avail |
69 | pgs: 16 active+clean | |
7c673cae | 70 | |
7c673cae FG |
71 | |
72 | .. topic:: How Ceph Calculates Data Usage | |
73 | ||
c07f9fc5 | 74 | The ``usage`` value reflects the *actual* amount of raw storage used. The |
7c673cae FG |
75 | ``xxx GB / xxx GB`` value means the amount available (the lesser number) |
76 | of the overall storage capacity of the cluster. The notional number reflects | |
77 | the size of the stored data before it is replicated, cloned or snapshotted. | |
78 | Therefore, the amount of data actually stored typically exceeds the notional | |
79 | amount stored, because Ceph creates replicas of the data and may also use | |
80 | storage capacity for cloning and snapshotting. | |
81 | ||
82 | ||
c07f9fc5 FG |
83 | Watching a Cluster |
84 | ================== | |
85 | ||
86 | In addition to local logging by each daemon, Ceph clusters maintain | |
87 | a *cluster log* that records high level events about the whole system. | |
88 | This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by | |
89 | default), but can also be monitored via the command line. | |
90 | ||
91 | To follow the cluster log, use the following command | |
92 | ||
93 | :: | |
94 | ||
95 | ceph -w | |
96 | ||
97 | Ceph will print the status of the system, followed by each log message as it | |
98 | is emitted. For example: | |
99 | ||
100 | :: | |
101 | ||
102 | cluster: | |
103 | id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20 | |
104 | health: HEALTH_OK | |
105 | ||
106 | services: | |
11fdf7f2 | 107 | mon: 3 daemons, quorum a,b,c |
c07f9fc5 | 108 | mgr: x(active) |
11fdf7f2 TL |
109 | mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby |
110 | osd: 3 osds: 3 up, 3 in | |
c07f9fc5 FG |
111 | |
112 | data: | |
113 | pools: 2 pools, 16 pgs | |
11fdf7f2 | 114 | objects: 21 objects, 2.19K |
c07f9fc5 FG |
115 | usage: 546 GB used, 384 GB / 931 GB avail |
116 | pgs: 16 active+clean | |
117 | ||
118 | ||
119 | 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot | |
120 | 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x | |
121 | 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available | |
122 | ||
123 | ||
124 | In addition to using ``ceph -w`` to print log lines as they are emitted, | |
125 | use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster | |
126 | log. | |
127 | ||
128 | Monitoring Health Checks | |
129 | ======================== | |
130 | ||
11fdf7f2 | 131 | Ceph continuously runs various *health checks* against its own status. When |
c07f9fc5 FG |
132 | a health check fails, this is reflected in the output of ``ceph status`` (or |
133 | ``ceph health``). In addition, messages are sent to the cluster log to | |
134 | indicate when a check fails, and when the cluster recovers. | |
135 | ||
136 | For example, when an OSD goes down, the ``health`` section of the status | |
137 | output may be updated as follows: | |
138 | ||
139 | :: | |
140 | ||
141 | health: HEALTH_WARN | |
142 | 1 osds down | |
143 | Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded | |
144 | ||
145 | At this time, cluster log messages are also emitted to record the failure of the | |
146 | health checks: | |
147 | ||
148 | :: | |
149 | ||
150 | 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN) | |
151 | 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED) | |
152 | ||
153 | When the OSD comes back online, the cluster log records the cluster's return | |
154 | to a health state: | |
155 | ||
156 | :: | |
157 | ||
158 | 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED) | |
159 | 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized) | |
160 | 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy | |
161 | ||
eafe8130 TL |
162 | Network Performance Checks |
163 | -------------------------- | |
164 | ||
165 | Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability. We | |
166 | also use the response times to monitor network performance. | |
167 | While it is possible that a busy OSD could delay a ping response, we can assume | |
168 | that if a network switch fails mutiple delays will be detected between distinct pairs of OSDs. | |
169 | ||
170 | By default we will warn about ping times which exceed 1 second (1000 milliseconds). | |
171 | ||
172 | :: | |
173 | ||
174 | HEALTH_WARN Long heartbeat ping times on back interface seen, longest is 1118.001 msec | |
175 | ||
176 | The health detail will add the combination of OSDs are seeing the delays and by how much. There is a limit of 10 | |
177 | detail line items. | |
178 | ||
179 | :: | |
180 | ||
181 | [WRN] OSD_SLOW_PING_TIME_BACK: Long heartbeat ping times on back interface seen, longest is 1118.001 msec | |
182 | Slow heartbeat ping on back interface from osd.0 to osd.1 1118.001 msec | |
183 | Slow heartbeat ping on back interface from osd.0 to osd.2 1030.123 msec | |
184 | Slow heartbeat ping on back interface from osd.2 to osd.1 1015.321 msec | |
185 | Slow heartbeat ping on back interface from osd.1 to osd.0 1010.456 msec | |
186 | ||
187 | To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used. Typically, this would be | |
188 | sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD. The current threshold which defaults to 1 second | |
189 | (1000 milliseconds) can be overridden as an argument in milliseconds. | |
190 | ||
191 | The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr. | |
192 | ||
193 | :: | |
194 | ||
195 | $ ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0 | |
196 | { | |
197 | "threshold": 0, | |
198 | "entries": [ | |
199 | { | |
200 | "last update": "Wed Sep 4 17:04:49 2019", | |
201 | "stale": false, | |
202 | "from osd": 2, | |
203 | "to osd": 0, | |
204 | "interface": "front", | |
205 | "average": { | |
206 | "1min": 1.023, | |
207 | "5min": 0.860, | |
208 | "15min": 0.883 | |
209 | }, | |
210 | "min": { | |
211 | "1min": 0.818, | |
212 | "5min": 0.607, | |
213 | "15min": 0.607 | |
214 | }, | |
215 | "max": { | |
216 | "1min": 1.164, | |
217 | "5min": 1.173, | |
218 | "15min": 1.544 | |
219 | }, | |
220 | "last": 0.924 | |
221 | }, | |
222 | { | |
223 | "last update": "Wed Sep 4 17:04:49 2019", | |
224 | "stale": false, | |
225 | "from osd": 2, | |
226 | "to osd": 0, | |
227 | "interface": "back", | |
228 | "average": { | |
229 | "1min": 0.968, | |
230 | "5min": 0.897, | |
231 | "15min": 0.830 | |
232 | }, | |
233 | "min": { | |
234 | "1min": 0.860, | |
235 | "5min": 0.563, | |
236 | "15min": 0.502 | |
237 | }, | |
238 | "max": { | |
239 | "1min": 1.171, | |
240 | "5min": 1.216, | |
241 | "15min": 1.456 | |
242 | }, | |
243 | "last": 0.845 | |
244 | }, | |
245 | { | |
246 | "last update": "Wed Sep 4 17:04:48 2019", | |
247 | "stale": false, | |
248 | "from osd": 0, | |
249 | "to osd": 1, | |
250 | "interface": "front", | |
251 | "average": { | |
252 | "1min": 0.965, | |
253 | "5min": 0.811, | |
254 | "15min": 0.850 | |
255 | }, | |
256 | "min": { | |
257 | "1min": 0.650, | |
258 | "5min": 0.488, | |
259 | "15min": 0.466 | |
260 | }, | |
261 | "max": { | |
262 | "1min": 1.252, | |
263 | "5min": 1.252, | |
264 | "15min": 1.362 | |
265 | }, | |
266 | "last": 0.791 | |
267 | }, | |
268 | ... | |
269 | ||
c07f9fc5 FG |
270 | |
271 | Detecting configuration issues | |
272 | ============================== | |
273 | ||
274 | In addition to the health checks that Ceph continuously runs on its | |
275 | own status, there are some configuration issues that may only be detected | |
276 | by an external tool. | |
277 | ||
278 | Use the `ceph-medic`_ tool to run these additional checks on your Ceph | |
279 | cluster's configuration. | |
280 | ||
7c673cae FG |
281 | Checking a Cluster's Usage Stats |
282 | ================================ | |
283 | ||
284 | To check a cluster's data usage and data distribution among pools, you can | |
285 | use the ``df`` option. It is similar to Linux ``df``. Execute | |
286 | the following:: | |
287 | ||
288 | ceph df | |
289 | ||
11fdf7f2 TL |
290 | The **RAW STORAGE** section of the output provides an overview of the |
291 | amount of storage that is managed by your cluster. | |
7c673cae | 292 | |
11fdf7f2 TL |
293 | - **CLASS:** The class of OSD device (or the total for the cluster) |
294 | - **SIZE:** The amount of storage capacity managed by the cluster. | |
7c673cae | 295 | - **AVAIL:** The amount of free space available in the cluster. |
11fdf7f2 TL |
296 | - **USED:** The amount of raw storage consumed by user data. |
297 | - **RAW USED:** The amount of raw storage consumed by user data, internal overhead, or reserved capacity. | |
298 | - **%RAW USED:** The percentage of raw storage used. Use this number in | |
7c673cae FG |
299 | conjunction with the ``full ratio`` and ``near full ratio`` to ensure that |
300 | you are not reaching your cluster's capacity. See `Storage Capacity`_ for | |
301 | additional details. | |
302 | ||
303 | The **POOLS** section of the output provides a list of pools and the notional | |
304 | usage of each pool. The output from this section **DOES NOT** reflect replicas, | |
305 | clones or snapshots. For example, if you store an object with 1MB of data, the | |
306 | notional usage will be 1MB, but the actual usage may be 2MB or more depending | |
307 | on the number of replicas, clones and snapshots. | |
308 | ||
309 | - **NAME:** The name of the pool. | |
310 | - **ID:** The pool ID. | |
311 | - **USED:** The notional amount of data stored in kilobytes, unless the number | |
312 | appends **M** for megabytes or **G** for gigabytes. | |
313 | - **%USED:** The notional percentage of storage used per pool. | |
314 | - **MAX AVAIL:** An estimate of the notional amount of data that can be written | |
315 | to this pool. | |
11fdf7f2 | 316 | - **OBJECTS:** The notional number of objects stored per pool. |
7c673cae FG |
317 | |
318 | .. note:: The numbers in the **POOLS** section are notional. They are not | |
11fdf7f2 | 319 | inclusive of the number of replicas, snapshots or clones. As a result, |
7c673cae | 320 | the sum of the **USED** and **%USED** amounts will not add up to the |
11fdf7f2 | 321 | **USED** and **%USED** amounts in the **RAW** section of the |
7c673cae FG |
322 | output. |
323 | ||
324 | .. note:: The **MAX AVAIL** value is a complicated function of the | |
325 | replication or erasure code used, the CRUSH rule that maps storage | |
326 | to devices, the utilization of those devices, and the configured | |
327 | mon_osd_full_ratio. | |
328 | ||
329 | ||
7c673cae FG |
330 | |
331 | Checking OSD Status | |
332 | =================== | |
333 | ||
334 | You can check OSDs to ensure they are ``up`` and ``in`` by executing:: | |
335 | ||
336 | ceph osd stat | |
337 | ||
338 | Or:: | |
339 | ||
340 | ceph osd dump | |
341 | ||
342 | You can also check view OSDs according to their position in the CRUSH map. :: | |
343 | ||
344 | ceph osd tree | |
345 | ||
346 | Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up | |
347 | and their weight. :: | |
348 | ||
11fdf7f2 TL |
349 | #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF |
350 | -1 3.00000 pool default | |
351 | -3 3.00000 rack mainrack | |
352 | -2 3.00000 host osd-host | |
353 | 0 ssd 1.00000 osd.0 up 1.00000 1.00000 | |
354 | 1 ssd 1.00000 osd.1 up 1.00000 1.00000 | |
355 | 2 ssd 1.00000 osd.2 up 1.00000 1.00000 | |
7c673cae FG |
356 | |
357 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
358 | ||
359 | Checking Monitor Status | |
360 | ======================= | |
361 | ||
362 | If your cluster has multiple monitors (likely), you should check the monitor | |
11fdf7f2 | 363 | quorum status after you start the cluster and before reading and/or writing data. A |
7c673cae FG |
364 | quorum must be present when multiple monitors are running. You should also check |
365 | monitor status periodically to ensure that they are running. | |
366 | ||
367 | To see display the monitor map, execute the following:: | |
368 | ||
369 | ceph mon stat | |
370 | ||
371 | Or:: | |
372 | ||
373 | ceph mon dump | |
374 | ||
375 | To check the quorum status for the monitor cluster, execute the following:: | |
376 | ||
377 | ceph quorum_status | |
378 | ||
379 | Ceph will return the quorum status. For example, a Ceph cluster consisting of | |
380 | three monitors may return the following: | |
381 | ||
382 | .. code-block:: javascript | |
383 | ||
384 | { "election_epoch": 10, | |
385 | "quorum": [ | |
386 | 0, | |
387 | 1, | |
388 | 2], | |
11fdf7f2 TL |
389 | "quorum_names": [ |
390 | "a", | |
391 | "b", | |
392 | "c"], | |
393 | "quorum_leader_name": "a", | |
7c673cae FG |
394 | "monmap": { "epoch": 1, |
395 | "fsid": "444b489c-4f16-4b75-83f0-cb8097468898", | |
396 | "modified": "2011-12-12 13:28:27.505520", | |
397 | "created": "2011-12-12 13:28:27.505520", | |
11fdf7f2 TL |
398 | "features": {"persistent": [ |
399 | "kraken", | |
400 | "luminous", | |
401 | "mimic"], | |
402 | "optional": [] | |
403 | }, | |
7c673cae FG |
404 | "mons": [ |
405 | { "rank": 0, | |
406 | "name": "a", | |
11fdf7f2 TL |
407 | "addr": "127.0.0.1:6789/0", |
408 | "public_addr": "127.0.0.1:6789/0"}, | |
7c673cae FG |
409 | { "rank": 1, |
410 | "name": "b", | |
11fdf7f2 TL |
411 | "addr": "127.0.0.1:6790/0", |
412 | "public_addr": "127.0.0.1:6790/0"}, | |
7c673cae FG |
413 | { "rank": 2, |
414 | "name": "c", | |
11fdf7f2 TL |
415 | "addr": "127.0.0.1:6791/0", |
416 | "public_addr": "127.0.0.1:6791/0"} | |
7c673cae | 417 | ] |
11fdf7f2 | 418 | } |
7c673cae FG |
419 | } |
420 | ||
421 | Checking MDS Status | |
422 | =================== | |
423 | ||
91327a77 | 424 | Metadata servers provide metadata services for CephFS. Metadata servers have |
7c673cae FG |
425 | two sets of states: ``up | down`` and ``active | inactive``. To ensure your |
426 | metadata servers are ``up`` and ``active``, execute the following:: | |
427 | ||
428 | ceph mds stat | |
429 | ||
430 | To display details of the metadata cluster, execute the following:: | |
431 | ||
432 | ceph fs dump | |
433 | ||
434 | ||
435 | Checking Placement Group States | |
436 | =============================== | |
437 | ||
438 | Placement groups map objects to OSDs. When you monitor your | |
439 | placement groups, you will want them to be ``active`` and ``clean``. | |
440 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
441 | ||
442 | .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg | |
443 | ||
444 | ||
445 | Using the Admin Socket | |
446 | ====================== | |
447 | ||
448 | The Ceph admin socket allows you to query a daemon via a socket interface. | |
449 | By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon | |
450 | via the admin socket, login to the host running the daemon and use the | |
451 | following command:: | |
452 | ||
453 | ceph daemon {daemon-name} | |
454 | ceph daemon {path-to-socket-file} | |
455 | ||
456 | For example, the following are equivalent:: | |
457 | ||
458 | ceph daemon osd.0 foo | |
459 | ceph daemon /var/run/ceph/ceph-osd.0.asok foo | |
460 | ||
461 | To view the available admin socket commands, execute the following command:: | |
462 | ||
463 | ceph daemon {daemon-name} help | |
464 | ||
465 | The admin socket command enables you to show and set your configuration at | |
466 | runtime. See `Viewing a Configuration at Runtime`_ for details. | |
467 | ||
468 | Additionally, you can set configuration values at runtime directly (i.e., the | |
469 | admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id} | |
11fdf7f2 | 470 | config set``, which relies on the monitor but doesn't require you to login |
7c673cae FG |
471 | directly to the host in question ). |
472 | ||
11fdf7f2 | 473 | .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime |
7c673cae | 474 | .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity |
c07f9fc5 | 475 | .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/ |