]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Monitoring a Cluster | |
3 | ====================== | |
4 | ||
5 | Once you have a running cluster, you may use the ``ceph`` tool to monitor your | |
6 | cluster. Monitoring a cluster typically involves checking OSD status, monitor | |
7 | status, placement group status and metadata server status. | |
8 | ||
c07f9fc5 FG |
9 | Using the command line |
10 | ====================== | |
11 | ||
12 | Interactive mode | |
13 | ---------------- | |
7c673cae FG |
14 | |
15 | To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line | |
16 | with no arguments. For example:: | |
17 | ||
18 | ceph | |
19 | ceph> health | |
20 | ceph> status | |
21 | ceph> quorum_status | |
22 | ceph> mon_status | |
7c673cae | 23 | |
c07f9fc5 FG |
24 | Non-default paths |
25 | ----------------- | |
7c673cae FG |
26 | |
27 | If you specified non-default locations for your configuration or keyring, | |
28 | you may specify their locations:: | |
29 | ||
30 | ceph -c /path/to/conf -k /path/to/keyring health | |
31 | ||
c07f9fc5 FG |
32 | Checking a Cluster's Status |
33 | =========================== | |
34 | ||
35 | After you start your cluster, and before you start reading and/or | |
36 | writing data, check your cluster's status first. | |
7c673cae | 37 | |
c07f9fc5 | 38 | To check a cluster's status, execute the following:: |
7c673cae | 39 | |
c07f9fc5 FG |
40 | ceph status |
41 | ||
42 | Or:: | |
7c673cae | 43 | |
c07f9fc5 FG |
44 | ceph -s |
45 | ||
46 | In interactive mode, type ``status`` and press **Enter**. :: | |
47 | ||
48 | ceph> status | |
49 | ||
50 | Ceph will print the cluster status. For example, a tiny Ceph demonstration | |
51 | cluster with one of each service may print the following: | |
52 | ||
53 | :: | |
54 | ||
55 | cluster: | |
56 | id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20 | |
57 | health: HEALTH_OK | |
58 | ||
59 | services: | |
11fdf7f2 | 60 | mon: 3 daemons, quorum a,b,c |
c07f9fc5 | 61 | mgr: x(active) |
11fdf7f2 TL |
62 | mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby |
63 | osd: 3 osds: 3 up, 3 in | |
c07f9fc5 FG |
64 | |
65 | data: | |
66 | pools: 2 pools, 16 pgs | |
11fdf7f2 | 67 | objects: 21 objects, 2.19K |
c07f9fc5 FG |
68 | usage: 546 GB used, 384 GB / 931 GB avail |
69 | pgs: 16 active+clean | |
7c673cae | 70 | |
7c673cae FG |
71 | |
72 | .. topic:: How Ceph Calculates Data Usage | |
73 | ||
c07f9fc5 | 74 | The ``usage`` value reflects the *actual* amount of raw storage used. The |
7c673cae FG |
75 | ``xxx GB / xxx GB`` value means the amount available (the lesser number) |
76 | of the overall storage capacity of the cluster. The notional number reflects | |
77 | the size of the stored data before it is replicated, cloned or snapshotted. | |
78 | Therefore, the amount of data actually stored typically exceeds the notional | |
79 | amount stored, because Ceph creates replicas of the data and may also use | |
80 | storage capacity for cloning and snapshotting. | |
81 | ||
82 | ||
c07f9fc5 FG |
83 | Watching a Cluster |
84 | ================== | |
85 | ||
86 | In addition to local logging by each daemon, Ceph clusters maintain | |
87 | a *cluster log* that records high level events about the whole system. | |
88 | This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by | |
89 | default), but can also be monitored via the command line. | |
90 | ||
91 | To follow the cluster log, use the following command | |
92 | ||
93 | :: | |
94 | ||
95 | ceph -w | |
96 | ||
97 | Ceph will print the status of the system, followed by each log message as it | |
98 | is emitted. For example: | |
99 | ||
100 | :: | |
101 | ||
102 | cluster: | |
103 | id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20 | |
104 | health: HEALTH_OK | |
105 | ||
106 | services: | |
11fdf7f2 | 107 | mon: 3 daemons, quorum a,b,c |
c07f9fc5 | 108 | mgr: x(active) |
11fdf7f2 TL |
109 | mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby |
110 | osd: 3 osds: 3 up, 3 in | |
c07f9fc5 FG |
111 | |
112 | data: | |
113 | pools: 2 pools, 16 pgs | |
11fdf7f2 | 114 | objects: 21 objects, 2.19K |
c07f9fc5 FG |
115 | usage: 546 GB used, 384 GB / 931 GB avail |
116 | pgs: 16 active+clean | |
117 | ||
118 | ||
119 | 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot | |
120 | 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x | |
121 | 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available | |
122 | ||
123 | ||
124 | In addition to using ``ceph -w`` to print log lines as they are emitted, | |
125 | use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster | |
126 | log. | |
127 | ||
128 | Monitoring Health Checks | |
129 | ======================== | |
130 | ||
11fdf7f2 | 131 | Ceph continuously runs various *health checks* against its own status. When |
c07f9fc5 FG |
132 | a health check fails, this is reflected in the output of ``ceph status`` (or |
133 | ``ceph health``). In addition, messages are sent to the cluster log to | |
134 | indicate when a check fails, and when the cluster recovers. | |
135 | ||
136 | For example, when an OSD goes down, the ``health`` section of the status | |
137 | output may be updated as follows: | |
138 | ||
139 | :: | |
140 | ||
141 | health: HEALTH_WARN | |
142 | 1 osds down | |
143 | Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded | |
144 | ||
145 | At this time, cluster log messages are also emitted to record the failure of the | |
146 | health checks: | |
147 | ||
148 | :: | |
149 | ||
150 | 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN) | |
151 | 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED) | |
152 | ||
153 | When the OSD comes back online, the cluster log records the cluster's return | |
154 | to a health state: | |
155 | ||
156 | :: | |
157 | ||
158 | 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED) | |
159 | 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized) | |
160 | 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy | |
161 | ||
162 | ||
163 | Detecting configuration issues | |
164 | ============================== | |
165 | ||
166 | In addition to the health checks that Ceph continuously runs on its | |
167 | own status, there are some configuration issues that may only be detected | |
168 | by an external tool. | |
169 | ||
170 | Use the `ceph-medic`_ tool to run these additional checks on your Ceph | |
171 | cluster's configuration. | |
172 | ||
7c673cae FG |
173 | Checking a Cluster's Usage Stats |
174 | ================================ | |
175 | ||
176 | To check a cluster's data usage and data distribution among pools, you can | |
177 | use the ``df`` option. It is similar to Linux ``df``. Execute | |
178 | the following:: | |
179 | ||
180 | ceph df | |
181 | ||
11fdf7f2 TL |
182 | The **RAW STORAGE** section of the output provides an overview of the |
183 | amount of storage that is managed by your cluster. | |
7c673cae | 184 | |
11fdf7f2 TL |
185 | - **CLASS:** The class of OSD device (or the total for the cluster) |
186 | - **SIZE:** The amount of storage capacity managed by the cluster. | |
7c673cae | 187 | - **AVAIL:** The amount of free space available in the cluster. |
11fdf7f2 TL |
188 | - **USED:** The amount of raw storage consumed by user data. |
189 | - **RAW USED:** The amount of raw storage consumed by user data, internal overhead, or reserved capacity. | |
190 | - **%RAW USED:** The percentage of raw storage used. Use this number in | |
7c673cae FG |
191 | conjunction with the ``full ratio`` and ``near full ratio`` to ensure that |
192 | you are not reaching your cluster's capacity. See `Storage Capacity`_ for | |
193 | additional details. | |
194 | ||
195 | The **POOLS** section of the output provides a list of pools and the notional | |
196 | usage of each pool. The output from this section **DOES NOT** reflect replicas, | |
197 | clones or snapshots. For example, if you store an object with 1MB of data, the | |
198 | notional usage will be 1MB, but the actual usage may be 2MB or more depending | |
199 | on the number of replicas, clones and snapshots. | |
200 | ||
201 | - **NAME:** The name of the pool. | |
202 | - **ID:** The pool ID. | |
203 | - **USED:** The notional amount of data stored in kilobytes, unless the number | |
204 | appends **M** for megabytes or **G** for gigabytes. | |
205 | - **%USED:** The notional percentage of storage used per pool. | |
206 | - **MAX AVAIL:** An estimate of the notional amount of data that can be written | |
207 | to this pool. | |
11fdf7f2 | 208 | - **OBJECTS:** The notional number of objects stored per pool. |
7c673cae FG |
209 | |
210 | .. note:: The numbers in the **POOLS** section are notional. They are not | |
11fdf7f2 | 211 | inclusive of the number of replicas, snapshots or clones. As a result, |
7c673cae | 212 | the sum of the **USED** and **%USED** amounts will not add up to the |
11fdf7f2 | 213 | **USED** and **%USED** amounts in the **RAW** section of the |
7c673cae FG |
214 | output. |
215 | ||
216 | .. note:: The **MAX AVAIL** value is a complicated function of the | |
217 | replication or erasure code used, the CRUSH rule that maps storage | |
218 | to devices, the utilization of those devices, and the configured | |
219 | mon_osd_full_ratio. | |
220 | ||
221 | ||
7c673cae FG |
222 | |
223 | Checking OSD Status | |
224 | =================== | |
225 | ||
226 | You can check OSDs to ensure they are ``up`` and ``in`` by executing:: | |
227 | ||
228 | ceph osd stat | |
229 | ||
230 | Or:: | |
231 | ||
232 | ceph osd dump | |
233 | ||
234 | You can also check view OSDs according to their position in the CRUSH map. :: | |
235 | ||
236 | ceph osd tree | |
237 | ||
238 | Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up | |
239 | and their weight. :: | |
240 | ||
11fdf7f2 TL |
241 | #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF |
242 | -1 3.00000 pool default | |
243 | -3 3.00000 rack mainrack | |
244 | -2 3.00000 host osd-host | |
245 | 0 ssd 1.00000 osd.0 up 1.00000 1.00000 | |
246 | 1 ssd 1.00000 osd.1 up 1.00000 1.00000 | |
247 | 2 ssd 1.00000 osd.2 up 1.00000 1.00000 | |
7c673cae FG |
248 | |
249 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
250 | ||
251 | Checking Monitor Status | |
252 | ======================= | |
253 | ||
254 | If your cluster has multiple monitors (likely), you should check the monitor | |
11fdf7f2 | 255 | quorum status after you start the cluster and before reading and/or writing data. A |
7c673cae FG |
256 | quorum must be present when multiple monitors are running. You should also check |
257 | monitor status periodically to ensure that they are running. | |
258 | ||
259 | To see display the monitor map, execute the following:: | |
260 | ||
261 | ceph mon stat | |
262 | ||
263 | Or:: | |
264 | ||
265 | ceph mon dump | |
266 | ||
267 | To check the quorum status for the monitor cluster, execute the following:: | |
268 | ||
269 | ceph quorum_status | |
270 | ||
271 | Ceph will return the quorum status. For example, a Ceph cluster consisting of | |
272 | three monitors may return the following: | |
273 | ||
274 | .. code-block:: javascript | |
275 | ||
276 | { "election_epoch": 10, | |
277 | "quorum": [ | |
278 | 0, | |
279 | 1, | |
280 | 2], | |
11fdf7f2 TL |
281 | "quorum_names": [ |
282 | "a", | |
283 | "b", | |
284 | "c"], | |
285 | "quorum_leader_name": "a", | |
7c673cae FG |
286 | "monmap": { "epoch": 1, |
287 | "fsid": "444b489c-4f16-4b75-83f0-cb8097468898", | |
288 | "modified": "2011-12-12 13:28:27.505520", | |
289 | "created": "2011-12-12 13:28:27.505520", | |
11fdf7f2 TL |
290 | "features": {"persistent": [ |
291 | "kraken", | |
292 | "luminous", | |
293 | "mimic"], | |
294 | "optional": [] | |
295 | }, | |
7c673cae FG |
296 | "mons": [ |
297 | { "rank": 0, | |
298 | "name": "a", | |
11fdf7f2 TL |
299 | "addr": "127.0.0.1:6789/0", |
300 | "public_addr": "127.0.0.1:6789/0"}, | |
7c673cae FG |
301 | { "rank": 1, |
302 | "name": "b", | |
11fdf7f2 TL |
303 | "addr": "127.0.0.1:6790/0", |
304 | "public_addr": "127.0.0.1:6790/0"}, | |
7c673cae FG |
305 | { "rank": 2, |
306 | "name": "c", | |
11fdf7f2 TL |
307 | "addr": "127.0.0.1:6791/0", |
308 | "public_addr": "127.0.0.1:6791/0"} | |
7c673cae | 309 | ] |
11fdf7f2 | 310 | } |
7c673cae FG |
311 | } |
312 | ||
313 | Checking MDS Status | |
314 | =================== | |
315 | ||
91327a77 | 316 | Metadata servers provide metadata services for CephFS. Metadata servers have |
7c673cae FG |
317 | two sets of states: ``up | down`` and ``active | inactive``. To ensure your |
318 | metadata servers are ``up`` and ``active``, execute the following:: | |
319 | ||
320 | ceph mds stat | |
321 | ||
322 | To display details of the metadata cluster, execute the following:: | |
323 | ||
324 | ceph fs dump | |
325 | ||
326 | ||
327 | Checking Placement Group States | |
328 | =============================== | |
329 | ||
330 | Placement groups map objects to OSDs. When you monitor your | |
331 | placement groups, you will want them to be ``active`` and ``clean``. | |
332 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
333 | ||
334 | .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg | |
335 | ||
336 | ||
337 | Using the Admin Socket | |
338 | ====================== | |
339 | ||
340 | The Ceph admin socket allows you to query a daemon via a socket interface. | |
341 | By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon | |
342 | via the admin socket, login to the host running the daemon and use the | |
343 | following command:: | |
344 | ||
345 | ceph daemon {daemon-name} | |
346 | ceph daemon {path-to-socket-file} | |
347 | ||
348 | For example, the following are equivalent:: | |
349 | ||
350 | ceph daemon osd.0 foo | |
351 | ceph daemon /var/run/ceph/ceph-osd.0.asok foo | |
352 | ||
353 | To view the available admin socket commands, execute the following command:: | |
354 | ||
355 | ceph daemon {daemon-name} help | |
356 | ||
357 | The admin socket command enables you to show and set your configuration at | |
358 | runtime. See `Viewing a Configuration at Runtime`_ for details. | |
359 | ||
360 | Additionally, you can set configuration values at runtime directly (i.e., the | |
361 | admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id} | |
11fdf7f2 | 362 | config set``, which relies on the monitor but doesn't require you to login |
7c673cae FG |
363 | directly to the host in question ). |
364 | ||
11fdf7f2 | 365 | .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime |
7c673cae | 366 | .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity |
c07f9fc5 | 367 | .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/ |