]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Monitoring a Cluster | |
3 | ====================== | |
4 | ||
5 | Once you have a running cluster, you may use the ``ceph`` tool to monitor your | |
6 | cluster. Monitoring a cluster typically involves checking OSD status, monitor | |
7 | status, placement group status and metadata server status. | |
8 | ||
c07f9fc5 FG |
9 | Using the command line |
10 | ====================== | |
11 | ||
12 | Interactive mode | |
13 | ---------------- | |
7c673cae FG |
14 | |
15 | To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line | |
16 | with no arguments. For example:: | |
17 | ||
18 | ceph | |
19 | ceph> health | |
20 | ceph> status | |
21 | ceph> quorum_status | |
22 | ceph> mon_status | |
7c673cae | 23 | |
c07f9fc5 FG |
24 | Non-default paths |
25 | ----------------- | |
7c673cae FG |
26 | |
27 | If you specified non-default locations for your configuration or keyring, | |
28 | you may specify their locations:: | |
29 | ||
30 | ceph -c /path/to/conf -k /path/to/keyring health | |
31 | ||
c07f9fc5 FG |
32 | Checking a Cluster's Status |
33 | =========================== | |
34 | ||
35 | After you start your cluster, and before you start reading and/or | |
36 | writing data, check your cluster's status first. | |
7c673cae | 37 | |
c07f9fc5 | 38 | To check a cluster's status, execute the following:: |
7c673cae | 39 | |
c07f9fc5 FG |
40 | ceph status |
41 | ||
42 | Or:: | |
7c673cae | 43 | |
c07f9fc5 FG |
44 | ceph -s |
45 | ||
46 | In interactive mode, type ``status`` and press **Enter**. :: | |
47 | ||
48 | ceph> status | |
49 | ||
50 | Ceph will print the cluster status. For example, a tiny Ceph demonstration | |
51 | cluster with one of each service may print the following: | |
52 | ||
53 | :: | |
54 | ||
55 | cluster: | |
56 | id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20 | |
57 | health: HEALTH_OK | |
58 | ||
59 | services: | |
60 | mon: 1 daemons, quorum a | |
61 | mgr: x(active) | |
62 | mds: 1/1/1 up {0=a=up:active} | |
63 | osd: 1 osds: 1 up, 1 in | |
64 | ||
65 | data: | |
66 | pools: 2 pools, 16 pgs | |
67 | objects: 21 objects, 2246 bytes | |
68 | usage: 546 GB used, 384 GB / 931 GB avail | |
69 | pgs: 16 active+clean | |
7c673cae | 70 | |
7c673cae FG |
71 | |
72 | .. topic:: How Ceph Calculates Data Usage | |
73 | ||
c07f9fc5 | 74 | The ``usage`` value reflects the *actual* amount of raw storage used. The |
7c673cae FG |
75 | ``xxx GB / xxx GB`` value means the amount available (the lesser number) |
76 | of the overall storage capacity of the cluster. The notional number reflects | |
77 | the size of the stored data before it is replicated, cloned or snapshotted. | |
78 | Therefore, the amount of data actually stored typically exceeds the notional | |
79 | amount stored, because Ceph creates replicas of the data and may also use | |
80 | storage capacity for cloning and snapshotting. | |
81 | ||
82 | ||
c07f9fc5 FG |
83 | Watching a Cluster |
84 | ================== | |
85 | ||
86 | In addition to local logging by each daemon, Ceph clusters maintain | |
87 | a *cluster log* that records high level events about the whole system. | |
88 | This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by | |
89 | default), but can also be monitored via the command line. | |
90 | ||
91 | To follow the cluster log, use the following command | |
92 | ||
93 | :: | |
94 | ||
95 | ceph -w | |
96 | ||
97 | Ceph will print the status of the system, followed by each log message as it | |
98 | is emitted. For example: | |
99 | ||
100 | :: | |
101 | ||
102 | cluster: | |
103 | id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20 | |
104 | health: HEALTH_OK | |
105 | ||
106 | services: | |
107 | mon: 1 daemons, quorum a | |
108 | mgr: x(active) | |
109 | mds: 1/1/1 up {0=a=up:active} | |
110 | osd: 1 osds: 1 up, 1 in | |
111 | ||
112 | data: | |
113 | pools: 2 pools, 16 pgs | |
114 | objects: 21 objects, 2246 bytes | |
115 | usage: 546 GB used, 384 GB / 931 GB avail | |
116 | pgs: 16 active+clean | |
117 | ||
118 | ||
119 | 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot | |
120 | 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x | |
121 | 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available | |
122 | ||
123 | ||
124 | In addition to using ``ceph -w`` to print log lines as they are emitted, | |
125 | use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster | |
126 | log. | |
127 | ||
128 | Monitoring Health Checks | |
129 | ======================== | |
130 | ||
131 | Ceph continously runs various *health checks* against its own status. When | |
132 | a health check fails, this is reflected in the output of ``ceph status`` (or | |
133 | ``ceph health``). In addition, messages are sent to the cluster log to | |
134 | indicate when a check fails, and when the cluster recovers. | |
135 | ||
136 | For example, when an OSD goes down, the ``health`` section of the status | |
137 | output may be updated as follows: | |
138 | ||
139 | :: | |
140 | ||
141 | health: HEALTH_WARN | |
142 | 1 osds down | |
143 | Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded | |
144 | ||
145 | At this time, cluster log messages are also emitted to record the failure of the | |
146 | health checks: | |
147 | ||
148 | :: | |
149 | ||
150 | 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN) | |
151 | 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED) | |
152 | ||
153 | When the OSD comes back online, the cluster log records the cluster's return | |
154 | to a health state: | |
155 | ||
156 | :: | |
157 | ||
158 | 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED) | |
159 | 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized) | |
160 | 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy | |
161 | ||
162 | ||
163 | Detecting configuration issues | |
164 | ============================== | |
165 | ||
166 | In addition to the health checks that Ceph continuously runs on its | |
167 | own status, there are some configuration issues that may only be detected | |
168 | by an external tool. | |
169 | ||
170 | Use the `ceph-medic`_ tool to run these additional checks on your Ceph | |
171 | cluster's configuration. | |
172 | ||
7c673cae FG |
173 | Checking a Cluster's Usage Stats |
174 | ================================ | |
175 | ||
176 | To check a cluster's data usage and data distribution among pools, you can | |
177 | use the ``df`` option. It is similar to Linux ``df``. Execute | |
178 | the following:: | |
179 | ||
180 | ceph df | |
181 | ||
182 | The **GLOBAL** section of the output provides an overview of the amount of | |
183 | storage your cluster uses for your data. | |
184 | ||
185 | - **SIZE:** The overall storage capacity of the cluster. | |
186 | - **AVAIL:** The amount of free space available in the cluster. | |
187 | - **RAW USED:** The amount of raw storage used. | |
188 | - **% RAW USED:** The percentage of raw storage used. Use this number in | |
189 | conjunction with the ``full ratio`` and ``near full ratio`` to ensure that | |
190 | you are not reaching your cluster's capacity. See `Storage Capacity`_ for | |
191 | additional details. | |
192 | ||
193 | The **POOLS** section of the output provides a list of pools and the notional | |
194 | usage of each pool. The output from this section **DOES NOT** reflect replicas, | |
195 | clones or snapshots. For example, if you store an object with 1MB of data, the | |
196 | notional usage will be 1MB, but the actual usage may be 2MB or more depending | |
197 | on the number of replicas, clones and snapshots. | |
198 | ||
199 | - **NAME:** The name of the pool. | |
200 | - **ID:** The pool ID. | |
201 | - **USED:** The notional amount of data stored in kilobytes, unless the number | |
202 | appends **M** for megabytes or **G** for gigabytes. | |
203 | - **%USED:** The notional percentage of storage used per pool. | |
204 | - **MAX AVAIL:** An estimate of the notional amount of data that can be written | |
205 | to this pool. | |
206 | - **Objects:** The notional number of objects stored per pool. | |
207 | ||
208 | .. note:: The numbers in the **POOLS** section are notional. They are not | |
209 | inclusive of the number of replicas, shapshots or clones. As a result, | |
210 | the sum of the **USED** and **%USED** amounts will not add up to the | |
211 | **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the | |
212 | output. | |
213 | ||
214 | .. note:: The **MAX AVAIL** value is a complicated function of the | |
215 | replication or erasure code used, the CRUSH rule that maps storage | |
216 | to devices, the utilization of those devices, and the configured | |
217 | mon_osd_full_ratio. | |
218 | ||
219 | ||
7c673cae FG |
220 | |
221 | Checking OSD Status | |
222 | =================== | |
223 | ||
224 | You can check OSDs to ensure they are ``up`` and ``in`` by executing:: | |
225 | ||
226 | ceph osd stat | |
227 | ||
228 | Or:: | |
229 | ||
230 | ceph osd dump | |
231 | ||
232 | You can also check view OSDs according to their position in the CRUSH map. :: | |
233 | ||
234 | ceph osd tree | |
235 | ||
236 | Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up | |
237 | and their weight. :: | |
238 | ||
239 | # id weight type name up/down reweight | |
240 | -1 3 pool default | |
241 | -3 3 rack mainrack | |
242 | -2 3 host osd-host | |
243 | 0 1 osd.0 up 1 | |
244 | 1 1 osd.1 up 1 | |
245 | 2 1 osd.2 up 1 | |
246 | ||
247 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
248 | ||
249 | Checking Monitor Status | |
250 | ======================= | |
251 | ||
252 | If your cluster has multiple monitors (likely), you should check the monitor | |
253 | quorum status after you start the cluster before reading and/or writing data. A | |
254 | quorum must be present when multiple monitors are running. You should also check | |
255 | monitor status periodically to ensure that they are running. | |
256 | ||
257 | To see display the monitor map, execute the following:: | |
258 | ||
259 | ceph mon stat | |
260 | ||
261 | Or:: | |
262 | ||
263 | ceph mon dump | |
264 | ||
265 | To check the quorum status for the monitor cluster, execute the following:: | |
266 | ||
267 | ceph quorum_status | |
268 | ||
269 | Ceph will return the quorum status. For example, a Ceph cluster consisting of | |
270 | three monitors may return the following: | |
271 | ||
272 | .. code-block:: javascript | |
273 | ||
274 | { "election_epoch": 10, | |
275 | "quorum": [ | |
276 | 0, | |
277 | 1, | |
278 | 2], | |
279 | "monmap": { "epoch": 1, | |
280 | "fsid": "444b489c-4f16-4b75-83f0-cb8097468898", | |
281 | "modified": "2011-12-12 13:28:27.505520", | |
282 | "created": "2011-12-12 13:28:27.505520", | |
283 | "mons": [ | |
284 | { "rank": 0, | |
285 | "name": "a", | |
286 | "addr": "127.0.0.1:6789\/0"}, | |
287 | { "rank": 1, | |
288 | "name": "b", | |
289 | "addr": "127.0.0.1:6790\/0"}, | |
290 | { "rank": 2, | |
291 | "name": "c", | |
292 | "addr": "127.0.0.1:6791\/0"} | |
293 | ] | |
294 | } | |
295 | } | |
296 | ||
297 | Checking MDS Status | |
298 | =================== | |
299 | ||
91327a77 | 300 | Metadata servers provide metadata services for CephFS. Metadata servers have |
7c673cae FG |
301 | two sets of states: ``up | down`` and ``active | inactive``. To ensure your |
302 | metadata servers are ``up`` and ``active``, execute the following:: | |
303 | ||
304 | ceph mds stat | |
305 | ||
306 | To display details of the metadata cluster, execute the following:: | |
307 | ||
308 | ceph fs dump | |
309 | ||
310 | ||
311 | Checking Placement Group States | |
312 | =============================== | |
313 | ||
314 | Placement groups map objects to OSDs. When you monitor your | |
315 | placement groups, you will want them to be ``active`` and ``clean``. | |
316 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
317 | ||
318 | .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg | |
319 | ||
320 | ||
321 | Using the Admin Socket | |
322 | ====================== | |
323 | ||
324 | The Ceph admin socket allows you to query a daemon via a socket interface. | |
325 | By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon | |
326 | via the admin socket, login to the host running the daemon and use the | |
327 | following command:: | |
328 | ||
329 | ceph daemon {daemon-name} | |
330 | ceph daemon {path-to-socket-file} | |
331 | ||
332 | For example, the following are equivalent:: | |
333 | ||
334 | ceph daemon osd.0 foo | |
335 | ceph daemon /var/run/ceph/ceph-osd.0.asok foo | |
336 | ||
337 | To view the available admin socket commands, execute the following command:: | |
338 | ||
339 | ceph daemon {daemon-name} help | |
340 | ||
341 | The admin socket command enables you to show and set your configuration at | |
342 | runtime. See `Viewing a Configuration at Runtime`_ for details. | |
343 | ||
344 | Additionally, you can set configuration values at runtime directly (i.e., the | |
345 | admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id} | |
346 | injectargs``, which relies on the monitor but doesn't require you to login | |
347 | directly to the host in question ). | |
348 | ||
349 | .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config | |
350 | .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity | |
c07f9fc5 | 351 | .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/ |