]>
Commit | Line | Data |
---|---|---|
1 | ||
2 | ============= | |
3 | Health checks | |
4 | ============= | |
5 | ||
6 | Overview | |
7 | ======== | |
8 | ||
9 | There is a finite set of possible health messages that a Ceph cluster can | |
10 | raise -- these are defined as *health checks* which have unique identifiers. | |
11 | ||
12 | The identifier is a terse pseudo-human-readable (i.e. like a variable name) | |
13 | string. It is intended to enable tools (such as UIs) to make sense of | |
14 | health checks, and present them in a way that reflects their meaning. | |
15 | ||
16 | This page lists the health checks that are raised by the monitor and manager | |
17 | daemons. In addition to these, you may also see health checks that originate | |
18 | from MDS daemons (see :doc:`/cephfs/health-messages`), and health checks | |
19 | that are defined by ceph-mgr python modules. | |
20 | ||
21 | Definitions | |
22 | =========== | |
23 | ||
24 | ||
25 | OSDs | |
26 | ---- | |
27 | ||
28 | OSD_DOWN | |
29 | ________ | |
30 | ||
31 | One or more OSDs are marked down. The ceph-osd daemon may have been | |
32 | stopped, or peer OSDs may be unable to reach the OSD over the network. | |
33 | Common causes include a stopped or crashed daemon, a down host, or a | |
34 | network outage. | |
35 | ||
36 | Verify the host is healthy, the daemon is started, and network is | |
37 | functioning. If the daemon has crashed, the daemon log file | |
38 | (``/var/log/ceph/ceph-osd.*``) may contain debugging information. | |
39 | ||
40 | OSD_<crush type>_DOWN | |
41 | _____________________ | |
42 | ||
43 | (e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN) | |
44 | ||
45 | All the OSDs within a particular CRUSH subtree are marked down, for example | |
46 | all OSDs on a host. | |
47 | ||
48 | OSD_ORPHAN | |
49 | __________ | |
50 | ||
51 | An OSD is referenced in the CRUSH map hierarchy but does not exist. | |
52 | ||
53 | The OSD can be removed from the CRUSH hierarchy with:: | |
54 | ||
55 | ceph osd crush rm osd.<id> | |
56 | ||
57 | OSD_OUT_OF_ORDER_FULL | |
58 | _____________________ | |
59 | ||
60 | The utilization thresholds for `backfillfull`, `nearfull`, `full`, | |
61 | and/or `failsafe_full` are not ascending. In particular, we expect | |
62 | `backfillfull < nearfull`, `nearfull < full`, and `full < | |
63 | failsafe_full`. | |
64 | ||
65 | The thresholds can be adjusted with:: | |
66 | ||
67 | ceph osd set-backfillfull-ratio <ratio> | |
68 | ceph osd set-nearfull-ratio <ratio> | |
69 | ceph osd set-full-ratio <ratio> | |
70 | ||
71 | ||
72 | OSD_FULL | |
73 | ________ | |
74 | ||
75 | One or more OSDs has exceeded the `full` threshold and is preventing | |
76 | the cluster from servicing writes. | |
77 | ||
78 | Utilization by pool can be checked with:: | |
79 | ||
80 | ceph df | |
81 | ||
82 | The currently defined `full` ratio can be seen with:: | |
83 | ||
84 | ceph osd dump | grep full_ratio | |
85 | ||
86 | A short-term workaround to restore write availability is to raise the full | |
87 | threshold by a small amount:: | |
88 | ||
89 | ceph osd set-full-ratio <ratio> | |
90 | ||
91 | New storage should be added to the cluster by deploying more OSDs or | |
92 | existing data should be deleted in order to free up space. | |
93 | ||
94 | OSD_BACKFILLFULL | |
95 | ________________ | |
96 | ||
97 | One or more OSDs has exceeded the `backfillfull` threshold, which will | |
98 | prevent data from being allowed to rebalance to this device. This is | |
99 | an early warning that rebalancing may not be able to complete and that | |
100 | the cluster is approaching full. | |
101 | ||
102 | Utilization by pool can be checked with:: | |
103 | ||
104 | ceph df | |
105 | ||
106 | OSD_NEARFULL | |
107 | ____________ | |
108 | ||
109 | One or more OSDs has exceeded the `nearfull` threshold. This is an early | |
110 | warning that the cluster is approaching full. | |
111 | ||
112 | Utilization by pool can be checked with:: | |
113 | ||
114 | ceph df | |
115 | ||
116 | OSDMAP_FLAGS | |
117 | ____________ | |
118 | ||
119 | One or more cluster flags of interest has been set. These flags include: | |
120 | ||
121 | * *full* - the cluster is flagged as full and cannot service writes | |
122 | * *pauserd*, *pausewr* - paused reads or writes | |
123 | * *noup* - OSDs are not allowed to start | |
124 | * *nodown* - OSD failure reports are being ignored, such that the | |
125 | monitors will not mark OSDs `down` | |
126 | * *noin* - OSDs that were previously marked `out` will not be marked | |
127 | back `in` when they start | |
128 | * *noout* - down OSDs will not automatically be marked out after the | |
129 | configured interval | |
130 | * *nobackfill*, *norecover*, *norebalance* - recovery or data | |
131 | rebalancing is suspended | |
132 | * *noscrub*, *nodeep_scrub* - scrubbing is disabled | |
133 | * *notieragent* - cache tiering activity is suspended | |
134 | ||
135 | With the exception of *full*, these flags can be set or cleared with:: | |
136 | ||
137 | ceph osd set <flag> | |
138 | ceph osd unset <flag> | |
139 | ||
140 | OSD_FLAGS | |
141 | _________ | |
142 | ||
143 | One or more OSDs has a per-OSD flag of interest set. These flags include: | |
144 | ||
145 | * *noup*: OSD is not allowed to start | |
146 | * *nodown*: failure reports for this OSD will be ignored | |
147 | * *noin*: if this OSD was previously marked `out` automatically | |
148 | after a failure, it will not be marked in when it stats | |
149 | * *noout*: if this OSD is down it will not automatically be marked | |
150 | `out` after the configured interval | |
151 | ||
152 | Per-OSD flags can be set and cleared with:: | |
153 | ||
154 | ceph osd add-<flag> <osd-id> | |
155 | ceph osd rm-<flag> <osd-id> | |
156 | ||
157 | For example, :: | |
158 | ||
159 | ceph osd rm-nodown osd.123 | |
160 | ||
161 | OLD_CRUSH_TUNABLES | |
162 | __________________ | |
163 | ||
164 | The CRUSH map is using very old settings and should be updated. The | |
165 | oldest tunables that can be used (i.e., the oldest client version that | |
166 | can connect to the cluster) without triggering this health warning is | |
167 | determined by the ``mon_crush_min_required_version`` config option. | |
168 | See :doc:`/rados/operations/crush-map/#tunables` for more information. | |
169 | ||
170 | OLD_CRUSH_STRAW_CALC_VERSION | |
171 | ____________________________ | |
172 | ||
173 | The CRUSH map is using an older, non-optimal method for calculating | |
174 | intermediate weight values for ``straw`` buckets. | |
175 | ||
176 | The CRUSH map should be updated to use the newer method | |
177 | (``straw_calc_version=1``). See | |
178 | :doc:`/rados/operations/crush-map/#tunables` for more information. | |
179 | ||
180 | CACHE_POOL_NO_HIT_SET | |
181 | _____________________ | |
182 | ||
183 | One or more cache pools is not configured with a *hit set* to track | |
184 | utilization, which will prevent the tiering agent from identifying | |
185 | cold objects to flush and evict from the cache. | |
186 | ||
187 | Hit sets can be configured on the cache pool with:: | |
188 | ||
189 | ceph osd pool set <poolname> hit_set_type <type> | |
190 | ceph osd pool set <poolname> hit_set_period <period-in-seconds> | |
191 | ceph osd pool set <poolname> hit_set_count <number-of-hitsets> | |
192 | ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate> | |
193 | ||
194 | OSD_NO_SORTBITWISE | |
195 | __________________ | |
196 | ||
197 | No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not | |
198 | been set. | |
199 | ||
200 | The ``sortbitwise`` flag must be set before luminous v12.y.z or newer | |
201 | OSDs can start. You can safely set the flag with:: | |
202 | ||
203 | ceph osd set sortbitwise | |
204 | ||
205 | POOL_FULL | |
206 | _________ | |
207 | ||
208 | One or more pools has reached its quota and is no longer allowing writes. | |
209 | ||
210 | Pool quotas and utilization can be seen with:: | |
211 | ||
212 | ceph df detail | |
213 | ||
214 | You can either raise the pool quota with:: | |
215 | ||
216 | ceph osd pool set-quota <poolname> max_objects <num-objects> | |
217 | ceph osd pool set-quota <poolname> max_bytes <num-bytes> | |
218 | ||
219 | or delete some existing data to reduce utilization. | |
220 | ||
221 | ||
222 | Data health (pools & placement groups) | |
223 | -------------------------------------- | |
224 | ||
225 | PG_AVAILABILITY | |
226 | _______________ | |
227 | ||
228 | Data availability is reduced, meaning that the cluster is unable to | |
229 | service potential read or write requests for some data in the cluster. | |
230 | Specifically, one or more PGs is in a state that does not allow IO | |
231 | requests to be serviced. Problematic PG states include *peering*, | |
232 | *stale*, *incomplete*, and the lack of *active* (if those conditions do not clear | |
233 | quickly). | |
234 | ||
235 | Detailed information about which PGs are affected is available from:: | |
236 | ||
237 | ceph health detail | |
238 | ||
239 | In most cases the root cause is that one or more OSDs is currently | |
240 | down; see the dicussion for ``OSD_DOWN`` above. | |
241 | ||
242 | The state of specific problematic PGs can be queried with:: | |
243 | ||
244 | ceph tell <pgid> query | |
245 | ||
246 | PG_DEGRADED | |
247 | ___________ | |
248 | ||
249 | Data redundancy is reduced for some data, meaning the cluster does not | |
250 | have the desired number of replicas for all data (for replicated | |
251 | pools) or erasure code fragments (for erasure coded pools). | |
252 | Specifically, one or more PGs: | |
253 | ||
254 | * has the *degraded* or *undersized* flag set, meaning there are not | |
255 | enough instances of that placement group in the cluster; | |
256 | * has not had the *clean* flag set for some time. | |
257 | ||
258 | Detailed information about which PGs are affected is available from:: | |
259 | ||
260 | ceph health detail | |
261 | ||
262 | In most cases the root cause is that one or more OSDs is currently | |
263 | down; see the dicussion for ``OSD_DOWN`` above. | |
264 | ||
265 | The state of specific problematic PGs can be queried with:: | |
266 | ||
267 | ceph tell <pgid> query | |
268 | ||
269 | ||
270 | PG_DEGRADED_FULL | |
271 | ________________ | |
272 | ||
273 | Data redundancy may be reduced or at risk for some data due to a lack | |
274 | of free space in the cluster. Specifically, one or more PGs has the | |
275 | *backfill_toofull* or *recovery_toofull* flag set, meaning that the | |
276 | cluster is unable to migrate or recover data because one or more OSDs | |
277 | is above the *backfillfull* threshold. | |
278 | ||
279 | See the discussion for *OSD_BACKFILLFULL* or *OSD_FULL* above for | |
280 | steps to resolve this condition. | |
281 | ||
282 | PG_DAMAGED | |
283 | __________ | |
284 | ||
285 | Data scrubbing has discovered some problems with data consistency in | |
286 | the cluster. Specifically, one or more PGs has the *inconsistent* or | |
287 | *snaptrim_error* flag is set, indicating an earlier scrub operation | |
288 | found a problem, or that the *repair* flag is set, meaning a repair | |
289 | for such an inconsistency is currently in progress. | |
290 | ||
291 | See :doc:`pg-repair` for more information. | |
292 | ||
293 | OSD_SCRUB_ERRORS | |
294 | ________________ | |
295 | ||
296 | Recent OSD scrubs have uncovered inconsistencies. This error is generally | |
297 | paired with *PG_DAMANGED* (see above). | |
298 | ||
299 | See :doc:`pg-repair` for more information. | |
300 | ||
301 | CACHE_POOL_NEAR_FULL | |
302 | ____________________ | |
303 | ||
304 | A cache tier pool is nearly full. Full in this context is determined | |
305 | by the ``target_max_bytes`` and ``target_max_objects`` properties on | |
306 | the cache pool. Once the pool reaches the target threshold, write | |
307 | requests to the pool may block while data is flushed and evicted | |
308 | from the cache, a state that normally leads to very high latencies and | |
309 | poor performance. | |
310 | ||
311 | The cache pool target size can be adjusted with:: | |
312 | ||
313 | ceph osd pool set <cache-pool-name> target_max_bytes <bytes> | |
314 | ceph osd pool set <cache-pool-name> target_max_objects <objects> | |
315 | ||
316 | Normal cache flush and evict activity may also be throttled due to reduced | |
317 | availability or performance of the base tier, or overall cluster load. | |
318 | ||
319 | TOO_FEW_PGS | |
320 | ___________ | |
321 | ||
322 | The number of PGs in use in the cluster is below the configurable | |
323 | threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead | |
324 | to suboptimizal distribution and balance of data across the OSDs in | |
325 | the cluster, and similar reduce overall performance. | |
326 | ||
327 | This may be an expected condition if data pools have not yet been | |
328 | created. | |
329 | ||
330 | The PG count for existing pools can be increased or new pools can be | |
331 | created. Please refer to | |
332 | :doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for | |
333 | more information. | |
334 | ||
335 | TOO_MANY_PGS | |
336 | ____________ | |
337 | ||
338 | The number of PGs in use in the cluster is above the configurable | |
339 | threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is | |
340 | exceed the cluster will not allow new pools to be created, pool `pg_num` to | |
341 | be increased, or pool replication to be increased (any of which would lead to | |
342 | more PGs in the cluster). A large number of PGs can lead | |
343 | to higher memory utilization for OSD daemons, slower peering after | |
344 | cluster state changes (like OSD restarts, additions, or removals), and | |
345 | higher load on the Manager and Monitor daemons. | |
346 | ||
347 | The simplest way to mitigate the problem is to increase the number of | |
348 | OSDs in the cluster by adding more hardware. Note that the OSD count | |
349 | used for the purposes of this health check is the number of "in" OSDs, | |
350 | so marking "out" OSDs "in" (if there are any) can also help:: | |
351 | ||
352 | ceph osd in <osd id(s)> | |
353 | ||
354 | Please refer to | |
355 | :doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for | |
356 | more information. | |
357 | ||
358 | SMALLER_PGP_NUM | |
359 | _______________ | |
360 | ||
361 | One or more pools has a ``pgp_num`` value less than ``pg_num``. This | |
362 | is normally an indication that the PG count was increased without | |
363 | also increasing the placement behavior. | |
364 | ||
365 | This is sometimes done deliberately to separate out the `split` step | |
366 | when the PG count is adjusted from the data migration that is needed | |
367 | when ``pgp_num`` is changed. | |
368 | ||
369 | This is normally resolved by setting ``pgp_num`` to match ``pg_num``, | |
370 | triggering the data migration, with:: | |
371 | ||
372 | ceph osd pool set <pool> pgp_num <pg-num-value> | |
373 | ||
374 | MANY_OBJECTS_PER_PG | |
375 | ___________________ | |
376 | ||
377 | One or more pools has an average number of objects per PG that is | |
378 | significantly higher than the overall cluster average. The specific | |
379 | threshold is controlled by the ``mon_pg_warn_max_object_skew`` | |
380 | configuration value. | |
381 | ||
382 | This is usually an indication that the pool(s) containing most of the | |
383 | data in the cluster have too few PGs, and/or that other pools that do | |
384 | not contain as much data have too many PGs. See the discussion of | |
385 | *TOO_MANY_PGS* above. | |
386 | ||
387 | The threshold can be raised to silence the health warning by adjusting | |
388 | the ``mon_pg_warn_max_object_skew`` config option on the monitors. | |
389 | ||
390 | POOL_APP_NOT_ENABLED | |
391 | ____________________ | |
392 | ||
393 | A pool exists that contains one or more objects but has not been | |
394 | tagged for use by a particular application. | |
395 | ||
396 | Resolve this warning by labeling the pool for use by an application. For | |
397 | example, if the pool is used by RBD,:: | |
398 | ||
399 | rbd pool init <poolname> | |
400 | ||
401 | If the pool is being used by a custom application 'foo', you can also label | |
402 | via the low-level command:: | |
403 | ||
404 | ceph osd pool application enable foo | |
405 | ||
406 | For more information, see :doc:`pools.rst#associate-pool-to-application`. | |
407 | ||
408 | POOL_FULL | |
409 | _________ | |
410 | ||
411 | One or more pools has reached (or is very close to reaching) its | |
412 | quota. The threshold to trigger this error condition is controlled by | |
413 | the ``mon_pool_quota_crit_threshold`` configuration option. | |
414 | ||
415 | Pool quotas can be adjusted up or down (or removed) with:: | |
416 | ||
417 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
418 | ceph osd pool set-quota <pool> max_objects <objects> | |
419 | ||
420 | Setting the quota value to 0 will disable the quota. | |
421 | ||
422 | POOL_NEAR_FULL | |
423 | ______________ | |
424 | ||
425 | One or more pools is approaching is quota. The threshold to trigger | |
426 | this warning condition is controlled by the | |
427 | ``mon_pool_quota_warn_threshold`` configuration option. | |
428 | ||
429 | Pool quotas can be adjusted up or down (or removed) with:: | |
430 | ||
431 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
432 | ceph osd pool set-quota <pool> max_objects <objects> | |
433 | ||
434 | Setting the quota value to 0 will disable the quota. | |
435 | ||
436 | OBJECT_MISPLACED | |
437 | ________________ | |
438 | ||
439 | One or more objects in the cluster is not stored on the node the | |
440 | cluster would like it to be stored on. This is an indication that | |
441 | data migration due to some recent cluster change has not yet completed. | |
442 | ||
443 | Misplaced data is not a dangerous condition in and of itself; data | |
444 | consistency is never at risk, and old copies of objects are never | |
445 | removed until the desired number of new copies (in the desired | |
446 | locations) are present. | |
447 | ||
448 | OBJECT_UNFOUND | |
449 | ______________ | |
450 | ||
451 | One or more objects in the cluster cannot be found. Specifically, the | |
452 | OSDs know that a new or updated copy of an object should exist, but a | |
453 | copy of that version of the object has not been found on OSDs that are | |
454 | currently online. | |
455 | ||
456 | Read or write requests to unfound objects will block. | |
457 | ||
458 | Ideally, a down OSD can be brought back online that has the more | |
459 | recent copy of the unfound object. Candidate OSDs can be identified from the | |
460 | peering state for the PG(s) responsible for the unfound object:: | |
461 | ||
462 | ceph tell <pgid> query | |
463 | ||
464 | If the latest copy of the object is not available, the cluster can be | |
465 | told to roll back to a previous version of the object. See | |
466 | :doc:`troubleshooting-pg#Unfound-objects` for more information. | |
467 | ||
468 | REQUEST_SLOW | |
469 | ____________ | |
470 | ||
471 | One or more OSD requests is taking a long time to process. This can | |
472 | be an indication of extreme load, a slow storage device, or a software | |
473 | bug. | |
474 | ||
475 | The request queue on the OSD(s) in question can be queried with the | |
476 | following command, executed from the OSD host:: | |
477 | ||
478 | ceph daemon osd.<id> ops | |
479 | ||
480 | A summary of the slowest recent requests can be seen with:: | |
481 | ||
482 | ceph daemon osd.<id> dump_historic_ops | |
483 | ||
484 | The location of an OSD can be found with:: | |
485 | ||
486 | ceph osd find osd.<id> | |
487 | ||
488 | REQUEST_STUCK | |
489 | _____________ | |
490 | ||
491 | One or more OSD requests has been blocked for an extremely long time. | |
492 | This is an indication that either the cluster has been unhealthy for | |
493 | an extended period of time (e.g., not enough running OSDs) or there is | |
494 | some internal problem with the OSD. See the dicussion of | |
495 | *REQUEST_SLOW* above. | |
496 | ||
497 | PG_NOT_SCRUBBED | |
498 | _______________ | |
499 | ||
500 | One or more PGs has not been scrubbed recently. PGs are normally | |
501 | scrubbed every ``mon_scrub_interval`` seconds, and this warning | |
502 | triggers when ``mon_warn_not_scrubbed`` such intervals have elapsed | |
503 | without a scrub. | |
504 | ||
505 | PGs will not scrub if they are not flagged as *clean*, which may | |
506 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
507 | *PG_DEGRADED* above). | |
508 | ||
509 | You can manually initiate a scrub of a clean PG with:: | |
510 | ||
511 | ceph pg scrub <pgid> | |
512 | ||
513 | PG_NOT_DEEP_SCRUBBED | |
514 | ____________________ | |
515 | ||
516 | One or more PGs has not been deep scrubbed recently. PGs are normally | |
517 | scrubbed every ``osd_deep_mon_scrub_interval`` seconds, and this warning | |
518 | triggers when ``mon_warn_not_deep_scrubbed`` such intervals have elapsed | |
519 | without a scrub. | |
520 | ||
521 | PGs will not (deep) scrub if they are not flagged as *clean*, which may | |
522 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
523 | *PG_DEGRADED* above). | |
524 | ||
525 | You can manually initiate a scrub of a clean PG with:: | |
526 | ||
527 | ceph pg deep-scrub <pgid> |