]>
Commit | Line | Data |
---|---|---|
c07f9fc5 FG |
1 | |
2 | ============= | |
3 | Health checks | |
4 | ============= | |
5 | ||
6 | Overview | |
7 | ======== | |
8 | ||
9 | There is a finite set of possible health messages that a Ceph cluster can | |
10 | raise -- these are defined as *health checks* which have unique identifiers. | |
11 | ||
12 | The identifier is a terse pseudo-human-readable (i.e. like a variable name) | |
13 | string. It is intended to enable tools (such as UIs) to make sense of | |
14 | health checks, and present them in a way that reflects their meaning. | |
15 | ||
16 | This page lists the health checks that are raised by the monitor and manager | |
17 | daemons. In addition to these, you may also see health checks that originate | |
18 | from MDS daemons (see :doc:`/cephfs/health-messages`), and health checks | |
19 | that are defined by ceph-mgr python modules. | |
20 | ||
21 | Definitions | |
22 | =========== | |
23 | ||
24 | ||
25 | OSDs | |
26 | ---- | |
27 | ||
28 | OSD_DOWN | |
29 | ________ | |
30 | ||
31 | One or more OSDs are marked down. The ceph-osd daemon may have been | |
32 | stopped, or peer OSDs may be unable to reach the OSD over the network. | |
33 | Common causes include a stopped or crashed daemon, a down host, or a | |
34 | network outage. | |
35 | ||
36 | Verify the host is healthy, the daemon is started, and network is | |
37 | functioning. If the daemon has crashed, the daemon log file | |
38 | (``/var/log/ceph/ceph-osd.*``) may contain debugging information. | |
39 | ||
40 | OSD_<crush type>_DOWN | |
41 | _____________________ | |
42 | ||
43 | (e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN) | |
44 | ||
45 | All the OSDs within a particular CRUSH subtree are marked down, for example | |
46 | all OSDs on a host. | |
47 | ||
48 | OSD_ORPHAN | |
49 | __________ | |
50 | ||
51 | An OSD is referenced in the CRUSH map hierarchy but does not exist. | |
52 | ||
53 | The OSD can be removed from the CRUSH hierarchy with:: | |
54 | ||
55 | ceph osd crush rm osd.<id> | |
56 | ||
57 | OSD_OUT_OF_ORDER_FULL | |
58 | _____________________ | |
59 | ||
60 | The utilization thresholds for `backfillfull`, `nearfull`, `full`, | |
61 | and/or `failsafe_full` are not ascending. In particular, we expect | |
62 | `backfillfull < nearfull`, `nearfull < full`, and `full < | |
63 | failsafe_full`. | |
64 | ||
65 | The thresholds can be adjusted with:: | |
66 | ||
67 | ceph osd set-backfillfull-ratio <ratio> | |
68 | ceph osd set-nearfull-ratio <ratio> | |
69 | ceph osd set-full-ratio <ratio> | |
70 | ||
71 | ||
72 | OSD_FULL | |
73 | ________ | |
74 | ||
75 | One or more OSDs has exceeded the `full` threshold and is preventing | |
76 | the cluster from servicing writes. | |
77 | ||
78 | Utilization by pool can be checked with:: | |
79 | ||
80 | ceph df | |
81 | ||
82 | The currently defined `full` ratio can be seen with:: | |
83 | ||
84 | ceph osd dump | grep full_ratio | |
85 | ||
86 | A short-term workaround to restore write availability is to raise the full | |
87 | threshold by a small amount:: | |
88 | ||
89 | ceph osd set-full-ratio <ratio> | |
90 | ||
91 | New storage should be added to the cluster by deploying more OSDs or | |
92 | existing data should be deleted in order to free up space. | |
93 | ||
94 | OSD_BACKFILLFULL | |
95 | ________________ | |
96 | ||
97 | One or more OSDs has exceeded the `backfillfull` threshold, which will | |
98 | prevent data from being allowed to rebalance to this device. This is | |
99 | an early warning that rebalancing may not be able to complete and that | |
100 | the cluster is approaching full. | |
101 | ||
102 | Utilization by pool can be checked with:: | |
103 | ||
104 | ceph df | |
105 | ||
106 | OSD_NEARFULL | |
107 | ____________ | |
108 | ||
109 | One or more OSDs has exceeded the `nearfull` threshold. This is an early | |
110 | warning that the cluster is approaching full. | |
111 | ||
112 | Utilization by pool can be checked with:: | |
113 | ||
114 | ceph df | |
115 | ||
116 | OSDMAP_FLAGS | |
117 | ____________ | |
118 | ||
119 | One or more cluster flags of interest has been set. These flags include: | |
120 | ||
121 | * *full* - the cluster is flagged as full and cannot service writes | |
122 | * *pauserd*, *pausewr* - paused reads or writes | |
123 | * *noup* - OSDs are not allowed to start | |
124 | * *nodown* - OSD failure reports are being ignored, such that the | |
125 | monitors will not mark OSDs `down` | |
126 | * *noin* - OSDs that were previously marked `out` will not be marked | |
127 | back `in` when they start | |
128 | * *noout* - down OSDs will not automatically be marked out after the | |
129 | configured interval | |
130 | * *nobackfill*, *norecover*, *norebalance* - recovery or data | |
131 | rebalancing is suspended | |
132 | * *noscrub*, *nodeep_scrub* - scrubbing is disabled | |
133 | * *notieragent* - cache tiering activity is suspended | |
134 | ||
135 | With the exception of *full*, these flags can be set or cleared with:: | |
136 | ||
137 | ceph osd set <flag> | |
138 | ceph osd unset <flag> | |
139 | ||
140 | OSD_FLAGS | |
141 | _________ | |
142 | ||
143 | One or more OSDs has a per-OSD flag of interest set. These flags include: | |
144 | ||
145 | * *noup*: OSD is not allowed to start | |
146 | * *nodown*: failure reports for this OSD will be ignored | |
147 | * *noin*: if this OSD was previously marked `out` automatically | |
148 | after a failure, it will not be marked in when it stats | |
149 | * *noout*: if this OSD is down it will not automatically be marked | |
150 | `out` after the configured interval | |
151 | ||
152 | Per-OSD flags can be set and cleared with:: | |
153 | ||
154 | ceph osd add-<flag> <osd-id> | |
155 | ceph osd rm-<flag> <osd-id> | |
156 | ||
157 | For example, :: | |
158 | ||
159 | ceph osd rm-nodown osd.123 | |
160 | ||
161 | OLD_CRUSH_TUNABLES | |
162 | __________________ | |
163 | ||
164 | The CRUSH map is using very old settings and should be updated. The | |
165 | oldest tunables that can be used (i.e., the oldest client version that | |
166 | can connect to the cluster) without triggering this health warning is | |
167 | determined by the ``mon_crush_min_required_version`` config option. | |
168 | See :doc:`/rados/operations/crush-map/#tunables` for more information. | |
169 | ||
170 | OLD_CRUSH_STRAW_CALC_VERSION | |
171 | ____________________________ | |
172 | ||
173 | The CRUSH map is using an older, non-optimal method for calculating | |
174 | intermediate weight values for ``straw`` buckets. | |
175 | ||
176 | The CRUSH map should be updated to use the newer method | |
177 | (``straw_calc_version=1``). See | |
178 | :doc:`/rados/operations/crush-map/#tunables` for more information. | |
179 | ||
180 | CACHE_POOL_NO_HIT_SET | |
181 | _____________________ | |
182 | ||
183 | One or more cache pools is not configured with a *hit set* to track | |
184 | utilization, which will prevent the tiering agent from identifying | |
185 | cold objects to flush and evict from the cache. | |
186 | ||
187 | Hit sets can be configured on the cache pool with:: | |
188 | ||
189 | ceph osd pool set <poolname> hit_set_type <type> | |
190 | ceph osd pool set <poolname> hit_set_period <period-in-seconds> | |
191 | ceph osd pool set <poolname> hit_set_count <number-of-hitsets> | |
192 | ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate> | |
193 | ||
194 | OSD_NO_SORTBITWISE | |
195 | __________________ | |
196 | ||
197 | No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not | |
198 | been set. | |
199 | ||
200 | The ``sortbitwise`` flag must be set before luminous v12.y.z or newer | |
201 | OSDs can start. You can safely set the flag with:: | |
202 | ||
203 | ceph osd set sortbitwise | |
204 | ||
205 | POOL_FULL | |
206 | _________ | |
207 | ||
208 | One or more pools has reached its quota and is no longer allowing writes. | |
209 | ||
210 | Pool quotas and utilization can be seen with:: | |
211 | ||
212 | ceph df detail | |
213 | ||
214 | You can either raise the pool quota with:: | |
215 | ||
216 | ceph osd pool set-quota <poolname> max_objects <num-objects> | |
217 | ceph osd pool set-quota <poolname> max_bytes <num-bytes> | |
218 | ||
219 | or delete some existing data to reduce utilization. | |
220 | ||
221 | ||
222 | Data health (pools & placement groups) | |
d2e6a577 | 223 | -------------------------------------- |
c07f9fc5 FG |
224 | |
225 | PG_AVAILABILITY | |
226 | _______________ | |
227 | ||
228 | Data availability is reduced, meaning that the cluster is unable to | |
229 | service potential read or write requests for some data in the cluster. | |
230 | Specifically, one or more PGs is in a state that does not allow IO | |
231 | requests to be serviced. Problematic PG states include *peering*, | |
232 | *stale*, *incomplete*, and the lack of *active* (if those conditions do not clear | |
233 | quickly). | |
234 | ||
235 | Detailed information about which PGs are affected is available from:: | |
236 | ||
237 | ceph health detail | |
238 | ||
239 | In most cases the root cause is that one or more OSDs is currently | |
240 | down; see the dicussion for ``OSD_DOWN`` above. | |
241 | ||
242 | The state of specific problematic PGs can be queried with:: | |
243 | ||
244 | ceph tell <pgid> query | |
245 | ||
246 | PG_DEGRADED | |
247 | ___________ | |
248 | ||
249 | Data redundancy is reduced for some data, meaning the cluster does not | |
250 | have the desired number of replicas for all data (for replicated | |
251 | pools) or erasure code fragments (for erasure coded pools). | |
252 | Specifically, one or more PGs: | |
253 | ||
254 | * has the *degraded* or *undersized* flag set, meaning there are not | |
255 | enough instances of that placement group in the cluster; | |
256 | * has not had the *clean* flag set for some time. | |
257 | ||
258 | Detailed information about which PGs are affected is available from:: | |
259 | ||
260 | ceph health detail | |
261 | ||
262 | In most cases the root cause is that one or more OSDs is currently | |
263 | down; see the dicussion for ``OSD_DOWN`` above. | |
264 | ||
265 | The state of specific problematic PGs can be queried with:: | |
266 | ||
267 | ceph tell <pgid> query | |
268 | ||
269 | ||
270 | PG_DEGRADED_FULL | |
271 | ________________ | |
272 | ||
273 | Data redundancy may be reduced or at risk for some data due to a lack | |
274 | of free space in the cluster. Specifically, one or more PGs has the | |
275 | *backfill_toofull* or *recovery_toofull* flag set, meaning that the | |
276 | cluster is unable to migrate or recover data because one or more OSDs | |
277 | is above the *backfillfull* threshold. | |
278 | ||
279 | See the discussion for *OSD_BACKFILLFULL* or *OSD_FULL* above for | |
280 | steps to resolve this condition. | |
281 | ||
282 | PG_DAMAGED | |
283 | __________ | |
284 | ||
285 | Data scrubbing has discovered some problems with data consistency in | |
286 | the cluster. Specifically, one or more PGs has the *inconsistent* or | |
287 | *snaptrim_error* flag is set, indicating an earlier scrub operation | |
288 | found a problem, or that the *repair* flag is set, meaning a repair | |
289 | for such an inconsistency is currently in progress. | |
290 | ||
291 | See :doc:`pg-repair` for more information. | |
292 | ||
293 | OSD_SCRUB_ERRORS | |
294 | ________________ | |
295 | ||
296 | Recent OSD scrubs have uncovered inconsistencies. This error is generally | |
297 | paired with *PG_DAMANGED* (see above). | |
298 | ||
299 | See :doc:`pg-repair` for more information. | |
300 | ||
301 | CACHE_POOL_NEAR_FULL | |
302 | ____________________ | |
303 | ||
304 | A cache tier pool is nearly full. Full in this context is determined | |
305 | by the ``target_max_bytes`` and ``target_max_objects`` properties on | |
306 | the cache pool. Once the pool reaches the target threshold, write | |
307 | requests to the pool may block while data is flushed and evicted | |
308 | from the cache, a state that normally leads to very high latencies and | |
309 | poor performance. | |
310 | ||
311 | The cache pool target size can be adjusted with:: | |
312 | ||
313 | ceph osd pool set <cache-pool-name> target_max_bytes <bytes> | |
314 | ceph osd pool set <cache-pool-name> target_max_objects <objects> | |
315 | ||
316 | Normal cache flush and evict activity may also be throttled due to reduced | |
317 | availability or performance of the base tier, or overall cluster load. | |
318 | ||
319 | TOO_FEW_PGS | |
320 | ___________ | |
321 | ||
322 | The number of PGs in use in the cluster is below the configurable | |
323 | threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead | |
324 | to suboptimizal distribution and balance of data across the OSDs in | |
325 | the cluster, and similar reduce overall performance. | |
326 | ||
327 | This may be an expected condition if data pools have not yet been | |
328 | created. | |
329 | ||
330 | The PG count for existing pools can be increased or new pools can be | |
331 | created. Please refer to | |
332 | :doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for | |
333 | more information. | |
334 | ||
335 | TOO_MANY_PGS | |
336 | ____________ | |
337 | ||
338 | The number of PGs in use in the cluster is above the configurable | |
339 | threshold of ``mon_pg_warn_max_per_osd`` PGs per OSD. This can lead | |
340 | to higher memory utilization for OSD daemons, slower peering after | |
341 | cluster state changes (like OSD restarts, additions, or removals), and | |
342 | higher load on the Manager and Monitor daemons. | |
343 | ||
344 | The ``pg_num`` value for existing pools cannot currently be reduced. | |
345 | However, the ``pgp_num`` value can, which effectively collocates some | |
346 | PGs on the same sets of OSDs, mitigating some of the negative impacts | |
347 | described above. The ``pgp_num`` value can be adjusted with:: | |
348 | ||
349 | ceph osd pool set <pool> pgp_num <value> | |
350 | ||
351 | Please refer to | |
352 | :doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for | |
353 | more information. | |
354 | ||
355 | SMALLER_PGP_NUM | |
356 | _______________ | |
357 | ||
358 | One or more pools has a ``pgp_num`` value less than ``pg_num``. This | |
359 | is normally an indication that the PG count was increased without | |
360 | also increasing the placement behavior. | |
361 | ||
362 | This is sometimes done deliberately to separate out the `split` step | |
363 | when the PG count is adjusted from the data migration that is needed | |
364 | when ``pgp_num`` is changed. | |
365 | ||
366 | This is normally resolved by setting ``pgp_num`` to match ``pg_num``, | |
367 | triggering the data migration, with:: | |
368 | ||
369 | ceph osd pool set <pool> pgp_num <pg-num-value> | |
370 | ||
371 | ||
372 | MANY_OBJECTS_PER_PG | |
373 | ___________________ | |
374 | ||
375 | One or more pools has an average number of objects per PG that is | |
376 | significantly higher than the overall cluster average. The specific | |
377 | threshold is controlled by the ``mon_pg_warn_max_object_skew`` | |
378 | configuration value. | |
379 | ||
380 | This is usually an indication that the pool(s) containing most of the | |
381 | data in the cluster have too few PGs, and/or that other pools that do | |
382 | not contain as much data have too many PGs. See the discussion of | |
383 | *TOO_MANY_PGS* above. | |
384 | ||
385 | The threshold can be raised to silence the health warning by adjusting | |
386 | the ``mon_pg_warn_max_object_skew`` config option on the monitors. | |
387 | ||
388 | POOL_APP_NOT_ENABLED | |
389 | ____________________ | |
390 | ||
391 | A pool exists that contains one or more objects but has not been | |
392 | tagged for use by a particular application. | |
393 | ||
394 | Resolve this warning by labeling the pool for use by an application. For | |
395 | example, if the pool is used by RBD,:: | |
396 | ||
397 | rbd pool init <poolname> | |
398 | ||
399 | If the pool is being used by a custom application 'foo', you can also label | |
400 | via the low-level command:: | |
401 | ||
402 | ceph osd pool application enable foo | |
403 | ||
404 | For more information, see :doc:`pools.rst#associate-pool-to-application`. | |
405 | ||
406 | POOL_FULL | |
407 | _________ | |
408 | ||
409 | One or more pools has reached (or is very close to reaching) its | |
410 | quota. The threshold to trigger this error condition is controlled by | |
411 | the ``mon_pool_quota_crit_threshold`` configuration option. | |
412 | ||
413 | Pool quotas can be adjusted up or down (or removed) with:: | |
414 | ||
415 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
416 | ceph osd pool set-quota <pool> max_objects <objects> | |
417 | ||
418 | Setting the quota value to 0 will disable the quota. | |
419 | ||
420 | POOL_NEAR_FULL | |
421 | ______________ | |
422 | ||
423 | One or more pools is approaching is quota. The threshold to trigger | |
424 | this warning condition is controlled by the | |
425 | ``mon_pool_quota_warn_threshold`` configuration option. | |
426 | ||
427 | Pool quotas can be adjusted up or down (or removed) with:: | |
428 | ||
429 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
430 | ceph osd pool set-quota <pool> max_objects <objects> | |
431 | ||
432 | Setting the quota value to 0 will disable the quota. | |
433 | ||
434 | OBJECT_MISPLACED | |
435 | ________________ | |
436 | ||
437 | One or more objects in the cluster is not stored on the node the | |
438 | cluster would like it to be stored on. This is an indication that | |
439 | data migration due to some recent cluster change has not yet completed. | |
440 | ||
441 | Misplaced data is not a dangerous condition in and of itself; data | |
442 | consistency is never at risk, and old copies of objects are never | |
443 | removed until the desired number of new copies (in the desired | |
444 | locations) are present. | |
445 | ||
446 | OBJECT_UNFOUND | |
447 | ______________ | |
448 | ||
449 | One or more objects in the cluster cannot be found. Specifically, the | |
450 | OSDs know that a new or updated copy of an object should exist, but a | |
451 | copy of that version of the object has not been found on OSDs that are | |
452 | currently online. | |
453 | ||
454 | Read or write requests to unfound objects will block. | |
455 | ||
456 | Ideally, a down OSD can be brought back online that has the more | |
457 | recent copy of the unfound object. Candidate OSDs can be identified from the | |
458 | peering state for the PG(s) responsible for the unfound object:: | |
459 | ||
460 | ceph tell <pgid> query | |
461 | ||
462 | If the latest copy of the object is not available, the cluster can be | |
463 | told to roll back to a previous version of the object. See | |
464 | :doc:`troubleshooting-pg#Unfound-objects` for more information. | |
465 | ||
466 | REQUEST_SLOW | |
467 | ____________ | |
468 | ||
469 | One or more OSD requests is taking a long time to process. This can | |
470 | be an indication of extreme load, a slow storage device, or a software | |
471 | bug. | |
472 | ||
473 | The request queue on the OSD(s) in question can be queried with the | |
474 | following command, executed from the OSD host:: | |
475 | ||
476 | ceph daemon osd.<id> ops | |
477 | ||
478 | A summary of the slowest recent requests can be seen with:: | |
479 | ||
480 | ceph daemon osd.<id> dump_historic_ops | |
481 | ||
482 | The location of an OSD can be found with:: | |
483 | ||
484 | ceph osd find osd.<id> | |
485 | ||
486 | REQUEST_STUCK | |
487 | _____________ | |
488 | ||
489 | One or more OSD requests has been blocked for an extremely long time. | |
490 | This is an indication that either the cluster has been unhealthy for | |
491 | an extended period of time (e.g., not enough running OSDs) or there is | |
492 | some internal problem with the OSD. See the dicussion of | |
493 | *REQUEST_SLOW* above. | |
494 | ||
495 | PG_NOT_SCRUBBED | |
496 | _______________ | |
497 | ||
498 | One or more PGs has not been scrubbed recently. PGs are normally | |
499 | scrubbed every ``mon_scrub_interval`` seconds, and this warning | |
500 | triggers when ``mon_warn_not_scrubbed`` such intervals have elapsed | |
501 | without a scrub. | |
502 | ||
503 | PGs will not scrub if they are not flagged as *clean*, which may | |
504 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
505 | *PG_DEGRADED* above). | |
506 | ||
507 | You can manually initiate a scrub of a clean PG with:: | |
508 | ||
509 | ceph pg scrub <pgid> | |
510 | ||
511 | PG_NOT_DEEP_SCRUBBED | |
512 | ____________________ | |
513 | ||
514 | One or more PGs has not been deep scrubbed recently. PGs are normally | |
515 | scrubbed every ``osd_deep_mon_scrub_interval`` seconds, and this warning | |
516 | triggers when ``mon_warn_not_deep_scrubbed`` such intervals have elapsed | |
517 | without a scrub. | |
518 | ||
519 | PGs will not (deep) scrub if they are not flagged as *clean*, which may | |
520 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
521 | *PG_DEGRADED* above). | |
522 | ||
523 | You can manually initiate a scrub of a clean PG with:: | |
524 | ||
525 | ceph pg deep-scrub <pgid> |