]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/health-checks.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / rados / operations / health-checks.rst
CommitLineData
9f95a23c 1.. _health-checks:
c07f9fc5
FG
2
3=============
4Health checks
5=============
6
7Overview
8========
9
10There is a finite set of possible health messages that a Ceph cluster can
11raise -- these are defined as *health checks* which have unique identifiers.
12
13The identifier is a terse pseudo-human-readable (i.e. like a variable name)
14string. It is intended to enable tools (such as UIs) to make sense of
15health checks, and present them in a way that reflects their meaning.
16
17This page lists the health checks that are raised by the monitor and manager
18daemons. In addition to these, you may also see health checks that originate
11fdf7f2 19from MDS daemons (see :ref:`cephfs-health-messages`), and health checks
c07f9fc5
FG
20that are defined by ceph-mgr python modules.
21
22Definitions
23===========
24
11fdf7f2
TL
25Monitor
26-------
27
28MON_DOWN
29________
30
31One or more monitor daemons is currently down. The cluster requires a
32majority (more than 1/2) of the monitors in order to function. When
33one or more monitors are down, clients may have a harder time forming
34their initial connection to the cluster as they may need to try more
35addresses before they reach an operating monitor.
36
37The down monitor daemon should generally be restarted as soon as
38possible to reduce the risk of a subsequen monitor failure leading to
39a service outage.
40
41MON_CLOCK_SKEW
42______________
43
44The clocks on the hosts running the ceph-mon monitor daemons are not
45sufficiently well synchronized. This health alert is raised if the
46cluster detects a clock skew greater than ``mon_clock_drift_allowed``.
47
48This is best resolved by synchronizing the clocks using a tool like
49``ntpd`` or ``chrony``.
50
51If it is impractical to keep the clocks closely synchronized, the
52``mon_clock_drift_allowed`` threshold can also be increased, but this
53value must stay significantly below the ``mon_lease`` interval in
54order for monitor cluster to function properly.
55
56MON_MSGR2_NOT_ENABLED
57_____________________
58
59The ``ms_bind_msgr2`` option is enabled but one or more monitors is
60not configured to bind to a v2 port in the cluster's monmap. This
61means that features specific to the msgr2 protocol (e.g., encryption)
62are not available on some or all connections.
63
64In most cases this can be corrected by issuing the command::
65
66 ceph mon enable-msgr2
67
68That command will change any monitor configured for the old default
69port 6789 to continue to listen for v1 connections on 6789 and also
70listen for v2 connections on the new default 3300 port.
71
72If a monitor is configured to listen for v1 connections on a non-standard port (not 6789), then the monmap will need to be modified manually.
73
74
9f95a23c
TL
75MON_DISK_LOW
76____________
77
78One or more monitors is low on disk space. This alert triggers if the
79available space on the file system storing the monitor database
80(normally ``/var/lib/ceph/mon``), as a percentage, drops below
81``mon_data_avail_warn`` (default: 30%).
82
83This may indicate that some other process or user on the system is
84filling up the same file system used by the monitor. It may also
85indicate that the monitors database is large (see ``MON_DISK_BIG``
86below).
87
88If space cannot be freed, the monitor's data directory may need to be
89moved to another storage device or file system (while the monitor
90daemon is not running, of course).
91
92
93MON_DISK_CRIT
94_____________
95
96One or more monitors is critically low on disk space. This alert
97triggers if the available space on the file system storing the monitor
98database (normally ``/var/lib/ceph/mon``), as a percentage, drops
99below ``mon_data_avail_crit`` (default: 5%). See ``MON_DISK_LOW``, above.
100
101MON_DISK_BIG
102____________
103
104The database size for one or more monitors is very large. This alert
105triggers if the size of the monitor's database is larger than
106``mon_data_size_warn`` (default: 15 GiB).
107
108A large database is unusual, but may not necessarily indicate a
109problem. Monitor databases may grow in size when there are placement
110groups that have not reached an ``active+clean`` state in a long time.
111
112This may also indicate that the monitor's database is not properly
113compacting, which has been observed with some older versions of
114leveldb and rocksdb. Forcing a compaction with ``ceph daemon mon.<id>
115compact`` may shrink the on-disk size.
116
117This warning may also indicate that the monitor has a bug that is
118preventing it from pruning the cluster metadata it stores. If the
119problem persists, please report a bug.
120
121The warning threshold may be adjusted with::
122
123 ceph config set global mon_data_size_warn <size>
124
11fdf7f2
TL
125
126Manager
127-------
128
9f95a23c
TL
129MGR_DOWN
130________
131
132All manager daemons are currently down. The cluster should normally
133have at least one running manager (``ceph-mgr``) daemon. If no
134manager daemon is running, the cluster's ability to monitor itself will
135be compromised, and parts of the management API will become
136unavailable (for example, the dashboard will not work, and most CLI
137commands that report metrics or runtime state will block). However,
138the cluster will still be able to perform all IO operations and
139recover from failures.
140
141The down manager daemon should generally be restarted as soon as
142possible to ensure that the cluster can be monitored (e.g., so that
143the ``ceph -s`` information is up to date, and/or metrics can be
144scraped by Prometheus).
145
146
11fdf7f2
TL
147MGR_MODULE_DEPENDENCY
148_____________________
149
150An enabled manager module is failing its dependency check. This health check
151should come with an explanatory message from the module about the problem.
152
153For example, a module might report that a required package is not installed:
154install the required package and restart your manager daemons.
155
156This health check is only applied to enabled modules. If a module is
157not enabled, you can see whether it is reporting dependency issues in
158the output of `ceph module ls`.
159
160
161MGR_MODULE_ERROR
162________________
163
164A manager module has experienced an unexpected error. Typically,
165this means an unhandled exception was raised from the module's `serve`
166function. The human readable description of the error may be obscurely
167worded if the exception did not provide a useful description of itself.
168
169This health check may indicate a bug: please open a Ceph bug report if you
170think you have encountered a bug.
171
172If you believe the error is transient, you may restart your manager
173daemon(s), or use `ceph mgr fail` on the active daemon to prompt
174a failover to another daemon.
175
c07f9fc5
FG
176
177OSDs
178----
179
180OSD_DOWN
181________
182
183One or more OSDs are marked down. The ceph-osd daemon may have been
184stopped, or peer OSDs may be unable to reach the OSD over the network.
185Common causes include a stopped or crashed daemon, a down host, or a
186network outage.
187
188Verify the host is healthy, the daemon is started, and network is
189functioning. If the daemon has crashed, the daemon log file
190(``/var/log/ceph/ceph-osd.*``) may contain debugging information.
191
192OSD_<crush type>_DOWN
193_____________________
194
195(e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN)
196
197All the OSDs within a particular CRUSH subtree are marked down, for example
198all OSDs on a host.
199
200OSD_ORPHAN
201__________
202
203An OSD is referenced in the CRUSH map hierarchy but does not exist.
204
205The OSD can be removed from the CRUSH hierarchy with::
206
207 ceph osd crush rm osd.<id>
208
209OSD_OUT_OF_ORDER_FULL
210_____________________
211
9f95a23c 212The utilization thresholds for `nearfull`, `backfillfull`, `full`,
c07f9fc5 213and/or `failsafe_full` are not ascending. In particular, we expect
9f95a23c 214`nearfull < backfillfull`, `backfillfull < full`, and `full <
c07f9fc5
FG
215failsafe_full`.
216
217The thresholds can be adjusted with::
218
c07f9fc5 219 ceph osd set-nearfull-ratio <ratio>
9f95a23c 220 ceph osd set-backfillfull-ratio <ratio>
c07f9fc5
FG
221 ceph osd set-full-ratio <ratio>
222
223
224OSD_FULL
225________
226
227One or more OSDs has exceeded the `full` threshold and is preventing
228the cluster from servicing writes.
229
230Utilization by pool can be checked with::
231
232 ceph df
233
234The currently defined `full` ratio can be seen with::
235
236 ceph osd dump | grep full_ratio
237
238A short-term workaround to restore write availability is to raise the full
239threshold by a small amount::
240
241 ceph osd set-full-ratio <ratio>
242
243New storage should be added to the cluster by deploying more OSDs or
244existing data should be deleted in order to free up space.
11fdf7f2 245
c07f9fc5
FG
246OSD_BACKFILLFULL
247________________
248
249One or more OSDs has exceeded the `backfillfull` threshold, which will
250prevent data from being allowed to rebalance to this device. This is
251an early warning that rebalancing may not be able to complete and that
252the cluster is approaching full.
253
254Utilization by pool can be checked with::
255
256 ceph df
257
258OSD_NEARFULL
259____________
260
261One or more OSDs has exceeded the `nearfull` threshold. This is an early
262warning that the cluster is approaching full.
263
264Utilization by pool can be checked with::
265
266 ceph df
267
268OSDMAP_FLAGS
269____________
270
271One or more cluster flags of interest has been set. These flags include:
272
81eedcae 273* *full* - the cluster is flagged as full and cannot serve writes
c07f9fc5
FG
274* *pauserd*, *pausewr* - paused reads or writes
275* *noup* - OSDs are not allowed to start
276* *nodown* - OSD failure reports are being ignored, such that the
277 monitors will not mark OSDs `down`
278* *noin* - OSDs that were previously marked `out` will not be marked
279 back `in` when they start
280* *noout* - down OSDs will not automatically be marked out after the
281 configured interval
282* *nobackfill*, *norecover*, *norebalance* - recovery or data
283 rebalancing is suspended
284* *noscrub*, *nodeep_scrub* - scrubbing is disabled
285* *notieragent* - cache tiering activity is suspended
286
287With the exception of *full*, these flags can be set or cleared with::
288
289 ceph osd set <flag>
290 ceph osd unset <flag>
11fdf7f2 291
c07f9fc5
FG
292OSD_FLAGS
293_________
294
81eedcae
TL
295One or more OSDs or CRUSH {nodes,device classes} has a flag of interest set.
296These flags include:
c07f9fc5 297
81eedcae
TL
298* *noup*: these OSDs are not allowed to start
299* *nodown*: failure reports for these OSDs will be ignored
300* *noin*: if these OSDs were previously marked `out` automatically
301 after a failure, they will not be marked in when they start
302* *noout*: if these OSDs are down they will not automatically be marked
c07f9fc5
FG
303 `out` after the configured interval
304
81eedcae 305These flags can be set and cleared in batch with::
c07f9fc5 306
81eedcae
TL
307 ceph osd set-group <flags> <who>
308 ceph osd unset-group <flags> <who>
c07f9fc5
FG
309
310For example, ::
311
81eedcae
TL
312 ceph osd set-group noup,noout osd.0 osd.1
313 ceph osd unset-group noup,noout osd.0 osd.1
314 ceph osd set-group noup,noout host-foo
315 ceph osd unset-group noup,noout host-foo
316 ceph osd set-group noup,noout class-hdd
317 ceph osd unset-group noup,noout class-hdd
c07f9fc5
FG
318
319OLD_CRUSH_TUNABLES
320__________________
321
322The CRUSH map is using very old settings and should be updated. The
323oldest tunables that can be used (i.e., the oldest client version that
324can connect to the cluster) without triggering this health warning is
325determined by the ``mon_crush_min_required_version`` config option.
11fdf7f2 326See :ref:`crush-map-tunables` for more information.
c07f9fc5
FG
327
328OLD_CRUSH_STRAW_CALC_VERSION
329____________________________
330
331The CRUSH map is using an older, non-optimal method for calculating
332intermediate weight values for ``straw`` buckets.
333
334The CRUSH map should be updated to use the newer method
335(``straw_calc_version=1``). See
11fdf7f2 336:ref:`crush-map-tunables` for more information.
c07f9fc5
FG
337
338CACHE_POOL_NO_HIT_SET
339_____________________
340
341One or more cache pools is not configured with a *hit set* to track
342utilization, which will prevent the tiering agent from identifying
343cold objects to flush and evict from the cache.
344
345Hit sets can be configured on the cache pool with::
346
347 ceph osd pool set <poolname> hit_set_type <type>
348 ceph osd pool set <poolname> hit_set_period <period-in-seconds>
349 ceph osd pool set <poolname> hit_set_count <number-of-hitsets>
11fdf7f2 350 ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate>
c07f9fc5
FG
351
352OSD_NO_SORTBITWISE
353__________________
354
355No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not
356been set.
357
358The ``sortbitwise`` flag must be set before luminous v12.y.z or newer
359OSDs can start. You can safely set the flag with::
360
361 ceph osd set sortbitwise
362
363POOL_FULL
364_________
365
366One or more pools has reached its quota and is no longer allowing writes.
367
368Pool quotas and utilization can be seen with::
369
370 ceph df detail
371
372You can either raise the pool quota with::
373
374 ceph osd pool set-quota <poolname> max_objects <num-objects>
375 ceph osd pool set-quota <poolname> max_bytes <num-bytes>
376
377or delete some existing data to reduce utilization.
378
81eedcae
TL
379BLUEFS_SPILLOVER
380________________
381
382One or more OSDs that use the BlueStore backend have been allocated
383`db` partitions (storage space for metadata, normally on a faster
384device) but that space has filled, such that metadata has "spilled
385over" onto the normal slow device. This isn't necessarily an error
386condition or even unexpected, but if the administrator's expectation
387was that all metadata would fit on the faster device, it indicates
388that not enough space was provided.
389
390This warning can be disabled on all OSDs with::
391
392 ceph config set osd bluestore_warn_on_bluefs_spillover false
393
394Alternatively, it can be disabled on a specific OSD with::
395
396 ceph config set osd.123 bluestore_warn_on_bluefs_spillover false
397
398To provide more metadata space, the OSD in question could be destroyed and
399reprovisioned. This will involve data migration and recovery.
400
401It may also be possible to expand the LVM logical volume backing the
402`db` storage. If the underlying LV has been expanded, the OSD daemon
403needs to be stopped and BlueFS informed of the device size change with::
404
405 ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID
406
eafe8130
TL
407BLUEFS_AVAILABLE_SPACE
408______________________
409
410To check how much space is free for BlueFS do::
411
412 ceph daemon osd.123 bluestore bluefs available
413
414This will output up to 3 values: `BDEV_DB free`, `BDEV_SLOW free` and
415`available_from_bluestore`. `BDEV_DB` and `BDEV_SLOW` report amount of space that
416has been acquired by BlueFS and is considered free. Value `available_from_bluestore`
417denotes ability of BlueStore to relinquish more space to BlueFS.
418It is normal that this value is different from amount of BlueStore free space, as
419BlueFS allocation unit is typically larger than BlueStore allocation unit.
420This means that only part of BlueStore free space will be acceptable for BlueFS.
421
422BLUEFS_LOW_SPACE
423_________________
424
425If BlueFS is running low on available free space and there is little
426`available_from_bluestore` one can consider reducing BlueFS allocation unit size.
427To simulate available space when allocation unit is different do::
428
429 ceph daemon osd.123 bluestore bluefs available <alloc-unit-size>
430
431BLUESTORE_FRAGMENTATION
432_______________________
433
434As BlueStore works free space on underlying storage will get fragmented.
435This is normal and unavoidable but excessive fragmentation will cause slowdown.
436To inspect BlueStore fragmentation one can do::
437
438 ceph daemon osd.123 bluestore allocator score block
439
440Score is given in [0-1] range.
441[0.0 .. 0.4] tiny fragmentation
442[0.4 .. 0.7] small, acceptable fragmentation
443[0.7 .. 0.9] considerable, but safe fragmentation
444[0.9 .. 1.0] severe fragmentation, may impact BlueFS ability to get space from BlueStore
445
446If detailed report of free fragments is required do::
447
448 ceph daemon osd.123 bluestore allocator dump block
449
450In case when handling OSD process that is not running fragmentation can be
451inspected with `ceph-bluestore-tool`.
452Get fragmentation score::
453
454 ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score
455
456And dump detailed free chunks::
457
458 ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump
459
81eedcae
TL
460BLUESTORE_LEGACY_STATFS
461_______________________
462
463In the Nautilus release, BlueStore tracks its internal usage
464statistics on a per-pool granular basis, and one or more OSDs have
465BlueStore volumes that were created prior to Nautilus. If *all* OSDs
466are older than Nautilus, this just means that the per-pool metrics are
467not available. However, if there is a mix of pre-Nautilus and
468post-Nautilus OSDs, the cluster usage statistics reported by ``ceph
469df`` will not be accurate.
470
471The old OSDs can be updated to use the new usage tracking scheme by stopping each OSD, running a repair operation, and the restarting it. For example, if ``osd.123`` needed to be updated,::
472
473 systemctl stop ceph-osd@123
474 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
475 systemctl start ceph-osd@123
476
477This warning can be disabled with::
478
479 ceph config set global bluestore_warn_on_legacy_statfs false
480
9f95a23c
TL
481BLUESTORE_NO_PER_POOL_OMAP
482__________________________
483
484Starting with the Octopus release, BlueStore tracks omap space utilization
485by pool, and one or more OSDs have volumes that were created prior to
486Octopus. If all OSDs are not running BlueStore with the new tracking
487enabled, the cluster will report and approximate value for per-pool omap usage
488based on the most recent deep-scrub.
489
490The old OSDs can be updated to track by pool by stopping each OSD,
491running a repair operation, and the restarting it. For example, if
492``osd.123`` needed to be updated,::
493
494 systemctl stop ceph-osd@123
495 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
496 systemctl start ceph-osd@123
497
498This warning can be disabled with::
499
500 ceph config set global bluestore_warn_on_no_per_pool_omap false
501
81eedcae
TL
502
503BLUESTORE_DISK_SIZE_MISMATCH
504____________________________
505
506One or more OSDs using BlueStore has an internal inconsistency between the size
507of the physical device and the metadata tracking its size. This can lead to
508the OSD crashing in the future.
509
510The OSDs in question should be destroyed and reprovisioned. Care should be
511taken to do this one OSD at a time, and in a way that doesn't put any data at
512risk. For example, if osd ``$N`` has the error,::
513
514 ceph osd out osd.$N
515 while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done
516 ceph osd destroy osd.$N
517 ceph-volume lvm zap /path/to/device
518 ceph-volume lvm create --osd-id $N --data /path/to/device
519
9f95a23c
TL
520BLUESTORE_NO_COMPRESSION
521________________________
522
523One or more OSDs is unable to load a BlueStore compression plugin.
524This can be caused by a broken installation, in which the ``ceph-osd``
525binary does not match the compression plugins, or a recent upgrade
526that did not include a restart of the ``ceph-osd`` daemon.
527
528Verify that the package(s) on the host running the OSD(s) in question
529are correctly installed and that the OSD daemon(s) have been
530restarted. If the problem persists, check the OSD log for any clues
531as to the source of the problem.
532
533
c07f9fc5 534
11fdf7f2
TL
535Device health
536-------------
537
538DEVICE_HEALTH
539_____________
540
541One or more devices is expected to fail soon, where the warning
542threshold is controlled by the ``mgr/devicehealth/warn_threshold``
543config option.
544
545This warning only applies to OSDs that are currently marked "in", so
546the expected response to this failure is to mark the device "out" so
547that data is migrated off of the device, and then to remove the
548hardware from the system. Note that the marking out is normally done
549automatically if ``mgr/devicehealth/self_heal`` is enabled based on
550the ``mgr/devicehealth/mark_out_threshold``.
551
552Device health can be checked with::
553
554 ceph device info <device-id>
555
556Device life expectancy is set by a prediction model run by
557the mgr or an by external tool via the command::
558
559 ceph device set-life-expectancy <device-id> <from> <to>
560
561You can change the stored life expectancy manually, but that usually
562doesn't accomplish anything as whatever tool originally set it will
563probably set it again, and changing the stored value does not affect
564the actual health of the hardware device.
565
566DEVICE_HEALTH_IN_USE
567____________________
568
569One or more devices is expected to fail soon and has been marked "out"
570of the cluster based on ``mgr/devicehealth/mark_out_threshold``, but it
571is still participating in one more PGs. This may be because it was
572only recently marked "out" and data is still migrating, or because data
573cannot be migrated off for some reason (e.g., the cluster is nearly
574full, or the CRUSH hierarchy is such that there isn't another suitable
575OSD to migrate the data too).
576
577This message can be silenced by disabling the self heal behavior
578(setting ``mgr/devicehealth/self_heal`` to false), by adjusting the
579``mgr/devicehealth/mark_out_threshold``, or by addressing what is
580preventing data from being migrated off of the ailing device.
581
582DEVICE_HEALTH_TOOMANY
583_____________________
584
585Too many devices is expected to fail soon and the
586``mgr/devicehealth/self_heal`` behavior is enabled, such that marking
587out all of the ailing devices would exceed the clusters
588``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being
589automatically marked "out".
590
591This generally indicates that too many devices in your cluster are
592expected to fail soon and you should take action to add newer
593(healthier) devices before too many devices fail and data is lost.
594
595The health message can also be silenced by adjusting parameters like
596``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``,
597but be warned that this will increase the likelihood of unrecoverable
598data loss in the cluster.
599
600
c07f9fc5 601Data health (pools & placement groups)
d2e6a577 602--------------------------------------
c07f9fc5
FG
603
604PG_AVAILABILITY
605_______________
606
607Data availability is reduced, meaning that the cluster is unable to
608service potential read or write requests for some data in the cluster.
609Specifically, one or more PGs is in a state that does not allow IO
610requests to be serviced. Problematic PG states include *peering*,
611*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear
612quickly).
613
614Detailed information about which PGs are affected is available from::
615
616 ceph health detail
617
618In most cases the root cause is that one or more OSDs is currently
11fdf7f2 619down; see the discussion for ``OSD_DOWN`` above.
c07f9fc5
FG
620
621The state of specific problematic PGs can be queried with::
622
623 ceph tell <pgid> query
624
625PG_DEGRADED
626___________
627
628Data redundancy is reduced for some data, meaning the cluster does not
629have the desired number of replicas for all data (for replicated
630pools) or erasure code fragments (for erasure coded pools).
631Specifically, one or more PGs:
632
633* has the *degraded* or *undersized* flag set, meaning there are not
634 enough instances of that placement group in the cluster;
635* has not had the *clean* flag set for some time.
636
637Detailed information about which PGs are affected is available from::
638
639 ceph health detail
640
641In most cases the root cause is that one or more OSDs is currently
642down; see the dicussion for ``OSD_DOWN`` above.
643
644The state of specific problematic PGs can be queried with::
645
646 ceph tell <pgid> query
647
648
eafe8130
TL
649PG_RECOVERY_FULL
650________________
651
652Data redundancy may be reduced or at risk for some data due to a lack
653of free space in the cluster. Specifically, one or more PGs has the
654*recovery_toofull* flag set, meaning that the
655cluster is unable to migrate or recover data because one or more OSDs
656is above the *full* threshold.
657
658See the discussion for *OSD_FULL* above for steps to resolve this condition.
659
660PG_BACKFILL_FULL
c07f9fc5
FG
661________________
662
663Data redundancy may be reduced or at risk for some data due to a lack
664of free space in the cluster. Specifically, one or more PGs has the
eafe8130 665*backfill_toofull* flag set, meaning that the
c07f9fc5
FG
666cluster is unable to migrate or recover data because one or more OSDs
667is above the *backfillfull* threshold.
668
eafe8130 669See the discussion for *OSD_BACKFILLFULL* above for
c07f9fc5
FG
670steps to resolve this condition.
671
672PG_DAMAGED
673__________
674
675Data scrubbing has discovered some problems with data consistency in
676the cluster. Specifically, one or more PGs has the *inconsistent* or
677*snaptrim_error* flag is set, indicating an earlier scrub operation
678found a problem, or that the *repair* flag is set, meaning a repair
679for such an inconsistency is currently in progress.
680
681See :doc:`pg-repair` for more information.
682
683OSD_SCRUB_ERRORS
684________________
685
686Recent OSD scrubs have uncovered inconsistencies. This error is generally
11fdf7f2 687paired with *PG_DAMAGED* (see above).
c07f9fc5
FG
688
689See :doc:`pg-repair` for more information.
690
11fdf7f2
TL
691LARGE_OMAP_OBJECTS
692__________________
693
694One or more pools contain large omap objects as determined by
695``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for number of keys
696to determine a large omap object) or
697``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for
698summed size (bytes) of all key values to determine a large omap object) or both.
699More information on the object name, key count, and size in bytes can be found
700by searching the cluster log for 'Large omap object found'. Large omap objects
701can be caused by RGW bucket index objects that do not have automatic resharding
702enabled. Please see :ref:`RGW Dynamic Bucket Index Resharding
703<rgw_dynamic_bucket_index_resharding>` for more information on resharding.
704
705The thresholds can be adjusted with::
706
707 ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys>
708 ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes>
709
c07f9fc5
FG
710CACHE_POOL_NEAR_FULL
711____________________
712
713A cache tier pool is nearly full. Full in this context is determined
714by the ``target_max_bytes`` and ``target_max_objects`` properties on
715the cache pool. Once the pool reaches the target threshold, write
716requests to the pool may block while data is flushed and evicted
717from the cache, a state that normally leads to very high latencies and
718poor performance.
719
720The cache pool target size can be adjusted with::
721
722 ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
723 ceph osd pool set <cache-pool-name> target_max_objects <objects>
724
725Normal cache flush and evict activity may also be throttled due to reduced
726availability or performance of the base tier, or overall cluster load.
727
728TOO_FEW_PGS
729___________
730
731The number of PGs in use in the cluster is below the configurable
732threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead
11fdf7f2
TL
733to suboptimal distribution and balance of data across the OSDs in
734the cluster, and similarly reduce overall performance.
c07f9fc5
FG
735
736This may be an expected condition if data pools have not yet been
737created.
738
11fdf7f2
TL
739The PG count for existing pools can be increased or new pools can be created.
740Please refer to :ref:`choosing-number-of-placement-groups` for more
741information.
742
92f5a8d4
TL
743POOL_PG_NUM_NOT_POWER_OF_TWO
744____________________________
745
746One or more pools has a ``pg_num`` value that is not a power of two.
747Although this is not strictly incorrect, it does lead to a less
748balanced distribution of data because some PGs have roughly twice as
749much data as others.
750
751This is easily corrected by setting the ``pg_num`` value for the
752affected pool(s) to a nearby power of two::
753
754 ceph osd pool set <pool-name> pg_num <value>
755
756This health warning can be disabled with::
757
758 ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
759
11fdf7f2
TL
760POOL_TOO_FEW_PGS
761________________
762
763One or more pools should probably have more PGs, based on the amount
764of data that is currently stored in the pool. This can lead to
765suboptimal distribution and balance of data across the OSDs in the
766cluster, and similarly reduce overall performance. This warning is
767generated if the ``pg_autoscale_mode`` property on the pool is set to
768``warn``.
769
770To disable the warning, you can disable auto-scaling of PGs for the
771pool entirely with::
772
773 ceph osd pool set <pool-name> pg_autoscale_mode off
774
775To allow the cluster to automatically adjust the number of PGs,::
776
777 ceph osd pool set <pool-name> pg_autoscale_mode on
778
779You can also manually set the number of PGs for the pool to the
780recommended amount with::
781
782 ceph osd pool set <pool-name> pg_num <new-pg-num>
783
784Please refer to :ref:`choosing-number-of-placement-groups` and
785:ref:`pg-autoscaler` for more information.
c07f9fc5
FG
786
787TOO_MANY_PGS
788____________
789
790The number of PGs in use in the cluster is above the configurable
3efd9988
FG
791threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is
792exceed the cluster will not allow new pools to be created, pool `pg_num` to
793be increased, or pool replication to be increased (any of which would lead to
794more PGs in the cluster). A large number of PGs can lead
c07f9fc5
FG
795to higher memory utilization for OSD daemons, slower peering after
796cluster state changes (like OSD restarts, additions, or removals), and
797higher load on the Manager and Monitor daemons.
798
3efd9988
FG
799The simplest way to mitigate the problem is to increase the number of
800OSDs in the cluster by adding more hardware. Note that the OSD count
801used for the purposes of this health check is the number of "in" OSDs,
802so marking "out" OSDs "in" (if there are any) can also help::
c07f9fc5 803
3efd9988 804 ceph osd in <osd id(s)>
c07f9fc5 805
11fdf7f2
TL
806Please refer to :ref:`choosing-number-of-placement-groups` for more
807information.
808
809POOL_TOO_MANY_PGS
810_________________
811
812One or more pools should probably have more PGs, based on the amount
813of data that is currently stored in the pool. This can lead to higher
814memory utilization for OSD daemons, slower peering after cluster state
815changes (like OSD restarts, additions, or removals), and higher load
816on the Manager and Monitor daemons. This warning is generated if the
817``pg_autoscale_mode`` property on the pool is set to ``warn``.
818
819To disable the warning, you can disable auto-scaling of PGs for the
820pool entirely with::
821
822 ceph osd pool set <pool-name> pg_autoscale_mode off
823
824To allow the cluster to automatically adjust the number of PGs,::
825
826 ceph osd pool set <pool-name> pg_autoscale_mode on
827
828You can also manually set the number of PGs for the pool to the
829recommended amount with::
830
831 ceph osd pool set <pool-name> pg_num <new-pg-num>
832
833Please refer to :ref:`choosing-number-of-placement-groups` and
834:ref:`pg-autoscaler` for more information.
835
9f95a23c 836POOL_TARGET_SIZE_BYTES_OVERCOMMITTED
11fdf7f2
TL
837____________________________________
838
9f95a23c
TL
839One or more pools have a ``target_size_bytes`` property set to
840estimate the expected size of the pool,
11fdf7f2
TL
841but the value(s) exceed the total available storage (either by
842themselves or in combination with other pools' actual usage).
843
9f95a23c 844This is usually an indication that the ``target_size_bytes`` value for
11fdf7f2
TL
845the pool is too large and should be reduced or set to zero with::
846
9f95a23c 847 ceph osd pool set <pool-name> target_size_bytes 0
11fdf7f2
TL
848
849For more information, see :ref:`specifying_pool_target_size`.
850
9f95a23c 851POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO
11fdf7f2
TL
852____________________________________
853
9f95a23c
TL
854One or more pools have both ``target_size_bytes`` and
855``target_size_ratio`` set to estimate the expected size of the pool.
856Only one of these properties should be non-zero. If both are set,
857``target_size_ratio`` takes precedence and ``target_size_bytes`` is
858ignored.
11fdf7f2 859
9f95a23c 860To reset ``target_size_bytes`` to zero::
11fdf7f2
TL
861
862 ceph osd pool set <pool-name> target_size_bytes 0
863
864For more information, see :ref:`specifying_pool_target_size`.
c07f9fc5 865
eafe8130
TL
866TOO_FEW_OSDS
867____________
868
869The number of OSDs in the cluster is below the configurable
870threshold of ``osd_pool_default_size``.
871
c07f9fc5
FG
872SMALLER_PGP_NUM
873_______________
874
875One or more pools has a ``pgp_num`` value less than ``pg_num``. This
876is normally an indication that the PG count was increased without
877also increasing the placement behavior.
878
879This is sometimes done deliberately to separate out the `split` step
880when the PG count is adjusted from the data migration that is needed
881when ``pgp_num`` is changed.
882
883This is normally resolved by setting ``pgp_num`` to match ``pg_num``,
884triggering the data migration, with::
885
886 ceph osd pool set <pool> pgp_num <pg-num-value>
887
c07f9fc5
FG
888MANY_OBJECTS_PER_PG
889___________________
890
891One or more pools has an average number of objects per PG that is
892significantly higher than the overall cluster average. The specific
893threshold is controlled by the ``mon_pg_warn_max_object_skew``
894configuration value.
895
896This is usually an indication that the pool(s) containing most of the
897data in the cluster have too few PGs, and/or that other pools that do
898not contain as much data have too many PGs. See the discussion of
899*TOO_MANY_PGS* above.
900
901The threshold can be raised to silence the health warning by adjusting
9f95a23c 902the ``mon_pg_warn_max_object_skew`` config option on the managers.
c07f9fc5 903
11fdf7f2 904
c07f9fc5
FG
905POOL_APP_NOT_ENABLED
906____________________
907
908A pool exists that contains one or more objects but has not been
909tagged for use by a particular application.
910
911Resolve this warning by labeling the pool for use by an application. For
912example, if the pool is used by RBD,::
913
914 rbd pool init <poolname>
915
916If the pool is being used by a custom application 'foo', you can also label
917via the low-level command::
918
919 ceph osd pool application enable foo
920
11fdf7f2 921For more information, see :ref:`associate-pool-to-application`.
c07f9fc5
FG
922
923POOL_FULL
924_________
925
926One or more pools has reached (or is very close to reaching) its
927quota. The threshold to trigger this error condition is controlled by
928the ``mon_pool_quota_crit_threshold`` configuration option.
929
930Pool quotas can be adjusted up or down (or removed) with::
931
932 ceph osd pool set-quota <pool> max_bytes <bytes>
933 ceph osd pool set-quota <pool> max_objects <objects>
934
11fdf7f2 935Setting the quota value to 0 will disable the quota.
c07f9fc5
FG
936
937POOL_NEAR_FULL
938______________
939
940One or more pools is approaching is quota. The threshold to trigger
941this warning condition is controlled by the
942``mon_pool_quota_warn_threshold`` configuration option.
943
944Pool quotas can be adjusted up or down (or removed) with::
945
946 ceph osd pool set-quota <pool> max_bytes <bytes>
947 ceph osd pool set-quota <pool> max_objects <objects>
948
949Setting the quota value to 0 will disable the quota.
950
951OBJECT_MISPLACED
952________________
953
954One or more objects in the cluster is not stored on the node the
955cluster would like it to be stored on. This is an indication that
956data migration due to some recent cluster change has not yet completed.
957
958Misplaced data is not a dangerous condition in and of itself; data
959consistency is never at risk, and old copies of objects are never
960removed until the desired number of new copies (in the desired
961locations) are present.
962
963OBJECT_UNFOUND
964______________
965
966One or more objects in the cluster cannot be found. Specifically, the
967OSDs know that a new or updated copy of an object should exist, but a
968copy of that version of the object has not been found on OSDs that are
969currently online.
970
971Read or write requests to unfound objects will block.
972
973Ideally, a down OSD can be brought back online that has the more
974recent copy of the unfound object. Candidate OSDs can be identified from the
975peering state for the PG(s) responsible for the unfound object::
976
977 ceph tell <pgid> query
978
979If the latest copy of the object is not available, the cluster can be
11fdf7f2
TL
980told to roll back to a previous version of the object. See
981:ref:`failures-osd-unfound` for more information.
c07f9fc5 982
11fdf7f2
TL
983SLOW_OPS
984________
c07f9fc5
FG
985
986One or more OSD requests is taking a long time to process. This can
987be an indication of extreme load, a slow storage device, or a software
988bug.
989
990The request queue on the OSD(s) in question can be queried with the
991following command, executed from the OSD host::
992
993 ceph daemon osd.<id> ops
994
995A summary of the slowest recent requests can be seen with::
996
997 ceph daemon osd.<id> dump_historic_ops
998
999The location of an OSD can be found with::
1000
1001 ceph osd find osd.<id>
1002
c07f9fc5
FG
1003PG_NOT_SCRUBBED
1004_______________
1005
1006One or more PGs has not been scrubbed recently. PGs are normally
1007scrubbed every ``mon_scrub_interval`` seconds, and this warning
11fdf7f2
TL
1008triggers when ``mon_warn_pg_not_scrubbed_ratio`` percentage of interval has elapsed
1009without a scrub since it was due.
c07f9fc5
FG
1010
1011PGs will not scrub if they are not flagged as *clean*, which may
1012happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
1013*PG_DEGRADED* above).
1014
1015You can manually initiate a scrub of a clean PG with::
1016
1017 ceph pg scrub <pgid>
1018
1019PG_NOT_DEEP_SCRUBBED
1020____________________
1021
1022One or more PGs has not been deep scrubbed recently. PGs are normally
a8e16298 1023scrubbed every ``osd_deep_scrub_interval`` seconds, and this warning
11fdf7f2
TL
1024triggers when ``mon_warn_pg_not_deep_scrubbed_ratio`` percentage of interval has elapsed
1025without a scrub since it was due.
c07f9fc5
FG
1026
1027PGs will not (deep) scrub if they are not flagged as *clean*, which may
1028happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
1029*PG_DEGRADED* above).
1030
1031You can manually initiate a scrub of a clean PG with::
1032
1033 ceph pg deep-scrub <pgid>
eafe8130
TL
1034
1035
9f95a23c
TL
1036PG_SLOW_SNAP_TRIMMING
1037_____________________
1038
1039The snapshot trim queue for one or more PGs has exceeded the
1040configured warning threshold. This indicates that either an extremely
1041large number of snapshots were recently deleted, or that the OSDs are
1042unable to trim snapshots quickly enough to keep up with the rate of
1043new snapshot deletions.
1044
1045The warning threshold is controlled by the
1046``mon_osd_snap_trim_queue_warn_on`` option (default: 32768).
1047
1048This warning may trigger if OSDs are under excessive load and unable
1049to keep up with their background work, or if the OSDs' internal
1050metadata database is heavily fragmented and unable to perform. It may
1051also indicate some other performance issue with the OSDs.
1052
1053The exact size of the snapshot trim queue is reported by the
1054``snaptrimq_len`` field of ``ceph pg ls -f json-detail``.
1055
1056
1057
eafe8130
TL
1058Miscellaneous
1059-------------
1060
1061RECENT_CRASH
1062____________
1063
1064One or more Ceph daemons has crashed recently, and the crash has not
1065yet been archived (acknowledged) by the administrator. This may
1066indicate a software bug, a hardware problem (e.g., a failing disk), or
1067some other problem.
1068
1069New crashes can be listed with::
1070
1071 ceph crash ls-new
1072
1073Information about a specific crash can be examined with::
1074
1075 ceph crash info <crash-id>
1076
1077This warning can be silenced by "archiving" the crash (perhaps after
1078being examined by an administrator) so that it does not generate this
1079warning::
1080
1081 ceph crash archive <crash-id>
1082
1083Similarly, all new crashes can be archived with::
1084
1085 ceph crash archive-all
1086
1087Archived crashes will still be visible via ``ceph crash ls`` but not
1088``ceph crash ls-new``.
1089
1090The time period for what "recent" means is controlled by the option
1091``mgr/crash/warn_recent_interval`` (default: two weeks).
1092
1093These warnings can be disabled entirely with::
1094
1095 ceph config set mgr/crash/warn_recent_interval 0
1096
1097TELEMETRY_CHANGED
1098_________________
1099
1100Telemetry has been enabled, but the contents of the telemetry report
1101have changed since that time, so telemetry reports will not be sent.
1102
1103The Ceph developers periodically revise the telemetry feature to
1104include new and useful information, or to remove information found to
1105be useless or sensitive. If any new information is included in the
1106report, Ceph will require the administrator to re-enable telemetry to
1107ensure they have an opportunity to (re)review what information will be
1108shared.
1109
1110To review the contents of the telemetry report,::
1111
1112 ceph telemetry show
1113
1114Note that the telemetry report consists of several optional channels
1115that may be independently enabled or disabled. For more information, see
1116:ref:`telemetry`.
1117
1118To re-enable telemetry (and make this warning go away),::
1119
1120 ceph telemetry on
1121
1122To disable telemetry (and make this warning go away),::
1123
1124 ceph telemetry off
9f95a23c
TL
1125
1126AUTH_BAD_CAPS
1127_____________
1128
1129One or more auth users has capabilities that cannot be parsed by the
1130monitor. This generally indicates that the user will not be
1131authorized to perform any action with one or more daemon types.
1132
1133This error is mostly likely to occur after an upgrade if the
1134capabilities were set with an older version of Ceph that did not
1135properly validate their syntax, or if the syntax of the capabilities
1136has changed.
1137
1138The user in question can be removed with::
1139
1140 ceph auth rm <entity-name>
1141
1142(This will resolve the health alert, but obviously clients will not be
1143able to authenticate as that user.)
1144
1145Alternatively, the capabilities for the user can be updated with::
1146
1147 ceph auth <entity-name> <daemon-type> <caps> [<daemon-type> <caps> ...]
1148
1149For more information about auth capabilities, see :ref:`user-management`.
1150
1151
1152OSD_NO_DOWN_OUT_INTERVAL
1153________________________
1154
1155The ``mon_osd_down_out_interval`` option is set to zero, which means
1156that the system will not automatically perform any repair or healing
1157operations after an OSD fails. Instead, an administrator (or some
1158other external entity) will need to manually mark down OSDs as 'out'
1159(i.e., via ``ceph osd out <osd-id>``) in order to trigger recovery.
1160
1161This option is normally set to five or ten minutes--enough time for a
1162host to power-cycle or reboot.
1163
1164This warning can silenced by setting the
1165``mon_warn_on_osd_down_out_interval_zero`` to false::
1166
1167 ceph config global mon mon_warn_on_osd_down_out_interval_zero false