]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/health-checks.rst
import new upstream nautilus stable release 14.2.8
[ceph.git] / ceph / doc / rados / operations / health-checks.rst
CommitLineData
c07f9fc5
FG
1
2=============
3Health checks
4=============
5
6Overview
7========
8
9There is a finite set of possible health messages that a Ceph cluster can
10raise -- these are defined as *health checks* which have unique identifiers.
11
12The identifier is a terse pseudo-human-readable (i.e. like a variable name)
13string. It is intended to enable tools (such as UIs) to make sense of
14health checks, and present them in a way that reflects their meaning.
15
16This page lists the health checks that are raised by the monitor and manager
17daemons. In addition to these, you may also see health checks that originate
11fdf7f2 18from MDS daemons (see :ref:`cephfs-health-messages`), and health checks
c07f9fc5
FG
19that are defined by ceph-mgr python modules.
20
21Definitions
22===========
23
11fdf7f2
TL
24Monitor
25-------
26
27MON_DOWN
28________
29
30One or more monitor daemons is currently down. The cluster requires a
31majority (more than 1/2) of the monitors in order to function. When
32one or more monitors are down, clients may have a harder time forming
33their initial connection to the cluster as they may need to try more
34addresses before they reach an operating monitor.
35
36The down monitor daemon should generally be restarted as soon as
37possible to reduce the risk of a subsequen monitor failure leading to
38a service outage.
39
40MON_CLOCK_SKEW
41______________
42
43The clocks on the hosts running the ceph-mon monitor daemons are not
44sufficiently well synchronized. This health alert is raised if the
45cluster detects a clock skew greater than ``mon_clock_drift_allowed``.
46
47This is best resolved by synchronizing the clocks using a tool like
48``ntpd`` or ``chrony``.
49
50If it is impractical to keep the clocks closely synchronized, the
51``mon_clock_drift_allowed`` threshold can also be increased, but this
52value must stay significantly below the ``mon_lease`` interval in
53order for monitor cluster to function properly.
54
55MON_MSGR2_NOT_ENABLED
56_____________________
57
58The ``ms_bind_msgr2`` option is enabled but one or more monitors is
59not configured to bind to a v2 port in the cluster's monmap. This
60means that features specific to the msgr2 protocol (e.g., encryption)
61are not available on some or all connections.
62
63In most cases this can be corrected by issuing the command::
64
65 ceph mon enable-msgr2
66
67That command will change any monitor configured for the old default
68port 6789 to continue to listen for v1 connections on 6789 and also
69listen for v2 connections on the new default 3300 port.
70
71If a monitor is configured to listen for v1 connections on a non-standard port (not 6789), then the monmap will need to be modified manually.
72
73
74
75Manager
76-------
77
78MGR_MODULE_DEPENDENCY
79_____________________
80
81An enabled manager module is failing its dependency check. This health check
82should come with an explanatory message from the module about the problem.
83
84For example, a module might report that a required package is not installed:
85install the required package and restart your manager daemons.
86
87This health check is only applied to enabled modules. If a module is
88not enabled, you can see whether it is reporting dependency issues in
89the output of `ceph module ls`.
90
91
92MGR_MODULE_ERROR
93________________
94
95A manager module has experienced an unexpected error. Typically,
96this means an unhandled exception was raised from the module's `serve`
97function. The human readable description of the error may be obscurely
98worded if the exception did not provide a useful description of itself.
99
100This health check may indicate a bug: please open a Ceph bug report if you
101think you have encountered a bug.
102
103If you believe the error is transient, you may restart your manager
104daemon(s), or use `ceph mgr fail` on the active daemon to prompt
105a failover to another daemon.
106
c07f9fc5
FG
107
108OSDs
109----
110
111OSD_DOWN
112________
113
114One or more OSDs are marked down. The ceph-osd daemon may have been
115stopped, or peer OSDs may be unable to reach the OSD over the network.
116Common causes include a stopped or crashed daemon, a down host, or a
117network outage.
118
119Verify the host is healthy, the daemon is started, and network is
120functioning. If the daemon has crashed, the daemon log file
121(``/var/log/ceph/ceph-osd.*``) may contain debugging information.
122
123OSD_<crush type>_DOWN
124_____________________
125
126(e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN)
127
128All the OSDs within a particular CRUSH subtree are marked down, for example
129all OSDs on a host.
130
131OSD_ORPHAN
132__________
133
134An OSD is referenced in the CRUSH map hierarchy but does not exist.
135
136The OSD can be removed from the CRUSH hierarchy with::
137
138 ceph osd crush rm osd.<id>
139
140OSD_OUT_OF_ORDER_FULL
141_____________________
142
143The utilization thresholds for `backfillfull`, `nearfull`, `full`,
144and/or `failsafe_full` are not ascending. In particular, we expect
145`backfillfull < nearfull`, `nearfull < full`, and `full <
146failsafe_full`.
147
148The thresholds can be adjusted with::
149
150 ceph osd set-backfillfull-ratio <ratio>
151 ceph osd set-nearfull-ratio <ratio>
152 ceph osd set-full-ratio <ratio>
153
154
155OSD_FULL
156________
157
158One or more OSDs has exceeded the `full` threshold and is preventing
159the cluster from servicing writes.
160
161Utilization by pool can be checked with::
162
163 ceph df
164
165The currently defined `full` ratio can be seen with::
166
167 ceph osd dump | grep full_ratio
168
169A short-term workaround to restore write availability is to raise the full
170threshold by a small amount::
171
172 ceph osd set-full-ratio <ratio>
173
174New storage should be added to the cluster by deploying more OSDs or
175existing data should be deleted in order to free up space.
11fdf7f2 176
c07f9fc5
FG
177OSD_BACKFILLFULL
178________________
179
180One or more OSDs has exceeded the `backfillfull` threshold, which will
181prevent data from being allowed to rebalance to this device. This is
182an early warning that rebalancing may not be able to complete and that
183the cluster is approaching full.
184
185Utilization by pool can be checked with::
186
187 ceph df
188
189OSD_NEARFULL
190____________
191
192One or more OSDs has exceeded the `nearfull` threshold. This is an early
193warning that the cluster is approaching full.
194
195Utilization by pool can be checked with::
196
197 ceph df
198
199OSDMAP_FLAGS
200____________
201
202One or more cluster flags of interest has been set. These flags include:
203
81eedcae 204* *full* - the cluster is flagged as full and cannot serve writes
c07f9fc5
FG
205* *pauserd*, *pausewr* - paused reads or writes
206* *noup* - OSDs are not allowed to start
207* *nodown* - OSD failure reports are being ignored, such that the
208 monitors will not mark OSDs `down`
209* *noin* - OSDs that were previously marked `out` will not be marked
210 back `in` when they start
211* *noout* - down OSDs will not automatically be marked out after the
212 configured interval
213* *nobackfill*, *norecover*, *norebalance* - recovery or data
214 rebalancing is suspended
215* *noscrub*, *nodeep_scrub* - scrubbing is disabled
216* *notieragent* - cache tiering activity is suspended
217
218With the exception of *full*, these flags can be set or cleared with::
219
220 ceph osd set <flag>
221 ceph osd unset <flag>
11fdf7f2 222
c07f9fc5
FG
223OSD_FLAGS
224_________
225
81eedcae
TL
226One or more OSDs or CRUSH {nodes,device classes} has a flag of interest set.
227These flags include:
c07f9fc5 228
81eedcae
TL
229* *noup*: these OSDs are not allowed to start
230* *nodown*: failure reports for these OSDs will be ignored
231* *noin*: if these OSDs were previously marked `out` automatically
232 after a failure, they will not be marked in when they start
233* *noout*: if these OSDs are down they will not automatically be marked
c07f9fc5
FG
234 `out` after the configured interval
235
81eedcae 236These flags can be set and cleared in batch with::
c07f9fc5 237
81eedcae
TL
238 ceph osd set-group <flags> <who>
239 ceph osd unset-group <flags> <who>
c07f9fc5
FG
240
241For example, ::
242
81eedcae
TL
243 ceph osd set-group noup,noout osd.0 osd.1
244 ceph osd unset-group noup,noout osd.0 osd.1
245 ceph osd set-group noup,noout host-foo
246 ceph osd unset-group noup,noout host-foo
247 ceph osd set-group noup,noout class-hdd
248 ceph osd unset-group noup,noout class-hdd
c07f9fc5
FG
249
250OLD_CRUSH_TUNABLES
251__________________
252
253The CRUSH map is using very old settings and should be updated. The
254oldest tunables that can be used (i.e., the oldest client version that
255can connect to the cluster) without triggering this health warning is
256determined by the ``mon_crush_min_required_version`` config option.
11fdf7f2 257See :ref:`crush-map-tunables` for more information.
c07f9fc5
FG
258
259OLD_CRUSH_STRAW_CALC_VERSION
260____________________________
261
262The CRUSH map is using an older, non-optimal method for calculating
263intermediate weight values for ``straw`` buckets.
264
265The CRUSH map should be updated to use the newer method
266(``straw_calc_version=1``). See
11fdf7f2 267:ref:`crush-map-tunables` for more information.
c07f9fc5
FG
268
269CACHE_POOL_NO_HIT_SET
270_____________________
271
272One or more cache pools is not configured with a *hit set* to track
273utilization, which will prevent the tiering agent from identifying
274cold objects to flush and evict from the cache.
275
276Hit sets can be configured on the cache pool with::
277
278 ceph osd pool set <poolname> hit_set_type <type>
279 ceph osd pool set <poolname> hit_set_period <period-in-seconds>
280 ceph osd pool set <poolname> hit_set_count <number-of-hitsets>
11fdf7f2 281 ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate>
c07f9fc5
FG
282
283OSD_NO_SORTBITWISE
284__________________
285
286No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not
287been set.
288
289The ``sortbitwise`` flag must be set before luminous v12.y.z or newer
290OSDs can start. You can safely set the flag with::
291
292 ceph osd set sortbitwise
293
294POOL_FULL
295_________
296
297One or more pools has reached its quota and is no longer allowing writes.
298
299Pool quotas and utilization can be seen with::
300
301 ceph df detail
302
303You can either raise the pool quota with::
304
305 ceph osd pool set-quota <poolname> max_objects <num-objects>
306 ceph osd pool set-quota <poolname> max_bytes <num-bytes>
307
308or delete some existing data to reduce utilization.
309
81eedcae
TL
310BLUEFS_SPILLOVER
311________________
312
313One or more OSDs that use the BlueStore backend have been allocated
314`db` partitions (storage space for metadata, normally on a faster
315device) but that space has filled, such that metadata has "spilled
316over" onto the normal slow device. This isn't necessarily an error
317condition or even unexpected, but if the administrator's expectation
318was that all metadata would fit on the faster device, it indicates
319that not enough space was provided.
320
321This warning can be disabled on all OSDs with::
322
323 ceph config set osd bluestore_warn_on_bluefs_spillover false
324
325Alternatively, it can be disabled on a specific OSD with::
326
327 ceph config set osd.123 bluestore_warn_on_bluefs_spillover false
328
329To provide more metadata space, the OSD in question could be destroyed and
330reprovisioned. This will involve data migration and recovery.
331
332It may also be possible to expand the LVM logical volume backing the
333`db` storage. If the underlying LV has been expanded, the OSD daemon
334needs to be stopped and BlueFS informed of the device size change with::
335
336 ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID
337
eafe8130
TL
338BLUEFS_AVAILABLE_SPACE
339______________________
340
341To check how much space is free for BlueFS do::
342
343 ceph daemon osd.123 bluestore bluefs available
344
345This will output up to 3 values: `BDEV_DB free`, `BDEV_SLOW free` and
346`available_from_bluestore`. `BDEV_DB` and `BDEV_SLOW` report amount of space that
347has been acquired by BlueFS and is considered free. Value `available_from_bluestore`
348denotes ability of BlueStore to relinquish more space to BlueFS.
349It is normal that this value is different from amount of BlueStore free space, as
350BlueFS allocation unit is typically larger than BlueStore allocation unit.
351This means that only part of BlueStore free space will be acceptable for BlueFS.
352
353BLUEFS_LOW_SPACE
354_________________
355
356If BlueFS is running low on available free space and there is little
357`available_from_bluestore` one can consider reducing BlueFS allocation unit size.
358To simulate available space when allocation unit is different do::
359
360 ceph daemon osd.123 bluestore bluefs available <alloc-unit-size>
361
362BLUESTORE_FRAGMENTATION
363_______________________
364
365As BlueStore works free space on underlying storage will get fragmented.
366This is normal and unavoidable but excessive fragmentation will cause slowdown.
367To inspect BlueStore fragmentation one can do::
368
369 ceph daemon osd.123 bluestore allocator score block
370
371Score is given in [0-1] range.
372[0.0 .. 0.4] tiny fragmentation
373[0.4 .. 0.7] small, acceptable fragmentation
374[0.7 .. 0.9] considerable, but safe fragmentation
375[0.9 .. 1.0] severe fragmentation, may impact BlueFS ability to get space from BlueStore
376
377If detailed report of free fragments is required do::
378
379 ceph daemon osd.123 bluestore allocator dump block
380
381In case when handling OSD process that is not running fragmentation can be
382inspected with `ceph-bluestore-tool`.
383Get fragmentation score::
384
385 ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score
386
387And dump detailed free chunks::
388
389 ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump
390
81eedcae
TL
391BLUESTORE_LEGACY_STATFS
392_______________________
393
394In the Nautilus release, BlueStore tracks its internal usage
395statistics on a per-pool granular basis, and one or more OSDs have
396BlueStore volumes that were created prior to Nautilus. If *all* OSDs
397are older than Nautilus, this just means that the per-pool metrics are
398not available. However, if there is a mix of pre-Nautilus and
399post-Nautilus OSDs, the cluster usage statistics reported by ``ceph
400df`` will not be accurate.
401
402The old OSDs can be updated to use the new usage tracking scheme by stopping each OSD, running a repair operation, and the restarting it. For example, if ``osd.123`` needed to be updated,::
403
404 systemctl stop ceph-osd@123
405 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
406 systemctl start ceph-osd@123
407
408This warning can be disabled with::
409
410 ceph config set global bluestore_warn_on_legacy_statfs false
411
412
413BLUESTORE_DISK_SIZE_MISMATCH
414____________________________
415
416One or more OSDs using BlueStore has an internal inconsistency between the size
417of the physical device and the metadata tracking its size. This can lead to
418the OSD crashing in the future.
419
420The OSDs in question should be destroyed and reprovisioned. Care should be
421taken to do this one OSD at a time, and in a way that doesn't put any data at
422risk. For example, if osd ``$N`` has the error,::
423
424 ceph osd out osd.$N
425 while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done
426 ceph osd destroy osd.$N
427 ceph-volume lvm zap /path/to/device
428 ceph-volume lvm create --osd-id $N --data /path/to/device
429
c07f9fc5 430
11fdf7f2
TL
431Device health
432-------------
433
434DEVICE_HEALTH
435_____________
436
437One or more devices is expected to fail soon, where the warning
438threshold is controlled by the ``mgr/devicehealth/warn_threshold``
439config option.
440
441This warning only applies to OSDs that are currently marked "in", so
442the expected response to this failure is to mark the device "out" so
443that data is migrated off of the device, and then to remove the
444hardware from the system. Note that the marking out is normally done
445automatically if ``mgr/devicehealth/self_heal`` is enabled based on
446the ``mgr/devicehealth/mark_out_threshold``.
447
448Device health can be checked with::
449
450 ceph device info <device-id>
451
452Device life expectancy is set by a prediction model run by
453the mgr or an by external tool via the command::
454
455 ceph device set-life-expectancy <device-id> <from> <to>
456
457You can change the stored life expectancy manually, but that usually
458doesn't accomplish anything as whatever tool originally set it will
459probably set it again, and changing the stored value does not affect
460the actual health of the hardware device.
461
462DEVICE_HEALTH_IN_USE
463____________________
464
465One or more devices is expected to fail soon and has been marked "out"
466of the cluster based on ``mgr/devicehealth/mark_out_threshold``, but it
467is still participating in one more PGs. This may be because it was
468only recently marked "out" and data is still migrating, or because data
469cannot be migrated off for some reason (e.g., the cluster is nearly
470full, or the CRUSH hierarchy is such that there isn't another suitable
471OSD to migrate the data too).
472
473This message can be silenced by disabling the self heal behavior
474(setting ``mgr/devicehealth/self_heal`` to false), by adjusting the
475``mgr/devicehealth/mark_out_threshold``, or by addressing what is
476preventing data from being migrated off of the ailing device.
477
478DEVICE_HEALTH_TOOMANY
479_____________________
480
481Too many devices is expected to fail soon and the
482``mgr/devicehealth/self_heal`` behavior is enabled, such that marking
483out all of the ailing devices would exceed the clusters
484``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being
485automatically marked "out".
486
487This generally indicates that too many devices in your cluster are
488expected to fail soon and you should take action to add newer
489(healthier) devices before too many devices fail and data is lost.
490
491The health message can also be silenced by adjusting parameters like
492``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``,
493but be warned that this will increase the likelihood of unrecoverable
494data loss in the cluster.
495
496
c07f9fc5 497Data health (pools & placement groups)
d2e6a577 498--------------------------------------
c07f9fc5
FG
499
500PG_AVAILABILITY
501_______________
502
503Data availability is reduced, meaning that the cluster is unable to
504service potential read or write requests for some data in the cluster.
505Specifically, one or more PGs is in a state that does not allow IO
506requests to be serviced. Problematic PG states include *peering*,
507*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear
508quickly).
509
510Detailed information about which PGs are affected is available from::
511
512 ceph health detail
513
514In most cases the root cause is that one or more OSDs is currently
11fdf7f2 515down; see the discussion for ``OSD_DOWN`` above.
c07f9fc5
FG
516
517The state of specific problematic PGs can be queried with::
518
519 ceph tell <pgid> query
520
521PG_DEGRADED
522___________
523
524Data redundancy is reduced for some data, meaning the cluster does not
525have the desired number of replicas for all data (for replicated
526pools) or erasure code fragments (for erasure coded pools).
527Specifically, one or more PGs:
528
529* has the *degraded* or *undersized* flag set, meaning there are not
530 enough instances of that placement group in the cluster;
531* has not had the *clean* flag set for some time.
532
533Detailed information about which PGs are affected is available from::
534
535 ceph health detail
536
537In most cases the root cause is that one or more OSDs is currently
538down; see the dicussion for ``OSD_DOWN`` above.
539
540The state of specific problematic PGs can be queried with::
541
542 ceph tell <pgid> query
543
544
eafe8130
TL
545PG_RECOVERY_FULL
546________________
547
548Data redundancy may be reduced or at risk for some data due to a lack
549of free space in the cluster. Specifically, one or more PGs has the
550*recovery_toofull* flag set, meaning that the
551cluster is unable to migrate or recover data because one or more OSDs
552is above the *full* threshold.
553
554See the discussion for *OSD_FULL* above for steps to resolve this condition.
555
556PG_BACKFILL_FULL
c07f9fc5
FG
557________________
558
559Data redundancy may be reduced or at risk for some data due to a lack
560of free space in the cluster. Specifically, one or more PGs has the
eafe8130 561*backfill_toofull* flag set, meaning that the
c07f9fc5
FG
562cluster is unable to migrate or recover data because one or more OSDs
563is above the *backfillfull* threshold.
564
eafe8130 565See the discussion for *OSD_BACKFILLFULL* above for
c07f9fc5
FG
566steps to resolve this condition.
567
568PG_DAMAGED
569__________
570
571Data scrubbing has discovered some problems with data consistency in
572the cluster. Specifically, one or more PGs has the *inconsistent* or
573*snaptrim_error* flag is set, indicating an earlier scrub operation
574found a problem, or that the *repair* flag is set, meaning a repair
575for such an inconsistency is currently in progress.
576
577See :doc:`pg-repair` for more information.
578
579OSD_SCRUB_ERRORS
580________________
581
582Recent OSD scrubs have uncovered inconsistencies. This error is generally
11fdf7f2 583paired with *PG_DAMAGED* (see above).
c07f9fc5
FG
584
585See :doc:`pg-repair` for more information.
586
11fdf7f2
TL
587LARGE_OMAP_OBJECTS
588__________________
589
590One or more pools contain large omap objects as determined by
591``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for number of keys
592to determine a large omap object) or
593``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for
594summed size (bytes) of all key values to determine a large omap object) or both.
595More information on the object name, key count, and size in bytes can be found
596by searching the cluster log for 'Large omap object found'. Large omap objects
597can be caused by RGW bucket index objects that do not have automatic resharding
598enabled. Please see :ref:`RGW Dynamic Bucket Index Resharding
599<rgw_dynamic_bucket_index_resharding>` for more information on resharding.
600
601The thresholds can be adjusted with::
602
603 ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys>
604 ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes>
605
c07f9fc5
FG
606CACHE_POOL_NEAR_FULL
607____________________
608
609A cache tier pool is nearly full. Full in this context is determined
610by the ``target_max_bytes`` and ``target_max_objects`` properties on
611the cache pool. Once the pool reaches the target threshold, write
612requests to the pool may block while data is flushed and evicted
613from the cache, a state that normally leads to very high latencies and
614poor performance.
615
616The cache pool target size can be adjusted with::
617
618 ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
619 ceph osd pool set <cache-pool-name> target_max_objects <objects>
620
621Normal cache flush and evict activity may also be throttled due to reduced
622availability or performance of the base tier, or overall cluster load.
623
624TOO_FEW_PGS
625___________
626
627The number of PGs in use in the cluster is below the configurable
628threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead
11fdf7f2
TL
629to suboptimal distribution and balance of data across the OSDs in
630the cluster, and similarly reduce overall performance.
c07f9fc5
FG
631
632This may be an expected condition if data pools have not yet been
633created.
634
11fdf7f2
TL
635The PG count for existing pools can be increased or new pools can be created.
636Please refer to :ref:`choosing-number-of-placement-groups` for more
637information.
638
92f5a8d4
TL
639POOL_PG_NUM_NOT_POWER_OF_TWO
640____________________________
641
642One or more pools has a ``pg_num`` value that is not a power of two.
643Although this is not strictly incorrect, it does lead to a less
644balanced distribution of data because some PGs have roughly twice as
645much data as others.
646
647This is easily corrected by setting the ``pg_num`` value for the
648affected pool(s) to a nearby power of two::
649
650 ceph osd pool set <pool-name> pg_num <value>
651
652This health warning can be disabled with::
653
654 ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
655
11fdf7f2
TL
656POOL_TOO_FEW_PGS
657________________
658
659One or more pools should probably have more PGs, based on the amount
660of data that is currently stored in the pool. This can lead to
661suboptimal distribution and balance of data across the OSDs in the
662cluster, and similarly reduce overall performance. This warning is
663generated if the ``pg_autoscale_mode`` property on the pool is set to
664``warn``.
665
666To disable the warning, you can disable auto-scaling of PGs for the
667pool entirely with::
668
669 ceph osd pool set <pool-name> pg_autoscale_mode off
670
671To allow the cluster to automatically adjust the number of PGs,::
672
673 ceph osd pool set <pool-name> pg_autoscale_mode on
674
675You can also manually set the number of PGs for the pool to the
676recommended amount with::
677
678 ceph osd pool set <pool-name> pg_num <new-pg-num>
679
680Please refer to :ref:`choosing-number-of-placement-groups` and
681:ref:`pg-autoscaler` for more information.
c07f9fc5
FG
682
683TOO_MANY_PGS
684____________
685
686The number of PGs in use in the cluster is above the configurable
3efd9988
FG
687threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is
688exceed the cluster will not allow new pools to be created, pool `pg_num` to
689be increased, or pool replication to be increased (any of which would lead to
690more PGs in the cluster). A large number of PGs can lead
c07f9fc5
FG
691to higher memory utilization for OSD daemons, slower peering after
692cluster state changes (like OSD restarts, additions, or removals), and
693higher load on the Manager and Monitor daemons.
694
3efd9988
FG
695The simplest way to mitigate the problem is to increase the number of
696OSDs in the cluster by adding more hardware. Note that the OSD count
697used for the purposes of this health check is the number of "in" OSDs,
698so marking "out" OSDs "in" (if there are any) can also help::
c07f9fc5 699
3efd9988 700 ceph osd in <osd id(s)>
c07f9fc5 701
11fdf7f2
TL
702Please refer to :ref:`choosing-number-of-placement-groups` for more
703information.
704
705POOL_TOO_MANY_PGS
706_________________
707
708One or more pools should probably have more PGs, based on the amount
709of data that is currently stored in the pool. This can lead to higher
710memory utilization for OSD daemons, slower peering after cluster state
711changes (like OSD restarts, additions, or removals), and higher load
712on the Manager and Monitor daemons. This warning is generated if the
713``pg_autoscale_mode`` property on the pool is set to ``warn``.
714
715To disable the warning, you can disable auto-scaling of PGs for the
716pool entirely with::
717
718 ceph osd pool set <pool-name> pg_autoscale_mode off
719
720To allow the cluster to automatically adjust the number of PGs,::
721
722 ceph osd pool set <pool-name> pg_autoscale_mode on
723
724You can also manually set the number of PGs for the pool to the
725recommended amount with::
726
727 ceph osd pool set <pool-name> pg_num <new-pg-num>
728
729Please refer to :ref:`choosing-number-of-placement-groups` and
730:ref:`pg-autoscaler` for more information.
731
732POOL_TARGET_SIZE_RATIO_OVERCOMMITTED
733____________________________________
734
735One or more pools have a ``target_size_ratio`` property set to
736estimate the expected size of the pool as a fraction of total storage,
737but the value(s) exceed the total available storage (either by
738themselves or in combination with other pools' actual usage).
739
740This is usually an indication that the ``target_size_ratio`` value for
741the pool is too large and should be reduced or set to zero with::
742
743 ceph osd pool set <pool-name> target_size_ratio 0
744
745For more information, see :ref:`specifying_pool_target_size`.
746
747POOL_TARGET_SIZE_BYTES_OVERCOMMITTED
748____________________________________
749
750One or more pools have a ``target_size_bytes`` property set to
751estimate the expected size of the pool,
752but the value(s) exceed the total available storage (either by
753themselves or in combination with other pools' actual usage).
754
755This is usually an indication that the ``target_size_bytes`` value for
756the pool is too large and should be reduced or set to zero with::
757
758 ceph osd pool set <pool-name> target_size_bytes 0
759
760For more information, see :ref:`specifying_pool_target_size`.
c07f9fc5 761
eafe8130
TL
762TOO_FEW_OSDS
763____________
764
765The number of OSDs in the cluster is below the configurable
766threshold of ``osd_pool_default_size``.
767
c07f9fc5
FG
768SMALLER_PGP_NUM
769_______________
770
771One or more pools has a ``pgp_num`` value less than ``pg_num``. This
772is normally an indication that the PG count was increased without
773also increasing the placement behavior.
774
775This is sometimes done deliberately to separate out the `split` step
776when the PG count is adjusted from the data migration that is needed
777when ``pgp_num`` is changed.
778
779This is normally resolved by setting ``pgp_num`` to match ``pg_num``,
780triggering the data migration, with::
781
782 ceph osd pool set <pool> pgp_num <pg-num-value>
783
c07f9fc5
FG
784MANY_OBJECTS_PER_PG
785___________________
786
787One or more pools has an average number of objects per PG that is
788significantly higher than the overall cluster average. The specific
789threshold is controlled by the ``mon_pg_warn_max_object_skew``
790configuration value.
791
792This is usually an indication that the pool(s) containing most of the
793data in the cluster have too few PGs, and/or that other pools that do
794not contain as much data have too many PGs. See the discussion of
795*TOO_MANY_PGS* above.
796
797The threshold can be raised to silence the health warning by adjusting
798the ``mon_pg_warn_max_object_skew`` config option on the monitors.
799
11fdf7f2 800
c07f9fc5
FG
801POOL_APP_NOT_ENABLED
802____________________
803
804A pool exists that contains one or more objects but has not been
805tagged for use by a particular application.
806
807Resolve this warning by labeling the pool for use by an application. For
808example, if the pool is used by RBD,::
809
810 rbd pool init <poolname>
811
812If the pool is being used by a custom application 'foo', you can also label
813via the low-level command::
814
815 ceph osd pool application enable foo
816
11fdf7f2 817For more information, see :ref:`associate-pool-to-application`.
c07f9fc5
FG
818
819POOL_FULL
820_________
821
822One or more pools has reached (or is very close to reaching) its
823quota. The threshold to trigger this error condition is controlled by
824the ``mon_pool_quota_crit_threshold`` configuration option.
825
826Pool quotas can be adjusted up or down (or removed) with::
827
828 ceph osd pool set-quota <pool> max_bytes <bytes>
829 ceph osd pool set-quota <pool> max_objects <objects>
830
11fdf7f2 831Setting the quota value to 0 will disable the quota.
c07f9fc5
FG
832
833POOL_NEAR_FULL
834______________
835
836One or more pools is approaching is quota. The threshold to trigger
837this warning condition is controlled by the
838``mon_pool_quota_warn_threshold`` configuration option.
839
840Pool quotas can be adjusted up or down (or removed) with::
841
842 ceph osd pool set-quota <pool> max_bytes <bytes>
843 ceph osd pool set-quota <pool> max_objects <objects>
844
845Setting the quota value to 0 will disable the quota.
846
847OBJECT_MISPLACED
848________________
849
850One or more objects in the cluster is not stored on the node the
851cluster would like it to be stored on. This is an indication that
852data migration due to some recent cluster change has not yet completed.
853
854Misplaced data is not a dangerous condition in and of itself; data
855consistency is never at risk, and old copies of objects are never
856removed until the desired number of new copies (in the desired
857locations) are present.
858
859OBJECT_UNFOUND
860______________
861
862One or more objects in the cluster cannot be found. Specifically, the
863OSDs know that a new or updated copy of an object should exist, but a
864copy of that version of the object has not been found on OSDs that are
865currently online.
866
867Read or write requests to unfound objects will block.
868
869Ideally, a down OSD can be brought back online that has the more
870recent copy of the unfound object. Candidate OSDs can be identified from the
871peering state for the PG(s) responsible for the unfound object::
872
873 ceph tell <pgid> query
874
875If the latest copy of the object is not available, the cluster can be
11fdf7f2
TL
876told to roll back to a previous version of the object. See
877:ref:`failures-osd-unfound` for more information.
c07f9fc5 878
11fdf7f2
TL
879SLOW_OPS
880________
c07f9fc5
FG
881
882One or more OSD requests is taking a long time to process. This can
883be an indication of extreme load, a slow storage device, or a software
884bug.
885
886The request queue on the OSD(s) in question can be queried with the
887following command, executed from the OSD host::
888
889 ceph daemon osd.<id> ops
890
891A summary of the slowest recent requests can be seen with::
892
893 ceph daemon osd.<id> dump_historic_ops
894
895The location of an OSD can be found with::
896
897 ceph osd find osd.<id>
898
c07f9fc5
FG
899PG_NOT_SCRUBBED
900_______________
901
902One or more PGs has not been scrubbed recently. PGs are normally
903scrubbed every ``mon_scrub_interval`` seconds, and this warning
11fdf7f2
TL
904triggers when ``mon_warn_pg_not_scrubbed_ratio`` percentage of interval has elapsed
905without a scrub since it was due.
c07f9fc5
FG
906
907PGs will not scrub if they are not flagged as *clean*, which may
908happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
909*PG_DEGRADED* above).
910
911You can manually initiate a scrub of a clean PG with::
912
913 ceph pg scrub <pgid>
914
915PG_NOT_DEEP_SCRUBBED
916____________________
917
918One or more PGs has not been deep scrubbed recently. PGs are normally
a8e16298 919scrubbed every ``osd_deep_scrub_interval`` seconds, and this warning
11fdf7f2
TL
920triggers when ``mon_warn_pg_not_deep_scrubbed_ratio`` percentage of interval has elapsed
921without a scrub since it was due.
c07f9fc5
FG
922
923PGs will not (deep) scrub if they are not flagged as *clean*, which may
924happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
925*PG_DEGRADED* above).
926
927You can manually initiate a scrub of a clean PG with::
928
929 ceph pg deep-scrub <pgid>
eafe8130
TL
930
931
932Miscellaneous
933-------------
934
935RECENT_CRASH
936____________
937
938One or more Ceph daemons has crashed recently, and the crash has not
939yet been archived (acknowledged) by the administrator. This may
940indicate a software bug, a hardware problem (e.g., a failing disk), or
941some other problem.
942
943New crashes can be listed with::
944
945 ceph crash ls-new
946
947Information about a specific crash can be examined with::
948
949 ceph crash info <crash-id>
950
951This warning can be silenced by "archiving" the crash (perhaps after
952being examined by an administrator) so that it does not generate this
953warning::
954
955 ceph crash archive <crash-id>
956
957Similarly, all new crashes can be archived with::
958
959 ceph crash archive-all
960
961Archived crashes will still be visible via ``ceph crash ls`` but not
962``ceph crash ls-new``.
963
964The time period for what "recent" means is controlled by the option
965``mgr/crash/warn_recent_interval`` (default: two weeks).
966
967These warnings can be disabled entirely with::
968
969 ceph config set mgr/crash/warn_recent_interval 0
970
971TELEMETRY_CHANGED
972_________________
973
974Telemetry has been enabled, but the contents of the telemetry report
975have changed since that time, so telemetry reports will not be sent.
976
977The Ceph developers periodically revise the telemetry feature to
978include new and useful information, or to remove information found to
979be useless or sensitive. If any new information is included in the
980report, Ceph will require the administrator to re-enable telemetry to
981ensure they have an opportunity to (re)review what information will be
982shared.
983
984To review the contents of the telemetry report,::
985
986 ceph telemetry show
987
988Note that the telemetry report consists of several optional channels
989that may be independently enabled or disabled. For more information, see
990:ref:`telemetry`.
991
992To re-enable telemetry (and make this warning go away),::
993
994 ceph telemetry on
995
996To disable telemetry (and make this warning go away),::
997
998 ceph telemetry off