]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/health-checks.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / rados / operations / health-checks.rst
CommitLineData
9f95a23c 1.. _health-checks:
c07f9fc5 2
1e59de90
TL
3===============
4 Health checks
5===============
c07f9fc5
FG
6
7Overview
8========
9
1e59de90
TL
10There is a finite set of health messages that a Ceph cluster can raise. These
11messages are known as *health checks*. Each health check has a unique
12identifier.
c07f9fc5 13
1e59de90
TL
14The identifier is a terse human-readable string -- that is, the identifier is
15readable in much the same way as a typical variable name. It is intended to
16enable tools (for example, UIs) to make sense of health checks and present them
17in a way that reflects their meaning.
c07f9fc5
FG
18
19This page lists the health checks that are raised by the monitor and manager
1e59de90 20daemons. In addition to these, you might see health checks that originate
11fdf7f2 21from MDS daemons (see :ref:`cephfs-health-messages`), and health checks
1e59de90 22that are defined by ``ceph-mgr`` python modules.
c07f9fc5
FG
23
24Definitions
25===========
26
11fdf7f2
TL
27Monitor
28-------
29
f67539c2
TL
30DAEMON_OLD_VERSION
31__________________
32
1e59de90
TL
33Warn if one or more old versions of Ceph are running on any daemons. A health
34check is raised if multiple versions are detected. This condition must exist
35for a period of time greater than ``mon_warn_older_version_delay`` (set to one
36week by default) in order for the health check to be raised. This allows most
37upgrades to proceed without the occurrence of a false warning. If the upgrade
38is paused for an extended time period, ``health mute`` can be used by running
39``ceph health mute DAEMON_OLD_VERSION --sticky``. Be sure, however, to run
40``ceph health unmute DAEMON_OLD_VERSION`` after the upgrade has finished.
f67539c2 41
11fdf7f2
TL
42MON_DOWN
43________
44
1e59de90
TL
45One or more monitor daemons are currently down. The cluster requires a majority
46(more than one-half) of the monitors to be available. When one or more monitors
47are down, clients might have a harder time forming their initial connection to
48the cluster, as they might need to try more addresses before they reach an
49operating monitor.
11fdf7f2 50
1e59de90
TL
51The down monitor daemon should be restarted as soon as possible to reduce the
52risk of a subsequent monitor failure leading to a service outage.
11fdf7f2
TL
53
54MON_CLOCK_SKEW
55______________
56
57The clocks on the hosts running the ceph-mon monitor daemons are not
1e59de90
TL
58well-synchronized. This health check is raised if the cluster detects a clock
59skew greater than ``mon_clock_drift_allowed``.
11fdf7f2 60
1e59de90 61This issue is best resolved by synchronizing the clocks by using a tool like
11fdf7f2
TL
62``ntpd`` or ``chrony``.
63
64If it is impractical to keep the clocks closely synchronized, the
1e59de90
TL
65``mon_clock_drift_allowed`` threshold can also be increased. However, this
66value must stay significantly below the ``mon_lease`` interval in order for the
67monitor cluster to function properly.
11fdf7f2
TL
68
69MON_MSGR2_NOT_ENABLED
70_____________________
71
1e59de90
TL
72The :confval:`ms_bind_msgr2` option is enabled but one or more monitors are
73not configured to bind to a v2 port in the cluster's monmap. This
74means that features specific to the msgr2 protocol (for example, encryption)
75are unavailable on some or all connections.
11fdf7f2 76
1e59de90 77In most cases this can be corrected by running the following command:
11fdf7f2 78
39ae355f
TL
79.. prompt:: bash $
80
81 ceph mon enable-msgr2
11fdf7f2 82
1e59de90
TL
83After this command is run, any monitor configured to listen on the old default
84port (6789) will continue to listen for v1 connections on 6789 and begin to
85listen for v2 connections on the new default port 3300.
11fdf7f2 86
1e59de90
TL
87If a monitor is configured to listen for v1 connections on a non-standard port
88(that is, a port other than 6789), then the monmap will need to be modified
89manually.
11fdf7f2
TL
90
91
9f95a23c
TL
92MON_DISK_LOW
93____________
94
1e59de90
TL
95One or more monitors are low on disk space. This health check is raised if the
96percentage of available space on the file system used by the monitor database
97(normally ``/var/lib/ceph/mon``) drops below the percentage value
9f95a23c
TL
98``mon_data_avail_warn`` (default: 30%).
99
1e59de90
TL
100This alert might indicate that some other process or user on the system is
101filling up the file system used by the monitor. It might also
102indicate that the monitor database is too large (see ``MON_DISK_BIG``
9f95a23c
TL
103below).
104
1e59de90
TL
105If space cannot be freed, the monitor's data directory might need to be
106moved to another storage device or file system (this relocation process must be carried out while the monitor
107daemon is not running).
9f95a23c
TL
108
109
110MON_DISK_CRIT
111_____________
112
1e59de90
TL
113One or more monitors are critically low on disk space. This health check is raised if the
114percentage of available space on the file system used by the monitor database
115(normally ``/var/lib/ceph/mon``) drops below the percentage value
116``mon_data_avail_crit`` (default: 5%). See ``MON_DISK_LOW``, above.
9f95a23c
TL
117
118MON_DISK_BIG
119____________
120
1e59de90
TL
121The database size for one or more monitors is very large. This health check is
122raised if the size of the monitor database is larger than
9f95a23c
TL
123``mon_data_size_warn`` (default: 15 GiB).
124
1e59de90
TL
125A large database is unusual, but does not necessarily indicate a problem.
126Monitor databases might grow in size when there are placement groups that have
127not reached an ``active+clean`` state in a long time.
9f95a23c 128
1e59de90
TL
129This alert might also indicate that the monitor's database is not properly
130compacting, an issue that has been observed with some older versions of leveldb
131and rocksdb. Forcing a compaction with ``ceph daemon mon.<id> compact`` might
132shrink the database's on-disk size.
9f95a23c 133
1e59de90
TL
134This alert might also indicate that the monitor has a bug that prevents it from
135pruning the cluster metadata that it stores. If the problem persists, please
136report a bug.
9f95a23c 137
1e59de90 138To adjust the warning threshold, run the following command:
39ae355f
TL
139
140.. prompt:: bash $
9f95a23c 141
39ae355f 142 ceph config set global mon_data_size_warn <size>
9f95a23c 143
1e59de90 144
c5c27e9a
TL
145AUTH_INSECURE_GLOBAL_ID_RECLAIM
146_______________________________
147
1e59de90
TL
148One or more clients or daemons that are connected to the cluster are not
149securely reclaiming their ``global_id`` (a unique number that identifies each
150entity in the cluster) when reconnecting to a monitor. The client is being
151permitted to connect anyway because the
152``auth_allow_insecure_global_id_reclaim`` option is set to ``true`` (which may
153be necessary until all Ceph clients have been upgraded) and because the
154``auth_expose_insecure_global_id_reclaim`` option is set to ``true`` (which
155allows monitors to detect clients with "insecure reclaim" sooner by forcing
156those clients to reconnect immediately after their initial authentication).
c5c27e9a 157
1e59de90
TL
158To identify which client(s) are using unpatched Ceph client code, run the
159following command:
39ae355f
TL
160
161.. prompt:: bash $
c5c27e9a 162
39ae355f 163 ceph health detail
c5c27e9a 164
1e59de90
TL
165If you collect a dump of the clients that are connected to an individual
166monitor and examine the ``global_id_status`` field in the output of the dump,
167you can see the ``global_id`` reclaim behavior of those clients. Here
168``reclaim_insecure`` means that a client is unpatched and is contributing to
169this health check. To effect a client dump, run the following command:
c5c27e9a 170
39ae355f
TL
171.. prompt:: bash $
172
173 ceph tell mon.\* sessions
c5c27e9a 174
1e59de90
TL
175We strongly recommend that all clients in the system be upgraded to a newer
176version of Ceph that correctly reclaims ``global_id`` values. After all clients
177have been updated, run the following command to stop allowing insecure
178reconnections:
39ae355f
TL
179
180.. prompt:: bash $
c5c27e9a 181
39ae355f 182 ceph config set mon auth_allow_insecure_global_id_reclaim false
c5c27e9a 183
1e59de90
TL
184If it is impractical to upgrade all clients immediately, you can temporarily
185silence this alert by running the following command:
c5c27e9a 186
39ae355f 187.. prompt:: bash $
c5c27e9a 188
39ae355f 189 ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week
c5c27e9a 190
1e59de90
TL
191Although we do NOT recommend doing so, you can also disable this alert
192indefinitely by running the following command:
39ae355f
TL
193
194.. prompt:: bash $
195
196 ceph config set mon mon_warn_on_insecure_global_id_reclaim false
c5c27e9a
TL
197
198AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED
199_______________________________________
200
1e59de90
TL
201Ceph is currently configured to allow clients that reconnect to monitors using
202an insecure process to reclaim their previous ``global_id``. Such reclaiming is
203allowed because, by default, ``auth_allow_insecure_global_id_reclaim`` is set
204to ``true``. It might be necessary to leave this setting enabled while existing
205Ceph clients are upgraded to newer versions of Ceph that correctly and securely
206reclaim their ``global_id``.
c5c27e9a 207
1e59de90
TL
208If the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health check has not also been
209raised and if the ``auth_expose_insecure_global_id_reclaim`` setting has not
210been disabled (it is enabled by default), then there are currently no clients
211connected that need to be upgraded. In that case, it is safe to disable
212``insecure global_id reclaim`` by running the following command:
39ae355f
TL
213
214.. prompt:: bash $
c5c27e9a 215
39ae355f 216 ceph config set mon auth_allow_insecure_global_id_reclaim false
c5c27e9a 217
1e59de90
TL
218On the other hand, if there are still clients that need to be upgraded, then
219this alert can be temporarily silenced by running the following command:
c5c27e9a 220
39ae355f
TL
221.. prompt:: bash $
222
223 ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week
c5c27e9a 224
1e59de90
TL
225Although we do NOT recommend doing so, you can also disable this alert indefinitely
226by running the following command:
39ae355f
TL
227
228.. prompt:: bash $
c5c27e9a 229
39ae355f 230 ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
c5c27e9a 231
11fdf7f2
TL
232
233Manager
234-------
235
9f95a23c
TL
236MGR_DOWN
237________
238
1e59de90
TL
239All manager daemons are currently down. The cluster should normally have at
240least one running manager (``ceph-mgr``) daemon. If no manager daemon is
241running, the cluster's ability to monitor itself will be compromised, and parts
242of the management API will become unavailable (for example, the dashboard will
243not work, and most CLI commands that report metrics or runtime state will
244block). However, the cluster will still be able to perform all I/O operations
245and to recover from failures.
9f95a23c 246
1e59de90
TL
247The "down" manager daemon should be restarted as soon as possible to ensure
248that the cluster can be monitored (for example, so that the ``ceph -s``
249information is up to date, or so that metrics can be scraped by Prometheus).
9f95a23c
TL
250
251
11fdf7f2
TL
252MGR_MODULE_DEPENDENCY
253_____________________
254
1e59de90
TL
255An enabled manager module is failing its dependency check. This health check
256typically comes with an explanatory message from the module about the problem.
11fdf7f2 257
1e59de90
TL
258For example, a module might report that a required package is not installed: in
259this case, you should install the required package and restart your manager
260daemons.
11fdf7f2 261
1e59de90
TL
262This health check is applied only to enabled modules. If a module is not
263enabled, you can see whether it is reporting dependency issues in the output of
264`ceph module ls`.
11fdf7f2
TL
265
266
267MGR_MODULE_ERROR
268________________
269
1e59de90
TL
270A manager module has experienced an unexpected error. Typically, this means
271that an unhandled exception was raised from the module's `serve` function. The
272human-readable description of the error might be obscurely worded if the
273exception did not provide a useful description of itself.
11fdf7f2 274
1e59de90 275This health check might indicate a bug: please open a Ceph bug report if you
11fdf7f2
TL
276think you have encountered a bug.
277
1e59de90
TL
278However, if you believe the error is transient, you may restart your manager
279daemon(s) or use ``ceph mgr fail`` on the active daemon in order to force
280failover to another daemon.
c07f9fc5
FG
281
282OSDs
283----
284
285OSD_DOWN
286________
287
1e59de90
TL
288One or more OSDs are marked "down". The ceph-osd daemon might have been
289stopped, or peer OSDs might be unable to reach the OSD over the network.
290Common causes include a stopped or crashed daemon, a "down" host, or a network
291outage.
c07f9fc5 292
1e59de90
TL
293Verify that the host is healthy, the daemon is started, and the network is
294functioning. If the daemon has crashed, the daemon log file
295(``/var/log/ceph/ceph-osd.*``) might contain debugging information.
c07f9fc5
FG
296
297OSD_<crush type>_DOWN
298_____________________
299
1e59de90 300(for example, OSD_HOST_DOWN, OSD_ROOT_DOWN)
c07f9fc5 301
1e59de90
TL
302All of the OSDs within a particular CRUSH subtree are marked "down" (for
303example, all OSDs on a host).
c07f9fc5
FG
304
305OSD_ORPHAN
306__________
307
1e59de90 308An OSD is referenced in the CRUSH map hierarchy, but does not exist.
c07f9fc5 309
1e59de90 310To remove the OSD from the CRUSH map hierarchy, run the following command:
39ae355f
TL
311
312.. prompt:: bash $
c07f9fc5 313
39ae355f 314 ceph osd crush rm osd.<id>
c07f9fc5
FG
315
316OSD_OUT_OF_ORDER_FULL
317_____________________
318
1e59de90
TL
319The utilization thresholds for `nearfull`, `backfillfull`, `full`, and/or
320`failsafe_full` are not ascending. In particular, the following pattern is
321expected: `nearfull < backfillfull`, `backfillfull < full`, and `full <
c07f9fc5
FG
322failsafe_full`.
323
1e59de90 324To adjust these utilization thresholds, run the following commands:
c07f9fc5 325
39ae355f
TL
326.. prompt:: bash $
327
328 ceph osd set-nearfull-ratio <ratio>
329 ceph osd set-backfillfull-ratio <ratio>
330 ceph osd set-full-ratio <ratio>
c07f9fc5
FG
331
332
333OSD_FULL
334________
335
1e59de90
TL
336One or more OSDs have exceeded the `full` threshold and are preventing the
337cluster from servicing writes.
c07f9fc5 338
1e59de90 339To check utilization by pool, run the following command:
39ae355f
TL
340
341.. prompt:: bash $
342
343 ceph df
c07f9fc5 344
1e59de90 345To see the currently defined `full` ratio, run the following command:
c07f9fc5 346
39ae355f 347.. prompt:: bash $
c07f9fc5 348
39ae355f 349 ceph osd dump | grep full_ratio
c07f9fc5
FG
350
351A short-term workaround to restore write availability is to raise the full
1e59de90 352threshold by a small amount. To do so, run the following command:
c07f9fc5 353
39ae355f
TL
354.. prompt:: bash $
355
356 ceph osd set-full-ratio <ratio>
c07f9fc5 357
1e59de90
TL
358Additional OSDs should be deployed in order to add new storage to the cluster,
359or existing data should be deleted in order to free up space in the cluster.
11fdf7f2 360
c07f9fc5
FG
361OSD_BACKFILLFULL
362________________
363
1e59de90
TL
364One or more OSDs have exceeded the `backfillfull` threshold or *would* exceed
365it if the currently-mapped backfills were to finish, which will prevent data
366from rebalancing to this OSD. This alert is an early warning that
367rebalancing might be unable to complete and that the cluster is approaching
368full.
39ae355f 369
1e59de90 370To check utilization by pool, run the following command:
c07f9fc5 371
39ae355f 372.. prompt:: bash $
c07f9fc5 373
39ae355f 374 ceph df
c07f9fc5
FG
375
376OSD_NEARFULL
377____________
378
1e59de90 379One or more OSDs have exceeded the `nearfull` threshold. This alert is an early
c07f9fc5
FG
380warning that the cluster is approaching full.
381
1e59de90 382To check utilization by pool, run the following command:
39ae355f
TL
383
384.. prompt:: bash $
c07f9fc5 385
39ae355f 386 ceph df
c07f9fc5
FG
387
388OSDMAP_FLAGS
389____________
390
1e59de90 391One or more cluster flags of interest have been set. These flags include:
c07f9fc5 392
81eedcae 393* *full* - the cluster is flagged as full and cannot serve writes
1e59de90 394* *pauserd*, *pausewr* - there are paused reads or writes
c07f9fc5 395* *noup* - OSDs are not allowed to start
1e59de90
TL
396* *nodown* - OSD failure reports are being ignored, and that means that the
397 monitors will not mark OSDs "down"
398* *noin* - OSDs that were previously marked ``out`` are not being marked
399 back ``in`` when they start
400* *noout* - "down" OSDs are not automatically being marked ``out`` after the
c07f9fc5
FG
401 configured interval
402* *nobackfill*, *norecover*, *norebalance* - recovery or data
403 rebalancing is suspended
404* *noscrub*, *nodeep_scrub* - scrubbing is disabled
1e59de90 405* *notieragent* - cache-tiering activity is suspended
c07f9fc5 406
1e59de90
TL
407With the exception of *full*, these flags can be set or cleared by running the
408following commands:
c07f9fc5 409
39ae355f
TL
410.. prompt:: bash $
411
412 ceph osd set <flag>
413 ceph osd unset <flag>
11fdf7f2 414
c07f9fc5
FG
415OSD_FLAGS
416_________
417
1e59de90 418One or more OSDs or CRUSH {nodes,device classes} have a flag of interest set.
81eedcae 419These flags include:
c07f9fc5 420
81eedcae
TL
421* *noup*: these OSDs are not allowed to start
422* *nodown*: failure reports for these OSDs will be ignored
1e59de90
TL
423* *noin*: if these OSDs were previously marked ``out`` automatically
424 after a failure, they will not be marked ``in`` when they start
425* *noout*: if these OSDs are "down" they will not automatically be marked
426 ``out`` after the configured interval
c07f9fc5 427
1e59de90 428To set and clear these flags in batch, run the following commands:
39ae355f
TL
429
430.. prompt:: bash $
431
432 ceph osd set-group <flags> <who>
433 ceph osd unset-group <flags> <who>
c07f9fc5 434
39ae355f 435For example:
c07f9fc5 436
39ae355f 437.. prompt:: bash $
c07f9fc5 438
39ae355f
TL
439 ceph osd set-group noup,noout osd.0 osd.1
440 ceph osd unset-group noup,noout osd.0 osd.1
441 ceph osd set-group noup,noout host-foo
442 ceph osd unset-group noup,noout host-foo
443 ceph osd set-group noup,noout class-hdd
444 ceph osd unset-group noup,noout class-hdd
c07f9fc5
FG
445
446OLD_CRUSH_TUNABLES
447__________________
448
1e59de90
TL
449The CRUSH map is using very old settings and should be updated. The oldest set
450of tunables that can be used (that is, the oldest client version that can
451connect to the cluster) without raising this health check is determined by the
452``mon_crush_min_required_version`` config option. For more information, see
453:ref:`crush-map-tunables`.
c07f9fc5
FG
454
455OLD_CRUSH_STRAW_CALC_VERSION
456____________________________
457
1e59de90
TL
458The CRUSH map is using an older, non-optimal method of calculating intermediate
459weight values for ``straw`` buckets.
c07f9fc5 460
1e59de90
TL
461The CRUSH map should be updated to use the newer method (that is:
462``straw_calc_version=1``). For more information, see :ref:`crush-map-tunables`.
c07f9fc5
FG
463
464CACHE_POOL_NO_HIT_SET
465_____________________
466
1e59de90
TL
467One or more cache pools are not configured with a *hit set* to track
468utilization. This issue prevents the tiering agent from identifying cold
469objects that are to be flushed and evicted from the cache.
c07f9fc5 470
1e59de90 471To configure hit sets on the cache pool, run the following commands:
c07f9fc5 472
39ae355f
TL
473.. prompt:: bash $
474
475 ceph osd pool set <poolname> hit_set_type <type>
476 ceph osd pool set <poolname> hit_set_period <period-in-seconds>
477 ceph osd pool set <poolname> hit_set_count <number-of-hitsets>
478 ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate>
c07f9fc5
FG
479
480OSD_NO_SORTBITWISE
481__________________
482
1e59de90 483No pre-Luminous v12.y.z OSDs are running, but the ``sortbitwise`` flag has not
c07f9fc5
FG
484been set.
485
1e59de90
TL
486The ``sortbitwise`` flag must be set in order for OSDs running Luminous v12.y.z
487or newer to start. To safely set the flag, run the following command:
39ae355f
TL
488
489.. prompt:: bash $
c07f9fc5 490
39ae355f 491 ceph osd set sortbitwise
c07f9fc5 492
20effc67
TL
493OSD_FILESTORE
494__________________
495
1e59de90
TL
496Warn if OSDs are running Filestore. The Filestore OSD back end has been
497deprecated; the BlueStore back end has been the default object store since the
498Ceph Luminous release.
20effc67 499
1e59de90
TL
500The 'mclock_scheduler' is not supported for Filestore OSDs. For this reason,
501the default 'osd_op_queue' is set to 'wpq' for Filestore OSDs and is enforced
20effc67
TL
502even if the user attempts to change it.
503
1e59de90 504
20effc67 505
39ae355f 506.. prompt:: bash $
20effc67 507
39ae355f 508 ceph report | jq -c '."osd_metadata" | .[] | select(.osd_objectstore | contains("filestore")) | {id, osd_objectstore}'
20effc67 509
1e59de90
TL
510**In order to upgrade to Reef or a later release, you must first migrate any
511Filestore OSDs to BlueStore.**
512
513If you are upgrading a pre-Reef release to Reef or later, but it is not
514feasible to migrate Filestore OSDs to BlueStore immediately, you can
515temporarily silence this alert by running the following command:
39ae355f
TL
516
517.. prompt:: bash $
518
519 ceph health mute OSD_FILESTORE
520
1e59de90
TL
521Since this migration can take a considerable amount of time to complete, we
522recommend that you begin the process well in advance of any update to Reef or
523to later releases.
20effc67 524
c07f9fc5
FG
525POOL_FULL
526_________
527
1e59de90 528One or more pools have reached their quota and are no longer allowing writes.
c07f9fc5 529
1e59de90 530To see pool quotas and utilization, run the following command:
c07f9fc5 531
39ae355f 532.. prompt:: bash $
c07f9fc5 533
39ae355f 534 ceph df detail
c07f9fc5 535
1e59de90 536If you opt to raise the pool quota, run the following commands:
39ae355f
TL
537
538.. prompt:: bash $
539
540 ceph osd pool set-quota <poolname> max_objects <num-objects>
541 ceph osd pool set-quota <poolname> max_bytes <num-bytes>
c07f9fc5 542
1e59de90 543If not, delete some existing data to reduce utilization.
c07f9fc5 544
81eedcae
TL
545BLUEFS_SPILLOVER
546________________
547
1e59de90
TL
548One or more OSDs that use the BlueStore back end have been allocated `db`
549partitions (that is, storage space for metadata, normally on a faster device),
550but because that space has been filled, metadata has "spilled over" onto the
551slow device. This is not necessarily an error condition or even unexpected
552behavior, but may result in degraded performance. If the administrator had
553expected that all metadata would fit on the faster device, this alert indicates
81eedcae
TL
554that not enough space was provided.
555
1e59de90 556To disable this alert on all OSDs, run the following command:
39ae355f
TL
557
558.. prompt:: bash $
81eedcae 559
39ae355f 560 ceph config set osd bluestore_warn_on_bluefs_spillover false
81eedcae 561
1e59de90
TL
562Alternatively, to disable the alert on a specific OSD, run the following
563command:
81eedcae 564
39ae355f
TL
565.. prompt:: bash $
566
567 ceph config set osd.123 bluestore_warn_on_bluefs_spillover false
81eedcae 568
1e59de90
TL
569To secure more metadata space, you can destroy and reprovision the OSD in
570question. This process involves data migration and recovery.
81eedcae 571
1e59de90
TL
572It might also be possible to expand the LVM logical volume that backs the `db`
573storage. If the underlying LV has been expanded, you must stop the OSD daemon
574and inform BlueFS of the device-size change by running the following command:
39ae355f
TL
575
576.. prompt:: bash $
81eedcae 577
39ae355f 578 ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID
81eedcae 579
eafe8130
TL
580BLUEFS_AVAILABLE_SPACE
581______________________
582
1e59de90 583To see how much space is free for BlueFS, run the following command:
39ae355f
TL
584
585.. prompt:: bash $
eafe8130 586
39ae355f 587 ceph daemon osd.123 bluestore bluefs available
eafe8130 588
1e59de90
TL
589This will output up to three values: ``BDEV_DB free``, ``BDEV_SLOW free``, and
590``available_from_bluestore``. ``BDEV_DB`` and ``BDEV_SLOW`` report the amount
591of space that has been acquired by BlueFS and is now considered free. The value
592``available_from_bluestore`` indicates the ability of BlueStore to relinquish
593more space to BlueFS. It is normal for this value to differ from the amount of
594BlueStore free space, because the BlueFS allocation unit is typically larger
595than the BlueStore allocation unit. This means that only part of the BlueStore
596free space will be available for BlueFS.
eafe8130
TL
597
598BLUEFS_LOW_SPACE
599_________________
600
1e59de90
TL
601If BlueFS is running low on available free space and there is not much free
602space available from BlueStore (in other words, `available_from_bluestore` has
603a low value), consider reducing the BlueFS allocation unit size. To simulate
604available space when the allocation unit is different, run the following
605command:
eafe8130 606
39ae355f
TL
607.. prompt:: bash $
608
609 ceph daemon osd.123 bluestore bluefs available <alloc-unit-size>
eafe8130
TL
610
611BLUESTORE_FRAGMENTATION
612_______________________
613
1e59de90
TL
614As BlueStore operates, the free space on the underlying storage will become
615fragmented. This is normal and unavoidable, but excessive fragmentation causes
616slowdown. To inspect BlueStore fragmentation, run the following command:
39ae355f
TL
617
618.. prompt:: bash $
eafe8130 619
39ae355f 620 ceph daemon osd.123 bluestore allocator score block
eafe8130 621
1e59de90 622The fragmentation score is given in a [0-1] range.
eafe8130
TL
623[0.0 .. 0.4] tiny fragmentation
624[0.4 .. 0.7] small, acceptable fragmentation
625[0.7 .. 0.9] considerable, but safe fragmentation
1e59de90 626[0.9 .. 1.0] severe fragmentation, might impact BlueFS's ability to get space from BlueStore
eafe8130 627
1e59de90 628To see a detailed report of free fragments, run the following command:
39ae355f
TL
629
630.. prompt:: bash $
eafe8130 631
39ae355f 632 ceph daemon osd.123 bluestore allocator dump block
eafe8130 633
1e59de90
TL
634For OSD processes that are not currently running, fragmentation can be
635inspected with `ceph-bluestore-tool`. To see the fragmentation score, run the
636following command:
eafe8130 637
39ae355f 638.. prompt:: bash $
eafe8130 639
39ae355f 640 ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score
eafe8130 641
1e59de90 642To dump detailed free chunks, run the following command:
39ae355f
TL
643
644.. prompt:: bash $
645
646 ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump
eafe8130 647
81eedcae
TL
648BLUESTORE_LEGACY_STATFS
649_______________________
650
1e59de90
TL
651One or more OSDs have BlueStore volumes that were created prior to the
652Nautilus release. (In Nautilus, BlueStore tracks its internal usage
653statistics on a granular, per-pool basis.)
654
655If *all* OSDs
656are older than Nautilus, this means that the per-pool metrics are
657simply unavailable. But if there is a mixture of pre-Nautilus and
81eedcae 658post-Nautilus OSDs, the cluster usage statistics reported by ``ceph
1e59de90 659df`` will be inaccurate.
81eedcae 660
1e59de90
TL
661The old OSDs can be updated to use the new usage-tracking scheme by stopping
662each OSD, running a repair operation, and then restarting the OSD. For example,
663to update ``osd.123``, run the following commands:
39ae355f
TL
664
665.. prompt:: bash $
81eedcae 666
39ae355f
TL
667 systemctl stop ceph-osd@123
668 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
669 systemctl start ceph-osd@123
81eedcae 670
1e59de90 671To disable this alert, run the following command:
81eedcae 672
39ae355f
TL
673.. prompt:: bash $
674
675 ceph config set global bluestore_warn_on_legacy_statfs false
81eedcae 676
9f95a23c
TL
677BLUESTORE_NO_PER_POOL_OMAP
678__________________________
679
1e59de90
TL
680One or more OSDs have volumes that were created prior to the Octopus release.
681(In Octopus and later releases, BlueStore tracks omap space utilization by
682pool.)
683
684If there are any BlueStore OSDs that do not have the new tracking enabled, the
685cluster will report an approximate value for per-pool omap usage based on the
686most recent deep scrub.
9f95a23c 687
1e59de90
TL
688The OSDs can be updated to track by pool by stopping each OSD, running a repair
689operation, and then restarting the OSD. For example, to update ``osd.123``, run
690the following commands:
39ae355f
TL
691
692.. prompt:: bash $
693
694 systemctl stop ceph-osd@123
695 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
696 systemctl start ceph-osd@123
9f95a23c 697
1e59de90 698To disable this alert, run the following command:
9f95a23c 699
39ae355f 700.. prompt:: bash $
9f95a23c 701
39ae355f 702 ceph config set global bluestore_warn_on_no_per_pool_omap false
9f95a23c 703
f67539c2
TL
704BLUESTORE_NO_PER_PG_OMAP
705__________________________
706
1e59de90
TL
707One or more OSDs have volumes that were created prior to Pacific. (In Pacific
708and later releases Bluestore tracks omap space utilitzation by Placement Group
709(PG).)
f67539c2 710
1e59de90
TL
711Per-PG omap allows faster PG removal when PGs migrate.
712
713The older OSDs can be updated to track by PG by stopping each OSD, running a
714repair operation, and then restarting the OSD. For example, to update
715``osd.123``, run the following commands:
f67539c2 716
39ae355f 717.. prompt:: bash $
f67539c2 718
39ae355f
TL
719 systemctl stop ceph-osd@123
720 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
721 systemctl start ceph-osd@123
f67539c2 722
1e59de90 723To disable this alert, run the following command:
39ae355f
TL
724
725.. prompt:: bash $
726
727 ceph config set global bluestore_warn_on_no_per_pg_omap false
f67539c2 728
81eedcae
TL
729
730BLUESTORE_DISK_SIZE_MISMATCH
731____________________________
732
1e59de90
TL
733One or more BlueStore OSDs have an internal inconsistency between the size of
734the physical device and the metadata that tracks its size. This inconsistency
735can lead to the OSD(s) crashing in the future.
81eedcae 736
1e59de90
TL
737The OSDs that have this inconsistency should be destroyed and reprovisioned. Be
738very careful to execute this procedure on only one OSD at a time, so as to
739minimize the risk of losing any data. To execute this procedure, where ``$N``
740is the OSD that has the inconsistency, run the following commands:
39ae355f
TL
741
742.. prompt:: bash $
81eedcae 743
39ae355f
TL
744 ceph osd out osd.$N
745 while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done
746 ceph osd destroy osd.$N
747 ceph-volume lvm zap /path/to/device
748 ceph-volume lvm create --osd-id $N --data /path/to/device
81eedcae 749
1e59de90
TL
750.. note::
751
752 Wait for this recovery procedure to completely on one OSD before running it
753 on the next.
754
9f95a23c
TL
755BLUESTORE_NO_COMPRESSION
756________________________
757
1e59de90
TL
758One or more OSDs is unable to load a BlueStore compression plugin. This issue
759might be caused by a broken installation, in which the ``ceph-osd`` binary does
760not match the compression plugins. Or it might be caused by a recent upgrade in
761which the ``ceph-osd`` daemon was not restarted.
9f95a23c 762
1e59de90
TL
763To resolve this issue, verify that all of the packages on the host that is
764running the affected OSD(s) are correctly installed and that the OSD daemon(s)
765have been restarted. If the problem persists, check the OSD log for information
766about the source of the problem.
9f95a23c 767
f67539c2
TL
768BLUESTORE_SPURIOUS_READ_ERRORS
769______________________________
770
1e59de90
TL
771One or more BlueStore OSDs detect spurious read errors on the main device.
772BlueStore has recovered from these errors by retrying disk reads. This alert
773might indicate issues with underlying hardware, issues with the I/O subsystem,
774or something similar. In theory, such issues can cause permanent data
775corruption. Some observations on the root cause of spurious read errors can be
776found here: https://tracker.ceph.com/issues/22464
f67539c2 777
1e59de90
TL
778This alert does not require an immediate response, but the affected host might
779need additional attention: for example, upgrading the host to the latest
780OS/kernel versions and implementing hardware-resource-utilization monitoring.
f67539c2 781
1e59de90 782To disable this alert on all OSDs, run the following command:
39ae355f
TL
783
784.. prompt:: bash $
785
786 ceph config set osd bluestore_warn_on_spurious_read_errors false
f67539c2 787
1e59de90 788Or, to disable this alert on a specific OSD, run the following command:
f67539c2 789
39ae355f 790.. prompt:: bash $
f67539c2 791
39ae355f 792 ceph config set osd.123 bluestore_warn_on_spurious_read_errors false
9f95a23c 793
11fdf7f2
TL
794Device health
795-------------
796
797DEVICE_HEALTH
798_____________
799
1e59de90
TL
800One or more OSD devices are expected to fail soon, where the warning threshold
801is determined by the ``mgr/devicehealth/warn_threshold`` config option.
11fdf7f2 802
1e59de90
TL
803Because this alert applies only to OSDs that are currently marked ``in``, the
804appropriate response to this expected failure is (1) to mark the OSD ``out`` so
805that data is migrated off of the OSD, and then (2) to remove the hardware from
806the system. Note that this marking ``out`` is normally done automatically if
807``mgr/devicehealth/self_heal`` is enabled (as determined by
808``mgr/devicehealth/mark_out_threshold``).
11fdf7f2 809
1e59de90 810To check device health, run the following command:
11fdf7f2 811
39ae355f
TL
812.. prompt:: bash $
813
814 ceph device info <device-id>
11fdf7f2 815
1e59de90
TL
816Device life expectancy is set either by a prediction model that the mgr runs or
817by an external tool that is activated by running the following command:
39ae355f
TL
818
819.. prompt:: bash $
11fdf7f2 820
39ae355f 821 ceph device set-life-expectancy <device-id> <from> <to>
11fdf7f2 822
1e59de90
TL
823You can change the stored life expectancy manually, but such a change usually
824doesn't accomplish anything. The reason for this is that whichever tool
825originally set the stored life expectancy will probably undo your change by
826setting it again, and a change to the stored value does not affect the actual
827health of the hardware device.
11fdf7f2
TL
828
829DEVICE_HEALTH_IN_USE
830____________________
831
1e59de90
TL
832One or more devices (that is, OSDs) are expected to fail soon and have been
833marked ``out`` of the cluster (as controlled by
834``mgr/devicehealth/mark_out_threshold``), but they are still participating in
835one or more Placement Groups. This might be because the OSD(s) were marked
836``out`` only recently and data is still migrating, or because data cannot be
837migrated off of the OSD(s) for some reason (for example, the cluster is nearly
838full, or the CRUSH hierarchy is structured so that there isn't another suitable
839OSD to migrate the data to).
11fdf7f2 840
1e59de90
TL
841This message can be silenced by disabling self-heal behavior (that is, setting
842``mgr/devicehealth/self_heal`` to ``false``), by adjusting
843``mgr/devicehealth/mark_out_threshold``, or by addressing whichever condition
844is preventing data from being migrated off of the ailing OSD(s).
845
846.. _rados_health_checks_device_health_toomany:
11fdf7f2
TL
847
848DEVICE_HEALTH_TOOMANY
849_____________________
850
1e59de90
TL
851Too many devices (that is, OSDs) are expected to fail soon, and because
852``mgr/devicehealth/self_heal`` behavior is enabled, marking ``out`` all of the
853ailing OSDs would exceed the cluster's ``mon_osd_min_in_ratio`` ratio. This
854ratio prevents a cascade of too many OSDs from being automatically marked
855``out``.
11fdf7f2 856
1e59de90
TL
857You should promptly add new OSDs to the cluster to prevent data loss, or
858incrementally replace the failing OSDs.
11fdf7f2 859
1e59de90
TL
860Alternatively, you can silence this health check by adjusting options including
861``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``. Be
862warned, however, that this will increase the likelihood of unrecoverable data
863loss.
11fdf7f2
TL
864
865
c07f9fc5 866Data health (pools & placement groups)
d2e6a577 867--------------------------------------
c07f9fc5
FG
868
869PG_AVAILABILITY
870_______________
871
1e59de90
TL
872Data availability is reduced. In other words, the cluster is unable to service
873potential read or write requests for at least some data in the cluster. More
874precisely, one or more Placement Groups (PGs) are in a state that does not
875allow I/O requests to be serviced. Any of the following PG states are
876problematic if they do not clear quickly: *peering*, *stale*, *incomplete*, and
877the lack of *active*.
c07f9fc5 878
1e59de90
TL
879For detailed information about which PGs are affected, run the following
880command:
39ae355f
TL
881
882.. prompt:: bash $
c07f9fc5 883
39ae355f 884 ceph health detail
c07f9fc5 885
1e59de90
TL
886In most cases, the root cause of this issue is that one or more OSDs are
887currently ``down``: see ``OSD_DOWN`` above.
c07f9fc5 888
1e59de90 889To see the state of a specific problematic PG, run the following command:
c07f9fc5 890
39ae355f
TL
891.. prompt:: bash $
892
893 ceph tell <pgid> query
c07f9fc5
FG
894
895PG_DEGRADED
896___________
897
1e59de90
TL
898Data redundancy is reduced for some data: in other words, the cluster does not
899have the desired number of replicas for all data (in the case of replicated
900pools) or erasure code fragments (in the case of erasure-coded pools). More
901precisely, one or more Placement Groups (PGs):
c07f9fc5 902
1e59de90
TL
903* have the *degraded* or *undersized* flag set, which means that there are not
904 enough instances of that PG in the cluster; or
905* have not had the *clean* state set for a long time.
c07f9fc5 906
1e59de90
TL
907For detailed information about which PGs are affected, run the following
908command:
39ae355f
TL
909
910.. prompt:: bash $
c07f9fc5 911
39ae355f 912 ceph health detail
c07f9fc5 913
1e59de90
TL
914In most cases, the root cause of this issue is that one or more OSDs are
915currently "down": see ``OSD_DOWN`` above.
c07f9fc5 916
1e59de90 917To see the state of a specific problematic PG, run the following command:
39ae355f
TL
918
919.. prompt:: bash $
c07f9fc5 920
39ae355f 921 ceph tell <pgid> query
c07f9fc5
FG
922
923
eafe8130
TL
924PG_RECOVERY_FULL
925________________
926
1e59de90
TL
927Data redundancy might be reduced or even put at risk for some data due to a
928lack of free space in the cluster. More precisely, one or more Placement Groups
929have the *recovery_toofull* flag set, which means that the cluster is unable to
930migrate or recover data because one or more OSDs are above the ``full``
931threshold.
eafe8130 932
1e59de90 933For steps to resolve this condition, see *OSD_FULL* above.
eafe8130
TL
934
935PG_BACKFILL_FULL
c07f9fc5
FG
936________________
937
1e59de90
TL
938Data redundancy might be reduced or even put at risk for some data due to a
939lack of free space in the cluster. More precisely, one or more Placement Groups
940have the *backfill_toofull* flag set, which means that the cluster is unable to
941migrate or recover data because one or more OSDs are above the ``backfillfull``
942threshold.
c07f9fc5 943
1e59de90 944For steps to resolve this condition, see *OSD_BACKFILLFULL* above.
c07f9fc5
FG
945
946PG_DAMAGED
947__________
948
1e59de90
TL
949Data scrubbing has discovered problems with data consistency in the cluster.
950More precisely, one or more Placement Groups either (1) have the *inconsistent*
951or ``snaptrim_error`` flag set, which indicates that an earlier data scrub
952operation found a problem, or (2) have the *repair* flag set, which means that
953a repair for such an inconsistency is currently in progress.
c07f9fc5 954
1e59de90 955For more information, see :doc:`pg-repair`.
c07f9fc5
FG
956
957OSD_SCRUB_ERRORS
958________________
959
1e59de90 960Recent OSD scrubs have discovered inconsistencies. This alert is generally
11fdf7f2 961paired with *PG_DAMAGED* (see above).
c07f9fc5 962
1e59de90 963For more information, see :doc:`pg-repair`.
c07f9fc5 964
f6b5b4d7
TL
965OSD_TOO_MANY_REPAIRS
966____________________
967
1e59de90
TL
968The count of read repairs has exceeded the config value threshold
969``mon_osd_warn_num_repaired`` (default: ``10``). Because scrub handles errors
970only for data at rest, and because any read error that occurs when another
971replica is available will be repaired immediately so that the client can get
972the object data, there might exist failing disks that are not registering any
973scrub errors. This repair count is maintained as a way of identifying any such
974failing disks.
975
f6b5b4d7 976
11fdf7f2
TL
977LARGE_OMAP_OBJECTS
978__________________
979
1e59de90
TL
980One or more pools contain large omap objects, as determined by
981``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for the number of
982keys to determine what is considered a large omap object) or
983``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for the
984summed size in bytes of all key values to determine what is considered a large
985omap object) or both. To find more information on object name, key count, and
986size in bytes, search the cluster log for 'Large omap object found'. This issue
987can be caused by RGW-bucket index objects that do not have automatic resharding
988enabled. For more information on resharding, see :ref:`RGW Dynamic Bucket Index
989Resharding <rgw_dynamic_bucket_index_resharding>`.
11fdf7f2 990
1e59de90 991To adjust the thresholds mentioned above, run the following commands:
11fdf7f2 992
39ae355f
TL
993.. prompt:: bash $
994
995 ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys>
996 ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes>
11fdf7f2 997
c07f9fc5
FG
998CACHE_POOL_NEAR_FULL
999____________________
1000
1e59de90
TL
1001A cache-tier pool is nearly full, as determined by the ``target_max_bytes`` and
1002``target_max_objects`` properties of the cache pool. Once the pool reaches the
1003target threshold, write requests to the pool might block while data is flushed
1004and evicted from the cache. This state normally leads to very high latencies
1005and poor performance.
c07f9fc5 1006
1e59de90 1007To adjust the cache pool's target size, run the following commands:
39ae355f
TL
1008
1009.. prompt:: bash $
c07f9fc5 1010
39ae355f
TL
1011 ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
1012 ceph osd pool set <cache-pool-name> target_max_objects <objects>
c07f9fc5 1013
1e59de90
TL
1014There might be other reasons that normal cache flush and evict activity are
1015throttled: for example, reduced availability of the base tier, reduced
1016performance of the base tier, or overall cluster load.
c07f9fc5
FG
1017
1018TOO_FEW_PGS
1019___________
1020
1e59de90
TL
1021The number of Placement Groups (PGs) that are in use in the cluster is below
1022the configurable threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can
1023lead to suboptimal distribution and suboptimal balance of data across the OSDs
1024in the cluster, and a reduction of overall performance.
c07f9fc5 1025
1e59de90 1026If data pools have not yet been created, this condition is expected.
c07f9fc5 1027
1e59de90
TL
1028To address this issue, you can increase the PG count for existing pools or
1029create new pools. For more information, see
1030:ref:`choosing-number-of-placement-groups`.
11fdf7f2 1031
92f5a8d4
TL
1032POOL_PG_NUM_NOT_POWER_OF_TWO
1033____________________________
1034
1e59de90
TL
1035One or more pools have a ``pg_num`` value that is not a power of two. Although
1036this is not strictly incorrect, it does lead to a less balanced distribution of
1037data because some Placement Groups will have roughly twice as much data as
1038others have.
92f5a8d4 1039
1e59de90
TL
1040This is easily corrected by setting the ``pg_num`` value for the affected
1041pool(s) to a nearby power of two. To do so, run the following command:
39ae355f
TL
1042
1043.. prompt:: bash $
92f5a8d4 1044
39ae355f 1045 ceph osd pool set <pool-name> pg_num <value>
92f5a8d4 1046
1e59de90 1047To disable this health check, run the following command:
92f5a8d4 1048
39ae355f
TL
1049.. prompt:: bash $
1050
1051 ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
92f5a8d4 1052
11fdf7f2
TL
1053POOL_TOO_FEW_PGS
1054________________
1055
1e59de90
TL
1056One or more pools should probably have more Placement Groups (PGs), given the
1057amount of data that is currently stored in the pool. This issue can lead to
1058suboptimal distribution and suboptimal balance of data across the OSDs in the
1059cluster, and a reduction of overall performance. This alert is raised only if
1060the ``pg_autoscale_mode`` property on the pool is set to ``warn``.
11fdf7f2 1061
1e59de90
TL
1062To disable the alert, entirely disable auto-scaling of PGs for the pool by
1063running the following command:
39ae355f
TL
1064
1065.. prompt:: bash $
1066
1067 ceph osd pool set <pool-name> pg_autoscale_mode off
11fdf7f2 1068
1e59de90
TL
1069To allow the cluster to automatically adjust the number of PGs for the pool,
1070run the following command:
11fdf7f2 1071
39ae355f 1072.. prompt:: bash $
11fdf7f2 1073
39ae355f 1074 ceph osd pool set <pool-name> pg_autoscale_mode on
11fdf7f2 1075
1e59de90
TL
1076Alternatively, to manually set the number of PGs for the pool to the
1077recommended amount, run the following command:
11fdf7f2 1078
39ae355f
TL
1079.. prompt:: bash $
1080
1081 ceph osd pool set <pool-name> pg_num <new-pg-num>
11fdf7f2 1082
1e59de90
TL
1083For more information, see :ref:`choosing-number-of-placement-groups` and
1084:ref:`pg-autoscaler`.
c07f9fc5
FG
1085
1086TOO_MANY_PGS
1087____________
1088
1e59de90
TL
1089The number of Placement Groups (PGs) in use in the cluster is above the
1090configurable threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold
1091is exceeded, the cluster will not allow new pools to be created, pool `pg_num`
1092to be increased, or pool replication to be increased (any of which, if allowed,
1093would lead to more PGs in the cluster). A large number of PGs can lead to
1094higher memory utilization for OSD daemons, slower peering after cluster state
1095changes (for example, OSD restarts, additions, or removals), and higher load on
1096the Manager and Monitor daemons.
c07f9fc5 1097
1e59de90
TL
1098The simplest way to mitigate the problem is to increase the number of OSDs in
1099the cluster by adding more hardware. Note that, because the OSD count that is
1100used for the purposes of this health check is the number of ``in`` OSDs,
1101marking ``out`` OSDs ``in`` (if there are any ``out`` OSDs available) can also
1102help. To do so, run the following command:
39ae355f
TL
1103
1104.. prompt:: bash $
c07f9fc5 1105
39ae355f 1106 ceph osd in <osd id(s)>
c07f9fc5 1107
1e59de90 1108For more information, see :ref:`choosing-number-of-placement-groups`.
11fdf7f2
TL
1109
1110POOL_TOO_MANY_PGS
1111_________________
1112
1e59de90
TL
1113One or more pools should probably have fewer Placement Groups (PGs), given the
1114amount of data that is currently stored in the pool. This issue can lead to
1115higher memory utilization for OSD daemons, slower peering after cluster state
1116changes (for example, OSD restarts, additions, or removals), and higher load on
1117the Manager and Monitor daemons. This alert is raised only if the
11fdf7f2
TL
1118``pg_autoscale_mode`` property on the pool is set to ``warn``.
1119
1e59de90
TL
1120To disable the alert, entirely disable auto-scaling of PGs for the pool by
1121running the following command:
39ae355f
TL
1122
1123.. prompt:: bash $
11fdf7f2 1124
39ae355f 1125 ceph osd pool set <pool-name> pg_autoscale_mode off
11fdf7f2 1126
1e59de90
TL
1127To allow the cluster to automatically adjust the number of PGs for the pool,
1128run the following command:
11fdf7f2 1129
39ae355f
TL
1130.. prompt:: bash $
1131
1132 ceph osd pool set <pool-name> pg_autoscale_mode on
11fdf7f2 1133
1e59de90
TL
1134Alternatively, to manually set the number of PGs for the pool to the
1135recommended amount, run the following command:
39ae355f
TL
1136
1137.. prompt:: bash $
11fdf7f2 1138
39ae355f 1139 ceph osd pool set <pool-name> pg_num <new-pg-num>
11fdf7f2 1140
1e59de90
TL
1141For more information, see :ref:`choosing-number-of-placement-groups` and
1142:ref:`pg-autoscaler`.
1143
11fdf7f2 1144
9f95a23c 1145POOL_TARGET_SIZE_BYTES_OVERCOMMITTED
11fdf7f2
TL
1146____________________________________
1147
1e59de90
TL
1148One or more pools have a ``target_size_bytes`` property that is set in order to
1149estimate the expected size of the pool, but the value(s) of this property are
1150greater than the total available storage (either by themselves or in
1151combination with other pools).
11fdf7f2 1152
1e59de90
TL
1153This alert is usually an indication that the ``target_size_bytes`` value for
1154the pool is too large and should be reduced or set to zero. To reduce the
1155``target_size_bytes`` value or set it to zero, run the following command:
39ae355f
TL
1156
1157.. prompt:: bash $
11fdf7f2 1158
39ae355f 1159 ceph osd pool set <pool-name> target_size_bytes 0
11fdf7f2 1160
1e59de90
TL
1161The above command sets the value of ``target_size_bytes`` to zero. To set the
1162value of ``target_size_bytes`` to a non-zero value, replace the ``0`` with that
1163non-zero value.
1164
11fdf7f2
TL
1165For more information, see :ref:`specifying_pool_target_size`.
1166
9f95a23c 1167POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO
11fdf7f2
TL
1168____________________________________
1169
1e59de90
TL
1170One or more pools have both ``target_size_bytes`` and ``target_size_ratio`` set
1171in order to estimate the expected size of the pool. Only one of these
1172properties should be non-zero. If both are set to a non-zero value, then
1173``target_size_ratio`` takes precedence and ``target_size_bytes`` is ignored.
11fdf7f2 1174
1e59de90 1175To reset ``target_size_bytes`` to zero, run the following command:
11fdf7f2 1176
39ae355f
TL
1177.. prompt:: bash $
1178
1179 ceph osd pool set <pool-name> target_size_bytes 0
11fdf7f2
TL
1180
1181For more information, see :ref:`specifying_pool_target_size`.
c07f9fc5 1182
eafe8130
TL
1183TOO_FEW_OSDS
1184____________
1185
1e59de90
TL
1186The number of OSDs in the cluster is below the configurable threshold of
1187``osd_pool_default_size``. This means that some or all data may not be able to
1188satisfy the data protection policy specified in CRUSH rules and pool settings.
eafe8130 1189
c07f9fc5
FG
1190SMALLER_PGP_NUM
1191_______________
1192
1e59de90
TL
1193One or more pools have a ``pgp_num`` value less than ``pg_num``. This alert is
1194normally an indication that the Placement Group (PG) count was increased
1195without any increase in the placement behavior.
c07f9fc5 1196
1e59de90
TL
1197This disparity is sometimes brought about deliberately, in order to separate
1198out the `split` step when the PG count is adjusted from the data migration that
1199is needed when ``pgp_num`` is changed.
c07f9fc5 1200
1e59de90
TL
1201This issue is normally resolved by setting ``pgp_num`` to match ``pg_num``, so
1202as to trigger the data migration, by running the following command:
39ae355f
TL
1203
1204.. prompt:: bash $
c07f9fc5 1205
39ae355f 1206 ceph osd pool set <pool> pgp_num <pg-num-value>
c07f9fc5 1207
c07f9fc5
FG
1208MANY_OBJECTS_PER_PG
1209___________________
1210
1e59de90
TL
1211One or more pools have an average number of objects per Placement Group (PG)
1212that is significantly higher than the overall cluster average. The specific
1213threshold is determined by the ``mon_pg_warn_max_object_skew`` configuration
1214value.
c07f9fc5 1215
1e59de90
TL
1216This alert is usually an indication that the pool(s) that contain most of the
1217data in the cluster have too few PGs, or that other pools that contain less
1218data have too many PGs. See *TOO_MANY_PGS* above.
c07f9fc5 1219
1e59de90
TL
1220To silence the health check, raise the threshold by adjusting the
1221``mon_pg_warn_max_object_skew`` config option on the managers.
c07f9fc5 1222
1e59de90 1223The health check will be silenced for a specific pool only if
20effc67 1224``pg_autoscale_mode`` is set to ``on``.
11fdf7f2 1225
c07f9fc5
FG
1226POOL_APP_NOT_ENABLED
1227____________________
1228
aee94f69
TL
1229A pool exists but the pool has not been tagged for use by a particular
1230application.
c07f9fc5 1231
1e59de90
TL
1232To resolve this issue, tag the pool for use by an application. For
1233example, if the pool is used by RBD, run the following command:
39ae355f
TL
1234
1235.. prompt:: bash $
c07f9fc5 1236
39ae355f 1237 rbd pool init <poolname>
c07f9fc5 1238
1e59de90
TL
1239Alternatively, if the pool is being used by a custom application (here 'foo'),
1240you can label the pool by running the following low-level command:
c07f9fc5 1241
39ae355f
TL
1242.. prompt:: bash $
1243
1244 ceph osd pool application enable foo
c07f9fc5 1245
11fdf7f2 1246For more information, see :ref:`associate-pool-to-application`.
c07f9fc5
FG
1247
1248POOL_FULL
1249_________
1250
1e59de90
TL
1251One or more pools have reached (or are very close to reaching) their quota. The
1252threshold to raise this health check is determined by the
1253``mon_pool_quota_crit_threshold`` configuration option.
c07f9fc5 1254
1e59de90
TL
1255Pool quotas can be adjusted up or down (or removed) by running the following
1256commands:
39ae355f
TL
1257
1258.. prompt:: bash $
c07f9fc5 1259
39ae355f
TL
1260 ceph osd pool set-quota <pool> max_bytes <bytes>
1261 ceph osd pool set-quota <pool> max_objects <objects>
c07f9fc5 1262
1e59de90 1263To disable a quota, set the quota value to 0.
c07f9fc5
FG
1264
1265POOL_NEAR_FULL
1266______________
1267
1e59de90 1268One or more pools are approaching a configured fullness threshold.
f67539c2 1269
1e59de90
TL
1270One of the several thresholds that can raise this health check is determined by
1271the ``mon_pool_quota_warn_threshold`` configuration option.
c07f9fc5 1272
1e59de90
TL
1273Pool quotas can be adjusted up or down (or removed) by running the following
1274commands:
c07f9fc5 1275
39ae355f
TL
1276.. prompt:: bash $
1277
1278 ceph osd pool set-quota <pool> max_bytes <bytes>
1279 ceph osd pool set-quota <pool> max_objects <objects>
c07f9fc5 1280
1e59de90 1281To disable a quota, set the quota value to 0.
c07f9fc5 1282
1e59de90
TL
1283Other thresholds that can raise the two health checks above are
1284``mon_osd_nearfull_ratio`` and ``mon_osd_full_ratio``. For details and
1285resolution, see :ref:`storage-capacity` and :ref:`no-free-drive-space`.
f67539c2 1286
c07f9fc5
FG
1287OBJECT_MISPLACED
1288________________
1289
1e59de90
TL
1290One or more objects in the cluster are not stored on the node that CRUSH would
1291prefer that they be stored on. This alert is an indication that data migration
1292due to a recent cluster change has not yet completed.
c07f9fc5 1293
1e59de90
TL
1294Misplaced data is not a dangerous condition in and of itself; data consistency
1295is never at risk, and old copies of objects will not be removed until the
1296desired number of new copies (in the desired locations) has been created.
c07f9fc5
FG
1297
1298OBJECT_UNFOUND
1299______________
1300
1e59de90
TL
1301One or more objects in the cluster cannot be found. More precisely, the OSDs
1302know that a new or updated copy of an object should exist, but no such copy has
1303been found on OSDs that are currently online.
c07f9fc5
FG
1304
1305Read or write requests to unfound objects will block.
1306
1e59de90
TL
1307Ideally, a "down" OSD that has a more recent copy of the unfound object can be
1308brought back online. To identify candidate OSDs, check the peering state of the
1309PG(s) responsible for the unfound object. To see the peering state, run the
1310following command:
39ae355f
TL
1311
1312.. prompt:: bash $
c07f9fc5 1313
39ae355f 1314 ceph tell <pgid> query
c07f9fc5 1315
1e59de90
TL
1316On the other hand, if the latest copy of the object is not available, the
1317cluster can be told to roll back to a previous version of the object. For more
1318information, see :ref:`failures-osd-unfound`.
c07f9fc5 1319
11fdf7f2
TL
1320SLOW_OPS
1321________
c07f9fc5 1322
1e59de90
TL
1323One or more OSD requests or monitor requests are taking a long time to process.
1324This alert might be an indication of extreme load, a slow storage device, or a
1325software bug.
c07f9fc5 1326
1e59de90
TL
1327To query the request queue for the daemon that is causing the slowdown, run the
1328following command from the daemon's host:
39ae355f
TL
1329
1330.. prompt:: bash $
c07f9fc5 1331
39ae355f 1332 ceph daemon osd.<id> ops
c07f9fc5 1333
1e59de90 1334To see a summary of the slowest recent requests, run the following command:
c07f9fc5 1335
39ae355f 1336.. prompt:: bash $
c07f9fc5 1337
39ae355f 1338 ceph daemon osd.<id> dump_historic_ops
c07f9fc5 1339
1e59de90 1340To see the location of a specific OSD, run the following command:
39ae355f
TL
1341
1342.. prompt:: bash $
1343
1344 ceph osd find osd.<id>
c07f9fc5 1345
c07f9fc5
FG
1346PG_NOT_SCRUBBED
1347_______________
1348
1e59de90
TL
1349One or more Placement Groups (PGs) have not been scrubbed recently. PGs are
1350normally scrubbed within an interval determined by
1351:confval:`osd_scrub_max_interval` globally. This interval can be overridden on
1352per-pool basis by changing the value of the variable
1353:confval:`scrub_max_interval`. This health check is raised if a certain
1354percentage (determined by ``mon_warn_pg_not_scrubbed_ratio``) of the interval
1355has elapsed after the time the scrub was scheduled and no scrub has been
1356performed.
1357
1358PGs will be scrubbed only if they are flagged as ``clean`` (which means that
1359they are to be cleaned, and not that they have been examined and found to be
1360clean). Misplaced or degraded PGs will not be flagged as ``clean`` (see
1361*PG_AVAILABILITY* and *PG_DEGRADED* above).
c07f9fc5 1362
1e59de90 1363To manually initiate a scrub of a clean PG, run the following command:
c07f9fc5 1364
1e59de90 1365.. prompt: bash $
c07f9fc5 1366
1e59de90 1367 ceph pg scrub <pgid>
c07f9fc5
FG
1368
1369PG_NOT_DEEP_SCRUBBED
1370____________________
1371
1e59de90
TL
1372One or more Placement Groups (PGs) have not been deep scrubbed recently. PGs
1373are normally scrubbed every :confval:`osd_deep_scrub_interval` seconds at most.
1374This health check is raised if a certain percentage (determined by
1375``mon_warn_pg_not_deep_scrubbed_ratio``) of the interval has elapsed after the
1376time the scrub was scheduled and no scrub has been performed.
c07f9fc5 1377
1e59de90
TL
1378PGs will receive a deep scrub only if they are flagged as *clean* (which means
1379that they are to be cleaned, and not that they have been examined and found to
1380be clean). Misplaced or degraded PGs might not be flagged as ``clean`` (see
1381*PG_AVAILABILITY* and *PG_DEGRADED* above).
c07f9fc5 1382
1e59de90 1383To manually initiate a deep scrub of a clean PG, run the following command:
c07f9fc5 1384
39ae355f
TL
1385.. prompt:: bash $
1386
1387 ceph pg deep-scrub <pgid>
eafe8130
TL
1388
1389
9f95a23c
TL
1390PG_SLOW_SNAP_TRIMMING
1391_____________________
1392
1e59de90
TL
1393The snapshot trim queue for one or more PGs has exceeded the configured warning
1394threshold. This alert indicates either that an extremely large number of
1395snapshots was recently deleted, or that OSDs are unable to trim snapshots
1396quickly enough to keep up with the rate of new snapshot deletions.
9f95a23c 1397
1e59de90
TL
1398The warning threshold is determined by the ``mon_osd_snap_trim_queue_warn_on``
1399option (default: 32768).
9f95a23c 1400
1e59de90
TL
1401This alert might be raised if OSDs are under excessive load and unable to keep
1402up with their background work, or if the OSDs' internal metadata database is
1403heavily fragmented and unable to perform. The alert might also indicate some
1404other performance issue with the OSDs.
9f95a23c 1405
1e59de90
TL
1406The exact size of the snapshot trim queue is reported by the ``snaptrimq_len``
1407field of ``ceph pg ls -f json-detail``.
9f95a23c 1408
aee94f69
TL
1409Stretch Mode
1410------------
1411
1412INCORRECT_NUM_BUCKETS_STRETCH_MODE
1413__________________________________
1414
1415Stretch mode currently only support 2 dividing buckets with OSDs, this warning suggests
1416that the number of dividing buckets is not equal to 2 after stretch mode is enabled.
1417You can expect unpredictable failures and MON assertions until the condition is fixed.
1418
1419We encourage you to fix this by removing additional dividing buckets or bump the
1420number of dividing buckets to 2.
1421
1422UNEVEN_WEIGHTS_STRETCH_MODE
1423___________________________
1424
1425The 2 dividing buckets must have equal weights when stretch mode is enabled.
1426This warning suggests that the 2 dividing buckets have uneven weights after
1427stretch mode is enabled. This is not immediately fatal, however, you can expect
1428Ceph to be confused when trying to process transitions between dividing buckets.
1429
1430We encourage you to fix this by making the weights even on both dividing buckets.
1431This can be done by making sure the combined weight of the OSDs on each dividing
1432bucket are the same.
1433
eafe8130
TL
1434Miscellaneous
1435-------------
1436
1437RECENT_CRASH
1438____________
1439
1e59de90
TL
1440One or more Ceph daemons have crashed recently, and the crash(es) have not yet
1441been acknowledged and archived by the administrator. This alert might indicate
1442a software bug, a hardware problem (for example, a failing disk), or some other
1443problem.
eafe8130 1444
1e59de90 1445To list recent crashes, run the following command:
39ae355f
TL
1446
1447.. prompt:: bash $
1448
1449 ceph crash ls-new
eafe8130 1450
1e59de90 1451To examine information about a specific crash, run the following command:
eafe8130 1452
39ae355f 1453.. prompt:: bash $
eafe8130 1454
39ae355f 1455 ceph crash info <crash-id>
eafe8130 1456
1e59de90
TL
1457To silence this alert, you can archive the crash (perhaps after the crash
1458has been examined by an administrator) by running the following command:
eafe8130 1459
39ae355f 1460.. prompt:: bash $
eafe8130 1461
39ae355f 1462 ceph crash archive <crash-id>
eafe8130 1463
1e59de90 1464Similarly, to archive all recent crashes, run the following command:
39ae355f
TL
1465
1466.. prompt:: bash $
1467
1468 ceph crash archive-all
eafe8130 1469
1e59de90
TL
1470Archived crashes will still be visible by running the command ``ceph crash
1471ls``, but not by running the command ``ceph crash ls-new``.
20effc67 1472
1e59de90 1473The time period that is considered recent is determined by the option
20effc67
TL
1474``mgr/crash/warn_recent_interval`` (default: two weeks).
1475
1e59de90 1476To entirely disable this alert, run the following command:
39ae355f
TL
1477
1478.. prompt:: bash $
20effc67 1479
39ae355f 1480 ceph config set mgr/crash/warn_recent_interval 0
20effc67
TL
1481
1482RECENT_MGR_MODULE_CRASH
1483_______________________
1484
1e59de90
TL
1485One or more ``ceph-mgr`` modules have crashed recently, and the crash(es) have
1486not yet been acknowledged and archived by the administrator. This alert
1487usually indicates a software bug in one of the software modules that are
1488running inside the ``ceph-mgr`` daemon. The module that experienced the problem
1489might be disabled as a result, but other modules are unaffected and continue to
1490function as expected.
20effc67 1491
1e59de90
TL
1492As with the *RECENT_CRASH* health check, a specific crash can be inspected by
1493running the following command:
20effc67 1494
39ae355f
TL
1495.. prompt:: bash $
1496
1497 ceph crash info <crash-id>
20effc67 1498
1e59de90
TL
1499To silence this alert, you can archive the crash (perhaps after the crash has
1500been examined by an administrator) by running the following command:
39ae355f
TL
1501
1502.. prompt:: bash $
1503
1504 ceph crash archive <crash-id>
20effc67 1505
1e59de90 1506Similarly, to archive all recent crashes, run the following command:
20effc67 1507
39ae355f 1508.. prompt:: bash $
20effc67 1509
39ae355f 1510 ceph crash archive-all
20effc67 1511
1e59de90
TL
1512Archived crashes will still be visible by running the command ``ceph crash ls``
1513but not by running the command ``ceph crash ls-new``.
eafe8130 1514
1e59de90 1515The time period that is considered recent is determined by the option
eafe8130
TL
1516``mgr/crash/warn_recent_interval`` (default: two weeks).
1517
1e59de90 1518To entirely disable this alert, run the following command:
eafe8130 1519
39ae355f
TL
1520.. prompt:: bash $
1521
1522 ceph config set mgr/crash/warn_recent_interval 0
eafe8130
TL
1523
1524TELEMETRY_CHANGED
1525_________________
1526
1e59de90
TL
1527Telemetry has been enabled, but because the contents of the telemetry report
1528have changed in the meantime, telemetry reports will not be sent.
eafe8130 1529
1e59de90
TL
1530Ceph developers occasionally revise the telemetry feature to include new and
1531useful information, or to remove information found to be useless or sensitive.
1532If any new information is included in the report, Ceph requires the
1533administrator to re-enable telemetry. This requirement ensures that the
1534administrator has an opportunity to (re)review the information that will be
eafe8130
TL
1535shared.
1536
1e59de90 1537To review the contents of the telemetry report, run the following command:
39ae355f
TL
1538
1539.. prompt:: bash $
eafe8130 1540
39ae355f 1541 ceph telemetry show
eafe8130 1542
1e59de90
TL
1543Note that the telemetry report consists of several channels that may be
1544independently enabled or disabled. For more information, see :ref:`telemetry`.
eafe8130 1545
1e59de90 1546To re-enable telemetry (and silence the alert), run the following command:
39ae355f
TL
1547
1548.. prompt:: bash $
eafe8130 1549
39ae355f 1550 ceph telemetry on
eafe8130 1551
1e59de90 1552To disable telemetry (and silence the alert), run the following command:
eafe8130 1553
39ae355f
TL
1554.. prompt:: bash $
1555
1556 ceph telemetry off
9f95a23c
TL
1557
1558AUTH_BAD_CAPS
1559_____________
1560
1e59de90
TL
1561One or more auth users have capabilities that cannot be parsed by the monitors.
1562As a general rule, this alert indicates that there are one or more daemon types
1563that the user is not authorized to use to perform any action.
9f95a23c 1564
1e59de90
TL
1565This alert is most likely to be raised after an upgrade if (1) the capabilities
1566were set with an older version of Ceph that did not properly validate the
1567syntax of those capabilities, or if (2) the syntax of the capabilities has
1568changed.
9f95a23c 1569
1e59de90 1570To remove the user(s) in question, run the following command:
39ae355f
TL
1571
1572.. prompt:: bash $
9f95a23c 1573
39ae355f 1574 ceph auth rm <entity-name>
9f95a23c 1575
1e59de90
TL
1576(This resolves the health check, but it prevents clients from being able to
1577authenticate as the removed user.)
9f95a23c 1578
1e59de90
TL
1579Alternatively, to update the capabilities for the user(s), run the following
1580command:
39ae355f
TL
1581
1582.. prompt:: bash $
9f95a23c 1583
39ae355f 1584 ceph auth <entity-name> <daemon-type> <caps> [<daemon-type> <caps> ...]
9f95a23c
TL
1585
1586For more information about auth capabilities, see :ref:`user-management`.
1587
9f95a23c
TL
1588OSD_NO_DOWN_OUT_INTERVAL
1589________________________
1590
1e59de90
TL
1591The ``mon_osd_down_out_interval`` option is set to zero, which means that the
1592system does not automatically perform any repair or healing operations when an
1593OSD fails. Instead, an administrator an external orchestrator must manually
1594mark "down" OSDs as ``out`` (by running ``ceph osd out <osd-id>``) in order to
1595trigger recovery.
9f95a23c 1596
1e59de90
TL
1597This option is normally set to five or ten minutes, which should be enough time
1598for a host to power-cycle or reboot.
9f95a23c 1599
1e59de90
TL
1600To silence this alert, set ``mon_warn_on_osd_down_out_interval_zero`` to
1601``false`` by running the following command:
9f95a23c 1602
39ae355f
TL
1603.. prompt:: bash $
1604
1605 ceph config global mon mon_warn_on_osd_down_out_interval_zero false
adb31ebb
TL
1606
1607DASHBOARD_DEBUG
1608_______________
1609
1e59de90
TL
1610The Dashboard debug mode is enabled. This means that if there is an error while
1611processing a REST API request, the HTTP error response will contain a Python
1612traceback. This mode should be disabled in production environments because such
1613a traceback might contain and expose sensitive information.
adb31ebb 1614
1e59de90 1615To disable the debug mode, run the following command:
39ae355f
TL
1616
1617.. prompt:: bash $
adb31ebb 1618
39ae355f 1619 ceph dashboard debug disable