]>
Commit | Line | Data |
---|---|---|
9f95a23c | 1 | .. _health-checks: |
c07f9fc5 FG |
2 | |
3 | ============= | |
4 | Health checks | |
5 | ============= | |
6 | ||
7 | Overview | |
8 | ======== | |
9 | ||
10 | There is a finite set of possible health messages that a Ceph cluster can | |
11 | raise -- these are defined as *health checks* which have unique identifiers. | |
12 | ||
13 | The identifier is a terse pseudo-human-readable (i.e. like a variable name) | |
14 | string. It is intended to enable tools (such as UIs) to make sense of | |
15 | health checks, and present them in a way that reflects their meaning. | |
16 | ||
17 | This page lists the health checks that are raised by the monitor and manager | |
18 | daemons. In addition to these, you may also see health checks that originate | |
11fdf7f2 | 19 | from MDS daemons (see :ref:`cephfs-health-messages`), and health checks |
c07f9fc5 FG |
20 | that are defined by ceph-mgr python modules. |
21 | ||
22 | Definitions | |
23 | =========== | |
24 | ||
11fdf7f2 TL |
25 | Monitor |
26 | ------- | |
27 | ||
f67539c2 TL |
28 | DAEMON_OLD_VERSION |
29 | __________________ | |
30 | ||
31 | Warn if old version(s) of Ceph are running on any daemons. | |
32 | It will generate a health error if multiple versions are detected. | |
33 | This condition must exist for over mon_warn_older_version_delay (set to 1 week by default) in order for the | |
34 | health condition to be triggered. This allows most upgrades to proceed | |
35 | without falsely seeing the warning. If upgrade is paused for an extended | |
36 | time period, health mute can be used like this | |
37 | "ceph health mute DAEMON_OLD_VERSION --sticky". In this case after | |
38 | upgrade has finished use "ceph health unmute DAEMON_OLD_VERSION". | |
39 | ||
11fdf7f2 TL |
40 | MON_DOWN |
41 | ________ | |
42 | ||
43 | One or more monitor daemons is currently down. The cluster requires a | |
44 | majority (more than 1/2) of the monitors in order to function. When | |
45 | one or more monitors are down, clients may have a harder time forming | |
46 | their initial connection to the cluster as they may need to try more | |
47 | addresses before they reach an operating monitor. | |
48 | ||
49 | The down monitor daemon should generally be restarted as soon as | |
20effc67 | 50 | possible to reduce the risk of a subsequent monitor failure leading to |
11fdf7f2 TL |
51 | a service outage. |
52 | ||
53 | MON_CLOCK_SKEW | |
54 | ______________ | |
55 | ||
56 | The clocks on the hosts running the ceph-mon monitor daemons are not | |
57 | sufficiently well synchronized. This health alert is raised if the | |
58 | cluster detects a clock skew greater than ``mon_clock_drift_allowed``. | |
59 | ||
60 | This is best resolved by synchronizing the clocks using a tool like | |
61 | ``ntpd`` or ``chrony``. | |
62 | ||
63 | If it is impractical to keep the clocks closely synchronized, the | |
64 | ``mon_clock_drift_allowed`` threshold can also be increased, but this | |
65 | value must stay significantly below the ``mon_lease`` interval in | |
66 | order for monitor cluster to function properly. | |
67 | ||
68 | MON_MSGR2_NOT_ENABLED | |
69 | _____________________ | |
70 | ||
20effc67 | 71 | The :confval:`ms_bind_msgr2` option is enabled but one or more monitors is |
11fdf7f2 TL |
72 | not configured to bind to a v2 port in the cluster's monmap. This |
73 | means that features specific to the msgr2 protocol (e.g., encryption) | |
74 | are not available on some or all connections. | |
75 | ||
76 | In most cases this can be corrected by issuing the command:: | |
77 | ||
78 | ceph mon enable-msgr2 | |
79 | ||
80 | That command will change any monitor configured for the old default | |
81 | port 6789 to continue to listen for v1 connections on 6789 and also | |
82 | listen for v2 connections on the new default 3300 port. | |
83 | ||
84 | If a monitor is configured to listen for v1 connections on a non-standard port (not 6789), then the monmap will need to be modified manually. | |
85 | ||
86 | ||
9f95a23c TL |
87 | MON_DISK_LOW |
88 | ____________ | |
89 | ||
90 | One or more monitors is low on disk space. This alert triggers if the | |
91 | available space on the file system storing the monitor database | |
92 | (normally ``/var/lib/ceph/mon``), as a percentage, drops below | |
93 | ``mon_data_avail_warn`` (default: 30%). | |
94 | ||
95 | This may indicate that some other process or user on the system is | |
96 | filling up the same file system used by the monitor. It may also | |
97 | indicate that the monitors database is large (see ``MON_DISK_BIG`` | |
98 | below). | |
99 | ||
100 | If space cannot be freed, the monitor's data directory may need to be | |
101 | moved to another storage device or file system (while the monitor | |
102 | daemon is not running, of course). | |
103 | ||
104 | ||
105 | MON_DISK_CRIT | |
106 | _____________ | |
107 | ||
108 | One or more monitors is critically low on disk space. This alert | |
109 | triggers if the available space on the file system storing the monitor | |
110 | database (normally ``/var/lib/ceph/mon``), as a percentage, drops | |
111 | below ``mon_data_avail_crit`` (default: 5%). See ``MON_DISK_LOW``, above. | |
112 | ||
113 | MON_DISK_BIG | |
114 | ____________ | |
115 | ||
116 | The database size for one or more monitors is very large. This alert | |
117 | triggers if the size of the monitor's database is larger than | |
118 | ``mon_data_size_warn`` (default: 15 GiB). | |
119 | ||
120 | A large database is unusual, but may not necessarily indicate a | |
121 | problem. Monitor databases may grow in size when there are placement | |
122 | groups that have not reached an ``active+clean`` state in a long time. | |
123 | ||
124 | This may also indicate that the monitor's database is not properly | |
125 | compacting, which has been observed with some older versions of | |
126 | leveldb and rocksdb. Forcing a compaction with ``ceph daemon mon.<id> | |
127 | compact`` may shrink the on-disk size. | |
128 | ||
129 | This warning may also indicate that the monitor has a bug that is | |
130 | preventing it from pruning the cluster metadata it stores. If the | |
131 | problem persists, please report a bug. | |
132 | ||
133 | The warning threshold may be adjusted with:: | |
134 | ||
135 | ceph config set global mon_data_size_warn <size> | |
136 | ||
c5c27e9a TL |
137 | AUTH_INSECURE_GLOBAL_ID_RECLAIM |
138 | _______________________________ | |
139 | ||
140 | One or more clients or daemons are connected to the cluster that are | |
141 | not securely reclaiming their global_id (a unique number identifying | |
142 | each entity in the cluster) when reconnecting to a monitor. The | |
143 | client is being permitted to connect anyway because the | |
144 | ``auth_allow_insecure_global_id_reclaim`` option is set to true (which may | |
145 | be necessary until all ceph clients have been upgraded), and the | |
146 | ``auth_expose_insecure_global_id_reclaim`` option set to ``true`` (which | |
147 | allows monitors to detect clients with insecure reclaim early by forcing them to | |
148 | reconnect right after they first authenticate). | |
149 | ||
150 | You can identify which client(s) are using unpatched ceph client code with:: | |
151 | ||
152 | ceph health detail | |
153 | ||
154 | Clients global_id reclaim rehavior can also seen in the | |
155 | ``global_id_status`` field in the dump of clients connected to an | |
156 | individual monitor (``reclaim_insecure`` means the client is | |
157 | unpatched and is contributing to this health alert):: | |
158 | ||
159 | ceph tell mon.\* sessions | |
160 | ||
161 | We strongly recommend that all clients in the system are upgraded to a | |
162 | newer version of Ceph that correctly reclaims global_id values. Once | |
163 | all clients have been updated, you can stop allowing insecure reconnections | |
164 | with:: | |
165 | ||
166 | ceph config set mon auth_allow_insecure_global_id_reclaim false | |
167 | ||
168 | If it is impractical to upgrade all clients immediately, you can silence | |
169 | this warning temporarily with:: | |
170 | ||
171 | ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week | |
172 | ||
173 | Although we do NOT recommend doing so, you can also disable this warning indefinitely | |
174 | with:: | |
175 | ||
176 | ceph config set mon mon_warn_on_insecure_global_id_reclaim false | |
177 | ||
178 | AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED | |
179 | _______________________________________ | |
180 | ||
181 | Ceph is currently configured to allow clients to reconnect to monitors using | |
182 | an insecure process to reclaim their previous global_id because the setting | |
183 | ``auth_allow_insecure_global_id_reclaim`` is set to ``true``. It may be necessary to | |
184 | leave this setting enabled while existing Ceph clients are upgraded to newer | |
185 | versions of Ceph that correctly and securely reclaim their global_id. | |
186 | ||
187 | If the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health alert has not also been raised and | |
188 | the ``auth_expose_insecure_global_id_reclaim`` setting has not been disabled (it is | |
189 | on by default), then there are currently no clients connected that need to be | |
190 | upgraded, and it is safe to disallow insecure global_id reclaim with:: | |
191 | ||
192 | ceph config set mon auth_allow_insecure_global_id_reclaim false | |
193 | ||
194 | If there are still clients that need to be upgraded, then this alert can be | |
195 | silenced temporarily with:: | |
196 | ||
197 | ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week | |
198 | ||
199 | Although we do NOT recommend doing so, you can also disable this warning indefinitely | |
200 | with:: | |
201 | ||
202 | ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false | |
203 | ||
11fdf7f2 TL |
204 | |
205 | Manager | |
206 | ------- | |
207 | ||
9f95a23c TL |
208 | MGR_DOWN |
209 | ________ | |
210 | ||
211 | All manager daemons are currently down. The cluster should normally | |
212 | have at least one running manager (``ceph-mgr``) daemon. If no | |
213 | manager daemon is running, the cluster's ability to monitor itself will | |
214 | be compromised, and parts of the management API will become | |
215 | unavailable (for example, the dashboard will not work, and most CLI | |
216 | commands that report metrics or runtime state will block). However, | |
217 | the cluster will still be able to perform all IO operations and | |
218 | recover from failures. | |
219 | ||
220 | The down manager daemon should generally be restarted as soon as | |
221 | possible to ensure that the cluster can be monitored (e.g., so that | |
222 | the ``ceph -s`` information is up to date, and/or metrics can be | |
223 | scraped by Prometheus). | |
224 | ||
225 | ||
11fdf7f2 TL |
226 | MGR_MODULE_DEPENDENCY |
227 | _____________________ | |
228 | ||
229 | An enabled manager module is failing its dependency check. This health check | |
230 | should come with an explanatory message from the module about the problem. | |
231 | ||
232 | For example, a module might report that a required package is not installed: | |
233 | install the required package and restart your manager daemons. | |
234 | ||
235 | This health check is only applied to enabled modules. If a module is | |
236 | not enabled, you can see whether it is reporting dependency issues in | |
237 | the output of `ceph module ls`. | |
238 | ||
239 | ||
240 | MGR_MODULE_ERROR | |
241 | ________________ | |
242 | ||
243 | A manager module has experienced an unexpected error. Typically, | |
244 | this means an unhandled exception was raised from the module's `serve` | |
245 | function. The human readable description of the error may be obscurely | |
246 | worded if the exception did not provide a useful description of itself. | |
247 | ||
248 | This health check may indicate a bug: please open a Ceph bug report if you | |
249 | think you have encountered a bug. | |
250 | ||
251 | If you believe the error is transient, you may restart your manager | |
252 | daemon(s), or use `ceph mgr fail` on the active daemon to prompt | |
253 | a failover to another daemon. | |
254 | ||
c07f9fc5 FG |
255 | |
256 | OSDs | |
257 | ---- | |
258 | ||
259 | OSD_DOWN | |
260 | ________ | |
261 | ||
262 | One or more OSDs are marked down. The ceph-osd daemon may have been | |
263 | stopped, or peer OSDs may be unable to reach the OSD over the network. | |
264 | Common causes include a stopped or crashed daemon, a down host, or a | |
265 | network outage. | |
266 | ||
267 | Verify the host is healthy, the daemon is started, and network is | |
268 | functioning. If the daemon has crashed, the daemon log file | |
269 | (``/var/log/ceph/ceph-osd.*``) may contain debugging information. | |
270 | ||
271 | OSD_<crush type>_DOWN | |
272 | _____________________ | |
273 | ||
274 | (e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN) | |
275 | ||
276 | All the OSDs within a particular CRUSH subtree are marked down, for example | |
277 | all OSDs on a host. | |
278 | ||
279 | OSD_ORPHAN | |
280 | __________ | |
281 | ||
282 | An OSD is referenced in the CRUSH map hierarchy but does not exist. | |
283 | ||
284 | The OSD can be removed from the CRUSH hierarchy with:: | |
285 | ||
286 | ceph osd crush rm osd.<id> | |
287 | ||
288 | OSD_OUT_OF_ORDER_FULL | |
289 | _____________________ | |
290 | ||
9f95a23c | 291 | The utilization thresholds for `nearfull`, `backfillfull`, `full`, |
c07f9fc5 | 292 | and/or `failsafe_full` are not ascending. In particular, we expect |
9f95a23c | 293 | `nearfull < backfillfull`, `backfillfull < full`, and `full < |
c07f9fc5 FG |
294 | failsafe_full`. |
295 | ||
296 | The thresholds can be adjusted with:: | |
297 | ||
c07f9fc5 | 298 | ceph osd set-nearfull-ratio <ratio> |
9f95a23c | 299 | ceph osd set-backfillfull-ratio <ratio> |
c07f9fc5 FG |
300 | ceph osd set-full-ratio <ratio> |
301 | ||
302 | ||
303 | OSD_FULL | |
304 | ________ | |
305 | ||
306 | One or more OSDs has exceeded the `full` threshold and is preventing | |
307 | the cluster from servicing writes. | |
308 | ||
309 | Utilization by pool can be checked with:: | |
310 | ||
311 | ceph df | |
312 | ||
313 | The currently defined `full` ratio can be seen with:: | |
314 | ||
315 | ceph osd dump | grep full_ratio | |
316 | ||
317 | A short-term workaround to restore write availability is to raise the full | |
318 | threshold by a small amount:: | |
319 | ||
320 | ceph osd set-full-ratio <ratio> | |
321 | ||
322 | New storage should be added to the cluster by deploying more OSDs or | |
323 | existing data should be deleted in order to free up space. | |
11fdf7f2 | 324 | |
c07f9fc5 FG |
325 | OSD_BACKFILLFULL |
326 | ________________ | |
327 | ||
328 | One or more OSDs has exceeded the `backfillfull` threshold, which will | |
329 | prevent data from being allowed to rebalance to this device. This is | |
330 | an early warning that rebalancing may not be able to complete and that | |
331 | the cluster is approaching full. | |
332 | ||
333 | Utilization by pool can be checked with:: | |
334 | ||
335 | ceph df | |
336 | ||
337 | OSD_NEARFULL | |
338 | ____________ | |
339 | ||
340 | One or more OSDs has exceeded the `nearfull` threshold. This is an early | |
341 | warning that the cluster is approaching full. | |
342 | ||
343 | Utilization by pool can be checked with:: | |
344 | ||
345 | ceph df | |
346 | ||
347 | OSDMAP_FLAGS | |
348 | ____________ | |
349 | ||
350 | One or more cluster flags of interest has been set. These flags include: | |
351 | ||
81eedcae | 352 | * *full* - the cluster is flagged as full and cannot serve writes |
c07f9fc5 FG |
353 | * *pauserd*, *pausewr* - paused reads or writes |
354 | * *noup* - OSDs are not allowed to start | |
355 | * *nodown* - OSD failure reports are being ignored, such that the | |
356 | monitors will not mark OSDs `down` | |
357 | * *noin* - OSDs that were previously marked `out` will not be marked | |
358 | back `in` when they start | |
359 | * *noout* - down OSDs will not automatically be marked out after the | |
360 | configured interval | |
361 | * *nobackfill*, *norecover*, *norebalance* - recovery or data | |
362 | rebalancing is suspended | |
363 | * *noscrub*, *nodeep_scrub* - scrubbing is disabled | |
364 | * *notieragent* - cache tiering activity is suspended | |
365 | ||
366 | With the exception of *full*, these flags can be set or cleared with:: | |
367 | ||
368 | ceph osd set <flag> | |
369 | ceph osd unset <flag> | |
11fdf7f2 | 370 | |
c07f9fc5 FG |
371 | OSD_FLAGS |
372 | _________ | |
373 | ||
81eedcae TL |
374 | One or more OSDs or CRUSH {nodes,device classes} has a flag of interest set. |
375 | These flags include: | |
c07f9fc5 | 376 | |
81eedcae TL |
377 | * *noup*: these OSDs are not allowed to start |
378 | * *nodown*: failure reports for these OSDs will be ignored | |
379 | * *noin*: if these OSDs were previously marked `out` automatically | |
380 | after a failure, they will not be marked in when they start | |
381 | * *noout*: if these OSDs are down they will not automatically be marked | |
c07f9fc5 FG |
382 | `out` after the configured interval |
383 | ||
81eedcae | 384 | These flags can be set and cleared in batch with:: |
c07f9fc5 | 385 | |
81eedcae TL |
386 | ceph osd set-group <flags> <who> |
387 | ceph osd unset-group <flags> <who> | |
c07f9fc5 FG |
388 | |
389 | For example, :: | |
390 | ||
81eedcae TL |
391 | ceph osd set-group noup,noout osd.0 osd.1 |
392 | ceph osd unset-group noup,noout osd.0 osd.1 | |
393 | ceph osd set-group noup,noout host-foo | |
394 | ceph osd unset-group noup,noout host-foo | |
395 | ceph osd set-group noup,noout class-hdd | |
396 | ceph osd unset-group noup,noout class-hdd | |
c07f9fc5 FG |
397 | |
398 | OLD_CRUSH_TUNABLES | |
399 | __________________ | |
400 | ||
401 | The CRUSH map is using very old settings and should be updated. The | |
402 | oldest tunables that can be used (i.e., the oldest client version that | |
403 | can connect to the cluster) without triggering this health warning is | |
404 | determined by the ``mon_crush_min_required_version`` config option. | |
11fdf7f2 | 405 | See :ref:`crush-map-tunables` for more information. |
c07f9fc5 FG |
406 | |
407 | OLD_CRUSH_STRAW_CALC_VERSION | |
408 | ____________________________ | |
409 | ||
410 | The CRUSH map is using an older, non-optimal method for calculating | |
411 | intermediate weight values for ``straw`` buckets. | |
412 | ||
413 | The CRUSH map should be updated to use the newer method | |
414 | (``straw_calc_version=1``). See | |
11fdf7f2 | 415 | :ref:`crush-map-tunables` for more information. |
c07f9fc5 FG |
416 | |
417 | CACHE_POOL_NO_HIT_SET | |
418 | _____________________ | |
419 | ||
420 | One or more cache pools is not configured with a *hit set* to track | |
421 | utilization, which will prevent the tiering agent from identifying | |
422 | cold objects to flush and evict from the cache. | |
423 | ||
424 | Hit sets can be configured on the cache pool with:: | |
425 | ||
426 | ceph osd pool set <poolname> hit_set_type <type> | |
427 | ceph osd pool set <poolname> hit_set_period <period-in-seconds> | |
428 | ceph osd pool set <poolname> hit_set_count <number-of-hitsets> | |
11fdf7f2 | 429 | ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate> |
c07f9fc5 FG |
430 | |
431 | OSD_NO_SORTBITWISE | |
432 | __________________ | |
433 | ||
434 | No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not | |
435 | been set. | |
436 | ||
437 | The ``sortbitwise`` flag must be set before luminous v12.y.z or newer | |
438 | OSDs can start. You can safely set the flag with:: | |
439 | ||
440 | ceph osd set sortbitwise | |
441 | ||
20effc67 TL |
442 | OSD_FILESTORE |
443 | __________________ | |
444 | ||
445 | Filestore has been deprecated, considering that Bluestore has been the default | |
446 | objectstore for quite some time. Warn if OSDs are running Filestore. | |
447 | ||
448 | The 'mclock_scheduler' is not supported for filestore OSDs. Therefore, the | |
449 | default 'osd_op_queue' is set to 'wpq' for filestore OSDs and is enforced | |
450 | even if the user attempts to change it. | |
451 | ||
452 | Filestore OSDs can be listed with:: | |
453 | ||
454 | ceph report | jq -c '."osd_metadata" | .[] | select(.osd_objectstore | contains("filestore")) | {id, osd_objectstore}' | |
455 | ||
456 | If it is not feasible to migrate Filestore OSDs to Bluestore immediately, you can silence | |
457 | this warning temporarily with:: | |
458 | ||
459 | ceph health mute OSD_FILESTORE | |
460 | ||
c07f9fc5 FG |
461 | POOL_FULL |
462 | _________ | |
463 | ||
464 | One or more pools has reached its quota and is no longer allowing writes. | |
465 | ||
466 | Pool quotas and utilization can be seen with:: | |
467 | ||
468 | ceph df detail | |
469 | ||
470 | You can either raise the pool quota with:: | |
471 | ||
472 | ceph osd pool set-quota <poolname> max_objects <num-objects> | |
473 | ceph osd pool set-quota <poolname> max_bytes <num-bytes> | |
474 | ||
475 | or delete some existing data to reduce utilization. | |
476 | ||
81eedcae TL |
477 | BLUEFS_SPILLOVER |
478 | ________________ | |
479 | ||
480 | One or more OSDs that use the BlueStore backend have been allocated | |
481 | `db` partitions (storage space for metadata, normally on a faster | |
482 | device) but that space has filled, such that metadata has "spilled | |
483 | over" onto the normal slow device. This isn't necessarily an error | |
484 | condition or even unexpected, but if the administrator's expectation | |
485 | was that all metadata would fit on the faster device, it indicates | |
486 | that not enough space was provided. | |
487 | ||
488 | This warning can be disabled on all OSDs with:: | |
489 | ||
490 | ceph config set osd bluestore_warn_on_bluefs_spillover false | |
491 | ||
492 | Alternatively, it can be disabled on a specific OSD with:: | |
493 | ||
494 | ceph config set osd.123 bluestore_warn_on_bluefs_spillover false | |
495 | ||
496 | To provide more metadata space, the OSD in question could be destroyed and | |
497 | reprovisioned. This will involve data migration and recovery. | |
498 | ||
499 | It may also be possible to expand the LVM logical volume backing the | |
500 | `db` storage. If the underlying LV has been expanded, the OSD daemon | |
501 | needs to be stopped and BlueFS informed of the device size change with:: | |
502 | ||
503 | ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID | |
504 | ||
eafe8130 TL |
505 | BLUEFS_AVAILABLE_SPACE |
506 | ______________________ | |
507 | ||
508 | To check how much space is free for BlueFS do:: | |
509 | ||
510 | ceph daemon osd.123 bluestore bluefs available | |
511 | ||
512 | This will output up to 3 values: `BDEV_DB free`, `BDEV_SLOW free` and | |
513 | `available_from_bluestore`. `BDEV_DB` and `BDEV_SLOW` report amount of space that | |
514 | has been acquired by BlueFS and is considered free. Value `available_from_bluestore` | |
515 | denotes ability of BlueStore to relinquish more space to BlueFS. | |
516 | It is normal that this value is different from amount of BlueStore free space, as | |
517 | BlueFS allocation unit is typically larger than BlueStore allocation unit. | |
518 | This means that only part of BlueStore free space will be acceptable for BlueFS. | |
519 | ||
520 | BLUEFS_LOW_SPACE | |
521 | _________________ | |
522 | ||
523 | If BlueFS is running low on available free space and there is little | |
524 | `available_from_bluestore` one can consider reducing BlueFS allocation unit size. | |
525 | To simulate available space when allocation unit is different do:: | |
526 | ||
527 | ceph daemon osd.123 bluestore bluefs available <alloc-unit-size> | |
528 | ||
529 | BLUESTORE_FRAGMENTATION | |
530 | _______________________ | |
531 | ||
532 | As BlueStore works free space on underlying storage will get fragmented. | |
533 | This is normal and unavoidable but excessive fragmentation will cause slowdown. | |
534 | To inspect BlueStore fragmentation one can do:: | |
535 | ||
536 | ceph daemon osd.123 bluestore allocator score block | |
537 | ||
538 | Score is given in [0-1] range. | |
539 | [0.0 .. 0.4] tiny fragmentation | |
540 | [0.4 .. 0.7] small, acceptable fragmentation | |
541 | [0.7 .. 0.9] considerable, but safe fragmentation | |
542 | [0.9 .. 1.0] severe fragmentation, may impact BlueFS ability to get space from BlueStore | |
543 | ||
544 | If detailed report of free fragments is required do:: | |
545 | ||
546 | ceph daemon osd.123 bluestore allocator dump block | |
547 | ||
548 | In case when handling OSD process that is not running fragmentation can be | |
549 | inspected with `ceph-bluestore-tool`. | |
550 | Get fragmentation score:: | |
551 | ||
552 | ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score | |
553 | ||
554 | And dump detailed free chunks:: | |
555 | ||
556 | ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump | |
557 | ||
81eedcae TL |
558 | BLUESTORE_LEGACY_STATFS |
559 | _______________________ | |
560 | ||
561 | In the Nautilus release, BlueStore tracks its internal usage | |
562 | statistics on a per-pool granular basis, and one or more OSDs have | |
563 | BlueStore volumes that were created prior to Nautilus. If *all* OSDs | |
564 | are older than Nautilus, this just means that the per-pool metrics are | |
565 | not available. However, if there is a mix of pre-Nautilus and | |
566 | post-Nautilus OSDs, the cluster usage statistics reported by ``ceph | |
567 | df`` will not be accurate. | |
568 | ||
569 | The old OSDs can be updated to use the new usage tracking scheme by stopping each OSD, running a repair operation, and the restarting it. For example, if ``osd.123`` needed to be updated,:: | |
570 | ||
571 | systemctl stop ceph-osd@123 | |
572 | ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 | |
573 | systemctl start ceph-osd@123 | |
574 | ||
575 | This warning can be disabled with:: | |
576 | ||
577 | ceph config set global bluestore_warn_on_legacy_statfs false | |
578 | ||
9f95a23c TL |
579 | BLUESTORE_NO_PER_POOL_OMAP |
580 | __________________________ | |
581 | ||
582 | Starting with the Octopus release, BlueStore tracks omap space utilization | |
583 | by pool, and one or more OSDs have volumes that were created prior to | |
584 | Octopus. If all OSDs are not running BlueStore with the new tracking | |
585 | enabled, the cluster will report and approximate value for per-pool omap usage | |
586 | based on the most recent deep-scrub. | |
587 | ||
588 | The old OSDs can be updated to track by pool by stopping each OSD, | |
589 | running a repair operation, and the restarting it. For example, if | |
590 | ``osd.123`` needed to be updated,:: | |
591 | ||
592 | systemctl stop ceph-osd@123 | |
593 | ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 | |
594 | systemctl start ceph-osd@123 | |
595 | ||
596 | This warning can be disabled with:: | |
597 | ||
598 | ceph config set global bluestore_warn_on_no_per_pool_omap false | |
599 | ||
f67539c2 TL |
600 | BLUESTORE_NO_PER_PG_OMAP |
601 | __________________________ | |
602 | ||
603 | Starting with the Pacific release, BlueStore tracks omap space utilization | |
604 | by PG, and one or more OSDs have volumes that were created prior to | |
605 | Pacific. Per-PG omap enables faster PG removal when PGs migrate. | |
606 | ||
607 | The older OSDs can be updated to track by PG by stopping each OSD, | |
608 | running a repair operation, and the restarting it. For example, if | |
609 | ``osd.123`` needed to be updated,:: | |
610 | ||
611 | systemctl stop ceph-osd@123 | |
612 | ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 | |
613 | systemctl start ceph-osd@123 | |
614 | ||
615 | This warning can be disabled with:: | |
616 | ||
617 | ceph config set global bluestore_warn_on_no_per_pg_omap false | |
618 | ||
81eedcae TL |
619 | |
620 | BLUESTORE_DISK_SIZE_MISMATCH | |
621 | ____________________________ | |
622 | ||
623 | One or more OSDs using BlueStore has an internal inconsistency between the size | |
624 | of the physical device and the metadata tracking its size. This can lead to | |
625 | the OSD crashing in the future. | |
626 | ||
627 | The OSDs in question should be destroyed and reprovisioned. Care should be | |
628 | taken to do this one OSD at a time, and in a way that doesn't put any data at | |
629 | risk. For example, if osd ``$N`` has the error,:: | |
630 | ||
631 | ceph osd out osd.$N | |
632 | while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done | |
633 | ceph osd destroy osd.$N | |
634 | ceph-volume lvm zap /path/to/device | |
635 | ceph-volume lvm create --osd-id $N --data /path/to/device | |
636 | ||
9f95a23c TL |
637 | BLUESTORE_NO_COMPRESSION |
638 | ________________________ | |
639 | ||
640 | One or more OSDs is unable to load a BlueStore compression plugin. | |
641 | This can be caused by a broken installation, in which the ``ceph-osd`` | |
642 | binary does not match the compression plugins, or a recent upgrade | |
643 | that did not include a restart of the ``ceph-osd`` daemon. | |
644 | ||
645 | Verify that the package(s) on the host running the OSD(s) in question | |
646 | are correctly installed and that the OSD daemon(s) have been | |
647 | restarted. If the problem persists, check the OSD log for any clues | |
648 | as to the source of the problem. | |
649 | ||
f67539c2 TL |
650 | BLUESTORE_SPURIOUS_READ_ERRORS |
651 | ______________________________ | |
652 | ||
653 | One or more OSDs using BlueStore detects spurious read errors at main device. | |
654 | BlueStore has recovered from these errors by retrying disk reads. | |
655 | Though this might show some issues with underlying hardware, I/O subsystem, | |
656 | etc. | |
657 | Which theoretically might cause permanent data corruption. | |
658 | Some observations on the root cause can be found at | |
659 | https://tracker.ceph.com/issues/22464 | |
660 | ||
661 | This alert doesn't require immediate response but corresponding host might need | |
662 | additional attention, e.g. upgrading to the latest OS/kernel versions and | |
663 | H/W resource utilization monitoring. | |
664 | ||
665 | This warning can be disabled on all OSDs with:: | |
666 | ||
667 | ceph config set osd bluestore_warn_on_spurious_read_errors false | |
668 | ||
669 | Alternatively, it can be disabled on a specific OSD with:: | |
670 | ||
671 | ceph config set osd.123 bluestore_warn_on_spurious_read_errors false | |
9f95a23c | 672 | |
c07f9fc5 | 673 | |
11fdf7f2 TL |
674 | Device health |
675 | ------------- | |
676 | ||
677 | DEVICE_HEALTH | |
678 | _____________ | |
679 | ||
680 | One or more devices is expected to fail soon, where the warning | |
681 | threshold is controlled by the ``mgr/devicehealth/warn_threshold`` | |
682 | config option. | |
683 | ||
684 | This warning only applies to OSDs that are currently marked "in", so | |
685 | the expected response to this failure is to mark the device "out" so | |
686 | that data is migrated off of the device, and then to remove the | |
687 | hardware from the system. Note that the marking out is normally done | |
688 | automatically if ``mgr/devicehealth/self_heal`` is enabled based on | |
689 | the ``mgr/devicehealth/mark_out_threshold``. | |
690 | ||
691 | Device health can be checked with:: | |
692 | ||
693 | ceph device info <device-id> | |
694 | ||
695 | Device life expectancy is set by a prediction model run by | |
696 | the mgr or an by external tool via the command:: | |
697 | ||
698 | ceph device set-life-expectancy <device-id> <from> <to> | |
699 | ||
700 | You can change the stored life expectancy manually, but that usually | |
701 | doesn't accomplish anything as whatever tool originally set it will | |
702 | probably set it again, and changing the stored value does not affect | |
703 | the actual health of the hardware device. | |
704 | ||
705 | DEVICE_HEALTH_IN_USE | |
706 | ____________________ | |
707 | ||
708 | One or more devices is expected to fail soon and has been marked "out" | |
709 | of the cluster based on ``mgr/devicehealth/mark_out_threshold``, but it | |
710 | is still participating in one more PGs. This may be because it was | |
711 | only recently marked "out" and data is still migrating, or because data | |
712 | cannot be migrated off for some reason (e.g., the cluster is nearly | |
713 | full, or the CRUSH hierarchy is such that there isn't another suitable | |
714 | OSD to migrate the data too). | |
715 | ||
716 | This message can be silenced by disabling the self heal behavior | |
717 | (setting ``mgr/devicehealth/self_heal`` to false), by adjusting the | |
718 | ``mgr/devicehealth/mark_out_threshold``, or by addressing what is | |
719 | preventing data from being migrated off of the ailing device. | |
720 | ||
721 | DEVICE_HEALTH_TOOMANY | |
722 | _____________________ | |
723 | ||
724 | Too many devices is expected to fail soon and the | |
725 | ``mgr/devicehealth/self_heal`` behavior is enabled, such that marking | |
726 | out all of the ailing devices would exceed the clusters | |
727 | ``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being | |
728 | automatically marked "out". | |
729 | ||
730 | This generally indicates that too many devices in your cluster are | |
731 | expected to fail soon and you should take action to add newer | |
732 | (healthier) devices before too many devices fail and data is lost. | |
733 | ||
734 | The health message can also be silenced by adjusting parameters like | |
735 | ``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``, | |
736 | but be warned that this will increase the likelihood of unrecoverable | |
737 | data loss in the cluster. | |
738 | ||
739 | ||
c07f9fc5 | 740 | Data health (pools & placement groups) |
d2e6a577 | 741 | -------------------------------------- |
c07f9fc5 FG |
742 | |
743 | PG_AVAILABILITY | |
744 | _______________ | |
745 | ||
746 | Data availability is reduced, meaning that the cluster is unable to | |
747 | service potential read or write requests for some data in the cluster. | |
748 | Specifically, one or more PGs is in a state that does not allow IO | |
749 | requests to be serviced. Problematic PG states include *peering*, | |
750 | *stale*, *incomplete*, and the lack of *active* (if those conditions do not clear | |
751 | quickly). | |
752 | ||
753 | Detailed information about which PGs are affected is available from:: | |
754 | ||
755 | ceph health detail | |
756 | ||
757 | In most cases the root cause is that one or more OSDs is currently | |
11fdf7f2 | 758 | down; see the discussion for ``OSD_DOWN`` above. |
c07f9fc5 FG |
759 | |
760 | The state of specific problematic PGs can be queried with:: | |
761 | ||
762 | ceph tell <pgid> query | |
763 | ||
764 | PG_DEGRADED | |
765 | ___________ | |
766 | ||
767 | Data redundancy is reduced for some data, meaning the cluster does not | |
768 | have the desired number of replicas for all data (for replicated | |
769 | pools) or erasure code fragments (for erasure coded pools). | |
770 | Specifically, one or more PGs: | |
771 | ||
772 | * has the *degraded* or *undersized* flag set, meaning there are not | |
773 | enough instances of that placement group in the cluster; | |
774 | * has not had the *clean* flag set for some time. | |
775 | ||
776 | Detailed information about which PGs are affected is available from:: | |
777 | ||
778 | ceph health detail | |
779 | ||
780 | In most cases the root cause is that one or more OSDs is currently | |
20effc67 | 781 | down; see the discussion for ``OSD_DOWN`` above. |
c07f9fc5 FG |
782 | |
783 | The state of specific problematic PGs can be queried with:: | |
784 | ||
785 | ceph tell <pgid> query | |
786 | ||
787 | ||
eafe8130 TL |
788 | PG_RECOVERY_FULL |
789 | ________________ | |
790 | ||
791 | Data redundancy may be reduced or at risk for some data due to a lack | |
792 | of free space in the cluster. Specifically, one or more PGs has the | |
793 | *recovery_toofull* flag set, meaning that the | |
794 | cluster is unable to migrate or recover data because one or more OSDs | |
795 | is above the *full* threshold. | |
796 | ||
797 | See the discussion for *OSD_FULL* above for steps to resolve this condition. | |
798 | ||
799 | PG_BACKFILL_FULL | |
c07f9fc5 FG |
800 | ________________ |
801 | ||
802 | Data redundancy may be reduced or at risk for some data due to a lack | |
803 | of free space in the cluster. Specifically, one or more PGs has the | |
eafe8130 | 804 | *backfill_toofull* flag set, meaning that the |
c07f9fc5 FG |
805 | cluster is unable to migrate or recover data because one or more OSDs |
806 | is above the *backfillfull* threshold. | |
807 | ||
eafe8130 | 808 | See the discussion for *OSD_BACKFILLFULL* above for |
c07f9fc5 FG |
809 | steps to resolve this condition. |
810 | ||
811 | PG_DAMAGED | |
812 | __________ | |
813 | ||
814 | Data scrubbing has discovered some problems with data consistency in | |
815 | the cluster. Specifically, one or more PGs has the *inconsistent* or | |
816 | *snaptrim_error* flag is set, indicating an earlier scrub operation | |
817 | found a problem, or that the *repair* flag is set, meaning a repair | |
818 | for such an inconsistency is currently in progress. | |
819 | ||
820 | See :doc:`pg-repair` for more information. | |
821 | ||
822 | OSD_SCRUB_ERRORS | |
823 | ________________ | |
824 | ||
825 | Recent OSD scrubs have uncovered inconsistencies. This error is generally | |
11fdf7f2 | 826 | paired with *PG_DAMAGED* (see above). |
c07f9fc5 FG |
827 | |
828 | See :doc:`pg-repair` for more information. | |
829 | ||
f6b5b4d7 TL |
830 | OSD_TOO_MANY_REPAIRS |
831 | ____________________ | |
832 | ||
833 | When a read error occurs and another replica is available it is used to repair | |
834 | the error immediately, so that the client can get the object data. Scrub | |
835 | handles errors for data at rest. In order to identify possible failing disks | |
836 | that aren't seeing scrub errors, a count of read repairs is maintained. If | |
837 | it exceeds a config value threshold *mon_osd_warn_num_repaired* default 10, | |
838 | this health warning is generated. | |
839 | ||
11fdf7f2 TL |
840 | LARGE_OMAP_OBJECTS |
841 | __________________ | |
842 | ||
843 | One or more pools contain large omap objects as determined by | |
844 | ``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for number of keys | |
845 | to determine a large omap object) or | |
846 | ``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for | |
847 | summed size (bytes) of all key values to determine a large omap object) or both. | |
848 | More information on the object name, key count, and size in bytes can be found | |
849 | by searching the cluster log for 'Large omap object found'. Large omap objects | |
850 | can be caused by RGW bucket index objects that do not have automatic resharding | |
851 | enabled. Please see :ref:`RGW Dynamic Bucket Index Resharding | |
852 | <rgw_dynamic_bucket_index_resharding>` for more information on resharding. | |
853 | ||
854 | The thresholds can be adjusted with:: | |
855 | ||
856 | ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys> | |
857 | ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes> | |
858 | ||
c07f9fc5 FG |
859 | CACHE_POOL_NEAR_FULL |
860 | ____________________ | |
861 | ||
862 | A cache tier pool is nearly full. Full in this context is determined | |
863 | by the ``target_max_bytes`` and ``target_max_objects`` properties on | |
864 | the cache pool. Once the pool reaches the target threshold, write | |
865 | requests to the pool may block while data is flushed and evicted | |
866 | from the cache, a state that normally leads to very high latencies and | |
867 | poor performance. | |
868 | ||
869 | The cache pool target size can be adjusted with:: | |
870 | ||
871 | ceph osd pool set <cache-pool-name> target_max_bytes <bytes> | |
872 | ceph osd pool set <cache-pool-name> target_max_objects <objects> | |
873 | ||
874 | Normal cache flush and evict activity may also be throttled due to reduced | |
875 | availability or performance of the base tier, or overall cluster load. | |
876 | ||
877 | TOO_FEW_PGS | |
878 | ___________ | |
879 | ||
880 | The number of PGs in use in the cluster is below the configurable | |
881 | threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead | |
11fdf7f2 TL |
882 | to suboptimal distribution and balance of data across the OSDs in |
883 | the cluster, and similarly reduce overall performance. | |
c07f9fc5 FG |
884 | |
885 | This may be an expected condition if data pools have not yet been | |
886 | created. | |
887 | ||
11fdf7f2 TL |
888 | The PG count for existing pools can be increased or new pools can be created. |
889 | Please refer to :ref:`choosing-number-of-placement-groups` for more | |
890 | information. | |
891 | ||
92f5a8d4 TL |
892 | POOL_PG_NUM_NOT_POWER_OF_TWO |
893 | ____________________________ | |
894 | ||
895 | One or more pools has a ``pg_num`` value that is not a power of two. | |
896 | Although this is not strictly incorrect, it does lead to a less | |
897 | balanced distribution of data because some PGs have roughly twice as | |
898 | much data as others. | |
899 | ||
900 | This is easily corrected by setting the ``pg_num`` value for the | |
901 | affected pool(s) to a nearby power of two:: | |
902 | ||
903 | ceph osd pool set <pool-name> pg_num <value> | |
904 | ||
905 | This health warning can be disabled with:: | |
906 | ||
907 | ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false | |
908 | ||
11fdf7f2 TL |
909 | POOL_TOO_FEW_PGS |
910 | ________________ | |
911 | ||
912 | One or more pools should probably have more PGs, based on the amount | |
913 | of data that is currently stored in the pool. This can lead to | |
914 | suboptimal distribution and balance of data across the OSDs in the | |
915 | cluster, and similarly reduce overall performance. This warning is | |
916 | generated if the ``pg_autoscale_mode`` property on the pool is set to | |
917 | ``warn``. | |
918 | ||
919 | To disable the warning, you can disable auto-scaling of PGs for the | |
920 | pool entirely with:: | |
921 | ||
922 | ceph osd pool set <pool-name> pg_autoscale_mode off | |
923 | ||
924 | To allow the cluster to automatically adjust the number of PGs,:: | |
925 | ||
926 | ceph osd pool set <pool-name> pg_autoscale_mode on | |
927 | ||
928 | You can also manually set the number of PGs for the pool to the | |
929 | recommended amount with:: | |
930 | ||
931 | ceph osd pool set <pool-name> pg_num <new-pg-num> | |
932 | ||
933 | Please refer to :ref:`choosing-number-of-placement-groups` and | |
934 | :ref:`pg-autoscaler` for more information. | |
c07f9fc5 FG |
935 | |
936 | TOO_MANY_PGS | |
937 | ____________ | |
938 | ||
939 | The number of PGs in use in the cluster is above the configurable | |
3efd9988 FG |
940 | threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is |
941 | exceed the cluster will not allow new pools to be created, pool `pg_num` to | |
942 | be increased, or pool replication to be increased (any of which would lead to | |
943 | more PGs in the cluster). A large number of PGs can lead | |
c07f9fc5 FG |
944 | to higher memory utilization for OSD daemons, slower peering after |
945 | cluster state changes (like OSD restarts, additions, or removals), and | |
946 | higher load on the Manager and Monitor daemons. | |
947 | ||
3efd9988 FG |
948 | The simplest way to mitigate the problem is to increase the number of |
949 | OSDs in the cluster by adding more hardware. Note that the OSD count | |
950 | used for the purposes of this health check is the number of "in" OSDs, | |
951 | so marking "out" OSDs "in" (if there are any) can also help:: | |
c07f9fc5 | 952 | |
3efd9988 | 953 | ceph osd in <osd id(s)> |
c07f9fc5 | 954 | |
11fdf7f2 TL |
955 | Please refer to :ref:`choosing-number-of-placement-groups` for more |
956 | information. | |
957 | ||
958 | POOL_TOO_MANY_PGS | |
959 | _________________ | |
960 | ||
961 | One or more pools should probably have more PGs, based on the amount | |
962 | of data that is currently stored in the pool. This can lead to higher | |
963 | memory utilization for OSD daemons, slower peering after cluster state | |
964 | changes (like OSD restarts, additions, or removals), and higher load | |
965 | on the Manager and Monitor daemons. This warning is generated if the | |
966 | ``pg_autoscale_mode`` property on the pool is set to ``warn``. | |
967 | ||
968 | To disable the warning, you can disable auto-scaling of PGs for the | |
969 | pool entirely with:: | |
970 | ||
971 | ceph osd pool set <pool-name> pg_autoscale_mode off | |
972 | ||
973 | To allow the cluster to automatically adjust the number of PGs,:: | |
974 | ||
975 | ceph osd pool set <pool-name> pg_autoscale_mode on | |
976 | ||
977 | You can also manually set the number of PGs for the pool to the | |
978 | recommended amount with:: | |
979 | ||
980 | ceph osd pool set <pool-name> pg_num <new-pg-num> | |
981 | ||
982 | Please refer to :ref:`choosing-number-of-placement-groups` and | |
983 | :ref:`pg-autoscaler` for more information. | |
984 | ||
9f95a23c | 985 | POOL_TARGET_SIZE_BYTES_OVERCOMMITTED |
11fdf7f2 TL |
986 | ____________________________________ |
987 | ||
9f95a23c TL |
988 | One or more pools have a ``target_size_bytes`` property set to |
989 | estimate the expected size of the pool, | |
11fdf7f2 TL |
990 | but the value(s) exceed the total available storage (either by |
991 | themselves or in combination with other pools' actual usage). | |
992 | ||
9f95a23c | 993 | This is usually an indication that the ``target_size_bytes`` value for |
11fdf7f2 TL |
994 | the pool is too large and should be reduced or set to zero with:: |
995 | ||
9f95a23c | 996 | ceph osd pool set <pool-name> target_size_bytes 0 |
11fdf7f2 TL |
997 | |
998 | For more information, see :ref:`specifying_pool_target_size`. | |
999 | ||
9f95a23c | 1000 | POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO |
11fdf7f2 TL |
1001 | ____________________________________ |
1002 | ||
9f95a23c TL |
1003 | One or more pools have both ``target_size_bytes`` and |
1004 | ``target_size_ratio`` set to estimate the expected size of the pool. | |
1005 | Only one of these properties should be non-zero. If both are set, | |
1006 | ``target_size_ratio`` takes precedence and ``target_size_bytes`` is | |
1007 | ignored. | |
11fdf7f2 | 1008 | |
9f95a23c | 1009 | To reset ``target_size_bytes`` to zero:: |
11fdf7f2 TL |
1010 | |
1011 | ceph osd pool set <pool-name> target_size_bytes 0 | |
1012 | ||
1013 | For more information, see :ref:`specifying_pool_target_size`. | |
c07f9fc5 | 1014 | |
eafe8130 TL |
1015 | TOO_FEW_OSDS |
1016 | ____________ | |
1017 | ||
1018 | The number of OSDs in the cluster is below the configurable | |
1019 | threshold of ``osd_pool_default_size``. | |
1020 | ||
c07f9fc5 FG |
1021 | SMALLER_PGP_NUM |
1022 | _______________ | |
1023 | ||
1024 | One or more pools has a ``pgp_num`` value less than ``pg_num``. This | |
1025 | is normally an indication that the PG count was increased without | |
1026 | also increasing the placement behavior. | |
1027 | ||
1028 | This is sometimes done deliberately to separate out the `split` step | |
1029 | when the PG count is adjusted from the data migration that is needed | |
1030 | when ``pgp_num`` is changed. | |
1031 | ||
1032 | This is normally resolved by setting ``pgp_num`` to match ``pg_num``, | |
1033 | triggering the data migration, with:: | |
1034 | ||
1035 | ceph osd pool set <pool> pgp_num <pg-num-value> | |
1036 | ||
c07f9fc5 FG |
1037 | MANY_OBJECTS_PER_PG |
1038 | ___________________ | |
1039 | ||
1040 | One or more pools has an average number of objects per PG that is | |
1041 | significantly higher than the overall cluster average. The specific | |
1042 | threshold is controlled by the ``mon_pg_warn_max_object_skew`` | |
1043 | configuration value. | |
1044 | ||
1045 | This is usually an indication that the pool(s) containing most of the | |
1046 | data in the cluster have too few PGs, and/or that other pools that do | |
1047 | not contain as much data have too many PGs. See the discussion of | |
1048 | *TOO_MANY_PGS* above. | |
1049 | ||
1050 | The threshold can be raised to silence the health warning by adjusting | |
9f95a23c | 1051 | the ``mon_pg_warn_max_object_skew`` config option on the managers. |
c07f9fc5 | 1052 | |
20effc67 TL |
1053 | The health warning will be silenced for a particular pool if |
1054 | ``pg_autoscale_mode`` is set to ``on``. | |
11fdf7f2 | 1055 | |
c07f9fc5 FG |
1056 | POOL_APP_NOT_ENABLED |
1057 | ____________________ | |
1058 | ||
1059 | A pool exists that contains one or more objects but has not been | |
1060 | tagged for use by a particular application. | |
1061 | ||
1062 | Resolve this warning by labeling the pool for use by an application. For | |
1063 | example, if the pool is used by RBD,:: | |
1064 | ||
1065 | rbd pool init <poolname> | |
1066 | ||
1067 | If the pool is being used by a custom application 'foo', you can also label | |
1068 | via the low-level command:: | |
1069 | ||
1070 | ceph osd pool application enable foo | |
1071 | ||
11fdf7f2 | 1072 | For more information, see :ref:`associate-pool-to-application`. |
c07f9fc5 FG |
1073 | |
1074 | POOL_FULL | |
1075 | _________ | |
1076 | ||
1077 | One or more pools has reached (or is very close to reaching) its | |
1078 | quota. The threshold to trigger this error condition is controlled by | |
1079 | the ``mon_pool_quota_crit_threshold`` configuration option. | |
1080 | ||
1081 | Pool quotas can be adjusted up or down (or removed) with:: | |
1082 | ||
1083 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
1084 | ceph osd pool set-quota <pool> max_objects <objects> | |
1085 | ||
11fdf7f2 | 1086 | Setting the quota value to 0 will disable the quota. |
c07f9fc5 FG |
1087 | |
1088 | POOL_NEAR_FULL | |
1089 | ______________ | |
1090 | ||
f67539c2 TL |
1091 | One or more pools is approaching a configured fullness threshold. |
1092 | ||
1093 | One threshold that can trigger this warning condition is the | |
c07f9fc5 FG |
1094 | ``mon_pool_quota_warn_threshold`` configuration option. |
1095 | ||
1096 | Pool quotas can be adjusted up or down (or removed) with:: | |
1097 | ||
1098 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
1099 | ceph osd pool set-quota <pool> max_objects <objects> | |
1100 | ||
1101 | Setting the quota value to 0 will disable the quota. | |
1102 | ||
f67539c2 TL |
1103 | Other thresholds that can trigger the above two warning conditions are |
1104 | ``mon_osd_nearfull_ratio`` and ``mon_osd_full_ratio``. Visit the | |
1105 | :ref:`storage-capacity` and :ref:`no-free-drive-space` documents for details | |
1106 | and resolution. | |
1107 | ||
c07f9fc5 FG |
1108 | OBJECT_MISPLACED |
1109 | ________________ | |
1110 | ||
1111 | One or more objects in the cluster is not stored on the node the | |
1112 | cluster would like it to be stored on. This is an indication that | |
1113 | data migration due to some recent cluster change has not yet completed. | |
1114 | ||
1115 | Misplaced data is not a dangerous condition in and of itself; data | |
1116 | consistency is never at risk, and old copies of objects are never | |
1117 | removed until the desired number of new copies (in the desired | |
1118 | locations) are present. | |
1119 | ||
1120 | OBJECT_UNFOUND | |
1121 | ______________ | |
1122 | ||
1123 | One or more objects in the cluster cannot be found. Specifically, the | |
1124 | OSDs know that a new or updated copy of an object should exist, but a | |
1125 | copy of that version of the object has not been found on OSDs that are | |
1126 | currently online. | |
1127 | ||
1128 | Read or write requests to unfound objects will block. | |
1129 | ||
1130 | Ideally, a down OSD can be brought back online that has the more | |
1131 | recent copy of the unfound object. Candidate OSDs can be identified from the | |
1132 | peering state for the PG(s) responsible for the unfound object:: | |
1133 | ||
1134 | ceph tell <pgid> query | |
1135 | ||
1136 | If the latest copy of the object is not available, the cluster can be | |
11fdf7f2 TL |
1137 | told to roll back to a previous version of the object. See |
1138 | :ref:`failures-osd-unfound` for more information. | |
c07f9fc5 | 1139 | |
11fdf7f2 TL |
1140 | SLOW_OPS |
1141 | ________ | |
c07f9fc5 | 1142 | |
f67539c2 | 1143 | One or more OSD or monitor requests is taking a long time to process. This can |
c07f9fc5 FG |
1144 | be an indication of extreme load, a slow storage device, or a software |
1145 | bug. | |
1146 | ||
f67539c2 TL |
1147 | The request queue for the daemon in question can be queried with the |
1148 | following command, executed from the daemon's host:: | |
c07f9fc5 FG |
1149 | |
1150 | ceph daemon osd.<id> ops | |
1151 | ||
1152 | A summary of the slowest recent requests can be seen with:: | |
1153 | ||
1154 | ceph daemon osd.<id> dump_historic_ops | |
1155 | ||
1156 | The location of an OSD can be found with:: | |
1157 | ||
1158 | ceph osd find osd.<id> | |
1159 | ||
c07f9fc5 FG |
1160 | PG_NOT_SCRUBBED |
1161 | _______________ | |
1162 | ||
f67539c2 TL |
1163 | One or more PGs has not been scrubbed recently. PGs are normally scrubbed |
1164 | within every configured interval specified by | |
20effc67 TL |
1165 | :confval:`osd_scrub_max_interval` globally. This |
1166 | interval can be overridden on per-pool basis with | |
1167 | :confval:`scrub_max_interval`. The warning triggers when | |
f67539c2 TL |
1168 | ``mon_warn_pg_not_scrubbed_ratio`` percentage of interval has elapsed without a |
1169 | scrub since it was due. | |
c07f9fc5 FG |
1170 | |
1171 | PGs will not scrub if they are not flagged as *clean*, which may | |
1172 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
1173 | *PG_DEGRADED* above). | |
1174 | ||
1175 | You can manually initiate a scrub of a clean PG with:: | |
1176 | ||
1177 | ceph pg scrub <pgid> | |
1178 | ||
1179 | PG_NOT_DEEP_SCRUBBED | |
1180 | ____________________ | |
1181 | ||
1182 | One or more PGs has not been deep scrubbed recently. PGs are normally | |
20effc67 | 1183 | scrubbed every :confval:`osd_deep_scrub_interval` seconds, and this warning |
11fdf7f2 TL |
1184 | triggers when ``mon_warn_pg_not_deep_scrubbed_ratio`` percentage of interval has elapsed |
1185 | without a scrub since it was due. | |
c07f9fc5 FG |
1186 | |
1187 | PGs will not (deep) scrub if they are not flagged as *clean*, which may | |
1188 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
1189 | *PG_DEGRADED* above). | |
1190 | ||
1191 | You can manually initiate a scrub of a clean PG with:: | |
1192 | ||
1193 | ceph pg deep-scrub <pgid> | |
eafe8130 TL |
1194 | |
1195 | ||
9f95a23c TL |
1196 | PG_SLOW_SNAP_TRIMMING |
1197 | _____________________ | |
1198 | ||
1199 | The snapshot trim queue for one or more PGs has exceeded the | |
1200 | configured warning threshold. This indicates that either an extremely | |
1201 | large number of snapshots were recently deleted, or that the OSDs are | |
1202 | unable to trim snapshots quickly enough to keep up with the rate of | |
1203 | new snapshot deletions. | |
1204 | ||
1205 | The warning threshold is controlled by the | |
1206 | ``mon_osd_snap_trim_queue_warn_on`` option (default: 32768). | |
1207 | ||
1208 | This warning may trigger if OSDs are under excessive load and unable | |
1209 | to keep up with their background work, or if the OSDs' internal | |
1210 | metadata database is heavily fragmented and unable to perform. It may | |
1211 | also indicate some other performance issue with the OSDs. | |
1212 | ||
1213 | The exact size of the snapshot trim queue is reported by the | |
1214 | ``snaptrimq_len`` field of ``ceph pg ls -f json-detail``. | |
1215 | ||
eafe8130 TL |
1216 | Miscellaneous |
1217 | ------------- | |
1218 | ||
1219 | RECENT_CRASH | |
1220 | ____________ | |
1221 | ||
1222 | One or more Ceph daemons has crashed recently, and the crash has not | |
1223 | yet been archived (acknowledged) by the administrator. This may | |
1224 | indicate a software bug, a hardware problem (e.g., a failing disk), or | |
1225 | some other problem. | |
1226 | ||
1227 | New crashes can be listed with:: | |
1228 | ||
1229 | ceph crash ls-new | |
1230 | ||
1231 | Information about a specific crash can be examined with:: | |
1232 | ||
1233 | ceph crash info <crash-id> | |
1234 | ||
1235 | This warning can be silenced by "archiving" the crash (perhaps after | |
1236 | being examined by an administrator) so that it does not generate this | |
1237 | warning:: | |
1238 | ||
1239 | ceph crash archive <crash-id> | |
1240 | ||
1241 | Similarly, all new crashes can be archived with:: | |
1242 | ||
1243 | ceph crash archive-all | |
1244 | ||
1245 | Archived crashes will still be visible via ``ceph crash ls`` but not | |
20effc67 TL |
1246 | ``ceph crash ls-new``. |
1247 | ||
1248 | The time period for what "recent" means is controlled by the option | |
1249 | ``mgr/crash/warn_recent_interval`` (default: two weeks). | |
1250 | ||
1251 | These warnings can be disabled entirely with:: | |
1252 | ||
1253 | ceph config set mgr/crash/warn_recent_interval 0 | |
1254 | ||
1255 | RECENT_MGR_MODULE_CRASH | |
1256 | _______________________ | |
1257 | ||
1258 | One or more ceph-mgr modules has crashed recently, and the crash as | |
1259 | not yet been archived (acknowledged) by the administrator. This | |
1260 | generally indicates a software bug in one of the software modules run | |
1261 | inside the ceph-mgr daemon. Although the module that experienced the | |
1262 | problem maybe be disabled as a result, the function of other modules | |
1263 | is normally unaffected. | |
1264 | ||
1265 | As with the *RECENT_CRASH* health alert, the crash can be inspected with:: | |
1266 | ||
1267 | ceph crash info <crash-id> | |
1268 | ||
1269 | This warning can be silenced by "archiving" the crash (perhaps after | |
1270 | being examined by an administrator) so that it does not generate this | |
1271 | warning:: | |
1272 | ||
1273 | ceph crash archive <crash-id> | |
1274 | ||
1275 | Similarly, all new crashes can be archived with:: | |
1276 | ||
1277 | ceph crash archive-all | |
1278 | ||
1279 | Archived crashes will still be visible via ``ceph crash ls`` but not | |
eafe8130 TL |
1280 | ``ceph crash ls-new``. |
1281 | ||
1282 | The time period for what "recent" means is controlled by the option | |
1283 | ``mgr/crash/warn_recent_interval`` (default: two weeks). | |
1284 | ||
1285 | These warnings can be disabled entirely with:: | |
1286 | ||
1287 | ceph config set mgr/crash/warn_recent_interval 0 | |
1288 | ||
1289 | TELEMETRY_CHANGED | |
1290 | _________________ | |
1291 | ||
1292 | Telemetry has been enabled, but the contents of the telemetry report | |
1293 | have changed since that time, so telemetry reports will not be sent. | |
1294 | ||
1295 | The Ceph developers periodically revise the telemetry feature to | |
1296 | include new and useful information, or to remove information found to | |
1297 | be useless or sensitive. If any new information is included in the | |
1298 | report, Ceph will require the administrator to re-enable telemetry to | |
1299 | ensure they have an opportunity to (re)review what information will be | |
1300 | shared. | |
1301 | ||
1302 | To review the contents of the telemetry report,:: | |
1303 | ||
1304 | ceph telemetry show | |
1305 | ||
1306 | Note that the telemetry report consists of several optional channels | |
1307 | that may be independently enabled or disabled. For more information, see | |
1308 | :ref:`telemetry`. | |
1309 | ||
1310 | To re-enable telemetry (and make this warning go away),:: | |
1311 | ||
1312 | ceph telemetry on | |
1313 | ||
1314 | To disable telemetry (and make this warning go away),:: | |
1315 | ||
1316 | ceph telemetry off | |
9f95a23c TL |
1317 | |
1318 | AUTH_BAD_CAPS | |
1319 | _____________ | |
1320 | ||
1321 | One or more auth users has capabilities that cannot be parsed by the | |
1322 | monitor. This generally indicates that the user will not be | |
1323 | authorized to perform any action with one or more daemon types. | |
1324 | ||
1325 | This error is mostly likely to occur after an upgrade if the | |
1326 | capabilities were set with an older version of Ceph that did not | |
1327 | properly validate their syntax, or if the syntax of the capabilities | |
1328 | has changed. | |
1329 | ||
1330 | The user in question can be removed with:: | |
1331 | ||
1332 | ceph auth rm <entity-name> | |
1333 | ||
1334 | (This will resolve the health alert, but obviously clients will not be | |
1335 | able to authenticate as that user.) | |
1336 | ||
1337 | Alternatively, the capabilities for the user can be updated with:: | |
1338 | ||
1339 | ceph auth <entity-name> <daemon-type> <caps> [<daemon-type> <caps> ...] | |
1340 | ||
1341 | For more information about auth capabilities, see :ref:`user-management`. | |
1342 | ||
9f95a23c TL |
1343 | OSD_NO_DOWN_OUT_INTERVAL |
1344 | ________________________ | |
1345 | ||
1346 | The ``mon_osd_down_out_interval`` option is set to zero, which means | |
1347 | that the system will not automatically perform any repair or healing | |
1348 | operations after an OSD fails. Instead, an administrator (or some | |
1349 | other external entity) will need to manually mark down OSDs as 'out' | |
1350 | (i.e., via ``ceph osd out <osd-id>``) in order to trigger recovery. | |
1351 | ||
1352 | This option is normally set to five or ten minutes--enough time for a | |
1353 | host to power-cycle or reboot. | |
1354 | ||
1355 | This warning can silenced by setting the | |
1356 | ``mon_warn_on_osd_down_out_interval_zero`` to false:: | |
1357 | ||
1358 | ceph config global mon mon_warn_on_osd_down_out_interval_zero false | |
adb31ebb TL |
1359 | |
1360 | DASHBOARD_DEBUG | |
1361 | _______________ | |
1362 | ||
1363 | The Dashboard debug mode is enabled. This means, if there is an error | |
1364 | while processing a REST API request, the HTTP error response contains | |
1365 | a Python traceback. This behaviour should be disabled in production | |
1366 | environments because such a traceback might contain and expose sensible | |
1367 | information. | |
1368 | ||
1369 | The debug mode can be disabled with:: | |
1370 | ||
1371 | ceph dashboard debug disable |