]>
Commit | Line | Data |
---|---|---|
1 | ||
2 | ============= | |
3 | Health checks | |
4 | ============= | |
5 | ||
6 | Overview | |
7 | ======== | |
8 | ||
9 | There is a finite set of possible health messages that a Ceph cluster can | |
10 | raise -- these are defined as *health checks* which have unique identifiers. | |
11 | ||
12 | The identifier is a terse pseudo-human-readable (i.e. like a variable name) | |
13 | string. It is intended to enable tools (such as UIs) to make sense of | |
14 | health checks, and present them in a way that reflects their meaning. | |
15 | ||
16 | This page lists the health checks that are raised by the monitor and manager | |
17 | daemons. In addition to these, you may also see health checks that originate | |
18 | from MDS daemons (see :ref:`cephfs-health-messages`), and health checks | |
19 | that are defined by ceph-mgr python modules. | |
20 | ||
21 | Definitions | |
22 | =========== | |
23 | ||
24 | Monitor | |
25 | ------- | |
26 | ||
27 | MON_DOWN | |
28 | ________ | |
29 | ||
30 | One or more monitor daemons is currently down. The cluster requires a | |
31 | majority (more than 1/2) of the monitors in order to function. When | |
32 | one or more monitors are down, clients may have a harder time forming | |
33 | their initial connection to the cluster as they may need to try more | |
34 | addresses before they reach an operating monitor. | |
35 | ||
36 | The down monitor daemon should generally be restarted as soon as | |
37 | possible to reduce the risk of a subsequen monitor failure leading to | |
38 | a service outage. | |
39 | ||
40 | MON_CLOCK_SKEW | |
41 | ______________ | |
42 | ||
43 | The clocks on the hosts running the ceph-mon monitor daemons are not | |
44 | sufficiently well synchronized. This health alert is raised if the | |
45 | cluster detects a clock skew greater than ``mon_clock_drift_allowed``. | |
46 | ||
47 | This is best resolved by synchronizing the clocks using a tool like | |
48 | ``ntpd`` or ``chrony``. | |
49 | ||
50 | If it is impractical to keep the clocks closely synchronized, the | |
51 | ``mon_clock_drift_allowed`` threshold can also be increased, but this | |
52 | value must stay significantly below the ``mon_lease`` interval in | |
53 | order for monitor cluster to function properly. | |
54 | ||
55 | MON_MSGR2_NOT_ENABLED | |
56 | _____________________ | |
57 | ||
58 | The ``ms_bind_msgr2`` option is enabled but one or more monitors is | |
59 | not configured to bind to a v2 port in the cluster's monmap. This | |
60 | means that features specific to the msgr2 protocol (e.g., encryption) | |
61 | are not available on some or all connections. | |
62 | ||
63 | In most cases this can be corrected by issuing the command:: | |
64 | ||
65 | ceph mon enable-msgr2 | |
66 | ||
67 | That command will change any monitor configured for the old default | |
68 | port 6789 to continue to listen for v1 connections on 6789 and also | |
69 | listen for v2 connections on the new default 3300 port. | |
70 | ||
71 | If a monitor is configured to listen for v1 connections on a non-standard port (not 6789), then the monmap will need to be modified manually. | |
72 | ||
73 | ||
74 | ||
75 | Manager | |
76 | ------- | |
77 | ||
78 | MGR_MODULE_DEPENDENCY | |
79 | _____________________ | |
80 | ||
81 | An enabled manager module is failing its dependency check. This health check | |
82 | should come with an explanatory message from the module about the problem. | |
83 | ||
84 | For example, a module might report that a required package is not installed: | |
85 | install the required package and restart your manager daemons. | |
86 | ||
87 | This health check is only applied to enabled modules. If a module is | |
88 | not enabled, you can see whether it is reporting dependency issues in | |
89 | the output of `ceph module ls`. | |
90 | ||
91 | ||
92 | MGR_MODULE_ERROR | |
93 | ________________ | |
94 | ||
95 | A manager module has experienced an unexpected error. Typically, | |
96 | this means an unhandled exception was raised from the module's `serve` | |
97 | function. The human readable description of the error may be obscurely | |
98 | worded if the exception did not provide a useful description of itself. | |
99 | ||
100 | This health check may indicate a bug: please open a Ceph bug report if you | |
101 | think you have encountered a bug. | |
102 | ||
103 | If you believe the error is transient, you may restart your manager | |
104 | daemon(s), or use `ceph mgr fail` on the active daemon to prompt | |
105 | a failover to another daemon. | |
106 | ||
107 | ||
108 | OSDs | |
109 | ---- | |
110 | ||
111 | OSD_DOWN | |
112 | ________ | |
113 | ||
114 | One or more OSDs are marked down. The ceph-osd daemon may have been | |
115 | stopped, or peer OSDs may be unable to reach the OSD over the network. | |
116 | Common causes include a stopped or crashed daemon, a down host, or a | |
117 | network outage. | |
118 | ||
119 | Verify the host is healthy, the daemon is started, and network is | |
120 | functioning. If the daemon has crashed, the daemon log file | |
121 | (``/var/log/ceph/ceph-osd.*``) may contain debugging information. | |
122 | ||
123 | OSD_<crush type>_DOWN | |
124 | _____________________ | |
125 | ||
126 | (e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN) | |
127 | ||
128 | All the OSDs within a particular CRUSH subtree are marked down, for example | |
129 | all OSDs on a host. | |
130 | ||
131 | OSD_ORPHAN | |
132 | __________ | |
133 | ||
134 | An OSD is referenced in the CRUSH map hierarchy but does not exist. | |
135 | ||
136 | The OSD can be removed from the CRUSH hierarchy with:: | |
137 | ||
138 | ceph osd crush rm osd.<id> | |
139 | ||
140 | OSD_OUT_OF_ORDER_FULL | |
141 | _____________________ | |
142 | ||
143 | The utilization thresholds for `backfillfull`, `nearfull`, `full`, | |
144 | and/or `failsafe_full` are not ascending. In particular, we expect | |
145 | `backfillfull < nearfull`, `nearfull < full`, and `full < | |
146 | failsafe_full`. | |
147 | ||
148 | The thresholds can be adjusted with:: | |
149 | ||
150 | ceph osd set-backfillfull-ratio <ratio> | |
151 | ceph osd set-nearfull-ratio <ratio> | |
152 | ceph osd set-full-ratio <ratio> | |
153 | ||
154 | ||
155 | OSD_FULL | |
156 | ________ | |
157 | ||
158 | One or more OSDs has exceeded the `full` threshold and is preventing | |
159 | the cluster from servicing writes. | |
160 | ||
161 | Utilization by pool can be checked with:: | |
162 | ||
163 | ceph df | |
164 | ||
165 | The currently defined `full` ratio can be seen with:: | |
166 | ||
167 | ceph osd dump | grep full_ratio | |
168 | ||
169 | A short-term workaround to restore write availability is to raise the full | |
170 | threshold by a small amount:: | |
171 | ||
172 | ceph osd set-full-ratio <ratio> | |
173 | ||
174 | New storage should be added to the cluster by deploying more OSDs or | |
175 | existing data should be deleted in order to free up space. | |
176 | ||
177 | OSD_BACKFILLFULL | |
178 | ________________ | |
179 | ||
180 | One or more OSDs has exceeded the `backfillfull` threshold, which will | |
181 | prevent data from being allowed to rebalance to this device. This is | |
182 | an early warning that rebalancing may not be able to complete and that | |
183 | the cluster is approaching full. | |
184 | ||
185 | Utilization by pool can be checked with:: | |
186 | ||
187 | ceph df | |
188 | ||
189 | OSD_NEARFULL | |
190 | ____________ | |
191 | ||
192 | One or more OSDs has exceeded the `nearfull` threshold. This is an early | |
193 | warning that the cluster is approaching full. | |
194 | ||
195 | Utilization by pool can be checked with:: | |
196 | ||
197 | ceph df | |
198 | ||
199 | OSDMAP_FLAGS | |
200 | ____________ | |
201 | ||
202 | One or more cluster flags of interest has been set. These flags include: | |
203 | ||
204 | * *full* - the cluster is flagged as full and cannot serve writes | |
205 | * *pauserd*, *pausewr* - paused reads or writes | |
206 | * *noup* - OSDs are not allowed to start | |
207 | * *nodown* - OSD failure reports are being ignored, such that the | |
208 | monitors will not mark OSDs `down` | |
209 | * *noin* - OSDs that were previously marked `out` will not be marked | |
210 | back `in` when they start | |
211 | * *noout* - down OSDs will not automatically be marked out after the | |
212 | configured interval | |
213 | * *nobackfill*, *norecover*, *norebalance* - recovery or data | |
214 | rebalancing is suspended | |
215 | * *noscrub*, *nodeep_scrub* - scrubbing is disabled | |
216 | * *notieragent* - cache tiering activity is suspended | |
217 | ||
218 | With the exception of *full*, these flags can be set or cleared with:: | |
219 | ||
220 | ceph osd set <flag> | |
221 | ceph osd unset <flag> | |
222 | ||
223 | OSD_FLAGS | |
224 | _________ | |
225 | ||
226 | One or more OSDs or CRUSH {nodes,device classes} has a flag of interest set. | |
227 | These flags include: | |
228 | ||
229 | * *noup*: these OSDs are not allowed to start | |
230 | * *nodown*: failure reports for these OSDs will be ignored | |
231 | * *noin*: if these OSDs were previously marked `out` automatically | |
232 | after a failure, they will not be marked in when they start | |
233 | * *noout*: if these OSDs are down they will not automatically be marked | |
234 | `out` after the configured interval | |
235 | ||
236 | These flags can be set and cleared in batch with:: | |
237 | ||
238 | ceph osd set-group <flags> <who> | |
239 | ceph osd unset-group <flags> <who> | |
240 | ||
241 | For example, :: | |
242 | ||
243 | ceph osd set-group noup,noout osd.0 osd.1 | |
244 | ceph osd unset-group noup,noout osd.0 osd.1 | |
245 | ceph osd set-group noup,noout host-foo | |
246 | ceph osd unset-group noup,noout host-foo | |
247 | ceph osd set-group noup,noout class-hdd | |
248 | ceph osd unset-group noup,noout class-hdd | |
249 | ||
250 | OLD_CRUSH_TUNABLES | |
251 | __________________ | |
252 | ||
253 | The CRUSH map is using very old settings and should be updated. The | |
254 | oldest tunables that can be used (i.e., the oldest client version that | |
255 | can connect to the cluster) without triggering this health warning is | |
256 | determined by the ``mon_crush_min_required_version`` config option. | |
257 | See :ref:`crush-map-tunables` for more information. | |
258 | ||
259 | OLD_CRUSH_STRAW_CALC_VERSION | |
260 | ____________________________ | |
261 | ||
262 | The CRUSH map is using an older, non-optimal method for calculating | |
263 | intermediate weight values for ``straw`` buckets. | |
264 | ||
265 | The CRUSH map should be updated to use the newer method | |
266 | (``straw_calc_version=1``). See | |
267 | :ref:`crush-map-tunables` for more information. | |
268 | ||
269 | CACHE_POOL_NO_HIT_SET | |
270 | _____________________ | |
271 | ||
272 | One or more cache pools is not configured with a *hit set* to track | |
273 | utilization, which will prevent the tiering agent from identifying | |
274 | cold objects to flush and evict from the cache. | |
275 | ||
276 | Hit sets can be configured on the cache pool with:: | |
277 | ||
278 | ceph osd pool set <poolname> hit_set_type <type> | |
279 | ceph osd pool set <poolname> hit_set_period <period-in-seconds> | |
280 | ceph osd pool set <poolname> hit_set_count <number-of-hitsets> | |
281 | ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate> | |
282 | ||
283 | OSD_NO_SORTBITWISE | |
284 | __________________ | |
285 | ||
286 | No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not | |
287 | been set. | |
288 | ||
289 | The ``sortbitwise`` flag must be set before luminous v12.y.z or newer | |
290 | OSDs can start. You can safely set the flag with:: | |
291 | ||
292 | ceph osd set sortbitwise | |
293 | ||
294 | POOL_FULL | |
295 | _________ | |
296 | ||
297 | One or more pools has reached its quota and is no longer allowing writes. | |
298 | ||
299 | Pool quotas and utilization can be seen with:: | |
300 | ||
301 | ceph df detail | |
302 | ||
303 | You can either raise the pool quota with:: | |
304 | ||
305 | ceph osd pool set-quota <poolname> max_objects <num-objects> | |
306 | ceph osd pool set-quota <poolname> max_bytes <num-bytes> | |
307 | ||
308 | or delete some existing data to reduce utilization. | |
309 | ||
310 | BLUEFS_SPILLOVER | |
311 | ________________ | |
312 | ||
313 | One or more OSDs that use the BlueStore backend have been allocated | |
314 | `db` partitions (storage space for metadata, normally on a faster | |
315 | device) but that space has filled, such that metadata has "spilled | |
316 | over" onto the normal slow device. This isn't necessarily an error | |
317 | condition or even unexpected, but if the administrator's expectation | |
318 | was that all metadata would fit on the faster device, it indicates | |
319 | that not enough space was provided. | |
320 | ||
321 | This warning can be disabled on all OSDs with:: | |
322 | ||
323 | ceph config set osd bluestore_warn_on_bluefs_spillover false | |
324 | ||
325 | Alternatively, it can be disabled on a specific OSD with:: | |
326 | ||
327 | ceph config set osd.123 bluestore_warn_on_bluefs_spillover false | |
328 | ||
329 | To provide more metadata space, the OSD in question could be destroyed and | |
330 | reprovisioned. This will involve data migration and recovery. | |
331 | ||
332 | It may also be possible to expand the LVM logical volume backing the | |
333 | `db` storage. If the underlying LV has been expanded, the OSD daemon | |
334 | needs to be stopped and BlueFS informed of the device size change with:: | |
335 | ||
336 | ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID | |
337 | ||
338 | BLUEFS_AVAILABLE_SPACE | |
339 | ______________________ | |
340 | ||
341 | To check how much space is free for BlueFS do:: | |
342 | ||
343 | ceph daemon osd.123 bluestore bluefs available | |
344 | ||
345 | This will output up to 3 values: `BDEV_DB free`, `BDEV_SLOW free` and | |
346 | `available_from_bluestore`. `BDEV_DB` and `BDEV_SLOW` report amount of space that | |
347 | has been acquired by BlueFS and is considered free. Value `available_from_bluestore` | |
348 | denotes ability of BlueStore to relinquish more space to BlueFS. | |
349 | It is normal that this value is different from amount of BlueStore free space, as | |
350 | BlueFS allocation unit is typically larger than BlueStore allocation unit. | |
351 | This means that only part of BlueStore free space will be acceptable for BlueFS. | |
352 | ||
353 | BLUEFS_LOW_SPACE | |
354 | _________________ | |
355 | ||
356 | If BlueFS is running low on available free space and there is little | |
357 | `available_from_bluestore` one can consider reducing BlueFS allocation unit size. | |
358 | To simulate available space when allocation unit is different do:: | |
359 | ||
360 | ceph daemon osd.123 bluestore bluefs available <alloc-unit-size> | |
361 | ||
362 | BLUESTORE_FRAGMENTATION | |
363 | _______________________ | |
364 | ||
365 | As BlueStore works free space on underlying storage will get fragmented. | |
366 | This is normal and unavoidable but excessive fragmentation will cause slowdown. | |
367 | To inspect BlueStore fragmentation one can do:: | |
368 | ||
369 | ceph daemon osd.123 bluestore allocator score block | |
370 | ||
371 | Score is given in [0-1] range. | |
372 | [0.0 .. 0.4] tiny fragmentation | |
373 | [0.4 .. 0.7] small, acceptable fragmentation | |
374 | [0.7 .. 0.9] considerable, but safe fragmentation | |
375 | [0.9 .. 1.0] severe fragmentation, may impact BlueFS ability to get space from BlueStore | |
376 | ||
377 | If detailed report of free fragments is required do:: | |
378 | ||
379 | ceph daemon osd.123 bluestore allocator dump block | |
380 | ||
381 | In case when handling OSD process that is not running fragmentation can be | |
382 | inspected with `ceph-bluestore-tool`. | |
383 | Get fragmentation score:: | |
384 | ||
385 | ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score | |
386 | ||
387 | And dump detailed free chunks:: | |
388 | ||
389 | ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump | |
390 | ||
391 | BLUESTORE_LEGACY_STATFS | |
392 | _______________________ | |
393 | ||
394 | In the Nautilus release, BlueStore tracks its internal usage | |
395 | statistics on a per-pool granular basis, and one or more OSDs have | |
396 | BlueStore volumes that were created prior to Nautilus. If *all* OSDs | |
397 | are older than Nautilus, this just means that the per-pool metrics are | |
398 | not available. However, if there is a mix of pre-Nautilus and | |
399 | post-Nautilus OSDs, the cluster usage statistics reported by ``ceph | |
400 | df`` will not be accurate. | |
401 | ||
402 | The old OSDs can be updated to use the new usage tracking scheme by stopping each OSD, running a repair operation, and the restarting it. For example, if ``osd.123`` needed to be updated,:: | |
403 | ||
404 | systemctl stop ceph-osd@123 | |
405 | ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 | |
406 | systemctl start ceph-osd@123 | |
407 | ||
408 | This warning can be disabled with:: | |
409 | ||
410 | ceph config set global bluestore_warn_on_legacy_statfs false | |
411 | ||
412 | ||
413 | BLUESTORE_DISK_SIZE_MISMATCH | |
414 | ____________________________ | |
415 | ||
416 | One or more OSDs using BlueStore has an internal inconsistency between the size | |
417 | of the physical device and the metadata tracking its size. This can lead to | |
418 | the OSD crashing in the future. | |
419 | ||
420 | The OSDs in question should be destroyed and reprovisioned. Care should be | |
421 | taken to do this one OSD at a time, and in a way that doesn't put any data at | |
422 | risk. For example, if osd ``$N`` has the error,:: | |
423 | ||
424 | ceph osd out osd.$N | |
425 | while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done | |
426 | ceph osd destroy osd.$N | |
427 | ceph-volume lvm zap /path/to/device | |
428 | ceph-volume lvm create --osd-id $N --data /path/to/device | |
429 | ||
430 | ||
431 | Device health | |
432 | ------------- | |
433 | ||
434 | DEVICE_HEALTH | |
435 | _____________ | |
436 | ||
437 | One or more devices is expected to fail soon, where the warning | |
438 | threshold is controlled by the ``mgr/devicehealth/warn_threshold`` | |
439 | config option. | |
440 | ||
441 | This warning only applies to OSDs that are currently marked "in", so | |
442 | the expected response to this failure is to mark the device "out" so | |
443 | that data is migrated off of the device, and then to remove the | |
444 | hardware from the system. Note that the marking out is normally done | |
445 | automatically if ``mgr/devicehealth/self_heal`` is enabled based on | |
446 | the ``mgr/devicehealth/mark_out_threshold``. | |
447 | ||
448 | Device health can be checked with:: | |
449 | ||
450 | ceph device info <device-id> | |
451 | ||
452 | Device life expectancy is set by a prediction model run by | |
453 | the mgr or an by external tool via the command:: | |
454 | ||
455 | ceph device set-life-expectancy <device-id> <from> <to> | |
456 | ||
457 | You can change the stored life expectancy manually, but that usually | |
458 | doesn't accomplish anything as whatever tool originally set it will | |
459 | probably set it again, and changing the stored value does not affect | |
460 | the actual health of the hardware device. | |
461 | ||
462 | DEVICE_HEALTH_IN_USE | |
463 | ____________________ | |
464 | ||
465 | One or more devices is expected to fail soon and has been marked "out" | |
466 | of the cluster based on ``mgr/devicehealth/mark_out_threshold``, but it | |
467 | is still participating in one more PGs. This may be because it was | |
468 | only recently marked "out" and data is still migrating, or because data | |
469 | cannot be migrated off for some reason (e.g., the cluster is nearly | |
470 | full, or the CRUSH hierarchy is such that there isn't another suitable | |
471 | OSD to migrate the data too). | |
472 | ||
473 | This message can be silenced by disabling the self heal behavior | |
474 | (setting ``mgr/devicehealth/self_heal`` to false), by adjusting the | |
475 | ``mgr/devicehealth/mark_out_threshold``, or by addressing what is | |
476 | preventing data from being migrated off of the ailing device. | |
477 | ||
478 | DEVICE_HEALTH_TOOMANY | |
479 | _____________________ | |
480 | ||
481 | Too many devices is expected to fail soon and the | |
482 | ``mgr/devicehealth/self_heal`` behavior is enabled, such that marking | |
483 | out all of the ailing devices would exceed the clusters | |
484 | ``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being | |
485 | automatically marked "out". | |
486 | ||
487 | This generally indicates that too many devices in your cluster are | |
488 | expected to fail soon and you should take action to add newer | |
489 | (healthier) devices before too many devices fail and data is lost. | |
490 | ||
491 | The health message can also be silenced by adjusting parameters like | |
492 | ``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``, | |
493 | but be warned that this will increase the likelihood of unrecoverable | |
494 | data loss in the cluster. | |
495 | ||
496 | ||
497 | Data health (pools & placement groups) | |
498 | -------------------------------------- | |
499 | ||
500 | PG_AVAILABILITY | |
501 | _______________ | |
502 | ||
503 | Data availability is reduced, meaning that the cluster is unable to | |
504 | service potential read or write requests for some data in the cluster. | |
505 | Specifically, one or more PGs is in a state that does not allow IO | |
506 | requests to be serviced. Problematic PG states include *peering*, | |
507 | *stale*, *incomplete*, and the lack of *active* (if those conditions do not clear | |
508 | quickly). | |
509 | ||
510 | Detailed information about which PGs are affected is available from:: | |
511 | ||
512 | ceph health detail | |
513 | ||
514 | In most cases the root cause is that one or more OSDs is currently | |
515 | down; see the discussion for ``OSD_DOWN`` above. | |
516 | ||
517 | The state of specific problematic PGs can be queried with:: | |
518 | ||
519 | ceph tell <pgid> query | |
520 | ||
521 | PG_DEGRADED | |
522 | ___________ | |
523 | ||
524 | Data redundancy is reduced for some data, meaning the cluster does not | |
525 | have the desired number of replicas for all data (for replicated | |
526 | pools) or erasure code fragments (for erasure coded pools). | |
527 | Specifically, one or more PGs: | |
528 | ||
529 | * has the *degraded* or *undersized* flag set, meaning there are not | |
530 | enough instances of that placement group in the cluster; | |
531 | * has not had the *clean* flag set for some time. | |
532 | ||
533 | Detailed information about which PGs are affected is available from:: | |
534 | ||
535 | ceph health detail | |
536 | ||
537 | In most cases the root cause is that one or more OSDs is currently | |
538 | down; see the dicussion for ``OSD_DOWN`` above. | |
539 | ||
540 | The state of specific problematic PGs can be queried with:: | |
541 | ||
542 | ceph tell <pgid> query | |
543 | ||
544 | ||
545 | PG_RECOVERY_FULL | |
546 | ________________ | |
547 | ||
548 | Data redundancy may be reduced or at risk for some data due to a lack | |
549 | of free space in the cluster. Specifically, one or more PGs has the | |
550 | *recovery_toofull* flag set, meaning that the | |
551 | cluster is unable to migrate or recover data because one or more OSDs | |
552 | is above the *full* threshold. | |
553 | ||
554 | See the discussion for *OSD_FULL* above for steps to resolve this condition. | |
555 | ||
556 | PG_BACKFILL_FULL | |
557 | ________________ | |
558 | ||
559 | Data redundancy may be reduced or at risk for some data due to a lack | |
560 | of free space in the cluster. Specifically, one or more PGs has the | |
561 | *backfill_toofull* flag set, meaning that the | |
562 | cluster is unable to migrate or recover data because one or more OSDs | |
563 | is above the *backfillfull* threshold. | |
564 | ||
565 | See the discussion for *OSD_BACKFILLFULL* above for | |
566 | steps to resolve this condition. | |
567 | ||
568 | PG_DAMAGED | |
569 | __________ | |
570 | ||
571 | Data scrubbing has discovered some problems with data consistency in | |
572 | the cluster. Specifically, one or more PGs has the *inconsistent* or | |
573 | *snaptrim_error* flag is set, indicating an earlier scrub operation | |
574 | found a problem, or that the *repair* flag is set, meaning a repair | |
575 | for such an inconsistency is currently in progress. | |
576 | ||
577 | See :doc:`pg-repair` for more information. | |
578 | ||
579 | OSD_SCRUB_ERRORS | |
580 | ________________ | |
581 | ||
582 | Recent OSD scrubs have uncovered inconsistencies. This error is generally | |
583 | paired with *PG_DAMAGED* (see above). | |
584 | ||
585 | See :doc:`pg-repair` for more information. | |
586 | ||
587 | LARGE_OMAP_OBJECTS | |
588 | __________________ | |
589 | ||
590 | One or more pools contain large omap objects as determined by | |
591 | ``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for number of keys | |
592 | to determine a large omap object) or | |
593 | ``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for | |
594 | summed size (bytes) of all key values to determine a large omap object) or both. | |
595 | More information on the object name, key count, and size in bytes can be found | |
596 | by searching the cluster log for 'Large omap object found'. Large omap objects | |
597 | can be caused by RGW bucket index objects that do not have automatic resharding | |
598 | enabled. Please see :ref:`RGW Dynamic Bucket Index Resharding | |
599 | <rgw_dynamic_bucket_index_resharding>` for more information on resharding. | |
600 | ||
601 | The thresholds can be adjusted with:: | |
602 | ||
603 | ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys> | |
604 | ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes> | |
605 | ||
606 | CACHE_POOL_NEAR_FULL | |
607 | ____________________ | |
608 | ||
609 | A cache tier pool is nearly full. Full in this context is determined | |
610 | by the ``target_max_bytes`` and ``target_max_objects`` properties on | |
611 | the cache pool. Once the pool reaches the target threshold, write | |
612 | requests to the pool may block while data is flushed and evicted | |
613 | from the cache, a state that normally leads to very high latencies and | |
614 | poor performance. | |
615 | ||
616 | The cache pool target size can be adjusted with:: | |
617 | ||
618 | ceph osd pool set <cache-pool-name> target_max_bytes <bytes> | |
619 | ceph osd pool set <cache-pool-name> target_max_objects <objects> | |
620 | ||
621 | Normal cache flush and evict activity may also be throttled due to reduced | |
622 | availability or performance of the base tier, or overall cluster load. | |
623 | ||
624 | TOO_FEW_PGS | |
625 | ___________ | |
626 | ||
627 | The number of PGs in use in the cluster is below the configurable | |
628 | threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead | |
629 | to suboptimal distribution and balance of data across the OSDs in | |
630 | the cluster, and similarly reduce overall performance. | |
631 | ||
632 | This may be an expected condition if data pools have not yet been | |
633 | created. | |
634 | ||
635 | The PG count for existing pools can be increased or new pools can be created. | |
636 | Please refer to :ref:`choosing-number-of-placement-groups` for more | |
637 | information. | |
638 | ||
639 | POOL_PG_NUM_NOT_POWER_OF_TWO | |
640 | ____________________________ | |
641 | ||
642 | One or more pools has a ``pg_num`` value that is not a power of two. | |
643 | Although this is not strictly incorrect, it does lead to a less | |
644 | balanced distribution of data because some PGs have roughly twice as | |
645 | much data as others. | |
646 | ||
647 | This is easily corrected by setting the ``pg_num`` value for the | |
648 | affected pool(s) to a nearby power of two:: | |
649 | ||
650 | ceph osd pool set <pool-name> pg_num <value> | |
651 | ||
652 | This health warning can be disabled with:: | |
653 | ||
654 | ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false | |
655 | ||
656 | POOL_TOO_FEW_PGS | |
657 | ________________ | |
658 | ||
659 | One or more pools should probably have more PGs, based on the amount | |
660 | of data that is currently stored in the pool. This can lead to | |
661 | suboptimal distribution and balance of data across the OSDs in the | |
662 | cluster, and similarly reduce overall performance. This warning is | |
663 | generated if the ``pg_autoscale_mode`` property on the pool is set to | |
664 | ``warn``. | |
665 | ||
666 | To disable the warning, you can disable auto-scaling of PGs for the | |
667 | pool entirely with:: | |
668 | ||
669 | ceph osd pool set <pool-name> pg_autoscale_mode off | |
670 | ||
671 | To allow the cluster to automatically adjust the number of PGs,:: | |
672 | ||
673 | ceph osd pool set <pool-name> pg_autoscale_mode on | |
674 | ||
675 | You can also manually set the number of PGs for the pool to the | |
676 | recommended amount with:: | |
677 | ||
678 | ceph osd pool set <pool-name> pg_num <new-pg-num> | |
679 | ||
680 | Please refer to :ref:`choosing-number-of-placement-groups` and | |
681 | :ref:`pg-autoscaler` for more information. | |
682 | ||
683 | TOO_MANY_PGS | |
684 | ____________ | |
685 | ||
686 | The number of PGs in use in the cluster is above the configurable | |
687 | threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is | |
688 | exceed the cluster will not allow new pools to be created, pool `pg_num` to | |
689 | be increased, or pool replication to be increased (any of which would lead to | |
690 | more PGs in the cluster). A large number of PGs can lead | |
691 | to higher memory utilization for OSD daemons, slower peering after | |
692 | cluster state changes (like OSD restarts, additions, or removals), and | |
693 | higher load on the Manager and Monitor daemons. | |
694 | ||
695 | The simplest way to mitigate the problem is to increase the number of | |
696 | OSDs in the cluster by adding more hardware. Note that the OSD count | |
697 | used for the purposes of this health check is the number of "in" OSDs, | |
698 | so marking "out" OSDs "in" (if there are any) can also help:: | |
699 | ||
700 | ceph osd in <osd id(s)> | |
701 | ||
702 | Please refer to :ref:`choosing-number-of-placement-groups` for more | |
703 | information. | |
704 | ||
705 | POOL_TOO_MANY_PGS | |
706 | _________________ | |
707 | ||
708 | One or more pools should probably have more PGs, based on the amount | |
709 | of data that is currently stored in the pool. This can lead to higher | |
710 | memory utilization for OSD daemons, slower peering after cluster state | |
711 | changes (like OSD restarts, additions, or removals), and higher load | |
712 | on the Manager and Monitor daemons. This warning is generated if the | |
713 | ``pg_autoscale_mode`` property on the pool is set to ``warn``. | |
714 | ||
715 | To disable the warning, you can disable auto-scaling of PGs for the | |
716 | pool entirely with:: | |
717 | ||
718 | ceph osd pool set <pool-name> pg_autoscale_mode off | |
719 | ||
720 | To allow the cluster to automatically adjust the number of PGs,:: | |
721 | ||
722 | ceph osd pool set <pool-name> pg_autoscale_mode on | |
723 | ||
724 | You can also manually set the number of PGs for the pool to the | |
725 | recommended amount with:: | |
726 | ||
727 | ceph osd pool set <pool-name> pg_num <new-pg-num> | |
728 | ||
729 | Please refer to :ref:`choosing-number-of-placement-groups` and | |
730 | :ref:`pg-autoscaler` for more information. | |
731 | ||
732 | POOL_TARGET_SIZE_RATIO_OVERCOMMITTED | |
733 | ____________________________________ | |
734 | ||
735 | One or more pools have a ``target_size_ratio`` property set to | |
736 | estimate the expected size of the pool as a fraction of total storage, | |
737 | but the value(s) exceed the total available storage (either by | |
738 | themselves or in combination with other pools' actual usage). | |
739 | ||
740 | This is usually an indication that the ``target_size_ratio`` value for | |
741 | the pool is too large and should be reduced or set to zero with:: | |
742 | ||
743 | ceph osd pool set <pool-name> target_size_ratio 0 | |
744 | ||
745 | For more information, see :ref:`specifying_pool_target_size`. | |
746 | ||
747 | POOL_TARGET_SIZE_BYTES_OVERCOMMITTED | |
748 | ____________________________________ | |
749 | ||
750 | One or more pools have a ``target_size_bytes`` property set to | |
751 | estimate the expected size of the pool, | |
752 | but the value(s) exceed the total available storage (either by | |
753 | themselves or in combination with other pools' actual usage). | |
754 | ||
755 | This is usually an indication that the ``target_size_bytes`` value for | |
756 | the pool is too large and should be reduced or set to zero with:: | |
757 | ||
758 | ceph osd pool set <pool-name> target_size_bytes 0 | |
759 | ||
760 | For more information, see :ref:`specifying_pool_target_size`. | |
761 | ||
762 | TOO_FEW_OSDS | |
763 | ____________ | |
764 | ||
765 | The number of OSDs in the cluster is below the configurable | |
766 | threshold of ``osd_pool_default_size``. | |
767 | ||
768 | SMALLER_PGP_NUM | |
769 | _______________ | |
770 | ||
771 | One or more pools has a ``pgp_num`` value less than ``pg_num``. This | |
772 | is normally an indication that the PG count was increased without | |
773 | also increasing the placement behavior. | |
774 | ||
775 | This is sometimes done deliberately to separate out the `split` step | |
776 | when the PG count is adjusted from the data migration that is needed | |
777 | when ``pgp_num`` is changed. | |
778 | ||
779 | This is normally resolved by setting ``pgp_num`` to match ``pg_num``, | |
780 | triggering the data migration, with:: | |
781 | ||
782 | ceph osd pool set <pool> pgp_num <pg-num-value> | |
783 | ||
784 | MANY_OBJECTS_PER_PG | |
785 | ___________________ | |
786 | ||
787 | One or more pools has an average number of objects per PG that is | |
788 | significantly higher than the overall cluster average. The specific | |
789 | threshold is controlled by the ``mon_pg_warn_max_object_skew`` | |
790 | configuration value. | |
791 | ||
792 | This is usually an indication that the pool(s) containing most of the | |
793 | data in the cluster have too few PGs, and/or that other pools that do | |
794 | not contain as much data have too many PGs. See the discussion of | |
795 | *TOO_MANY_PGS* above. | |
796 | ||
797 | The threshold can be raised to silence the health warning by adjusting | |
798 | the ``mon_pg_warn_max_object_skew`` config option on the monitors. | |
799 | ||
800 | ||
801 | POOL_APP_NOT_ENABLED | |
802 | ____________________ | |
803 | ||
804 | A pool exists that contains one or more objects but has not been | |
805 | tagged for use by a particular application. | |
806 | ||
807 | Resolve this warning by labeling the pool for use by an application. For | |
808 | example, if the pool is used by RBD,:: | |
809 | ||
810 | rbd pool init <poolname> | |
811 | ||
812 | If the pool is being used by a custom application 'foo', you can also label | |
813 | via the low-level command:: | |
814 | ||
815 | ceph osd pool application enable foo | |
816 | ||
817 | For more information, see :ref:`associate-pool-to-application`. | |
818 | ||
819 | POOL_FULL | |
820 | _________ | |
821 | ||
822 | One or more pools has reached (or is very close to reaching) its | |
823 | quota. The threshold to trigger this error condition is controlled by | |
824 | the ``mon_pool_quota_crit_threshold`` configuration option. | |
825 | ||
826 | Pool quotas can be adjusted up or down (or removed) with:: | |
827 | ||
828 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
829 | ceph osd pool set-quota <pool> max_objects <objects> | |
830 | ||
831 | Setting the quota value to 0 will disable the quota. | |
832 | ||
833 | POOL_NEAR_FULL | |
834 | ______________ | |
835 | ||
836 | One or more pools is approaching is quota. The threshold to trigger | |
837 | this warning condition is controlled by the | |
838 | ``mon_pool_quota_warn_threshold`` configuration option. | |
839 | ||
840 | Pool quotas can be adjusted up or down (or removed) with:: | |
841 | ||
842 | ceph osd pool set-quota <pool> max_bytes <bytes> | |
843 | ceph osd pool set-quota <pool> max_objects <objects> | |
844 | ||
845 | Setting the quota value to 0 will disable the quota. | |
846 | ||
847 | OBJECT_MISPLACED | |
848 | ________________ | |
849 | ||
850 | One or more objects in the cluster is not stored on the node the | |
851 | cluster would like it to be stored on. This is an indication that | |
852 | data migration due to some recent cluster change has not yet completed. | |
853 | ||
854 | Misplaced data is not a dangerous condition in and of itself; data | |
855 | consistency is never at risk, and old copies of objects are never | |
856 | removed until the desired number of new copies (in the desired | |
857 | locations) are present. | |
858 | ||
859 | OBJECT_UNFOUND | |
860 | ______________ | |
861 | ||
862 | One or more objects in the cluster cannot be found. Specifically, the | |
863 | OSDs know that a new or updated copy of an object should exist, but a | |
864 | copy of that version of the object has not been found on OSDs that are | |
865 | currently online. | |
866 | ||
867 | Read or write requests to unfound objects will block. | |
868 | ||
869 | Ideally, a down OSD can be brought back online that has the more | |
870 | recent copy of the unfound object. Candidate OSDs can be identified from the | |
871 | peering state for the PG(s) responsible for the unfound object:: | |
872 | ||
873 | ceph tell <pgid> query | |
874 | ||
875 | If the latest copy of the object is not available, the cluster can be | |
876 | told to roll back to a previous version of the object. See | |
877 | :ref:`failures-osd-unfound` for more information. | |
878 | ||
879 | SLOW_OPS | |
880 | ________ | |
881 | ||
882 | One or more OSD requests is taking a long time to process. This can | |
883 | be an indication of extreme load, a slow storage device, or a software | |
884 | bug. | |
885 | ||
886 | The request queue on the OSD(s) in question can be queried with the | |
887 | following command, executed from the OSD host:: | |
888 | ||
889 | ceph daemon osd.<id> ops | |
890 | ||
891 | A summary of the slowest recent requests can be seen with:: | |
892 | ||
893 | ceph daemon osd.<id> dump_historic_ops | |
894 | ||
895 | The location of an OSD can be found with:: | |
896 | ||
897 | ceph osd find osd.<id> | |
898 | ||
899 | PG_NOT_SCRUBBED | |
900 | _______________ | |
901 | ||
902 | One or more PGs has not been scrubbed recently. PGs are normally | |
903 | scrubbed every ``mon_scrub_interval`` seconds, and this warning | |
904 | triggers when ``mon_warn_pg_not_scrubbed_ratio`` percentage of interval has elapsed | |
905 | without a scrub since it was due. | |
906 | ||
907 | PGs will not scrub if they are not flagged as *clean*, which may | |
908 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
909 | *PG_DEGRADED* above). | |
910 | ||
911 | You can manually initiate a scrub of a clean PG with:: | |
912 | ||
913 | ceph pg scrub <pgid> | |
914 | ||
915 | PG_NOT_DEEP_SCRUBBED | |
916 | ____________________ | |
917 | ||
918 | One or more PGs has not been deep scrubbed recently. PGs are normally | |
919 | scrubbed every ``osd_deep_scrub_interval`` seconds, and this warning | |
920 | triggers when ``mon_warn_pg_not_deep_scrubbed_ratio`` percentage of interval has elapsed | |
921 | without a scrub since it was due. | |
922 | ||
923 | PGs will not (deep) scrub if they are not flagged as *clean*, which may | |
924 | happen if they are misplaced or degraded (see *PG_AVAILABILITY* and | |
925 | *PG_DEGRADED* above). | |
926 | ||
927 | You can manually initiate a scrub of a clean PG with:: | |
928 | ||
929 | ceph pg deep-scrub <pgid> | |
930 | ||
931 | ||
932 | Miscellaneous | |
933 | ------------- | |
934 | ||
935 | RECENT_CRASH | |
936 | ____________ | |
937 | ||
938 | One or more Ceph daemons has crashed recently, and the crash has not | |
939 | yet been archived (acknowledged) by the administrator. This may | |
940 | indicate a software bug, a hardware problem (e.g., a failing disk), or | |
941 | some other problem. | |
942 | ||
943 | New crashes can be listed with:: | |
944 | ||
945 | ceph crash ls-new | |
946 | ||
947 | Information about a specific crash can be examined with:: | |
948 | ||
949 | ceph crash info <crash-id> | |
950 | ||
951 | This warning can be silenced by "archiving" the crash (perhaps after | |
952 | being examined by an administrator) so that it does not generate this | |
953 | warning:: | |
954 | ||
955 | ceph crash archive <crash-id> | |
956 | ||
957 | Similarly, all new crashes can be archived with:: | |
958 | ||
959 | ceph crash archive-all | |
960 | ||
961 | Archived crashes will still be visible via ``ceph crash ls`` but not | |
962 | ``ceph crash ls-new``. | |
963 | ||
964 | The time period for what "recent" means is controlled by the option | |
965 | ``mgr/crash/warn_recent_interval`` (default: two weeks). | |
966 | ||
967 | These warnings can be disabled entirely with:: | |
968 | ||
969 | ceph config set mgr/crash/warn_recent_interval 0 | |
970 | ||
971 | TELEMETRY_CHANGED | |
972 | _________________ | |
973 | ||
974 | Telemetry has been enabled, but the contents of the telemetry report | |
975 | have changed since that time, so telemetry reports will not be sent. | |
976 | ||
977 | The Ceph developers periodically revise the telemetry feature to | |
978 | include new and useful information, or to remove information found to | |
979 | be useless or sensitive. If any new information is included in the | |
980 | report, Ceph will require the administrator to re-enable telemetry to | |
981 | ensure they have an opportunity to (re)review what information will be | |
982 | shared. | |
983 | ||
984 | To review the contents of the telemetry report,:: | |
985 | ||
986 | ceph telemetry show | |
987 | ||
988 | Note that the telemetry report consists of several optional channels | |
989 | that may be independently enabled or disabled. For more information, see | |
990 | :ref:`telemetry`. | |
991 | ||
992 | To re-enable telemetry (and make this warning go away),:: | |
993 | ||
994 | ceph telemetry on | |
995 | ||
996 | To disable telemetry (and make this warning go away),:: | |
997 | ||
998 | ceph telemetry off |