ceph/doc/rados/troubleshooting/troubleshooting-osd.rst

   1 ======================
   2  Troubleshooting OSDs
   3 ======================
   4
   5 Before troubleshooting your OSDs, first check your monitors and network. If
   6 you execute ``ceph health`` or ``ceph -s`` on the command line and Ceph shows
   7 ``HEALTH_OK``, it means that the monitors have a quorum.
   8 If you don't have a monitor quorum or if there are errors with the monitor
   9 status, `address the monitor issues first <../troubleshooting-mon>`_.
  10 Check your networks to ensure they
  11 are running properly, because networks may have a significant impact on OSD
  12 operation and performance. Look for dropped packets on the host side
  13 and CRC errors on the switch side.
  14
  15 Obtaining Data About OSDs
  16 =========================
  17
  18 A good first step in troubleshooting your OSDs is to obtain topology information in
  19 addition to the information you collected while `monitoring your OSDs`_
  20 (e.g., ``ceph osd tree``).
  21
  22
  23 Ceph Logs
  24 ---------
  25
  26 If you haven't changed the default path, you can find Ceph log files at
  27 ``/var/log/ceph``::
  28
  29         ls /var/log/ceph
  30
  31 If you don't see enough log detail you can change your logging level.  See
  32 `Logging and Debugging`_ for details to ensure that Ceph performs adequately
  33 under high logging volume.
  34
  35
  36 Admin Socket
  37 ------------
  38
  39 Use the admin socket tool to retrieve runtime information. For details, list
  40 the sockets for your Ceph daemons::
  41
  42         ls /var/run/ceph
  43
  44 Then, execute the following, replacing ``{daemon-name}`` with an actual
  45 daemon (e.g., ``osd.0``)::
  46
  47   ceph daemon osd.0 help
  48
  49 Alternatively, you can specify a ``{socket-file}`` (e.g., something in ``/var/run/ceph``)::
  50
  51   ceph daemon {socket-file} help
  52
  53 The admin socket, among other things, allows you to:
  54
  55 - List your configuration at runtime
  56 - Dump historic operations
  57 - Dump the operation priority queue state
  58 - Dump operations in flight
  59 - Dump perfcounters
  60
  61 Display Freespace
  62 -----------------
  63
  64 Filesystem issues may arise. To display your file system's free space, execute
  65 ``df``. ::
  66
  67         df -h
  68
  69 Execute ``df --help`` for additional usage.
  70
  71 I/O Statistics
  72 --------------
  73
  74 Use `iostat`_ to identify I/O-related issues. ::
  75
  76         iostat -x
  77
  78 Diagnostic Messages
  79 -------------------
  80
  81 To retrieve diagnostic messages from the kernel, use ``dmesg`` with ``less``, ``more``, ``grep``
  82 or ``tail``.  For example::
  83
  84         dmesg | grep scsi
  85
  86 Stopping w/out Rebalancing
  87 ==========================
  88
  89 Periodically, you may need to perform maintenance on a subset of your cluster,
  90 or resolve a problem that affects a failure domain (e.g., a rack). If you do not
  91 want CRUSH to automatically rebalance the cluster as you stop OSDs for
  92 maintenance, set the cluster to ``noout`` first::
  93
  94         ceph osd set noout
  95
  96 On Luminous or newer releases it is safer to set the flag only on affected OSDs.
  97 You can do this individually ::
  98
  99         ceph osd add-noout osd.0
 100         ceph osd rm-noout  osd.0
 101
 102 Or an entire CRUSH bucket at a time.  Say you're going to take down
 103 ``prod-ceph-data1701`` to add RAM ::
 104
 105         ceph osd set-group noout prod-ceph-data1701
 106
 107 Once the flag is set you can stop the OSDs and any other colocated Ceph
 108 services within the failure domain that requires maintenance work. ::
 109
 110         systemctl stop ceph\*.service ceph\*.target
 111
 112 .. note:: Placement groups within the OSDs you stop will become ``degraded``
 113    while you are addressing issues with within the failure domain.
 114
 115 Once you have completed your maintenance, restart the OSDs and any other
 116 daemons.  If you rebooted the host as part of the maintenance, these should
 117 come back on their own without intervention. ::
 118
 119         sudo systemctl start ceph.target
 120
 121 Finally, you must unset the cluster-wide``noout`` flag::
 122
 123         ceph osd unset noout
 124         ceph osd unset-group noout prod-ceph-data1701
 125
 126 Note that most Linux distributions that Ceph supports today employ ``systemd``
 127 for service management.  For other or older operating systems you may need
 128 to issue equivalent ``service`` or ``start``/``stop`` commands.
 129
 130 .. _osd-not-running:
 131
 132 OSD Not Running
 133 ===============
 134
 135 Under normal circumstances, simply restarting the ``ceph-osd`` daemon will
 136 allow it to rejoin the cluster and recover.
 137
 138 An OSD Won't Start
 139 ------------------
 140
 141 If you start your cluster and an OSD won't start, check the following:
 142
 143 - **Configuration File:** If you were not able to get OSDs running from
 144   a new installation, check your configuration file to ensure it conforms
 145   (e.g., ``host`` not ``hostname``, etc.).
 146
 147 - **Check Paths:** Check the paths in your configuration, and the actual
 148   paths themselves for data and metadata (journals, WAL, DB). If you separate the OSD data from
 149   the metadata and there are errors in your configuration file or in the
 150   actual mounts, you may have trouble starting OSDs. If you want to store the
 151   metadata on a separate block device, you should partition or LVM your
 152   drive and assign one partition per OSD.
 153
 154 - **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be
 155   hitting the default maximum number of threads (e.g., usually 32k), especially
 156   during recovery. You can increase the number of threads using ``sysctl`` to
 157   see if increasing the maximum number of threads to the maximum possible
 158   number of threads allowed (i.e.,  4194303) will help. For example::
 159
 160         sysctl -w kernel.pid_max=4194303
 161
 162   If increasing the maximum thread count resolves the issue, you can make it
 163   permanent by including a ``kernel.pid_max`` setting in a file under ``/etc/sysctl.d`` or
 164   within the master ``/etc/sysctl.conf`` file. For example::
 165
 166         kernel.pid_max = 4194303
 167
 168 - **Check ``nf_conntrack``:** This connection tracking and limiting system
 169   is the bane of many production Ceph clusters, and can be insidious in that
 170   everything is fine at first. As cluster topology and client workload
 171   grow, mysterious and intermittent connection failures and performance
 172   glitches manifest, becoming worse over time and at certain times of day.
 173   Check ``syslog`` history for table fillage events.  You can mitigate this
 174   bother by raising ``nf_conntrack_max`` to a much higher value via ``sysctl``.
 175   Be sure to raise ``nf_conntrack_buckets`` accordingly to
 176   ``nf_conntrack_max / 4``, which may require action outside of ``sysctl`` e.g.
 177   ``"echo 131072 > /sys/module/nf_conntrack/parameters/hashsize``
 178   More interdictive but fussier is to blacklist the associated kernel modules
 179   to disable processing altogether.  This is fragile in that the modules
 180   vary among kernel versions, as does the order in which they must be listed.
 181   Even when blacklisted there are situations in which ``iptables`` or ``docker``
 182   may activate connection tracking anyway, so a "set and forget" strategy for
 183   the tunables is advised.  On modern systems this will not consume appreciable
 184   resources.
 185
 186 - **Kernel Version:** Identify the kernel version and distribution you
 187   are using. Ceph uses some third party tools by default, which may be
 188   buggy or may conflict with certain distributions and/or kernel
 189   versions (e.g., Google ``gperftools`` and ``TCMalloc``). Check the
 190   `OS recommendations`_ and the release notes for each Ceph version
 191   to ensure you have addressed any issues related to your kernel.
 192
 193 - **Segment Fault:** If there is a segment fault, increase log levels
 194   and start the problematic daemon(s) again. If segment faults recur,
 195   search the Ceph bug tracker `https://tracker.ceph/com/projects/ceph <https://tracker.ceph.com/projects/ceph/>`_
 196   and the ``dev`` and ``ceph-users`` mailing list archives `https://ceph.io/resources <https://ceph.io/resources>`_.
 197   If this is truly a new and unique
 198   failure, post to the ``dev`` email list and provide the specific Ceph
 199   release being run, ``ceph.conf`` (with secrets XXX'd out),
 200   your monitor status output and excerpts from your log file(s).
 201
 202 An OSD Failed
 203 -------------
 204
 205 When a ``ceph-osd`` process dies, surviving ``ceph-osd`` daemons will report
 206 to the mons that it appears down, which will in turn surface the new status
 207 via the ``ceph health`` command::
 208
 209         ceph health
 210         HEALTH_WARN 1/3 in osds are down
 211
 212 Specifically, you will get a warning whenever there are OSDs marked ``in``
 213 and ``down``.  You can identify which  are ``down`` with::
 214
 215         ceph health detail
 216         HEALTH_WARN 1/3 in osds are down
 217         osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080
 218
 219 or ::
 220
 221         ceph osd tree down
 222
 223 If there is a drive
 224 failure or other fault preventing ``ceph-osd`` from functioning or
 225 restarting, an error message should be present in its log file under
 226 ``/var/log/ceph``.
 227
 228 If the daemon stopped because of a heartbeat failure or ``suicide timeout``,
 229 the underlying drive or filesystem may be unresponsive. Check ``dmesg``
 230 and `syslog`  output for drive or other kernel errors.  You may need to
 231 specify something like ``dmesg -T`` to get timestamps, otherwise it's
 232 easy to mistake old errors for new.
 233
 234 If the problem is a software error (failed assertion or other
 235 unexpected error), search the archives and tracker as above, and
 236 report it to the `ceph-devel`_ email list if there's no clear fix or
 237 existing bug.
 238
 239 .. _no-free-drive-space:
 240
 241 No Free Drive Space
 242 -------------------
 243
 244 Ceph prevents you from writing to a full OSD so that you don't lose data.
 245 In an operational cluster, you should receive a warning when your cluster's OSDs
 246 and pools approach the full ratio. The ``mon_osd_full_ratio`` defaults to
 247 ``0.95``, or 95% of capacity before it stops clients from writing data.
 248 The ``mon_osd_backfillfull_ratio`` defaults to ``0.90``, or 90 % of
 249 capacity above which backfills will not start. The
 250 OSD nearfull ratio defaults to ``0.85``, or 85% of capacity
 251 when it generates a health warning.
 252
 253 Note that individual OSDs within a cluster will vary in how much data Ceph
 254 allocates to them.  This utilization can be displayed for each OSD with ::
 255
 256         ceph osd df
 257
 258 Overall cluster / pool fullness can be checked with ::
 259
 260         ceph df
 261
 262 Pay close attention to the **most full** OSDs, not the percentage of raw space
 263 used as reported by ``ceph df``.  It only takes one outlier OSD filling up to
 264 fail writes to its pool.  The space available to each pool as reported by
 265 ``ceph df`` considers the ratio settings relative to the *most full* OSD that
 266 is part of a given pool.  The distribution can be flattened by progressively
 267 moving data from overfull or to underfull OSDs using the ``reweight-by-utilization``
 268 command.  With Ceph releases beginning with later revisions of Luminous one can also
 269 exploit the ``ceph-mgr`` ``balancer`` module to perform this task automatically
 270 and rather effectively.
 271
 272 The ratios can be adjusted:
 273
 274 ::
 275
 276     ceph osd set-nearfull-ratio <float[0.0-1.0]>
 277     ceph osd set-full-ratio <float[0.0-1.0]>
 278     ceph osd set-backfillfull-ratio <float[0.0-1.0]>
 279
 280 Full cluster issues can arise when an OSD fails either as a test or organically
 281 within small and/or very full or unbalanced cluster. When an OSD or node
 282 holds an outsize percentage of the cluster's data, the ``nearfull`` and ``full``
 283 ratios may be exceeded as a result of component failures or even natural growth.
 284 If you are testing how Ceph reacts to OSD failures on a small
 285 cluster, you should leave ample free disk space and consider temporarily
 286 lowering the OSD ``full ratio``, OSD ``backfillfull ratio`` and
 287 OSD ``nearfull ratio``
 288
 289 Full ``ceph-osds`` will be reported by ``ceph health``::
 290
 291         ceph health
 292         HEALTH_WARN 1 nearfull osd(s)
 293
 294 Or::
 295
 296         ceph health detail
 297         HEALTH_ERR 1 full osd(s); 1 backfillfull osd(s); 1 nearfull osd(s)
 298         osd.3 is full at 97%
 299         osd.4 is backfill full at 91%
 300         osd.2 is near full at 87%
 301
 302 The best way to deal with a full cluster is to add capacity via new OSDs, enabling
 303 the cluster to redistribute data to newly available storage.
 304
 305 If you cannot start a legacy Filestore OSD because it is full, you may reclaim
 306 some space deleting a few placement group directories in the full OSD.
 307
 308 .. important:: If you choose to delete a placement group directory on a full OSD,
 309    **DO NOT** delete the same placement group directory on another full OSD, or
 310    **YOU WILL LOSE DATA**. You **MUST** maintain at least one copy of your data on
 311    at least one OSD.  This is a rare and extreme intervention, and is not to be
 312    undertaken lightly.
 313
 314 See `Monitor Config Reference`_ for additional details.
 315
 316 OSDs are Slow/Unresponsive
 317 ==========================
 318
 319 A common issue involves slow or unresponsive OSDs. Ensure that you
 320 have eliminated other troubleshooting possibilities before delving into OSD
 321 performance issues. For example, ensure that your network(s) is working properly
 322 and your OSDs are running. Check to see if OSDs are throttling recovery traffic.
 323
 324 .. tip:: Newer versions of Ceph provide better recovery handling by preventing
 325    recovering OSDs from using up system resources so that ``up`` and ``in``
 326    OSDs are not available or are otherwise slow.
 327
 328 Networking Issues
 329 -----------------
 330
 331 Ceph is a distributed storage system, so it relies upon networks for OSD peering
 332 and replication, recovery from faults, and periodic heartbeats. Networking
 333 issues can cause OSD latency and flapping OSDs. See `Flapping OSDs`_ for
 334 details.
 335
 336 Ensure that Ceph processes and Ceph-dependent processes are connected and/or
 337 listening. ::
 338
 339         netstat -a | grep ceph
 340         netstat -l | grep ceph
 341         sudo netstat -p | grep ceph
 342
 343 Check network statistics. ::
 344
 345         netstat -s
 346
 347 Drive Configuration
 348 -------------------
 349
 350 A SAS or SATA storage drive should only house one OSD; NVMe drives readily
 351 handle two or more. Read and write throughput can bottleneck if other processes
 352 share the drive, including journals / metadata, operating systems, Ceph monitors,
 353 `syslog` logs, other OSDs, and non-Ceph processes.
 354
 355 Ceph acknowledges writes *after* journaling, so fast SSDs are an
 356 attractive option to accelerate the response time--particularly when
 357 using the ``XFS`` or ``ext4`` file systems for legacy Filestore OSDs.
 358 By contrast, the ``Btrfs``
 359 file system can write and journal simultaneously.  (Note, however, that
 360 we recommend against using ``Btrfs`` for production deployments.)
 361
 362 .. note:: Partitioning a drive does not change its total throughput or
 363    sequential read/write limits. Running a journal in a separate partition
 364    may help, but you should prefer a separate physical drive.
 365
 366 Bad Sectors / Fragmented Disk
 367 -----------------------------
 368
 369 Check your drives for bad blocks, fragmentation, and other errors that can cause
 370 performance to drop substantially.  Invaluable tools include ``dmesg``, ``syslog``
 371 logs, and ``smartctl`` (from the ``smartmontools`` package).
 372
 373 Co-resident Monitors/OSDs
 374 -------------------------
 375
 376 Monitors are relatively lightweight processes, but they issue lots of
 377 ``fsync()`` calls,
 378 which can interfere with other workloads, particularly if monitors run on the
 379 same drive as an OSD. Additionally, if you run monitors on the same host as
 380 OSDs, you may incur performance issues related to:
 381
 382 - Running an older kernel (pre-3.0)
 383 - Running a kernel with no ``syncfs(2)`` syscall.
 384
 385 In these cases, multiple OSDs running on the same host can drag each other down
 386 by doing lots of commits. That often leads to the bursty writes.
 387
 388 Co-resident Processes
 389 ---------------------
 390
 391 Spinning up co-resident processes (convergence) such as a cloud-based solution, virtual
 392 machines and other applications that write data to Ceph while operating on the
 393 same hardware as OSDs can introduce significant OSD latency. Generally, we
 394 recommend optimizing hosts for use with Ceph and using other hosts for other
 395 processes. The practice of separating Ceph operations from other applications
 396 may help improve performance and may streamline troubleshooting and maintenance.
 397
 398 Logging Levels
 399 --------------
 400
 401 If you turned logging levels up to track an issue and then forgot to turn
 402 logging levels back down, the OSD may be putting a lot of logs onto the disk. If
 403 you intend to keep logging levels high, you may consider mounting a drive to the
 404 default path for logging (i.e., ``/var/log/ceph/$cluster-$name.log``).
 405
 406 Recovery Throttling
 407 -------------------
 408
 409 Depending upon your configuration, Ceph may reduce recovery rates to maintain
 410 performance or it may increase recovery rates to the point that recovery
 411 impacts OSD performance. Check to see if the OSD is recovering.
 412
 413 Kernel Version
 414 --------------
 415
 416 Check the kernel version you are running. Older kernels may not receive
 417 new backports that Ceph depends upon for better performance.
 418
 419 Kernel Issues with SyncFS
 420 -------------------------
 421
 422 Try running one OSD per host to see if performance improves. Old kernels
 423 might not have a recent enough version of ``glibc`` to support ``syncfs(2)``.
 424
 425 Filesystem Issues
 426 -----------------
 427
 428 Currently, we recommend deploying clusters with the BlueStore back end.
 429 When running a pre-Luminous release or if you have a specific reason to deploy
 430 OSDs with the previous Filestore backend, we recommend ``XFS``.
 431
 432 We recommend against using ``Btrfs`` or ``ext4``.  The ``Btrfs`` filesystem has
 433 many attractive features, but bugs may lead to
 434 performance issues and spurious ENOSPC errors.  We do not recommend
 435 ``ext4`` for Filestore OSDs because ``xattr`` limitations break support for long
 436 object names, which are needed for RGW.
 437
 438 For more information, see `Filesystem Recommendations`_.
 439
 440 .. _Filesystem Recommendations: ../configuration/filesystem-recommendations
 441
 442 Insufficient RAM
 443 ----------------
 444
 445 We recommend a *minimum* of 4GB of RAM per OSD daemon and suggest rounding up
 446 from 6-8GB.  You may notice that during normal operations, ``ceph-osd``
 447 processes only use a fraction of that amount.
 448 Unused RAM makes it tempting to use the excess RAM for co-resident
 449 applications or to skimp on each node's memory capacity.  However,
 450 when OSDs experience recovery their memory utilization spikes. If
 451 there is insufficient RAM available, OSD performance will slow considerably
 452 and the daemons may even crash or be killed by the Linux ``OOM Killer``.
 453
 454 Blocked Requests or Slow Requests
 455 ---------------------------------
 456
 457 If a ``ceph-osd`` daemon is slow to respond to a request, messages will be logged
 458 noting ops that are taking too long.  The warning threshold
 459 defaults to 30 seconds and is configurable via the ``osd_op_complaint_time``
 460 setting.  When this happens, the cluster log will receive messages.
 461
 462 Legacy versions of Ceph complain about ``old requests``::
 463
 464         osd.0 192.168.106.220:6800/18813 312 : [WRN] old request osd_op(client.5099.0:790 fatty_26485_object789 [write 0~4096] 2.5e54f643) v4 received at 2012-03-06 15:42:56.054801 currently waiting for sub ops
 465
 466 New versions of Ceph complain about ``slow requests``::
 467
 468         {date} {osd.num} [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
 469         {date} {osd.num}  [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
 470
 471 Possible causes include:
 472
 473 - A failing drive (check ``dmesg`` output)
 474 - A bug in the kernel file system (check ``dmesg`` output)
 475 - An overloaded cluster (check system load, iostat, etc.)
 476 - A bug in the ``ceph-osd`` daemon.
 477
 478 Possible solutions:
 479
 480 - Remove VMs from Ceph hosts
 481 - Upgrade kernel
 482 - Upgrade Ceph
 483 - Restart OSDs
 484 - Replace failed or failing components
 485
 486 Debugging Slow Requests
 487 -----------------------
 488
 489 If you run ``ceph daemon osd.<id> dump_historic_ops`` or ``ceph daemon osd.<id> dump_ops_in_flight``,
 490 you will see a set of operations and a list of events each operation went
 491 through. These are briefly described below.
 492
 493 Events from the Messenger layer:
 494
 495 - ``header_read``: When the messenger first started reading the message off the wire.
 496 - ``throttled``: When the messenger tried to acquire memory throttle space to read
 497   the message into memory.
 498 - ``all_read``: When the messenger finished reading the message off the wire.
 499 - ``dispatched``: When the messenger gave the message to the OSD.
 500 - ``initiated``: This is identical to ``header_read``. The existence of both is a
 501   historical oddity.
 502
 503 Events from the OSD as it processes ops:
 504
 505 - ``queued_for_pg``: The op has been put into the queue for processing by its PG.
 506 - ``reached_pg``: The PG has started doing the op.
 507 - ``waiting for \*``: The op is waiting for some other work to complete before it
 508   can proceed (e.g. a new OSDMap; for its object target to scrub; for the PG to
 509   finish peering; all as specified in the message).
 510 - ``started``: The op has been accepted as something the OSD should do and
 511   is now being performed.
 512 - ``waiting for subops from``: The op has been sent to replica OSDs.
 513
 514 Events from ```Filestore```:
 515
 516 - ``commit_queued_for_journal_write``: The op has been given to the FileStore.
 517 - ``write_thread_in_journal_buffer``: The op is in the journal's buffer and waiting
 518   to be persisted (as the next disk write).
 519 - ``journaled_completion_queued``: The op was journaled to disk and its callback
 520   queued for invocation.
 521
 522 Events from the OSD after data has been given to underlying storage:
 523
 524 - ``op_commit``: The op has been committed (i.e. written to journal) by the
 525   primary OSD.
 526 - ``op_applied``: The op has been `write()'en <https://www.freebsd.org/cgi/man.cgi?write(2)>`_ to the backing FS (i.e.   applied in memory but not flushed out to disk) on the primary.
 527 - ``sub_op_applied``: ``op_applied``, but for a replica's "subop".
 528 - ``sub_op_committed``: ``op_commit``, but for a replica's subop (only for EC pools).
 529 - ``sub_op_commit_rec/sub_op_apply_rec from <X>``: The primary marks this when it
 530   hears about the above, but for a particular replica (i.e. ``<X>``).
 531 - ``commit_sent``: We sent a reply back to the client (or primary OSD, for sub ops).
 532
 533 Many of these events are seemingly redundant, but cross important boundaries in
 534 the internal code (such as passing data across locks into new threads).
 535
 536 Flapping OSDs
 537 =============
 538
 539 When OSDs peer and check heartbeats, they use the cluster (back-end)
 540 network when it's available. See `Monitor/OSD Interaction`_ for details.
 541
 542 We have traditionally recommended separate *public* (front-end) and *private*
 543 (cluster / back-end / replication) networks:
 544
 545 #. Segregation of heartbeat and replication / recovery traffic (private)
 546    from client and OSD <-> mon traffic (public).  This helps keep one
 547    from DoS-ing the other, which could in turn result in a cascading failure.
 548
 549 #. Additional throughput for both public and private traffic.
 550
 551 When common networking technologies were 100Mb/s and 1Gb/s, this separation
 552 was often critical.  With today's 10Gb/s, 40Gb/s, and 25/50/100Gb/s
 553 networks, the above capacity concerns are often diminished or even obviated.
 554 For example, if your OSD nodes have two network ports, dedicating one to
 555 the public and the other to the private network means no path redundancy.
 556 This degrades your ability to weather network maintenance and failures without
 557 significant cluster or client impact.  Consider instead using both links
 558 for just a public network:  with bonding (LACP) or equal-cost routing (e.g. FRR)
 559 you reap the benefits of increased throughput headroom, fault tolerance, and
 560 reduced OSD flapping.
 561
 562 When a private network (or even a single host link) fails or degrades while the
 563 public network operates normally, OSDs may not handle this situation well. What
 564 happens is that OSDs use the public network to report each other ``down`` to
 565 the monitors, while marking themselves ``up``. The monitors then send out,
 566 again on the public network, an updated cluster map with affected OSDs marked
 567 `down`. These OSDs reply to the monitors "I'm not dead yet!", and the cycle
 568 repeats.  We call this scenario 'flapping`, and it can be difficult to isolate
 569 and remediate.  With no private network, this irksome dynamic is avoided:
 570 OSDs are generally either ``up`` or ``down`` without flapping.
 571
 572 If something does cause OSDs to 'flap' (repeatedly getting marked ``down`` and
 573 then ``up`` again), you can force the monitors to halt the flapping by
 574 temporarily freezing their states::
 575
 576         ceph osd set noup      # prevent OSDs from getting marked up
 577         ceph osd set nodown    # prevent OSDs from getting marked down
 578
 579 These flags are recorded in the osdmap::
 580
 581         ceph osd dump | grep flags
 582         flags no-up,no-down
 583
 584 You can clear the flags with::
 585
 586         ceph osd unset noup
 587         ceph osd unset nodown
 588
 589 Two other flags are supported, ``noin`` and ``noout``, which prevent
 590 booting OSDs from being marked ``in`` (allocated data) or protect OSDs
 591 from eventually being marked ``out`` (regardless of what the current value for
 592 ``mon_osd_down_out_interval`` is).
 593
 594 .. note:: ``noup``, ``noout``, and ``nodown`` are temporary in the
 595    sense that once the flags are cleared, the action they were blocking
 596    should occur shortly after.  The ``noin`` flag, on the other hand,
 597    prevents OSDs from being marked ``in`` on boot, and any daemons that
 598    started while the flag was set will remain that way.
 599
 600 .. note:: The causes and effects of flapping can be somewhat mitigated through
 601    careful adjustments to the ``mon_osd_down_out_subtree_limit``,
 602    ``mon_osd_reporter_subtree_level``, and ``mon_osd_min_down_reporters``.
 603    Derivation of optimal settings depends on cluster size, topology, and the
 604    Ceph  release in use. Their interactions are subtle and beyond the scope of
 605    this document.
 606
 607
 608 .. _iostat: https://en.wikipedia.org/wiki/Iostat
 609 .. _Ceph Logging and Debugging: ../../configuration/ceph-conf#ceph-logging-and-debugging
 610 .. _Logging and Debugging: ../log-and-debug
 611 .. _Debugging and Logging: ../debug
 612 .. _Monitor/OSD Interaction: ../../configuration/mon-osd-interaction
 613 .. _Monitor Config Reference: ../../configuration/mon-config-ref
 614 .. _monitoring your OSDs: ../../operations/monitoring-osd-pg
 615 .. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
 616 .. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
 617 .. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com
 618 .. _unsubscribe from the ceph-users email list: mailto:ceph-users-leave@lists.ceph.com
 619 .. _OS recommendations: ../../../start/os-recommendations
 620 .. _ceph-devel: ceph-devel@vger.kernel.org