ceph/doc/rados/configuration/mon-osd-interaction.rst

   1 =====================================
   2  Configuring Monitor/OSD Interaction
   3 =====================================
   4
   5 .. index:: heartbeat
   6
   7 After you have completed your initial Ceph configuration, you may deploy and run
   8 Ceph.  When you execute a command such as ``ceph health`` or ``ceph -s``,  the
   9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
  10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
  11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
  12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
  13 Monitor doesn't receive reports, or if it receives reports of changes in the
  14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
  15 Cluster Map`.
  16
  17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
  18 interaction. However, you may override the defaults. The following sections
  19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
  20 monitoring the Ceph Storage Cluster.
  21
  22 .. index:: heartbeat interval
  23
  24 OSDs Check Heartbeats
  25 =====================
  26
  27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
  28 seconds. You can change the heartbeat interval by adding an ``osd heartbeat
  29 interval`` setting under the ``[osd]`` section of your Ceph configuration file,
  30 or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
  31 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
  32 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
  33 Monitor, which will update the Ceph Cluster Map. You may change this grace
  34 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
  35 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
  36 or by setting the value at runtime.
  37
  38
  39 .. ditaa:: +---------+          +---------+
  40            |  OSD 1  |          |  OSD 2  |
  41            +---------+          +---------+
  42                 |                    |
  43                 |----+ Heartbeat     |
  44                 |    | Interval      |
  45                 |<---+ Exceeded      |
  46                 |                    |
  47                 |       Check        |
  48                 |     Heartbeat      |
  49                 |------------------->|
  50                 |                    |
  51                 |<-------------------|
  52                 |   Heart Beating    |
  53                 |                    |
  54                 |----+ Heartbeat     |
  55                 |    | Interval      |
  56                 |<---+ Exceeded      |
  57                 |                    |
  58                 |       Check        |
  59                 |     Heartbeat      |
  60                 |------------------->|
  61                 |                    |
  62                 |----+ Grace         |
  63                 |    | Period        |
  64                 |<---+ Exceeded      |
  65                 |                    |
  66                 |----+ Mark          |
  67                 |    | OSD 2         |
  68                 |<---+ Down          |
  69
  70
  71 .. index:: OSD down report
  72
  73 OSDs Report Down OSDs
  74 =====================
  75
  76 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
  77 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
  78 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
  79 that all the OSDs reporting the failure are hosted in a rack with a bad switch
  80 which has trouble connecting to another OSD. To avoid this sort of false alarm,
  81 we consider the peers reporting a failure a proxy for a potential "subcluster"
  82 over the overall cluster that is similarly laggy. This is clearly not true in
  83 all cases, but will sometimes help us localize the grace correction to a subset
  84 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
  85 group the peers into the "subcluster" by their common ancestor type in CRUSH
  86 map. By default, only two reports from different subtree are required to report
  87 another Ceph OSD Daemon ``down``. You can change the number of reporters from
  88 unique subtrees and the common ancestor type required to report a Ceph OSD
  89 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
  90 and ``mon osd reporter subtree level`` settings  under the ``[mon]`` section of
  91 your Ceph configuration file, or by setting the value at runtime.
  92
  93
  94 .. ditaa:: +---------+     +---------+      +---------+
  95            |  OSD 1  |     |  OSD 2  |      | Monitor |
  96            +---------+     +---------+      +---------+
  97                 |               |                |
  98                 | OSD 3 Is Down |                |
  99                 |---------------+--------------->|
 100                 |               |                |
 101                 |               |                |
 102                 |               | OSD 3 Is Down  |
 103                 |               |--------------->|
 104                 |               |                |
 105                 |               |                |
 106                 |               |                |---------+ Mark
 107                 |               |                |         | OSD 3
 108                 |               |                |<--------+ Down
 109
 110
 111 .. index:: peering failure
 112
 113 OSDs Report Peering Failure
 114 ===========================
 115
 116 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
 117 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
 118 the most recent copy of the cluster map every 30 seconds. You can change the
 119 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
 120 setting under the ``[osd]`` section of your Ceph configuration file, or by
 121 setting the value at runtime.
 122
 123 .. ditaa:: +---------+     +---------+     +-------+     +---------+
 124            |  OSD 1  |     |  OSD 2  |     | OSD 3 |     | Monitor |
 125            +---------+     +---------+     +-------+     +---------+
 126                 |               |              |              |
 127                 |  Request To   |              |              |
 128                 |     Peer      |              |              |
 129                 |-------------->|              |              |
 130                 |<--------------|              |              |
 131                 |    Peering                   |              |
 132                 |                              |              |
 133                 |  Request To                  |              |
 134                 |     Peer                     |              |
 135                 |----------------------------->|              |
 136                 |                                             |
 137                 |----+ OSD Monitor                            |
 138                 |    | Heartbeat                              |
 139                 |<---+ Interval Exceeded                      |
 140                 |                                             |
 141                 |         Failed to Peer with OSD 3           |
 142                 |-------------------------------------------->|
 143                 |<--------------------------------------------|
 144                 |          Receive New Cluster Map            |
 145
 146
 147 .. index:: OSD status
 148
 149 OSDs Report Their Status
 150 ========================
 151
 152 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
 153 consider the Ceph OSD Daemon ``down`` after the  ``mon osd report timeout``
 154 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
 155 event such as a failure, a change in placement group stats, a change in
 156 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
 157 Daemon minimum report interval by adding an ``osd mon report interval min``
 158 setting under the ``[osd]`` section of your Ceph configuration file, or by
 159 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
 160 Monitor every 120 seconds irrespective of whether any notable changes occur.
 161 You can change the Ceph Monitor report interval by adding an ``osd mon report
 162 interval max`` setting under the ``[osd]`` section of your Ceph configuration
 163 file, or by setting the value at runtime.
 164
 165
 166 .. ditaa:: +---------+          +---------+
 167            |  OSD 1  |          | Monitor |
 168            +---------+          +---------+
 169                 |                    |
 170                 |----+ Report Min    |
 171                 |    | Interval      |
 172                 |<---+ Exceeded      |
 173                 |                    |
 174                 |----+ Reportable    |
 175                 |    | Event         |
 176                 |<---+ Occurs        |
 177                 |                    |
 178                 |     Report To      |
 179                 |      Monitor       |
 180                 |------------------->|
 181                 |                    |
 182                 |----+ Report Max    |
 183                 |    | Interval      |
 184                 |<---+ Exceeded      |
 185                 |                    |
 186                 |     Report To      |
 187                 |      Monitor       |
 188                 |------------------->|
 189                 |                    |
 190                 |----+ Monitor       |
 191                 |    | Fails         |
 192                 |<---+               |
 193                                      +----+ Monitor OSD
 194                                      |    | Report Timeout
 195                                      |<---+ Exceeded
 196                                      |
 197                                      +----+ Mark
 198                                      |    | OSD 1
 199                                      |<---+ Down
 200
 201
 202
 203
 204 Configuration Settings
 205 ======================
 206
 207 When modifying heartbeat settings, you should include them in the ``[global]``
 208 section of your configuration file.
 209
 210 .. index:: monitor heartbeat
 211
 212 Monitor Settings
 213 ----------------
 214
 215 ``mon osd min up ratio``
 216
 217 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
 218               mark Ceph OSD Daemons ``down``.
 219
 220 :Type: Double
 221 :Default: ``.3``
 222
 223
 224 ``mon osd min in ratio``
 225
 226 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
 227               mark Ceph OSD Daemons ``out``.
 228
 229 :Type: Double
 230 :Default: ``.75``
 231
 232
 233 ``mon osd laggy halflife``
 234
 235 :Description: The number of seconds laggy estimates will decay.
 236 :Type: Integer
 237 :Default: ``60*60``
 238
 239
 240 ``mon osd laggy weight``
 241
 242 :Description: The weight for new samples in laggy estimation decay.
 243 :Type: Double
 244 :Default: ``0.3``
 245
 246
 247
 248 ``mon osd laggy max interval``
 249 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
 250               Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
 251               a certain OSD. This value will be used to calculate the grace time for
 252               that OSD.
 253 :Type: Integer
 254 :Default: 300
 255
 256 ``mon osd adjust heartbeat grace``
 257
 258 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
 259 :Type: Boolean
 260 :Default: ``true``
 261
 262
 263 ``mon osd adjust down out interval``
 264
 265 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
 266 :Type: Boolean
 267 :Default: ``true``
 268
 269
 270 ``mon osd auto mark in``
 271
 272 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
 273               the Ceph Storage Cluster.
 274
 275 :Type: Boolean
 276 :Default: ``false``
 277
 278
 279 ``mon osd auto mark auto out in``
 280
 281 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
 282               of the Ceph Storage Cluster as ``in`` the cluster.
 283
 284 :Type: Boolean
 285 :Default: ``true``
 286
 287
 288 ``mon osd auto mark new in``
 289
 290 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
 291               Ceph Storage Cluster.
 292
 293 :Type: Boolean
 294 :Default: ``true``
 295
 296
 297 ``mon osd down out interval``
 298
 299 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
 300               ``down`` and ``out`` if it doesn't respond.
 301
 302 :Type: 32-bit Integer
 303 :Default: ``600``
 304
 305
 306 ``mon osd down out subtree limit``
 307
 308 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
 309               automatically mark out. For instance, if set to ``host`` and if
 310               all OSDs of a host are down, Ceph will not automatically mark out
 311               these OSDs.
 312
 313 :Type: String
 314 :Default: ``rack``
 315
 316
 317 ``mon osd report timeout``
 318
 319 :Description: The grace period in seconds before declaring
 320               unresponsive Ceph OSD Daemons ``down``.
 321
 322 :Type: 32-bit Integer
 323 :Default: ``900``
 324
 325 ``mon osd min down reporters``
 326
 327 :Description: The minimum number of Ceph OSD Daemons required to report a
 328               ``down`` Ceph OSD Daemon.
 329
 330 :Type: 32-bit Integer
 331 :Default: ``2``
 332
 333
 334 ``mon osd reporter subtree level``
 335
 336 :Description: In which level of parent bucket the reporters are counted. The OSDs
 337               send failure reports to monitor if they find its peer is not responsive.
 338               And monitor mark the reported OSD out and then down after a grace period.
 339 :Type: String
 340 :Default: ``host``
 341
 342
 343 .. index:: OSD hearbeat
 344
 345 OSD Settings
 346 ------------
 347
 348 ``osd heartbeat address``
 349
 350 :Description: An Ceph OSD Daemon's network address for heartbeats.
 351 :Type: Address
 352 :Default: The host address.
 353
 354
 355 ``osd heartbeat interval``
 356
 357 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
 358 :Type: 32-bit Integer
 359 :Default: ``6``
 360
 361
 362 ``osd heartbeat grace``
 363
 364 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
 365               that the Ceph Storage Cluster considers it ``down``.
 366               This setting has to be set in both the [mon] and [osd] or [global]
 367               section so that it is read by both the MON and OSD daemons.
 368 :Type: 32-bit Integer
 369 :Default: ``20``
 370
 371
 372 ``osd mon heartbeat interval``
 373
 374 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
 375               Ceph OSD Daemon peers.
 376
 377 :Type: 32-bit Integer
 378 :Default: ``30``
 379
 380
 381 ``osd mon report interval max``
 382
 383 :Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
 384               it must report to a Ceph Monitor.
 385
 386 :Type: 32-bit Integer
 387 :Default: ``120``
 388
 389
 390 ``osd mon report interval min``
 391
 392 :Description: The minimum number of seconds a Ceph OSD Daemon may wait
 393               from startup or another reportable event before reporting
 394               to a Ceph Monitor.
 395
 396 :Type: 32-bit Integer
 397 :Default: ``5``
 398 :Valid Range: Should be less than ``osd mon report interval max``
 399
 400
 401 ``osd mon ack timeout``
 402
 403 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
 404               request for statistics.
 405
 406 :Type: 32-bit Integer
 407 :Default: ``30``