ceph/doc/rados/configuration/mon-osd-interaction.rst

   1 =====================================
   2  Configuring Monitor/OSD Interaction
   3 =====================================
   4
   5 .. index:: heartbeat
   6
   7 After you have completed your initial Ceph configuration, you may deploy and run
   8 Ceph.  When you execute a command such as ``ceph health`` or ``ceph -s``,  the
   9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
  10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
  11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
  12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
  13 Monitor doesn't receive reports, or if it receives reports of changes in the
  14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
  15 Cluster Map`.
  16
  17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
  18 interaction. However, you may override the defaults. The following sections
  19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
  20 monitoring the Ceph Storage Cluster.
  21
  22 .. index:: heartbeat interval
  23
  24 OSDs Check Heartbeats
  25 =====================
  26
  27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons at random
  28 intervals less than every 6 seconds.  If a neighboring Ceph OSD Daemon doesn't
  29 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
  30 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
  31 Monitor, which will update the Ceph Cluster Map. You may change this grace
  32 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
  33 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
  34 or by setting the value at runtime.
  35
  36
  37 .. ditaa:: +---------+          +---------+
  38            |  OSD 1  |          |  OSD 2  |
  39            +---------+          +---------+
  40                 |                    |
  41                 |----+ Heartbeat     |
  42                 |    | Interval      |
  43                 |<---+ Exceeded      |
  44                 |                    |
  45                 |       Check        |
  46                 |     Heartbeat      |
  47                 |------------------->|
  48                 |                    |
  49                 |<-------------------|
  50                 |   Heart Beating    |
  51                 |                    |
  52                 |----+ Heartbeat     |
  53                 |    | Interval      |
  54                 |<---+ Exceeded      |
  55                 |                    |
  56                 |       Check        |
  57                 |     Heartbeat      |
  58                 |------------------->|
  59                 |                    |
  60                 |----+ Grace         |
  61                 |    | Period        |
  62                 |<---+ Exceeded      |
  63                 |                    |
  64                 |----+ Mark          |
  65                 |    | OSD 2         |
  66                 |<---+ Down          |
  67
  68
  69 .. index:: OSD down report
  70
  71 OSDs Report Down OSDs
  72 =====================
  73
  74 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
  75 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
  76 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
  77 that all the OSDs reporting the failure are hosted in a rack with a bad switch
  78 which has trouble connecting to another OSD. To avoid this sort of false alarm,
  79 we consider the peers reporting a failure a proxy for a potential "subcluster"
  80 over the overall cluster that is similarly laggy. This is clearly not true in
  81 all cases, but will sometimes help us localize the grace correction to a subset
  82 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
  83 group the peers into the "subcluster" by their common ancestor type in CRUSH
  84 map. By default, only two reports from different subtree are required to report
  85 another Ceph OSD Daemon ``down``. You can change the number of reporters from
  86 unique subtrees and the common ancestor type required to report a Ceph OSD
  87 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
  88 and ``mon osd reporter subtree level`` settings  under the ``[mon]`` section of
  89 your Ceph configuration file, or by setting the value at runtime.
  90
  91
  92 .. ditaa:: +---------+     +---------+      +---------+
  93            |  OSD 1  |     |  OSD 2  |      | Monitor |
  94            +---------+     +---------+      +---------+
  95                 |               |                |
  96                 | OSD 3 Is Down |                |
  97                 |---------------+--------------->|
  98                 |               |                |
  99                 |               |                |
 100                 |               | OSD 3 Is Down  |
 101                 |               |--------------->|
 102                 |               |                |
 103                 |               |                |
 104                 |               |                |---------+ Mark
 105                 |               |                |         | OSD 3
 106                 |               |                |<--------+ Down
 107
 108
 109 .. index:: peering failure
 110
 111 OSDs Report Peering Failure
 112 ===========================
 113
 114 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
 115 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
 116 the most recent copy of the cluster map every 30 seconds. You can change the
 117 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
 118 setting under the ``[osd]`` section of your Ceph configuration file, or by
 119 setting the value at runtime.
 120
 121 .. ditaa:: +---------+     +---------+     +-------+     +---------+
 122            |  OSD 1  |     |  OSD 2  |     | OSD 3 |     | Monitor |
 123            +---------+     +---------+     +-------+     +---------+
 124                 |               |              |              |
 125                 |  Request To   |              |              |
 126                 |     Peer      |              |              |
 127                 |-------------->|              |              |
 128                 |<--------------|              |              |
 129                 |    Peering                   |              |
 130                 |                              |              |
 131                 |  Request To                  |              |
 132                 |     Peer                     |              |
 133                 |----------------------------->|              |
 134                 |                                             |
 135                 |----+ OSD Monitor                            |
 136                 |    | Heartbeat                              |
 137                 |<---+ Interval Exceeded                      |
 138                 |                                             |
 139                 |         Failed to Peer with OSD 3           |
 140                 |-------------------------------------------->|
 141                 |<--------------------------------------------|
 142                 |          Receive New Cluster Map            |
 143
 144
 145 .. index:: OSD status
 146
 147 OSDs Report Their Status
 148 ========================
 149
 150 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
 151 consider the Ceph OSD Daemon ``down`` after the  ``mon osd report timeout``
 152 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
 153 event such as a failure, a change in placement group stats, a change in
 154 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
 155 Daemon minimum report interval by adding an ``osd mon report interval``
 156 setting under the ``[osd]`` section of your Ceph configuration file, or by
 157 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
 158 Monitor every 120 seconds irrespective of whether any notable changes occur.
 159 You can change the Ceph Monitor report interval by adding an ``osd mon report
 160 interval max`` setting under the ``[osd]`` section of your Ceph configuration
 161 file, or by setting the value at runtime.
 162
 163
 164 .. ditaa:: +---------+          +---------+
 165            |  OSD 1  |          | Monitor |
 166            +---------+          +---------+
 167                 |                    |
 168                 |----+ Report Min    |
 169                 |    | Interval      |
 170                 |<---+ Exceeded      |
 171                 |                    |
 172                 |----+ Reportable    |
 173                 |    | Event         |
 174                 |<---+ Occurs        |
 175                 |                    |
 176                 |     Report To      |
 177                 |      Monitor       |
 178                 |------------------->|
 179                 |                    |
 180                 |----+ Report Max    |
 181                 |    | Interval      |
 182                 |<---+ Exceeded      |
 183                 |                    |
 184                 |     Report To      |
 185                 |      Monitor       |
 186                 |------------------->|
 187                 |                    |
 188                 |----+ Monitor       |
 189                 |    | Fails         |
 190                 |<---+               |
 191                                      +----+ Monitor OSD
 192                                      |    | Report Timeout
 193                                      |<---+ Exceeded
 194                                      |
 195                                      +----+ Mark
 196                                      |    | OSD 1
 197                                      |<---+ Down
 198
 199
 200
 201
 202 Configuration Settings
 203 ======================
 204
 205 When modifying heartbeat settings, you should include them in the ``[global]``
 206 section of your configuration file.
 207
 208 .. index:: monitor heartbeat
 209
 210 Monitor Settings
 211 ----------------
 212
 213 ``mon osd min up ratio``
 214
 215 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
 216               mark Ceph OSD Daemons ``down``.
 217
 218 :Type: Double
 219 :Default: ``.3``
 220
 221
 222 ``mon osd min in ratio``
 223
 224 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
 225               mark Ceph OSD Daemons ``out``.
 226
 227 :Type: Double
 228 :Default: ``.75``
 229
 230
 231 ``mon osd laggy halflife``
 232
 233 :Description: The number of seconds laggy estimates will decay.
 234 :Type: Integer
 235 :Default: ``60*60``
 236
 237
 238 ``mon osd laggy weight``
 239
 240 :Description: The weight for new samples in laggy estimation decay.
 241 :Type: Double
 242 :Default: ``0.3``
 243
 244
 245
 246 ``mon osd laggy max interval``
 247
 248 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
 249               Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
 250               a certain OSD. This value will be used to calculate the grace time for
 251               that OSD.
 252 :Type: Integer
 253 :Default: 300
 254
 255 ``mon osd adjust heartbeat grace``
 256
 257 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
 258 :Type: Boolean
 259 :Default: ``true``
 260
 261
 262 ``mon osd adjust down out interval``
 263
 264 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
 265 :Type: Boolean
 266 :Default: ``true``
 267
 268
 269 ``mon osd auto mark in``
 270
 271 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
 272               the Ceph Storage Cluster.
 273
 274 :Type: Boolean
 275 :Default: ``false``
 276
 277
 278 ``mon osd auto mark auto out in``
 279
 280 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
 281               of the Ceph Storage Cluster as ``in`` the cluster.
 282
 283 :Type: Boolean
 284 :Default: ``true``
 285
 286
 287 ``mon osd auto mark new in``
 288
 289 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
 290               Ceph Storage Cluster.
 291
 292 :Type: Boolean
 293 :Default: ``true``
 294
 295
 296 ``mon osd down out interval``
 297
 298 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
 299               ``down`` and ``out`` if it doesn't respond.
 300
 301 :Type: 32-bit Integer
 302 :Default: ``600``
 303
 304
 305 ``mon osd down out subtree limit``
 306
 307 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
 308               automatically mark out. For instance, if set to ``host`` and if
 309               all OSDs of a host are down, Ceph will not automatically mark out
 310               these OSDs.
 311
 312 :Type: String
 313 :Default: ``rack``
 314
 315
 316 ``mon osd report timeout``
 317
 318 :Description: The grace period in seconds before declaring
 319               unresponsive Ceph OSD Daemons ``down``.
 320
 321 :Type: 32-bit Integer
 322 :Default: ``900``
 323
 324 ``mon osd min down reporters``
 325
 326 :Description: The minimum number of Ceph OSD Daemons required to report a
 327               ``down`` Ceph OSD Daemon.
 328
 329 :Type: 32-bit Integer
 330 :Default: ``2``
 331
 332
 333 ``mon osd reporter subtree level``
 334
 335 :Description: In which level of parent bucket the reporters are counted. The OSDs
 336               send failure reports to monitor if they find its peer is not responsive.
 337               And monitor mark the reported OSD out and then down after a grace period.
 338 :Type: String
 339 :Default: ``host``
 340
 341
 342 .. index:: OSD hearbeat
 343
 344 OSD Settings
 345 ------------
 346
 347 ``osd heartbeat address``
 348
 349 :Description: An Ceph OSD Daemon's network address for heartbeats.
 350 :Type: Address
 351 :Default: The host address.
 352
 353
 354 ``osd heartbeat interval``
 355
 356 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
 357 :Type: 32-bit Integer
 358 :Default: ``6``
 359
 360
 361 ``osd heartbeat grace``
 362
 363 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
 364               that the Ceph Storage Cluster considers it ``down``.
 365               This setting has to be set in both the [mon] and [osd] or [global]
 366               section so that it is read by both the MON and OSD daemons.
 367 :Type: 32-bit Integer
 368 :Default: ``20``
 369
 370
 371 ``osd mon heartbeat interval``
 372
 373 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
 374               Ceph OSD Daemon peers.
 375
 376 :Type: 32-bit Integer
 377 :Default: ``30``
 378
 379
 380 ``osd mon heartbeat stat stale``
 381
 382 :Description: Stop reporting on heartbeat ping times which haven't been updated for
 383               this many seconds.  Set to zero to disable this action.
 384
 385 :Type: 32-bit Integer
 386 :Default: ``3600``
 387
 388
 389 ``osd mon report interval``
 390
 391 :Description: The number of seconds a Ceph OSD Daemon may wait
 392               from startup or another reportable event before reporting
 393               to a Ceph Monitor.
 394
 395 :Type: 32-bit Integer
 396 :Default: ``5``
 397
 398
 399 ``osd mon ack timeout``
 400
 401 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
 402               request for statistics.
 403
 404 :Type: 32-bit Integer
 405 :Default: ``30``