1 =====================================
2 Configuring Monitor/OSD Interaction
3 =====================================
7 After you have completed your initial Ceph configuration, you may deploy and run
8 Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
13 Monitor doesn't receive reports, or if it receives reports of changes in the
14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
18 interaction. However, you may override the defaults. The following sections
19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
20 monitoring the Ceph Storage Cluster.
22 .. index:: heartbeat interval
27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
28 seconds. You can change the heartbeat interval by adding an ``osd heartbeat
29 interval`` setting under the ``[osd]`` section of your Ceph configuration file,
30 or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
31 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
32 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
33 Monitor, which will update the Ceph Cluster Map. You may change this grace
34 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
35 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
36 or by setting the value at runtime.
39 .. ditaa:: +---------+ +---------+
41 +---------+ +---------+
49 |------------------->|
51 |<-------------------|
60 |------------------->|
71 .. index:: OSD down report
76 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
77 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
78 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
79 that all the OSDs reporting the failure are hosted in a rack with a bad switch
80 which has trouble connecting to another OSD. To avoid this sort of false alarm,
81 we consider the peers reporting a failure a proxy for a potential "subcluster"
82 over the overall cluster that is similarly laggy. This is clearly not true in
83 all cases, but will sometimes help us localize the grace correction to a subset
84 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
85 group the peers into the "subcluster" by their common ancestor type in CRUSH
86 map. By default, only two reports from different subtree are required to report
87 another Ceph OSD Daemon ``down``. You can change the number of reporters from
88 unique subtrees and the common ancestor type required to report a Ceph OSD
89 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
90 and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
91 your Ceph configuration file, or by setting the value at runtime.
94 .. ditaa:: +---------+ +---------+ +---------+
95 | OSD 1 | | OSD 2 | | Monitor |
96 +---------+ +---------+ +---------+
99 |---------------+--------------->|
111 .. index:: peering failure
113 OSDs Report Peering Failure
114 ===========================
116 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
117 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
118 the most recent copy of the cluster map every 30 seconds. You can change the
119 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
120 setting under the ``[osd]`` section of your Ceph configuration file, or by
121 setting the value at runtime.
123 .. ditaa:: +---------+ +---------+ +-------+ +---------+
124 | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
125 +---------+ +---------+ +-------+ +---------+
129 |-------------->| | |
130 |<--------------| | |
135 |----------------------------->| |
139 |<---+ Interval Exceeded |
141 | Failed to Peer with OSD 3 |
142 |-------------------------------------------->|
143 |<--------------------------------------------|
144 | Receive New Cluster Map |
147 .. index:: OSD status
149 OSDs Report Their Status
150 ========================
152 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
153 consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
154 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
155 event such as a failure, a change in placement group stats, a change in
156 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
157 Daemon minimum report interval by adding an ``osd mon report interval min``
158 setting under the ``[osd]`` section of your Ceph configuration file, or by
159 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
160 Monitor every 120 seconds irrespective of whether any notable changes occur.
161 You can change the Ceph Monitor report interval by adding an ``osd mon report
162 interval max`` setting under the ``[osd]`` section of your Ceph configuration
163 file, or by setting the value at runtime.
166 .. ditaa:: +---------+ +---------+
167 | OSD 1 | | Monitor |
168 +---------+ +---------+
180 |------------------->|
188 |------------------->|
204 Configuration Settings
205 ======================
207 When modifying heartbeat settings, you should include them in the ``[global]``
208 section of your configuration file.
210 .. index:: monitor heartbeat
215 ``mon osd min up ratio``
217 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
218 mark Ceph OSD Daemons ``down``.
224 ``mon osd min in ratio``
226 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
227 mark Ceph OSD Daemons ``out``.
233 ``mon osd laggy halflife``
235 :Description: The number of seconds laggy estimates will decay.
240 ``mon osd laggy weight``
242 :Description: The weight for new samples in laggy estimation decay.
248 ``mon osd laggy max interval``
249 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
250 Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
251 a certain OSD. This value will be used to calculate the grace time for
256 ``mon osd adjust heartbeat grace``
258 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
263 ``mon osd adjust down out interval``
265 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
270 ``mon osd auto mark in``
272 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
273 the Ceph Storage Cluster.
279 ``mon osd auto mark auto out in``
281 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
282 of the Ceph Storage Cluster as ``in`` the cluster.
288 ``mon osd auto mark new in``
290 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
291 Ceph Storage Cluster.
297 ``mon osd down out interval``
299 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
300 ``down`` and ``out`` if it doesn't respond.
302 :Type: 32-bit Integer
306 ``mon osd down out subtree limit``
308 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
309 automatically mark out. For instance, if set to ``host`` and if
310 all OSDs of a host are down, Ceph will not automatically mark out
317 ``mon osd report timeout``
319 :Description: The grace period in seconds before declaring
320 unresponsive Ceph OSD Daemons ``down``.
322 :Type: 32-bit Integer
325 ``mon osd min down reporters``
327 :Description: The minimum number of Ceph OSD Daemons required to report a
328 ``down`` Ceph OSD Daemon.
330 :Type: 32-bit Integer
334 ``mon osd reporter subtree level``
336 :Description: In which level of parent bucket the reporters are counted. The OSDs
337 send failure reports to monitor if they find its peer is not responsive.
338 And monitor mark the reported OSD out and then down after a grace period.
343 .. index:: OSD hearbeat
348 ``osd heartbeat address``
350 :Description: An Ceph OSD Daemon's network address for heartbeats.
352 :Default: The host address.
355 ``osd heartbeat interval``
357 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
358 :Type: 32-bit Integer
362 ``osd heartbeat grace``
364 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
365 that the Ceph Storage Cluster considers it ``down``.
366 This setting has to be set in both the [mon] and [osd] or [global]
367 section so that it is read by both the MON and OSD daemons.
368 :Type: 32-bit Integer
372 ``osd mon heartbeat interval``
374 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
375 Ceph OSD Daemon peers.
377 :Type: 32-bit Integer
381 ``osd mon report interval max``
383 :Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
384 it must report to a Ceph Monitor.
386 :Type: 32-bit Integer
390 ``osd mon report interval min``
392 :Description: The minimum number of seconds a Ceph OSD Daemon may wait
393 from startup or another reportable event before reporting
396 :Type: 32-bit Integer
398 :Valid Range: Should be less than ``osd mon report interval max``
401 ``osd mon ack timeout``
403 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
404 request for statistics.
406 :Type: 32-bit Integer