1 =====================================
2 Configuring Monitor/OSD Interaction
3 =====================================
7 After you have completed your initial Ceph configuration, you may deploy and run
8 Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
13 Monitor doesn't receive reports, or if it receives reports of changes in the
14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
18 interaction. However, you may override the defaults. The following sections
19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
20 monitoring the Ceph Storage Cluster.
22 .. index:: heartbeat interval
27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons at random
28 intervals less than every 6 seconds. If a neighboring Ceph OSD Daemon doesn't
29 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
30 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
31 Monitor, which will update the Ceph Cluster Map. You may change this grace
32 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
33 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
34 or by setting the value at runtime.
37 .. ditaa:: +---------+ +---------+
39 +---------+ +---------+
47 |------------------->|
49 |<-------------------|
58 |------------------->|
69 .. index:: OSD down report
74 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
75 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
76 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
77 that all the OSDs reporting the failure are hosted in a rack with a bad switch
78 which has trouble connecting to another OSD. To avoid this sort of false alarm,
79 we consider the peers reporting a failure a proxy for a potential "subcluster"
80 over the overall cluster that is similarly laggy. This is clearly not true in
81 all cases, but will sometimes help us localize the grace correction to a subset
82 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
83 group the peers into the "subcluster" by their common ancestor type in CRUSH
84 map. By default, only two reports from different subtree are required to report
85 another Ceph OSD Daemon ``down``. You can change the number of reporters from
86 unique subtrees and the common ancestor type required to report a Ceph OSD
87 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
88 and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
89 your Ceph configuration file, or by setting the value at runtime.
92 .. ditaa:: +---------+ +---------+ +---------+
93 | OSD 1 | | OSD 2 | | Monitor |
94 +---------+ +---------+ +---------+
97 |---------------+--------------->|
109 .. index:: peering failure
111 OSDs Report Peering Failure
112 ===========================
114 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
115 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
116 the most recent copy of the cluster map every 30 seconds. You can change the
117 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
118 setting under the ``[osd]`` section of your Ceph configuration file, or by
119 setting the value at runtime.
121 .. ditaa:: +---------+ +---------+ +-------+ +---------+
122 | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
123 +---------+ +---------+ +-------+ +---------+
127 |-------------->| | |
128 |<--------------| | |
133 |----------------------------->| |
137 |<---+ Interval Exceeded |
139 | Failed to Peer with OSD 3 |
140 |-------------------------------------------->|
141 |<--------------------------------------------|
142 | Receive New Cluster Map |
145 .. index:: OSD status
147 OSDs Report Their Status
148 ========================
150 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
151 consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
152 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
153 event such as a failure, a change in placement group stats, a change in
154 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
155 Daemon minimum report interval by adding an ``osd mon report interval``
156 setting under the ``[osd]`` section of your Ceph configuration file, or by
157 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
158 Monitor every 120 seconds irrespective of whether any notable changes occur.
159 You can change the Ceph Monitor report interval by adding an ``osd mon report
160 interval max`` setting under the ``[osd]`` section of your Ceph configuration
161 file, or by setting the value at runtime.
164 .. ditaa:: +---------+ +---------+
165 | OSD 1 | | Monitor |
166 +---------+ +---------+
178 |------------------->|
186 |------------------->|
202 Configuration Settings
203 ======================
205 When modifying heartbeat settings, you should include them in the ``[global]``
206 section of your configuration file.
208 .. index:: monitor heartbeat
213 ``mon osd min up ratio``
215 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
216 mark Ceph OSD Daemons ``down``.
222 ``mon osd min in ratio``
224 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
225 mark Ceph OSD Daemons ``out``.
231 ``mon osd laggy halflife``
233 :Description: The number of seconds laggy estimates will decay.
238 ``mon osd laggy weight``
240 :Description: The weight for new samples in laggy estimation decay.
246 ``mon osd laggy max interval``
248 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
249 Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
250 a certain OSD. This value will be used to calculate the grace time for
255 ``mon osd adjust heartbeat grace``
257 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
262 ``mon osd adjust down out interval``
264 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
269 ``mon osd auto mark in``
271 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
272 the Ceph Storage Cluster.
278 ``mon osd auto mark auto out in``
280 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
281 of the Ceph Storage Cluster as ``in`` the cluster.
287 ``mon osd auto mark new in``
289 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
290 Ceph Storage Cluster.
296 ``mon osd down out interval``
298 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
299 ``down`` and ``out`` if it doesn't respond.
301 :Type: 32-bit Integer
305 ``mon osd down out subtree limit``
307 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
308 automatically mark out. For instance, if set to ``host`` and if
309 all OSDs of a host are down, Ceph will not automatically mark out
316 ``mon osd report timeout``
318 :Description: The grace period in seconds before declaring
319 unresponsive Ceph OSD Daemons ``down``.
321 :Type: 32-bit Integer
324 ``mon osd min down reporters``
326 :Description: The minimum number of Ceph OSD Daemons required to report a
327 ``down`` Ceph OSD Daemon.
329 :Type: 32-bit Integer
333 ``mon osd reporter subtree level``
335 :Description: In which level of parent bucket the reporters are counted. The OSDs
336 send failure reports to monitor if they find its peer is not responsive.
337 And monitor mark the reported OSD out and then down after a grace period.
342 .. index:: OSD hearbeat
347 ``osd heartbeat address``
349 :Description: An Ceph OSD Daemon's network address for heartbeats.
351 :Default: The host address.
354 ``osd heartbeat interval``
356 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
357 :Type: 32-bit Integer
361 ``osd heartbeat grace``
363 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
364 that the Ceph Storage Cluster considers it ``down``.
365 This setting has to be set in both the [mon] and [osd] or [global]
366 section so that it is read by both the MON and OSD daemons.
367 :Type: 32-bit Integer
371 ``osd mon heartbeat interval``
373 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
374 Ceph OSD Daemon peers.
376 :Type: 32-bit Integer
380 ``osd mon heartbeat stat stale``
382 :Description: Stop reporting on heartbeat ping times which haven't been updated for
383 this many seconds. Set to zero to disable this action.
385 :Type: 32-bit Integer
389 ``osd mon report interval``
391 :Description: The number of seconds a Ceph OSD Daemon may wait
392 from startup or another reportable event before reporting
395 :Type: 32-bit Integer
399 ``osd mon ack timeout``
401 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
402 request for statistics.
404 :Type: 32-bit Integer