]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/configuration/mon-osd-interaction.rst
update sources to v12.1.1
[ceph.git] / ceph / doc / rados / configuration / mon-osd-interaction.rst
CommitLineData
7c673cae
FG
1=====================================
2 Configuring Monitor/OSD Interaction
3=====================================
4
5.. index:: heartbeat
6
7After you have completed your initial Ceph configuration, you may deploy and run
8Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
9:term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
10Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
11reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
12OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
13Monitor doesn't receive reports, or if it receives reports of changes in the
14Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
15Cluster Map`.
16
17Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
18interaction. However, you may override the defaults. The following sections
19describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
20monitoring the Ceph Storage Cluster.
21
22.. index:: heartbeat interval
23
24OSDs Check Heartbeats
25=====================
26
27Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
28seconds. You can change the heartbeat interval by adding an ``osd heartbeat
29interval`` setting under the ``[osd]`` section of your Ceph configuration file,
30or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
31show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
32consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
33Monitor, which will update the Ceph Cluster Map. You may change this grace
34period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
35and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
36or by setting the value at runtime.
37
38
39.. ditaa:: +---------+ +---------+
40 | OSD 1 | | OSD 2 |
41 +---------+ +---------+
42 | |
43 |----+ Heartbeat |
44 | | Interval |
45 |<---+ Exceeded |
46 | |
47 | Check |
48 | Heartbeat |
49 |------------------->|
50 | |
51 |<-------------------|
52 | Heart Beating |
53 | |
54 |----+ Heartbeat |
55 | | Interval |
56 |<---+ Exceeded |
57 | |
58 | Check |
59 | Heartbeat |
60 |------------------->|
61 | |
62 |----+ Grace |
63 | | Period |
64 |<---+ Exceeded |
65 | |
66 |----+ Mark |
67 | | OSD 2 |
68 |<---+ Down |
31f18b77 69
7c673cae
FG
70
71.. index:: OSD down report
72
73OSDs Report Down OSDs
74=====================
75
31f18b77
FG
76By default, two Ceph OSD Daemons from different hosts must report to the Ceph
77Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
78acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
79that all the OSDs reporting the failure are hosted in a rack with a bad switch
80which has trouble connecting to another OSD. To avoid this sort of false alarm,
81we consider the peers reporting a failure a proxy for a potential "subcluster"
82over the overall cluster that is similarly laggy. This is clearly not true in
83all cases, but will sometimes help us localize the grace correction to a subset
84of the system that is unhappy. ``mon osd reporter subtree level`` is used to
85group the peers into the "subcluster" by their common ancestor type in CRUSH
86map. By default, only two reports from different subtree are required to report
87another Ceph OSD Daemon ``down``. You can change the number of reporters from
88unique subtrees and the common ancestor type required to report a Ceph OSD
89Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
90and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
91your Ceph configuration file, or by setting the value at runtime.
92
93
94.. ditaa:: +---------+ +---------+ +---------+
95 | OSD 1 | | OSD 2 | | Monitor |
96 +---------+ +---------+ +---------+
97 | | |
98 | OSD 3 Is Down | |
99 |---------------+--------------->|
100 | | |
101 | | |
102 | | OSD 3 Is Down |
103 | |--------------->|
104 | | |
105 | | |
106 | | |---------+ Mark
107 | | | | OSD 3
108 | | |<--------+ Down
7c673cae
FG
109
110
111.. index:: peering failure
112
113OSDs Report Peering Failure
114===========================
115
116If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
117Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
118the most recent copy of the cluster map every 30 seconds. You can change the
119Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
120setting under the ``[osd]`` section of your Ceph configuration file, or by
121setting the value at runtime.
122
123.. ditaa:: +---------+ +---------+ +-------+ +---------+
124 | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
125 +---------+ +---------+ +-------+ +---------+
126 | | | |
127 | Request To | | |
31f18b77 128 | Peer | | |
7c673cae
FG
129 |-------------->| | |
130 |<--------------| | |
131 | Peering | |
132 | | |
133 | Request To | |
31f18b77 134 | Peer | |
7c673cae
FG
135 |----------------------------->| |
136 | |
137 |----+ OSD Monitor |
138 | | Heartbeat |
139 |<---+ Interval Exceeded |
140 | |
141 | Failed to Peer with OSD 3 |
142 |-------------------------------------------->|
143 |<--------------------------------------------|
144 | Receive New Cluster Map |
31f18b77 145
7c673cae
FG
146
147.. index:: OSD status
148
149OSDs Report Their Status
150========================
151
152If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
153consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
154elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
155event such as a failure, a change in placement group stats, a change in
156``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
157Daemon minimum report interval by adding an ``osd mon report interval min``
158setting under the ``[osd]`` section of your Ceph configuration file, or by
31f18b77
FG
159setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
160Monitor every 120 seconds irrespective of whether any notable changes occur.
161You can change the Ceph Monitor report interval by adding an ``osd mon report
162interval max`` setting under the ``[osd]`` section of your Ceph configuration
7c673cae
FG
163file, or by setting the value at runtime.
164
165
166.. ditaa:: +---------+ +---------+
167 | OSD 1 | | Monitor |
168 +---------+ +---------+
169 | |
170 |----+ Report Min |
171 | | Interval |
172 |<---+ Exceeded |
173 | |
174 |----+ Reportable |
175 | | Event |
176 |<---+ Occurs |
177 | |
178 | Report To |
179 | Monitor |
180 |------------------->|
181 | |
182 |----+ Report Max |
183 | | Interval |
184 |<---+ Exceeded |
185 | |
186 | Report To |
187 | Monitor |
188 |------------------->|
189 | |
190 |----+ Monitor |
191 | | Fails |
192 |<---+ |
193 +----+ Monitor OSD
194 | | Report Timeout
195 |<---+ Exceeded
196 |
197 +----+ Mark
198 | | OSD 1
199 |<---+ Down
200
201
202
203
204Configuration Settings
205======================
206
207When modifying heartbeat settings, you should include them in the ``[global]``
208section of your configuration file.
209
210.. index:: monitor heartbeat
211
212Monitor Settings
213----------------
214
215``mon osd min up ratio``
216
31f18b77 217:Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
7c673cae 218 mark Ceph OSD Daemons ``down``.
31f18b77 219
7c673cae
FG
220:Type: Double
221:Default: ``.3``
222
223
224``mon osd min in ratio``
225
31f18b77 226:Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
7c673cae 227 mark Ceph OSD Daemons ``out``.
31f18b77 228
7c673cae
FG
229:Type: Double
230:Default: ``.75``
231
232
233``mon osd laggy halflife``
234
235:Description: The number of seconds laggy estimates will decay.
236:Type: Integer
237:Default: ``60*60``
238
239
240``mon osd laggy weight``
241
242:Description: The weight for new samples in laggy estimation decay.
243:Type: Double
244:Default: ``0.3``
245
246
31f18b77
FG
247
248``mon osd laggy max interval``
224ce89b 249
31f18b77
FG
250:Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
251 Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
252 a certain OSD. This value will be used to calculate the grace time for
253 that OSD.
254:Type: Integer
255:Default: 300
256
7c673cae
FG
257``mon osd adjust heartbeat grace``
258
259:Description: If set to ``true``, Ceph will scale based on laggy estimations.
260:Type: Boolean
261:Default: ``true``
262
263
264``mon osd adjust down out interval``
265
266:Description: If set to ``true``, Ceph will scaled based on laggy estimations.
267:Type: Boolean
268:Default: ``true``
269
270
31f18b77 271``mon osd auto mark in``
7c673cae 272
31f18b77 273:Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
7c673cae
FG
274 the Ceph Storage Cluster.
275
276:Type: Boolean
277:Default: ``false``
278
279
31f18b77 280``mon osd auto mark auto out in``
7c673cae 281
31f18b77 282:Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
7c673cae 283 of the Ceph Storage Cluster as ``in`` the cluster.
31f18b77 284
7c673cae 285:Type: Boolean
31f18b77 286:Default: ``true``
7c673cae
FG
287
288
31f18b77 289``mon osd auto mark new in``
7c673cae 290
31f18b77 291:Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
7c673cae 292 Ceph Storage Cluster.
31f18b77 293
7c673cae 294:Type: Boolean
31f18b77 295:Default: ``true``
7c673cae
FG
296
297
31f18b77 298``mon osd down out interval``
7c673cae
FG
299
300:Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
301 ``down`` and ``out`` if it doesn't respond.
31f18b77 302
7c673cae
FG
303:Type: 32-bit Integer
304:Default: ``600``
305
306
307``mon osd down out subtree limit``
308
309:Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
310 automatically mark out. For instance, if set to ``host`` and if
311 all OSDs of a host are down, Ceph will not automatically mark out
312 these OSDs.
313
314:Type: String
315:Default: ``rack``
316
317
31f18b77 318``mon osd report timeout``
7c673cae 319
31f18b77 320:Description: The grace period in seconds before declaring
7c673cae
FG
321 unresponsive Ceph OSD Daemons ``down``.
322
323:Type: 32-bit Integer
324:Default: ``900``
325
31f18b77 326``mon osd min down reporters``
7c673cae 327
31f18b77 328:Description: The minimum number of Ceph OSD Daemons required to report a
7c673cae
FG
329 ``down`` Ceph OSD Daemon.
330
331:Type: 32-bit Integer
31f18b77
FG
332:Default: ``2``
333
334
335``mon osd reporter subtree level``
336
337:Description: In which level of parent bucket the reporters are counted. The OSDs
338 send failure reports to monitor if they find its peer is not responsive.
339 And monitor mark the reported OSD out and then down after a grace period.
340:Type: String
341:Default: ``host``
7c673cae
FG
342
343
344.. index:: OSD hearbeat
345
346OSD Settings
347------------
348
349``osd heartbeat address``
350
31f18b77 351:Description: An Ceph OSD Daemon's network address for heartbeats.
7c673cae
FG
352:Type: Address
353:Default: The host address.
354
355
31f18b77 356``osd heartbeat interval``
7c673cae
FG
357
358:Description: How often an Ceph OSD Daemon pings its peers (in seconds).
359:Type: 32-bit Integer
360:Default: ``6``
361
362
31f18b77 363``osd heartbeat grace``
7c673cae
FG
364
365:Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
366 that the Ceph Storage Cluster considers it ``down``.
367 This setting has to be set in both the [mon] and [osd] or [global]
368 section so that it is read by both the MON and OSD daemons.
7c673cae
FG
369:Type: 32-bit Integer
370:Default: ``20``
371
372
31f18b77 373``osd mon heartbeat interval``
7c673cae 374
31f18b77 375:Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
7c673cae
FG
376 Ceph OSD Daemon peers.
377
378:Type: 32-bit Integer
31f18b77 379:Default: ``30``
7c673cae
FG
380
381
31f18b77 382``osd mon report interval max``
7c673cae
FG
383
384:Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
385 it must report to a Ceph Monitor.
386
387:Type: 32-bit Integer
31f18b77 388:Default: ``120``
7c673cae
FG
389
390
31f18b77 391``osd mon report interval min``
7c673cae
FG
392
393:Description: The minimum number of seconds a Ceph OSD Daemon may wait
31f18b77 394 from startup or another reportable event before reporting
7c673cae
FG
395 to a Ceph Monitor.
396
397:Type: 32-bit Integer
398:Default: ``5``
31f18b77 399:Valid Range: Should be less than ``osd mon report interval max``
7c673cae
FG
400
401
31f18b77 402``osd mon ack timeout``
7c673cae 403
31f18b77 404:Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
7c673cae
FG
405 request for statistics.
406
407:Type: 32-bit Integer
31f18b77 408:Default: ``30``