]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/configuration/mon-osd-interaction.rst
ab57cb069a88cae7c8d80f3d6cc2a3ba24980a73
[ceph.git] / ceph / doc / rados / configuration / mon-osd-interaction.rst
1 =====================================
2 Configuring Monitor/OSD Interaction
3 =====================================
4
5 .. index:: heartbeat
6
7 After you have completed your initial Ceph configuration, you may deploy and run
8 Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
13 Monitor doesn't receive reports, or if it receives reports of changes in the
14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
15 Cluster Map`.
16
17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
18 interaction. However, you may override the defaults. The following sections
19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
20 monitoring the Ceph Storage Cluster.
21
22 .. index:: heartbeat interval
23
24 OSDs Check Heartbeats
25 =====================
26
27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
28 seconds. You can change the heartbeat interval by adding an ``osd heartbeat
29 interval`` setting under the ``[osd]`` section of your Ceph configuration file,
30 or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
31 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
32 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
33 Monitor, which will update the Ceph Cluster Map. You may change this grace
34 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
35 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
36 or by setting the value at runtime.
37
38
39 .. ditaa:: +---------+ +---------+
40 | OSD 1 | | OSD 2 |
41 +---------+ +---------+
42 | |
43 |----+ Heartbeat |
44 | | Interval |
45 |<---+ Exceeded |
46 | |
47 | Check |
48 | Heartbeat |
49 |------------------->|
50 | |
51 |<-------------------|
52 | Heart Beating |
53 | |
54 |----+ Heartbeat |
55 | | Interval |
56 |<---+ Exceeded |
57 | |
58 | Check |
59 | Heartbeat |
60 |------------------->|
61 | |
62 |----+ Grace |
63 | | Period |
64 |<---+ Exceeded |
65 | |
66 |----+ Mark |
67 | | OSD 2 |
68 |<---+ Down |
69
70
71 .. index:: OSD down report
72
73 OSDs Report Down OSDs
74 =====================
75
76 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
77 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
78 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
79 that all the OSDs reporting the failure are hosted in a rack with a bad switch
80 which has trouble connecting to another OSD. To avoid this sort of false alarm,
81 we consider the peers reporting a failure a proxy for a potential "subcluster"
82 over the overall cluster that is similarly laggy. This is clearly not true in
83 all cases, but will sometimes help us localize the grace correction to a subset
84 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
85 group the peers into the "subcluster" by their common ancestor type in CRUSH
86 map. By default, only two reports from different subtree are required to report
87 another Ceph OSD Daemon ``down``. You can change the number of reporters from
88 unique subtrees and the common ancestor type required to report a Ceph OSD
89 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
90 and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
91 your Ceph configuration file, or by setting the value at runtime.
92
93
94 .. ditaa:: +---------+ +---------+ +---------+
95 | OSD 1 | | OSD 2 | | Monitor |
96 +---------+ +---------+ +---------+
97 | | |
98 | OSD 3 Is Down | |
99 |---------------+--------------->|
100 | | |
101 | | |
102 | | OSD 3 Is Down |
103 | |--------------->|
104 | | |
105 | | |
106 | | |---------+ Mark
107 | | | | OSD 3
108 | | |<--------+ Down
109
110
111 .. index:: peering failure
112
113 OSDs Report Peering Failure
114 ===========================
115
116 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
117 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
118 the most recent copy of the cluster map every 30 seconds. You can change the
119 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
120 setting under the ``[osd]`` section of your Ceph configuration file, or by
121 setting the value at runtime.
122
123 .. ditaa:: +---------+ +---------+ +-------+ +---------+
124 | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
125 +---------+ +---------+ +-------+ +---------+
126 | | | |
127 | Request To | | |
128 | Peer | | |
129 |-------------->| | |
130 |<--------------| | |
131 | Peering | |
132 | | |
133 | Request To | |
134 | Peer | |
135 |----------------------------->| |
136 | |
137 |----+ OSD Monitor |
138 | | Heartbeat |
139 |<---+ Interval Exceeded |
140 | |
141 | Failed to Peer with OSD 3 |
142 |-------------------------------------------->|
143 |<--------------------------------------------|
144 | Receive New Cluster Map |
145
146
147 .. index:: OSD status
148
149 OSDs Report Their Status
150 ========================
151
152 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
153 consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
154 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
155 event such as a failure, a change in placement group stats, a change in
156 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
157 Daemon minimum report interval by adding an ``osd mon report interval min``
158 setting under the ``[osd]`` section of your Ceph configuration file, or by
159 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
160 Monitor every 120 seconds irrespective of whether any notable changes occur.
161 You can change the Ceph Monitor report interval by adding an ``osd mon report
162 interval max`` setting under the ``[osd]`` section of your Ceph configuration
163 file, or by setting the value at runtime.
164
165
166 .. ditaa:: +---------+ +---------+
167 | OSD 1 | | Monitor |
168 +---------+ +---------+
169 | |
170 |----+ Report Min |
171 | | Interval |
172 |<---+ Exceeded |
173 | |
174 |----+ Reportable |
175 | | Event |
176 |<---+ Occurs |
177 | |
178 | Report To |
179 | Monitor |
180 |------------------->|
181 | |
182 |----+ Report Max |
183 | | Interval |
184 |<---+ Exceeded |
185 | |
186 | Report To |
187 | Monitor |
188 |------------------->|
189 | |
190 |----+ Monitor |
191 | | Fails |
192 |<---+ |
193 +----+ Monitor OSD
194 | | Report Timeout
195 |<---+ Exceeded
196 |
197 +----+ Mark
198 | | OSD 1
199 |<---+ Down
200
201
202
203
204 Configuration Settings
205 ======================
206
207 When modifying heartbeat settings, you should include them in the ``[global]``
208 section of your configuration file.
209
210 .. index:: monitor heartbeat
211
212 Monitor Settings
213 ----------------
214
215 ``mon osd min up ratio``
216
217 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
218 mark Ceph OSD Daemons ``down``.
219
220 :Type: Double
221 :Default: ``.3``
222
223
224 ``mon osd min in ratio``
225
226 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
227 mark Ceph OSD Daemons ``out``.
228
229 :Type: Double
230 :Default: ``.75``
231
232
233 ``mon osd laggy halflife``
234
235 :Description: The number of seconds laggy estimates will decay.
236 :Type: Integer
237 :Default: ``60*60``
238
239
240 ``mon osd laggy weight``
241
242 :Description: The weight for new samples in laggy estimation decay.
243 :Type: Double
244 :Default: ``0.3``
245
246
247
248 ``mon osd laggy max interval``
249 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
250 Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
251 a certain OSD. This value will be used to calculate the grace time for
252 that OSD.
253 :Type: Integer
254 :Default: 300
255
256 ``mon osd adjust heartbeat grace``
257
258 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
259 :Type: Boolean
260 :Default: ``true``
261
262
263 ``mon osd adjust down out interval``
264
265 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
266 :Type: Boolean
267 :Default: ``true``
268
269
270 ``mon osd auto mark in``
271
272 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
273 the Ceph Storage Cluster.
274
275 :Type: Boolean
276 :Default: ``false``
277
278
279 ``mon osd auto mark auto out in``
280
281 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
282 of the Ceph Storage Cluster as ``in`` the cluster.
283
284 :Type: Boolean
285 :Default: ``true``
286
287
288 ``mon osd auto mark new in``
289
290 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
291 Ceph Storage Cluster.
292
293 :Type: Boolean
294 :Default: ``true``
295
296
297 ``mon osd down out interval``
298
299 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
300 ``down`` and ``out`` if it doesn't respond.
301
302 :Type: 32-bit Integer
303 :Default: ``600``
304
305
306 ``mon osd down out subtree limit``
307
308 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
309 automatically mark out. For instance, if set to ``host`` and if
310 all OSDs of a host are down, Ceph will not automatically mark out
311 these OSDs.
312
313 :Type: String
314 :Default: ``rack``
315
316
317 ``mon osd report timeout``
318
319 :Description: The grace period in seconds before declaring
320 unresponsive Ceph OSD Daemons ``down``.
321
322 :Type: 32-bit Integer
323 :Default: ``900``
324
325 ``mon osd min down reporters``
326
327 :Description: The minimum number of Ceph OSD Daemons required to report a
328 ``down`` Ceph OSD Daemon.
329
330 :Type: 32-bit Integer
331 :Default: ``2``
332
333
334 ``mon osd reporter subtree level``
335
336 :Description: In which level of parent bucket the reporters are counted. The OSDs
337 send failure reports to monitor if they find its peer is not responsive.
338 And monitor mark the reported OSD out and then down after a grace period.
339 :Type: String
340 :Default: ``host``
341
342
343 .. index:: OSD hearbeat
344
345 OSD Settings
346 ------------
347
348 ``osd heartbeat address``
349
350 :Description: An Ceph OSD Daemon's network address for heartbeats.
351 :Type: Address
352 :Default: The host address.
353
354
355 ``osd heartbeat interval``
356
357 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
358 :Type: 32-bit Integer
359 :Default: ``6``
360
361
362 ``osd heartbeat grace``
363
364 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
365 that the Ceph Storage Cluster considers it ``down``.
366 This setting has to be set in both the [mon] and [osd] or [global]
367 section so that it is read by both the MON and OSD daemons.
368 :Type: 32-bit Integer
369 :Default: ``20``
370
371
372 ``osd mon heartbeat interval``
373
374 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
375 Ceph OSD Daemon peers.
376
377 :Type: 32-bit Integer
378 :Default: ``30``
379
380
381 ``osd mon report interval max``
382
383 :Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
384 it must report to a Ceph Monitor.
385
386 :Type: 32-bit Integer
387 :Default: ``120``
388
389
390 ``osd mon report interval min``
391
392 :Description: The minimum number of seconds a Ceph OSD Daemon may wait
393 from startup or another reportable event before reporting
394 to a Ceph Monitor.
395
396 :Type: 32-bit Integer
397 :Default: ``5``
398 :Valid Range: Should be less than ``osd mon report interval max``
399
400
401 ``osd mon ack timeout``
402
403 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
404 request for statistics.
405
406 :Type: 32-bit Integer
407 :Default: ``30``