]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/configuration/mon-osd-interaction.rst
import ceph 14.2.5
[ceph.git] / ceph / doc / rados / configuration / mon-osd-interaction.rst
1 =====================================
2 Configuring Monitor/OSD Interaction
3 =====================================
4
5 .. index:: heartbeat
6
7 After you have completed your initial Ceph configuration, you may deploy and run
8 Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
13 Monitor doesn't receive reports, or if it receives reports of changes in the
14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
15 Cluster Map`.
16
17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
18 interaction. However, you may override the defaults. The following sections
19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
20 monitoring the Ceph Storage Cluster.
21
22 .. index:: heartbeat interval
23
24 OSDs Check Heartbeats
25 =====================
26
27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons at random
28 intervals less than every 6 seconds. If a neighboring Ceph OSD Daemon doesn't
29 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
30 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
31 Monitor, which will update the Ceph Cluster Map. You may change this grace
32 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
33 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
34 or by setting the value at runtime.
35
36
37 .. ditaa:: +---------+ +---------+
38 | OSD 1 | | OSD 2 |
39 +---------+ +---------+
40 | |
41 |----+ Heartbeat |
42 | | Interval |
43 |<---+ Exceeded |
44 | |
45 | Check |
46 | Heartbeat |
47 |------------------->|
48 | |
49 |<-------------------|
50 | Heart Beating |
51 | |
52 |----+ Heartbeat |
53 | | Interval |
54 |<---+ Exceeded |
55 | |
56 | Check |
57 | Heartbeat |
58 |------------------->|
59 | |
60 |----+ Grace |
61 | | Period |
62 |<---+ Exceeded |
63 | |
64 |----+ Mark |
65 | | OSD 2 |
66 |<---+ Down |
67
68
69 .. index:: OSD down report
70
71 OSDs Report Down OSDs
72 =====================
73
74 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
75 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
76 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
77 that all the OSDs reporting the failure are hosted in a rack with a bad switch
78 which has trouble connecting to another OSD. To avoid this sort of false alarm,
79 we consider the peers reporting a failure a proxy for a potential "subcluster"
80 over the overall cluster that is similarly laggy. This is clearly not true in
81 all cases, but will sometimes help us localize the grace correction to a subset
82 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
83 group the peers into the "subcluster" by their common ancestor type in CRUSH
84 map. By default, only two reports from different subtree are required to report
85 another Ceph OSD Daemon ``down``. You can change the number of reporters from
86 unique subtrees and the common ancestor type required to report a Ceph OSD
87 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
88 and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
89 your Ceph configuration file, or by setting the value at runtime.
90
91
92 .. ditaa:: +---------+ +---------+ +---------+
93 | OSD 1 | | OSD 2 | | Monitor |
94 +---------+ +---------+ +---------+
95 | | |
96 | OSD 3 Is Down | |
97 |---------------+--------------->|
98 | | |
99 | | |
100 | | OSD 3 Is Down |
101 | |--------------->|
102 | | |
103 | | |
104 | | |---------+ Mark
105 | | | | OSD 3
106 | | |<--------+ Down
107
108
109 .. index:: peering failure
110
111 OSDs Report Peering Failure
112 ===========================
113
114 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
115 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
116 the most recent copy of the cluster map every 30 seconds. You can change the
117 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
118 setting under the ``[osd]`` section of your Ceph configuration file, or by
119 setting the value at runtime.
120
121 .. ditaa:: +---------+ +---------+ +-------+ +---------+
122 | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
123 +---------+ +---------+ +-------+ +---------+
124 | | | |
125 | Request To | | |
126 | Peer | | |
127 |-------------->| | |
128 |<--------------| | |
129 | Peering | |
130 | | |
131 | Request To | |
132 | Peer | |
133 |----------------------------->| |
134 | |
135 |----+ OSD Monitor |
136 | | Heartbeat |
137 |<---+ Interval Exceeded |
138 | |
139 | Failed to Peer with OSD 3 |
140 |-------------------------------------------->|
141 |<--------------------------------------------|
142 | Receive New Cluster Map |
143
144
145 .. index:: OSD status
146
147 OSDs Report Their Status
148 ========================
149
150 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
151 consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
152 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
153 event such as a failure, a change in placement group stats, a change in
154 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
155 Daemon minimum report interval by adding an ``osd mon report interval``
156 setting under the ``[osd]`` section of your Ceph configuration file, or by
157 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
158 Monitor every 120 seconds irrespective of whether any notable changes occur.
159 You can change the Ceph Monitor report interval by adding an ``osd mon report
160 interval max`` setting under the ``[osd]`` section of your Ceph configuration
161 file, or by setting the value at runtime.
162
163
164 .. ditaa:: +---------+ +---------+
165 | OSD 1 | | Monitor |
166 +---------+ +---------+
167 | |
168 |----+ Report Min |
169 | | Interval |
170 |<---+ Exceeded |
171 | |
172 |----+ Reportable |
173 | | Event |
174 |<---+ Occurs |
175 | |
176 | Report To |
177 | Monitor |
178 |------------------->|
179 | |
180 |----+ Report Max |
181 | | Interval |
182 |<---+ Exceeded |
183 | |
184 | Report To |
185 | Monitor |
186 |------------------->|
187 | |
188 |----+ Monitor |
189 | | Fails |
190 |<---+ |
191 +----+ Monitor OSD
192 | | Report Timeout
193 |<---+ Exceeded
194 |
195 +----+ Mark
196 | | OSD 1
197 |<---+ Down
198
199
200
201
202 Configuration Settings
203 ======================
204
205 When modifying heartbeat settings, you should include them in the ``[global]``
206 section of your configuration file.
207
208 .. index:: monitor heartbeat
209
210 Monitor Settings
211 ----------------
212
213 ``mon osd min up ratio``
214
215 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
216 mark Ceph OSD Daemons ``down``.
217
218 :Type: Double
219 :Default: ``.3``
220
221
222 ``mon osd min in ratio``
223
224 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
225 mark Ceph OSD Daemons ``out``.
226
227 :Type: Double
228 :Default: ``.75``
229
230
231 ``mon osd laggy halflife``
232
233 :Description: The number of seconds laggy estimates will decay.
234 :Type: Integer
235 :Default: ``60*60``
236
237
238 ``mon osd laggy weight``
239
240 :Description: The weight for new samples in laggy estimation decay.
241 :Type: Double
242 :Default: ``0.3``
243
244
245
246 ``mon osd laggy max interval``
247
248 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
249 Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
250 a certain OSD. This value will be used to calculate the grace time for
251 that OSD.
252 :Type: Integer
253 :Default: 300
254
255 ``mon osd adjust heartbeat grace``
256
257 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
258 :Type: Boolean
259 :Default: ``true``
260
261
262 ``mon osd adjust down out interval``
263
264 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
265 :Type: Boolean
266 :Default: ``true``
267
268
269 ``mon osd auto mark in``
270
271 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
272 the Ceph Storage Cluster.
273
274 :Type: Boolean
275 :Default: ``false``
276
277
278 ``mon osd auto mark auto out in``
279
280 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
281 of the Ceph Storage Cluster as ``in`` the cluster.
282
283 :Type: Boolean
284 :Default: ``true``
285
286
287 ``mon osd auto mark new in``
288
289 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
290 Ceph Storage Cluster.
291
292 :Type: Boolean
293 :Default: ``true``
294
295
296 ``mon osd down out interval``
297
298 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
299 ``down`` and ``out`` if it doesn't respond.
300
301 :Type: 32-bit Integer
302 :Default: ``600``
303
304
305 ``mon osd down out subtree limit``
306
307 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
308 automatically mark out. For instance, if set to ``host`` and if
309 all OSDs of a host are down, Ceph will not automatically mark out
310 these OSDs.
311
312 :Type: String
313 :Default: ``rack``
314
315
316 ``mon osd report timeout``
317
318 :Description: The grace period in seconds before declaring
319 unresponsive Ceph OSD Daemons ``down``.
320
321 :Type: 32-bit Integer
322 :Default: ``900``
323
324 ``mon osd min down reporters``
325
326 :Description: The minimum number of Ceph OSD Daemons required to report a
327 ``down`` Ceph OSD Daemon.
328
329 :Type: 32-bit Integer
330 :Default: ``2``
331
332
333 ``mon osd reporter subtree level``
334
335 :Description: In which level of parent bucket the reporters are counted. The OSDs
336 send failure reports to monitor if they find its peer is not responsive.
337 And monitor mark the reported OSD out and then down after a grace period.
338 :Type: String
339 :Default: ``host``
340
341
342 .. index:: OSD hearbeat
343
344 OSD Settings
345 ------------
346
347 ``osd heartbeat address``
348
349 :Description: An Ceph OSD Daemon's network address for heartbeats.
350 :Type: Address
351 :Default: The host address.
352
353
354 ``osd heartbeat interval``
355
356 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
357 :Type: 32-bit Integer
358 :Default: ``6``
359
360
361 ``osd heartbeat grace``
362
363 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
364 that the Ceph Storage Cluster considers it ``down``.
365 This setting has to be set in both the [mon] and [osd] or [global]
366 section so that it is read by both the MON and OSD daemons.
367 :Type: 32-bit Integer
368 :Default: ``20``
369
370
371 ``osd mon heartbeat interval``
372
373 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
374 Ceph OSD Daemon peers.
375
376 :Type: 32-bit Integer
377 :Default: ``30``
378
379
380 ``osd mon heartbeat stat stale``
381
382 :Description: Stop reporting on heartbeat ping times which haven't been updated for
383 this many seconds. Set to zero to disable this action.
384
385 :Type: 32-bit Integer
386 :Default: ``3600``
387
388
389 ``osd mon report interval``
390
391 :Description: The number of seconds a Ceph OSD Daemon may wait
392 from startup or another reportable event before reporting
393 to a Ceph Monitor.
394
395 :Type: 32-bit Integer
396 :Default: ``5``
397
398
399 ``osd mon ack timeout``
400
401 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
402 request for statistics.
403
404 :Type: 32-bit Integer
405 :Default: ``30``