]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ===================================== |
2 | Configuring Monitor/OSD Interaction | |
3 | ===================================== | |
4 | ||
5 | .. index:: heartbeat | |
6 | ||
7 | After you have completed your initial Ceph configuration, you may deploy and run | |
8 | Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the | |
9 | :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage | |
10 | Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring | |
11 | reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph | |
12 | OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph | |
13 | Monitor doesn't receive reports, or if it receives reports of changes in the | |
14 | Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph | |
15 | Cluster Map`. | |
16 | ||
17 | Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon | |
18 | interaction. However, you may override the defaults. The following sections | |
19 | describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of | |
20 | monitoring the Ceph Storage Cluster. | |
21 | ||
22 | .. index:: heartbeat interval | |
23 | ||
24 | OSDs Check Heartbeats | |
25 | ===================== | |
26 | ||
27 | Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 | |
28 | seconds. You can change the heartbeat interval by adding an ``osd heartbeat | |
29 | interval`` setting under the ``[osd]`` section of your Ceph configuration file, | |
30 | or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't | |
31 | show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may | |
32 | consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph | |
33 | Monitor, which will update the Ceph Cluster Map. You may change this grace | |
34 | period by adding an ``osd heartbeat grace`` setting under the ``[mon]`` | |
35 | and ``[osd]`` or ``[global]`` section of your Ceph configuration file, | |
36 | or by setting the value at runtime. | |
37 | ||
38 | ||
39 | .. ditaa:: +---------+ +---------+ | |
40 | | OSD 1 | | OSD 2 | | |
41 | +---------+ +---------+ | |
42 | | | | |
43 | |----+ Heartbeat | | |
44 | | | Interval | | |
45 | |<---+ Exceeded | | |
46 | | | | |
47 | | Check | | |
48 | | Heartbeat | | |
49 | |------------------->| | |
50 | | | | |
51 | |<-------------------| | |
52 | | Heart Beating | | |
53 | | | | |
54 | |----+ Heartbeat | | |
55 | | | Interval | | |
56 | |<---+ Exceeded | | |
57 | | | | |
58 | | Check | | |
59 | | Heartbeat | | |
60 | |------------------->| | |
61 | | | | |
62 | |----+ Grace | | |
63 | | | Period | | |
64 | |<---+ Exceeded | | |
65 | | | | |
66 | |----+ Mark | | |
67 | | | OSD 2 | | |
68 | |<---+ Down | | |
69 | ||
70 | ||
71 | .. index:: OSD down report | |
72 | ||
73 | OSDs Report Down OSDs | |
74 | ===================== | |
75 | ||
76 | By default, a Ceph OSD Daemon must report to the Ceph Monitors that another Ceph | |
77 | OSD Daemon is ``down`` three times before the Ceph Monitors acknowledge that the | |
78 | reported Ceph OSD Daemon is ``down``. By default, only one | |
79 | Ceph OSD Daemon is required to report another Ceph OSD Daemon ``down``. You can | |
80 | change the number of Ceph OSD Daemones required to report a Ceph OSD Daemon | |
81 | ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters`` setting | |
82 | (``osd min down reporters`` prior to v0.62) under the ``[mon]`` section of your | |
83 | Ceph configuration file, or by setting the value at runtime. | |
84 | ||
85 | ||
86 | .. ditaa:: +---------+ +---------+ | |
87 | | OSD 1 | | Monitor | | |
88 | +---------+ +---------+ | |
89 | | | | |
90 | | OSD 2 Is Down | | |
91 | |-------------->| | |
92 | | | | |
93 | | OSD 2 Is Down | | |
94 | |-------------->| | |
95 | | | | |
96 | | OSD 2 Is Down | | |
97 | |-------------->| | |
98 | | | | |
99 | | |----------+ Mark | |
100 | | | | OSD 2 | |
101 | | |<---------+ Down | |
102 | ||
103 | ||
104 | .. index:: peering failure | |
105 | ||
106 | OSDs Report Peering Failure | |
107 | =========================== | |
108 | ||
109 | If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its | |
110 | Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for | |
111 | the most recent copy of the cluster map every 30 seconds. You can change the | |
112 | Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval`` | |
113 | setting under the ``[osd]`` section of your Ceph configuration file, or by | |
114 | setting the value at runtime. | |
115 | ||
116 | .. ditaa:: +---------+ +---------+ +-------+ +---------+ | |
117 | | OSD 1 | | OSD 2 | | OSD 3 | | Monitor | | |
118 | +---------+ +---------+ +-------+ +---------+ | |
119 | | | | | | |
120 | | Request To | | | | |
121 | | Peer | | | | |
122 | |-------------->| | | | |
123 | |<--------------| | | | |
124 | | Peering | | | |
125 | | | | | |
126 | | Request To | | | |
127 | | Peer | | | |
128 | |----------------------------->| | | |
129 | | | | |
130 | |----+ OSD Monitor | | |
131 | | | Heartbeat | | |
132 | |<---+ Interval Exceeded | | |
133 | | | | |
134 | | Failed to Peer with OSD 3 | | |
135 | |-------------------------------------------->| | |
136 | |<--------------------------------------------| | |
137 | | Receive New Cluster Map | | |
138 | ||
139 | ||
140 | .. index:: OSD status | |
141 | ||
142 | OSDs Report Their Status | |
143 | ======================== | |
144 | ||
145 | If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will | |
146 | consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout`` | |
147 | elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable | |
148 | event such as a failure, a change in placement group stats, a change in | |
149 | ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD | |
150 | Daemon minimum report interval by adding an ``osd mon report interval min`` | |
151 | setting under the ``[osd]`` section of your Ceph configuration file, or by | |
152 | setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph | |
153 | Monitor every 120 seconds irrespective of whether any notable changes occur. | |
154 | You can change the Ceph Monitor report interval by adding an ``osd mon report | |
155 | interval max`` setting under the ``[osd]`` section of your Ceph configuration | |
156 | file, or by setting the value at runtime. | |
157 | ||
158 | ||
159 | .. ditaa:: +---------+ +---------+ | |
160 | | OSD 1 | | Monitor | | |
161 | +---------+ +---------+ | |
162 | | | | |
163 | |----+ Report Min | | |
164 | | | Interval | | |
165 | |<---+ Exceeded | | |
166 | | | | |
167 | |----+ Reportable | | |
168 | | | Event | | |
169 | |<---+ Occurs | | |
170 | | | | |
171 | | Report To | | |
172 | | Monitor | | |
173 | |------------------->| | |
174 | | | | |
175 | |----+ Report Max | | |
176 | | | Interval | | |
177 | |<---+ Exceeded | | |
178 | | | | |
179 | | Report To | | |
180 | | Monitor | | |
181 | |------------------->| | |
182 | | | | |
183 | |----+ Monitor | | |
184 | | | Fails | | |
185 | |<---+ | | |
186 | +----+ Monitor OSD | |
187 | | | Report Timeout | |
188 | |<---+ Exceeded | |
189 | | | |
190 | +----+ Mark | |
191 | | | OSD 1 | |
192 | |<---+ Down | |
193 | ||
194 | ||
195 | ||
196 | ||
197 | Configuration Settings | |
198 | ====================== | |
199 | ||
200 | When modifying heartbeat settings, you should include them in the ``[global]`` | |
201 | section of your configuration file. | |
202 | ||
203 | .. index:: monitor heartbeat | |
204 | ||
205 | Monitor Settings | |
206 | ---------------- | |
207 | ||
208 | ``mon osd min up ratio`` | |
209 | ||
210 | :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will | |
211 | mark Ceph OSD Daemons ``down``. | |
212 | ||
213 | :Type: Double | |
214 | :Default: ``.3`` | |
215 | ||
216 | ||
217 | ``mon osd min in ratio`` | |
218 | ||
219 | :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will | |
220 | mark Ceph OSD Daemons ``out``. | |
221 | ||
222 | :Type: Double | |
223 | :Default: ``.75`` | |
224 | ||
225 | ||
226 | ``mon osd laggy halflife`` | |
227 | ||
228 | :Description: The number of seconds laggy estimates will decay. | |
229 | :Type: Integer | |
230 | :Default: ``60*60`` | |
231 | ||
232 | ||
233 | ``mon osd laggy weight`` | |
234 | ||
235 | :Description: The weight for new samples in laggy estimation decay. | |
236 | :Type: Double | |
237 | :Default: ``0.3`` | |
238 | ||
239 | ||
240 | ``mon osd adjust heartbeat grace`` | |
241 | ||
242 | :Description: If set to ``true``, Ceph will scale based on laggy estimations. | |
243 | :Type: Boolean | |
244 | :Default: ``true`` | |
245 | ||
246 | ||
247 | ``mon osd adjust down out interval`` | |
248 | ||
249 | :Description: If set to ``true``, Ceph will scaled based on laggy estimations. | |
250 | :Type: Boolean | |
251 | :Default: ``true`` | |
252 | ||
253 | ||
254 | ``mon osd auto mark in`` | |
255 | ||
256 | :Description: Ceph will mark any booting Ceph OSD Daemons as ``in`` | |
257 | the Ceph Storage Cluster. | |
258 | ||
259 | :Type: Boolean | |
260 | :Default: ``false`` | |
261 | ||
262 | ||
263 | ``mon osd auto mark auto out in`` | |
264 | ||
265 | :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out`` | |
266 | of the Ceph Storage Cluster as ``in`` the cluster. | |
267 | ||
268 | :Type: Boolean | |
269 | :Default: ``true`` | |
270 | ||
271 | ||
272 | ``mon osd auto mark new in`` | |
273 | ||
274 | :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the | |
275 | Ceph Storage Cluster. | |
276 | ||
277 | :Type: Boolean | |
278 | :Default: ``true`` | |
279 | ||
280 | ||
281 | ``mon osd down out interval`` | |
282 | ||
283 | :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon | |
284 | ``down`` and ``out`` if it doesn't respond. | |
285 | ||
286 | :Type: 32-bit Integer | |
287 | :Default: ``600`` | |
288 | ||
289 | ||
290 | ``mon osd down out subtree limit`` | |
291 | ||
292 | :Description: The smallest :term:`CRUSH` unit type that Ceph will **not** | |
293 | automatically mark out. For instance, if set to ``host`` and if | |
294 | all OSDs of a host are down, Ceph will not automatically mark out | |
295 | these OSDs. | |
296 | ||
297 | :Type: String | |
298 | :Default: ``rack`` | |
299 | ||
300 | ||
301 | ``mon osd report timeout`` | |
302 | ||
303 | :Description: The grace period in seconds before declaring | |
304 | unresponsive Ceph OSD Daemons ``down``. | |
305 | ||
306 | :Type: 32-bit Integer | |
307 | :Default: ``900`` | |
308 | ||
309 | ``mon osd min down reporters`` | |
310 | ||
311 | :Description: The minimum number of Ceph OSD Daemons required to report a | |
312 | ``down`` Ceph OSD Daemon. | |
313 | ||
314 | :Type: 32-bit Integer | |
315 | :Default: ``1`` | |
316 | ||
317 | ||
318 | .. index:: OSD hearbeat | |
319 | ||
320 | OSD Settings | |
321 | ------------ | |
322 | ||
323 | ``osd heartbeat address`` | |
324 | ||
325 | :Description: An Ceph OSD Daemon's network address for heartbeats. | |
326 | :Type: Address | |
327 | :Default: The host address. | |
328 | ||
329 | ||
330 | ``osd heartbeat interval`` | |
331 | ||
332 | :Description: How often an Ceph OSD Daemon pings its peers (in seconds). | |
333 | :Type: 32-bit Integer | |
334 | :Default: ``6`` | |
335 | ||
336 | ||
337 | ``osd heartbeat grace`` | |
338 | ||
339 | :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat | |
340 | that the Ceph Storage Cluster considers it ``down``. | |
341 | This setting has to be set in both the [mon] and [osd] or [global] | |
342 | section so that it is read by both the MON and OSD daemons. | |
343 | ||
344 | :Type: 32-bit Integer | |
345 | :Default: ``20`` | |
346 | ||
347 | ||
348 | ``osd mon heartbeat interval`` | |
349 | ||
350 | :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no | |
351 | Ceph OSD Daemon peers. | |
352 | ||
353 | :Type: 32-bit Integer | |
354 | :Default: ``30`` | |
355 | ||
356 | ||
357 | ``osd mon report interval max`` | |
358 | ||
359 | :Description: The maximum time in seconds that a Ceph OSD Daemon can wait before | |
360 | it must report to a Ceph Monitor. | |
361 | ||
362 | :Type: 32-bit Integer | |
363 | :Default: ``120`` | |
364 | ||
365 | ||
366 | ``osd mon report interval min`` | |
367 | ||
368 | :Description: The minimum number of seconds a Ceph OSD Daemon may wait | |
369 | from startup or another reportable event before reporting | |
370 | to a Ceph Monitor. | |
371 | ||
372 | :Type: 32-bit Integer | |
373 | :Default: ``5`` | |
374 | :Valid Range: Should be less than ``osd mon report interval max`` | |
375 | ||
376 | ||
377 | ``osd mon ack timeout`` | |
378 | ||
379 | :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a | |
380 | request for statistics. | |
381 | ||
382 | :Type: 32-bit Integer | |
383 | :Default: ``30`` | |
384 |