]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | OSD Config Reference | |
3 | ====================== | |
4 | ||
5 | .. index:: OSD; configuration | |
6 | ||
f67539c2 TL |
7 | You can configure Ceph OSD Daemons in the Ceph configuration file (or in recent |
8 | releases, the central config store), but Ceph OSD | |
7c673cae | 9 | Daemons can use the default values and a very minimal configuration. A minimal |
f67539c2 | 10 | Ceph OSD Daemon configuration sets ``osd journal size`` (for Filestore), ``host``, and |
7c673cae FG |
11 | uses default values for nearly everything else. |
12 | ||
13 | Ceph OSD Daemons are numerically identified in incremental fashion, beginning | |
14 | with ``0`` using the following convention. :: | |
15 | ||
16 | osd.0 | |
17 | osd.1 | |
18 | osd.2 | |
19 | ||
20 | In a configuration file, you may specify settings for all Ceph OSD Daemons in | |
21 | the cluster by adding configuration settings to the ``[osd]`` section of your | |
22 | configuration file. To add settings directly to a specific Ceph OSD Daemon | |
23 | (e.g., ``host``), enter it in an OSD-specific section of your configuration | |
24 | file. For example: | |
25 | ||
26 | .. code-block:: ini | |
1adf2230 | 27 | |
7c673cae | 28 | [osd] |
f67539c2 | 29 | osd_journal_size = 5120 |
1adf2230 | 30 | |
7c673cae FG |
31 | [osd.0] |
32 | host = osd-host-a | |
1adf2230 | 33 | |
7c673cae FG |
34 | [osd.1] |
35 | host = osd-host-b | |
36 | ||
37 | ||
38 | .. index:: OSD; config settings | |
39 | ||
40 | General Settings | |
41 | ================ | |
42 | ||
9f95a23c | 43 | The following settings provide a Ceph OSD Daemon's ID, and determine paths to |
7c673cae | 44 | data and journals. Ceph deployment scripts typically generate the UUID |
1adf2230 AA |
45 | automatically. |
46 | ||
47 | .. warning:: **DO NOT** change the default paths for data or journals, as it | |
48 | makes it more problematic to troubleshoot Ceph later. | |
7c673cae | 49 | |
f67539c2 TL |
50 | When using Filestore, the journal size should be at least twice the product of the expected drive |
51 | speed multiplied by ``filestore_max_sync_interval``. However, the most common | |
7c673cae FG |
52 | practice is to partition the journal drive (often an SSD), and mount it such |
53 | that Ceph uses the entire partition for the journal. | |
54 | ||
55 | ||
f67539c2 | 56 | ``osd_uuid`` |
7c673cae FG |
57 | |
58 | :Description: The universally unique identifier (UUID) for the Ceph OSD Daemon. | |
59 | :Type: UUID | |
60 | :Default: The UUID. | |
f67539c2 | 61 | :Note: The ``osd_uuid`` applies to a single Ceph OSD Daemon. The ``fsid`` |
7c673cae FG |
62 | applies to the entire cluster. |
63 | ||
64 | ||
f67539c2 | 65 | ``osd_data`` |
7c673cae | 66 | |
1adf2230 AA |
67 | :Description: The path to the OSDs data. You must create the directory when |
68 | deploying Ceph. You should mount a drive for OSD data at this | |
69 | mount point. We do not recommend changing the default. | |
7c673cae FG |
70 | |
71 | :Type: String | |
72 | :Default: ``/var/lib/ceph/osd/$cluster-$id`` | |
73 | ||
74 | ||
f67539c2 | 75 | ``osd_max_write_size`` |
7c673cae FG |
76 | |
77 | :Description: The maximum size of a write in megabytes. | |
78 | :Type: 32-bit Integer | |
79 | :Default: ``90`` | |
80 | ||
81 | ||
f67539c2 | 82 | ``osd_max_object_size`` |
11fdf7f2 TL |
83 | |
84 | :Description: The maximum size of a RADOS object in bytes. | |
85 | :Type: 32-bit Unsigned Integer | |
86 | :Default: 128MB | |
87 | ||
88 | ||
f67539c2 | 89 | ``osd_client_message_size_cap`` |
7c673cae FG |
90 | |
91 | :Description: The largest client data message allowed in memory. | |
c07f9fc5 | 92 | :Type: 64-bit Unsigned Integer |
1adf2230 | 93 | :Default: 500MB default. ``500*1024L*1024L`` |
7c673cae FG |
94 | |
95 | ||
f67539c2 | 96 | ``osd_class_dir`` |
7c673cae FG |
97 | |
98 | :Description: The class path for RADOS class plug-ins. | |
99 | :Type: String | |
100 | :Default: ``$libdir/rados-classes`` | |
101 | ||
102 | ||
103 | .. index:: OSD; file system | |
104 | ||
105 | File System Settings | |
106 | ==================== | |
107 | Ceph builds and mounts file systems which are used for Ceph OSDs. | |
108 | ||
f67539c2 | 109 | ``osd_mkfs_options {fs-type}`` |
7c673cae | 110 | |
f67539c2 | 111 | :Description: Options used when creating a new Ceph Filestore OSD of type {fs-type}. |
7c673cae FG |
112 | |
113 | :Type: String | |
114 | :Default for xfs: ``-f -i 2048`` | |
115 | :Default for other file systems: {empty string} | |
116 | ||
117 | For example:: | |
f67539c2 | 118 | ``osd_mkfs_options_xfs = -f -d agcount=24`` |
7c673cae | 119 | |
f67539c2 | 120 | ``osd_mount_options {fs-type}`` |
7c673cae | 121 | |
f67539c2 | 122 | :Description: Options used when mounting a Ceph Filestore OSD of type {fs-type}. |
7c673cae FG |
123 | |
124 | :Type: String | |
125 | :Default for xfs: ``rw,noatime,inode64`` | |
126 | :Default for other file systems: ``rw, noatime`` | |
127 | ||
128 | For example:: | |
f67539c2 | 129 | ``osd_mount_options_xfs = rw, noatime, inode64, logbufs=8`` |
7c673cae FG |
130 | |
131 | ||
132 | .. index:: OSD; journal settings | |
133 | ||
134 | Journal Settings | |
135 | ================ | |
136 | ||
f67539c2 TL |
137 | This section applies only to the older Filestore OSD back end. Since Luminous |
138 | BlueStore has been default and preferred. | |
139 | ||
140 | By default, Ceph expects that you will provision a Ceph OSD Daemon's journal at | |
141 | the following path, which is usually a symlink to a device or partition:: | |
7c673cae FG |
142 | |
143 | /var/lib/ceph/osd/$cluster-$id/journal | |
144 | ||
1adf2230 AA |
145 | When using a single device type (for example, spinning drives), the journals |
146 | should be *colocated*: the logical volume (or partition) should be in the same | |
147 | device as the ``data`` logical volume. | |
7c673cae | 148 | |
1adf2230 AA |
149 | When using a mix of fast (SSDs, NVMe) devices with slower ones (like spinning |
150 | drives) it makes sense to place the journal on the faster device, while | |
151 | ``data`` occupies the slower device fully. | |
7c673cae | 152 | |
f67539c2 TL |
153 | The default ``osd_journal_size`` value is 5120 (5 gigabytes), but it can be |
154 | larger, in which case it will need to be set in the ``ceph.conf`` file. | |
155 | A value of 10 gigabytes is common in practice:: | |
7c673cae | 156 | |
f67539c2 | 157 | osd_journal_size = 10240 |
7c673cae | 158 | |
1adf2230 | 159 | |
f67539c2 | 160 | ``osd_journal`` |
7c673cae FG |
161 | |
162 | :Description: The path to the OSD's journal. This may be a path to a file or a | |
1adf2230 | 163 | block device (such as a partition of an SSD). If it is a file, |
7c673cae | 164 | you must create the directory to contain it. We recommend using a |
f67539c2 | 165 | separate fast device when the ``osd_data`` drive is an HDD. |
7c673cae FG |
166 | |
167 | :Type: String | |
168 | :Default: ``/var/lib/ceph/osd/$cluster-$id/journal`` | |
169 | ||
170 | ||
f67539c2 | 171 | ``osd_journal_size`` |
7c673cae | 172 | |
1adf2230 | 173 | :Description: The size of the journal in megabytes. |
7c673cae FG |
174 | |
175 | :Type: 32-bit Integer | |
176 | :Default: ``5120`` | |
7c673cae FG |
177 | |
178 | ||
179 | See `Journal Config Reference`_ for additional details. | |
180 | ||
181 | ||
182 | Monitor OSD Interaction | |
183 | ======================= | |
184 | ||
185 | Ceph OSD Daemons check each other's heartbeats and report to monitors | |
186 | periodically. Ceph can use default values in many cases. However, if your | |
f67539c2 | 187 | network has latency issues, you may need to adopt longer intervals. See |
7c673cae FG |
188 | `Configuring Monitor/OSD Interaction`_ for a detailed discussion of heartbeats. |
189 | ||
190 | ||
191 | Data Placement | |
192 | ============== | |
193 | ||
194 | See `Pool & PG Config Reference`_ for details. | |
195 | ||
196 | ||
197 | .. index:: OSD; scrubbing | |
198 | ||
199 | Scrubbing | |
200 | ========= | |
201 | ||
9f95a23c | 202 | In addition to making multiple copies of objects, Ceph ensures data integrity by |
7c673cae FG |
203 | scrubbing placement groups. Ceph scrubbing is analogous to ``fsck`` on the |
204 | object storage layer. For each placement group, Ceph generates a catalog of all | |
205 | objects and compares each primary object and its replicas to ensure that no | |
206 | objects are missing or mismatched. Light scrubbing (daily) checks the object | |
207 | size and attributes. Deep scrubbing (weekly) reads the data and uses checksums | |
208 | to ensure data integrity. | |
209 | ||
210 | Scrubbing is important for maintaining data integrity, but it can reduce | |
211 | performance. You can adjust the following settings to increase or decrease | |
212 | scrubbing operations. | |
213 | ||
214 | ||
f67539c2 | 215 | ``osd_max_scrubs`` |
7c673cae | 216 | |
1adf2230 | 217 | :Description: The maximum number of simultaneous scrub operations for |
7c673cae FG |
218 | a Ceph OSD Daemon. |
219 | ||
220 | :Type: 32-bit Int | |
1adf2230 | 221 | :Default: ``1`` |
7c673cae | 222 | |
f67539c2 | 223 | ``osd_scrub_begin_hour`` |
7c673cae | 224 | |
f67539c2 TL |
225 | :Description: This restricts scrubbing to this hour of the day or later. |
226 | Use ``osd_scrub_begin_hour = 0`` and ``osd_scrub_end_hour = 0`` | |
227 | to allow scrubbing the entire day. Along with ``osd_scrub_end_hour``, they define a time | |
228 | window, in which the scrubs can happen. | |
229 | But a scrub will be performed | |
230 | no matter whether the time window allows or not, as long as the placement | |
231 | group's scrub interval exceeds ``osd_scrub_max_interval``. | |
232 | :Type: Integer in the range of 0 to 23 | |
7c673cae FG |
233 | :Default: ``0`` |
234 | ||
235 | ||
f67539c2 | 236 | ``osd_scrub_end_hour`` |
7c673cae | 237 | |
f67539c2 TL |
238 | :Description: This restricts scrubbing to the hour earlier than this. |
239 | Use ``osd_scrub_begin_hour = 0`` and ``osd_scrub_end_hour = 0`` to allow scrubbing | |
240 | for the entire day. Along with ``osd_scrub_begin_hour``, they define a time | |
7c673cae | 241 | window, in which the scrubs can happen. But a scrub will be performed |
f67539c2 TL |
242 | no matter whether the time window allows or not, as long as the placement |
243 | group's scrub interval exceeds ``osd_scrub_max_interval``. | |
244 | :Type: Integer in the range of 0 to 23 | |
245 | :Default: ``0`` | |
7c673cae FG |
246 | |
247 | ||
f67539c2 | 248 | ``osd_scrub_begin_week_day`` |
11fdf7f2 TL |
249 | |
250 | :Description: This restricts scrubbing to this day of the week or later. | |
f67539c2 TL |
251 | 0 = Sunday, 1 = Monday, etc. Use ``osd_scrub_begin_week_day = 0`` |
252 | and ``osd_scrub_end_week_day = 0`` to allow scrubbing for the entire week. | |
253 | Along with ``osd_scrub_end_week_day``, they define a time window in which | |
254 | scrubs can happen. But a scrub will be performed | |
255 | no matter whether the time window allows or not, when the PG's | |
256 | scrub interval exceeds ``osd_scrub_max_interval``. | |
257 | :Type: Integer in the range of 0 to 6 | |
11fdf7f2 TL |
258 | :Default: ``0`` |
259 | ||
260 | ||
f67539c2 | 261 | ``osd_scrub_end_week_day`` |
11fdf7f2 TL |
262 | |
263 | :Description: This restricts scrubbing to days of the week earlier than this. | |
f67539c2 TL |
264 | 0 = Sunday, 1 = Monday, etc. Use ``osd_scrub_begin_week_day = 0`` |
265 | and ``osd_scrub_end_week_day = 0`` to allow scrubbing for the entire week. | |
266 | Along with ``osd_scrub_begin_week_day``, they define a time | |
267 | window, in which the scrubs can happen. But a scrub will be performed | |
268 | no matter whether the time window allows or not, as long as the placement | |
269 | group's scrub interval exceeds ``osd_scrub_max_interval``. | |
270 | :Type: Integer in the range of 0 to 6 | |
271 | :Default: ``0`` | |
11fdf7f2 TL |
272 | |
273 | ||
7c673cae FG |
274 | ``osd scrub during recovery`` |
275 | ||
276 | :Description: Allow scrub during recovery. Setting this to ``false`` will disable | |
277 | scheduling new scrub (and deep--scrub) while there is active recovery. | |
278 | Already running scrubs will be continued. This might be useful to reduce | |
279 | load on busy clusters. | |
280 | :Type: Boolean | |
f6b5b4d7 | 281 | :Default: ``false`` |
7c673cae FG |
282 | |
283 | ||
f67539c2 | 284 | ``osd_scrub_thread_timeout`` |
7c673cae FG |
285 | |
286 | :Description: The maximum time in seconds before timing out a scrub thread. | |
287 | :Type: 32-bit Integer | |
1adf2230 | 288 | :Default: ``60`` |
7c673cae FG |
289 | |
290 | ||
f67539c2 | 291 | ``osd_scrub_finalize_thread_timeout`` |
7c673cae | 292 | |
1adf2230 | 293 | :Description: The maximum time in seconds before timing out a scrub finalize |
7c673cae FG |
294 | thread. |
295 | ||
296 | :Type: 32-bit Integer | |
f67539c2 | 297 | :Default: ``10*60`` |
7c673cae FG |
298 | |
299 | ||
f67539c2 | 300 | ``osd_scrub_load_threshold`` |
7c673cae | 301 | |
11fdf7f2 | 302 | :Description: The normalized maximum load. Ceph will not scrub when the system load |
f67539c2 | 303 | (as defined by ``getloadavg() / number of online CPUs``) is higher than this number. |
7c673cae FG |
304 | Default is ``0.5``. |
305 | ||
306 | :Type: Float | |
1adf2230 | 307 | :Default: ``0.5`` |
7c673cae FG |
308 | |
309 | ||
f67539c2 | 310 | ``osd_scrub_min_interval`` |
7c673cae FG |
311 | |
312 | :Description: The minimal interval in seconds for scrubbing the Ceph OSD Daemon | |
313 | when the Ceph Storage Cluster load is low. | |
314 | ||
315 | :Type: Float | |
f67539c2 | 316 | :Default: Once per day. ``24*60*60`` |
7c673cae | 317 | |
f67539c2 | 318 | .. _osd_scrub_max_interval: |
7c673cae | 319 | |
f67539c2 | 320 | ``osd_scrub_max_interval`` |
7c673cae | 321 | |
1adf2230 | 322 | :Description: The maximum interval in seconds for scrubbing the Ceph OSD Daemon |
7c673cae FG |
323 | irrespective of cluster load. |
324 | ||
325 | :Type: Float | |
f67539c2 | 326 | :Default: Once per week. ``7*24*60*60`` |
7c673cae FG |
327 | |
328 | ||
f67539c2 | 329 | ``osd_scrub_chunk_min`` |
7c673cae FG |
330 | |
331 | :Description: The minimal number of object store chunks to scrub during single operation. | |
332 | Ceph blocks writes to single chunk during scrub. | |
333 | ||
334 | :Type: 32-bit Integer | |
335 | :Default: 5 | |
336 | ||
337 | ||
f67539c2 | 338 | ``osd_scrub_chunk_max`` |
7c673cae FG |
339 | |
340 | :Description: The maximum number of object store chunks to scrub during single operation. | |
341 | ||
342 | :Type: 32-bit Integer | |
343 | :Default: 25 | |
344 | ||
345 | ||
f67539c2 | 346 | ``osd_scrub_sleep`` |
7c673cae | 347 | |
f67539c2 TL |
348 | :Description: Time to sleep before scrubbing the next group of chunks. Increasing this value will slow |
349 | down the overall rate of scrubbing so that client operations will be less impacted. | |
7c673cae FG |
350 | |
351 | :Type: Float | |
352 | :Default: 0 | |
353 | ||
354 | ||
f67539c2 | 355 | ``osd_deep_scrub_interval`` |
7c673cae | 356 | |
1adf2230 | 357 | :Description: The interval for "deep" scrubbing (fully reading all data). The |
f67539c2 | 358 | ``osd_scrub_load_threshold`` does not affect this setting. |
7c673cae FG |
359 | |
360 | :Type: Float | |
f67539c2 | 361 | :Default: Once per week. ``7*24*60*60`` |
7c673cae FG |
362 | |
363 | ||
f67539c2 | 364 | ``osd_scrub_interval_randomize_ratio`` |
7c673cae | 365 | |
f67539c2 TL |
366 | :Description: Add a random delay to ``osd_scrub_min_interval`` when scheduling |
367 | the next scrub job for a PG. The delay is a random | |
368 | value less than ``osd_scrub_min_interval`` \* | |
369 | ``osd_scrub_interval_randomized_ratio``. The default setting | |
370 | spreads scrubs throughout the allowed time | |
371 | window of ``[1, 1.5]`` \* ``osd_scrub_min_interval``. | |
7c673cae FG |
372 | :Type: Float |
373 | :Default: ``0.5`` | |
374 | ||
f67539c2 | 375 | ``osd_deep_scrub_stride`` |
7c673cae FG |
376 | |
377 | :Description: Read size when doing a deep scrub. | |
378 | :Type: 32-bit Integer | |
379 | :Default: 512 KB. ``524288`` | |
380 | ||
381 | ||
f67539c2 | 382 | ``osd_scrub_auto_repair`` |
7c673cae | 383 | |
f67539c2 TL |
384 | :Description: Setting this to ``true`` will enable automatic PG repair when errors |
385 | are found by scrubs or deep-scrubs. However, if more than | |
386 | ``osd_scrub_auto_repair_num_errors`` errors are found a repair is NOT performed. | |
11fdf7f2 TL |
387 | :Type: Boolean |
388 | :Default: ``false`` | |
7c673cae | 389 | |
7c673cae | 390 | |
f67539c2 | 391 | ``osd_scrub_auto_repair_num_errors`` |
7c673cae | 392 | |
11fdf7f2 TL |
393 | :Description: Auto repair will not occur if more than this many errors are found. |
394 | :Type: 32-bit Integer | |
395 | :Default: ``5`` | |
7c673cae | 396 | |
7c673cae | 397 | |
11fdf7f2 | 398 | .. index:: OSD; operations settings |
7c673cae | 399 | |
11fdf7f2 TL |
400 | Operations |
401 | ========== | |
7c673cae | 402 | |
f67539c2 | 403 | ``osd_op_queue`` |
7c673cae FG |
404 | |
405 | :Description: This sets the type of queue to be used for prioritizing ops | |
f67539c2 | 406 | within each OSD. Both queues feature a strict sub-queue which is |
7c673cae | 407 | dequeued before the normal queue. The normal queue is different |
f67539c2 TL |
408 | between implementations. The WeightedPriorityQueue (``wpq``) |
409 | dequeues operations in relation to their priorities to prevent | |
410 | starvation of any queue. WPQ should help in cases where a few OSDs | |
411 | are more overloaded than others. The new mClockQueue | |
412 | (``mclock_scheduler``) prioritizes operations based on which class | |
c07f9fc5 | 413 | they belong to (recovery, scrub, snaptrim, client op, osd subop). |
f67539c2 | 414 | See `QoS Based on mClock`_. Requires a restart. |
7c673cae FG |
415 | |
416 | :Type: String | |
f67539c2 | 417 | :Valid Choices: wpq, mclock_scheduler |
9f95a23c | 418 | :Default: ``wpq`` |
7c673cae FG |
419 | |
420 | ||
f67539c2 | 421 | ``osd_op_queue_cut_off`` |
7c673cae FG |
422 | |
423 | :Description: This selects which priority ops will be sent to the strict | |
424 | queue verses the normal queue. The ``low`` setting sends all | |
425 | replication ops and higher to the strict queue, while the ``high`` | |
9f95a23c | 426 | option sends only replication acknowledgment ops and higher to |
7c673cae FG |
427 | the strict queue. Setting this to ``high`` should help when a few |
428 | OSDs in the cluster are very busy especially when combined with | |
f67539c2 | 429 | ``wpq`` in the ``osd_op_queue`` setting. OSDs that are very busy |
7c673cae FG |
430 | handling replication traffic could starve primary client traffic |
431 | on these OSDs without these settings. Requires a restart. | |
432 | ||
433 | :Type: String | |
434 | :Valid Choices: low, high | |
9f95a23c | 435 | :Default: ``high`` |
7c673cae FG |
436 | |
437 | ||
f67539c2 | 438 | ``osd_client_op_priority`` |
7c673cae | 439 | |
f67539c2 TL |
440 | :Description: The priority set for client operations. This value is relative |
441 | to that of ``osd_recovery_op_priority`` below. The default | |
442 | strongly favors client ops over recovery. | |
7c673cae FG |
443 | |
444 | :Type: 32-bit Integer | |
1adf2230 | 445 | :Default: ``63`` |
7c673cae FG |
446 | :Valid Range: 1-63 |
447 | ||
448 | ||
f67539c2 | 449 | ``osd_recovery_op_priority`` |
7c673cae | 450 | |
f67539c2 TL |
451 | :Description: The priority of recovery operations vs client operations, if not specified by the |
452 | pool's ``recovery_op_priority``. The default value prioritizes client | |
453 | ops (see above) over recovery ops. You may adjust the tradeoff of client | |
454 | impact against the time to restore cluster health by lowering this value | |
455 | for increased prioritization of client ops, or by increasing it to favor | |
456 | recovery. | |
7c673cae FG |
457 | |
458 | :Type: 32-bit Integer | |
1adf2230 | 459 | :Default: ``3`` |
7c673cae FG |
460 | :Valid Range: 1-63 |
461 | ||
462 | ||
f67539c2 | 463 | ``osd_scrub_priority`` |
7c673cae | 464 | |
f67539c2 | 465 | :Description: The default work queue priority for scheduled scrubs when the |
11fdf7f2 | 466 | pool doesn't specify a value of ``scrub_priority``. This can be |
f67539c2 | 467 | boosted to the value of ``osd_client_op_priority`` when scrubs are |
11fdf7f2 | 468 | blocking client operations. |
7c673cae FG |
469 | |
470 | :Type: 32-bit Integer | |
471 | :Default: ``5`` | |
472 | :Valid Range: 1-63 | |
473 | ||
474 | ||
f67539c2 | 475 | ``osd_requested_scrub_priority`` |
11fdf7f2 TL |
476 | |
477 | :Description: The priority set for user requested scrub on the work queue. If | |
f67539c2 TL |
478 | this value were to be smaller than ``osd_client_op_priority`` it |
479 | can be boosted to the value of ``osd_client_op_priority`` when | |
11fdf7f2 TL |
480 | scrub is blocking client operations. |
481 | ||
482 | :Type: 32-bit Integer | |
483 | :Default: ``120`` | |
484 | ||
485 | ||
f67539c2 | 486 | ``osd_snap_trim_priority`` |
7c673cae | 487 | |
11fdf7f2 | 488 | :Description: The priority set for the snap trim work queue. |
7c673cae FG |
489 | |
490 | :Type: 32-bit Integer | |
491 | :Default: ``5`` | |
492 | :Valid Range: 1-63 | |
493 | ||
f67539c2 | 494 | ``osd_snap_trim_sleep`` |
494da23a TL |
495 | |
496 | :Description: Time in seconds to sleep before next snap trim op. | |
497 | Increasing this value will slow down snap trimming. | |
498 | This option overrides backend specific variants. | |
499 | ||
500 | :Type: Float | |
501 | :Default: ``0`` | |
502 | ||
503 | ||
f67539c2 | 504 | ``osd_snap_trim_sleep_hdd`` |
494da23a TL |
505 | |
506 | :Description: Time in seconds to sleep before next snap trim op | |
507 | for HDDs. | |
508 | ||
509 | :Type: Float | |
510 | :Default: ``5`` | |
511 | ||
512 | ||
f67539c2 | 513 | ``osd_snap_trim_sleep_ssd`` |
494da23a TL |
514 | |
515 | :Description: Time in seconds to sleep before next snap trim op | |
f67539c2 | 516 | for SSD OSDs (including NVMe). |
494da23a TL |
517 | |
518 | :Type: Float | |
519 | :Default: ``0`` | |
520 | ||
521 | ||
f67539c2 | 522 | ``osd_snap_trim_sleep_hybrid`` |
494da23a TL |
523 | |
524 | :Description: Time in seconds to sleep before next snap trim op | |
f67539c2 | 525 | when OSD data is on an HDD and the OSD journal or WAL+DB is on an SSD. |
494da23a TL |
526 | |
527 | :Type: Float | |
528 | :Default: ``2`` | |
7c673cae | 529 | |
f67539c2 | 530 | ``osd_op_thread_timeout`` |
7c673cae FG |
531 | |
532 | :Description: The Ceph OSD Daemon operation thread timeout in seconds. | |
533 | :Type: 32-bit Integer | |
1adf2230 | 534 | :Default: ``15`` |
7c673cae FG |
535 | |
536 | ||
f67539c2 | 537 | ``osd_op_complaint_time`` |
7c673cae FG |
538 | |
539 | :Description: An operation becomes complaint worthy after the specified number | |
540 | of seconds have elapsed. | |
541 | ||
542 | :Type: Float | |
1adf2230 | 543 | :Default: ``30`` |
7c673cae FG |
544 | |
545 | ||
f67539c2 | 546 | ``osd_op_history_size`` |
7c673cae FG |
547 | |
548 | :Description: The maximum number of completed operations to track. | |
549 | :Type: 32-bit Unsigned Integer | |
550 | :Default: ``20`` | |
551 | ||
552 | ||
f67539c2 | 553 | ``osd_op_history_duration`` |
7c673cae FG |
554 | |
555 | :Description: The oldest completed operation to track. | |
556 | :Type: 32-bit Unsigned Integer | |
557 | :Default: ``600`` | |
558 | ||
559 | ||
f67539c2 | 560 | ``osd_op_log_threshold`` |
7c673cae FG |
561 | |
562 | :Description: How many operations logs to display at once. | |
563 | :Type: 32-bit Integer | |
564 | :Default: ``5`` | |
565 | ||
c07f9fc5 | 566 | |
9f95a23c TL |
567 | .. _dmclock-qos: |
568 | ||
c07f9fc5 FG |
569 | QoS Based on mClock |
570 | ------------------- | |
571 | ||
b3b6e05e TL |
572 | Ceph's use of mClock is now more refined and can be used by following the |
573 | steps as described in `mClock Config Reference`_. | |
c07f9fc5 FG |
574 | |
575 | Core Concepts | |
576 | ````````````` | |
577 | ||
f67539c2 | 578 | Ceph's QoS support is implemented using a queueing scheduler |
c07f9fc5 FG |
579 | based on `the dmClock algorithm`_. This algorithm allocates the I/O |
580 | resources of the Ceph cluster in proportion to weights, and enforces | |
11fdf7f2 | 581 | the constraints of minimum reservation and maximum limitation, so that |
c07f9fc5 | 582 | the services can compete for the resources fairly. Currently the |
f67539c2 | 583 | *mclock_scheduler* operation queue divides Ceph services involving I/O |
c07f9fc5 FG |
584 | resources into following buckets: |
585 | ||
586 | - client op: the iops issued by client | |
587 | - osd subop: the iops issued by primary OSD | |
588 | - snap trim: the snap trimming related requests | |
589 | - pg recovery: the recovery related requests | |
590 | - pg scrub: the scrub related requests | |
591 | ||
592 | And the resources are partitioned using following three sets of tags. In other | |
593 | words, the share of each type of service is controlled by three tags: | |
594 | ||
595 | #. reservation: the minimum IOPS allocated for the service. | |
596 | #. limitation: the maximum IOPS allocated for the service. | |
597 | #. weight: the proportional share of capacity if extra capacity or system | |
598 | oversubscribed. | |
599 | ||
b3b6e05e | 600 | In Ceph, operations are graded with "cost". And the resources allocated |
c07f9fc5 FG |
601 | for serving various services are consumed by these "costs". So, for |
602 | example, the more reservation a services has, the more resource it is | |
603 | guaranteed to possess, as long as it requires. Assuming there are 2 | |
604 | services: recovery and client ops: | |
605 | ||
606 | - recovery: (r:1, l:5, w:1) | |
607 | - client ops: (r:2, l:0, w:9) | |
608 | ||
609 | The settings above ensure that the recovery won't get more than 5 | |
610 | requests per second serviced, even if it requires so (see CURRENT | |
611 | IMPLEMENTATION NOTE below), and no other services are competing with | |
612 | it. But if the clients start to issue large amount of I/O requests, | |
613 | neither will they exhaust all the I/O resources. 1 request per second | |
614 | is always allocated for recovery jobs as long as there are any such | |
615 | requests. So the recovery jobs won't be starved even in a cluster with | |
616 | high load. And in the meantime, the client ops can enjoy a larger | |
617 | portion of the I/O resource, because its weight is "9", while its | |
618 | competitor "1". In the case of client ops, it is not clamped by the | |
619 | limit setting, so it can make use of all the resources if there is no | |
620 | recovery ongoing. | |
621 | ||
b3b6e05e TL |
622 | CURRENT IMPLEMENTATION NOTE: the current implementation enforces the limit |
623 | values. Therefore, if a service crosses the enforced limit, the op remains | |
624 | in the operation queue until the limit is restored. | |
c07f9fc5 FG |
625 | |
626 | Subtleties of mClock | |
627 | ```````````````````` | |
628 | ||
629 | The reservation and limit values have a unit of requests per | |
630 | second. The weight, however, does not technically have a unit and the | |
631 | weights are relative to one another. So if one class of requests has a | |
632 | weight of 1 and another a weight of 9, then the latter class of | |
633 | requests should get 9 executed at a 9 to 1 ratio as the first class. | |
634 | However that will only happen once the reservations are met and those | |
635 | values include the operations executed under the reservation phase. | |
636 | ||
637 | Even though the weights do not have units, one must be careful in | |
638 | choosing their values due how the algorithm assigns weight tags to | |
639 | requests. If the weight is *W*, then for a given class of requests, | |
640 | the next one that comes in will have a weight tag of *1/W* plus the | |
641 | previous weight tag or the current time, whichever is larger. That | |
642 | means if *W* is sufficiently large and therefore *1/W* is sufficiently | |
643 | small, the calculated tag may never be assigned as it will get a value | |
644 | of the current time. The ultimate lesson is that values for weight | |
645 | should not be too large. They should be under the number of requests | |
b3b6e05e | 646 | one expects to be serviced each second. |
c07f9fc5 FG |
647 | |
648 | Caveats | |
649 | ``````` | |
650 | ||
651 | There are some factors that can reduce the impact of the mClock op | |
652 | queues within Ceph. First, requests to an OSD are sharded by their | |
653 | placement group identifier. Each shard has its own mClock queue and | |
654 | these queues neither interact nor share information among them. The | |
655 | number of shards can be controlled with the configuration options | |
656 | ``osd_op_num_shards``, ``osd_op_num_shards_hdd``, and | |
657 | ``osd_op_num_shards_ssd``. A lower number of shards will increase the | |
11fdf7f2 | 658 | impact of the mClock queues, but may have other deleterious effects. |
c07f9fc5 FG |
659 | |
660 | Second, requests are transferred from the operation queue to the | |
661 | operation sequencer, in which they go through the phases of | |
662 | execution. The operation queue is where mClock resides and mClock | |
663 | determines the next op to transfer to the operation sequencer. The | |
664 | number of operations allowed in the operation sequencer is a complex | |
665 | issue. In general we want to keep enough operations in the sequencer | |
666 | so it's always getting work done on some operations while it's waiting | |
667 | for disk and network access to complete on other operations. On the | |
668 | other hand, once an operation is transferred to the operation | |
669 | sequencer, mClock no longer has control over it. Therefore to maximize | |
670 | the impact of mClock, we want to keep as few operations in the | |
671 | operation sequencer as possible. So we have an inherent tension. | |
672 | ||
673 | The configuration options that influence the number of operations in | |
674 | the operation sequencer are ``bluestore_throttle_bytes``, | |
675 | ``bluestore_throttle_deferred_bytes``, | |
676 | ``bluestore_throttle_cost_per_io``, | |
677 | ``bluestore_throttle_cost_per_io_hdd``, and | |
678 | ``bluestore_throttle_cost_per_io_ssd``. | |
679 | ||
680 | A third factor that affects the impact of the mClock algorithm is that | |
681 | we're using a distributed system, where requests are made to multiple | |
682 | OSDs and each OSD has (can have) multiple shards. Yet we're currently | |
683 | using the mClock algorithm, which is not distributed (note: dmClock is | |
684 | the distributed version of mClock). | |
685 | ||
686 | Various organizations and individuals are currently experimenting with | |
687 | mClock as it exists in this code base along with their modifications | |
688 | to the code base. We hope you'll share you're experiences with your | |
f67539c2 | 689 | mClock and dmClock experiments on the ``ceph-devel`` mailing list. |
c07f9fc5 FG |
690 | |
691 | ||
f67539c2 | 692 | ``osd_push_per_object_cost`` |
c07f9fc5 FG |
693 | |
694 | :Description: the overhead for serving a push op | |
695 | ||
696 | :Type: Unsigned Integer | |
697 | :Default: 1000 | |
698 | ||
f67539c2 TL |
699 | |
700 | ``osd_recovery_max_chunk`` | |
c07f9fc5 FG |
701 | |
702 | :Description: the maximum total size of data chunks a recovery op can carry. | |
703 | ||
704 | :Type: Unsigned Integer | |
705 | :Default: 8 MiB | |
706 | ||
707 | ||
f67539c2 | 708 | ``osd_mclock_scheduler_client_res`` |
c07f9fc5 | 709 | |
f67539c2 | 710 | :Description: IO proportion reserved for each client (default). |
c07f9fc5 | 711 | |
f67539c2 TL |
712 | :Type: Unsigned Integer |
713 | :Default: 1 | |
c07f9fc5 FG |
714 | |
715 | ||
f67539c2 | 716 | ``osd_mclock_scheduler_client_wgt`` |
c07f9fc5 | 717 | |
f67539c2 | 718 | :Description: IO share for each client (default) over reservation. |
c07f9fc5 | 719 | |
f67539c2 TL |
720 | :Type: Unsigned Integer |
721 | :Default: 1 | |
c07f9fc5 FG |
722 | |
723 | ||
f67539c2 | 724 | ``osd_mclock_scheduler_client_lim`` |
c07f9fc5 | 725 | |
f67539c2 | 726 | :Description: IO limit for each client (default) over reservation. |
c07f9fc5 | 727 | |
f67539c2 TL |
728 | :Type: Unsigned Integer |
729 | :Default: 999999 | |
c07f9fc5 FG |
730 | |
731 | ||
f67539c2 | 732 | ``osd_mclock_scheduler_background_recovery_res`` |
c07f9fc5 | 733 | |
f67539c2 | 734 | :Description: IO proportion reserved for background recovery (default). |
c07f9fc5 | 735 | |
f67539c2 TL |
736 | :Type: Unsigned Integer |
737 | :Default: 1 | |
c07f9fc5 FG |
738 | |
739 | ||
f67539c2 | 740 | ``osd_mclock_scheduler_background_recovery_wgt`` |
c07f9fc5 | 741 | |
f67539c2 | 742 | :Description: IO share for each background recovery over reservation. |
c07f9fc5 | 743 | |
f67539c2 TL |
744 | :Type: Unsigned Integer |
745 | :Default: 1 | |
c07f9fc5 FG |
746 | |
747 | ||
f67539c2 | 748 | ``osd_mclock_scheduler_background_recovery_lim`` |
c07f9fc5 | 749 | |
f67539c2 | 750 | :Description: IO limit for background recovery over reservation. |
c07f9fc5 | 751 | |
f67539c2 TL |
752 | :Type: Unsigned Integer |
753 | :Default: 999999 | |
c07f9fc5 FG |
754 | |
755 | ||
f67539c2 | 756 | ``osd_mclock_scheduler_background_best_effort_res`` |
c07f9fc5 | 757 | |
f67539c2 | 758 | :Description: IO proportion reserved for background best_effort (default). |
c07f9fc5 | 759 | |
f67539c2 TL |
760 | :Type: Unsigned Integer |
761 | :Default: 1 | |
c07f9fc5 FG |
762 | |
763 | ||
f67539c2 | 764 | ``osd_mclock_scheduler_background_best_effort_wgt`` |
c07f9fc5 | 765 | |
f67539c2 | 766 | :Description: IO share for each background best_effort over reservation. |
c07f9fc5 | 767 | |
f67539c2 TL |
768 | :Type: Unsigned Integer |
769 | :Default: 1 | |
c07f9fc5 FG |
770 | |
771 | ||
f67539c2 | 772 | ``osd_mclock_scheduler_background_best_effort_lim`` |
c07f9fc5 | 773 | |
f67539c2 | 774 | :Description: IO limit for background best_effort over reservation. |
c07f9fc5 | 775 | |
f67539c2 TL |
776 | :Type: Unsigned Integer |
777 | :Default: 999999 | |
c07f9fc5 FG |
778 | |
779 | .. _the dmClock algorithm: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf | |
780 | ||
781 | ||
7c673cae FG |
782 | .. index:: OSD; backfilling |
783 | ||
784 | Backfilling | |
785 | =========== | |
786 | ||
f67539c2 TL |
787 | When you add or remove Ceph OSD Daemons to a cluster, CRUSH will |
788 | rebalance the cluster by moving placement groups to or from Ceph OSDs | |
789 | to restore balanced utilization. The process of migrating placement groups and | |
7c673cae FG |
790 | the objects they contain can reduce the cluster's operational performance |
791 | considerably. To maintain operational performance, Ceph performs this migration | |
792 | with 'backfilling', which allows Ceph to set backfill operations to a lower | |
1adf2230 | 793 | priority than requests to read or write data. |
7c673cae FG |
794 | |
795 | ||
f67539c2 | 796 | ``osd_max_backfills`` |
7c673cae FG |
797 | |
798 | :Description: The maximum number of backfills allowed to or from a single OSD. | |
f67539c2 | 799 | Note that this is applied separately for read and write operations. |
7c673cae FG |
800 | :Type: 64-bit Unsigned Integer |
801 | :Default: ``1`` | |
802 | ||
803 | ||
f67539c2 | 804 | ``osd_backfill_scan_min`` |
7c673cae FG |
805 | |
806 | :Description: The minimum number of objects per backfill scan. | |
807 | ||
808 | :Type: 32-bit Integer | |
1adf2230 | 809 | :Default: ``64`` |
7c673cae FG |
810 | |
811 | ||
f67539c2 | 812 | ``osd_backfill_scan_max`` |
7c673cae FG |
813 | |
814 | :Description: The maximum number of objects per backfill scan. | |
815 | ||
816 | :Type: 32-bit Integer | |
1adf2230 | 817 | :Default: ``512`` |
7c673cae FG |
818 | |
819 | ||
f67539c2 | 820 | ``osd_backfill_retry_interval`` |
7c673cae FG |
821 | |
822 | :Description: The number of seconds to wait before retrying backfill requests. | |
823 | :Type: Double | |
824 | :Default: ``10.0`` | |
825 | ||
826 | .. index:: OSD; osdmap | |
827 | ||
828 | OSD Map | |
829 | ======= | |
830 | ||
1adf2230 | 831 | OSD maps reflect the OSD daemons operating in the cluster. Over time, the |
7c673cae FG |
832 | number of map epochs increases. Ceph provides some settings to ensure that |
833 | Ceph performs well as the OSD map grows larger. | |
834 | ||
835 | ||
f67539c2 | 836 | ``osd_map_dedup`` |
7c673cae | 837 | |
1adf2230 | 838 | :Description: Enable removing duplicates in the OSD map. |
7c673cae FG |
839 | :Type: Boolean |
840 | :Default: ``true`` | |
841 | ||
842 | ||
f67539c2 | 843 | ``osd_map_cache_size`` |
7c673cae FG |
844 | |
845 | :Description: The number of OSD maps to keep cached. | |
846 | :Type: 32-bit Integer | |
7c673cae FG |
847 | :Default: ``50`` |
848 | ||
849 | ||
f67539c2 | 850 | ``osd_map_message_max`` |
7c673cae FG |
851 | |
852 | :Description: The maximum map entries allowed per MOSDMap message. | |
853 | :Type: 32-bit Integer | |
a8e16298 | 854 | :Default: ``40`` |
7c673cae FG |
855 | |
856 | ||
857 | ||
858 | .. index:: OSD; recovery | |
859 | ||
860 | Recovery | |
861 | ======== | |
862 | ||
863 | When the cluster starts or when a Ceph OSD Daemon crashes and restarts, the OSD | |
864 | begins peering with other Ceph OSD Daemons before writes can occur. See | |
865 | `Monitoring OSDs and PGs`_ for details. | |
866 | ||
867 | If a Ceph OSD Daemon crashes and comes back online, usually it will be out of | |
868 | sync with other Ceph OSD Daemons containing more recent versions of objects in | |
869 | the placement groups. When this happens, the Ceph OSD Daemon goes into recovery | |
870 | mode and seeks to get the latest copy of the data and bring its map back up to | |
871 | date. Depending upon how long the Ceph OSD Daemon was down, the OSD's objects | |
872 | and placement groups may be significantly out of date. Also, if a failure domain | |
873 | went down (e.g., a rack), more than one Ceph OSD Daemon may come back online at | |
874 | the same time. This can make the recovery process time consuming and resource | |
875 | intensive. | |
876 | ||
877 | To maintain operational performance, Ceph performs recovery with limitations on | |
878 | the number recovery requests, threads and object chunk sizes which allows Ceph | |
1adf2230 | 879 | perform well in a degraded state. |
7c673cae FG |
880 | |
881 | ||
f67539c2 | 882 | ``osd_recovery_delay_start`` |
7c673cae | 883 | |
1adf2230 | 884 | :Description: After peering completes, Ceph will delay for the specified number |
f67539c2 | 885 | of seconds before starting to recover RADOS objects. |
7c673cae FG |
886 | |
887 | :Type: Float | |
1adf2230 | 888 | :Default: ``0`` |
7c673cae FG |
889 | |
890 | ||
f67539c2 | 891 | ``osd_recovery_max_active`` |
7c673cae | 892 | |
1adf2230 AA |
893 | :Description: The number of active recovery requests per OSD at one time. More |
894 | requests will accelerate recovery, but the requests places an | |
7c673cae FG |
895 | increased load on the cluster. |
896 | ||
9f95a23c TL |
897 | This value is only used if it is non-zero. Normally it |
898 | is ``0``, which means that the ``hdd`` or ``ssd`` values | |
899 | (below) are used, depending on the type of the primary | |
900 | device backing the OSD. | |
901 | ||
902 | :Type: 32-bit Integer | |
903 | :Default: ``0`` | |
904 | ||
f67539c2 | 905 | ``osd_recovery_max_active_hdd`` |
9f95a23c TL |
906 | |
907 | :Description: The number of active recovery requests per OSD at one time, if the | |
908 | primary device is rotational. | |
909 | ||
7c673cae | 910 | :Type: 32-bit Integer |
31f18b77 | 911 | :Default: ``3`` |
7c673cae | 912 | |
f67539c2 | 913 | ``osd_recovery_max_active_ssd`` |
9f95a23c TL |
914 | |
915 | :Description: The number of active recovery requests per OSD at one time, if the | |
916 | primary device is non-rotational (i.e., an SSD). | |
917 | ||
918 | :Type: 32-bit Integer | |
919 | :Default: ``10`` | |
920 | ||
7c673cae | 921 | |
f67539c2 | 922 | ``osd_recovery_max_chunk`` |
7c673cae | 923 | |
1adf2230 | 924 | :Description: The maximum size of a recovered chunk of data to push. |
c07f9fc5 | 925 | :Type: 64-bit Unsigned Integer |
1adf2230 | 926 | :Default: ``8 << 20`` |
7c673cae FG |
927 | |
928 | ||
f67539c2 | 929 | ``osd_recovery_max_single_start`` |
31f18b77 FG |
930 | |
931 | :Description: The maximum number of recovery operations per OSD that will be | |
932 | newly started when an OSD is recovering. | |
c07f9fc5 | 933 | :Type: 64-bit Unsigned Integer |
31f18b77 FG |
934 | :Default: ``1`` |
935 | ||
936 | ||
f67539c2 | 937 | ``osd_recovery_thread_timeout`` |
7c673cae FG |
938 | |
939 | :Description: The maximum time in seconds before timing out a recovery thread. | |
940 | :Type: 32-bit Integer | |
941 | :Default: ``30`` | |
942 | ||
943 | ||
f67539c2 | 944 | ``osd_recover_clone_overlap`` |
7c673cae | 945 | |
1adf2230 | 946 | :Description: Preserves clone overlap during recovery. Should always be set |
7c673cae FG |
947 | to ``true``. |
948 | ||
949 | :Type: Boolean | |
950 | :Default: ``true`` | |
951 | ||
31f18b77 | 952 | |
f67539c2 | 953 | ``osd_recovery_sleep`` |
31f18b77 | 954 | |
f67539c2 | 955 | :Description: Time in seconds to sleep before the next recovery or backfill op. |
c07f9fc5 FG |
956 | Increasing this value will slow down recovery operation while |
957 | client operations will be less impacted. | |
31f18b77 FG |
958 | |
959 | :Type: Float | |
c07f9fc5 FG |
960 | :Default: ``0`` |
961 | ||
962 | ||
f67539c2 | 963 | ``osd_recovery_sleep_hdd`` |
c07f9fc5 FG |
964 | |
965 | :Description: Time in seconds to sleep before next recovery or backfill op | |
966 | for HDDs. | |
967 | ||
968 | :Type: Float | |
969 | :Default: ``0.1`` | |
970 | ||
971 | ||
f67539c2 | 972 | ``osd_recovery_sleep_ssd`` |
c07f9fc5 | 973 | |
f67539c2 | 974 | :Description: Time in seconds to sleep before the next recovery or backfill op |
c07f9fc5 FG |
975 | for SSDs. |
976 | ||
977 | :Type: Float | |
978 | :Default: ``0`` | |
31f18b77 | 979 | |
d2e6a577 | 980 | |
f67539c2 | 981 | ``osd_recovery_sleep_hybrid`` |
d2e6a577 | 982 | |
f67539c2 TL |
983 | :Description: Time in seconds to sleep before the next recovery or backfill op |
984 | when OSD data is on HDD and OSD journal / WAL+DB is on SSD. | |
d2e6a577 FG |
985 | |
986 | :Type: Float | |
987 | :Default: ``0.025`` | |
988 | ||
11fdf7f2 | 989 | |
f67539c2 | 990 | ``osd_recovery_priority`` |
11fdf7f2 TL |
991 | |
992 | :Description: The default priority set for recovery work queue. Not | |
993 | related to a pool's ``recovery_priority``. | |
994 | ||
995 | :Type: 32-bit Integer | |
996 | :Default: ``5`` | |
997 | ||
998 | ||
7c673cae FG |
999 | Tiering |
1000 | ======= | |
1001 | ||
f67539c2 | 1002 | ``osd_agent_max_ops`` |
7c673cae FG |
1003 | |
1004 | :Description: The maximum number of simultaneous flushing ops per tiering agent | |
1005 | in the high speed mode. | |
1006 | :Type: 32-bit Integer | |
1007 | :Default: ``4`` | |
1008 | ||
1009 | ||
f67539c2 | 1010 | ``osd_agent_max_low_ops`` |
7c673cae FG |
1011 | |
1012 | :Description: The maximum number of simultaneous flushing ops per tiering agent | |
1013 | in the low speed mode. | |
1014 | :Type: 32-bit Integer | |
1015 | :Default: ``2`` | |
1016 | ||
1017 | See `cache target dirty high ratio`_ for when the tiering agent flushes dirty | |
1018 | objects within the high speed mode. | |
1019 | ||
1020 | Miscellaneous | |
1021 | ============= | |
1022 | ||
1023 | ||
f67539c2 | 1024 | ``osd_snap_trim_thread_timeout`` |
7c673cae FG |
1025 | |
1026 | :Description: The maximum time in seconds before timing out a snap trim thread. | |
1027 | :Type: 32-bit Integer | |
f67539c2 | 1028 | :Default: ``1*60*60`` |
7c673cae FG |
1029 | |
1030 | ||
f67539c2 | 1031 | ``osd_backlog_thread_timeout`` |
7c673cae FG |
1032 | |
1033 | :Description: The maximum time in seconds before timing out a backlog thread. | |
1034 | :Type: 32-bit Integer | |
f67539c2 | 1035 | :Default: ``1*60*60`` |
7c673cae FG |
1036 | |
1037 | ||
f67539c2 | 1038 | ``osd_default_notify_timeout`` |
7c673cae FG |
1039 | |
1040 | :Description: The OSD default notification timeout (in seconds). | |
c07f9fc5 | 1041 | :Type: 32-bit Unsigned Integer |
1adf2230 | 1042 | :Default: ``30`` |
7c673cae FG |
1043 | |
1044 | ||
f67539c2 | 1045 | ``osd_check_for_log_corruption`` |
7c673cae FG |
1046 | |
1047 | :Description: Check log files for corruption. Can be computationally expensive. | |
1048 | :Type: Boolean | |
1adf2230 | 1049 | :Default: ``false`` |
7c673cae FG |
1050 | |
1051 | ||
f67539c2 | 1052 | ``osd_remove_thread_timeout`` |
7c673cae FG |
1053 | |
1054 | :Description: The maximum time in seconds before timing out a remove OSD thread. | |
1055 | :Type: 32-bit Integer | |
1056 | :Default: ``60*60`` | |
1057 | ||
1058 | ||
f67539c2 | 1059 | ``osd_command_thread_timeout`` |
7c673cae FG |
1060 | |
1061 | :Description: The maximum time in seconds before timing out a command thread. | |
1062 | :Type: 32-bit Integer | |
1adf2230 | 1063 | :Default: ``10*60`` |
7c673cae FG |
1064 | |
1065 | ||
f67539c2 | 1066 | ``osd_delete_sleep`` |
9f95a23c | 1067 | |
f67539c2 TL |
1068 | :Description: Time in seconds to sleep before the next removal transaction. This |
1069 | throttles the PG deletion process. | |
9f95a23c TL |
1070 | |
1071 | :Type: Float | |
1072 | :Default: ``0`` | |
1073 | ||
1074 | ||
f67539c2 | 1075 | ``osd_delete_sleep_hdd`` |
9f95a23c | 1076 | |
f67539c2 | 1077 | :Description: Time in seconds to sleep before the next removal transaction |
9f95a23c TL |
1078 | for HDDs. |
1079 | ||
1080 | :Type: Float | |
1081 | :Default: ``5`` | |
1082 | ||
1083 | ||
f67539c2 | 1084 | ``osd_delete_sleep_ssd`` |
9f95a23c | 1085 | |
f67539c2 | 1086 | :Description: Time in seconds to sleep before the next removal transaction |
9f95a23c TL |
1087 | for SSDs. |
1088 | ||
1089 | :Type: Float | |
1090 | :Default: ``0`` | |
1091 | ||
1092 | ||
f67539c2 | 1093 | ``osd_delete_sleep_hybrid`` |
9f95a23c | 1094 | |
f67539c2 TL |
1095 | :Description: Time in seconds to sleep before the next removal transaction |
1096 | when OSD data is on HDD and OSD journal or WAL+DB is on SSD. | |
9f95a23c TL |
1097 | |
1098 | :Type: Float | |
adb31ebb | 1099 | :Default: ``1`` |
9f95a23c TL |
1100 | |
1101 | ||
f67539c2 | 1102 | ``osd_command_max_records`` |
7c673cae | 1103 | |
1adf2230 | 1104 | :Description: Limits the number of lost objects to return. |
7c673cae | 1105 | :Type: 32-bit Integer |
1adf2230 | 1106 | :Default: ``256`` |
7c673cae FG |
1107 | |
1108 | ||
f67539c2 | 1109 | ``osd_fast_fail_on_connection_refused`` |
7c673cae FG |
1110 | |
1111 | :Description: If this option is enabled, crashed OSDs are marked down | |
1112 | immediately by connected peers and MONs (assuming that the | |
1113 | crashed OSD host survives). Disable it to restore old | |
1114 | behavior, at the expense of possible long I/O stalls when | |
1115 | OSDs crash in the middle of I/O operations. | |
1116 | :Type: Boolean | |
1117 | :Default: ``true`` | |
1118 | ||
1119 | ||
1120 | ||
1121 | .. _pool: ../../operations/pools | |
1122 | .. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction | |
1123 | .. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering | |
1124 | .. _Pool & PG Config Reference: ../pool-pg-config-ref | |
1125 | .. _Journal Config Reference: ../journal-ref | |
1126 | .. _cache target dirty high ratio: ../../operations/pools#cache-target-dirty-high-ratio | |
b3b6e05e | 1127 | .. _mClock Config Reference: ../mclock-config-ref |