]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rbd/rbd-config-ref.rst
update source to Ceph Pacific 16.2.2
[ceph.git] / ceph / doc / rbd / rbd-config-ref.rst
1 =======================
2 Config Settings
3 =======================
4
5 See `Block Device`_ for additional details.
6
7 Generic IO Settings
8 ===================
9
10 ``rbd_compression_hint``
11
12 :Description: Hint to send to the OSDs on write operations. If set to
13 ``compressible`` and the OSD ``bluestore_compression_mode``
14 setting is ``passive``, the OSD will attempt to compress data
15 If set to ``incompressible`` and the OSD compression setting
16 is ``aggressive``, the OSD will not attempt to compress data.
17 :Type: Enum
18 :Required: No
19 :Default: ``none``
20 :Values: ``none``, ``compressible``, ``incompressible``
21
22
23 ``rbd_read_from_replica_policy``
24
25 :Description: Policy for determining which OSD will receive read operations.
26 If set to ``default``, each PG's primary OSD will always be used
27 for read operations. If set to ``balance``, read operations will
28 be sent to a randomly selected OSD within the replica set. If set
29 to ``localize``, read operations will be sent to the closest OSD
30 as determined by the CRUSH map. Note: this feature requires the
31 cluster to be configured with a minimum compatible OSD release of
32 Octopus.
33 :Type: Enum
34 :Required: No
35 :Default: ``default``
36 :Values: ``default``, ``balance``, ``localize``
37
38 Cache Settings
39 =======================
40
41 .. sidebar:: Kernel Caching
42
43 The kernel driver for Ceph block devices can use the Linux page cache to
44 improve performance.
45
46 The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
47 take advantage of the Linux page cache, so it includes its own in-memory
48 caching, called "RBD caching." RBD caching behaves just like well-behaved hard
49 disk caching. When the OS sends a barrier or a flush request, all dirty data is
50 written to the OSDs. This means that using write-back caching is just as safe as
51 using a well-behaved physical hard disk with a VM that properly sends flushes
52 (i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
53 algorithm, and in write-back mode it can coalesce contiguous requests for
54 better throughput.
55
56 The librbd cache is enabled by default and supports three different cache
57 policies: write-around, write-back, and write-through. Writes return
58 immediately under both the write-around and write-back policies, unless there
59 are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster.
60 The write-around policy differs from the write-back policy in that it does
61 not attempt to service read requests from the cache, unlike the write-back
62 policy, and is therefore faster for high performance write workloads. Under the
63 write-through policy, writes return only when the data is on disk on all
64 replicas, but reads may come from the cache.
65
66 Prior to receiving a flush request, the cache behaves like a write-through cache
67 to ensure safe operation for older operating systems that do not send flushes to
68 ensure crash consistent behavior.
69
70 If the librbd cache is disabled, writes and
71 reads go directly to the storage cluster, and writes return only when the data
72 is on disk on all replicas.
73
74 .. note::
75 The cache is in memory on the client, and each RBD image has
76 its own. Since the cache is local to the client, there's no coherency
77 if there are others accessing the image. Running GFS or OCFS on top of
78 RBD will not work with caching enabled.
79
80
81 Option settings for RBD should be set in the ``[client]``
82 section of your configuration file or the central config store. These settings
83 include:
84
85 ``rbd_cache``
86
87 :Description: Enable caching for RADOS Block Device (RBD).
88 :Type: Boolean
89 :Required: No
90 :Default: ``true``
91
92
93 ``rbd_cache_policy``
94
95 :Description: Select the caching policy for librbd.
96 :Type: Enum
97 :Required: No
98 :Default: ``writearound``
99 :Values: ``writearound``, ``writeback``, ``writethrough``
100
101
102 ``rbd_cache_writethrough_until_flush``
103
104 :Description: Start out in ``writethrough`` mode, and switch to ``writeback``
105 after the first flush request is received. Enabling is a
106 conservative but safe strategy in case VMs running on RBD volumes
107 are too old to send flushes, like the ``virtio`` driver in Linux
108 kernels older than 2.6.32.
109 :Type: Boolean
110 :Required: No
111 :Default: ``true``
112
113
114 ``rbd_cache_size``
115
116 :Description: The per-volume RBD client cache size in bytes.
117 :Type: 64-bit Integer
118 :Required: No
119 :Default: ``32 MiB``
120 :Policies: write-back and write-through
121
122
123 ``rbd_cache_max_dirty``
124
125 :Description: The ``dirty`` limit in bytes at which the cache triggers write-back. If ``0``, uses write-through caching.
126 :Type: 64-bit Integer
127 :Required: No
128 :Constraint: Must be less than ``rbd_cache_size``.
129 :Default: ``24 MiB``
130 :Policies: write-around and write-back
131
132
133 ``rbd_cache_target_dirty``
134
135 :Description: The ``dirty target`` before the cache begins writing data to the data storage. Does not block writes to the cache.
136 :Type: 64-bit Integer
137 :Required: No
138 :Constraint: Must be less than ``rbd_cache_max_dirty``.
139 :Default: ``16 MiB``
140 :Policies: write-back
141
142
143 ``rbd_cache_max_dirty_age``
144
145 :Description: The number of seconds dirty data is in the cache before writeback starts.
146 :Type: Float
147 :Required: No
148 :Default: ``1.0``
149 :Policies: write-back
150
151
152 .. _Block Device: ../../rbd
153
154
155 Read-ahead Settings
156 =======================
157
158 librbd supports read-ahead/prefetching to optimize small, sequential reads.
159 This should normally be handled by the guest OS in the case of a VM,
160 but boot loaders may not issue efficient reads. Read-ahead is automatically
161 disabled if caching is disabled or if the policy is write-around.
162
163
164 ``rbd_readahead_trigger_requests``
165
166 :Description: Number of sequential read requests necessary to trigger read-ahead.
167 :Type: Integer
168 :Required: No
169 :Default: ``10``
170
171
172 ``rbd_readahead_max_bytes``
173
174 :Description: Maximum size of a read-ahead request. If zero, read-ahead is disabled.
175 :Type: 64-bit Integer
176 :Required: No
177 :Default: ``512 KiB``
178
179
180 ``rbd_readahead_disable_after_bytes``
181
182 :Description: After this many bytes have been read from an RBD image, read-ahead
183 is disabled for that image until it is closed. This allows the
184 guest OS to take over read-ahead once it is booted. If zero,
185 read-ahead stays enabled.
186 :Type: 64-bit Integer
187 :Required: No
188 :Default: ``50 MiB``
189
190
191 Image Features
192 ==============
193
194 RBD supports advanced features which can be specified via the command line when
195 creating images or the default features can be configured via
196 ``rbd_default_features = <sum of feature numeric values>`` or
197 ``rbd_default_features = <comma-delimited list of CLI values>``.
198
199 ``Layering``
200
201 :Description: Layering enables cloning.
202 :Internal value: 1
203 :CLI value: layering
204 :Added in: v0.52 (Bobtail)
205 :KRBD support: since v3.10
206 :Default: yes
207
208 ``Striping v2``
209
210 :Description: Striping spreads data across multiple objects. Striping helps with
211 parallelism for sequential read/write workloads.
212 :Internal value: 2
213 :CLI value: striping
214 :Added in: v0.55 (Bobtail)
215 :KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17)
216 :Default: yes
217
218 ``Exclusive locking``
219
220 :Description: When enabled, it requires a client to acquire a lock on an object
221 before making a write. Exclusive lock should only be enabled when
222 a single client is accessing an image at any given time.
223 :Internal value: 4
224 :CLI value: exclusive-lock
225 :Added in: v0.92 (Hammer)
226 :KRBD support: since v4.9
227 :Default: yes
228
229 ``Object map``
230
231 :Description: Object map support depends on exclusive lock support. Block
232 devices are thin provisioned, which means that they only store
233 data that actually has been written, ie. they are *sparse*. Object
234 map support helps track which objects actually exist (have data
235 stored on a device). Enabling object map support speeds up I/O
236 operations for cloning, importing and exporting a sparsely
237 populated image, and deleting.
238 :Internal value: 8
239 :CLI value: object-map
240 :Added in: v0.93 (Hammer)
241 :KRBD support: since v5.3
242 :Default: yes
243
244
245 ``Fast-diff``
246
247 :Description: Fast-diff support depends on object map support and exclusive lock
248 support. It adds another property to the object map, which makes
249 it much faster to generate diffs between snapshots of an image.
250 It is also much faster to calculate the actual data usage of a
251 snapshot or volume (``rbd du``).
252 :Internal value: 16
253 :CLI value: fast-diff
254 :Added in: v9.0.1 (Infernalis)
255 :KRBD support: since v5.3
256 :Default: yes
257
258
259 ``Deep-flatten``
260
261 :Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of
262 an image, in addition to the image itself. Without it, snapshots
263 of an image will still rely on the parent, so the parent cannot be
264 deleted until the snapshots are first deleted. Deep-flatten makes
265 a parent independent of its clones, even if they have snapshots,
266 at the expense of using additional OSD device space.
267 :Internal value: 32
268 :CLI value: deep-flatten
269 :Added in: v9.0.2 (Infernalis)
270 :KRBD support: since v5.1
271 :Default: yes
272
273
274 ``Journaling``
275
276 :Description: Journaling support depends on exclusive lock support. Journaling
277 records all modifications to an image in the order they occur. RBD
278 mirroring can utilize the journal to replicate a crash-consistent
279 image to a remote cluster. It is best to let ``rbd-mirror``
280 manage this feature only as needed, as enabling it long term may
281 result in substantial additional OSD space consumption.
282 :Internal value: 64
283 :CLI value: journaling
284 :Added in: v10.0.1 (Jewel)
285 :KRBD support: no
286 :Default: no
287
288
289 ``Data pool``
290
291 :Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata.
292 :Internal value: 128
293 :Added in: v11.1.0 (Kraken)
294 :KRBD support: since v4.11
295 :Default: no
296
297
298 ``Operations``
299
300 :Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create).
301 :Internal value: 256
302 :Added in: v13.0.2 (Mimic)
303 :KRBD support: since v4.16
304
305
306 ``Migrating``
307
308 :Description: Used to restrict older clients from opening an image when it is in migration state.
309 :Internal value: 512
310 :Added in: v14.0.1 (Nautilus)
311 :KRBD support: no
312
313 ``Non-primary``
314
315 :Description: Used to restrict changes to non-primary images using snapshot-based mirroring.
316 :Internal value: 1024
317 :Added in: v15.2.0 (Octopus)
318 :KRBD support: no
319
320
321 QOS Settings
322 ============
323
324 librbd supports limiting per-image IO, controlled by the following
325 settings.
326
327 ``rbd_qos_iops_limit``
328
329 :Description: The desired limit of IO operations per second.
330 :Type: Unsigned Integer
331 :Required: No
332 :Default: ``0``
333
334
335 ``rbd_qos_bps_limit``
336
337 :Description: The desired limit of IO bytes per second.
338 :Type: Unsigned Integer
339 :Required: No
340 :Default: ``0``
341
342
343 ``rbd_qos_read_iops_limit``
344
345 :Description: The desired limit of read operations per second.
346 :Type: Unsigned Integer
347 :Required: No
348 :Default: ``0``
349
350
351 ``rbd_qos_write_iops_limit``
352
353 :Description: The desired limit of write operations per second.
354 :Type: Unsigned Integer
355 :Required: No
356 :Default: ``0``
357
358
359 ``rbd_qos_read_bps_limit``
360
361 :Description: The desired limit of read bytes per second.
362 :Type: Unsigned Integer
363 :Required: No
364 :Default: ``0``
365
366
367 ``rbd_qos_writ_bps_limit``
368
369 :Description: The desired limit of write bytes per second.
370 :Type: Unsigned Integer
371 :Required: No
372 :Default: ``0``
373
374
375 ``rbd_qos_iops_burst``
376
377 :Description: The desired burst limit of IO operations.
378 :Type: Unsigned Integer
379 :Required: No
380 :Default: ``0``
381
382
383 ``rbd_qos_bps_burst``
384
385 :Description: The desired burst limit of IO bytes.
386 :Type: Unsigned Integer
387 :Required: No
388 :Default: ``0``
389
390
391 ``rbd_qos_read_iops_burst``
392
393 :Description: The desired burst limit of read operations.
394 :Type: Unsigned Integer
395 :Required: No
396 :Default: ``0``
397
398
399 ``rbd_qos_write_iops_burst``
400
401 :Description: The desired burst limit of write operations.
402 :Type: Unsigned Integer
403 :Required: No
404 :Default: ``0``
405
406
407 ``rbd_qos_read_bps_burst``
408
409 :Description: The desired burst limit of read bytes per second.
410 :Type: Unsigned Integer
411 :Required: No
412 :Default: ``0``
413
414
415 ``rbd_qos_write_bps_burst``
416
417 :Description: The desired burst limit of write bytes per second.
418 :Type: Unsigned Integer
419 :Required: No
420 :Default: ``0``
421
422
423 ``rbd_qos_iops_burst_seconds``
424
425 :Description: The desired burst duration in seconds of IO operations.
426 :Type: Unsigned Integer
427 :Required: No
428 :Default: ``1``
429
430
431 ``rbd_qos_bps_burst_seconds``
432
433 :Description: The desired burst duration in seconds.
434 :Type: Unsigned Integer
435 :Required: No
436 :Default: ``1``
437
438
439 ``rbd_qos_read_iops_burst_seconds``
440
441 :Description: The desired burst duration in seconds of read operations.
442 :Type: Unsigned Integer
443 :Required: No
444 :Default: ``1``
445
446
447 ``rbd_qos_write_iops_burst_seconds``
448
449 :Description: The desired burst duration in seconds of write operations.
450 :Type: Unsigned Integer
451 :Required: No
452 :Default: ``1``
453
454
455 ``rbd_qos_read_bps_burst_seconds``
456
457 :Description: The desired burst duration in seconds of read bytes.
458 :Type: Unsigned Integer
459 :Required: No
460 :Default: ``1``
461
462
463 ``rbd_qos_write_bps_burst_seconds``
464
465 :Description: The desired burst duration in seconds of write bytes.
466 :Type: Unsigned Integer
467 :Required: No
468 :Default: ``1``
469
470
471 ``rbd_qos_schedule_tick_min``
472
473 :Description: The minimum schedule tick (in milliseconds) for QoS.
474 :Type: Unsigned Integer
475 :Required: No
476 :Default: ``50``