]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rbd/rbd-config-ref.rst
bump version to 18.2.2-pve1
[ceph.git] / ceph / doc / rbd / rbd-config-ref.rst
CommitLineData
7c673cae 1=======================
9f95a23c 2 Config Settings
7c673cae
FG
3=======================
4
5See `Block Device`_ for additional details.
6
92f5a8d4
TL
7Generic IO Settings
8===================
9
20effc67
TL
10.. confval:: rbd_compression_hint
11.. confval:: rbd_read_from_replica_policy
12.. confval:: rbd_default_order
9f95a23c 13
7c673cae
FG
14Cache Settings
15=======================
16
17.. sidebar:: Kernel Caching
18
9f95a23c 19 The kernel driver for Ceph block devices can use the Linux page cache to
7c673cae
FG
20 improve performance.
21
22The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
23take advantage of the Linux page cache, so it includes its own in-memory
24caching, called "RBD caching." RBD caching behaves just like well-behaved hard
25disk caching. When the OS sends a barrier or a flush request, all dirty data is
26written to the OSDs. This means that using write-back caching is just as safe as
27using a well-behaved physical hard disk with a VM that properly sends flushes
28(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
9f95a23c 29algorithm, and in write-back mode it can coalesce contiguous requests for
7c673cae
FG
30better throughput.
31
9f95a23c
TL
32The librbd cache is enabled by default and supports three different cache
33policies: write-around, write-back, and write-through. Writes return
34immediately under both the write-around and write-back policies, unless there
f67539c2 35are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster.
9f95a23c
TL
36The write-around policy differs from the write-back policy in that it does
37not attempt to service read requests from the cache, unlike the write-back
38policy, and is therefore faster for high performance write workloads. Under the
39write-through policy, writes return only when the data is on disk on all
40replicas, but reads may come from the cache.
41
42Prior to receiving a flush request, the cache behaves like a write-through cache
43to ensure safe operation for older operating systems that do not send flushes to
44ensure crash consistent behavior.
7c673cae 45
9f95a23c
TL
46If the librbd cache is disabled, writes and
47reads go directly to the storage cluster, and writes return only when the data
48is on disk on all replicas.
7c673cae 49
9f95a23c
TL
50.. note::
51 The cache is in memory on the client, and each RBD image has
52 its own. Since the cache is local to the client, there's no coherency
53 if there are others accessing the image. Running GFS or OCFS on top of
54 RBD will not work with caching enabled.
7c673cae 55
7c673cae 56
f67539c2
TL
57Option settings for RBD should be set in the ``[client]``
58section of your configuration file or the central config store. These settings
59include:
7c673cae 60
20effc67
TL
61.. confval:: rbd_cache
62.. confval:: rbd_cache_policy
63.. confval:: rbd_cache_writethrough_until_flush
64.. confval:: rbd_cache_size
65.. confval:: rbd_cache_max_dirty
66.. confval:: rbd_cache_target_dirty
67.. confval:: rbd_cache_max_dirty_age
7c673cae 68
d2e6a577 69.. _Block Device: ../../rbd
7c673cae
FG
70
71
72Read-ahead Settings
73=======================
74
9f95a23c 75librbd supports read-ahead/prefetching to optimize small, sequential reads.
7c673cae 76This should normally be handled by the guest OS in the case of a VM,
9f95a23c
TL
77but boot loaders may not issue efficient reads. Read-ahead is automatically
78disabled if caching is disabled or if the policy is write-around.
7c673cae
FG
79
80
20effc67
TL
81.. confval:: rbd_readahead_trigger_requests
82.. confval:: rbd_readahead_max_bytes
83.. confval:: rbd_readahead_disable_after_bytes
11fdf7f2 84
9f95a23c
TL
85Image Features
86==============
11fdf7f2 87
f67539c2
TL
88RBD supports advanced features which can be specified via the command line when
89creating images or the default features can be configured via
90``rbd_default_features = <sum of feature numeric values>`` or
91``rbd_default_features = <comma-delimited list of CLI values>``.
11fdf7f2
TL
92
93``Layering``
94
f67539c2 95:Description: Layering enables cloning.
11fdf7f2
TL
96:Internal value: 1
97:CLI value: layering
9f95a23c 98:Added in: v0.52 (Bobtail)
11fdf7f2
TL
99:KRBD support: since v3.10
100:Default: yes
101
102``Striping v2``
103
f67539c2
TL
104:Description: Striping spreads data across multiple objects. Striping helps with
105 parallelism for sequential read/write workloads.
11fdf7f2
TL
106:Internal value: 2
107:CLI value: striping
9f95a23c
TL
108:Added in: v0.55 (Bobtail)
109:KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17)
11fdf7f2
TL
110:Default: yes
111
112``Exclusive locking``
113
f67539c2
TL
114:Description: When enabled, it requires a client to acquire a lock on an object
115 before making a write. Exclusive lock should only be enabled when
116 a single client is accessing an image at any given time.
11fdf7f2
TL
117:Internal value: 4
118:CLI value: exclusive-lock
119:Added in: v0.92 (Hammer)
120:KRBD support: since v4.9
121:Default: yes
122
123``Object map``
124
f67539c2
TL
125:Description: Object map support depends on exclusive lock support. Block
126 devices are thin provisioned, which means that they only store
127 data that actually has been written, ie. they are *sparse*. Object
128 map support helps track which objects actually exist (have data
129 stored on a device). Enabling object map support speeds up I/O
130 operations for cloning, importing and exporting a sparsely
131 populated image, and deleting.
11fdf7f2
TL
132:Internal value: 8
133:CLI value: object-map
134:Added in: v0.93 (Hammer)
9f95a23c 135:KRBD support: since v5.3
11fdf7f2
TL
136:Default: yes
137
138
139``Fast-diff``
140
f67539c2
TL
141:Description: Fast-diff support depends on object map support and exclusive lock
142 support. It adds another property to the object map, which makes
143 it much faster to generate diffs between snapshots of an image.
144 It is also much faster to calculate the actual data usage of a
145 snapshot or volume (``rbd du``).
11fdf7f2
TL
146:Internal value: 16
147:CLI value: fast-diff
148:Added in: v9.0.1 (Infernalis)
9f95a23c 149:KRBD support: since v5.3
11fdf7f2
TL
150:Default: yes
151
152
153``Deep-flatten``
154
f67539c2
TL
155:Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of
156 an image, in addition to the image itself. Without it, snapshots
157 of an image will still rely on the parent, so the parent cannot be
158 deleted until the snapshots are first deleted. Deep-flatten makes
159 a parent independent of its clones, even if they have snapshots,
160 at the expense of using additional OSD device space.
11fdf7f2
TL
161:Internal value: 32
162:CLI value: deep-flatten
163:Added in: v9.0.2 (Infernalis)
9f95a23c 164:KRBD support: since v5.1
11fdf7f2
TL
165:Default: yes
166
167
168``Journaling``
169
f67539c2
TL
170:Description: Journaling support depends on exclusive lock support. Journaling
171 records all modifications to an image in the order they occur. RBD
172 mirroring can utilize the journal to replicate a crash-consistent
173 image to a remote cluster. It is best to let ``rbd-mirror``
174 manage this feature only as needed, as enabling it long term may
175 result in substantial additional OSD space consumption.
11fdf7f2
TL
176:Internal value: 64
177:CLI value: journaling
178:Added in: v10.0.1 (Jewel)
179:KRBD support: no
180:Default: no
181
182
183``Data pool``
184
185:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata.
186:Internal value: 128
187:Added in: v11.1.0 (Kraken)
188:KRBD support: since v4.11
189:Default: no
190
191
192``Operations``
193
194:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create).
195:Internal value: 256
196:Added in: v13.0.2 (Mimic)
197:KRBD support: since v4.16
198
199
200``Migrating``
201
202:Description: Used to restrict older clients from opening an image when it is in migration state.
203:Internal value: 512
204:Added in: v14.0.1 (Nautilus)
205:KRBD support: no
206
9f95a23c 207``Non-primary``
11fdf7f2 208
9f95a23c
TL
209:Description: Used to restrict changes to non-primary images using snapshot-based mirroring.
210:Internal value: 1024
211:Added in: v15.2.0 (Octopus)
212:KRBD support: no
213
214
20effc67 215QoS Settings
9f95a23c 216============
11fdf7f2 217
20effc67
TL
218librbd supports limiting per-image IO in several ways. These all apply
219to a given image within a given process - the same image used in
220multiple places, e.g. two separate VMs, would have independent limits.
221
222* **IOPS:** number of I/Os per second (any type of I/O)
223* **read IOPS:** number of read I/Os per second
224* **write IOPS:** number of write I/Os per second
225* **bps:** bytes per second (any type of I/O)
226* **read bps:** bytes per second read
227* **write bps:** bytes per second written
228
229Each of these limits operates independently of each other. They are
230all off by default. Every type of limit throttles I/O using a token
231bucket algorithm, with the ability to configure the limit (average
232speed over time) and potential for a higher rate (a burst) for a short
233period of time (burst_seconds). When any of these limits is reached,
234and there is no burst capacity left, librbd reduces the rate of that
235type of I/O to the limit.
236
237For example, if a read bps limit of 100MB was configured, but writes
238were not limited, writes could proceed as quickly as possible, while
239reads would be throttled to 100MB/s on average. If a read bps burst of
240150MB was set, and read burst seconds was set to five seconds, reads
241could proceed at 150MB/s for up to five seconds before dropping back
242to the 100MB/s limit.
243
244The following options configure these throttles:
245
246.. confval:: rbd_qos_iops_limit
247.. confval:: rbd_qos_iops_burst
248.. confval:: rbd_qos_iops_burst_seconds
249.. confval:: rbd_qos_read_iops_limit
250.. confval:: rbd_qos_read_iops_burst
251.. confval:: rbd_qos_read_iops_burst_seconds
252.. confval:: rbd_qos_write_iops_limit
253.. confval:: rbd_qos_write_iops_burst
254.. confval:: rbd_qos_write_iops_burst_seconds
255.. confval:: rbd_qos_bps_limit
256.. confval:: rbd_qos_bps_burst
257.. confval:: rbd_qos_bps_burst_seconds
258.. confval:: rbd_qos_read_bps_limit
259.. confval:: rbd_qos_read_bps_burst
260.. confval:: rbd_qos_read_bps_burst_seconds
261.. confval:: rbd_qos_write_bps_limit
262.. confval:: rbd_qos_write_bps_burst
263.. confval:: rbd_qos_write_bps_burst_seconds
264.. confval:: rbd_qos_schedule_tick_min
265.. confval:: rbd_qos_exclude_ops