]>
Commit | Line | Data |
---|---|---|
7c673cae | 1 | ======================= |
9f95a23c | 2 | Config Settings |
7c673cae FG |
3 | ======================= |
4 | ||
5 | See `Block Device`_ for additional details. | |
6 | ||
92f5a8d4 TL |
7 | Generic IO Settings |
8 | =================== | |
9 | ||
20effc67 TL |
10 | .. confval:: rbd_compression_hint |
11 | .. confval:: rbd_read_from_replica_policy | |
12 | .. confval:: rbd_default_order | |
9f95a23c | 13 | |
7c673cae FG |
14 | Cache Settings |
15 | ======================= | |
16 | ||
17 | .. sidebar:: Kernel Caching | |
18 | ||
9f95a23c | 19 | The kernel driver for Ceph block devices can use the Linux page cache to |
7c673cae FG |
20 | improve performance. |
21 | ||
22 | The user space implementation of the Ceph block device (i.e., ``librbd``) cannot | |
23 | take advantage of the Linux page cache, so it includes its own in-memory | |
24 | caching, called "RBD caching." RBD caching behaves just like well-behaved hard | |
25 | disk caching. When the OS sends a barrier or a flush request, all dirty data is | |
26 | written to the OSDs. This means that using write-back caching is just as safe as | |
27 | using a well-behaved physical hard disk with a VM that properly sends flushes | |
28 | (i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) | |
9f95a23c | 29 | algorithm, and in write-back mode it can coalesce contiguous requests for |
7c673cae FG |
30 | better throughput. |
31 | ||
9f95a23c TL |
32 | The librbd cache is enabled by default and supports three different cache |
33 | policies: write-around, write-back, and write-through. Writes return | |
34 | immediately under both the write-around and write-back policies, unless there | |
f67539c2 | 35 | are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster. |
9f95a23c TL |
36 | The write-around policy differs from the write-back policy in that it does |
37 | not attempt to service read requests from the cache, unlike the write-back | |
38 | policy, and is therefore faster for high performance write workloads. Under the | |
39 | write-through policy, writes return only when the data is on disk on all | |
40 | replicas, but reads may come from the cache. | |
41 | ||
42 | Prior to receiving a flush request, the cache behaves like a write-through cache | |
43 | to ensure safe operation for older operating systems that do not send flushes to | |
44 | ensure crash consistent behavior. | |
7c673cae | 45 | |
9f95a23c TL |
46 | If the librbd cache is disabled, writes and |
47 | reads go directly to the storage cluster, and writes return only when the data | |
48 | is on disk on all replicas. | |
7c673cae | 49 | |
9f95a23c TL |
50 | .. note:: |
51 | The cache is in memory on the client, and each RBD image has | |
52 | its own. Since the cache is local to the client, there's no coherency | |
53 | if there are others accessing the image. Running GFS or OCFS on top of | |
54 | RBD will not work with caching enabled. | |
7c673cae | 55 | |
7c673cae | 56 | |
f67539c2 TL |
57 | Option settings for RBD should be set in the ``[client]`` |
58 | section of your configuration file or the central config store. These settings | |
59 | include: | |
7c673cae | 60 | |
20effc67 TL |
61 | .. confval:: rbd_cache |
62 | .. confval:: rbd_cache_policy | |
63 | .. confval:: rbd_cache_writethrough_until_flush | |
64 | .. confval:: rbd_cache_size | |
65 | .. confval:: rbd_cache_max_dirty | |
66 | .. confval:: rbd_cache_target_dirty | |
67 | .. confval:: rbd_cache_max_dirty_age | |
7c673cae | 68 | |
d2e6a577 | 69 | .. _Block Device: ../../rbd |
7c673cae FG |
70 | |
71 | ||
72 | Read-ahead Settings | |
73 | ======================= | |
74 | ||
9f95a23c | 75 | librbd supports read-ahead/prefetching to optimize small, sequential reads. |
7c673cae | 76 | This should normally be handled by the guest OS in the case of a VM, |
9f95a23c TL |
77 | but boot loaders may not issue efficient reads. Read-ahead is automatically |
78 | disabled if caching is disabled or if the policy is write-around. | |
7c673cae FG |
79 | |
80 | ||
20effc67 TL |
81 | .. confval:: rbd_readahead_trigger_requests |
82 | .. confval:: rbd_readahead_max_bytes | |
83 | .. confval:: rbd_readahead_disable_after_bytes | |
11fdf7f2 | 84 | |
9f95a23c TL |
85 | Image Features |
86 | ============== | |
11fdf7f2 | 87 | |
f67539c2 TL |
88 | RBD supports advanced features which can be specified via the command line when |
89 | creating images or the default features can be configured via | |
90 | ``rbd_default_features = <sum of feature numeric values>`` or | |
91 | ``rbd_default_features = <comma-delimited list of CLI values>``. | |
11fdf7f2 TL |
92 | |
93 | ``Layering`` | |
94 | ||
f67539c2 | 95 | :Description: Layering enables cloning. |
11fdf7f2 TL |
96 | :Internal value: 1 |
97 | :CLI value: layering | |
9f95a23c | 98 | :Added in: v0.52 (Bobtail) |
11fdf7f2 TL |
99 | :KRBD support: since v3.10 |
100 | :Default: yes | |
101 | ||
102 | ``Striping v2`` | |
103 | ||
f67539c2 TL |
104 | :Description: Striping spreads data across multiple objects. Striping helps with |
105 | parallelism for sequential read/write workloads. | |
11fdf7f2 TL |
106 | :Internal value: 2 |
107 | :CLI value: striping | |
9f95a23c TL |
108 | :Added in: v0.55 (Bobtail) |
109 | :KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17) | |
11fdf7f2 TL |
110 | :Default: yes |
111 | ||
112 | ``Exclusive locking`` | |
113 | ||
f67539c2 TL |
114 | :Description: When enabled, it requires a client to acquire a lock on an object |
115 | before making a write. Exclusive lock should only be enabled when | |
116 | a single client is accessing an image at any given time. | |
11fdf7f2 TL |
117 | :Internal value: 4 |
118 | :CLI value: exclusive-lock | |
119 | :Added in: v0.92 (Hammer) | |
120 | :KRBD support: since v4.9 | |
121 | :Default: yes | |
122 | ||
123 | ``Object map`` | |
124 | ||
f67539c2 TL |
125 | :Description: Object map support depends on exclusive lock support. Block |
126 | devices are thin provisioned, which means that they only store | |
127 | data that actually has been written, ie. they are *sparse*. Object | |
128 | map support helps track which objects actually exist (have data | |
129 | stored on a device). Enabling object map support speeds up I/O | |
130 | operations for cloning, importing and exporting a sparsely | |
131 | populated image, and deleting. | |
11fdf7f2 TL |
132 | :Internal value: 8 |
133 | :CLI value: object-map | |
134 | :Added in: v0.93 (Hammer) | |
9f95a23c | 135 | :KRBD support: since v5.3 |
11fdf7f2 TL |
136 | :Default: yes |
137 | ||
138 | ||
139 | ``Fast-diff`` | |
140 | ||
f67539c2 TL |
141 | :Description: Fast-diff support depends on object map support and exclusive lock |
142 | support. It adds another property to the object map, which makes | |
143 | it much faster to generate diffs between snapshots of an image. | |
144 | It is also much faster to calculate the actual data usage of a | |
145 | snapshot or volume (``rbd du``). | |
11fdf7f2 TL |
146 | :Internal value: 16 |
147 | :CLI value: fast-diff | |
148 | :Added in: v9.0.1 (Infernalis) | |
9f95a23c | 149 | :KRBD support: since v5.3 |
11fdf7f2 TL |
150 | :Default: yes |
151 | ||
152 | ||
153 | ``Deep-flatten`` | |
154 | ||
f67539c2 TL |
155 | :Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of |
156 | an image, in addition to the image itself. Without it, snapshots | |
157 | of an image will still rely on the parent, so the parent cannot be | |
158 | deleted until the snapshots are first deleted. Deep-flatten makes | |
159 | a parent independent of its clones, even if they have snapshots, | |
160 | at the expense of using additional OSD device space. | |
11fdf7f2 TL |
161 | :Internal value: 32 |
162 | :CLI value: deep-flatten | |
163 | :Added in: v9.0.2 (Infernalis) | |
9f95a23c | 164 | :KRBD support: since v5.1 |
11fdf7f2 TL |
165 | :Default: yes |
166 | ||
167 | ||
168 | ``Journaling`` | |
169 | ||
f67539c2 TL |
170 | :Description: Journaling support depends on exclusive lock support. Journaling |
171 | records all modifications to an image in the order they occur. RBD | |
172 | mirroring can utilize the journal to replicate a crash-consistent | |
173 | image to a remote cluster. It is best to let ``rbd-mirror`` | |
174 | manage this feature only as needed, as enabling it long term may | |
175 | result in substantial additional OSD space consumption. | |
11fdf7f2 TL |
176 | :Internal value: 64 |
177 | :CLI value: journaling | |
178 | :Added in: v10.0.1 (Jewel) | |
179 | :KRBD support: no | |
180 | :Default: no | |
181 | ||
182 | ||
183 | ``Data pool`` | |
184 | ||
185 | :Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata. | |
186 | :Internal value: 128 | |
187 | :Added in: v11.1.0 (Kraken) | |
188 | :KRBD support: since v4.11 | |
189 | :Default: no | |
190 | ||
191 | ||
192 | ``Operations`` | |
193 | ||
194 | :Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create). | |
195 | :Internal value: 256 | |
196 | :Added in: v13.0.2 (Mimic) | |
197 | :KRBD support: since v4.16 | |
198 | ||
199 | ||
200 | ``Migrating`` | |
201 | ||
202 | :Description: Used to restrict older clients from opening an image when it is in migration state. | |
203 | :Internal value: 512 | |
204 | :Added in: v14.0.1 (Nautilus) | |
205 | :KRBD support: no | |
206 | ||
9f95a23c | 207 | ``Non-primary`` |
11fdf7f2 | 208 | |
9f95a23c TL |
209 | :Description: Used to restrict changes to non-primary images using snapshot-based mirroring. |
210 | :Internal value: 1024 | |
211 | :Added in: v15.2.0 (Octopus) | |
212 | :KRBD support: no | |
213 | ||
214 | ||
20effc67 | 215 | QoS Settings |
9f95a23c | 216 | ============ |
11fdf7f2 | 217 | |
20effc67 TL |
218 | librbd supports limiting per-image IO in several ways. These all apply |
219 | to a given image within a given process - the same image used in | |
220 | multiple places, e.g. two separate VMs, would have independent limits. | |
221 | ||
222 | * **IOPS:** number of I/Os per second (any type of I/O) | |
223 | * **read IOPS:** number of read I/Os per second | |
224 | * **write IOPS:** number of write I/Os per second | |
225 | * **bps:** bytes per second (any type of I/O) | |
226 | * **read bps:** bytes per second read | |
227 | * **write bps:** bytes per second written | |
228 | ||
229 | Each of these limits operates independently of each other. They are | |
230 | all off by default. Every type of limit throttles I/O using a token | |
231 | bucket algorithm, with the ability to configure the limit (average | |
232 | speed over time) and potential for a higher rate (a burst) for a short | |
233 | period of time (burst_seconds). When any of these limits is reached, | |
234 | and there is no burst capacity left, librbd reduces the rate of that | |
235 | type of I/O to the limit. | |
236 | ||
237 | For example, if a read bps limit of 100MB was configured, but writes | |
238 | were not limited, writes could proceed as quickly as possible, while | |
239 | reads would be throttled to 100MB/s on average. If a read bps burst of | |
240 | 150MB was set, and read burst seconds was set to five seconds, reads | |
241 | could proceed at 150MB/s for up to five seconds before dropping back | |
242 | to the 100MB/s limit. | |
243 | ||
244 | The following options configure these throttles: | |
245 | ||
246 | .. confval:: rbd_qos_iops_limit | |
247 | .. confval:: rbd_qos_iops_burst | |
248 | .. confval:: rbd_qos_iops_burst_seconds | |
249 | .. confval:: rbd_qos_read_iops_limit | |
250 | .. confval:: rbd_qos_read_iops_burst | |
251 | .. confval:: rbd_qos_read_iops_burst_seconds | |
252 | .. confval:: rbd_qos_write_iops_limit | |
253 | .. confval:: rbd_qos_write_iops_burst | |
254 | .. confval:: rbd_qos_write_iops_burst_seconds | |
255 | .. confval:: rbd_qos_bps_limit | |
256 | .. confval:: rbd_qos_bps_burst | |
257 | .. confval:: rbd_qos_bps_burst_seconds | |
258 | .. confval:: rbd_qos_read_bps_limit | |
259 | .. confval:: rbd_qos_read_bps_burst | |
260 | .. confval:: rbd_qos_read_bps_burst_seconds | |
261 | .. confval:: rbd_qos_write_bps_limit | |
262 | .. confval:: rbd_qos_write_bps_burst | |
263 | .. confval:: rbd_qos_write_bps_burst_seconds | |
264 | .. confval:: rbd_qos_schedule_tick_min | |
265 | .. confval:: rbd_qos_exclude_ops |