]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/cache-tiering.rst
update sources to v12.2.3
[ceph.git] / ceph / doc / rados / operations / cache-tiering.rst
1 ===============
2 Cache Tiering
3 ===============
4
5 A cache tier provides Ceph Clients with better I/O performance for a subset of
6 the data stored in a backing storage tier. Cache tiering involves creating a
7 pool of relatively fast/expensive storage devices (e.g., solid state drives)
8 configured to act as a cache tier, and a backing pool of either erasure-coded
9 or relatively slower/cheaper devices configured to act as an economical storage
10 tier. The Ceph objecter handles where to place the objects and the tiering
11 agent determines when to flush objects from the cache to the backing storage
12 tier. So the cache tier and the backing storage tier are completely transparent
13 to Ceph clients.
14
15
16 .. ditaa::
17 +-------------+
18 | Ceph Client |
19 +------+------+
20 ^
21 Tiering is |
22 Transparent | Faster I/O
23 to Ceph | +---------------+
24 Client Ops | | |
25 | +----->+ Cache Tier |
26 | | | |
27 | | +-----+---+-----+
28 | | | ^
29 v v | | Active Data in Cache Tier
30 +------+----+--+ | |
31 | Objecter | | |
32 +-----------+--+ | |
33 ^ | | Inactive Data in Storage Tier
34 | v |
35 | +-----+---+-----+
36 | | |
37 +----->| Storage Tier |
38 | |
39 +---------------+
40 Slower I/O
41
42
43 The cache tiering agent handles the migration of data between the cache tier
44 and the backing storage tier automatically. However, admins have the ability to
45 configure how this migration takes place. There are two main scenarios:
46
47 - **Writeback Mode:** When admins configure tiers with ``writeback`` mode, Ceph
48 clients write data to the cache tier and receive an ACK from the cache tier.
49 In time, the data written to the cache tier migrates to the storage tier
50 and gets flushed from the cache tier. Conceptually, the cache tier is
51 overlaid "in front" of the backing storage tier. When a Ceph client needs
52 data that resides in the storage tier, the cache tiering agent migrates the
53 data to the cache tier on read, then it is sent to the Ceph client.
54 Thereafter, the Ceph client can perform I/O using the cache tier, until the
55 data becomes inactive. This is ideal for mutable data (e.g., photo/video
56 editing, transactional data, etc.).
57
58 - **Read-proxy Mode:** This mode will use any objects that already
59 exist in the cache tier, but if an object is not present in the
60 cache the request will be proxied to the base tier. This is useful
61 for transitioning from ``writeback`` mode to a disabled cache as it
62 allows the workload to function properly while the cache is drained,
63 without adding any new objects to the cache.
64
65 A word of caution
66 =================
67
68 Cache tiering will *degrade* performance for most workloads. Users should use
69 extreme caution before using this feature.
70
71 * *Workload dependent*: Whether a cache will improve performance is
72 highly dependent on the workload. Because there is a cost
73 associated with moving objects into or out of the cache, it can only
74 be effective when there is a *large skew* in the access pattern in
75 the data set, such that most of the requests touch a small number of
76 objects. The cache pool should be large enough to capture the
77 working set for your workload to avoid thrashing.
78
79 * *Difficult to benchmark*: Most benchmarks that users run to measure
80 performance will show terrible performance with cache tiering, in
81 part because very few of them skew requests toward a small set of
82 objects, it can take a long time for the cache to "warm up," and
83 because the warm-up cost can be high.
84
85 * *Usually slower*: For workloads that are not cache tiering-friendly,
86 performance is often slower than a normal RADOS pool without cache
87 tiering enabled.
88
89 * *librados object enumeration*: The librados-level object enumeration
90 API is not meant to be coherent in the presence of the case. If
91 your applicatoin is using librados directly and relies on object
92 enumeration, cache tiering will probably not work as expected.
93 (This is not a problem for RGW, RBD, or CephFS.)
94
95 * *Complexity*: Enabling cache tiering means that a lot of additional
96 machinery and complexity within the RADOS cluster is being used.
97 This increases the probability that you will encounter a bug in the system
98 that other users have not yet encountered and will put your deployment at a
99 higher level of risk.
100
101 Known Good Workloads
102 --------------------
103
104 * *RGW time-skewed*: If the RGW workload is such that almost all read
105 operations are directed at recently written objects, a simple cache
106 tiering configuration that destages recently written objects from
107 the cache to the base tier after a configurable period can work
108 well.
109
110 Known Bad Workloads
111 -------------------
112
113 The following configurations are *known to work poorly* with cache
114 tiering.
115
116 * *RBD with replicated cache and erasure-coded base*: This is a common
117 request, but usually does not perform well. Even reasonably skewed
118 workloads still send some small writes to cold objects, and because
119 small writes are not yet supported by the erasure-coded pool, entire
120 (usually 4 MB) objects must be migrated into the cache in order to
121 satisfy a small (often 4 KB) write. Only a handful of users have
122 successfully deployed this configuration, and it only works for them
123 because their data is extremely cold (backups) and they are not in
124 any way sensitive to performance.
125
126 * *RBD with replicated cache and base*: RBD with a replicated base
127 tier does better than when the base is erasure coded, but it is
128 still highly dependent on the amount of skew in the workload, and
129 very difficult to validate. The user will need to have a good
130 understanding of their workload and will need to tune the cache
131 tiering parameters carefully.
132
133
134 Setting Up Pools
135 ================
136
137 To set up cache tiering, you must have two pools. One will act as the
138 backing storage and the other will act as the cache.
139
140
141 Setting Up a Backing Storage Pool
142 ---------------------------------
143
144 Setting up a backing storage pool typically involves one of two scenarios:
145
146 - **Standard Storage**: In this scenario, the pool stores multiple copies
147 of an object in the Ceph Storage Cluster.
148
149 - **Erasure Coding:** In this scenario, the pool uses erasure coding to
150 store data much more efficiently with a small performance tradeoff.
151
152 In the standard storage scenario, you can setup a CRUSH rule to establish
153 the failure domain (e.g., osd, host, chassis, rack, row, etc.). Ceph OSD
154 Daemons perform optimally when all storage drives in the rule are of the
155 same size, speed (both RPMs and throughput) and type. See `CRUSH Maps`_
156 for details on creating a rule. Once you have created a rule, create
157 a backing storage pool.
158
159 In the erasure coding scenario, the pool creation arguments will generate the
160 appropriate rule automatically. See `Create a Pool`_ for details.
161
162 In subsequent examples, we will refer to the backing storage pool
163 as ``cold-storage``.
164
165
166 Setting Up a Cache Pool
167 -----------------------
168
169 Setting up a cache pool follows the same procedure as the standard storage
170 scenario, but with this difference: the drives for the cache tier are typically
171 high performance drives that reside in their own servers and have their own
172 CRUSH rule. When setting up such a rule, it should take account of the hosts
173 that have the high performance drives while omitting the hosts that don't. See
174 `Placing Different Pools on Different OSDs`_ for details.
175
176
177 In subsequent examples, we will refer to the cache pool as ``hot-storage`` and
178 the backing pool as ``cold-storage``.
179
180 For cache tier configuration and default values, see
181 `Pools - Set Pool Values`_.
182
183
184 Creating a Cache Tier
185 =====================
186
187 Setting up a cache tier involves associating a backing storage pool with
188 a cache pool ::
189
190 ceph osd tier add {storagepool} {cachepool}
191
192 For example ::
193
194 ceph osd tier add cold-storage hot-storage
195
196 To set the cache mode, execute the following::
197
198 ceph osd tier cache-mode {cachepool} {cache-mode}
199
200 For example::
201
202 ceph osd tier cache-mode hot-storage writeback
203
204 The cache tiers overlay the backing storage tier, so they require one
205 additional step: you must direct all client traffic from the storage pool to
206 the cache pool. To direct client traffic directly to the cache pool, execute
207 the following::
208
209 ceph osd tier set-overlay {storagepool} {cachepool}
210
211 For example::
212
213 ceph osd tier set-overlay cold-storage hot-storage
214
215
216 Configuring a Cache Tier
217 ========================
218
219 Cache tiers have several configuration options. You may set
220 cache tier configuration options with the following usage::
221
222 ceph osd pool set {cachepool} {key} {value}
223
224 See `Pools - Set Pool Values`_ for details.
225
226
227 Target Size and Type
228 --------------------
229
230 Ceph's production cache tiers use a `Bloom Filter`_ for the ``hit_set_type``::
231
232 ceph osd pool set {cachepool} hit_set_type bloom
233
234 For example::
235
236 ceph osd pool set hot-storage hit_set_type bloom
237
238 The ``hit_set_count`` and ``hit_set_period`` define how much time each HitSet
239 should cover, and how many such HitSets to store. ::
240
241 ceph osd pool set {cachepool} hit_set_count 12
242 ceph osd pool set {cachepool} hit_set_period 14400
243 ceph osd pool set {cachepool} target_max_bytes 1000000000000
244
245 .. note:: A larger ``hit_set_count`` results in more RAM consumed by
246 the ``ceph-osd`` process.
247
248 Binning accesses over time allows Ceph to determine whether a Ceph client
249 accessed an object at least once, or more than once over a time period
250 ("age" vs "temperature").
251
252 The ``min_read_recency_for_promote`` defines how many HitSets to check for the
253 existence of an object when handling a read operation. The checking result is
254 used to decide whether to promote the object asynchronously. Its value should be
255 between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted.
256 If it's set to 1, the current HitSet is checked. And if this object is in the
257 current HitSet, it's promoted. Otherwise not. For the other values, the exact
258 number of archive HitSets are checked. The object is promoted if the object is
259 found in any of the most recent ``min_read_recency_for_promote`` HitSets.
260
261 A similar parameter can be set for the write operation, which is
262 ``min_write_recency_for_promote``. ::
263
264 ceph osd pool set {cachepool} min_read_recency_for_promote 2
265 ceph osd pool set {cachepool} min_write_recency_for_promote 2
266
267 .. note:: The longer the period and the higher the
268 ``min_read_recency_for_promote`` and
269 ``min_write_recency_for_promote``values, the more RAM the ``ceph-osd``
270 daemon consumes. In particular, when the agent is active to flush
271 or evict cache objects, all ``hit_set_count`` HitSets are loaded
272 into RAM.
273
274
275 Cache Sizing
276 ------------
277
278 The cache tiering agent performs two main functions:
279
280 - **Flushing:** The agent identifies modified (or dirty) objects and forwards
281 them to the storage pool for long-term storage.
282
283 - **Evicting:** The agent identifies objects that haven't been modified
284 (or clean) and evicts the least recently used among them from the cache.
285
286
287 Absolute Sizing
288 ~~~~~~~~~~~~~~~
289
290 The cache tiering agent can flush or evict objects based upon the total number
291 of bytes or the total number of objects. To specify a maximum number of bytes,
292 execute the following::
293
294 ceph osd pool set {cachepool} target_max_bytes {#bytes}
295
296 For example, to flush or evict at 1 TB, execute the following::
297
298 ceph osd pool set hot-storage target_max_bytes 1099511627776
299
300
301 To specify the maximum number of objects, execute the following::
302
303 ceph osd pool set {cachepool} target_max_objects {#objects}
304
305 For example, to flush or evict at 1M objects, execute the following::
306
307 ceph osd pool set hot-storage target_max_objects 1000000
308
309 .. note:: Ceph is not able to determine the size of a cache pool automatically, so
310 the configuration on the absolute size is required here, otherwise the
311 flush/evict will not work. If you specify both limits, the cache tiering
312 agent will begin flushing or evicting when either threshold is triggered.
313
314 .. note:: All client requests will be blocked only when ``target_max_bytes`` or
315 ``target_max_objects`` reached
316
317 Relative Sizing
318 ~~~~~~~~~~~~~~~
319
320 The cache tiering agent can flush or evict objects relative to the size of the
321 cache pool(specified by ``target_max_bytes`` / ``target_max_objects`` in
322 `Absolute sizing`_). When the cache pool consists of a certain percentage of
323 modified (or dirty) objects, the cache tiering agent will flush them to the
324 storage pool. To set the ``cache_target_dirty_ratio``, execute the following::
325
326 ceph osd pool set {cachepool} cache_target_dirty_ratio {0.0..1.0}
327
328 For example, setting the value to ``0.4`` will begin flushing modified
329 (dirty) objects when they reach 40% of the cache pool's capacity::
330
331 ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
332
333 When the dirty objects reaches a certain percentage of its capacity, flush dirty
334 objects with a higher speed. To set the ``cache_target_dirty_high_ratio``::
335
336 ceph osd pool set {cachepool} cache_target_dirty_high_ratio {0.0..1.0}
337
338 For example, setting the value to ``0.6`` will begin aggressively flush dirty objects
339 when they reach 60% of the cache pool's capacity. obviously, we'd better set the value
340 between dirty_ratio and full_ratio::
341
342 ceph osd pool set hot-storage cache_target_dirty_high_ratio 0.6
343
344 When the cache pool reaches a certain percentage of its capacity, the cache
345 tiering agent will evict objects to maintain free capacity. To set the
346 ``cache_target_full_ratio``, execute the following::
347
348 ceph osd pool set {cachepool} cache_target_full_ratio {0.0..1.0}
349
350 For example, setting the value to ``0.8`` will begin flushing unmodified
351 (clean) objects when they reach 80% of the cache pool's capacity::
352
353 ceph osd pool set hot-storage cache_target_full_ratio 0.8
354
355
356 Cache Age
357 ---------
358
359 You can specify the minimum age of an object before the cache tiering agent
360 flushes a recently modified (or dirty) object to the backing storage pool::
361
362 ceph osd pool set {cachepool} cache_min_flush_age {#seconds}
363
364 For example, to flush modified (or dirty) objects after 10 minutes, execute
365 the following::
366
367 ceph osd pool set hot-storage cache_min_flush_age 600
368
369 You can specify the minimum age of an object before it will be evicted from
370 the cache tier::
371
372 ceph osd pool {cache-tier} cache_min_evict_age {#seconds}
373
374 For example, to evict objects after 30 minutes, execute the following::
375
376 ceph osd pool set hot-storage cache_min_evict_age 1800
377
378
379 Removing a Cache Tier
380 =====================
381
382 Removing a cache tier differs depending on whether it is a writeback
383 cache or a read-only cache.
384
385
386 Removing a Read-Only Cache
387 --------------------------
388
389 Since a read-only cache does not have modified data, you can disable
390 and remove it without losing any recent changes to objects in the cache.
391
392 #. Change the cache-mode to ``none`` to disable it. ::
393
394 ceph osd tier cache-mode {cachepool} none
395
396 For example::
397
398 ceph osd tier cache-mode hot-storage none
399
400 #. Remove the cache pool from the backing pool. ::
401
402 ceph osd tier remove {storagepool} {cachepool}
403
404 For example::
405
406 ceph osd tier remove cold-storage hot-storage
407
408
409
410 Removing a Writeback Cache
411 --------------------------
412
413 Since a writeback cache may have modified data, you must take steps to ensure
414 that you do not lose any recent changes to objects in the cache before you
415 disable and remove it.
416
417
418 #. Change the cache mode to ``forward`` so that new and modified objects will
419 flush to the backing storage pool. ::
420
421 ceph osd tier cache-mode {cachepool} forward
422
423 For example::
424
425 ceph osd tier cache-mode hot-storage forward
426
427
428 #. Ensure that the cache pool has been flushed. This may take a few minutes::
429
430 rados -p {cachepool} ls
431
432 If the cache pool still has objects, you can flush them manually.
433 For example::
434
435 rados -p {cachepool} cache-flush-evict-all
436
437
438 #. Remove the overlay so that clients will not direct traffic to the cache. ::
439
440 ceph osd tier remove-overlay {storagetier}
441
442 For example::
443
444 ceph osd tier remove-overlay cold-storage
445
446
447 #. Finally, remove the cache tier pool from the backing storage pool. ::
448
449 ceph osd tier remove {storagepool} {cachepool}
450
451 For example::
452
453 ceph osd tier remove cold-storage hot-storage
454
455
456 .. _Create a Pool: ../pools#create-a-pool
457 .. _Pools - Set Pool Values: ../pools#set-pool-values
458 .. _Placing Different Pools on Different OSDs: ../crush-map/#placing-different-pools-on-different-osds
459 .. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
460 .. _CRUSH Maps: ../crush-map
461 .. _Absolute Sizing: #absolute-sizing