]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/cache-tiering.rst
bump version to 18.2.2-pve1
[ceph.git] / ceph / doc / rados / operations / cache-tiering.rst
1 ===============
2 Cache Tiering
3 ===============
4
5 .. warning:: Cache tiering has been deprecated in the Reef release as it
6 has lacked a maintainer for a very long time. This does not mean
7 it will be certainly removed, but we may choose to remove it
8 without much further notice.
9
10 A cache tier provides Ceph Clients with better I/O performance for a subset of
11 the data stored in a backing storage tier. Cache tiering involves creating a
12 pool of relatively fast/expensive storage devices (e.g., solid state drives)
13 configured to act as a cache tier, and a backing pool of either erasure-coded
14 or relatively slower/cheaper devices configured to act as an economical storage
15 tier. The Ceph objecter handles where to place the objects and the tiering
16 agent determines when to flush objects from the cache to the backing storage
17 tier. So the cache tier and the backing storage tier are completely transparent
18 to Ceph clients.
19
20
21 .. ditaa::
22 +-------------+
23 | Ceph Client |
24 +------+------+
25 ^
26 Tiering is |
27 Transparent | Faster I/O
28 to Ceph | +---------------+
29 Client Ops | | |
30 | +----->+ Cache Tier |
31 | | | |
32 | | +-----+---+-----+
33 | | | ^
34 v v | | Active Data in Cache Tier
35 +------+----+--+ | |
36 | Objecter | | |
37 +-----------+--+ | |
38 ^ | | Inactive Data in Storage Tier
39 | v |
40 | +-----+---+-----+
41 | | |
42 +----->| Storage Tier |
43 | |
44 +---------------+
45 Slower I/O
46
47
48 The cache tiering agent handles the migration of data between the cache tier
49 and the backing storage tier automatically. However, admins have the ability to
50 configure how this migration takes place by setting the ``cache-mode``. There are
51 two main scenarios:
52
53 - **writeback** mode: If the base tier and the cache tier are configured in
54 ``writeback`` mode, Ceph clients receive an ACK from the base tier every time
55 they write data to it. Then the cache tiering agent determines whether
56 ``osd_tier_default_cache_min_write_recency_for_promote`` has been set. If it
57 has been set and the data has been written more than a specified number of
58 times per interval, the data is promoted to the cache tier.
59
60 When Ceph clients need access to data stored in the base tier, the cache
61 tiering agent reads the data from the base tier and returns it to the client.
62 While data is being read from the base tier, the cache tiering agent consults
63 the value of ``osd_tier_default_cache_min_read_recency_for_promote`` and
64 decides whether to promote that data from the base tier to the cache tier.
65 When data has been promoted from the base tier to the cache tier, the Ceph
66 client is able to perform I/O operations on it using the cache tier. This is
67 well-suited for mutable data (for example, photo/video editing, transactional
68 data).
69
70 - **readproxy** mode: This mode will use any objects that already
71 exist in the cache tier, but if an object is not present in the
72 cache the request will be proxied to the base tier. This is useful
73 for transitioning from ``writeback`` mode to a disabled cache as it
74 allows the workload to function properly while the cache is drained,
75 without adding any new objects to the cache.
76
77 Other cache modes are:
78
79 - **readonly** promotes objects to the cache on read operations only; write
80 operations are forwarded to the base tier. This mode is intended for
81 read-only workloads that do not require consistency to be enforced by the
82 storage system. (**Warning**: when objects are updated in the base tier,
83 Ceph makes **no** attempt to sync these updates to the corresponding objects
84 in the cache. Since this mode is considered experimental, a
85 ``--yes-i-really-mean-it`` option must be passed in order to enable it.)
86
87 - **none** is used to completely disable caching.
88
89
90 A word of caution
91 =================
92
93 Cache tiering will *degrade* performance for most workloads. Users should use
94 extreme caution before using this feature.
95
96 * *Workload dependent*: Whether a cache will improve performance is
97 highly dependent on the workload. Because there is a cost
98 associated with moving objects into or out of the cache, it can only
99 be effective when there is a *large skew* in the access pattern in
100 the data set, such that most of the requests touch a small number of
101 objects. The cache pool should be large enough to capture the
102 working set for your workload to avoid thrashing.
103
104 * *Difficult to benchmark*: Most benchmarks that users run to measure
105 performance will show terrible performance with cache tiering, in
106 part because very few of them skew requests toward a small set of
107 objects, it can take a long time for the cache to "warm up," and
108 because the warm-up cost can be high.
109
110 * *Usually slower*: For workloads that are not cache tiering-friendly,
111 performance is often slower than a normal RADOS pool without cache
112 tiering enabled.
113
114 * *librados object enumeration*: The librados-level object enumeration
115 API is not meant to be coherent in the presence of the case. If
116 your application is using librados directly and relies on object
117 enumeration, cache tiering will probably not work as expected.
118 (This is not a problem for RGW, RBD, or CephFS.)
119
120 * *Complexity*: Enabling cache tiering means that a lot of additional
121 machinery and complexity within the RADOS cluster is being used.
122 This increases the probability that you will encounter a bug in the system
123 that other users have not yet encountered and will put your deployment at a
124 higher level of risk.
125
126 Known Good Workloads
127 --------------------
128
129 * *RGW time-skewed*: If the RGW workload is such that almost all read
130 operations are directed at recently written objects, a simple cache
131 tiering configuration that destages recently written objects from
132 the cache to the base tier after a configurable period can work
133 well.
134
135 Known Bad Workloads
136 -------------------
137
138 The following configurations are *known to work poorly* with cache
139 tiering.
140
141 * *RBD with replicated cache and erasure-coded base*: This is a common
142 request, but usually does not perform well. Even reasonably skewed
143 workloads still send some small writes to cold objects, and because
144 small writes are not yet supported by the erasure-coded pool, entire
145 (usually 4 MB) objects must be migrated into the cache in order to
146 satisfy a small (often 4 KB) write. Only a handful of users have
147 successfully deployed this configuration, and it only works for them
148 because their data is extremely cold (backups) and they are not in
149 any way sensitive to performance.
150
151 * *RBD with replicated cache and base*: RBD with a replicated base
152 tier does better than when the base is erasure coded, but it is
153 still highly dependent on the amount of skew in the workload, and
154 very difficult to validate. The user will need to have a good
155 understanding of their workload and will need to tune the cache
156 tiering parameters carefully.
157
158
159 Setting Up Pools
160 ================
161
162 To set up cache tiering, you must have two pools. One will act as the
163 backing storage and the other will act as the cache.
164
165
166 Setting Up a Backing Storage Pool
167 ---------------------------------
168
169 Setting up a backing storage pool typically involves one of two scenarios:
170
171 - **Standard Storage**: In this scenario, the pool stores multiple copies
172 of an object in the Ceph Storage Cluster.
173
174 - **Erasure Coding:** In this scenario, the pool uses erasure coding to
175 store data much more efficiently with a small performance tradeoff.
176
177 In the standard storage scenario, you can setup a CRUSH rule to establish
178 the failure domain (e.g., osd, host, chassis, rack, row, etc.). Ceph OSD
179 Daemons perform optimally when all storage drives in the rule are of the
180 same size, speed (both RPMs and throughput) and type. See `CRUSH Maps`_
181 for details on creating a rule. Once you have created a rule, create
182 a backing storage pool.
183
184 In the erasure coding scenario, the pool creation arguments will generate the
185 appropriate rule automatically. See `Create a Pool`_ for details.
186
187 In subsequent examples, we will refer to the backing storage pool
188 as ``cold-storage``.
189
190
191 Setting Up a Cache Pool
192 -----------------------
193
194 Setting up a cache pool follows the same procedure as the standard storage
195 scenario, but with this difference: the drives for the cache tier are typically
196 high performance drives that reside in their own servers and have their own
197 CRUSH rule. When setting up such a rule, it should take account of the hosts
198 that have the high performance drives while omitting the hosts that don't. See
199 :ref:`CRUSH Device Class<crush-map-device-class>` for details.
200
201
202 In subsequent examples, we will refer to the cache pool as ``hot-storage`` and
203 the backing pool as ``cold-storage``.
204
205 For cache tier configuration and default values, see
206 `Pools - Set Pool Values`_.
207
208
209 Creating a Cache Tier
210 =====================
211
212 Setting up a cache tier involves associating a backing storage pool with
213 a cache pool:
214
215 .. prompt:: bash $
216
217 ceph osd tier add {storagepool} {cachepool}
218
219 For example:
220
221 .. prompt:: bash $
222
223 ceph osd tier add cold-storage hot-storage
224
225 To set the cache mode, execute the following:
226
227 .. prompt:: bash $
228
229 ceph osd tier cache-mode {cachepool} {cache-mode}
230
231 For example:
232
233 .. prompt:: bash $
234
235 ceph osd tier cache-mode hot-storage writeback
236
237 The cache tiers overlay the backing storage tier, so they require one
238 additional step: you must direct all client traffic from the storage pool to
239 the cache pool. To direct client traffic directly to the cache pool, execute
240 the following:
241
242 .. prompt:: bash $
243
244 ceph osd tier set-overlay {storagepool} {cachepool}
245
246 For example:
247
248 .. prompt:: bash $
249
250 ceph osd tier set-overlay cold-storage hot-storage
251
252
253 Configuring a Cache Tier
254 ========================
255
256 Cache tiers have several configuration options. You may set
257 cache tier configuration options with the following usage:
258
259 .. prompt:: bash $
260
261 ceph osd pool set {cachepool} {key} {value}
262
263 See `Pools - Set Pool Values`_ for details.
264
265
266 Target Size and Type
267 --------------------
268
269 Ceph's production cache tiers use a `Bloom Filter`_ for the ``hit_set_type``:
270
271 .. prompt:: bash $
272
273 ceph osd pool set {cachepool} hit_set_type bloom
274
275 For example:
276
277 .. prompt:: bash $
278
279 ceph osd pool set hot-storage hit_set_type bloom
280
281 The ``hit_set_count`` and ``hit_set_period`` define how many such HitSets to
282 store, and how much time each HitSet should cover:
283
284 .. prompt:: bash $
285
286 ceph osd pool set {cachepool} hit_set_count 12
287 ceph osd pool set {cachepool} hit_set_period 14400
288 ceph osd pool set {cachepool} target_max_bytes 1000000000000
289
290 .. note:: A larger ``hit_set_count`` results in more RAM consumed by
291 the ``ceph-osd`` process.
292
293 Binning accesses over time allows Ceph to determine whether a Ceph client
294 accessed an object at least once, or more than once over a time period
295 ("age" vs "temperature").
296
297 The ``min_read_recency_for_promote`` defines how many HitSets to check for the
298 existence of an object when handling a read operation. The checking result is
299 used to decide whether to promote the object asynchronously. Its value should be
300 between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted.
301 If it's set to 1, the current HitSet is checked. And if this object is in the
302 current HitSet, it's promoted. Otherwise not. For the other values, the exact
303 number of archive HitSets are checked. The object is promoted if the object is
304 found in any of the most recent ``min_read_recency_for_promote`` HitSets.
305
306 A similar parameter can be set for the write operation, which is
307 ``min_write_recency_for_promote``:
308
309 .. prompt:: bash $
310
311 ceph osd pool set {cachepool} min_read_recency_for_promote 2
312 ceph osd pool set {cachepool} min_write_recency_for_promote 2
313
314 .. note:: The longer the period and the higher the
315 ``min_read_recency_for_promote`` and
316 ``min_write_recency_for_promote``values, the more RAM the ``ceph-osd``
317 daemon consumes. In particular, when the agent is active to flush
318 or evict cache objects, all ``hit_set_count`` HitSets are loaded
319 into RAM.
320
321
322 Cache Sizing
323 ------------
324
325 The cache tiering agent performs two main functions:
326
327 - **Flushing:** The agent identifies modified (or dirty) objects and forwards
328 them to the storage pool for long-term storage.
329
330 - **Evicting:** The agent identifies objects that haven't been modified
331 (or clean) and evicts the least recently used among them from the cache.
332
333
334 Absolute Sizing
335 ~~~~~~~~~~~~~~~
336
337 The cache tiering agent can flush or evict objects based upon the total number
338 of bytes or the total number of objects. To specify a maximum number of bytes,
339 execute the following:
340
341 .. prompt:: bash $
342
343 ceph osd pool set {cachepool} target_max_bytes {#bytes}
344
345 For example, to flush or evict at 1 TB, execute the following:
346
347 .. prompt:: bash $
348
349 ceph osd pool set hot-storage target_max_bytes 1099511627776
350
351 To specify the maximum number of objects, execute the following:
352
353 .. prompt:: bash $
354
355 ceph osd pool set {cachepool} target_max_objects {#objects}
356
357 For example, to flush or evict at 1M objects, execute the following:
358
359 .. prompt:: bash $
360
361 ceph osd pool set hot-storage target_max_objects 1000000
362
363 .. note:: Ceph is not able to determine the size of a cache pool automatically, so
364 the configuration on the absolute size is required here, otherwise the
365 flush/evict will not work. If you specify both limits, the cache tiering
366 agent will begin flushing or evicting when either threshold is triggered.
367
368 .. note:: All client requests will be blocked only when ``target_max_bytes`` or
369 ``target_max_objects`` reached
370
371 Relative Sizing
372 ~~~~~~~~~~~~~~~
373
374 The cache tiering agent can flush or evict objects relative to the size of the
375 cache pool(specified by ``target_max_bytes`` / ``target_max_objects`` in
376 `Absolute sizing`_). When the cache pool consists of a certain percentage of
377 modified (or dirty) objects, the cache tiering agent will flush them to the
378 storage pool. To set the ``cache_target_dirty_ratio``, execute the following:
379
380 .. prompt:: bash $
381
382 ceph osd pool set {cachepool} cache_target_dirty_ratio {0.0..1.0}
383
384 For example, setting the value to ``0.4`` will begin flushing modified
385 (dirty) objects when they reach 40% of the cache pool's capacity:
386
387 .. prompt:: bash $
388
389 ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
390
391 When the dirty objects reaches a certain percentage of its capacity, flush dirty
392 objects with a higher speed. To set the ``cache_target_dirty_high_ratio``:
393
394 .. prompt:: bash $
395
396 ceph osd pool set {cachepool} cache_target_dirty_high_ratio {0.0..1.0}
397
398 For example, setting the value to ``0.6`` will begin aggressively flush dirty
399 objects when they reach 60% of the cache pool's capacity. obviously, we'd
400 better set the value between dirty_ratio and full_ratio:
401
402 .. prompt:: bash $
403
404 ceph osd pool set hot-storage cache_target_dirty_high_ratio 0.6
405
406 When the cache pool reaches a certain percentage of its capacity, the cache
407 tiering agent will evict objects to maintain free capacity. To set the
408 ``cache_target_full_ratio``, execute the following:
409
410 .. prompt:: bash $
411
412 ceph osd pool set {cachepool} cache_target_full_ratio {0.0..1.0}
413
414 For example, setting the value to ``0.8`` will begin flushing unmodified
415 (clean) objects when they reach 80% of the cache pool's capacity:
416
417 .. prompt:: bash $
418
419 ceph osd pool set hot-storage cache_target_full_ratio 0.8
420
421
422 Cache Age
423 ---------
424
425 You can specify the minimum age of an object before the cache tiering agent
426 flushes a recently modified (or dirty) object to the backing storage pool:
427
428 .. prompt:: bash $
429
430 ceph osd pool set {cachepool} cache_min_flush_age {#seconds}
431
432 For example, to flush modified (or dirty) objects after 10 minutes, execute the
433 following:
434
435 .. prompt:: bash $
436
437 ceph osd pool set hot-storage cache_min_flush_age 600
438
439 You can specify the minimum age of an object before it will be evicted from the
440 cache tier:
441
442 .. prompt:: bash $
443
444 ceph osd pool {cache-tier} cache_min_evict_age {#seconds}
445
446 For example, to evict objects after 30 minutes, execute the following:
447
448 .. prompt:: bash $
449
450 ceph osd pool set hot-storage cache_min_evict_age 1800
451
452
453 Removing a Cache Tier
454 =====================
455
456 Removing a cache tier differs depending on whether it is a writeback
457 cache or a read-only cache.
458
459
460 Removing a Read-Only Cache
461 --------------------------
462
463 Since a read-only cache does not have modified data, you can disable
464 and remove it without losing any recent changes to objects in the cache.
465
466 #. Change the cache-mode to ``none`` to disable it.:
467
468 .. prompt:: bash
469
470 ceph osd tier cache-mode {cachepool} none
471
472 For example:
473
474 .. prompt:: bash $
475
476 ceph osd tier cache-mode hot-storage none
477
478 #. Remove the cache pool from the backing pool.:
479
480 .. prompt:: bash $
481
482 ceph osd tier remove {storagepool} {cachepool}
483
484 For example:
485
486 .. prompt:: bash $
487
488 ceph osd tier remove cold-storage hot-storage
489
490
491 Removing a Writeback Cache
492 --------------------------
493
494 Since a writeback cache may have modified data, you must take steps to ensure
495 that you do not lose any recent changes to objects in the cache before you
496 disable and remove it.
497
498
499 #. Change the cache mode to ``proxy`` so that new and modified objects will
500 flush to the backing storage pool.:
501
502 .. prompt:: bash $
503
504 ceph osd tier cache-mode {cachepool} proxy
505
506 For example:
507
508 .. prompt:: bash $
509
510 ceph osd tier cache-mode hot-storage proxy
511
512
513 #. Ensure that the cache pool has been flushed. This may take a few minutes:
514
515 .. prompt:: bash $
516
517 rados -p {cachepool} ls
518
519 If the cache pool still has objects, you can flush them manually.
520 For example:
521
522 .. prompt:: bash $
523
524 rados -p {cachepool} cache-flush-evict-all
525
526
527 #. Remove the overlay so that clients will not direct traffic to the cache.:
528
529 .. prompt:: bash $
530
531 ceph osd tier remove-overlay {storagetier}
532
533 For example:
534
535 .. prompt:: bash $
536
537 ceph osd tier remove-overlay cold-storage
538
539
540 #. Finally, remove the cache tier pool from the backing storage pool.:
541
542 .. prompt:: bash $
543
544 ceph osd tier remove {storagepool} {cachepool}
545
546 For example:
547
548 .. prompt:: bash $
549
550 ceph osd tier remove cold-storage hot-storage
551
552
553 .. _Create a Pool: ../pools#create-a-pool
554 .. _Pools - Set Pool Values: ../pools#set-pool-values
555 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
556 .. _CRUSH Maps: ../crush-map
557 .. _Absolute Sizing: #absolute-sizing