ceph/doc/rados/operations/cache-tiering.rst

   1 ===============
   2  Cache Tiering
   3 ===============
   4
   5 .. warning:: Cache tiering has been deprecated in the Reef release as it
   6              has lacked a maintainer for a very long time. This does not mean
   7              it will be certainly removed, but we may choose to remove it
   8              without much further notice.
   9
  10 A cache tier provides Ceph Clients with better I/O performance for a subset of
  11 the data stored in a backing storage tier. Cache tiering involves creating a
  12 pool of relatively fast/expensive storage devices (e.g., solid state drives)
  13 configured to act as a cache tier, and a backing pool of either erasure-coded
  14 or relatively slower/cheaper devices configured to act as an economical storage
  15 tier. The Ceph objecter handles where to place the objects and the tiering
  16 agent determines when to flush objects from the cache to the backing storage
  17 tier. So the cache tier and the backing storage tier are completely transparent
  18 to Ceph clients.
  19
  20
  21 .. ditaa::
  22            +-------------+
  23            | Ceph Client |
  24            +------+------+
  25                   ^
  26      Tiering is   |
  27     Transparent   |              Faster I/O
  28         to Ceph   |           +---------------+
  29      Client Ops   |           |               |
  30                   |    +----->+   Cache Tier  |
  31                   |    |      |               |
  32                   |    |      +-----+---+-----+
  33                   |    |            |   ^
  34                   v    v            |   |   Active Data in Cache Tier
  35            +------+----+--+         |   |
  36            |   Objecter   |         |   |
  37            +-----------+--+         |   |
  38                        ^            |   |   Inactive Data in Storage Tier
  39                        |            v   |
  40                        |      +-----+---+-----+
  41                        |      |               |
  42                        +----->|  Storage Tier |
  43                               |               |
  44                               +---------------+
  45                                  Slower I/O
  46
  47
  48 The cache tiering agent handles the migration of data between the cache tier
  49 and the backing storage tier automatically. However, admins have the ability to
  50 configure how this migration takes place by setting the ``cache-mode``. There are
  51 two main scenarios:
  52
  53 - **writeback** mode: If the base tier and the cache tier are configured in
  54   ``writeback`` mode, Ceph clients receive an ACK from the base tier every time
  55   they write data to it. Then the cache tiering agent determines whether
  56   ``osd_tier_default_cache_min_write_recency_for_promote`` has been set. If it
  57   has been set and the data has been written more than a specified number of
  58   times per interval, the data is promoted to the cache tier.
  59
  60   When Ceph clients need access to data stored in the base tier, the cache
  61   tiering agent reads the data from the base tier and returns it to the client.
  62   While data is being read from the base tier, the cache tiering agent consults
  63   the value of ``osd_tier_default_cache_min_read_recency_for_promote`` and
  64   decides whether to promote that data from the base tier to the cache tier.
  65   When data has been promoted from the base tier to the cache tier, the Ceph
  66   client is able to perform I/O operations on it using the cache tier. This is
  67   well-suited for mutable data (for example, photo/video editing, transactional
  68   data).
  69
  70 - **readproxy** mode: This mode will use any objects that already
  71   exist in the cache tier, but if an object is not present in the
  72   cache the request will be proxied to the base tier.  This is useful
  73   for transitioning from ``writeback`` mode to a disabled cache as it
  74   allows the workload to function properly while the cache is drained,
  75   without adding any new objects to the cache.
  76
  77 Other cache modes are:
  78
  79 - **readonly** promotes objects to the cache on read operations only; write
  80   operations are forwarded to the base tier. This mode is intended for
  81   read-only workloads that do not require consistency to be enforced by the
  82   storage system. (**Warning**: when objects are updated in the base tier,
  83   Ceph makes **no** attempt to sync these updates to the corresponding objects
  84   in the cache. Since this mode is considered experimental, a
  85   ``--yes-i-really-mean-it`` option must be passed in order to enable it.)
  86
  87 - **none** is used to completely disable caching.
  88
  89
  90 A word of caution
  91 =================
  92
  93 Cache tiering will *degrade* performance for most workloads.  Users should use
  94 extreme caution before using this feature.
  95
  96 * *Workload dependent*: Whether a cache will improve performance is
  97   highly dependent on the workload.  Because there is a cost
  98   associated with moving objects into or out of the cache, it can only
  99   be effective when there is a *large skew* in the access pattern in
 100   the data set, such that most of the requests touch a small number of
 101   objects.  The cache pool should be large enough to capture the
 102   working set for your workload to avoid thrashing.
 103
 104 * *Difficult to benchmark*: Most benchmarks that users run to measure
 105   performance will show terrible performance with cache tiering, in
 106   part because very few of them skew requests toward a small set of
 107   objects, it can take a long time for the cache to "warm up," and
 108   because the warm-up cost can be high.
 109
 110 * *Usually slower*: For workloads that are not cache tiering-friendly,
 111   performance is often slower than a normal RADOS pool without cache
 112   tiering enabled.
 113
 114 * *librados object enumeration*: The librados-level object enumeration
 115   API is not meant to be coherent in the presence of the case.  If
 116   your application is using librados directly and relies on object
 117   enumeration, cache tiering will probably not work as expected.
 118   (This is not a problem for RGW, RBD, or CephFS.)
 119
 120 * *Complexity*: Enabling cache tiering means that a lot of additional
 121   machinery and complexity within the RADOS cluster is being used.
 122   This increases the probability that you will encounter a bug in the system
 123   that other users have not yet encountered and will put your deployment at a
 124   higher level of risk.
 125
 126 Known Good Workloads
 127 --------------------
 128
 129 * *RGW time-skewed*: If the RGW workload is such that almost all read
 130   operations are directed at recently written objects, a simple cache
 131   tiering configuration that destages recently written objects from
 132   the cache to the base tier after a configurable period can work
 133   well.
 134
 135 Known Bad Workloads
 136 -------------------
 137
 138 The following configurations are *known to work poorly* with cache
 139 tiering.
 140
 141 * *RBD with replicated cache and erasure-coded base*: This is a common
 142   request, but usually does not perform well.  Even reasonably skewed
 143   workloads still send some small writes to cold objects, and because
 144   small writes are not yet supported by the erasure-coded pool, entire
 145   (usually 4 MB) objects must be migrated into the cache in order to
 146   satisfy a small (often 4 KB) write.  Only a handful of users have
 147   successfully deployed this configuration, and it only works for them
 148   because their data is extremely cold (backups) and they are not in
 149   any way sensitive to performance.
 150
 151 * *RBD with replicated cache and base*: RBD with a replicated base
 152   tier does better than when the base is erasure coded, but it is
 153   still highly dependent on the amount of skew in the workload, and
 154   very difficult to validate.  The user will need to have a good
 155   understanding of their workload and will need to tune the cache
 156   tiering parameters carefully.
 157
 158
 159 Setting Up Pools
 160 ================
 161
 162 To set up cache tiering, you must have two pools. One will act as the
 163 backing storage and the other will act as the cache.
 164
 165
 166 Setting Up a Backing Storage Pool
 167 ---------------------------------
 168
 169 Setting up a backing storage pool typically involves one of two scenarios:
 170
 171 - **Standard Storage**: In this scenario, the pool stores multiple copies
 172   of an object in the Ceph Storage Cluster.
 173
 174 - **Erasure Coding:** In this scenario, the pool uses erasure coding to
 175   store data much more efficiently with a small performance tradeoff.
 176
 177 In the standard storage scenario, you can setup a CRUSH rule to establish
 178 the failure domain (e.g., osd, host, chassis, rack, row, etc.). Ceph OSD
 179 Daemons perform optimally when all storage drives in the rule are of the
 180 same size, speed (both RPMs and throughput) and type. See `CRUSH Maps`_
 181 for details on creating a rule. Once you have created a rule, create
 182 a backing storage pool.
 183
 184 In the erasure coding scenario, the pool creation arguments will generate the
 185 appropriate rule automatically. See `Create a Pool`_ for details.
 186
 187 In subsequent examples, we will refer to the backing storage pool
 188 as ``cold-storage``.
 189
 190
 191 Setting Up a Cache Pool
 192 -----------------------
 193
 194 Setting up a cache pool follows the same procedure as the standard storage
 195 scenario, but with this difference: the drives for the cache tier are typically
 196 high performance drives that reside in their own servers and have their own
 197 CRUSH rule.  When setting up such a rule, it should take account of the hosts
 198 that have the high performance drives while omitting the hosts that don't. See
 199 :ref:`CRUSH Device Class<crush-map-device-class>` for details.
 200
 201
 202 In subsequent examples, we will refer to the cache pool as ``hot-storage`` and
 203 the backing pool as ``cold-storage``.
 204
 205 For cache tier configuration and default values, see
 206 `Pools - Set Pool Values`_.
 207
 208
 209 Creating a Cache Tier
 210 =====================
 211
 212 Setting up a cache tier involves associating a backing storage pool with
 213 a cache pool:
 214
 215 .. prompt:: bash $
 216
 217    ceph osd tier add {storagepool} {cachepool}
 218
 219 For example:
 220
 221 .. prompt:: bash $
 222
 223    ceph osd tier add cold-storage hot-storage
 224
 225 To set the cache mode, execute the following:
 226
 227 .. prompt:: bash $
 228
 229    ceph osd tier cache-mode {cachepool} {cache-mode}
 230
 231 For example:
 232
 233 .. prompt:: bash $
 234
 235    ceph osd tier cache-mode hot-storage writeback
 236
 237 The cache tiers overlay the backing storage tier, so they require one
 238 additional step: you must direct all client traffic from the storage pool to
 239 the cache pool. To direct client traffic directly to the cache pool, execute
 240 the following:
 241
 242 .. prompt:: bash $
 243
 244    ceph osd tier set-overlay {storagepool} {cachepool}
 245
 246 For example:
 247
 248 .. prompt:: bash $
 249
 250    ceph osd tier set-overlay cold-storage hot-storage
 251
 252
 253 Configuring a Cache Tier
 254 ========================
 255
 256 Cache tiers have several configuration options. You may set
 257 cache tier configuration options with the following usage:
 258
 259 .. prompt:: bash $
 260
 261    ceph osd pool set {cachepool} {key} {value}
 262
 263 See `Pools - Set Pool Values`_ for details.
 264
 265
 266 Target Size and Type
 267 --------------------
 268
 269 Ceph's production cache tiers use a `Bloom Filter`_ for the ``hit_set_type``:
 270
 271 .. prompt:: bash $
 272
 273    ceph osd pool set {cachepool} hit_set_type bloom
 274
 275 For example:
 276
 277 .. prompt:: bash $
 278
 279    ceph osd pool set hot-storage hit_set_type bloom
 280
 281 The ``hit_set_count`` and ``hit_set_period`` define how many such HitSets to
 282 store, and how much time each HitSet should cover:
 283
 284 .. prompt:: bash $
 285
 286    ceph osd pool set {cachepool} hit_set_count 12
 287    ceph osd pool set {cachepool} hit_set_period 14400
 288    ceph osd pool set {cachepool} target_max_bytes 1000000000000
 289
 290 .. note:: A larger ``hit_set_count`` results in more RAM consumed by
 291           the ``ceph-osd`` process.
 292
 293 Binning accesses over time allows Ceph to determine whether a Ceph client
 294 accessed an object at least once, or more than once over a time period
 295 ("age" vs "temperature").
 296
 297 The ``min_read_recency_for_promote`` defines how many HitSets to check for the
 298 existence of an object when handling a read operation. The checking result is
 299 used to decide whether to promote the object asynchronously. Its value should be
 300 between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted.
 301 If it's set to 1, the current HitSet is checked. And if this object is in the
 302 current HitSet, it's promoted. Otherwise not. For the other values, the exact
 303 number of archive HitSets are checked. The object is promoted if the object is
 304 found in any of the most recent ``min_read_recency_for_promote`` HitSets.
 305
 306 A similar parameter can be set for the write operation, which is
 307 ``min_write_recency_for_promote``:
 308
 309 .. prompt:: bash $
 310
 311    ceph osd pool set {cachepool} min_read_recency_for_promote 2
 312    ceph osd pool set {cachepool} min_write_recency_for_promote 2
 313
 314 .. note:: The longer the period and the higher the
 315    ``min_read_recency_for_promote`` and
 316    ``min_write_recency_for_promote``values, the more RAM the ``ceph-osd``
 317    daemon consumes. In particular, when the agent is active to flush
 318    or evict cache objects, all ``hit_set_count`` HitSets are loaded
 319    into RAM.
 320
 321
 322 Cache Sizing
 323 ------------
 324
 325 The cache tiering agent performs two main functions:
 326
 327 - **Flushing:** The agent identifies modified (or dirty) objects and forwards
 328   them to the storage pool for long-term storage.
 329
 330 - **Evicting:** The agent identifies objects that haven't been modified
 331   (or clean) and evicts the least recently used among them from the cache.
 332
 333
 334 Absolute Sizing
 335 ~~~~~~~~~~~~~~~
 336
 337 The cache tiering agent can flush or evict objects based upon the total number
 338 of bytes or the total number of objects. To specify a maximum number of bytes,
 339 execute the following:
 340
 341 .. prompt:: bash $
 342
 343    ceph osd pool set {cachepool} target_max_bytes {#bytes}
 344
 345 For example, to flush or evict at 1 TB, execute the following:
 346
 347 .. prompt:: bash $
 348
 349    ceph osd pool set hot-storage target_max_bytes 1099511627776
 350
 351 To specify the maximum number of objects, execute the following:
 352
 353 .. prompt:: bash $
 354
 355    ceph osd pool set {cachepool} target_max_objects {#objects}
 356
 357 For example, to flush or evict at 1M objects, execute the following:
 358
 359 .. prompt:: bash $
 360
 361    ceph osd pool set hot-storage target_max_objects 1000000
 362
 363 .. note:: Ceph is not able to determine the size of a cache pool automatically, so
 364    the configuration on the absolute size is required here, otherwise the
 365    flush/evict will not work. If you specify both limits, the cache tiering
 366    agent will begin flushing or evicting when either threshold is triggered.
 367
 368 .. note:: All client requests will be blocked only when  ``target_max_bytes`` or
 369    ``target_max_objects`` reached
 370
 371 Relative Sizing
 372 ~~~~~~~~~~~~~~~
 373
 374 The cache tiering agent can flush or evict objects relative to the size of the
 375 cache pool(specified by ``target_max_bytes`` / ``target_max_objects`` in
 376 `Absolute sizing`_).  When the cache pool consists of a certain percentage of
 377 modified (or dirty) objects, the cache tiering agent will flush them to the
 378 storage pool. To set the ``cache_target_dirty_ratio``, execute the following:
 379
 380 .. prompt:: bash $
 381
 382    ceph osd pool set {cachepool} cache_target_dirty_ratio {0.0..1.0}
 383
 384 For example, setting the value to ``0.4`` will begin flushing modified
 385 (dirty) objects when they reach 40% of the cache pool's capacity:
 386
 387 .. prompt:: bash $
 388
 389    ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
 390
 391 When the dirty objects reaches a certain percentage of its capacity, flush dirty
 392 objects with a higher speed. To set the ``cache_target_dirty_high_ratio``:
 393
 394 .. prompt:: bash $
 395
 396    ceph osd pool set {cachepool} cache_target_dirty_high_ratio {0.0..1.0}
 397
 398 For example, setting the value to ``0.6`` will begin aggressively flush dirty
 399 objects when they reach 60% of the cache pool's capacity. obviously, we'd
 400 better set the value between dirty_ratio and full_ratio:
 401
 402 .. prompt:: bash $
 403
 404    ceph osd pool set hot-storage cache_target_dirty_high_ratio 0.6
 405
 406 When the cache pool reaches a certain percentage of its capacity, the cache
 407 tiering agent will evict objects to maintain free capacity. To set the
 408 ``cache_target_full_ratio``, execute the following:
 409
 410 .. prompt:: bash $
 411
 412    ceph osd pool set {cachepool} cache_target_full_ratio {0.0..1.0}
 413
 414 For example, setting the value to ``0.8`` will begin flushing unmodified
 415 (clean) objects when they reach 80% of the cache pool's capacity:
 416
 417 .. prompt:: bash $
 418
 419    ceph osd pool set hot-storage cache_target_full_ratio 0.8
 420
 421
 422 Cache Age
 423 ---------
 424
 425 You can specify the minimum age of an object before the cache tiering agent
 426 flushes a recently modified (or dirty) object to the backing storage pool:
 427
 428 .. prompt:: bash $
 429
 430    ceph osd pool set {cachepool} cache_min_flush_age {#seconds}
 431
 432 For example, to flush modified (or dirty) objects after 10 minutes, execute the
 433 following:
 434
 435 .. prompt:: bash $
 436
 437    ceph osd pool set hot-storage cache_min_flush_age 600
 438
 439 You can specify the minimum age of an object before it will be evicted from the
 440 cache tier:
 441
 442 .. prompt:: bash $
 443
 444    ceph osd pool {cache-tier} cache_min_evict_age {#seconds}
 445
 446 For example, to evict objects after 30 minutes, execute the following:
 447
 448 .. prompt:: bash $
 449
 450    ceph osd pool set hot-storage cache_min_evict_age 1800
 451
 452
 453 Removing a Cache Tier
 454 =====================
 455
 456 Removing a cache tier differs depending on whether it is a writeback
 457 cache or a read-only cache.
 458
 459
 460 Removing a Read-Only Cache
 461 --------------------------
 462
 463 Since a read-only cache does not have modified data, you can disable
 464 and remove it without losing any recent changes to objects in the cache.
 465
 466 #. Change the cache-mode to ``none`` to disable it.:
 467
 468    .. prompt:: bash
 469
 470       ceph osd tier cache-mode {cachepool} none
 471
 472    For example:
 473
 474    .. prompt:: bash $
 475
 476       ceph osd tier cache-mode hot-storage none
 477
 478 #. Remove the cache pool from the backing pool.:
 479
 480    .. prompt:: bash $
 481
 482       ceph osd tier remove {storagepool} {cachepool}
 483
 484    For example:
 485
 486    .. prompt:: bash $
 487
 488       ceph osd tier remove cold-storage hot-storage
 489
 490
 491 Removing a Writeback Cache
 492 --------------------------
 493
 494 Since a writeback cache may have modified data, you must take steps to ensure
 495 that you do not lose any recent changes to objects in the cache before you
 496 disable and remove it.
 497
 498
 499 #. Change the cache mode to ``proxy`` so that new and modified objects will
 500    flush to the backing storage pool.:
 501
 502    .. prompt:: bash $
 503
 504       ceph osd tier cache-mode {cachepool} proxy
 505
 506    For example:
 507
 508    .. prompt:: bash $
 509
 510       ceph osd tier cache-mode hot-storage proxy
 511
 512
 513 #. Ensure that the cache pool has been flushed. This may take a few minutes:
 514
 515    .. prompt:: bash $
 516
 517       rados -p {cachepool} ls
 518
 519    If the cache pool still has objects, you can flush them manually.
 520    For example:
 521
 522    .. prompt:: bash $
 523
 524       rados -p {cachepool} cache-flush-evict-all
 525
 526
 527 #. Remove the overlay so that clients will not direct traffic to the cache.:
 528
 529    .. prompt:: bash $
 530
 531       ceph osd tier remove-overlay {storagetier}
 532
 533    For example:
 534
 535    .. prompt:: bash $
 536
 537       ceph osd tier remove-overlay cold-storage
 538
 539
 540 #. Finally, remove the cache tier pool from the backing storage pool.:
 541
 542    .. prompt:: bash $
 543
 544       ceph osd tier remove {storagepool} {cachepool}
 545
 546    For example:
 547
 548    .. prompt:: bash $
 549
 550       ceph osd tier remove cold-storage hot-storage
 551
 552
 553 .. _Create a Pool: ../pools#create-a-pool
 554 .. _Pools - Set Pool Values: ../pools#set-pool-values
 555 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
 556 .. _CRUSH Maps: ../crush-map
 557 .. _Absolute Sizing: #absolute-sizing