=======================
- librbd Settings
+ Config Settings
=======================
See `Block Device`_ for additional details.
+Generic IO Settings
+===================
+
+.. confval:: rbd_compression_hint
+.. confval:: rbd_read_from_replica_policy
+.. confval:: rbd_default_order
+
Cache Settings
=======================
.. sidebar:: Kernel Caching
- The kernel driver for Ceph block devices can use the Linux page cache to
+ The kernel driver for Ceph block devices can use the Linux page cache to
improve performance.
The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
written to the OSDs. This means that using write-back caching is just as safe as
using a well-behaved physical hard disk with a VM that properly sends flushes
(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
-algorithm, and in write-back mode it can coalesce contiguous requests for
+algorithm, and in write-back mode it can coalesce contiguous requests for
better throughput.
-.. versionadded:: 0.46
-
-Ceph supports write-back caching for RBD. To enable it, add ``rbd cache =
-true`` to the ``[client]`` section of your ``ceph.conf`` file. By default
-``librbd`` does not perform any caching. Writes and reads go directly to the
-storage cluster, and writes return only when the data is on disk on all
-replicas. With caching enabled, writes return immediately, unless there are more
-than ``rbd cache max dirty`` unflushed bytes. In this case, the write triggers
-writeback and blocks until enough bytes are flushed.
-
-.. versionadded:: 0.47
-
-Ceph supports write-through caching for RBD. You can set the size of
-the cache, and you can set targets and limits to switch from
-write-back caching to write through caching. To enable write-through
-mode, set ``rbd cache max dirty`` to 0. This means writes return only
-when the data is on disk on all replicas, but reads may come from the
-cache. The cache is in memory on the client, and each RBD image has
-its own. Since the cache is local to the client, there's no coherency
-if there are others accessing the image. Running GFS or OCFS on top of
-RBD will not work with caching enabled.
-
-The ``ceph.conf`` file settings for RBD should be set in the ``[client]``
-section of your configuration file. The settings include:
-
-
-``rbd cache``
-
-:Description: Enable caching for RADOS Block Device (RBD).
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``rbd cache size``
-
-:Description: The RBD cache size in bytes.
-:Type: 64-bit Integer
-:Required: No
-:Default: ``32 MiB``
-
-
-``rbd cache max dirty``
-
-:Description: The ``dirty`` limit in bytes at which the cache triggers write-back. If ``0``, uses write-through caching.
-:Type: 64-bit Integer
-:Required: No
-:Constraint: Must be less than ``rbd cache size``.
-:Default: ``24 MiB``
-
-
-``rbd cache target dirty``
-
-:Description: The ``dirty target`` before the cache begins writing data to the data storage. Does not block writes to the cache.
-:Type: 64-bit Integer
-:Required: No
-:Constraint: Must be less than ``rbd cache max dirty``.
-:Default: ``16 MiB``
-
-
-``rbd cache max dirty age``
-
-:Description: The number of seconds dirty data is in the cache before writeback starts.
-:Type: Float
-:Required: No
-:Default: ``1.0``
-
-.. versionadded:: 0.60
-
-``rbd cache writethrough until flush``
-
-:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.
-:Type: Boolean
-:Required: No
-:Default: ``true``
+The librbd cache is enabled by default and supports three different cache
+policies: write-around, write-back, and write-through. Writes return
+immediately under both the write-around and write-back policies, unless there
+are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster.
+The write-around policy differs from the write-back policy in that it does
+not attempt to service read requests from the cache, unlike the write-back
+policy, and is therefore faster for high performance write workloads. Under the
+write-through policy, writes return only when the data is on disk on all
+replicas, but reads may come from the cache.
+
+Prior to receiving a flush request, the cache behaves like a write-through cache
+to ensure safe operation for older operating systems that do not send flushes to
+ensure crash consistent behavior.
+
+If the librbd cache is disabled, writes and
+reads go directly to the storage cluster, and writes return only when the data
+is on disk on all replicas.
+
+.. note::
+ The cache is in memory on the client, and each RBD image has
+ its own. Since the cache is local to the client, there's no coherency
+ if there are others accessing the image. Running GFS or OCFS on top of
+ RBD will not work with caching enabled.
+
+
+Option settings for RBD should be set in the ``[client]``
+section of your configuration file or the central config store. These settings
+include:
+
+.. confval:: rbd_cache
+.. confval:: rbd_cache_policy
+.. confval:: rbd_cache_writethrough_until_flush
+.. confval:: rbd_cache_size
+.. confval:: rbd_cache_max_dirty
+.. confval:: rbd_cache_target_dirty
+.. confval:: rbd_cache_max_dirty_age
.. _Block Device: ../../rbd
Read-ahead Settings
=======================
-.. versionadded:: 0.86
-
-RBD supports read-ahead/prefetching to optimize small, sequential reads.
+librbd supports read-ahead/prefetching to optimize small, sequential reads.
This should normally be handled by the guest OS in the case of a VM,
-but boot loaders may not issue efficient reads.
-Read-ahead is automatically disabled if caching is disabled.
-
-
-``rbd readahead trigger requests``
-
-:Description: Number of sequential read requests necessary to trigger read-ahead.
-:Type: Integer
-:Required: No
-:Default: ``10``
-
-
-``rbd readahead max bytes``
-
-:Description: Maximum size of a read-ahead request. If zero, read-ahead is disabled.
-:Type: 64-bit Integer
-:Required: No
-:Default: ``512 KiB``
-
-
-``rbd readahead disable after bytes``
+but boot loaders may not issue efficient reads. Read-ahead is automatically
+disabled if caching is disabled or if the policy is write-around.
+
+
+.. confval:: rbd_readahead_trigger_requests
+.. confval:: rbd_readahead_max_bytes
+.. confval:: rbd_readahead_disable_after_bytes
+
+Image Features
+==============
+
+RBD supports advanced features which can be specified via the command line when
+creating images or the default features can be configured via
+``rbd_default_features = <sum of feature numeric values>`` or
+``rbd_default_features = <comma-delimited list of CLI values>``.
+
+``Layering``
+
+:Description: Layering enables cloning.
+:Internal value: 1
+:CLI value: layering
+:Added in: v0.52 (Bobtail)
+:KRBD support: since v3.10
+:Default: yes
+
+``Striping v2``
+
+:Description: Striping spreads data across multiple objects. Striping helps with
+ parallelism for sequential read/write workloads.
+:Internal value: 2
+:CLI value: striping
+:Added in: v0.55 (Bobtail)
+:KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17)
+:Default: yes
+
+``Exclusive locking``
+
+:Description: When enabled, it requires a client to acquire a lock on an object
+ before making a write. Exclusive lock should only be enabled when
+ a single client is accessing an image at any given time.
+:Internal value: 4
+:CLI value: exclusive-lock
+:Added in: v0.92 (Hammer)
+:KRBD support: since v4.9
+:Default: yes
-:Description: After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled.
-:Type: 64-bit Integer
-:Required: No
-:Default: ``50 MiB``
+``Object map``
+
+:Description: Object map support depends on exclusive lock support. Block
+ devices are thin provisioned, which means that they only store
+ data that actually has been written, ie. they are *sparse*. Object
+ map support helps track which objects actually exist (have data
+ stored on a device). Enabling object map support speeds up I/O
+ operations for cloning, importing and exporting a sparsely
+ populated image, and deleting.
+:Internal value: 8
+:CLI value: object-map
+:Added in: v0.93 (Hammer)
+:KRBD support: since v5.3
+:Default: yes
+
+
+``Fast-diff``
+
+:Description: Fast-diff support depends on object map support and exclusive lock
+ support. It adds another property to the object map, which makes
+ it much faster to generate diffs between snapshots of an image.
+ It is also much faster to calculate the actual data usage of a
+ snapshot or volume (``rbd du``).
+:Internal value: 16
+:CLI value: fast-diff
+:Added in: v9.0.1 (Infernalis)
+:KRBD support: since v5.3
+:Default: yes
+
+
+``Deep-flatten``
+
+:Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of
+ an image, in addition to the image itself. Without it, snapshots
+ of an image will still rely on the parent, so the parent cannot be
+ deleted until the snapshots are first deleted. Deep-flatten makes
+ a parent independent of its clones, even if they have snapshots,
+ at the expense of using additional OSD device space.
+:Internal value: 32
+:CLI value: deep-flatten
+:Added in: v9.0.2 (Infernalis)
+:KRBD support: since v5.1
+:Default: yes
+
+
+``Journaling``
+
+:Description: Journaling support depends on exclusive lock support. Journaling
+ records all modifications to an image in the order they occur. RBD
+ mirroring can utilize the journal to replicate a crash-consistent
+ image to a remote cluster. It is best to let ``rbd-mirror``
+ manage this feature only as needed, as enabling it long term may
+ result in substantial additional OSD space consumption.
+:Internal value: 64
+:CLI value: journaling
+:Added in: v10.0.1 (Jewel)
+:KRBD support: no
+:Default: no
+
+
+``Data pool``
+
+:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata.
+:Internal value: 128
+:Added in: v11.1.0 (Kraken)
+:KRBD support: since v4.11
+:Default: no
+
+
+``Operations``
+
+:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create).
+:Internal value: 256
+:Added in: v13.0.2 (Mimic)
+:KRBD support: since v4.16
+
+
+``Migrating``
+
+:Description: Used to restrict older clients from opening an image when it is in migration state.
+:Internal value: 512
+:Added in: v14.0.1 (Nautilus)
+:KRBD support: no
+
+``Non-primary``
+
+:Description: Used to restrict changes to non-primary images using snapshot-based mirroring.
+:Internal value: 1024
+:Added in: v15.2.0 (Octopus)
+:KRBD support: no
+
+
+QoS Settings
+============
+
+librbd supports limiting per-image IO in several ways. These all apply
+to a given image within a given process - the same image used in
+multiple places, e.g. two separate VMs, would have independent limits.
+
+* **IOPS:** number of I/Os per second (any type of I/O)
+* **read IOPS:** number of read I/Os per second
+* **write IOPS:** number of write I/Os per second
+* **bps:** bytes per second (any type of I/O)
+* **read bps:** bytes per second read
+* **write bps:** bytes per second written
+
+Each of these limits operates independently of each other. They are
+all off by default. Every type of limit throttles I/O using a token
+bucket algorithm, with the ability to configure the limit (average
+speed over time) and potential for a higher rate (a burst) for a short
+period of time (burst_seconds). When any of these limits is reached,
+and there is no burst capacity left, librbd reduces the rate of that
+type of I/O to the limit.
+
+For example, if a read bps limit of 100MB was configured, but writes
+were not limited, writes could proceed as quickly as possible, while
+reads would be throttled to 100MB/s on average. If a read bps burst of
+150MB was set, and read burst seconds was set to five seconds, reads
+could proceed at 150MB/s for up to five seconds before dropping back
+to the 100MB/s limit.
+
+The following options configure these throttles:
+
+.. confval:: rbd_qos_iops_limit
+.. confval:: rbd_qos_iops_burst
+.. confval:: rbd_qos_iops_burst_seconds
+.. confval:: rbd_qos_read_iops_limit
+.. confval:: rbd_qos_read_iops_burst
+.. confval:: rbd_qos_read_iops_burst_seconds
+.. confval:: rbd_qos_write_iops_limit
+.. confval:: rbd_qos_write_iops_burst
+.. confval:: rbd_qos_write_iops_burst_seconds
+.. confval:: rbd_qos_bps_limit
+.. confval:: rbd_qos_bps_burst
+.. confval:: rbd_qos_bps_burst_seconds
+.. confval:: rbd_qos_read_bps_limit
+.. confval:: rbd_qos_read_bps_burst
+.. confval:: rbd_qos_read_bps_burst_seconds
+.. confval:: rbd_qos_write_bps_limit
+.. confval:: rbd_qos_write_bps_burst
+.. confval:: rbd_qos_write_bps_burst_seconds
+.. confval:: rbd_qos_schedule_tick_min
+.. confval:: rbd_qos_exclude_ops