X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=ceph%2Fdoc%2Frbd%2Frbd-config-ref.rst;h=c21731adca8a925154da272ba77a7af03737dc1b;hb=20effc670b57271cb089376d6d0800990e5218d5;hp=db942f88c786b251884e275847679de73faded08;hpb=d2e6a577eb19928d58b31d1b6e096ca0f03c4052;p=ceph.git diff --git a/ceph/doc/rbd/rbd-config-ref.rst b/ceph/doc/rbd/rbd-config-ref.rst index db942f88c..c21731adc 100644 --- a/ceph/doc/rbd/rbd-config-ref.rst +++ b/ceph/doc/rbd/rbd-config-ref.rst @@ -1,15 +1,22 @@ ======================= - librbd Settings + Config Settings ======================= See `Block Device`_ for additional details. +Generic IO Settings +=================== + +.. confval:: rbd_compression_hint +.. confval:: rbd_read_from_replica_policy +.. confval:: rbd_default_order + Cache Settings ======================= .. sidebar:: Kernel Caching - The kernel driver for Ceph block devices can use the Linux page cache to + The kernel driver for Ceph block devices can use the Linux page cache to improve performance. The user space implementation of the Ceph block device (i.e., ``librbd``) cannot @@ -19,84 +26,45 @@ disk caching. When the OS sends a barrier or a flush request, all dirty data is written to the OSDs. This means that using write-back caching is just as safe as using a well-behaved physical hard disk with a VM that properly sends flushes (i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) -algorithm, and in write-back mode it can coalesce contiguous requests for +algorithm, and in write-back mode it can coalesce contiguous requests for better throughput. -.. versionadded:: 0.46 - -Ceph supports write-back caching for RBD. To enable it, add ``rbd cache = -true`` to the ``[client]`` section of your ``ceph.conf`` file. By default -``librbd`` does not perform any caching. Writes and reads go directly to the -storage cluster, and writes return only when the data is on disk on all -replicas. With caching enabled, writes return immediately, unless there are more -than ``rbd cache max dirty`` unflushed bytes. In this case, the write triggers -writeback and blocks until enough bytes are flushed. - -.. versionadded:: 0.47 - -Ceph supports write-through caching for RBD. You can set the size of -the cache, and you can set targets and limits to switch from -write-back caching to write through caching. To enable write-through -mode, set ``rbd cache max dirty`` to 0. This means writes return only -when the data is on disk on all replicas, but reads may come from the -cache. The cache is in memory on the client, and each RBD image has -its own. Since the cache is local to the client, there's no coherency -if there are others accessing the image. Running GFS or OCFS on top of -RBD will not work with caching enabled. - -The ``ceph.conf`` file settings for RBD should be set in the ``[client]`` -section of your configuration file. The settings include: - - -``rbd cache`` - -:Description: Enable caching for RADOS Block Device (RBD). -:Type: Boolean -:Required: No -:Default: ``true`` - - -``rbd cache size`` - -:Description: The RBD cache size in bytes. -:Type: 64-bit Integer -:Required: No -:Default: ``32 MiB`` - - -``rbd cache max dirty`` - -:Description: The ``dirty`` limit in bytes at which the cache triggers write-back. If ``0``, uses write-through caching. -:Type: 64-bit Integer -:Required: No -:Constraint: Must be less than ``rbd cache size``. -:Default: ``24 MiB`` - - -``rbd cache target dirty`` - -:Description: The ``dirty target`` before the cache begins writing data to the data storage. Does not block writes to the cache. -:Type: 64-bit Integer -:Required: No -:Constraint: Must be less than ``rbd cache max dirty``. -:Default: ``16 MiB`` - - -``rbd cache max dirty age`` - -:Description: The number of seconds dirty data is in the cache before writeback starts. -:Type: Float -:Required: No -:Default: ``1.0`` - -.. versionadded:: 0.60 - -``rbd cache writethrough until flush`` - -:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32. -:Type: Boolean -:Required: No -:Default: ``true`` +The librbd cache is enabled by default and supports three different cache +policies: write-around, write-back, and write-through. Writes return +immediately under both the write-around and write-back policies, unless there +are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster. +The write-around policy differs from the write-back policy in that it does +not attempt to service read requests from the cache, unlike the write-back +policy, and is therefore faster for high performance write workloads. Under the +write-through policy, writes return only when the data is on disk on all +replicas, but reads may come from the cache. + +Prior to receiving a flush request, the cache behaves like a write-through cache +to ensure safe operation for older operating systems that do not send flushes to +ensure crash consistent behavior. + +If the librbd cache is disabled, writes and +reads go directly to the storage cluster, and writes return only when the data +is on disk on all replicas. + +.. note:: + The cache is in memory on the client, and each RBD image has + its own. Since the cache is local to the client, there's no coherency + if there are others accessing the image. Running GFS or OCFS on top of + RBD will not work with caching enabled. + + +Option settings for RBD should be set in the ``[client]`` +section of your configuration file or the central config store. These settings +include: + +.. confval:: rbd_cache +.. confval:: rbd_cache_policy +.. confval:: rbd_cache_writethrough_until_flush +.. confval:: rbd_cache_size +.. confval:: rbd_cache_max_dirty +.. confval:: rbd_cache_target_dirty +.. confval:: rbd_cache_max_dirty_age .. _Block Device: ../../rbd @@ -104,33 +72,194 @@ section of your configuration file. The settings include: Read-ahead Settings ======================= -.. versionadded:: 0.86 - -RBD supports read-ahead/prefetching to optimize small, sequential reads. +librbd supports read-ahead/prefetching to optimize small, sequential reads. This should normally be handled by the guest OS in the case of a VM, -but boot loaders may not issue efficient reads. -Read-ahead is automatically disabled if caching is disabled. - - -``rbd readahead trigger requests`` - -:Description: Number of sequential read requests necessary to trigger read-ahead. -:Type: Integer -:Required: No -:Default: ``10`` - - -``rbd readahead max bytes`` - -:Description: Maximum size of a read-ahead request. If zero, read-ahead is disabled. -:Type: 64-bit Integer -:Required: No -:Default: ``512 KiB`` - - -``rbd readahead disable after bytes`` +but boot loaders may not issue efficient reads. Read-ahead is automatically +disabled if caching is disabled or if the policy is write-around. + + +.. confval:: rbd_readahead_trigger_requests +.. confval:: rbd_readahead_max_bytes +.. confval:: rbd_readahead_disable_after_bytes + +Image Features +============== + +RBD supports advanced features which can be specified via the command line when +creating images or the default features can be configured via +``rbd_default_features = `` or +``rbd_default_features = ``. + +``Layering`` + +:Description: Layering enables cloning. +:Internal value: 1 +:CLI value: layering +:Added in: v0.52 (Bobtail) +:KRBD support: since v3.10 +:Default: yes + +``Striping v2`` + +:Description: Striping spreads data across multiple objects. Striping helps with + parallelism for sequential read/write workloads. +:Internal value: 2 +:CLI value: striping +:Added in: v0.55 (Bobtail) +:KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17) +:Default: yes + +``Exclusive locking`` + +:Description: When enabled, it requires a client to acquire a lock on an object + before making a write. Exclusive lock should only be enabled when + a single client is accessing an image at any given time. +:Internal value: 4 +:CLI value: exclusive-lock +:Added in: v0.92 (Hammer) +:KRBD support: since v4.9 +:Default: yes -:Description: After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled. -:Type: 64-bit Integer -:Required: No -:Default: ``50 MiB`` +``Object map`` + +:Description: Object map support depends on exclusive lock support. Block + devices are thin provisioned, which means that they only store + data that actually has been written, ie. they are *sparse*. Object + map support helps track which objects actually exist (have data + stored on a device). Enabling object map support speeds up I/O + operations for cloning, importing and exporting a sparsely + populated image, and deleting. +:Internal value: 8 +:CLI value: object-map +:Added in: v0.93 (Hammer) +:KRBD support: since v5.3 +:Default: yes + + +``Fast-diff`` + +:Description: Fast-diff support depends on object map support and exclusive lock + support. It adds another property to the object map, which makes + it much faster to generate diffs between snapshots of an image. + It is also much faster to calculate the actual data usage of a + snapshot or volume (``rbd du``). +:Internal value: 16 +:CLI value: fast-diff +:Added in: v9.0.1 (Infernalis) +:KRBD support: since v5.3 +:Default: yes + + +``Deep-flatten`` + +:Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of + an image, in addition to the image itself. Without it, snapshots + of an image will still rely on the parent, so the parent cannot be + deleted until the snapshots are first deleted. Deep-flatten makes + a parent independent of its clones, even if they have snapshots, + at the expense of using additional OSD device space. +:Internal value: 32 +:CLI value: deep-flatten +:Added in: v9.0.2 (Infernalis) +:KRBD support: since v5.1 +:Default: yes + + +``Journaling`` + +:Description: Journaling support depends on exclusive lock support. Journaling + records all modifications to an image in the order they occur. RBD + mirroring can utilize the journal to replicate a crash-consistent + image to a remote cluster. It is best to let ``rbd-mirror`` + manage this feature only as needed, as enabling it long term may + result in substantial additional OSD space consumption. +:Internal value: 64 +:CLI value: journaling +:Added in: v10.0.1 (Jewel) +:KRBD support: no +:Default: no + + +``Data pool`` + +:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata. +:Internal value: 128 +:Added in: v11.1.0 (Kraken) +:KRBD support: since v4.11 +:Default: no + + +``Operations`` + +:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create). +:Internal value: 256 +:Added in: v13.0.2 (Mimic) +:KRBD support: since v4.16 + + +``Migrating`` + +:Description: Used to restrict older clients from opening an image when it is in migration state. +:Internal value: 512 +:Added in: v14.0.1 (Nautilus) +:KRBD support: no + +``Non-primary`` + +:Description: Used to restrict changes to non-primary images using snapshot-based mirroring. +:Internal value: 1024 +:Added in: v15.2.0 (Octopus) +:KRBD support: no + + +QoS Settings +============ + +librbd supports limiting per-image IO in several ways. These all apply +to a given image within a given process - the same image used in +multiple places, e.g. two separate VMs, would have independent limits. + +* **IOPS:** number of I/Os per second (any type of I/O) +* **read IOPS:** number of read I/Os per second +* **write IOPS:** number of write I/Os per second +* **bps:** bytes per second (any type of I/O) +* **read bps:** bytes per second read +* **write bps:** bytes per second written + +Each of these limits operates independently of each other. They are +all off by default. Every type of limit throttles I/O using a token +bucket algorithm, with the ability to configure the limit (average +speed over time) and potential for a higher rate (a burst) for a short +period of time (burst_seconds). When any of these limits is reached, +and there is no burst capacity left, librbd reduces the rate of that +type of I/O to the limit. + +For example, if a read bps limit of 100MB was configured, but writes +were not limited, writes could proceed as quickly as possible, while +reads would be throttled to 100MB/s on average. If a read bps burst of +150MB was set, and read burst seconds was set to five seconds, reads +could proceed at 150MB/s for up to five seconds before dropping back +to the 100MB/s limit. + +The following options configure these throttles: + +.. confval:: rbd_qos_iops_limit +.. confval:: rbd_qos_iops_burst +.. confval:: rbd_qos_iops_burst_seconds +.. confval:: rbd_qos_read_iops_limit +.. confval:: rbd_qos_read_iops_burst +.. confval:: rbd_qos_read_iops_burst_seconds +.. confval:: rbd_qos_write_iops_limit +.. confval:: rbd_qos_write_iops_burst +.. confval:: rbd_qos_write_iops_burst_seconds +.. confval:: rbd_qos_bps_limit +.. confval:: rbd_qos_bps_burst +.. confval:: rbd_qos_bps_burst_seconds +.. confval:: rbd_qos_read_bps_limit +.. confval:: rbd_qos_read_bps_burst +.. confval:: rbd_qos_read_bps_burst_seconds +.. confval:: rbd_qos_write_bps_limit +.. confval:: rbd_qos_write_bps_burst +.. confval:: rbd_qos_write_bps_burst_seconds +.. confval:: rbd_qos_schedule_tick_min +.. confval:: rbd_qos_exclude_ops