6 Pools are logical partitions that are used to store objects.
10 - **Resilience**: It is possible to set the number of OSDs that are allowed to
11 fail without any data being lost. If your cluster uses replicated pools, the
12 number of OSDs that can fail without data loss is equal to the number of
15 For example: a typical configuration stores an object and two replicas
16 (copies) of each RADOS object (that is: ``size = 3``), but you can configure
17 the number of replicas on a per-pool basis. For `erasure-coded pools
18 <../erasure-code>`_, resilience is defined as the number of coding chunks
19 (for example, ``m = 2`` in the default **erasure code profile**).
21 - **Placement Groups**: You can set the number of placement groups (PGs) for
22 the pool. In a typical configuration, the target number of PGs is
23 approximately one hundred PGs per OSD. This provides reasonable balancing
24 without consuming excessive computing resources. When setting up multiple
25 pools, be careful to set an appropriate number of PGs for each pool and for
26 the cluster as a whole. Each PG belongs to a specific pool: when multiple
27 pools use the same OSDs, make sure that the **sum** of PG replicas per OSD is
28 in the desired PG-per-OSD target range. To calculate an appropriate number of
29 PGs for your pools, use the `pgcalc`_ tool.
31 - **CRUSH Rules**: When data is stored in a pool, the placement of the object
32 and its replicas (or chunks, in the case of erasure-coded pools) in your
33 cluster is governed by CRUSH rules. Custom CRUSH rules can be created for a
34 pool if the default rule does not fit your use case.
36 - **Snapshots**: The command ``ceph osd pool mksnap`` creates a snapshot of a
42 Pool names beginning with ``.`` are reserved for use by Ceph's internal
43 operations. Do not create or manipulate pools with these names.
49 To list your cluster's pools, run the following command:
60 Before creating a pool, consult `Pool, PG and CRUSH Config Reference`_. Your
61 Ceph configuration file contains a setting (namely, ``pg_num``) that determines
62 the number of PGs. However, this setting's default value is NOT appropriate
63 for most systems. In most cases, you should override this default value when
64 creating your pool. For details on PG numbers, see `setting the number of
71 osd_pool_default_pg_num = 128
72 osd_pool_default_pgp_num = 128
74 .. note:: In Luminous and later releases, each pool must be associated with the
75 application that will be using the pool. For more information, see
76 `Associating a Pool with an Application`_ below.
78 To create a pool, run one of the following commands:
82 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
83 [crush-rule-name] [expected-num-objects]
89 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
90 [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
92 For a brief description of the elements of the above commands, consult the
95 .. describe:: {pool-name}
97 The name of the pool. It must be unique.
102 .. describe:: {pg-num}
104 The total number of PGs in the pool. For details on calculating an
105 appropriate number, see :ref:`placement groups`. The default value ``8`` is
106 NOT suitable for most systems.
112 .. describe:: {pgp-num}
114 The total number of PGs for placement purposes. This **should be equal to
115 the total number of PGs**, except briefly while ``pg_num`` is being
116 increased or decreased.
119 :Required: Yes. If no value has been specified in the command, then the default value is used (unless a different value has been set in Ceph configuration).
122 .. describe:: {replicated|erasure}
124 The pool type. This can be either **replicated** (to recover from lost OSDs
125 by keeping multiple copies of the objects) or **erasure** (to achieve a kind
126 of `generalized parity RAID <../erasure-code>`_ capability). The
127 **replicated** pools require more raw storage but can implement all Ceph
128 operations. The **erasure** pools require less raw storage but can perform
129 only some Ceph tasks and may provide decreased performance.
135 .. describe:: [crush-rule-name]
137 The name of the CRUSH rule to use for this pool. The specified rule must
138 exist; otherwise the command will fail.
142 :Default: For **replicated** pools, it is the rule specified by the :confval:`osd_pool_default_crush_rule` configuration variable. This rule must exist. For **erasure** pools, it is the ``erasure-code`` rule if the ``default`` `erasure code profile`_ is used or the ``{pool-name}`` rule if not. This rule will be created implicitly if it doesn't already exist.
144 .. describe:: [erasure-code-profile=profile]
146 For **erasure** pools only. Instructs Ceph to use the specified `erasure
147 code profile`_. This profile must be an existing profile as defined by **osd
148 erasure-code-profile set**.
153 .. _erasure code profile: ../erasure-code-profile
155 .. describe:: --autoscale-mode=<on,off,warn>
157 - ``on``: the Ceph cluster will autotune or recommend changes to the number of PGs in your pool based on actual usage.
158 - ``warn``: the Ceph cluster will autotune or recommend changes to the number of PGs in your pool based on actual usage.
159 - ``off``: refer to :ref:`placement groups` for more information.
163 :Default: The default behavior is determined by the :confval:`osd_pool_default_pg_autoscale_mode` option.
165 .. describe:: [expected-num-objects]
167 The expected number of RADOS objects for this pool. By setting this value and
168 assigning a negative value to **filestore merge threshold**, you arrange
169 for the PG folder splitting to occur at the time of pool creation and
170 avoid the latency impact that accompanies runtime folder splitting.
174 :Default: 0, no splitting at the time of pool creation.
176 .. _associate-pool-to-application:
178 Associating a Pool with an Application
179 ======================================
181 Pools need to be associated with an application before they can be used. Pools
182 that are intended for use with CephFS and pools that are created automatically
183 by RGW are associated automatically. Pools that are intended for use with RBD
184 should be initialized with the ``rbd`` tool (see `Block Device Commands`_ for
187 For other cases, you can manually associate a free-form application name to a
188 pool by running the following command.:
192 ceph osd pool application enable {pool-name} {application-name}
194 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
195 application name ``rbd``, and RGW uses the application name ``rgw``.
200 To set pool quotas for the maximum number of bytes and/or the maximum number of
201 RADOS objects per pool, run the following command:
205 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
211 ceph osd pool set-quota data max_objects 10000
213 To remove a quota, set its value to ``0``.
219 To delete a pool, run a command of the following form:
223 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
225 To remove a pool, you must set the ``mon_allow_pool_delete`` flag to ``true``
226 in the monitor's configuration. Otherwise, monitors will refuse to remove
229 For more information, see `Monitor Configuration`_.
231 .. _Monitor Configuration: ../../configuration/mon-config-ref
233 If there are custom rules for a pool that is no longer needed, consider
234 deleting those rules.
238 ceph osd pool get {pool-name} crush_rule
240 For example, if the custom rule is "123", check all pools to see whether they
241 use the rule by running the following command:
245 ceph osd dump | grep "^pool" | grep "crush_rule 123"
247 If no pools use this custom rule, then it is safe to delete the rule from the
250 Similarly, if there are users with permissions restricted to a pool that no
251 longer exists, consider deleting those users by running commands of the
256 ceph auth ls | grep -C 5 {pool-name}
263 To rename a pool, run a command of the following form:
267 ceph osd pool rename {current-pool-name} {new-pool-name}
269 If you rename a pool for which an authenticated user has per-pool capabilities,
270 you must update the user's capabilities ("caps") to refer to the new pool name.
273 Showing Pool Statistics
274 =======================
276 To show a pool's utilization statistics, run the following command:
282 To obtain I/O information for a specific pool or for all pools, run a command
283 of the following form:
287 ceph osd pool stats [{pool-name}]
290 Making a Snapshot of a Pool
291 ===========================
293 To make a snapshot of a pool, run a command of the following form:
297 ceph osd pool mksnap {pool-name} {snap-name}
299 Removing a Snapshot of a Pool
300 =============================
302 To remove a snapshot of a pool, run a command of the following form:
306 ceph osd pool rmsnap {pool-name} {snap-name}
313 To assign values to a pool's configuration keys, run a command of the following
318 ceph osd pool set {pool-name} {key} {value}
320 You may set values for the following keys:
322 .. _compression_algorithm:
324 .. describe:: compression_algorithm
326 :Description: Sets the inline compression algorithm used in storing data on the underlying BlueStore back end. This key's setting overrides the global setting :confval:`bluestore_compression_algorithm`.
328 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
330 .. describe:: compression_mode
332 :Description: Sets the policy for the inline compression algorithm used in storing data on the underlying BlueStore back end. This key's setting overrides the global setting :confval:`bluestore_compression_mode`.
334 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
336 .. describe:: compression_min_blob_size
339 :Description: Sets the minimum size for the compression of chunks: that is, chunks smaller than this are not compressed. This key's setting overrides the following global settings:
341 * :confval:`bluestore_compression_min_blob_size`
342 * :confval:`bluestore_compression_min_blob_size_hdd`
343 * :confval:`bluestore_compression_min_blob_size_ssd`
345 :Type: Unsigned Integer
348 .. describe:: compression_max_blob_size
350 :Description: Sets the maximum size for chunks: that is, chunks larger than this are broken into smaller blobs of this size before compression is performed.
351 :Type: Unsigned Integer
357 :Description: Sets the number of replicas for objects in the pool. For further details, see `Setting the Number of RADOS Object Replicas`_. Replicated pools only.
362 .. describe:: min_size
364 :Description: Sets the minimum number of replicas required for I/O. For further details, see `Setting the Number of RADOS Object Replicas`_. For erasure-coded pools, this should be set to a value greater than 'k'. If I/O is allowed at the value 'k', then there is no redundancy and data will be lost in the event of a permanent OSD failure. For more information, see `Erasure Code <../erasure-code>`_
366 :Version: ``0.54`` and above
372 :Description: Sets the effective number of PGs to use when calculating data placement.
374 :Valid Range: ``0`` to ``mon_max_pool_pg_num``. If set to ``0``, the value of ``osd_pool_default_pg_num`` will be used.
378 .. describe:: pgp_num
380 :Description: Sets the effective number of PGs to use when calculating data placement.
382 :Valid Range: Between ``1`` and the current value of ``pg_num``.
386 .. describe:: crush_rule
388 :Description: Sets the CRUSH rule that Ceph uses to map object placement within the pool.
391 .. _allow_ec_overwrites:
393 .. describe:: allow_ec_overwrites
395 :Description: Determines whether writes to an erasure-coded pool are allowed to update only part of a RADOS object. This allows CephFS and RBD to use an EC (erasure-coded) pool for user data (but not for metadata). For more details, see `Erasure Coding with Overwrites`_.
398 .. versionadded:: 12.2.0
400 .. describe:: hashpspool
402 :Description: Sets and unsets the HASHPSPOOL flag on a given pool.
404 :Valid Range: 1 sets flag, 0 unsets flag
408 .. describe:: nodelete
410 :Description: Sets and unsets the NODELETE flag on a given pool.
412 :Valid Range: 1 sets flag, 0 unsets flag
413 :Version: Version ``FIXME``
417 .. describe:: nopgchange
419 :Description: Sets and unsets the NOPGCHANGE flag on a given pool.
421 :Valid Range: 1 sets flag, 0 unsets flag
422 :Version: Version ``FIXME``
426 .. describe:: nosizechange
428 :Description: Sets and unsets the NOSIZECHANGE flag on a given pool.
430 :Valid Range: 1 sets flag, 0 unsets flag
431 :Version: Version ``FIXME``
437 :Description: Sets and unsets the bulk flag on a given pool.
439 :Valid Range: ``true``/``1`` sets flag, ``false``/``0`` unsets flag
441 .. _write_fadvise_dontneed:
443 .. describe:: write_fadvise_dontneed
445 :Description: Sets and unsets the WRITE_FADVISE_DONTNEED flag on a given pool.
447 :Valid Range: ``1`` sets flag, ``0`` unsets flag
451 .. describe:: noscrub
453 :Description: Sets and unsets the NOSCRUB flag on a given pool.
455 :Valid Range: ``1`` sets flag, ``0`` unsets flag
459 .. describe:: nodeep-scrub
461 :Description: Sets and unsets the NODEEP_SCRUB flag on a given pool.
463 :Valid Range: ``1`` sets flag, ``0`` unsets flag
467 .. describe:: hit_set_type
469 :Description: Enables HitSet tracking for cache pools.
470 For additional information, see `Bloom Filter`_.
472 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
473 :Default: ``bloom``. Other values are for testing.
477 .. describe:: hit_set_count
479 :Description: Determines the number of HitSets to store for cache pools. The
480 higher the value, the more RAM is consumed by the ``ceph-osd``
483 :Valid Range: ``1``. Agent doesn't handle > ``1`` yet.
487 .. describe:: hit_set_period
489 :Description: Determines the duration of a HitSet period (in seconds) for
490 cache pools. The higher the value, the more RAM is consumed
491 by the ``ceph-osd`` daemon.
493 :Example: ``3600`` (3600 seconds: one hour)
497 .. describe:: hit_set_fpp
499 :Description: Determines the probability of false positives for the
500 ``bloom`` HitSet type. For additional information, see `Bloom
503 :Valid Range: ``0.0`` - ``1.0``
506 .. _cache_target_dirty_ratio:
508 .. describe:: cache_target_dirty_ratio
510 :Description: Sets a flush threshold for the percentage of the cache pool
511 containing modified (dirty) objects. When this threshold is
512 reached, the cache-tiering agent will flush these objects to
513 the backing storage pool.
517 .. _cache_target_dirty_high_ratio:
519 .. describe:: cache_target_dirty_high_ratio
521 :Description: Sets a flush threshold for the percentage of the cache pool
522 containing modified (dirty) objects. When this threshold is
523 reached, the cache-tiering agent will flush these objects to
524 the backing storage pool with a higher speed (as compared with
525 ``cache_target_dirty_ratio``).
529 .. _cache_target_full_ratio:
531 .. describe:: cache_target_full_ratio
533 :Description: Sets an eviction threshold for the percentage of the cache
534 pool containing unmodified (clean) objects. When this
535 threshold is reached, the cache-tiering agent will evict
536 these objects from the cache pool.
541 .. _target_max_bytes:
543 .. describe:: target_max_bytes
545 :Description: Ceph will begin flushing or evicting objects when the
546 ``max_bytes`` threshold is triggered.
548 :Example: ``1000000000000`` #1-TB
550 .. _target_max_objects:
552 .. describe:: target_max_objects
554 :Description: Ceph will begin flushing or evicting objects when the
555 ``max_objects`` threshold is triggered.
557 :Example: ``1000000`` #1M objects
560 .. describe:: hit_set_grade_decay_rate
562 :Description: Sets the temperature decay rate between two successive
565 :Valid Range: 0 - 100
568 .. describe:: hit_set_search_last_n
570 :Description: Count at most N appearances in HitSets. Used for temperature
573 :Valid Range: 0 - hit_set_count
576 .. _cache_min_flush_age:
578 .. describe:: cache_min_flush_age
580 :Description: Sets the time (in seconds) before the cache-tiering agent
581 flushes an object from the cache pool to the storage pool.
583 :Example: ``600`` (600 seconds: ten minutes)
585 .. _cache_min_evict_age:
587 .. describe:: cache_min_evict_age
589 :Description: Sets the time (in seconds) before the cache-tiering agent
590 evicts an object from the cache pool.
592 :Example: ``1800`` (1800 seconds: thirty minutes)
596 .. describe:: fast_read
598 :Description: For erasure-coded pools, if this flag is turned ``on``, the
599 read request issues "sub reads" to all shards, and then waits
600 until it receives enough shards to decode before it serves
601 the client. If *jerasure* or *isa* erasure plugins are in
602 use, then after the first *K* replies have returned, the
603 client's request is served immediately using the data decoded
604 from these replies. This approach sacrifices resources in
605 exchange for better performance. This flag is supported only
606 for erasure-coded pools.
610 .. _scrub_min_interval:
612 .. describe:: scrub_min_interval
614 :Description: Sets the minimum interval (in seconds) for successive scrubs of the pool's PGs when the load is low. If the default value of ``0`` is in effect, then the value of ``osd_scrub_min_interval`` from central config is used.
619 .. _scrub_max_interval:
621 .. describe:: scrub_max_interval
623 :Description: Sets the maximum interval (in seconds) for scrubs of the pool's PGs regardless of cluster load. If the value of ``scrub_max_interval`` is ``0``, then the value ``osd_scrub_max_interval`` from central config is used.
628 .. _deep_scrub_interval:
630 .. describe:: deep_scrub_interval
632 :Description: Sets the interval (in seconds) for pool “deep” scrubs of the pool's PGs. If the value of ``deep_scrub_interval`` is ``0``, the value ``osd_deep_scrub_interval`` from central config is used.
637 .. _recovery_priority:
639 .. describe:: recovery_priority
641 :Description: Setting this value adjusts a pool's computed reservation priority. This value must be in the range ``-10`` to ``10``. Any pool assigned a negative value will be given a lower priority than any new pools, so users are directed to assign negative values to low-priority pools.
647 .. _recovery_op_priority:
649 .. describe:: recovery_op_priority
651 :Description: Sets the recovery operation priority for a specific pool's PGs. This overrides the general priority determined by :confval:`osd_recovery_op_priority`.
660 To get a value from a pool's key, run a command of the following form:
664 ceph osd pool get {pool-name} {key}
667 You may get values from the following keys:
672 :Description: See size_.
679 :Description: See min_size_.
682 :Version: ``0.54`` and above
687 :Description: See pg_num_.
694 :Description: See pgp_num_.
697 :Valid Range: Equal to or less than ``pg_num``.
702 :Description: See crush_rule_.
707 :Description: See hit_set_type_.
710 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
715 :Description: See hit_set_count_.
722 :Description: See hit_set_period_.
729 :Description: See hit_set_fpp_.
734 ``cache_target_dirty_ratio``
736 :Description: See cache_target_dirty_ratio_.
741 ``cache_target_dirty_high_ratio``
743 :Description: See cache_target_dirty_high_ratio_.
748 ``cache_target_full_ratio``
750 :Description: See cache_target_full_ratio_.
757 :Description: See target_max_bytes_.
762 ``target_max_objects``
764 :Description: See target_max_objects_.
769 ``cache_min_flush_age``
771 :Description: See cache_min_flush_age_.
776 ``cache_min_evict_age``
778 :Description: See cache_min_evict_age_.
785 :Description: See fast_read_.
790 ``scrub_min_interval``
792 :Description: See scrub_min_interval_.
797 ``scrub_max_interval``
799 :Description: See scrub_max_interval_.
804 ``deep_scrub_interval``
806 :Description: See deep_scrub_interval_.
811 ``allow_ec_overwrites``
813 :Description: See allow_ec_overwrites_.
818 ``recovery_priority``
820 :Description: See recovery_priority_.
825 ``recovery_op_priority``
827 :Description: See recovery_op_priority_.
832 Setting the Number of RADOS Object Replicas
833 ===========================================
835 To set the number of data replicas on a replicated pool, run a command of the
840 ceph osd pool set {poolname} size {num-replicas}
842 .. important:: The ``{num-replicas}`` argument includes the primary object
843 itself. For example, if you want there to be two replicas of the object in
844 addition to the original object (for a total of three instances of the
845 object) specify ``3`` by running the following command:
849 ceph osd pool set data size 3
851 You may run the above command for each pool.
853 .. Note:: An object might accept I/Os in degraded mode with fewer than ``pool
854 size`` replicas. To set a minimum number of replicas required for I/O, you
855 should use the ``min_size`` setting. For example, you might run the
860 ceph osd pool set data min_size 2
862 This command ensures that no object in the data pool will receive I/O if it has
863 fewer than ``min_size`` (in this case, two) replicas.
866 Getting the Number of Object Replicas
867 =====================================
869 To get the number of object replicas, run the following command:
873 ceph osd dump | grep 'replicated size'
875 Ceph will list pools and highlight the ``replicated size`` attribute. By
876 default, Ceph creates two replicas of an object (a total of three copies, for a
880 .. _pgcalc: https://old.ceph.com/pgcalc/
881 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
882 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
883 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
884 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
885 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool