6 Pools are logical partitions that are used to store objects.
10 - **Resilience**: It is possible to set the number of OSDs that are allowed to
11 fail without any data being lost. If your cluster uses replicated pools, the
12 number of OSDs that can fail without data loss is the number of replicas.
13 For example: a typical configuration stores an object and two additional
14 copies (that is: ``size = 3``), but you can configure the number of replicas
15 on a per-pool basis. For `erasure coded pools <../erasure-code>`_, resilience
16 is defined as the number of coding chunks (for example, ``m = 2`` in the
17 **erasure code profile**).
19 - **Placement Groups**: You can set the number of placement groups for the
20 pool. A typical configuration targets approximately 100 placement groups per
21 OSD, providing optimal balancing without consuming many computing resources.
22 When setting up multiple pools, be careful to set a reasonable number of
23 placement groups for each pool and for the cluster as a whole. Note that each
24 PG belongs to a specific pool: when multiple pools use the same OSDs, make
25 sure that the **sum** of PG replicas per OSD is in the desired PG per OSD
26 target range. Use the `pgcalc`_ tool to calculate the number of placement
27 groups to set for your pool.
29 - **CRUSH Rules**: When data is stored in a pool, the placement of the object
30 and its replicas (or chunks, in the case of erasure-coded pools) in your
31 cluster is governed by CRUSH rules. Custom CRUSH rules can be created for a
32 pool if the default rule does not fit your use case.
34 - **Snapshots**: The command ``ceph osd pool mksnap`` creates a snapshot of a
40 Pool names beginning with ``.`` are reserved for use by Ceph's internal
41 operations. Please do not create or manipulate pools with these names.
47 To list your cluster's pools, execute:
58 Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
59 Ideally, you should override the default value for the number of placement
60 groups in your Ceph configuration file, as the default is NOT ideal.
61 For details on placement group numbers refer to `setting the number of placement groups`_
63 .. note:: Starting with Luminous, all pools need to be associated to the
64 application using the pool. See `Associate Pool to Application`_ below for
71 osd_pool_default_pg_num = 128
72 osd_pool_default_pgp_num = 128
74 To create a pool, execute:
78 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
79 [crush-rule-name] [expected-num-objects]
80 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
81 [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
85 .. describe:: {pool-name}
87 The name of the pool. It must be unique.
92 .. describe:: {pg-num}
94 The total number of placement groups for the pool. See :ref:`placement groups`
95 for details on calculating a suitable number. The
96 default value ``8`` is NOT suitable for most systems.
102 .. describe:: {pgp-num}
104 The total number of placement groups for placement purposes. This
105 **should be equal to the total number of placement groups**, except
106 for placement group splitting scenarios.
109 :Required: Yes. Picks up default or Ceph configuration value if not specified.
112 .. describe:: {replicated|erasure}
114 The pool type which may either be **replicated** to
115 recover from lost OSDs by keeping multiple copies of the
116 objects or **erasure** to get a kind of
117 `generalized RAID5 <../erasure-code>`_ capability.
118 The **replicated** pools require more
119 raw storage but implement all Ceph operations. The
120 **erasure** pools require less raw storage but only
121 implement a subset of the available operations.
127 .. describe:: [crush-rule-name]
129 The name of a CRUSH rule to use for this pool. The specified
134 :Default: For **replicated** pools it is the rule specified by the
135 :confval:`osd_pool_default_crush_rule` config variable. This rule must exist.
136 For **erasure** pools it is ``erasure-code`` if the ``default``
137 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
138 rule will be created implicitly if it doesn't exist already.
141 .. describe:: [erasure-code-profile=profile]
143 For **erasure** pools only. Use the `erasure code profile`_. It
144 must be an existing profile as defined by
145 **osd erasure-code-profile set**.
150 .. _erasure code profile: ../erasure-code-profile
152 .. describe:: --autoscale-mode=<on,off,warn>
154 If you set the autoscale mode to ``on`` or ``warn``, you can let the system
155 autotune or recommend changes to the number of placement groups in your pool
156 based on actual usage. If you leave it off, then you should refer to
157 :ref:`placement groups` for more information.
161 :Default: The default behavior is controlled by the :confval:`osd_pool_default_pg_autoscale_mode` option.
163 .. describe:: [expected-num-objects]
165 The expected number of objects for this pool. By setting this value (
166 together with a negative **filestore merge threshold**), the PG folder
167 splitting would happen at the pool creation time, to avoid the latency
168 impact to do a runtime folder splitting.
172 :Default: 0, no splitting at the pool creation time.
174 .. _associate-pool-to-application:
176 Associate Pool to Application
177 =============================
179 Pools need to be associated with an application before use. Pools that will be
180 used with CephFS or pools that are automatically created by RGW are
181 automatically associated. Pools that are intended for use with RBD should be
182 initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
185 For other cases, you can manually associate a free-form application name to
190 ceph osd pool application enable {pool-name} {application-name}
192 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
193 application name ``rbd``, and RGW uses the application name ``rgw``.
198 You can set pool quotas for the maximum number of bytes and/or the maximum
199 number of objects per pool:
203 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
209 ceph osd pool set-quota data max_objects 10000
211 To remove a quota, set its value to ``0``.
217 To delete a pool, execute:
221 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
224 To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
225 configuration. Otherwise they will refuse to remove a pool.
227 See `Monitor Configuration`_ for more information.
229 .. _Monitor Configuration: ../../configuration/mon-config-ref
231 If you created your own rules for a pool you created, you should consider
232 removing them when you no longer need your pool:
236 ceph osd pool get {pool-name} crush_rule
238 If the rule was "123", for example, you can check the other pools like so:
242 ceph osd dump | grep "^pool" | grep "crush_rule 123"
244 If no other pools use that custom rule, then it's safe to delete that
245 rule from the cluster.
247 If you created users with permissions strictly for a pool that no longer
248 exists, you should consider deleting those users too:
253 ceph auth ls | grep -C 5 {pool-name}
260 To rename a pool, execute:
264 ceph osd pool rename {current-pool-name} {new-pool-name}
266 If you rename a pool and you have per-pool capabilities for an authenticated
267 user, you must update the user's capabilities (i.e., caps) with the new pool
273 To show a pool's utilization statistics, execute:
279 Additionally, to obtain I/O information for a specific pool or all, execute:
283 ceph osd pool stats [{pool-name}]
286 Make a Snapshot of a Pool
287 =========================
289 To make a snapshot of a pool, execute:
293 ceph osd pool mksnap {pool-name} {snap-name}
295 Remove a Snapshot of a Pool
296 ===========================
298 To remove a snapshot of a pool, execute:
302 ceph osd pool rmsnap {pool-name} {snap-name}
310 To set a value to a pool, execute the following:
314 ceph osd pool set {pool-name} {key} {value}
316 You may set values for the following keys:
318 .. _compression_algorithm:
320 .. describe:: compression_algorithm
322 Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the global setting
323 :confval:`bluestore_compression_algorithm`.
326 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
328 .. describe:: compression_mode
330 Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the
331 global setting :confval:`bluestore_compression_mode`.
334 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
336 .. describe:: compression_min_blob_size
338 Chunks smaller than this are never compressed. This setting overrides the global settings of
339 :confval:`bluestore_compression_min_blob_size`, :confval:`bluestore_compression_min_blob_size_hdd` and
340 :confval:`bluestore_compression_min_blob_size_ssd`
342 :Type: Unsigned Integer
344 .. describe:: compression_max_blob_size
346 Chunks larger than this are broken into smaller blobs sizing
347 ``compression_max_blob_size`` before being compressed.
349 :Type: Unsigned Integer
355 Sets the number of replicas for objects in the pool.
356 See `Set the Number of Object Replicas`_ for further details.
357 Replicated pools only.
363 .. describe:: min_size
365 Sets the minimum number of replicas required for I/O.
366 See `Set the Number of Object Replicas`_ for further details.
367 In the case of Erasure Coded pools this should be set to a value
368 greater than 'k' since if we allow IO at the value 'k' there is no
369 redundancy and data will be lost in the event of a permanent OSD
370 failure. For more information see `Erasure Code <../erasure-code>`_
373 :Version: ``0.54`` and above
379 The effective number of placement groups to use when calculating
383 :Valid Range: Superior to ``pg_num`` current value.
387 .. describe:: pgp_num
389 The effective number of placement groups for placement to use
390 when calculating data placement.
393 :Valid Range: Equal to or less than ``pg_num``.
397 .. describe:: crush_rule
399 The rule to use for mapping object placement in the cluster.
403 .. _allow_ec_overwrites:
405 .. describe:: allow_ec_overwrites
408 Whether writes to an erasure coded pool can update part
409 of an object, so cephfs and rbd can use it. See
410 `Erasure Coding with Overwrites`_ for more details.
414 .. versionadded:: 12.2.0
418 .. describe:: hashpspool
420 Set/Unset HASHPSPOOL flag on a given pool.
423 :Valid Range: 1 sets flag, 0 unsets flag
427 .. describe:: nodelete
429 Set/Unset NODELETE flag on a given pool.
432 :Valid Range: 1 sets flag, 0 unsets flag
433 :Version: Version ``FIXME``
437 .. describe:: nopgchange
439 :Description: Set/Unset NOPGCHANGE flag on a given pool.
441 :Valid Range: 1 sets flag, 0 unsets flag
442 :Version: Version ``FIXME``
446 .. describe:: nosizechange
448 Set/Unset NOSIZECHANGE flag on a given pool.
451 :Valid Range: 1 sets flag, 0 unsets flag
452 :Version: Version ``FIXME``
458 Set/Unset bulk flag on a given pool.
461 :Valid Range: true/1 sets flag, false/0 unsets flag
463 .. _write_fadvise_dontneed:
465 .. describe:: write_fadvise_dontneed
467 Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
470 :Valid Range: 1 sets flag, 0 unsets flag
474 .. describe:: noscrub
476 Set/Unset NOSCRUB flag on a given pool.
479 :Valid Range: 1 sets flag, 0 unsets flag
483 .. describe:: nodeep-scrub
485 Set/Unset NODEEP_SCRUB flag on a given pool.
488 :Valid Range: 1 sets flag, 0 unsets flag
492 .. describe:: hit_set_type
494 Enables hit set tracking for cache pools.
495 See `Bloom Filter`_ for additional information.
498 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
499 :Default: ``bloom``. Other values are for testing.
503 .. describe:: hit_set_count
505 The number of hit sets to store for cache pools. The higher
506 the number, the more RAM consumed by the ``ceph-osd`` daemon.
509 :Valid Range: ``1``. Agent doesn't handle > 1 yet.
513 .. describe:: hit_set_period
515 The duration of a hit set period in seconds for cache pools.
516 The higher the number, the more RAM consumed by the
520 :Example: ``3600`` 1hr
524 .. describe:: hit_set_fpp
526 The false positive probability for the ``bloom`` hit set type.
527 See `Bloom Filter`_ for additional information.
530 :Valid Range: 0.0 - 1.0
533 .. _cache_target_dirty_ratio:
535 .. describe:: cache_target_dirty_ratio
537 The percentage of the cache pool containing modified (dirty)
538 objects before the cache tiering agent will flush them to the
539 backing storage pool.
544 .. _cache_target_dirty_high_ratio:
546 .. describe:: cache_target_dirty_high_ratio
548 The percentage of the cache pool containing modified (dirty)
549 objects before the cache tiering agent will flush them to the
550 backing storage pool with a higher speed.
555 .. _cache_target_full_ratio:
557 .. describe:: cache_target_full_ratio
559 The percentage of the cache pool containing unmodified (clean)
560 objects before the cache tiering agent will evict them from the
566 .. _target_max_bytes:
568 .. describe:: target_max_bytes
570 Ceph will begin flushing or evicting objects when the
571 ``max_bytes`` threshold is triggered.
574 :Example: ``1000000000000`` #1-TB
576 .. _target_max_objects:
578 .. describe:: target_max_objects
580 Ceph will begin flushing or evicting objects when the
581 ``max_objects`` threshold is triggered.
584 :Example: ``1000000`` #1M objects
587 .. describe:: hit_set_grade_decay_rate
589 Temperature decay rate between two successive hit_sets
592 :Valid Range: 0 - 100
595 .. describe:: hit_set_search_last_n
597 Count at most N appearance in hit_sets for temperature calculation
600 :Valid Range: 0 - hit_set_count
603 .. _cache_min_flush_age:
605 .. describe:: cache_min_flush_age
607 The time (in seconds) before the cache tiering agent will flush
608 an object from the cache pool to the storage pool.
611 :Example: ``600`` 10min
613 .. _cache_min_evict_age:
615 .. describe:: cache_min_evict_age
617 The time (in seconds) before the cache tiering agent will evict
618 an object from the cache pool.
621 :Example: ``1800`` 30min
625 .. describe:: fast_read
627 On Erasure Coding pool, if this flag is turned on, the read request
628 would issue sub reads to all shards, and waits until it receives enough
629 shards to decode to serve the client. In the case of jerasure and isa
630 erasure plugins, once the first K replies return, client's request is
631 served immediately using the data decoded from these replies. This
632 helps to tradeoff some resources for better performance. Currently this
633 flag is only supported for Erasure Coding pool.
638 .. _scrub_min_interval:
640 .. describe:: scrub_min_interval
642 The minimum interval in seconds for pool scrubbing when
643 load is low. If it is 0, the value osd_scrub_min_interval
649 .. _scrub_max_interval:
651 .. describe:: scrub_max_interval
653 The maximum interval in seconds for pool scrubbing
654 irrespective of cluster load. If it is 0, the value
655 osd_scrub_max_interval from config is used.
660 .. _deep_scrub_interval:
662 .. describe:: deep_scrub_interval
664 The interval in seconds for pool “deep” scrubbing. If it
665 is 0, the value osd_deep_scrub_interval from config is used.
670 .. _recovery_priority:
672 .. describe:: recovery_priority
674 When a value is set it will increase or decrease the computed
675 reservation priority. This value must be in the range -10 to
676 10. Use a negative priority for less important pools so they
677 have lower priority than any new pools.
683 .. _recovery_op_priority:
685 .. describe:: recovery_op_priority
687 Specify the recovery operation priority for this pool instead of :confval:`osd_recovery_op_priority`.
696 To get a value from a pool, execute the following:
700 ceph osd pool get {pool-name} {key}
702 You may get values for the following keys:
706 :Description: see size_
712 :Description: see min_size_
715 :Version: ``0.54`` and above
719 :Description: see pg_num_
726 :Description: see pgp_num_
729 :Valid Range: Equal to or less than ``pg_num``.
734 :Description: see crush_rule_
739 :Description: see hit_set_type_
742 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
746 :Description: see hit_set_count_
753 :Description: see hit_set_period_
760 :Description: see hit_set_fpp_
765 ``cache_target_dirty_ratio``
767 :Description: see cache_target_dirty_ratio_
772 ``cache_target_dirty_high_ratio``
774 :Description: see cache_target_dirty_high_ratio_
779 ``cache_target_full_ratio``
781 :Description: see cache_target_full_ratio_
788 :Description: see target_max_bytes_
793 ``target_max_objects``
795 :Description: see target_max_objects_
800 ``cache_min_flush_age``
802 :Description: see cache_min_flush_age_
807 ``cache_min_evict_age``
809 :Description: see cache_min_evict_age_
816 :Description: see fast_read_
821 ``scrub_min_interval``
823 :Description: see scrub_min_interval_
828 ``scrub_max_interval``
830 :Description: see scrub_max_interval_
835 ``deep_scrub_interval``
837 :Description: see deep_scrub_interval_
842 ``allow_ec_overwrites``
844 :Description: see allow_ec_overwrites_
849 ``recovery_priority``
851 :Description: see recovery_priority_
856 ``recovery_op_priority``
858 :Description: see recovery_op_priority_
863 Set the Number of Object Replicas
864 =================================
866 To set the number of object replicas on a replicated pool, execute the following:
870 ceph osd pool set {poolname} size {num-replicas}
872 .. important:: The ``{num-replicas}`` includes the object itself.
873 If you want the object and two copies of the object for a total of
874 three instances of the object, specify ``3``.
880 ceph osd pool set data size 3
882 You may execute this command for each pool. **Note:** An object might accept
883 I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
884 number of required replicas for I/O, you should use the ``min_size`` setting.
889 ceph osd pool set data min_size 2
891 This ensures that no object in the data pool will receive I/O with fewer than
892 ``min_size`` replicas.
895 Get the Number of Object Replicas
896 =================================
898 To get the number of object replicas, execute the following:
902 ceph osd dump | grep 'replicated size'
904 Ceph will list the pools, with the ``replicated size`` attribute highlighted.
905 By default, ceph creates two replicas of an object (a total of three copies, or
909 .. _pgcalc: https://old.ceph.com/pgcalc/
910 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
911 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
912 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
913 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
914 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool