4 Pools are logical partitions for storing objects.
6 When you first deploy a cluster without creating a pool, Ceph uses the default
7 pools for storing data. A pool provides you with:
9 - **Resilience**: You can set how many OSD are allowed to fail without losing data.
10 For replicated pools, it is the desired number of copies/replicas of an object.
11 A typical configuration stores an object and two additional copies
12 (i.e., ``size = 3``), but you can configure the number of copies/replicas at
14 For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
15 (i.e. ``m=2`` in the **erasure code profile**)
17 - **Placement Groups**: You can set the number of placement groups for the pool.
18 A typical configuration targets approximately 100 placement groups per OSD to
19 provide optimal balancing without using up too many computing resources. When
20 setting up multiple pools, be careful to set a reasonable number of
21 placement groups for each pool and for the cluster as a whole. Note that each PG
22 belongs to a specific pool, so when multiple pools use the same OSDs, you must
23 take care that the **sum** of PG replicas per OSD is in the desired PG per OSD
26 - **CRUSH Rules**: When you store data in a pool, placement of the object
27 and its replicas (or chunks for erasure coded pools) in your cluster is governed
28 by CRUSH rules. You can create a custom CRUSH rule for your pool if the default
29 rule is not appropriate for your use case.
31 - **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
32 you effectively take a snapshot of a particular pool.
34 To organize data into pools, you can list, create, and remove pools.
35 You can also view the utilization statistics for each pool.
40 Pool names beginning with ``.`` are reserved for use by Ceph's internal
41 operations. Please do not create or manipulate pools with these names.
48 To list your cluster's pools, execute::
58 Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
59 Ideally, you should override the default value for the number of placement
60 groups in your Ceph configuration file, as the default is NOT ideal.
61 For details on placement group numbers refer to `setting the number of placement groups`_
63 .. note:: Starting with Luminous, all pools need to be associated to the
64 application using the pool. See `Associate Pool to Application`_ below for
69 osd_pool_default_pg_num = 128
70 osd_pool_default_pgp_num = 128
72 To create a pool, execute::
74 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
75 [crush-rule-name] [expected-num-objects]
76 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
77 [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
81 .. describe:: {pool-name}
83 The name of the pool. It must be unique.
88 .. describe:: {pg-num}
90 The total number of placement groups for the pool. See :ref:`placement groups`
91 for details on calculating a suitable number. The
92 default value ``8`` is NOT suitable for most systems.
98 .. describe:: {pgp-num}
100 The total number of placement groups for placement purposes. This
101 **should be equal to the total number of placement groups**, except
102 for placement group splitting scenarios.
105 :Required: Yes. Picks up default or Ceph configuration value if not specified.
108 .. describe:: {replicated|erasure}
110 The pool type which may either be **replicated** to
111 recover from lost OSDs by keeping multiple copies of the
112 objects or **erasure** to get a kind of
113 `generalized RAID5 <../erasure-code>`_ capability.
114 The **replicated** pools require more
115 raw storage but implement all Ceph operations. The
116 **erasure** pools require less raw storage but only
117 implement a subset of the available operations.
123 .. describe:: [crush-rule-name]
125 The name of a CRUSH rule to use for this pool. The specified
130 :Default: For **replicated** pools it is the rule specified by the
131 :confval:`osd_pool_default_crush_rule` config variable. This rule must exist.
132 For **erasure** pools it is ``erasure-code`` if the ``default``
133 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
134 rule will be created implicitly if it doesn't exist already.
137 .. describe:: [erasure-code-profile=profile]
139 For **erasure** pools only. Use the `erasure code profile`_. It
140 must be an existing profile as defined by
141 **osd erasure-code-profile set**.
146 .. _erasure code profile: ../erasure-code-profile
148 .. describe:: --autoscale-mode=<on,off,warn>
150 If you set the autoscale mode to ``on`` or ``warn``, you can let the system
151 autotune or recommend changes to the number of placement groups in your pool
152 based on actual usage. If you leave it off, then you should refer to
153 :ref:`placement groups` for more information.
157 :Default: The default behavior is controlled by the :confval:`osd_pool_default_pg_autoscale_mode` option.
159 .. describe:: [expected-num-objects]
161 The expected number of objects for this pool. By setting this value (
162 together with a negative **filestore merge threshold**), the PG folder
163 splitting would happen at the pool creation time, to avoid the latency
164 impact to do a runtime folder splitting.
168 :Default: 0, no splitting at the pool creation time.
170 .. _associate-pool-to-application:
172 Associate Pool to Application
173 =============================
175 Pools need to be associated with an application before use. Pools that will be
176 used with CephFS or pools that are automatically created by RGW are
177 automatically associated. Pools that are intended for use with RBD should be
178 initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
181 For other cases, you can manually associate a free-form application name to
184 ceph osd pool application enable {pool-name} {application-name}
186 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
187 application name ``rbd``, and RGW uses the application name ``rgw``.
192 You can set pool quotas for the maximum number of bytes and/or the maximum
193 number of objects per pool. ::
195 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
199 ceph osd pool set-quota data max_objects 10000
201 To remove a quota, set its value to ``0``.
207 To delete a pool, execute::
209 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
212 To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
213 configuration. Otherwise they will refuse to remove a pool.
215 See `Monitor Configuration`_ for more information.
217 .. _Monitor Configuration: ../../configuration/mon-config-ref
219 If you created your own rules for a pool you created, you should consider
220 removing them when you no longer need your pool::
222 ceph osd pool get {pool-name} crush_rule
224 If the rule was "123", for example, you can check the other pools like so::
226 ceph osd dump | grep "^pool" | grep "crush_rule 123"
228 If no other pools use that custom rule, then it's safe to delete that
229 rule from the cluster.
231 If you created users with permissions strictly for a pool that no longer
232 exists, you should consider deleting those users too::
234 ceph auth ls | grep -C 5 {pool-name}
241 To rename a pool, execute::
243 ceph osd pool rename {current-pool-name} {new-pool-name}
245 If you rename a pool and you have per-pool capabilities for an authenticated
246 user, you must update the user's capabilities (i.e., caps) with the new pool
252 To show a pool's utilization statistics, execute::
256 Additionally, to obtain I/O information for a specific pool or all, execute::
258 ceph osd pool stats [{pool-name}]
261 Make a Snapshot of a Pool
262 =========================
264 To make a snapshot of a pool, execute::
266 ceph osd pool mksnap {pool-name} {snap-name}
268 Remove a Snapshot of a Pool
269 ===========================
271 To remove a snapshot of a pool, execute::
273 ceph osd pool rmsnap {pool-name} {snap-name}
281 To set a value to a pool, execute the following::
283 ceph osd pool set {pool-name} {key} {value}
285 You may set values for the following keys:
287 .. _compression_algorithm:
289 .. describe:: compression_algorithm
291 Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the global setting
292 :confval:`bluestore_compression_algorithm`.
295 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
297 .. describe:: compression_mode
299 Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the
300 global setting :confval:`bluestore_compression_mode`.
303 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
305 .. describe:: compression_min_blob_size
307 Chunks smaller than this are never compressed. This setting overrides the global settings of
308 :confval:`bluestore_compression_min_blob_size`, :confval:`bluestore_compression_min_blob_size_hdd` and
309 :confval:`bluestore_compression_min_blob_size_ssd`
311 :Type: Unsigned Integer
313 .. describe:: compression_max_blob_size
315 Chunks larger than this are broken into smaller blobs sizing
316 ``compression_max_blob_size`` before being compressed.
318 :Type: Unsigned Integer
324 Sets the number of replicas for objects in the pool.
325 See `Set the Number of Object Replicas`_ for further details.
326 Replicated pools only.
332 .. describe:: min_size
334 Sets the minimum number of replicas required for I/O.
335 See `Set the Number of Object Replicas`_ for further details.
336 In the case of Erasure Coded pools this should be set to a value
337 greater than 'k' since if we allow IO at the value 'k' there is no
338 redundancy and data will be lost in the event of a permanent OSD
339 failure. For more information see `Erasure Code <../erasure-code>`_
342 :Version: ``0.54`` and above
348 The effective number of placement groups to use when calculating
352 :Valid Range: Superior to ``pg_num`` current value.
356 .. describe:: pgp_num
358 The effective number of placement groups for placement to use
359 when calculating data placement.
362 :Valid Range: Equal to or less than ``pg_num``.
366 .. describe:: crush_rule
368 The rule to use for mapping object placement in the cluster.
372 .. _allow_ec_overwrites:
374 .. describe:: allow_ec_overwrites
377 Whether writes to an erasure coded pool can update part
378 of an object, so cephfs and rbd can use it. See
379 `Erasure Coding with Overwrites`_ for more details.
383 .. versionadded:: 12.2.0
387 .. describe:: hashpspool
389 Set/Unset HASHPSPOOL flag on a given pool.
392 :Valid Range: 1 sets flag, 0 unsets flag
396 .. describe:: nodelete
398 Set/Unset NODELETE flag on a given pool.
401 :Valid Range: 1 sets flag, 0 unsets flag
402 :Version: Version ``FIXME``
406 .. describe:: nopgchange
408 :Description: Set/Unset NOPGCHANGE flag on a given pool.
410 :Valid Range: 1 sets flag, 0 unsets flag
411 :Version: Version ``FIXME``
415 .. describe:: nosizechange
417 Set/Unset NOSIZECHANGE flag on a given pool.
420 :Valid Range: 1 sets flag, 0 unsets flag
421 :Version: Version ``FIXME``
427 Set/Unset bulk flag on a given pool.
430 :Valid Range: true/1 sets flag, false/0 unsets flag
432 .. _write_fadvise_dontneed:
434 .. describe:: write_fadvise_dontneed
436 Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
439 :Valid Range: 1 sets flag, 0 unsets flag
443 .. describe:: noscrub
445 Set/Unset NOSCRUB flag on a given pool.
448 :Valid Range: 1 sets flag, 0 unsets flag
452 .. describe:: nodeep-scrub
454 Set/Unset NODEEP_SCRUB flag on a given pool.
457 :Valid Range: 1 sets flag, 0 unsets flag
461 .. describe:: hit_set_type
463 Enables hit set tracking for cache pools.
464 See `Bloom Filter`_ for additional information.
467 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
468 :Default: ``bloom``. Other values are for testing.
472 .. describe:: hit_set_count
474 The number of hit sets to store for cache pools. The higher
475 the number, the more RAM consumed by the ``ceph-osd`` daemon.
478 :Valid Range: ``1``. Agent doesn't handle > 1 yet.
482 .. describe:: hit_set_period
484 The duration of a hit set period in seconds for cache pools.
485 The higher the number, the more RAM consumed by the
489 :Example: ``3600`` 1hr
493 .. describe:: hit_set_fpp
495 The false positive probability for the ``bloom`` hit set type.
496 See `Bloom Filter`_ for additional information.
499 :Valid Range: 0.0 - 1.0
502 .. _cache_target_dirty_ratio:
504 .. describe:: cache_target_dirty_ratio
506 The percentage of the cache pool containing modified (dirty)
507 objects before the cache tiering agent will flush them to the
508 backing storage pool.
513 .. _cache_target_dirty_high_ratio:
515 .. describe:: cache_target_dirty_high_ratio
517 The percentage of the cache pool containing modified (dirty)
518 objects before the cache tiering agent will flush them to the
519 backing storage pool with a higher speed.
524 .. _cache_target_full_ratio:
526 .. describe:: cache_target_full_ratio
528 The percentage of the cache pool containing unmodified (clean)
529 objects before the cache tiering agent will evict them from the
535 .. _target_max_bytes:
537 .. describe:: target_max_bytes
539 Ceph will begin flushing or evicting objects when the
540 ``max_bytes`` threshold is triggered.
543 :Example: ``1000000000000`` #1-TB
545 .. _target_max_objects:
547 .. describe:: target_max_objects
549 Ceph will begin flushing or evicting objects when the
550 ``max_objects`` threshold is triggered.
553 :Example: ``1000000`` #1M objects
556 .. describe:: hit_set_grade_decay_rate
558 Temperature decay rate between two successive hit_sets
561 :Valid Range: 0 - 100
564 .. describe:: hit_set_search_last_n
566 Count at most N appearance in hit_sets for temperature calculation
569 :Valid Range: 0 - hit_set_count
572 .. _cache_min_flush_age:
574 .. describe:: cache_min_flush_age
576 The time (in seconds) before the cache tiering agent will flush
577 an object from the cache pool to the storage pool.
580 :Example: ``600`` 10min
582 .. _cache_min_evict_age:
584 .. describe:: cache_min_evict_age
586 The time (in seconds) before the cache tiering agent will evict
587 an object from the cache pool.
590 :Example: ``1800`` 30min
594 .. describe:: fast_read
596 On Erasure Coding pool, if this flag is turned on, the read request
597 would issue sub reads to all shards, and waits until it receives enough
598 shards to decode to serve the client. In the case of jerasure and isa
599 erasure plugins, once the first K replies return, client's request is
600 served immediately using the data decoded from these replies. This
601 helps to tradeoff some resources for better performance. Currently this
602 flag is only supported for Erasure Coding pool.
607 .. _scrub_min_interval:
609 .. describe:: scrub_min_interval
611 The minimum interval in seconds for pool scrubbing when
612 load is low. If it is 0, the value osd_scrub_min_interval
618 .. _scrub_max_interval:
620 .. describe:: scrub_max_interval
622 The maximum interval in seconds for pool scrubbing
623 irrespective of cluster load. If it is 0, the value
624 osd_scrub_max_interval from config is used.
629 .. _deep_scrub_interval:
631 .. describe:: deep_scrub_interval
633 The interval in seconds for pool “deep” scrubbing. If it
634 is 0, the value osd_deep_scrub_interval from config is used.
639 .. _recovery_priority:
641 .. describe:: recovery_priority
643 When a value is set it will increase or decrease the computed
644 reservation priority. This value must be in the range -10 to
645 10. Use a negative priority for less important pools so they
646 have lower priority than any new pools.
652 .. _recovery_op_priority:
654 .. describe:: recovery_op_priority
656 Specify the recovery operation priority for this pool instead of :confval:`osd_recovery_op_priority`.
665 To get a value from a pool, execute the following::
667 ceph osd pool get {pool-name} {key}
669 You may get values for the following keys:
673 :Description: see size_
679 :Description: see min_size_
682 :Version: ``0.54`` and above
686 :Description: see pg_num_
693 :Description: see pgp_num_
696 :Valid Range: Equal to or less than ``pg_num``.
701 :Description: see crush_rule_
706 :Description: see hit_set_type_
709 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
713 :Description: see hit_set_count_
720 :Description: see hit_set_period_
727 :Description: see hit_set_fpp_
732 ``cache_target_dirty_ratio``
734 :Description: see cache_target_dirty_ratio_
739 ``cache_target_dirty_high_ratio``
741 :Description: see cache_target_dirty_high_ratio_
746 ``cache_target_full_ratio``
748 :Description: see cache_target_full_ratio_
755 :Description: see target_max_bytes_
760 ``target_max_objects``
762 :Description: see target_max_objects_
767 ``cache_min_flush_age``
769 :Description: see cache_min_flush_age_
774 ``cache_min_evict_age``
776 :Description: see cache_min_evict_age_
783 :Description: see fast_read_
788 ``scrub_min_interval``
790 :Description: see scrub_min_interval_
795 ``scrub_max_interval``
797 :Description: see scrub_max_interval_
802 ``deep_scrub_interval``
804 :Description: see deep_scrub_interval_
809 ``allow_ec_overwrites``
811 :Description: see allow_ec_overwrites_
816 ``recovery_priority``
818 :Description: see recovery_priority_
823 ``recovery_op_priority``
825 :Description: see recovery_op_priority_
830 Set the Number of Object Replicas
831 =================================
833 To set the number of object replicas on a replicated pool, execute the following::
835 ceph osd pool set {poolname} size {num-replicas}
837 .. important:: The ``{num-replicas}`` includes the object itself.
838 If you want the object and two copies of the object for a total of
839 three instances of the object, specify ``3``.
843 ceph osd pool set data size 3
845 You may execute this command for each pool. **Note:** An object might accept
846 I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
847 number of required replicas for I/O, you should use the ``min_size`` setting.
850 ceph osd pool set data min_size 2
852 This ensures that no object in the data pool will receive I/O with fewer than
853 ``min_size`` replicas.
856 Get the Number of Object Replicas
857 =================================
859 To get the number of object replicas, execute the following::
861 ceph osd dump | grep 'replicated size'
863 Ceph will list the pools, with the ``replicated size`` attribute highlighted.
864 By default, ceph creates two replicas of an object (a total of three copies, or
869 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
870 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
871 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
872 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
873 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool