]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/pools.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / rados / operations / pools.rst
1 =======
2 Pools
3 =======
4 Pools are logical partitions for storing objects.
5
6 When you first deploy a cluster without creating a pool, Ceph uses the default
7 pools for storing data. A pool provides you with:
8
9 - **Resilience**: You can set how many OSD are allowed to fail without losing data.
10 For replicated pools, it is the desired number of copies/replicas of an object.
11 A typical configuration stores an object and two additional copies
12 (i.e., ``size = 3``), but you can configure the number of copies/replicas at
13 pool granularity.
14 For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
15 (i.e. ``m=2`` in the **erasure code profile**)
16
17 - **Placement Groups**: You can set the number of placement groups for the pool.
18 A typical configuration targets approximately 100 placement groups per OSD to
19 provide optimal balancing without using up too many computing resources. When
20 setting up multiple pools, be careful to set a reasonable number of
21 placement groups for each pool and for the cluster as a whole. Note that each PG
22 belongs to a specific pool, so when multiple pools use the same OSDs, you must
23 take care that the **sum** of PG replicas per OSD is in the desired PG per OSD
24 target range.
25
26 - **CRUSH Rules**: When you store data in a pool, placement of the object
27 and its replicas (or chunks for erasure coded pools) in your cluster is governed
28 by CRUSH rules. You can create a custom CRUSH rule for your pool if the default
29 rule is not appropriate for your use case.
30
31 - **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
32 you effectively take a snapshot of a particular pool.
33
34 To organize data into pools, you can list, create, and remove pools.
35 You can also view the utilization statistics for each pool.
36
37 Pool Names
38 ==========
39
40 Pool names beginning with ``.`` are reserved for use by Ceph's internal
41 operations. Please do not create or manipulate pools with these names.
42
43
44
45 List Pools
46 ==========
47
48 To list your cluster's pools, execute::
49
50 ceph osd lspools
51
52
53 .. _createpool:
54
55 Create a Pool
56 =============
57
58 Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
59 Ideally, you should override the default value for the number of placement
60 groups in your Ceph configuration file, as the default is NOT ideal.
61 For details on placement group numbers refer to `setting the number of placement groups`_
62
63 .. note:: Starting with Luminous, all pools need to be associated to the
64 application using the pool. See `Associate Pool to Application`_ below for
65 more information.
66
67 For example::
68
69 osd_pool_default_pg_num = 128
70 osd_pool_default_pgp_num = 128
71
72 To create a pool, execute::
73
74 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
75 [crush-rule-name] [expected-num-objects]
76 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
77 [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
78
79 Where:
80
81 .. describe:: {pool-name}
82
83 The name of the pool. It must be unique.
84
85 :Type: String
86 :Required: Yes.
87
88 .. describe:: {pg-num}
89
90 The total number of placement groups for the pool. See :ref:`placement groups`
91 for details on calculating a suitable number. The
92 default value ``8`` is NOT suitable for most systems.
93
94 :Type: Integer
95 :Required: Yes.
96 :Default: 8
97
98 .. describe:: {pgp-num}
99
100 The total number of placement groups for placement purposes. This
101 **should be equal to the total number of placement groups**, except
102 for placement group splitting scenarios.
103
104 :Type: Integer
105 :Required: Yes. Picks up default or Ceph configuration value if not specified.
106 :Default: 8
107
108 .. describe:: {replicated|erasure}
109
110 The pool type which may either be **replicated** to
111 recover from lost OSDs by keeping multiple copies of the
112 objects or **erasure** to get a kind of
113 `generalized RAID5 <../erasure-code>`_ capability.
114 The **replicated** pools require more
115 raw storage but implement all Ceph operations. The
116 **erasure** pools require less raw storage but only
117 implement a subset of the available operations.
118
119 :Type: String
120 :Required: No.
121 :Default: replicated
122
123 .. describe:: [crush-rule-name]
124
125 The name of a CRUSH rule to use for this pool. The specified
126 rule must exist.
127
128 :Type: String
129 :Required: No.
130 :Default: For **replicated** pools it is the rule specified by the
131 :confval:`osd_pool_default_crush_rule` config variable. This rule must exist.
132 For **erasure** pools it is ``erasure-code`` if the ``default``
133 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
134 rule will be created implicitly if it doesn't exist already.
135
136
137 .. describe:: [erasure-code-profile=profile]
138
139 For **erasure** pools only. Use the `erasure code profile`_. It
140 must be an existing profile as defined by
141 **osd erasure-code-profile set**.
142
143 :Type: String
144 :Required: No.
145
146 .. _erasure code profile: ../erasure-code-profile
147
148 .. describe:: --autoscale-mode=<on,off,warn>
149
150 If you set the autoscale mode to ``on`` or ``warn``, you can let the system
151 autotune or recommend changes to the number of placement groups in your pool
152 based on actual usage. If you leave it off, then you should refer to
153 :ref:`placement groups` for more information.
154
155 :Type: String
156 :Required: No.
157 :Default: The default behavior is controlled by the :confval:`osd_pool_default_pg_autoscale_mode` option.
158
159 .. describe:: [expected-num-objects]
160
161 The expected number of objects for this pool. By setting this value (
162 together with a negative **filestore merge threshold**), the PG folder
163 splitting would happen at the pool creation time, to avoid the latency
164 impact to do a runtime folder splitting.
165
166 :Type: Integer
167 :Required: No.
168 :Default: 0, no splitting at the pool creation time.
169
170 .. _associate-pool-to-application:
171
172 Associate Pool to Application
173 =============================
174
175 Pools need to be associated with an application before use. Pools that will be
176 used with CephFS or pools that are automatically created by RGW are
177 automatically associated. Pools that are intended for use with RBD should be
178 initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
179 information).
180
181 For other cases, you can manually associate a free-form application name to
182 a pool.::
183
184 ceph osd pool application enable {pool-name} {application-name}
185
186 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
187 application name ``rbd``, and RGW uses the application name ``rgw``.
188
189 Set Pool Quotas
190 ===============
191
192 You can set pool quotas for the maximum number of bytes and/or the maximum
193 number of objects per pool. ::
194
195 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
196
197 For example::
198
199 ceph osd pool set-quota data max_objects 10000
200
201 To remove a quota, set its value to ``0``.
202
203
204 Delete a Pool
205 =============
206
207 To delete a pool, execute::
208
209 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
210
211
212 To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
213 configuration. Otherwise they will refuse to remove a pool.
214
215 See `Monitor Configuration`_ for more information.
216
217 .. _Monitor Configuration: ../../configuration/mon-config-ref
218
219 If you created your own rules for a pool you created, you should consider
220 removing them when you no longer need your pool::
221
222 ceph osd pool get {pool-name} crush_rule
223
224 If the rule was "123", for example, you can check the other pools like so::
225
226 ceph osd dump | grep "^pool" | grep "crush_rule 123"
227
228 If no other pools use that custom rule, then it's safe to delete that
229 rule from the cluster.
230
231 If you created users with permissions strictly for a pool that no longer
232 exists, you should consider deleting those users too::
233
234 ceph auth ls | grep -C 5 {pool-name}
235 ceph auth del {user}
236
237
238 Rename a Pool
239 =============
240
241 To rename a pool, execute::
242
243 ceph osd pool rename {current-pool-name} {new-pool-name}
244
245 If you rename a pool and you have per-pool capabilities for an authenticated
246 user, you must update the user's capabilities (i.e., caps) with the new pool
247 name.
248
249 Show Pool Statistics
250 ====================
251
252 To show a pool's utilization statistics, execute::
253
254 rados df
255
256 Additionally, to obtain I/O information for a specific pool or all, execute::
257
258 ceph osd pool stats [{pool-name}]
259
260
261 Make a Snapshot of a Pool
262 =========================
263
264 To make a snapshot of a pool, execute::
265
266 ceph osd pool mksnap {pool-name} {snap-name}
267
268 Remove a Snapshot of a Pool
269 ===========================
270
271 To remove a snapshot of a pool, execute::
272
273 ceph osd pool rmsnap {pool-name} {snap-name}
274
275 .. _setpoolvalues:
276
277
278 Set Pool Values
279 ===============
280
281 To set a value to a pool, execute the following::
282
283 ceph osd pool set {pool-name} {key} {value}
284
285 You may set values for the following keys:
286
287 .. _compression_algorithm:
288
289 .. describe:: compression_algorithm
290
291 Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the global setting
292 :confval:`bluestore_compression_algorithm`.
293
294 :Type: String
295 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
296
297 .. describe:: compression_mode
298
299 Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the
300 global setting :confval:`bluestore_compression_mode`.
301
302 :Type: String
303 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
304
305 .. describe:: compression_min_blob_size
306
307 Chunks smaller than this are never compressed. This setting overrides the global settings of
308 :confval:`bluestore_compression_min_blob_size`, :confval:`bluestore_compression_min_blob_size_hdd` and
309 :confval:`bluestore_compression_min_blob_size_ssd`
310
311 :Type: Unsigned Integer
312
313 .. describe:: compression_max_blob_size
314
315 Chunks larger than this are broken into smaller blobs sizing
316 ``compression_max_blob_size`` before being compressed.
317
318 :Type: Unsigned Integer
319
320 .. _size:
321
322 .. describe:: size
323
324 Sets the number of replicas for objects in the pool.
325 See `Set the Number of Object Replicas`_ for further details.
326 Replicated pools only.
327
328 :Type: Integer
329
330 .. _min_size:
331
332 .. describe:: min_size
333
334 Sets the minimum number of replicas required for I/O.
335 See `Set the Number of Object Replicas`_ for further details.
336 In the case of Erasure Coded pools this should be set to a value
337 greater than 'k' since if we allow IO at the value 'k' there is no
338 redundancy and data will be lost in the event of a permanent OSD
339 failure. For more information see `Erasure Code <../erasure-code>`_
340
341 :Type: Integer
342 :Version: ``0.54`` and above
343
344 .. _pg_num:
345
346 .. describe:: pg_num
347
348 The effective number of placement groups to use when calculating
349 data placement.
350
351 :Type: Integer
352 :Valid Range: Superior to ``pg_num`` current value.
353
354 .. _pgp_num:
355
356 .. describe:: pgp_num
357
358 The effective number of placement groups for placement to use
359 when calculating data placement.
360
361 :Type: Integer
362 :Valid Range: Equal to or less than ``pg_num``.
363
364 .. _crush_rule:
365
366 .. describe:: crush_rule
367
368 The rule to use for mapping object placement in the cluster.
369
370 :Type: String
371
372 .. _allow_ec_overwrites:
373
374 .. describe:: allow_ec_overwrites
375
376
377 Whether writes to an erasure coded pool can update part
378 of an object, so cephfs and rbd can use it. See
379 `Erasure Coding with Overwrites`_ for more details.
380
381 :Type: Boolean
382
383 .. versionadded:: 12.2.0
384
385 .. _hashpspool:
386
387 .. describe:: hashpspool
388
389 Set/Unset HASHPSPOOL flag on a given pool.
390
391 :Type: Integer
392 :Valid Range: 1 sets flag, 0 unsets flag
393
394 .. _nodelete:
395
396 .. describe:: nodelete
397
398 Set/Unset NODELETE flag on a given pool.
399
400 :Type: Integer
401 :Valid Range: 1 sets flag, 0 unsets flag
402 :Version: Version ``FIXME``
403
404 .. _nopgchange:
405
406 .. describe:: nopgchange
407
408 :Description: Set/Unset NOPGCHANGE flag on a given pool.
409 :Type: Integer
410 :Valid Range: 1 sets flag, 0 unsets flag
411 :Version: Version ``FIXME``
412
413 .. _nosizechange:
414
415 .. describe:: nosizechange
416
417 Set/Unset NOSIZECHANGE flag on a given pool.
418
419 :Type: Integer
420 :Valid Range: 1 sets flag, 0 unsets flag
421 :Version: Version ``FIXME``
422
423 .. _bulk:
424
425 .. describe:: bulk
426
427 Set/Unset bulk flag on a given pool.
428
429 :Type: Boolean
430 :Valid Range: true/1 sets flag, false/0 unsets flag
431
432 .. _write_fadvise_dontneed:
433
434 .. describe:: write_fadvise_dontneed
435
436 Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
437
438 :Type: Integer
439 :Valid Range: 1 sets flag, 0 unsets flag
440
441 .. _noscrub:
442
443 .. describe:: noscrub
444
445 Set/Unset NOSCRUB flag on a given pool.
446
447 :Type: Integer
448 :Valid Range: 1 sets flag, 0 unsets flag
449
450 .. _nodeep-scrub:
451
452 .. describe:: nodeep-scrub
453
454 Set/Unset NODEEP_SCRUB flag on a given pool.
455
456 :Type: Integer
457 :Valid Range: 1 sets flag, 0 unsets flag
458
459 .. _hit_set_type:
460
461 .. describe:: hit_set_type
462
463 Enables hit set tracking for cache pools.
464 See `Bloom Filter`_ for additional information.
465
466 :Type: String
467 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
468 :Default: ``bloom``. Other values are for testing.
469
470 .. _hit_set_count:
471
472 .. describe:: hit_set_count
473
474 The number of hit sets to store for cache pools. The higher
475 the number, the more RAM consumed by the ``ceph-osd`` daemon.
476
477 :Type: Integer
478 :Valid Range: ``1``. Agent doesn't handle > 1 yet.
479
480 .. _hit_set_period:
481
482 .. describe:: hit_set_period
483
484 The duration of a hit set period in seconds for cache pools.
485 The higher the number, the more RAM consumed by the
486 ``ceph-osd`` daemon.
487
488 :Type: Integer
489 :Example: ``3600`` 1hr
490
491 .. _hit_set_fpp:
492
493 .. describe:: hit_set_fpp
494
495 The false positive probability for the ``bloom`` hit set type.
496 See `Bloom Filter`_ for additional information.
497
498 :Type: Double
499 :Valid Range: 0.0 - 1.0
500 :Default: ``0.05``
501
502 .. _cache_target_dirty_ratio:
503
504 .. describe:: cache_target_dirty_ratio
505
506 The percentage of the cache pool containing modified (dirty)
507 objects before the cache tiering agent will flush them to the
508 backing storage pool.
509
510 :Type: Double
511 :Default: ``.4``
512
513 .. _cache_target_dirty_high_ratio:
514
515 .. describe:: cache_target_dirty_high_ratio
516
517 The percentage of the cache pool containing modified (dirty)
518 objects before the cache tiering agent will flush them to the
519 backing storage pool with a higher speed.
520
521 :Type: Double
522 :Default: ``.6``
523
524 .. _cache_target_full_ratio:
525
526 .. describe:: cache_target_full_ratio
527
528 The percentage of the cache pool containing unmodified (clean)
529 objects before the cache tiering agent will evict them from the
530 cache pool.
531
532 :Type: Double
533 :Default: ``.8``
534
535 .. _target_max_bytes:
536
537 .. describe:: target_max_bytes
538
539 Ceph will begin flushing or evicting objects when the
540 ``max_bytes`` threshold is triggered.
541
542 :Type: Integer
543 :Example: ``1000000000000`` #1-TB
544
545 .. _target_max_objects:
546
547 .. describe:: target_max_objects
548
549 Ceph will begin flushing or evicting objects when the
550 ``max_objects`` threshold is triggered.
551
552 :Type: Integer
553 :Example: ``1000000`` #1M objects
554
555
556 .. describe:: hit_set_grade_decay_rate
557
558 Temperature decay rate between two successive hit_sets
559
560 :Type: Integer
561 :Valid Range: 0 - 100
562 :Default: ``20``
563
564 .. describe:: hit_set_search_last_n
565
566 Count at most N appearance in hit_sets for temperature calculation
567
568 :Type: Integer
569 :Valid Range: 0 - hit_set_count
570 :Default: ``1``
571
572 .. _cache_min_flush_age:
573
574 .. describe:: cache_min_flush_age
575
576 The time (in seconds) before the cache tiering agent will flush
577 an object from the cache pool to the storage pool.
578
579 :Type: Integer
580 :Example: ``600`` 10min
581
582 .. _cache_min_evict_age:
583
584 .. describe:: cache_min_evict_age
585
586 The time (in seconds) before the cache tiering agent will evict
587 an object from the cache pool.
588
589 :Type: Integer
590 :Example: ``1800`` 30min
591
592 .. _fast_read:
593
594 .. describe:: fast_read
595
596 On Erasure Coding pool, if this flag is turned on, the read request
597 would issue sub reads to all shards, and waits until it receives enough
598 shards to decode to serve the client. In the case of jerasure and isa
599 erasure plugins, once the first K replies return, client's request is
600 served immediately using the data decoded from these replies. This
601 helps to tradeoff some resources for better performance. Currently this
602 flag is only supported for Erasure Coding pool.
603
604 :Type: Boolean
605 :Defaults: ``0``
606
607 .. _scrub_min_interval:
608
609 .. describe:: scrub_min_interval
610
611 The minimum interval in seconds for pool scrubbing when
612 load is low. If it is 0, the value osd_scrub_min_interval
613 from config is used.
614
615 :Type: Double
616 :Default: ``0``
617
618 .. _scrub_max_interval:
619
620 .. describe:: scrub_max_interval
621
622 The maximum interval in seconds for pool scrubbing
623 irrespective of cluster load. If it is 0, the value
624 osd_scrub_max_interval from config is used.
625
626 :Type: Double
627 :Default: ``0``
628
629 .. _deep_scrub_interval:
630
631 .. describe:: deep_scrub_interval
632
633 The interval in seconds for pool “deep” scrubbing. If it
634 is 0, the value osd_deep_scrub_interval from config is used.
635
636 :Type: Double
637 :Default: ``0``
638
639 .. _recovery_priority:
640
641 .. describe:: recovery_priority
642
643 When a value is set it will increase or decrease the computed
644 reservation priority. This value must be in the range -10 to
645 10. Use a negative priority for less important pools so they
646 have lower priority than any new pools.
647
648 :Type: Integer
649 :Default: ``0``
650
651
652 .. _recovery_op_priority:
653
654 .. describe:: recovery_op_priority
655
656 Specify the recovery operation priority for this pool instead of :confval:`osd_recovery_op_priority`.
657
658 :Type: Integer
659 :Default: ``0``
660
661
662 Get Pool Values
663 ===============
664
665 To get a value from a pool, execute the following::
666
667 ceph osd pool get {pool-name} {key}
668
669 You may get values for the following keys:
670
671 ``size``
672
673 :Description: see size_
674
675 :Type: Integer
676
677 ``min_size``
678
679 :Description: see min_size_
680
681 :Type: Integer
682 :Version: ``0.54`` and above
683
684 ``pg_num``
685
686 :Description: see pg_num_
687
688 :Type: Integer
689
690
691 ``pgp_num``
692
693 :Description: see pgp_num_
694
695 :Type: Integer
696 :Valid Range: Equal to or less than ``pg_num``.
697
698
699 ``crush_rule``
700
701 :Description: see crush_rule_
702
703
704 ``hit_set_type``
705
706 :Description: see hit_set_type_
707
708 :Type: String
709 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
710
711 ``hit_set_count``
712
713 :Description: see hit_set_count_
714
715 :Type: Integer
716
717
718 ``hit_set_period``
719
720 :Description: see hit_set_period_
721
722 :Type: Integer
723
724
725 ``hit_set_fpp``
726
727 :Description: see hit_set_fpp_
728
729 :Type: Double
730
731
732 ``cache_target_dirty_ratio``
733
734 :Description: see cache_target_dirty_ratio_
735
736 :Type: Double
737
738
739 ``cache_target_dirty_high_ratio``
740
741 :Description: see cache_target_dirty_high_ratio_
742
743 :Type: Double
744
745
746 ``cache_target_full_ratio``
747
748 :Description: see cache_target_full_ratio_
749
750 :Type: Double
751
752
753 ``target_max_bytes``
754
755 :Description: see target_max_bytes_
756
757 :Type: Integer
758
759
760 ``target_max_objects``
761
762 :Description: see target_max_objects_
763
764 :Type: Integer
765
766
767 ``cache_min_flush_age``
768
769 :Description: see cache_min_flush_age_
770
771 :Type: Integer
772
773
774 ``cache_min_evict_age``
775
776 :Description: see cache_min_evict_age_
777
778 :Type: Integer
779
780
781 ``fast_read``
782
783 :Description: see fast_read_
784
785 :Type: Boolean
786
787
788 ``scrub_min_interval``
789
790 :Description: see scrub_min_interval_
791
792 :Type: Double
793
794
795 ``scrub_max_interval``
796
797 :Description: see scrub_max_interval_
798
799 :Type: Double
800
801
802 ``deep_scrub_interval``
803
804 :Description: see deep_scrub_interval_
805
806 :Type: Double
807
808
809 ``allow_ec_overwrites``
810
811 :Description: see allow_ec_overwrites_
812
813 :Type: Boolean
814
815
816 ``recovery_priority``
817
818 :Description: see recovery_priority_
819
820 :Type: Integer
821
822
823 ``recovery_op_priority``
824
825 :Description: see recovery_op_priority_
826
827 :Type: Integer
828
829
830 Set the Number of Object Replicas
831 =================================
832
833 To set the number of object replicas on a replicated pool, execute the following::
834
835 ceph osd pool set {poolname} size {num-replicas}
836
837 .. important:: The ``{num-replicas}`` includes the object itself.
838 If you want the object and two copies of the object for a total of
839 three instances of the object, specify ``3``.
840
841 For example::
842
843 ceph osd pool set data size 3
844
845 You may execute this command for each pool. **Note:** An object might accept
846 I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
847 number of required replicas for I/O, you should use the ``min_size`` setting.
848 For example::
849
850 ceph osd pool set data min_size 2
851
852 This ensures that no object in the data pool will receive I/O with fewer than
853 ``min_size`` replicas.
854
855
856 Get the Number of Object Replicas
857 =================================
858
859 To get the number of object replicas, execute the following::
860
861 ceph osd dump | grep 'replicated size'
862
863 Ceph will list the pools, with the ``replicated size`` attribute highlighted.
864 By default, ceph creates two replicas of an object (a total of three copies, or
865 a size of 3).
866
867
868
869 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
870 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
871 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
872 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
873 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool