]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/pools.rst
import 15.2.4
[ceph.git] / ceph / doc / rados / operations / pools.rst
1 =======
2 Pools
3 =======
4
5 When you first deploy a cluster without creating a pool, Ceph uses the default
6 pools for storing data. A pool provides you with:
7
8 - **Resilience**: You can set how many OSD are allowed to fail without losing data.
9 For replicated pools, it is the desired number of copies/replicas of an object.
10 A typical configuration stores an object and one additional copy
11 (i.e., ``size = 2``), but you can determine the number of copies/replicas.
12 For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
13 (i.e. ``m=2`` in the **erasure code profile**)
14
15 - **Placement Groups**: You can set the number of placement groups for the pool.
16 A typical configuration uses approximately 100 placement groups per OSD to
17 provide optimal balancing without using up too many computing resources. When
18 setting up multiple pools, be careful to ensure you set a reasonable number of
19 placement groups for both the pool and the cluster as a whole.
20
21 - **CRUSH Rules**: When you store data in a pool, placement of the object
22 and its replicas (or chunks for erasure coded pools) in your cluster is governed
23 by CRUSH rules. You can create a custom CRUSH rule for your pool if the default
24 rule is not appropriate for your use case.
25
26 - **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
27 you effectively take a snapshot of a particular pool.
28
29 To organize data into pools, you can list, create, and remove pools.
30 You can also view the utilization statistics for each pool.
31
32 List Pools
33 ==========
34
35 To list your cluster's pools, execute::
36
37 ceph osd lspools
38
39
40 .. _createpool:
41
42 Create a Pool
43 =============
44
45 Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
46 Ideally, you should override the default value for the number of placement
47 groups in your Ceph configuration file, as the default is NOT ideal.
48 For details on placement group numbers refer to `setting the number of placement groups`_
49
50 .. note:: Starting with Luminous, all pools need to be associated to the
51 application using the pool. See `Associate Pool to Application`_ below for
52 more information.
53
54 For example::
55
56 osd pool default pg num = 100
57 osd pool default pgp num = 100
58
59 To create a pool, execute::
60
61 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
62 [crush-rule-name] [expected-num-objects]
63 ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
64 [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
65
66 Where:
67
68 ``{pool-name}``
69
70 :Description: The name of the pool. It must be unique.
71 :Type: String
72 :Required: Yes.
73
74 ``{pg-num}``
75
76 :Description: The total number of placement groups for the pool. See `Placement
77 Groups`_ for details on calculating a suitable number. The
78 default value ``8`` is NOT suitable for most systems.
79
80 :Type: Integer
81 :Required: Yes.
82 :Default: 8
83
84 ``{pgp-num}``
85
86 :Description: The total number of placement groups for placement purposes. This
87 **should be equal to the total number of placement groups**, except
88 for placement group splitting scenarios.
89
90 :Type: Integer
91 :Required: Yes. Picks up default or Ceph configuration value if not specified.
92 :Default: 8
93
94 ``{replicated|erasure}``
95
96 :Description: The pool type which may either be **replicated** to
97 recover from lost OSDs by keeping multiple copies of the
98 objects or **erasure** to get a kind of
99 `generalized RAID5 <../erasure-code>`_ capability.
100 The **replicated** pools require more
101 raw storage but implement all Ceph operations. The
102 **erasure** pools require less raw storage but only
103 implement a subset of the available operations.
104
105 :Type: String
106 :Required: No.
107 :Default: replicated
108
109 ``[crush-rule-name]``
110
111 :Description: The name of a CRUSH rule to use for this pool. The specified
112 rule must exist.
113
114 :Type: String
115 :Required: No.
116 :Default: For **replicated** pools it is the rule specified by the ``osd
117 pool default crush rule`` config variable. This rule must exist.
118 For **erasure** pools it is ``erasure-code`` if the ``default``
119 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
120 rule will be created implicitly if it doesn't exist already.
121
122
123 ``[erasure-code-profile=profile]``
124
125 .. _erasure code profile: ../erasure-code-profile
126
127 :Description: For **erasure** pools only. Use the `erasure code profile`_. It
128 must be an existing profile as defined by
129 **osd erasure-code-profile set**.
130
131 :Type: String
132 :Required: No.
133
134 ``--autoscale-mode=<on,off,warn>``
135
136 :Description: Autoscale mode
137
138 :Type: String
139 :Required: No.
140 :Default: The default behavior is controlled by the ``osd pool default pg autoscale mode`` option.
141
142 If you set the autoscale mode to ``on`` or ``warn``, you can let the system autotune or recommend changes to the number of placement groups in your pool based on actual usage. If you leave it off, then you should refer to `Placement Groups`_ for more information.
143
144 .. _Placement Groups: ../placement-groups
145
146 ``[expected-num-objects]``
147
148 :Description: The expected number of objects for this pool. By setting this value (
149 together with a negative **filestore merge threshold**), the PG folder
150 splitting would happen at the pool creation time, to avoid the latency
151 impact to do a runtime folder splitting.
152
153 :Type: Integer
154 :Required: No.
155 :Default: 0, no splitting at the pool creation time.
156
157 .. _associate-pool-to-application:
158
159 Associate Pool to Application
160 =============================
161
162 Pools need to be associated with an application before use. Pools that will be
163 used with CephFS or pools that are automatically created by RGW are
164 automatically associated. Pools that are intended for use with RBD should be
165 initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
166 information).
167
168 For other cases, you can manually associate a free-form application name to
169 a pool.::
170
171 ceph osd pool application enable {pool-name} {application-name}
172
173 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
174 application name ``rbd``, and RGW uses the application name ``rgw``.
175
176 Set Pool Quotas
177 ===============
178
179 You can set pool quotas for the maximum number of bytes and/or the maximum
180 number of objects per pool. ::
181
182 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
183
184 For example::
185
186 ceph osd pool set-quota data max_objects 10000
187
188 To remove a quota, set its value to ``0``.
189
190
191 Delete a Pool
192 =============
193
194 To delete a pool, execute::
195
196 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
197
198
199 To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
200 configuration. Otherwise they will refuse to remove a pool.
201
202 See `Monitor Configuration`_ for more information.
203
204 .. _Monitor Configuration: ../../configuration/mon-config-ref
205
206 If you created your own rules for a pool you created, you should consider
207 removing them when you no longer need your pool::
208
209 ceph osd pool get {pool-name} crush_rule
210
211 If the rule was "123", for example, you can check the other pools like so::
212
213 ceph osd dump | grep "^pool" | grep "crush_rule 123"
214
215 If no other pools use that custom rule, then it's safe to delete that
216 rule from the cluster.
217
218 If you created users with permissions strictly for a pool that no longer
219 exists, you should consider deleting those users too::
220
221 ceph auth ls | grep -C 5 {pool-name}
222 ceph auth del {user}
223
224
225 Rename a Pool
226 =============
227
228 To rename a pool, execute::
229
230 ceph osd pool rename {current-pool-name} {new-pool-name}
231
232 If you rename a pool and you have per-pool capabilities for an authenticated
233 user, you must update the user's capabilities (i.e., caps) with the new pool
234 name.
235
236 Show Pool Statistics
237 ====================
238
239 To show a pool's utilization statistics, execute::
240
241 rados df
242
243 Additionally, to obtain I/O information for a specific pool or all, execute::
244
245 ceph osd pool stats [{pool-name}]
246
247
248 Make a Snapshot of a Pool
249 =========================
250
251 To make a snapshot of a pool, execute::
252
253 ceph osd pool mksnap {pool-name} {snap-name}
254
255 Remove a Snapshot of a Pool
256 ===========================
257
258 To remove a snapshot of a pool, execute::
259
260 ceph osd pool rmsnap {pool-name} {snap-name}
261
262 .. _setpoolvalues:
263
264
265 Set Pool Values
266 ===============
267
268 To set a value to a pool, execute the following::
269
270 ceph osd pool set {pool-name} {key} {value}
271
272 You may set values for the following keys:
273
274 .. _compression_algorithm:
275
276 ``compression_algorithm``
277
278 :Description: Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression algorithm``.
279
280 :Type: String
281 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
282
283 ``compression_mode``
284
285 :Description: Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression mode``.
286
287 :Type: String
288 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
289
290 ``compression_min_blob_size``
291
292 :Description: Chunks smaller than this are never compressed. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression min blob *``.
293
294 :Type: Unsigned Integer
295
296 ``compression_max_blob_size``
297
298 :Description: Chunks larger than this are broken into smaller blobs sizing
299 ``compression_max_blob_size`` before being compressed.
300
301 :Type: Unsigned Integer
302
303 .. _size:
304
305 ``size``
306
307 :Description: Sets the number of replicas for objects in the pool.
308 See `Set the Number of Object Replicas`_ for further details.
309 Replicated pools only.
310
311 :Type: Integer
312
313 .. _min_size:
314
315 ``min_size``
316
317 :Description: Sets the minimum number of replicas required for I/O.
318 See `Set the Number of Object Replicas`_ for further details.
319 In the case of Erasure Coded pools this should be set to a value
320 greater than 'k' since if we allow IO at the value 'k' there is no
321 redundancy and data will be lost in the event of a permanent OSD
322 failure. For more information see `Erasure Code
323 <../erasure-code>`_
324
325 :Type: Integer
326 :Version: ``0.54`` and above
327
328 .. _pg_num:
329
330 ``pg_num``
331
332 :Description: The effective number of placement groups to use when calculating
333 data placement.
334 :Type: Integer
335 :Valid Range: Superior to ``pg_num`` current value.
336
337 .. _pgp_num:
338
339 ``pgp_num``
340
341 :Description: The effective number of placement groups for placement to use
342 when calculating data placement.
343
344 :Type: Integer
345 :Valid Range: Equal to or less than ``pg_num``.
346
347 .. _crush_rule:
348
349 ``crush_rule``
350
351 :Description: The rule to use for mapping object placement in the cluster.
352 :Type: String
353
354 .. _allow_ec_overwrites:
355
356 ``allow_ec_overwrites``
357
358 :Description: Whether writes to an erasure coded pool can update part
359 of an object, so cephfs and rbd can use it. See
360 `Erasure Coding with Overwrites`_ for more details.
361 :Type: Boolean
362 :Version: ``12.2.0`` and above
363
364 .. _hashpspool:
365
366 ``hashpspool``
367
368 :Description: Set/Unset HASHPSPOOL flag on a given pool.
369 :Type: Integer
370 :Valid Range: 1 sets flag, 0 unsets flag
371
372 .. _nodelete:
373
374 ``nodelete``
375
376 :Description: Set/Unset NODELETE flag on a given pool.
377 :Type: Integer
378 :Valid Range: 1 sets flag, 0 unsets flag
379 :Version: Version ``FIXME``
380
381 .. _nopgchange:
382
383 ``nopgchange``
384
385 :Description: Set/Unset NOPGCHANGE flag on a given pool.
386 :Type: Integer
387 :Valid Range: 1 sets flag, 0 unsets flag
388 :Version: Version ``FIXME``
389
390 .. _nosizechange:
391
392 ``nosizechange``
393
394 :Description: Set/Unset NOSIZECHANGE flag on a given pool.
395 :Type: Integer
396 :Valid Range: 1 sets flag, 0 unsets flag
397 :Version: Version ``FIXME``
398
399 .. _write_fadvise_dontneed:
400
401 ``write_fadvise_dontneed``
402
403 :Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
404 :Type: Integer
405 :Valid Range: 1 sets flag, 0 unsets flag
406
407 .. _noscrub:
408
409 ``noscrub``
410
411 :Description: Set/Unset NOSCRUB flag on a given pool.
412 :Type: Integer
413 :Valid Range: 1 sets flag, 0 unsets flag
414
415 .. _nodeep-scrub:
416
417 ``nodeep-scrub``
418
419 :Description: Set/Unset NODEEP_SCRUB flag on a given pool.
420 :Type: Integer
421 :Valid Range: 1 sets flag, 0 unsets flag
422
423 .. _hit_set_type:
424
425 ``hit_set_type``
426
427 :Description: Enables hit set tracking for cache pools.
428 See `Bloom Filter`_ for additional information.
429
430 :Type: String
431 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
432 :Default: ``bloom``. Other values are for testing.
433
434 .. _hit_set_count:
435
436 ``hit_set_count``
437
438 :Description: The number of hit sets to store for cache pools. The higher
439 the number, the more RAM consumed by the ``ceph-osd`` daemon.
440
441 :Type: Integer
442 :Valid Range: ``1``. Agent doesn't handle > 1 yet.
443
444 .. _hit_set_period:
445
446 ``hit_set_period``
447
448 :Description: The duration of a hit set period in seconds for cache pools.
449 The higher the number, the more RAM consumed by the
450 ``ceph-osd`` daemon.
451
452 :Type: Integer
453 :Example: ``3600`` 1hr
454
455 .. _hit_set_fpp:
456
457 ``hit_set_fpp``
458
459 :Description: The false positive probability for the ``bloom`` hit set type.
460 See `Bloom Filter`_ for additional information.
461
462 :Type: Double
463 :Valid Range: 0.0 - 1.0
464 :Default: ``0.05``
465
466 .. _cache_target_dirty_ratio:
467
468 ``cache_target_dirty_ratio``
469
470 :Description: The percentage of the cache pool containing modified (dirty)
471 objects before the cache tiering agent will flush them to the
472 backing storage pool.
473
474 :Type: Double
475 :Default: ``.4``
476
477 .. _cache_target_dirty_high_ratio:
478
479 ``cache_target_dirty_high_ratio``
480
481 :Description: The percentage of the cache pool containing modified (dirty)
482 objects before the cache tiering agent will flush them to the
483 backing storage pool with a higher speed.
484
485 :Type: Double
486 :Default: ``.6``
487
488 .. _cache_target_full_ratio:
489
490 ``cache_target_full_ratio``
491
492 :Description: The percentage of the cache pool containing unmodified (clean)
493 objects before the cache tiering agent will evict them from the
494 cache pool.
495
496 :Type: Double
497 :Default: ``.8``
498
499 .. _target_max_bytes:
500
501 ``target_max_bytes``
502
503 :Description: Ceph will begin flushing or evicting objects when the
504 ``max_bytes`` threshold is triggered.
505
506 :Type: Integer
507 :Example: ``1000000000000`` #1-TB
508
509 .. _target_max_objects:
510
511 ``target_max_objects``
512
513 :Description: Ceph will begin flushing or evicting objects when the
514 ``max_objects`` threshold is triggered.
515
516 :Type: Integer
517 :Example: ``1000000`` #1M objects
518
519
520 ``hit_set_grade_decay_rate``
521
522 :Description: Temperature decay rate between two successive hit_sets
523 :Type: Integer
524 :Valid Range: 0 - 100
525 :Default: ``20``
526
527
528 ``hit_set_search_last_n``
529
530 :Description: Count at most N appearance in hit_sets for temperature calculation
531 :Type: Integer
532 :Valid Range: 0 - hit_set_count
533 :Default: ``1``
534
535
536 .. _cache_min_flush_age:
537
538 ``cache_min_flush_age``
539
540 :Description: The time (in seconds) before the cache tiering agent will flush
541 an object from the cache pool to the storage pool.
542
543 :Type: Integer
544 :Example: ``600`` 10min
545
546 .. _cache_min_evict_age:
547
548 ``cache_min_evict_age``
549
550 :Description: The time (in seconds) before the cache tiering agent will evict
551 an object from the cache pool.
552
553 :Type: Integer
554 :Example: ``1800`` 30min
555
556 .. _fast_read:
557
558 ``fast_read``
559
560 :Description: On Erasure Coding pool, if this flag is turned on, the read request
561 would issue sub reads to all shards, and waits until it receives enough
562 shards to decode to serve the client. In the case of jerasure and isa
563 erasure plugins, once the first K replies return, client's request is
564 served immediately using the data decoded from these replies. This
565 helps to tradeoff some resources for better performance. Currently this
566 flag is only supported for Erasure Coding pool.
567
568 :Type: Boolean
569 :Defaults: ``0``
570
571 .. _scrub_min_interval:
572
573 ``scrub_min_interval``
574
575 :Description: The minimum interval in seconds for pool scrubbing when
576 load is low. If it is 0, the value osd_scrub_min_interval
577 from config is used.
578
579 :Type: Double
580 :Default: ``0``
581
582 .. _scrub_max_interval:
583
584 ``scrub_max_interval``
585
586 :Description: The maximum interval in seconds for pool scrubbing
587 irrespective of cluster load. If it is 0, the value
588 osd_scrub_max_interval from config is used.
589
590 :Type: Double
591 :Default: ``0``
592
593 .. _deep_scrub_interval:
594
595 ``deep_scrub_interval``
596
597 :Description: The interval in seconds for pool “deep” scrubbing. If it
598 is 0, the value osd_deep_scrub_interval from config is used.
599
600 :Type: Double
601 :Default: ``0``
602
603
604 .. _recovery_priority:
605
606 ``recovery_priority``
607
608 :Description: When a value is set it will increase or decrease the computed
609 reservation priority. This value must be in the range -10 to
610 10. Use a negative priority for less important pools so they
611 have lower priority than any new pools.
612
613 :Type: Integer
614 :Default: ``0``
615
616
617 .. _recovery_op_priority:
618
619 ``recovery_op_priority``
620
621 :Description: Specify the recovery operation priority for this pool instead of ``osd_recovery_op_priority``.
622
623 :Type: Integer
624 :Default: ``0``
625
626
627 Get Pool Values
628 ===============
629
630 To get a value from a pool, execute the following::
631
632 ceph osd pool get {pool-name} {key}
633
634 You may get values for the following keys:
635
636 ``size``
637
638 :Description: see size_
639
640 :Type: Integer
641
642 ``min_size``
643
644 :Description: see min_size_
645
646 :Type: Integer
647 :Version: ``0.54`` and above
648
649 ``pg_num``
650
651 :Description: see pg_num_
652
653 :Type: Integer
654
655
656 ``pgp_num``
657
658 :Description: see pgp_num_
659
660 :Type: Integer
661 :Valid Range: Equal to or less than ``pg_num``.
662
663
664 ``crush_rule``
665
666 :Description: see crush_rule_
667
668
669 ``hit_set_type``
670
671 :Description: see hit_set_type_
672
673 :Type: String
674 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
675
676 ``hit_set_count``
677
678 :Description: see hit_set_count_
679
680 :Type: Integer
681
682
683 ``hit_set_period``
684
685 :Description: see hit_set_period_
686
687 :Type: Integer
688
689
690 ``hit_set_fpp``
691
692 :Description: see hit_set_fpp_
693
694 :Type: Double
695
696
697 ``cache_target_dirty_ratio``
698
699 :Description: see cache_target_dirty_ratio_
700
701 :Type: Double
702
703
704 ``cache_target_dirty_high_ratio``
705
706 :Description: see cache_target_dirty_high_ratio_
707
708 :Type: Double
709
710
711 ``cache_target_full_ratio``
712
713 :Description: see cache_target_full_ratio_
714
715 :Type: Double
716
717
718 ``target_max_bytes``
719
720 :Description: see target_max_bytes_
721
722 :Type: Integer
723
724
725 ``target_max_objects``
726
727 :Description: see target_max_objects_
728
729 :Type: Integer
730
731
732 ``cache_min_flush_age``
733
734 :Description: see cache_min_flush_age_
735
736 :Type: Integer
737
738
739 ``cache_min_evict_age``
740
741 :Description: see cache_min_evict_age_
742
743 :Type: Integer
744
745
746 ``fast_read``
747
748 :Description: see fast_read_
749
750 :Type: Boolean
751
752
753 ``scrub_min_interval``
754
755 :Description: see scrub_min_interval_
756
757 :Type: Double
758
759
760 ``scrub_max_interval``
761
762 :Description: see scrub_max_interval_
763
764 :Type: Double
765
766
767 ``deep_scrub_interval``
768
769 :Description: see deep_scrub_interval_
770
771 :Type: Double
772
773
774 ``allow_ec_overwrites``
775
776 :Description: see allow_ec_overwrites_
777
778 :Type: Boolean
779
780
781 ``recovery_priority``
782
783 :Description: see recovery_priority_
784
785 :Type: Integer
786
787
788 ``recovery_op_priority``
789
790 :Description: see recovery_op_priority_
791
792 :Type: Integer
793
794
795 Set the Number of Object Replicas
796 =================================
797
798 To set the number of object replicas on a replicated pool, execute the following::
799
800 ceph osd pool set {poolname} size {num-replicas}
801
802 .. important:: The ``{num-replicas}`` includes the object itself.
803 If you want the object and two copies of the object for a total of
804 three instances of the object, specify ``3``.
805
806 For example::
807
808 ceph osd pool set data size 3
809
810 You may execute this command for each pool. **Note:** An object might accept
811 I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
812 number of required replicas for I/O, you should use the ``min_size`` setting.
813 For example::
814
815 ceph osd pool set data min_size 2
816
817 This ensures that no object in the data pool will receive I/O with fewer than
818 ``min_size`` replicas.
819
820
821 Get the Number of Object Replicas
822 =================================
823
824 To get the number of object replicas, execute the following::
825
826 ceph osd dump | grep 'replicated size'
827
828 Ceph will list the pools, with the ``replicated size`` attribute highlighted.
829 By default, ceph creates two replicas of an object (a total of three copies, or
830 a size of 3).
831
832
833
834 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
835 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
836 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
837 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
838 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool
839