]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/pools.rst
import ceph nautilus 14.2.2
[ceph.git] / ceph / doc / rados / operations / pools.rst
1 =======
2 Pools
3 =======
4
5 When you first deploy a cluster without creating a pool, Ceph uses the default
6 pools for storing data. A pool provides you with:
7
8 - **Resilience**: You can set how many OSD are allowed to fail without losing data.
9 For replicated pools, it is the desired number of copies/replicas of an object.
10 A typical configuration stores an object and one additional copy
11 (i.e., ``size = 2``), but you can determine the number of copies/replicas.
12 For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
13 (i.e. ``m=2`` in the **erasure code profile**)
14
15 - **Placement Groups**: You can set the number of placement groups for the pool.
16 A typical configuration uses approximately 100 placement groups per OSD to
17 provide optimal balancing without using up too many computing resources. When
18 setting up multiple pools, be careful to ensure you set a reasonable number of
19 placement groups for both the pool and the cluster as a whole.
20
21 - **CRUSH Rules**: When you store data in a pool, placement of the object
22 and its replicas (or chunks for erasure coded pools) in your cluster is governed
23 by CRUSH rules. You can create a custom CRUSH rule for your pool if the default
24 rule is not appropriate for your use case.
25
26 - **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
27 you effectively take a snapshot of a particular pool.
28
29 To organize data into pools, you can list, create, and remove pools.
30 You can also view the utilization statistics for each pool.
31
32 List Pools
33 ==========
34
35 To list your cluster's pools, execute::
36
37 ceph osd lspools
38
39
40 .. _createpool:
41
42 Create a Pool
43 =============
44
45 Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
46 Ideally, you should override the default value for the number of placement
47 groups in your Ceph configuration file, as the default is NOT ideal.
48 For details on placement group numbers refer to `setting the number of placement groups`_
49
50 .. note:: Starting with Luminous, all pools need to be associated to the
51 application using the pool. See `Associate Pool to Application`_ below for
52 more information.
53
54 For example::
55
56 osd pool default pg num = 100
57 osd pool default pgp num = 100
58
59 To create a pool, execute::
60
61 ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
62 [crush-rule-name] [expected-num-objects]
63 ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
64 [erasure-code-profile] [crush-rule-name] [expected_num_objects]
65
66 Where:
67
68 ``{pool-name}``
69
70 :Description: The name of the pool. It must be unique.
71 :Type: String
72 :Required: Yes.
73
74 ``{pg-num}``
75
76 :Description: The total number of placement groups for the pool. See `Placement
77 Groups`_ for details on calculating a suitable number. The
78 default value ``8`` is NOT suitable for most systems.
79
80 :Type: Integer
81 :Required: Yes.
82 :Default: 8
83
84 ``{pgp-num}``
85
86 :Description: The total number of placement groups for placement purposes. This
87 **should be equal to the total number of placement groups**, except
88 for placement group splitting scenarios.
89
90 :Type: Integer
91 :Required: Yes. Picks up default or Ceph configuration value if not specified.
92 :Default: 8
93
94 ``{replicated|erasure}``
95
96 :Description: The pool type which may either be **replicated** to
97 recover from lost OSDs by keeping multiple copies of the
98 objects or **erasure** to get a kind of
99 `generalized RAID5 <../erasure-code>`_ capability.
100 The **replicated** pools require more
101 raw storage but implement all Ceph operations. The
102 **erasure** pools require less raw storage but only
103 implement a subset of the available operations.
104
105 :Type: String
106 :Required: No.
107 :Default: replicated
108
109 ``[crush-rule-name]``
110
111 :Description: The name of a CRUSH rule to use for this pool. The specified
112 rule must exist.
113
114 :Type: String
115 :Required: No.
116 :Default: For **replicated** pools it is the rule specified by the ``osd
117 pool default crush rule`` config variable. This rule must exist.
118 For **erasure** pools it is ``erasure-code`` if the ``default``
119 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
120 rule will be created implicitly if it doesn't exist already.
121
122
123 ``[erasure-code-profile=profile]``
124
125 .. _erasure code profile: ../erasure-code-profile
126
127 :Description: For **erasure** pools only. Use the `erasure code profile`_. It
128 must be an existing profile as defined by
129 **osd erasure-code-profile set**.
130
131 :Type: String
132 :Required: No.
133
134 When you create a pool, set the number of placement groups to a reasonable value
135 (e.g., ``100``). Consider the total number of placement groups per OSD too.
136 Placement groups are computationally expensive, so performance will degrade when
137 you have many pools with many placement groups (e.g., 50 pools with 100
138 placement groups each). The point of diminishing returns depends upon the power
139 of the OSD host.
140
141 See `Placement Groups`_ for details on calculating an appropriate number of
142 placement groups for your pool.
143
144 .. _Placement Groups: ../placement-groups
145
146 ``[expected-num-objects]``
147
148 :Description: The expected number of objects for this pool. By setting this value (
149 together with a negative **filestore merge threshold**), the PG folder
150 splitting would happen at the pool creation time, to avoid the latency
151 impact to do a runtime folder splitting.
152
153 :Type: Integer
154 :Required: No.
155 :Default: 0, no splitting at the pool creation time.
156
157 .. _associate-pool-to-application:
158
159 Associate Pool to Application
160 =============================
161
162 Pools need to be associated with an application before use. Pools that will be
163 used with CephFS or pools that are automatically created by RGW are
164 automatically associated. Pools that are intended for use with RBD should be
165 initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
166 information).
167
168 For other cases, you can manually associate a free-form application name to
169 a pool.::
170
171 ceph osd pool application enable {pool-name} {application-name}
172
173 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
174 application name ``rbd``, and RGW uses the application name ``rgw``.
175
176 Set Pool Quotas
177 ===============
178
179 You can set pool quotas for the maximum number of bytes and/or the maximum
180 number of objects per pool. ::
181
182 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
183
184 For example::
185
186 ceph osd pool set-quota data max_objects 10000
187
188 To remove a quota, set its value to ``0``.
189
190
191 Delete a Pool
192 =============
193
194 To delete a pool, execute::
195
196 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
197
198
199 To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
200 configuration. Otherwise they will refuse to remove a pool.
201
202 See `Monitor Configuration`_ for more information.
203
204 .. _Monitor Configuration: ../../configuration/mon-config-ref
205
206 If you created your own rules for a pool you created, you should consider
207 removing them when you no longer need your pool::
208
209 ceph osd pool get {pool-name} crush_rule
210
211 If the rule was "123", for example, you can check the other pools like so::
212
213 ceph osd dump | grep "^pool" | grep "crush_rule 123"
214
215 If no other pools use that custom rule, then it's safe to delete that
216 rule from the cluster.
217
218 If you created users with permissions strictly for a pool that no longer
219 exists, you should consider deleting those users too::
220
221 ceph auth ls | grep -C 5 {pool-name}
222 ceph auth del {user}
223
224
225 Rename a Pool
226 =============
227
228 To rename a pool, execute::
229
230 ceph osd pool rename {current-pool-name} {new-pool-name}
231
232 If you rename a pool and you have per-pool capabilities for an authenticated
233 user, you must update the user's capabilities (i.e., caps) with the new pool
234 name.
235
236 Show Pool Statistics
237 ====================
238
239 To show a pool's utilization statistics, execute::
240
241 rados df
242
243 Additionally, to obtain I/O information for a specific pool or all, execute::
244
245 ceph osd pool stats [{pool-name}]
246
247
248 Make a Snapshot of a Pool
249 =========================
250
251 To make a snapshot of a pool, execute::
252
253 ceph osd pool mksnap {pool-name} {snap-name}
254
255 Remove a Snapshot of a Pool
256 ===========================
257
258 To remove a snapshot of a pool, execute::
259
260 ceph osd pool rmsnap {pool-name} {snap-name}
261
262 .. _setpoolvalues:
263
264
265 Set Pool Values
266 ===============
267
268 To set a value to a pool, execute the following::
269
270 ceph osd pool set {pool-name} {key} {value}
271
272 You may set values for the following keys:
273
274 .. _compression_algorithm:
275
276 ``compression_algorithm``
277
278 :Description: Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression algorithm``.
279
280 :Type: String
281 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
282
283 ``compression_mode``
284
285 :Description: Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression mode``.
286
287 :Type: String
288 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
289
290 ``compression_min_blob_size``
291
292 :Description: Chunks smaller than this are never compressed. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression min blob *``.
293
294 :Type: Unsigned Integer
295
296 ``compression_max_blob_size``
297
298 :Description: Chunks larger than this are broken into smaller blobs sizing
299 ``compression_max_blob_size`` before being compressed.
300
301 :Type: Unsigned Integer
302
303 .. _size:
304
305 ``size``
306
307 :Description: Sets the number of replicas for objects in the pool.
308 See `Set the Number of Object Replicas`_ for further details.
309 Replicated pools only.
310
311 :Type: Integer
312
313 .. _min_size:
314
315 ``min_size``
316
317 :Description: Sets the minimum number of replicas required for I/O.
318 See `Set the Number of Object Replicas`_ for further details.
319 Replicated pools only.
320
321 :Type: Integer
322 :Version: ``0.54`` and above
323
324 .. _pg_num:
325
326 ``pg_num``
327
328 :Description: The effective number of placement groups to use when calculating
329 data placement.
330 :Type: Integer
331 :Valid Range: Superior to ``pg_num`` current value.
332
333 .. _pgp_num:
334
335 ``pgp_num``
336
337 :Description: The effective number of placement groups for placement to use
338 when calculating data placement.
339
340 :Type: Integer
341 :Valid Range: Equal to or less than ``pg_num``.
342
343 .. _crush_rule:
344
345 ``crush_rule``
346
347 :Description: The rule to use for mapping object placement in the cluster.
348 :Type: Integer
349
350 .. _allow_ec_overwrites:
351
352 ``allow_ec_overwrites``
353
354 :Description: Whether writes to an erasure coded pool can update part
355 of an object, so cephfs and rbd can use it. See
356 `Erasure Coding with Overwrites`_ for more details.
357 :Type: Boolean
358 :Version: ``12.2.0`` and above
359
360 .. _hashpspool:
361
362 ``hashpspool``
363
364 :Description: Set/Unset HASHPSPOOL flag on a given pool.
365 :Type: Integer
366 :Valid Range: 1 sets flag, 0 unsets flag
367
368 .. _nodelete:
369
370 ``nodelete``
371
372 :Description: Set/Unset NODELETE flag on a given pool.
373 :Type: Integer
374 :Valid Range: 1 sets flag, 0 unsets flag
375 :Version: Version ``FIXME``
376
377 .. _nopgchange:
378
379 ``nopgchange``
380
381 :Description: Set/Unset NOPGCHANGE flag on a given pool.
382 :Type: Integer
383 :Valid Range: 1 sets flag, 0 unsets flag
384 :Version: Version ``FIXME``
385
386 .. _nosizechange:
387
388 ``nosizechange``
389
390 :Description: Set/Unset NOSIZECHANGE flag on a given pool.
391 :Type: Integer
392 :Valid Range: 1 sets flag, 0 unsets flag
393 :Version: Version ``FIXME``
394
395 .. _write_fadvise_dontneed:
396
397 ``write_fadvise_dontneed``
398
399 :Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
400 :Type: Integer
401 :Valid Range: 1 sets flag, 0 unsets flag
402
403 .. _noscrub:
404
405 ``noscrub``
406
407 :Description: Set/Unset NOSCRUB flag on a given pool.
408 :Type: Integer
409 :Valid Range: 1 sets flag, 0 unsets flag
410
411 .. _nodeep-scrub:
412
413 ``nodeep-scrub``
414
415 :Description: Set/Unset NODEEP_SCRUB flag on a given pool.
416 :Type: Integer
417 :Valid Range: 1 sets flag, 0 unsets flag
418
419 .. _hit_set_type:
420
421 ``hit_set_type``
422
423 :Description: Enables hit set tracking for cache pools.
424 See `Bloom Filter`_ for additional information.
425
426 :Type: String
427 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
428 :Default: ``bloom``. Other values are for testing.
429
430 .. _hit_set_count:
431
432 ``hit_set_count``
433
434 :Description: The number of hit sets to store for cache pools. The higher
435 the number, the more RAM consumed by the ``ceph-osd`` daemon.
436
437 :Type: Integer
438 :Valid Range: ``1``. Agent doesn't handle > 1 yet.
439
440 .. _hit_set_period:
441
442 ``hit_set_period``
443
444 :Description: The duration of a hit set period in seconds for cache pools.
445 The higher the number, the more RAM consumed by the
446 ``ceph-osd`` daemon.
447
448 :Type: Integer
449 :Example: ``3600`` 1hr
450
451 .. _hit_set_fpp:
452
453 ``hit_set_fpp``
454
455 :Description: The false positive probability for the ``bloom`` hit set type.
456 See `Bloom Filter`_ for additional information.
457
458 :Type: Double
459 :Valid Range: 0.0 - 1.0
460 :Default: ``0.05``
461
462 .. _cache_target_dirty_ratio:
463
464 ``cache_target_dirty_ratio``
465
466 :Description: The percentage of the cache pool containing modified (dirty)
467 objects before the cache tiering agent will flush them to the
468 backing storage pool.
469
470 :Type: Double
471 :Default: ``.4``
472
473 .. _cache_target_dirty_high_ratio:
474
475 ``cache_target_dirty_high_ratio``
476
477 :Description: The percentage of the cache pool containing modified (dirty)
478 objects before the cache tiering agent will flush them to the
479 backing storage pool with a higher speed.
480
481 :Type: Double
482 :Default: ``.6``
483
484 .. _cache_target_full_ratio:
485
486 ``cache_target_full_ratio``
487
488 :Description: The percentage of the cache pool containing unmodified (clean)
489 objects before the cache tiering agent will evict them from the
490 cache pool.
491
492 :Type: Double
493 :Default: ``.8``
494
495 .. _target_max_bytes:
496
497 ``target_max_bytes``
498
499 :Description: Ceph will begin flushing or evicting objects when the
500 ``max_bytes`` threshold is triggered.
501
502 :Type: Integer
503 :Example: ``1000000000000`` #1-TB
504
505 .. _target_max_objects:
506
507 ``target_max_objects``
508
509 :Description: Ceph will begin flushing or evicting objects when the
510 ``max_objects`` threshold is triggered.
511
512 :Type: Integer
513 :Example: ``1000000`` #1M objects
514
515
516 ``hit_set_grade_decay_rate``
517
518 :Description: Temperature decay rate between two successive hit_sets
519 :Type: Integer
520 :Valid Range: 0 - 100
521 :Default: ``20``
522
523
524 ``hit_set_search_last_n``
525
526 :Description: Count at most N appearance in hit_sets for temperature calculation
527 :Type: Integer
528 :Valid Range: 0 - hit_set_count
529 :Default: ``1``
530
531
532 .. _cache_min_flush_age:
533
534 ``cache_min_flush_age``
535
536 :Description: The time (in seconds) before the cache tiering agent will flush
537 an object from the cache pool to the storage pool.
538
539 :Type: Integer
540 :Example: ``600`` 10min
541
542 .. _cache_min_evict_age:
543
544 ``cache_min_evict_age``
545
546 :Description: The time (in seconds) before the cache tiering agent will evict
547 an object from the cache pool.
548
549 :Type: Integer
550 :Example: ``1800`` 30min
551
552 .. _fast_read:
553
554 ``fast_read``
555
556 :Description: On Erasure Coding pool, if this flag is turned on, the read request
557 would issue sub reads to all shards, and waits until it receives enough
558 shards to decode to serve the client. In the case of jerasure and isa
559 erasure plugins, once the first K replies return, client's request is
560 served immediately using the data decoded from these replies. This
561 helps to tradeoff some resources for better performance. Currently this
562 flag is only supported for Erasure Coding pool.
563
564 :Type: Boolean
565 :Defaults: ``0``
566
567 .. _scrub_min_interval:
568
569 ``scrub_min_interval``
570
571 :Description: The minimum interval in seconds for pool scrubbing when
572 load is low. If it is 0, the value osd_scrub_min_interval
573 from config is used.
574
575 :Type: Double
576 :Default: ``0``
577
578 .. _scrub_max_interval:
579
580 ``scrub_max_interval``
581
582 :Description: The maximum interval in seconds for pool scrubbing
583 irrespective of cluster load. If it is 0, the value
584 osd_scrub_max_interval from config is used.
585
586 :Type: Double
587 :Default: ``0``
588
589 .. _deep_scrub_interval:
590
591 ``deep_scrub_interval``
592
593 :Description: The interval in seconds for pool “deep” scrubbing. If it
594 is 0, the value osd_deep_scrub_interval from config is used.
595
596 :Type: Double
597 :Default: ``0``
598
599
600 .. _recovery_priority:
601
602 ``recovery_priority``
603
604 :Description: When a value is set it will increase or decrease the computed
605 reservation priority. This value must be in the range -10 to
606 10. Use a negative priority for less important pools so they
607 have lower priority than any new pools.
608
609 :Type: Integer
610 :Default: ``0``
611
612
613 .. _recovery_op_priority:
614
615 ``recovery_op_priority``
616
617 :Description: Specify the recovery operation priority for this pool instead of ``osd_recovery_op_priority``.
618
619 :Type: Integer
620 :Default: ``0``
621
622
623 Get Pool Values
624 ===============
625
626 To get a value from a pool, execute the following::
627
628 ceph osd pool get {pool-name} {key}
629
630 You may get values for the following keys:
631
632 ``size``
633
634 :Description: see size_
635
636 :Type: Integer
637
638 ``min_size``
639
640 :Description: see min_size_
641
642 :Type: Integer
643 :Version: ``0.54`` and above
644
645 ``pg_num``
646
647 :Description: see pg_num_
648
649 :Type: Integer
650
651
652 ``pgp_num``
653
654 :Description: see pgp_num_
655
656 :Type: Integer
657 :Valid Range: Equal to or less than ``pg_num``.
658
659
660 ``crush_rule``
661
662 :Description: see crush_rule_
663
664
665 ``hit_set_type``
666
667 :Description: see hit_set_type_
668
669 :Type: String
670 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
671
672 ``hit_set_count``
673
674 :Description: see hit_set_count_
675
676 :Type: Integer
677
678
679 ``hit_set_period``
680
681 :Description: see hit_set_period_
682
683 :Type: Integer
684
685
686 ``hit_set_fpp``
687
688 :Description: see hit_set_fpp_
689
690 :Type: Double
691
692
693 ``cache_target_dirty_ratio``
694
695 :Description: see cache_target_dirty_ratio_
696
697 :Type: Double
698
699
700 ``cache_target_dirty_high_ratio``
701
702 :Description: see cache_target_dirty_high_ratio_
703
704 :Type: Double
705
706
707 ``cache_target_full_ratio``
708
709 :Description: see cache_target_full_ratio_
710
711 :Type: Double
712
713
714 ``target_max_bytes``
715
716 :Description: see target_max_bytes_
717
718 :Type: Integer
719
720
721 ``target_max_objects``
722
723 :Description: see target_max_objects_
724
725 :Type: Integer
726
727
728 ``cache_min_flush_age``
729
730 :Description: see cache_min_flush_age_
731
732 :Type: Integer
733
734
735 ``cache_min_evict_age``
736
737 :Description: see cache_min_evict_age_
738
739 :Type: Integer
740
741
742 ``fast_read``
743
744 :Description: see fast_read_
745
746 :Type: Boolean
747
748
749 ``scrub_min_interval``
750
751 :Description: see scrub_min_interval_
752
753 :Type: Double
754
755
756 ``scrub_max_interval``
757
758 :Description: see scrub_max_interval_
759
760 :Type: Double
761
762
763 ``deep_scrub_interval``
764
765 :Description: see deep_scrub_interval_
766
767 :Type: Double
768
769
770 ``allow_ec_overwrites``
771
772 :Description: see allow_ec_overwrites_
773
774 :Type: Boolean
775
776
777 ``recovery_priority``
778
779 :Description: see recovery_priority_
780
781 :Type: Integer
782
783
784 ``recovery_op_priority``
785
786 :Description: see recovery_op_priority_
787
788 :Type: Integer
789
790
791 Set the Number of Object Replicas
792 =================================
793
794 To set the number of object replicas on a replicated pool, execute the following::
795
796 ceph osd pool set {poolname} size {num-replicas}
797
798 .. important:: The ``{num-replicas}`` includes the object itself.
799 If you want the object and two copies of the object for a total of
800 three instances of the object, specify ``3``.
801
802 For example::
803
804 ceph osd pool set data size 3
805
806 You may execute this command for each pool. **Note:** An object might accept
807 I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
808 number of required replicas for I/O, you should use the ``min_size`` setting.
809 For example::
810
811 ceph osd pool set data min_size 2
812
813 This ensures that no object in the data pool will receive I/O with fewer than
814 ``min_size`` replicas.
815
816
817 Get the Number of Object Replicas
818 =================================
819
820 To get the number of object replicas, execute the following::
821
822 ceph osd dump | grep 'replicated size'
823
824 Ceph will list the pools, with the ``replicated size`` attribute highlighted.
825 By default, ceph creates two replicas of an object (a total of three copies, or
826 a size of 3).
827
828
829
830 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
831 .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
832 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
833 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
834 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool
835