]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/pools.rst
update sources to 12.2.7
[ceph.git] / ceph / doc / rados / operations / pools.rst
1 =======
2 Pools
3 =======
4
5 When you first deploy a cluster without creating a pool, Ceph uses the default
6 pools for storing data. A pool provides you with:
7
8 - **Resilience**: You can set how many OSD are allowed to fail without losing data.
9 For replicated pools, it is the desired number of copies/replicas of an object.
10 A typical configuration stores an object and one additional copy
11 (i.e., ``size = 2``), but you can determine the number of copies/replicas.
12 For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
13 (i.e. ``m=2`` in the **erasure code profile**)
14
15 - **Placement Groups**: You can set the number of placement groups for the pool.
16 A typical configuration uses approximately 100 placement groups per OSD to
17 provide optimal balancing without using up too many computing resources. When
18 setting up multiple pools, be careful to ensure you set a reasonable number of
19 placement groups for both the pool and the cluster as a whole.
20
21 - **CRUSH Rules**: When you store data in a pool, placement of the object
22 and its replicas (or chunks for erasure coded pools) in your cluster is governed
23 by CRUSH rules. You can create a custom CRUSH rule for your pool if the default
24 rule is not appropriate for your use case.
25
26 - **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
27 you effectively take a snapshot of a particular pool.
28
29 To organize data into pools, you can list, create, and remove pools.
30 You can also view the utilization statistics for each pool.
31
32 List Pools
33 ==========
34
35 To list your cluster's pools, execute::
36
37 ceph osd lspools
38
39 On a freshly installed cluster, only the ``rbd`` pool exists.
40
41
42 .. _createpool:
43
44 Create a Pool
45 =============
46
47 Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
48 Ideally, you should override the default value for the number of placement
49 groups in your Ceph configuration file, as the default is NOT ideal.
50 For details on placement group numbers refer to `setting the number of placement groups`_
51
52 .. note:: Starting with Luminous, all pools need to be associated to the
53 application using the pool. See `Associate Pool to Application`_ below for
54 more information.
55
56 For example::
57
58 osd pool default pg num = 100
59 osd pool default pgp num = 100
60
61 To create a pool, execute::
62
63 ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
64 [crush-rule-name] [expected-num-objects]
65 ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
66 [erasure-code-profile] [crush-rule-name] [expected_num_objects]
67
68 Where:
69
70 ``{pool-name}``
71
72 :Description: The name of the pool. It must be unique.
73 :Type: String
74 :Required: Yes.
75
76 ``{pg-num}``
77
78 :Description: The total number of placement groups for the pool. See `Placement
79 Groups`_ for details on calculating a suitable number. The
80 default value ``8`` is NOT suitable for most systems.
81
82 :Type: Integer
83 :Required: Yes.
84 :Default: 8
85
86 ``{pgp-num}``
87
88 :Description: The total number of placement groups for placement purposes. This
89 **should be equal to the total number of placement groups**, except
90 for placement group splitting scenarios.
91
92 :Type: Integer
93 :Required: Yes. Picks up default or Ceph configuration value if not specified.
94 :Default: 8
95
96 ``{replicated|erasure}``
97
98 :Description: The pool type which may either be **replicated** to
99 recover from lost OSDs by keeping multiple copies of the
100 objects or **erasure** to get a kind of
101 `generalized RAID5 <../erasure-code>`_ capability.
102 The **replicated** pools require more
103 raw storage but implement all Ceph operations. The
104 **erasure** pools require less raw storage but only
105 implement a subset of the available operations.
106
107 :Type: String
108 :Required: No.
109 :Default: replicated
110
111 ``[crush-rule-name]``
112
113 :Description: The name of a CRUSH rule to use for this pool. The specified
114 rule must exist.
115
116 :Type: String
117 :Required: No.
118 :Default: For **replicated** pools it is the rule specified by the ``osd
119 pool default crush rule`` config variable. This rule must exist.
120 For **erasure** pools it is ``erasure-code`` if the ``default``
121 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
122 rule will be created implicitly if it doesn't exist already.
123
124
125 ``[erasure-code-profile=profile]``
126
127 .. _erasure code profile: ../erasure-code-profile
128
129 :Description: For **erasure** pools only. Use the `erasure code profile`_. It
130 must be an existing profile as defined by
131 **osd erasure-code-profile set**.
132
133 :Type: String
134 :Required: No.
135
136 When you create a pool, set the number of placement groups to a reasonable value
137 (e.g., ``100``). Consider the total number of placement groups per OSD too.
138 Placement groups are computationally expensive, so performance will degrade when
139 you have many pools with many placement groups (e.g., 50 pools with 100
140 placement groups each). The point of diminishing returns depends upon the power
141 of the OSD host.
142
143 See `Placement Groups`_ for details on calculating an appropriate number of
144 placement groups for your pool.
145
146 .. _Placement Groups: ../placement-groups
147
148 ``[expected-num-objects]``
149
150 :Description: The expected number of objects for this pool. By setting this value (
151 together with a negative **filestore merge threshold**), the PG folder
152 splitting would happen at the pool creation time, to avoid the latency
153 impact to do a runtime folder splitting.
154
155 :Type: Integer
156 :Required: No.
157 :Default: 0, no splitting at the pool creation time.
158
159 Associate Pool to Application
160 =============================
161
162 Pools need to be associated with an application before use. Pools that will be
163 used with CephFS or pools that are automatically created by RGW are
164 automatically associated. Pools that are intended for use with RBD should be
165 initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
166 information).
167
168 For other cases, you can manually associate a free-form application name to
169 a pool.::
170
171 ceph osd pool application enable {pool-name} {application-name}
172
173 .. note:: CephFS uses the application name ``cephfs``, RBD uses the
174 application name ``rbd``, and RGW uses the application name ``rgw``.
175
176 Set Pool Quotas
177 ===============
178
179 You can set pool quotas for the maximum number of bytes and/or the maximum
180 number of objects per pool. ::
181
182 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
183
184 For example::
185
186 ceph osd pool set-quota data max_objects 10000
187
188 To remove a quota, set its value to ``0``.
189
190
191 Delete a Pool
192 =============
193
194 To delete a pool, execute::
195
196 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
197
198
199 To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
200 configuration. Otherwise they will refuse to remove a pool.
201
202 See `Monitor Configuration`_ for more information.
203
204 .. _Monitor Configuration: ../../configuration/mon-config-ref
205
206 If you created your own rules for a pool you created, you should consider
207 removing them when you no longer need your pool::
208
209 ceph osd pool get {pool-name} crush_rule
210
211 If the rule was "123", for example, you can check the other pools like so::
212
213 ceph osd dump | grep "^pool" | grep "crush_rule 123"
214
215 If no other pools use that custom rule, then it's safe to delete that
216 rule from the cluster.
217
218 If you created users with permissions strictly for a pool that no longer
219 exists, you should consider deleting those users too::
220
221 ceph auth ls | grep -C 5 {pool-name}
222 ceph auth del {user}
223
224
225 Rename a Pool
226 =============
227
228 To rename a pool, execute::
229
230 ceph osd pool rename {current-pool-name} {new-pool-name}
231
232 If you rename a pool and you have per-pool capabilities for an authenticated
233 user, you must update the user's capabilities (i.e., caps) with the new pool
234 name.
235
236 .. note:: Version ``0.48`` Argonaut and above.
237
238 Show Pool Statistics
239 ====================
240
241 To show a pool's utilization statistics, execute::
242
243 rados df
244
245
246 Make a Snapshot of a Pool
247 =========================
248
249 To make a snapshot of a pool, execute::
250
251 ceph osd pool mksnap {pool-name} {snap-name}
252
253 .. note:: Version ``0.48`` Argonaut and above.
254
255
256 Remove a Snapshot of a Pool
257 ===========================
258
259 To remove a snapshot of a pool, execute::
260
261 ceph osd pool rmsnap {pool-name} {snap-name}
262
263 .. note:: Version ``0.48`` Argonaut and above.
264
265 .. _setpoolvalues:
266
267
268 Set Pool Values
269 ===============
270
271 To set a value to a pool, execute the following::
272
273 ceph osd pool set {pool-name} {key} {value}
274
275 You may set values for the following keys:
276
277 .. _compression_algorithm:
278
279 ``compression_algorithm``
280 :Description: Sets inline compression algorithm to use for underlying BlueStore.
281 This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression algorithm``.
282
283 :Type: String
284 :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
285
286 ``compression_mode``
287
288 :Description: Sets the policy for the inline compression algorithm for underlying BlueStore.
289 This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression mode``.
290
291 :Type: String
292 :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
293
294 ``compression_min_blob_size``
295
296 :Description: Chunks smaller than this are never compressed.
297 This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression min blob *``.
298
299 :Type: Unsigned Integer
300
301 ``compression_max_blob_size``
302
303 :Description: Chunks larger than this are broken into smaller blobs sizing
304 ``compression_max_blob_size`` before being compressed.
305
306 :Type: Unsigned Integer
307
308 .. _size:
309
310 ``size``
311
312 :Description: Sets the number of replicas for objects in the pool.
313 See `Set the Number of Object Replicas`_ for further details.
314 Replicated pools only.
315
316 :Type: Integer
317
318 .. _min_size:
319
320 ``min_size``
321
322 :Description: Sets the minimum number of replicas required for I/O.
323 See `Set the Number of Object Replicas`_ for further details.
324 Replicated pools only.
325
326 :Type: Integer
327 :Version: ``0.54`` and above
328
329 .. _pg_num:
330
331 ``pg_num``
332
333 :Description: The effective number of placement groups to use when calculating
334 data placement.
335 :Type: Integer
336 :Valid Range: Superior to ``pg_num`` current value.
337
338 .. _pgp_num:
339
340 ``pgp_num``
341
342 :Description: The effective number of placement groups for placement to use
343 when calculating data placement.
344
345 :Type: Integer
346 :Valid Range: Equal to or less than ``pg_num``.
347
348 .. _crush_rule:
349
350 ``crush_rule``
351
352 :Description: The rule to use for mapping object placement in the cluster.
353 :Type: Integer
354
355 .. _allow_ec_overwrites:
356
357 ``allow_ec_overwrites``
358
359 :Description: Whether writes to an erasure coded pool can update part
360 of an object, so cephfs and rbd can use it. See
361 `Erasure Coding with Overwrites`_ for more details.
362 :Type: Boolean
363 :Version: ``12.2.0`` and above
364
365 .. _hashpspool:
366
367 ``hashpspool``
368
369 :Description: Set/Unset HASHPSPOOL flag on a given pool.
370 :Type: Integer
371 :Valid Range: 1 sets flag, 0 unsets flag
372 :Version: Version ``0.48`` Argonaut and above.
373
374 .. _nodelete:
375
376 ``nodelete``
377
378 :Description: Set/Unset NODELETE flag on a given pool.
379 :Type: Integer
380 :Valid Range: 1 sets flag, 0 unsets flag
381 :Version: Version ``FIXME``
382
383 .. _nopgchange:
384
385 ``nopgchange``
386
387 :Description: Set/Unset NOPGCHANGE flag on a given pool.
388 :Type: Integer
389 :Valid Range: 1 sets flag, 0 unsets flag
390 :Version: Version ``FIXME``
391
392 .. _nosizechange:
393
394 ``nosizechange``
395
396 :Description: Set/Unset NOSIZECHANGE flag on a given pool.
397 :Type: Integer
398 :Valid Range: 1 sets flag, 0 unsets flag
399 :Version: Version ``FIXME``
400
401 .. _write_fadvise_dontneed:
402
403 ``write_fadvise_dontneed``
404
405 :Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
406 :Type: Integer
407 :Valid Range: 1 sets flag, 0 unsets flag
408
409 .. _noscrub:
410
411 ``noscrub``
412
413 :Description: Set/Unset NOSCRUB flag on a given pool.
414 :Type: Integer
415 :Valid Range: 1 sets flag, 0 unsets flag
416
417 .. _nodeep-scrub:
418
419 ``nodeep-scrub``
420
421 :Description: Set/Unset NODEEP_SCRUB flag on a given pool.
422 :Type: Integer
423 :Valid Range: 1 sets flag, 0 unsets flag
424
425 .. _hit_set_type:
426
427 ``hit_set_type``
428
429 :Description: Enables hit set tracking for cache pools.
430 See `Bloom Filter`_ for additional information.
431
432 :Type: String
433 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
434 :Default: ``bloom``. Other values are for testing.
435
436 .. _hit_set_count:
437
438 ``hit_set_count``
439
440 :Description: The number of hit sets to store for cache pools. The higher
441 the number, the more RAM consumed by the ``ceph-osd`` daemon.
442
443 :Type: Integer
444 :Valid Range: ``1``. Agent doesn't handle > 1 yet.
445
446 .. _hit_set_period:
447
448 ``hit_set_period``
449
450 :Description: The duration of a hit set period in seconds for cache pools.
451 The higher the number, the more RAM consumed by the
452 ``ceph-osd`` daemon.
453
454 :Type: Integer
455 :Example: ``3600`` 1hr
456
457 .. _hit_set_fpp:
458
459 ``hit_set_fpp``
460
461 :Description: The false positive probability for the ``bloom`` hit set type.
462 See `Bloom Filter`_ for additional information.
463
464 :Type: Double
465 :Valid Range: 0.0 - 1.0
466 :Default: ``0.05``
467
468 .. _cache_target_dirty_ratio:
469
470 ``cache_target_dirty_ratio``
471
472 :Description: The percentage of the cache pool containing modified (dirty)
473 objects before the cache tiering agent will flush them to the
474 backing storage pool.
475
476 :Type: Double
477 :Default: ``.4``
478
479 .. _cache_target_dirty_high_ratio:
480
481 ``cache_target_dirty_high_ratio``
482
483 :Description: The percentage of the cache pool containing modified (dirty)
484 objects before the cache tiering agent will flush them to the
485 backing storage pool with a higher speed.
486
487 :Type: Double
488 :Default: ``.6``
489
490 .. _cache_target_full_ratio:
491
492 ``cache_target_full_ratio``
493
494 :Description: The percentage of the cache pool containing unmodified (clean)
495 objects before the cache tiering agent will evict them from the
496 cache pool.
497
498 :Type: Double
499 :Default: ``.8``
500
501 .. _target_max_bytes:
502
503 ``target_max_bytes``
504
505 :Description: Ceph will begin flushing or evicting objects when the
506 ``max_bytes`` threshold is triggered.
507
508 :Type: Integer
509 :Example: ``1000000000000`` #1-TB
510
511 .. _target_max_objects:
512
513 ``target_max_objects``
514
515 :Description: Ceph will begin flushing or evicting objects when the
516 ``max_objects`` threshold is triggered.
517
518 :Type: Integer
519 :Example: ``1000000`` #1M objects
520
521
522 ``hit_set_grade_decay_rate``
523
524 :Description: Temperature decay rate between two successive hit_sets
525 :Type: Integer
526 :Valid Range: 0 - 100
527 :Default: ``20``
528
529
530 ``hit_set_search_last_n``
531
532 :Description: Count at most N appearance in hit_sets for temperature calculation
533 :Type: Integer
534 :Valid Range: 0 - hit_set_count
535 :Default: ``1``
536
537
538 .. _cache_min_flush_age:
539
540 ``cache_min_flush_age``
541
542 :Description: The time (in seconds) before the cache tiering agent will flush
543 an object from the cache pool to the storage pool.
544
545 :Type: Integer
546 :Example: ``600`` 10min
547
548 .. _cache_min_evict_age:
549
550 ``cache_min_evict_age``
551
552 :Description: The time (in seconds) before the cache tiering agent will evict
553 an object from the cache pool.
554
555 :Type: Integer
556 :Example: ``1800`` 30min
557
558 .. _fast_read:
559
560 ``fast_read``
561
562 :Description: On Erasure Coding pool, if this flag is turned on, the read request
563 would issue sub reads to all shards, and waits until it receives enough
564 shards to decode to serve the client. In the case of jerasure and isa
565 erasure plugins, once the first K replies return, client's request is
566 served immediately using the data decoded from these replies. This
567 helps to tradeoff some resources for better performance. Currently this
568 flag is only supported for Erasure Coding pool.
569
570 :Type: Boolean
571 :Defaults: ``0``
572
573 .. _scrub_min_interval:
574
575 ``scrub_min_interval``
576
577 :Description: The minimum interval in seconds for pool scrubbing when
578 load is low. If it is 0, the value osd_scrub_min_interval
579 from config is used.
580
581 :Type: Double
582 :Default: ``0``
583
584 .. _scrub_max_interval:
585
586 ``scrub_max_interval``
587
588 :Description: The maximum interval in seconds for pool scrubbing
589 irrespective of cluster load. If it is 0, the value
590 osd_scrub_max_interval from config is used.
591
592 :Type: Double
593 :Default: ``0``
594
595 .. _deep_scrub_interval:
596
597 ``deep_scrub_interval``
598
599 :Description: The interval in seconds for pool “deep” scrubbing. If it
600 is 0, the value osd_deep_scrub_interval from config is used.
601
602 :Type: Double
603 :Default: ``0``
604
605
606 Get Pool Values
607 ===============
608
609 To get a value from a pool, execute the following::
610
611 ceph osd pool get {pool-name} {key}
612
613 You may get values for the following keys:
614
615 ``size``
616
617 :Description: see size_
618
619 :Type: Integer
620
621 ``min_size``
622
623 :Description: see min_size_
624
625 :Type: Integer
626 :Version: ``0.54`` and above
627
628 ``pg_num``
629
630 :Description: see pg_num_
631
632 :Type: Integer
633
634
635 ``pgp_num``
636
637 :Description: see pgp_num_
638
639 :Type: Integer
640 :Valid Range: Equal to or less than ``pg_num``.
641
642
643 ``crush_rule``
644
645 :Description: see crush_rule_
646
647
648 ``hit_set_type``
649
650 :Description: see hit_set_type_
651
652 :Type: String
653 :Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
654
655 ``hit_set_count``
656
657 :Description: see hit_set_count_
658
659 :Type: Integer
660
661
662 ``hit_set_period``
663
664 :Description: see hit_set_period_
665
666 :Type: Integer
667
668
669 ``hit_set_fpp``
670
671 :Description: see hit_set_fpp_
672
673 :Type: Double
674
675
676 ``cache_target_dirty_ratio``
677
678 :Description: see cache_target_dirty_ratio_
679
680 :Type: Double
681
682
683 ``cache_target_dirty_high_ratio``
684
685 :Description: see cache_target_dirty_high_ratio_
686
687 :Type: Double
688
689
690 ``cache_target_full_ratio``
691
692 :Description: see cache_target_full_ratio_
693
694 :Type: Double
695
696
697 ``target_max_bytes``
698
699 :Description: see target_max_bytes_
700
701 :Type: Integer
702
703
704 ``target_max_objects``
705
706 :Description: see target_max_objects_
707
708 :Type: Integer
709
710
711 ``cache_min_flush_age``
712
713 :Description: see cache_min_flush_age_
714
715 :Type: Integer
716
717
718 ``cache_min_evict_age``
719
720 :Description: see cache_min_evict_age_
721
722 :Type: Integer
723
724
725 ``fast_read``
726
727 :Description: see fast_read_
728
729 :Type: Boolean
730
731
732 ``scrub_min_interval``
733
734 :Description: see scrub_min_interval_
735
736 :Type: Double
737
738
739 ``scrub_max_interval``
740
741 :Description: see scrub_max_interval_
742
743 :Type: Double
744
745
746 ``deep_scrub_interval``
747
748 :Description: see deep_scrub_interval_
749
750 :Type: Double
751
752
753 ``allow_ec_overwrites``
754
755 :Description: see allow_ec_overwrites_
756
757 :Type: Boolean
758
759
760 Set the Number of Object Replicas
761 =================================
762
763 To set the number of object replicas on a replicated pool, execute the following::
764
765 ceph osd pool set {poolname} size {num-replicas}
766
767 .. important:: The ``{num-replicas}`` includes the object itself.
768 If you want the object and two copies of the object for a total of
769 three instances of the object, specify ``3``.
770
771 For example::
772
773 ceph osd pool set data size 3
774
775 You may execute this command for each pool. **Note:** An object might accept
776 I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
777 number of required replicas for I/O, you should use the ``min_size`` setting.
778 For example::
779
780 ceph osd pool set data min_size 2
781
782 This ensures that no object in the data pool will receive I/O with fewer than
783 ``min_size`` replicas.
784
785
786 Get the Number of Object Replicas
787 =================================
788
789 To get the number of object replicas, execute the following::
790
791 ceph osd dump | grep 'replicated size'
792
793 Ceph will list the pools, with the ``replicated size`` attribute highlighted.
794 By default, ceph creates two replicas of an object (a total of three copies, or
795 a size of 3).
796
797
798
799 .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
800 .. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
801 .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
802 .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
803 .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool
804