]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/pools.rst
update sources to v12.1.1
[ceph.git] / ceph / doc / rados / operations / pools.rst
CommitLineData
7c673cae
FG
1=======
2 Pools
3=======
4
5When you first deploy a cluster without creating a pool, Ceph uses the default
6pools for storing data. A pool provides you with:
7
8- **Resilience**: You can set how many OSD are allowed to fail without losing data.
9 For replicated pools, it is the desired number of copies/replicas of an object.
10 A typical configuration stores an object and one additional copy
11 (i.e., ``size = 2``), but you can determine the number of copies/replicas.
12 For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
13 (i.e. ``m=2`` in the **erasure code profile**)
14
15- **Placement Groups**: You can set the number of placement groups for the pool.
16 A typical configuration uses approximately 100 placement groups per OSD to
17 provide optimal balancing without using up too many computing resources. When
18 setting up multiple pools, be careful to ensure you set a reasonable number of
19 placement groups for both the pool and the cluster as a whole.
20
21- **CRUSH Rules**: When you store data in a pool, a CRUSH ruleset mapped to the
22 pool enables CRUSH to identify a rule for the placement of the object
23 and its replicas (or chunks for erasure coded pools) in your cluster.
24 You can create a custom CRUSH rule for your pool.
25
26- **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
27 you effectively take a snapshot of a particular pool.
28
29To organize data into pools, you can list, create, and remove pools.
30You can also view the utilization statistics for each pool.
31
32List Pools
33==========
34
35To list your cluster's pools, execute::
36
37 ceph osd lspools
38
39On a freshly installed cluster, only the ``rbd`` pool exists.
40
41
42.. _createpool:
43
44Create a Pool
45=============
46
47Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
48Ideally, you should override the default value for the number of placement
49groups in your Ceph configuration file, as the default is NOT ideal.
50For details on placement group numbers refer to `setting the number of placement groups`_
51
52For example::
53
54 osd pool default pg num = 100
55 osd pool default pgp num = 100
56
57To create a pool, execute::
58
59 ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
224ce89b 60 [crush-rule-name] [expected-num-objects]
7c673cae 61 ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
224ce89b 62 [erasure-code-profile] [crush-rule-name] [expected_num_objects]
7c673cae
FG
63
64Where:
65
66``{pool-name}``
67
68:Description: The name of the pool. It must be unique.
69:Type: String
70:Required: Yes.
71
72``{pg-num}``
73
74:Description: The total number of placement groups for the pool. See `Placement
75 Groups`_ for details on calculating a suitable number. The
76 default value ``8`` is NOT suitable for most systems.
77
78:Type: Integer
79:Required: Yes.
80:Default: 8
81
82``{pgp-num}``
83
84:Description: The total number of placement groups for placement purposes. This
85 **should be equal to the total number of placement groups**, except
86 for placement group splitting scenarios.
87
88:Type: Integer
89:Required: Yes. Picks up default or Ceph configuration value if not specified.
90:Default: 8
91
92``{replicated|erasure}``
93
94:Description: The pool type which may either be **replicated** to
95 recover from lost OSDs by keeping multiple copies of the
96 objects or **erasure** to get a kind of
97 `generalized RAID5 <../erasure-code>`_ capability.
98 The **replicated** pools require more
99 raw storage but implement all Ceph operations. The
100 **erasure** pools require less raw storage but only
101 implement a subset of the available operations.
102
103:Type: String
104:Required: No.
105:Default: replicated
106
224ce89b 107``[crush-rule-name]``
7c673cae 108
224ce89b
WB
109:Description: The name of a CRUSH rule to use for this pool. The specified
110 rule must exist.
7c673cae
FG
111
112:Type: String
113:Required: No.
114:Default: For **replicated** pools it is the ruleset specified by the ``osd
115 pool default crush replicated ruleset`` config variable. This
116 ruleset must exist.
117 For **erasure** pools it is ``erasure-code`` if the ``default``
118 `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
119 ruleset will be created implicitly if it doesn't exist already.
120
121
122``[erasure-code-profile=profile]``
123
124.. _erasure code profile: ../erasure-code-profile
125
126:Description: For **erasure** pools only. Use the `erasure code profile`_. It
127 must be an existing profile as defined by
128 **osd erasure-code-profile set**.
129
130:Type: String
131:Required: No.
132
133When you create a pool, set the number of placement groups to a reasonable value
134(e.g., ``100``). Consider the total number of placement groups per OSD too.
135Placement groups are computationally expensive, so performance will degrade when
136you have many pools with many placement groups (e.g., 50 pools with 100
137placement groups each). The point of diminishing returns depends upon the power
138of the OSD host.
139
140See `Placement Groups`_ for details on calculating an appropriate number of
141placement groups for your pool.
142
143.. _Placement Groups: ../placement-groups
144
145``[expected-num-objects]``
146
147:Description: The expected number of objects for this pool. By setting this value (
148 together with a negative **filestore merge threshold**), the PG folder
149 splitting would happen at the pool creation time, to avoid the latency
150 impact to do a runtime folder splitting.
151
152:Type: Integer
153:Required: No.
154:Default: 0, no splitting at the pool creation time.
155
156Set Pool Quotas
157===============
158
159You can set pool quotas for the maximum number of bytes and/or the maximum
160number of objects per pool. ::
161
162 ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
163
164For example::
165
166 ceph osd pool set-quota data max_objects 10000
167
168To remove a quota, set its value to ``0``.
169
170
171Delete a Pool
172=============
173
174To delete a pool, execute::
175
176 ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
177
178
179To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
180configuration. Otherwise they will refuse to remove a pool.
181
182See `Monitor Configuration`_ for more information.
183
184.. _Monitor Configuration: ../../configuration/mon-config-ref
185
186If you created your own rulesets and rules for a pool you created, you should
187consider removing them when you no longer need your pool::
188
189 ceph osd pool get {pool-name} crush_ruleset
190
191If the ruleset was "123", for example, you can check the other pools like so::
192
193 ceph osd dump | grep "^pool" | grep "crush_ruleset 123"
194
195If no other pools use that custom ruleset, then it's safe to delete that
196ruleset from the cluster.
197
198If you created users with permissions strictly for a pool that no longer
199exists, you should consider deleting those users too::
200
201 ceph auth list | grep -C 5 {pool-name}
202 ceph auth del {user}
203
204
205Rename a Pool
206=============
207
208To rename a pool, execute::
209
210 ceph osd pool rename {current-pool-name} {new-pool-name}
211
212If you rename a pool and you have per-pool capabilities for an authenticated
213user, you must update the user's capabilities (i.e., caps) with the new pool
214name.
215
216.. note:: Version ``0.48`` Argonaut and above.
217
218Show Pool Statistics
219====================
220
221To show a pool's utilization statistics, execute::
222
223 rados df
224
225
226Make a Snapshot of a Pool
227=========================
228
229To make a snapshot of a pool, execute::
230
231 ceph osd pool mksnap {pool-name} {snap-name}
232
233.. note:: Version ``0.48`` Argonaut and above.
234
235
236Remove a Snapshot of a Pool
237===========================
238
239To remove a snapshot of a pool, execute::
240
241 ceph osd pool rmsnap {pool-name} {snap-name}
242
243.. note:: Version ``0.48`` Argonaut and above.
244
245.. _setpoolvalues:
246
247
248Set Pool Values
249===============
250
251To set a value to a pool, execute the following::
252
253 ceph osd pool set {pool-name} {key} {value}
254
255You may set values for the following keys:
256
257.. _size:
258
259``size``
260
261:Description: Sets the number of replicas for objects in the pool.
262 See `Set the Number of Object Replicas`_ for further details.
263 Replicated pools only.
264
265:Type: Integer
266
267.. _min_size:
268
269``min_size``
270
271:Description: Sets the minimum number of replicas required for I/O.
272 See `Set the Number of Object Replicas`_ for further details.
273 Replicated pools only.
274
275:Type: Integer
276:Version: ``0.54`` and above
277
278.. _pg_num:
279
280``pg_num``
281
282:Description: The effective number of placement groups to use when calculating
283 data placement.
284:Type: Integer
285:Valid Range: Superior to ``pg_num`` current value.
286
287.. _pgp_num:
288
289``pgp_num``
290
291:Description: The effective number of placement groups for placement to use
292 when calculating data placement.
293
294:Type: Integer
295:Valid Range: Equal to or less than ``pg_num``.
296
297.. _crush_ruleset:
298
299``crush_ruleset``
300
301:Description: The ruleset to use for mapping object placement in the cluster.
302:Type: Integer
303
304.. _allow_ec_overwrites:
305
306``allow_ec_overwrites``
307
308:Description: Whether writes to an erasure coded pool can update part
309 of an object, so cephfs and rbd can use it. See
310 `Erasure Coding with Overwrites`_ for more details.
311:Type: Boolean
312:Version: ``12.2.0`` and above
313
314.. _hashpspool:
315
316``hashpspool``
317
318:Description: Set/Unset HASHPSPOOL flag on a given pool.
319:Type: Integer
320:Valid Range: 1 sets flag, 0 unsets flag
321:Version: Version ``0.48`` Argonaut and above.
322
323.. _nodelete:
324
325``nodelete``
326
327:Description: Set/Unset NODELETE flag on a given pool.
328:Type: Integer
329:Valid Range: 1 sets flag, 0 unsets flag
330:Version: Version ``FIXME``
331
332.. _nopgchange:
333
334``nopgchange``
335
336:Description: Set/Unset NOPGCHANGE flag on a given pool.
337:Type: Integer
338:Valid Range: 1 sets flag, 0 unsets flag
339:Version: Version ``FIXME``
340
341.. _nosizechange:
342
343``nosizechange``
344
345:Description: Set/Unset NOSIZECHANGE flag on a given pool.
346:Type: Integer
347:Valid Range: 1 sets flag, 0 unsets flag
348:Version: Version ``FIXME``
349
350.. _write_fadvise_dontneed:
351
352``write_fadvise_dontneed``
353
354:Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
355:Type: Integer
356:Valid Range: 1 sets flag, 0 unsets flag
357
358.. _noscrub:
359
360``noscrub``
361
362:Description: Set/Unset NOSCRUB flag on a given pool.
363:Type: Integer
364:Valid Range: 1 sets flag, 0 unsets flag
365
366.. _nodeep-scrub:
367
368``nodeep-scrub``
369
370:Description: Set/Unset NODEEP_SCRUB flag on a given pool.
371:Type: Integer
372:Valid Range: 1 sets flag, 0 unsets flag
373
374.. _hit_set_type:
375
376``hit_set_type``
377
378:Description: Enables hit set tracking for cache pools.
379 See `Bloom Filter`_ for additional information.
380
381:Type: String
382:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
383:Default: ``bloom``. Other values are for testing.
384
385.. _hit_set_count:
386
387``hit_set_count``
388
389:Description: The number of hit sets to store for cache pools. The higher
390 the number, the more RAM consumed by the ``ceph-osd`` daemon.
391
392:Type: Integer
393:Valid Range: ``1``. Agent doesn't handle > 1 yet.
394
395.. _hit_set_period:
396
397``hit_set_period``
398
399:Description: The duration of a hit set period in seconds for cache pools.
400 The higher the number, the more RAM consumed by the
401 ``ceph-osd`` daemon.
402
403:Type: Integer
404:Example: ``3600`` 1hr
405
406.. _hit_set_fpp:
407
408``hit_set_fpp``
409
410:Description: The false positive probability for the ``bloom`` hit set type.
411 See `Bloom Filter`_ for additional information.
412
413:Type: Double
414:Valid Range: 0.0 - 1.0
415:Default: ``0.05``
416
417.. _cache_target_dirty_ratio:
418
419``cache_target_dirty_ratio``
420
421:Description: The percentage of the cache pool containing modified (dirty)
422 objects before the cache tiering agent will flush them to the
423 backing storage pool.
424
425:Type: Double
426:Default: ``.4``
427
428.. _cache_target_dirty_high_ratio:
429
430``cache_target_dirty_high_ratio``
431
432:Description: The percentage of the cache pool containing modified (dirty)
433 objects before the cache tiering agent will flush them to the
434 backing storage pool with a higher speed.
435
436:Type: Double
437:Default: ``.6``
438
439.. _cache_target_full_ratio:
440
441``cache_target_full_ratio``
442
443:Description: The percentage of the cache pool containing unmodified (clean)
444 objects before the cache tiering agent will evict them from the
445 cache pool.
446
447:Type: Double
448:Default: ``.8``
449
450.. _target_max_bytes:
451
452``target_max_bytes``
453
454:Description: Ceph will begin flushing or evicting objects when the
455 ``max_bytes`` threshold is triggered.
456
457:Type: Integer
458:Example: ``1000000000000`` #1-TB
459
460.. _target_max_objects:
461
462``target_max_objects``
463
464:Description: Ceph will begin flushing or evicting objects when the
465 ``max_objects`` threshold is triggered.
466
467:Type: Integer
468:Example: ``1000000`` #1M objects
469
470
471``hit_set_grade_decay_rate``
472
473:Description: Temperature decay rate between two successive hit_sets
474:Type: Integer
475:Valid Range: 0 - 100
476:Default: ``20``
477
478
479``hit_set_search_last_n``
480
481:Description: Count at most N appearance in hit_sets for temperature calculation
482:Type: Integer
483:Valid Range: 0 - hit_set_count
484:Default: ``1``
485
486
487.. _cache_min_flush_age:
488
489``cache_min_flush_age``
490
491:Description: The time (in seconds) before the cache tiering agent will flush
492 an object from the cache pool to the storage pool.
493
494:Type: Integer
495:Example: ``600`` 10min
496
497.. _cache_min_evict_age:
498
499``cache_min_evict_age``
500
501:Description: The time (in seconds) before the cache tiering agent will evict
502 an object from the cache pool.
503
504:Type: Integer
505:Example: ``1800`` 30min
506
507.. _fast_read:
508
509``fast_read``
510
511:Description: On Erasure Coding pool, if this flag is turned on, the read request
512 would issue sub reads to all shards, and waits until it receives enough
513 shards to decode to serve the client. In the case of jerasure and isa
514 erasure plugins, once the first K replies return, client's request is
515 served immediately using the data decoded from these replies. This
516 helps to tradeoff some resources for better performance. Currently this
517 flag is only supported for Erasure Coding pool.
518
519:Type: Boolean
520:Defaults: ``0``
521
522.. _scrub_min_interval:
523
524``scrub_min_interval``
525
526:Description: The minimum interval in seconds for pool scrubbing when
527 load is low. If it is 0, the value osd_scrub_min_interval
528 from config is used.
529
530:Type: Double
531:Default: ``0``
532
533.. _scrub_max_interval:
534
535``scrub_max_interval``
536
537:Description: The maximum interval in seconds for pool scrubbing
538 irrespective of cluster load. If it is 0, the value
539 osd_scrub_max_interval from config is used.
540
541:Type: Double
542:Default: ``0``
543
544.. _deep_scrub_interval:
545
546``deep_scrub_interval``
547
548:Description: The interval in seconds for pool “deep” scrubbing. If it
549 is 0, the value osd_deep_scrub_interval from config is used.
550
551:Type: Double
552:Default: ``0``
553
554
555Get Pool Values
556===============
557
558To get a value from a pool, execute the following::
559
560 ceph osd pool get {pool-name} {key}
561
562You may get values for the following keys:
563
564``size``
565
566:Description: see size_
567
568:Type: Integer
569
570``min_size``
571
572:Description: see min_size_
573
574:Type: Integer
575:Version: ``0.54`` and above
576
577``pg_num``
578
579:Description: see pg_num_
580
581:Type: Integer
582
583
584``pgp_num``
585
586:Description: see pgp_num_
587
588:Type: Integer
589:Valid Range: Equal to or less than ``pg_num``.
590
591
592``crush_ruleset``
593
594:Description: see crush_ruleset_
595
596
597``hit_set_type``
598
599:Description: see hit_set_type_
600
601:Type: String
602:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
603
604``hit_set_count``
605
606:Description: see hit_set_count_
607
608:Type: Integer
609
610
611``hit_set_period``
612
613:Description: see hit_set_period_
614
615:Type: Integer
616
617
618``hit_set_fpp``
619
620:Description: see hit_set_fpp_
621
622:Type: Double
623
624
625``cache_target_dirty_ratio``
626
627:Description: see cache_target_dirty_ratio_
628
629:Type: Double
630
631
632``cache_target_dirty_high_ratio``
633
634:Description: see cache_target_dirty_high_ratio_
635
636:Type: Double
637
638
639``cache_target_full_ratio``
640
641:Description: see cache_target_full_ratio_
642
643:Type: Double
644
645
646``target_max_bytes``
647
648:Description: see target_max_bytes_
649
650:Type: Integer
651
652
653``target_max_objects``
654
655:Description: see target_max_objects_
656
657:Type: Integer
658
659
660``cache_min_flush_age``
661
662:Description: see cache_min_flush_age_
663
664:Type: Integer
665
666
667``cache_min_evict_age``
668
669:Description: see cache_min_evict_age_
670
671:Type: Integer
672
673
674``fast_read``
675
676:Description: see fast_read_
677
678:Type: Boolean
679
680
681``scrub_min_interval``
682
683:Description: see scrub_min_interval_
684
685:Type: Double
686
687
688``scrub_max_interval``
689
690:Description: see scrub_max_interval_
691
692:Type: Double
693
694
695``deep_scrub_interval``
696
697:Description: see deep_scrub_interval_
698
699:Type: Double
700
701
702Set the Number of Object Replicas
703=================================
704
705To set the number of object replicas on a replicated pool, execute the following::
706
707 ceph osd pool set {poolname} size {num-replicas}
708
709.. important:: The ``{num-replicas}`` includes the object itself.
710 If you want the object and two copies of the object for a total of
711 three instances of the object, specify ``3``.
712
713For example::
714
715 ceph osd pool set data size 3
716
717You may execute this command for each pool. **Note:** An object might accept
718I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
719number of required replicas for I/O, you should use the ``min_size`` setting.
720For example::
721
722 ceph osd pool set data min_size 2
723
724This ensures that no object in the data pool will receive I/O with fewer than
725``min_size`` replicas.
726
727
728Get the Number of Object Replicas
729=================================
730
731To get the number of object replicas, execute the following::
732
733 ceph osd dump | grep 'replicated size'
734
735Ceph will list the pools, with the ``replicated size`` attribute highlighted.
736By default, ceph creates two replicas of an object (a total of three copies, or
737a size of 3).
738
739
740
741.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
742.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
743.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
744.. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites