]>
Commit | Line | Data |
---|---|---|
20effc67 TL |
1 | .. _placement groups: |
2 | ||
7c673cae FG |
3 | ================== |
4 | Placement Groups | |
5 | ================== | |
6 | ||
11fdf7f2 TL |
7 | .. _pg-autoscaler: |
8 | ||
9 | Autoscaling placement groups | |
10 | ============================ | |
11 | ||
1e59de90 TL |
12 | Placement groups (PGs) are an internal implementation detail of how Ceph |
13 | distributes data. Autoscaling provides a way to manage PGs, and especially to | |
14 | manage the number of PGs present in different pools. When *pg-autoscaling* is | |
15 | enabled, the cluster is allowed to make recommendations or automatic | |
16 | adjustments with respect to the number of PGs for each pool (``pgp_num``) in | |
17 | accordance with expected cluster utilization and expected pool utilization. | |
18 | ||
19 | Each pool has a ``pg_autoscale_mode`` property that can be set to ``off``, | |
20 | ``on``, or ``warn``: | |
21 | ||
22 | * ``off``: Disable autoscaling for this pool. It is up to the administrator to | |
23 | choose an appropriate ``pgp_num`` for each pool. For more information, see | |
24 | :ref:`choosing-number-of-placement-groups`. | |
11fdf7f2 | 25 | * ``on``: Enable automated adjustments of the PG count for the given pool. |
1e59de90 | 26 | * ``warn``: Raise health checks when the PG count is in need of adjustment. |
11fdf7f2 | 27 | |
1e59de90 TL |
28 | To set the autoscaling mode for an existing pool, run a command of the |
29 | following form: | |
11fdf7f2 | 30 | |
39ae355f | 31 | .. prompt:: bash # |
11fdf7f2 | 32 | |
39ae355f | 33 | ceph osd pool set <pool-name> pg_autoscale_mode <mode> |
11fdf7f2 | 34 | |
1e59de90 | 35 | For example, to enable autoscaling on pool ``foo``, run the following command: |
39ae355f TL |
36 | |
37 | .. prompt:: bash # | |
38 | ||
39 | ceph osd pool set foo pg_autoscale_mode on | |
11fdf7f2 | 40 | |
1e59de90 TL |
41 | There is also a ``pg_autoscale_mode`` setting for any pools that are created |
42 | after the initial setup of the cluster. To change this setting, run a command | |
43 | of the following form: | |
11fdf7f2 | 44 | |
39ae355f TL |
45 | .. prompt:: bash # |
46 | ||
47 | ceph config set global osd_pool_default_pg_autoscale_mode <mode> | |
11fdf7f2 | 48 | |
1e59de90 TL |
49 | You can disable or enable the autoscaler for all pools with the ``noautoscale`` |
50 | flag. By default, this flag is set to ``off``, but you can set it to ``on`` by | |
51 | running the following command: | |
39ae355f | 52 | |
1e59de90 | 53 | .. prompt:: bash # |
20effc67 | 54 | |
39ae355f | 55 | ceph osd pool set noautoscale |
20effc67 | 56 | |
1e59de90 | 57 | To set the ``noautoscale`` flag to ``off``, run the following command: |
20effc67 | 58 | |
39ae355f | 59 | .. prompt:: bash # |
20effc67 | 60 | |
39ae355f | 61 | ceph osd pool unset noautoscale |
20effc67 | 62 | |
1e59de90 | 63 | To get the value of the flag, run the following command: |
39ae355f TL |
64 | |
65 | .. prompt:: bash # | |
66 | ||
67 | ceph osd pool get noautoscale | |
20effc67 | 68 | |
11fdf7f2 TL |
69 | Viewing PG scaling recommendations |
70 | ---------------------------------- | |
71 | ||
1e59de90 TL |
72 | To view each pool, its relative utilization, and any recommended changes to the |
73 | PG count, run the following command: | |
11fdf7f2 | 74 | |
39ae355f TL |
75 | .. prompt:: bash # |
76 | ||
77 | ceph osd pool autoscale-status | |
11fdf7f2 | 78 | |
1e59de90 | 79 | The output will resemble the following:: |
11fdf7f2 | 80 | |
20effc67 TL |
81 | POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK |
82 | a 12900M 3.0 82431M 0.4695 8 128 warn True | |
83 | c 0 3.0 82431M 0.0000 0.2000 0.9884 1.0 1 64 warn True | |
84 | b 0 953.6M 3.0 82431M 0.0347 8 warn False | |
11fdf7f2 | 85 | |
1e59de90 | 86 | - **POOL** is the name of the pool. |
11fdf7f2 | 87 | |
1e59de90 TL |
88 | - **SIZE** is the amount of data stored in the pool. |
89 | ||
90 | - **TARGET SIZE** (if present) is the amount of data that is expected to be | |
91 | stored in the pool, as specified by the administrator. The system uses the | |
92 | greater of the two values for its calculation. | |
11fdf7f2 | 93 | |
1e59de90 TL |
94 | - **RATE** is the multiplier for the pool that determines how much raw storage |
95 | capacity is consumed. For example, a three-replica pool will have a ratio of | |
96 | 3.0, and a ``k=4 m=2`` erasure-coded pool will have a ratio of 1.5. | |
11fdf7f2 | 97 | |
1e59de90 TL |
98 | - **RAW CAPACITY** is the total amount of raw storage capacity on the specific |
99 | OSDs that are responsible for storing the data of the pool (and perhaps the | |
100 | data of other pools). | |
11fdf7f2 | 101 | |
1e59de90 TL |
102 | - **RATIO** is the ratio of (1) the storage consumed by the pool to (2) the |
103 | total raw storage capacity. In order words, RATIO is defined as | |
104 | (SIZE * RATE) / RAW CAPACITY. | |
9f95a23c | 105 | |
1e59de90 TL |
106 | - **TARGET RATIO** (if present) is the ratio of the expected storage of this |
107 | pool (that is, the amount of storage that this pool is expected to consume, | |
108 | as specified by the administrator) to the expected storage of all other pools | |
109 | that have target ratios set. If both ``target_size_bytes`` and | |
110 | ``target_size_ratio`` are specified, then ``target_size_ratio`` takes | |
111 | precedence. | |
9f95a23c | 112 | |
1e59de90 TL |
113 | - **EFFECTIVE RATIO** is the result of making two adjustments to the target |
114 | ratio: | |
9f95a23c | 115 | |
1e59de90 TL |
116 | #. Subtracting any capacity expected to be used by pools that have target |
117 | size set. | |
522d829b | 118 | |
1e59de90 TL |
119 | #. Normalizing the target ratios among pools that have target ratio set so |
120 | that collectively they target cluster capacity. For example, four pools | |
121 | with target_ratio 1.0 would have an effective ratio of 0.25. | |
39ae355f | 122 | |
1e59de90 TL |
123 | The system's calculations use whichever of these two ratios (that is, the |
124 | target ratio and the effective ratio) is greater. | |
125 | ||
126 | - **BIAS** is used as a multiplier to manually adjust a pool's PG in accordance | |
127 | with prior information about how many PGs a specific pool is expected to | |
128 | have. | |
129 | ||
130 | - **PG_NUM** is either the current number of PGs associated with the pool or, | |
131 | if a ``pg_num`` change is in progress, the current number of PGs that the | |
132 | pool is working towards. | |
133 | ||
134 | - **NEW PG_NUM** (if present) is the value that the system is recommending the | |
135 | ``pg_num`` of the pool to be changed to. It is always a power of 2, and it is | |
136 | present only if the recommended value varies from the current value by more | |
137 | than the default factor of ``3``. To adjust this factor (in the following | |
138 | example, it is changed to ``2``), run the following command: | |
139 | ||
140 | .. prompt:: bash # | |
141 | ||
142 | ceph osd pool set threshold 2.0 | |
143 | ||
144 | - **AUTOSCALE** is the pool's ``pg_autoscale_mode`` and is set to ``on``, | |
145 | ``off``, or ``warn``. | |
146 | ||
147 | - **BULK** determines whether the pool is ``bulk``. It has a value of ``True`` | |
148 | or ``False``. A ``bulk`` pool is expected to be large and should initially | |
149 | have a large number of PGs so that performance does not suffer]. On the other | |
150 | hand, a pool that is not ``bulk`` is expected to be small (for example, a | |
151 | ``.mgr`` pool or a meta pool). | |
152 | ||
153 | .. note:: | |
20effc67 | 154 | |
1e59de90 TL |
155 | If the ``ceph osd pool autoscale-status`` command returns no output at all, |
156 | there is probably at least one pool that spans multiple CRUSH roots. This | |
157 | 'spanning pool' issue can happen in scenarios like the following: | |
158 | when a new deployment auto-creates the ``.mgr`` pool on the ``default`` | |
159 | CRUSH root, subsequent pools are created with rules that constrain them to a | |
160 | specific shadow CRUSH tree. For example, if you create an RBD metadata pool | |
161 | that is constrained to ``deviceclass = ssd`` and an RBD data pool that is | |
162 | constrained to ``deviceclass = hdd``, you will encounter this issue. To | |
163 | remedy this issue, constrain the spanning pool to only one device class. In | |
164 | the above scenario, there is likely to be a ``replicated-ssd`` CRUSH rule in | |
165 | effect, and the ``.mgr`` pool can be constrained to ``ssd`` devices by | |
166 | running the following commands: | |
11fdf7f2 | 167 | |
1e59de90 | 168 | .. prompt:: bash # |
11fdf7f2 | 169 | |
1e59de90 TL |
170 | ceph osd pool set .mgr crush_rule replicated-ssd |
171 | ceph osd pool set pool 1 crush_rule to replicated-ssd | |
172 | ||
173 | This intervention will result in a small amount of backfill, but | |
174 | typically this traffic completes quickly. | |
522d829b | 175 | |
11fdf7f2 TL |
176 | |
177 | Automated scaling | |
178 | ----------------- | |
179 | ||
1e59de90 TL |
180 | In the simplest approach to automated scaling, the cluster is allowed to |
181 | automatically scale ``pgp_num`` in accordance with usage. Ceph considers the | |
182 | total available storage and the target number of PGs for the whole system, | |
183 | considers how much data is stored in each pool, and apportions PGs accordingly. | |
184 | The system is conservative with its approach, making changes to a pool only | |
185 | when the current number of PGs (``pg_num``) varies by more than a factor of 3 | |
186 | from the recommended number. | |
11fdf7f2 | 187 | |
1e59de90 TL |
188 | The target number of PGs per OSD is determined by the ``mon_target_pg_per_osd`` |
189 | parameter (default: 100), which can be adjusted by running the following | |
190 | command: | |
11fdf7f2 | 191 | |
39ae355f TL |
192 | .. prompt:: bash # |
193 | ||
194 | ceph config set global mon_target_pg_per_osd 100 | |
11fdf7f2 | 195 | |
1e59de90 TL |
196 | The autoscaler analyzes pools and adjusts on a per-subtree basis. Because each |
197 | pool might map to a different CRUSH rule, and each rule might distribute data | |
198 | across different devices, Ceph will consider the utilization of each subtree of | |
199 | the hierarchy independently. For example, a pool that maps to OSDs of class | |
200 | ``ssd`` and a pool that maps to OSDs of class ``hdd`` will each have optimal PG | |
201 | counts that are determined by how many of these two different device types | |
202 | there are. | |
203 | ||
204 | If a pool uses OSDs under two or more CRUSH roots (for example, shadow trees | |
205 | with both ``ssd`` and ``hdd`` devices), the autoscaler issues a warning to the | |
206 | user in the manager log. The warning states the name of the pool and the set of | |
207 | roots that overlap each other. The autoscaler does not scale any pools with | |
208 | overlapping roots because this condition can cause problems with the scaling | |
209 | process. We recommend constraining each pool so that it belongs to only one | |
210 | root (that is, one OSD class) to silence the warning and ensure a successful | |
2a845540 TL |
211 | scaling process. |
212 | ||
aee94f69 TL |
213 | .. _managing_bulk_flagged_pools: |
214 | ||
215 | Managing pools that are flagged with ``bulk`` | |
216 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
217 | ||
1e59de90 TL |
218 | If a pool is flagged ``bulk``, then the autoscaler starts the pool with a full |
219 | complement of PGs and then scales down the number of PGs only if the usage | |
220 | ratio across the pool is uneven. However, if a pool is not flagged ``bulk``, | |
221 | then the autoscaler starts the pool with minimal PGs and creates additional PGs | |
222 | only if there is more usage in the pool. | |
522d829b | 223 | |
1e59de90 | 224 | To create a pool that will be flagged ``bulk``, run the following command: |
39ae355f TL |
225 | |
226 | .. prompt:: bash # | |
227 | ||
228 | ceph osd pool create <pool-name> --bulk | |
229 | ||
1e59de90 TL |
230 | To set or unset the ``bulk`` flag of an existing pool, run the following |
231 | command: | |
20effc67 | 232 | |
39ae355f | 233 | .. prompt:: bash # |
522d829b | 234 | |
39ae355f | 235 | ceph osd pool set <pool-name> bulk <true/false/1/0> |
522d829b | 236 | |
1e59de90 | 237 | To get the ``bulk`` flag of an existing pool, run the following command: |
522d829b | 238 | |
39ae355f | 239 | .. prompt:: bash # |
522d829b | 240 | |
39ae355f | 241 | ceph osd pool get <pool-name> bulk |
11fdf7f2 TL |
242 | |
243 | .. _specifying_pool_target_size: | |
244 | ||
245 | Specifying expected pool size | |
246 | ----------------------------- | |
247 | ||
1e59de90 TL |
248 | When a cluster or pool is first created, it consumes only a small fraction of |
249 | the total cluster capacity and appears to the system as if it should need only | |
250 | a small number of PGs. However, in some cases, cluster administrators know | |
251 | which pools are likely to consume most of the system capacity in the long run. | |
252 | When Ceph is provided with this information, a more appropriate number of PGs | |
253 | can be used from the beginning, obviating subsequent changes in ``pg_num`` and | |
254 | the associated overhead cost of relocating data. | |
11fdf7f2 | 255 | |
1e59de90 TL |
256 | The *target size* of a pool can be specified in two ways: either in relation to |
257 | the absolute size (in bytes) of the pool, or as a weight relative to all other | |
258 | pools that have ``target_size_ratio`` set. | |
11fdf7f2 | 259 | |
1e59de90 TL |
260 | For example, to tell the system that ``mypool`` is expected to consume 100 TB, |
261 | run the following command: | |
11fdf7f2 | 262 | |
39ae355f TL |
263 | .. prompt:: bash # |
264 | ||
265 | ceph osd pool set mypool target_size_bytes 100T | |
11fdf7f2 | 266 | |
1e59de90 TL |
267 | Alternatively, to tell the system that ``mypool`` is expected to consume a |
268 | ratio of 1.0 relative to other pools that have ``target_size_ratio`` set, | |
269 | adjust the ``target_size_ratio`` setting of ``my pool`` by running the | |
270 | following command: | |
39ae355f TL |
271 | |
272 | .. prompt:: bash # | |
11fdf7f2 | 273 | |
39ae355f | 274 | ceph osd pool set mypool target_size_ratio 1.0 |
11fdf7f2 | 275 | |
1e59de90 TL |
276 | If `mypool` is the only pool in the cluster, then it is expected to use 100% of |
277 | the total cluster capacity. However, if the cluster contains a second pool that | |
278 | has ``target_size_ratio`` set to 1.0, then both pools are expected to use 50% | |
279 | of the total cluster capacity. | |
11fdf7f2 | 280 | |
1e59de90 TL |
281 | The ``ceph osd pool create`` command has two command-line options that can be |
282 | used to set the target size of a pool at creation time: ``--target-size-bytes | |
283 | <bytes>`` and ``--target-size-ratio <ratio>``. | |
11fdf7f2 | 284 | |
1e59de90 TL |
285 | Note that if the target-size values that have been specified are impossible |
286 | (for example, a capacity larger than the total cluster), then a health check | |
9f95a23c TL |
287 | (``POOL_TARGET_SIZE_BYTES_OVERCOMMITTED``) will be raised. |
288 | ||
1e59de90 TL |
289 | If both ``target_size_ratio`` and ``target_size_bytes`` are specified for a |
290 | pool, then the latter will be ignored, the former will be used in system | |
291 | calculations, and a health check (``POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO``) | |
292 | will be raised. | |
11fdf7f2 TL |
293 | |
294 | Specifying bounds on a pool's PGs | |
295 | --------------------------------- | |
296 | ||
1e59de90 TL |
297 | It is possible to specify both the minimum number and the maximum number of PGs |
298 | for a pool. | |
299 | ||
300 | Setting a Minimum Number of PGs and a Maximum Number of PGs | |
301 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
302 | ||
303 | If a minimum is set, then Ceph will not itself reduce (nor recommend that you | |
304 | reduce) the number of PGs to a value below the configured value. Setting a | |
305 | minimum serves to establish a lower bound on the amount of parallelism enjoyed | |
306 | by a client during I/O, even if a pool is mostly empty. | |
11fdf7f2 | 307 | |
1e59de90 TL |
308 | If a maximum is set, then Ceph will not itself increase (or recommend that you |
309 | increase) the number of PGs to a value above the configured value. | |
310 | ||
311 | To set the minimum number of PGs for a pool, run a command of the following | |
312 | form: | |
11fdf7f2 | 313 | |
39ae355f TL |
314 | .. prompt:: bash # |
315 | ||
316 | ceph osd pool set <pool-name> pg_num_min <num> | |
1e59de90 TL |
317 | |
318 | To set the maximum number of PGs for a pool, run a command of the following | |
319 | form: | |
320 | ||
321 | .. prompt:: bash # | |
322 | ||
39ae355f | 323 | ceph osd pool set <pool-name> pg_num_max <num> |
11fdf7f2 | 324 | |
1e59de90 TL |
325 | In addition, the ``ceph osd pool create`` command has two command-line options |
326 | that can be used to specify the minimum or maximum PG count of a pool at | |
327 | creation time: ``--pg-num-min <num>`` and ``--pg-num-max <num>``. | |
11fdf7f2 | 328 | |
7c673cae FG |
329 | .. _preselection: |
330 | ||
1e59de90 TL |
331 | Preselecting pg_num |
332 | =================== | |
7c673cae | 333 | |
1e59de90 TL |
334 | When creating a pool with the following command, you have the option to |
335 | preselect the value of the ``pg_num`` parameter: | |
39ae355f TL |
336 | |
337 | .. prompt:: bash # | |
7c673cae | 338 | |
39ae355f | 339 | ceph osd pool create {pool-name} [pg_num] |
7c673cae | 340 | |
1e59de90 TL |
341 | If you opt not to specify ``pg_num`` in this command, the cluster uses the PG |
342 | autoscaler to automatically configure the parameter in accordance with the | |
343 | amount of data that is stored in the pool (see :ref:`pg-autoscaler` above). | |
344 | ||
345 | However, your decision of whether or not to specify ``pg_num`` at creation time | |
346 | has no effect on whether the parameter will be automatically tuned by the | |
347 | cluster afterwards. As seen above, autoscaling of PGs is enabled or disabled by | |
348 | running a command of the following form: | |
349 | ||
350 | .. prompt:: bash # | |
7c673cae | 351 | |
1e59de90 | 352 | ceph osd pool set {pool-name} pg_autoscale_mode (on|off|warn) |
7c673cae | 353 | |
1e59de90 TL |
354 | Without the balancer, the suggested target is approximately 100 PG replicas on |
355 | each OSD. With the balancer, an initial target of 50 PG replicas on each OSD is | |
356 | reasonable. | |
7c673cae | 357 | |
1e59de90 | 358 | The autoscaler attempts to satisfy the following conditions: |
7c673cae | 359 | |
1e59de90 TL |
360 | - the number of PGs per OSD should be proportional to the amount of data in the |
361 | pool | |
362 | - there should be 50-100 PGs per pool, taking into account the replication | |
363 | overhead or erasure-coding fan-out of each PG's replicas across OSDs | |
7c673cae | 364 | |
1e59de90 TL |
365 | Use of Placement Groups |
366 | ======================= | |
7c673cae | 367 | |
1e59de90 TL |
368 | A placement group aggregates objects within a pool. The tracking of RADOS |
369 | object placement and object metadata on a per-object basis is computationally | |
370 | expensive. It would be infeasible for a system with millions of RADOS | |
371 | objects to efficiently track placement on a per-object basis. | |
7c673cae FG |
372 | |
373 | .. ditaa:: | |
374 | /-----\ /-----\ /-----\ /-----\ /-----\ | |
375 | | obj | | obj | | obj | | obj | | obj | | |
376 | \-----/ \-----/ \-----/ \-----/ \-----/ | |
377 | | | | | | | |
378 | +--------+--------+ +---+----+ | |
379 | | | | |
380 | v v | |
381 | +-----------------------+ +-----------------------+ | |
382 | | Placement Group #1 | | Placement Group #2 | | |
383 | | | | | | |
384 | +-----------------------+ +-----------------------+ | |
385 | | | | |
386 | +------------------------------+ | |
387 | | | |
388 | v | |
389 | +-----------------------+ | |
390 | | Pool | | |
391 | | | | |
392 | +-----------------------+ | |
393 | ||
1e59de90 TL |
394 | The Ceph client calculates which PG a RADOS object should be in. As part of |
395 | this calculation, the client hashes the object ID and performs an operation | |
396 | involving both the number of PGs in the specified pool and the pool ID. For | |
397 | details, see `Mapping PGs to OSDs`_. | |
7c673cae | 398 | |
1e59de90 TL |
399 | The contents of a RADOS object belonging to a PG are stored in a set of OSDs. |
400 | For example, in a replicated pool of size two, each PG will store objects on | |
401 | two OSDs, as shown below: | |
7c673cae FG |
402 | |
403 | .. ditaa:: | |
7c673cae FG |
404 | +-----------------------+ +-----------------------+ |
405 | | Placement Group #1 | | Placement Group #2 | | |
406 | | | | | | |
407 | +-----------------------+ +-----------------------+ | |
408 | | | | | | |
409 | v v v v | |
410 | /----------\ /----------\ /----------\ /----------\ | |
411 | | | | | | | | | | |
412 | | OSD #1 | | OSD #2 | | OSD #2 | | OSD #3 | | |
413 | | | | | | | | | | |
414 | \----------/ \----------/ \----------/ \----------/ | |
415 | ||
416 | ||
1e59de90 TL |
417 | If OSD #2 fails, another OSD will be assigned to Placement Group #1 and then |
418 | filled with copies of all objects in OSD #1. If the pool size is changed from | |
419 | two to three, an additional OSD will be assigned to the PG and will receive | |
420 | copies of all objects in the PG. | |
7c673cae | 421 | |
1e59de90 TL |
422 | An OSD assigned to a PG is not owned exclusively by that PG; rather, the OSD is |
423 | shared with other PGs either from the same pool or from other pools. In our | |
424 | example, OSD #2 is shared by Placement Group #1 and Placement Group #2. If OSD | |
425 | #2 fails, then Placement Group #2 must restore copies of objects (by making use | |
426 | of OSD #3). | |
7c673cae | 427 | |
1e59de90 TL |
428 | When the number of PGs increases, several consequences ensue. The new PGs are |
429 | assigned OSDs. The result of the CRUSH function changes, which means that some | |
430 | objects from the already-existing PGs are copied to the new PGs and removed | |
431 | from the old ones. | |
7c673cae | 432 | |
1e59de90 TL |
433 | Factors Relevant To Specifying pg_num |
434 | ===================================== | |
7c673cae | 435 | |
1e59de90 TL |
436 | On the one hand, the criteria of data durability and even distribution across |
437 | OSDs weigh in favor of a high number of PGs. On the other hand, the criteria of | |
438 | saving CPU resources and minimizing memory usage weigh in favor of a low number | |
439 | of PGs. | |
7c673cae FG |
440 | |
441 | .. _data durability: | |
442 | ||
443 | Data durability | |
444 | --------------- | |
445 | ||
1e59de90 TL |
446 | When an OSD fails, the risk of data loss is increased until replication of the |
447 | data it hosted is restored to the configured level. To illustrate this point, | |
448 | let's imagine a scenario that results in permanent data loss in a single PG: | |
449 | ||
450 | #. The OSD fails and all copies of the object that it contains are lost. For | |
451 | each object within the PG, the number of its replicas suddenly drops from | |
452 | three to two. | |
453 | ||
454 | #. Ceph starts recovery for this PG by choosing a new OSD on which to re-create | |
455 | the third copy of each object. | |
456 | ||
457 | #. Another OSD within the same PG fails before the new OSD is fully populated | |
458 | with the third copy. Some objects will then only have one surviving copy. | |
459 | ||
460 | #. Ceph selects yet another OSD and continues copying objects in order to | |
461 | restore the desired number of copies. | |
462 | ||
463 | #. A third OSD within the same PG fails before recovery is complete. If this | |
464 | OSD happened to contain the only remaining copy of an object, the object is | |
465 | permanently lost. | |
466 | ||
467 | In a cluster containing 10 OSDs with 512 PGs in a three-replica pool, CRUSH | |
468 | will give each PG three OSDs. Ultimately, each OSD hosts :math:`\frac{(512 * | |
469 | 3)}{10} = ~150` PGs. So when the first OSD fails in the above scenario, | |
470 | recovery will begin for all 150 PGs at the same time. | |
471 | ||
472 | The 150 PGs that are being recovered are likely to be homogeneously distributed | |
473 | across the 9 remaining OSDs. Each remaining OSD is therefore likely to send | |
474 | copies of objects to all other OSDs and also likely to receive some new objects | |
475 | to be stored because it has become part of a new PG. | |
476 | ||
477 | The amount of time it takes for this recovery to complete depends on the | |
478 | architecture of the Ceph cluster. Compare two setups: (1) Each OSD is hosted by | |
479 | a 1 TB SSD on a single machine, all of the OSDs are connected to a 10 Gb/s | |
480 | switch, and the recovery of a single OSD completes within a certain number of | |
481 | minutes. (2) There are two OSDs per machine using HDDs with no SSD WAL+DB and | |
482 | a 1 Gb/s switch. In the second setup, recovery will be at least one order of | |
7c673cae FG |
483 | magnitude slower. |
484 | ||
1e59de90 TL |
485 | In such a cluster, the number of PGs has almost no effect on data durability. |
486 | Whether there are 128 PGs per OSD or 8192 PGs per OSD, the recovery will be no | |
487 | slower or faster. | |
488 | ||
489 | However, an increase in the number of OSDs can increase the speed of recovery. | |
490 | Suppose our Ceph cluster is expanded from 10 OSDs to 20 OSDs. Each OSD now | |
491 | participates in only ~75 PGs rather than ~150 PGs. All 19 remaining OSDs will | |
492 | still be required to replicate the same number of objects in order to recover. | |
493 | But instead of there being only 10 OSDs that have to copy ~100 GB each, there | |
494 | are now 20 OSDs that have to copy only 50 GB each. If the network had | |
495 | previously been a bottleneck, recovery now happens twice as fast. | |
496 | ||
497 | Similarly, suppose that our cluster grows to 40 OSDs. Each OSD will host only | |
498 | ~38 PGs. And if an OSD dies, recovery will take place faster than before unless | |
499 | it is blocked by another bottleneck. Now, however, suppose that our cluster | |
500 | grows to 200 OSDs. Each OSD will host only ~7 PGs. And if an OSD dies, recovery | |
501 | will happen across at most :math:`\approx 21 = (7 \times 3)` OSDs | |
502 | associated with these PGs. This means that recovery will take longer than when | |
503 | there were only 40 OSDs. For this reason, the number of PGs should be | |
7c673cae FG |
504 | increased. |
505 | ||
1e59de90 TL |
506 | No matter how brief the recovery time is, there is always a chance that an |
507 | additional OSD will fail while recovery is in progress. Consider the cluster | |
508 | with 10 OSDs described above: if any of the OSDs fail, then :math:`\approx 17` | |
509 | (approximately 150 divided by 9) PGs will have only one remaining copy. And if | |
510 | any of the 8 remaining OSDs fail, then 2 (approximately 17 divided by 8) PGs | |
511 | are likely to lose their remaining objects. This is one reason why setting | |
512 | ``size=2`` is risky. | |
513 | ||
514 | When the number of OSDs in the cluster increases to 20, the number of PGs that | |
515 | would be damaged by the loss of three OSDs significantly decreases. The loss of | |
516 | a second OSD degrades only approximately :math:`4` or (:math:`\frac{75}{19}`) | |
517 | PGs rather than :math:`\approx 17` PGs, and the loss of a third OSD results in | |
518 | data loss only if it is one of the 4 OSDs that contains the remaining copy. | |
519 | This means -- assuming that the probability of losing one OSD during recovery | |
520 | is 0.0001% -- that the probability of data loss when three OSDs are lost is | |
521 | :math:`\approx 17 \times 10 \times 0.0001%` in the cluster with 10 OSDs, and | |
522 | only :math:`\approx 4 \times 20 \times 0.0001%` in the cluster with 20 OSDs. | |
523 | ||
524 | In summary, the greater the number of OSDs, the faster the recovery and the | |
525 | lower the risk of permanently losing a PG due to cascading failures. As far as | |
526 | data durability is concerned, in a cluster with fewer than 50 OSDs, it doesn't | |
527 | much matter whether there are 512 or 4096 PGs. | |
528 | ||
529 | .. note:: It can take a long time for an OSD that has been recently added to | |
530 | the cluster to be populated with the PGs assigned to it. However, no object | |
531 | degradation or impact on data durability will result from the slowness of | |
532 | this process since Ceph populates data into the new PGs before removing it | |
533 | from the old PGs. | |
7c673cae FG |
534 | |
535 | .. _object distribution: | |
536 | ||
537 | Object distribution within a pool | |
538 | --------------------------------- | |
539 | ||
1e59de90 TL |
540 | Under ideal conditions, objects are evenly distributed across PGs. Because |
541 | CRUSH computes the PG for each object but does not know how much data is stored | |
542 | in each OSD associated with the PG, the ratio between the number of PGs and the | |
543 | number of OSDs can have a significant influence on data distribution. | |
544 | ||
545 | For example, suppose that there is only a single PG for ten OSDs in a | |
546 | three-replica pool. In that case, only three OSDs would be used because CRUSH | |
547 | would have no other option. However, if more PGs are available, RADOS objects are | |
548 | more likely to be evenly distributed across OSDs. CRUSH makes every effort to | |
549 | distribute OSDs evenly across all existing PGs. | |
550 | ||
551 | As long as there are one or two orders of magnitude more PGs than OSDs, the | |
552 | distribution is likely to be even. For example: 256 PGs for 3 OSDs, 512 PGs for | |
553 | 10 OSDs, or 1024 PGs for 10 OSDs. | |
554 | ||
555 | However, uneven data distribution can emerge due to factors other than the | |
556 | ratio of PGs to OSDs. For example, since CRUSH does not take into account the | |
557 | size of the RADOS objects, the presence of a few very large RADOS objects can | |
558 | create an imbalance. Suppose that one million 4 KB RADOS objects totaling 4 GB | |
559 | are evenly distributed among 1024 PGs on 10 OSDs. These RADOS objects will | |
560 | consume 4 GB / 10 = 400 MB on each OSD. If a single 400 MB RADOS object is then | |
561 | added to the pool, the three OSDs supporting the PG in which the RADOS object | |
562 | has been placed will each be filled with 400 MB + 400 MB = 800 MB but the seven | |
563 | other OSDs will still contain only 400 MB. | |
7c673cae FG |
564 | |
565 | .. _resource usage: | |
566 | ||
567 | Memory, CPU and network usage | |
568 | ----------------------------- | |
569 | ||
1e59de90 TL |
570 | Every PG in the cluster imposes memory, network, and CPU demands upon OSDs and |
571 | MONs. These needs must be met at all times and are increased during recovery. | |
572 | Indeed, one of the main reasons PGs were developed was to share this overhead | |
573 | by clustering objects together. | |
7c673cae | 574 | |
1e59de90 | 575 | For this reason, minimizing the number of PGs saves significant resources. |
7c673cae | 576 | |
11fdf7f2 TL |
577 | .. _choosing-number-of-placement-groups: |
578 | ||
1e59de90 TL |
579 | Choosing the Number of PGs |
580 | ========================== | |
7c673cae | 581 | |
1e59de90 TL |
582 | .. note: It is rarely necessary to do the math in this section by hand. |
583 | Instead, use the ``ceph osd pool autoscale-status`` command in combination | |
584 | with the ``target_size_bytes`` or ``target_size_ratio`` pool properties. For | |
585 | more information, see :ref:`pg-autoscaler`. | |
11fdf7f2 | 586 | |
1e59de90 TL |
587 | If you have more than 50 OSDs, we recommend approximately 50-100 PGs per OSD in |
588 | order to balance resource usage, data durability, and data distribution. If you | |
589 | have fewer than 50 OSDs, follow the guidance in the `preselection`_ section. | |
590 | For a single pool, use the following formula to get a baseline value: | |
7c673cae | 591 | |
f67539c2 | 592 | Total PGs = :math:`\frac{OSDs \times 100}{pool \: size}` |
7c673cae | 593 | |
1e59de90 TL |
594 | Here **pool size** is either the number of replicas for replicated pools or the |
595 | K+M sum for erasure-coded pools. To retrieve this sum, run the command ``ceph | |
596 | osd erasure-code-profile get``. | |
7c673cae | 597 | |
1e59de90 TL |
598 | Next, check whether the resulting baseline value is consistent with the way you |
599 | designed your Ceph cluster to maximize `data durability`_ and `object | |
600 | distribution`_ and to minimize `resource usage`_. | |
7c673cae | 601 | |
1e59de90 | 602 | This value should be **rounded up to the nearest power of two**. |
eafe8130 | 603 | |
1e59de90 TL |
604 | Each pool's ``pg_num`` should be a power of two. Other values are likely to |
605 | result in uneven distribution of data across OSDs. It is best to increase | |
606 | ``pg_num`` for a pool only when it is feasible and desirable to set the next | |
607 | highest power of two. Note that this power of two rule is per-pool; it is | |
608 | neither necessary nor easy to align the sum of all pools' ``pg_num`` to a power | |
609 | of two. | |
7c673cae | 610 | |
1e59de90 TL |
611 | For example, if you have a cluster with 200 OSDs and a single pool with a size |
612 | of 3 replicas, estimate the number of PGs as follows: | |
7c673cae | 613 | |
1e59de90 | 614 | :math:`\frac{200 \times 100}{3} = 6667`. Rounded up to the nearest power of 2: 8192. |
7c673cae | 615 | |
1e59de90 TL |
616 | When using multiple data pools to store objects, make sure that you balance the |
617 | number of PGs per pool against the number of PGs per OSD so that you arrive at | |
618 | a reasonable total number of PGs. It is important to find a number that | |
619 | provides reasonably low variance per OSD without taxing system resources or | |
620 | making the peering process too slow. | |
7c673cae | 621 | |
1e59de90 TL |
622 | For example, suppose you have a cluster of 10 pools, each with 512 PGs on 10 |
623 | OSDs. That amounts to 5,120 PGs distributed across 10 OSDs, or 512 PGs per OSD. | |
624 | This cluster will not use too many resources. However, in a cluster of 1,000 | |
625 | pools, each with 512 PGs on 10 OSDs, the OSDs will have to handle ~50,000 PGs | |
626 | each. This cluster will require significantly more resources and significantly | |
627 | more time for peering. | |
7c673cae | 628 | |
1e59de90 TL |
629 | For determining the optimal number of PGs per OSD, we recommend the `PGCalc`_ |
630 | tool. | |
224ce89b WB |
631 | |
632 | ||
7c673cae FG |
633 | .. _setting the number of placement groups: |
634 | ||
1e59de90 TL |
635 | Setting the Number of PGs |
636 | ========================= | |
7c673cae | 637 | |
1e59de90 TL |
638 | Setting the initial number of PGs in a pool must be done at the time you create |
639 | the pool. See `Create a Pool`_ for details. | |
640 | ||
641 | However, even after a pool is created, if the ``pg_autoscaler`` is not being | |
642 | used to manage ``pg_num`` values, you can change the number of PGs by running a | |
643 | command of the following form: | |
7c673cae | 644 | |
39ae355f TL |
645 | .. prompt:: bash # |
646 | ||
647 | ceph osd pool set {pool-name} pg_num {pg_num} | |
7c673cae | 648 | |
1e59de90 TL |
649 | If you increase the number of PGs, your cluster will not rebalance until you |
650 | increase the number of PGs for placement (``pgp_num``). The ``pgp_num`` | |
651 | parameter specifies the number of PGs that are to be considered for placement | |
652 | by the CRUSH algorithm. Increasing ``pg_num`` splits the PGs in your cluster, | |
653 | but data will not be migrated to the newer PGs until ``pgp_num`` is increased. | |
654 | The ``pgp_num`` parameter should be equal to the ``pg_num`` parameter. To | |
655 | increase the number of PGs for placement, run a command of the following form: | |
39ae355f TL |
656 | |
657 | .. prompt:: bash # | |
7c673cae | 658 | |
39ae355f | 659 | ceph osd pool set {pool-name} pgp_num {pgp_num} |
7c673cae | 660 | |
1e59de90 TL |
661 | If you decrease the number of PGs, then ``pgp_num`` is adjusted automatically. |
662 | In releases of Ceph that are Nautilus and later (inclusive), when the | |
663 | ``pg_autoscaler`` is not used, ``pgp_num`` is automatically stepped to match | |
664 | ``pg_num``. This process manifests as periods of remapping of PGs and of | |
665 | backfill, and is expected behavior and normal. | |
666 | ||
aee94f69 | 667 | .. _rados_ops_pgs_get_pg_num: |
7c673cae | 668 | |
1e59de90 TL |
669 | Get the Number of PGs |
670 | ===================== | |
7c673cae | 671 | |
1e59de90 | 672 | To get the number of PGs in a pool, run a command of the following form: |
7c673cae | 673 | |
39ae355f | 674 | .. prompt:: bash # |
1e59de90 | 675 | |
39ae355f | 676 | ceph osd pool get {pool-name} pg_num |
7c673cae FG |
677 | |
678 | ||
679 | Get a Cluster's PG Statistics | |
680 | ============================= | |
681 | ||
1e59de90 TL |
682 | To see the details of the PGs in your cluster, run a command of the following |
683 | form: | |
7c673cae | 684 | |
39ae355f TL |
685 | .. prompt:: bash # |
686 | ||
687 | ceph pg dump [--format {format}] | |
7c673cae FG |
688 | |
689 | Valid formats are ``plain`` (default) and ``json``. | |
690 | ||
691 | ||
692 | Get Statistics for Stuck PGs | |
693 | ============================ | |
694 | ||
1e59de90 TL |
695 | To see the statistics for all PGs that are stuck in a specified state, run a |
696 | command of the following form: | |
39ae355f TL |
697 | |
698 | .. prompt:: bash # | |
7c673cae | 699 | |
39ae355f | 700 | ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format <format>] [-t|--threshold <seconds>] |
7c673cae | 701 | |
1e59de90 TL |
702 | - **Inactive** PGs cannot process reads or writes because they are waiting for |
703 | enough OSDs with the most up-to-date data to come ``up`` and ``in``. | |
7c673cae | 704 | |
1e59de90 TL |
705 | - **Undersized** PGs contain objects that have not been replicated the desired |
706 | number of times. Under normal conditions, it can be assumed that these PGs | |
707 | are recovering. | |
7c673cae | 708 | |
1e59de90 TL |
709 | - **Stale** PGs are in an unknown state -- the OSDs that host them have not |
710 | reported to the monitor cluster for a certain period of time (determined by | |
711 | ``mon_osd_report_timeout``). | |
7c673cae | 712 | |
1e59de90 TL |
713 | Valid formats are ``plain`` (default) and ``json``. The threshold defines the |
714 | minimum number of seconds the PG is stuck before it is included in the returned | |
715 | statistics (default: 300). | |
7c673cae FG |
716 | |
717 | ||
718 | Get a PG Map | |
719 | ============ | |
720 | ||
1e59de90 | 721 | To get the PG map for a particular PG, run a command of the following form: |
7c673cae | 722 | |
39ae355f | 723 | .. prompt:: bash # |
7c673cae | 724 | |
39ae355f | 725 | ceph pg map {pg-id} |
7c673cae | 726 | |
39ae355f | 727 | For example: |
7c673cae | 728 | |
39ae355f | 729 | .. prompt:: bash # |
7c673cae | 730 | |
39ae355f TL |
731 | ceph pg map 1.6c |
732 | ||
1e59de90 TL |
733 | Ceph will return the PG map, the PG, and the OSD status. The output resembles |
734 | the following: | |
39ae355f TL |
735 | |
736 | .. prompt:: bash # | |
737 | ||
738 | osdmap e13 pg 1.6c (1.6c) -> up [1,0] acting [1,0] | |
7c673cae FG |
739 | |
740 | ||
1e59de90 TL |
741 | Get a PG's Statistics |
742 | ===================== | |
7c673cae | 743 | |
1e59de90 | 744 | To see statistics for a particular PG, run a command of the following form: |
39ae355f TL |
745 | |
746 | .. prompt:: bash # | |
7c673cae | 747 | |
39ae355f | 748 | ceph pg {pg-id} query |
7c673cae FG |
749 | |
750 | ||
1e59de90 TL |
751 | Scrub a PG |
752 | ========== | |
7c673cae | 753 | |
1e59de90 | 754 | To scrub a PG, run a command of the following form: |
7c673cae | 755 | |
39ae355f TL |
756 | .. prompt:: bash # |
757 | ||
758 | ceph pg scrub {pg-id} | |
7c673cae | 759 | |
1e59de90 TL |
760 | Ceph checks the primary and replica OSDs, generates a catalog of all objects in |
761 | the PG, and compares the objects against each other in order to ensure that no | |
762 | objects are missing or mismatched and that their contents are consistent. If | |
763 | the replicas all match, then a final semantic sweep takes place to ensure that | |
764 | all snapshot-related object metadata is consistent. Errors are reported in | |
765 | logs. | |
7c673cae | 766 | |
1e59de90 | 767 | To scrub all PGs from a specific pool, run a command of the following form: |
39ae355f TL |
768 | |
769 | .. prompt:: bash # | |
11fdf7f2 | 770 | |
39ae355f | 771 | ceph osd pool scrub {pool-name} |
11fdf7f2 | 772 | |
c07f9fc5 | 773 | |
1e59de90 TL |
774 | Prioritize backfill/recovery of PG(s) |
775 | ===================================== | |
776 | ||
777 | You might encounter a situation in which multiple PGs require recovery or | |
778 | backfill, but the data in some PGs is more important than the data in others | |
779 | (for example, some PGs hold data for images that are used by running machines | |
780 | and other PGs are used by inactive machines and hold data that is less | |
781 | relevant). In that case, you might want to prioritize recovery or backfill of | |
782 | the PGs with especially important data so that the performance of the cluster | |
783 | and the availability of their data are restored sooner. To designate specific | |
784 | PG(s) as prioritized during recovery, run a command of the following form: | |
c07f9fc5 | 785 | |
39ae355f TL |
786 | .. prompt:: bash # |
787 | ||
788 | ceph pg force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...] | |
1e59de90 TL |
789 | |
790 | To mark specific PG(s) as prioritized during backfill, run a command of the | |
791 | following form: | |
792 | ||
793 | .. prompt:: bash # | |
794 | ||
39ae355f | 795 | ceph pg force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...] |
c07f9fc5 | 796 | |
1e59de90 TL |
797 | These commands instruct Ceph to perform recovery or backfill on the specified |
798 | PGs before processing the other PGs. Prioritization does not interrupt current | |
799 | backfills or recovery, but places the specified PGs at the top of the queue so | |
800 | that they will be acted upon next. If you change your mind or realize that you | |
801 | have prioritized the wrong PGs, run one or both of the following commands: | |
39ae355f TL |
802 | |
803 | .. prompt:: bash # | |
c07f9fc5 | 804 | |
39ae355f TL |
805 | ceph pg cancel-force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...] |
806 | ceph pg cancel-force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...] | |
c07f9fc5 | 807 | |
1e59de90 TL |
808 | These commands remove the ``force`` flag from the specified PGs, so that the |
809 | PGs will be processed in their usual order. As in the case of adding the | |
810 | ``force`` flag, this affects only those PGs that are still queued but does not | |
811 | affect PGs currently undergoing recovery. | |
c07f9fc5 | 812 | |
1e59de90 TL |
813 | The ``force`` flag is cleared automatically after recovery or backfill of the |
814 | PGs is complete. | |
7c673cae | 815 | |
1e59de90 TL |
816 | Similarly, to instruct Ceph to prioritize all PGs from a specified pool (that |
817 | is, to perform recovery or backfill on those PGs first), run one or both of the | |
818 | following commands: | |
11fdf7f2 | 819 | |
39ae355f | 820 | .. prompt:: bash # |
11fdf7f2 | 821 | |
39ae355f TL |
822 | ceph osd pool force-recovery {pool-name} |
823 | ceph osd pool force-backfill {pool-name} | |
11fdf7f2 | 824 | |
1e59de90 TL |
825 | These commands can also be cancelled. To revert to the default order, run one |
826 | or both of the following commands: | |
39ae355f TL |
827 | |
828 | .. prompt:: bash # | |
829 | ||
830 | ceph osd pool cancel-force-recovery {pool-name} | |
831 | ceph osd pool cancel-force-backfill {pool-name} | |
11fdf7f2 | 832 | |
1e59de90 TL |
833 | .. warning:: These commands can break the order of Ceph's internal priority |
834 | computations, so use them with caution! If you have multiple pools that are | |
835 | currently sharing the same underlying OSDs, and if the data held by certain | |
836 | pools is more important than the data held by other pools, then we recommend | |
837 | that you run a command of the following form to arrange a custom | |
838 | recovery/backfill priority for all pools: | |
11fdf7f2 | 839 | |
39ae355f TL |
840 | .. prompt:: bash # |
841 | ||
842 | ceph osd pool set {pool-name} recovery_priority {value} | |
11fdf7f2 | 843 | |
1e59de90 TL |
844 | For example, if you have twenty pools, you could make the most important pool |
845 | priority ``20``, and the next most important pool priority ``19``, and so on. | |
846 | ||
847 | Another option is to set the recovery/backfill priority for only a proper | |
848 | subset of pools. In such a scenario, three important pools might (all) be | |
849 | assigned priority ``1`` and all other pools would be left without an assigned | |
850 | recovery/backfill priority. Another possibility is to select three important | |
851 | pools and set their recovery/backfill priorities to ``3``, ``2``, and ``1`` | |
852 | respectively. | |
11fdf7f2 | 853 | |
1e59de90 TL |
854 | .. important:: Numbers of greater value have higher priority than numbers of |
855 | lesser value when using ``ceph osd pool set {pool-name} recovery_priority | |
856 | {value}`` to set their recovery/backfill priority. For example, a pool with | |
857 | the recovery/backfill priority ``30`` has a higher priority than a pool with | |
858 | the recovery/backfill priority ``15``. | |
7c673cae | 859 | |
1e59de90 TL |
860 | Reverting Lost RADOS Objects |
861 | ============================ | |
862 | ||
863 | If the cluster has lost one or more RADOS objects and you have decided to | |
7c673cae | 864 | abandon the search for the lost data, you must mark the unfound objects |
1e59de90 TL |
865 | ``lost``. |
866 | ||
867 | If every possible location has been queried and all OSDs are ``up`` and ``in``, | |
868 | but certain RADOS objects are still lost, you might have to give up on those | |
869 | objects. This situation can arise when rare and unusual combinations of | |
870 | failures allow the cluster to learn about writes that were performed before the | |
871 | writes themselves were recovered. | |
7c673cae | 872 | |
1e59de90 TL |
873 | The command to mark a RADOS object ``lost`` has only one supported option: |
874 | ``revert``. The ``revert`` option will either roll back to a previous version | |
875 | of the RADOS object (if it is old enough to have a previous version) or forget | |
876 | about it entirely (if it is too new to have a previous version). To mark the | |
877 | "unfound" objects ``lost``, run a command of the following form: | |
7c673cae | 878 | |
39ae355f TL |
879 | |
880 | .. prompt:: bash # | |
7c673cae | 881 | |
39ae355f | 882 | ceph pg {pg-id} mark_unfound_lost revert|delete |
7c673cae | 883 | |
1e59de90 TL |
884 | .. important:: Use this feature with caution. It might confuse applications |
885 | that expect the object(s) to exist. | |
7c673cae FG |
886 | |
887 | ||
888 | .. toctree:: | |
889 | :hidden: | |
890 | ||
891 | pg-states | |
892 | pg-concepts | |
893 | ||
894 | ||
895 | .. _Create a Pool: ../pools#createpool | |
896 | .. _Mapping PGs to OSDs: ../../../architecture#mapping-pgs-to-osds | |
33c7a0ef | 897 | .. _pgcalc: https://old.ceph.com/pgcalc/ |