]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | .. _upmap: |
2 | ||
1e59de90 TL |
3 | Using pg-upmap |
4 | ============== | |
c07f9fc5 | 5 | |
1e59de90 | 6 | In Luminous v12.2.z and later releases, there is a *pg-upmap* exception table |
c07f9fc5 | 7 | in the OSDMap that allows the cluster to explicitly map specific PGs to |
1e59de90 TL |
8 | specific OSDs. This allows the cluster to fine-tune the data distribution to, |
9 | in most cases, uniformly distribute PGs across OSDs. | |
c07f9fc5 | 10 | |
1e59de90 TL |
11 | However, there is an important caveat when it comes to this new feature: it |
12 | requires all clients to understand the new *pg-upmap* structure in the OSDMap. | |
c07f9fc5 FG |
13 | |
14 | Enabling | |
15 | -------- | |
16 | ||
1e59de90 TL |
17 | In order to use ``pg-upmap``, the cluster cannot have any pre-Luminous clients. |
18 | By default, new clusters enable the *balancer module*, which makes use of | |
19 | ``pg-upmap``. If you want to use a different balancer or you want to make your | |
20 | own custom ``pg-upmap`` entries, you might want to turn off the balancer in | |
21 | order to avoid conflict: | |
f67539c2 | 22 | |
39ae355f TL |
23 | .. prompt:: bash $ |
24 | ||
25 | ceph balancer off | |
f67539c2 | 26 | |
1e59de90 TL |
27 | To allow use of the new feature on an existing cluster, you must restrict the |
28 | cluster to supporting only Luminous (and newer) clients. To do so, run the | |
29 | following command: | |
39ae355f TL |
30 | |
31 | .. prompt:: bash $ | |
c07f9fc5 | 32 | |
39ae355f | 33 | ceph osd set-require-min-compat-client luminous |
c07f9fc5 | 34 | |
1e59de90 TL |
35 | This command will fail if any pre-Luminous clients or daemons are connected to |
36 | the monitors. To see which client versions are in use, run the following | |
37 | command: | |
39ae355f TL |
38 | |
39 | .. prompt:: bash $ | |
c07f9fc5 | 40 | |
39ae355f | 41 | ceph features |
c07f9fc5 | 42 | |
92f5a8d4 | 43 | Balancer module |
1e59de90 | 44 | --------------- |
c07f9fc5 | 45 | |
1e59de90 TL |
46 | The `balancer` module for ``ceph-mgr`` will automatically balance the number of |
47 | PGs per OSD. See :ref:`balancer` | |
c07f9fc5 FG |
48 | |
49 | Offline optimization | |
50 | -------------------- | |
51 | ||
1e59de90 TL |
52 | Upmap entries are updated with an offline optimizer that is built into |
53 | ``osdmaptool``. | |
c07f9fc5 | 54 | |
39ae355f | 55 | #. Grab the latest copy of your osdmap: |
c07f9fc5 | 56 | |
39ae355f | 57 | .. prompt:: bash $ |
c07f9fc5 | 58 | |
39ae355f | 59 | ceph osd getmap -o om |
c07f9fc5 | 60 | |
39ae355f TL |
61 | #. Run the optimizer: |
62 | ||
63 | .. prompt:: bash $ | |
64 | ||
65 | osdmaptool om --upmap out.txt [--upmap-pool <pool>] \ | |
66 | [--upmap-max <max-optimizations>] \ | |
67 | [--upmap-deviation <max-deviation>] \ | |
68 | [--upmap-active] | |
c07f9fc5 FG |
69 | |
70 | It is highly recommended that optimization be done for each pool | |
1e59de90 TL |
71 | individually, or for sets of similarly utilized pools. You can specify the |
72 | ``--upmap-pool`` option multiple times. "Similarly utilized pools" means | |
73 | pools that are mapped to the same devices and that store the same kind of | |
74 | data (for example, RBD image pools are considered to be similarly utilized; | |
75 | an RGW index pool and an RGW data pool are not considered to be similarly | |
76 | utilized). | |
77 | ||
78 | The ``max-optimizations`` value determines the maximum number of upmap | |
79 | entries to identify. The default is `10` (as is the case with the | |
80 | ``ceph-mgr`` balancer module), but you should use a larger number if you are | |
81 | doing offline optimization. If it cannot find any additional changes to | |
82 | make (that is, if the pool distribution is perfect), it will stop early. | |
83 | ||
84 | The ``max-deviation`` value defaults to `5`. If an OSD's PG count varies | |
85 | from the computed target number by no more than this amount it will be | |
86 | considered perfect. | |
87 | ||
88 | The ``--upmap-active`` option simulates the behavior of the active balancer | |
89 | in upmap mode. It keeps cycling until the OSDs are balanced and reports how | |
90 | many rounds have occurred and how long each round takes. The elapsed time | |
91 | for rounds indicates the CPU load that ``ceph-mgr`` consumes when it computes | |
92 | the next optimization plan. | |
92f5a8d4 | 93 | |
39ae355f TL |
94 | #. Apply the changes: |
95 | ||
96 | .. prompt:: bash $ | |
c07f9fc5 | 97 | |
39ae355f | 98 | source out.txt |
c07f9fc5 | 99 | |
1e59de90 TL |
100 | In the above example, the proposed changes are written to the output file |
101 | ``out.txt``. The commands in this procedure are normal Ceph CLI commands | |
102 | that can be run in order to apply the changes to the cluster. | |
92f5a8d4 | 103 | |
1e59de90 TL |
104 | The above steps can be repeated as many times as necessary to achieve a perfect |
105 | distribution of PGs for each set of pools. | |
c07f9fc5 | 106 | |
1e59de90 TL |
107 | To see some (gory) details about what the tool is doing, you can pass |
108 | ``--debug-osd 10`` to ``osdmaptool``. To see even more details, pass | |
109 | ``--debug-crush 10`` to ``osdmaptool``. |