]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | .. _upmap: |
2 | ||
aee94f69 | 3 | ======================================= |
1e59de90 | 4 | Using pg-upmap |
aee94f69 | 5 | ======================================= |
c07f9fc5 | 6 | |
1e59de90 | 7 | In Luminous v12.2.z and later releases, there is a *pg-upmap* exception table |
c07f9fc5 | 8 | in the OSDMap that allows the cluster to explicitly map specific PGs to |
1e59de90 TL |
9 | specific OSDs. This allows the cluster to fine-tune the data distribution to, |
10 | in most cases, uniformly distribute PGs across OSDs. | |
c07f9fc5 | 11 | |
1e59de90 TL |
12 | However, there is an important caveat when it comes to this new feature: it |
13 | requires all clients to understand the new *pg-upmap* structure in the OSDMap. | |
c07f9fc5 | 14 | |
aee94f69 TL |
15 | Online Optimization |
16 | =================== | |
17 | ||
c07f9fc5 FG |
18 | Enabling |
19 | -------- | |
20 | ||
1e59de90 TL |
21 | In order to use ``pg-upmap``, the cluster cannot have any pre-Luminous clients. |
22 | By default, new clusters enable the *balancer module*, which makes use of | |
23 | ``pg-upmap``. If you want to use a different balancer or you want to make your | |
24 | own custom ``pg-upmap`` entries, you might want to turn off the balancer in | |
25 | order to avoid conflict: | |
f67539c2 | 26 | |
39ae355f TL |
27 | .. prompt:: bash $ |
28 | ||
29 | ceph balancer off | |
f67539c2 | 30 | |
1e59de90 TL |
31 | To allow use of the new feature on an existing cluster, you must restrict the |
32 | cluster to supporting only Luminous (and newer) clients. To do so, run the | |
33 | following command: | |
39ae355f TL |
34 | |
35 | .. prompt:: bash $ | |
c07f9fc5 | 36 | |
39ae355f | 37 | ceph osd set-require-min-compat-client luminous |
c07f9fc5 | 38 | |
1e59de90 TL |
39 | This command will fail if any pre-Luminous clients or daemons are connected to |
40 | the monitors. To see which client versions are in use, run the following | |
41 | command: | |
39ae355f TL |
42 | |
43 | .. prompt:: bash $ | |
c07f9fc5 | 44 | |
39ae355f | 45 | ceph features |
c07f9fc5 | 46 | |
aee94f69 | 47 | Balancer Module |
1e59de90 | 48 | --------------- |
c07f9fc5 | 49 | |
1e59de90 TL |
50 | The `balancer` module for ``ceph-mgr`` will automatically balance the number of |
51 | PGs per OSD. See :ref:`balancer` | |
c07f9fc5 | 52 | |
aee94f69 TL |
53 | Offline Optimization |
54 | ==================== | |
c07f9fc5 | 55 | |
aee94f69 TL |
56 | Upmap entries are updated with an offline optimizer that is built into the |
57 | :ref:`osdmaptool`. | |
c07f9fc5 | 58 | |
39ae355f | 59 | #. Grab the latest copy of your osdmap: |
c07f9fc5 | 60 | |
39ae355f | 61 | .. prompt:: bash $ |
c07f9fc5 | 62 | |
39ae355f | 63 | ceph osd getmap -o om |
c07f9fc5 | 64 | |
39ae355f TL |
65 | #. Run the optimizer: |
66 | ||
67 | .. prompt:: bash $ | |
68 | ||
69 | osdmaptool om --upmap out.txt [--upmap-pool <pool>] \ | |
70 | [--upmap-max <max-optimizations>] \ | |
71 | [--upmap-deviation <max-deviation>] \ | |
72 | [--upmap-active] | |
c07f9fc5 FG |
73 | |
74 | It is highly recommended that optimization be done for each pool | |
1e59de90 TL |
75 | individually, or for sets of similarly utilized pools. You can specify the |
76 | ``--upmap-pool`` option multiple times. "Similarly utilized pools" means | |
77 | pools that are mapped to the same devices and that store the same kind of | |
78 | data (for example, RBD image pools are considered to be similarly utilized; | |
79 | an RGW index pool and an RGW data pool are not considered to be similarly | |
80 | utilized). | |
81 | ||
82 | The ``max-optimizations`` value determines the maximum number of upmap | |
83 | entries to identify. The default is `10` (as is the case with the | |
84 | ``ceph-mgr`` balancer module), but you should use a larger number if you are | |
85 | doing offline optimization. If it cannot find any additional changes to | |
86 | make (that is, if the pool distribution is perfect), it will stop early. | |
87 | ||
88 | The ``max-deviation`` value defaults to `5`. If an OSD's PG count varies | |
89 | from the computed target number by no more than this amount it will be | |
90 | considered perfect. | |
91 | ||
92 | The ``--upmap-active`` option simulates the behavior of the active balancer | |
93 | in upmap mode. It keeps cycling until the OSDs are balanced and reports how | |
94 | many rounds have occurred and how long each round takes. The elapsed time | |
95 | for rounds indicates the CPU load that ``ceph-mgr`` consumes when it computes | |
96 | the next optimization plan. | |
92f5a8d4 | 97 | |
39ae355f TL |
98 | #. Apply the changes: |
99 | ||
100 | .. prompt:: bash $ | |
c07f9fc5 | 101 | |
39ae355f | 102 | source out.txt |
c07f9fc5 | 103 | |
1e59de90 TL |
104 | In the above example, the proposed changes are written to the output file |
105 | ``out.txt``. The commands in this procedure are normal Ceph CLI commands | |
106 | that can be run in order to apply the changes to the cluster. | |
92f5a8d4 | 107 | |
1e59de90 TL |
108 | The above steps can be repeated as many times as necessary to achieve a perfect |
109 | distribution of PGs for each set of pools. | |
c07f9fc5 | 110 | |
1e59de90 TL |
111 | To see some (gory) details about what the tool is doing, you can pass |
112 | ``--debug-osd 10`` to ``osdmaptool``. To see even more details, pass | |
113 | ``--debug-crush 10`` to ``osdmaptool``. |