]>
Commit | Line | Data |
---|---|---|
c07f9fc5 FG |
1 | Using the pg-upmap |
2 | ================== | |
3 | ||
4 | Starting in Luminous v12.2.z there is a new *pg-upmap* exception table | |
5 | in the OSDMap that allows the cluster to explicitly map specific PGs to | |
6 | specific OSDs. This allows the cluster to fine-tune the data | |
7 | distribution to, in most cases, perfectly distributed PGs across OSDs. | |
8 | ||
9 | The key caveat to this new mechanism is that it requires that all | |
10 | clients understand the new *pg-upmap* structure in the OSDMap. | |
11 | ||
12 | Enabling | |
13 | -------- | |
14 | ||
15 | To allow use of the feature, you must tell the cluster that it only | |
16 | needs to support luminous (and newer) clients with:: | |
17 | ||
18 | ceph osd set-require-min-compat-client luminous | |
19 | ||
20 | This command will fail if any pre-luminous clients or daemons are | |
21 | connected to the monitors. You can see what client versions are in | |
22 | use with:: | |
23 | ||
24 | ceph features | |
25 | ||
26 | A word of caution | |
27 | ----------------- | |
28 | ||
29 | This is a new feature and not very user friendly. At the time of this | |
30 | writing we are working on a new `balancer` module for ceph-mgr that | |
31 | will eventually do all of this automatically. | |
32 | ||
33 | Until then, | |
34 | ||
35 | Offline optimization | |
36 | -------------------- | |
37 | ||
38 | Upmap entries are updated with an offline optimizer built into ``osdmaptool``. | |
39 | ||
40 | #. Grab the latest copy of your osdmap:: | |
41 | ||
42 | ceph osd getmap -o om | |
43 | ||
44 | #. Run the optimizer:: | |
45 | ||
46 | osdmaptool om --upmap out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>] | |
47 | ||
48 | It is highly recommended that optimization be done for each pool | |
49 | individually, or for sets of similarly-utilized pools. You can | |
50 | specify the ``--upmap-pool`` option multiple times. "Similar pools" | |
51 | means pools that are mapped to the same devices and store the same | |
52 | kind of data (e.g., RBD image pools, yes; RGW index pool and RGW | |
53 | data pool, no). | |
54 | ||
55 | The ``max-count`` value is the maximum number of upmap entries to | |
56 | identify in the run. The default is 100, but you may want to make | |
57 | this a smaller number so that the tool completes more quickly (but | |
58 | does less work). If it cannot find any additional changes to make | |
59 | it will stop early (i.e., when the pool distribution is perfect). | |
60 | ||
61 | The ``max-deviation`` value defaults to `.01` (i.e., 1%). If an OSD | |
62 | utilization varies from the average by less than this amount it | |
63 | will be considered perfect. | |
64 | ||
65 | #. The proposed changes are written to the output file ``out.txt`` in | |
66 | the example above. These are normal ceph CLI commands that can be | |
67 | run to apply the changes to the cluster. This can be done with:: | |
68 | ||
69 | source out.txt | |
70 | ||
71 | The above steps can be repeated as many times as necessary to achieve | |
72 | a perfect distribution of PGs for each set of pools. | |
73 | ||
74 | You can see some (gory) details about what the tool is doing by | |
75 | passing ``--debug-osd 10`` to ``osdmaptool``. |