]>
Commit | Line | Data |
---|---|---|
c07f9fc5 FG |
1 | Using the pg-upmap |
2 | ================== | |
3 | ||
4 | Starting in Luminous v12.2.z there is a new *pg-upmap* exception table | |
5 | in the OSDMap that allows the cluster to explicitly map specific PGs to | |
6 | specific OSDs. This allows the cluster to fine-tune the data | |
7 | distribution to, in most cases, perfectly distributed PGs across OSDs. | |
8 | ||
9 | The key caveat to this new mechanism is that it requires that all | |
10 | clients understand the new *pg-upmap* structure in the OSDMap. | |
11 | ||
12 | Enabling | |
13 | -------- | |
14 | ||
15 | To allow use of the feature, you must tell the cluster that it only | |
16 | needs to support luminous (and newer) clients with:: | |
17 | ||
18 | ceph osd set-require-min-compat-client luminous | |
19 | ||
20 | This command will fail if any pre-luminous clients or daemons are | |
21 | connected to the monitors. You can see what client versions are in | |
22 | use with:: | |
23 | ||
24 | ceph features | |
25 | ||
92f5a8d4 | 26 | Balancer module |
c07f9fc5 FG |
27 | ----------------- |
28 | ||
92f5a8d4 TL |
29 | The new `balancer` module for ceph-mgr will automatically balance |
30 | the number of PGs per OSD. See ``Balancer`` | |
c07f9fc5 | 31 | |
c07f9fc5 FG |
32 | |
33 | Offline optimization | |
34 | -------------------- | |
35 | ||
36 | Upmap entries are updated with an offline optimizer built into ``osdmaptool``. | |
37 | ||
38 | #. Grab the latest copy of your osdmap:: | |
39 | ||
40 | ceph osd getmap -o om | |
41 | ||
42 | #. Run the optimizer:: | |
43 | ||
92f5a8d4 TL |
44 | osdmaptool om --upmap out.txt [--upmap-pool <pool>] |
45 | [--upmap-max <max-optimizations>] [--upmap-deviation <max-deviation>] | |
46 | [--upmap-active] | |
c07f9fc5 FG |
47 | |
48 | It is highly recommended that optimization be done for each pool | |
49 | individually, or for sets of similarly-utilized pools. You can | |
50 | specify the ``--upmap-pool`` option multiple times. "Similar pools" | |
51 | means pools that are mapped to the same devices and store the same | |
52 | kind of data (e.g., RBD image pools, yes; RGW index pool and RGW | |
53 | data pool, no). | |
54 | ||
92f5a8d4 TL |
55 | The ``max-optimizations`` value is the maximum number of upmap entries to |
56 | identify in the run. The default is `10` like the ceph-mgr balancer module, | |
57 | but you should use a larger number if you are doing offline optimization. | |
58 | If it cannot find any additional changes to make it will stop early | |
59 | (i.e., when the pool distribution is perfect). | |
c07f9fc5 | 60 | |
92f5a8d4 TL |
61 | The ``max-deviation`` value defaults to `5`. If an OSD PG count |
62 | varies from the computed target number by less than or equal | |
63 | to this amount it will be considered perfect. | |
c07f9fc5 | 64 | |
92f5a8d4 TL |
65 | The ``--upmap-active`` option simulates the behavior of the active |
66 | balancer in upmap mode. It keeps cycling until the OSDs are balanced | |
67 | and reports how many rounds and how long each round is taking. The | |
68 | elapsed time for rounds indicates the CPU load ceph-mgr will be | |
69 | consuming when it tries to compute the next optimization plan. | |
70 | ||
71 | #. Apply the changes:: | |
c07f9fc5 FG |
72 | |
73 | source out.txt | |
74 | ||
92f5a8d4 TL |
75 | The proposed changes are written to the output file ``out.txt`` in |
76 | the example above. These are normal ceph CLI commands that can be | |
77 | run to apply the changes to the cluster. | |
78 | ||
79 | ||
c07f9fc5 FG |
80 | The above steps can be repeated as many times as necessary to achieve |
81 | a perfect distribution of PGs for each set of pools. | |
82 | ||
83 | You can see some (gory) details about what the tool is doing by | |
92f5a8d4 TL |
84 | passing ``--debug-osd 10`` and even more with ``--debug-crush 10`` |
85 | to ``osdmaptool``. |