7 The *balancer* can optimize the placement of PGs across OSDs in
8 order to achieve a balanced distribution, either automatically or in a
14 The current status of the balancer can be checked at any time with::
22 The automatic balancing can be enabled, using the default settings, with::
26 The balancer can be turned back off again with::
30 This will use the ``crush-compat`` mode, which is backward compatible
31 with older clients, and will make small changes to the data
32 distribution over time to ensure that OSDs are equally utilized.
38 No adjustments will be made to the PG distribution if the cluster is
39 degraded (e.g., because an OSD has failed and the system has not yet
42 When the cluster is healthy, the balancer will throttle its changes
43 such that the percentage of PGs that are misplaced (i.e., that need to
44 be moved) is below a threshold of (by default) 5%. The
45 ``target_max_misplaced_ratio`` threshold can be adjusted with::
47 ceph config set mgr target_max_misplaced_ratio .07 # 7%
53 There are currently two supported balancer modes:
55 #. **crush-compat**. The CRUSH compat mode uses the compat weight-set
56 feature (introduced in Luminous) to manage an alternative set of
57 weights for devices in the CRUSH hierarchy. The normal weights
58 should remain set to the size of the device to reflect the target
59 amount of data that we want to store on the device. The balancer
60 then optimizes the weight-set values, adjusting them up or down in
61 small increments, in order to achieve a distribution that matches
62 the target distribution as closely as possible. (Because PG
63 placement is a pseudorandom process, there is a natural amount of
64 variation in the placement; by optimizing the weights we
65 counter-act that natural variation.)
67 Notably, this mode is *fully backwards compatible* with older
68 clients: when an OSDMap and CRUSH map is shared with older clients,
69 we present the optimized weights as the "real" weights.
71 The primary restriction of this mode is that the balancer cannot
72 handle multiple CRUSH hierarchies with different placement rules if
73 the subtrees of the hierarchy share any OSDs. (This is normally
74 not the case, and is generally not a recommended configuration
75 because it is hard to manage the space utilization on the shared
78 #. **upmap**. Starting with Luminous, the OSDMap can store explicit
79 mappings for individual OSDs as exceptions to the normal CRUSH
80 placement calculation. These `upmap` entries provide fine-grained
81 control over the PG mapping. This CRUSH mode will optimize the
82 placement of individual PGs in order to achieve a balanced
83 distribution. In most cases, this distribution is "perfect," which
84 an equal number of PGs on each OSD (+/-1 PG, since they might not
87 Note that using upmap requires that all clients be Luminous or newer.
89 The default mode is ``crush-compat``. The mode can be adjusted with::
91 ceph balancer mode upmap
95 ceph balancer mode crush-compat
97 Supervised optimization
98 -----------------------
100 The balancer operation is broken into a few distinct phases:
103 #. evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a *plan*
104 #. executing the *plan*
106 To evaluate and score the current distribution::
110 You can also evaluate the distribution for a single pool with::
112 ceph balancer eval <pool-name>
114 Greater detail for the evaluation can be seen with::
116 ceph balancer eval-verbose ...
118 The balancer can generate a plan, using the currently configured mode, with::
120 ceph balancer optimize <plan-name>
122 The name is provided by the user and can be any useful identifying string. The contents of a plan can be seen with::
124 ceph balancer show <plan-name>
126 All plans can be shown with::
130 Old plans can be discarded with::
132 ceph balancer rm <plan-name>
134 Currently recorded plans are shown as part of the status command::
138 The quality of the distribution that would result after executing a plan can be calculated with::
140 ceph balancer eval <plan-name>
142 Assuming the plan is expected to improve the distribution (i.e., it has a lower score than the current cluster state), the user can execute that plan with::
144 ceph balancer execute <plan-name>