ceph/doc/rados/operations/balancer.rst

   1
   2 .. _balancer:
   3
   4 Balancer
   5 ========
   6
   7 The *balancer* can optimize the placement of PGs across OSDs in
   8 order to achieve a balanced distribution, either automatically or in a
   9 supervised fashion.
  10
  11 Status
  12 ------
  13
  14 The current status of the balancer can be checked at any time with::
  15
  16   ceph balancer status
  17
  18
  19 Automatic balancing
  20 -------------------
  21
  22 The automatic balancing can be enabled, using the default settings, with::
  23
  24   ceph balancer on
  25
  26 The balancer can be turned back off again with::
  27
  28   ceph balancer off
  29
  30 This will use the ``crush-compat`` mode, which is backward compatible
  31 with older clients, and will make small changes to the data
  32 distribution over time to ensure that OSDs are equally utilized.
  33
  34
  35 Throttling
  36 ----------
  37
  38 No adjustments will be made to the PG distribution if the cluster is
  39 degraded (e.g., because an OSD has failed and the system has not yet
  40 healed itself).
  41
  42 When the cluster is healthy, the balancer will throttle its changes
  43 such that the percentage of PGs that are misplaced (i.e., that need to
  44 be moved) is below a threshold of (by default) 5%.  The
  45 ``target_max_misplaced_ratio`` threshold can be adjusted with::
  46
  47   ceph config set mgr target_max_misplaced_ratio .07   # 7%
  48
  49
  50 Modes
  51 -----
  52
  53 There are currently two supported balancer modes:
  54
  55 #. **crush-compat**.  The CRUSH compat mode uses the compat weight-set
  56    feature (introduced in Luminous) to manage an alternative set of
  57    weights for devices in the CRUSH hierarchy.  The normal weights
  58    should remain set to the size of the device to reflect the target
  59    amount of data that we want to store on the device.  The balancer
  60    then optimizes the weight-set values, adjusting them up or down in
  61    small increments, in order to achieve a distribution that matches
  62    the target distribution as closely as possible.  (Because PG
  63    placement is a pseudorandom process, there is a natural amount of
  64    variation in the placement; by optimizing the weights we
  65    counter-act that natural variation.)
  66
  67    Notably, this mode is *fully backwards compatible* with older
  68    clients: when an OSDMap and CRUSH map is shared with older clients,
  69    we present the optimized weights as the "real" weights.
  70
  71    The primary restriction of this mode is that the balancer cannot
  72    handle multiple CRUSH hierarchies with different placement rules if
  73    the subtrees of the hierarchy share any OSDs.  (This is normally
  74    not the case, and is generally not a recommended configuration
  75    because it is hard to manage the space utilization on the shared
  76    OSDs.)
  77
  78 #. **upmap**.  Starting with Luminous, the OSDMap can store explicit
  79    mappings for individual OSDs as exceptions to the normal CRUSH
  80    placement calculation.  These `upmap` entries provide fine-grained
  81    control over the PG mapping.  This CRUSH mode will optimize the
  82    placement of individual PGs in order to achieve a balanced
  83    distribution.  In most cases, this distribution is "perfect," which
  84    an equal number of PGs on each OSD (+/-1 PG, since they might not
  85    divide evenly).
  86
  87    Note that using upmap requires that all clients be Luminous or newer.
  88
  89 The default mode is ``crush-compat``.  The mode can be adjusted with::
  90
  91   ceph balancer mode upmap
  92
  93 or::
  94
  95   ceph balancer mode crush-compat
  96
  97 Supervised optimization
  98 -----------------------
  99
 100 The balancer operation is broken into a few distinct phases:
 101
 102 #. building a *plan*
 103 #. evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a *plan*
 104 #. executing the *plan*
 105
 106 To evaluate and score the current distribution::
 107
 108   ceph balancer eval
 109
 110 You can also evaluate the distribution for a single pool with::
 111
 112   ceph balancer eval <pool-name>
 113
 114 Greater detail for the evaluation can be seen with::
 115
 116   ceph balancer eval-verbose ...
 117
 118 The balancer can generate a plan, using the currently configured mode, with::
 119
 120   ceph balancer optimize <plan-name>
 121
 122 The name is provided by the user and can be any useful identifying string.  The contents of a plan can be seen with::
 123
 124   ceph balancer show <plan-name>
 125
 126 All plans can be shown with::
 127
 128   ceph balancer ls
 129
 130 Old plans can be discarded with::
 131
 132   ceph balancer rm <plan-name>
 133
 134 Currently recorded plans are shown as part of the status command::
 135
 136   ceph balancer status
 137
 138 The quality of the distribution that would result after executing a plan can be calculated with::
 139
 140   ceph balancer eval <plan-name>
 141
 142 Assuming the plan is expected to improve the distribution (i.e., it has a lower score than the current cluster state), the user can execute that plan with::
 143
 144   ceph balancer execute <plan-name>