ceph/doc/rados/operations/balancer.rst

   1
   2 .. _balancer:
   3
   4 Balancer
   5 ========
   6
   7 The *balancer* can optimize the placement of PGs across OSDs in
   8 order to achieve a balanced distribution, either automatically or in a
   9 supervised fashion.
  10
  11 Status
  12 ------
  13
  14 The current status of the balancer can be checked at any time with::
  15
  16   ceph balancer status
  17
  18
  19 Automatic balancing
  20 -------------------
  21
  22 The automatic balancing feature is enabled by default in ``upmap``
  23 mode. Please refer to :ref:`upmap` for more details. The balancer can be
  24 turned off with::
  25
  26   ceph balancer off
  27
  28 The balancer mode can be changed to ``crush-compat`` mode, which is
  29 backward compatible with older clients, and will make small changes to
  30 the data distribution over time to ensure that OSDs are equally utilized.
  31
  32
  33 Throttling
  34 ----------
  35
  36 No adjustments will be made to the PG distribution if the cluster is
  37 degraded (e.g., because an OSD has failed and the system has not yet
  38 healed itself).
  39
  40 When the cluster is healthy, the balancer will throttle its changes
  41 such that the percentage of PGs that are misplaced (i.e., that need to
  42 be moved) is below a threshold of (by default) 5%.  The
  43 ``target_max_misplaced_ratio`` threshold can be adjusted with::
  44
  45   ceph config set mgr target_max_misplaced_ratio .07   # 7%
  46
  47 Set the number of seconds to sleep in between runs of the automatic balancer::
  48
  49   ceph config set mgr mgr/balancer/sleep_interval 60
  50
  51 Set the time of day to begin automatic balancing in HHMM format::
  52
  53   ceph config set mgr mgr/balancer/begin_time 0000
  54
  55 Set the time of day to finish automatic balancing in HHMM format::
  56
  57   ceph config set mgr mgr/balancer/end_time 2400
  58
  59 Restrict automatic balancing to this day of the week or later.
  60 Uses the same conventions as crontab, 0 or 7 is Sunday, 1 is Monday, and so on::
  61
  62   ceph config set mgr mgr/balancer/begin_weekday 0
  63
  64 Restrict automatic balancing to this day of the week or earlier.
  65 Uses the same conventions as crontab, 0 or 7 is Sunday, 1 is Monday, and so on::
  66
  67   ceph config set mgr mgr/balancer/end_weekday 7
  68
  69 Pool IDs to which the automatic balancing will be limited.
  70 The default for this is an empty string, meaning all pools will be balanced.
  71 The numeric pool IDs can be gotten with the :command:`ceph osd pool ls detail` command::
  72
  73   ceph config set mgr mgr/balancer/pool_ids 1,2,3
  74
  75
  76 Modes
  77 -----
  78
  79 There are currently two supported balancer modes:
  80
  81 #. **crush-compat**.  The CRUSH compat mode uses the compat weight-set
  82    feature (introduced in Luminous) to manage an alternative set of
  83    weights for devices in the CRUSH hierarchy.  The normal weights
  84    should remain set to the size of the device to reflect the target
  85    amount of data that we want to store on the device.  The balancer
  86    then optimizes the weight-set values, adjusting them up or down in
  87    small increments, in order to achieve a distribution that matches
  88    the target distribution as closely as possible.  (Because PG
  89    placement is a pseudorandom process, there is a natural amount of
  90    variation in the placement; by optimizing the weights we
  91    counter-act that natural variation.)
  92
  93    Notably, this mode is *fully backwards compatible* with older
  94    clients: when an OSDMap and CRUSH map is shared with older clients,
  95    we present the optimized weights as the "real" weights.
  96
  97    The primary restriction of this mode is that the balancer cannot
  98    handle multiple CRUSH hierarchies with different placement rules if
  99    the subtrees of the hierarchy share any OSDs.  (This is normally
 100    not the case, and is generally not a recommended configuration
 101    because it is hard to manage the space utilization on the shared
 102    OSDs.)
 103
 104 #. **upmap**.  Starting with Luminous, the OSDMap can store explicit
 105    mappings for individual OSDs as exceptions to the normal CRUSH
 106    placement calculation.  These `upmap` entries provide fine-grained
 107    control over the PG mapping.  This CRUSH mode will optimize the
 108    placement of individual PGs in order to achieve a balanced
 109    distribution.  In most cases, this distribution is "perfect," which
 110    an equal number of PGs on each OSD (+/-1 PG, since they might not
 111    divide evenly).
 112
 113    Note that using upmap requires that all clients be Luminous or newer.
 114
 115 The default mode is ``upmap``.  The mode can be adjusted with::
 116
 117   ceph balancer mode crush-compat
 118
 119 Supervised optimization
 120 -----------------------
 121
 122 The balancer operation is broken into a few distinct phases:
 123
 124 #. building a *plan*
 125 #. evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a *plan*
 126 #. executing the *plan*
 127
 128 To evaluate and score the current distribution::
 129
 130   ceph balancer eval
 131
 132 You can also evaluate the distribution for a single pool with::
 133
 134   ceph balancer eval <pool-name>
 135
 136 Greater detail for the evaluation can be seen with::
 137
 138   ceph balancer eval-verbose ...
 139
 140 The balancer can generate a plan, using the currently configured mode, with::
 141
 142   ceph balancer optimize <plan-name>
 143
 144 The name is provided by the user and can be any useful identifying string.  The contents of a plan can be seen with::
 145
 146   ceph balancer show <plan-name>
 147
 148 All plans can be shown with::
 149
 150   ceph balancer ls
 151
 152 Old plans can be discarded with::
 153
 154   ceph balancer rm <plan-name>
 155
 156 Currently recorded plans are shown as part of the status command::
 157
 158   ceph balancer status
 159
 160 The quality of the distribution that would result after executing a plan can be calculated with::
 161
 162   ceph balancer eval <plan-name>
 163
 164 Assuming the plan is expected to improve the distribution (i.e., it has a lower score than the current cluster state), the user can execute that plan with::
 165
 166   ceph balancer execute <plan-name>
 167