ceph/doc/rados/operations/upmap.rst

   1 .. _upmap:
   2
   3 =======================================
   4 Using pg-upmap
   5 =======================================
   6
   7 In Luminous v12.2.z and later releases, there is a *pg-upmap* exception table
   8 in the OSDMap that allows the cluster to explicitly map specific PGs to
   9 specific OSDs. This allows the cluster to fine-tune the data distribution to,
  10 in most cases, uniformly distribute PGs across OSDs.
  11
  12 However, there is an important caveat when it comes to this new feature: it
  13 requires all clients to understand the new *pg-upmap* structure in the OSDMap.
  14
  15 Online Optimization
  16 ===================
  17
  18 Enabling
  19 --------
  20
  21 In order to use ``pg-upmap``, the cluster cannot have any pre-Luminous clients.
  22 By default, new clusters enable the *balancer module*, which makes use of
  23 ``pg-upmap``. If you want to use a different balancer or you want to make your
  24 own custom ``pg-upmap`` entries, you might want to turn off the balancer in
  25 order to avoid conflict:
  26
  27 .. prompt:: bash $
  28
  29    ceph balancer off
  30
  31 To allow use of the new feature on an existing cluster, you must restrict the
  32 cluster to supporting only Luminous (and newer) clients.  To do so, run the
  33 following command:
  34
  35 .. prompt:: bash $
  36
  37    ceph osd set-require-min-compat-client luminous
  38
  39 This command will fail if any pre-Luminous clients or daemons are connected to
  40 the monitors. To see which client versions are in use, run the following
  41 command:
  42
  43 .. prompt:: bash $
  44
  45    ceph features
  46
  47 Balancer Module
  48 ---------------
  49
  50 The `balancer` module for ``ceph-mgr`` will automatically balance the number of
  51 PGs per OSD. See :ref:`balancer`
  52
  53 Offline Optimization
  54 ====================
  55
  56 Upmap entries are updated with an offline optimizer that is built into the
  57 :ref:`osdmaptool`.
  58
  59 #. Grab the latest copy of your osdmap:
  60
  61    .. prompt:: bash $
  62
  63       ceph osd getmap -o om
  64
  65 #. Run the optimizer:
  66
  67    .. prompt:: bash $
  68
  69       osdmaptool om --upmap out.txt [--upmap-pool <pool>] \
  70       [--upmap-max <max-optimizations>] \
  71       [--upmap-deviation <max-deviation>] \
  72       [--upmap-active]
  73
  74    It is highly recommended that optimization be done for each pool
  75    individually, or for sets of similarly utilized pools. You can specify the
  76    ``--upmap-pool`` option multiple times. "Similarly utilized pools" means
  77    pools that are mapped to the same devices and that store the same kind of
  78    data (for example, RBD image pools are considered to be similarly utilized;
  79    an RGW index pool and an RGW data pool are not considered to be similarly
  80    utilized).
  81
  82    The ``max-optimizations`` value determines the maximum number of upmap
  83    entries to identify. The default is `10` (as is the case with the
  84    ``ceph-mgr`` balancer module), but you should use a larger number if you are
  85    doing offline optimization.  If it cannot find any additional changes to
  86    make (that is, if the pool distribution is perfect), it will stop early.
  87
  88    The ``max-deviation`` value defaults to `5`. If an OSD's PG count varies
  89    from the computed target number by no more than this amount it will be
  90    considered perfect.
  91
  92    The ``--upmap-active`` option simulates the behavior of the active balancer
  93    in upmap mode. It keeps cycling until the OSDs are balanced and reports how
  94    many rounds have occurred and how long each round takes. The elapsed time
  95    for rounds indicates the CPU load that ``ceph-mgr`` consumes when it computes
  96    the next optimization plan.
  97
  98 #. Apply the changes:
  99
 100    .. prompt:: bash $
 101
 102       source out.txt
 103
 104    In the above example, the proposed changes are written to the output file
 105    ``out.txt``. The commands in this procedure are normal Ceph CLI commands
 106    that can be run in order to apply the changes to the cluster.
 107
 108 The above steps can be repeated as many times as necessary to achieve a perfect
 109 distribution of PGs for each set of pools.
 110
 111 To see some (gory) details about what the tool is doing, you can pass
 112 ``--debug-osd 10`` to ``osdmaptool``. To see even more details, pass
 113 ``--debug-crush 10`` to ``osdmaptool``.