ceph/doc/rados/operations/upmap.rst

   1 .. _upmap:
   2
   3 Using the pg-upmap
   4 ==================
   5
   6 Starting in Luminous v12.2.z there is a new *pg-upmap* exception table
   7 in the OSDMap that allows the cluster to explicitly map specific PGs to
   8 specific OSDs.  This allows the cluster to fine-tune the data
   9 distribution to, in most cases, perfectly distributed PGs across OSDs.
  10
  11 The key caveat to this new mechanism is that it requires that all
  12 clients understand the new *pg-upmap* structure in the OSDMap.
  13
  14 Enabling
  15 --------
  16
  17 New clusters will have this module on by default. The cluster must only
  18 have luminous (and newer) clients. You can the turn the balancer off with::
  19
  20   ceph balancer off
  21
  22 To allow use of the feature on existing clusters, you must tell the
  23 cluster that it only needs to support luminous (and newer) clients with::
  24
  25   ceph osd set-require-min-compat-client luminous
  26
  27 This command will fail if any pre-luminous clients or daemons are
  28 connected to the monitors.  You can see what client versions are in
  29 use with::
  30
  31   ceph features
  32
  33 Balancer module
  34 -----------------
  35
  36 The `balancer` module for ceph-mgr will automatically balance
  37 the number of PGs per OSD.  See :ref:`balancer`
  38
  39
  40 Offline optimization
  41 --------------------
  42
  43 Upmap entries are updated with an offline optimizer built into ``osdmaptool``.
  44
  45 #. Grab the latest copy of your osdmap::
  46
  47      ceph osd getmap -o om
  48
  49 #. Run the optimizer::
  50
  51      osdmaptool om --upmap out.txt [--upmap-pool <pool>]
  52               [--upmap-max <max-optimizations>] [--upmap-deviation <max-deviation>]
  53               [--upmap-active]
  54
  55    It is highly recommended that optimization be done for each pool
  56    individually, or for sets of similarly-utilized pools.  You can
  57    specify the ``--upmap-pool`` option multiple times.  "Similar pools"
  58    means pools that are mapped to the same devices and store the same
  59    kind of data (e.g., RBD image pools, yes; RGW index pool and RGW
  60    data pool, no).
  61
  62    The ``max-optimizations`` value is the maximum number of upmap entries to
  63    identify in the run.  The default is `10` like the ceph-mgr balancer module,
  64    but you should use a larger number if you are doing offline optimization.
  65    If it cannot find any additional changes to make it will stop early
  66    (i.e., when the pool distribution is perfect).
  67
  68    The ``max-deviation`` value defaults to `5`.  If an OSD PG count
  69    varies from the computed target number by less than or equal
  70    to this amount it will be considered perfect.
  71
  72    The ``--upmap-active`` option simulates the behavior of the active
  73    balancer in upmap mode.  It keeps cycling until the OSDs are balanced
  74    and reports how many rounds and how long each round is taking.  The
  75    elapsed time for rounds indicates the CPU load ceph-mgr will be
  76    consuming when it tries to compute the next optimization plan.
  77
  78 #. Apply the changes::
  79
  80      source out.txt
  81
  82    The proposed changes are written to the output file ``out.txt`` in
  83    the example above.  These are normal ceph CLI commands that can be
  84    run to apply the changes to the cluster.
  85
  86
  87 The above steps can be repeated as many times as necessary to achieve
  88 a perfect distribution of PGs for each set of pools.
  89
  90 You can see some (gory) details about what the tool is doing by
  91 passing ``--debug-osd 10`` and even more with ``--debug-crush 10``
  92 to ``osdmaptool``.