]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/upmap.rst
update ceph source to reef 18.1.2
[ceph.git] / ceph / doc / rados / operations / upmap.rst
CommitLineData
f67539c2
TL
1.. _upmap:
2
1e59de90
TL
3Using pg-upmap
4==============
c07f9fc5 5
1e59de90 6In Luminous v12.2.z and later releases, there is a *pg-upmap* exception table
c07f9fc5 7in the OSDMap that allows the cluster to explicitly map specific PGs to
1e59de90
TL
8specific OSDs. This allows the cluster to fine-tune the data distribution to,
9in most cases, uniformly distribute PGs across OSDs.
c07f9fc5 10
1e59de90
TL
11However, there is an important caveat when it comes to this new feature: it
12requires all clients to understand the new *pg-upmap* structure in the OSDMap.
c07f9fc5
FG
13
14Enabling
15--------
16
1e59de90
TL
17In order to use ``pg-upmap``, the cluster cannot have any pre-Luminous clients.
18By default, new clusters enable the *balancer module*, which makes use of
19``pg-upmap``. If you want to use a different balancer or you want to make your
20own custom ``pg-upmap`` entries, you might want to turn off the balancer in
21order to avoid conflict:
f67539c2 22
39ae355f
TL
23.. prompt:: bash $
24
25 ceph balancer off
f67539c2 26
1e59de90
TL
27To allow use of the new feature on an existing cluster, you must restrict the
28cluster to supporting only Luminous (and newer) clients. To do so, run the
29following command:
39ae355f
TL
30
31.. prompt:: bash $
c07f9fc5 32
39ae355f 33 ceph osd set-require-min-compat-client luminous
c07f9fc5 34
1e59de90
TL
35This command will fail if any pre-Luminous clients or daemons are connected to
36the monitors. To see which client versions are in use, run the following
37command:
39ae355f
TL
38
39.. prompt:: bash $
c07f9fc5 40
39ae355f 41 ceph features
c07f9fc5 42
92f5a8d4 43Balancer module
1e59de90 44---------------
c07f9fc5 45
1e59de90
TL
46The `balancer` module for ``ceph-mgr`` will automatically balance the number of
47PGs per OSD. See :ref:`balancer`
c07f9fc5
FG
48
49Offline optimization
50--------------------
51
1e59de90
TL
52Upmap entries are updated with an offline optimizer that is built into
53``osdmaptool``.
c07f9fc5 54
39ae355f 55#. Grab the latest copy of your osdmap:
c07f9fc5 56
39ae355f 57 .. prompt:: bash $
c07f9fc5 58
39ae355f 59 ceph osd getmap -o om
c07f9fc5 60
39ae355f
TL
61#. Run the optimizer:
62
63 .. prompt:: bash $
64
65 osdmaptool om --upmap out.txt [--upmap-pool <pool>] \
66 [--upmap-max <max-optimizations>] \
67 [--upmap-deviation <max-deviation>] \
68 [--upmap-active]
c07f9fc5
FG
69
70 It is highly recommended that optimization be done for each pool
1e59de90
TL
71 individually, or for sets of similarly utilized pools. You can specify the
72 ``--upmap-pool`` option multiple times. "Similarly utilized pools" means
73 pools that are mapped to the same devices and that store the same kind of
74 data (for example, RBD image pools are considered to be similarly utilized;
75 an RGW index pool and an RGW data pool are not considered to be similarly
76 utilized).
77
78 The ``max-optimizations`` value determines the maximum number of upmap
79 entries to identify. The default is `10` (as is the case with the
80 ``ceph-mgr`` balancer module), but you should use a larger number if you are
81 doing offline optimization. If it cannot find any additional changes to
82 make (that is, if the pool distribution is perfect), it will stop early.
83
84 The ``max-deviation`` value defaults to `5`. If an OSD's PG count varies
85 from the computed target number by no more than this amount it will be
86 considered perfect.
87
88 The ``--upmap-active`` option simulates the behavior of the active balancer
89 in upmap mode. It keeps cycling until the OSDs are balanced and reports how
90 many rounds have occurred and how long each round takes. The elapsed time
91 for rounds indicates the CPU load that ``ceph-mgr`` consumes when it computes
92 the next optimization plan.
92f5a8d4 93
39ae355f
TL
94#. Apply the changes:
95
96 .. prompt:: bash $
c07f9fc5 97
39ae355f 98 source out.txt
c07f9fc5 99
1e59de90
TL
100 In the above example, the proposed changes are written to the output file
101 ``out.txt``. The commands in this procedure are normal Ceph CLI commands
102 that can be run in order to apply the changes to the cluster.
92f5a8d4 103
1e59de90
TL
104The above steps can be repeated as many times as necessary to achieve a perfect
105distribution of PGs for each set of pools.
c07f9fc5 106
1e59de90
TL
107To see some (gory) details about what the tool is doing, you can pass
108``--debug-osd 10`` to ``osdmaptool``. To see even more details, pass
109``--debug-crush 10`` to ``osdmaptool``.