]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/upmap.rst
update ceph source to reef 18.1.2
[ceph.git] / ceph / doc / rados / operations / upmap.rst
1 .. _upmap:
2
3 Using pg-upmap
4 ==============
5
6 In Luminous v12.2.z and later releases, there is a *pg-upmap* exception table
7 in the OSDMap that allows the cluster to explicitly map specific PGs to
8 specific OSDs. This allows the cluster to fine-tune the data distribution to,
9 in most cases, uniformly distribute PGs across OSDs.
10
11 However, there is an important caveat when it comes to this new feature: it
12 requires all clients to understand the new *pg-upmap* structure in the OSDMap.
13
14 Enabling
15 --------
16
17 In order to use ``pg-upmap``, the cluster cannot have any pre-Luminous clients.
18 By default, new clusters enable the *balancer module*, which makes use of
19 ``pg-upmap``. If you want to use a different balancer or you want to make your
20 own custom ``pg-upmap`` entries, you might want to turn off the balancer in
21 order to avoid conflict:
22
23 .. prompt:: bash $
24
25 ceph balancer off
26
27 To allow use of the new feature on an existing cluster, you must restrict the
28 cluster to supporting only Luminous (and newer) clients. To do so, run the
29 following command:
30
31 .. prompt:: bash $
32
33 ceph osd set-require-min-compat-client luminous
34
35 This command will fail if any pre-Luminous clients or daemons are connected to
36 the monitors. To see which client versions are in use, run the following
37 command:
38
39 .. prompt:: bash $
40
41 ceph features
42
43 Balancer module
44 ---------------
45
46 The `balancer` module for ``ceph-mgr`` will automatically balance the number of
47 PGs per OSD. See :ref:`balancer`
48
49 Offline optimization
50 --------------------
51
52 Upmap entries are updated with an offline optimizer that is built into
53 ``osdmaptool``.
54
55 #. Grab the latest copy of your osdmap:
56
57 .. prompt:: bash $
58
59 ceph osd getmap -o om
60
61 #. Run the optimizer:
62
63 .. prompt:: bash $
64
65 osdmaptool om --upmap out.txt [--upmap-pool <pool>] \
66 [--upmap-max <max-optimizations>] \
67 [--upmap-deviation <max-deviation>] \
68 [--upmap-active]
69
70 It is highly recommended that optimization be done for each pool
71 individually, or for sets of similarly utilized pools. You can specify the
72 ``--upmap-pool`` option multiple times. "Similarly utilized pools" means
73 pools that are mapped to the same devices and that store the same kind of
74 data (for example, RBD image pools are considered to be similarly utilized;
75 an RGW index pool and an RGW data pool are not considered to be similarly
76 utilized).
77
78 The ``max-optimizations`` value determines the maximum number of upmap
79 entries to identify. The default is `10` (as is the case with the
80 ``ceph-mgr`` balancer module), but you should use a larger number if you are
81 doing offline optimization. If it cannot find any additional changes to
82 make (that is, if the pool distribution is perfect), it will stop early.
83
84 The ``max-deviation`` value defaults to `5`. If an OSD's PG count varies
85 from the computed target number by no more than this amount it will be
86 considered perfect.
87
88 The ``--upmap-active`` option simulates the behavior of the active balancer
89 in upmap mode. It keeps cycling until the OSDs are balanced and reports how
90 many rounds have occurred and how long each round takes. The elapsed time
91 for rounds indicates the CPU load that ``ceph-mgr`` consumes when it computes
92 the next optimization plan.
93
94 #. Apply the changes:
95
96 .. prompt:: bash $
97
98 source out.txt
99
100 In the above example, the proposed changes are written to the output file
101 ``out.txt``. The commands in this procedure are normal Ceph CLI commands
102 that can be run in order to apply the changes to the cluster.
103
104 The above steps can be repeated as many times as necessary to achieve a perfect
105 distribution of PGs for each set of pools.
106
107 To see some (gory) details about what the tool is doing, you can pass
108 ``--debug-osd 10`` to ``osdmaptool``. To see even more details, pass
109 ``--debug-crush 10`` to ``osdmaptool``.