]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/osd_internals/backfill_reservation.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / dev / osd_internals / backfill_reservation.rst
1 ====================
2 Backfill Reservation
3 ====================
4
5 When a new osd joins a cluster, all pgs containing it must eventually backfill
6 to it. If all of these backfills happen simultaneously, it would put excessive
7 load on the osd. osd_max_backfills limits the number of outgoing or
8 incoming backfills on a single node. The maximum number of outgoing backfills is
9 osd_max_backfills. The maximum number of incoming backfills is
10 osd_max_backfills. Therefore there can be a maximum of osd_max_backfills * 2
11 simultaneous backfills on one osd.
12
13 Each OSDService now has two AsyncReserver instances: one for backfills going
14 from the osd (local_reserver) and one for backfills going to the osd
15 (remote_reserver). An AsyncReserver (common/AsyncReserver.h) manages a queue
16 by priority of waiting items and a set of current reservation holders. When a
17 slot frees up, the AsyncReserver queues the Context* associated with the next
18 item on the highest priority queue in the finisher provided to the constructor.
19
20 For a primary to initiate a backfill, it must first obtain a reservation from
21 its own local_reserver. Then, it must obtain a reservation from the backfill
22 target's remote_reserver via a MBackfillReserve message. This process is
23 managed by substates of Active and ReplicaActive (see the substates of Active
24 in PG.h). The reservations are dropped either on the Backfilled event, which
25 is sent on the primary before calling recovery_complete and on the replica on
26 receipt of the BackfillComplete progress message), or upon leaving Active or
27 ReplicaActive.
28
29 It's important that we always grab the local reservation before the remote
30 reservation in order to prevent a circular dependency.
31
32 We want to minimize the risk of data loss by prioritizing the order in
33 which PGs are recovered. A user can override the default order by using
34 force-recovery or force-backfill. A force-recovery at priority 255 will start
35 before a force-backfill at priority 254.
36
37 If a recovery is needed because a PG is below min_size a base priority of 220
38 is used. The number of OSDs below min_size of the pool is added, as well as a
39 value relative to the pool's recovery_priority. The total priority is limited
40 to 253. Under ordinary circumstances a recovery is prioritized at 180 plus a
41 value relative to the pool's recovery_priority. The total priority is limited
42 to 219.
43
44 If a backfill is needed because the number of acting OSDs is less that min_size,
45 a priority of 220 is used. The number of OSDs below min_size of the pool is
46 added as well as a value relative to the pool's recovery_priority. The total
47 priority is limited to 253. If a backfill is needed because a PG is undersized,
48 a priority of 140 is used. The number of OSDs below the size of the pool is
49 added as well as a value relative to the pool's recovery_priority. The total
50 priority is limited to 179. If a backfill is needed because a PG is degraded,
51 a priority of 140 is used. A value relative to the pool's recovery_priority is
52 added. The total priority is limited to 179. Under ordinary circumstances a
53 backfill is priority of 100 is used. A value relative to the pool's
54 recovery_priority is added. The total priority is limited to 139.
55
56
57 Description Base priority Maximum priority
58 ----------- ------------- ----------------
59 Backfill 100 139
60 Degraded Backfill 140 179
61 Recovery 180 219
62 Inactive Recovery 220 253
63 Inactive Backfill 220 253
64 force-backfill 254
65 force-recovery 255