]> git.proxmox.com Git - ceph.git/blame - ceph/doc/radosgw/dynamicresharding.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / radosgw / dynamicresharding.rst
CommitLineData
11fdf7f2
TL
1.. _rgw_dynamic_bucket_index_resharding:
2
3===================================
4RGW Dynamic Bucket Index Resharding
5===================================
6
7.. versionadded:: Luminous
8
aee94f69
TL
9A large bucket index can lead to performance problems, which can
10be addressed by sharding bucket indexes.
11fdf7f2 11Until Luminous, changing the number of bucket shards (resharding)
aee94f69
TL
12needed to be done offline, with RGW services disabled.
13Since the Luminous release Ceph has supported online bucket resharding.
11fdf7f2
TL
14
15Each bucket index shard can handle its entries efficiently up until
aee94f69 16reaching a certain threshold. If this threshold is
f67539c2 17exceeded the system can suffer from performance issues. The dynamic
9f95a23c 18resharding feature detects this situation and automatically increases
aee94f69
TL
19the number of shards used by a bucket's index, resulting in a
20reduction of the number of entries in each shard. This
21process is transparent to the user. Writes to the target bucket
22are blocked (but reads are not) briefly during resharding process.
9f95a23c
TL
23
24By default dynamic bucket index resharding can only increase the
f67539c2
TL
25number of bucket index shards to 1999, although this upper-bound is a
26configuration parameter (see Configuration below). When
aee94f69
TL
27possible, the process chooses a prime number of shards in order to
28spread the number of entries across the bucket index
9f95a23c
TL
29shards more evenly.
30
aee94f69
TL
31Detection of resharding opportunities runs as a background process
32that periodically
33scans all buckets. A bucket that requires resharding is added to
34a queue. A thread runs in the background and processes the queueued
35resharding tasks, one at a time and in order.
11fdf7f2
TL
36
37Multisite
38=========
81eedcae 39
aee94f69
TL
40With Ceph releases Prior to Reef, the Ceph Object Gateway (RGW) does not support
41dynamic resharding in a
1e59de90
TL
42multisite environment. For information on dynamic resharding, see
43:ref:`Resharding <feature_resharding>` in the RGW multisite documentation.
11fdf7f2
TL
44
45Configuration
46=============
47
81eedcae 48Enable/Disable dynamic bucket index resharding:
11fdf7f2 49
81eedcae 50- ``rgw_dynamic_resharding``: true/false, default: true
11fdf7f2 51
81eedcae 52Configuration options that control the resharding process:
11fdf7f2 53
aee94f69 54- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000
11fdf7f2 55
aee94f69 56- ``rgw_max_dynamic_shards``: maximum number of bucket index shards that dynamic resharding can increase to, default: 1999
11fdf7f2 57
aee94f69 58- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, that writes to the bucket are locked during resharding, default: 360 (i.e., 6 minutes)
11fdf7f2 59
9f95a23c 60- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)
11fdf7f2 61
9f95a23c 62- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16
11fdf7f2
TL
63
64Admin commands
65==============
66
67Add a bucket to the resharding queue
68------------------------------------
69
70::
71
72 # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
73
74List resharding queue
75---------------------
76
77::
78
79 # radosgw-admin reshard list
80
81eedcae
TL
81Process tasks on the resharding queue
82-------------------------------------
11fdf7f2
TL
83
84::
85
86 # radosgw-admin reshard process
87
88Bucket resharding status
89------------------------
90
91::
92
93 # radosgw-admin reshard status --bucket <bucket_name>
94
aee94f69 95The output is a JSON array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.
494da23a 96
aee94f69 97For example, the output at each dynamic resharding stage is shown below:
494da23a
TL
98
99``1. Before resharding occurred:``
100::
101
102 [
103 {
104 "reshard_status": "not-resharding",
105 "new_bucket_instance_id": "",
106 "num_shards": -1
107 }
108 ]
109
110``2. During resharding:``
111::
112
113 [
114 {
115 "reshard_status": "in-progress",
116 "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
117 "num_shards": 2
118 },
119 {
120 "reshard_status": "in-progress",
121 "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
122 "num_shards": 2
123 }
124 ]
125
aee94f69 126``3. After resharding completed:``
494da23a
TL
127::
128
129 [
130 {
131 "reshard_status": "not-resharding",
132 "new_bucket_instance_id": "",
133 "num_shards": -1
134 },
135 {
136 "reshard_status": "not-resharding",
137 "new_bucket_instance_id": "",
138 "num_shards": -1
139 }
140 ]
141
142
11fdf7f2
TL
143Cancel pending bucket resharding
144--------------------------------
145
aee94f69 146Note: Bucket resharding operations cannot be cancelled while executing. ::
11fdf7f2
TL
147
148 # radosgw-admin reshard cancel --bucket <bucket_name>
149
81eedcae
TL
150Manual immediate bucket resharding
151----------------------------------
11fdf7f2
TL
152
153::
154
155 # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>
156
aee94f69
TL
157When choosing a number of shards, the administrator must anticipate each
158bucket's peak number of objects. Ideally one should aim for no
159more than 100000 entries per shard at any given time.
9f95a23c 160
aee94f69
TL
161Additionally, bucket index shards that are prime numbers are more effective
162in evenly distributing bucket index entries.
163For example, 7001 bucket index shards is better than 7000
9f95a23c 164since the former is prime. A variety of web sites have lists of prime
aee94f69 165numbers; search for "list of prime numbers" with your favorite
9f95a23c 166search engine to locate some web sites.
11fdf7f2
TL
167
168Troubleshooting
169===============
170
171Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
aee94f69
TL
172instance entries, which were not automatically cleaned up. This issue also affected
173LifeCycle policies, which were no longer applied to resharded buckets. Both of
174these issues could be worked around by running ``radosgw-admin`` commands.
11fdf7f2 175
81eedcae 176Stale instance management
11fdf7f2
TL
177-------------------------
178
81eedcae
TL
179List the stale instances in a cluster that are ready to be cleaned up.
180
11fdf7f2
TL
181::
182
183 # radosgw-admin reshard stale-instances list
184
81eedcae 185Clean up the stale instances in a cluster. Note: cleanup of these
aee94f69 186instances should only be done on a single-site cluster.
11fdf7f2
TL
187
188::
189
190 # radosgw-admin reshard stale-instances rm
191
192
193Lifecycle fixes
194---------------
195
aee94f69 196For clusters with resharded instances, it is highly likely that the old
81eedcae 197lifecycle processes would have flagged and deleted lifecycle processing as the
aee94f69
TL
198bucket instance changed during a reshard. While this is fixed for buckets
199deployed on newer Ceph releases (from Mimic 13.2.6 and Luminous 12.2.12),
200older buckets that had lifecycle policies and that have undergone
201resharding must be fixed manually.
81eedcae
TL
202
203The command to do so is:
11fdf7f2
TL
204
205::
206
207 # radosgw-admin lc reshard fix --bucket {bucketname}
208
209
aee94f69
TL
210If the ``--bucket`` argument is not provided, this
211command will try to fix lifecycle policies for all the buckets in the cluster.
81eedcae
TL
212
213Object Expirer fixes
214--------------------
215
216Objects subject to Swift object expiration on older clusters may have
217been dropped from the log pool and never deleted after the bucket was
218resharded. This would happen if their expiration time was before the
219cluster was upgraded, but if their expiration was after the upgrade
220the objects would be correctly handled. To manage these expire-stale
aee94f69 221objects, ``radosgw-admin`` provides two subcommands.
81eedcae
TL
222
223Listing:
224
225::
226
227 # radosgw-admin objects expire-stale list --bucket {bucketname}
228
229Displays a list of object names and expiration times in JSON format.
230
231Deleting:
232
233::
234
235 # radosgw-admin objects expire-stale rm --bucket {bucketname}
236
237
238Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.