]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | .. _rgw_dynamic_bucket_index_resharding: |
2 | ||
3 | =================================== | |
4 | RGW Dynamic Bucket Index Resharding | |
5 | =================================== | |
6 | ||
7 | .. versionadded:: Luminous | |
8 | ||
aee94f69 TL |
9 | A large bucket index can lead to performance problems, which can |
10 | be addressed by sharding bucket indexes. | |
11fdf7f2 | 11 | Until Luminous, changing the number of bucket shards (resharding) |
aee94f69 TL |
12 | needed to be done offline, with RGW services disabled. |
13 | Since the Luminous release Ceph has supported online bucket resharding. | |
11fdf7f2 TL |
14 | |
15 | Each bucket index shard can handle its entries efficiently up until | |
aee94f69 | 16 | reaching a certain threshold. If this threshold is |
f67539c2 | 17 | exceeded the system can suffer from performance issues. The dynamic |
9f95a23c | 18 | resharding feature detects this situation and automatically increases |
aee94f69 TL |
19 | the number of shards used by a bucket's index, resulting in a |
20 | reduction of the number of entries in each shard. This | |
21 | process is transparent to the user. Writes to the target bucket | |
22 | are blocked (but reads are not) briefly during resharding process. | |
9f95a23c TL |
23 | |
24 | By default dynamic bucket index resharding can only increase the | |
f67539c2 TL |
25 | number of bucket index shards to 1999, although this upper-bound is a |
26 | configuration parameter (see Configuration below). When | |
aee94f69 TL |
27 | possible, the process chooses a prime number of shards in order to |
28 | spread the number of entries across the bucket index | |
9f95a23c TL |
29 | shards more evenly. |
30 | ||
aee94f69 TL |
31 | Detection of resharding opportunities runs as a background process |
32 | that periodically | |
33 | scans all buckets. A bucket that requires resharding is added to | |
34 | a queue. A thread runs in the background and processes the queueued | |
35 | resharding tasks, one at a time and in order. | |
11fdf7f2 TL |
36 | |
37 | Multisite | |
38 | ========= | |
81eedcae | 39 | |
aee94f69 TL |
40 | With Ceph releases Prior to Reef, the Ceph Object Gateway (RGW) does not support |
41 | dynamic resharding in a | |
1e59de90 TL |
42 | multisite environment. For information on dynamic resharding, see |
43 | :ref:`Resharding <feature_resharding>` in the RGW multisite documentation. | |
11fdf7f2 TL |
44 | |
45 | Configuration | |
46 | ============= | |
47 | ||
81eedcae | 48 | Enable/Disable dynamic bucket index resharding: |
11fdf7f2 | 49 | |
81eedcae | 50 | - ``rgw_dynamic_resharding``: true/false, default: true |
11fdf7f2 | 51 | |
81eedcae | 52 | Configuration options that control the resharding process: |
11fdf7f2 | 53 | |
aee94f69 | 54 | - ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 |
11fdf7f2 | 55 | |
aee94f69 | 56 | - ``rgw_max_dynamic_shards``: maximum number of bucket index shards that dynamic resharding can increase to, default: 1999 |
11fdf7f2 | 57 | |
aee94f69 | 58 | - ``rgw_reshard_bucket_lock_duration``: duration, in seconds, that writes to the bucket are locked during resharding, default: 360 (i.e., 6 minutes) |
11fdf7f2 | 59 | |
9f95a23c | 60 | - ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes) |
11fdf7f2 | 61 | |
9f95a23c | 62 | - ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16 |
11fdf7f2 TL |
63 | |
64 | Admin commands | |
65 | ============== | |
66 | ||
67 | Add a bucket to the resharding queue | |
68 | ------------------------------------ | |
69 | ||
70 | :: | |
71 | ||
72 | # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards> | |
73 | ||
74 | List resharding queue | |
75 | --------------------- | |
76 | ||
77 | :: | |
78 | ||
79 | # radosgw-admin reshard list | |
80 | ||
81eedcae TL |
81 | Process tasks on the resharding queue |
82 | ------------------------------------- | |
11fdf7f2 TL |
83 | |
84 | :: | |
85 | ||
86 | # radosgw-admin reshard process | |
87 | ||
88 | Bucket resharding status | |
89 | ------------------------ | |
90 | ||
91 | :: | |
92 | ||
93 | # radosgw-admin reshard status --bucket <bucket_name> | |
94 | ||
aee94f69 | 95 | The output is a JSON array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard. |
494da23a | 96 | |
aee94f69 | 97 | For example, the output at each dynamic resharding stage is shown below: |
494da23a TL |
98 | |
99 | ``1. Before resharding occurred:`` | |
100 | :: | |
101 | ||
102 | [ | |
103 | { | |
104 | "reshard_status": "not-resharding", | |
105 | "new_bucket_instance_id": "", | |
106 | "num_shards": -1 | |
107 | } | |
108 | ] | |
109 | ||
110 | ``2. During resharding:`` | |
111 | :: | |
112 | ||
113 | [ | |
114 | { | |
115 | "reshard_status": "in-progress", | |
116 | "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1", | |
117 | "num_shards": 2 | |
118 | }, | |
119 | { | |
120 | "reshard_status": "in-progress", | |
121 | "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1", | |
122 | "num_shards": 2 | |
123 | } | |
124 | ] | |
125 | ||
aee94f69 | 126 | ``3. After resharding completed:`` |
494da23a TL |
127 | :: |
128 | ||
129 | [ | |
130 | { | |
131 | "reshard_status": "not-resharding", | |
132 | "new_bucket_instance_id": "", | |
133 | "num_shards": -1 | |
134 | }, | |
135 | { | |
136 | "reshard_status": "not-resharding", | |
137 | "new_bucket_instance_id": "", | |
138 | "num_shards": -1 | |
139 | } | |
140 | ] | |
141 | ||
142 | ||
11fdf7f2 TL |
143 | Cancel pending bucket resharding |
144 | -------------------------------- | |
145 | ||
aee94f69 | 146 | Note: Bucket resharding operations cannot be cancelled while executing. :: |
11fdf7f2 TL |
147 | |
148 | # radosgw-admin reshard cancel --bucket <bucket_name> | |
149 | ||
81eedcae TL |
150 | Manual immediate bucket resharding |
151 | ---------------------------------- | |
11fdf7f2 TL |
152 | |
153 | :: | |
154 | ||
155 | # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards> | |
156 | ||
aee94f69 TL |
157 | When choosing a number of shards, the administrator must anticipate each |
158 | bucket's peak number of objects. Ideally one should aim for no | |
159 | more than 100000 entries per shard at any given time. | |
9f95a23c | 160 | |
aee94f69 TL |
161 | Additionally, bucket index shards that are prime numbers are more effective |
162 | in evenly distributing bucket index entries. | |
163 | For example, 7001 bucket index shards is better than 7000 | |
9f95a23c | 164 | since the former is prime. A variety of web sites have lists of prime |
aee94f69 | 165 | numbers; search for "list of prime numbers" with your favorite |
9f95a23c | 166 | search engine to locate some web sites. |
11fdf7f2 TL |
167 | |
168 | Troubleshooting | |
169 | =============== | |
170 | ||
171 | Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket | |
aee94f69 TL |
172 | instance entries, which were not automatically cleaned up. This issue also affected |
173 | LifeCycle policies, which were no longer applied to resharded buckets. Both of | |
174 | these issues could be worked around by running ``radosgw-admin`` commands. | |
11fdf7f2 | 175 | |
81eedcae | 176 | Stale instance management |
11fdf7f2 TL |
177 | ------------------------- |
178 | ||
81eedcae TL |
179 | List the stale instances in a cluster that are ready to be cleaned up. |
180 | ||
11fdf7f2 TL |
181 | :: |
182 | ||
183 | # radosgw-admin reshard stale-instances list | |
184 | ||
81eedcae | 185 | Clean up the stale instances in a cluster. Note: cleanup of these |
aee94f69 | 186 | instances should only be done on a single-site cluster. |
11fdf7f2 TL |
187 | |
188 | :: | |
189 | ||
190 | # radosgw-admin reshard stale-instances rm | |
191 | ||
192 | ||
193 | Lifecycle fixes | |
194 | --------------- | |
195 | ||
aee94f69 | 196 | For clusters with resharded instances, it is highly likely that the old |
81eedcae | 197 | lifecycle processes would have flagged and deleted lifecycle processing as the |
aee94f69 TL |
198 | bucket instance changed during a reshard. While this is fixed for buckets |
199 | deployed on newer Ceph releases (from Mimic 13.2.6 and Luminous 12.2.12), | |
200 | older buckets that had lifecycle policies and that have undergone | |
201 | resharding must be fixed manually. | |
81eedcae TL |
202 | |
203 | The command to do so is: | |
11fdf7f2 TL |
204 | |
205 | :: | |
206 | ||
207 | # radosgw-admin lc reshard fix --bucket {bucketname} | |
208 | ||
209 | ||
aee94f69 TL |
210 | If the ``--bucket`` argument is not provided, this |
211 | command will try to fix lifecycle policies for all the buckets in the cluster. | |
81eedcae TL |
212 | |
213 | Object Expirer fixes | |
214 | -------------------- | |
215 | ||
216 | Objects subject to Swift object expiration on older clusters may have | |
217 | been dropped from the log pool and never deleted after the bucket was | |
218 | resharded. This would happen if their expiration time was before the | |
219 | cluster was upgraded, but if their expiration was after the upgrade | |
220 | the objects would be correctly handled. To manage these expire-stale | |
aee94f69 | 221 | objects, ``radosgw-admin`` provides two subcommands. |
81eedcae TL |
222 | |
223 | Listing: | |
224 | ||
225 | :: | |
226 | ||
227 | # radosgw-admin objects expire-stale list --bucket {bucketname} | |
228 | ||
229 | Displays a list of object names and expiration times in JSON format. | |
230 | ||
231 | Deleting: | |
232 | ||
233 | :: | |
234 | ||
235 | # radosgw-admin objects expire-stale rm --bucket {bucketname} | |
236 | ||
237 | ||
238 | Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format. |