ceph/doc/radosgw/dynamicresharding.rst

   1 .. _rgw_dynamic_bucket_index_resharding:
   2
   3 ===================================
   4 RGW Dynamic Bucket Index Resharding
   5 ===================================
   6
   7 .. versionadded:: Luminous
   8
   9 A large bucket index can lead to performance problems. In order
  10 to address this problem we introduced bucket index sharding.
  11 Until Luminous, changing the number of bucket shards (resharding)
  12 needed to be done offline. Starting with Luminous we support
  13 online bucket resharding.
  14
  15 Each bucket index shard can handle its entries efficiently up until
  16 reaching a certain threshold number of entries. If this threshold is
  17 exceeded the system can suffer from performance issues. The dynamic
  18 resharding feature detects this situation and automatically increases
  19 the number of shards used by the bucket index, resulting in a
  20 reduction of the number of entries in each bucket index shard. This
  21 process is transparent to the user. Write I/Os to the target bucket
  22 are blocked and read I/Os are not during resharding process.
  23
  24 By default dynamic bucket index resharding can only increase the
  25 number of bucket index shards to 1999, although this upper-bound is a
  26 configuration parameter (see Configuration below). When
  27 possible, the process chooses a prime number of bucket index shards to
  28 spread the number of bucket index entries across the bucket index
  29 shards more evenly.
  30
  31 The detection process runs in a background process that periodically
  32 scans all the buckets. A bucket that requires resharding is added to
  33 the resharding queue and will be scheduled to be resharded later. The
  34 reshard thread runs in the background and execute the scheduled
  35 resharding tasks, one at a time.
  36
  37 Multisite
  38 =========
  39
  40 Prior to the Reef release, RGW does not support dynamic resharding in a
  41 multisite environment. For information on dynamic resharding, see
  42 :ref:`Resharding <feature_resharding>` in the RGW multisite documentation.
  43
  44 Configuration
  45 =============
  46
  47 Enable/Disable dynamic bucket index resharding:
  48
  49 - ``rgw_dynamic_resharding``:  true/false, default: true
  50
  51 Configuration options that control the resharding process:
  52
  53 - ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 objects
  54
  55 - ``rgw_max_dynamic_shards``: maximum number of shards that dynamic bucket index resharding can increase to, default: 1999
  56
  57 - ``rgw_reshard_bucket_lock_duration``: duration, in seconds, of lock on bucket obj during resharding, default: 360 seconds (i.e., 6 minutes)
  58
  59 - ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)
  60
  61 - ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16
  62
  63 Admin commands
  64 ==============
  65
  66 Add a bucket to the resharding queue
  67 ------------------------------------
  68
  69 ::
  70
  71    # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
  72
  73 List resharding queue
  74 ---------------------
  75
  76 ::
  77
  78    # radosgw-admin reshard list
  79
  80 Process tasks on the resharding queue
  81 -------------------------------------
  82
  83 ::
  84
  85    # radosgw-admin reshard process
  86
  87 Bucket resharding status
  88 ------------------------
  89
  90 ::
  91
  92    # radosgw-admin reshard status --bucket <bucket_name>
  93
  94 The output is a json array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.
  95
  96 For example, the output at different Dynamic Resharding stages is shown below:
  97
  98 ``1. Before resharding occurred:``
  99 ::
 100
 101   [
 102     {
 103         "reshard_status": "not-resharding",
 104         "new_bucket_instance_id": "",
 105         "num_shards": -1
 106     }
 107   ]
 108
 109 ``2. During resharding:``
 110 ::
 111
 112   [
 113     {
 114         "reshard_status": "in-progress",
 115         "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
 116         "num_shards": 2
 117     },
 118     {
 119         "reshard_status": "in-progress",
 120         "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
 121         "num_shards": 2
 122     }
 123   ]
 124
 125 ``3, After resharding completed:``
 126 ::
 127
 128   [
 129     {
 130         "reshard_status": "not-resharding",
 131         "new_bucket_instance_id": "",
 132         "num_shards": -1
 133     },
 134     {
 135         "reshard_status": "not-resharding",
 136         "new_bucket_instance_id": "",
 137         "num_shards": -1
 138     }
 139   ]
 140
 141
 142 Cancel pending bucket resharding
 143 --------------------------------
 144
 145 Note: Ongoing bucket resharding operations cannot be cancelled. ::
 146
 147    # radosgw-admin reshard cancel --bucket <bucket_name>
 148
 149 Manual immediate bucket resharding
 150 ----------------------------------
 151
 152 ::
 153
 154    # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>
 155
 156 When choosing a number of shards, the administrator should keep a
 157 number of items in mind. Ideally the administrator is aiming for no
 158 more than 100000 entries per shard, now and through some future point
 159 in time.
 160
 161 Additionally, bucket index shards that are prime numbers tend to work
 162 better in evenly distributing bucket index entries across the
 163 shards. For example, 7001 bucket index shards is better than 7000
 164 since the former is prime. A variety of web sites have lists of prime
 165 numbers; search for "list of prime numbers" withy your favorite web
 166 search engine to locate some web sites.
 167
 168 Troubleshooting
 169 ===============
 170
 171 Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
 172 instance entries, which were not automatically cleaned up. The issue also affected
 173 LifeCycle policies, which were not applied to resharded buckets anymore. Both of
 174 these issues can be worked around using a couple of radosgw-admin commands.
 175
 176 Stale instance management
 177 -------------------------
 178
 179 List the stale instances in a cluster that are ready to be cleaned up.
 180
 181 ::
 182
 183    # radosgw-admin reshard stale-instances list
 184
 185 Clean up the stale instances in a cluster. Note: cleanup of these
 186 instances should only be done on a single site cluster.
 187
 188 ::
 189
 190    # radosgw-admin reshard stale-instances rm
 191
 192
 193 Lifecycle fixes
 194 ---------------
 195
 196 For clusters that had resharded instances, it is highly likely that the old
 197 lifecycle processes would have flagged and deleted lifecycle processing as the
 198 bucket instance changed during a reshard. While this is fixed for newer clusters
 199 (from Mimic 13.2.6 and Luminous 12.2.12), older buckets that had lifecycle policies and
 200 that have undergone resharding will have to be manually fixed.
 201
 202 The command to do so is:
 203
 204 ::
 205
 206    # radosgw-admin lc reshard fix --bucket {bucketname}
 207
 208
 209 As a convenience wrapper, if the ``--bucket`` argument is dropped then this
 210 command will try and fix lifecycle policies for all the buckets in the cluster.
 211
 212 Object Expirer fixes
 213 --------------------
 214
 215 Objects subject to Swift object expiration on older clusters may have
 216 been dropped from the log pool and never deleted after the bucket was
 217 resharded. This would happen if their expiration time was before the
 218 cluster was upgraded, but if their expiration was after the upgrade
 219 the objects would be correctly handled. To manage these expire-stale
 220 objects, radosgw-admin provides two subcommands.
 221
 222 Listing:
 223
 224 ::
 225
 226    # radosgw-admin objects expire-stale list --bucket {bucketname}
 227
 228 Displays a list of object names and expiration times in JSON format.
 229
 230 Deleting:
 231
 232 ::
 233
 234    # radosgw-admin objects expire-stale rm --bucket {bucketname}
 235
 236
 237 Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.