[ceph.git] / ceph / doc / radosgw / dynamicresharding.rst

.. _rgw_dynamic_bucket_index_resharding:

===================================
RGW Dynamic Bucket Index Resharding
===================================

.. versionadded:: Luminous

A large bucket index can lead to performance problems. In order
to address this problem we introduced bucket index sharding.
Until Luminous, changing the number of bucket shards (resharding)
needed to be done offline. Starting with Luminous we support
online bucket resharding.

Each bucket index shard can handle its entries efficiently up until
reaching a certain threshold number of entries. If this threshold is
exceeded the system can suffer from performance issues. The dynamic
resharding feature detects this situation and automatically increases
the number of shards used by the bucket index, resulting in a
reduction of the number of entries in each bucket index shard. This
process is transparent to the user. Write I/Os to the target bucket
are blocked and read I/Os are not during resharding process.

By default dynamic bucket index resharding can only increase the
number of bucket index shards to 1999, although this upper-bound is a
configuration parameter (see Configuration below). When
possible, the process chooses a prime number of bucket index shards to
spread the number of bucket index entries across the bucket index
shards more evenly.

The detection process runs in a background process that periodically
scans all the buckets. A bucket that requires resharding is added to
the resharding queue and will be scheduled to be resharded later. The
reshard thread runs in the background and execute the scheduled
resharding tasks, one at a time.

Multisite
=========

Prior to the Reef release, RGW does not support dynamic resharding in a
multisite environment. For information on dynamic resharding, see
:ref:`Resharding <feature_resharding>` in the RGW multisite documentation.

Configuration
=============

Enable/Disable dynamic bucket index resharding:

- ``rgw_dynamic_resharding``:  true/false, default: true

Configuration options that control the resharding process:

- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 objects

- ``rgw_max_dynamic_shards``: maximum number of shards that dynamic bucket index resharding can increase to, default: 1999

- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, of lock on bucket obj during resharding, default: 360 seconds (i.e., 6 minutes)

- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)

- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16

Admin commands
==============

Add a bucket to the resharding queue
------------------------------------

::

   # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>

List resharding queue
---------------------

::

   # radosgw-admin reshard list

Process tasks on the resharding queue
-------------------------------------

::

   # radosgw-admin reshard process

Bucket resharding status
------------------------

::

   # radosgw-admin reshard status --bucket <bucket_name>

The output is a json array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.

For example, the output at different Dynamic Resharding stages is shown below:

``1. Before resharding occurred:``
::

  [
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    }
  ]

``2. During resharding:``
::

  [
    {
        "reshard_status": "in-progress",
        "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
        "num_shards": 2
    },
    {
        "reshard_status": "in-progress",
        "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
        "num_shards": 2
    }
  ]

``3, After resharding completed:``
::

  [
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    },
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    }
  ]


Cancel pending bucket resharding
--------------------------------

Note: Ongoing bucket resharding operations cannot be cancelled. ::

   # radosgw-admin reshard cancel --bucket <bucket_name>

Manual immediate bucket resharding
----------------------------------

::

   # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>

When choosing a number of shards, the administrator should keep a
number of items in mind. Ideally the administrator is aiming for no
more than 100000 entries per shard, now and through some future point
in time.

Additionally, bucket index shards that are prime numbers tend to work
better in evenly distributing bucket index entries across the
shards. For example, 7001 bucket index shards is better than 7000
since the former is prime. A variety of web sites have lists of prime
numbers; search for "list of prime numbers" withy your favorite web
search engine to locate some web sites.

Troubleshooting
===============

Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
instance entries, which were not automatically cleaned up. The issue also affected
LifeCycle policies, which were not applied to resharded buckets anymore. Both of
these issues can be worked around using a couple of radosgw-admin commands.

Stale instance management
-------------------------

List the stale instances in a cluster that are ready to be cleaned up.

::

   # radosgw-admin reshard stale-instances list

Clean up the stale instances in a cluster. Note: cleanup of these
instances should only be done on a single site cluster.

::

   # radosgw-admin reshard stale-instances rm


Lifecycle fixes
---------------

For clusters that had resharded instances, it is highly likely that the old
lifecycle processes would have flagged and deleted lifecycle processing as the
bucket instance changed during a reshard. While this is fixed for newer clusters
(from Mimic 13.2.6 and Luminous 12.2.12), older buckets that had lifecycle policies and
that have undergone resharding will have to be manually fixed.

The command to do so is:

::

   # radosgw-admin lc reshard fix --bucket {bucketname}


As a convenience wrapper, if the ``--bucket`` argument is dropped then this
command will try and fix lifecycle policies for all the buckets in the cluster.

Object Expirer fixes
--------------------

Objects subject to Swift object expiration on older clusters may have
been dropped from the log pool and never deleted after the bucket was
resharded. This would happen if their expiration time was before the
cluster was upgraded, but if their expiration was after the upgrade
the objects would be correctly handled. To manage these expire-stale
objects, radosgw-admin provides two subcommands.

Listing:

::

   # radosgw-admin objects expire-stale list --bucket {bucketname}

Displays a list of object names and expiration times in JSON format.

Deleting:

::

   # radosgw-admin objects expire-stale rm --bucket {bucketname}


Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.
Commit	Line	Data
11fdf7f2 TL	1	.. _rgw_dynamic_bucket_index_resharding:
	2
	3	===================================
	4	RGW Dynamic Bucket Index Resharding
	5	===================================
	6
	7	.. versionadded:: Luminous
	8
	9	A large bucket index can lead to performance problems. In order
	10	to address this problem we introduced bucket index sharding.
	11	Until Luminous, changing the number of bucket shards (resharding)
81eedcae	12	needed to be done offline. Starting with Luminous we support
11fdf7f2 TL	13	online bucket resharding.
	14
	15	Each bucket index shard can handle its entries efficiently up until
9f95a23c	16	reaching a certain threshold number of entries. If this threshold is
f67539c2	17	exceeded the system can suffer from performance issues. The dynamic
9f95a23c	18	resharding feature detects this situation and automatically increases
f67539c2	19	the number of shards used by the bucket index, resulting in a
9f95a23c	20	reduction of the number of entries in each bucket index shard. This
20effc67 TL	21	process is transparent to the user. Write I/Os to the target bucket
20effc67 TL	22	are blocked and read I/Os are not during resharding process.
9f95a23c TL	23
9f95a23c TL	24	By default dynamic bucket index resharding can only increase the
f67539c2 TL	25	number of bucket index shards to 1999, although this upper-bound is a
f67539c2 TL	26	configuration parameter (see Configuration below). When
9f95a23c	27	possible, the process chooses a prime number of bucket index shards to
f67539c2	28	spread the number of bucket index entries across the bucket index
9f95a23c TL	29	shards more evenly.
	30
	31	The detection process runs in a background process that periodically
	32	scans all the buckets. A bucket that requires resharding is added to
	33	the resharding queue and will be scheduled to be resharded later. The
	34	reshard thread runs in the background and execute the scheduled
	35	resharding tasks, one at a time.
11fdf7f2 TL	36
	37	Multisite
	38	=========
81eedcae	39
1e59de90 TL	40	Prior to the Reef release, RGW does not support dynamic resharding in a
	41	multisite environment. For information on dynamic resharding, see
	42	:ref:`Resharding <feature_resharding>` in the RGW multisite documentation.
11fdf7f2 TL	43
	44	Configuration
	45	=============
	46
81eedcae	47	Enable/Disable dynamic bucket index resharding:
11fdf7f2	48
81eedcae	49	- ``rgw_dynamic_resharding``: true/false, default: true
11fdf7f2	50
81eedcae	51	Configuration options that control the resharding process:
11fdf7f2	52
9f95a23c	53	- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 objects
11fdf7f2	54
9f95a23c	55	- ``rgw_max_dynamic_shards``: maximum number of shards that dynamic bucket index resharding can increase to, default: 1999
11fdf7f2	56
9f95a23c	57	- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, of lock on bucket obj during resharding, default: 360 seconds (i.e., 6 minutes)
11fdf7f2	58
9f95a23c	59	- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)
11fdf7f2	60
9f95a23c	61	- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16
11fdf7f2 TL	62
	63	Admin commands
	64	==============
	65
	66	Add a bucket to the resharding queue
	67	------------------------------------
	68
	69	::
	70
	71	# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
	72
	73	List resharding queue
	74	---------------------
	75
	76	::
	77
	78	# radosgw-admin reshard list
	79
81eedcae TL	80	Process tasks on the resharding queue
81eedcae TL	81	-------------------------------------
11fdf7f2 TL	82
	83	::
	84
	85	# radosgw-admin reshard process
	86
	87	Bucket resharding status
	88	------------------------
	89
	90	::
	91
	92	# radosgw-admin reshard status --bucket <bucket_name>
	93
494da23a TL	94	The output is a json array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.
	95
	96	For example, the output at different Dynamic Resharding stages is shown below:
	97
	98	``1. Before resharding occurred:``
	99	::
	100
	101	[
	102	{
	103	"reshard_status": "not-resharding",
	104	"new_bucket_instance_id": "",
	105	"num_shards": -1
	106	}
	107	]
	108
	109	``2. During resharding:``
	110	::
	111
	112	[
	113	{
	114	"reshard_status": "in-progress",
	115	"new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
	116	"num_shards": 2
	117	},
	118	{
	119	"reshard_status": "in-progress",
	120	"new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
	121	"num_shards": 2
	122	}
	123	]
	124
	125	``3, After resharding completed:``
	126	::
	127
	128	[
	129	{
	130	"reshard_status": "not-resharding",
	131	"new_bucket_instance_id": "",
	132	"num_shards": -1
	133	},
	134	{
	135	"reshard_status": "not-resharding",
	136	"new_bucket_instance_id": "",
	137	"num_shards": -1
	138	}
	139	]
	140
	141
11fdf7f2 TL	142	Cancel pending bucket resharding
	143	--------------------------------
	144
81eedcae	145	Note: Ongoing bucket resharding operations cannot be cancelled. ::
11fdf7f2 TL	146
	147	# radosgw-admin reshard cancel --bucket <bucket_name>
	148
81eedcae TL	149	Manual immediate bucket resharding
81eedcae TL	150	----------------------------------
11fdf7f2 TL	151
	152	::
	153
	154	# radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>
	155
9f95a23c TL	156	When choosing a number of shards, the administrator should keep a
	157	number of items in mind. Ideally the administrator is aiming for no
	158	more than 100000 entries per shard, now and through some future point
	159	in time.
	160
	161	Additionally, bucket index shards that are prime numbers tend to work
	162	better in evenly distributing bucket index entries across the
	163	shards. For example, 7001 bucket index shards is better than 7000
	164	since the former is prime. A variety of web sites have lists of prime
	165	numbers; search for "list of prime numbers" withy your favorite web
	166	search engine to locate some web sites.
11fdf7f2 TL	167
	168	Troubleshooting
	169	===============
	170
	171	Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
81eedcae TL	172	instance entries, which were not automatically cleaned up. The issue also affected
81eedcae TL	173	LifeCycle policies, which were not applied to resharded buckets anymore. Both of
11fdf7f2 TL	174	these issues can be worked around using a couple of radosgw-admin commands.
11fdf7f2 TL	175
81eedcae	176	Stale instance management
11fdf7f2 TL	177	-------------------------
11fdf7f2 TL	178
81eedcae TL	179	List the stale instances in a cluster that are ready to be cleaned up.
81eedcae TL	180
11fdf7f2 TL	181	::
	182
	183	# radosgw-admin reshard stale-instances list
	184
81eedcae TL	185	Clean up the stale instances in a cluster. Note: cleanup of these
81eedcae TL	186	instances should only be done on a single site cluster.
11fdf7f2 TL	187
	188	::
	189
	190	# radosgw-admin reshard stale-instances rm
	191
	192
	193	Lifecycle fixes
	194	---------------
	195
81eedcae TL	196	For clusters that had resharded instances, it is highly likely that the old
81eedcae TL	197	lifecycle processes would have flagged and deleted lifecycle processing as the
11fdf7f2	198	bucket instance changed during a reshard. While this is fixed for newer clusters
81eedcae TL	199	(from Mimic 13.2.6 and Luminous 12.2.12), older buckets that had lifecycle policies and
	200	that have undergone resharding will have to be manually fixed.
	201
	202	The command to do so is:
11fdf7f2 TL	203
	204	::
	205
	206	# radosgw-admin lc reshard fix --bucket {bucketname}
	207
	208
	209	As a convenience wrapper, if the ``--bucket`` argument is dropped then this
81eedcae TL	210	command will try and fix lifecycle policies for all the buckets in the cluster.
	211
	212	Object Expirer fixes
	213	--------------------
	214
	215	Objects subject to Swift object expiration on older clusters may have
	216	been dropped from the log pool and never deleted after the bucket was
	217	resharded. This would happen if their expiration time was before the
	218	cluster was upgraded, but if their expiration was after the upgrade
	219	the objects would be correctly handled. To manage these expire-stale
	220	objects, radosgw-admin provides two subcommands.
	221
	222	Listing:
	223
	224	::
	225
	226	# radosgw-admin objects expire-stale list --bucket {bucketname}
	227
	228	Displays a list of object names and expiration times in JSON format.
	229
	230	Deleting:
	231
	232	::
	233
	234	# radosgw-admin objects expire-stale rm --bucket {bucketname}
	235
	236
	237	Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.