[ceph.git] / ceph / doc / radosgw / dynamicresharding.rst

.. _rgw_dynamic_bucket_index_resharding:

===================================
RGW Dynamic Bucket Index Resharding
===================================

.. versionadded:: Luminous

A large bucket index can lead to performance problems, which can
be addressed by sharding bucket indexes.
Until Luminous, changing the number of bucket shards (resharding)
needed to be done offline, with RGW services disabled.
Since the Luminous release Ceph has supported online bucket resharding.

Each bucket index shard can handle its entries efficiently up until
reaching a certain threshold. If this threshold is
exceeded the system can suffer from performance issues. The dynamic
resharding feature detects this situation and automatically increases
the number of shards used by a bucket's index, resulting in a
reduction of the number of entries in each shard. This
process is transparent to the user. Writes to the target bucket
are blocked (but reads are not) briefly during resharding process.

By default dynamic bucket index resharding can only increase the
number of bucket index shards to 1999, although this upper-bound is a
configuration parameter (see Configuration below). When
possible, the process chooses a prime number of shards in order to
spread the number of entries across the bucket index
shards more evenly.

Detection of resharding opportunities runs as a background process
that periodically
scans all buckets. A bucket that requires resharding is added to
a queue. A thread runs in the background and processes the queueued
resharding tasks, one at a time and in order.

Multisite
=========

With Ceph releases Prior to Reef, the Ceph Object Gateway (RGW) does not support
dynamic resharding in a
multisite environment. For information on dynamic resharding, see
:ref:`Resharding <feature_resharding>` in the RGW multisite documentation.

Configuration
=============

Enable/Disable dynamic bucket index resharding:

- ``rgw_dynamic_resharding``:  true/false, default: true

Configuration options that control the resharding process:

- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000

- ``rgw_max_dynamic_shards``: maximum number of bucket index shards that dynamic resharding can increase to, default: 1999

- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, that writes to the bucket are locked during resharding, default: 360 (i.e., 6 minutes)

- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)

- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16

Admin commands
==============

Add a bucket to the resharding queue
------------------------------------

::

   # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>

List resharding queue
---------------------

::

   # radosgw-admin reshard list

Process tasks on the resharding queue
-------------------------------------

::

   # radosgw-admin reshard process

Bucket resharding status
------------------------

::

   # radosgw-admin reshard status --bucket <bucket_name>

The output is a JSON array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.

For example, the output at each dynamic resharding stage is shown below:

``1. Before resharding occurred:``
::

  [
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    }
  ]

``2. During resharding:``
::

  [
    {
        "reshard_status": "in-progress",
        "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
        "num_shards": 2
    },
    {
        "reshard_status": "in-progress",
        "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
        "num_shards": 2
    }
  ]

``3. After resharding completed:``
::

  [
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    },
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    }
  ]


Cancel pending bucket resharding
--------------------------------

Note: Bucket resharding operations cannot be cancelled while executing. ::

   # radosgw-admin reshard cancel --bucket <bucket_name>

Manual immediate bucket resharding
----------------------------------

::

   # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>

When choosing a number of shards, the administrator must anticipate each
bucket's peak number of objects. Ideally one should aim for no
more than 100000 entries per shard at any given time.

Additionally, bucket index shards that are prime numbers are more effective
in evenly distributing bucket index entries.
For example, 7001 bucket index shards is better than 7000
since the former is prime. A variety of web sites have lists of prime
numbers; search for "list of prime numbers" with your favorite
search engine to locate some web sites.

Troubleshooting
===============

Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
instance entries, which were not automatically cleaned up. This issue also affected
LifeCycle policies, which were no longer applied to resharded buckets. Both of
these issues could be worked around by running ``radosgw-admin`` commands.

Stale instance management
-------------------------

List the stale instances in a cluster that are ready to be cleaned up.

::

   # radosgw-admin reshard stale-instances list

Clean up the stale instances in a cluster. Note: cleanup of these
instances should only be done on a single-site cluster.

::

   # radosgw-admin reshard stale-instances rm


Lifecycle fixes
---------------

For clusters with resharded instances, it is highly likely that the old
lifecycle processes would have flagged and deleted lifecycle processing as the
bucket instance changed during a reshard. While this is fixed for buckets
deployed on newer Ceph releases (from Mimic 13.2.6 and Luminous 12.2.12),
older buckets that had lifecycle policies and that have undergone
resharding must be fixed manually.

The command to do so is:

::

   # radosgw-admin lc reshard fix --bucket {bucketname}


If the ``--bucket`` argument is not provided, this
command will try to fix lifecycle policies for all the buckets in the cluster.

Object Expirer fixes
--------------------

Objects subject to Swift object expiration on older clusters may have
been dropped from the log pool and never deleted after the bucket was
resharded. This would happen if their expiration time was before the
cluster was upgraded, but if their expiration was after the upgrade
the objects would be correctly handled. To manage these expire-stale
objects, ``radosgw-admin`` provides two subcommands.

Listing:

::

   # radosgw-admin objects expire-stale list --bucket {bucketname}

Displays a list of object names and expiration times in JSON format.

Deleting:

::

   # radosgw-admin objects expire-stale rm --bucket {bucketname}


Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.
Commit	Line	Data
11fdf7f2 TL	1	.. _rgw_dynamic_bucket_index_resharding:
	2
	3	===================================
	4	RGW Dynamic Bucket Index Resharding
	5	===================================
	6
	7	.. versionadded:: Luminous
	8
aee94f69 TL	9	A large bucket index can lead to performance problems, which can
aee94f69 TL	10	be addressed by sharding bucket indexes.
11fdf7f2	11	Until Luminous, changing the number of bucket shards (resharding)
aee94f69 TL	12	needed to be done offline, with RGW services disabled.
aee94f69 TL	13	Since the Luminous release Ceph has supported online bucket resharding.
11fdf7f2 TL	14
11fdf7f2 TL	15	Each bucket index shard can handle its entries efficiently up until
aee94f69	16	reaching a certain threshold. If this threshold is
f67539c2	17	exceeded the system can suffer from performance issues. The dynamic
9f95a23c	18	resharding feature detects this situation and automatically increases
aee94f69 TL	19	the number of shards used by a bucket's index, resulting in a
	20	reduction of the number of entries in each shard. This
	21	process is transparent to the user. Writes to the target bucket
	22	are blocked (but reads are not) briefly during resharding process.
9f95a23c TL	23
9f95a23c TL	24	By default dynamic bucket index resharding can only increase the
f67539c2 TL	25	number of bucket index shards to 1999, although this upper-bound is a
f67539c2 TL	26	configuration parameter (see Configuration below). When
aee94f69 TL	27	possible, the process chooses a prime number of shards in order to
aee94f69 TL	28	spread the number of entries across the bucket index
9f95a23c TL	29	shards more evenly.
9f95a23c TL	30
aee94f69 TL	31	Detection of resharding opportunities runs as a background process
	32	that periodically
	33	scans all buckets. A bucket that requires resharding is added to
	34	a queue. A thread runs in the background and processes the queueued
	35	resharding tasks, one at a time and in order.
11fdf7f2 TL	36
	37	Multisite
	38	=========
81eedcae	39
aee94f69 TL	40	With Ceph releases Prior to Reef, the Ceph Object Gateway (RGW) does not support
aee94f69 TL	41	dynamic resharding in a
1e59de90 TL	42	multisite environment. For information on dynamic resharding, see
1e59de90 TL	43	:ref:`Resharding <feature_resharding>` in the RGW multisite documentation.
11fdf7f2 TL	44
	45	Configuration
	46	=============
	47
81eedcae	48	Enable/Disable dynamic bucket index resharding:
11fdf7f2	49
81eedcae	50	- ``rgw_dynamic_resharding``: true/false, default: true
11fdf7f2	51
81eedcae	52	Configuration options that control the resharding process:
11fdf7f2	53
aee94f69	54	- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000
11fdf7f2	55
aee94f69	56	- ``rgw_max_dynamic_shards``: maximum number of bucket index shards that dynamic resharding can increase to, default: 1999
11fdf7f2	57
aee94f69	58	- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, that writes to the bucket are locked during resharding, default: 360 (i.e., 6 minutes)
11fdf7f2	59
9f95a23c	60	- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)
11fdf7f2	61
9f95a23c	62	- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16
11fdf7f2 TL	63
	64	Admin commands
	65	==============
	66
	67	Add a bucket to the resharding queue
	68	------------------------------------
	69
	70	::
	71
	72	# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
	73
	74	List resharding queue
	75	---------------------
	76
	77	::
	78
	79	# radosgw-admin reshard list
	80
81eedcae TL	81	Process tasks on the resharding queue
81eedcae TL	82	-------------------------------------
11fdf7f2 TL	83
	84	::
	85
	86	# radosgw-admin reshard process
	87
	88	Bucket resharding status
	89	------------------------
	90
	91	::
	92
	93	# radosgw-admin reshard status --bucket <bucket_name>
	94
aee94f69	95	The output is a JSON array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.
494da23a	96
aee94f69	97	For example, the output at each dynamic resharding stage is shown below:
494da23a TL	98
	99	``1. Before resharding occurred:``
	100	::
	101
	102	[
	103	{
	104	"reshard_status": "not-resharding",
	105	"new_bucket_instance_id": "",
	106	"num_shards": -1
	107	}
	108	]
	109
	110	``2. During resharding:``
	111	::
	112
	113	[
	114	{
	115	"reshard_status": "in-progress",
	116	"new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
	117	"num_shards": 2
	118	},
	119	{
	120	"reshard_status": "in-progress",
	121	"new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
	122	"num_shards": 2
	123	}
	124	]
	125
aee94f69	126	``3. After resharding completed:``
494da23a TL	127	::
	128
	129	[
	130	{
	131	"reshard_status": "not-resharding",
	132	"new_bucket_instance_id": "",
	133	"num_shards": -1
	134	},
	135	{
	136	"reshard_status": "not-resharding",
	137	"new_bucket_instance_id": "",
	138	"num_shards": -1
	139	}
	140	]
	141
	142
11fdf7f2 TL	143	Cancel pending bucket resharding
	144	--------------------------------
	145
aee94f69	146	Note: Bucket resharding operations cannot be cancelled while executing. ::
11fdf7f2 TL	147
	148	# radosgw-admin reshard cancel --bucket <bucket_name>
	149
81eedcae TL	150	Manual immediate bucket resharding
81eedcae TL	151	----------------------------------
11fdf7f2 TL	152
	153	::
	154
	155	# radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>
	156
aee94f69 TL	157	When choosing a number of shards, the administrator must anticipate each
	158	bucket's peak number of objects. Ideally one should aim for no
	159	more than 100000 entries per shard at any given time.
9f95a23c	160
aee94f69 TL	161	Additionally, bucket index shards that are prime numbers are more effective
	162	in evenly distributing bucket index entries.
	163	For example, 7001 bucket index shards is better than 7000
9f95a23c	164	since the former is prime. A variety of web sites have lists of prime
aee94f69	165	numbers; search for "list of prime numbers" with your favorite
9f95a23c	166	search engine to locate some web sites.
11fdf7f2 TL	167
	168	Troubleshooting
	169	===============
	170
	171	Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
aee94f69 TL	172	instance entries, which were not automatically cleaned up. This issue also affected
	173	LifeCycle policies, which were no longer applied to resharded buckets. Both of
	174	these issues could be worked around by running ``radosgw-admin`` commands.
11fdf7f2	175
81eedcae	176	Stale instance management
11fdf7f2 TL	177	-------------------------
11fdf7f2 TL	178
81eedcae TL	179	List the stale instances in a cluster that are ready to be cleaned up.
81eedcae TL	180
11fdf7f2 TL	181	::
	182
	183	# radosgw-admin reshard stale-instances list
	184
81eedcae	185	Clean up the stale instances in a cluster. Note: cleanup of these
aee94f69	186	instances should only be done on a single-site cluster.
11fdf7f2 TL	187
	188	::
	189
	190	# radosgw-admin reshard stale-instances rm
	191
	192
	193	Lifecycle fixes
	194	---------------
	195
aee94f69	196	For clusters with resharded instances, it is highly likely that the old
81eedcae	197	lifecycle processes would have flagged and deleted lifecycle processing as the
aee94f69 TL	198	bucket instance changed during a reshard. While this is fixed for buckets
	199	deployed on newer Ceph releases (from Mimic 13.2.6 and Luminous 12.2.12),
	200	older buckets that had lifecycle policies and that have undergone
	201	resharding must be fixed manually.
81eedcae TL	202
81eedcae TL	203	The command to do so is:
11fdf7f2 TL	204
	205	::
	206
	207	# radosgw-admin lc reshard fix --bucket {bucketname}
	208
	209
aee94f69 TL	210	If the ``--bucket`` argument is not provided, this
aee94f69 TL	211	command will try to fix lifecycle policies for all the buckets in the cluster.
81eedcae TL	212
	213	Object Expirer fixes
	214	--------------------
	215
	216	Objects subject to Swift object expiration on older clusters may have
	217	been dropped from the log pool and never deleted after the bucket was
	218	resharded. This would happen if their expiration time was before the
	219	cluster was upgraded, but if their expiration was after the upgrade
	220	the objects would be correctly handled. To manage these expire-stale
aee94f69	221	objects, ``radosgw-admin`` provides two subcommands.
81eedcae TL	222
	223	Listing:
	224
	225	::
	226
	227	# radosgw-admin objects expire-stale list --bucket {bucketname}
	228
	229	Displays a list of object names and expiration times in JSON format.
	230
	231	Deleting:
	232
	233	::
	234
	235	# radosgw-admin objects expire-stale rm --bucket {bucketname}
	236
	237
	238	Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.