[ceph.git] / ceph / doc / radosgw / dynamicresharding.rst

.. _rgw_dynamic_bucket_index_resharding:

===================================
RGW Dynamic Bucket Index Resharding
===================================

.. versionadded:: Luminous

A large bucket index can lead to performance problems. In order
to address this problem we introduced bucket index sharding.
Until Luminous, changing the number of bucket shards (resharding)
needed to be done offline. Starting with Luminous we support
online bucket resharding.

Each bucket index shard can handle its entries efficiently up until
reaching a certain threshold number of entries. If this threshold is
exceeded the system can encounter performance issues. The dynamic
resharding feature detects this situation and automatically increases
the number of shards used by the bucket index, resulting in the
reduction of the number of entries in each bucket index shard. This
process is transparent to the user.

By default dynamic bucket index resharding can only increase the
number of bucket index sharts to 1999, although the upper-bound is a
configuration parameter (see Configuration below). Furthermore, when
possible, the process chooses a prime number of bucket index shards to
help spread the number of bucket index entries across the bucket index
shards more evenly.

The detection process runs in a background process that periodically
scans all the buckets. A bucket that requires resharding is added to
the resharding queue and will be scheduled to be resharded later. The
reshard thread runs in the background and execute the scheduled
resharding tasks, one at a time.

Multisite
=========

Dynamic resharding is not supported in a multisite environment.


Configuration
=============

Enable/Disable dynamic bucket index resharding:

- ``rgw_dynamic_resharding``:  true/false, default: true

Configuration options that control the resharding process:

- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 objects

- ``rgw_max_dynamic_shards``: maximum number of shards that dynamic bucket index resharding can increase to, default: 1999

- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, of lock on bucket obj during resharding, default: 360 seconds (i.e., 6 minutes)

- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)

- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16

Admin commands
==============

Add a bucket to the resharding queue
------------------------------------

::

   # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>

List resharding queue
---------------------

::

   # radosgw-admin reshard list

Process tasks on the resharding queue
-------------------------------------

::

   # radosgw-admin reshard process

Bucket resharding status
------------------------

::

   # radosgw-admin reshard status --bucket <bucket_name>

The output is a json array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.

For example, the output at different Dynamic Resharding stages is shown below:

``1. Before resharding occurred:``
::

  [
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    }
  ]

``2. During resharding:``
::

  [
    {
        "reshard_status": "in-progress",
        "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
        "num_shards": 2
    },
    {
        "reshard_status": "in-progress",
        "new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
        "num_shards": 2
    }
  ]

``3, After resharding completed:``
::

  [
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    },
    {
        "reshard_status": "not-resharding",
        "new_bucket_instance_id": "",
        "num_shards": -1
    }
  ]


Cancel pending bucket resharding
--------------------------------

Note: Ongoing bucket resharding operations cannot be cancelled. ::

   # radosgw-admin reshard cancel --bucket <bucket_name>

Manual immediate bucket resharding
----------------------------------

::

   # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>

When choosing a number of shards, the administrator should keep a
number of items in mind. Ideally the administrator is aiming for no
more than 100000 entries per shard, now and through some future point
in time.

Additionally, bucket index shards that are prime numbers tend to work
better in evenly distributing bucket index entries across the
shards. For example, 7001 bucket index shards is better than 7000
since the former is prime. A variety of web sites have lists of prime
numbers; search for "list of prime numbers" withy your favorite web
search engine to locate some web sites.

Troubleshooting
===============

Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
instance entries, which were not automatically cleaned up. The issue also affected
LifeCycle policies, which were not applied to resharded buckets anymore. Both of
these issues can be worked around using a couple of radosgw-admin commands.

Stale instance management
-------------------------

List the stale instances in a cluster that are ready to be cleaned up.

::

   # radosgw-admin reshard stale-instances list

Clean up the stale instances in a cluster. Note: cleanup of these
instances should only be done on a single site cluster.

::

   # radosgw-admin reshard stale-instances rm


Lifecycle fixes
---------------

For clusters that had resharded instances, it is highly likely that the old
lifecycle processes would have flagged and deleted lifecycle processing as the
bucket instance changed during a reshard. While this is fixed for newer clusters
(from Mimic 13.2.6 and Luminous 12.2.12), older buckets that had lifecycle policies and
that have undergone resharding will have to be manually fixed.

The command to do so is:

::

   # radosgw-admin lc reshard fix --bucket {bucketname}


As a convenience wrapper, if the ``--bucket`` argument is dropped then this
command will try and fix lifecycle policies for all the buckets in the cluster.

Object Expirer fixes
--------------------

Objects subject to Swift object expiration on older clusters may have
been dropped from the log pool and never deleted after the bucket was
resharded. This would happen if their expiration time was before the
cluster was upgraded, but if their expiration was after the upgrade
the objects would be correctly handled. To manage these expire-stale
objects, radosgw-admin provides two subcommands.

Listing:

::

   # radosgw-admin objects expire-stale list --bucket {bucketname}

Displays a list of object names and expiration times in JSON format.

Deleting:

::

   # radosgw-admin objects expire-stale rm --bucket {bucketname}


Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.
Commit	Line	Data
11fdf7f2 TL	1	.. _rgw_dynamic_bucket_index_resharding:
	2
	3	===================================
	4	RGW Dynamic Bucket Index Resharding
	5	===================================
	6
	7	.. versionadded:: Luminous
	8
	9	A large bucket index can lead to performance problems. In order
	10	to address this problem we introduced bucket index sharding.
	11	Until Luminous, changing the number of bucket shards (resharding)
81eedcae	12	needed to be done offline. Starting with Luminous we support
11fdf7f2 TL	13	online bucket resharding.
	14
	15	Each bucket index shard can handle its entries efficiently up until
9f95a23c TL	16	reaching a certain threshold number of entries. If this threshold is
	17	exceeded the system can encounter performance issues. The dynamic
	18	resharding feature detects this situation and automatically increases
	19	the number of shards used by the bucket index, resulting in the
	20	reduction of the number of entries in each bucket index shard. This
	21	process is transparent to the user.
	22
	23	By default dynamic bucket index resharding can only increase the
	24	number of bucket index sharts to 1999, although the upper-bound is a
	25	configuration parameter (see Configuration below). Furthermore, when
	26	possible, the process chooses a prime number of bucket index shards to
	27	help spread the number of bucket index entries across the bucket index
	28	shards more evenly.
	29
	30	The detection process runs in a background process that periodically
	31	scans all the buckets. A bucket that requires resharding is added to
	32	the resharding queue and will be scheduled to be resharded later. The
	33	reshard thread runs in the background and execute the scheduled
	34	resharding tasks, one at a time.
11fdf7f2 TL	35
	36	Multisite
	37	=========
81eedcae TL	38
81eedcae TL	39	Dynamic resharding is not supported in a multisite environment.
11fdf7f2 TL	40
	41
	42	Configuration
	43	=============
	44
81eedcae	45	Enable/Disable dynamic bucket index resharding:
11fdf7f2	46
81eedcae	47	- ``rgw_dynamic_resharding``: true/false, default: true
11fdf7f2	48
81eedcae	49	Configuration options that control the resharding process:
11fdf7f2	50
9f95a23c	51	- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 objects
11fdf7f2	52
9f95a23c	53	- ``rgw_max_dynamic_shards``: maximum number of shards that dynamic bucket index resharding can increase to, default: 1999
11fdf7f2	54
9f95a23c	55	- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, of lock on bucket obj during resharding, default: 360 seconds (i.e., 6 minutes)
11fdf7f2	56
9f95a23c	57	- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds (i.e., 10 minutes)
11fdf7f2	58
9f95a23c	59	- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16
11fdf7f2 TL	60
	61	Admin commands
	62	==============
	63
	64	Add a bucket to the resharding queue
	65	------------------------------------
	66
	67	::
	68
	69	# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
	70
	71	List resharding queue
	72	---------------------
	73
	74	::
	75
	76	# radosgw-admin reshard list
	77
81eedcae TL	78	Process tasks on the resharding queue
81eedcae TL	79	-------------------------------------
11fdf7f2 TL	80
	81	::
	82
	83	# radosgw-admin reshard process
	84
	85	Bucket resharding status
	86	------------------------
	87
	88	::
	89
	90	# radosgw-admin reshard status --bucket <bucket_name>
	91
494da23a TL	92	The output is a json array of 3 objects (reshard_status, new_bucket_instance_id, num_shards) per shard.
	93
	94	For example, the output at different Dynamic Resharding stages is shown below:
	95
	96	``1. Before resharding occurred:``
	97	::
	98
	99	[
	100	{
	101	"reshard_status": "not-resharding",
	102	"new_bucket_instance_id": "",
	103	"num_shards": -1
	104	}
	105	]
	106
	107	``2. During resharding:``
	108	::
	109
	110	[
	111	{
	112	"reshard_status": "in-progress",
	113	"new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
	114	"num_shards": 2
	115	},
	116	{
	117	"reshard_status": "in-progress",
	118	"new_bucket_instance_id": "1179f470-2ebf-4630-8ec3-c9922da887fd.8652.1",
	119	"num_shards": 2
	120	}
	121	]
	122
	123	``3, After resharding completed:``
	124	::
	125
	126	[
	127	{
	128	"reshard_status": "not-resharding",
	129	"new_bucket_instance_id": "",
	130	"num_shards": -1
	131	},
	132	{
	133	"reshard_status": "not-resharding",
	134	"new_bucket_instance_id": "",
	135	"num_shards": -1
	136	}
	137	]
	138
	139
11fdf7f2 TL	140	Cancel pending bucket resharding
	141	--------------------------------
	142
81eedcae	143	Note: Ongoing bucket resharding operations cannot be cancelled. ::
11fdf7f2 TL	144
	145	# radosgw-admin reshard cancel --bucket <bucket_name>
	146
81eedcae TL	147	Manual immediate bucket resharding
81eedcae TL	148	----------------------------------
11fdf7f2 TL	149
	150	::
	151
	152	# radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>
	153
9f95a23c TL	154	When choosing a number of shards, the administrator should keep a
	155	number of items in mind. Ideally the administrator is aiming for no
	156	more than 100000 entries per shard, now and through some future point
	157	in time.
	158
	159	Additionally, bucket index shards that are prime numbers tend to work
	160	better in evenly distributing bucket index entries across the
	161	shards. For example, 7001 bucket index shards is better than 7000
	162	since the former is prime. A variety of web sites have lists of prime
	163	numbers; search for "list of prime numbers" withy your favorite web
	164	search engine to locate some web sites.
11fdf7f2 TL	165
	166	Troubleshooting
	167	===============
	168
	169	Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
81eedcae TL	170	instance entries, which were not automatically cleaned up. The issue also affected
81eedcae TL	171	LifeCycle policies, which were not applied to resharded buckets anymore. Both of
11fdf7f2 TL	172	these issues can be worked around using a couple of radosgw-admin commands.
11fdf7f2 TL	173
81eedcae	174	Stale instance management
11fdf7f2 TL	175	-------------------------
11fdf7f2 TL	176
81eedcae TL	177	List the stale instances in a cluster that are ready to be cleaned up.
81eedcae TL	178
11fdf7f2 TL	179	::
	180
	181	# radosgw-admin reshard stale-instances list
	182
81eedcae TL	183	Clean up the stale instances in a cluster. Note: cleanup of these
81eedcae TL	184	instances should only be done on a single site cluster.
11fdf7f2 TL	185
	186	::
	187
	188	# radosgw-admin reshard stale-instances rm
	189
	190
	191	Lifecycle fixes
	192	---------------
	193
81eedcae TL	194	For clusters that had resharded instances, it is highly likely that the old
81eedcae TL	195	lifecycle processes would have flagged and deleted lifecycle processing as the
11fdf7f2	196	bucket instance changed during a reshard. While this is fixed for newer clusters
81eedcae TL	197	(from Mimic 13.2.6 and Luminous 12.2.12), older buckets that had lifecycle policies and
	198	that have undergone resharding will have to be manually fixed.
	199
	200	The command to do so is:
11fdf7f2 TL	201
	202	::
	203
	204	# radosgw-admin lc reshard fix --bucket {bucketname}
	205
	206
	207	As a convenience wrapper, if the ``--bucket`` argument is dropped then this
81eedcae TL	208	command will try and fix lifecycle policies for all the buckets in the cluster.
	209
	210	Object Expirer fixes
	211	--------------------
	212
	213	Objects subject to Swift object expiration on older clusters may have
	214	been dropped from the log pool and never deleted after the bucket was
	215	resharded. This would happen if their expiration time was before the
	216	cluster was upgraded, but if their expiration was after the upgrade
	217	the objects would be correctly handled. To manage these expire-stale
	218	objects, radosgw-admin provides two subcommands.
	219
	220	Listing:
	221
	222	::
	223
	224	# radosgw-admin objects expire-stale list --bucket {bucketname}
	225
	226	Displays a list of object names and expiration times in JSON format.
	227
	228	Deleting:
	229
	230	::
	231
	232	# radosgw-admin objects expire-stale rm --bucket {bucketname}
	233
	234
	235	Initiates deletion of such objects, displaying a list of object names, expiration times, and deletion status in JSON format.