[ceph.git] / ceph / PendingReleaseNotes

>=19.0.0

* RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
  multi-site. Previously, the replicas of such objects were corrupted on decryption.
  A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to
  identify these original multipart uploads. The ``LastModified`` timestamp of any
  identified object is incremented by 1ns to cause peer zones to replicate it again.
  For multi-site deployments that make any use of Server-Side Encryption, we
  recommended running this command against every bucket in every zone after all
  zones have upgraded.
* CEPHFS: MDS evicts clients which are not advancing their request tids which causes
  a large buildup of session metadata resulting in the MDS going read-only due to
  the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold`
  config controls the maximum size that a (encoded) session metadata can grow.
* RGW: New tools have been added to radosgw-admin for identifying and
  correcting issues with versioned bucket indexes. Historical bugs with the
  versioned bucket index transaction workflow made it possible for the index
  to accumulate extraneous "book-keeping" olh entries and plain placeholder
  entries. In some specific scenarios where clients made concurrent requests
  referencing the same object key, it was likely that a lot of extra index
  entries would accumulate. When a significant number of these entries are
  present in a single bucket index shard, they can cause high bucket listing
  latencies and lifecycle processing failures. To check whether a versioned
  bucket has unnecessary olh entries, users can now run ``radosgw-admin
  bucket check olh``. If the ``--fix`` flag is used, the extra entries will
  be safely removed. A distinct issue from the one described thus far, it is
  also possible that some versioned buckets are maintaining extra unlinked
  objects that are not listable from the S3/ Swift APIs. These extra objects
  are typically a result of PUT requests that exited abnormally, in the middle
  of a bucket index transaction - so the client would not have received a
  successful response. Bugs in prior releases made these unlinked objects easy
  to reproduce with any PUT request that was made on a bucket that was actively
  resharding. Besides the extra space that these hidden, unlinked objects
  consume, there can be another side effect in certain scenarios, caused by
  the nature of the failure mode that produced them, where a client of a bucket
  that was a victim of this bug may find the object associated with the key to
  be in an inconsistent state. To check whether a versioned bucket has unlinked
  entries, users can now run ``radosgw-admin bucket check unlinked``. If the
  ``--fix`` flag is used, the unlinked objects will be safely removed. Finally,
  a third issue made it possible for versioned bucket index stats to be
  accounted inaccurately. The tooling for recalculating versioned bucket stats
  also had a bug, and was not previously capable of fixing these inaccuracies.
  This release resolves those issues and users can now expect that the existing
  ``radosgw-admin bucket check`` command will produce correct results. We
  recommend that users with versioned buckets, especially those that existed
  on prior releases, use these new tools to check whether their buckets are
  affected and to clean them up accordingly.
* mgr/snap-schedule: For clusters with multiple CephFS file systems, all the
  snap-schedule commands now expect the '--fs' argument.

>=18.0.0

* The RGW policy parser now rejects unknown principals by default. If you are
  mirroring policies between RGW and AWS, you may wish to set
  "rgw policy reject invalid principals" to "false". This affects only newly set
  policies, not policies that are already in place.
* RGW's default backend for `rgw_enable_ops_log` changed from RADOS to file.
  The default value of `rgw_ops_log_rados` is now false, and `rgw_ops_log_file_path`
  defaults to "/var/log/ceph/ops-log-$cluster-$name.log".
* The SPDK backend for BlueStore is now able to connect to an NVMeoF target.
  Please note that this is not an officially supported feature.
* RGW's pubsub interface now returns boolean fields using bool. Before this change,
  `/topics/<topic-name>` returns "stored_secret" and "persistent" using a string
  of "true" or "false" with quotes around them. After this change, these fields
  are returned without quotes so they can be decoded as boolean values in JSON.
  The same applies to the `is_truncated` field returned by `/subscriptions/<sub-name>`.
* RGW's response of `Action=GetTopicAttributes&TopicArn=<topic-arn>` REST API now
  returns `HasStoredSecret` and `Persistent` as boolean in the JSON string
  encoded in `Attributes/EndPoint`.
* All boolean fields previously rendered as string by `rgw-admin` command when
  the JSON format is used are now rendered as boolean. If your scripts/tools
  relies on this behavior, please update them accordingly. The impacted field names
  are:
  * absolute
  * add
  * admin
  * appendable
  * bucket_key_enabled
  * delete_marker
  * exists
  * has_bucket_info
  * high_precision_time
  * index
  * is_master
  * is_prefix
  * is_truncated
  * linked
  * log_meta
  * log_op
  * pending_removal
  * read_only
  * retain_head_object
  * rule_exist
  * start_with_full_sync
  * sync_from_all
  * syncstopped
  * system
  * truncated
  * user_stats_sync
* RGW: The beast frontend's HTTP access log line uses a new debug_rgw_access
  configurable. This has the same defaults as debug_rgw, but can now be controlled
  independently.
* RBD: The semantics of compare-and-write C++ API (`Image::compare_and_write`
  and `Image::aio_compare_and_write` methods) now match those of C API.  Both
  compare and write steps operate only on `len` bytes even if the respective
  buffers are larger. The previous behavior of comparing up to the size of
  the compare buffer was prone to subtle breakage upon straddling a stripe
  unit boundary.
* RBD: compare-and-write operation is no longer limited to 512-byte sectors.
  Assuming proper alignment, it now allows operating on stripe units (4M by
  default).
* RBD: New `rbd_aio_compare_and_writev` API method to support scatter/gather
  on both compare and write buffers.  This compliments existing `rbd_aio_readv`
  and `rbd_aio_writev` methods.
* The 'AT_NO_ATTR_SYNC' macro is deprecated, please use the standard 'AT_STATX_DONT_SYNC'
  macro. The 'AT_NO_ATTR_SYNC' macro will be removed in the future.
* Trimming of PGLog dups is now controlled by the size instead of the version.
  This fixes the PGLog inflation issue that was happening when the on-line
  (in OSD) trimming got jammed after a PG split operation. Also, a new off-line
  mechanism has been added: `ceph-objectstore-tool` got `trim-pg-log-dups` op
  that targets situations where OSD is unable to boot due to those inflated dups.
  If that is the case, in OSD logs the "You can be hit by THE DUPS BUG" warning
  will be visible.
  Relevant tracker: https://tracker.ceph.com/issues/53729
* RBD: `rbd device unmap` command gained `--namespace` option.  Support for
  namespaces was added to RBD in Nautilus 14.2.0 and it has been possible to
  map and unmap images in namespaces using the `image-spec` syntax since then
  but the corresponding option available in most other commands was missing.
* RGW: Compression is now supported for objects uploaded with Server-Side Encryption.
  When both are enabled, compression is applied before encryption. Earlier releases
  of multisite do not replicate such objects correctly, so all zones must upgrade to
  Reef before enabling the `compress-encrypted` zonegroup feature: see
  https://docs.ceph.com/en/reef/radosgw/multisite/#zone-features and note the
  security considerations.
* RGW: the "pubsub" functionality for storing bucket notifications inside Ceph
  is removed. Together with it, the "pubsub" zone should not be used anymore.
  The REST operations, as well as radosgw-admin commands for manipulating
  subscriptions, as well as fetching and acking the notifications are removed 
  as well.
  In case that the endpoint to which the notifications are sent maybe down or 
  disconnected, it is recommended to use persistent notifications to guarantee 
  the delivery of the notifications. In case the system that consumes the 
  notifications needs to pull them (instead of the notifications be pushed 
  to it), an external message bus (e.g. rabbitmq, Kafka) should be used for 
  that purpose.
* RGW: The serialized format of notification and topics has changed, so that 
  new/updated topics will be unreadable by old RGWs. We recommend completing 
  the RGW upgrades before creating or modifying any notification topics.
* RBD: Trailing newline in passphrase files (`<passphrase-file>` argument in
  `rbd encryption format` command and `--encryption-passphrase-file` option
  in other commands) is no longer stripped.
* RBD: Support for layered client-side encryption is added.  Cloned images
  can now be encrypted each with its own encryption format and passphrase,
  potentially different from that of the parent image.  The efficient
  copy-on-write semantics intrinsic to unformatted (regular) cloned images
  are retained.
* CEPHFS: Rename the `mds_max_retries_on_remount_failure` option to
  `client_max_retries_on_remount_failure` and move it from mds.yaml.in to
  mds-client.yaml.in because this option was only used by MDS client from its
  birth.
* The `perf dump` and `perf schema` commands are deprecated in favor of new
  `counter dump` and `counter schema` commands. These new commands add support
  for labeled perf counters and also emit existing unlabeled perf counters. Some
  unlabeled perf counters became labeled in this release, with more to follow in
  future releases; such converted perf counters are no longer emitted by the
  `perf dump` and `perf schema` commands.
* `ceph mgr dump` command now outputs `last_failure_osd_epoch` and
  `active_clients` fields at the top level.  Previously, these fields were
  output under `always_on_modules` field.
* `ceph mgr dump` command now displays the name of the mgr module that
  registered a RADOS client in the `name` field added to elements of the
  `active_clients` array. Previously, only the address of a module's RADOS
  client was shown in the `active_clients` array.
* RBD: All rbd-mirror daemon perf counters became labeled and as such are now
  emitted only by the new `counter dump` and `counter schema` commands.  As part
  of the conversion, many also got renamed to better disambiguate journal-based
  and snapshot-based mirroring.
* RBD: list-watchers C++ API (`Image::list_watchers`) now clears the passed
  `std::list` before potentially appending to it, aligning with the semantics
  of the corresponding C API (`rbd_watchers_list`).
* The rados python binding is now able to process (opt-in) omap keys as bytes
  objects. This enables interacting with RADOS omap keys that are not decodeable as
  UTF-8 strings.
* Telemetry: Users who are opted-in to telemetry can also opt-in to
  participating in a leaderboard in the telemetry public
  dashboards (https://telemetry-public.ceph.com/). Users can now also add a
  description of the cluster to publicly appear in the leaderboard.
  For more details, see:
  https://docs.ceph.com/en/latest/mgr/telemetry/#leaderboard
  See a sample report with `ceph telemetry preview`.
  Opt-in to telemetry with `ceph telemetry on`.
  Opt-in to the leaderboard with
  `ceph config set mgr mgr/telemetry/leaderboard true`.
  Add leaderboard description with:
  `ceph config set mgr mgr/telemetry/leaderboard_description ‘Cluster description’`.
* CEPHFS: After recovering a Ceph File System post following the disaster recovery
  procedure, the recovered files under `lost+found` directory can now be deleted.
* core: cache-tiering is now deprecated.
* mClock Scheduler: The mClock scheduler (default scheduler in Quincy) has
  undergone significant usability and design improvements to address the slow
  backfill issue. Some important changes are:
  * The 'balanced' profile is set as the default mClock profile because it
    represents a compromise between prioritizing client IO or recovery IO. Users
    can then choose either the 'high_client_ops' profile to prioritize client IO
    or the 'high_recovery_ops' profile to prioritize recovery IO.
  * QoS parameters like reservation and limit are now specified in terms of a
    fraction (range: 0.0 to 1.0) of the OSD's IOPS capacity.
  * The cost parameters (osd_mclock_cost_per_io_usec_* and
    osd_mclock_cost_per_byte_usec_*) have been removed. The cost of an operation
    is now determined using the random IOPS and maximum sequential bandwidth
    capability of the OSD's underlying device.
  * Degraded object recovery is given higher priority when compared to misplaced
    object recovery because degraded objects present a data safety issue not
    present with objects that are merely misplaced. Therefore, backfilling
    operations with the 'balanced' and 'high_client_ops' mClock profiles may
    progress slower than what was seen with the 'WeightedPriorityQueue' (WPQ)
    scheduler.
  * The QoS allocations in all the mClock profiles are optimized based on the above
    fixes and enhancements.
  * For more detailed information see:
    https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/
* CEPHFS: After recovering a Ceph File System post following the disaster recovery
  procedure, the recovered files under `lost+found` directory can now be deleted.
    https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/
* mgr/snap_schedule: The snap-schedule mgr module now retains one less snapshot
  than the number mentioned against the config tunable `mds_max_snaps_per_dir`
  so that a new snapshot can be created and retained during the next schedule
  run.

>=17.2.1

* The "BlueStore zero block detection" feature (first introduced to Quincy in
https://github.com/ceph/ceph/pull/43337) has been turned off by default with a
new global configuration called `bluestore_zero_block_detection`. This feature,
intended for large-scale synthetic testing, does not interact well with some RBD
and CephFS features. Any side effects experienced in previous Quincy versions
would no longer occur, provided that the configuration remains set to false.
Relevant tracker: https://tracker.ceph.com/issues/55521

* telemetry: Added new Rook metrics to the 'basic' channel to report Rook's
  version, Kubernetes version, node metrics, etc.
  See a sample report with `ceph telemetry preview`.
  Opt-in with `ceph telemetry on`.

  For more details, see:

  https://docs.ceph.com/en/latest/mgr/telemetry/

* OSD: The issue of high CPU utilization during recovery/backfill operations
  has been fixed. For more details, see: https://tracker.ceph.com/issues/56530.

>=15.2.17

* OSD: Octopus modified the SnapMapper key format from
  <LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()>
  to
  <MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()>
  When this change was introduced, 94ebe0e also introduced a conversion
  with a crucial bug which essentially destroyed legacy keys by mapping them
  to
  <MAPPING_PREFIX><poolid>_<snapid>_
  without the object-unique suffix. The conversion is fixed in this release.
  Relevant tracker: https://tracker.ceph.com/issues/56147
  
* Cephadm may now be configured to carry out CephFS MDS upgrades without
reducing ``max_mds`` to 1. Previously, Cephadm would reduce ``max_mds`` to 1 to
avoid having two active MDS modifying on-disk structures with new versions,
communicating cross-version-incompatible messages, or other potential
incompatibilities. This could be disruptive for large-scale CephFS deployments
because the cluster cannot easily reduce active MDS daemons to 1.
NOTE: Staggered upgrade of the mons/mgrs may be necessary to take advantage
of the feature, refer this link on how to perform it:
https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade
Relevant tracker: https://tracker.ceph.com/issues/55715

  Relevant tracker: https://tracker.ceph.com/issues/5614
  
* Cephadm may now be configured to carry out CephFS MDS upgrades without
reducing ``max_mds`` to 1. Previously, Cephadm would reduce ``max_mds`` to 1 to
avoid having two active MDS modifying on-disk structures with new versions,
communicating cross-version-incompatible messages, or other potential
incompatibilities. This could be disruptive for large-scale CephFS deployments
because the cluster cannot easily reduce active MDS daemons to 1.
NOTE: Staggered upgrade of the mons/mgrs may be necessary to take advantage
of the feature, refer this link on how to perform it:
https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade
Relevant tracker: https://tracker.ceph.com/issues/55715

* Introduced a new file system flag `refuse_client_session` that can be set using the
`fs set` command. This flag allows blocking any incoming session
request from client(s). This can be useful during some recovery situations
where it's desirable to bring MDS up but have no client workload.
Relevant tracker: https://tracker.ceph.com/issues/57090
Commit	Line	Data
aee94f69 TL	1	>=19.0.0
	2
	3	* RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
	4	multi-site. Previously, the replicas of such objects were corrupted on decryption.
	5	A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to
	6	identify these original multipart uploads. The ``LastModified`` timestamp of any
	7	identified object is incremented by 1ns to cause peer zones to replicate it again.
	8	For multi-site deployments that make any use of Server-Side Encryption, we
	9	recommended running this command against every bucket in every zone after all
	10	zones have upgraded.
	11	* CEPHFS: MDS evicts clients which are not advancing their request tids which causes
	12	a large buildup of session metadata resulting in the MDS going read-only due to
	13	the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold`
	14	config controls the maximum size that a (encoded) session metadata can grow.
	15	* RGW: New tools have been added to radosgw-admin for identifying and
	16	correcting issues with versioned bucket indexes. Historical bugs with the
	17	versioned bucket index transaction workflow made it possible for the index
	18	to accumulate extraneous "book-keeping" olh entries and plain placeholder
	19	entries. In some specific scenarios where clients made concurrent requests
	20	referencing the same object key, it was likely that a lot of extra index
	21	entries would accumulate. When a significant number of these entries are
	22	present in a single bucket index shard, they can cause high bucket listing
	23	latencies and lifecycle processing failures. To check whether a versioned
	24	bucket has unnecessary olh entries, users can now run ``radosgw-admin
	25	bucket check olh``. If the ``--fix`` flag is used, the extra entries will
	26	be safely removed. A distinct issue from the one described thus far, it is
	27	also possible that some versioned buckets are maintaining extra unlinked
	28	objects that are not listable from the S3/ Swift APIs. These extra objects
	29	are typically a result of PUT requests that exited abnormally, in the middle
	30	of a bucket index transaction - so the client would not have received a
	31	successful response. Bugs in prior releases made these unlinked objects easy
	32	to reproduce with any PUT request that was made on a bucket that was actively
	33	resharding. Besides the extra space that these hidden, unlinked objects
	34	consume, there can be another side effect in certain scenarios, caused by
	35	the nature of the failure mode that produced them, where a client of a bucket
	36	that was a victim of this bug may find the object associated with the key to
	37	be in an inconsistent state. To check whether a versioned bucket has unlinked
	38	entries, users can now run ``radosgw-admin bucket check unlinked``. If the
	39	``--fix`` flag is used, the unlinked objects will be safely removed. Finally,
	40	a third issue made it possible for versioned bucket index stats to be
	41	accounted inaccurately. The tooling for recalculating versioned bucket stats
	42	also had a bug, and was not previously capable of fixing these inaccuracies.
	43	This release resolves those issues and users can now expect that the existing
	44	``radosgw-admin bucket check`` command will produce correct results. We
	45	recommend that users with versioned buckets, especially those that existed
	46	on prior releases, use these new tools to check whether their buckets are
	47	affected and to clean them up accordingly.
	48	* mgr/snap-schedule: For clusters with multiple CephFS file systems, all the
	49	snap-schedule commands now expect the '--fs' argument.
	50
1e59de90 TL	51	>=18.0.0
	52
	53	* The RGW policy parser now rejects unknown principals by default. If you are
	54	mirroring policies between RGW and AWS, you may wish to set
	55	"rgw policy reject invalid principals" to "false". This affects only newly set
	56	policies, not policies that are already in place.
	57	* RGW's default backend for `rgw_enable_ops_log` changed from RADOS to file.
	58	The default value of `rgw_ops_log_rados` is now false, and `rgw_ops_log_file_path`
	59	defaults to "/var/log/ceph/ops-log-$cluster-$name.log".
	60	* The SPDK backend for BlueStore is now able to connect to an NVMeoF target.
	61	Please note that this is not an officially supported feature.
	62	* RGW's pubsub interface now returns boolean fields using bool. Before this change,
	63	`/topics/<topic-name>` returns "stored_secret" and "persistent" using a string
	64	of "true" or "false" with quotes around them. After this change, these fields
	65	are returned without quotes so they can be decoded as boolean values in JSON.
	66	The same applies to the `is_truncated` field returned by `/subscriptions/<sub-name>`.
	67	* RGW's response of `Action=GetTopicAttributes&TopicArn=<topic-arn>` REST API now
	68	returns `HasStoredSecret` and `Persistent` as boolean in the JSON string
	69	encoded in `Attributes/EndPoint`.
	70	* All boolean fields previously rendered as string by `rgw-admin` command when
	71	the JSON format is used are now rendered as boolean. If your scripts/tools
	72	relies on this behavior, please update them accordingly. The impacted field names
	73	are:
	74	* absolute
	75	* add
	76	* admin
	77	* appendable
	78	* bucket_key_enabled
	79	* delete_marker
	80	* exists
	81	* has_bucket_info
	82	* high_precision_time
	83	* index
	84	* is_master
	85	* is_prefix
	86	* is_truncated
	87	* linked
	88	* log_meta
	89	* log_op
	90	* pending_removal
	91	* read_only
	92	* retain_head_object
	93	* rule_exist
	94	* start_with_full_sync
	95	* sync_from_all
	96	* syncstopped
	97	* system
	98	* truncated
	99	* user_stats_sync
	100	* RGW: The beast frontend's HTTP access log line uses a new debug_rgw_access
	101	configurable. This has the same defaults as debug_rgw, but can now be controlled
	102	independently.
39ae355f TL	103	* RBD: The semantics of compare-and-write C++ API (`Image::compare_and_write`
	104	and `Image::aio_compare_and_write` methods) now match those of C API. Both
	105	compare and write steps operate only on `len` bytes even if the respective
	106	buffers are larger. The previous behavior of comparing up to the size of
	107	the compare buffer was prone to subtle breakage upon straddling a stripe
	108	unit boundary.
	109	* RBD: compare-and-write operation is no longer limited to 512-byte sectors.
	110	Assuming proper alignment, it now allows operating on stripe units (4M by
	111	default).
	112	* RBD: New `rbd_aio_compare_and_writev` API method to support scatter/gather
	113	on both compare and write buffers. This compliments existing `rbd_aio_readv`
	114	and `rbd_aio_writev` methods.
1e59de90 TL	115	* The 'AT_NO_ATTR_SYNC' macro is deprecated, please use the standard 'AT_STATX_DONT_SYNC'
1e59de90 TL	116	macro. The 'AT_NO_ATTR_SYNC' macro will be removed in the future.
39ae355f TL	117	* Trimming of PGLog dups is now controlled by the size instead of the version.
	118	This fixes the PGLog inflation issue that was happening when the on-line
	119	(in OSD) trimming got jammed after a PG split operation. Also, a new off-line
	120	mechanism has been added: `ceph-objectstore-tool` got `trim-pg-log-dups` op
	121	that targets situations where OSD is unable to boot due to those inflated dups.
	122	If that is the case, in OSD logs the "You can be hit by THE DUPS BUG" warning
	123	will be visible.
	124	Relevant tracker: https://tracker.ceph.com/issues/53729
1e59de90 TL	125	* RBD: `rbd device unmap` command gained `--namespace` option. Support for
	126	namespaces was added to RBD in Nautilus 14.2.0 and it has been possible to
	127	map and unmap images in namespaces using the `image-spec` syntax since then
	128	but the corresponding option available in most other commands was missing.
	129	* RGW: Compression is now supported for objects uploaded with Server-Side Encryption.
05a536ef TL	130	When both are enabled, compression is applied before encryption. Earlier releases
	131	of multisite do not replicate such objects correctly, so all zones must upgrade to
	132	Reef before enabling the `compress-encrypted` zonegroup feature: see
	133	https://docs.ceph.com/en/reef/radosgw/multisite/#zone-features and note the
	134	security considerations.
1e59de90 TL	135	* RGW: the "pubsub" functionality for storing bucket notifications inside Ceph
	136	is removed. Together with it, the "pubsub" zone should not be used anymore.
	137	The REST operations, as well as radosgw-admin commands for manipulating
	138	subscriptions, as well as fetching and acking the notifications are removed
	139	as well.
	140	In case that the endpoint to which the notifications are sent maybe down or
	141	disconnected, it is recommended to use persistent notifications to guarantee
	142	the delivery of the notifications. In case the system that consumes the
	143	notifications needs to pull them (instead of the notifications be pushed
	144	to it), an external message bus (e.g. rabbitmq, Kafka) should be used for
	145	that purpose.
	146	* RGW: The serialized format of notification and topics has changed, so that
	147	new/updated topics will be unreadable by old RGWs. We recommend completing
	148	the RGW upgrades before creating or modifying any notification topics.
	149	* RBD: Trailing newline in passphrase files (`<passphrase-file>` argument in
	150	`rbd encryption format` command and `--encryption-passphrase-file` option
	151	in other commands) is no longer stripped.
	152	* RBD: Support for layered client-side encryption is added. Cloned images
	153	can now be encrypted each with its own encryption format and passphrase,
	154	potentially different from that of the parent image. The efficient
	155	copy-on-write semantics intrinsic to unformatted (regular) cloned images
	156	are retained.
	157	* CEPHFS: Rename the `mds_max_retries_on_remount_failure` option to
	158	`client_max_retries_on_remount_failure` and move it from mds.yaml.in to
	159	mds-client.yaml.in because this option was only used by MDS client from its
	160	birth.
	161	* The `perf dump` and `perf schema` commands are deprecated in favor of new
	162	`counter dump` and `counter schema` commands. These new commands add support
	163	for labeled perf counters and also emit existing unlabeled perf counters. Some
	164	unlabeled perf counters became labeled in this release, with more to follow in
	165	future releases; such converted perf counters are no longer emitted by the
	166	`perf dump` and `perf schema` commands.
	167	* `ceph mgr dump` command now outputs `last_failure_osd_epoch` and
	168	`active_clients` fields at the top level. Previously, these fields were
	169	output under `always_on_modules` field.
	170	* `ceph mgr dump` command now displays the name of the mgr module that
	171	registered a RADOS client in the `name` field added to elements of the
	172	`active_clients` array. Previously, only the address of a module's RADOS
	173	client was shown in the `active_clients` array.
	174	* RBD: All rbd-mirror daemon perf counters became labeled and as such are now
	175	emitted only by the new `counter dump` and `counter schema` commands. As part
	176	of the conversion, many also got renamed to better disambiguate journal-based
	177	and snapshot-based mirroring.
	178	* RBD: list-watchers C++ API (`Image::list_watchers`) now clears the passed
	179	`std::list` before potentially appending to it, aligning with the semantics
	180	of the corresponding C API (`rbd_watchers_list`).
05a536ef TL	181	* The rados python binding is now able to process (opt-in) omap keys as bytes
	182	objects. This enables interacting with RADOS omap keys that are not decodeable as
	183	UTF-8 strings.
1e59de90 TL	184	* Telemetry: Users who are opted-in to telemetry can also opt-in to
	185	participating in a leaderboard in the telemetry public
	186	dashboards (https://telemetry-public.ceph.com/). Users can now also add a
	187	description of the cluster to publicly appear in the leaderboard.
	188	For more details, see:
	189	https://docs.ceph.com/en/latest/mgr/telemetry/#leaderboard
	190	See a sample report with `ceph telemetry preview`.
	191	Opt-in to telemetry with `ceph telemetry on`.
	192	Opt-in to the leaderboard with
	193	`ceph config set mgr mgr/telemetry/leaderboard true`.
	194	Add leaderboard description with:
	195	`ceph config set mgr mgr/telemetry/leaderboard_description ‘Cluster description’`.
	196	* CEPHFS: After recovering a Ceph File System post following the disaster recovery
	197	procedure, the recovered files under `lost+found` directory can now be deleted.
	198	* core: cache-tiering is now deprecated.
	199	* mClock Scheduler: The mClock scheduler (default scheduler in Quincy) has
	200	undergone significant usability and design improvements to address the slow
	201	backfill issue. Some important changes are:
	202	* The 'balanced' profile is set as the default mClock profile because it
	203	represents a compromise between prioritizing client IO or recovery IO. Users
	204	can then choose either the 'high_client_ops' profile to prioritize client IO
	205	or the 'high_recovery_ops' profile to prioritize recovery IO.
	206	* QoS parameters like reservation and limit are now specified in terms of a
	207	fraction (range: 0.0 to 1.0) of the OSD's IOPS capacity.
	208	* The cost parameters (osd_mclock_cost_per_io_usec_* and
	209	osd_mclock_cost_per_byte_usec_*) have been removed. The cost of an operation
	210	is now determined using the random IOPS and maximum sequential bandwidth
	211	capability of the OSD's underlying device.
	212	* Degraded object recovery is given higher priority when compared to misplaced
	213	object recovery because degraded objects present a data safety issue not
	214	present with objects that are merely misplaced. Therefore, backfilling
	215	operations with the 'balanced' and 'high_client_ops' mClock profiles may
	216	progress slower than what was seen with the 'WeightedPriorityQueue' (WPQ)
	217	scheduler.
	218	* The QoS allocations in all the mClock profiles are optimized based on the above
	219	fixes and enhancements.
	220	* For more detailed information see:
	221	https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/
05a536ef TL	222	* CEPHFS: After recovering a Ceph File System post following the disaster recovery
05a536ef TL	223	procedure, the recovered files under `lost+found` directory can now be deleted.
aee94f69 TL	224	https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/
	225	* mgr/snap_schedule: The snap-schedule mgr module now retains one less snapshot
	226	than the number mentioned against the config tunable `mds_max_snaps_per_dir`
	227	so that a new snapshot can be created and retained during the next schedule
	228	run.
39ae355f	229
33c7a0ef TL	230	>=17.2.1
	231
	232	* The "BlueStore zero block detection" feature (first introduced to Quincy in
	233	https://github.com/ceph/ceph/pull/43337) has been turned off by default with a
	234	new global configuration called `bluestore_zero_block_detection`. This feature,
	235	intended for large-scale synthetic testing, does not interact well with some RBD
	236	and CephFS features. Any side effects experienced in previous Quincy versions
	237	would no longer occur, provided that the configuration remains set to false.
	238	Relevant tracker: https://tracker.ceph.com/issues/55521
	239
	240	* telemetry: Added new Rook metrics to the 'basic' channel to report Rook's
	241	version, Kubernetes version, node metrics, etc.
	242	See a sample report with `ceph telemetry preview`.
	243	Opt-in with `ceph telemetry on`.
	244
	245	For more details, see:
	246
	247	https://docs.ceph.com/en/latest/mgr/telemetry/
	248
1e59de90 TL	249	* OSD: The issue of high CPU utilization during recovery/backfill operations
1e59de90 TL	250	has been fixed. For more details, see: https://tracker.ceph.com/issues/56530.
20effc67	251
1e59de90 TL	252	>=15.2.17
	253
	254	* OSD: Octopus modified the SnapMapper key format from
	255	<LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()>
	256	to
	257	<MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()>
	258	When this change was introduced, 94ebe0e also introduced a conversion
	259	with a crucial bug which essentially destroyed legacy keys by mapping them
	260	to
	261	<MAPPING_PREFIX><poolid>_<snapid>_
	262	without the object-unique suffix. The conversion is fixed in this release.
	263	Relevant tracker: https://tracker.ceph.com/issues/56147
	264
	265	* Cephadm may now be configured to carry out CephFS MDS upgrades without
	266	reducing ``max_mds`` to 1. Previously, Cephadm would reduce ``max_mds`` to 1 to
	267	avoid having two active MDS modifying on-disk structures with new versions,
	268	communicating cross-version-incompatible messages, or other potential
	269	incompatibilities. This could be disruptive for large-scale CephFS deployments
	270	because the cluster cannot easily reduce active MDS daemons to 1.
	271	NOTE: Staggered upgrade of the mons/mgrs may be necessary to take advantage
	272	of the feature, refer this link on how to perform it:
	273	https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade
	274	Relevant tracker: https://tracker.ceph.com/issues/55715
	275
	276	Relevant tracker: https://tracker.ceph.com/issues/5614
	277
	278	* Cephadm may now be configured to carry out CephFS MDS upgrades without
	279	reducing ``max_mds`` to 1. Previously, Cephadm would reduce ``max_mds`` to 1 to
	280	avoid having two active MDS modifying on-disk structures with new versions,
	281	communicating cross-version-incompatible messages, or other potential
	282	incompatibilities. This could be disruptive for large-scale CephFS deployments
	283	because the cluster cannot easily reduce active MDS daemons to 1.
	284	NOTE: Staggered upgrade of the mons/mgrs may be necessary to take advantage
	285	of the feature, refer this link on how to perform it:
	286	https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade
	287	Relevant tracker: https://tracker.ceph.com/issues/55715
	288
	289	* Introduced a new file system flag `refuse_client_session` that can be set using the
	290	`fs set` command. This flag allows blocking any incoming session
	291	request from client(s). This can be useful during some recovery situations
	292	where it's desirable to bring MDS up but have no client workload.
	293	Relevant tracker: https://tracker.ceph.com/issues/57090