[ceph.git] / ceph / PendingReleaseNotes

14.2.4
------

* In the Zabbix Mgr Module there was a typo in the key being send
  to Zabbix for PGs in backfill_wait state. The key that was sent
  was 'wait_backfill' and the correct name is 'backfill_wait'.
  Update your Zabbix template accordingly so that it accepts the
  new key being send to Zabbix.

14.2.3
--------

* Nautilus-based librbd clients can now open images on Jewel clusters.

* The RGW "num_rados_handles" has been removed.
  If you were using a value of "num_rados_handles" greater than 1
  multiply your current "objecter_inflight_ops" and
  "objecter_inflight_op_bytes" paramaeters by the old
  "num_rados_handles" to get the same throttle behavior.
  
* The ``bluestore_no_per_pool_stats_tolerance`` config option has been
  replaced with ``bluestore_fsck_error_on_no_per_pool_stats``
  (default: false).  The overall default behavior has not changed:
  fsck will warn but not fail on legacy stores, and repair will
  convert to per-pool stats.

14.2.2
------

* The no{up,down,in,out} related commands has been revamped.
  There are now 2 ways to set the no{up,down,in,out} flags:
  the old 'ceph osd [un]set <flag>' command, which sets cluster-wide flags;
  and the new 'ceph osd [un]set-group <flags> <who>' command,
  which sets flags in batch at the granularity of any crush node,
  or device class.

* RGW: radosgw-admin introduces two subcommands that allow the
  managing of expire-stale objects that might be left behind after a
  bucket reshard in earlier versions of RGW. One subcommand lists such
  objects and the other deletes them. Read the troubleshooting section
  of the dynamic resharding docs for details.

14.2.5
------

* The telemetry module now has a 'device' channel, enabled by default, that
  will report anonymized hard disk and SSD health metrics to telemetry.ceph.com
  in order to build and improve device failure prediction algorithms.  Because
  the content of telemetry reports has changed, you will need to either re-opt-in
  with::

    ceph telemetry on

  You can view exactly what information will be reported first with::

    ceph telemetry show
    ceph telemetry show device   # specifically show the device channel

  If you are not comfortable sharing device metrics, you can disable that
  channel first before re-opting-in:

    ceph config set mgr mgr/telemetry/channel_crash false
    ceph telemetry on

* The telemetry module now reports more information about CephFS file systems,
  including:

    - how many MDS daemons (in total and per file system)
    - which features are (or have been) enabled
    - how many data pools
    - approximate file system age (year + month of creation)
    - how many files, bytes, and snapshots
    - how much metadata is being cached

  We have also added:

    - which Ceph release the monitors are running
    - whether msgr v1 or v2 addresses are used for the monitors
    - whether IPv4 or IPv6 addresses are used for the monitors
    - whether RADOS cache tiering is enabled (and which mode)
    - whether pools are replicated or erasure coded, and
      which erasure code profile plugin and parameters are in use
    - how many hosts are in the cluster, and how many hosts have each type of daemon
    - whether a separate OSD cluster network is being used
    - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
    - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
    - aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use

  If you had telemetry enabled, you will need to re-opt-in with::

    ceph telemetry on

  You can view exactly what information will be reported first with::

    ceph telemetry show        # see everything
    ceph telemetry show basic  # basic cluster info (including all of the new info)

* A health warning is now generated if the average osd heartbeat ping
  time exceeds a configurable threshold for any of the intervals
  computed.  The OSD computes 1 minute, 5 minute and 15 minute
  intervals with average, minimum and maximum values.  New configuration
  option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
  ``osd_heartbeat_grace`` to determine the threshold.  A value of zero
  disables the warning.  New configuration option
 ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
  computed value, causes a warning
  when OSD heartbeat pings take longer than the specified amount.
  New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
  list all connections with a ping time longer than the specified threshold or
  value determined by the config options, for the average for any of the 3 intervals.
  New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
  do the same but only including heartbeats initiated by the specified OSD.

* New OSD daemon command dump_recovery_reservations which reveals the
  recovery locks held (in_progress) and waiting in priority queues.

* New OSD daemon command dump_scrub_reservations which reveals the
  scrub reservations that are held for local (primary) and remote (replica) PGs.
Commit	Line	Data
eafe8130 TL	1	14.2.4
	2	------
	3
	4	* In the Zabbix Mgr Module there was a typo in the key being send
	5	to Zabbix for PGs in backfill_wait state. The key that was sent
	6	was 'wait_backfill' and the correct name is 'backfill_wait'.
	7	Update your Zabbix template accordingly so that it accepts the
	8	new key being send to Zabbix.
	9
	10	14.2.3
11fdf7f2	11	--------
a8e16298	12
81eedcae TL	13	* Nautilus-based librbd clients can now open images on Jewel clusters.
81eedcae TL	14
494da23a TL	15	* The RGW "num_rados_handles" has been removed.
	16	If you were using a value of "num_rados_handles" greater than 1
	17	multiply your current "objecter_inflight_ops" and
	18	"objecter_inflight_op_bytes" paramaeters by the old
	19	"num_rados_handles" to get the same throttle behavior.
eafe8130 TL	20
	21	* The ``bluestore_no_per_pool_stats_tolerance`` config option has been
	22	replaced with ``bluestore_fsck_error_on_no_per_pool_stats``
	23	(default: false). The overall default behavior has not changed:
	24	fsck will warn but not fail on legacy stores, and repair will
	25	convert to per-pool stats.
494da23a	26
81eedcae TL	27	14.2.2
	28	------
	29
	30	* The no{up,down,in,out} related commands has been revamped.
	31	There are now 2 ways to set the no{up,down,in,out} flags:
	32	the old 'ceph osd [un]set <flag>' command, which sets cluster-wide flags;
	33	and the new 'ceph osd [un]set-group <flags> <who>' command,
	34	which sets flags in batch at the granularity of any crush node,
	35	or device class.
	36
	37	* RGW: radosgw-admin introduces two subcommands that allow the
	38	managing of expire-stale objects that might be left behind after a
	39	bucket reshard in earlier versions of RGW. One subcommand lists such
	40	objects and the other deletes them. Read the troubleshooting section
	41	of the dynamic resharding docs for details.
eafe8130 TL	42
	43	14.2.5
	44	------
	45
	46	* The telemetry module now has a 'device' channel, enabled by default, that
	47	will report anonymized hard disk and SSD health metrics to telemetry.ceph.com
	48	in order to build and improve device failure prediction algorithms. Because
	49	the content of telemetry reports has changed, you will need to either re-opt-in
	50	with::
	51
	52	ceph telemetry on
	53
	54	You can view exactly what information will be reported first with::
	55
	56	ceph telemetry show
	57	ceph telemetry show device # specifically show the device channel
	58
	59	If you are not comfortable sharing device metrics, you can disable that
	60	channel first before re-opting-in:
	61
	62	ceph config set mgr mgr/telemetry/channel_crash false
	63	ceph telemetry on
	64
	65	* The telemetry module now reports more information about CephFS file systems,
	66	including:
	67
	68	- how many MDS daemons (in total and per file system)
	69	- which features are (or have been) enabled
	70	- how many data pools
	71	- approximate file system age (year + month of creation)
	72	- how many files, bytes, and snapshots
	73	- how much metadata is being cached
	74
	75	We have also added:
	76
	77	- which Ceph release the monitors are running
	78	- whether msgr v1 or v2 addresses are used for the monitors
	79	- whether IPv4 or IPv6 addresses are used for the monitors
	80	- whether RADOS cache tiering is enabled (and which mode)
	81	- whether pools are replicated or erasure coded, and
	82	which erasure code profile plugin and parameters are in use
	83	- how many hosts are in the cluster, and how many hosts have each type of daemon
	84	- whether a separate OSD cluster network is being used
	85	- how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
	86	- how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
	87	- aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use
	88
	89	If you had telemetry enabled, you will need to re-opt-in with::
	90
	91	ceph telemetry on
	92
	93	You can view exactly what information will be reported first with::
	94
	95	ceph telemetry show # see everything
	96	ceph telemetry show basic # basic cluster info (including all of the new info)
	97
	98	* A health warning is now generated if the average osd heartbeat ping
	99	time exceeds a configurable threshold for any of the intervals
	100	computed. The OSD computes 1 minute, 5 minute and 15 minute
	101	intervals with average, minimum and maximum values. New configuration
	102	option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
	103	``osd_heartbeat_grace`` to determine the threshold. A value of zero
	104	disables the warning. New configuration option
	105	``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
106	computed value, causes a warning
107	when OSD heartbeat pings take longer than the specified amount.
108	New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
109	list all connections with a ping time longer than the specified threshold or
110	value determined by the config options, for the average for any of the 3 intervals.
111	New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
112	do the same but only including heartbeats initiated by the specified OSD.
113
114	* New OSD daemon command dump_recovery_reservations which reveals the
115	recovery locks held (in_progress) and waiting in priority queues.
116
117	* New OSD daemon command dump_scrub_reservations which reveals the
118	scrub reservations that are held for local (primary) and remote (replica) PGs.