ceph/PendingReleaseNotes

   1 >=15.0.0
   2 --------
   3
   4 * The RGW "num_rados_handles" has been removed.
   5   * If you were using a value of "num_rados_handles" greater than 1
   6     multiply your current "objecter_inflight_ops" and
   7     "objecter_inflight_op_bytes" paramaeters by the old
   8     "num_rados_handles" to get the same throttle behavior.
   9
  10 * Ceph now packages python bindings for python3.6 instead of
  11   python3.4, because python3 in EL7/EL8 is now using python3.6
  12   as the native python3. see the `announcement <https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/>_`
  13   for more details on the background of this change.
  14
  15 * librbd now uses a write-around cache policy be default,
  16   replacing the previous write-back cache policy default.
  17   This cache policy allows librbd to immediately complete
  18   write IOs while they are still in-flight to the OSDs.
  19   Subsequent flush requests will ensure all in-flight
  20   write IOs are completed prior to completing. The
  21   librbd cache policy can be controlled via a new
  22   "rbd_cache_policy" configuration option.
  23
  24 * librbd now includes a simple IO scheduler which attempts to
  25   batch together multiple IOs against the same backing RBD
  26   data block object. The librbd IO scheduler policy can be
  27   controlled via a new "rbd_io_scheduler" configuration
  28   option.
  29
  30 * RGW: radosgw-admin introduces two subcommands that allow the
  31   managing of expire-stale objects that might be left behind after a
  32   bucket reshard in earlier versions of RGW. One subcommand lists such
  33   objects and the other deletes them. Read the troubleshooting section
  34   of the dynamic resharding docs for details.
  35
  36 * RGW: Bucket naming restrictions have changed and likely to cause
  37   InvalidBucketName errors. We recommend to set ``rgw_relaxed_s3_bucket_names``
  38   option to true as a workaround.
  39
  40 * In the Zabbix Mgr Module there was a typo in the key being send
  41   to Zabbix for PGs in backfill_wait state. The key that was sent
  42   was 'wait_backfill' and the correct name is 'backfill_wait'.
  43   Update your Zabbix template accordingly so that it accepts the
  44   new key being send to Zabbix.
  45
  46 * zabbix plugin for ceph manager now includes osd and pool
  47   discovery. Update of zabbix_template.xml is needed
  48   to receive per-pool (read/write throughput, diskspace usage)
  49   and per-osd (latency, status, pgs) statistics
  50
  51 * The format of all date + time stamps has been modified to fully
  52   conform to ISO 8601.  The old format (``YYYY-MM-DD
  53   HH:MM:SS.ssssss``) excluded the ``T`` separator between the date and
  54   time and was rendered using the local time zone without any explicit
  55   indication.  The new format includes the separator as well as a
  56   ``+nnnn`` or ``-nnnn`` suffix to indicate the time zone, or a ``Z``
  57   suffix if the time is UTC.  For example,
  58   ``2019-04-26T18:40:06.225953+0100``.
  59
  60   Any code or scripts that was previously parsing date and/or time
  61   values from the JSON or XML structure CLI output should be checked
  62   to ensure it can handle ISO 8601 conformant values.  Any code
  63   parsing date or time values from the unstructured human-readable
  64   output should be modified to parse the structured output instead, as
  65   the human-readable output may change without notice.
  66
  67 * The ``bluestore_no_per_pool_stats_tolerance`` config option has been
  68   replaced with ``bluestore_fsck_error_on_no_per_pool_stats``
  69   (default: false).  The overall default behavior has not changed:
  70   fsck will warn but not fail on legacy stores, and repair will
  71   convert to per-pool stats.
  72
  73 * The disaster-recovery related 'ceph mon sync force' command has been
  74   replaced with 'ceph daemon <...> sync_force'.
  75
  76 * The ``osd_recovery_max_active`` option now has
  77   ``osd_recovery_max_active_hdd`` and ``osd_recovery_max_active_ssd``
  78   variants, each with different default values for HDD and SSD-backed
  79   OSDs, respectively.  By default ``osd_recovery_max_active`` now
  80   defaults to zero, which means that the OSD will conditionally use
  81   the HDD or SSD option values.  Administrators who have customized
  82   this value may want to consider whether they have set this to a
  83   value similar to the new defaults (3 for HDDs and 10 for SSDs) and,
  84   if so, remove the option from their configuration entirely.
  85
  86 * monitors now have a `ceph osd info` command that will provide information
  87   on all osds, or provided osds, thus simplifying the process of having to
  88   parse `osd dump` for the same information.
  89
  90 * The structured output of ``ceph status`` or ``ceph -s`` is now more
  91   concise, particularly the `mgrmap` and `monmap` sections, and the
  92   structure of the `osdmap` section has been cleaned up.
  93
  94 * A health warning is now generated if the average osd heartbeat ping
  95   time exceeds a configurable threshold for any of the intervals
  96   computed.  The OSD computes 1 minute, 5 minute and 15 minute
  97   intervals with average, minimum and maximum values.  New configuration
  98   option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
  99   ``osd_heartbeat_grace`` to determine the threshold.  A value of zero
 100   disables the warning.  New configuration option
 101  ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
 102   computed value, causes a warning
 103   when OSD heartbeat pings take longer than the specified amount.
 104   New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
 105   list all connections with a ping time longer than the specified threshold or
 106   value determined by the config options, for the average for any of the 3 intervals.
 107   New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
 108   do the same but only including heartbeats initiated by the specified OSD.
 109
 110 * Inline data support for CephFS has been deprecated. When setting the flag,
 111   users will see a warning to that effect, and enabling it now requires the
 112   ``--yes-i-really-really-mean-it`` flag. If the MDS is started on a
 113   filesystem that has it enabled, a health warning is generated. Support for
 114   this feature will be removed in a future release.
 115
 116 * ``ceph {set,unset} full`` is not supported anymore. We have been using
 117   ``full`` and ``nearfull`` flags in OSD map for tracking the fullness status
 118   of a cluster back since the Hammer release, if the OSD map is marked ``full``
 119   all write operations will be blocked until this flag is removed. In the
 120   Infernalis release and Linux kernel 4.7 client, we introduced the per-pool
 121   full/nearfull flags to track the status for a finer-grained control, so the
 122   clients will hold the write operations if either the cluster-wide ``full``
 123   flag or the per-pool ``full`` flag is set. This was a compromise, as we
 124   needed to support the cluster with and without per-pool ``full`` flags
 125   support. But this practically defeated the purpose of introducing the
 126   per-pool flags. So, in the Mimic release, the new flags finally took the
 127   place of their cluster-wide counterparts, as the monitor started removing
 128   these two flags from OSD map. So the clients of Infernalis and up can benefit
 129   from this change, as they won't be blocked by the full pools which they are
 130   not writing to. In this release, ``ceph {set,unset} full`` is now considered
 131   as an invalid command. And the clients will continue honoring both the
 132   cluster-wide and per-pool flags to be backward comaptible with pre-infernalis
 133   clusters.
 134
 135 * The telemetry module now reports more information.
 136
 137   First, there is a new 'device' channel, enabled by default, that
 138   will report anonymized hard disk and SSD health metrics to
 139   telemetry.ceph.com in order to build and improve device failure
 140   prediction algorithms.  If you are not comfortable sharing device
 141   metrics, you can disable that channel first before re-opting-in::
 142
 143     ceph config set mgr mgr/telemetry/channel_device false
 144
 145   Second, we now report more information about CephFS file systems,
 146   including:
 147
 148     - how many MDS daemons (in total and per file system)
 149     - which features are (or have been) enabled
 150     - how many data pools
 151     - approximate file system age (year + month of creation)
 152     - how many files, bytes, and snapshots
 153     - how much metadata is being cached
 154
 155   We have also added:
 156
 157     - which Ceph release the monitors are running
 158     - whether msgr v1 or v2 addresses are used for the monitors
 159     - whether IPv4 or IPv6 addresses are used for the monitors
 160     - whether RADOS cache tiering is enabled (and which mode)
 161     - whether pools are replicated or erasure coded, and
 162       which erasure code profile plugin and parameters are in use
 163     - how many hosts are in the cluster, and how many hosts have each type of daemon
 164     - whether a separate OSD cluster network is being used
 165     - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
 166     - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
 167     - aggregate stats about the CRUSH map, like which algorithms are used, how
 168       big buckets are, how many rules are defined, and what tunables are in
 169       use
 170
 171   If you had telemetry enabled, you will need to re-opt-in with::
 172
 173     ceph telemetry on
 174
 175   You can view exactly what information will be reported first with::
 176
 177     ceph telemetry show        # see everything
 178     ceph telemetry show basic  # basic cluster info (including all of the new info)
 179
 180 * Following invalid settings now are not tolerated anymore
 181   for the command `ceph osd erasure-code-profile set xxx`.
 182   * invalid `m` for "reed_sol_r6_op" erasure technique
 183   * invalid `m` and invalid `w` for "liber8tion" erasure technique
 184
 185 * New OSD daemon command dump_recovery_reservations which reveals the
 186   recovery locks held (in_progress) and waiting in priority queues.
 187
 188 * New OSD daemon command dump_scrub_reservations which reveals the
 189   scrub reservations that are held for local (primary) and remote (replica) PGs.
 190
 191 * Previously, ``ceph tell mgr ...`` could be used to call commands
 192   implemented by mgr modules.  This is no longer supported.  Since
 193   luminous, using ``tell`` has not been necessary: those same commands
 194   are also accessible without the ``tell mgr`` portion (e.g., ``ceph
 195   tell mgr influx foo`` is the same as ``ceph influx foo``.  ``ceph
 196   tell mgr ...`` will now call admin commands--the same set of
 197   commands accessible via ``ceph daemon ...`` when you are logged into
 198   the appropriate host.
 199
 200 * The ``ceph tell`` and ``ceph daemon`` commands have been unified,
 201   such that all such commands are accessible via either interface.
 202   Note that ceph-mgr tell commands are accessible via either ``ceph
 203   tell mgr ...`` or ``ceph tell mgr.<id> ...``, and it is only
 204   possible to send tell commands to the active daemon (the standbys do
 205   not accept incoming connections over the network).
 206
 207 * Ceph will now issue a health warning if a RADOS pool as a ``pg_num``
 208   value that is not a power of two.  This can be fixed by adjusting
 209   the pool to a nearby power of two::
 210
 211     ceph osd pool set <pool-name> pg_num <new-pg-num>
 212
 213   Alternatively, the warning can be silenced with::
 214
 215     ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
 216
 217 * The format of MDSs in `ceph fs dump` has changed.
 218
 219 * The ``mds_cache_size`` config option is completely removed. Since luminous,
 220   the ``mds_cache_memory_limit`` config option has been preferred to configure
 221   the MDS's cache limits.
 222
 223 * The ``pg_autoscale_mode`` is now set to ``on`` by default for newly
 224   created pools, which means that Ceph will automatically manage the
 225   number of PGs.  To change this behavior, or to learn more about PG
 226   autoscaling, see :ref:`pg-autoscaler`.  Note that existing pools in
 227   upgraded clusters will still be set to ``warn`` by default.
 228
 229 * The pool parameter ``target_size_ratio``, used by the pg autoscaler,
 230   has changed meaning. It is now normalized across pools, rather than
 231   specifying an absolute ratio. For details, see :ref:`pg-autoscaler`.
 232   If you have set target size ratios on any pools, you may want to set
 233   these pools to autoscale ``warn`` mode to avoid data movement during
 234   the upgrade::
 235
 236     ceph osd pool set <pool-name> pg_autoscale_mode warn
 237
 238 * The ``upmap_max_iterations`` config option of mgr/balancer has been
 239   renamed to ``upmap_max_optimizations`` to better match its behaviour.
 240
 241 * ``mClockClientQueue`` and ``mClockClassQueue`` OpQueue
 242   implementations have been removed in favor of of a single
 243   ``mClockScheduler`` implementation of a simpler OSD interface.
 244   Accordingly, the ``osd_op_queue_mclock*`` family of config options
 245   has been removed in favor of the ``osd_mclock_scheduler*`` family
 246   of options.
 247
 248 * The config subsystem now searches dot ('.') delineated prefixes for
 249   options.  That means for an entity like ``client.foo.bar``, it's
 250   overall configuration will be a combination of the global options,
 251   ``client``, ``client.foo``, and ``client.foo.bar``.  Previously,
 252   only global, ``client``, and ``client.foo.bar`` options would apply.
 253   This change may affect the configuration for clients that include a
 254   ``.`` in their name.
 255
 256   Note that this only applies to configuration options in the
 257   monitor's database--config file parsing is not affected.
 258
 259 * RGW: bucket listing performance on sharded bucket indexes has been
 260   notably improved by heuristically -- and significantly, in many
 261   cases -- reducing the number of entries requested from each bucket
 262   index shard.
 263
 264 * MDS default cache memory limit is now 4GB.
 265
 266 * The behaviour of the ``-o`` argument to the rados tool has been reverted to
 267   its orignal behaviour of indicating an output file. This reverts it to a more
 268   consisten behaviour when compared to other tools. Specifying obect size is now
 269   accomplished by using an upper case O ``-O``.
 270
 271 * In certain rare cases, OSDs would self-classify themselves as type
 272   'nvme' instead of 'hdd' or 'ssd'.  This appears to be limited to
 273   cases where BlueStore was deployed with older versions of ceph-disk,
 274   or manually without ceph-volume and LVM.  Going forward, the OSD
 275   will limit itself to only 'hdd' and 'ssd' (or whatever device class the user
 276   manually specifies).
 277
 278 * RGW: a mismatch between the bucket notification documentation and the actual
 279   message format was fixed. This means that any endpoints receiving bucket
 280   notification, will now receive the same notifications inside an JSON array
 281   named 'Records'. Note that this does not affect pulling bucket notification
 282   from a subscription in a 'pubsub' zone, as these are already wrapped inside
 283   that array.
 284
 285 * The configuration value ``osd_calc_pg_upmaps_max_stddev`` used for upmap
 286   balancing has been removed. Instead use the mgr balancer config
 287   ``upmap_max_deviation`` which now is an integer number of PGs of deviation
 288   from the target PGs per OSD.  This can be set with a command like
 289   ``ceph config set mgr mgr/balancer/upmap_max_deviation 2``.  The default
 290   ``upmap_max_deviation`` is 1.  There are situations where crush rules
 291   would not allow a pool to ever have completely balanced PGs.  For example, if
 292   crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of
 293   the racks.  In those cases, the configuration value can be increased.
 294
 295 * MDS daemons can now be assigned to manage a particular file system via the
 296   new ``mds_join_fs`` option. The monitors will try to use only MDS for a file
 297   system with mds_join_fs equal to the file system name (strong affinity).
 298   Monitors may also deliberately failover an active MDS to a standby when the
 299   cluster is otherwise healthy if the standby has stronger affinity.
 300
 301 * RGW Multisite: A new fine grained bucket-granularity policy configuration
 302   system has been introduced and it supersedes the previous coarse zone sync
 303   configuration (specifically the ``sync_from`` and ``sync_from_all`` fields
 304   in the zonegroup configuration. New configuration should only be configured
 305   after all relevant zones in the zonegroup have been upgraded.
 306
 307 * RGW S3: Support has been added for BlockPublicAccess set of APIs at a bucket
 308   level, currently blocking/ignoring public acls & policies are supported.
 309   User/Account level APIs are planned to be added in the future
 310
 311 * RGW: The default number of bucket index shards for new buckets was raised
 312   from 1 to 11 to increase the amount of write throughput for small buckets
 313   and delay the onset of dynamic resharding. This change only affects new
 314   deployments/zones. To change this default value on existing deployments,
 315   use 'radosgw-admin zonegroup modify --bucket-index-max-shards=11'.
 316   If the zonegroup is part of a realm, the change must be committed with
 317   'radosgw-admin period update --commit' - otherwise the change will take
 318   effect after radosgws are restarted.