update source to Ceph Pacific 16.2.2

[ceph.git] / ceph / PendingReleaseNotes
diff --git a/ceph/PendingReleaseNotes b/ceph/PendingReleaseNotes

index cf62599debc08f5ec2b3bf00264da438447d34a9..1963b699d743eb0533119df943f42c4ca6e89db4 100644 (file)
--- a/ceph/PendingReleaseNotes
+++ b/ceph/PendingReleaseNotes
@@ -1,34 +1,313 @@
-15.2.9
-------
-* MGR: progress module can now be turned on/off, using the commands:
-  ``ceph progress on`` and ``ceph progress off``.
+>=17.0.0
+
+* A new library is available, libcephsqlite. It provides a SQLite Virtual File
+  System (VFS) on top of RADOS. The database and journals are striped over
+  RADOS across multiple objects for virtually unlimited scaling and throughput
+  only limited by the SQLite client. Applications using SQLite may change to
+  the Ceph VFS with minimal changes, usually just by specifying the alternate
+  VFS. We expect the library to be most impactful and useful for applications
+  that were storing state in RADOS omap, especially without striping which
+  limits scalability.
+
+>=16.0.0
+--------
+* CephFS: Disabling allow_standby_replay on a file system will also stop all
+  standby-replay daemons for that file system.
  
  * New bluestore_rocksdb_options_annex config parameter. Complements
    bluestore_rocksdb_options and allows setting rocksdb options without repeating
    the existing defaults.
+* The cephfs addes two new CDentry tags, 'I' --> 'i' and 'L' --> 'l', and
+  on-RADOS metadata is no longer backwards compatible after upgraded to Pacific
+  or a later release.
  
-15.2.8
-------
  * $pid expansion in config paths like `admin_socket` will now properly expand
    to the daemon pid for commands like `ceph-mds` or `ceph-osd`. Previously only
    `ceph-fuse`/`rbd-nbd` expanded `$pid` with the actual daemon pid.
  
+* The allowable options for some "radosgw-admin" commands have been changed.
+
+  * "mdlog-list", "datalog-list", "sync-error-list" no longer accepts
+    start and end dates, but does accept a single optional start marker.
+  * "mdlog-trim", "datalog-trim", "sync-error-trim" only accept a
+    single marker giving the end of the trimmed range.
+  * Similarly the date ranges and marker ranges have been removed on
+    the RESTful DATALog and MDLog list and trim operations.
+
  * ceph-volume: The ``lvm batch` subcommand received a major rewrite. This closed
    a number of bugs and improves usability in terms of size specification and
    calculation, as well as idempotency behaviour and disk replacement process.
    Please refer to https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for
    more detailed information.
  
+* Configuration variables for permitted scrub times have changed.  The legal
+  values for ``osd_scrub_begin_hour`` and ``osd_scrub_end_hour`` are 0 - 23.
+  The use of 24 is now illegal.  Specifying ``0`` for both values causes every
+  hour to be allowed.  The legal vaues for ``osd_scrub_begin_week_day`` and
+  ``osd_scrub_end_week_day`` are 0 - 6.  The use of 7 is now illegal.
+  Specifying ``0`` for both values causes every day of the week to be allowed.
+
+* Multiple file systems in a single Ceph cluster is now stable. New Ceph clusters
+  enable support for multiple file systems by default. Existing clusters
+  must still set the "enable_multiple" flag on the fs. Please see the CephFS
+  documentation for more information.
+
+* volume/nfs: Recently "ganesha-" prefix from cluster id and nfs-ganesha common
+  config object was removed, to ensure consistent namespace across different
+  orchestrator backends. Please delete any existing nfs-ganesha clusters prior
+  to upgrading and redeploy new clusters after upgrading to Pacific.
+
+* A new health check, DAEMON_OLD_VERSION, will warn if different versions of Ceph are running
+  on daemons. It will generate a health error if multiple versions are detected.
+  This condition must exist for over mon_warn_older_version_delay (set to 1 week by default) in order for the
+  health condition to be triggered.  This allows most upgrades to proceed
+  without falsely seeing the warning.  If upgrade is paused for an extended
+  time period, health mute can be used like this
+  "ceph health mute DAEMON_OLD_VERSION --sticky".  In this case after
+  upgrade has finished use "ceph health unmute DAEMON_OLD_VERSION".
+
+* MGR: progress module can now be turned on/off, using the commands:
+  ``ceph progress on`` and ``ceph progress off``.
+* An AWS-compliant API: "GetTopicAttributes" was added to replace the existing "GetTopic" API. The new API
+  should be used to fetch information about topics used for bucket notifications.
+
+* librbd: The shared, read-only parent cache's config option ``immutable_object_cache_watermark`` now has been updated
+  to property reflect the upper cache utilization before space is reclaimed. The default ``immutable_object_cache_watermark``
+  now is ``0.9``. If the capacity reaches 90% the daemon will delete cold cache.
+* The ceph_volume_client.py library used for manipulating legacy "volumes" in
+  CephFS is removed. All remaining users should use the "fs volume" interface
+  exposed by the ceph-mgr:
+  https://docs.ceph.com/en/latest/cephfs/fs-volumes/
+
+* An AWS-compliant API: "GetTopicAttributes" was added to replace the existing 
+  "GetTopic" API. The new API should be used to fetch information about topics 
+  used for bucket notifications.
+
+* librbd: The shared, read-only parent cache's config option 
+  ``immutable_object_cache_watermark`` has now been updated to properly reflect 
+  the upper cache utilization before space is reclaimed. The default 
+  ``immutable_object_cache_watermark`` is now ``0.9``. If the capacity reaches 
+  90% the daemon will delete cold cache.
+
+* OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced to allow
+  the OSD to notify the monitor it is shutting down even if ``osd_fast_shutdown``
+  is enabled. This helps with the monitor logs on larger clusters, that may get
+  many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools.
+* rgw/kms/vault: the transit logic has been revamped to better use
+  the transit engine in vault.  To take advantage of this new
+  functionality configuration changes are required.  See the current
+  documentation (radosgw/vault) for more details.
+
+* Scubs are more aggressive in trying to find more simultaneous possible PGs within osd_max_scrubs limitation.
+  It is possible that increasing osd_scrub_sleep may be necessary to maintain client responsiveness.
+* OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced to allow
+  the OSD to notify the monitor it is shutting down even if ``osd_fast_shutdown``
+  is enabled. This helps with the monitor logs on larger clusters, that may get
+  many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools.
+
+* The mclock scheduler has been refined. A set of built-in profiles are now available that
+  provide QoS between the internal and external clients of Ceph. To enable the mclock
+  scheduler, set the config option "osd_op_queue" to "mclock_scheduler". The
+  "high_client_ops" profile is enabled by default, and allocates more OSD bandwidth to
+  external client operations than to internal client operations (such as background recovery
+  and scrubs). Other built-in profiles include "high_recovery_ops" and "balanced". These
+  built-in profiles optimize the QoS provided to clients of mclock scheduler.
+
+* Version 2 of the cephx authentication protocol (``CEPHX_V2`` feature bit) is
+  now required by default.  It was introduced in 2018, adding replay attack
+  protection for authorizers and making msgr v1 message signatures stronger
+  (CVE-2018-1128 and CVE-2018-1129).  Support is present in Jewel 10.2.11,
+  Luminous 12.2.6, Mimic 13.2.1, Nautilus 14.2.0 and later; upstream kernels
+  4.9.150, 4.14.86, 4.19 and later; various distribution kernels, in particular
+  CentOS 7.6 and later.  To enable older clients, set ``cephx_require_version``
+  and ``cephx_service_require_version`` config options to 1.
+
+>=15.0.0
+--------
+
  * MON: The cluster log now logs health detail every ``mon_health_to_clog_interval``,
    which has been changed from 1hr to 10min. Logging of health detail will be
    skipped if there is no change in health summary since last known.
  
  * The ``ceph df`` command now lists the number of pgs in each pool.
  
-* The ``bluefs_preextend_wal_files`` option has been removed.
+* Monitors now have config option ``mon_allow_pool_size_one``, which is disabled
+  by default. However, if enabled, user now have to pass the
+  ``--yes-i-really-mean-it`` flag to ``osd pool set size 1``, if they are really
+  sure of configuring pool size 1.
+
+* librbd now inherits the stripe unit and count from its parent image upon creation.
+  This can be overridden by specifying different stripe settings during clone creation.
+
+* The balancer is now on by default in upmap mode. Since upmap mode requires
+  ``require_min_compat_client`` luminous, new clusters will only support luminous
+  and newer clients by default. Existing clusters can enable upmap support by running
+  ``ceph osd set-require-min-compat-client luminous``. It is still possible to turn
+  the balancer off using the ``ceph balancer off`` command. In earlier versions,
+  the balancer was included in the ``always_on_modules`` list, but needed to be
+  turned on explicitly using the ``ceph balancer on`` command.
+
+* MGR: the "cloud" mode of the diskprediction module is not supported anymore
+  and the ``ceph-mgr-diskprediction-cloud`` manager module has been removed. This
+  is because the external cloud service run by ProphetStor is no longer accessible
+  and there is no immediate replacement for it at this time. The "local" prediction
+  mode will continue to be supported.
+
+* Cephadm: There were a lot of small usability improvements and bug fixes:
+
+  * Grafana when deployed by Cephadm now binds to all network interfaces.
+  * ``cephadm check-host`` now prints all detected problems at once.
+  * Cephadm now calls ``ceph dashboard set-grafana-api-ssl-verify false``
+    when generating an SSL certificate for Grafana.
+  * The Alertmanager is now correctly pointed to the Ceph Dashboard
+  * ``cephadm adopt`` now supports adopting an Alertmanager
+  * ``ceph orch ps`` now supports filtering by service name
+  * ``ceph orch host ls`` now marks hosts as offline, if they are not
+    accessible.
+
+* Cephadm can now deploy NFS Ganesha services. For example, to deploy NFS with
+  a service id of mynfs, that will use the RADOS pool nfs-ganesha and namespace
+  nfs-ns::
+
+    ceph orch apply nfs mynfs nfs-ganesha nfs-ns
+
+* Cephadm: ``ceph orch ls --export`` now returns all service specifications in
+  yaml representation that is consumable by ``ceph orch apply``. In addition,
+  the commands ``orch ps`` and ``orch ls`` now support ``--format yaml`` and
+  ``--format json-pretty``.
+
+* CephFS: Automatic static subtree partitioning policies may now be configured
+  using the new distributed and random ephemeral pinning extended attributes on
+  directories. See the documentation for more information:
+  https://docs.ceph.com/docs/master/cephfs/multimds/
+
+* Cephadm: ``ceph orch apply osd`` supports a ``--preview`` flag that prints a preview of
+  the OSD specification before deploying OSDs. This makes it possible to
+  verify that the specification is correct, before applying it.
+
+* RGW: The ``radosgw-admin`` sub-commands dealing with orphans --
+  ``radosgw-admin orphans find``, ``radosgw-admin orphans finish``, and
+  ``radosgw-admin orphans list-jobs`` -- have been deprecated. They have
+  not been actively maintained and they store intermediate results on
+  the cluster, which could fill a nearly-full cluster.  They have been
+  replaced by a tool, currently considered experimental,
+  ``rgw-orphan-list``.
+
+* RBD: The name of the rbd pool object that is used to store
+  rbd trash purge schedule is changed from "rbd_trash_trash_purge_schedule"
+  to "rbd_trash_purge_schedule". Users that have already started using
+  ``rbd trash purge schedule`` functionality and have per pool or namespace
+  schedules configured should copy "rbd_trash_trash_purge_schedule"
+  object to "rbd_trash_purge_schedule" before the upgrade and remove
+  "rbd_trash_purge_schedule" using the following commands in every RBD
+  pool and namespace where a trash purge schedule was previously
+  configured::
+
+    rados -p <pool-name> [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule
+    rados -p <pool-name> [-N namespace] rm rbd_trash_trash_purge_schedule
+
+  or use any other convenient way to restore the schedule after the
+  upgrade.
+
+* librbd: The shared, read-only parent cache has been moved to a separate librbd
+  plugin. If the parent cache was previously in-use, you must also instruct
+  librbd to load the plugin by adding the following to your configuration::
+
+    rbd_plugins = parent_cache
+
+* Monitors now have a config option ``mon_osd_warn_num_repaired``, 10 by default.
+  If any OSD has repaired more than this many I/O errors in stored data a
+ ``OSD_TOO_MANY_REPAIRS`` health warning is generated.
+
+* Introduce commands that manipulate required client features of a file system::
+
+    ceph fs required_client_features <fs name> add <feature>
+    ceph fs required_client_features <fs name> rm <feature>
+    ceph fs feature ls
+
+* OSD: A new configuration option ``osd_compact_on_start`` has been added which triggers
+  an OSD compaction on start. Setting this option to ``true`` and restarting an OSD
+  will result in an offline compaction of the OSD prior to booting.
+
+* OSD: the option named ``bdev_nvme_retry_count`` has been removed. Because
+  in SPDK v20.07, there is no easy access to bdev_nvme options, and this
+  option is hardly used, so it was removed.
+
+* Now when noscrub and/or nodeep-scrub flags are set globally or per pool,
+  scheduled scrubs of the type disabled will be aborted. All user initiated
+  scrubs are NOT interrupted.
+
+* Alpine build related script, documentation and test have been removed since
+  the most updated APKBUILD script of Ceph is already included by Alpine Linux's
+  aports repository.
+
+* fs: Names of new FSs, volumes, subvolumes and subvolume groups can only
+  contain alphanumeric and ``-``, ``_`` and ``.`` characters. Some commands
+  or CephX credentials may not work with old FSs with non-conformant names.
  
  * It is now possible to specify the initial monitor to contact for Ceph tools
    and daemons using the ``mon_host_override`` config option or
    ``--mon-host-override <ip>`` command-line switch. This generally should only
    be used for debugging and only affects initial communication with Ceph's
    monitor cluster.
+
+* `blacklist` has been replaced with `blocklist` throughout.  The following commands have changed:
+
+  - ``ceph osd blacklist ...`` are now ``ceph osd blocklist ...``
+  - ``ceph <tell|daemon> osd.<NNN> dump_blacklist`` is now ``ceph <tell|daemon> osd.<NNN> dump_blocklist``
+
+* The following config options have changed:
+
+  - ``mon osd blacklist default expire`` is now ``mon osd blocklist default expire``
+  - ``mon mds blacklist interval`` is now ``mon mds blocklist interval``
+  - ``mon mgr blacklist interval`` is now ''mon mgr blocklist interval``
+  - ``rbd blacklist on break lock`` is now ``rbd blocklist on break lock``
+  - ``rbd blacklist expire seconds`` is now ``rbd blocklist expire seconds``
+  - ``mds session blacklist on timeout`` is now ``mds session blocklist on timeout``
+  - ``mds session blacklist on evict`` is now ``mds session blocklist on evict``
+
+* CephFS: Compatibility code for old on-disk format of snapshot has been removed.
+  Current on-disk format of snapshot was introduced by Mimic release. If there
+  are any snapshots created by Ceph release older than Mimic. Before upgrading,
+  either delete them all or scrub the whole filesystem:
+
+    ceph daemon <mds of rank 0> scrub_path / force recursive repair
+    ceph daemon <mds of rank 0> scrub_path '~mdsdir' force recursive repair
+
+* CephFS: Scrub is supported in multiple active mds setup. MDS rank 0 handles
+  scrub commands, and forward scrub to other mds if necessary.
+
+* The following librados API calls have changed:
+
+  - ``rados_blacklist_add`` is now ``rados_blocklist_add``; the former will issue a deprecation warning and be removed in a future release.
+  - ``rados.blacklist_add`` is now ``rados.blocklist_add`` in the C++ API.
+
+* The JSON output for the following commands now shows ``blocklist`` instead of ``blacklist``:
+
+  - ``ceph osd dump``
+  - ``ceph <tell|daemon> osd.<N> dump_blocklist``
+
+* caps: MON and MDS caps can now be used to restrict client's ability to view
+  and operate on specific Ceph file systems. The FS can be specificed using
+  ``fsname`` in caps. This also affects subcommand ``fs authorize``, the caps
+  produce by it will be specific to the FS name passed in its arguments.
+
+* fs: root_squash flag can be set in MDS caps. It disallows file system
+  operations that need write access for clients with uid=0 or gid=0. This
+  feature should prevent accidents such as an inadvertent `sudo rm -rf /<path>`.
+
+* fs: "fs authorize" now sets MON cap to "allow <perm> fsname=<fsname>"
+      instead of setting it to "allow r" all the time.
+
+* ``ceph pg #.# list_unfound`` output has been enhanced to provide
+  might_have_unfound information which indicates which OSDs may
+  contain the unfound objects.
+
+* The ``ceph orch apply rgw`` syntax and behavior have changed.  RGW
+  services can now be arbitrarily named (it is no longer forced to be
+  `realm.zone`).  The ``--rgw-realm=...`` and ``--rgw-zone=...``
+  arguments are now optional, which means that if they are omitted, a
+  vanilla single-cluster RGW will be deployed.  When the realm and
+  zone are provided, the user is now responsible for setting up the
+  multisite configuration beforehand--cephadm no longer attempts to
+  create missing realms or zones.