>=17.0.0 * A new library is available, libcephsqlite. It provides a SQLite Virtual File System (VFS) on top of RADOS. The database and journals are striped over RADOS across multiple objects for virtually unlimited scaling and throughput only limited by the SQLite client. Applications using SQLite may change to the Ceph VFS with minimal changes, usually just by specifying the alternate VFS. We expect the library to be most impactful and useful for applications that were storing state in RADOS omap, especially without striping which limits scalability. >=16.0.0 -------- * CephFS: Disabling allow_standby_replay on a file system will also stop all standby-replay daemons for that file system. * New bluestore_rocksdb_options_annex config parameter. Complements bluestore_rocksdb_options and allows setting rocksdb options without repeating the existing defaults. * The cephfs addes two new CDentry tags, 'I' --> 'i' and 'L' --> 'l', and on-RADOS metadata is no longer backwards compatible after upgraded to Pacific or a later release. * $pid expansion in config paths like `admin_socket` will now properly expand to the daemon pid for commands like `ceph-mds` or `ceph-osd`. Previously only `ceph-fuse`/`rbd-nbd` expanded `$pid` with the actual daemon pid. * The allowable options for some "radosgw-admin" commands have been changed. * "mdlog-list", "datalog-list", "sync-error-list" no longer accepts start and end dates, but does accept a single optional start marker. * "mdlog-trim", "datalog-trim", "sync-error-trim" only accept a single marker giving the end of the trimmed range. * Similarly the date ranges and marker ranges have been removed on the RESTful DATALog and MDLog list and trim operations. * ceph-volume: The ``lvm batch` subcommand received a major rewrite. This closed a number of bugs and improves usability in terms of size specification and calculation, as well as idempotency behaviour and disk replacement process. Please refer to https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for more detailed information. * Configuration variables for permitted scrub times have changed. The legal values for ``osd_scrub_begin_hour`` and ``osd_scrub_end_hour`` are 0 - 23. The use of 24 is now illegal. Specifying ``0`` for both values causes every hour to be allowed. The legal vaues for ``osd_scrub_begin_week_day`` and ``osd_scrub_end_week_day`` are 0 - 6. The use of 7 is now illegal. Specifying ``0`` for both values causes every day of the week to be allowed. * Multiple file systems in a single Ceph cluster is now stable. New Ceph clusters enable support for multiple file systems by default. Existing clusters must still set the "enable_multiple" flag on the fs. Please see the CephFS documentation for more information. * volume/nfs: Recently "ganesha-" prefix from cluster id and nfs-ganesha common config object was removed, to ensure consistent namespace across different orchestrator backends. Please delete any existing nfs-ganesha clusters prior to upgrading and redeploy new clusters after upgrading to Pacific. * A new health check, DAEMON_OLD_VERSION, will warn if different versions of Ceph are running on daemons. It will generate a health error if multiple versions are detected. This condition must exist for over mon_warn_older_version_delay (set to 1 week by default) in order for the health condition to be triggered. This allows most upgrades to proceed without falsely seeing the warning. If upgrade is paused for an extended time period, health mute can be used like this "ceph health mute DAEMON_OLD_VERSION --sticky". In this case after upgrade has finished use "ceph health unmute DAEMON_OLD_VERSION". * MGR: progress module can now be turned on/off, using the commands: ``ceph progress on`` and ``ceph progress off``. * An AWS-compliant API: "GetTopicAttributes" was added to replace the existing "GetTopic" API. The new API should be used to fetch information about topics used for bucket notifications. * librbd: The shared, read-only parent cache's config option ``immutable_object_cache_watermark`` now has been updated to property reflect the upper cache utilization before space is reclaimed. The default ``immutable_object_cache_watermark`` now is ``0.9``. If the capacity reaches 90% the daemon will delete cold cache. * The ceph_volume_client.py library used for manipulating legacy "volumes" in CephFS is removed. All remaining users should use the "fs volume" interface exposed by the ceph-mgr: https://docs.ceph.com/en/latest/cephfs/fs-volumes/ * An AWS-compliant API: "GetTopicAttributes" was added to replace the existing "GetTopic" API. The new API should be used to fetch information about topics used for bucket notifications. * librbd: The shared, read-only parent cache's config option ``immutable_object_cache_watermark`` has now been updated to properly reflect the upper cache utilization before space is reclaimed. The default ``immutable_object_cache_watermark`` is now ``0.9``. If the capacity reaches 90% the daemon will delete cold cache. * OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced to allow the OSD to notify the monitor it is shutting down even if ``osd_fast_shutdown`` is enabled. This helps with the monitor logs on larger clusters, that may get many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools. * rgw/kms/vault: the transit logic has been revamped to better use the transit engine in vault. To take advantage of this new functionality configuration changes are required. See the current documentation (radosgw/vault) for more details. * Scubs are more aggressive in trying to find more simultaneous possible PGs within osd_max_scrubs limitation. It is possible that increasing osd_scrub_sleep may be necessary to maintain client responsiveness. * OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced to allow the OSD to notify the monitor it is shutting down even if ``osd_fast_shutdown`` is enabled. This helps with the monitor logs on larger clusters, that may get many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools. * The mclock scheduler has been refined. A set of built-in profiles are now available that provide QoS between the internal and external clients of Ceph. To enable the mclock scheduler, set the config option "osd_op_queue" to "mclock_scheduler". The "high_client_ops" profile is enabled by default, and allocates more OSD bandwidth to external client operations than to internal client operations (such as background recovery and scrubs). Other built-in profiles include "high_recovery_ops" and "balanced". These built-in profiles optimize the QoS provided to clients of mclock scheduler. * Version 2 of the cephx authentication protocol (``CEPHX_V2`` feature bit) is now required by default. It was introduced in 2018, adding replay attack protection for authorizers and making msgr v1 message signatures stronger (CVE-2018-1128 and CVE-2018-1129). Support is present in Jewel 10.2.11, Luminous 12.2.6, Mimic 13.2.1, Nautilus 14.2.0 and later; upstream kernels 4.9.150, 4.14.86, 4.19 and later; various distribution kernels, in particular CentOS 7.6 and later. To enable older clients, set ``cephx_require_version`` and ``cephx_service_require_version`` config options to 1. >=15.0.0 -------- * MON: The cluster log now logs health detail every ``mon_health_to_clog_interval``, which has been changed from 1hr to 10min. Logging of health detail will be skipped if there is no change in health summary since last known. * The ``ceph df`` command now lists the number of pgs in each pool. * Monitors now have config option ``mon_allow_pool_size_one``, which is disabled by default. However, if enabled, user now have to pass the ``--yes-i-really-mean-it`` flag to ``osd pool set size 1``, if they are really sure of configuring pool size 1. * librbd now inherits the stripe unit and count from its parent image upon creation. This can be overridden by specifying different stripe settings during clone creation. * The balancer is now on by default in upmap mode. Since upmap mode requires ``require_min_compat_client`` luminous, new clusters will only support luminous and newer clients by default. Existing clusters can enable upmap support by running ``ceph osd set-require-min-compat-client luminous``. It is still possible to turn the balancer off using the ``ceph balancer off`` command. In earlier versions, the balancer was included in the ``always_on_modules`` list, but needed to be turned on explicitly using the ``ceph balancer on`` command. * MGR: the "cloud" mode of the diskprediction module is not supported anymore and the ``ceph-mgr-diskprediction-cloud`` manager module has been removed. This is because the external cloud service run by ProphetStor is no longer accessible and there is no immediate replacement for it at this time. The "local" prediction mode will continue to be supported. * Cephadm: There were a lot of small usability improvements and bug fixes: * Grafana when deployed by Cephadm now binds to all network interfaces. * ``cephadm check-host`` now prints all detected problems at once. * Cephadm now calls ``ceph dashboard set-grafana-api-ssl-verify false`` when generating an SSL certificate for Grafana. * The Alertmanager is now correctly pointed to the Ceph Dashboard * ``cephadm adopt`` now supports adopting an Alertmanager * ``ceph orch ps`` now supports filtering by service name * ``ceph orch host ls`` now marks hosts as offline, if they are not accessible. * Cephadm can now deploy NFS Ganesha services. For example, to deploy NFS with a service id of mynfs, that will use the RADOS pool nfs-ganesha and namespace nfs-ns:: ceph orch apply nfs mynfs nfs-ganesha nfs-ns * Cephadm: ``ceph orch ls --export`` now returns all service specifications in yaml representation that is consumable by ``ceph orch apply``. In addition, the commands ``orch ps`` and ``orch ls`` now support ``--format yaml`` and ``--format json-pretty``. * CephFS: Automatic static subtree partitioning policies may now be configured using the new distributed and random ephemeral pinning extended attributes on directories. See the documentation for more information: https://docs.ceph.com/docs/master/cephfs/multimds/ * Cephadm: ``ceph orch apply osd`` supports a ``--preview`` flag that prints a preview of the OSD specification before deploying OSDs. This makes it possible to verify that the specification is correct, before applying it. * RGW: The ``radosgw-admin`` sub-commands dealing with orphans -- ``radosgw-admin orphans find``, ``radosgw-admin orphans finish``, and ``radosgw-admin orphans list-jobs`` -- have been deprecated. They have not been actively maintained and they store intermediate results on the cluster, which could fill a nearly-full cluster. They have been replaced by a tool, currently considered experimental, ``rgw-orphan-list``. * RBD: The name of the rbd pool object that is used to store rbd trash purge schedule is changed from "rbd_trash_trash_purge_schedule" to "rbd_trash_purge_schedule". Users that have already started using ``rbd trash purge schedule`` functionality and have per pool or namespace schedules configured should copy "rbd_trash_trash_purge_schedule" object to "rbd_trash_purge_schedule" before the upgrade and remove "rbd_trash_purge_schedule" using the following commands in every RBD pool and namespace where a trash purge schedule was previously configured:: rados -p [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule rados -p [-N namespace] rm rbd_trash_trash_purge_schedule or use any other convenient way to restore the schedule after the upgrade. * librbd: The shared, read-only parent cache has been moved to a separate librbd plugin. If the parent cache was previously in-use, you must also instruct librbd to load the plugin by adding the following to your configuration:: rbd_plugins = parent_cache * Monitors now have a config option ``mon_osd_warn_num_repaired``, 10 by default. If any OSD has repaired more than this many I/O errors in stored data a ``OSD_TOO_MANY_REPAIRS`` health warning is generated. * Introduce commands that manipulate required client features of a file system:: ceph fs required_client_features add ceph fs required_client_features rm ceph fs feature ls * OSD: A new configuration option ``osd_compact_on_start`` has been added which triggers an OSD compaction on start. Setting this option to ``true`` and restarting an OSD will result in an offline compaction of the OSD prior to booting. * OSD: the option named ``bdev_nvme_retry_count`` has been removed. Because in SPDK v20.07, there is no easy access to bdev_nvme options, and this option is hardly used, so it was removed. * Now when noscrub and/or nodeep-scrub flags are set globally or per pool, scheduled scrubs of the type disabled will be aborted. All user initiated scrubs are NOT interrupted. * Alpine build related script, documentation and test have been removed since the most updated APKBUILD script of Ceph is already included by Alpine Linux's aports repository. * fs: Names of new FSs, volumes, subvolumes and subvolume groups can only contain alphanumeric and ``-``, ``_`` and ``.`` characters. Some commands or CephX credentials may not work with old FSs with non-conformant names. * It is now possible to specify the initial monitor to contact for Ceph tools and daemons using the ``mon_host_override`` config option or ``--mon-host-override `` command-line switch. This generally should only be used for debugging and only affects initial communication with Ceph's monitor cluster. * `blacklist` has been replaced with `blocklist` throughout. The following commands have changed: - ``ceph osd blacklist ...`` are now ``ceph osd blocklist ...`` - ``ceph osd. dump_blacklist`` is now ``ceph osd. dump_blocklist`` * The following config options have changed: - ``mon osd blacklist default expire`` is now ``mon osd blocklist default expire`` - ``mon mds blacklist interval`` is now ``mon mds blocklist interval`` - ``mon mgr blacklist interval`` is now ''mon mgr blocklist interval`` - ``rbd blacklist on break lock`` is now ``rbd blocklist on break lock`` - ``rbd blacklist expire seconds`` is now ``rbd blocklist expire seconds`` - ``mds session blacklist on timeout`` is now ``mds session blocklist on timeout`` - ``mds session blacklist on evict`` is now ``mds session blocklist on evict`` * CephFS: Compatibility code for old on-disk format of snapshot has been removed. Current on-disk format of snapshot was introduced by Mimic release. If there are any snapshots created by Ceph release older than Mimic. Before upgrading, either delete them all or scrub the whole filesystem: ceph daemon scrub_path / force recursive repair ceph daemon scrub_path '~mdsdir' force recursive repair * CephFS: Scrub is supported in multiple active mds setup. MDS rank 0 handles scrub commands, and forward scrub to other mds if necessary. * The following librados API calls have changed: - ``rados_blacklist_add`` is now ``rados_blocklist_add``; the former will issue a deprecation warning and be removed in a future release. - ``rados.blacklist_add`` is now ``rados.blocklist_add`` in the C++ API. * The JSON output for the following commands now shows ``blocklist`` instead of ``blacklist``: - ``ceph osd dump`` - ``ceph osd. dump_blocklist`` * caps: MON and MDS caps can now be used to restrict client's ability to view and operate on specific Ceph file systems. The FS can be specificed using ``fsname`` in caps. This also affects subcommand ``fs authorize``, the caps produce by it will be specific to the FS name passed in its arguments. * fs: root_squash flag can be set in MDS caps. It disallows file system operations that need write access for clients with uid=0 or gid=0. This feature should prevent accidents such as an inadvertent `sudo rm -rf /`. * fs: "fs authorize" now sets MON cap to "allow fsname=" instead of setting it to "allow r" all the time. * ``ceph pg #.# list_unfound`` output has been enhanced to provide might_have_unfound information which indicates which OSDs may contain the unfound objects. * The ``ceph orch apply rgw`` syntax and behavior have changed. RGW services can now be arbitrarily named (it is no longer forced to be `realm.zone`). The ``--rgw-realm=...`` and ``--rgw-zone=...`` arguments are now optional, which means that if they are omitted, a vanilla single-cluster RGW will be deployed. When the realm and zone are provided, the user is now responsible for setting up the multisite configuration beforehand--cephadm no longer attempts to create missing realms or zones.