5 Quincy is the 17th stable release of Ceph. It is named after Squidward
6 Quincy Tentacles from Spongebob Squarepants.
11 This is the first stable release of Ceph Quincy.
13 Major Changes from Pacific
14 --------------------------
19 * Filestore has been deprecated in Quincy. BlueStore is Ceph's default object
22 * The `ceph-mgr-modules-core` debian package no longer recommends
23 `ceph-mgr-rook`. `ceph-mgr-rook` depends on `python3-numpy`, which
24 cannot be imported in different Python sub-interpreters multiple times
25 when the version of `python3-numpy` is older than 1.19. Because
26 `apt-get` installs the `Recommends` packages by default, `ceph-mgr-rook`
27 was always installed along with the `ceph-mgr` debian package as an
28 indirect dependency. If your workflow depends on this behavior, you
29 might want to install `ceph-mgr-rook` separately.
31 * The ``device_health_metrics`` pool has been renamed ``.mgr``. It is now
32 used as a common store for all ``ceph-mgr`` modules. After upgrading to
33 Quincy, the ``device_health_metrics`` pool will be renamed to ``.mgr``
36 * The ``ceph pg dump`` command now prints three additional columns:
37 `LAST_SCRUB_DURATION` shows the duration (in seconds) of the last completed
39 `SCRUB_SCHEDULING` conveys whether a PG is scheduled to be scrubbed at a
40 specified time, whether it is queued for scrubbing, or whether it is being
42 `OBJECTS_SCRUBBED` shows the number of objects scrubbed in a PG after a
45 * A health warning is now reported if the ``require-osd-release`` flag
46 is not set to the appropriate release after a cluster upgrade.
48 * LevelDB support has been removed. ``WITH_LEVELDB`` is no longer a supported
49 build option. Users *should* migrate their monitors and OSDs to RocksDB
50 before upgrading to Quincy.
52 * Cephadm: ``osd_memory_target_autotune`` is enabled by default, which sets
53 ``mgr/cephadm/autotune_memory_target_ratio`` to ``0.7`` of total RAM. This
54 is unsuitable for hyperconverged infrastructures. For hyperconverged Ceph,
55 please refer to the documentation or set
56 ``mgr/cephadm/autotune_memory_target_ratio`` to ``0.2``.
58 * telemetry: Improved the opt-in flow so that users can keep sharing the same
59 data, even when new data collections are available. A new 'perf' channel that
60 collects various performance metrics is now avaiable to opt into with:
62 `ceph telemetry enable channel perf`
63 See a sample report with `ceph telemetry preview`.
64 Note that generating a telemetry report with 'perf' channel data might
65 take a few moments in big clusters.
66 For more details, see:
67 https://docs.ceph.com/en/quincy/mgr/telemetry/
69 * MGR: The progress module disables the pg recovery event by default since the
70 event is expensive and has interrupted other services when there are OSDs
71 being marked in/out from the the cluster. However, the user can still enable
72 this event anytime. For more detail, see:
74 https://docs.ceph.com/en/quincy/mgr/progress/
76 * https://tracker.ceph.com/issues/55383 is a known issue -
77 ``mon_cluster_log_to_journald`` needs to be set to false, when
78 ``mon_cluster_log_to_file`` is set to true to continue to log cluster
79 log messages to file, after log rotation.
85 * Colocation of Daemons (mgr, mds, rgw)
86 * osd memory autotuning
87 * Integration with new NFS mgr module
88 * Ability to zap osds as they are removed
89 * cephadm agent for increased performance/scalability
93 * Day 1: the new "Cluster Expansion Wizard" will guide users through post-install steps:
94 adding new hosts, storage devices or services.
95 * NFS: the Dashboard now allows users to fully manage all NFS exports from a single place.
96 * New mgr module (feedback): users can quickly report Ceph tracker issues
97 or suggestions directly from the Dashboard or the CLI.
98 * New "Message of the Day": cluster admins can publish a custom message in a banner.
99 * Cephadm integration improvements:
100 * Host management: maintenance, specs and labelling,
101 * Service management: edit and display logs,
102 * Daemon management (start, stop, restart, reload),
103 * New services supported: ingress (HAProxy) and SNMP-gateway.
104 * Monitoring and alerting:
105 * 43 new alerts have been added (totalling 68) improving observability of events affecting:
106 cluster health, monitors, storage devices, PGs and CephFS.
107 * Alerts can now be sent externally as SNMP traps via the new SNMP gateway service
108 (the MIB is provided).
109 * Improved integrated full/nearfull event notifications.
110 * Grafana Dashboards now use grafonnet format (though they're still available
112 * Stack update: images for monitoring containers have been updated.
113 Grafana 8.3.5, Prometheus 2.33.4, Alertmanager 0.23.0 and Node Exporter 1.3.1.
114 This reduced exposure to several Grafana vulnerabilities (CVE-2021-43798,
115 CVE-2021-39226, CVE-2021-43798, CVE-2020-29510, CVE-2020-29511).
120 * OSD: Ceph now uses `mclock_scheduler` for BlueStore OSDs as its default
121 `osd_op_queue` to provide QoS. The 'mclock_scheduler' is not supported
122 for Filestore OSDs. Therefore, the default 'osd_op_queue' is set to `wpq`
123 for Filestore OSDs and is enforced even if the user attempts to change it.
124 For more details on configuring mclock see,
126 https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/
128 An outstanding issue exists during runtime where the mclock config options
129 related to reservation, weight and limit cannot be modified after switching
130 to the `custom` mclock profile using the `ceph config set ...` command.
131 This is tracked by: https://tracker.ceph.com/issues/55153. Until the issue
132 is fixed, users are advised to avoid using the 'custom' profile or use the
133 workaround mentioned in the tracker.
135 * MGR: The pg_autoscaler can now be turned `on` and `off` globally
136 with the `noautoscale` flag. By default, it is set to `on`, but this flag
137 can come in handy to prevent rebalancing triggered by autoscaling during
138 cluster upgrade and maintenance. Pools can now be created with the `--bulk`
139 flag, which allows the autoscaler to allocate more PGs to such pools. This
140 can be useful to get better out of the box performance for data-heavy pools.
142 For more details about autoscaling, see:
143 https://docs.ceph.com/en/quincy/rados/operations/placement-groups/
145 * OSD: Support for on-wire compression for osd-osd communication, `off` by
148 For more details about compression modes, see:
149 https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#compression-modes
151 * OSD: Concise reporting of slow operations in the cluster log. The old
152 and more verbose logging behavior can be regained by setting
153 `osd_aggregated_slow_ops_logging` to false.
155 * the "kvs" Ceph object class is not packaged anymore. The "kvs" Ceph
156 object class offers a distributed flat b-tree key-value store that
157 is implemented on top of the librados objects omap. Because there
158 are no existing internal users of this object class, it is not
164 * rbd-nbd: `rbd device attach` and `rbd device detach` commands added,
165 these allow for safe reattach after `rbd-nbd` daemon is restarted since
168 * rbd-nbd: `notrim` map option added to support thick-provisioned images,
171 * Large stabilization effort for client-side persistent caching on SSD
172 devices, also available in 16.2.8. For details on usage, see:
174 https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/
176 * Several bug fixes in diff calculation when using fast-diff image
177 feature + whole object (inexact) mode. In some rare cases these
178 long-standing issues could cause an incorrect `rbd export`. Also
179 fixed in 15.2.16 and 16.2.8.
181 * Fix for a potential performance degradation when running Windows VMs
182 on krbd. For details, see `rxbounce` map option description:
184 https://docs.ceph.com/en/quincy/man/8/rbd/#kernel-rbd-krbd-options
189 * RGW now supports rate limiting by user and/or by bucket. With this
190 feature it is possible to limit user and/or bucket, the total operations
191 and/or bytes per minute can be delivered. This feature allows the
192 admin to limit only READ operations and/or WRITE operations. The
193 rate-limiting configuration could be applied on all users and all buckets
194 by using global configuration.
196 * `radosgw-admin realm delete` has been renamed to `radosgw-admin realm
197 rm`. This is consistent with the help message.
199 * S3 bucket notification events now contain an `eTag` key instead of
200 `etag`, and eventName values no longer carry the `s3:` prefix, fixing
201 deviations from the message format that is observed on AWS.
203 * It is possible to specify ssl options and ciphers for beast frontend
204 now. The default ssl options setting is
205 "no_sslv2:no_sslv3:no_tlsv1:no_tlsv1_1". If you want to return to the old
206 behavior, add 'ssl_options=' (empty) to the ``rgw frontends`` configuration.
208 * The behavior for Multipart Upload was modified so that only
209 CompleteMultipartUpload notification is sent at the end of the multipart
210 upload. The POST notification at the beginning of the upload and the PUT
211 notifications that were sent on each part are no longer sent.
214 CephFS distributed file system
215 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
217 * fs: A file system can be created with a specific ID ("fscid"). This is
218 useful in certain recovery scenarios (for example, when a monitor
219 database has been lost and rebuilt, and the restored file system is
220 expected to have the same ID as before).
222 * fs: A file system can be renamed using the `fs rename` command. Any cephx
223 credentials authorized for the old file system name will need to be
224 reauthorized to the new file system name. Since the operations of the clients
225 using these re-authorized IDs may be disrupted, this command requires the
226 "--yes-i-really-mean-it" flag. Also, mirroring is expected to be disabled
229 * MDS upgrades no longer require all standby MDS daemons to be stoped before
230 upgrading a file systems's sole active MDS.
232 * CephFS: Failure to replay the journal by a standby-replay daemon now
233 causes the rank to be marked "damaged".
235 Upgrading from Octopus or Pacific
236 ----------------------------------
238 Quincy does not support LevelDB. Please migrate your OSDs and monitors
239 to RocksDB before upgrading to Quincy.
241 Before starting, make sure your cluster is stable and healthy (no down or
242 recovering OSDs). (This is optional, but recommended.) You can disable
243 the autoscaler for all pools during the upgrade using the noautoscale flag.
247 You can monitor the progress of your upgrade at each stage with the
248 ``ceph versions`` command, which will tell you what ceph version(s) are
249 running for each type of daemon.
251 Upgrading cephadm clusters
252 ~~~~~~~~~~~~~~~~~~~~~~~~~~
254 If your cluster is deployed with cephadm (first introduced in Octopus), then
255 the upgrade process is entirely automated. To initiate the upgrade,
259 ceph orch upgrade start --ceph-version 17.2.0
261 The same process is used to upgrade to future minor releases.
263 Upgrade progress can be monitored with ``ceph -s`` (which provides a simple
264 progress bar) or more verbosely with
270 The upgrade can be paused or resumed with
274 ceph orch upgrade pause # to pause
275 ceph orch upgrade resume # to resume
281 ceph orch upgrade stop
283 Note that canceling the upgrade simply stops the process; there is no ability to
284 downgrade back to Octopus or Pacific.
287 Upgrading non-cephadm clusters
288 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
291 If you cluster is running Octopus (15.2.x) or later, you might choose
292 to first convert it to use cephadm so that the upgrade to Quincy
293 is automated (see above). For more information, see
294 :ref:`cephadm-adoption`.
296 #. Set the ``noout`` flag for the duration of the upgrade. (Optional,
301 #. Upgrade monitors by installing the new packages and restarting the
302 monitor daemons. For example, on each monitor host,::
304 # systemctl restart ceph-mon.target
306 Once all monitors are up, verify that the monitor upgrade is
307 complete by looking for the ``quincy`` string in the mon
310 # ceph mon dump | grep min_mon_release
314 min_mon_release 17 (quincy)
316 If it doesn't, that implies that one or more monitors hasn't been
317 upgraded and restarted and/or the quorum does not include all monitors.
319 #. Upgrade ``ceph-mgr`` daemons by installing the new packages and
320 restarting all manager daemons. For example, on each manager host,::
322 # systemctl restart ceph-mgr.target
324 Verify the ``ceph-mgr`` daemons are running by checking ``ceph
331 mon: 3 daemons, quorum foo,bar,baz
332 mgr: foo(active), standbys: bar, baz
335 #. Upgrade all OSDs by installing the new packages and restarting the
336 ceph-osd daemons on all OSD hosts::
338 # systemctl restart ceph-osd.target
340 #. Upgrade all CephFS MDS daemons. For each CephFS file system,
342 #. Disable standby_replay::
344 # ceph fs set <fs_name> allow_standby_replay false
346 #. Reduce the number of ranks to 1. (Make note of the original
347 number of MDS daemons first if you plan to restore it later.)::
350 # ceph fs set <fs_name> max_mds 1
352 #. Wait for the cluster to deactivate any non-zero ranks by
353 periodically checking the status::
357 #. Take all standby MDS daemons offline on the appropriate hosts with::
359 # systemctl stop ceph-mds@<daemon_name>
361 #. Confirm that only one MDS is online and is rank 0 for your FS::
365 #. Upgrade the last remaining MDS daemon by installing the new
366 packages and restarting the daemon::
368 # systemctl restart ceph-mds.target
370 #. Restart all standby MDS daemons that were taken offline::
372 # systemctl start ceph-mds.target
374 #. Restore the original value of ``max_mds`` for the volume::
376 # ceph fs set <fs_name> max_mds <original_max_mds>
378 #. Upgrade all radosgw daemons by upgrading packages and restarting
379 daemons on all hosts::
381 # systemctl restart ceph-radosgw.target
383 #. Complete the upgrade by disallowing pre-Quincy OSDs and enabling
384 all new Quincy-only functionality::
386 # ceph osd require-osd-release quincy
388 #. If you set ``noout`` at the beginning, be sure to clear it with::
390 # ceph osd unset noout
392 #. Consider transitioning your cluster to use the cephadm deployment
393 and orchestration framework to simplify cluster management and
394 future upgrades. For more information on converting an existing
395 cluster to cephadm, see :ref:`cephadm-adoption`.
400 #. Verify the cluster is healthy with ``ceph health``. If your cluster is
401 running Filestore, a deprecation warning is expected. This warning can
402 be temporarily muted using the following command::
404 ceph health mute OSD_FILESTORE
406 #. If you are upgrading from Mimic, or did not already do so when you
407 upgraded to Nautilus, we recommend you enable the new :ref:`v2
408 network protocol <msgr2>`, issue the following command::
410 ceph mon enable-msgr2
412 This will instruct all monitors that bind to the old default port
413 6789 for the legacy v1 protocol to also bind to the new 3300 v2
414 protocol port. To see if all monitors have been updated,::
418 and verify that each monitor has both a ``v2:`` and ``v1:`` address
421 #. Consider enabling the :ref:`telemetry module <telemetry>` to send
422 anonymized usage statistics and crash information to the Ceph
423 upstream developers. To see what would be reported (without actually
424 sending any information to anyone),::
426 ceph telemetry preview-all
428 If you are comfortable with the data that is reported, you can opt-in to
429 automatically report the high-level cluster metadata with::
433 The public dashboard that aggregates Ceph telemetry can be found at
434 `https://telemetry-public.ceph.com/ <https://telemetry-public.ceph.com/>`_.
436 For more information about the telemetry module, see :ref:`the
437 documentation <telemetry>`.
440 Upgrading from pre-Octopus releases (like Nautilus)
441 ---------------------------------------------------
444 You *must* first upgrade to Octopus (15.2.z) or Pacific (16.2.z) before