]>
Commit | Line | Data |
---|---|---|
aee94f69 TL |
1 | >=19.0.0 |
2 | ||
3 | * RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in | |
4 | multi-site. Previously, the replicas of such objects were corrupted on decryption. | |
5 | A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to | |
6 | identify these original multipart uploads. The ``LastModified`` timestamp of any | |
7 | identified object is incremented by 1ns to cause peer zones to replicate it again. | |
8 | For multi-site deployments that make any use of Server-Side Encryption, we | |
9 | recommended running this command against every bucket in every zone after all | |
10 | zones have upgraded. | |
11 | * CEPHFS: MDS evicts clients which are not advancing their request tids which causes | |
12 | a large buildup of session metadata resulting in the MDS going read-only due to | |
13 | the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold` | |
14 | config controls the maximum size that a (encoded) session metadata can grow. | |
15 | * RGW: New tools have been added to radosgw-admin for identifying and | |
16 | correcting issues with versioned bucket indexes. Historical bugs with the | |
17 | versioned bucket index transaction workflow made it possible for the index | |
18 | to accumulate extraneous "book-keeping" olh entries and plain placeholder | |
19 | entries. In some specific scenarios where clients made concurrent requests | |
20 | referencing the same object key, it was likely that a lot of extra index | |
21 | entries would accumulate. When a significant number of these entries are | |
22 | present in a single bucket index shard, they can cause high bucket listing | |
23 | latencies and lifecycle processing failures. To check whether a versioned | |
24 | bucket has unnecessary olh entries, users can now run ``radosgw-admin | |
25 | bucket check olh``. If the ``--fix`` flag is used, the extra entries will | |
26 | be safely removed. A distinct issue from the one described thus far, it is | |
27 | also possible that some versioned buckets are maintaining extra unlinked | |
28 | objects that are not listable from the S3/ Swift APIs. These extra objects | |
29 | are typically a result of PUT requests that exited abnormally, in the middle | |
30 | of a bucket index transaction - so the client would not have received a | |
31 | successful response. Bugs in prior releases made these unlinked objects easy | |
32 | to reproduce with any PUT request that was made on a bucket that was actively | |
33 | resharding. Besides the extra space that these hidden, unlinked objects | |
34 | consume, there can be another side effect in certain scenarios, caused by | |
35 | the nature of the failure mode that produced them, where a client of a bucket | |
36 | that was a victim of this bug may find the object associated with the key to | |
37 | be in an inconsistent state. To check whether a versioned bucket has unlinked | |
38 | entries, users can now run ``radosgw-admin bucket check unlinked``. If the | |
39 | ``--fix`` flag is used, the unlinked objects will be safely removed. Finally, | |
40 | a third issue made it possible for versioned bucket index stats to be | |
41 | accounted inaccurately. The tooling for recalculating versioned bucket stats | |
42 | also had a bug, and was not previously capable of fixing these inaccuracies. | |
43 | This release resolves those issues and users can now expect that the existing | |
44 | ``radosgw-admin bucket check`` command will produce correct results. We | |
45 | recommend that users with versioned buckets, especially those that existed | |
46 | on prior releases, use these new tools to check whether their buckets are | |
47 | affected and to clean them up accordingly. | |
48 | * mgr/snap-schedule: For clusters with multiple CephFS file systems, all the | |
49 | snap-schedule commands now expect the '--fs' argument. | |
50 | ||
1e59de90 TL |
51 | >=18.0.0 |
52 | ||
53 | * The RGW policy parser now rejects unknown principals by default. If you are | |
54 | mirroring policies between RGW and AWS, you may wish to set | |
55 | "rgw policy reject invalid principals" to "false". This affects only newly set | |
56 | policies, not policies that are already in place. | |
57 | * RGW's default backend for `rgw_enable_ops_log` changed from RADOS to file. | |
58 | The default value of `rgw_ops_log_rados` is now false, and `rgw_ops_log_file_path` | |
59 | defaults to "/var/log/ceph/ops-log-$cluster-$name.log". | |
60 | * The SPDK backend for BlueStore is now able to connect to an NVMeoF target. | |
61 | Please note that this is not an officially supported feature. | |
62 | * RGW's pubsub interface now returns boolean fields using bool. Before this change, | |
63 | `/topics/<topic-name>` returns "stored_secret" and "persistent" using a string | |
64 | of "true" or "false" with quotes around them. After this change, these fields | |
65 | are returned without quotes so they can be decoded as boolean values in JSON. | |
66 | The same applies to the `is_truncated` field returned by `/subscriptions/<sub-name>`. | |
67 | * RGW's response of `Action=GetTopicAttributes&TopicArn=<topic-arn>` REST API now | |
68 | returns `HasStoredSecret` and `Persistent` as boolean in the JSON string | |
69 | encoded in `Attributes/EndPoint`. | |
70 | * All boolean fields previously rendered as string by `rgw-admin` command when | |
71 | the JSON format is used are now rendered as boolean. If your scripts/tools | |
72 | relies on this behavior, please update them accordingly. The impacted field names | |
73 | are: | |
74 | * absolute | |
75 | * add | |
76 | * admin | |
77 | * appendable | |
78 | * bucket_key_enabled | |
79 | * delete_marker | |
80 | * exists | |
81 | * has_bucket_info | |
82 | * high_precision_time | |
83 | * index | |
84 | * is_master | |
85 | * is_prefix | |
86 | * is_truncated | |
87 | * linked | |
88 | * log_meta | |
89 | * log_op | |
90 | * pending_removal | |
91 | * read_only | |
92 | * retain_head_object | |
93 | * rule_exist | |
94 | * start_with_full_sync | |
95 | * sync_from_all | |
96 | * syncstopped | |
97 | * system | |
98 | * truncated | |
99 | * user_stats_sync | |
100 | * RGW: The beast frontend's HTTP access log line uses a new debug_rgw_access | |
101 | configurable. This has the same defaults as debug_rgw, but can now be controlled | |
102 | independently. | |
39ae355f TL |
103 | * RBD: The semantics of compare-and-write C++ API (`Image::compare_and_write` |
104 | and `Image::aio_compare_and_write` methods) now match those of C API. Both | |
105 | compare and write steps operate only on `len` bytes even if the respective | |
106 | buffers are larger. The previous behavior of comparing up to the size of | |
107 | the compare buffer was prone to subtle breakage upon straddling a stripe | |
108 | unit boundary. | |
109 | * RBD: compare-and-write operation is no longer limited to 512-byte sectors. | |
110 | Assuming proper alignment, it now allows operating on stripe units (4M by | |
111 | default). | |
112 | * RBD: New `rbd_aio_compare_and_writev` API method to support scatter/gather | |
113 | on both compare and write buffers. This compliments existing `rbd_aio_readv` | |
114 | and `rbd_aio_writev` methods. | |
1e59de90 TL |
115 | * The 'AT_NO_ATTR_SYNC' macro is deprecated, please use the standard 'AT_STATX_DONT_SYNC' |
116 | macro. The 'AT_NO_ATTR_SYNC' macro will be removed in the future. | |
39ae355f TL |
117 | * Trimming of PGLog dups is now controlled by the size instead of the version. |
118 | This fixes the PGLog inflation issue that was happening when the on-line | |
119 | (in OSD) trimming got jammed after a PG split operation. Also, a new off-line | |
120 | mechanism has been added: `ceph-objectstore-tool` got `trim-pg-log-dups` op | |
121 | that targets situations where OSD is unable to boot due to those inflated dups. | |
122 | If that is the case, in OSD logs the "You can be hit by THE DUPS BUG" warning | |
123 | will be visible. | |
124 | Relevant tracker: https://tracker.ceph.com/issues/53729 | |
1e59de90 TL |
125 | * RBD: `rbd device unmap` command gained `--namespace` option. Support for |
126 | namespaces was added to RBD in Nautilus 14.2.0 and it has been possible to | |
127 | map and unmap images in namespaces using the `image-spec` syntax since then | |
128 | but the corresponding option available in most other commands was missing. | |
129 | * RGW: Compression is now supported for objects uploaded with Server-Side Encryption. | |
05a536ef TL |
130 | When both are enabled, compression is applied before encryption. Earlier releases |
131 | of multisite do not replicate such objects correctly, so all zones must upgrade to | |
132 | Reef before enabling the `compress-encrypted` zonegroup feature: see | |
133 | https://docs.ceph.com/en/reef/radosgw/multisite/#zone-features and note the | |
134 | security considerations. | |
1e59de90 TL |
135 | * RGW: the "pubsub" functionality for storing bucket notifications inside Ceph |
136 | is removed. Together with it, the "pubsub" zone should not be used anymore. | |
137 | The REST operations, as well as radosgw-admin commands for manipulating | |
138 | subscriptions, as well as fetching and acking the notifications are removed | |
139 | as well. | |
140 | In case that the endpoint to which the notifications are sent maybe down or | |
141 | disconnected, it is recommended to use persistent notifications to guarantee | |
142 | the delivery of the notifications. In case the system that consumes the | |
143 | notifications needs to pull them (instead of the notifications be pushed | |
144 | to it), an external message bus (e.g. rabbitmq, Kafka) should be used for | |
145 | that purpose. | |
146 | * RGW: The serialized format of notification and topics has changed, so that | |
147 | new/updated topics will be unreadable by old RGWs. We recommend completing | |
148 | the RGW upgrades before creating or modifying any notification topics. | |
149 | * RBD: Trailing newline in passphrase files (`<passphrase-file>` argument in | |
150 | `rbd encryption format` command and `--encryption-passphrase-file` option | |
151 | in other commands) is no longer stripped. | |
152 | * RBD: Support for layered client-side encryption is added. Cloned images | |
153 | can now be encrypted each with its own encryption format and passphrase, | |
154 | potentially different from that of the parent image. The efficient | |
155 | copy-on-write semantics intrinsic to unformatted (regular) cloned images | |
156 | are retained. | |
157 | * CEPHFS: Rename the `mds_max_retries_on_remount_failure` option to | |
158 | `client_max_retries_on_remount_failure` and move it from mds.yaml.in to | |
159 | mds-client.yaml.in because this option was only used by MDS client from its | |
160 | birth. | |
161 | * The `perf dump` and `perf schema` commands are deprecated in favor of new | |
162 | `counter dump` and `counter schema` commands. These new commands add support | |
163 | for labeled perf counters and also emit existing unlabeled perf counters. Some | |
164 | unlabeled perf counters became labeled in this release, with more to follow in | |
165 | future releases; such converted perf counters are no longer emitted by the | |
166 | `perf dump` and `perf schema` commands. | |
167 | * `ceph mgr dump` command now outputs `last_failure_osd_epoch` and | |
168 | `active_clients` fields at the top level. Previously, these fields were | |
169 | output under `always_on_modules` field. | |
170 | * `ceph mgr dump` command now displays the name of the mgr module that | |
171 | registered a RADOS client in the `name` field added to elements of the | |
172 | `active_clients` array. Previously, only the address of a module's RADOS | |
173 | client was shown in the `active_clients` array. | |
174 | * RBD: All rbd-mirror daemon perf counters became labeled and as such are now | |
175 | emitted only by the new `counter dump` and `counter schema` commands. As part | |
176 | of the conversion, many also got renamed to better disambiguate journal-based | |
177 | and snapshot-based mirroring. | |
178 | * RBD: list-watchers C++ API (`Image::list_watchers`) now clears the passed | |
179 | `std::list` before potentially appending to it, aligning with the semantics | |
180 | of the corresponding C API (`rbd_watchers_list`). | |
05a536ef TL |
181 | * The rados python binding is now able to process (opt-in) omap keys as bytes |
182 | objects. This enables interacting with RADOS omap keys that are not decodeable as | |
183 | UTF-8 strings. | |
1e59de90 TL |
184 | * Telemetry: Users who are opted-in to telemetry can also opt-in to |
185 | participating in a leaderboard in the telemetry public | |
186 | dashboards (https://telemetry-public.ceph.com/). Users can now also add a | |
187 | description of the cluster to publicly appear in the leaderboard. | |
188 | For more details, see: | |
189 | https://docs.ceph.com/en/latest/mgr/telemetry/#leaderboard | |
190 | See a sample report with `ceph telemetry preview`. | |
191 | Opt-in to telemetry with `ceph telemetry on`. | |
192 | Opt-in to the leaderboard with | |
193 | `ceph config set mgr mgr/telemetry/leaderboard true`. | |
194 | Add leaderboard description with: | |
195 | `ceph config set mgr mgr/telemetry/leaderboard_description ‘Cluster description’`. | |
196 | * CEPHFS: After recovering a Ceph File System post following the disaster recovery | |
197 | procedure, the recovered files under `lost+found` directory can now be deleted. | |
198 | * core: cache-tiering is now deprecated. | |
199 | * mClock Scheduler: The mClock scheduler (default scheduler in Quincy) has | |
200 | undergone significant usability and design improvements to address the slow | |
201 | backfill issue. Some important changes are: | |
202 | * The 'balanced' profile is set as the default mClock profile because it | |
203 | represents a compromise between prioritizing client IO or recovery IO. Users | |
204 | can then choose either the 'high_client_ops' profile to prioritize client IO | |
205 | or the 'high_recovery_ops' profile to prioritize recovery IO. | |
206 | * QoS parameters like reservation and limit are now specified in terms of a | |
207 | fraction (range: 0.0 to 1.0) of the OSD's IOPS capacity. | |
208 | * The cost parameters (osd_mclock_cost_per_io_usec_* and | |
209 | osd_mclock_cost_per_byte_usec_*) have been removed. The cost of an operation | |
210 | is now determined using the random IOPS and maximum sequential bandwidth | |
211 | capability of the OSD's underlying device. | |
212 | * Degraded object recovery is given higher priority when compared to misplaced | |
213 | object recovery because degraded objects present a data safety issue not | |
214 | present with objects that are merely misplaced. Therefore, backfilling | |
215 | operations with the 'balanced' and 'high_client_ops' mClock profiles may | |
216 | progress slower than what was seen with the 'WeightedPriorityQueue' (WPQ) | |
217 | scheduler. | |
218 | * The QoS allocations in all the mClock profiles are optimized based on the above | |
219 | fixes and enhancements. | |
220 | * For more detailed information see: | |
221 | https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/ | |
05a536ef TL |
222 | * CEPHFS: After recovering a Ceph File System post following the disaster recovery |
223 | procedure, the recovered files under `lost+found` directory can now be deleted. | |
aee94f69 TL |
224 | https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/ |
225 | * mgr/snap_schedule: The snap-schedule mgr module now retains one less snapshot | |
226 | than the number mentioned against the config tunable `mds_max_snaps_per_dir` | |
227 | so that a new snapshot can be created and retained during the next schedule | |
228 | run. | |
39ae355f | 229 | |
33c7a0ef TL |
230 | >=17.2.1 |
231 | ||
232 | * The "BlueStore zero block detection" feature (first introduced to Quincy in | |
233 | https://github.com/ceph/ceph/pull/43337) has been turned off by default with a | |
234 | new global configuration called `bluestore_zero_block_detection`. This feature, | |
235 | intended for large-scale synthetic testing, does not interact well with some RBD | |
236 | and CephFS features. Any side effects experienced in previous Quincy versions | |
237 | would no longer occur, provided that the configuration remains set to false. | |
238 | Relevant tracker: https://tracker.ceph.com/issues/55521 | |
239 | ||
240 | * telemetry: Added new Rook metrics to the 'basic' channel to report Rook's | |
241 | version, Kubernetes version, node metrics, etc. | |
242 | See a sample report with `ceph telemetry preview`. | |
243 | Opt-in with `ceph telemetry on`. | |
244 | ||
245 | For more details, see: | |
246 | ||
247 | https://docs.ceph.com/en/latest/mgr/telemetry/ | |
248 | ||
1e59de90 TL |
249 | * OSD: The issue of high CPU utilization during recovery/backfill operations |
250 | has been fixed. For more details, see: https://tracker.ceph.com/issues/56530. | |
20effc67 | 251 | |
1e59de90 TL |
252 | >=15.2.17 |
253 | ||
254 | * OSD: Octopus modified the SnapMapper key format from | |
255 | <LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()> | |
256 | to | |
257 | <MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()> | |
258 | When this change was introduced, 94ebe0e also introduced a conversion | |
259 | with a crucial bug which essentially destroyed legacy keys by mapping them | |
260 | to | |
261 | <MAPPING_PREFIX><poolid>_<snapid>_ | |
262 | without the object-unique suffix. The conversion is fixed in this release. | |
263 | Relevant tracker: https://tracker.ceph.com/issues/56147 | |
264 | ||
265 | * Cephadm may now be configured to carry out CephFS MDS upgrades without | |
266 | reducing ``max_mds`` to 1. Previously, Cephadm would reduce ``max_mds`` to 1 to | |
267 | avoid having two active MDS modifying on-disk structures with new versions, | |
268 | communicating cross-version-incompatible messages, or other potential | |
269 | incompatibilities. This could be disruptive for large-scale CephFS deployments | |
270 | because the cluster cannot easily reduce active MDS daemons to 1. | |
271 | NOTE: Staggered upgrade of the mons/mgrs may be necessary to take advantage | |
272 | of the feature, refer this link on how to perform it: | |
273 | https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade | |
274 | Relevant tracker: https://tracker.ceph.com/issues/55715 | |
275 | ||
276 | Relevant tracker: https://tracker.ceph.com/issues/5614 | |
277 | ||
278 | * Cephadm may now be configured to carry out CephFS MDS upgrades without | |
279 | reducing ``max_mds`` to 1. Previously, Cephadm would reduce ``max_mds`` to 1 to | |
280 | avoid having two active MDS modifying on-disk structures with new versions, | |
281 | communicating cross-version-incompatible messages, or other potential | |
282 | incompatibilities. This could be disruptive for large-scale CephFS deployments | |
283 | because the cluster cannot easily reduce active MDS daemons to 1. | |
284 | NOTE: Staggered upgrade of the mons/mgrs may be necessary to take advantage | |
285 | of the feature, refer this link on how to perform it: | |
286 | https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade | |
287 | Relevant tracker: https://tracker.ceph.com/issues/55715 | |
288 | ||
289 | * Introduced a new file system flag `refuse_client_session` that can be set using the | |
290 | `fs set` command. This flag allows blocking any incoming session | |
291 | request from client(s). This can be useful during some recovery situations | |
292 | where it's desirable to bring MDS up but have no client workload. | |
293 | Relevant tracker: https://tracker.ceph.com/issues/57090 |