]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | v14.1.0 Nautilus (release candidate 1) |
2 | ====================================== | |
3 | ||
4 | .. note: We expect to make a msgr2 protocol revision after this first | |
5 | release candidate. If you upgrade to v14.1.0 *and* enable msgr2, | |
6 | you will need to restart all daemons after upgrading to v14.1.1 or | |
7 | any other later nautilus release. | |
8 | ||
9 | .. note: These are draft notes for the first Nautilus release. | |
10 | ||
11 | Major Changes from Mimic | |
12 | ------------------------ | |
13 | ||
14 | - *Dashboard*: | |
15 | ||
16 | The Ceph Dashboard has gained a lot of new functionality: | |
17 | ||
18 | * Support for multiple users / roles | |
19 | * SSO (SAMLv2) for user authentication | |
20 | * Auditing support | |
21 | * New landing page, showing more metrics and health info | |
22 | * I18N support | |
23 | * REST API documentation with Swagger API | |
24 | ||
25 | New Ceph management features include: | |
26 | ||
27 | * OSD management (mark as down/out, change OSD settings, recovery profiles) | |
28 | * Cluster config settings editor | |
29 | * Ceph Pool management (create/modify/delete) | |
30 | * ECP management | |
31 | * RBD mirroring configuration | |
32 | * Embedded Grafana Dashboards (derived from Ceph Metrics) | |
33 | * CRUSH map viewer | |
34 | * NFS Ganesha management | |
35 | * iSCSI target management (via ceph-iscsi) | |
36 | * RBD QoS configuration | |
37 | * Ceph Manager (ceph-mgr) module management | |
38 | * Prometheus alert Management | |
39 | ||
40 | Also, the Ceph Dashboard is now split into its own package named | |
41 | ``ceph-mgr-dashboard``. So, you might want to install it separately, | |
42 | if your package management software fails to do so when it installs | |
43 | ``ceph-mgr``. | |
44 | ||
45 | - *RADOS*: | |
46 | ||
47 | * The number of placement groups (PGs) per pool can now be decreased | |
48 | at any time, and the cluster can automatically tune the PG count | |
49 | based on cluster utilization or administrator hints. | |
50 | * The new :ref:`v2 wire protocol <msgr2>` brings support for encryption on the wire. | |
51 | * Physical storage devices consumed by OSD and Monitor daemons are | |
52 | now tracked by the cluster along with health metrics (i.e., | |
53 | SMART), and the cluster can apply a pre-trained prediction model | |
54 | or a cloud-based prediction service to warn about expected | |
55 | HDD or SSD failures. | |
56 | * The NUMA node for OSD daemons can easily be monitored via the | |
57 | ``ceph osd numa-status`` command, and configured via the | |
58 | ``osd_numa_node`` config option. | |
59 | * When BlueStore OSDs are used, space utilization is now broken down | |
60 | by object data, omap data, and internal metadata, by pool, and by | |
61 | pre- and post- compression sizes. | |
62 | * OSDs more effectively prioritize the most important PGs and | |
63 | objects when performing recovery and backfill. | |
64 | * Progress for long-running background processes--like recovery | |
65 | after a device failure--is now reported as part of ``ceph | |
66 | status``. | |
67 | * An experimental `Coupled-Layer "Clay" erasure code | |
68 | <https://www.usenix.org/conference/fast18/presentation/vajha>`_ | |
69 | plugin has been added that reduces network bandwidth and IO needed | |
70 | for most recovery operations. | |
71 | ||
72 | - *RGW*: | |
73 | ||
74 | * S3 lifecycle transition for tiering between storage classes. | |
75 | * A new web frontend (Beast) has replaced civetweb as the default, | |
76 | improving overall performance. | |
77 | * A new publish/subscribe infrastructure allows RGW to feed events | |
78 | to serverless frameworks like knative or data pipelies like Kafka. | |
79 | * A range of authentication features, including STS federation using | |
80 | OAuth2 and OpenID::connect and an OPA (Open Policy Agent) | |
81 | authentication delegation prototype. | |
82 | * The new archive zone federation feature enables full preservation | |
83 | of all objects (including history) in a separate zone. | |
84 | ||
85 | - *CephFS*: | |
86 | ||
87 | * MDS stability has been greatly improved for large caches and | |
88 | long-running clients with a lot of RAM. Cache trimming and client | |
89 | capability recall is now throttled to prevent overloading the MDS. | |
90 | * CephFS may now be exported via NFS-Ganesha clusters in environments managed | |
91 | by Rook. Ceph manages the clusters and ensures high-availability and | |
92 | scalability. An `introductory demo | |
93 | <https://ceph.com/community/deploying-a-cephnfs-server-cluster-with-rook/>`_ | |
94 | is available. More automation of this feature is expected to be forthcoming | |
95 | in future minor releases of Nautilus. | |
96 | * The MDS ``mds_standby_for_*``, ``mon_force_standby_active``, and | |
97 | ``mds_standby_replay`` configuration options have been obsoleted. Instead, | |
98 | the operator :ref:`may now set <mds-standby-replay>` the new | |
99 | ``allow_standby_replay`` flag on the CephFS file system. This setting | |
100 | causes standbys to become standby-replay for any available rank in the file | |
101 | system. | |
102 | * MDS now supports dropping its cache which concurrently asks clients | |
103 | to trim their caches. This is done using MDS admin socket ``cache drop`` | |
104 | command. | |
105 | * It is now possible to check the progress of an on-going scrub in the MDS. | |
106 | Additionally, a scrub may be paused or aborted. See :ref:`the scrub | |
107 | documentation <mds-scrub>` for more information. | |
108 | * A new interface for creating volumes is provided via the ``ceph volume`` | |
109 | command-line-interface. | |
110 | * A new cephfs-shell tool is available for manipulating a CephFS file | |
111 | system without mounting. | |
112 | * CephFS-related output from ``ceph status`` has been reformatted for brevity, | |
113 | clarity, and usefulness. | |
114 | * Lazy IO has been revamped. It can be turned on by the client using the new | |
115 | CEPH_O_LAZY flag to the ``ceph_open`` C/C++ API or via the config option | |
116 | ``client_force_lazyio``. | |
117 | * CephFS file system can now be brought down rapidly via the ``ceph fs fail`` | |
118 | command. See :ref:`the administration page <cephfs-administration>` for | |
119 | more information. | |
120 | ||
121 | - *RBD*: | |
122 | ||
123 | * Images can be live-migrated with minimal downtime to assist with moving | |
124 | images between pools or to new layouts. | |
125 | * New ``rbd perf image iotop`` and ``rbd perf image iostat`` commands provide | |
126 | an iotop- and iostat-like IO monitor for all RBD images. | |
127 | * The *ceph-mgr* Prometheus exporter now optionally includes an IO monitor | |
128 | for all RBD images. | |
129 | * Support for separate image namespaces within a pool for tenant isolation. | |
130 | ||
131 | - *Misc*: | |
132 | ||
133 | * Ceph has a new set of :ref:`orchestrator modules | |
134 | <orchestrator-cli-module>` to directly interact with external | |
135 | orchestrators like ceph-ansible, DeepSea, Rook, or simply ssh via | |
136 | a consistent CLI (and, eventually, Dashboard) interface. | |
137 | ||
138 | ||
139 | Upgrading from Mimic or Luminous | |
140 | -------------------------------- | |
141 | ||
142 | Notes | |
143 | ~~~~~ | |
144 | ||
145 | * During the upgrade from Luminous to nautilus, it will not be | |
146 | possible to create a new OSD using a Luminous ceph-osd daemon after | |
147 | the monitors have been upgraded to Nautilus. We recommend you avoid adding | |
148 | or replacing any OSDs while the upgrade is in process. | |
149 | ||
150 | * We recommend you avoid creating any RADOS pools while the upgrade is | |
151 | in process. | |
152 | ||
153 | * You can monitor the progress of your upgrade at each stage with the | |
154 | ``ceph versions`` command, which will tell you what ceph version(s) are | |
155 | running for each type of daemon. | |
156 | ||
157 | Instructions | |
158 | ~~~~~~~~~~~~ | |
159 | ||
160 | #. If your cluster was originally installed with a version prior to | |
161 | Luminous, ensure that it has completed at least one full scrub of | |
162 | all PGs while running Luminous. Failure to do so will cause your | |
163 | monitor daemons to refuse to join the quorum on start, leaving them | |
164 | non-functional. | |
165 | ||
166 | If you are unsure whether or not your Luminous cluster has | |
167 | completed a full scrub of all PGs, you can check your cluster's | |
168 | state by running:: | |
169 | ||
170 | # ceph osd dump | grep ^flags | |
171 | ||
172 | In order to be able to proceed to Nautilus, your OSD map must include | |
173 | the ``recovery_deletes`` and ``purged_snapdirs`` flags. | |
174 | ||
175 | If your OSD map does not contain both these flags, you can simply | |
176 | wait for approximately 24-48 hours, which in a standard cluster | |
177 | configuration should be ample time for all your placement groups to | |
178 | be scrubbed at least once, and then repeat the above process to | |
179 | recheck. | |
180 | ||
181 | However, if you have just completed an upgrade to Luminous and want | |
182 | to proceed to Mimic in short order, you can force a scrub on all | |
183 | placement groups with a one-line shell command, like:: | |
184 | ||
185 | # ceph pg dump pgs_brief | cut -d " " -f 1 | xargs -n1 ceph pg scrub | |
186 | ||
187 | You should take into consideration that this forced scrub may | |
188 | possibly have a negative impact on your Ceph clients' performance. | |
189 | ||
190 | #. Make sure your cluster is stable and healthy (no down or | |
191 | recovering OSDs). (Optional, but recommended.) | |
192 | ||
193 | #. Set the ``noout`` flag for the duration of the upgrade. (Optional, | |
194 | but recommended.):: | |
195 | ||
196 | # ceph osd set noout | |
197 | ||
198 | #. Upgrade monitors by installing the new packages and restarting the | |
199 | monitor daemons. For example,:: | |
200 | ||
201 | # systemctl restart ceph-mon.target | |
202 | ||
203 | Once all monitors are up, verify that the monitor upgrade is | |
204 | complete by looking for the ``nautilus`` string in the mon | |
205 | map. For example:: | |
206 | ||
207 | # ceph mon dump | grep min_mon_release | |
208 | ||
209 | should report:: | |
210 | ||
211 | min_mon_release 14 (nautilus) | |
212 | ||
213 | If it doesn't, that implies that one or more monitors hasn't been | |
214 | upgraded and restarted and the quorum is not complete. | |
215 | ||
216 | #. Upgrade ``ceph-mgr`` daemons by installing the new packages and | |
217 | restarting all manager daemons. For example,:: | |
218 | ||
219 | # systemctl restart ceph-mgr.target | |
220 | ||
221 | Please note, if you are using Ceph Dashboard, you will probably need to | |
222 | install ``ceph-mgr-dashboard`` separately after upgrading ``ceph-mgr`` | |
223 | package. The install script of ``ceph-mgr-dashboard`` will restart the | |
224 | manager daemons automatically for you. So in this case, you can just skip | |
225 | the step to restart the daemons. | |
226 | ||
227 | Verify the ``ceph-mgr`` daemons are running by checking ``ceph | |
228 | -s``:: | |
229 | ||
230 | # ceph -s | |
231 | ||
232 | ... | |
233 | services: | |
234 | mon: 3 daemons, quorum foo,bar,baz | |
235 | mgr: foo(active), standbys: bar, baz | |
236 | ... | |
237 | ||
238 | #. Upgrade all OSDs by installing the new packages and restarting the | |
239 | ceph-osd daemons on all hosts:: | |
240 | ||
241 | # systemctl restart ceph-osd.target | |
242 | ||
243 | You can monitor the progress of the OSD upgrades with the | |
244 | ``ceph versions`` or ``ceph osd versions`` command:: | |
245 | ||
246 | # ceph osd versions | |
247 | { | |
248 | "ceph version 13.2.5 (...) mimic (stable)": 12, | |
249 | "ceph version 14.2.0 (...) nautilus (stable)": 22, | |
250 | } | |
251 | ||
252 | #. If there are any OSDs in the cluster deployed with ceph-disk (e.g., | |
253 | almost any OSDs that were created before the Mimic release), you | |
254 | need to tell ceph-volume to adopt responsibility for starting the | |
255 | daemons. On each host containing OSDs, ensure the OSDs are | |
256 | currently running, and then:: | |
257 | ||
258 | # ceph-volume simple scan | |
259 | # ceph-volume simple activate --all | |
260 | ||
261 | We recommend that each OSD host be rebooted following this step to | |
262 | verify that the OSDs start up automatically. | |
263 | ||
264 | Note that ceph-volume doesn't have the same hot-plug capability | |
265 | that ceph-disk did, where a newly attached disk is automatically | |
266 | detected via udev events. If the OSD isn't currently running when the | |
267 | above ``scan`` command is run, or a ceph-disk-based OSD is moved to | |
268 | a new host, or the host OSD is reinstalled, or the | |
269 | ``/etc/ceph/osd`` directory is lost, you will need to scan the main | |
270 | data partition for each ceph-disk OSD explicitly. For example,:: | |
271 | ||
272 | # ceph-volume simple scan /dev/sdb1 | |
273 | ||
274 | The output will include the appopriate ``ceph-volume simple | |
275 | activate`` command to enable the OSD. | |
276 | ||
277 | #. Upgrade all CephFS MDS daemons. For each CephFS file system, | |
278 | ||
279 | #. Reduce the number of ranks to 1. (Make note of the original | |
280 | number of MDS daemons first if you plan to restore it later.):: | |
281 | ||
282 | # ceph status | |
283 | # ceph fs set <fs_name> max_mds 1 | |
284 | ||
285 | #. Wait for the cluster to deactivate any non-zero ranks by | |
286 | periodically checking the status:: | |
287 | ||
288 | # ceph status | |
289 | ||
290 | #. Take all standby MDS daemons offline on the appropriate hosts with:: | |
291 | ||
292 | # systemctl stop ceph-mds@<daemon_name> | |
293 | ||
294 | #. Confirm that only one MDS is online and is rank 0 for your FS:: | |
295 | ||
296 | # ceph status | |
297 | ||
298 | #. Upgrade the last remaining MDS daemon by installing the new | |
299 | packages and restarting the daemon:: | |
300 | ||
301 | # systemctl restart ceph-mds.target | |
302 | ||
303 | #. Restart all standby MDS daemons that were taken offline:: | |
304 | ||
305 | # systemctl start ceph-mds.target | |
306 | ||
307 | #. Restore the original value of ``max_mds`` for the volume:: | |
308 | ||
309 | # ceph fs set <fs_name> max_mds <original_max_mds> | |
310 | ||
311 | #. Upgrade all radosgw daemons by upgrading packages and restarting | |
312 | daemons on all hosts:: | |
313 | ||
314 | # systemctl restart radosgw.target | |
315 | ||
316 | #. Complete the upgrade by disallowing pre-Nautilus OSDs and enabling | |
317 | all new Nautilus-only functionality:: | |
318 | ||
319 | # ceph osd require-osd-release nautilus | |
320 | ||
321 | #. If you set ``noout`` at the beginning, be sure to clear it with:: | |
322 | ||
323 | # ceph osd unset noout | |
324 | ||
325 | #. Verify the cluster is healthy with ``ceph health``. | |
326 | ||
327 | #. To enable the new :ref:`v2 network protocol <msgr2>`, issue the | |
328 | following command:: | |
329 | ||
330 | ceph mon enable-msgr2 | |
331 | ||
332 | This will instruct all monitors that bind to the old default port | |
333 | 6789 for the legacy v1 protocol to also bind to the new 3300 v2 | |
334 | protocol port. To see if all monitors have been updated,:: | |
335 | ||
336 | ceph mon dump | |
337 | ||
338 | and verify that each monitor has both a ``v2:`` and ``v1:`` address | |
339 | listed. | |
340 | ||
341 | #. For each host that has been upgrade, you should update your | |
342 | ``ceph.conf`` file so that it references both the v2 and v1 | |
343 | addresses. Things will still work if only the v1 IP and port are | |
344 | listed, but each CLI instantiation or daemon will need to reconnect | |
345 | after learning the monitors real IPs, slowing things down a bit and | |
346 | preventing a full transition to the v2 protocol. | |
347 | ||
348 | This is also a good time to fully transition any config options in | |
349 | ceph.conf into the cluster's configuration database. On each host, | |
350 | you can use the following command to import any option into the | |
351 | monitors with:: | |
352 | ||
353 | ceph config assimilate-conf -i /etc/ceph/ceph.conf | |
354 | ||
355 | To create a minimal but sufficient ceph.conf for each host,:: | |
356 | ||
357 | ceph config generate-minimal-conf > /etc/ceph/ceph.conf.new | |
358 | mv /etc/ceph/ceph.conf.new /etc/ceph/ceph.conf | |
359 | ||
360 | Be sure to use this new config--and, specifically, the new syntax | |
361 | for the ``mon_host`` option that lists both ``v2:`` and ``v1:`` | |
362 | addresses in brackets--on hosts that have been upgraded to | |
363 | Nautilus, since pre-nautilus versions of Ceph to not understand the | |
364 | syntax. | |
365 | ||
366 | #. Consider enabling the :ref:`telemetry module <telemetry>` to send | |
367 | anonymized usage statistics and crash information to the Ceph | |
368 | upstream developers. To see what would be reported (without actually | |
369 | sending any information to anyone),:: | |
370 | ||
371 | ceph mgr module enable telemetry | |
372 | ceph telemetry show | |
373 | ||
374 | If you are comfortable with the data that is reported, you can opt-in to | |
375 | automatically report the high-level cluster metadata with:: | |
376 | ||
377 | ceph telemetry on | |
378 | ||
379 | Upgrading from pre-Luminous releases (like Jewel) | |
380 | ------------------------------------------------- | |
381 | ||
382 | You *must* first upgrade to Luminous (12.2.z) before attempting an | |
383 | upgrade to Nautilus. In addition, your cluster must have completed at | |
384 | least one scrub of all PGs while running Luminous, setting the | |
385 | ``recovery_deletes`` and ``purged_snapdirs`` flags in the OSD map. | |
386 | ||
387 | ||
388 | Upgrade compatibility notes | |
389 | --------------------------- | |
390 | ||
391 | These changes occurred between the Mimic and Nautilus releases. | |
392 | ||
393 | * ``ceph pg stat`` output has been modified in json | |
394 | format to match ``ceph df`` output: | |
395 | ||
396 | - "raw_bytes" field renamed to "total_bytes" | |
397 | - "raw_bytes_avail" field renamed to "total_bytes_avail" | |
398 | - "raw_bytes_avail" field renamed to "total_bytes_avail" | |
399 | - "raw_bytes_used" field renamed to "total_bytes_raw_used" | |
400 | - "total_bytes_used" field added to represent the space (accumulated over | |
401 | all OSDs) allocated purely for data objects kept at block(slow) device | |
402 | ||
403 | * ``ceph df [detail]`` output (GLOBAL section) has been modified in plain | |
404 | format: | |
405 | ||
406 | - new 'USED' column shows the space (accumulated over all OSDs) allocated | |
407 | purely for data objects kept at block(slow) device. | |
408 | - 'RAW USED' is now a sum of 'USED' space and space allocated/reserved at | |
409 | block device for Ceph purposes, e.g. BlueFS part for BlueStore. | |
410 | ||
411 | * ``ceph df [detail]`` output (GLOBAL section) has been modified in json | |
412 | format: | |
413 | ||
414 | - 'total_used_bytes' column now shows the space (accumulated over all OSDs) | |
415 | allocated purely for data objects kept at block(slow) device | |
416 | - new 'total_used_raw_bytes' column shows a sum of 'USED' space and space | |
417 | allocated/reserved at block device for Ceph purposes, e.g. BlueFS part for | |
418 | BlueStore. | |
419 | ||
420 | * ``ceph df [detail]`` output (POOLS section) has been modified in plain | |
421 | format: | |
422 | ||
423 | - 'BYTES USED' column renamed to 'STORED'. Represents amount of data | |
424 | stored by the user. | |
425 | - 'USED' column now represent amount of space allocated purely for data | |
426 | by all OSD nodes in KB. | |
427 | - 'QUOTA BYTES', 'QUOTA OBJECTS' aren't showed anymore in non-detailed mode. | |
428 | - new column 'USED COMPR' - amount of space allocated for compressed | |
429 | data. i.e., compressed data plus all the allocation, replication and erasure | |
430 | coding overhead. | |
431 | - new column 'UNDER COMPR' - amount of data passed through compression | |
432 | (summed over all replicas) and beneficial enough to be stored in a | |
433 | compressed form. | |
434 | - Some columns reordering | |
435 | ||
436 | * ``ceph df [detail]`` output (POOLS section) has been modified in json | |
437 | format: | |
438 | ||
439 | - 'bytes used' column renamed to 'stored'. Represents amount of data | |
440 | stored by the user. | |
441 | - 'raw bytes used' column renamed to "stored_raw". Totals of user data | |
442 | over all OSD excluding degraded. | |
443 | - new 'bytes_used' column now represent amount of space allocated by | |
444 | all OSD nodes. | |
445 | - 'kb_used' column - the same as 'bytes_used' but in KB. | |
446 | - new column 'compress_bytes_used' - amount of space allocated for compressed | |
447 | data. i.e., compressed data plus all the allocation, replication and erasure | |
448 | coding overhead. | |
449 | - new column 'compress_under_bytes' amount of data passed through compression | |
450 | (summed over all replicas) and beneficial enough to be stored in a | |
451 | compressed form. | |
452 | ||
453 | * ``rados df [detail]`` output (POOLS section) has been modified in plain | |
454 | format: | |
455 | ||
456 | - 'USED' column now shows the space (accumulated over all OSDs) allocated | |
457 | purely for data objects kept at block(slow) device. | |
458 | - new column 'USED COMPR' - amount of space allocated for compressed | |
459 | data. i.e., compressed data plus all the allocation, replication and erasure | |
460 | coding overhead. | |
461 | - new column 'UNDER COMPR' - amount of data passed through compression | |
462 | (summed over all replicas) and beneficial enough to be stored in a | |
463 | compressed form. | |
464 | ||
465 | * ``rados df [detail]`` output (POOLS section) has been modified in json | |
466 | format: | |
467 | ||
468 | - 'size_bytes' and 'size_kb' columns now show the space (accumulated | |
469 | over all OSDs) allocated purely for data objects kept at block | |
470 | device. | |
471 | - new column 'compress_bytes_used' - amount of space allocated for compressed | |
472 | data. i.e., compressed data plus all the allocation, replication and erasure | |
473 | coding overhead. | |
474 | - new column 'compress_under_bytes' amount of data passed through compression | |
475 | (summed over all replicas) and beneficial enough to be stored in a | |
476 | compressed form. | |
477 | ||
478 | * ``ceph pg dump`` output (totals section) has been modified in json | |
479 | format: | |
480 | ||
481 | - new 'USED' column shows the space (accumulated over all OSDs) allocated | |
482 | purely for data objects kept at block(slow) device. | |
483 | - 'USED_RAW' is now a sum of 'USED' space and space allocated/reserved at | |
484 | block device for Ceph purposes, e.g. BlueFS part for BlueStore. | |
485 | ||
486 | * The ``ceph osd rm`` command has been deprecated. Users should use | |
487 | ``ceph osd destroy`` or ``ceph osd purge`` (but after first confirming it is | |
488 | safe to do so via the ``ceph osd safe-to-destroy`` command). | |
489 | ||
490 | * The MDS now supports dropping its cache for the purposes of benchmarking.:: | |
491 | ||
492 | ceph tell mds.* cache drop <timeout> | |
493 | ||
494 | Note that the MDS cache is cooperatively managed by the clients. It is | |
495 | necessary for clients to give up capabilities in order for the MDS to fully | |
496 | drop its cache. This is accomplished by asking all clients to trim as many | |
497 | caps as possible. The timeout argument to the ``cache drop`` command controls | |
498 | how long the MDS waits for clients to complete trimming caps. This is optional | |
499 | and is 0 by default (no timeout). Keep in mind that clients may still retain | |
500 | caps to open files which will prevent the metadata for those files from being | |
501 | dropped by both the client and the MDS. (This is an equivalent scenario to | |
502 | dropping the Linux page/buffer/inode/dentry caches with some processes pinning | |
503 | some inodes/dentries/pages in cache.) | |
504 | ||
505 | * The ``mon_health_preluminous_compat`` and | |
506 | ``mon_health_preluminous_compat_warning`` config options are | |
507 | removed, as the related functionality is more than two versions old. | |
508 | Any legacy monitoring system expecting Jewel-style health output | |
509 | will need to be updated to work with Nautilus. | |
510 | ||
511 | * Nautilus is not supported on any distros still running upstart so upstart | |
512 | specific files and references have been removed. | |
513 | ||
514 | * The ``ceph pg <pgid> list_missing`` command has been renamed to | |
515 | ``ceph pg <pgid> list_unfound`` to better match its behaviour. | |
516 | ||
517 | * The *rbd-mirror* daemon can now retrieve remote peer cluster configuration | |
518 | secrets from the monitor. To use this feature, the rbd-mirror daemon | |
519 | CephX user for the local cluster must use the ``profile rbd-mirror`` mon cap. | |
520 | The secrets can be set using the ``rbd mirror pool peer add`` and | |
521 | ``rbd mirror pool peer set`` actions. | |
522 | ||
523 | * The 'rbd-mirror' daemon will now run in active/active mode by default, where | |
524 | mirrored images are evenly distributed between all active 'rbd-mirror' | |
525 | daemons. To revert to active/passive mode, override the | |
526 | 'rbd_mirror_image_policy_type' config key to 'none'. | |
527 | ||
528 | * The ``ceph mds deactivate`` is fully obsolete and references to it in the docs | |
529 | have been removed or clarified. | |
530 | ||
531 | * The libcephfs bindings added the ``ceph_select_filesystem`` function | |
532 | for use with multiple filesystems. | |
533 | ||
534 | * The cephfs python bindings now include ``mount_root`` and ``filesystem_name`` | |
535 | options in the mount() function. | |
536 | ||
537 | * erasure-code: add experimental *Coupled LAYer (CLAY)* erasure codes | |
538 | support. It features less network traffic and disk I/O when performing | |
539 | recovery. | |
540 | ||
541 | * The ``cache drop`` OSD command has been added to drop an OSD's caches: | |
542 | ||
543 | - ``ceph tell osd.x cache drop`` | |
544 | ||
545 | * The ``cache status`` OSD command has been added to get the cache stats of an | |
546 | OSD: | |
547 | ||
548 | - ``ceph tell osd.x cache status`` | |
549 | ||
550 | * The libcephfs added several functions that allow restarted client to destroy | |
551 | or reclaim state held by a previous incarnation. These functions are for NFS | |
552 | servers. | |
553 | ||
554 | * The ``ceph`` command line tool now accepts keyword arguments in | |
555 | the format ``--arg=value`` or ``--arg value``. | |
556 | ||
557 | * ``librados::IoCtx::nobjects_begin()`` and | |
558 | ``librados::NObjectIterator`` now communicate errors by throwing a | |
559 | ``std::system_error`` exception instead of ``std::runtime_error``. | |
560 | ||
561 | * The callback function passed to ``LibRGWFS.readdir()`` now accepts a ``flags`` | |
562 | parameter. it will be the last parameter passed to ``readdir()`` method. | |
563 | ||
564 | * The ``cephfs-data-scan scan_links`` now automatically repair inotables and | |
565 | snaptable. | |
566 | ||
567 | * Configuration values ``mon_warn_not_scrubbed`` and | |
568 | ``mon_warn_not_deep_scrubbed`` have been renamed. They are now | |
569 | ``mon_warn_pg_not_scrubbed_ratio`` and ``mon_warn_pg_not_deep_scrubbed_ratio`` | |
570 | respectively. This is to clarify that these warnings are related to | |
571 | pg scrubbing and are a ratio of the related interval. These options | |
572 | are now enabled by default. | |
573 | ||
574 | * The MDS cache trimming is now throttled. Dropping the MDS cache | |
575 | via the ``ceph tell mds.<foo> cache drop`` command or large reductions in the | |
576 | cache size will no longer cause service unavailability. | |
577 | ||
578 | * The CephFS MDS behavior with recalling caps has been significantly improved | |
579 | to not attempt recalling too many caps at once, leading to instability. | |
580 | MDS with a large cache (64GB+) should be more stable. | |
581 | ||
582 | * MDS now provides a config option ``mds_max_caps_per_client`` (default: 1M) to | |
583 | limit the number of caps a client session may hold. Long running client | |
584 | sessions with a large number of caps have been a source of instability in the | |
585 | MDS when all of these caps need to be processed during certain session | |
586 | events. It is recommended to not unnecessarily increase this value. | |
587 | ||
588 | * The MDS config ``mds_recall_state_timeout`` has been removed. Late | |
589 | client recall warnings are now generated based on the number of caps | |
590 | the MDS has recalled which have not been released. The new configs | |
591 | ``mds_recall_warning_threshold`` (default: 32K) and | |
592 | ``mds_recall_warning_decay_rate`` (default: 60s) sets the threshold | |
593 | for this warning. | |
594 | ||
595 | * The Telegraf module for the Manager allows for sending statistics to | |
596 | an Telegraf Agent over TCP, UDP or a UNIX Socket. Telegraf can then | |
597 | send the statistics to databases like InfluxDB, ElasticSearch, Graphite | |
598 | and many more. | |
599 | ||
600 | * The graylog fields naming the originator of a log event have | |
601 | changed: the string-form name is now included (e.g., ``"name": | |
602 | "mgr.foo"``), and the rank-form name is now in a nested section | |
603 | (e.g., ``"rank": {"type": "mgr", "num": 43243}``). | |
604 | ||
605 | * If the cluster log is directed at syslog, the entries are now | |
606 | prefixed by both the string-form name and the rank-form name (e.g., | |
607 | ``mgr.x mgr.12345 ...`` instead of just ``mgr.12345 ...``). | |
608 | ||
609 | * The JSON output of the ``ceph osd find`` command has replaced the ``ip`` | |
610 | field with an ``addrs`` section to reflect that OSDs may bind to | |
611 | multiple addresses. | |
612 | ||
613 | * CephFS clients without the 's' flag in their authentication capability | |
614 | string will no longer be able to create/delete snapshots. To allow | |
615 | ``client.foo`` to create/delete snapshots in the ``bar`` directory of | |
616 | filesystem ``cephfs_a``, use command: | |
617 | ||
618 | - ``ceph auth caps client.foo mon 'allow r' osd 'allow rw tag cephfs data=cephfs_a' mds 'allow rw, allow rws path=/bar'`` | |
619 | ||
620 | * The ``osd_heartbeat_addr`` option has been removed as it served no | |
621 | (good) purpose: the OSD should always check heartbeats on both the | |
622 | public and cluster networks. | |
623 | ||
624 | * The ``rados`` tool's ``mkpool`` and ``rmpool`` commands have been | |
625 | removed because they are redundant; please use the ``ceph osd pool | |
626 | create`` and ``ceph osd pool rm`` commands instead. | |
627 | ||
628 | * The ``auid`` property for cephx users and RADOS pools has been | |
629 | removed. This was an undocumented and partially implemented | |
630 | capability that allowed cephx users to map capabilities to RADOS | |
631 | pools that they "owned". Because there are no users we have removed | |
632 | this support. If any cephx capabilities exist in the cluster that | |
633 | restrict based on auid then they will no longer parse, and the | |
634 | cluster will report a health warning like:: | |
635 | ||
636 | AUTH_BAD_CAPS 1 auth entities have invalid capabilities | |
637 | client.bad osd capability parse failed, stopped at 'allow rwx auid 123' of 'allow rwx auid 123' | |
638 | ||
639 | The capability can be adjusted with the ``ceph auth caps`` | |
640 | command. For example,:: | |
641 | ||
642 | ceph auth caps client.bad osd 'allow rwx pool foo' | |
643 | ||
644 | * The ``ceph-kvstore-tool`` ``repair`` command has been renamed | |
645 | ``destructive-repair`` since we have discovered it can corrupt an | |
646 | otherwise healthy rocksdb database. It should be used only as a last-ditch | |
647 | attempt to recover data from an otherwise corrupted store. | |
648 | ||
649 | ||
650 | * The default memory utilization for the mons has been increased | |
651 | somewhat. Rocksdb now uses 512 MB of RAM by default, which should | |
652 | be sufficient for small to medium-sized clusters; large clusters | |
653 | should tune this up. Also, the ``mon_osd_cache_size`` has been | |
654 | increase from 10 OSDMaps to 500, which will translate to an | |
655 | additional 500 MB to 1 GB of RAM for large clusters, and much less | |
656 | for small clusters. | |
657 | ||
658 | * The ``mgr/balancer/max_misplaced`` option has been replaced by a new | |
659 | global ``target_max_misplaced_ratio`` option that throttles both | |
660 | balancer activity and automated adjustments to ``pgp_num`` (normally as a | |
661 | result of ``pg_num`` changes). If you have customized the balancer module | |
662 | option, you will need to adjust your config to set the new global option | |
663 | or revert to the default of .05 (5%). | |
664 | ||
665 | * By default, Ceph no longer issues a health warning when there are | |
666 | misplaced objects (objects that are fully replicated but not stored | |
667 | on the intended OSDs). You can reenable the old warning by setting | |
668 | ``mon_warn_on_misplaced`` to ``true``. | |
669 | ||
670 | * The ``ceph-create-keys`` tool is now obsolete. The monitors | |
671 | automatically create these keys on their own. For now the script | |
672 | prints a warning message and exits, but it will be removed in the | |
673 | next release. Note that ``ceph-create-keys`` would also write the | |
674 | admin and bootstrap keys to /etc/ceph and /var/lib/ceph, but this | |
675 | script no longer does that. Any deployment tools that relied on | |
676 | this behavior should instead make use of the ``ceph auth export | |
677 | <entity-name>`` command for whichever key(s) they need. | |
678 | ||
679 | * The ``mon_osd_pool_ec_fast_read`` option has been renamed | |
680 | ``osd_pool_default_ec_fast_read`` to be more consistent with other | |
681 | ``osd_pool_default_*`` options that affect default values for newly | |
682 | created RADOS pools. | |
683 | ||
684 | * The ``mon addr`` configuration option is now deprecated. It can | |
685 | still be used to specify an address for each monitor in the | |
686 | ``ceph.conf`` file, but it only affects cluster creation and | |
687 | bootstrapping, and it does not support listing multiple addresses | |
688 | (e.g., both a v2 and v1 protocol address). We strongly recommend | |
689 | the option be removed and instead a single ``mon host`` option be | |
690 | specified in the ``[global]`` section to allow daemons and clients | |
691 | to discover the monitors. | |
692 | ||
693 | * New command ``ceph fs fail`` has been added to quickly bring down a file | |
694 | system. This is a single command that unsets the joinable flag on the file | |
695 | system and brings down all of its ranks. | |
696 | ||
697 | * The ``cache drop`` admin socket command has been removed. The ``ceph | |
698 | tell mds.X cache drop`` remains. | |
699 | ||
700 | ||
701 | Detailed Changelog | |
702 | ------------------ |