]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/upgrade.rst
import ceph quincy 17.2.4
[ceph.git] / ceph / doc / cephadm / upgrade.rst
CommitLineData
9f95a23c
TL
1==============
2Upgrading Ceph
3==============
4
20effc67
TL
5.. DANGER:: DATE: 01 NOV 2021.
6
7 DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION.
8
9 A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause
10 data corruption. This bug occurs during OMAP format conversion for
11 clusters that are updated to Pacific. New clusters are not affected by this
12 bug.
13
14 The trigger for this bug is BlueStore's repair/quick-fix functionality. This
15 bug can be triggered in two known ways:
16
17 (1) manually via the ceph-bluestore-tool, or
18 (2) automatically, by OSD if ``bluestore_fsck_quick_fix_on_mount`` is set
19 to true.
20
21 The fix for this bug is expected to be available in Ceph v16.2.7.
22
23 DO NOT set ``bluestore_quick_fix_on_mount`` to true. If it is currently
24 set to true in your configuration, immediately set it to false.
25
26 DO NOT run ``ceph-bluestore-tool``'s repair/quick-fix commands.
27
b3b6e05e
TL
28Cephadm can safely upgrade Ceph from one bugfix release to the next. For
29example, you can upgrade from v15.2.0 (the first Octopus release) to the next
30point release, v15.2.1.
9f95a23c
TL
31
32The automated upgrade process follows Ceph best practices. For example:
33
34* The upgrade order starts with managers, monitors, then other daemons.
35* Each daemon is restarted only after Ceph indicates that the cluster
36 will remain available.
37
522d829b
TL
38.. note::
39
40 The Ceph cluster health status is likely to switch to
41 ``HEALTH_WARNING`` during the upgrade.
42
43.. note::
44
45 In case a host of the cluster is offline, the upgrade is paused.
9f95a23c
TL
46
47
48Starting the upgrade
49====================
50
522d829b 51Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command:
9f95a23c 52
b3b6e05e 53.. prompt:: bash #
9f95a23c 54
b3b6e05e
TL
55 ceph -s
56
522d829b 57To upgrade (or downgrade) to a specific release, run the following command:
9f95a23c 58
b3b6e05e 59.. prompt:: bash #
9f95a23c 60
b3b6e05e 61 ceph orch upgrade start --ceph-version <version>
9f95a23c 62
a4b75251 63For example, to upgrade to v16.2.6, run the following command:
9f95a23c 64
b3b6e05e
TL
65.. prompt:: bash #
66
20effc67 67 ceph orch upgrade start --ceph-version 16.2.6
9f95a23c 68
a4b75251
TL
69.. note::
70
71 From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry:
72
73.. prompt:: bash #
74
75 ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6
76
9f95a23c
TL
77
78Monitoring the upgrade
79======================
80
b3b6e05e
TL
81Determine (1) whether an upgrade is in progress and (2) which version the
82cluster is upgrading to by running the following command:
83
84.. prompt:: bash #
9f95a23c 85
b3b6e05e 86 ceph orch upgrade status
9f95a23c 87
b3b6e05e
TL
88Watching the progress bar during a Ceph upgrade
89-----------------------------------------------
90
91During the upgrade, a progress bar is visible in the ceph status output. It
92looks like this:
93
94.. code-block:: console
9f95a23c
TL
95
96 # ceph -s
b3b6e05e 97
9f95a23c
TL
98 [...]
99 progress:
100 Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s)
101 [=======.....................] (time remaining: 01h 43m 31s)
102
b3b6e05e
TL
103Watching the cephadm log during an upgrade
104------------------------------------------
105
106Watch the cephadm log by running the following command:
9f95a23c 107
b3b6e05e
TL
108.. prompt:: bash #
109
110 ceph -W cephadm
9f95a23c
TL
111
112
113Canceling an upgrade
114====================
115
522d829b 116You can stop the upgrade process at any time by running the following command:
b3b6e05e
TL
117
118.. prompt:: bash #
9f95a23c 119
522d829b 120 ceph orch upgrade stop
9f95a23c 121
2a845540
TL
122Post upgrade actions
123====================
124
125In case the new version is based on ``cephadm``, once done with the upgrade the user
126has to update the ``cephadm`` package (or ceph-common package in case the user
127doesn't use ``cephadm shell``) to a version compatible with the new version.
9f95a23c
TL
128
129Potential problems
130==================
131
132There are a few health alerts that can arise during the upgrade process.
133
134UPGRADE_NO_STANDBY_MGR
135----------------------
136
522d829b
TL
137This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
138active standby manager daemon. In order to proceed with the upgrade, Ceph
139requires an active standby manager daemon (which you can think of in this
140context as "a second manager").
b3b6e05e 141
522d829b
TL
142You can ensure that Cephadm is configured to run 2 (or more) managers by
143running the following command:
b3b6e05e
TL
144
145.. prompt:: bash #
9f95a23c 146
b3b6e05e 147 ceph orch apply mgr 2 # or more
9f95a23c 148
522d829b
TL
149You can check the status of existing mgr daemons by running the following
150command:
9f95a23c 151
b3b6e05e 152.. prompt:: bash #
9f95a23c 153
b3b6e05e 154 ceph orch ps --daemon-type mgr
9f95a23c 155
522d829b
TL
156If an existing mgr daemon has stopped, you can try to restart it by running the
157following command:
9f95a23c 158
b3b6e05e
TL
159.. prompt:: bash #
160
161 ceph orch daemon restart <name>
9f95a23c
TL
162
163UPGRADE_FAILED_PULL
164-------------------
165
522d829b
TL
166This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the
167container image for the target version. This can happen if you specify a
168version or container image that does not exist (e.g. "1.2.3"), or if the
169container registry can not be reached by one or more hosts in the cluster.
9f95a23c 170
522d829b
TL
171To cancel the existing upgrade and to specify a different target version, run
172the following commands:
9f95a23c 173
b3b6e05e
TL
174.. prompt:: bash #
175
176 ceph orch upgrade stop
177 ceph orch upgrade start --ceph-version <version>
9f95a23c
TL
178
179
180Using customized container images
181=================================
182
b3b6e05e
TL
183For most users, upgrading requires nothing more complicated than specifying the
184Ceph version number to upgrade to. In such cases, cephadm locates the specific
185Ceph container image to use by combining the ``container_image_base``
186configuration option (default: ``docker.io/ceph/ceph``) with a tag of
187``vX.Y.Z``.
188
189But it is possible to upgrade to an arbitrary container image, if that's what
190you need. For example, the following command upgrades to a development build:
9f95a23c 191
b3b6e05e 192.. prompt:: bash #
9f95a23c 193
b3b6e05e 194 ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name
9f95a23c
TL
195
196For more information about available container images, see :ref:`containers`.
33c7a0ef
TL
197
198Staggered Upgrade
199=================
200
201Some users may prefer to upgrade components in phases rather than all at once.
202The upgrade command, starting in 16.2.10 and 17.2.1 allows parameters
203to limit which daemons are upgraded by a single upgrade command. The options in
204include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types``
205takes a comma-separated list of daemon types and will only upgrade daemons of those
206types. ``services`` is mutually exclusive with ``daemon_types``, only takes services
207of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and
208will only upgrade daemons belonging to those services. ``hosts`` can be combined
209with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter
210follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`.
211``limit`` takes an integer > 0 and provides a numerical limit on the number of
212daemons cephadm will upgrade. ``limit`` can be combined with any of the other
213parameters. For example, if you specify to upgrade daemons of type osd on host
214Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on
215Host1.
216
217Example: specifying daemon types and hosts:
218
219.. prompt:: bash #
220
221 ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2
222
223Example: specifying services and using limit:
224
225.. prompt:: bash #
226
227 ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2
228
229.. note::
230
231 Cephadm strictly enforces an order to the upgrade of daemons that is still present
232 in staggered upgrade scenarios. The current upgrade ordering is
233 ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``.
234 If you specify parameters that would upgrade daemons out of order, the upgrade
235 command will block and note which daemons will be missed if you proceed.
236
237.. note::
238
239 Upgrade commands with limiting parameters will validate the options before beginning the
240 upgrade, which may require pulling the new container image. Do not be surprised
241 if the upgrade start command takes a while to return when limiting parameters are provided.
242
243.. note::
244
245 In staggered upgrade scenarios (when a limiting parameter is provided) monitoring
246 stack daemons including Prometheus and node-exporter are refreshed after the Manager
247 daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer
248 than expected. Note that the versions of monitoring stack daemons may not change between
249 Ceph releases, in which case they are only redeployed.
250
251Upgrading to a version that supports staggered upgrade from one that doesn't
252----------------------------------------------------------------------------
253
254While upgrading from a version that already supports staggered upgrades the process
255simply requires providing the necessary arguments. However, if you wish to upgrade
256to a version that supports staggered upgrade from one that does not, there is a
257workaround. It requires first manually upgrading the Manager daemons and then passing
258the limiting parameters as usual.
259
260.. warning::
261 Make sure you have multiple running mgr daemons before attempting this procedure.
262
263To start with, determine which Manager is your active one and which are standby. This
264can be done in a variety of ways such as looking at the ``ceph -s`` output. Then,
265manually upgrade each standby mgr daemon with:
266
267.. prompt:: bash #
268
269 ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name>
270
271.. note::
272
273 If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy``
274 command may not have the ``--image`` flag. In that case, you must manually set the
275 Manager container image ``ceph config set mgr container_image <new-image-name>`` and then
276 redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef``
277
278At this point, a Manager fail over should allow us to have the active Manager be one
279running the new version.
280
281.. prompt:: bash #
282
283 ceph mgr fail
284
285Verify the active Manager is now one running the new version. To complete the Manager
286upgrading:
287
288.. prompt:: bash #
289
290 ceph orch upgrade start --image <new-image-name> --daemon-types mgr
291
292You should now have all your Manager daemons on the new version and be able to
293specify the limiting parameters for the rest of the upgrade.