]>
Commit | Line | Data |
---|---|---|
9f95a23c TL |
1 | ============== |
2 | Upgrading Ceph | |
3 | ============== | |
4 | ||
20effc67 TL |
5 | .. DANGER:: DATE: 01 NOV 2021. |
6 | ||
7 | DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION. | |
8 | ||
9 | A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause | |
10 | data corruption. This bug occurs during OMAP format conversion for | |
11 | clusters that are updated to Pacific. New clusters are not affected by this | |
12 | bug. | |
13 | ||
14 | The trigger for this bug is BlueStore's repair/quick-fix functionality. This | |
15 | bug can be triggered in two known ways: | |
16 | ||
17 | (1) manually via the ceph-bluestore-tool, or | |
18 | (2) automatically, by OSD if ``bluestore_fsck_quick_fix_on_mount`` is set | |
19 | to true. | |
20 | ||
21 | The fix for this bug is expected to be available in Ceph v16.2.7. | |
22 | ||
23 | DO NOT set ``bluestore_quick_fix_on_mount`` to true. If it is currently | |
24 | set to true in your configuration, immediately set it to false. | |
25 | ||
26 | DO NOT run ``ceph-bluestore-tool``'s repair/quick-fix commands. | |
27 | ||
b3b6e05e TL |
28 | Cephadm can safely upgrade Ceph from one bugfix release to the next. For |
29 | example, you can upgrade from v15.2.0 (the first Octopus release) to the next | |
30 | point release, v15.2.1. | |
9f95a23c TL |
31 | |
32 | The automated upgrade process follows Ceph best practices. For example: | |
33 | ||
34 | * The upgrade order starts with managers, monitors, then other daemons. | |
35 | * Each daemon is restarted only after Ceph indicates that the cluster | |
36 | will remain available. | |
37 | ||
522d829b TL |
38 | .. note:: |
39 | ||
40 | The Ceph cluster health status is likely to switch to | |
41 | ``HEALTH_WARNING`` during the upgrade. | |
42 | ||
43 | .. note:: | |
44 | ||
45 | In case a host of the cluster is offline, the upgrade is paused. | |
9f95a23c TL |
46 | |
47 | ||
48 | Starting the upgrade | |
49 | ==================== | |
50 | ||
522d829b | 51 | Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command: |
9f95a23c | 52 | |
b3b6e05e | 53 | .. prompt:: bash # |
9f95a23c | 54 | |
b3b6e05e TL |
55 | ceph -s |
56 | ||
522d829b | 57 | To upgrade (or downgrade) to a specific release, run the following command: |
9f95a23c | 58 | |
b3b6e05e | 59 | .. prompt:: bash # |
9f95a23c | 60 | |
b3b6e05e | 61 | ceph orch upgrade start --ceph-version <version> |
9f95a23c | 62 | |
a4b75251 | 63 | For example, to upgrade to v16.2.6, run the following command: |
9f95a23c | 64 | |
b3b6e05e TL |
65 | .. prompt:: bash # |
66 | ||
20effc67 | 67 | ceph orch upgrade start --ceph-version 16.2.6 |
9f95a23c | 68 | |
a4b75251 TL |
69 | .. note:: |
70 | ||
71 | From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry: | |
72 | ||
73 | .. prompt:: bash # | |
74 | ||
75 | ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6 | |
76 | ||
9f95a23c TL |
77 | |
78 | Monitoring the upgrade | |
79 | ====================== | |
80 | ||
b3b6e05e TL |
81 | Determine (1) whether an upgrade is in progress and (2) which version the |
82 | cluster is upgrading to by running the following command: | |
83 | ||
84 | .. prompt:: bash # | |
9f95a23c | 85 | |
b3b6e05e | 86 | ceph orch upgrade status |
9f95a23c | 87 | |
b3b6e05e TL |
88 | Watching the progress bar during a Ceph upgrade |
89 | ----------------------------------------------- | |
90 | ||
91 | During the upgrade, a progress bar is visible in the ceph status output. It | |
92 | looks like this: | |
93 | ||
94 | .. code-block:: console | |
9f95a23c TL |
95 | |
96 | # ceph -s | |
b3b6e05e | 97 | |
9f95a23c TL |
98 | [...] |
99 | progress: | |
100 | Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s) | |
101 | [=======.....................] (time remaining: 01h 43m 31s) | |
102 | ||
b3b6e05e TL |
103 | Watching the cephadm log during an upgrade |
104 | ------------------------------------------ | |
105 | ||
106 | Watch the cephadm log by running the following command: | |
9f95a23c | 107 | |
b3b6e05e TL |
108 | .. prompt:: bash # |
109 | ||
110 | ceph -W cephadm | |
9f95a23c TL |
111 | |
112 | ||
113 | Canceling an upgrade | |
114 | ==================== | |
115 | ||
522d829b | 116 | You can stop the upgrade process at any time by running the following command: |
b3b6e05e TL |
117 | |
118 | .. prompt:: bash # | |
9f95a23c | 119 | |
522d829b | 120 | ceph orch upgrade stop |
9f95a23c | 121 | |
2a845540 TL |
122 | Post upgrade actions |
123 | ==================== | |
124 | ||
125 | In case the new version is based on ``cephadm``, once done with the upgrade the user | |
126 | has to update the ``cephadm`` package (or ceph-common package in case the user | |
127 | doesn't use ``cephadm shell``) to a version compatible with the new version. | |
9f95a23c TL |
128 | |
129 | Potential problems | |
130 | ================== | |
131 | ||
132 | There are a few health alerts that can arise during the upgrade process. | |
133 | ||
134 | UPGRADE_NO_STANDBY_MGR | |
135 | ---------------------- | |
136 | ||
522d829b TL |
137 | This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an |
138 | active standby manager daemon. In order to proceed with the upgrade, Ceph | |
139 | requires an active standby manager daemon (which you can think of in this | |
140 | context as "a second manager"). | |
b3b6e05e | 141 | |
522d829b TL |
142 | You can ensure that Cephadm is configured to run 2 (or more) managers by |
143 | running the following command: | |
b3b6e05e TL |
144 | |
145 | .. prompt:: bash # | |
9f95a23c | 146 | |
b3b6e05e | 147 | ceph orch apply mgr 2 # or more |
9f95a23c | 148 | |
522d829b TL |
149 | You can check the status of existing mgr daemons by running the following |
150 | command: | |
9f95a23c | 151 | |
b3b6e05e | 152 | .. prompt:: bash # |
9f95a23c | 153 | |
b3b6e05e | 154 | ceph orch ps --daemon-type mgr |
9f95a23c | 155 | |
522d829b TL |
156 | If an existing mgr daemon has stopped, you can try to restart it by running the |
157 | following command: | |
9f95a23c | 158 | |
b3b6e05e TL |
159 | .. prompt:: bash # |
160 | ||
161 | ceph orch daemon restart <name> | |
9f95a23c TL |
162 | |
163 | UPGRADE_FAILED_PULL | |
164 | ------------------- | |
165 | ||
522d829b TL |
166 | This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the |
167 | container image for the target version. This can happen if you specify a | |
168 | version or container image that does not exist (e.g. "1.2.3"), or if the | |
169 | container registry can not be reached by one or more hosts in the cluster. | |
9f95a23c | 170 | |
522d829b TL |
171 | To cancel the existing upgrade and to specify a different target version, run |
172 | the following commands: | |
9f95a23c | 173 | |
b3b6e05e TL |
174 | .. prompt:: bash # |
175 | ||
176 | ceph orch upgrade stop | |
177 | ceph orch upgrade start --ceph-version <version> | |
9f95a23c TL |
178 | |
179 | ||
180 | Using customized container images | |
181 | ================================= | |
182 | ||
b3b6e05e TL |
183 | For most users, upgrading requires nothing more complicated than specifying the |
184 | Ceph version number to upgrade to. In such cases, cephadm locates the specific | |
185 | Ceph container image to use by combining the ``container_image_base`` | |
186 | configuration option (default: ``docker.io/ceph/ceph``) with a tag of | |
187 | ``vX.Y.Z``. | |
188 | ||
189 | But it is possible to upgrade to an arbitrary container image, if that's what | |
190 | you need. For example, the following command upgrades to a development build: | |
9f95a23c | 191 | |
b3b6e05e | 192 | .. prompt:: bash # |
9f95a23c | 193 | |
b3b6e05e | 194 | ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name |
9f95a23c TL |
195 | |
196 | For more information about available container images, see :ref:`containers`. | |
33c7a0ef TL |
197 | |
198 | Staggered Upgrade | |
199 | ================= | |
200 | ||
201 | Some users may prefer to upgrade components in phases rather than all at once. | |
202 | The upgrade command, starting in 16.2.10 and 17.2.1 allows parameters | |
203 | to limit which daemons are upgraded by a single upgrade command. The options in | |
204 | include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types`` | |
205 | takes a comma-separated list of daemon types and will only upgrade daemons of those | |
206 | types. ``services`` is mutually exclusive with ``daemon_types``, only takes services | |
207 | of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and | |
208 | will only upgrade daemons belonging to those services. ``hosts`` can be combined | |
209 | with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter | |
210 | follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`. | |
211 | ``limit`` takes an integer > 0 and provides a numerical limit on the number of | |
212 | daemons cephadm will upgrade. ``limit`` can be combined with any of the other | |
213 | parameters. For example, if you specify to upgrade daemons of type osd on host | |
214 | Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on | |
215 | Host1. | |
216 | ||
217 | Example: specifying daemon types and hosts: | |
218 | ||
219 | .. prompt:: bash # | |
220 | ||
221 | ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2 | |
222 | ||
223 | Example: specifying services and using limit: | |
224 | ||
225 | .. prompt:: bash # | |
226 | ||
227 | ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2 | |
228 | ||
229 | .. note:: | |
230 | ||
231 | Cephadm strictly enforces an order to the upgrade of daemons that is still present | |
232 | in staggered upgrade scenarios. The current upgrade ordering is | |
233 | ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``. | |
234 | If you specify parameters that would upgrade daemons out of order, the upgrade | |
235 | command will block and note which daemons will be missed if you proceed. | |
236 | ||
237 | .. note:: | |
238 | ||
239 | Upgrade commands with limiting parameters will validate the options before beginning the | |
240 | upgrade, which may require pulling the new container image. Do not be surprised | |
241 | if the upgrade start command takes a while to return when limiting parameters are provided. | |
242 | ||
243 | .. note:: | |
244 | ||
245 | In staggered upgrade scenarios (when a limiting parameter is provided) monitoring | |
246 | stack daemons including Prometheus and node-exporter are refreshed after the Manager | |
247 | daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer | |
248 | than expected. Note that the versions of monitoring stack daemons may not change between | |
249 | Ceph releases, in which case they are only redeployed. | |
250 | ||
251 | Upgrading to a version that supports staggered upgrade from one that doesn't | |
252 | ---------------------------------------------------------------------------- | |
253 | ||
254 | While upgrading from a version that already supports staggered upgrades the process | |
255 | simply requires providing the necessary arguments. However, if you wish to upgrade | |
256 | to a version that supports staggered upgrade from one that does not, there is a | |
257 | workaround. It requires first manually upgrading the Manager daemons and then passing | |
258 | the limiting parameters as usual. | |
259 | ||
260 | .. warning:: | |
261 | Make sure you have multiple running mgr daemons before attempting this procedure. | |
262 | ||
263 | To start with, determine which Manager is your active one and which are standby. This | |
264 | can be done in a variety of ways such as looking at the ``ceph -s`` output. Then, | |
265 | manually upgrade each standby mgr daemon with: | |
266 | ||
267 | .. prompt:: bash # | |
268 | ||
269 | ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name> | |
270 | ||
271 | .. note:: | |
272 | ||
273 | If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy`` | |
274 | command may not have the ``--image`` flag. In that case, you must manually set the | |
275 | Manager container image ``ceph config set mgr container_image <new-image-name>`` and then | |
276 | redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef`` | |
277 | ||
278 | At this point, a Manager fail over should allow us to have the active Manager be one | |
279 | running the new version. | |
280 | ||
281 | .. prompt:: bash # | |
282 | ||
283 | ceph mgr fail | |
284 | ||
285 | Verify the active Manager is now one running the new version. To complete the Manager | |
286 | upgrading: | |
287 | ||
288 | .. prompt:: bash # | |
289 | ||
290 | ceph orch upgrade start --image <new-image-name> --daemon-types mgr | |
291 | ||
292 | You should now have all your Manager daemons on the new version and be able to | |
293 | specify the limiting parameters for the rest of the upgrade. |