ceph/doc/cephadm/upgrade.rst

   1 ==============
   2 Upgrading Ceph
   3 ==============
   4
   5 .. DANGER:: DATE: 01 NOV 2021.
   6
   7    DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION.
   8
   9    A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause
  10    data corruption. This bug occurs during OMAP format conversion for
  11    clusters that are updated to Pacific. New clusters are not affected by this
  12    bug.
  13
  14    The trigger for this bug is BlueStore's repair/quick-fix functionality. This
  15    bug can be triggered in two known ways:
  16
  17     (1) manually via the ceph-bluestore-tool, or
  18     (2) automatically, by OSD if ``bluestore_fsck_quick_fix_on_mount`` is set
  19         to true.
  20
  21    The fix for this bug is expected to be available in Ceph v16.2.7.
  22
  23    DO NOT set ``bluestore_quick_fix_on_mount`` to true. If it is currently
  24    set to true in your configuration, immediately set it to false.
  25
  26    DO NOT run ``ceph-bluestore-tool``'s repair/quick-fix commands.
  27
  28 Cephadm can safely upgrade Ceph from one bugfix release to the next.  For
  29 example, you can upgrade from v15.2.0 (the first Octopus release) to the next
  30 point release, v15.2.1.
  31
  32 The automated upgrade process follows Ceph best practices.  For example:
  33
  34 * The upgrade order starts with managers, monitors, then other daemons.
  35 * Each daemon is restarted only after Ceph indicates that the cluster
  36   will remain available.
  37
  38 .. note::
  39
  40    The Ceph cluster health status is likely to switch to
  41    ``HEALTH_WARNING`` during the upgrade.
  42
  43 .. note::
  44
  45    In case a host of the cluster is offline, the upgrade is paused.
  46
  47
  48 Starting the upgrade
  49 ====================
  50
  51 Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command:
  52
  53 .. prompt:: bash #
  54
  55    ceph -s
  56
  57 To upgrade (or downgrade) to a specific release, run the following command:
  58
  59 .. prompt:: bash #
  60
  61   ceph orch upgrade start --ceph-version <version>
  62
  63 For example, to upgrade to v16.2.6, run the following command:
  64
  65 .. prompt:: bash #
  66
  67   ceph orch upgrade start --ceph-version 16.2.6
  68
  69 .. note::
  70
  71     From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry:
  72
  73 .. prompt:: bash #
  74
  75   ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6
  76
  77
  78 Monitoring the upgrade
  79 ======================
  80
  81 Determine (1) whether an upgrade is in progress and (2) which version the
  82 cluster is upgrading to by running the following command:
  83
  84 .. prompt:: bash #
  85
  86   ceph orch upgrade status
  87
  88 Watching the progress bar during a Ceph upgrade
  89 -----------------------------------------------
  90
  91 During the upgrade, a progress bar is visible in the ceph status output. It
  92 looks like this:
  93
  94 .. code-block:: console
  95
  96   # ceph -s
  97
  98   [...]
  99     progress:
 100       Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s)
 101         [=======.....................] (time remaining: 01h 43m 31s)
 102
 103 Watching the cephadm log during an upgrade
 104 ------------------------------------------
 105
 106 Watch the cephadm log by running the following command:
 107
 108 .. prompt:: bash #
 109
 110   ceph -W cephadm
 111
 112
 113 Canceling an upgrade
 114 ====================
 115
 116 You can stop the upgrade process at any time by running the following command:
 117
 118 .. prompt:: bash #
 119
 120   ceph orch upgrade stop
 121
 122 Post upgrade actions
 123 ====================
 124
 125 In case the new version is based on ``cephadm``, once done with the upgrade the user
 126 has to update the ``cephadm`` package (or ceph-common package in case the user
 127 doesn't use ``cephadm shell``) to a version compatible with the new version.
 128
 129 Potential problems
 130 ==================
 131
 132 There are a few health alerts that can arise during the upgrade process.
 133
 134 UPGRADE_NO_STANDBY_MGR
 135 ----------------------
 136
 137 This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
 138 active standby manager daemon. In order to proceed with the upgrade, Ceph
 139 requires an active standby manager daemon (which you can think of in this
 140 context as "a second manager").
 141
 142 You can ensure that Cephadm is configured to run 2 (or more) managers by
 143 running the following command:
 144
 145 .. prompt:: bash #
 146
 147   ceph orch apply mgr 2  # or more
 148
 149 You can check the status of existing mgr daemons by running the following
 150 command:
 151
 152 .. prompt:: bash #
 153
 154   ceph orch ps --daemon-type mgr
 155
 156 If an existing mgr daemon has stopped, you can try to restart it by running the
 157 following command:
 158
 159 .. prompt:: bash #
 160
 161   ceph orch daemon restart <name>
 162
 163 UPGRADE_FAILED_PULL
 164 -------------------
 165
 166 This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the
 167 container image for the target version. This can happen if you specify a
 168 version or container image that does not exist (e.g. "1.2.3"), or if the
 169 container registry can not be reached by one or more hosts in the cluster.
 170
 171 To cancel the existing upgrade and to specify a different target version, run
 172 the following commands:
 173
 174 .. prompt:: bash #
 175
 176   ceph orch upgrade stop
 177   ceph orch upgrade start --ceph-version <version>
 178
 179
 180 Using customized container images
 181 =================================
 182
 183 For most users, upgrading requires nothing more complicated than specifying the
 184 Ceph version number to upgrade to.  In such cases, cephadm locates the specific
 185 Ceph container image to use by combining the ``container_image_base``
 186 configuration option (default: ``docker.io/ceph/ceph``) with a tag of
 187 ``vX.Y.Z``.
 188
 189 But it is possible to upgrade to an arbitrary container image, if that's what
 190 you need. For example, the following command upgrades to a development build:
 191
 192 .. prompt:: bash #
 193
 194   ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name
 195
 196 For more information about available container images, see :ref:`containers`.
 197
 198 Staggered Upgrade
 199 =================
 200
 201 Some users may prefer to upgrade components in phases rather than all at once.
 202 The upgrade command, starting in 16.2.10 and 17.2.1 allows parameters
 203 to limit which daemons are upgraded by a single upgrade command. The options in
 204 include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types``
 205 takes a comma-separated list of daemon types and will only upgrade daemons of those
 206 types. ``services`` is mutually exclusive with ``daemon_types``, only takes services
 207 of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and
 208 will only upgrade daemons belonging to those services. ``hosts`` can be combined
 209 with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter
 210 follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`.
 211 ``limit`` takes an integer > 0 and provides a numerical limit on the number of
 212 daemons cephadm will upgrade. ``limit`` can be combined with any of the other
 213 parameters. For example, if you specify to upgrade daemons of type osd on host
 214 Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on
 215 Host1.
 216
 217 Example: specifying daemon types and hosts:
 218
 219 .. prompt:: bash #
 220
 221   ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2
 222
 223 Example: specifying services and using limit:
 224
 225 .. prompt:: bash #
 226
 227   ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2
 228
 229 .. note::
 230
 231    Cephadm strictly enforces an order to the upgrade of daemons that is still present
 232    in staggered upgrade scenarios. The current upgrade ordering is
 233    ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``.
 234    If you specify parameters that would upgrade daemons out of order, the upgrade
 235    command will block and note which daemons will be missed if you proceed.
 236
 237 .. note::
 238
 239   Upgrade commands with limiting parameters will validate the options before beginning the
 240   upgrade, which may require pulling the new container image. Do not be surprised
 241   if the upgrade start command takes a while to return when limiting parameters are provided.
 242
 243 .. note::
 244
 245    In staggered upgrade scenarios (when a limiting parameter is provided) monitoring
 246    stack daemons including Prometheus and node-exporter are refreshed after the Manager
 247    daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer
 248    than expected. Note that the versions of monitoring stack daemons may not change between
 249    Ceph releases, in which case they are only redeployed.
 250
 251 Upgrading to a version that supports staggered upgrade from one that doesn't
 252 ----------------------------------------------------------------------------
 253
 254 While upgrading from a version that already supports staggered upgrades the process
 255 simply requires providing the necessary arguments. However, if you wish to upgrade
 256 to a version that supports staggered upgrade from one that does not, there is a
 257 workaround. It requires first manually upgrading the Manager daemons and then passing
 258 the limiting parameters as usual.
 259
 260 .. warning::
 261   Make sure you have multiple running mgr daemons before attempting this procedure.
 262
 263 To start with, determine which Manager is your active one and which are standby. This
 264 can be done in a variety of ways such as looking at the ``ceph -s`` output. Then,
 265 manually upgrade each standby mgr daemon with:
 266
 267 .. prompt:: bash #
 268
 269   ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name>
 270
 271 .. note::
 272
 273    If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy``
 274    command may not have the ``--image`` flag. In that case, you must manually set the
 275    Manager container image ``ceph config set mgr container_image <new-image-name>`` and then
 276    redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef``
 277
 278 At this point, a Manager fail over should allow us to have the active Manager be one
 279 running the new version.
 280
 281 .. prompt:: bash #
 282
 283   ceph mgr fail
 284
 285 Verify the active Manager is now one running the new version. To complete the Manager
 286 upgrading:
 287
 288 .. prompt:: bash #
 289
 290   ceph orch upgrade start --image <new-image-name> --daemon-types mgr
 291
 292 You should now have all your Manager daemons on the new version and be able to
 293 specify the limiting parameters for the rest of the upgrade.