ceph/doc/cephadm/upgrade.rst

   1 ==============
   2 Upgrading Ceph
   3 ==============
   4
   5 Cephadm can safely upgrade Ceph from one bugfix release to the next.  For
   6 example, you can upgrade from v15.2.0 (the first Octopus release) to the next
   7 point release, v15.2.1.
   8
   9 The automated upgrade process follows Ceph best practices.  For example:
  10
  11 * The upgrade order starts with managers, monitors, then other daemons.
  12 * Each daemon is restarted only after Ceph indicates that the cluster
  13   will remain available.
  14
  15 .. note::
  16
  17    The Ceph cluster health status is likely to switch to
  18    ``HEALTH_WARNING`` during the upgrade.
  19
  20 .. note::
  21
  22    In case a host of the cluster is offline, the upgrade is paused.
  23
  24
  25 Starting the upgrade
  26 ====================
  27
  28 .. note::
  29    .. note::
  30       `Staggered Upgrade`_ of the mons/mgrs may be necessary to have access
  31       to this new feature.
  32
  33    Cephadm by default reduces `max_mds` to `1`. This can be disruptive for large
  34    scale CephFS deployments because the cluster cannot quickly reduce active MDS(s)
  35    to `1` and a single active MDS cannot easily handle the load of all clients
  36    even for a short time. Therefore, to upgrade MDS(s) without reducing `max_mds`,
  37    the `fail_fs` option can to be set to `true` (default value is `false`) prior
  38    to initiating the upgrade:
  39
  40    .. prompt:: bash #
  41
  42       ceph config set mgr mgr/orchestrator/fail_fs true
  43
  44    This would:
  45                #. Fail CephFS filesystems, bringing active MDS daemon(s) to
  46                   `up:standby` state.
  47
  48                #. Upgrade MDS daemons safely.
  49
  50                #. Bring CephFS filesystems back up, bringing the state of active
  51                   MDS daemon(s) from `up:standby` to `up:active`.
  52
  53 Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command:
  54
  55 .. prompt:: bash #
  56
  57    ceph -s
  58
  59 To upgrade (or downgrade) to a specific release, run the following command:
  60
  61 .. prompt:: bash #
  62
  63   ceph orch upgrade start --ceph-version <version>
  64
  65 For example, to upgrade to v16.2.6, run the following command:
  66
  67 .. prompt:: bash #
  68
  69   ceph orch upgrade start --ceph-version 16.2.6
  70
  71 .. note::
  72
  73     From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry:
  74
  75 .. prompt:: bash #
  76
  77   ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6
  78
  79
  80 Monitoring the upgrade
  81 ======================
  82
  83 Determine (1) whether an upgrade is in progress and (2) which version the
  84 cluster is upgrading to by running the following command:
  85
  86 .. prompt:: bash #
  87
  88   ceph orch upgrade status
  89
  90 Watching the progress bar during a Ceph upgrade
  91 -----------------------------------------------
  92
  93 During the upgrade, a progress bar is visible in the ceph status output. It
  94 looks like this:
  95
  96 .. code-block:: console
  97
  98   # ceph -s
  99
 100   [...]
 101     progress:
 102       Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s)
 103         [=======.....................] (time remaining: 01h 43m 31s)
 104
 105 Watching the cephadm log during an upgrade
 106 ------------------------------------------
 107
 108 Watch the cephadm log by running the following command:
 109
 110 .. prompt:: bash #
 111
 112   ceph -W cephadm
 113
 114
 115 Canceling an upgrade
 116 ====================
 117
 118 You can stop the upgrade process at any time by running the following command:
 119
 120 .. prompt:: bash #
 121
 122   ceph orch upgrade stop
 123
 124 Post upgrade actions
 125 ====================
 126
 127 In case the new version is based on ``cephadm``, once done with the upgrade the user
 128 has to update the ``cephadm`` package (or ceph-common package in case the user
 129 doesn't use ``cephadm shell``) to a version compatible with the new version.
 130
 131 Potential problems
 132 ==================
 133
 134 There are a few health alerts that can arise during the upgrade process.
 135
 136 UPGRADE_NO_STANDBY_MGR
 137 ----------------------
 138
 139 This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
 140 active standby manager daemon. In order to proceed with the upgrade, Ceph
 141 requires an active standby manager daemon (which you can think of in this
 142 context as "a second manager").
 143
 144 You can ensure that Cephadm is configured to run 2 (or more) managers by
 145 running the following command:
 146
 147 .. prompt:: bash #
 148
 149   ceph orch apply mgr 2  # or more
 150
 151 You can check the status of existing mgr daemons by running the following
 152 command:
 153
 154 .. prompt:: bash #
 155
 156   ceph orch ps --daemon-type mgr
 157
 158 If an existing mgr daemon has stopped, you can try to restart it by running the
 159 following command:
 160
 161 .. prompt:: bash #
 162
 163   ceph orch daemon restart <name>
 164
 165 UPGRADE_FAILED_PULL
 166 -------------------
 167
 168 This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the
 169 container image for the target version. This can happen if you specify a
 170 version or container image that does not exist (e.g. "1.2.3"), or if the
 171 container registry can not be reached by one or more hosts in the cluster.
 172
 173 To cancel the existing upgrade and to specify a different target version, run
 174 the following commands:
 175
 176 .. prompt:: bash #
 177
 178   ceph orch upgrade stop
 179   ceph orch upgrade start --ceph-version <version>
 180
 181
 182 Using customized container images
 183 =================================
 184
 185 For most users, upgrading requires nothing more complicated than specifying the
 186 Ceph version number to upgrade to.  In such cases, cephadm locates the specific
 187 Ceph container image to use by combining the ``container_image_base``
 188 configuration option (default: ``docker.io/ceph/ceph``) with a tag of
 189 ``vX.Y.Z``.
 190
 191 But it is possible to upgrade to an arbitrary container image, if that's what
 192 you need. For example, the following command upgrades to a development build:
 193
 194 .. prompt:: bash #
 195
 196   ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name
 197
 198 For more information about available container images, see :ref:`containers`.
 199
 200 Staggered Upgrade
 201 =================
 202
 203 Some users may prefer to upgrade components in phases rather than all at once.
 204 The upgrade command, starting in 16.2.11 and 17.2.1 allows parameters
 205 to limit which daemons are upgraded by a single upgrade command. The options in
 206 include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types``
 207 takes a comma-separated list of daemon types and will only upgrade daemons of those
 208 types. ``services`` is mutually exclusive with ``daemon_types``, only takes services
 209 of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and
 210 will only upgrade daemons belonging to those services. ``hosts`` can be combined
 211 with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter
 212 follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`.
 213 ``limit`` takes an integer > 0 and provides a numerical limit on the number of
 214 daemons cephadm will upgrade. ``limit`` can be combined with any of the other
 215 parameters. For example, if you specify to upgrade daemons of type osd on host
 216 Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on
 217 Host1.
 218
 219 Example: specifying daemon types and hosts:
 220
 221 .. prompt:: bash #
 222
 223   ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2
 224
 225 Example: specifying services and using limit:
 226
 227 .. prompt:: bash #
 228
 229   ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2
 230
 231 .. note::
 232
 233    Cephadm strictly enforces an order to the upgrade of daemons that is still present
 234    in staggered upgrade scenarios. The current upgrade ordering is
 235    ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``.
 236    If you specify parameters that would upgrade daemons out of order, the upgrade
 237    command will block and note which daemons will be missed if you proceed.
 238
 239 .. note::
 240
 241   Upgrade commands with limiting parameters will validate the options before beginning the
 242   upgrade, which may require pulling the new container image. Do not be surprised
 243   if the upgrade start command takes a while to return when limiting parameters are provided.
 244
 245 .. note::
 246
 247    In staggered upgrade scenarios (when a limiting parameter is provided) monitoring
 248    stack daemons including Prometheus and node-exporter are refreshed after the Manager
 249    daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer
 250    than expected. Note that the versions of monitoring stack daemons may not change between
 251    Ceph releases, in which case they are only redeployed.
 252
 253 Upgrading to a version that supports staggered upgrade from one that doesn't
 254 ----------------------------------------------------------------------------
 255
 256 While upgrading from a version that already supports staggered upgrades the process
 257 simply requires providing the necessary arguments. However, if you wish to upgrade
 258 to a version that supports staggered upgrade from one that does not, there is a
 259 workaround. It requires first manually upgrading the Manager daemons and then passing
 260 the limiting parameters as usual.
 261
 262 .. warning::
 263   Make sure you have multiple running mgr daemons before attempting this procedure.
 264
 265 To start with, determine which Manager is your active one and which are standby. This
 266 can be done in a variety of ways such as looking at the ``ceph -s`` output. Then,
 267 manually upgrade each standby mgr daemon with:
 268
 269 .. prompt:: bash #
 270
 271   ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name>
 272
 273 .. note::
 274
 275    If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy``
 276    command may not have the ``--image`` flag. In that case, you must manually set the
 277    Manager container image ``ceph config set mgr container_image <new-image-name>`` and then
 278    redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef``
 279
 280 At this point, a Manager fail over should allow us to have the active Manager be one
 281 running the new version.
 282
 283 .. prompt:: bash #
 284
 285   ceph mgr fail
 286
 287 Verify the active Manager is now one running the new version. To complete the Manager
 288 upgrading:
 289
 290 .. prompt:: bash #
 291
 292   ceph orch upgrade start --image <new-image-name> --daemon-types mgr
 293
 294 You should now have all your Manager daemons on the new version and be able to
 295 specify the limiting parameters for the rest of the upgrade.