[ceph.git] / ceph / doc / cephadm / upgrade.rst

==============
Upgrading Ceph
==============

.. DANGER:: DATE: 01 NOV 2021. 

   DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION.  

   A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause
   data corruption. This bug occurs during OMAP format conversion for
   clusters that are updated to Pacific. New clusters are not affected by this
   bug.

   The trigger for this bug is BlueStore's repair/quick-fix functionality. This
   bug can be triggered in two known ways: 

    (1) manually via the ceph-bluestore-tool, or 
    (2) automatically, by OSD if ``bluestore_fsck_quick_fix_on_mount`` is set 
        to true.

   The fix for this bug is expected to be available in Ceph v16.2.7.

   DO NOT set ``bluestore_quick_fix_on_mount`` to true. If it is currently
   set to true in your configuration, immediately set it to false.

   DO NOT run ``ceph-bluestore-tool``'s repair/quick-fix commands.

Cephadm can safely upgrade Ceph from one bugfix release to the next.  For
example, you can upgrade from v15.2.0 (the first Octopus release) to the next
point release, v15.2.1.

The automated upgrade process follows Ceph best practices.  For example:

* The upgrade order starts with managers, monitors, then other daemons.
* Each daemon is restarted only after Ceph indicates that the cluster
  will remain available.

.. note::

   The Ceph cluster health status is likely to switch to
   ``HEALTH_WARNING`` during the upgrade.

.. note:: 

   In case a host of the cluster is offline, the upgrade is paused.


Starting the upgrade
====================

Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command:

.. prompt:: bash #

   ceph -s

To upgrade (or downgrade) to a specific release, run the following command:

.. prompt:: bash #

  ceph orch upgrade start --ceph-version <version>

For example, to upgrade to v16.2.6, run the following command:

.. prompt:: bash #

  ceph orch upgrade start --ceph-version 16.2.6

.. note::

    From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry:

.. prompt:: bash #

  ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6


Monitoring the upgrade
======================

Determine (1) whether an upgrade is in progress and (2) which version the
cluster is upgrading to by running the following command:

.. prompt:: bash #

  ceph orch upgrade status

Watching the progress bar during a Ceph upgrade
-----------------------------------------------

During the upgrade, a progress bar is visible in the ceph status output. It
looks like this:

.. code-block:: console

  # ceph -s

  [...]
    progress:
      Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s)
        [=======.....................] (time remaining: 01h 43m 31s)

Watching the cephadm log during an upgrade
------------------------------------------

Watch the cephadm log by running the following command:

.. prompt:: bash #

  ceph -W cephadm


Canceling an upgrade
====================

You can stop the upgrade process at any time by running the following command:

.. prompt:: bash #

  ceph orch upgrade stop

Post upgrade actions
====================

In case the new version is based on ``cephadm``, once done with the upgrade the user
has to update the ``cephadm`` package (or ceph-common package in case the user
doesn't use ``cephadm shell``) to a version compatible with the new version.

Potential problems
==================

There are a few health alerts that can arise during the upgrade process.

UPGRADE_NO_STANDBY_MGR
----------------------

This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
active standby manager daemon. In order to proceed with the upgrade, Ceph
requires an active standby manager daemon (which you can think of in this
context as "a second manager").

You can ensure that Cephadm is configured to run 2 (or more) managers by
running the following command:

.. prompt:: bash #

  ceph orch apply mgr 2  # or more

You can check the status of existing mgr daemons by running the following
command:

.. prompt:: bash #

  ceph orch ps --daemon-type mgr

If an existing mgr daemon has stopped, you can try to restart it by running the
following command: 

.. prompt:: bash #

  ceph orch daemon restart <name>

UPGRADE_FAILED_PULL
-------------------

This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the
container image for the target version. This can happen if you specify a
version or container image that does not exist (e.g. "1.2.3"), or if the
container registry can not be reached by one or more hosts in the cluster.

To cancel the existing upgrade and to specify a different target version, run
the following commands: 

.. prompt:: bash #

  ceph orch upgrade stop
  ceph orch upgrade start --ceph-version <version>


Using customized container images
=================================

For most users, upgrading requires nothing more complicated than specifying the
Ceph version number to upgrade to.  In such cases, cephadm locates the specific
Ceph container image to use by combining the ``container_image_base``
configuration option (default: ``docker.io/ceph/ceph``) with a tag of
``vX.Y.Z``.

But it is possible to upgrade to an arbitrary container image, if that's what
you need. For example, the following command upgrades to a development build:

.. prompt:: bash #

  ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name

For more information about available container images, see :ref:`containers`.

Staggered Upgrade
=================

Some users may prefer to upgrade components in phases rather than all at once.
The upgrade command, starting in 16.2.10 and 17.2.1 allows parameters
to limit which daemons are upgraded by a single upgrade command. The options in
include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types``
takes a comma-separated list of daemon types and will only upgrade daemons of those
types. ``services`` is mutually exclusive with ``daemon_types``, only takes services
of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and
will only upgrade daemons belonging to those services. ``hosts`` can be combined
with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter
follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`.
``limit`` takes an integer > 0 and provides a numerical limit on the number of
daemons cephadm will upgrade. ``limit`` can be combined with any of the other
parameters. For example, if you specify to upgrade daemons of type osd on host
Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on
Host1.

Example: specifying daemon types and hosts:

.. prompt:: bash #

  ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2

Example: specifying services and using limit:

.. prompt:: bash #

  ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2

.. note::

   Cephadm strictly enforces an order to the upgrade of daemons that is still present
   in staggered upgrade scenarios. The current upgrade ordering is
   ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``.
   If you specify parameters that would upgrade daemons out of order, the upgrade
   command will block and note which daemons will be missed if you proceed.

.. note::

  Upgrade commands with limiting parameters will validate the options before beginning the
  upgrade, which may require pulling the new container image. Do not be surprised
  if the upgrade start command takes a while to return when limiting parameters are provided.

.. note::

   In staggered upgrade scenarios (when a limiting parameter is provided) monitoring
   stack daemons including Prometheus and node-exporter are refreshed after the Manager
   daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer
   than expected. Note that the versions of monitoring stack daemons may not change between
   Ceph releases, in which case they are only redeployed.

Upgrading to a version that supports staggered upgrade from one that doesn't
----------------------------------------------------------------------------

While upgrading from a version that already supports staggered upgrades the process
simply requires providing the necessary arguments. However, if you wish to upgrade
to a version that supports staggered upgrade from one that does not, there is a
workaround. It requires first manually upgrading the Manager daemons and then passing
the limiting parameters as usual.

.. warning::
  Make sure you have multiple running mgr daemons before attempting this procedure.

To start with, determine which Manager is your active one and which are standby. This
can be done in a variety of ways such as looking at the ``ceph -s`` output. Then,
manually upgrade each standby mgr daemon with:

.. prompt:: bash #

  ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name>

.. note::

   If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy``
   command may not have the ``--image`` flag. In that case, you must manually set the
   Manager container image ``ceph config set mgr container_image <new-image-name>`` and then
   redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef``

At this point, a Manager fail over should allow us to have the active Manager be one
running the new version.

.. prompt:: bash #

  ceph mgr fail

Verify the active Manager is now one running the new version. To complete the Manager
upgrading:

.. prompt:: bash #

  ceph orch upgrade start --image <new-image-name> --daemon-types mgr

You should now have all your Manager daemons on the new version and be able to
specify the limiting parameters for the rest of the upgrade.
Commit	Line	Data
9f95a23c TL	1	==============
	2	Upgrading Ceph
	3	==============
	4
20effc67 TL	5	.. DANGER:: DATE: 01 NOV 2021.
	6
	7	DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION.
	8
	9	A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause
	10	data corruption. This bug occurs during OMAP format conversion for
	11	clusters that are updated to Pacific. New clusters are not affected by this
	12	bug.
	13
	14	The trigger for this bug is BlueStore's repair/quick-fix functionality. This
	15	bug can be triggered in two known ways:
	16
	17	(1) manually via the ceph-bluestore-tool, or
	18	(2) automatically, by OSD if ``bluestore_fsck_quick_fix_on_mount`` is set
	19	to true.
	20
	21	The fix for this bug is expected to be available in Ceph v16.2.7.
	22
	23	DO NOT set ``bluestore_quick_fix_on_mount`` to true. If it is currently
	24	set to true in your configuration, immediately set it to false.
	25
	26	DO NOT run ``ceph-bluestore-tool``'s repair/quick-fix commands.
	27
b3b6e05e TL	28	Cephadm can safely upgrade Ceph from one bugfix release to the next. For
	29	example, you can upgrade from v15.2.0 (the first Octopus release) to the next
	30	point release, v15.2.1.
9f95a23c TL	31
	32	The automated upgrade process follows Ceph best practices. For example:
	33
	34	* The upgrade order starts with managers, monitors, then other daemons.
	35	* Each daemon is restarted only after Ceph indicates that the cluster
	36	will remain available.
	37
522d829b TL	38	.. note::
	39
	40	The Ceph cluster health status is likely to switch to
	41	``HEALTH_WARNING`` during the upgrade.
	42
	43	.. note::
	44
	45	In case a host of the cluster is offline, the upgrade is paused.
9f95a23c TL	46
	47
	48	Starting the upgrade
	49	====================
	50
522d829b	51	Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command:
9f95a23c	52
b3b6e05e	53	.. prompt:: bash #
9f95a23c	54
b3b6e05e TL	55	ceph -s
b3b6e05e TL	56
522d829b	57	To upgrade (or downgrade) to a specific release, run the following command:
9f95a23c	58
b3b6e05e	59	.. prompt:: bash #
9f95a23c	60
b3b6e05e	61	ceph orch upgrade start --ceph-version <version>
9f95a23c	62
a4b75251	63	For example, to upgrade to v16.2.6, run the following command:
9f95a23c	64
b3b6e05e TL	65	.. prompt:: bash #
b3b6e05e TL	66
20effc67	67	ceph orch upgrade start --ceph-version 16.2.6
9f95a23c	68
a4b75251 TL	69	.. note::
	70
	71	From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry:
	72
	73	.. prompt:: bash #
	74
	75	ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6
	76
9f95a23c TL	77
	78	Monitoring the upgrade
	79	======================
	80
b3b6e05e TL	81	Determine (1) whether an upgrade is in progress and (2) which version the
	82	cluster is upgrading to by running the following command:
	83
	84	.. prompt:: bash #
9f95a23c	85
b3b6e05e	86	ceph orch upgrade status
9f95a23c	87
b3b6e05e TL	88	Watching the progress bar during a Ceph upgrade
	89	-----------------------------------------------
	90
	91	During the upgrade, a progress bar is visible in the ceph status output. It
	92	looks like this:
	93
	94	.. code-block:: console
9f95a23c TL	95
9f95a23c TL	96	# ceph -s
b3b6e05e	97
9f95a23c TL	98	[...]
	99	progress:
	100	Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s)
	101	[=======.....................] (time remaining: 01h 43m 31s)
	102
b3b6e05e TL	103	Watching the cephadm log during an upgrade
	104	------------------------------------------
	105
	106	Watch the cephadm log by running the following command:
9f95a23c	107
b3b6e05e TL	108	.. prompt:: bash #
	109
	110	ceph -W cephadm
9f95a23c TL	111
	112
	113	Canceling an upgrade
	114	====================
	115
522d829b	116	You can stop the upgrade process at any time by running the following command:
b3b6e05e TL	117
b3b6e05e TL	118	.. prompt:: bash #
9f95a23c	119
522d829b	120	ceph orch upgrade stop
9f95a23c	121
2a845540 TL	122	Post upgrade actions
	123	====================
	124
	125	In case the new version is based on ``cephadm``, once done with the upgrade the user
	126	has to update the ``cephadm`` package (or ceph-common package in case the user
	127	doesn't use ``cephadm shell``) to a version compatible with the new version.
9f95a23c TL	128
	129	Potential problems
	130	==================
	131
	132	There are a few health alerts that can arise during the upgrade process.
	133
	134	UPGRADE_NO_STANDBY_MGR
	135	----------------------
	136
522d829b TL	137	This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
	138	active standby manager daemon. In order to proceed with the upgrade, Ceph
	139	requires an active standby manager daemon (which you can think of in this
	140	context as "a second manager").
b3b6e05e	141
522d829b TL	142	You can ensure that Cephadm is configured to run 2 (or more) managers by
522d829b TL	143	running the following command:
b3b6e05e TL	144
b3b6e05e TL	145	.. prompt:: bash #
9f95a23c	146
b3b6e05e	147	ceph orch apply mgr 2 # or more
9f95a23c	148
522d829b TL	149	You can check the status of existing mgr daemons by running the following
522d829b TL	150	command:
9f95a23c	151
b3b6e05e	152	.. prompt:: bash #
9f95a23c	153
b3b6e05e	154	ceph orch ps --daemon-type mgr
9f95a23c	155
522d829b TL	156	If an existing mgr daemon has stopped, you can try to restart it by running the
522d829b TL	157	following command:
9f95a23c	158
b3b6e05e TL	159	.. prompt:: bash #
	160
	161	ceph orch daemon restart <name>
9f95a23c TL	162
	163	UPGRADE_FAILED_PULL
	164	-------------------
	165
522d829b TL	166	This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the
	167	container image for the target version. This can happen if you specify a
	168	version or container image that does not exist (e.g. "1.2.3"), or if the
	169	container registry can not be reached by one or more hosts in the cluster.
9f95a23c	170
522d829b TL	171	To cancel the existing upgrade and to specify a different target version, run
522d829b TL	172	the following commands:
9f95a23c	173
b3b6e05e TL	174	.. prompt:: bash #
	175
	176	ceph orch upgrade stop
	177	ceph orch upgrade start --ceph-version <version>
9f95a23c TL	178
	179
	180	Using customized container images
	181	=================================
	182
b3b6e05e TL	183	For most users, upgrading requires nothing more complicated than specifying the
	184	Ceph version number to upgrade to. In such cases, cephadm locates the specific
	185	Ceph container image to use by combining the ``container_image_base``
	186	configuration option (default: ``docker.io/ceph/ceph``) with a tag of
	187	``vX.Y.Z``.
	188
	189	But it is possible to upgrade to an arbitrary container image, if that's what
	190	you need. For example, the following command upgrades to a development build:
9f95a23c	191
b3b6e05e	192	.. prompt:: bash #
9f95a23c	193
b3b6e05e	194	ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name
9f95a23c TL	195
9f95a23c TL	196	For more information about available container images, see :ref:`containers`.
33c7a0ef TL	197
	198	Staggered Upgrade
	199	=================
	200
	201	Some users may prefer to upgrade components in phases rather than all at once.
	202	The upgrade command, starting in 16.2.10 and 17.2.1 allows parameters
	203	to limit which daemons are upgraded by a single upgrade command. The options in
	204	include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types``
	205	takes a comma-separated list of daemon types and will only upgrade daemons of those
	206	types. ``services`` is mutually exclusive with ``daemon_types``, only takes services
	207	of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and
	208	will only upgrade daemons belonging to those services. ``hosts`` can be combined
	209	with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter
	210	follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`.
	211	``limit`` takes an integer > 0 and provides a numerical limit on the number of
	212	daemons cephadm will upgrade. ``limit`` can be combined with any of the other
	213	parameters. For example, if you specify to upgrade daemons of type osd on host
	214	Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on
	215	Host1.
	216
	217	Example: specifying daemon types and hosts:
	218
	219	.. prompt:: bash #
	220
	221	ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2
	222
	223	Example: specifying services and using limit:
	224
	225	.. prompt:: bash #
	226
	227	ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2
	228
	229	.. note::
	230
	231	Cephadm strictly enforces an order to the upgrade of daemons that is still present
	232	in staggered upgrade scenarios. The current upgrade ordering is
	233	``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``.
	234	If you specify parameters that would upgrade daemons out of order, the upgrade
	235	command will block and note which daemons will be missed if you proceed.
	236
	237	.. note::
	238
	239	Upgrade commands with limiting parameters will validate the options before beginning the
	240	upgrade, which may require pulling the new container image. Do not be surprised
	241	if the upgrade start command takes a while to return when limiting parameters are provided.
	242
	243	.. note::
	244
	245	In staggered upgrade scenarios (when a limiting parameter is provided) monitoring
	246	stack daemons including Prometheus and node-exporter are refreshed after the Manager
	247	daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer
	248	than expected. Note that the versions of monitoring stack daemons may not change between
	249	Ceph releases, in which case they are only redeployed.
	250
	251	Upgrading to a version that supports staggered upgrade from one that doesn't
	252	----------------------------------------------------------------------------
	253
	254	While upgrading from a version that already supports staggered upgrades the process
	255	simply requires providing the necessary arguments. However, if you wish to upgrade
	256	to a version that supports staggered upgrade from one that does not, there is a
	257	workaround. It requires first manually upgrading the Manager daemons and then passing
	258	the limiting parameters as usual.
	259
	260	.. warning::
261	Make sure you have multiple running mgr daemons before attempting this procedure.
262
263	To start with, determine which Manager is your active one and which are standby. This
264	can be done in a variety of ways such as looking at the ``ceph -s`` output. Then,
265	manually upgrade each standby mgr daemon with:
266
267	.. prompt:: bash #
268
269	ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name>
270
271	.. note::
272
273	If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy``
274	command may not have the ``--image`` flag. In that case, you must manually set the
275	Manager container image ``ceph config set mgr container_image <new-image-name>`` and then
276	redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef``
277
278	At this point, a Manager fail over should allow us to have the active Manager be one
279	running the new version.
280
281	.. prompt:: bash #
282
283	ceph mgr fail
284
285	Verify the active Manager is now one running the new version. To complete the Manager
286	upgrading:
287
288	.. prompt:: bash #
289
290	ceph orch upgrade start --image <new-image-name> --daemon-types mgr
291
292	You should now have all your Manager daemons on the new version and be able to
293	specify the limiting parameters for the rest of the upgrade.