[ceph.git] / ceph / doc / cephfs / standby.rst

.. _mds-standby:

Terminology
-----------

A Ceph cluster may have zero or more CephFS *filesystems*.  CephFS
filesystems have a human readable name (set in ``fs new``)
and an integer ID.  The ID is called the filesystem cluster ID,
or *FSCID*.

Each CephFS filesystem has a number of *ranks*, one by default,
which start at zero.  A rank may be thought of as a metadata shard.
Controlling the number of ranks in a filesystem is described
in :doc:`/cephfs/multimds`

Each CephFS ceph-mds process (a *daemon*) initially starts up
without a rank.  It may be assigned one by the monitor cluster.
A daemon may only hold one rank at a time.  Daemons only give up
a rank when the ceph-mds process stops.

If a rank is not associated with a daemon, the rank is
considered *failed*.  Once a rank is assigned to a daemon,
the rank is considered *up*.

A daemon has a *name* that is set statically by the administrator
when the daemon is first configured.  Typical configurations
use the hostname where the daemon runs as the daemon name.

Each time a daemon starts up, it is also assigned a *GID*, which
is unique to this particular process lifetime of the daemon.  The
GID is an integer.

Referring to MDS daemons
------------------------

Most of the administrative commands that refer to an MDS daemon
accept a flexible argument format that may contain a rank, a GID
or a name.

Where a rank is used, this may optionally be qualified with
a leading filesystem name or ID.  If a daemon is a standby (i.e.
it is not currently assigned a rank), then it may only be
referred to by GID or name.

For example, if we had an MDS daemon which was called 'myhost',
had GID 5446, and was assigned rank 0 in the filesystem 'myfs'
which had FSCID 3, then any of the following would be suitable
forms of the 'fail' command:

::

    ceph mds fail 5446     # GID
    ceph mds fail myhost   # Daemon name
    ceph mds fail 0        # Unqualified rank
    ceph mds fail 3:0      # FSCID and rank
    ceph mds fail myfs:0   # Filesystem name and rank

Managing failover
-----------------

If an MDS daemon stops communicating with the monitor, the monitor will wait
``mds_beacon_grace`` seconds (default 15 seconds) before marking the daemon as
*laggy*. If a standby is available, the monitor will immediately replace the
laggy daemon.

Each file system may specify a number of standby daemons to be considered
healthy. This number includes daemons in standby-replay waiting for a rank to
fail (remember that a standby-replay daemon will not be assigned to take over a
failure for another rank or a failure in a another CephFS file system). The
pool of standby daemons not in replay count towards any file system count.
Each file system may set the number of standby daemons wanted using:

::

    ceph fs set <fs name> standby_count_wanted <count>

Setting ``count`` to 0 will disable the health check.


.. _mds-standby-replay:

Configuring standby-replay
--------------------------

Each CephFS file system may be configured to add standby-replay daemons.  These
standby daemons follow the active MDS's metadata journal to reduce failover
time in the event the active MDS becomes unavailable. Each active MDS may have
only one standby-replay daemon following it.

Configuring standby-replay on a file system is done using:

::

    ceph fs set <fs name> allow_standby_replay <bool>

Once set, the monitors will assign available standby daemons to follow the
active MDSs in that file system.

Once an MDS has entered the standby-replay state, it will only be used as a
standby for the rank that it is following. If another rank fails, this
standby-replay daemon will not be used as a replacement, even if no other
standbys are available. For this reason, it is advised that if standby-replay
is used then every active MDS should have a standby-replay daemon.
Commit	Line	Data
11fdf7f2	1	.. _mds-standby:
7c673cae FG	2
	3	Terminology
	4	-----------
	5
	6	A Ceph cluster may have zero or more CephFS filesystems. CephFS
	7	filesystems have a human readable name (set in ``fs new``)
	8	and an integer ID. The ID is called the filesystem cluster ID,
	9	or FSCID.
	10
	11	Each CephFS filesystem has a number of ranks, one by default,
	12	which start at zero. A rank may be thought of as a metadata shard.
	13	Controlling the number of ranks in a filesystem is described
	14	in :doc:`/cephfs/multimds`
	15
	16	Each CephFS ceph-mds process (a daemon) initially starts up
	17	without a rank. It may be assigned one by the monitor cluster.
	18	A daemon may only hold one rank at a time. Daemons only give up
	19	a rank when the ceph-mds process stops.
	20
	21	If a rank is not associated with a daemon, the rank is
	22	considered failed. Once a rank is assigned to a daemon,
	23	the rank is considered up.
	24
	25	A daemon has a name that is set statically by the administrator
	26	when the daemon is first configured. Typical configurations
	27	use the hostname where the daemon runs as the daemon name.
	28
	29	Each time a daemon starts up, it is also assigned a GID, which
	30	is unique to this particular process lifetime of the daemon. The
	31	GID is an integer.
	32
	33	Referring to MDS daemons
	34	------------------------
	35
	36	Most of the administrative commands that refer to an MDS daemon
	37	accept a flexible argument format that may contain a rank, a GID
	38	or a name.
	39
	40	Where a rank is used, this may optionally be qualified with
	41	a leading filesystem name or ID. If a daemon is a standby (i.e.
	42	it is not currently assigned a rank), then it may only be
	43	referred to by GID or name.
	44
	45	For example, if we had an MDS daemon which was called 'myhost',
	46	had GID 5446, and was assigned rank 0 in the filesystem 'myfs'
	47	which had FSCID 3, then any of the following would be suitable
	48	forms of the 'fail' command:
	49
	50	::
	51
	52	ceph mds fail 5446 # GID
	53	ceph mds fail myhost # Daemon name
	54	ceph mds fail 0 # Unqualified rank
	55	ceph mds fail 3:0 # FSCID and rank
	56	ceph mds fail myfs:0 # Filesystem name and rank
	57
	58	Managing failover
	59	-----------------
	60
11fdf7f2 TL	61	If an MDS daemon stops communicating with the monitor, the monitor will wait
	62	``mds_beacon_grace`` seconds (default 15 seconds) before marking the daemon as
	63	laggy. If a standby is available, the monitor will immediately replace the
	64	laggy daemon.
7c673cae FG	65
	66	Each file system may specify a number of standby daemons to be considered
	67	healthy. This number includes daemons in standby-replay waiting for a rank to
	68	fail (remember that a standby-replay daemon will not be assigned to take over a
	69	failure for another rank or a failure in a another CephFS file system). The
	70	pool of standby daemons not in replay count towards any file system count.
	71	Each file system may set the number of standby daemons wanted using:
	72
	73	::
	74
	75	ceph fs set <fs name> standby_count_wanted <count>
	76
	77	Setting ``count`` to 0 will disable the health check.
	78
	79
11fdf7f2	80	.. _mds-standby-replay:
7c673cae	81
11fdf7f2 TL	82	Configuring standby-replay
11fdf7f2 TL	83	--------------------------
7c673cae	84
11fdf7f2 TL	85	Each CephFS file system may be configured to add standby-replay daemons. These
	86	standby daemons follow the active MDS's metadata journal to reduce failover
	87	time in the event the active MDS becomes unavailable. Each active MDS may have
	88	only one standby-replay daemon following it.
7c673cae	89
11fdf7f2	90	Configuring standby-replay on a file system is done using:
7c673cae FG	91
7c673cae FG	92	::
7c673cae	93
11fdf7f2	94	ceph fs set <fs name> allow_standby_replay <bool>
7c673cae	95
11fdf7f2 TL	96	Once set, the monitors will assign available standby daemons to follow the
11fdf7f2 TL	97	active MDSs in that file system.
7c673cae	98
11fdf7f2 TL	99	Once an MDS has entered the standby-replay state, it will only be used as a
	100	standby for the rank that it is following. If another rank fails, this
	101	standby-replay daemon will not be used as a replacement, even if no other
	102	standbys are available. For this reason, it is advised that if standby-replay
	103	is used then every active MDS should have a standby-replay daemon.