[ceph.git] / ceph / doc / cephfs / add-remove-mds.rst

.. _cephfs_add_remote_mds:

.. note::
   It is highly recommended to use :doc:`/cephadm/index` or another Ceph
   orchestrator for setting up the ceph cluster. Use this approach only if you
   are setting up the ceph cluster manually. If one still intends to use the
   manual way for deploying MDS daemons, :doc:`/cephadm/services/mds/` can
   also be used.

============================
 Deploying Metadata Servers
============================

Each CephFS file system requires at least one MDS. The cluster operator will
generally use their automated deployment tool to launch required MDS servers as
needed.  Rook and ansible (via the ceph-ansible playbooks) are recommended
tools for doing this. For clarity, we also show the systemd commands here which
may be run by the deployment technology if executed on bare-metal.

See `MDS Config Reference`_ for details on configuring metadata servers.


Provisioning Hardware for an MDS
================================

The present version of the MDS is single-threaded and CPU-bound for most
activities, including responding to client requests. An MDS under the most
aggressive client loads uses about 2 to 3 CPU cores. This is due to the other
miscellaneous upkeep threads working in tandem.

Even so, it is recommended that an MDS server be well provisioned with an
advanced CPU with sufficient cores. Development is on-going to make better use
of available CPU cores in the MDS; it is expected in future versions of Ceph
that the MDS server will improve performance by taking advantage of more cores.

The other dimension to MDS performance is the available RAM for caching. The
MDS necessarily manages a distributed and cooperative metadata cache among all
clients and other active MDSs. Therefore it is essential to provide the MDS
with sufficient RAM to enable faster metadata access and mutation. The default
MDS cache size (see also :doc:`/cephfs/cache-configuration`) is 4GB. It is
recommended to provision at least 8GB of RAM for the MDS to support this cache
size.

Generally, an MDS serving a large cluster of clients (1000 or more) will use at
least 64GB of cache. An MDS with a larger cache is not well explored in the
largest known community clusters; there may be diminishing returns where
management of such a large cache negatively impacts performance in surprising
ways. It would be best to do analysis with expected workloads to determine if
provisioning more RAM is worthwhile.

In a bare-metal cluster, the best practice is to over-provision hardware for
the MDS server. Even if a single MDS daemon is unable to fully utilize the
hardware, it may be desirable later on to start more active MDS daemons on the
same node to fully utilize the available cores and memory. Additionally, it may
become clear with workloads on the cluster that performance improves with
multiple active MDS on the same node rather than over-provisioning a single
MDS.

Finally, be aware that CephFS is a highly-available file system by supporting
standby MDS (see also :ref:`mds-standby`) for rapid failover. To get a real
benefit from deploying standbys, it is usually necessary to distribute MDS
daemons across at least two nodes in the cluster. Otherwise, a hardware failure
on a single node may result in the file system becoming unavailable.

Co-locating the MDS with other Ceph daemons (hyperconverged) is an effective
and recommended way to accomplish this so long as all daemons are configured to
use available hardware within certain limits.  For the MDS, this generally
means limiting its cache size.


Adding an MDS
=============

#. Create an mds directory ``/var/lib/ceph/mds/ceph-${id}``. The daemon only uses this directory to store its keyring.

#. Create the authentication key, if you use CephX: ::

	$ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-${id}/keyring

#. Start the service: ::

	$ sudo systemctl start ceph-mds@${id}

#. The status of the cluster should show: ::

	mds: ${id}:1 {0=${id}=up:active} 2 up:standby

#. Optionally, configure the file system the MDS should join (:ref:`mds-join-fs`): ::

    $ ceph config set mds.${id} mds_join_fs ${fs}


Removing an MDS
===============

If you have a metadata server in your cluster that you'd like to remove, you may use
the following method.

#. (Optionally:) Create a new replacement Metadata Server. If there are no
   replacement MDS to take over once the MDS is removed, the file system will
   become unavailable to clients.  If that is not desirable, consider adding a
   metadata server before tearing down the metadata server you would like to
   take offline.

#. Stop the MDS to be removed. ::

	$ sudo systemctl stop ceph-mds@${id}

   The MDS will automatically notify the Ceph monitors that it is going down.
   This enables the monitors to perform instantaneous failover to an available
   standby, if one exists. It is unnecessary to use administrative commands to
   effect this failover, e.g. through the use of ``ceph mds fail mds.${id}``.

#. Remove the ``/var/lib/ceph/mds/ceph-${id}`` directory on the MDS. ::

	$ sudo rm -rf /var/lib/ceph/mds/ceph-${id}

.. _MDS Config Reference: ../mds-config-ref
Commit	Line	Data
39ae355f TL	1	.. _cephfs_add_remote_mds:
39ae355f TL	2
33c7a0ef TL	3	.. note::
	4	It is highly recommended to use :doc:`/cephadm/index` or another Ceph
	5	orchestrator for setting up the ceph cluster. Use this approach only if you
	6	are setting up the ceph cluster manually. If one still intends to use the
	7	manual way for deploying MDS daemons, :doc:`/cephadm/services/mds/` can
	8	also be used.
	9
11fdf7f2	10	============================
92f5a8d4	11	Deploying Metadata Servers
11fdf7f2 TL	12	============================
11fdf7f2 TL	13
92f5a8d4 TL	14	Each CephFS file system requires at least one MDS. The cluster operator will
	15	generally use their automated deployment tool to launch required MDS servers as
	16	needed. Rook and ansible (via the ceph-ansible playbooks) are recommended
	17	tools for doing this. For clarity, we also show the systemd commands here which
	18	may be run by the deployment technology if executed on bare-metal.
11fdf7f2 TL	19
	20	See `MDS Config Reference`_ for details on configuring metadata servers.
	21
	22
92f5a8d4 TL	23	Provisioning Hardware for an MDS
92f5a8d4 TL	24	================================
11fdf7f2	25
92f5a8d4	26	The present version of the MDS is single-threaded and CPU-bound for most
9f95a23c TL	27	activities, including responding to client requests. An MDS under the most
	28	aggressive client loads uses about 2 to 3 CPU cores. This is due to the other
	29	miscellaneous upkeep threads working in tandem.
92f5a8d4 TL	30
	31	Even so, it is recommended that an MDS server be well provisioned with an
	32	advanced CPU with sufficient cores. Development is on-going to make better use
	33	of available CPU cores in the MDS; it is expected in future versions of Ceph
	34	that the MDS server will improve performance by taking advantage of more cores.
	35
	36	The other dimension to MDS performance is the available RAM for caching. The
	37	MDS necessarily manages a distributed and cooperative metadata cache among all
	38	clients and other active MDSs. Therefore it is essential to provide the MDS
9f95a23c	39	with sufficient RAM to enable faster metadata access and mutation. The default
adb31ebb	40	MDS cache size (see also :doc:`/cephfs/cache-configuration`) is 4GB. It is
9f95a23c TL	41	recommended to provision at least 8GB of RAM for the MDS to support this cache
9f95a23c TL	42	size.
92f5a8d4 TL	43
92f5a8d4 TL	44	Generally, an MDS serving a large cluster of clients (1000 or more) will use at
9f95a23c TL	45	least 64GB of cache. An MDS with a larger cache is not well explored in the
	46	largest known community clusters; there may be diminishing returns where
	47	management of such a large cache negatively impacts performance in surprising
	48	ways. It would be best to do analysis with expected workloads to determine if
	49	provisioning more RAM is worthwhile.
92f5a8d4 TL	50
	51	In a bare-metal cluster, the best practice is to over-provision hardware for
	52	the MDS server. Even if a single MDS daemon is unable to fully utilize the
	53	hardware, it may be desirable later on to start more active MDS daemons on the
	54	same node to fully utilize the available cores and memory. Additionally, it may
	55	become clear with workloads on the cluster that performance improves with
	56	multiple active MDS on the same node rather than over-provisioning a single
	57	MDS.
	58
	59	Finally, be aware that CephFS is a highly-available file system by supporting
	60	standby MDS (see also :ref:`mds-standby`) for rapid failover. To get a real
	61	benefit from deploying standbys, it is usually necessary to distribute MDS
	62	daemons across at least two nodes in the cluster. Otherwise, a hardware failure
	63	on a single node may result in the file system becoming unavailable.
	64
	65	Co-locating the MDS with other Ceph daemons (hyperconverged) is an effective
	66	and recommended way to accomplish this so long as all daemons are configured to
	67	use available hardware within certain limits. For the MDS, this generally
	68	means limiting its cache size.
	69
	70
	71	Adding an MDS
	72	=============
	73
33c7a0ef	74	#. Create an mds directory ``/var/lib/ceph/mds/ceph-${id}``. The daemon only uses this directory to store its keyring.
11fdf7f2	75
9f95a23c	76	#. Create the authentication key, if you use CephX: ::
11fdf7f2	77
92f5a8d4	78	$ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow ' osd 'allow ' > /var/lib/ceph/mds/ceph-${id}/keyring
11fdf7f2	79
9f95a23c	80	#. Start the service: ::
11fdf7f2	81
9f95a23c	82	$ sudo systemctl start ceph-mds@${id}
11fdf7f2	83
92f5a8d4	84	#. The status of the cluster should show: ::
11fdf7f2	85
92f5a8d4	86	mds: ${id}:1 {0=${id}=up:active} 2 up:standby
11fdf7f2	87
9f95a23c TL	88	#. Optionally, configure the file system the MDS should join (:ref:`mds-join-fs`): ::
	89
	90	$ ceph config set mds.${id} mds_join_fs ${fs}
	91
	92
92f5a8d4 TL	93	Removing an MDS
92f5a8d4 TL	94	===============
11fdf7f2 TL	95
	96	If you have a metadata server in your cluster that you'd like to remove, you may use
	97	the following method.
	98
92f5a8d4 TL	99	#. (Optionally:) Create a new replacement Metadata Server. If there are no
	100	replacement MDS to take over once the MDS is removed, the file system will
	101	become unavailable to clients. If that is not desirable, consider adding a
	102	metadata server before tearing down the metadata server you would like to
	103	take offline.
	104
	105	#. Stop the MDS to be removed. ::
	106
9f95a23c	107	$ sudo systemctl stop ceph-mds@${id}
11fdf7f2	108
92f5a8d4 TL	109	The MDS will automatically notify the Ceph monitors that it is going down.
	110	This enables the monitors to perform instantaneous failover to an available
	111	standby, if one exists. It is unnecessary to use administrative commands to
	112	effect this failover, e.g. through the use of ``ceph mds fail mds.${id}``.
11fdf7f2	113
92f5a8d4	114	#. Remove the ``/var/lib/ceph/mds/ceph-${id}`` directory on the MDS. ::
11fdf7f2	115
92f5a8d4	116	$ sudo rm -rf /var/lib/ceph/mds/ceph-${id}
11fdf7f2 TL	117
11fdf7f2 TL	118	.. _MDS Config Reference: ../mds-config-ref