[ceph.git] / ceph / doc / cephfs / add-remove-mds.rst

============================
 Deploying Metadata Servers
============================

Each CephFS file system requires at least one MDS. The cluster operator will
generally use their automated deployment tool to launch required MDS servers as
needed.  Rook and ansible (via the ceph-ansible playbooks) are recommended
tools for doing this. For clarity, we also show the systemd commands here which
may be run by the deployment technology if executed on bare-metal.

See `MDS Config Reference`_ for details on configuring metadata servers.


Provisioning Hardware for an MDS
================================

The present version of the MDS is single-threaded and CPU-bound for most
activities, including responding to client requests. Even so, an MDS under the
most aggressive client loads still uses about 2 to 3 CPU cores. This is due to
the other miscellaneous upkeep threads working in tandem.

Even so, it is recommended that an MDS server be well provisioned with an
advanced CPU with sufficient cores. Development is on-going to make better use
of available CPU cores in the MDS; it is expected in future versions of Ceph
that the MDS server will improve performance by taking advantage of more cores.

The other dimension to MDS performance is the available RAM for caching. The
MDS necessarily manages a distributed and cooperative metadata cache among all
clients and other active MDSs. Therefore it is essential to provide the MDS
with sufficient RAM to enable faster metadata access and mutation.

Generally, an MDS serving a large cluster of clients (1000 or more) will use at
least 64GB of cache (see also :doc:`/cephfs/cache-size-limits`). An MDS with a larger
cache is not well explored in the largest known community clusters; there may
be diminishing returns where management of such a large cache negatively
impacts performance in surprising ways. It would be best to do analysis with
expected workloads to determine if provisioning more RAM is worthwhile.

In a bare-metal cluster, the best practice is to over-provision hardware for
the MDS server. Even if a single MDS daemon is unable to fully utilize the
hardware, it may be desirable later on to start more active MDS daemons on the
same node to fully utilize the available cores and memory. Additionally, it may
become clear with workloads on the cluster that performance improves with
multiple active MDS on the same node rather than over-provisioning a single
MDS.

Finally, be aware that CephFS is a highly-available file system by supporting
standby MDS (see also :ref:`mds-standby`) for rapid failover. To get a real
benefit from deploying standbys, it is usually necessary to distribute MDS
daemons across at least two nodes in the cluster. Otherwise, a hardware failure
on a single node may result in the file system becoming unavailable.

Co-locating the MDS with other Ceph daemons (hyperconverged) is an effective
and recommended way to accomplish this so long as all daemons are configured to
use available hardware within certain limits.  For the MDS, this generally
means limiting its cache size.


Adding an MDS
=============

#. Create an mds data point ``/var/lib/ceph/mds/ceph-${id}``. The daemon only uses this directory to store its keyring.

#. Edit ``ceph.conf`` and add MDS section. ::

	[mds.${id}]
	host = {hostname}

#. Create the authentication key, if you use CephX. ::

	$ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-${id}/keyring

#. Start the service. ::

	$ sudo systemctl start mds.${id}

#. The status of the cluster should show: ::

	mds: ${id}:1 {0=${id}=up:active} 2 up:standby

Removing an MDS
===============

If you have a metadata server in your cluster that you'd like to remove, you may use
the following method.

#. (Optionally:) Create a new replacement Metadata Server. If there are no
   replacement MDS to take over once the MDS is removed, the file system will
   become unavailable to clients.  If that is not desirable, consider adding a
   metadata server before tearing down the metadata server you would like to
   take offline.

#. Stop the MDS to be removed. ::

	$ sudo systemctl stop mds.${id}

   The MDS will automatically notify the Ceph monitors that it is going down.
   This enables the monitors to perform instantaneous failover to an available
   standby, if one exists. It is unnecessary to use administrative commands to
   effect this failover, e.g. through the use of ``ceph mds fail mds.${id}``.

#. Remove the ``/var/lib/ceph/mds/ceph-${id}`` directory on the MDS. ::

	$ sudo rm -rf /var/lib/ceph/mds/ceph-${id}

.. _MDS Config Reference: ../mds-config-ref
Commit	Line	Data
11fdf7f2	1	============================
92f5a8d4	2	Deploying Metadata Servers
11fdf7f2 TL	3	============================
11fdf7f2 TL	4
92f5a8d4 TL	5	Each CephFS file system requires at least one MDS. The cluster operator will
	6	generally use their automated deployment tool to launch required MDS servers as
	7	needed. Rook and ansible (via the ceph-ansible playbooks) are recommended
	8	tools for doing this. For clarity, we also show the systemd commands here which
	9	may be run by the deployment technology if executed on bare-metal.
11fdf7f2 TL	10
	11	See `MDS Config Reference`_ for details on configuring metadata servers.
	12
	13
92f5a8d4 TL	14	Provisioning Hardware for an MDS
92f5a8d4 TL	15	================================
11fdf7f2	16
92f5a8d4 TL	17	The present version of the MDS is single-threaded and CPU-bound for most
	18	activities, including responding to client requests. Even so, an MDS under the
	19	most aggressive client loads still uses about 2 to 3 CPU cores. This is due to
	20	the other miscellaneous upkeep threads working in tandem.
	21
	22	Even so, it is recommended that an MDS server be well provisioned with an
	23	advanced CPU with sufficient cores. Development is on-going to make better use
	24	of available CPU cores in the MDS; it is expected in future versions of Ceph
	25	that the MDS server will improve performance by taking advantage of more cores.
	26
	27	The other dimension to MDS performance is the available RAM for caching. The
	28	MDS necessarily manages a distributed and cooperative metadata cache among all
	29	clients and other active MDSs. Therefore it is essential to provide the MDS
	30	with sufficient RAM to enable faster metadata access and mutation.
	31
	32	Generally, an MDS serving a large cluster of clients (1000 or more) will use at
	33	least 64GB of cache (see also :doc:`/cephfs/cache-size-limits`). An MDS with a larger
	34	cache is not well explored in the largest known community clusters; there may
	35	be diminishing returns where management of such a large cache negatively
	36	impacts performance in surprising ways. It would be best to do analysis with
	37	expected workloads to determine if provisioning more RAM is worthwhile.
	38
	39	In a bare-metal cluster, the best practice is to over-provision hardware for
	40	the MDS server. Even if a single MDS daemon is unable to fully utilize the
	41	hardware, it may be desirable later on to start more active MDS daemons on the
	42	same node to fully utilize the available cores and memory. Additionally, it may
	43	become clear with workloads on the cluster that performance improves with
	44	multiple active MDS on the same node rather than over-provisioning a single
	45	MDS.
	46
	47	Finally, be aware that CephFS is a highly-available file system by supporting
	48	standby MDS (see also :ref:`mds-standby`) for rapid failover. To get a real
	49	benefit from deploying standbys, it is usually necessary to distribute MDS
	50	daemons across at least two nodes in the cluster. Otherwise, a hardware failure
	51	on a single node may result in the file system becoming unavailable.
	52
	53	Co-locating the MDS with other Ceph daemons (hyperconverged) is an effective
	54	and recommended way to accomplish this so long as all daemons are configured to
	55	use available hardware within certain limits. For the MDS, this generally
	56	means limiting its cache size.
	57
	58
	59	Adding an MDS
	60	=============
	61
	62	#. Create an mds data point ``/var/lib/ceph/mds/ceph-${id}``. The daemon only uses this directory to store its keyring.
11fdf7f2 TL	63
	64	#. Edit ``ceph.conf`` and add MDS section. ::
	65
92f5a8d4	66	[mds.${id}]
11fdf7f2 TL	67	host = {hostname}
	68
	69	#. Create the authentication key, if you use CephX. ::
	70
92f5a8d4	71	$ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow ' osd 'allow ' > /var/lib/ceph/mds/ceph-${id}/keyring
11fdf7f2 TL	72
	73	#. Start the service. ::
	74
92f5a8d4	75	$ sudo systemctl start mds.${id}
11fdf7f2	76
92f5a8d4	77	#. The status of the cluster should show: ::
11fdf7f2	78
92f5a8d4	79	mds: ${id}:1 {0=${id}=up:active} 2 up:standby
11fdf7f2	80
92f5a8d4 TL	81	Removing an MDS
92f5a8d4 TL	82	===============
11fdf7f2 TL	83
	84	If you have a metadata server in your cluster that you'd like to remove, you may use
	85	the following method.
	86
92f5a8d4 TL	87	#. (Optionally:) Create a new replacement Metadata Server. If there are no
	88	replacement MDS to take over once the MDS is removed, the file system will
	89	become unavailable to clients. If that is not desirable, consider adding a
	90	metadata server before tearing down the metadata server you would like to
	91	take offline.
	92
	93	#. Stop the MDS to be removed. ::
	94
	95	$ sudo systemctl stop mds.${id}
11fdf7f2	96
92f5a8d4 TL	97	The MDS will automatically notify the Ceph monitors that it is going down.
	98	This enables the monitors to perform instantaneous failover to an available
	99	standby, if one exists. It is unnecessary to use administrative commands to
	100	effect this failover, e.g. through the use of ``ceph mds fail mds.${id}``.
11fdf7f2	101
92f5a8d4	102	#. Remove the ``/var/lib/ceph/mds/ceph-${id}`` directory on the MDS. ::
11fdf7f2	103
92f5a8d4	104	$ sudo rm -rf /var/lib/ceph/mds/ceph-${id}
11fdf7f2 TL	105
11fdf7f2 TL	106	.. _MDS Config Reference: ../mds-config-ref