[ceph.git] / ceph / doc / cephadm / operations.rst

==================
Cephadm Operations
==================

Watching cephadm log messages
=============================

Cephadm logs to the ``cephadm`` cluster log channel, meaning you can
monitor progress in realtime with::

  # ceph -W cephadm

By default it will show info-level events and above.  To see
debug-level messages too::

  # ceph config set mgr mgr/cephadm/log_to_cluster_level debug
  # ceph -W cephadm --watch-debug

Be careful: the debug messages are very verbose!

You can see recent events with::

  # ceph log last cephadm

These events are also logged to the ``ceph.cephadm.log`` file on
monitor hosts and to the monitor daemons' stderr.


.. _cephadm-logs:

Ceph daemon logs
================

Logging to stdout
-----------------

Traditionally, Ceph daemons have logged to ``/var/log/ceph``.  By
default, cephadm daemons log to stderr and the logs are
captured by the container runtime environment.  For most systems, by
default, these logs are sent to journald and accessible via
``journalctl``.

For example, to view the logs for the daemon ``mon.foo`` for a cluster
with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
something like::

  journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo

This works well for normal operations when logging levels are low.

To disable logging to stderr::

  ceph config set global log_to_stderr false
  ceph config set global mon_cluster_log_to_stderr false

Logging to files
----------------

You can also configure Ceph daemons to log to files instead of stderr,
just like they have in the past.  When logging to files, Ceph logs appear
in ``/var/log/ceph/<cluster-fsid>``.

To enable logging to files::

  ceph config set global log_to_file true
  ceph config set global mon_cluster_log_to_file true

We recommend disabling logging to stderr (see above) or else everything
will be logged twice::

  ceph config set global log_to_stderr false
  ceph config set global mon_cluster_log_to_stderr false

By default, cephadm sets up log rotation on each host to rotate these
files.  You can configure the logging retention schedule by modifying
``/etc/logrotate.d/ceph.<cluster-fsid>``.


Data location
=============

Cephadm daemon data and logs in slightly different locations than older
versions of ceph:

* ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs.  Note
  that by default cephadm logs via stderr and the container runtime,
  so these logs are normally not present.
* ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
  (besides logs).
* ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
  an individual daemon.
* ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
  the cluster.
* ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
  data directories for stateful daemons (e.g., monitor, prometheus)
  that have been removed by cephadm.

Disk usage
----------

Because a few Ceph daemons may store a significant amount of data in
``/var/lib/ceph`` (notably, the monitors and prometheus), we recommend
moving this directory to its own disk, partition, or logical volume so
that it does not fill up the root file system.


SSH Configuration
=================

Cephadm uses SSH to connect to remote hosts.  SSH uses a key to authenticate
with those hosts in a secure way.


Default behavior
----------------

Cephadm stores an SSH key in the monitor that is used to
connect to remote hosts.  When the cluster is bootstrapped, this SSH
key is generated automatically and no additional configuration
is necessary.

A *new* SSH key can be generated with::

  ceph cephadm generate-key

The public portion of the SSH key can be retrieved with::

  ceph cephadm get-pub-key

The currently stored SSH key can be deleted with::

  ceph cephadm clear-key

You can make use of an existing key by directly importing it with::

  ceph config-key set mgr/cephadm/ssh_identity_key -i <key>
  ceph config-key set mgr/cephadm/ssh_identity_pub -i <pub>

You will then need to restart the mgr daemon to reload the configuration with::

  ceph mgr fail

Configuring a different SSH user
----------------------------------

Cephadm must be able to log into all the Ceph cluster nodes as an user
that has enough privileges to download container images, start containers
and execute commands without prompting for a password. If you do not want
to use the "root" user (default option in cephadm), you must provide
cephadm the name of the user that is going to be used to perform all the
cephadm operations. Use the command::

  ceph cephadm set-user <user>

Prior to running this the cluster ssh key needs to be added to this users
authorized_keys file and non-root users must have passwordless sudo access.


Customizing the SSH configuration
---------------------------------

Cephadm generates an appropriate ``ssh_config`` file that is
used for connecting to remote hosts.  This configuration looks
something like this::

  Host *
  User root
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null

There are two ways to customize this configuration for your environment:

#. Import a customized configuration file that will be stored
   by the monitor with::

     ceph cephadm set-ssh-config -i <ssh_config_file>

   To remove a customized SSH config and revert back to the default behavior::

     ceph cephadm clear-ssh-config

#. You can configure a file location for the SSH configuration file with::

     ceph config set mgr mgr/cephadm/ssh_config_file <path>

   We do *not recommend* this approach.  The path name must be
   visible to *any* mgr daemon, and cephadm runs all daemons as
   containers. That means that the file either need to be placed
   inside a customized container image for your deployment, or
   manually distributed to the mgr data directory
   (``/var/lib/ceph/<cluster-fsid>/mgr.<id>`` on the host, visible at
   ``/var/lib/ceph/mgr/ceph-<id>`` from inside the container).


Health checks
=============

CEPHADM_PAUSED
--------------

Cephadm background work has been paused with ``ceph orch pause``.  Cephadm
continues to perform passive monitoring activities (like checking
host and daemon status), but it will not make any changes (like deploying
or removing daemons).

Resume cephadm work with::

  ceph orch resume

.. _cephadm-stray-host:

CEPHADM_STRAY_HOST
------------------

One or more hosts have running Ceph daemons but are not registered as
hosts managed by *cephadm*.  This means that those services cannot
currently be managed by cephadm (e.g., restarted, upgraded, included
in `ceph orch ps`).

You can manage the host(s) with::

  ceph orch host add *<hostname>*

Note that you may need to configure SSH access to the remote host
before this will work.

Alternatively, you can manually connect to the host and ensure that
services on that host are removed or migrated to a host that is
managed by *cephadm*.

You can also disable this warning entirely with::

  ceph config set mgr mgr/cephadm/warn_on_stray_hosts false

See :ref:`cephadm-fqdn` for more information about host names and
domain names.

CEPHADM_STRAY_DAEMON
--------------------

One or more Ceph daemons are running but not are not managed by
*cephadm*.  This may be because they were deployed using a different
tool, or because they were started manually.  Those
services cannot currently be managed by cephadm (e.g., restarted,
upgraded, or included in `ceph orch ps`).

If the daemon is a stateful one (monitor or OSD), it should be adopted
by cephadm; see :ref:`cephadm-adoption`.  For stateless daemons, it is
usually easiest to provision a new daemon with the ``ceph orch apply``
command and then stop the unmanaged daemon.

This warning can be disabled entirely with::

  ceph config set mgr mgr/cephadm/warn_on_stray_daemons false

CEPHADM_HOST_CHECK_FAILED
-------------------------

One or more hosts have failed the basic cephadm host check, which verifies
that (1) the host is reachable and cephadm can be executed there, and (2)
that the host satisfies basic prerequisites, like a working container
runtime (podman or docker) and working time synchronization.
If this test fails, cephadm will no be able to manage services on that host.

You can manually run this check with::

  ceph cephadm check-host *<hostname>*

You can remove a broken host from management with::

  ceph orch host rm *<hostname>*

You can disable this health warning with::

  ceph config set mgr mgr/cephadm/warn_on_failed_host_check false

/etc/ceph/ceph.conf
===================

Cephadm uses a minimized ``ceph.conf`` that only contains 
a minimal set of information to connect to the Ceph cluster.

To update the configuration settings, use::

  ceph config set ...


To set up an initial configuration before calling
`bootstrap`, create an initial ``ceph.conf`` file. For example::

  cat <<EOF > /etc/ceph/ceph.conf
  [global]
  osd crush chooseleaf type = 0
  EOF
  cephadm bootstrap -c /root/ceph.conf ...
Commit	Line	Data
9f95a23c TL	1	==================
	2	Cephadm Operations
	3	==================
	4
	5	Watching cephadm log messages
	6	=============================
	7
	8	Cephadm logs to the ``cephadm`` cluster log channel, meaning you can
	9	monitor progress in realtime with::
	10
	11	# ceph -W cephadm
	12
	13	By default it will show info-level events and above. To see
	14	debug-level messages too::
	15
	16	# ceph config set mgr mgr/cephadm/log_to_cluster_level debug
	17	# ceph -W cephadm --watch-debug
	18
	19	Be careful: the debug messages are very verbose!
	20
	21	You can see recent events with::
	22
	23	# ceph log last cephadm
	24
	25	These events are also logged to the ``ceph.cephadm.log`` file on
	26	monitor hosts and to the monitor daemons' stderr.
	27
	28
801d1391 TL	29	.. _cephadm-logs:
801d1391 TL	30
9f95a23c TL	31	Ceph daemon logs
	32	================
	33
	34	Logging to stdout
	35	-----------------
	36
	37	Traditionally, Ceph daemons have logged to ``/var/log/ceph``. By
	38	default, cephadm daemons log to stderr and the logs are
	39	captured by the container runtime environment. For most systems, by
	40	default, these logs are sent to journald and accessible via
	41	``journalctl``.
	42
	43	For example, to view the logs for the daemon ``mon.foo`` for a cluster
	44	with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
	45	something like::
	46
	47	journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
	48
	49	This works well for normal operations when logging levels are low.
	50
	51	To disable logging to stderr::
	52
	53	ceph config set global log_to_stderr false
	54	ceph config set global mon_cluster_log_to_stderr false
	55
	56	Logging to files
	57	----------------
	58
	59	You can also configure Ceph daemons to log to files instead of stderr,
	60	just like they have in the past. When logging to files, Ceph logs appear
	61	in ``/var/log/ceph/<cluster-fsid>``.
	62
	63	To enable logging to files::
	64
	65	ceph config set global log_to_file true
	66	ceph config set global mon_cluster_log_to_file true
	67
	68	We recommend disabling logging to stderr (see above) or else everything
	69	will be logged twice::
	70
	71	ceph config set global log_to_stderr false
	72	ceph config set global mon_cluster_log_to_stderr false
	73
	74	By default, cephadm sets up log rotation on each host to rotate these
	75	files. You can configure the logging retention schedule by modifying
	76	``/etc/logrotate.d/ceph.<cluster-fsid>``.
	77
	78
	79	Data location
	80	=============
	81
	82	Cephadm daemon data and logs in slightly different locations than older
	83	versions of ceph:
	84
	85	* ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. Note
	86	that by default cephadm logs via stderr and the container runtime,
	87	so these logs are normally not present.
	88	* ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
	89	(besides logs).
	90	* ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
	91	an individual daemon.
	92	* ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
	93	the cluster.
	94	* ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
95	data directories for stateful daemons (e.g., monitor, prometheus)
96	that have been removed by cephadm.
97
98	Disk usage
99	----------
100
101	Because a few Ceph daemons may store a significant amount of data in
102	``/var/lib/ceph`` (notably, the monitors and prometheus), we recommend
103	moving this directory to its own disk, partition, or logical volume so
104	that it does not fill up the root file system.
105
106
107
108	SSH Configuration
109	=================
110
111	Cephadm uses SSH to connect to remote hosts. SSH uses a key to authenticate
112	with those hosts in a secure way.
113
114
115	Default behavior
116	----------------
117
118	Cephadm stores an SSH key in the monitor that is used to
119	connect to remote hosts. When the cluster is bootstrapped, this SSH
120	key is generated automatically and no additional configuration
121	is necessary.
122
123	A new SSH key can be generated with::
124
125	ceph cephadm generate-key
126
127	The public portion of the SSH key can be retrieved with::
128
129	ceph cephadm get-pub-key
130
131	The currently stored SSH key can be deleted with::
132
133	ceph cephadm clear-key
134
135	You can make use of an existing key by directly importing it with::
136
137	ceph config-key set mgr/cephadm/ssh_identity_key -i <key>
138	ceph config-key set mgr/cephadm/ssh_identity_pub -i <pub>
139
140	You will then need to restart the mgr daemon to reload the configuration with::
141
142	ceph mgr fail
143
f6b5b4d7 TL	144	Configuring a different SSH user
	145	----------------------------------
	146
	147	Cephadm must be able to log into all the Ceph cluster nodes as an user
	148	that has enough privileges to download container images, start containers
	149	and execute commands without prompting for a password. If you do not want
	150	to use the "root" user (default option in cephadm), you must provide
	151	cephadm the name of the user that is going to be used to perform all the
	152	cephadm operations. Use the command::
	153
	154	ceph cephadm set-user <user>
	155
	156	Prior to running this the cluster ssh key needs to be added to this users
	157	authorized_keys file and non-root users must have passwordless sudo access.
	158
9f95a23c TL	159
	160	Customizing the SSH configuration
	161	---------------------------------
	162
	163	Cephadm generates an appropriate ``ssh_config`` file that is
	164	used for connecting to remote hosts. This configuration looks
	165	something like this::
	166
	167	Host *
	168	User root
	169	StrictHostKeyChecking no
	170	UserKnownHostsFile /dev/null
	171
	172	There are two ways to customize this configuration for your environment:
	173
	174	#. Import a customized configuration file that will be stored
	175	by the monitor with::
	176
	177	ceph cephadm set-ssh-config -i <ssh_config_file>
	178
	179	To remove a customized SSH config and revert back to the default behavior::
	180
	181	ceph cephadm clear-ssh-config
	182
	183	#. You can configure a file location for the SSH configuration file with::
	184
	185	ceph config set mgr mgr/cephadm/ssh_config_file <path>
	186
	187	We do not recommend this approach. The path name must be
	188	visible to any mgr daemon, and cephadm runs all daemons as
	189	containers. That means that the file either need to be placed
	190	inside a customized container image for your deployment, or
	191	manually distributed to the mgr data directory
	192	(``/var/lib/ceph/<cluster-fsid>/mgr.<id>`` on the host, visible at
	193	``/var/lib/ceph/mgr/ceph-<id>`` from inside the container).
	194
	195
	196	Health checks
	197	=============
	198
	199	CEPHADM_PAUSED
	200	--------------
	201
	202	Cephadm background work has been paused with ``ceph orch pause``. Cephadm
	203	continues to perform passive monitoring activities (like checking
	204	host and daemon status), but it will not make any changes (like deploying
	205	or removing daemons).
	206
	207	Resume cephadm work with::
	208
	209	ceph orch resume
	210
f6b5b4d7 TL	211	.. _cephadm-stray-host:
f6b5b4d7 TL	212
9f95a23c TL	213	CEPHADM_STRAY_HOST
	214	------------------
	215
	216	One or more hosts have running Ceph daemons but are not registered as
	217	hosts managed by cephadm. This means that those services cannot
	218	currently be managed by cephadm (e.g., restarted, upgraded, included
	219	in `ceph orch ps`).
	220
	221	You can manage the host(s) with::
	222
	223	ceph orch host add <hostname>
	224
	225	Note that you may need to configure SSH access to the remote host
	226	before this will work.
	227
	228	Alternatively, you can manually connect to the host and ensure that
	229	services on that host are removed or migrated to a host that is
	230	managed by cephadm.
	231
	232	You can also disable this warning entirely with::
	233
	234	ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
	235
f6b5b4d7 TL	236	See :ref:`cephadm-fqdn` for more information about host names and
	237	domain names.
	238
9f95a23c TL	239	CEPHADM_STRAY_DAEMON
	240	--------------------
	241
	242	One or more Ceph daemons are running but not are not managed by
	243	cephadm. This may be because they were deployed using a different
	244	tool, or because they were started manually. Those
	245	services cannot currently be managed by cephadm (e.g., restarted,
	246	upgraded, or included in `ceph orch ps`).
	247
	248	If the daemon is a stateful one (monitor or OSD), it should be adopted
	249	by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
	250	usually easiest to provision a new daemon with the ``ceph orch apply``
	251	command and then stop the unmanaged daemon.
	252
	253	This warning can be disabled entirely with::
	254
	255	ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
	256
	257	CEPHADM_HOST_CHECK_FAILED
	258	-------------------------
	259
	260	One or more hosts have failed the basic cephadm host check, which verifies
	261	that (1) the host is reachable and cephadm can be executed there, and (2)
	262	that the host satisfies basic prerequisites, like a working container
	263	runtime (podman or docker) and working time synchronization.
	264	If this test fails, cephadm will no be able to manage services on that host.
	265
	266	You can manually run this check with::
	267
	268	ceph cephadm check-host <hostname>
	269
	270	You can remove a broken host from management with::
	271
	272	ceph orch host rm <hostname>
	273
	274	You can disable this health warning with::
	275
	276	ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
e306af50 TL	277
	278	/etc/ceph/ceph.conf
	279	===================
	280
	281	Cephadm uses a minimized ``ceph.conf`` that only contains
	282	a minimal set of information to connect to the Ceph cluster.
	283
	284	To update the configuration settings, use::
	285
	286	ceph config set ...
	287
	288
	289	To set up an initial configuration before calling
	290	`bootstrap`, create an initial ``ceph.conf`` file. For example::
	291
	292	cat <<EOF > /etc/ceph/ceph.conf
	293	[global]
	294	osd crush chooseleaf type = 0
	295	EOF
	296	cephadm bootstrap -c /root/ceph.conf ...