]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/troubleshooting.rst
import 15.2.5
[ceph.git] / ceph / doc / cephadm / troubleshooting.rst
CommitLineData
9f95a23c
TL
1
2Troubleshooting
3===============
4
5Sometimes there is a need to investigate why a cephadm command failed or why
6a specific service no longer runs properly.
7
8As cephadm deploys daemons as containers, troubleshooting daemons is slightly
9different. Here are a few tools and commands to help investigating issues.
10
801d1391
TL
11Pausing or disabling cephadm
12----------------------------
13
14If something goes wrong and cephadm is doing behaving in a way you do
15not like, you can pause most background activity with::
16
17 ceph orch pause
18
19This will stop any changes, but cephadm will still periodically check hosts to
20refresh its inventory of daemons and devices. You can disable cephadm
21completely with::
22
23 ceph orch set backend ''
24 ceph mgr module disable cephadm
25
26This will disable all of the ``ceph orch ...`` CLI commands but the previously
27deployed daemon containers will still continue to exist and start as they
28did before.
29
30Checking cephadm logs
31---------------------
32
33You can monitor the cephadm log in real time with::
34
35 ceph -W cephadm
36
37You can see the last few messages with::
38
39 ceph log last cephadm
40
41If you have enabled logging to files, you can see a cephadm log file called
42``ceph.cephadm.log`` on monitor hosts (see :ref:`cephadm-logs`).
43
9f95a23c
TL
44Gathering log files
45-------------------
46
47Use journalctl to gather the log files of all daemons:
48
49.. note:: By default cephadm now stores logs in journald. This means
50 that you will no longer find daemon logs in ``/var/log/ceph/``.
51
52To read the log file of one specific daemon, run::
53
54 cephadm logs --name <name-of-daemon>
55
56Note: this only works when run on the same host where the daemon is running. To
57get logs of a daemon running on a different host, give the ``--fsid`` option::
58
59 cephadm logs --fsid <fsid> --name <name-of-daemon>
60
61where the ``<fsid>`` corresponds to the cluster ID printed by ``ceph status``.
62
63To fetch all log files of all daemons on a given host, run::
64
65 for name in $(cephadm ls | jq -r '.[].name') ; do
66 cephadm logs --fsid <fsid> --name "$name" > $name;
67 done
68
69Collecting systemd status
70-------------------------
71
72To print the state of a systemd unit, run::
73
74 systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service";
75
76
77To fetch all state of all daemons of a given host, run::
78
79 fsid="$(cephadm shell ceph fsid)"
80 for name in $(cephadm ls | jq -r '.[].name') ; do
81 systemctl status "ceph-$fsid@$name.service" > $name;
82 done
83
84
85List all downloaded container images
86------------------------------------
87
88To list all container images that are downloaded on a host:
89
90.. note:: ``Image`` might also be called `ImageID`
91
92::
93
94 podman ps -a --format json | jq '.[].Image'
95 "docker.io/library/centos:8"
96 "registry.opensuse.org/opensuse/leap:15.2"
97
98
99Manually running containers
100---------------------------
101
102Cephadm writes small wrappers that run a containers. Refer to
103``/var/lib/ceph/<cluster-fsid>/<service-name>/unit.run`` for the
104container execution command.
1911f103 105
f6b5b4d7 106.. _cephadm-ssh-errors:
1911f103
TL
107
108ssh errors
109----------
110
111Error message::
112
113 xxxxxx.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-kbqvkrkw root@10.10.1.2
114 raise OrchestratorError('Failed to connect to %s (%s). Check that the host is reachable and accepts connections using the cephadm SSH key' % (host, addr)) from
115 orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2). Check that the host is reachable and accepts connections using the cephadm SSH key
116
117Things users can do:
118
1191. Ensure cephadm has an SSH identity key::
120
121 [root@mon1~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_key > key
122 INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98
123 INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key'
124 [root@mon1 ~] # chmod 0600 key
125
126 If this fails, cephadm doesn't have a key. Fix this by running the following command::
127
128 [root@mon1 ~]# cephadm shell -- ceph cephadm generate-ssh-key
129
130 or::
131
132 [root@mon1 ~]# cat key | cephadm shell -- ceph cephadm set-ssk-key -i -
133
1342. Ensure that the ssh config is correct::
135
136 [root@mon1 ~]# cephadm shell -- ceph cephadm get-ssh-config > config
137
1383. Verify that we can connect to the host::
139
140 [root@mon1 ~]# ssh -F config -i key root@mon1
141
1911f103
TL
142
143
144
145Verifying that the Public Key is Listed in the authorized_keys file
146^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
147To verify that the public key is in the authorized_keys file, run the following commands::
148
149 [root@mon1 ~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_pub > key.pub
150 [root@mon1 ~]# grep "`cat key.pub`" /root/.ssh/authorized_keys
e306af50
TL
151
152Failed to infer CIDR network error
153----------------------------------
154
155If you see this error::
156
157 ERROR: Failed to infer CIDR network for mon ip ***; pass --skip-mon-network to configure it later
158
159Or this error::
160
161 Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP
162
163This means that you must run a command of this form::
164
165 ceph config set mon public_network <mon_network>
166
167For more detail on operations of this kind, see :ref:`deploy_additional_monitors`
168
169Accessing the admin socket
170--------------------------
171
172Each Ceph daemon provides an admin socket that bypasses the
173MONs (See :ref:`rados-monitoring-using-admin-socket`).
174
175To access the admin socket, first enter the daemon container on the host::
176
177 [root@mon1 ~]# cephadm enter --name <daemon-name>
178 [ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show