]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/troubleshooting.rst
bump version to 15.2.11-pve1
[ceph.git] / ceph / doc / cephadm / troubleshooting.rst
CommitLineData
9f95a23c
TL
1
2Troubleshooting
3===============
4
5Sometimes there is a need to investigate why a cephadm command failed or why
6a specific service no longer runs properly.
7
8As cephadm deploys daemons as containers, troubleshooting daemons is slightly
9different. Here are a few tools and commands to help investigating issues.
10
801d1391
TL
11Pausing or disabling cephadm
12----------------------------
13
14If something goes wrong and cephadm is doing behaving in a way you do
15not like, you can pause most background activity with::
16
17 ceph orch pause
18
19This will stop any changes, but cephadm will still periodically check hosts to
20refresh its inventory of daemons and devices. You can disable cephadm
21completely with::
22
23 ceph orch set backend ''
24 ceph mgr module disable cephadm
25
26This will disable all of the ``ceph orch ...`` CLI commands but the previously
27deployed daemon containers will still continue to exist and start as they
28did before.
29
30Checking cephadm logs
31---------------------
32
33You can monitor the cephadm log in real time with::
34
35 ceph -W cephadm
36
37You can see the last few messages with::
38
39 ceph log last cephadm
40
41If you have enabled logging to files, you can see a cephadm log file called
42``ceph.cephadm.log`` on monitor hosts (see :ref:`cephadm-logs`).
43
9f95a23c
TL
44Gathering log files
45-------------------
46
47Use journalctl to gather the log files of all daemons:
48
49.. note:: By default cephadm now stores logs in journald. This means
50 that you will no longer find daemon logs in ``/var/log/ceph/``.
51
52To read the log file of one specific daemon, run::
53
54 cephadm logs --name <name-of-daemon>
55
56Note: this only works when run on the same host where the daemon is running. To
57get logs of a daemon running on a different host, give the ``--fsid`` option::
58
59 cephadm logs --fsid <fsid> --name <name-of-daemon>
60
61where the ``<fsid>`` corresponds to the cluster ID printed by ``ceph status``.
62
63To fetch all log files of all daemons on a given host, run::
64
65 for name in $(cephadm ls | jq -r '.[].name') ; do
66 cephadm logs --fsid <fsid> --name "$name" > $name;
67 done
68
69Collecting systemd status
70-------------------------
71
72To print the state of a systemd unit, run::
73
74 systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service";
75
76
77To fetch all state of all daemons of a given host, run::
78
79 fsid="$(cephadm shell ceph fsid)"
80 for name in $(cephadm ls | jq -r '.[].name') ; do
81 systemctl status "ceph-$fsid@$name.service" > $name;
82 done
83
84
85List all downloaded container images
86------------------------------------
87
88To list all container images that are downloaded on a host:
89
90.. note:: ``Image`` might also be called `ImageID`
91
92::
93
94 podman ps -a --format json | jq '.[].Image'
95 "docker.io/library/centos:8"
96 "registry.opensuse.org/opensuse/leap:15.2"
97
98
99Manually running containers
100---------------------------
101
102Cephadm writes small wrappers that run a containers. Refer to
103``/var/lib/ceph/<cluster-fsid>/<service-name>/unit.run`` for the
104container execution command.
1911f103 105
f6b5b4d7 106.. _cephadm-ssh-errors:
1911f103
TL
107
108ssh errors
109----------
110
111Error message::
112
f91f0fd5
TL
113 execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-73z09u6g -i /tmp/cephadm-identity-ky7ahp_5 root@10.10.1.2
114 ...
115 raise OrchestratorError(msg) from e
116 orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2).
117 Please make sure that the host is reachable and accepts connections using the cephadm SSH key
118 ...
1911f103
TL
119
120Things users can do:
121
1221. Ensure cephadm has an SSH identity key::
f91f0fd5
TL
123
124 [root@mon1~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
1911f103
TL
125 INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98
126 INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key'
f91f0fd5 127 [root@mon1 ~] # chmod 0600 ~/cephadm_private_key
1911f103
TL
128
129 If this fails, cephadm doesn't have a key. Fix this by running the following command::
f91f0fd5 130
1911f103
TL
131 [root@mon1 ~]# cephadm shell -- ceph cephadm generate-ssh-key
132
133 or::
f91f0fd5
TL
134
135 [root@mon1 ~]# cat ~/cephadm_private_key | cephadm shell -- ceph cephadm set-ssk-key -i -
1911f103
TL
136
1372. Ensure that the ssh config is correct::
f91f0fd5 138
1911f103
TL
139 [root@mon1 ~]# cephadm shell -- ceph cephadm get-ssh-config > config
140
1413. Verify that we can connect to the host::
1911f103 142
f91f0fd5 143 [root@mon1 ~]# ssh -F config -i ~/cephadm_private_key root@mon1
1911f103
TL
144
145Verifying that the Public Key is Listed in the authorized_keys file
146^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
147To verify that the public key is in the authorized_keys file, run the following commands::
148
f91f0fd5
TL
149 [root@mon1 ~]# cephadm shell -- ceph cephadm get-pub-key > ~/ceph.pub
150 [root@mon1 ~]# grep "`cat ~/ceph.pub`" /root/.ssh/authorized_keys
e306af50
TL
151
152Failed to infer CIDR network error
153----------------------------------
154
155If you see this error::
156
157 ERROR: Failed to infer CIDR network for mon ip ***; pass --skip-mon-network to configure it later
158
159Or this error::
160
161 Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP
162
163This means that you must run a command of this form::
164
165 ceph config set mon public_network <mon_network>
166
167For more detail on operations of this kind, see :ref:`deploy_additional_monitors`
168
169Accessing the admin socket
170--------------------------
171
172Each Ceph daemon provides an admin socket that bypasses the
173MONs (See :ref:`rados-monitoring-using-admin-socket`).
174
175To access the admin socket, first enter the daemon container on the host::
176
177 [root@mon1 ~]# cephadm enter --name <daemon-name>
178 [ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show