]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/troubleshooting.rst
29e36958e09c705d712019f97a26766a4ea5d28d
[ceph.git] / ceph / doc / cephadm / troubleshooting.rst
1
2 Troubleshooting
3 ===============
4
5 Sometimes there is a need to investigate why a cephadm command failed or why
6 a specific service no longer runs properly.
7
8 As cephadm deploys daemons as containers, troubleshooting daemons is slightly
9 different. Here are a few tools and commands to help investigating issues.
10
11 Pausing or disabling cephadm
12 ----------------------------
13
14 If something goes wrong and cephadm is doing behaving in a way you do
15 not like, you can pause most background activity with::
16
17 ceph orch pause
18
19 This will stop any changes, but cephadm will still periodically check hosts to
20 refresh its inventory of daemons and devices. You can disable cephadm
21 completely with::
22
23 ceph orch set backend ''
24 ceph mgr module disable cephadm
25
26 This will disable all of the ``ceph orch ...`` CLI commands but the previously
27 deployed daemon containers will still continue to exist and start as they
28 did before.
29
30 Checking cephadm logs
31 ---------------------
32
33 You can monitor the cephadm log in real time with::
34
35 ceph -W cephadm
36
37 You can see the last few messages with::
38
39 ceph log last cephadm
40
41 If you have enabled logging to files, you can see a cephadm log file called
42 ``ceph.cephadm.log`` on monitor hosts (see :ref:`cephadm-logs`).
43
44 Gathering log files
45 -------------------
46
47 Use journalctl to gather the log files of all daemons:
48
49 .. note:: By default cephadm now stores logs in journald. This means
50 that you will no longer find daemon logs in ``/var/log/ceph/``.
51
52 To read the log file of one specific daemon, run::
53
54 cephadm logs --name <name-of-daemon>
55
56 Note: this only works when run on the same host where the daemon is running. To
57 get logs of a daemon running on a different host, give the ``--fsid`` option::
58
59 cephadm logs --fsid <fsid> --name <name-of-daemon>
60
61 where the ``<fsid>`` corresponds to the cluster ID printed by ``ceph status``.
62
63 To fetch all log files of all daemons on a given host, run::
64
65 for name in $(cephadm ls | jq -r '.[].name') ; do
66 cephadm logs --fsid <fsid> --name "$name" > $name;
67 done
68
69 Collecting systemd status
70 -------------------------
71
72 To print the state of a systemd unit, run::
73
74 systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service";
75
76
77 To fetch all state of all daemons of a given host, run::
78
79 fsid="$(cephadm shell ceph fsid)"
80 for name in $(cephadm ls | jq -r '.[].name') ; do
81 systemctl status "ceph-$fsid@$name.service" > $name;
82 done
83
84
85 List all downloaded container images
86 ------------------------------------
87
88 To list all container images that are downloaded on a host:
89
90 .. note:: ``Image`` might also be called `ImageID`
91
92 ::
93
94 podman ps -a --format json | jq '.[].Image'
95 "docker.io/library/centos:8"
96 "registry.opensuse.org/opensuse/leap:15.2"
97
98
99 Manually running containers
100 ---------------------------
101
102 Cephadm writes small wrappers that run a containers. Refer to
103 ``/var/lib/ceph/<cluster-fsid>/<service-name>/unit.run`` for the
104 container execution command.
105
106
107 ssh errors
108 ----------
109
110 Error message::
111
112 xxxxxx.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-kbqvkrkw root@10.10.1.2
113 raise OrchestratorError('Failed to connect to %s (%s). Check that the host is reachable and accepts connections using the cephadm SSH key' % (host, addr)) from
114 orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2). Check that the host is reachable and accepts connections using the cephadm SSH key
115
116 Things users can do:
117
118 1. Ensure cephadm has an SSH identity key::
119
120 [root@mon1~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_key > key
121 INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98
122 INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key'
123 [root@mon1 ~] # chmod 0600 key
124
125 If this fails, cephadm doesn't have a key. Fix this by running the following command::
126
127 [root@mon1 ~]# cephadm shell -- ceph cephadm generate-ssh-key
128
129 or::
130
131 [root@mon1 ~]# cat key | cephadm shell -- ceph cephadm set-ssk-key -i -
132
133 2. Ensure that the ssh config is correct::
134
135 [root@mon1 ~]# cephadm shell -- ceph cephadm get-ssh-config > config
136
137 3. Verify that we can connect to the host::
138
139 [root@mon1 ~]# ssh -F config -i key root@mon1
140
141 4. There is a limitation right now: the ssh user is always `root`.
142
143
144
145 Verifying that the Public Key is Listed in the authorized_keys file
146 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
147 To verify that the public key is in the authorized_keys file, run the following commands::
148
149 [root@mon1 ~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_pub > key.pub
150 [root@mon1 ~]# grep "`cat key.pub`" /root/.ssh/authorized_keys
151
152 Failed to infer CIDR network error
153 ----------------------------------
154
155 If you see this error::
156
157 ERROR: Failed to infer CIDR network for mon ip ***; pass --skip-mon-network to configure it later
158
159 Or this error::
160
161 Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP
162
163 This means that you must run a command of this form::
164
165 ceph config set mon public_network <mon_network>
166
167 For more detail on operations of this kind, see :ref:`deploy_additional_monitors`
168
169 Accessing the admin socket
170 --------------------------
171
172 Each Ceph daemon provides an admin socket that bypasses the
173 MONs (See :ref:`rados-monitoring-using-admin-socket`).
174
175 To access the admin socket, first enter the daemon container on the host::
176
177 [root@mon1 ~]# cephadm enter --name <daemon-name>
178 [ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show