]>
Commit | Line | Data |
---|---|---|
9f95a23c TL |
1 | |
2 | Troubleshooting | |
3 | =============== | |
4 | ||
5 | Sometimes there is a need to investigate why a cephadm command failed or why | |
6 | a specific service no longer runs properly. | |
7 | ||
8 | As cephadm deploys daemons as containers, troubleshooting daemons is slightly | |
9 | different. Here are a few tools and commands to help investigating issues. | |
10 | ||
801d1391 TL |
11 | Pausing or disabling cephadm |
12 | ---------------------------- | |
13 | ||
14 | If something goes wrong and cephadm is doing behaving in a way you do | |
15 | not like, you can pause most background activity with:: | |
16 | ||
17 | ceph orch pause | |
18 | ||
19 | This will stop any changes, but cephadm will still periodically check hosts to | |
20 | refresh its inventory of daemons and devices. You can disable cephadm | |
21 | completely with:: | |
22 | ||
23 | ceph orch set backend '' | |
24 | ceph mgr module disable cephadm | |
25 | ||
26 | This will disable all of the ``ceph orch ...`` CLI commands but the previously | |
27 | deployed daemon containers will still continue to exist and start as they | |
28 | did before. | |
29 | ||
30 | Checking cephadm logs | |
31 | --------------------- | |
32 | ||
33 | You can monitor the cephadm log in real time with:: | |
34 | ||
35 | ceph -W cephadm | |
36 | ||
37 | You can see the last few messages with:: | |
38 | ||
39 | ceph log last cephadm | |
40 | ||
41 | If you have enabled logging to files, you can see a cephadm log file called | |
42 | ``ceph.cephadm.log`` on monitor hosts (see :ref:`cephadm-logs`). | |
43 | ||
9f95a23c TL |
44 | Gathering log files |
45 | ------------------- | |
46 | ||
47 | Use journalctl to gather the log files of all daemons: | |
48 | ||
49 | .. note:: By default cephadm now stores logs in journald. This means | |
50 | that you will no longer find daemon logs in ``/var/log/ceph/``. | |
51 | ||
52 | To read the log file of one specific daemon, run:: | |
53 | ||
54 | cephadm logs --name <name-of-daemon> | |
55 | ||
56 | Note: this only works when run on the same host where the daemon is running. To | |
57 | get logs of a daemon running on a different host, give the ``--fsid`` option:: | |
58 | ||
59 | cephadm logs --fsid <fsid> --name <name-of-daemon> | |
60 | ||
61 | where the ``<fsid>`` corresponds to the cluster ID printed by ``ceph status``. | |
62 | ||
63 | To fetch all log files of all daemons on a given host, run:: | |
64 | ||
65 | for name in $(cephadm ls | jq -r '.[].name') ; do | |
66 | cephadm logs --fsid <fsid> --name "$name" > $name; | |
67 | done | |
68 | ||
69 | Collecting systemd status | |
70 | ------------------------- | |
71 | ||
72 | To print the state of a systemd unit, run:: | |
73 | ||
74 | systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service"; | |
75 | ||
76 | ||
77 | To fetch all state of all daemons of a given host, run:: | |
78 | ||
79 | fsid="$(cephadm shell ceph fsid)" | |
80 | for name in $(cephadm ls | jq -r '.[].name') ; do | |
81 | systemctl status "ceph-$fsid@$name.service" > $name; | |
82 | done | |
83 | ||
84 | ||
85 | List all downloaded container images | |
86 | ------------------------------------ | |
87 | ||
88 | To list all container images that are downloaded on a host: | |
89 | ||
90 | .. note:: ``Image`` might also be called `ImageID` | |
91 | ||
92 | :: | |
93 | ||
94 | podman ps -a --format json | jq '.[].Image' | |
95 | "docker.io/library/centos:8" | |
96 | "registry.opensuse.org/opensuse/leap:15.2" | |
97 | ||
98 | ||
99 | Manually running containers | |
100 | --------------------------- | |
101 | ||
102 | Cephadm writes small wrappers that run a containers. Refer to | |
103 | ``/var/lib/ceph/<cluster-fsid>/<service-name>/unit.run`` for the | |
104 | container execution command. | |
1911f103 TL |
105 | |
106 | ||
107 | ssh errors | |
108 | ---------- | |
109 | ||
110 | Error message:: | |
111 | ||
112 | xxxxxx.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-kbqvkrkw root@10.10.1.2 | |
113 | raise OrchestratorError('Failed to connect to %s (%s). Check that the host is reachable and accepts connections using the cephadm SSH key' % (host, addr)) from | |
114 | orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2). Check that the host is reachable and accepts connections using the cephadm SSH key | |
115 | ||
116 | Things users can do: | |
117 | ||
118 | 1. Ensure cephadm has an SSH identity key:: | |
119 | ||
120 | [root@mon1~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_key > key | |
121 | INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98 | |
122 | INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key' | |
123 | [root@mon1 ~] # chmod 0600 key | |
124 | ||
125 | If this fails, cephadm doesn't have a key. Fix this by running the following command:: | |
126 | ||
127 | [root@mon1 ~]# cephadm shell -- ceph cephadm generate-ssh-key | |
128 | ||
129 | or:: | |
130 | ||
131 | [root@mon1 ~]# cat key | cephadm shell -- ceph cephadm set-ssk-key -i - | |
132 | ||
133 | 2. Ensure that the ssh config is correct:: | |
134 | ||
135 | [root@mon1 ~]# cephadm shell -- ceph cephadm get-ssh-config > config | |
136 | ||
137 | 3. Verify that we can connect to the host:: | |
138 | ||
139 | [root@mon1 ~]# ssh -F config -i key root@mon1 | |
140 | ||
141 | 4. There is a limitation right now: the ssh user is always `root`. | |
142 | ||
143 | ||
144 | ||
145 | Verifying that the Public Key is Listed in the authorized_keys file | |
146 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
147 | To verify that the public key is in the authorized_keys file, run the following commands:: | |
148 | ||
149 | [root@mon1 ~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_pub > key.pub | |
150 | [root@mon1 ~]# grep "`cat key.pub`" /root/.ssh/authorized_keys | |
e306af50 TL |
151 | |
152 | Failed to infer CIDR network error | |
153 | ---------------------------------- | |
154 | ||
155 | If you see this error:: | |
156 | ||
157 | ERROR: Failed to infer CIDR network for mon ip ***; pass --skip-mon-network to configure it later | |
158 | ||
159 | Or this error:: | |
160 | ||
161 | Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP | |
162 | ||
163 | This means that you must run a command of this form:: | |
164 | ||
165 | ceph config set mon public_network <mon_network> | |
166 | ||
167 | For more detail on operations of this kind, see :ref:`deploy_additional_monitors` | |
168 | ||
169 | Accessing the admin socket | |
170 | -------------------------- | |
171 | ||
172 | Each Ceph daemon provides an admin socket that bypasses the | |
173 | MONs (See :ref:`rados-monitoring-using-admin-socket`). | |
174 | ||
175 | To access the admin socket, first enter the daemon container on the host:: | |
176 | ||
177 | [root@mon1 ~]# cephadm enter --name <daemon-name> | |
178 | [ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show |