]>
Commit | Line | Data |
---|---|---|
9f95a23c TL |
1 | ================== |
2 | Cephadm Operations | |
3 | ================== | |
4 | ||
5 | Watching cephadm log messages | |
6 | ============================= | |
7 | ||
8 | Cephadm logs to the ``cephadm`` cluster log channel, meaning you can | |
9 | monitor progress in realtime with:: | |
10 | ||
11 | # ceph -W cephadm | |
12 | ||
13 | By default it will show info-level events and above. To see | |
14 | debug-level messages too:: | |
15 | ||
16 | # ceph config set mgr mgr/cephadm/log_to_cluster_level debug | |
17 | # ceph -W cephadm --watch-debug | |
18 | ||
19 | Be careful: the debug messages are very verbose! | |
20 | ||
21 | You can see recent events with:: | |
22 | ||
23 | # ceph log last cephadm | |
24 | ||
25 | These events are also logged to the ``ceph.cephadm.log`` file on | |
26 | monitor hosts and to the monitor daemons' stderr. | |
27 | ||
28 | ||
801d1391 TL |
29 | .. _cephadm-logs: |
30 | ||
9f95a23c TL |
31 | Ceph daemon logs |
32 | ================ | |
33 | ||
34 | Logging to stdout | |
35 | ----------------- | |
36 | ||
37 | Traditionally, Ceph daemons have logged to ``/var/log/ceph``. By | |
38 | default, cephadm daemons log to stderr and the logs are | |
39 | captured by the container runtime environment. For most systems, by | |
40 | default, these logs are sent to journald and accessible via | |
41 | ``journalctl``. | |
42 | ||
43 | For example, to view the logs for the daemon ``mon.foo`` for a cluster | |
44 | with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be | |
45 | something like:: | |
46 | ||
47 | journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo | |
48 | ||
49 | This works well for normal operations when logging levels are low. | |
50 | ||
51 | To disable logging to stderr:: | |
52 | ||
53 | ceph config set global log_to_stderr false | |
54 | ceph config set global mon_cluster_log_to_stderr false | |
55 | ||
56 | Logging to files | |
57 | ---------------- | |
58 | ||
59 | You can also configure Ceph daemons to log to files instead of stderr, | |
60 | just like they have in the past. When logging to files, Ceph logs appear | |
61 | in ``/var/log/ceph/<cluster-fsid>``. | |
62 | ||
63 | To enable logging to files:: | |
64 | ||
65 | ceph config set global log_to_file true | |
66 | ceph config set global mon_cluster_log_to_file true | |
67 | ||
68 | We recommend disabling logging to stderr (see above) or else everything | |
69 | will be logged twice:: | |
70 | ||
71 | ceph config set global log_to_stderr false | |
72 | ceph config set global mon_cluster_log_to_stderr false | |
73 | ||
74 | By default, cephadm sets up log rotation on each host to rotate these | |
75 | files. You can configure the logging retention schedule by modifying | |
76 | ``/etc/logrotate.d/ceph.<cluster-fsid>``. | |
77 | ||
78 | ||
79 | Data location | |
80 | ============= | |
81 | ||
82 | Cephadm daemon data and logs in slightly different locations than older | |
83 | versions of ceph: | |
84 | ||
85 | * ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. Note | |
86 | that by default cephadm logs via stderr and the container runtime, | |
87 | so these logs are normally not present. | |
88 | * ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data | |
89 | (besides logs). | |
90 | * ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for | |
91 | an individual daemon. | |
92 | * ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for | |
93 | the cluster. | |
94 | * ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon | |
95 | data directories for stateful daemons (e.g., monitor, prometheus) | |
96 | that have been removed by cephadm. | |
97 | ||
98 | Disk usage | |
99 | ---------- | |
100 | ||
101 | Because a few Ceph daemons may store a significant amount of data in | |
102 | ``/var/lib/ceph`` (notably, the monitors and prometheus), we recommend | |
103 | moving this directory to its own disk, partition, or logical volume so | |
104 | that it does not fill up the root file system. | |
105 | ||
106 | ||
9f95a23c TL |
107 | Health checks |
108 | ============= | |
f67539c2 TL |
109 | The cephadm module provides additional healthchecks to supplement the default healthchecks |
110 | provided by the Cluster. These additional healthchecks fall into two categories; | |
111 | ||
112 | - **cephadm operations**: Healthchecks in this category are always executed when the cephadm module is active. | |
113 | - **cluster configuration**: These healthchecks are *optional*, and focus on the configuration of the hosts in | |
114 | the cluster | |
115 | ||
116 | CEPHADM Operations | |
117 | ------------------ | |
9f95a23c TL |
118 | |
119 | CEPHADM_PAUSED | |
f67539c2 | 120 | ^^^^^^^^^^^^^^ |
9f95a23c TL |
121 | |
122 | Cephadm background work has been paused with ``ceph orch pause``. Cephadm | |
123 | continues to perform passive monitoring activities (like checking | |
124 | host and daemon status), but it will not make any changes (like deploying | |
125 | or removing daemons). | |
126 | ||
127 | Resume cephadm work with:: | |
128 | ||
129 | ceph orch resume | |
130 | ||
f6b5b4d7 TL |
131 | .. _cephadm-stray-host: |
132 | ||
9f95a23c | 133 | CEPHADM_STRAY_HOST |
f67539c2 | 134 | ^^^^^^^^^^^^^^^^^^ |
9f95a23c TL |
135 | |
136 | One or more hosts have running Ceph daemons but are not registered as | |
137 | hosts managed by *cephadm*. This means that those services cannot | |
138 | currently be managed by cephadm (e.g., restarted, upgraded, included | |
139 | in `ceph orch ps`). | |
140 | ||
141 | You can manage the host(s) with:: | |
142 | ||
143 | ceph orch host add *<hostname>* | |
144 | ||
145 | Note that you may need to configure SSH access to the remote host | |
146 | before this will work. | |
147 | ||
148 | Alternatively, you can manually connect to the host and ensure that | |
149 | services on that host are removed or migrated to a host that is | |
150 | managed by *cephadm*. | |
151 | ||
152 | You can also disable this warning entirely with:: | |
153 | ||
154 | ceph config set mgr mgr/cephadm/warn_on_stray_hosts false | |
155 | ||
f6b5b4d7 TL |
156 | See :ref:`cephadm-fqdn` for more information about host names and |
157 | domain names. | |
158 | ||
9f95a23c | 159 | CEPHADM_STRAY_DAEMON |
f67539c2 | 160 | ^^^^^^^^^^^^^^^^^^^^ |
9f95a23c TL |
161 | |
162 | One or more Ceph daemons are running but not are not managed by | |
163 | *cephadm*. This may be because they were deployed using a different | |
164 | tool, or because they were started manually. Those | |
165 | services cannot currently be managed by cephadm (e.g., restarted, | |
166 | upgraded, or included in `ceph orch ps`). | |
167 | ||
168 | If the daemon is a stateful one (monitor or OSD), it should be adopted | |
169 | by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is | |
170 | usually easiest to provision a new daemon with the ``ceph orch apply`` | |
171 | command and then stop the unmanaged daemon. | |
172 | ||
173 | This warning can be disabled entirely with:: | |
174 | ||
175 | ceph config set mgr mgr/cephadm/warn_on_stray_daemons false | |
176 | ||
177 | CEPHADM_HOST_CHECK_FAILED | |
f67539c2 | 178 | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
9f95a23c TL |
179 | |
180 | One or more hosts have failed the basic cephadm host check, which verifies | |
181 | that (1) the host is reachable and cephadm can be executed there, and (2) | |
182 | that the host satisfies basic prerequisites, like a working container | |
183 | runtime (podman or docker) and working time synchronization. | |
184 | If this test fails, cephadm will no be able to manage services on that host. | |
185 | ||
186 | You can manually run this check with:: | |
187 | ||
188 | ceph cephadm check-host *<hostname>* | |
189 | ||
190 | You can remove a broken host from management with:: | |
191 | ||
192 | ceph orch host rm *<hostname>* | |
193 | ||
194 | You can disable this health warning with:: | |
195 | ||
196 | ceph config set mgr mgr/cephadm/warn_on_failed_host_check false | |
e306af50 | 197 | |
f67539c2 TL |
198 | Cluster Configuration Checks |
199 | ---------------------------- | |
200 | Cephadm periodically scans each of the hosts in the cluster, to understand the state | |
201 | of the OS, disks, NICs etc. These facts can then be analysed for consistency across the hosts | |
202 | in the cluster to identify any configuration anomalies. | |
e306af50 | 203 | |
f67539c2 TL |
204 | The configuration checks are an **optional** feature, enabled by the following command |
205 | :: | |
e306af50 | 206 | |
f67539c2 | 207 | ceph config set mgr mgr/cephadm/config_checks_enabled true |
e306af50 | 208 | |
f67539c2 TL |
209 | The configuration checks are triggered after each host scan (1m). The cephadm log entries will |
210 | show the current state and outcome of the configuration checks as follows; | |
e306af50 | 211 | |
f67539c2 TL |
212 | Disabled state (config_checks_enabled false) |
213 | :: | |
e306af50 | 214 | |
f67539c2 | 215 | ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable |
f91f0fd5 | 216 | |
f67539c2 TL |
217 | Enabled state (config_checks_enabled true) |
218 | :: | |
f91f0fd5 | 219 | |
f67539c2 | 220 | CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected |
e306af50 | 221 | |
f67539c2 | 222 | The configuration checks themselves are managed through several cephadm sub-commands. |
f91f0fd5 | 223 | |
f67539c2 TL |
224 | To determine whether the configuration checks are enabled, you can use the following command |
225 | :: | |
f91f0fd5 | 226 | |
f67539c2 TL |
227 | ceph cephadm config-check status |
228 | ||
229 | This command will return the status of the configuration checker as either "Enabled" or "Disabled". | |
230 | ||
231 | ||
232 | Listing all the configuration checks and their current state | |
233 | :: | |
234 | ||
235 | ceph cephadm config-check ls | |
236 | ||
237 | e.g. | |
238 | NAME HEALTHCHECK STATUS DESCRIPTION | |
239 | kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts | |
240 | os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts | |
241 | public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork | |
242 | osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting | |
243 | osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed | |
244 | network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts | |
245 | ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active) | |
246 | kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent | |
247 | ||
248 | The name of each configuration check, can then be used to enable or disable a specific check. | |
249 | :: | |
adb31ebb | 250 | |
f67539c2 | 251 | ceph cephadm config-check disable <name> |
adb31ebb | 252 | |
f67539c2 TL |
253 | eg. |
254 | ceph cephadm config-check disable kernel_security | |
adb31ebb | 255 | |
f67539c2 TL |
256 | CEPHADM_CHECK_KERNEL_LSM |
257 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
258 | Each host within the cluster is expected to operate within the same Linux Security Module (LSM) state. For example, | |
259 | if the majority of the hosts are running with SELINUX in enforcing mode, any host not running in this mode | |
260 | would be flagged as an anomaly and a healtcheck (WARNING) state raised. | |
adb31ebb | 261 | |
f67539c2 TL |
262 | CEPHADM_CHECK_SUBSCRIPTION |
263 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
264 | This check relates to the status of vendor subscription. This check is only performed for hosts using RHEL, but helps | |
265 | to confirm that all your hosts are covered by an active subscription so patches and updates | |
266 | are available. | |
adb31ebb | 267 | |
f67539c2 TL |
268 | CEPHADM_CHECK_PUBLIC_MEMBERSHIP |
269 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
270 | All members of the cluster should have NICs configured on at least one of the public network subnets. Hosts | |
271 | that are not on the public network will rely on routing which may affect performance | |
adb31ebb | 272 | |
f67539c2 TL |
273 | CEPHADM_CHECK_MTU |
274 | ^^^^^^^^^^^^^^^^^ | |
275 | The MTU of the NICs on OSDs can be a key factor in consistent performance. This check examines hosts | |
276 | that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is | |
277 | determined by establishing the MTU setting that the majority of hosts are using, with any anomalies being | |
278 | resulting in a Ceph healthcheck. | |
adb31ebb | 279 | |
f67539c2 TL |
280 | CEPHADM_CHECK_LINKSPEED |
281 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
282 | Similar to the MTU check, linkspeed consistency is also a factor in consistent cluster performance. | |
283 | This check determines the linkspeed shared by the majority of "OSD hosts", resulting in a healthcheck for | |
284 | any hosts that are set at a lower linkspeed rate. | |
adb31ebb | 285 | |
f67539c2 TL |
286 | CEPHADM_CHECK_NETWORK_MISSING |
287 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
288 | The public_network and cluster_network settings support subnet definitions for IPv4 and IPv6. If these | |
289 | settings are not found on any host in the cluster a healthcheck is raised. | |
adb31ebb | 290 | |
f67539c2 TL |
291 | CEPHADM_CHECK_CEPH_RELEASE |
292 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
293 | Under normal operations, the ceph cluster should be running daemons under the same ceph release (i.e. all | |
294 | pacific). This check looks at the active release for each daemon, and reports any anomalies as a | |
295 | healthcheck. *This check is bypassed if an upgrade process is active within the cluster.* | |
adb31ebb | 296 | |
f67539c2 TL |
297 | CEPHADM_CHECK_KERNEL_VERSION |
298 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
299 | The OS kernel version (maj.min) is checked for consistency across the hosts. Once again, the | |
300 | majority of the hosts is used as the basis of identifying anomalies. | |
adb31ebb | 301 | |
b3b6e05e TL |
302 | Client keyrings and configs |
303 | =========================== | |
304 | ||
305 | Cephadm can distribute copies of the ``ceph.conf`` and client keyring | |
306 | files to hosts. For example, it is usually a good idea to store a | |
307 | copy of the config and ``client.admin`` keyring on any hosts that will | |
308 | be used to administer the cluster via the CLI. By default, cephadm will do | |
309 | this for any nodes with the ``_admin`` label (which normally includes the bootstrap | |
310 | host). | |
311 | ||
312 | When a client keyring is placed under management, cephadm will: | |
313 | ||
314 | - build a list of target hosts based on the specified placement spec (see :ref:`orchestrator-cli-placement-spec`) | |
315 | - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s) | |
316 | - store a copy of the keyring file on the specified host(s) | |
317 | - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors) | |
318 | - update the keyring file if the entity's key is changed (e.g., via ``ceph auth ...`` commands) | |
319 | - ensure the keyring file has the specified ownership and mode | |
320 | - remove the keyring file when client keyring management is disabled | |
321 | - remove the keyring file from old hosts if the keyring placement spec is updated (as needed) | |
322 | ||
323 | To view which client keyrings are currently under management:: | |
324 | ||
325 | ceph orch client-keyring ls | |
326 | ||
327 | To place a keyring under management:: | |
f67539c2 | 328 | |
b3b6e05e | 329 | ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>] |
f67539c2 | 330 | |
b3b6e05e TL |
331 | - By default, the *path* will be ``/etc/ceph/client.{entity}.keyring``, which is where |
332 | Ceph looks by default. Be careful specifying alternate locations as existing files | |
333 | may be overwritten. | |
334 | - A placement of ``*`` (all hosts) is common. | |
335 | - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root). | |
adb31ebb | 336 | |
b3b6e05e | 337 | For example, to create and deploy a ``client.rbd`` key to hosts with the ``rbd-client`` label and group readable by uid/gid 107 (qemu),:: |
adb31ebb | 338 | |
b3b6e05e TL |
339 | ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool' |
340 | ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640 | |
f67539c2 | 341 | |
b3b6e05e TL |
342 | The resulting keyring file is:: |
343 | ||
344 | -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring | |
345 | ||
346 | To disable management of a keyring file:: | |
347 | ||
348 | ceph orch client-keyring rm <entity> | |
349 | ||
350 | Note that this will delete any keyring files for this entity that were previously written | |
351 | to cluster nodes. | |
352 | ||
353 | ||
354 | /etc/ceph/ceph.conf | |
355 | =================== | |
adb31ebb | 356 | |
b3b6e05e TL |
357 | It may also be useful to distribute ``ceph.conf`` files to hosts without an associated |
358 | client keyring file. By default, cephadm only deploys a ``ceph.conf`` file to hosts where a client keyring | |
359 | is also distributed (see above). To write config files to hosts without client keyrings:: | |
adb31ebb | 360 | |
b3b6e05e | 361 | ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true |
adb31ebb | 362 | |
b3b6e05e TL |
363 | By default, the configs are written to all hosts (i.e., those listed |
364 | by ``ceph orch host ls``). To specify which hosts get a ``ceph.conf``:: | |
adb31ebb | 365 | |
b3b6e05e | 366 | ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec> |
adb31ebb | 367 | |
b3b6e05e | 368 | For example, to distribute configs to hosts with the ``bare_config`` label,:: |
adb31ebb | 369 | |
b3b6e05e | 370 | ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config |
adb31ebb | 371 | |
b3b6e05e | 372 | (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.) |