]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | Crash Module |
2 | ============ | |
3 | The crash module collects information about daemon crashdumps and stores | |
4 | it in the Ceph cluster for later analysis. | |
5 | ||
11fdf7f2 TL |
6 | Enabling |
7 | -------- | |
8 | ||
9 | The *crash* module is enabled with:: | |
10 | ||
11 | ceph mgr module enable crash | |
12 | ||
20effc67 TL |
13 | The *crash* upload key is generated with:: |
14 | ||
15 | ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' | |
16 | ||
17 | On each node, you should store this key in | |
18 | ``/etc/ceph/ceph.client.crash.keyring``. | |
19 | ||
20 | ||
21 | Automated collection | |
22 | -------------------- | |
23 | ||
24 | Daemon crashdumps are dumped in ``/var/lib/ceph/crash`` by default; this can | |
25 | be configured with the option 'crash dir'. Crash directories are named by | |
26 | time and date and a randomly-generated UUID, and contain a metadata file | |
27 | 'meta' and a recent log file, with a "crash_id" that is the same. | |
28 | ||
29 | These crashes can be automatically submitted and persisted in the monitors' | |
30 | storage by using ``ceph-crash.service``. | |
31 | It watches the crashdump directory and uploads them with ``ceph crash post``. | |
32 | ||
33 | ``ceph-crash`` tries some authentication names: ``client.crash.$hostname``, | |
34 | ``client.crash`` and ``client.admin``. | |
35 | In order to successfully upload with ``ceph crash post``, these need | |
36 | the suitable permissions: ``mon profile crash`` and ``mgr profile crash`` | |
37 | and a keyring needs to be in ``/etc/ceph``. | |
38 | ||
39 | ||
11fdf7f2 TL |
40 | Commands |
41 | -------- | |
42 | :: | |
43 | ||
44 | ceph crash post -i <metafile> | |
45 | ||
46 | Save a crash dump. The metadata file is a JSON blob stored in the crash | |
47 | dir as ``meta``. As usual, the ceph command can be invoked with ``-i -``, | |
48 | and will read from stdin. | |
49 | ||
50 | :: | |
51 | ||
f6b5b4d7 | 52 | ceph crash rm <crashid> |
11fdf7f2 TL |
53 | |
54 | Remove a specific crash dump. | |
55 | ||
56 | :: | |
57 | ||
58 | ceph crash ls | |
59 | ||
eafe8130 TL |
60 | List the timestamp/uuid crashids for all new and archived crash info. |
61 | ||
62 | :: | |
63 | ||
64 | ceph crash ls-new | |
65 | ||
66 | List the timestamp/uuid crashids for all newcrash info. | |
11fdf7f2 TL |
67 | |
68 | :: | |
69 | ||
70 | ceph crash stat | |
71 | ||
72 | Show a summary of saved crash info grouped by age. | |
73 | ||
74 | :: | |
75 | ||
76 | ceph crash info <crashid> | |
77 | ||
78 | Show all details of a saved crash. | |
79 | ||
80 | :: | |
81 | ||
82 | ceph crash prune <keep> | |
83 | ||
84 | Remove saved crashes older than 'keep' days. <keep> must be an integer. | |
85 | ||
eafe8130 TL |
86 | :: |
87 | ||
88 | ceph crash archive <crashid> | |
89 | ||
90 | Archive a crash report so that it is no longer considered for the ``RECENT_CRASH`` health check and does not appear in the ``crash ls-new`` output (it will still appear in the ``crash ls`` output). | |
91 | ||
92 | :: | |
93 | ||
94 | ceph crash archive-all | |
95 | ||
96 | Archive all new crash reports. | |
97 | ||
98 | ||
99 | Options | |
100 | ------- | |
11fdf7f2 | 101 | |
eafe8130 TL |
102 | * ``mgr/crash/warn_recent_interval`` [default: 2 weeks] controls what constitutes "recent" for the purposes of raising the ``RECENT_CRASH`` health warning. |
103 | * ``mgr/crash/retain_interval`` [default: 1 year] controls how long crash reports are retained by the cluster before they are automatically purged. |