]> git.proxmox.com Git - ceph.git/blob - ceph/doc/mgr/crash.rst
bump version to 15.2.4-pve1
[ceph.git] / ceph / doc / mgr / crash.rst
1 Crash Module
2 ============
3 The crash module collects information about daemon crashdumps and stores
4 it in the Ceph cluster for later analysis.
5
6 Daemon crashdumps are dumped in /var/lib/ceph/crash by default; this can
7 be configured with the option 'crash dir'. Crash directories are named by
8 time and date and a randomly-generated UUID, and contain a metadata file
9 'meta' and a recent log file, with a "crash_id" that is the same.
10 This module allows the metadata about those dumps to be persisted in
11 the monitors' storage.
12
13 Enabling
14 --------
15
16 The *crash* module is enabled with::
17
18 ceph mgr module enable crash
19
20 Commands
21 --------
22 ::
23
24 ceph crash post -i <metafile>
25
26 Save a crash dump. The metadata file is a JSON blob stored in the crash
27 dir as ``meta``. As usual, the ceph command can be invoked with ``-i -``,
28 and will read from stdin.
29
30 ::
31
32 ceph rm <crashid>
33
34 Remove a specific crash dump.
35
36 ::
37
38 ceph crash ls
39
40 List the timestamp/uuid crashids for all new and archived crash info.
41
42 ::
43
44 ceph crash ls-new
45
46 List the timestamp/uuid crashids for all newcrash info.
47
48 ::
49
50 ceph crash stat
51
52 Show a summary of saved crash info grouped by age.
53
54 ::
55
56 ceph crash info <crashid>
57
58 Show all details of a saved crash.
59
60 ::
61
62 ceph crash prune <keep>
63
64 Remove saved crashes older than 'keep' days. <keep> must be an integer.
65
66 ::
67
68 ceph crash archive <crashid>
69
70 Archive a crash report so that it is no longer considered for the ``RECENT_CRASH`` health check and does not appear in the ``crash ls-new`` output (it will still appear in the ``crash ls`` output).
71
72 ::
73
74 ceph crash archive-all
75
76 Archive all new crash reports.
77
78
79 Options
80 -------
81
82 * ``mgr/crash/warn_recent_interval`` [default: 2 weeks] controls what constitutes "recent" for the purposes of raising the ``RECENT_CRASH`` health warning.
83 * ``mgr/crash/retain_interval`` [default: 1 year] controls how long crash reports are retained by the cluster before they are automatically purged.