]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephfs/disaster-recovery.rst
update sources to v12.1.2
[ceph.git] / ceph / doc / cephfs / disaster-recovery.rst
CommitLineData
7c673cae
FG
1
2Disaster recovery
3=================
4
5.. danger::
6
7 The notes in this section are aimed at experts, making a best effort
8 to recovery what they can from damaged filesystems. These steps
9 have the potential to make things worse as well as better. If you
10 are unsure, do not proceed.
11
12
13Journal export
14--------------
15
16Before attempting dangerous operations, make a copy of the journal like so:
17
18::
19
20 cephfs-journal-tool journal export backup.bin
21
22Note that this command may not always work if the journal is badly corrupted,
23in which case a RADOS-level copy should be made (http://tracker.ceph.com/issues/9902).
24
25
26Dentry recovery from journal
27----------------------------
28
29If a journal is damaged or for any reason an MDS is incapable of replaying it,
30attempt to recover what file metadata we can like so:
31
32::
33
34 cephfs-journal-tool event recover_dentries summary
35
36This command by default acts on MDS rank 0, pass --rank=<n> to operate on other ranks.
37
38This command will write any inodes/dentries recoverable from the journal
39into the backing store, if these inodes/dentries are higher-versioned
40than the previous contents of the backing store. If any regions of the journal
41are missing/damaged, they will be skipped.
42
43Note that in addition to writing out dentries and inodes, this command will update
44the InoTables of each 'in' MDS rank, to indicate that any written inodes' numbers
45are now in use. In simple cases, this will result in an entirely valid backing
46store state.
47
48.. warning::
49
50 The resulting state of the backing store is not guaranteed to be self-consistent,
51 and an online MDS scrub will be required afterwards. The journal contents
52 will not be modified by this command, you should truncate the journal
53 separately after recovering what you can.
54
55Journal truncation
56------------------
57
58If the journal is corrupt or MDSs cannot replay it for any reason, you can
59truncate it like so:
60
61::
62
63 cephfs-journal-tool journal reset
64
65.. warning::
66
67 Resetting the journal *will* lose metadata unless you have extracted
68 it by other means such as ``recover_dentries``. It is likely to leave
69 some orphaned objects in the data pool. It may result in re-allocation
70 of already-written inodes, such that permissions rules could be violated.
71
72MDS table wipes
73---------------
74
75After the journal has been reset, it may no longer be consistent with respect
76to the contents of the MDS tables (InoTable, SessionMap, SnapServer).
77
78To reset the SessionMap (erase all sessions), use:
79
80::
81
82 cephfs-table-tool all reset session
83
84This command acts on the tables of all 'in' MDS ranks. Replace 'all' with an MDS
85rank to operate on that rank only.
86
87The session table is the table most likely to need resetting, but if you know you
88also need to reset the other tables then replace 'session' with 'snap' or 'inode'.
89
90MDS map reset
91-------------
92
93Once the in-RADOS state of the filesystem (i.e. contents of the metadata pool)
94is somewhat recovered, it may be necessary to update the MDS map to reflect
95the contents of the metadata pool. Use the following command to reset the MDS
96map to a single MDS:
97
98::
99
100 ceph fs reset <fs name> --yes-i-really-mean-it
101
102Once this is run, any in-RADOS state for MDS ranks other than 0 will be ignored:
103as a result it is possible for this to result in data loss.
104
105One might wonder what the difference is between 'fs reset' and 'fs remove; fs new'. The
106key distinction is that doing a remove/new will leave rank 0 in 'creating' state, such
107that it would overwrite any existing root inode on disk and orphan any existing files. In
108contrast, the 'reset' command will leave rank 0 in 'active' state such that the next MDS
109daemon to claim the rank will go ahead and use the existing in-RADOS metadata.
110
111Recovery from missing metadata objects
112--------------------------------------
113
114Depending on what objects are missing or corrupt, you may need to
115run various commands to regenerate default versions of the
116objects.
117
118::
119
120 # Session table
121 cephfs-table-tool 0 reset session
122 # SnapServer
123 cephfs-table-tool 0 reset snap
124 # InoTable
125 cephfs-table-tool 0 reset inode
126 # Journal
127 cephfs-journal-tool --rank=0 journal reset
128 # Root inodes ("/" and MDS directory)
129 cephfs-data-scan init
130
131Finally, you can regenerate metadata objects for missing files
132and directories based on the contents of a data pool. This is
c07f9fc5 133a three-phase process. First, scanning *all* objects to calculate
7c673cae 134size and mtime metadata for inodes. Second, scanning the first
c07f9fc5
FG
135object from every file to collect this metadata and inject it into
136the metadata pool. Third, checking inode linkages and fixing found
137errors.
7c673cae
FG
138
139::
140
141 cephfs-data-scan scan_extents <data pool>
142 cephfs-data-scan scan_inodes <data pool>
c07f9fc5 143 cephfs-data-scan scan_links
7c673cae 144
c07f9fc5
FG
145'scan_extents' and 'scan_inodes' commands may take a *very long* time
146if there are many files or very large files in the data pool.
7c673cae
FG
147
148To accelerate the process, run multiple instances of the tool.
149
150Decide on a number of workers, and pass each worker a number within
151the range 0-(worker_m - 1).
152
153The example below shows how to run 4 workers simultaneously:
154
155::
156
157 # Worker 0
158 cephfs-data-scan scan_extents --worker_n 0 --worker_m 4 <data pool>
159 # Worker 1
160 cephfs-data-scan scan_extents --worker_n 1 --worker_m 4 <data pool>
161 # Worker 2
162 cephfs-data-scan scan_extents --worker_n 2 --worker_m 4 <data pool>
163 # Worker 3
164 cephfs-data-scan scan_extents --worker_n 3 --worker_m 4 <data pool>
165
166 # Worker 0
167 cephfs-data-scan scan_inodes --worker_n 0 --worker_m 4 <data pool>
168 # Worker 1
169 cephfs-data-scan scan_inodes --worker_n 1 --worker_m 4 <data pool>
170 # Worker 2
171 cephfs-data-scan scan_inodes --worker_n 2 --worker_m 4 <data pool>
172 # Worker 3
173 cephfs-data-scan scan_inodes --worker_n 3 --worker_m 4 <data pool>
174
175It is **important** to ensure that all workers have completed the
176scan_extents phase before any workers enter the scan_inodes phase.
177
178After completing the metadata recovery, you may want to run cleanup
179operation to delete ancillary data geneated during recovery.
180
181::
182
183 cephfs-data-scan cleanup <data pool>
184
185Finding files affected by lost data PGs
186---------------------------------------
187
188Losing a data PG may affect many files. Files are split into many objects,
189so identifying which files are affected by loss of particular PGs requires
190a full scan over all object IDs that may exist within the size of a file.
191This type of scan may be useful for identifying which files require
192restoring from a backup.
193
194.. danger::
195
196 This command does not repair any metadata, so when restoring files in
197 this case you must *remove* the damaged file, and replace it in order
198 to have a fresh inode. Do not overwrite damaged files in place.
199
200If you know that objects have been lost from PGs, use the ``pg_files``
201subcommand to scan for files that may have been damaged as a result:
202
203::
204
205 cephfs-data-scan pg_files <path> <pg id> [<pg id>...]
206
207For example, if you have lost data from PGs 1.4 and 4.5, and you would like
208to know which files under /home/bob might have been damaged:
209
210::
211
212 cephfs-data-scan pg_files /home/bob 1.4 4.5
213
214The output will be a list of paths to potentially damaged files, one
215per line.
216
217Note that this command acts as a normal CephFS client to find all the
218files in the filesystem and read their layouts, so the MDS must be
219up and running.
220
221Using an alternate metadata pool for recovery
222---------------------------------------------
223
224.. warning::
225
226 There has not been extensive testing of this procedure. It should be
227 undertaken with great care.
228
229If an existing filesystem is damaged and inoperative, it is possible to create
230a fresh metadata pool and attempt to reconstruct the filesystem metadata
231into this new pool, leaving the old metadata in place. This could be used to
232make a safer attempt at recovery since the existing metadata pool would not be
233overwritten.
234
235.. caution::
236
237 During this process, multiple metadata pools will contain data referring to
238 the same data pool. Extreme caution must be exercised to avoid changing the
239 data pool contents while this is the case. Once recovery is complete, the
240 damaged metadata pool should be deleted.
241
242To begin this process, first create the fresh metadata pool and initialize
243it with empty file system data structures:
244
245::
246
247 ceph fs flag set enable_multiple true --yes-i-really-mean-it
248 ceph osd pool create recovery <pg-num> replicated <crush-ruleset-name>
249 ceph fs new recovery-fs recovery <data pool> --allow-dangerous-metadata-overlay
250 cephfs-data-scan init --force-init --filesystem recovery-fs --alternate-pool recovery
c07f9fc5 251 ceph fs reset recovery-fs --yes-i-really-mean-it
7c673cae
FG
252 cephfs-table-tool recovery-fs:all reset session
253 cephfs-table-tool recovery-fs:all reset snap
254 cephfs-table-tool recovery-fs:all reset inode
255
256Next, run the recovery toolset using the --alternate-pool argument to output
257results to the alternate pool:
258
259::
260
c07f9fc5 261 cephfs-data-scan scan_extents --alternate-pool recovery --filesystem <original filesystem name> <original data pool name>
7c673cae 262 cephfs-data-scan scan_inodes --alternate-pool recovery --filesystem <original filesystem name> --force-corrupt --force-init <original data pool name>
c07f9fc5 263 cephfs-data-scan scan_links --filesystem recovery-fs
7c673cae
FG
264
265If the damaged filesystem contains dirty journal data, it may be recovered next
266with:
267
268::
269
270 cephfs-journal-tool --rank=<original filesystem name>:0 event recover_dentries list --alternate-pool recovery
271 cephfs-journal-tool --rank recovery-fs:0 journal reset --force
272
c07f9fc5
FG
273After recovery, some recovered directories will have incorrect statistics.
274Ensure the parameters mds_verify_scatter and mds_debug_scatterstat are set
275to false (the default) to prevent the MDS from checking the statistics, then
276run a forward scrub to repair them. Ensure you have an MDS running and issue:
7c673cae
FG
277
278::
279
280 ceph daemon mds.a scrub_path / recursive repair