]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | .. _mds-scrub: |
2 | ||
9f95a23c TL |
3 | ====================== |
4 | Ceph File System Scrub | |
5 | ====================== | |
11fdf7f2 | 6 | |
9f95a23c | 7 | CephFS provides the cluster admin (operator) to check consistency of a file system |
11fdf7f2 TL |
8 | via a set of scrub commands. Scrub can be classified into two parts: |
9 | ||
9f95a23c | 10 | #. Forward Scrub: In which the scrub operation starts at the root of the file system |
11fdf7f2 TL |
11 | (or a sub directory) and looks at everything that can be touched in the hierarchy |
12 | to ensure consistency. | |
13 | ||
14 | #. Backward Scrub: In which the scrub operation looks at every RADOS object in the | |
9f95a23c | 15 | file system pools and maps it back to the file system hierarchy. |
11fdf7f2 TL |
16 | |
17 | This document details commands to initiate and control forward scrub (referred as | |
18 | scrub thereafter). | |
19 | ||
f67539c2 TL |
20 | .. warning:: |
21 | ||
22 | CephFS forward scrubs are started and manipulated on rank 0. All scrub | |
23 | commands must be directed at rank 0. | |
24 | ||
9f95a23c TL |
25 | Initiate File System Scrub |
26 | ========================== | |
11fdf7f2 | 27 | |
f67539c2 | 28 | To start a scrub operation for a directory tree use the following command:: |
11fdf7f2 | 29 | |
f67539c2 | 30 | ceph tell mds.<fsname>:0 scrub start <path> [scrubopts] [tag] |
11fdf7f2 | 31 | |
f67539c2 TL |
32 | where ``scrubopts`` is a comma delimited list of ``recursive``, ``force``, or |
33 | ``repair`` and ``tag`` is an optional custom string tag (the default is a generated | |
34 | UUID). An example command is:: | |
35 | ||
36 | ceph tell mds.cephfs:0 scrub start / recursive | |
11fdf7f2 TL |
37 | { |
38 | "return_code": 0, | |
39 | "scrub_tag": "6f0d204c-6cfd-4300-9e02-73f382fd23c1", | |
40 | "mode": "asynchronous" | |
41 | } | |
42 | ||
f67539c2 TL |
43 | Recursive scrub is asynchronous (as hinted by `mode` in the output above). |
44 | Asynchronous scrubs must be polled using ``scrub status`` to determine the | |
45 | status. | |
11fdf7f2 | 46 | |
f67539c2 TL |
47 | The scrub tag is used to differentiate scrubs and also to mark each inode's |
48 | first data object in the default data pool (where the backtrace information is | |
49 | stored) with a ``scrub_tag`` extended attribute with the value of the tag. You | |
50 | can verify an inode was scrubbed by looking at the extended attribute using the | |
51 | RADOS utilities. | |
11fdf7f2 | 52 | |
f67539c2 TL |
53 | Scrubs work for multiple active MDS (multiple ranks). The scrub is managed by |
54 | rank 0 and distributed across MDS as appropriate. | |
11fdf7f2 TL |
55 | |
56 | ||
9f95a23c TL |
57 | Monitor (ongoing) File System Scrubs |
58 | ==================================== | |
11fdf7f2 | 59 | |
f67539c2 TL |
60 | Status of ongoing scrubs can be monitored and polled using in `scrub status` |
61 | command. This commands lists out ongoing scrubs (identified by the tag) along | |
62 | with the path and options used to initiate the scrub:: | |
11fdf7f2 | 63 | |
f67539c2 | 64 | ceph tell mds.cephfs:0 scrub status |
11fdf7f2 TL |
65 | { |
66 | "status": "scrub active (85 inodes in the stack)", | |
67 | "scrubs": { | |
68 | "6f0d204c-6cfd-4300-9e02-73f382fd23c1": { | |
69 | "path": "/", | |
70 | "options": "recursive" | |
71 | } | |
72 | } | |
73 | } | |
74 | ||
75 | `status` shows the number of inodes that are scheduled to be scrubbed at any point in time, | |
9f95a23c TL |
76 | hence, can change on subsequent `scrub status` invocations. Also, a high level summary of |
77 | scrub operation (which includes the operation state and paths on which scrub is triggered) | |
f67539c2 | 78 | gets displayed in `ceph status`:: |
9f95a23c TL |
79 | |
80 | ceph status | |
81 | [...] | |
82 | ||
83 | task status: | |
84 | scrub status: | |
85 | mds.0: active [paths:/] | |
86 | ||
87 | [...] | |
11fdf7f2 | 88 | |
f67539c2 TL |
89 | A scrub is complete when it no longer shows up in this list (although that may |
90 | change in future releases). Any damage will be reported via cluster health warnings. | |
91 | ||
9f95a23c TL |
92 | Control (ongoing) File System Scrubs |
93 | ==================================== | |
11fdf7f2 TL |
94 | |
95 | - Pause: Pausing ongoing scrub operations results in no new or pending inodes being | |
96 | scrubbed after in-flight RADOS ops (for the inodes that are currently being scrubbed) | |
f67539c2 | 97 | finish:: |
11fdf7f2 | 98 | |
f67539c2 | 99 | ceph tell mds.cephfs:0 scrub pause |
11fdf7f2 TL |
100 | { |
101 | "return_code": 0 | |
102 | } | |
103 | ||
f67539c2 TL |
104 | The ``scrub status`` after pausing reflects the paused state. At this point, |
105 | initiating new scrub operations (via ``scrub start``) would just queue the | |
106 | inode for scrub:: | |
11fdf7f2 | 107 | |
f67539c2 | 108 | ceph tell mds.cephfs:0 scrub status |
11fdf7f2 TL |
109 | { |
110 | "status": "PAUSED (66 inodes in the stack)", | |
111 | "scrubs": { | |
112 | "6f0d204c-6cfd-4300-9e02-73f382fd23c1": { | |
113 | "path": "/", | |
114 | "options": "recursive" | |
115 | } | |
116 | } | |
117 | } | |
118 | ||
f67539c2 | 119 | - Resume: Resuming kick starts a paused scrub operation:: |
11fdf7f2 | 120 | |
f67539c2 | 121 | ceph tell mds.cephfs:0 scrub resume |
11fdf7f2 TL |
122 | { |
123 | "return_code": 0 | |
124 | } | |
125 | ||
126 | - Abort: Aborting ongoing scrub operations removes pending inodes from the scrub | |
127 | queue (thereby aborting the scrub) after in-flight RADOS ops (for the inodes that | |
f67539c2 | 128 | are currently being scrubbed) finish:: |
11fdf7f2 | 129 | |
f67539c2 | 130 | ceph tell mds.cephfs:0 scrub abort |
11fdf7f2 TL |
131 | { |
132 | "return_code": 0 | |
133 | } | |
39ae355f TL |
134 | |
135 | Damages | |
136 | ======= | |
137 | ||
138 | The types of damage that can be reported and repaired by File System Scrub are: | |
139 | ||
140 | * DENTRY : Inode's dentry is missing. | |
141 | ||
142 | * DIR_FRAG : Inode's directory fragment(s) is missing. | |
143 | ||
144 | * BACKTRACE : Inode's backtrace in the data pool is corrupted. | |
145 |