]>
Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | Tools that manage md devices can be found at |
2 | http://www.<country>.kernel.org/pub/linux/utils/raid/.... | |
3 | ||
4 | ||
5 | Boot time assembly of RAID arrays | |
6 | --------------------------------- | |
7 | ||
8 | You can boot with your md device with the following kernel command | |
9 | lines: | |
10 | ||
11 | for old raid arrays without persistent superblocks: | |
12 | md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn | |
13 | ||
14 | for raid arrays with persistent superblocks | |
15 | md=<md device no.>,dev0,dev1,...,devn | |
16 | or, to assemble a partitionable array: | |
17 | md=d<md device no.>,dev0,dev1,...,devn | |
18 | ||
19 | md device no. = the number of the md device ... | |
20 | 0 means md0, | |
21 | 1 md1, | |
22 | 2 md2, | |
23 | 3 md3, | |
24 | 4 md4 | |
25 | ||
26 | raid level = -1 linear mode | |
27 | 0 striped mode | |
28 | other modes are only supported with persistent super blocks | |
29 | ||
30 | chunk size factor = (raid-0 and raid-1 only) | |
31 | Set the chunk size as 4k << n. | |
32 | ||
33 | fault level = totally ignored | |
34 | ||
35 | dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1 | |
36 | ||
37 | A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this: | |
38 | ||
39 | e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro | |
40 | ||
41 | ||
42 | Boot time autodetection of RAID arrays | |
43 | -------------------------------------- | |
44 | ||
45 | When md is compiled into the kernel (not as module), partitions of | |
46 | type 0xfd are scanned and automatically assembled into RAID arrays. | |
47 | This autodetection may be suppressed with the kernel parameter | |
48 | "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 | |
49 | superblock can be autodetected and run at boot time. | |
50 | ||
51 | The kernel parameter "raid=partitionable" (or "raid=part") means | |
52 | that all auto-detected arrays are assembled as partitionable. | |
53 | ||
6ff8d8ec N |
54 | Boot time assembly of degraded/dirty arrays |
55 | ------------------------------------------- | |
56 | ||
57 | If a raid5 or raid6 array is both dirty and degraded, it could have | |
58 | undetectable data corruption. This is because the fact that it is | |
59 | 'dirty' means that the parity cannot be trusted, and the fact that it | |
60 | is degraded means that some datablocks are missing and cannot reliably | |
61 | be reconstructed (due to no parity). | |
62 | ||
63 | For this reason, md will normally refuse to start such an array. This | |
64 | requires the sysadmin to take action to explicitly start the array | |
65 | desipite possible corruption. This is normally done with | |
66 | mdadm --assemble --force .... | |
67 | ||
68 | This option is not really available if the array has the root | |
69 | filesystem on it. In order to support this booting from such an | |
70 | array, md supports a module parameter "start_dirty_degraded" which, | |
71 | when set to 1, bypassed the checks and will allows dirty degraded | |
72 | arrays to be started. | |
73 | ||
74 | So, to boot with a root filesystem of a dirty degraded raid[56], use | |
75 | ||
76 | md-mod.start_dirty_degraded=1 | |
77 | ||
1da177e4 LT |
78 | |
79 | Superblock formats | |
80 | ------------------ | |
81 | ||
82 | The md driver can support a variety of different superblock formats. | |
83 | Currently, it supports superblock formats "0.90.0" and the "md-1" format | |
84 | introduced in the 2.5 development series. | |
85 | ||
86 | The kernel will autodetect which format superblock is being used. | |
87 | ||
88 | Superblock format '0' is treated differently to others for legacy | |
89 | reasons - it is the original superblock format. | |
90 | ||
91 | ||
92 | General Rules - apply for all superblock formats | |
93 | ------------------------------------------------ | |
94 | ||
95 | An array is 'created' by writing appropriate superblocks to all | |
96 | devices. | |
97 | ||
98 | It is 'assembled' by associating each of these devices with an | |
99 | particular md virtual device. Once it is completely assembled, it can | |
100 | be accessed. | |
101 | ||
102 | An array should be created by a user-space tool. This will write | |
103 | superblocks to all devices. It will usually mark the array as | |
104 | 'unclean', or with some devices missing so that the kernel md driver | |
105 | can create appropriate redundancy (copying in raid1, parity | |
106 | calculation in raid4/5). | |
107 | ||
108 | When an array is assembled, it is first initialized with the | |
109 | SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor | |
110 | version number. The major version number selects which superblock | |
111 | format is to be used. The minor number might be used to tune handling | |
112 | of the format, such as suggesting where on each device to look for the | |
113 | superblock. | |
114 | ||
115 | Then each device is added using the ADD_NEW_DISK ioctl. This | |
116 | provides, in particular, a major and minor number identifying the | |
117 | device to add. | |
118 | ||
119 | The array is started with the RUN_ARRAY ioctl. | |
120 | ||
121 | Once started, new devices can be added. They should have an | |
122 | appropriate superblock written to them, and then passed be in with | |
123 | ADD_NEW_DISK. | |
124 | ||
125 | Devices that have failed or are not yet active can be detached from an | |
126 | array using HOT_REMOVE_DISK. | |
127 | ||
128 | ||
129 | Specific Rules that apply to format-0 super block arrays, and | |
130 | arrays with no superblock (non-persistent). | |
131 | ------------------------------------------------------------- | |
132 | ||
133 | An array can be 'created' by describing the array (level, chunksize | |
134 | etc) in a SET_ARRAY_INFO ioctl. This must has major_version==0 and | |
135 | raid_disks != 0. | |
136 | ||
137 | Then uninitialized devices can be added with ADD_NEW_DISK. The | |
138 | structure passed to ADD_NEW_DISK must specify the state of the device | |
139 | and it's role in the array. | |
140 | ||
141 | Once started with RUN_ARRAY, uninitialized spares can be added with | |
142 | HOT_ADD_DISK. | |
bb636547 N |
143 | |
144 | ||
145 | ||
146 | MD devices in sysfs | |
147 | ------------------- | |
148 | md devices appear in sysfs (/sys) as regular block devices, | |
149 | e.g. | |
150 | /sys/block/md0 | |
151 | ||
152 | Each 'md' device will contain a subdirectory called 'md' which | |
153 | contains further md-specific information about the device. | |
154 | ||
155 | All md devices contain: | |
156 | level | |
157 | a text file indicating the 'raid level'. This may be a standard | |
158 | numerical level prefixed by "RAID-" - e.g. "RAID-5", or some | |
159 | other name such as "linear" or "multipath". | |
160 | If no raid level has been set yet (array is still being | |
161 | assembled), this file will be empty. | |
162 | ||
163 | raid_disks | |
164 | a text file with a simple number indicating the number of devices | |
165 | in a fully functional array. If this is not yet known, the file | |
166 | will be empty. If an array is being resized (not currently | |
167 | possible) this will contain the larger of the old and new sizes. | |
168 | ||
169 | As component devices are added to an md array, they appear in the 'md' | |
170 | directory as new directories named | |
171 | dev-XXX | |
172 | where XXX is a name that the kernel knows for the device, e.g. hdb1. | |
173 | Each directory contains: | |
174 | ||
175 | block | |
176 | a symlink to the block device in /sys/block, e.g. | |
177 | /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1 | |
178 | ||
179 | super | |
180 | A file containing an image of the superblock read from, or | |
181 | written to, that device. | |
182 | ||
183 | state | |
184 | A file recording the current state of the device in the array | |
185 | which can be a comma separated list of | |
186 | faulty - device has been kicked from active use due to | |
187 | a detected fault | |
188 | in_sync - device is a fully in-sync member of the array | |
189 | spare - device is working, but not a full member. | |
190 | This includes spares that are in the process | |
191 | of being recoverred to | |
192 | This list make grow in future. | |
193 | ||
194 | ||
195 | An active md device will also contain and entry for each active device | |
196 | in the array. These are named | |
197 | ||
198 | rdNN | |
199 | ||
200 | where 'NN' is the possition in the array, starting from 0. | |
201 | So for a 3 drive array there will be rd0, rd1, rd2. | |
202 | These are symbolic links to the appropriate 'dev-XXX' entry. | |
203 | Thus, for example, | |
204 | cat /sys/block/md*/md/rd*/state | |
205 | will show 'in_sync' on every line. | |
206 | ||
207 | ||
208 | ||
209 | Active md devices for levels that support data redundancy (1,4,5,6) | |
210 | also have | |
211 | ||
212 | sync_action | |
213 | a text file that can be used to monitor and control the rebuild | |
214 | process. It contains one word which can be one of: | |
215 | resync - redundancy is being recalculated after unclean | |
216 | shutdown or creation | |
217 | recover - a hot spare is being built to replace a | |
218 | failed/missing device | |
219 | idle - nothing is happening | |
220 | check - A full check of redundancy was requested and is | |
221 | happening. This reads all block and checks | |
222 | them. A repair may also happen for some raid | |
223 | levels. | |
224 | repair - A full check and repair is happening. This is | |
225 | similar to 'resync', but was requested by the | |
226 | user, and the write-intent bitmap is NOT used to | |
227 | optimise the process. | |
228 | ||
229 | This file is writable, and each of the strings that could be | |
230 | read are meaningful for writing. | |
231 | ||
232 | 'idle' will stop an active resync/recovery etc. There is no | |
233 | guarantee that another resync/recovery may not be automatically | |
234 | started again, though some event will be needed to trigger | |
235 | this. | |
236 | 'resync' or 'recovery' can be used to restart the | |
237 | corresponding operation if it was stopped with 'idle'. | |
238 | 'check' and 'repair' will start the appropriate process | |
239 | providing the current state is 'idle'. | |
240 | ||
241 | mismatch_count | |
242 | When performing 'check' and 'repair', and possibly when | |
243 | performing 'resync', md will count the number of errors that are | |
244 | found. The count in 'mismatch_cnt' is the number of sectors | |
245 | that were re-written, or (for 'check') would have been | |
246 | re-written. As most raid levels work in units of pages rather | |
247 | than sectors, this my be larger than the number of actual errors | |
248 | by a factor of the number of sectors in a page. | |
249 | ||
250 | Each active md device may also have attributes specific to the | |
251 | personality module that manages it. | |
252 | These are specific to the implementation of the module and could | |
253 | change substantially if the implementation changes. | |
254 | ||
255 | These currently include | |
256 | ||
257 | stripe_cache_size (currently raid5 only) | |
258 | number of entries in the stripe cache. This is writable, but | |
259 | there are upper and lower limits (32768, 16). Default is 128. | |
260 | strip_cache_active (currently raid5 only) | |
261 | number of active entries in the stripe cache |