]>
Commit | Line | Data |
---|---|---|
c5ebfbbe RW |
1 | .\" |
2 | .\" CDDL HEADER START | |
3 | .\" | |
4 | .\" The contents of this file are subject to the terms of the | |
5 | .\" Common Development and Distribution License (the "License"). | |
6 | .\" You may not use this file except in compliance with the License. | |
7 | .\" | |
8 | .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE | |
1d3ba0bf | 9 | .\" or https://opensource.org/licenses/CDDL-1.0. |
c5ebfbbe RW |
10 | .\" See the License for the specific language governing permissions |
11 | .\" and limitations under the License. | |
12 | .\" | |
13 | .\" When distributing Covered Code, include this CDDL HEADER in each | |
14 | .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. | |
15 | .\" If applicable, add the following below this CDDL HEADER, with the | |
16 | .\" fields enclosed by brackets "[]" replaced with your own identifying | |
17 | .\" information: Portions Copyright [yyyy] [name of copyright owner] | |
18 | .\" | |
19 | .\" CDDL HEADER END | |
20 | .\" | |
c5ebfbbe RW |
21 | .\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved. |
22 | .\" Copyright (c) 2012, 2018 by Delphix. All rights reserved. | |
23 | .\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved. | |
24 | .\" Copyright (c) 2017 Datto Inc. | |
25 | .\" Copyright (c) 2018 George Melikov. All Rights Reserved. | |
26 | .\" Copyright 2017 Nexenta Systems, Inc. | |
27 | .\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. | |
28 | .\" | |
f84fe3fc | 29 | .Dd June 2, 2021 |
2badb345 | 30 | .Dt ZPOOLCONCEPTS 7 |
6706552e | 31 | .Os |
f84fe3fc | 32 | . |
c5ebfbbe RW |
33 | .Sh NAME |
34 | .Nm zpoolconcepts | |
35 | .Nd overview of ZFS storage pools | |
f84fe3fc | 36 | . |
c5ebfbbe RW |
37 | .Sh DESCRIPTION |
38 | .Ss Virtual Devices (vdevs) | |
39 | A "virtual device" describes a single device or a collection of devices | |
40 | organized according to certain performance and fault characteristics. | |
41 | The following virtual devices are supported: | |
f84fe3fc | 42 | .Bl -tag -width "special" |
c5ebfbbe RW |
43 | .It Sy disk |
44 | A block device, typically located under | |
45 | .Pa /dev . | |
46 | ZFS can use individual slices or partitions, though the recommended mode of | |
47 | operation is to use whole disks. | |
48 | A disk can be specified by a full path, or it can be a shorthand name | |
49 | .Po the relative portion of the path under | |
50 | .Pa /dev | |
51 | .Pc . | |
52 | A whole disk can be specified by omitting the slice or partition designation. | |
53 | For example, | |
54 | .Pa sda | |
55 | is equivalent to | |
56 | .Pa /dev/sda . | |
57 | When given a whole disk, ZFS automatically labels the disk, if necessary. | |
58 | .It Sy file | |
59 | A regular file. | |
60 | The use of files as a backing store is strongly discouraged. | |
61 | It is designed primarily for experimental purposes, as the fault tolerance of a | |
f84fe3fc | 62 | file is only as good as the file system on which it resides. |
c5ebfbbe RW |
63 | A file must be specified by a full path. |
64 | .It Sy mirror | |
65 | A mirror of two or more devices. | |
66 | Data is replicated in an identical fashion across all components of a mirror. | |
f84fe3fc AZ |
67 | A mirror with |
68 | .Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1 | |
69 | devices failing without losing data. | |
c5ebfbbe RW |
70 | .It Sy raidz , raidz1 , raidz2 , raidz3 |
71 | A variation on RAID-5 that allows for better distribution of parity and | |
72 | eliminates the RAID-5 | |
73 | .Qq write hole | |
74 | .Pq in which data and parity become inconsistent after a power loss . | |
75 | Data and parity is striped across all disks within a raidz group. | |
76 | .Pp | |
f84fe3fc | 77 | A raidz group can have single, double, or triple parity, meaning that the |
c5ebfbbe RW |
78 | raidz group can sustain one, two, or three failures, respectively, without |
79 | losing any data. | |
80 | The | |
81 | .Sy raidz1 | |
82 | vdev type specifies a single-parity raidz group; the | |
83 | .Sy raidz2 | |
84 | vdev type specifies a double-parity raidz group; and the | |
85 | .Sy raidz3 | |
86 | vdev type specifies a triple-parity raidz group. | |
87 | The | |
88 | .Sy raidz | |
89 | vdev type is an alias for | |
90 | .Sy raidz1 . | |
91 | .Pp | |
f84fe3fc AZ |
92 | A raidz group with |
93 | .Em N No disks of size Em X No with Em P No parity disks can hold approximately | |
b46be903 | 94 | .Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data . |
c5ebfbbe RW |
95 | The minimum number of devices in a raidz group is one more than the number of |
96 | parity disks. | |
97 | The recommended number is between 3 and 9 to help increase performance. | |
b2255edc BB |
98 | .It Sy draid , draid1 , draid2 , draid3 |
99 | A variant of raidz that provides integrated distributed hot spares which | |
100 | allows for faster resilvering while retaining the benefits of raidz. | |
f84fe3fc | 101 | A dRAID vdev is constructed from multiple internal raidz groups, each with |
b46be903 | 102 | .Em D No data devices and Em P No parity devices . |
b2255edc BB |
103 | These groups are distributed over all of the children in order to fully |
104 | utilize the available disk performance. | |
105 | .Pp | |
106 | Unlike raidz, dRAID uses a fixed stripe width (padding as necessary with | |
107 | zeros) to allow fully sequential resilvering. | |
108 | This fixed stripe width significantly effects both usable capacity and IOPS. | |
f84fe3fc | 109 | For example, with the default |
a894ae75 | 110 | .Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB . |
b2255edc BB |
111 | If using compression, this relatively large allocation size can reduce the |
112 | effective compression ratio. | |
f84fe3fc AZ |
113 | When using ZFS volumes and dRAID, the default of the |
114 | .Sy volblocksize | |
115 | property is increased to account for the allocation size. | |
b2255edc BB |
116 | If a dRAID pool will hold a significant amount of small blocks, it is |
117 | recommended to also add a mirrored | |
118 | .Sy special | |
119 | vdev to store those blocks. | |
120 | .Pp | |
f84fe3fc | 121 | In regards to I/O, performance is similar to raidz since for any read all |
b46be903 | 122 | .Em D No data disks must be accessed . |
b2255edc | 123 | Delivered random IOPS can be reasonably approximated as |
f84fe3fc | 124 | .Sy floor((N-S)/(D+P))*single_drive_IOPS . |
b2255edc | 125 | .Pp |
01893788 | 126 | Like raidz, a dRAID can have single-, double-, or triple-parity. |
f84fe3fc | 127 | The |
b2255edc BB |
128 | .Sy draid1 , |
129 | .Sy draid2 , | |
130 | and | |
131 | .Sy draid3 | |
132 | types can be used to specify the parity level. | |
133 | The | |
134 | .Sy draid | |
135 | vdev type is an alias for | |
136 | .Sy draid1 . | |
137 | .Pp | |
f84fe3fc | 138 | A dRAID with |
b46be903 | 139 | .Em N No disks of size Em X , D No data disks per redundancy group , Em P |
f84fe3fc AZ |
140 | .No parity level, and Em S No distributed hot spares can hold approximately |
141 | .Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P | |
142 | devices failing without losing data. | |
143 | .It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc | |
b2255edc BB |
144 | A non-default dRAID configuration can be specified by appending one or more |
145 | of the following optional arguments to the | |
146 | .Sy draid | |
f84fe3fc AZ |
147 | keyword: |
148 | .Bl -tag -compact -width "children" | |
149 | .It Ar parity | |
150 | The parity level (1-3). | |
151 | .It Ar data | |
152 | The number of data devices per redundancy group. | |
153 | In general, a smaller value of | |
b46be903 | 154 | .Em D No will increase IOPS, improve the compression ratio , |
f84fe3fc AZ |
155 | and speed up resilvering at the expense of total usable capacity. |
156 | Defaults to | |
157 | .Em 8 , No unless Em N-P-S No is less than Em 8 . | |
158 | .It Ar children | |
159 | The expected number of children. | |
b2255edc BB |
160 | Useful as a cross-check when listing a large number of devices. |
161 | An error is returned when the provided number of children differs. | |
f84fe3fc AZ |
162 | .It Ar spares |
163 | The number of distributed hot spares. | |
b2255edc | 164 | Defaults to zero. |
f84fe3fc | 165 | .El |
c5ebfbbe RW |
166 | .It Sy spare |
167 | A pseudo-vdev which keeps track of available hot spares for a pool. | |
168 | For more information, see the | |
169 | .Sx Hot Spares | |
170 | section. | |
171 | .It Sy log | |
172 | A separate intent log device. | |
173 | If more than one log device is specified, then writes are load-balanced between | |
174 | devices. | |
175 | Log devices can be mirrored. | |
176 | However, raidz vdev types are not supported for the intent log. | |
177 | For more information, see the | |
178 | .Sx Intent Log | |
179 | section. | |
180 | .It Sy dedup | |
181 | A device dedicated solely for deduplication tables. | |
182 | The redundancy of this device should match the redundancy of the other normal | |
f84fe3fc AZ |
183 | devices in the pool. |
184 | If more than one dedup device is specified, then | |
c5ebfbbe RW |
185 | allocations are load-balanced between those devices. |
186 | .It Sy special | |
187 | A device dedicated solely for allocating various kinds of internal metadata, | |
188 | and optionally small file blocks. | |
189 | The redundancy of this device should match the redundancy of the other normal | |
f84fe3fc AZ |
190 | devices in the pool. |
191 | If more than one special device is specified, then | |
c5ebfbbe RW |
192 | allocations are load-balanced between those devices. |
193 | .Pp | |
194 | For more information on special allocations, see the | |
195 | .Sx Special Allocation Class | |
196 | section. | |
197 | .It Sy cache | |
198 | A device used to cache storage pool data. | |
199 | A cache device cannot be configured as a mirror or raidz group. | |
200 | For more information, see the | |
201 | .Sx Cache Devices | |
202 | section. | |
203 | .El | |
204 | .Pp | |
205 | Virtual devices cannot be nested, so a mirror or raidz virtual device can only | |
206 | contain files or disks. | |
207 | Mirrors of mirrors | |
208 | .Pq or other combinations | |
209 | are not allowed. | |
210 | .Pp | |
211 | A pool can have any number of virtual devices at the top of the configuration | |
212 | .Po known as | |
213 | .Qq root vdevs | |
214 | .Pc . | |
215 | Data is dynamically distributed across all top-level devices to balance data | |
216 | among devices. | |
217 | As new virtual devices are added, ZFS automatically places data on the newly | |
218 | available devices. | |
219 | .Pp | |
f84fe3fc AZ |
220 | Virtual devices are specified one at a time on the command line, |
221 | separated by whitespace. | |
222 | Keywords like | |
223 | .Sy mirror No and Sy raidz | |
c5ebfbbe | 224 | are used to distinguish where a group ends and another begins. |
f84fe3fc AZ |
225 | For example, the following creates a pool with two root vdevs, |
226 | each a mirror of two disks: | |
227 | .Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd | |
228 | . | |
c5ebfbbe RW |
229 | .Ss Device Failure and Recovery |
230 | ZFS supports a rich set of mechanisms for handling device failure and data | |
231 | corruption. | |
232 | All metadata and data is checksummed, and ZFS automatically repairs bad data | |
233 | from a good copy when corruption is detected. | |
234 | .Pp | |
235 | In order to take advantage of these features, a pool must make use of some form | |
236 | of redundancy, using either mirrored or raidz groups. | |
237 | While ZFS supports running in a non-redundant configuration, where each root | |
238 | vdev is simply a disk or file, this is strongly discouraged. | |
239 | A single case of bit corruption can render some or all of your data unavailable. | |
240 | .Pp | |
f84fe3fc AZ |
241 | A pool's health status is described by one of three states: |
242 | .Sy online , degraded , No or Sy faulted . | |
c5ebfbbe RW |
243 | An online pool has all devices operating normally. |
244 | A degraded pool is one in which one or more devices have failed, but the data is | |
245 | still available due to a redundant configuration. | |
246 | A faulted pool has corrupted metadata, or one or more faulted devices, and | |
247 | insufficient replicas to continue functioning. | |
248 | .Pp | |
f84fe3fc AZ |
249 | The health of the top-level vdev, such as a mirror or raidz device, |
250 | is potentially impacted by the state of its associated vdevs, | |
251 | or component devices. | |
c5ebfbbe RW |
252 | A top-level vdev or component device is in one of the following states: |
253 | .Bl -tag -width "DEGRADED" | |
254 | .It Sy DEGRADED | |
255 | One or more top-level vdevs is in the degraded state because one or more | |
256 | component devices are offline. | |
257 | Sufficient replicas exist to continue functioning. | |
258 | .Pp | |
259 | One or more component devices is in the degraded or faulted state, but | |
260 | sufficient replicas exist to continue functioning. | |
261 | The underlying conditions are as follows: | |
f84fe3fc | 262 | .Bl -bullet -compact |
c5ebfbbe RW |
263 | .It |
264 | The number of checksum errors exceeds acceptable levels and the device is | |
265 | degraded as an indication that something may be wrong. | |
266 | ZFS continues to use the device as necessary. | |
267 | .It | |
268 | The number of I/O errors exceeds acceptable levels. | |
269 | The device could not be marked as faulted because there are insufficient | |
270 | replicas to continue functioning. | |
271 | .El | |
272 | .It Sy FAULTED | |
273 | One or more top-level vdevs is in the faulted state because one or more | |
274 | component devices are offline. | |
275 | Insufficient replicas exist to continue functioning. | |
276 | .Pp | |
277 | One or more component devices is in the faulted state, and insufficient | |
278 | replicas exist to continue functioning. | |
279 | The underlying conditions are as follows: | |
f84fe3fc | 280 | .Bl -bullet -compact |
c5ebfbbe RW |
281 | .It |
282 | The device could be opened, but the contents did not match expected values. | |
283 | .It | |
284 | The number of I/O errors exceeds acceptable levels and the device is faulted to | |
285 | prevent further use of the device. | |
286 | .El | |
287 | .It Sy OFFLINE | |
288 | The device was explicitly taken offline by the | |
289 | .Nm zpool Cm offline | |
290 | command. | |
291 | .It Sy ONLINE | |
292 | The device is online and functioning. | |
293 | .It Sy REMOVED | |
294 | The device was physically removed while the system was running. | |
295 | Device removal detection is hardware-dependent and may not be supported on all | |
296 | platforms. | |
297 | .It Sy UNAVAIL | |
298 | The device could not be opened. | |
299 | If a pool is imported when a device was unavailable, then the device will be | |
300 | identified by a unique identifier instead of its path since the path was never | |
301 | correct in the first place. | |
302 | .El | |
303 | .Pp | |
330c6c05 MA |
304 | Checksum errors represent events where a disk returned data that was expected |
305 | to be correct, but was not. | |
306 | In other words, these are instances of silent data corruption. | |
307 | The checksum errors are reported in | |
308 | .Nm zpool Cm status | |
309 | and | |
310 | .Nm zpool Cm events . | |
311 | When a block is stored redundantly, a damaged block may be reconstructed | |
f84fe3fc | 312 | (e.g. from raidz parity or a mirrored copy). |
330c6c05 MA |
313 | In this case, ZFS reports the checksum error against the disks that contained |
314 | damaged data. | |
315 | If a block is unable to be reconstructed (e.g. due to 3 disks being damaged | |
f84fe3fc | 316 | in a raidz2 group), it is not possible to determine which disks were silently |
330c6c05 MA |
317 | corrupted. |
318 | In this case, checksum errors are reported for all disks on which the block | |
319 | is stored. | |
320 | .Pp | |
f84fe3fc AZ |
321 | If a device is removed and later re-attached to the system, |
322 | ZFS attempts online the device automatically. | |
323 | Device attachment detection is hardware-dependent | |
324 | and might not be supported on all platforms. | |
325 | . | |
c5ebfbbe RW |
326 | .Ss Hot Spares |
327 | ZFS allows devices to be associated with pools as | |
328 | .Qq hot spares . | |
329 | These devices are not actively used in the pool, but when an active device | |
330 | fails, it is automatically replaced by a hot spare. | |
331 | To create a pool with hot spares, specify a | |
332 | .Sy spare | |
333 | vdev with any number of devices. | |
334 | For example, | |
f84fe3fc | 335 | .Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd |
c5ebfbbe RW |
336 | .Pp |
337 | Spares can be shared across multiple pools, and can be added with the | |
338 | .Nm zpool Cm add | |
339 | command and removed with the | |
340 | .Nm zpool Cm remove | |
341 | command. | |
342 | Once a spare replacement is initiated, a new | |
343 | .Sy spare | |
344 | vdev is created within the configuration that will remain there until the | |
345 | original device is replaced. | |
346 | At this point, the hot spare becomes available again if another device fails. | |
347 | .Pp | |
348 | If a pool has a shared spare that is currently being used, the pool can not be | |
349 | exported since other pools may use this shared spare, which may lead to | |
350 | potential data corruption. | |
351 | .Pp | |
f84fe3fc AZ |
352 | Shared spares add some risk. |
353 | If the pools are imported on different hosts, | |
354 | and both pools suffer a device failure at the same time, | |
355 | both could attempt to use the spare at the same time. | |
356 | This may not be detected, resulting in data corruption. | |
c5ebfbbe RW |
357 | .Pp |
358 | An in-progress spare replacement can be cancelled by detaching the hot spare. | |
359 | If the original faulted device is detached, then the hot spare assumes its | |
360 | place in the configuration, and is removed from the spare list of all active | |
361 | pools. | |
362 | .Pp | |
b2255edc BB |
363 | The |
364 | .Sy draid | |
365 | vdev type provides distributed hot spares. | |
f84fe3fc AZ |
366 | These hot spares are named after the dRAID vdev they're a part of |
367 | .Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 , | |
368 | .No which is a single parity dRAID Pc | |
369 | and may only be used by that dRAID vdev. | |
b2255edc BB |
370 | Otherwise, they behave the same as normal hot spares. |
371 | .Pp | |
c5ebfbbe | 372 | Spares cannot replace log devices. |
f84fe3fc | 373 | . |
c5ebfbbe RW |
374 | .Ss Intent Log |
375 | The ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous | |
376 | transactions. | |
377 | For instance, databases often require their transactions to be on stable storage | |
378 | devices when returning from a system call. | |
379 | NFS and other applications can also use | |
380 | .Xr fsync 2 | |
381 | to ensure data stability. | |
382 | By default, the intent log is allocated from blocks within the main pool. | |
383 | However, it might be possible to get better performance using separate intent | |
384 | log devices such as NVRAM or a dedicated disk. | |
385 | For example: | |
f84fe3fc | 386 | .Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc |
c5ebfbbe RW |
387 | .Pp |
388 | Multiple log devices can also be specified, and they can be mirrored. | |
389 | See the | |
390 | .Sx EXAMPLES | |
391 | section for an example of mirroring multiple log devices. | |
392 | .Pp | |
f84fe3fc AZ |
393 | Log devices can be added, replaced, attached, detached and removed. |
394 | In addition, log devices are imported and exported as part of the pool | |
c5ebfbbe RW |
395 | that contains them. |
396 | Mirrored devices can be removed by specifying the top-level mirror vdev. | |
f84fe3fc | 397 | . |
c5ebfbbe RW |
398 | .Ss Cache Devices |
399 | Devices can be added to a storage pool as | |
400 | .Qq cache devices . | |
401 | These devices provide an additional layer of caching between main memory and | |
402 | disk. | |
403 | For read-heavy workloads, where the working set size is much larger than what | |
f84fe3fc | 404 | can be cached in main memory, using cache devices allows much more of this |
c5ebfbbe RW |
405 | working set to be served from low latency media. |
406 | Using cache devices provides the greatest performance improvement for random | |
407 | read-workloads of mostly static content. | |
408 | .Pp | |
409 | To create a pool with cache devices, specify a | |
410 | .Sy cache | |
411 | vdev with any number of devices. | |
412 | For example: | |
f84fe3fc | 413 | .Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd |
c5ebfbbe RW |
414 | .Pp |
415 | Cache devices cannot be mirrored or part of a raidz configuration. | |
416 | If a read error is encountered on a cache device, that read I/O is reissued to | |
417 | the original storage pool device, which might be part of a mirrored or raidz | |
418 | configuration. | |
419 | .Pp | |
77f6826b GA |
420 | The content of the cache devices is persistent across reboots and restored |
421 | asynchronously when importing the pool in L2ARC (persistent L2ARC). | |
422 | This can be disabled by setting | |
f84fe3fc AZ |
423 | .Sy l2arc_rebuild_enabled Ns = Ns Sy 0 . |
424 | For cache devices smaller than | |
a894ae75 | 425 | .Em 1 GiB , |
f84fe3fc AZ |
426 | we do not write the metadata structures |
427 | required for rebuilding the L2ARC in order not to waste space. | |
428 | This can be changed with | |
77f6826b | 429 | .Sy l2arc_rebuild_blocks_min_l2size . |
f84fe3fc | 430 | The cache device header |
a894ae75 | 431 | .Pq Em 512 B |
f84fe3fc AZ |
432 | is updated even if no metadata structures are written. |
433 | Setting | |
434 | .Sy l2arc_headroom Ns = Ns Sy 0 | |
77f6826b | 435 | will result in scanning the full-length ARC lists for cacheable content to be |
f84fe3fc AZ |
436 | written in L2ARC (persistent ARC). |
437 | If a cache device is added with | |
77f6826b GA |
438 | .Nm zpool Cm add |
439 | its label and header will be overwritten and its contents are not going to be | |
f84fe3fc AZ |
440 | restored in L2ARC, even if the device was previously part of the pool. |
441 | If a cache device is onlined with | |
77f6826b | 442 | .Nm zpool Cm online |
f84fe3fc AZ |
443 | its contents will be restored in L2ARC. |
444 | This is useful in case of memory pressure | |
77f6826b | 445 | where the contents of the cache device are not fully restored in L2ARC. |
f84fe3fc | 446 | The user can off- and online the cache device when there is less memory pressure |
77f6826b | 447 | in order to fully restore its contents to L2ARC. |
f84fe3fc | 448 | . |
c5ebfbbe | 449 | .Ss Pool checkpoint |
f84fe3fc AZ |
450 | Before starting critical procedures that include destructive actions |
451 | .Pq like Nm zfs Cm destroy , | |
452 | an administrator can checkpoint the pool's state and in the case of a | |
c5ebfbbe RW |
453 | mistake or failure, rewind the entire pool back to the checkpoint. |
454 | Otherwise, the checkpoint can be discarded when the procedure has completed | |
455 | successfully. | |
456 | .Pp | |
457 | A pool checkpoint can be thought of as a pool-wide snapshot and should be used | |
458 | with care as it contains every part of the pool's state, from properties to vdev | |
459 | configuration. | |
f84fe3fc | 460 | Thus, certain operations are not allowed while a pool has a checkpoint. |
c5ebfbbe | 461 | Specifically, vdev removal/attach/detach, mirror splitting, and |
f84fe3fc AZ |
462 | changing the pool's GUID. |
463 | Adding a new vdev is supported, but in the case of a rewind it will have to be | |
c5ebfbbe RW |
464 | added again. |
465 | Finally, users of this feature should keep in mind that scrubs in a pool that | |
466 | has a checkpoint do not repair checkpointed data. | |
467 | .Pp | |
468 | To create a checkpoint for a pool: | |
f84fe3fc | 469 | .Dl # Nm zpool Cm checkpoint Ar pool |
c5ebfbbe RW |
470 | .Pp |
471 | To later rewind to its checkpointed state, you need to first export it and | |
472 | then rewind it during import: | |
f84fe3fc AZ |
473 | .Dl # Nm zpool Cm export Ar pool |
474 | .Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool | |
c5ebfbbe RW |
475 | .Pp |
476 | To discard the checkpoint from a pool: | |
f84fe3fc | 477 | .Dl # Nm zpool Cm checkpoint Fl d Ar pool |
c5ebfbbe RW |
478 | .Pp |
479 | Dataset reservations (controlled by the | |
f84fe3fc AZ |
480 | .Sy reservation No and Sy refreservation |
481 | properties) may be unenforceable while a checkpoint exists, because the | |
c5ebfbbe RW |
482 | checkpoint is allowed to consume the dataset's reservation. |
483 | Finally, data that is part of the checkpoint but has been freed in the | |
484 | current state of the pool won't be scanned during a scrub. | |
f84fe3fc | 485 | . |
c5ebfbbe | 486 | .Ss Special Allocation Class |
f84fe3fc | 487 | Allocations in the special class are dedicated to specific block types. |
c5ebfbbe | 488 | By default this includes all metadata, the indirect blocks of user data, and |
f84fe3fc AZ |
489 | any deduplication tables. |
490 | The class can also be provisioned to accept small file blocks. | |
491 | .Pp | |
492 | A pool must always have at least one normal | |
493 | .Pq non- Ns Sy dedup Ns /- Ns Sy special | |
494 | vdev before | |
495 | other devices can be assigned to the special class. | |
496 | If the | |
497 | .Sy special | |
498 | class becomes full, then allocations intended for it | |
499 | will spill back into the normal class. | |
c5ebfbbe | 500 | .Pp |
f84fe3fc | 501 | Deduplication tables can be excluded from the special class by unsetting the |
c5ebfbbe | 502 | .Sy zfs_ddt_data_is_special |
f84fe3fc | 503 | ZFS module parameter. |
c5ebfbbe | 504 | .Pp |
f84fe3fc AZ |
505 | Inclusion of small file blocks in the special class is opt-in. |
506 | Each dataset can control the size of small file blocks allowed | |
507 | in the special class by setting the | |
c5ebfbbe | 508 | .Sy special_small_blocks |
f84fe3fc AZ |
509 | property to nonzero. |
510 | See | |
2badb345 | 511 | .Xr zfsprops 7 |
f84fe3fc | 512 | for more info on this property. |