]> git.proxmox.com Git - mirror_zfs.git/blob - man/man5/zfs-module-parameters.5
vdev_ashift should only be set once
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
1 '\" te
2 .\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3 .\" Copyright (c) 2019, 2020 by Delphix. All rights reserved.
4 .\" Copyright (c) 2019 Datto Inc.
5 .\" The contents of this file are subject to the terms of the Common Development
6 .\" and Distribution License (the "License"). You may not use this file except
7 .\" in compliance with the License. You can obtain a copy of the license at
8 .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
9 .\"
10 .\" See the License for the specific language governing permissions and
11 .\" limitations under the License. When distributing Covered Code, include this
12 .\" CDDL HEADER in each file and include the License file at
13 .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
14 .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
15 .\" own identifying information:
16 .\" Portions Copyright [yyyy] [name of copyright owner]
17 .TH ZFS-MODULE-PARAMETERS 5 "Aug 24, 2020" OpenZFS
18 .SH NAME
19 zfs\-module\-parameters \- ZFS module parameters
20 .SH DESCRIPTION
21 .sp
22 .LP
23 Description of the different parameters to the ZFS module.
24
25 .SS "Module parameters"
26 .sp
27 .LP
28
29 .sp
30 .ne 2
31 .na
32 \fBdbuf_cache_max_bytes\fR (ulong)
33 .ad
34 .RS 12n
35 Maximum size in bytes of the dbuf cache. The target size is determined by the
36 MIN versus \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size. The
37 behavior of the dbuf cache and its associated settings can be observed via the
38 \fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
39 .sp
40 Default value: \fBULONG_MAX\fR.
41 .RE
42
43 .sp
44 .ne 2
45 .na
46 \fBdbuf_metadata_cache_max_bytes\fR (ulong)
47 .ad
48 .RS 12n
49 Maximum size in bytes of the metadata dbuf cache. The target size is
50 determined by the MIN versus \fB1/2^dbuf_metadata_cache_shift\fR (1/64) of the
51 target ARC size. The behavior of the metadata dbuf cache and its associated
52 settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
53 .sp
54 Default value: \fBULONG_MAX\fR.
55 .RE
56
57 .sp
58 .ne 2
59 .na
60 \fBdbuf_cache_hiwater_pct\fR (uint)
61 .ad
62 .RS 12n
63 The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted
64 directly.
65 .sp
66 Default value: \fB10\fR%.
67 .RE
68
69 .sp
70 .ne 2
71 .na
72 \fBdbuf_cache_lowater_pct\fR (uint)
73 .ad
74 .RS 12n
75 The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops
76 evicting dbufs.
77 .sp
78 Default value: \fB10\fR%.
79 .RE
80
81 .sp
82 .ne 2
83 .na
84 \fBdbuf_cache_shift\fR (int)
85 .ad
86 .RS 12n
87 Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
88 of the target ARC size.
89 .sp
90 Default value: \fB5\fR.
91 .RE
92
93 .sp
94 .ne 2
95 .na
96 \fBdbuf_metadata_cache_shift\fR (int)
97 .ad
98 .RS 12n
99 Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR,
100 to a log2 fraction of the target ARC size.
101 .sp
102 Default value: \fB6\fR.
103 .RE
104
105 .sp
106 .ne 2
107 .na
108 \fBdmu_object_alloc_chunk_shift\fR (int)
109 .ad
110 .RS 12n
111 dnode slots allocated in a single operation as a power of 2. The default value
112 minimizes lock contention for the bulk operation performed.
113 .sp
114 Default value: \fB7\fR (128).
115 .RE
116
117 .sp
118 .ne 2
119 .na
120 \fBdmu_prefetch_max\fR (int)
121 .ad
122 .RS 12n
123 Limit the amount we can prefetch with one call to this amount (in bytes).
124 This helps to limit the amount of memory that can be used by prefetching.
125 .sp
126 Default value: \fB134,217,728\fR (128MB).
127 .RE
128
129 .sp
130 .ne 2
131 .na
132 \fBignore_hole_birth\fR (int)
133 .ad
134 .RS 12n
135 This is an alias for \fBsend_holes_without_birth_time\fR.
136 .RE
137
138 .sp
139 .ne 2
140 .na
141 \fBl2arc_feed_again\fR (int)
142 .ad
143 .RS 12n
144 Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
145 fast as possible.
146 .sp
147 Use \fB1\fR for yes (default) and \fB0\fR to disable.
148 .RE
149
150 .sp
151 .ne 2
152 .na
153 \fBl2arc_feed_min_ms\fR (ulong)
154 .ad
155 .RS 12n
156 Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
157 applicable in related situations.
158 .sp
159 Default value: \fB200\fR.
160 .RE
161
162 .sp
163 .ne 2
164 .na
165 \fBl2arc_feed_secs\fR (ulong)
166 .ad
167 .RS 12n
168 Seconds between L2ARC writing
169 .sp
170 Default value: \fB1\fR.
171 .RE
172
173 .sp
174 .ne 2
175 .na
176 \fBl2arc_headroom\fR (ulong)
177 .ad
178 .RS 12n
179 How far through the ARC lists to search for L2ARC cacheable content, expressed
180 as a multiplier of \fBl2arc_write_max\fR.
181 ARC persistence across reboots can be achieved with persistent L2ARC by setting
182 this parameter to \fB0\fR allowing the full length of ARC lists to be searched
183 for cacheable content.
184 .sp
185 Default value: \fB2\fR.
186 .RE
187
188 .sp
189 .ne 2
190 .na
191 \fBl2arc_headroom_boost\fR (ulong)
192 .ad
193 .RS 12n
194 Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
195 successfully compressed before writing. A value of \fB100\fR disables this
196 feature.
197 .sp
198 Default value: \fB200\fR%.
199 .RE
200
201 .sp
202 .ne 2
203 .na
204 \fBl2arc_mfuonly\fR (int)
205 .ad
206 .RS 12n
207 Controls whether only MFU metadata and data are cached from ARC into L2ARC.
208 This may be desired to avoid wasting space on L2ARC when reading/writing large
209 amounts of data that are not expected to be accessed more than once. The
210 default is \fB0\fR, meaning both MRU and MFU data and metadata are cached.
211 When turning off (\fB0\fR) this feature some MRU buffers will still be present
212 in ARC and eventually cached on L2ARC. If \fBl2arc_noprefetch\fR is set to 0,
213 some prefetched buffers will be cached to L2ARC, and those might later
214 transition to MRU, in which case the \fBl2arc_mru_asize\fR arcstat will not
215 be 0. Regardless of \fBl2arc_noprefetch\fR, some MFU buffers might be evicted
216 from ARC, accessed later on as prefetches and transition to MRU as prefetches.
217 If accessed again they are counted as MRU and the \fBl2arc_mru_asize\fR arcstat
218 will not be 0. The ARC status of L2ARC buffers when they were first cached in
219 L2ARC can be seen in the \fBl2arc_mru_asize\fR, \fBl2arc_mfu_asize\fR and
220 \fBl2arc_prefetch_asize\fR arcstats when importing the pool or onlining a cache
221 device if persistent L2ARC is enabled.
222 .sp
223 Use \fB0\fR for no (default) and \fB1\fR for yes.
224 .RE
225
226 .sp
227 .ne 2
228 .na
229 \fBl2arc_meta_percent\fR (int)
230 .ad
231 .RS 12n
232 Percent of ARC size allowed for L2ARC-only headers.
233 Since L2ARC buffers are not evicted on memory pressure, too large amount of
234 headers on system with irrationaly large L2ARC can render it slow or unusable.
235 This parameter limits L2ARC writes and rebuild to achieve it.
236 .sp
237 Default value: \fB33\fR%.
238 .RE
239
240 .sp
241 .ne 2
242 .na
243 \fBl2arc_trim_ahead\fR (ulong)
244 .ad
245 .RS 12n
246 Trims ahead of the current write size (\fBl2arc_write_max\fR) on L2ARC devices
247 by this percentage of write size if we have filled the device. If set to
248 \fB100\fR we TRIM twice the space required to accommodate upcoming writes. A
249 minimum of 64MB will be trimmed. It also enables TRIM of the whole L2ARC device
250 upon creation or addition to an existing pool or if the header of the device is
251 invalid upon importing a pool or onlining a cache device. A value of \fB0\fR
252 disables TRIM on L2ARC altogether and is the default as it can put significant
253 stress on the underlying storage devices. This will vary depending of how well
254 the specific device handles these commands.
255 .sp
256 Default value: \fB0\fR%.
257 .RE
258
259 .sp
260 .ne 2
261 .na
262 \fBl2arc_noprefetch\fR (int)
263 .ad
264 .RS 12n
265 Do not write buffers to L2ARC if they were prefetched but not used by
266 applications. In case there are prefetched buffers in L2ARC and this option
267 is later set to \fB1\fR, we do not read the prefetched buffers from L2ARC.
268 Setting this option to \fB0\fR is useful for caching sequential reads from the
269 disks to L2ARC and serve those reads from L2ARC later on. This may be beneficial
270 in case the L2ARC device is significantly faster in sequential reads than the
271 disks of the pool.
272 .sp
273 Use \fB1\fR to disable (default) and \fB0\fR to enable caching/reading
274 prefetches to/from L2ARC..
275 .RE
276
277 .sp
278 .ne 2
279 .na
280 \fBl2arc_norw\fR (int)
281 .ad
282 .RS 12n
283 No reads during writes.
284 .sp
285 Use \fB1\fR for yes and \fB0\fR for no (default).
286 .RE
287
288 .sp
289 .ne 2
290 .na
291 \fBl2arc_write_boost\fR (ulong)
292 .ad
293 .RS 12n
294 Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
295 while they remain cold.
296 .sp
297 Default value: \fB8,388,608\fR.
298 .RE
299
300 .sp
301 .ne 2
302 .na
303 \fBl2arc_write_max\fR (ulong)
304 .ad
305 .RS 12n
306 Max write bytes per interval.
307 .sp
308 Default value: \fB8,388,608\fR.
309 .RE
310
311 .sp
312 .ne 2
313 .na
314 \fBl2arc_rebuild_enabled\fR (int)
315 .ad
316 .RS 12n
317 Rebuild the L2ARC when importing a pool (persistent L2ARC). This can be
318 disabled if there are problems importing a pool or attaching an L2ARC device
319 (e.g. the L2ARC device is slow in reading stored log metadata, or the metadata
320 has become somehow fragmented/unusable).
321 .sp
322 Use \fB1\fR for yes (default) and \fB0\fR for no.
323 .RE
324
325 .sp
326 .ne 2
327 .na
328 \fBl2arc_rebuild_blocks_min_l2size\fR (ulong)
329 .ad
330 .RS 12n
331 Min size (in bytes) of an L2ARC device required in order to write log blocks
332 in it. The log blocks are used upon importing the pool to rebuild
333 the L2ARC (persistent L2ARC). Rationale: for L2ARC devices less than 1GB, the
334 amount of data l2arc_evict() evicts is significant compared to the amount of
335 restored L2ARC data. In this case do not write log blocks in L2ARC in order not
336 to waste space.
337 .sp
338 Default value: \fB1,073,741,824\fR (1GB).
339 .RE
340
341 .sp
342 .ne 2
343 .na
344 \fBmetaslab_aliquot\fR (ulong)
345 .ad
346 .RS 12n
347 Metaslab granularity, in bytes. This is roughly similar to what would be
348 referred to as the "stripe size" in traditional RAID arrays. In normal
349 operation, ZFS will try to write this amount of data to a top-level vdev
350 before moving on to the next one.
351 .sp
352 Default value: \fB524,288\fR.
353 .RE
354
355 .sp
356 .ne 2
357 .na
358 \fBmetaslab_bias_enabled\fR (int)
359 .ad
360 .RS 12n
361 Enable metaslab group biasing based on its vdev's over- or under-utilization
362 relative to the pool.
363 .sp
364 Use \fB1\fR for yes (default) and \fB0\fR for no.
365 .RE
366
367 .sp
368 .ne 2
369 .na
370 \fBmetaslab_force_ganging\fR (ulong)
371 .ad
372 .RS 12n
373 Make some blocks above a certain size be gang blocks. This option is used
374 by the test suite to facilitate testing.
375 .sp
376 Default value: \fB16,777,217\fR.
377 .RE
378
379 .sp
380 .ne 2
381 .na
382 \fBzfs_keep_log_spacemaps_at_export\fR (int)
383 .ad
384 .RS 12n
385 Prevent log spacemaps from being destroyed during pool exports and destroys.
386 .sp
387 Use \fB1\fR for yes and \fB0\fR for no (default).
388 .RE
389
390 .sp
391 .ne 2
392 .na
393 \fBzfs_metaslab_segment_weight_enabled\fR (int)
394 .ad
395 .RS 12n
396 Enable/disable segment-based metaslab selection.
397 .sp
398 Use \fB1\fR for yes (default) and \fB0\fR for no.
399 .RE
400
401 .sp
402 .ne 2
403 .na
404 \fBzfs_metaslab_switch_threshold\fR (int)
405 .ad
406 .RS 12n
407 When using segment-based metaslab selection, continue allocating
408 from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
409 worth of buckets have been exhausted.
410 .sp
411 Default value: \fB2\fR.
412 .RE
413
414 .sp
415 .ne 2
416 .na
417 \fBmetaslab_debug_load\fR (int)
418 .ad
419 .RS 12n
420 Load all metaslabs during pool import.
421 .sp
422 Use \fB1\fR for yes and \fB0\fR for no (default).
423 .RE
424
425 .sp
426 .ne 2
427 .na
428 \fBmetaslab_debug_unload\fR (int)
429 .ad
430 .RS 12n
431 Prevent metaslabs from being unloaded.
432 .sp
433 Use \fB1\fR for yes and \fB0\fR for no (default).
434 .RE
435
436 .sp
437 .ne 2
438 .na
439 \fBmetaslab_fragmentation_factor_enabled\fR (int)
440 .ad
441 .RS 12n
442 Enable use of the fragmentation metric in computing metaslab weights.
443 .sp
444 Use \fB1\fR for yes (default) and \fB0\fR for no.
445 .RE
446
447 .sp
448 .ne 2
449 .na
450 \fBmetaslab_df_max_search\fR (int)
451 .ad
452 .RS 12n
453 Maximum distance to search forward from the last offset. Without this limit,
454 fragmented pools can see >100,000 iterations and metaslab_block_picker()
455 becomes the performance limiting factor on high-performance storage.
456
457 With the default setting of 16MB, we typically see less than 500 iterations,
458 even with very fragmented, ashift=9 pools. The maximum number of iterations
459 possible is: \fBmetaslab_df_max_search / (2 * (1<<ashift))\fR.
460 With the default setting of 16MB this is 16*1024 (with ashift=9) or 2048
461 (with ashift=12).
462 .sp
463 Default value: \fB16,777,216\fR (16MB)
464 .RE
465
466 .sp
467 .ne 2
468 .na
469 \fBmetaslab_df_use_largest_segment\fR (int)
470 .ad
471 .RS 12n
472 If we are not searching forward (due to metaslab_df_max_search,
473 metaslab_df_free_pct, or metaslab_df_alloc_threshold), this tunable controls
474 what segment is used. If it is set, we will use the largest free segment.
475 If it is not set, we will use a segment of exactly the requested size (or
476 larger).
477 .sp
478 Use \fB1\fR for yes and \fB0\fR for no (default).
479 .RE
480
481 .sp
482 .ne 2
483 .na
484 \fBzfs_metaslab_max_size_cache_sec\fR (ulong)
485 .ad
486 .RS 12n
487 When we unload a metaslab, we cache the size of the largest free chunk. We use
488 that cached size to determine whether or not to load a metaslab for a given
489 allocation. As more frees accumulate in that metaslab while it's unloaded, the
490 cached max size becomes less and less accurate. After a number of seconds
491 controlled by this tunable, we stop considering the cached max size and start
492 considering only the histogram instead.
493 .sp
494 Default value: \fB3600 seconds\fR (one hour)
495 .RE
496
497 .sp
498 .ne 2
499 .na
500 \fBzfs_metaslab_mem_limit\fR (int)
501 .ad
502 .RS 12n
503 When we are loading a new metaslab, we check the amount of memory being used
504 to store metaslab range trees. If it is over a threshold, we attempt to unload
505 the least recently used metaslab to prevent the system from clogging all of
506 its memory with range trees. This tunable sets the percentage of total system
507 memory that is the threshold.
508 .sp
509 Default value: \fB25 percent\fR
510 .RE
511
512 .sp
513 .ne 2
514 .na
515 \fBzfs_vdev_default_ms_count\fR (int)
516 .ad
517 .RS 12n
518 When a vdev is added target this number of metaslabs per top-level vdev.
519 .sp
520 Default value: \fB200\fR.
521 .RE
522
523 .sp
524 .ne 2
525 .na
526 \fBzfs_vdev_default_ms_shift\fR (int)
527 .ad
528 .RS 12n
529 Default limit for metaslab size.
530 .sp
531 Default value: \fB29\fR [meaning (1 << 29) = 512MB].
532 .RE
533
534 .sp
535 .ne 2
536 .na
537 \fBzfs_vdev_max_auto_ashift\fR (ulong)
538 .ad
539 .RS 12n
540 Maximum ashift used when optimizing for logical -> physical sector size on new
541 top-level vdevs.
542 .sp
543 Default value: \fBASHIFT_MAX\fR (16).
544 .RE
545
546 .sp
547 .ne 2
548 .na
549 \fBzfs_vdev_min_auto_ashift\fR (ulong)
550 .ad
551 .RS 12n
552 Minimum ashift used when creating new top-level vdevs.
553 .sp
554 Default value: \fBASHIFT_MIN\fR (9).
555 .RE
556
557 .sp
558 .ne 2
559 .na
560 \fBzfs_vdev_min_ms_count\fR (int)
561 .ad
562 .RS 12n
563 Minimum number of metaslabs to create in a top-level vdev.
564 .sp
565 Default value: \fB16\fR.
566 .RE
567
568 .sp
569 .ne 2
570 .na
571 \fBvdev_validate_skip\fR (int)
572 .ad
573 .RS 12n
574 Skip label validation steps during pool import. Changing is not recommended
575 unless you know what you are doing and are recovering a damaged label.
576 .sp
577 Default value: \fB0\fR.
578 .RE
579
580 .sp
581 .ne 2
582 .na
583 \fBzfs_vdev_ms_count_limit\fR (int)
584 .ad
585 .RS 12n
586 Practical upper limit of total metaslabs per top-level vdev.
587 .sp
588 Default value: \fB131,072\fR.
589 .RE
590
591 .sp
592 .ne 2
593 .na
594 \fBmetaslab_preload_enabled\fR (int)
595 .ad
596 .RS 12n
597 Enable metaslab group preloading.
598 .sp
599 Use \fB1\fR for yes (default) and \fB0\fR for no.
600 .RE
601
602 .sp
603 .ne 2
604 .na
605 \fBmetaslab_lba_weighting_enabled\fR (int)
606 .ad
607 .RS 12n
608 Give more weight to metaslabs with lower LBAs, assuming they have
609 greater bandwidth as is typically the case on a modern constant
610 angular velocity disk drive.
611 .sp
612 Use \fB1\fR for yes (default) and \fB0\fR for no.
613 .RE
614
615 .sp
616 .ne 2
617 .na
618 \fBmetaslab_unload_delay\fR (int)
619 .ad
620 .RS 12n
621 After a metaslab is used, we keep it loaded for this many txgs, to attempt to
622 reduce unnecessary reloading. Note that both this many txgs and
623 \fBmetaslab_unload_delay_ms\fR milliseconds must pass before unloading will
624 occur.
625 .sp
626 Default value: \fB32\fR.
627 .RE
628
629 .sp
630 .ne 2
631 .na
632 \fBmetaslab_unload_delay_ms\fR (int)
633 .ad
634 .RS 12n
635 After a metaslab is used, we keep it loaded for this many milliseconds, to
636 attempt to reduce unnecessary reloading. Note that both this many
637 milliseconds and \fBmetaslab_unload_delay\fR txgs must pass before unloading
638 will occur.
639 .sp
640 Default value: \fB600000\fR (ten minutes).
641 .RE
642
643 .sp
644 .ne 2
645 .na
646 \fBsend_holes_without_birth_time\fR (int)
647 .ad
648 .RS 12n
649 When set, the hole_birth optimization will not be used, and all holes will
650 always be sent on zfs send. This is useful if you suspect your datasets are
651 affected by a bug in hole_birth.
652 .sp
653 Use \fB1\fR for on (default) and \fB0\fR for off.
654 .RE
655
656 .sp
657 .ne 2
658 .na
659 \fBspa_config_path\fR (charp)
660 .ad
661 .RS 12n
662 SPA config file
663 .sp
664 Default value: \fB/etc/zfs/zpool.cache\fR.
665 .RE
666
667 .sp
668 .ne 2
669 .na
670 \fBspa_asize_inflation\fR (int)
671 .ad
672 .RS 12n
673 Multiplication factor used to estimate actual disk consumption from the
674 size of data being written. The default value is a worst case estimate,
675 but lower values may be valid for a given pool depending on its
676 configuration. Pool administrators who understand the factors involved
677 may wish to specify a more realistic inflation factor, particularly if
678 they operate close to quota or capacity limits.
679 .sp
680 Default value: \fB24\fR.
681 .RE
682
683 .sp
684 .ne 2
685 .na
686 \fBspa_load_print_vdev_tree\fR (int)
687 .ad
688 .RS 12n
689 Whether to print the vdev tree in the debugging message buffer during pool import.
690 Use 0 to disable and 1 to enable.
691 .sp
692 Default value: \fB0\fR.
693 .RE
694
695 .sp
696 .ne 2
697 .na
698 \fBspa_load_verify_data\fR (int)
699 .ad
700 .RS 12n
701 Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
702 import. Use 0 to disable and 1 to enable.
703
704 An extreme rewind import normally performs a full traversal of all
705 blocks in the pool for verification. If this parameter is set to 0,
706 the traversal skips non-metadata blocks. It can be toggled once the
707 import has started to stop or start the traversal of non-metadata blocks.
708 .sp
709 Default value: \fB1\fR.
710 .RE
711
712 .sp
713 .ne 2
714 .na
715 \fBspa_load_verify_metadata\fR (int)
716 .ad
717 .RS 12n
718 Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
719 pool import. Use 0 to disable and 1 to enable.
720
721 An extreme rewind import normally performs a full traversal of all
722 blocks in the pool for verification. If this parameter is set to 0,
723 the traversal is not performed. It can be toggled once the import has
724 started to stop or start the traversal.
725 .sp
726 Default value: \fB1\fR.
727 .RE
728
729 .sp
730 .ne 2
731 .na
732 \fBspa_load_verify_shift\fR (int)
733 .ad
734 .RS 12n
735 Sets the maximum number of bytes to consume during pool import to the log2
736 fraction of the target ARC size.
737 .sp
738 Default value: \fB4\fR.
739 .RE
740
741 .sp
742 .ne 2
743 .na
744 \fBspa_slop_shift\fR (int)
745 .ad
746 .RS 12n
747 Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
748 in the pool to be consumed. This ensures that we don't run the pool
749 completely out of space, due to unaccounted changes (e.g. to the MOS).
750 It also limits the worst-case time to allocate space. If we have
751 less than this amount of free space, most ZPL operations (e.g. write,
752 create) will return ENOSPC.
753 .sp
754 Default value: \fB5\fR.
755 .RE
756
757 .sp
758 .ne 2
759 .na
760 \fBvdev_removal_max_span\fR (int)
761 .ad
762 .RS 12n
763 During top-level vdev removal, chunks of data are copied from the vdev
764 which may include free space in order to trade bandwidth for IOPS.
765 This parameter determines the maximum span of free space (in bytes)
766 which will be included as "unnecessary" data in a chunk of copied data.
767
768 The default value here was chosen to align with
769 \fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing
770 regular reads (but there's no reason it has to be the same).
771 .sp
772 Default value: \fB32,768\fR.
773 .RE
774
775 .sp
776 .ne 2
777 .na
778 \fBvdev_file_logical_ashift\fR (ulong)
779 .ad
780 .RS 12n
781 Logical ashift for file-based devices.
782 .sp
783 Default value: \fB9\fR.
784 .RE
785
786 .sp
787 .ne 2
788 .na
789 \fBvdev_file_physical_ashift\fR (ulong)
790 .ad
791 .RS 12n
792 Physical ashift for file-based devices.
793 .sp
794 Default value: \fB9\fR.
795 .RE
796
797 .sp
798 .ne 2
799 .na
800 \fBzap_iterate_prefetch\fR (int)
801 .ad
802 .RS 12n
803 If this is set, when we start iterating over a ZAP object, zfs will prefetch
804 the entire object (all leaf blocks). However, this is limited by
805 \fBdmu_prefetch_max\fR.
806 .sp
807 Use \fB1\fR for on (default) and \fB0\fR for off.
808 .RE
809
810 .sp
811 .ne 2
812 .na
813 \fBzfetch_array_rd_sz\fR (ulong)
814 .ad
815 .RS 12n
816 If prefetching is enabled, disable prefetching for reads larger than this size.
817 .sp
818 Default value: \fB1,048,576\fR.
819 .RE
820
821 .sp
822 .ne 2
823 .na
824 \fBzfetch_max_distance\fR (uint)
825 .ad
826 .RS 12n
827 Max bytes to prefetch per stream (default 8MB).
828 .sp
829 Default value: \fB8,388,608\fR.
830 .RE
831
832 .sp
833 .ne 2
834 .na
835 \fBzfetch_max_streams\fR (uint)
836 .ad
837 .RS 12n
838 Max number of streams per zfetch (prefetch streams per file).
839 .sp
840 Default value: \fB8\fR.
841 .RE
842
843 .sp
844 .ne 2
845 .na
846 \fBzfetch_min_sec_reap\fR (uint)
847 .ad
848 .RS 12n
849 Min time before an active prefetch stream can be reclaimed
850 .sp
851 Default value: \fB2\fR.
852 .RE
853
854 .sp
855 .ne 2
856 .na
857 \fBzfs_abd_scatter_enabled\fR (int)
858 .ad
859 .RS 12n
860 Enables ARC from using scatter/gather lists and forces all allocations to be
861 linear in kernel memory. Disabling can improve performance in some code paths
862 at the expense of fragmented kernel memory.
863 .sp
864 Default value: \fB1\fR.
865 .RE
866
867 .sp
868 .ne 2
869 .na
870 \fBzfs_abd_scatter_max_order\fR (iunt)
871 .ad
872 .RS 12n
873 Maximum number of consecutive memory pages allocated in a single block for
874 scatter/gather lists. Default value is specified by the kernel itself.
875 .sp
876 Default value: \fB10\fR at the time of this writing.
877 .RE
878
879 .sp
880 .ne 2
881 .na
882 \fBzfs_abd_scatter_min_size\fR (uint)
883 .ad
884 .RS 12n
885 This is the minimum allocation size that will use scatter (page-based)
886 ABD's. Smaller allocations will use linear ABD's.
887 .sp
888 Default value: \fB1536\fR (512B and 1KB allocations will be linear).
889 .RE
890
891 .sp
892 .ne 2
893 .na
894 \fBzfs_arc_dnode_limit\fR (ulong)
895 .ad
896 .RS 12n
897 When the number of bytes consumed by dnodes in the ARC exceeds this number of
898 bytes, try to unpin some of it in response to demand for non-metadata. This
899 value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
900 indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
901 the ARC meta buffers that may be used for dnodes.
902
903 See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
904 when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
905 than in response to overall demand for non-metadata.
906
907 .sp
908 Default value: \fB0\fR.
909 .RE
910
911 .sp
912 .ne 2
913 .na
914 \fBzfs_arc_dnode_limit_percent\fR (ulong)
915 .ad
916 .RS 12n
917 Percentage that can be consumed by dnodes of ARC meta buffers.
918 .sp
919 See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
920 higher priority if set to nonzero value.
921 .sp
922 Default value: \fB10\fR%.
923 .RE
924
925 .sp
926 .ne 2
927 .na
928 \fBzfs_arc_dnode_reduce_percent\fR (ulong)
929 .ad
930 .RS 12n
931 Percentage of ARC dnodes to try to scan in response to demand for non-metadata
932 when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
933
934 .sp
935 Default value: \fB10\fR% of the number of dnodes in the ARC.
936 .RE
937
938 .sp
939 .ne 2
940 .na
941 \fBzfs_arc_average_blocksize\fR (int)
942 .ad
943 .RS 12n
944 The ARC's buffer hash table is sized based on the assumption of an average
945 block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
946 to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
947 For configurations with a known larger average block size this value can be
948 increased to reduce the memory footprint.
949
950 .sp
951 Default value: \fB8192\fR.
952 .RE
953
954 .sp
955 .ne 2
956 .na
957 \fBzfs_arc_eviction_pct\fR (int)
958 .ad
959 .RS 12n
960 When \fBarc_is_overflowing()\fR, \fBarc_get_data_impl()\fR waits for this
961 percent of the requested amount of data to be evicted. For example, by
962 default for every 2KB that's evicted, 1KB of it may be "reused" by a new
963 allocation. Since this is above 100%, it ensures that progress is made
964 towards getting \fBarc_size\fR under \fBarc_c\fR. Since this is finite, it
965 ensures that allocations can still happen, even during the potentially long
966 time that \fBarc_size\fR is more than \fBarc_c\fR.
967 .sp
968 Default value: \fB200\fR.
969 .RE
970
971 .sp
972 .ne 2
973 .na
974 \fBzfs_arc_evict_batch_limit\fR (int)
975 .ad
976 .RS 12n
977 Number ARC headers to evict per sub-list before proceeding to another sub-list.
978 This batch-style operation prevents entire sub-lists from being evicted at once
979 but comes at a cost of additional unlocking and locking.
980 .sp
981 Default value: \fB10\fR.
982 .RE
983
984 .sp
985 .ne 2
986 .na
987 \fBzfs_arc_grow_retry\fR (int)
988 .ad
989 .RS 12n
990 If set to a non zero value, it will replace the arc_grow_retry value with this value.
991 The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
992 trying to resume growth after a memory pressure event.
993 .sp
994 Default value: \fB0\fR.
995 .RE
996
997 .sp
998 .ne 2
999 .na
1000 \fBzfs_arc_lotsfree_percent\fR (int)
1001 .ad
1002 .RS 12n
1003 Throttle I/O when free system memory drops below this percentage of total
1004 system memory. Setting this value to 0 will disable the throttle.
1005 .sp
1006 Default value: \fB10\fR%.
1007 .RE
1008
1009 .sp
1010 .ne 2
1011 .na
1012 \fBzfs_arc_max\fR (ulong)
1013 .ad
1014 .RS 12n
1015 Max size of ARC in bytes. If set to 0 then the max size of ARC is determined
1016 by the amount of system memory installed. For Linux, 1/2 of system memory will
1017 be used as the limit. For FreeBSD, the larger of all system memory - 1GB or
1018 5/8 of system memory will be used as the limit. This value must be at least
1019 67108864 (64 megabytes).
1020 .sp
1021 This value can be changed dynamically with some caveats. It cannot be set back
1022 to 0 while running and reducing it below the current ARC size will not cause
1023 the ARC to shrink without memory pressure to induce shrinking.
1024 .sp
1025 Default value: \fB0\fR.
1026 .RE
1027
1028 .sp
1029 .ne 2
1030 .na
1031 \fBzfs_arc_meta_adjust_restarts\fR (ulong)
1032 .ad
1033 .RS 12n
1034 The number of restart passes to make while scanning the ARC attempting
1035 the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
1036 This value should not need to be tuned but is available to facilitate
1037 performance analysis.
1038 .sp
1039 Default value: \fB4096\fR.
1040 .RE
1041
1042 .sp
1043 .ne 2
1044 .na
1045 \fBzfs_arc_meta_limit\fR (ulong)
1046 .ad
1047 .RS 12n
1048 The maximum allowed size in bytes that meta data buffers are allowed to
1049 consume in the ARC. When this limit is reached meta data buffers will
1050 be reclaimed even if the overall arc_c_max has not been reached. This
1051 value defaults to 0 which indicates that a percent which is based on
1052 \fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
1053 .sp
1054 This value my be changed dynamically except that it cannot be set back to 0
1055 for a specific percent of the ARC; it must be set to an explicit value.
1056 .sp
1057 Default value: \fB0\fR.
1058 .RE
1059
1060 .sp
1061 .ne 2
1062 .na
1063 \fBzfs_arc_meta_limit_percent\fR (ulong)
1064 .ad
1065 .RS 12n
1066 Percentage of ARC buffers that can be used for meta data.
1067
1068 See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
1069 higher priority if set to nonzero value.
1070
1071 .sp
1072 Default value: \fB75\fR%.
1073 .RE
1074
1075 .sp
1076 .ne 2
1077 .na
1078 \fBzfs_arc_meta_min\fR (ulong)
1079 .ad
1080 .RS 12n
1081 The minimum allowed size in bytes that meta data buffers may consume in
1082 the ARC. This value defaults to 0 which disables a floor on the amount
1083 of the ARC devoted meta data.
1084 .sp
1085 Default value: \fB0\fR.
1086 .RE
1087
1088 .sp
1089 .ne 2
1090 .na
1091 \fBzfs_arc_meta_prune\fR (int)
1092 .ad
1093 .RS 12n
1094 The number of dentries and inodes to be scanned looking for entries
1095 which can be dropped. This may be required when the ARC reaches the
1096 \fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
1097 in the ARC. Increasing this value will cause to dentry and inode caches
1098 to be pruned more aggressively. Setting this value to 0 will disable
1099 pruning the inode and dentry caches.
1100 .sp
1101 Default value: \fB10,000\fR.
1102 .RE
1103
1104 .sp
1105 .ne 2
1106 .na
1107 \fBzfs_arc_meta_strategy\fR (int)
1108 .ad
1109 .RS 12n
1110 Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
1111 A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
1112 A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
1113 that is required to in order to evict the required number of meta data buffers.
1114 .sp
1115 Default value: \fB1\fR.
1116 .RE
1117
1118 .sp
1119 .ne 2
1120 .na
1121 \fBzfs_arc_min\fR (ulong)
1122 .ad
1123 .RS 12n
1124 Min size of ARC in bytes. If set to 0 then arc_c_min will default to
1125 consuming the larger of 32M or 1/32 of total system memory.
1126 .sp
1127 Default value: \fB0\fR.
1128 .RE
1129
1130 .sp
1131 .ne 2
1132 .na
1133 \fBzfs_arc_min_prefetch_ms\fR (int)
1134 .ad
1135 .RS 12n
1136 Minimum time prefetched blocks are locked in the ARC, specified in ms.
1137 A value of \fB0\fR will default to 1000 ms.
1138 .sp
1139 Default value: \fB0\fR.
1140 .RE
1141
1142 .sp
1143 .ne 2
1144 .na
1145 \fBzfs_arc_min_prescient_prefetch_ms\fR (int)
1146 .ad
1147 .RS 12n
1148 Minimum time "prescient prefetched" blocks are locked in the ARC, specified
1149 in ms. These blocks are meant to be prefetched fairly aggressively ahead of
1150 the code that may use them. A value of \fB0\fR will default to 6000 ms.
1151 .sp
1152 Default value: \fB0\fR.
1153 .RE
1154
1155 .sp
1156 .ne 2
1157 .na
1158 \fBzfs_max_missing_tvds\fR (int)
1159 .ad
1160 .RS 12n
1161 Number of missing top-level vdevs which will be allowed during
1162 pool import (only in read-only mode).
1163 .sp
1164 Default value: \fB0\fR
1165 .RE
1166
1167 .sp
1168 .ne 2
1169 .na
1170 \fBzfs_max_nvlist_src_size\fR (ulong)
1171 .ad
1172 .RS 12n
1173 Maximum size in bytes allowed to be passed as zc_nvlist_src_size for ioctls on
1174 /dev/zfs. This prevents a user from causing the kernel to allocate an excessive
1175 amount of memory. When the limit is exceeded, the ioctl fails with EINVAL and a
1176 description of the error is sent to the zfs-dbgmsg log. This parameter should
1177 not need to be touched under normal circumstances. On FreeBSD, the default is
1178 based on the system limit on user wired memory. On Linux, the default is
1179 \fBKMALLOC_MAX_SIZE\fR .
1180 .sp
1181 Default value: \fB0\fR (kernel decides)
1182 .RE
1183
1184 .sp
1185 .ne 2
1186 .na
1187 \fBzfs_multilist_num_sublists\fR (int)
1188 .ad
1189 .RS 12n
1190 To allow more fine-grained locking, each ARC state contains a series
1191 of lists for both data and meta data objects. Locking is performed at
1192 the level of these "sub-lists". This parameters controls the number of
1193 sub-lists per ARC state, and also applies to other uses of the
1194 multilist data structure.
1195 .sp
1196 Default value: \fB4\fR or the number of online CPUs, whichever is greater
1197 .RE
1198
1199 .sp
1200 .ne 2
1201 .na
1202 \fBzfs_arc_overflow_shift\fR (int)
1203 .ad
1204 .RS 12n
1205 The ARC size is considered to be overflowing if it exceeds the current
1206 ARC target size (arc_c) by a threshold determined by this parameter.
1207 The threshold is calculated as a fraction of arc_c using the formula
1208 "arc_c >> \fBzfs_arc_overflow_shift\fR".
1209
1210 The default value of 8 causes the ARC to be considered to be overflowing
1211 if it exceeds the target size by 1/256th (0.3%) of the target size.
1212
1213 When the ARC is overflowing, new buffer allocations are stalled until
1214 the reclaim thread catches up and the overflow condition no longer exists.
1215 .sp
1216 Default value: \fB8\fR.
1217 .RE
1218
1219 .sp
1220 .ne 2
1221 .na
1222
1223 \fBzfs_arc_p_min_shift\fR (int)
1224 .ad
1225 .RS 12n
1226 If set to a non zero value, this will update arc_p_min_shift (default 4)
1227 with the new value.
1228 arc_p_min_shift is used to shift of arc_c for calculating both min and max
1229 max arc_p
1230 .sp
1231 Default value: \fB0\fR.
1232 .RE
1233
1234 .sp
1235 .ne 2
1236 .na
1237 \fBzfs_arc_p_dampener_disable\fR (int)
1238 .ad
1239 .RS 12n
1240 Disable arc_p adapt dampener
1241 .sp
1242 Use \fB1\fR for yes (default) and \fB0\fR to disable.
1243 .RE
1244
1245 .sp
1246 .ne 2
1247 .na
1248 \fBzfs_arc_shrink_shift\fR (int)
1249 .ad
1250 .RS 12n
1251 If set to a non zero value, this will update arc_shrink_shift (default 7)
1252 with the new value.
1253 .sp
1254 Default value: \fB0\fR.
1255 .RE
1256
1257 .sp
1258 .ne 2
1259 .na
1260 \fBzfs_arc_pc_percent\fR (uint)
1261 .ad
1262 .RS 12n
1263 Percent of pagecache to reclaim arc to
1264
1265 This tunable allows ZFS arc to play more nicely with the kernel's LRU
1266 pagecache. It can guarantee that the ARC size won't collapse under scanning
1267 pressure on the pagecache, yet still allows arc to be reclaimed down to
1268 zfs_arc_min if necessary. This value is specified as percent of pagecache
1269 size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
1270 only operates during memory pressure/reclaim.
1271 .sp
1272 Default value: \fB0\fR% (disabled).
1273 .RE
1274
1275 .sp
1276 .ne 2
1277 .na
1278 \fBzfs_arc_shrinker_limit\fR (int)
1279 .ad
1280 .RS 12n
1281 This is a limit on how many pages the ARC shrinker makes available for
1282 eviction in response to one page allocation attempt. Note that in
1283 practice, the kernel's shrinker can ask us to evict up to about 4x this
1284 for one allocation attempt.
1285 .sp
1286 The default limit of 10,000 (in practice, 160MB per allocation attempt with
1287 4K pages) limits the amount of time spent attempting to reclaim ARC memory to
1288 less than 100ms per allocation attempt, even with a small average compressed
1289 block size of ~8KB.
1290 .sp
1291 The parameter can be set to 0 (zero) to disable the limit.
1292 .sp
1293 This parameter only applies on Linux.
1294 .sp
1295 Default value: \fB10,000\fR.
1296 .RE
1297
1298 .sp
1299 .ne 2
1300 .na
1301 \fBzfs_arc_sys_free\fR (ulong)
1302 .ad
1303 .RS 12n
1304 The target number of bytes the ARC should leave as free memory on the system.
1305 Defaults to the larger of 1/64 of physical memory or 512K. Setting this
1306 option to a non-zero value will override the default.
1307 .sp
1308 Default value: \fB0\fR.
1309 .RE
1310
1311 .sp
1312 .ne 2
1313 .na
1314 \fBzfs_autoimport_disable\fR (int)
1315 .ad
1316 .RS 12n
1317 Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
1318 .sp
1319 Use \fB1\fR for yes (default) and \fB0\fR for no.
1320 .RE
1321
1322 .sp
1323 .ne 2
1324 .na
1325 \fBzfs_checksum_events_per_second\fR (uint)
1326 .ad
1327 .RS 12n
1328 Rate limit checksum events to this many per second. Note that this should
1329 not be set below the zed thresholds (currently 10 checksums over 10 sec)
1330 or else zed may not trigger any action.
1331 .sp
1332 Default value: 20
1333 .RE
1334
1335 .sp
1336 .ne 2
1337 .na
1338 \fBzfs_commit_timeout_pct\fR (int)
1339 .ad
1340 .RS 12n
1341 This controls the amount of time that a ZIL block (lwb) will remain "open"
1342 when it isn't "full", and it has a thread waiting for it to be committed to
1343 stable storage. The timeout is scaled based on a percentage of the last lwb
1344 latency to avoid significantly impacting the latency of each individual
1345 transaction record (itx).
1346 .sp
1347 Default value: \fB5\fR%.
1348 .RE
1349
1350 .sp
1351 .ne 2
1352 .na
1353 \fBzfs_condense_indirect_commit_entry_delay_ms\fR (int)
1354 .ad
1355 .RS 12n
1356 Vdev indirection layer (used for device removal) sleeps for this many
1357 milliseconds during mapping generation. Intended for use with the test suite
1358 to throttle vdev removal speed.
1359 .sp
1360 Default value: \fB0\fR (no throttle).
1361 .RE
1362
1363 .sp
1364 .ne 2
1365 .na
1366 \fBzfs_condense_indirect_vdevs_enable\fR (int)
1367 .ad
1368 .RS 12n
1369 Enable condensing indirect vdev mappings. When set to a non-zero value,
1370 attempt to condense indirect vdev mappings if the mapping uses more than
1371 \fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete
1372 space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR
1373 bytes on-disk. The condensing process is an attempt to save memory by
1374 removing obsolete mappings.
1375 .sp
1376 Default value: \fB1\fR.
1377 .RE
1378
1379 .sp
1380 .ne 2
1381 .na
1382 \fBzfs_condense_max_obsolete_bytes\fR (ulong)
1383 .ad
1384 .RS 12n
1385 Only attempt to condense indirect vdev mappings if the on-disk size
1386 of the obsolete space map object is greater than this number of bytes
1387 (see \fBfBzfs_condense_indirect_vdevs_enable\fR).
1388 .sp
1389 Default value: \fB1,073,741,824\fR.
1390 .RE
1391
1392 .sp
1393 .ne 2
1394 .na
1395 \fBzfs_condense_min_mapping_bytes\fR (ulong)
1396 .ad
1397 .RS 12n
1398 Minimum size vdev mapping to attempt to condense (see
1399 \fBzfs_condense_indirect_vdevs_enable\fR).
1400 .sp
1401 Default value: \fB131,072\fR.
1402 .RE
1403
1404 .sp
1405 .ne 2
1406 .na
1407 \fBzfs_dbgmsg_enable\fR (int)
1408 .ad
1409 .RS 12n
1410 Internally ZFS keeps a small log to facilitate debugging. By default the log
1411 is disabled, to enable it set this option to 1. The contents of the log can
1412 be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
1413 this proc file clears the log.
1414 .sp
1415 Default value: \fB0\fR.
1416 .RE
1417
1418 .sp
1419 .ne 2
1420 .na
1421 \fBzfs_dbgmsg_maxsize\fR (int)
1422 .ad
1423 .RS 12n
1424 The maximum size in bytes of the internal ZFS debug log.
1425 .sp
1426 Default value: \fB4M\fR.
1427 .RE
1428
1429 .sp
1430 .ne 2
1431 .na
1432 \fBzfs_dbuf_state_index\fR (int)
1433 .ad
1434 .RS 12n
1435 This feature is currently unused. It is normally used for controlling what
1436 reporting is available under /proc/spl/kstat/zfs.
1437 .sp
1438 Default value: \fB0\fR.
1439 .RE
1440
1441 .sp
1442 .ne 2
1443 .na
1444 \fBzfs_deadman_enabled\fR (int)
1445 .ad
1446 .RS 12n
1447 When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
1448 milliseconds, or when an individual I/O takes longer than
1449 \fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
1450 be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
1451 invoked as described by the \fBzfs_deadman_failmode\fR module option.
1452 By default the deadman is enabled and configured to \fBwait\fR which results
1453 in "hung" I/Os only being logged. The deadman is automatically disabled
1454 when a pool gets suspended.
1455 .sp
1456 Default value: \fB1\fR.
1457 .RE
1458
1459 .sp
1460 .ne 2
1461 .na
1462 \fBzfs_deadman_failmode\fR (charp)
1463 .ad
1464 .RS 12n
1465 Controls the failure behavior when the deadman detects a "hung" I/O. Valid
1466 values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
1467 .sp
1468 \fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
1469 "deadman" event will be posted describing that I/O.
1470 .sp
1471 \fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
1472 to the I/O pipeline if possible.
1473 .sp
1474 \fBpanic\fR - Panic the system. This can be used to facilitate an automatic
1475 fail-over to a properly configured fail-over partner.
1476 .sp
1477 Default value: \fBwait\fR.
1478 .RE
1479
1480 .sp
1481 .ne 2
1482 .na
1483 \fBzfs_deadman_checktime_ms\fR (int)
1484 .ad
1485 .RS 12n
1486 Check time in milliseconds. This defines the frequency at which we check
1487 for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
1488 .sp
1489 Default value: \fB60,000\fR.
1490 .RE
1491
1492 .sp
1493 .ne 2
1494 .na
1495 \fBzfs_deadman_synctime_ms\fR (ulong)
1496 .ad
1497 .RS 12n
1498 Interval in milliseconds after which the deadman is triggered and also
1499 the interval after which a pool sync operation is considered to be "hung".
1500 Once this limit is exceeded the deadman will be invoked every
1501 \fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
1502 .sp
1503 Default value: \fB600,000\fR.
1504 .RE
1505
1506 .sp
1507 .ne 2
1508 .na
1509 \fBzfs_deadman_ziotime_ms\fR (ulong)
1510 .ad
1511 .RS 12n
1512 Interval in milliseconds after which the deadman is triggered and an
1513 individual I/O operation is considered to be "hung". As long as the I/O
1514 remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
1515 milliseconds until the I/O completes.
1516 .sp
1517 Default value: \fB300,000\fR.
1518 .RE
1519
1520 .sp
1521 .ne 2
1522 .na
1523 \fBzfs_dedup_prefetch\fR (int)
1524 .ad
1525 .RS 12n
1526 Enable prefetching dedup-ed blks
1527 .sp
1528 Use \fB1\fR for yes and \fB0\fR to disable (default).
1529 .RE
1530
1531 .sp
1532 .ne 2
1533 .na
1534 \fBzfs_delay_min_dirty_percent\fR (int)
1535 .ad
1536 .RS 12n
1537 Start to delay each transaction once there is this amount of dirty data,
1538 expressed as a percentage of \fBzfs_dirty_data_max\fR.
1539 This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
1540 See the section "ZFS TRANSACTION DELAY".
1541 .sp
1542 Default value: \fB60\fR%.
1543 .RE
1544
1545 .sp
1546 .ne 2
1547 .na
1548 \fBzfs_delay_scale\fR (int)
1549 .ad
1550 .RS 12n
1551 This controls how quickly the transaction delay approaches infinity.
1552 Larger values cause longer delays for a given amount of dirty data.
1553 .sp
1554 For the smoothest delay, this value should be about 1 billion divided
1555 by the maximum number of operations per second. This will smoothly
1556 handle between 10x and 1/10th this number.
1557 .sp
1558 See the section "ZFS TRANSACTION DELAY".
1559 .sp
1560 Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
1561 .sp
1562 Default value: \fB500,000\fR.
1563 .RE
1564
1565 .sp
1566 .ne 2
1567 .na
1568 \fBzfs_disable_ivset_guid_check\fR (int)
1569 .ad
1570 .RS 12n
1571 Disables requirement for IVset guids to be present and match when doing a raw
1572 receive of encrypted datasets. Intended for users whose pools were created with
1573 ZFS on Linux pre-release versions and now have compatibility issues.
1574 .sp
1575 Default value: \fB0\fR.
1576 .RE
1577
1578 .sp
1579 .ne 2
1580 .na
1581 \fBzfs_key_max_salt_uses\fR (ulong)
1582 .ad
1583 .RS 12n
1584 Maximum number of uses of a single salt value before generating a new one for
1585 encrypted datasets. The default value is also the maximum that will be
1586 accepted.
1587 .sp
1588 Default value: \fB400,000,000\fR.
1589 .RE
1590
1591 .sp
1592 .ne 2
1593 .na
1594 \fBzfs_object_mutex_size\fR (uint)
1595 .ad
1596 .RS 12n
1597 Size of the znode hashtable used for holds.
1598
1599 Due to the need to hold locks on objects that may not exist yet, kernel mutexes
1600 are not created per-object and instead a hashtable is used where collisions
1601 will result in objects waiting when there is not actually contention on the
1602 same object.
1603 .sp
1604 Default value: \fB64\fR.
1605 .RE
1606
1607 .sp
1608 .ne 2
1609 .na
1610 \fBzfs_slow_io_events_per_second\fR (int)
1611 .ad
1612 .RS 12n
1613 Rate limit delay zevents (which report slow I/Os) to this many per second.
1614 .sp
1615 Default value: 20
1616 .RE
1617
1618 .sp
1619 .ne 2
1620 .na
1621 \fBzfs_unflushed_max_mem_amt\fR (ulong)
1622 .ad
1623 .RS 12n
1624 Upper-bound limit for unflushed metadata changes to be held by the
1625 log spacemap in memory (in bytes).
1626 .sp
1627 Default value: \fB1,073,741,824\fR (1GB).
1628 .RE
1629
1630 .sp
1631 .ne 2
1632 .na
1633 \fBzfs_unflushed_max_mem_ppm\fR (ulong)
1634 .ad
1635 .RS 12n
1636 Percentage of the overall system memory that ZFS allows to be used
1637 for unflushed metadata changes by the log spacemap.
1638 (value is calculated over 1000000 for finer granularity).
1639 .sp
1640 Default value: \fB1000\fR (which is divided by 1000000, resulting in
1641 the limit to be \fB0.1\fR% of memory)
1642 .RE
1643
1644 .sp
1645 .ne 2
1646 .na
1647 \fBzfs_unflushed_log_block_max\fR (ulong)
1648 .ad
1649 .RS 12n
1650 Describes the maximum number of log spacemap blocks allowed for each pool.
1651 The default value of 262144 means that the space in all the log spacemaps
1652 can add up to no more than 262144 blocks (which means 32GB of logical
1653 space before compression and ditto blocks, assuming that blocksize is
1654 128k).
1655 .sp
1656 This tunable is important because it involves a trade-off between import
1657 time after an unclean export and the frequency of flushing metaslabs.
1658 The higher this number is, the more log blocks we allow when the pool is
1659 active which means that we flush metaslabs less often and thus decrease
1660 the number of I/Os for spacemap updates per TXG.
1661 At the same time though, that means that in the event of an unclean export,
1662 there will be more log spacemap blocks for us to read, inducing overhead
1663 in the import time of the pool.
1664 The lower the number, the amount of flushing increases destroying log
1665 blocks quicker as they become obsolete faster, which leaves less blocks
1666 to be read during import time after a crash.
1667 .sp
1668 Each log spacemap block existing during pool import leads to approximately
1669 one extra logical I/O issued.
1670 This is the reason why this tunable is exposed in terms of blocks rather
1671 than space used.
1672 .sp
1673 Default value: \fB262144\fR (256K).
1674 .RE
1675
1676 .sp
1677 .ne 2
1678 .na
1679 \fBzfs_unflushed_log_block_min\fR (ulong)
1680 .ad
1681 .RS 12n
1682 If the number of metaslabs is small and our incoming rate is high, we
1683 could get into a situation that we are flushing all our metaslabs every
1684 TXG.
1685 Thus we always allow at least this many log blocks.
1686 .sp
1687 Default value: \fB1000\fR.
1688 .RE
1689
1690 .sp
1691 .ne 2
1692 .na
1693 \fBzfs_unflushed_log_block_pct\fR (ulong)
1694 .ad
1695 .RS 12n
1696 Tunable used to determine the number of blocks that can be used for
1697 the spacemap log, expressed as a percentage of the total number of
1698 metaslabs in the pool.
1699 .sp
1700 Default value: \fB400\fR (read as \fB400\fR% - meaning that the number
1701 of log spacemap blocks are capped at 4 times the number of
1702 metaslabs in the pool).
1703 .RE
1704
1705 .sp
1706 .ne 2
1707 .na
1708 \fBzfs_unlink_suspend_progress\fR (uint)
1709 .ad
1710 .RS 12n
1711 When enabled, files will not be asynchronously removed from the list of pending
1712 unlinks and the space they consume will be leaked. Once this option has been
1713 disabled and the dataset is remounted, the pending unlinks will be processed
1714 and the freed space returned to the pool.
1715 This option is used by the test suite to facilitate testing.
1716 .sp
1717 Uses \fB0\fR (default) to allow progress and \fB1\fR to pause progress.
1718 .RE
1719
1720 .sp
1721 .ne 2
1722 .na
1723 \fBzfs_delete_blocks\fR (ulong)
1724 .ad
1725 .RS 12n
1726 This is the used to define a large file for the purposes of delete. Files
1727 containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
1728 while smaller files are deleted synchronously. Decreasing this value will
1729 reduce the time spent in an unlink(2) system call at the expense of a longer
1730 delay before the freed space is available.
1731 .sp
1732 Default value: \fB20,480\fR.
1733 .RE
1734
1735 .sp
1736 .ne 2
1737 .na
1738 \fBzfs_dirty_data_max\fR (int)
1739 .ad
1740 .RS 12n
1741 Determines the dirty space limit in bytes. Once this limit is exceeded, new
1742 writes are halted until space frees up. This parameter takes precedence
1743 over \fBzfs_dirty_data_max_percent\fR.
1744 See the section "ZFS TRANSACTION DELAY".
1745 .sp
1746 Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR.
1747 .RE
1748
1749 .sp
1750 .ne 2
1751 .na
1752 \fBzfs_dirty_data_max_max\fR (int)
1753 .ad
1754 .RS 12n
1755 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
1756 This limit is only enforced at module load time, and will be ignored if
1757 \fBzfs_dirty_data_max\fR is later changed. This parameter takes
1758 precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
1759 "ZFS TRANSACTION DELAY".
1760 .sp
1761 Default value: \fB25\fR% of physical RAM.
1762 .RE
1763
1764 .sp
1765 .ne 2
1766 .na
1767 \fBzfs_dirty_data_max_max_percent\fR (int)
1768 .ad
1769 .RS 12n
1770 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
1771 percentage of physical RAM. This limit is only enforced at module load
1772 time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
1773 The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
1774 one. See the section "ZFS TRANSACTION DELAY".
1775 .sp
1776 Default value: \fB25\fR%.
1777 .RE
1778
1779 .sp
1780 .ne 2
1781 .na
1782 \fBzfs_dirty_data_max_percent\fR (int)
1783 .ad
1784 .RS 12n
1785 Determines the dirty space limit, expressed as a percentage of all
1786 memory. Once this limit is exceeded, new writes are halted until space frees
1787 up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
1788 one. See the section "ZFS TRANSACTION DELAY".
1789 .sp
1790 Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR.
1791 .RE
1792
1793 .sp
1794 .ne 2
1795 .na
1796 \fBzfs_dirty_data_sync_percent\fR (int)
1797 .ad
1798 .RS 12n
1799 Start syncing out a transaction group if there's at least this much dirty data
1800 as a percentage of \fBzfs_dirty_data_max\fR. This should be less than
1801 \fBzfs_vdev_async_write_active_min_dirty_percent\fR.
1802 .sp
1803 Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR.
1804 .RE
1805
1806 .sp
1807 .ne 2
1808 .na
1809 \fBzfs_fallocate_reserve_percent\fR (uint)
1810 .ad
1811 .RS 12n
1812 Since ZFS is a copy-on-write filesystem with snapshots, blocks cannot be
1813 preallocated for a file in order to guarantee that later writes will not
1814 run out of space. Instead, fallocate() space preallocation only checks
1815 that sufficient space is currently available in the pool or the user's
1816 project quota allocation, and then creates a sparse file of the requested
1817 size. The requested space is multiplied by \fBzfs_fallocate_reserve_percent\fR
1818 to allow additional space for indirect blocks and other internal metadata.
1819 Setting this value to 0 disables support for fallocate(2) and returns
1820 EOPNOTSUPP for fallocate() space preallocation again.
1821 .sp
1822 Default value: \fB110\fR%
1823 .RE
1824
1825 .sp
1826 .ne 2
1827 .na
1828 \fBzfs_fletcher_4_impl\fR (string)
1829 .ad
1830 .RS 12n
1831 Select a fletcher 4 implementation.
1832 .sp
1833 Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
1834 \fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR.
1835 All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
1836 set extensions to be available and will only appear if ZFS detects that they are
1837 present at runtime. If multiple implementations of fletcher 4 are available,
1838 the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
1839 results in the original, CPU based calculation, being used. Selecting any option
1840 other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
1841 the respective CPU instruction set being used.
1842 .sp
1843 Default value: \fBfastest\fR.
1844 .RE
1845
1846 .sp
1847 .ne 2
1848 .na
1849 \fBzfs_free_bpobj_enabled\fR (int)
1850 .ad
1851 .RS 12n
1852 Enable/disable the processing of the free_bpobj object.
1853 .sp
1854 Default value: \fB1\fR.
1855 .RE
1856
1857 .sp
1858 .ne 2
1859 .na
1860 \fBzfs_async_block_max_blocks\fR (ulong)
1861 .ad
1862 .RS 12n
1863 Maximum number of blocks freed in a single txg.
1864 .sp
1865 Default value: \fBULONG_MAX\fR (unlimited).
1866 .RE
1867
1868 .sp
1869 .ne 2
1870 .na
1871 \fBzfs_max_async_dedup_frees\fR (ulong)
1872 .ad
1873 .RS 12n
1874 Maximum number of dedup blocks freed in a single txg.
1875 .sp
1876 Default value: \fB100,000\fR.
1877 .RE
1878
1879 .sp
1880 .ne 2
1881 .na
1882 \fBzfs_override_estimate_recordsize\fR (ulong)
1883 .ad
1884 .RS 12n
1885 Record size calculation override for zfs send estimates.
1886 .sp
1887 Default value: \fB0\fR.
1888 .RE
1889
1890 .sp
1891 .ne 2
1892 .na
1893 \fBzfs_vdev_async_read_max_active\fR (int)
1894 .ad
1895 .RS 12n
1896 Maximum asynchronous read I/Os active to each device.
1897 See the section "ZFS I/O SCHEDULER".
1898 .sp
1899 Default value: \fB3\fR.
1900 .RE
1901
1902 .sp
1903 .ne 2
1904 .na
1905 \fBzfs_vdev_async_read_min_active\fR (int)
1906 .ad
1907 .RS 12n
1908 Minimum asynchronous read I/Os active to each device.
1909 See the section "ZFS I/O SCHEDULER".
1910 .sp
1911 Default value: \fB1\fR.
1912 .RE
1913
1914 .sp
1915 .ne 2
1916 .na
1917 \fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
1918 .ad
1919 .RS 12n
1920 When the pool has more than
1921 \fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
1922 \fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
1923 the dirty data is between min and max, the active I/O limit is linearly
1924 interpolated. See the section "ZFS I/O SCHEDULER".
1925 .sp
1926 Default value: \fB60\fR%.
1927 .RE
1928
1929 .sp
1930 .ne 2
1931 .na
1932 \fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
1933 .ad
1934 .RS 12n
1935 When the pool has less than
1936 \fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
1937 \fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
1938 the dirty data is between min and max, the active I/O limit is linearly
1939 interpolated. See the section "ZFS I/O SCHEDULER".
1940 .sp
1941 Default value: \fB30\fR%.
1942 .RE
1943
1944 .sp
1945 .ne 2
1946 .na
1947 \fBzfs_vdev_async_write_max_active\fR (int)
1948 .ad
1949 .RS 12n
1950 Maximum asynchronous write I/Os active to each device.
1951 See the section "ZFS I/O SCHEDULER".
1952 .sp
1953 Default value: \fB10\fR.
1954 .RE
1955
1956 .sp
1957 .ne 2
1958 .na
1959 \fBzfs_vdev_async_write_min_active\fR (int)
1960 .ad
1961 .RS 12n
1962 Minimum asynchronous write I/Os active to each device.
1963 See the section "ZFS I/O SCHEDULER".
1964 .sp
1965 Lower values are associated with better latency on rotational media but poorer
1966 resilver performance. The default value of 2 was chosen as a compromise. A
1967 value of 3 has been shown to improve resilver performance further at a cost of
1968 further increasing latency.
1969 .sp
1970 Default value: \fB2\fR.
1971 .RE
1972
1973 .sp
1974 .ne 2
1975 .na
1976 \fBzfs_vdev_initializing_max_active\fR (int)
1977 .ad
1978 .RS 12n
1979 Maximum initializing I/Os active to each device.
1980 See the section "ZFS I/O SCHEDULER".
1981 .sp
1982 Default value: \fB1\fR.
1983 .RE
1984
1985 .sp
1986 .ne 2
1987 .na
1988 \fBzfs_vdev_initializing_min_active\fR (int)
1989 .ad
1990 .RS 12n
1991 Minimum initializing I/Os active to each device.
1992 See the section "ZFS I/O SCHEDULER".
1993 .sp
1994 Default value: \fB1\fR.
1995 .RE
1996
1997 .sp
1998 .ne 2
1999 .na
2000 \fBzfs_vdev_max_active\fR (int)
2001 .ad
2002 .RS 12n
2003 The maximum number of I/Os active to each device. Ideally, this will be >=
2004 the sum of each queue's max_active. It must be at least the sum of each
2005 queue's min_active. See the section "ZFS I/O SCHEDULER".
2006 .sp
2007 Default value: \fB1,000\fR.
2008 .RE
2009
2010 .sp
2011 .ne 2
2012 .na
2013 \fBzfs_vdev_rebuild_max_active\fR (int)
2014 .ad
2015 .RS 12n
2016 Maximum sequential resilver I/Os active to each device.
2017 See the section "ZFS I/O SCHEDULER".
2018 .sp
2019 Default value: \fB3\fR.
2020 .RE
2021
2022 .sp
2023 .ne 2
2024 .na
2025 \fBzfs_vdev_rebuild_min_active\fR (int)
2026 .ad
2027 .RS 12n
2028 Minimum sequential resilver I/Os active to each device.
2029 See the section "ZFS I/O SCHEDULER".
2030 .sp
2031 Default value: \fB1\fR.
2032 .RE
2033
2034 .sp
2035 .ne 2
2036 .na
2037 \fBzfs_vdev_removal_max_active\fR (int)
2038 .ad
2039 .RS 12n
2040 Maximum removal I/Os active to each device.
2041 See the section "ZFS I/O SCHEDULER".
2042 .sp
2043 Default value: \fB2\fR.
2044 .RE
2045
2046 .sp
2047 .ne 2
2048 .na
2049 \fBzfs_vdev_removal_min_active\fR (int)
2050 .ad
2051 .RS 12n
2052 Minimum removal I/Os active to each device.
2053 See the section "ZFS I/O SCHEDULER".
2054 .sp
2055 Default value: \fB1\fR.
2056 .RE
2057
2058 .sp
2059 .ne 2
2060 .na
2061 \fBzfs_vdev_scrub_max_active\fR (int)
2062 .ad
2063 .RS 12n
2064 Maximum scrub I/Os active to each device.
2065 See the section "ZFS I/O SCHEDULER".
2066 .sp
2067 Default value: \fB2\fR.
2068 .RE
2069
2070 .sp
2071 .ne 2
2072 .na
2073 \fBzfs_vdev_scrub_min_active\fR (int)
2074 .ad
2075 .RS 12n
2076 Minimum scrub I/Os active to each device.
2077 See the section "ZFS I/O SCHEDULER".
2078 .sp
2079 Default value: \fB1\fR.
2080 .RE
2081
2082 .sp
2083 .ne 2
2084 .na
2085 \fBzfs_vdev_sync_read_max_active\fR (int)
2086 .ad
2087 .RS 12n
2088 Maximum synchronous read I/Os active to each device.
2089 See the section "ZFS I/O SCHEDULER".
2090 .sp
2091 Default value: \fB10\fR.
2092 .RE
2093
2094 .sp
2095 .ne 2
2096 .na
2097 \fBzfs_vdev_sync_read_min_active\fR (int)
2098 .ad
2099 .RS 12n
2100 Minimum synchronous read I/Os active to each device.
2101 See the section "ZFS I/O SCHEDULER".
2102 .sp
2103 Default value: \fB10\fR.
2104 .RE
2105
2106 .sp
2107 .ne 2
2108 .na
2109 \fBzfs_vdev_sync_write_max_active\fR (int)
2110 .ad
2111 .RS 12n
2112 Maximum synchronous write I/Os active to each device.
2113 See the section "ZFS I/O SCHEDULER".
2114 .sp
2115 Default value: \fB10\fR.
2116 .RE
2117
2118 .sp
2119 .ne 2
2120 .na
2121 \fBzfs_vdev_sync_write_min_active\fR (int)
2122 .ad
2123 .RS 12n
2124 Minimum synchronous write I/Os active to each device.
2125 See the section "ZFS I/O SCHEDULER".
2126 .sp
2127 Default value: \fB10\fR.
2128 .RE
2129
2130 .sp
2131 .ne 2
2132 .na
2133 \fBzfs_vdev_trim_max_active\fR (int)
2134 .ad
2135 .RS 12n
2136 Maximum trim/discard I/Os active to each device.
2137 See the section "ZFS I/O SCHEDULER".
2138 .sp
2139 Default value: \fB2\fR.
2140 .RE
2141
2142 .sp
2143 .ne 2
2144 .na
2145 \fBzfs_vdev_trim_min_active\fR (int)
2146 .ad
2147 .RS 12n
2148 Minimum trim/discard I/Os active to each device.
2149 See the section "ZFS I/O SCHEDULER".
2150 .sp
2151 Default value: \fB1\fR.
2152 .RE
2153
2154 .sp
2155 .ne 2
2156 .na
2157 \fBzfs_vdev_queue_depth_pct\fR (int)
2158 .ad
2159 .RS 12n
2160 Maximum number of queued allocations per top-level vdev expressed as
2161 a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
2162 system to detect devices that are more capable of handling allocations
2163 and to allocate more blocks to those devices. It allows for dynamic
2164 allocation distribution when devices are imbalanced as fuller devices
2165 will tend to be slower than empty devices.
2166
2167 See also \fBzio_dva_throttle_enabled\fR.
2168 .sp
2169 Default value: \fB1000\fR%.
2170 .RE
2171
2172 .sp
2173 .ne 2
2174 .na
2175 \fBzfs_expire_snapshot\fR (int)
2176 .ad
2177 .RS 12n
2178 Seconds to expire .zfs/snapshot
2179 .sp
2180 Default value: \fB300\fR.
2181 .RE
2182
2183 .sp
2184 .ne 2
2185 .na
2186 \fBzfs_admin_snapshot\fR (int)
2187 .ad
2188 .RS 12n
2189 Allow the creation, removal, or renaming of entries in the .zfs/snapshot
2190 directory to cause the creation, destruction, or renaming of snapshots.
2191 When enabled this functionality works both locally and over NFS exports
2192 which have the 'no_root_squash' option set. This functionality is disabled
2193 by default.
2194 .sp
2195 Use \fB1\fR for yes and \fB0\fR for no (default).
2196 .RE
2197
2198 .sp
2199 .ne 2
2200 .na
2201 \fBzfs_flags\fR (int)
2202 .ad
2203 .RS 12n
2204 Set additional debugging flags. The following flags may be bitwise-or'd
2205 together.
2206 .sp
2207 .TS
2208 box;
2209 rB lB
2210 lB lB
2211 r l.
2212 Value Symbolic Name
2213 Description
2214 _
2215 1 ZFS_DEBUG_DPRINTF
2216 Enable dprintf entries in the debug log.
2217 _
2218 2 ZFS_DEBUG_DBUF_VERIFY *
2219 Enable extra dbuf verifications.
2220 _
2221 4 ZFS_DEBUG_DNODE_VERIFY *
2222 Enable extra dnode verifications.
2223 _
2224 8 ZFS_DEBUG_SNAPNAMES
2225 Enable snapshot name verification.
2226 _
2227 16 ZFS_DEBUG_MODIFY
2228 Check for illegally modified ARC buffers.
2229 _
2230 64 ZFS_DEBUG_ZIO_FREE
2231 Enable verification of block frees.
2232 _
2233 128 ZFS_DEBUG_HISTOGRAM_VERIFY
2234 Enable extra spacemap histogram verifications.
2235 _
2236 256 ZFS_DEBUG_METASLAB_VERIFY
2237 Verify space accounting on disk matches in-core range_trees.
2238 _
2239 512 ZFS_DEBUG_SET_ERROR
2240 Enable SET_ERROR and dprintf entries in the debug log.
2241 _
2242 1024 ZFS_DEBUG_INDIRECT_REMAP
2243 Verify split blocks created by device removal.
2244 _
2245 2048 ZFS_DEBUG_TRIM
2246 Verify TRIM ranges are always within the allocatable range tree.
2247 _
2248 4096 ZFS_DEBUG_LOG_SPACEMAP
2249 Verify that the log summary is consistent with the spacemap log
2250 and enable zfs_dbgmsgs for metaslab loading and flushing.
2251 .TE
2252 .sp
2253 * Requires debug build.
2254 .sp
2255 Default value: \fB0\fR.
2256 .RE
2257
2258 .sp
2259 .ne 2
2260 .na
2261 \fBzfs_free_leak_on_eio\fR (int)
2262 .ad
2263 .RS 12n
2264 If destroy encounters an EIO while reading metadata (e.g. indirect
2265 blocks), space referenced by the missing metadata can not be freed.
2266 Normally this causes the background destroy to become "stalled", as
2267 it is unable to make forward progress. While in this stalled state,
2268 all remaining space to free from the error-encountering filesystem is
2269 "temporarily leaked". Set this flag to cause it to ignore the EIO,
2270 permanently leak the space from indirect blocks that can not be read,
2271 and continue to free everything else that it can.
2272
2273 The default, "stalling" behavior is useful if the storage partially
2274 fails (i.e. some but not all i/os fail), and then later recovers. In
2275 this case, we will be able to continue pool operations while it is
2276 partially failed, and when it recovers, we can continue to free the
2277 space, with no leaks. However, note that this case is actually
2278 fairly rare.
2279
2280 Typically pools either (a) fail completely (but perhaps temporarily,
2281 e.g. a top-level vdev going offline), or (b) have localized,
2282 permanent errors (e.g. disk returns the wrong data due to bit flip or
2283 firmware bug). In case (a), this setting does not matter because the
2284 pool will be suspended and the sync thread will not be able to make
2285 forward progress regardless. In case (b), because the error is
2286 permanent, the best we can do is leak the minimum amount of space,
2287 which is what setting this flag will do. Therefore, it is reasonable
2288 for this flag to normally be set, but we chose the more conservative
2289 approach of not setting it, so that there is no possibility of
2290 leaking space in the "partial temporary" failure case.
2291 .sp
2292 Default value: \fB0\fR.
2293 .RE
2294
2295 .sp
2296 .ne 2
2297 .na
2298 \fBzfs_free_min_time_ms\fR (int)
2299 .ad
2300 .RS 12n
2301 During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
2302 of this much time will be spent working on freeing blocks per txg.
2303 .sp
2304 Default value: \fB1,000\fR.
2305 .RE
2306
2307 .sp
2308 .ne 2
2309 .na
2310 \fBzfs_obsolete_min_time_ms\fR (int)
2311 .ad
2312 .RS 12n
2313 Similar to \fBzfs_free_min_time_ms\fR but for cleanup of old indirection records
2314 for removed vdevs.
2315 .sp
2316 Default value: \fB500\fR.
2317 .RE
2318
2319 .sp
2320 .ne 2
2321 .na
2322 \fBzfs_immediate_write_sz\fR (long)
2323 .ad
2324 .RS 12n
2325 Largest data block to write to zil. Larger blocks will be treated as if the
2326 dataset being written to had the property setting \fBlogbias=throughput\fR.
2327 .sp
2328 Default value: \fB32,768\fR.
2329 .RE
2330
2331 .sp
2332 .ne 2
2333 .na
2334 \fBzfs_initialize_value\fR (ulong)
2335 .ad
2336 .RS 12n
2337 Pattern written to vdev free space by \fBzpool initialize\fR.
2338 .sp
2339 Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee).
2340 .RE
2341
2342 .sp
2343 .ne 2
2344 .na
2345 \fBzfs_initialize_chunk_size\fR (ulong)
2346 .ad
2347 .RS 12n
2348 Size of writes used by \fBzpool initialize\fR.
2349 This option is used by the test suite to facilitate testing.
2350 .sp
2351 Default value: \fB1,048,576\fR
2352 .RE
2353
2354 .sp
2355 .ne 2
2356 .na
2357 \fBzfs_livelist_max_entries\fR (ulong)
2358 .ad
2359 .RS 12n
2360 The threshold size (in block pointers) at which we create a new sub-livelist.
2361 Larger sublists are more costly from a memory perspective but the fewer
2362 sublists there are, the lower the cost of insertion.
2363 .sp
2364 Default value: \fB500,000\fR.
2365 .RE
2366
2367 .sp
2368 .ne 2
2369 .na
2370 \fBzfs_livelist_min_percent_shared\fR (int)
2371 .ad
2372 .RS 12n
2373 If the amount of shared space between a snapshot and its clone drops below
2374 this threshold, the clone turns off the livelist and reverts to the old deletion
2375 method. This is in place because once a clone has been overwritten enough
2376 livelists no long give us a benefit.
2377 .sp
2378 Default value: \fB75\fR.
2379 .RE
2380
2381 .sp
2382 .ne 2
2383 .na
2384 \fBzfs_livelist_condense_new_alloc\fR (int)
2385 .ad
2386 .RS 12n
2387 Incremented each time an extra ALLOC blkptr is added to a livelist entry while
2388 it is being condensed.
2389 This option is used by the test suite to track race conditions.
2390 .sp
2391 Default value: \fB0\fR.
2392 .RE
2393
2394 .sp
2395 .ne 2
2396 .na
2397 \fBzfs_livelist_condense_sync_cancel\fR (int)
2398 .ad
2399 .RS 12n
2400 Incremented each time livelist condensing is canceled while in
2401 spa_livelist_condense_sync.
2402 This option is used by the test suite to track race conditions.
2403 .sp
2404 Default value: \fB0\fR.
2405 .RE
2406
2407 .sp
2408 .ne 2
2409 .na
2410 \fBzfs_livelist_condense_sync_pause\fR (int)
2411 .ad
2412 .RS 12n
2413 When set, the livelist condense process pauses indefinitely before
2414 executing the synctask - spa_livelist_condense_sync.
2415 This option is used by the test suite to trigger race conditions.
2416 .sp
2417 Default value: \fB0\fR.
2418 .RE
2419
2420 .sp
2421 .ne 2
2422 .na
2423 \fBzfs_livelist_condense_zthr_cancel\fR (int)
2424 .ad
2425 .RS 12n
2426 Incremented each time livelist condensing is canceled while in
2427 spa_livelist_condense_cb.
2428 This option is used by the test suite to track race conditions.
2429 .sp
2430 Default value: \fB0\fR.
2431 .RE
2432
2433 .sp
2434 .ne 2
2435 .na
2436 \fBzfs_livelist_condense_zthr_pause\fR (int)
2437 .ad
2438 .RS 12n
2439 When set, the livelist condense process pauses indefinitely before
2440 executing the open context condensing work in spa_livelist_condense_cb.
2441 This option is used by the test suite to trigger race conditions.
2442 .sp
2443 Default value: \fB0\fR.
2444 .RE
2445
2446 .sp
2447 .ne 2
2448 .na
2449 \fBzfs_lua_max_instrlimit\fR (ulong)
2450 .ad
2451 .RS 12n
2452 The maximum execution time limit that can be set for a ZFS channel program,
2453 specified as a number of Lua instructions.
2454 .sp
2455 Default value: \fB100,000,000\fR.
2456 .RE
2457
2458 .sp
2459 .ne 2
2460 .na
2461 \fBzfs_lua_max_memlimit\fR (ulong)
2462 .ad
2463 .RS 12n
2464 The maximum memory limit that can be set for a ZFS channel program, specified
2465 in bytes.
2466 .sp
2467 Default value: \fB104,857,600\fR.
2468 .RE
2469
2470 .sp
2471 .ne 2
2472 .na
2473 \fBzfs_max_dataset_nesting\fR (int)
2474 .ad
2475 .RS 12n
2476 The maximum depth of nested datasets. This value can be tuned temporarily to
2477 fix existing datasets that exceed the predefined limit.
2478 .sp
2479 Default value: \fB50\fR.
2480 .RE
2481
2482 .sp
2483 .ne 2
2484 .na
2485 \fBzfs_max_log_walking\fR (ulong)
2486 .ad
2487 .RS 12n
2488 The number of past TXGs that the flushing algorithm of the log spacemap
2489 feature uses to estimate incoming log blocks.
2490 .sp
2491 Default value: \fB5\fR.
2492 .RE
2493
2494 .sp
2495 .ne 2
2496 .na
2497 \fBzfs_max_logsm_summary_length\fR (ulong)
2498 .ad
2499 .RS 12n
2500 Maximum number of rows allowed in the summary of the spacemap log.
2501 .sp
2502 Default value: \fB10\fR.
2503 .RE
2504
2505 .sp
2506 .ne 2
2507 .na
2508 \fBzfs_max_recordsize\fR (int)
2509 .ad
2510 .RS 12n
2511 We currently support block sizes from 512 bytes to 16MB. The benefits of
2512 larger blocks, and thus larger I/O, need to be weighed against the cost of
2513 COWing a giant block to modify one byte. Additionally, very large blocks
2514 can have an impact on i/o latency, and also potentially on the memory
2515 allocator. Therefore, we do not allow the recordsize to be set larger than
2516 zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
2517 this tunable, and pools with larger blocks can always be imported and used,
2518 regardless of this setting.
2519 .sp
2520 Default value: \fB1,048,576\fR.
2521 .RE
2522
2523 .sp
2524 .ne 2
2525 .na
2526 \fBzfs_allow_redacted_dataset_mount\fR (int)
2527 .ad
2528 .RS 12n
2529 Allow datasets received with redacted send/receive to be mounted. Normally
2530 disabled because these datasets may be missing key data.
2531 .sp
2532 Default value: \fB0\fR.
2533 .RE
2534
2535 .sp
2536 .ne 2
2537 .na
2538 \fBzfs_min_metaslabs_to_flush\fR (ulong)
2539 .ad
2540 .RS 12n
2541 Minimum number of metaslabs to flush per dirty TXG
2542 .sp
2543 Default value: \fB1\fR.
2544 .RE
2545
2546 .sp
2547 .ne 2
2548 .na
2549 \fBzfs_metaslab_fragmentation_threshold\fR (int)
2550 .ad
2551 .RS 12n
2552 Allow metaslabs to keep their active state as long as their fragmentation
2553 percentage is less than or equal to this value. An active metaslab that
2554 exceeds this threshold will no longer keep its active status allowing
2555 better metaslabs to be selected.
2556 .sp
2557 Default value: \fB70\fR.
2558 .RE
2559
2560 .sp
2561 .ne 2
2562 .na
2563 \fBzfs_mg_fragmentation_threshold\fR (int)
2564 .ad
2565 .RS 12n
2566 Metaslab groups are considered eligible for allocations if their
2567 fragmentation metric (measured as a percentage) is less than or equal to
2568 this value. If a metaslab group exceeds this threshold then it will be
2569 skipped unless all metaslab groups within the metaslab class have also
2570 crossed this threshold.
2571 .sp
2572 Default value: \fB95\fR.
2573 .RE
2574
2575 .sp
2576 .ne 2
2577 .na
2578 \fBzfs_mg_noalloc_threshold\fR (int)
2579 .ad
2580 .RS 12n
2581 Defines a threshold at which metaslab groups should be eligible for
2582 allocations. The value is expressed as a percentage of free space
2583 beyond which a metaslab group is always eligible for allocations.
2584 If a metaslab group's free space is less than or equal to the
2585 threshold, the allocator will avoid allocating to that group
2586 unless all groups in the pool have reached the threshold. Once all
2587 groups have reached the threshold, all groups are allowed to accept
2588 allocations. The default value of 0 disables the feature and causes
2589 all metaslab groups to be eligible for allocations.
2590
2591 This parameter allows one to deal with pools having heavily imbalanced
2592 vdevs such as would be the case when a new vdev has been added.
2593 Setting the threshold to a non-zero percentage will stop allocations
2594 from being made to vdevs that aren't filled to the specified percentage
2595 and allow lesser filled vdevs to acquire more allocations than they
2596 otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
2597 .sp
2598 Default value: \fB0\fR.
2599 .RE
2600
2601 .sp
2602 .ne 2
2603 .na
2604 \fBzfs_ddt_data_is_special\fR (int)
2605 .ad
2606 .RS 12n
2607 If enabled, ZFS will place DDT data into the special allocation class.
2608 .sp
2609 Default value: \fB1\fR.
2610 .RE
2611
2612 .sp
2613 .ne 2
2614 .na
2615 \fBzfs_user_indirect_is_special\fR (int)
2616 .ad
2617 .RS 12n
2618 If enabled, ZFS will place user data (both file and zvol) indirect blocks
2619 into the special allocation class.
2620 .sp
2621 Default value: \fB1\fR.
2622 .RE
2623
2624 .sp
2625 .ne 2
2626 .na
2627 \fBzfs_multihost_history\fR (int)
2628 .ad
2629 .RS 12n
2630 Historical statistics for the last N multihost updates will be available in
2631 \fB/proc/spl/kstat/zfs/<pool>/multihost\fR
2632 .sp
2633 Default value: \fB0\fR.
2634 .RE
2635
2636 .sp
2637 .ne 2
2638 .na
2639 \fBzfs_multihost_interval\fR (ulong)
2640 .ad
2641 .RS 12n
2642 Used to control the frequency of multihost writes which are performed when the
2643 \fBmultihost\fR pool property is on. This is one factor used to determine the
2644 length of the activity check during import.
2645 .sp
2646 The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR
2647 milliseconds. On average a multihost write will be issued for each leaf vdev
2648 every \fBzfs_multihost_interval\fR milliseconds. In practice, the observed
2649 period can vary with the I/O load and this observed value is the delay which is
2650 stored in the uberblock.
2651 .sp
2652 Default value: \fB1000\fR.
2653 .RE
2654
2655 .sp
2656 .ne 2
2657 .na
2658 \fBzfs_multihost_import_intervals\fR (uint)
2659 .ad
2660 .RS 12n
2661 Used to control the duration of the activity test on import. Smaller values of
2662 \fBzfs_multihost_import_intervals\fR will reduce the import time but increase
2663 the risk of failing to detect an active pool. The total activity check time is
2664 never allowed to drop below one second.
2665 .sp
2666 On import the activity check waits a minimum amount of time determined by
2667 \fBzfs_multihost_interval * zfs_multihost_import_intervals\fR, or the same
2668 product computed on the host which last had the pool imported (whichever is
2669 greater). The activity check time may be further extended if the value of mmp
2670 delay found in the best uberblock indicates actual multihost updates happened
2671 at longer intervals than \fBzfs_multihost_interval\fR. A minimum value of
2672 \fB100ms\fR is enforced.
2673 .sp
2674 A value of 0 is ignored and treated as if it was set to 1.
2675 .sp
2676 Default value: \fB20\fR.
2677 .RE
2678
2679 .sp
2680 .ne 2
2681 .na
2682 \fBzfs_multihost_fail_intervals\fR (uint)
2683 .ad
2684 .RS 12n
2685 Controls the behavior of the pool when multihost write failures or delays are
2686 detected.
2687 .sp
2688 When \fBzfs_multihost_fail_intervals = 0\fR, multihost write failures or delays
2689 are ignored. The failures will still be reported to the ZED which depending on
2690 its configuration may take action such as suspending the pool or offlining a
2691 device.
2692
2693 .sp
2694 When \fBzfs_multihost_fail_intervals > 0\fR, the pool will be suspended if
2695 \fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds pass
2696 without a successful mmp write. This guarantees the activity test will see
2697 mmp writes if the pool is imported. A value of 1 is ignored and treated as
2698 if it was set to 2. This is necessary to prevent the pool from being suspended
2699 due to normal, small I/O latency variations.
2700
2701 .sp
2702 Default value: \fB10\fR.
2703 .RE
2704
2705 .sp
2706 .ne 2
2707 .na
2708 \fBzfs_no_scrub_io\fR (int)
2709 .ad
2710 .RS 12n
2711 Set for no scrub I/O. This results in scrubs not actually scrubbing data and
2712 simply doing a metadata crawl of the pool instead.
2713 .sp
2714 Use \fB1\fR for yes and \fB0\fR for no (default).
2715 .RE
2716
2717 .sp
2718 .ne 2
2719 .na
2720 \fBzfs_no_scrub_prefetch\fR (int)
2721 .ad
2722 .RS 12n
2723 Set to disable block prefetching for scrubs.
2724 .sp
2725 Use \fB1\fR for yes and \fB0\fR for no (default).
2726 .RE
2727
2728 .sp
2729 .ne 2
2730 .na
2731 \fBzfs_nocacheflush\fR (int)
2732 .ad
2733 .RS 12n
2734 Disable cache flush operations on disks when writing. Setting this will
2735 cause pool corruption on power loss if a volatile out-of-order write cache
2736 is enabled.
2737 .sp
2738 Use \fB1\fR for yes and \fB0\fR for no (default).
2739 .RE
2740
2741 .sp
2742 .ne 2
2743 .na
2744 \fBzfs_nopwrite_enabled\fR (int)
2745 .ad
2746 .RS 12n
2747 Enable NOP writes
2748 .sp
2749 Use \fB1\fR for yes (default) and \fB0\fR to disable.
2750 .RE
2751
2752 .sp
2753 .ne 2
2754 .na
2755 \fBzfs_dmu_offset_next_sync\fR (int)
2756 .ad
2757 .RS 12n
2758 Enable forcing txg sync to find holes. When enabled forces ZFS to act
2759 like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
2760 when a dnode is dirty causes txg's to be synced so that this data can be
2761 found.
2762 .sp
2763 Use \fB1\fR for yes and \fB0\fR to disable (default).
2764 .RE
2765
2766 .sp
2767 .ne 2
2768 .na
2769 \fBzfs_pd_bytes_max\fR (int)
2770 .ad
2771 .RS 12n
2772 The number of bytes which should be prefetched during a pool traversal
2773 (eg: \fBzfs send\fR or other data crawling operations)
2774 .sp
2775 Default value: \fB52,428,800\fR.
2776 .RE
2777
2778 .sp
2779 .ne 2
2780 .na
2781 \fBzfs_per_txg_dirty_frees_percent \fR (ulong)
2782 .ad
2783 .RS 12n
2784 Tunable to control percentage of dirtied indirect blocks from frees allowed
2785 into one TXG. After this threshold is crossed, additional frees will wait until
2786 the next TXG.
2787 A value of zero will disable this throttle.
2788 .sp
2789 Default value: \fB5\fR, set to \fB0\fR to disable.
2790 .RE
2791
2792 .sp
2793 .ne 2
2794 .na
2795 \fBzfs_prefetch_disable\fR (int)
2796 .ad
2797 .RS 12n
2798 This tunable disables predictive prefetch. Note that it leaves "prescient"
2799 prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
2800 prescient prefetch never issues i/os that end up not being needed, so it
2801 can't hurt performance.
2802 .sp
2803 Use \fB1\fR for yes and \fB0\fR for no (default).
2804 .RE
2805
2806 .sp
2807 .ne 2
2808 .na
2809 \fBzfs_qat_checksum_disable\fR (int)
2810 .ad
2811 .RS 12n
2812 This tunable disables qat hardware acceleration for sha256 checksums. It
2813 may be set after the zfs modules have been loaded to initialize the qat
2814 hardware as long as support is compiled in and the qat driver is present.
2815 .sp
2816 Use \fB1\fR for yes and \fB0\fR for no (default).
2817 .RE
2818
2819 .sp
2820 .ne 2
2821 .na
2822 \fBzfs_qat_compress_disable\fR (int)
2823 .ad
2824 .RS 12n
2825 This tunable disables qat hardware acceleration for gzip compression. It
2826 may be set after the zfs modules have been loaded to initialize the qat
2827 hardware as long as support is compiled in and the qat driver is present.
2828 .sp
2829 Use \fB1\fR for yes and \fB0\fR for no (default).
2830 .RE
2831
2832 .sp
2833 .ne 2
2834 .na
2835 \fBzfs_qat_encrypt_disable\fR (int)
2836 .ad
2837 .RS 12n
2838 This tunable disables qat hardware acceleration for AES-GCM encryption. It
2839 may be set after the zfs modules have been loaded to initialize the qat
2840 hardware as long as support is compiled in and the qat driver is present.
2841 .sp
2842 Use \fB1\fR for yes and \fB0\fR for no (default).
2843 .RE
2844
2845 .sp
2846 .ne 2
2847 .na
2848 \fBzfs_read_chunk_size\fR (long)
2849 .ad
2850 .RS 12n
2851 Bytes to read per chunk
2852 .sp
2853 Default value: \fB1,048,576\fR.
2854 .RE
2855
2856 .sp
2857 .ne 2
2858 .na
2859 \fBzfs_read_history\fR (int)
2860 .ad
2861 .RS 12n
2862 Historical statistics for the last N reads will be available in
2863 \fB/proc/spl/kstat/zfs/<pool>/reads\fR
2864 .sp
2865 Default value: \fB0\fR (no data is kept).
2866 .RE
2867
2868 .sp
2869 .ne 2
2870 .na
2871 \fBzfs_read_history_hits\fR (int)
2872 .ad
2873 .RS 12n
2874 Include cache hits in read history
2875 .sp
2876 Use \fB1\fR for yes and \fB0\fR for no (default).
2877 .RE
2878
2879 .sp
2880 .ne 2
2881 .na
2882 \fBzfs_rebuild_max_segment\fR (ulong)
2883 .ad
2884 .RS 12n
2885 Maximum read segment size to issue when sequentially resilvering a
2886 top-level vdev.
2887 .sp
2888 Default value: \fB1,048,576\fR.
2889 .RE
2890
2891 .sp
2892 .ne 2
2893 .na
2894 \fBzfs_reconstruct_indirect_combinations_max\fR (int)
2895 .ad
2896 .RS 12na
2897 If an indirect split block contains more than this many possible unique
2898 combinations when being reconstructed, consider it too computationally
2899 expensive to check them all. Instead, try at most
2900 \fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected
2901 combinations each time the block is accessed. This allows all segment
2902 copies to participate fairly in the reconstruction when all combinations
2903 cannot be checked and prevents repeated use of one bad copy.
2904 .sp
2905 Default value: \fB4096\fR.
2906 .RE
2907
2908 .sp
2909 .ne 2
2910 .na
2911 \fBzfs_recover\fR (int)
2912 .ad
2913 .RS 12n
2914 Set to attempt to recover from fatal errors. This should only be used as a
2915 last resort, as it typically results in leaked space, or worse.
2916 .sp
2917 Use \fB1\fR for yes and \fB0\fR for no (default).
2918 .RE
2919
2920 .sp
2921 .ne 2
2922 .na
2923 \fBzfs_removal_ignore_errors\fR (int)
2924 .ad
2925 .RS 12n
2926 .sp
2927 Ignore hard IO errors during device removal. When set, if a device encounters
2928 a hard IO error during the removal process the removal will not be cancelled.
2929 This can result in a normally recoverable block becoming permanently damaged
2930 and is not recommended. This should only be used as a last resort when the
2931 pool cannot be returned to a healthy state prior to removing the device.
2932 .sp
2933 Default value: \fB0\fR.
2934 .RE
2935
2936 .sp
2937 .ne 2
2938 .na
2939 \fBzfs_removal_suspend_progress\fR (int)
2940 .ad
2941 .RS 12n
2942 .sp
2943 This is used by the test suite so that it can ensure that certain actions
2944 happen while in the middle of a removal.
2945 .sp
2946 Default value: \fB0\fR.
2947 .RE
2948
2949 .sp
2950 .ne 2
2951 .na
2952 \fBzfs_remove_max_segment\fR (int)
2953 .ad
2954 .RS 12n
2955 .sp
2956 The largest contiguous segment that we will attempt to allocate when removing
2957 a device. This can be no larger than 16MB. If there is a performance
2958 problem with attempting to allocate large blocks, consider decreasing this.
2959 .sp
2960 Default value: \fB16,777,216\fR (16MB).
2961 .RE
2962
2963 .sp
2964 .ne 2
2965 .na
2966 \fBzfs_resilver_disable_defer\fR (int)
2967 .ad
2968 .RS 12n
2969 Disables the \fBresilver_defer\fR feature, causing an operation that would
2970 start a resilver to restart one in progress immediately.
2971 .sp
2972 Default value: \fB0\fR (feature enabled).
2973 .RE
2974
2975 .sp
2976 .ne 2
2977 .na
2978 \fBzfs_resilver_min_time_ms\fR (int)
2979 .ad
2980 .RS 12n
2981 Resilvers are processed by the sync thread. While resilvering it will spend
2982 at least this much time working on a resilver between txg flushes.
2983 .sp
2984 Default value: \fB3,000\fR.
2985 .RE
2986
2987 .sp
2988 .ne 2
2989 .na
2990 \fBzfs_scan_ignore_errors\fR (int)
2991 .ad
2992 .RS 12n
2993 If set to a nonzero value, remove the DTL (dirty time list) upon
2994 completion of a pool scan (scrub) even if there were unrepairable
2995 errors. It is intended to be used during pool repair or recovery to
2996 stop resilvering when the pool is next imported.
2997 .sp
2998 Default value: \fB0\fR.
2999 .RE
3000
3001 .sp
3002 .ne 2
3003 .na
3004 \fBzfs_scrub_min_time_ms\fR (int)
3005 .ad
3006 .RS 12n
3007 Scrubs are processed by the sync thread. While scrubbing it will spend
3008 at least this much time working on a scrub between txg flushes.
3009 .sp
3010 Default value: \fB1,000\fR.
3011 .RE
3012
3013 .sp
3014 .ne 2
3015 .na
3016 \fBzfs_scan_checkpoint_intval\fR (int)
3017 .ad
3018 .RS 12n
3019 To preserve progress across reboots the sequential scan algorithm periodically
3020 needs to stop metadata scanning and issue all the verifications I/Os to disk.
3021 The frequency of this flushing is determined by the
3022 \fBzfs_scan_checkpoint_intval\fR tunable.
3023 .sp
3024 Default value: \fB7200\fR seconds (every 2 hours).
3025 .RE
3026
3027 .sp
3028 .ne 2
3029 .na
3030 \fBzfs_scan_fill_weight\fR (int)
3031 .ad
3032 .RS 12n
3033 This tunable affects how scrub and resilver I/O segments are ordered. A higher
3034 number indicates that we care more about how filled in a segment is, while a
3035 lower number indicates we care more about the size of the extent without
3036 considering the gaps within a segment. This value is only tunable upon module
3037 insertion. Changing the value afterwards will have no affect on scrub or
3038 resilver performance.
3039 .sp
3040 Default value: \fB3\fR.
3041 .RE
3042
3043 .sp
3044 .ne 2
3045 .na
3046 \fBzfs_scan_issue_strategy\fR (int)
3047 .ad
3048 .RS 12n
3049 Determines the order that data will be verified while scrubbing or resilvering.
3050 If set to \fB1\fR, data will be verified as sequentially as possible, given the
3051 amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
3052 may improve scrub performance if the pool's data is very fragmented. If set to
3053 \fB2\fR, the largest mostly-contiguous chunk of found data will be verified
3054 first. By deferring scrubbing of small segments, we may later find adjacent data
3055 to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
3056 strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
3057 checkpoint.
3058 .sp
3059 Default value: \fB0\fR.
3060 .RE
3061
3062 .sp
3063 .ne 2
3064 .na
3065 \fBzfs_scan_legacy\fR (int)
3066 .ad
3067 .RS 12n
3068 A value of 0 indicates that scrubs and resilvers will gather metadata in
3069 memory before issuing sequential I/O. A value of 1 indicates that the legacy
3070 algorithm will be used where I/O is initiated as soon as it is discovered.
3071 Changing this value to 0 will not affect scrubs or resilvers that are already
3072 in progress.
3073 .sp
3074 Default value: \fB0\fR.
3075 .RE
3076
3077 .sp
3078 .ne 2
3079 .na
3080 \fBzfs_scan_max_ext_gap\fR (int)
3081 .ad
3082 .RS 12n
3083 Indicates the largest gap in bytes between scrub / resilver I/Os that will still
3084 be considered sequential for sorting purposes. Changing this value will not
3085 affect scrubs or resilvers that are already in progress.
3086 .sp
3087 Default value: \fB2097152 (2 MB)\fR.
3088 .RE
3089
3090 .sp
3091 .ne 2
3092 .na
3093 \fBzfs_scan_mem_lim_fact\fR (int)
3094 .ad
3095 .RS 12n
3096 Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
3097 This tunable determines the hard limit for I/O sorting memory usage.
3098 When the hard limit is reached we stop scanning metadata and start issuing
3099 data verification I/O. This is done until we get below the soft limit.
3100 .sp
3101 Default value: \fB20\fR which is 5% of RAM (1/20).
3102 .RE
3103
3104 .sp
3105 .ne 2
3106 .na
3107 \fBzfs_scan_mem_lim_soft_fact\fR (int)
3108 .ad
3109 .RS 12n
3110 The fraction of the hard limit used to determined the soft limit for I/O sorting
3111 by the sequential scan algorithm. When we cross this limit from below no action
3112 is taken. When we cross this limit from above it is because we are issuing
3113 verification I/O. In this case (unless the metadata scan is done) we stop
3114 issuing verification I/O and start scanning metadata again until we get to the
3115 hard limit.
3116 .sp
3117 Default value: \fB20\fR which is 5% of the hard limit (1/20).
3118 .RE
3119
3120 .sp
3121 .ne 2
3122 .na
3123 \fBzfs_scan_strict_mem_lim\fR (int)
3124 .ad
3125 .RS 12n
3126 Enforces tight memory limits on pool scans when a sequential scan is in
3127 progress. When disabled the memory limit may be exceeded by fast disks.
3128 .sp
3129 Default value: \fB0\fR.
3130 .RE
3131
3132 .sp
3133 .ne 2
3134 .na
3135 \fBzfs_scan_suspend_progress\fR (int)
3136 .ad
3137 .RS 12n
3138 Freezes a scrub/resilver in progress without actually pausing it. Intended for
3139 testing/debugging.
3140 .sp
3141 Default value: \fB0\fR.
3142 .RE
3143
3144
3145 .sp
3146 .ne 2
3147 .na
3148 \fBzfs_scan_vdev_limit\fR (int)
3149 .ad
3150 .RS 12n
3151 Maximum amount of data that can be concurrently issued at once for scrubs and
3152 resilvers per leaf device, given in bytes.
3153 .sp
3154 Default value: \fB41943040\fR.
3155 .RE
3156
3157 .sp
3158 .ne 2
3159 .na
3160 \fBzfs_send_corrupt_data\fR (int)
3161 .ad
3162 .RS 12n
3163 Allow sending of corrupt data (ignore read/checksum errors when sending data)
3164 .sp
3165 Use \fB1\fR for yes and \fB0\fR for no (default).
3166 .RE
3167
3168 .sp
3169 .ne 2
3170 .na
3171 \fBzfs_send_unmodified_spill_blocks\fR (int)
3172 .ad
3173 .RS 12n
3174 Include unmodified spill blocks in the send stream. Under certain circumstances
3175 previous versions of ZFS could incorrectly remove the spill block from an
3176 existing object. Including unmodified copies of the spill blocks creates a
3177 backwards compatible stream which will recreate a spill block if it was
3178 incorrectly removed.
3179 .sp
3180 Use \fB1\fR for yes (default) and \fB0\fR for no.
3181 .RE
3182
3183 .sp
3184 .ne 2
3185 .na
3186 \fBzfs_send_no_prefetch_queue_ff\fR (int)
3187 .ad
3188 .RS 12n
3189 The fill fraction of the \fBzfs send\fR internal queues. The fill fraction
3190 controls the timing with which internal threads are woken up.
3191 .sp
3192 Default value: \fB20\fR.
3193 .RE
3194
3195 .sp
3196 .ne 2
3197 .na
3198 \fBzfs_send_no_prefetch_queue_length\fR (int)
3199 .ad
3200 .RS 12n
3201 The maximum number of bytes allowed in \fBzfs send\fR's internal queues.
3202 .sp
3203 Default value: \fB1,048,576\fR.
3204 .RE
3205
3206 .sp
3207 .ne 2
3208 .na
3209 \fBzfs_send_queue_ff\fR (int)
3210 .ad
3211 .RS 12n
3212 The fill fraction of the \fBzfs send\fR prefetch queue. The fill fraction
3213 controls the timing with which internal threads are woken up.
3214 .sp
3215 Default value: \fB20\fR.
3216 .RE
3217
3218 .sp
3219 .ne 2
3220 .na
3221 \fBzfs_send_queue_length\fR (int)
3222 .ad
3223 .RS 12n
3224 The maximum number of bytes allowed that will be prefetched by \fBzfs send\fR.
3225 This value must be at least twice the maximum block size in use.
3226 .sp
3227 Default value: \fB16,777,216\fR.
3228 .RE
3229
3230 .sp
3231 .ne 2
3232 .na
3233 \fBzfs_recv_queue_ff\fR (int)
3234 .ad
3235 .RS 12n
3236 The fill fraction of the \fBzfs receive\fR queue. The fill fraction
3237 controls the timing with which internal threads are woken up.
3238 .sp
3239 Default value: \fB20\fR.
3240 .RE
3241
3242 .sp
3243 .ne 2
3244 .na
3245 \fBzfs_recv_queue_length\fR (int)
3246 .ad
3247 .RS 12n
3248 The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value
3249 must be at least twice the maximum block size in use.
3250 .sp
3251 Default value: \fB16,777,216\fR.
3252 .RE
3253
3254 .sp
3255 .ne 2
3256 .na
3257 \fBzfs_recv_write_batch_size\fR (int)
3258 .ad
3259 .RS 12n
3260 The maximum amount of data (in bytes) that \fBzfs receive\fR will write in
3261 one DMU transaction. This is the uncompressed size, even when receiving a
3262 compressed send stream. This setting will not reduce the write size below
3263 a single block. Capped at a maximum of 32MB
3264 .sp
3265 Default value: \fB1MB\fR.
3266 .RE
3267
3268 .sp
3269 .ne 2
3270 .na
3271 \fBzfs_override_estimate_recordsize\fR (ulong)
3272 .ad
3273 .RS 12n
3274 Setting this variable overrides the default logic for estimating block
3275 sizes when doing a zfs send. The default heuristic is that the average
3276 block size will be the current recordsize. Override this value if most data
3277 in your dataset is not of that size and you require accurate zfs send size
3278 estimates.
3279 .sp
3280 Default value: \fB0\fR.
3281 .RE
3282
3283 .sp
3284 .ne 2
3285 .na
3286 \fBzfs_sync_pass_deferred_free\fR (int)
3287 .ad
3288 .RS 12n
3289 Flushing of data to disk is done in passes. Defer frees starting in this pass
3290 .sp
3291 Default value: \fB2\fR.
3292 .RE
3293
3294 .sp
3295 .ne 2
3296 .na
3297 \fBzfs_spa_discard_memory_limit\fR (int)
3298 .ad
3299 .RS 12n
3300 Maximum memory used for prefetching a checkpoint's space map on each
3301 vdev while discarding the checkpoint.
3302 .sp
3303 Default value: \fB16,777,216\fR.
3304 .RE
3305
3306 .sp
3307 .ne 2
3308 .na
3309 \fBzfs_special_class_metadata_reserve_pct\fR (int)
3310 .ad
3311 .RS 12n
3312 Only allow small data blocks to be allocated on the special and dedup vdev
3313 types when the available free space percentage on these vdevs exceeds this
3314 value. This ensures reserved space is available for pool meta data as the
3315 special vdevs approach capacity.
3316 .sp
3317 Default value: \fB25\fR.
3318 .RE
3319
3320 .sp
3321 .ne 2
3322 .na
3323 \fBzfs_sync_pass_dont_compress\fR (int)
3324 .ad
3325 .RS 12n
3326 Starting in this sync pass, we disable compression (including of metadata).
3327 With the default setting, in practice, we don't have this many sync passes,
3328 so this has no effect.
3329 .sp
3330 The original intent was that disabling compression would help the sync passes
3331 to converge. However, in practice disabling compression increases the average
3332 number of sync passes, because when we turn compression off, a lot of block's
3333 size will change and thus we have to re-allocate (not overwrite) them. It
3334 also increases the number of 128KB allocations (e.g. for indirect blocks and
3335 spacemaps) because these will not be compressed. The 128K allocations are
3336 especially detrimental to performance on highly fragmented systems, which may
3337 have very few free segments of this size, and may need to load new metaslabs
3338 to satisfy 128K allocations.
3339 .sp
3340 Default value: \fB8\fR.
3341 .RE
3342
3343 .sp
3344 .ne 2
3345 .na
3346 \fBzfs_sync_pass_rewrite\fR (int)
3347 .ad
3348 .RS 12n
3349 Rewrite new block pointers starting in this pass
3350 .sp
3351 Default value: \fB2\fR.
3352 .RE
3353
3354 .sp
3355 .ne 2
3356 .na
3357 \fBzfs_sync_taskq_batch_pct\fR (int)
3358 .ad
3359 .RS 12n
3360 This controls the number of threads used by the dp_sync_taskq. The default
3361 value of 75% will create a maximum of one thread per cpu.
3362 .sp
3363 Default value: \fB75\fR%.
3364 .RE
3365
3366 .sp
3367 .ne 2
3368 .na
3369 \fBzfs_trim_extent_bytes_max\fR (uint)
3370 .ad
3371 .RS 12n
3372 Maximum size of TRIM command. Ranges larger than this will be split in to
3373 chunks no larger than \fBzfs_trim_extent_bytes_max\fR bytes before being
3374 issued to the device.
3375 .sp
3376 Default value: \fB134,217,728\fR.
3377 .RE
3378
3379 .sp
3380 .ne 2
3381 .na
3382 \fBzfs_trim_extent_bytes_min\fR (uint)
3383 .ad
3384 .RS 12n
3385 Minimum size of TRIM commands. TRIM ranges smaller than this will be skipped
3386 unless they're part of a larger range which was broken in to chunks. This is
3387 done because it's common for these small TRIMs to negatively impact overall
3388 performance. This value can be set to 0 to TRIM all unallocated space.
3389 .sp
3390 Default value: \fB32,768\fR.
3391 .RE
3392
3393 .sp
3394 .ne 2
3395 .na
3396 \fBzfs_trim_metaslab_skip\fR (uint)
3397 .ad
3398 .RS 12n
3399 Skip uninitialized metaslabs during the TRIM process. This option is useful
3400 for pools constructed from large thinly-provisioned devices where TRIM
3401 operations are slow. As a pool ages an increasing fraction of the pools
3402 metaslabs will be initialized progressively degrading the usefulness of
3403 this option. This setting is stored when starting a manual TRIM and will
3404 persist for the duration of the requested TRIM.
3405 .sp
3406 Default value: \fB0\fR.
3407 .RE
3408
3409 .sp
3410 .ne 2
3411 .na
3412 \fBzfs_trim_queue_limit\fR (uint)
3413 .ad
3414 .RS 12n
3415 Maximum number of queued TRIMs outstanding per leaf vdev. The number of
3416 concurrent TRIM commands issued to the device is controlled by the
3417 \fBzfs_vdev_trim_min_active\fR and \fBzfs_vdev_trim_max_active\fR module
3418 options.
3419 .sp
3420 Default value: \fB10\fR.
3421 .RE
3422
3423 .sp
3424 .ne 2
3425 .na
3426 \fBzfs_trim_txg_batch\fR (uint)
3427 .ad
3428 .RS 12n
3429 The number of transaction groups worth of frees which should be aggregated
3430 before TRIM operations are issued to the device. This setting represents a
3431 trade-off between issuing larger, more efficient TRIM operations and the
3432 delay before the recently trimmed space is available for use by the device.
3433 .sp
3434 Increasing this value will allow frees to be aggregated for a longer time.
3435 This will result is larger TRIM operations and potentially increased memory
3436 usage. Decreasing this value will have the opposite effect. The default
3437 value of 32 was determined to be a reasonable compromise.
3438 .sp
3439 Default value: \fB32\fR.
3440 .RE
3441
3442 .sp
3443 .ne 2
3444 .na
3445 \fBzfs_txg_history\fR (int)
3446 .ad
3447 .RS 12n
3448 Historical statistics for the last N txgs will be available in
3449 \fB/proc/spl/kstat/zfs/<pool>/txgs\fR
3450 .sp
3451 Default value: \fB0\fR.
3452 .RE
3453
3454 .sp
3455 .ne 2
3456 .na
3457 \fBzfs_txg_timeout\fR (int)
3458 .ad
3459 .RS 12n
3460 Flush dirty data to disk at least every N seconds (maximum txg duration)
3461 .sp
3462 Default value: \fB5\fR.
3463 .RE
3464
3465 .sp
3466 .ne 2
3467 .na
3468 \fBzfs_vdev_aggregate_trim\fR (int)
3469 .ad
3470 .RS 12n
3471 Allow TRIM I/Os to be aggregated. This is normally not helpful because
3472 the extents to be trimmed will have been already been aggregated by the
3473 metaslab. This option is provided for debugging and performance analysis.
3474 .sp
3475 Default value: \fB0\fR.
3476 .RE
3477
3478 .sp
3479 .ne 2
3480 .na
3481 \fBzfs_vdev_aggregation_limit\fR (int)
3482 .ad
3483 .RS 12n
3484 Max vdev I/O aggregation size
3485 .sp
3486 Default value: \fB1,048,576\fR.
3487 .RE
3488
3489 .sp
3490 .ne 2
3491 .na
3492 \fBzfs_vdev_aggregation_limit_non_rotating\fR (int)
3493 .ad
3494 .RS 12n
3495 Max vdev I/O aggregation size for non-rotating media
3496 .sp
3497 Default value: \fB131,072\fR.
3498 .RE
3499
3500 .sp
3501 .ne 2
3502 .na
3503 \fBzfs_vdev_cache_bshift\fR (int)
3504 .ad
3505 .RS 12n
3506 Shift size to inflate reads too
3507 .sp
3508 Default value: \fB16\fR (effectively 65536).
3509 .RE
3510
3511 .sp
3512 .ne 2
3513 .na
3514 \fBzfs_vdev_cache_max\fR (int)
3515 .ad
3516 .RS 12n
3517 Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
3518 size (default 64k).
3519 .sp
3520 Default value: \fB16384\fR.
3521 .RE
3522
3523 .sp
3524 .ne 2
3525 .na
3526 \fBzfs_vdev_cache_size\fR (int)
3527 .ad
3528 .RS 12n
3529 Total size of the per-disk cache in bytes.
3530 .sp
3531 Currently this feature is disabled as it has been found to not be helpful
3532 for performance and in some cases harmful.
3533 .sp
3534 Default value: \fB0\fR.
3535 .RE
3536
3537 .sp
3538 .ne 2
3539 .na
3540 \fBzfs_vdev_mirror_rotating_inc\fR (int)
3541 .ad
3542 .RS 12n
3543 A number by which the balancing algorithm increments the load calculation for
3544 the purpose of selecting the least busy mirror member when an I/O immediately
3545 follows its predecessor on rotational vdevs for the purpose of making decisions
3546 based on load.
3547 .sp
3548 Default value: \fB0\fR.
3549 .RE
3550
3551 .sp
3552 .ne 2
3553 .na
3554 \fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
3555 .ad
3556 .RS 12n
3557 A number by which the balancing algorithm increments the load calculation for
3558 the purpose of selecting the least busy mirror member when an I/O lacks
3559 locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3560 this that are not immediately following the previous I/O are incremented by
3561 half.
3562 .sp
3563 Default value: \fB5\fR.
3564 .RE
3565
3566 .sp
3567 .ne 2
3568 .na
3569 \fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
3570 .ad
3571 .RS 12n
3572 The maximum distance for the last queued I/O in which the balancing algorithm
3573 considers an I/O to have locality.
3574 See the section "ZFS I/O SCHEDULER".
3575 .sp
3576 Default value: \fB1048576\fR.
3577 .RE
3578
3579 .sp
3580 .ne 2
3581 .na
3582 \fBzfs_vdev_mirror_non_rotating_inc\fR (int)
3583 .ad
3584 .RS 12n
3585 A number by which the balancing algorithm increments the load calculation for
3586 the purpose of selecting the least busy mirror member on non-rotational vdevs
3587 when I/Os do not immediately follow one another.
3588 .sp
3589 Default value: \fB0\fR.
3590 .RE
3591
3592 .sp
3593 .ne 2
3594 .na
3595 \fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
3596 .ad
3597 .RS 12n
3598 A number by which the balancing algorithm increments the load calculation for
3599 the purpose of selecting the least busy mirror member when an I/O lacks
3600 locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3601 this that are not immediately following the previous I/O are incremented by
3602 half.
3603 .sp
3604 Default value: \fB1\fR.
3605 .RE
3606
3607 .sp
3608 .ne 2
3609 .na
3610 \fBzfs_vdev_read_gap_limit\fR (int)
3611 .ad
3612 .RS 12n
3613 Aggregate read I/O operations if the gap on-disk between them is within this
3614 threshold.
3615 .sp
3616 Default value: \fB32,768\fR.
3617 .RE
3618
3619 .sp
3620 .ne 2
3621 .na
3622 \fBzfs_vdev_write_gap_limit\fR (int)
3623 .ad
3624 .RS 12n
3625 Aggregate write I/O over gap
3626 .sp
3627 Default value: \fB4,096\fR.
3628 .RE
3629
3630 .sp
3631 .ne 2
3632 .na
3633 \fBzfs_vdev_raidz_impl\fR (string)
3634 .ad
3635 .RS 12n
3636 Parameter for selecting raidz parity implementation to use.
3637
3638 Options marked (always) below may be selected on module load as they are
3639 supported on all systems.
3640 The remaining options may only be set after the module is loaded, as they
3641 are available only if the implementations are compiled in and supported
3642 on the running system.
3643
3644 Once the module is loaded, the content of
3645 /sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
3646 with the currently selected one enclosed in [].
3647 Possible options are:
3648 fastest - (always) implementation selected using built-in benchmark
3649 original - (always) original raidz implementation
3650 scalar - (always) scalar raidz implementation
3651 sse2 - implementation using SSE2 instruction set (64bit x86 only)
3652 ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
3653 avx2 - implementation using AVX2 instruction set (64bit x86 only)
3654 avx512f - implementation using AVX512F instruction set (64bit x86 only)
3655 avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
3656 aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
3657 aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
3658 powerpc_altivec - implementation using Altivec (PowerPC only)
3659 .sp
3660 Default value: \fBfastest\fR.
3661 .RE
3662
3663 .sp
3664 .ne 2
3665 .na
3666 \fBzfs_vdev_scheduler\fR (charp)
3667 .ad
3668 .RS 12n
3669 \fBDEPRECATED\fR: This option exists for compatibility with older user
3670 configurations. It does nothing except print a warning to the kernel log if
3671 set.
3672 .sp
3673 .RE
3674
3675 .sp
3676 .ne 2
3677 .na
3678 \fBzfs_zevent_cols\fR (int)
3679 .ad
3680 .RS 12n
3681 When zevents are logged to the console use this as the word wrap width.
3682 .sp
3683 Default value: \fB80\fR.
3684 .RE
3685
3686 .sp
3687 .ne 2
3688 .na
3689 \fBzfs_zevent_console\fR (int)
3690 .ad
3691 .RS 12n
3692 Log events to the console
3693 .sp
3694 Use \fB1\fR for yes and \fB0\fR for no (default).
3695 .RE
3696
3697 .sp
3698 .ne 2
3699 .na
3700 \fBzfs_zevent_len_max\fR (int)
3701 .ad
3702 .RS 12n
3703 Max event queue length. A value of 0 will result in a calculated value which
3704 increases with the number of CPUs in the system (minimum 64 events). Events
3705 in the queue can be viewed with the \fBzpool events\fR command.
3706 .sp
3707 Default value: \fB0\fR.
3708 .RE
3709
3710 .sp
3711 .ne 2
3712 .na
3713 \fBzfs_zevent_retain_max\fR (int)
3714 .ad
3715 .RS 12n
3716 Maximum recent zevent records to retain for duplicate checking. Setting
3717 this value to zero disables duplicate detection.
3718 .sp
3719 Default value: \fB2000\fR.
3720 .RE
3721
3722 .sp
3723 .ne 2
3724 .na
3725 \fBzfs_zevent_retain_expire_secs\fR (int)
3726 .ad
3727 .RS 12n
3728 Lifespan for a recent ereport that was retained for duplicate checking.
3729 .sp
3730 Default value: \fB900\fR.
3731 .RE
3732
3733 .na
3734 \fBzfs_zil_clean_taskq_maxalloc\fR (int)
3735 .ad
3736 .RS 12n
3737 The maximum number of taskq entries that are allowed to be cached. When this
3738 limit is exceeded transaction records (itxs) will be cleaned synchronously.
3739 .sp
3740 Default value: \fB1048576\fR.
3741 .RE
3742
3743 .sp
3744 .ne 2
3745 .na
3746 \fBzfs_zil_clean_taskq_minalloc\fR (int)
3747 .ad
3748 .RS 12n
3749 The number of taskq entries that are pre-populated when the taskq is first
3750 created and are immediately available for use.
3751 .sp
3752 Default value: \fB1024\fR.
3753 .RE
3754
3755 .sp
3756 .ne 2
3757 .na
3758 \fBzfs_zil_clean_taskq_nthr_pct\fR (int)
3759 .ad
3760 .RS 12n
3761 This controls the number of threads used by the dp_zil_clean_taskq. The default
3762 value of 100% will create a maximum of one thread per cpu.
3763 .sp
3764 Default value: \fB100\fR%.
3765 .RE
3766
3767 .sp
3768 .ne 2
3769 .na
3770 \fBzil_maxblocksize\fR (int)
3771 .ad
3772 .RS 12n
3773 This sets the maximum block size used by the ZIL. On very fragmented pools,
3774 lowering this (typically to 36KB) can improve performance.
3775 .sp
3776 Default value: \fB131072\fR (128KB).
3777 .RE
3778
3779 .sp
3780 .ne 2
3781 .na
3782 \fBzil_nocacheflush\fR (int)
3783 .ad
3784 .RS 12n
3785 Disable the cache flush commands that are normally sent to the disk(s) by
3786 the ZIL after an LWB write has completed. Setting this will cause ZIL
3787 corruption on power loss if a volatile out-of-order write cache is enabled.
3788 .sp
3789 Use \fB1\fR for yes and \fB0\fR for no (default).
3790 .RE
3791
3792 .sp
3793 .ne 2
3794 .na
3795 \fBzil_replay_disable\fR (int)
3796 .ad
3797 .RS 12n
3798 Disable intent logging replay. Can be disabled for recovery from corrupted
3799 ZIL
3800 .sp
3801 Use \fB1\fR for yes and \fB0\fR for no (default).
3802 .RE
3803
3804 .sp
3805 .ne 2
3806 .na
3807 \fBzil_slog_bulk\fR (ulong)
3808 .ad
3809 .RS 12n
3810 Limit SLOG write size per commit executed with synchronous priority.
3811 Any writes above that will be executed with lower (asynchronous) priority
3812 to limit potential SLOG device abuse by single active ZIL writer.
3813 .sp
3814 Default value: \fB786,432\fR.
3815 .RE
3816
3817 .sp
3818 .ne 2
3819 .na
3820 \fBzio_deadman_log_all\fR (int)
3821 .ad
3822 .RS 12n
3823 If non-zero, the zio deadman will produce debugging messages (see
3824 \fBzfs_dbgmsg_enable\fR) for all zios, rather than only for leaf
3825 zios possessing a vdev. This is meant to be used by developers to gain
3826 diagnostic information for hang conditions which don't involve a mutex
3827 or other locking primitive; typically conditions in which a thread in
3828 the zio pipeline is looping indefinitely.
3829 .sp
3830 Default value: \fB0\fR.
3831 .RE
3832
3833 .sp
3834 .ne 2
3835 .na
3836 \fBzio_decompress_fail_fraction\fR (int)
3837 .ad
3838 .RS 12n
3839 If non-zero, this value represents the denominator of the probability that zfs
3840 should induce a decompression failure. For instance, for a 5% decompression
3841 failure rate, this value should be set to 20.
3842 .sp
3843 Default value: \fB0\fR.
3844 .RE
3845
3846 .sp
3847 .ne 2
3848 .na
3849 \fBzio_slow_io_ms\fR (int)
3850 .ad
3851 .RS 12n
3852 When an I/O operation takes more than \fBzio_slow_io_ms\fR milliseconds to
3853 complete is marked as a slow I/O. Each slow I/O causes a delay zevent. Slow
3854 I/O counters can be seen with "zpool status -s".
3855
3856 .sp
3857 Default value: \fB30,000\fR.
3858 .RE
3859
3860 .sp
3861 .ne 2
3862 .na
3863 \fBzio_dva_throttle_enabled\fR (int)
3864 .ad
3865 .RS 12n
3866 Throttle block allocations in the I/O pipeline. This allows for
3867 dynamic allocation distribution when devices are imbalanced.
3868 When enabled, the maximum number of pending allocations per top-level vdev
3869 is limited by \fBzfs_vdev_queue_depth_pct\fR.
3870 .sp
3871 Default value: \fB1\fR.
3872 .RE
3873
3874 .sp
3875 .ne 2
3876 .na
3877 \fBzio_requeue_io_start_cut_in_line\fR (int)
3878 .ad
3879 .RS 12n
3880 Prioritize requeued I/O
3881 .sp
3882 Default value: \fB0\fR.
3883 .RE
3884
3885 .sp
3886 .ne 2
3887 .na
3888 \fBzio_taskq_batch_pct\fR (uint)
3889 .ad
3890 .RS 12n
3891 Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
3892 for I/O. These workers are responsible for I/O work such as compression and
3893 checksum calculations. Fractional number of CPUs will be rounded down.
3894 .sp
3895 The default value of 75 was chosen to avoid using all CPUs which can result in
3896 latency issues and inconsistent application performance, especially when high
3897 compression is enabled.
3898 .sp
3899 Default value: \fB75\fR.
3900 .RE
3901
3902 .sp
3903 .ne 2
3904 .na
3905 \fBzvol_inhibit_dev\fR (uint)
3906 .ad
3907 .RS 12n
3908 Do not create zvol device nodes. This may slightly improve startup time on
3909 systems with a very large number of zvols.
3910 .sp
3911 Use \fB1\fR for yes and \fB0\fR for no (default).
3912 .RE
3913
3914 .sp
3915 .ne 2
3916 .na
3917 \fBzvol_major\fR (uint)
3918 .ad
3919 .RS 12n
3920 Major number for zvol block devices
3921 .sp
3922 Default value: \fB230\fR.
3923 .RE
3924
3925 .sp
3926 .ne 2
3927 .na
3928 \fBzvol_max_discard_blocks\fR (ulong)
3929 .ad
3930 .RS 12n
3931 Discard (aka TRIM) operations done on zvols will be done in batches of this
3932 many blocks, where block size is determined by the \fBvolblocksize\fR property
3933 of a zvol.
3934 .sp
3935 Default value: \fB16,384\fR.
3936 .RE
3937
3938 .sp
3939 .ne 2
3940 .na
3941 \fBzvol_prefetch_bytes\fR (uint)
3942 .ad
3943 .RS 12n
3944 When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
3945 from the start and end of the volume. Prefetching these regions
3946 of the volume is desirable because they are likely to be accessed
3947 immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
3948 table.
3949 .sp
3950 Default value: \fB131,072\fR.
3951 .RE
3952
3953 .sp
3954 .ne 2
3955 .na
3956 \fBzvol_request_sync\fR (uint)
3957 .ad
3958 .RS 12n
3959 When processing I/O requests for a zvol submit them synchronously. This
3960 effectively limits the queue depth to 1 for each I/O submitter. When set
3961 to 0 requests are handled asynchronously by a thread pool. The number of
3962 requests which can be handled concurrently is controller by \fBzvol_threads\fR.
3963 .sp
3964 Default value: \fB0\fR.
3965 .RE
3966
3967 .sp
3968 .ne 2
3969 .na
3970 \fBzvol_threads\fR (uint)
3971 .ad
3972 .RS 12n
3973 Max number of threads which can handle zvol I/O requests concurrently.
3974 .sp
3975 Default value: \fB32\fR.
3976 .RE
3977
3978 .sp
3979 .ne 2
3980 .na
3981 \fBzvol_volmode\fR (uint)
3982 .ad
3983 .RS 12n
3984 Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
3985 Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
3986 .sp
3987 Default value: \fB1\fR.
3988 .RE
3989
3990 .SH ZFS I/O SCHEDULER
3991 ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
3992 The I/O scheduler determines when and in what order those operations are
3993 issued. The I/O scheduler divides operations into five I/O classes
3994 prioritized in the following order: sync read, sync write, async read,
3995 async write, and scrub/resilver. Each queue defines the minimum and
3996 maximum number of concurrent operations that may be issued to the
3997 device. In addition, the device has an aggregate maximum,
3998 \fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
3999 must not exceed the aggregate maximum. If the sum of the per-queue
4000 maximums exceeds the aggregate maximum, then the number of active I/Os
4001 may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
4002 be issued regardless of whether all per-queue minimums have been met.
4003 .sp
4004 For many physical devices, throughput increases with the number of
4005 concurrent operations, but latency typically suffers. Further, physical
4006 devices typically have a limit at which more concurrent operations have no
4007 effect on throughput or can actually cause it to decrease.
4008 .sp
4009 The scheduler selects the next operation to issue by first looking for an
4010 I/O class whose minimum has not been satisfied. Once all are satisfied and
4011 the aggregate maximum has not been hit, the scheduler looks for classes
4012 whose maximum has not been satisfied. Iteration through the I/O classes is
4013 done in the order specified above. No further operations are issued if the
4014 aggregate maximum number of concurrent operations has been hit or if there
4015 are no operations queued for an I/O class that has not hit its maximum.
4016 Every time an I/O is queued or an operation completes, the I/O scheduler
4017 looks for new operations to issue.
4018 .sp
4019 In general, smaller max_active's will lead to lower latency of synchronous
4020 operations. Larger max_active's may lead to higher overall throughput,
4021 depending on underlying storage.
4022 .sp
4023 The ratio of the queues' max_actives determines the balance of performance
4024 between reads, writes, and scrubs. E.g., increasing
4025 \fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
4026 more quickly, but reads and writes to have higher latency and lower throughput.
4027 .sp
4028 All I/O classes have a fixed maximum number of outstanding operations
4029 except for the async write class. Asynchronous writes represent the data
4030 that is committed to stable storage during the syncing stage for
4031 transaction groups. Transaction groups enter the syncing state
4032 periodically so the number of queued async writes will quickly burst up
4033 and then bleed down to zero. Rather than servicing them as quickly as
4034 possible, the I/O scheduler changes the maximum number of active async
4035 write I/Os according to the amount of dirty data in the pool. Since
4036 both throughput and latency typically increase with the number of
4037 concurrent operations issued to physical devices, reducing the
4038 burstiness in the number of concurrent operations also stabilizes the
4039 response time of operations from other -- and in particular synchronous
4040 -- queues. In broad strokes, the I/O scheduler will issue more
4041 concurrent operations from the async write queue as there's more dirty
4042 data in the pool.
4043 .sp
4044 Async Writes
4045 .sp
4046 The number of concurrent operations issued for the async write I/O class
4047 follows a piece-wise linear function defined by a few adjustable points.
4048 .nf
4049
4050 | o---------| <-- zfs_vdev_async_write_max_active
4051 ^ | /^ |
4052 | | / | |
4053 active | / | |
4054 I/O | / | |
4055 count | / | |
4056 | / | |
4057 |-------o | | <-- zfs_vdev_async_write_min_active
4058 0|_______^______|_________|
4059 0% | | 100% of zfs_dirty_data_max
4060 | |
4061 | `-- zfs_vdev_async_write_active_max_dirty_percent
4062 `--------- zfs_vdev_async_write_active_min_dirty_percent
4063
4064 .fi
4065 Until the amount of dirty data exceeds a minimum percentage of the dirty
4066 data allowed in the pool, the I/O scheduler will limit the number of
4067 concurrent operations to the minimum. As that threshold is crossed, the
4068 number of concurrent operations issued increases linearly to the maximum at
4069 the specified maximum percentage of the dirty data allowed in the pool.
4070 .sp
4071 Ideally, the amount of dirty data on a busy pool will stay in the sloped
4072 part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
4073 and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
4074 maximum percentage, this indicates that the rate of incoming data is
4075 greater than the rate that the backend storage can handle. In this case, we
4076 must further throttle incoming writes, as described in the next section.
4077
4078 .SH ZFS TRANSACTION DELAY
4079 We delay transactions when we've determined that the backend storage
4080 isn't able to accommodate the rate of incoming writes.
4081 .sp
4082 If there is already a transaction waiting, we delay relative to when
4083 that transaction will finish waiting. This way the calculated delay time
4084 is independent of the number of threads concurrently executing
4085 transactions.
4086 .sp
4087 If we are the only waiter, wait relative to when the transaction
4088 started, rather than the current time. This credits the transaction for
4089 "time already served", e.g. reading indirect blocks.
4090 .sp
4091 The minimum time for a transaction to take is calculated as:
4092 .nf
4093 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
4094 min_time is then capped at 100 milliseconds.
4095 .fi
4096 .sp
4097 The delay has two degrees of freedom that can be adjusted via tunables. The
4098 percentage of dirty data at which we start to delay is defined by
4099 \fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
4100 \fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
4101 delay after writing at full speed has failed to keep up with the incoming write
4102 rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
4103 this variable determines the amount of delay at the midpoint of the curve.
4104 .sp
4105 .nf
4106 delay
4107 10ms +-------------------------------------------------------------*+
4108 | *|
4109 9ms + *+
4110 | *|
4111 8ms + *+
4112 | * |
4113 7ms + * +
4114 | * |
4115 6ms + * +
4116 | * |
4117 5ms + * +
4118 | * |
4119 4ms + * +
4120 | * |
4121 3ms + * +
4122 | * |
4123 2ms + (midpoint) * +
4124 | | ** |
4125 1ms + v *** +
4126 | zfs_delay_scale ----------> ******** |
4127 0 +-------------------------------------*********----------------+
4128 0% <- zfs_dirty_data_max -> 100%
4129 .fi
4130 .sp
4131 Note that since the delay is added to the outstanding time remaining on the
4132 most recent transaction, the delay is effectively the inverse of IOPS.
4133 Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
4134 was chosen such that small changes in the amount of accumulated dirty data
4135 in the first 3/4 of the curve yield relatively small differences in the
4136 amount of delay.
4137 .sp
4138 The effects can be easier to understand when the amount of delay is
4139 represented on a log scale:
4140 .sp
4141 .nf
4142 delay
4143 100ms +-------------------------------------------------------------++
4144 + +
4145 | |
4146 + *+
4147 10ms + *+
4148 + ** +
4149 | (midpoint) ** |
4150 + | ** +
4151 1ms + v **** +
4152 + zfs_delay_scale ----------> ***** +
4153 | **** |
4154 + **** +
4155 100us + ** +
4156 + * +
4157 | * |
4158 + * +
4159 10us + * +
4160 + +
4161 | |
4162 + +
4163 +--------------------------------------------------------------+
4164 0% <- zfs_dirty_data_max -> 100%
4165 .fi
4166 .sp
4167 Note here that only as the amount of dirty data approaches its limit does
4168 the delay start to increase rapidly. The goal of a properly tuned system
4169 should be to keep the amount of dirty data out of that range by first
4170 ensuring that the appropriate limits are set for the I/O scheduler to reach
4171 optimal throughput on the backend storage, and then by changing the value
4172 of \fBzfs_delay_scale\fR to increase the steepness of the curve.