]> git.proxmox.com Git - mirror_zfs.git/blame - man/man5/zfs-module-parameters.5
Reduce codecov PR comments
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
CommitLineData
29714574
TF
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
d4a72f23 3.\" Copyright (c) 2017 Datto Inc.
29714574
TF
4.\" The contents of this file are subject to the terms of the Common Development
5.\" and Distribution License (the "License"). You may not use this file except
6.\" in compliance with the License. You can obtain a copy of the license at
7.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
8.\"
9.\" See the License for the specific language governing permissions and
10.\" limitations under the License. When distributing Covered Code, include this
11.\" CDDL HEADER in each file and include the License file at
12.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
13.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
14.\" own identifying information:
15.\" Portions Copyright [yyyy] [name of copyright owner]
d4a72f23 16.TH ZFS-MODULE-PARAMETERS 5 "Oct 28, 2017"
29714574
TF
17.SH NAME
18zfs\-module\-parameters \- ZFS module parameters
19.SH DESCRIPTION
20.sp
21.LP
22Description of the different parameters to the ZFS module.
23
24.SS "Module parameters"
25.sp
26.LP
27
6d836e6f
RE
28.sp
29.ne 2
30.na
31\fBignore_hole_birth\fR (int)
32.ad
33.RS 12n
34When set, the hole_birth optimization will not be used, and all holes will
35always be sent on zfs send. Useful if you suspect your datasets are affected
36by a bug in hole_birth.
37.sp
9ea9e0b9 38Use \fB1\fR for on (default) and \fB0\fR for off.
6d836e6f
RE
39.RE
40
29714574
TF
41.sp
42.ne 2
43.na
44\fBl2arc_feed_again\fR (int)
45.ad
46.RS 12n
83426735
D
47Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
48fast as possible.
29714574
TF
49.sp
50Use \fB1\fR for yes (default) and \fB0\fR to disable.
51.RE
52
53.sp
54.ne 2
55.na
56\fBl2arc_feed_min_ms\fR (ulong)
57.ad
58.RS 12n
83426735
D
59Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
60applicable in related situations.
29714574
TF
61.sp
62Default value: \fB200\fR.
63.RE
64
65.sp
66.ne 2
67.na
68\fBl2arc_feed_secs\fR (ulong)
69.ad
70.RS 12n
71Seconds between L2ARC writing
72.sp
73Default value: \fB1\fR.
74.RE
75
76.sp
77.ne 2
78.na
79\fBl2arc_headroom\fR (ulong)
80.ad
81.RS 12n
83426735
D
82How far through the ARC lists to search for L2ARC cacheable content, expressed
83as a multiplier of \fBl2arc_write_max\fR
29714574
TF
84.sp
85Default value: \fB2\fR.
86.RE
87
88.sp
89.ne 2
90.na
91\fBl2arc_headroom_boost\fR (ulong)
92.ad
93.RS 12n
83426735
D
94Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
95successfully compressed before writing. A value of 100 disables this feature.
29714574
TF
96.sp
97Default value: \fB200\fR.
98.RE
99
100.sp
101.ne 2
102.na
103\fBl2arc_nocompress\fR (int)
104.ad
105.RS 12n
106Skip compressing L2ARC buffers
107.sp
108Use \fB1\fR for yes and \fB0\fR for no (default).
109.RE
110
111.sp
112.ne 2
113.na
114\fBl2arc_noprefetch\fR (int)
115.ad
116.RS 12n
83426735
D
117Do not write buffers to L2ARC if they were prefetched but not used by
118applications
29714574
TF
119.sp
120Use \fB1\fR for yes (default) and \fB0\fR to disable.
121.RE
122
123.sp
124.ne 2
125.na
126\fBl2arc_norw\fR (int)
127.ad
128.RS 12n
129No reads during writes
130.sp
131Use \fB1\fR for yes and \fB0\fR for no (default).
132.RE
133
134.sp
135.ne 2
136.na
137\fBl2arc_write_boost\fR (ulong)
138.ad
139.RS 12n
603a1784 140Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
83426735 141while they remain cold.
29714574
TF
142.sp
143Default value: \fB8,388,608\fR.
144.RE
145
146.sp
147.ne 2
148.na
149\fBl2arc_write_max\fR (ulong)
150.ad
151.RS 12n
152Max write bytes per interval
153.sp
154Default value: \fB8,388,608\fR.
155.RE
156
99b14de4
ED
157.sp
158.ne 2
159.na
160\fBmetaslab_aliquot\fR (ulong)
161.ad
162.RS 12n
163Metaslab granularity, in bytes. This is roughly similar to what would be
164referred to as the "stripe size" in traditional RAID arrays. In normal
165operation, ZFS will try to write this amount of data to a top-level vdev
166before moving on to the next one.
167.sp
168Default value: \fB524,288\fR.
169.RE
170
f3a7f661
GW
171.sp
172.ne 2
173.na
174\fBmetaslab_bias_enabled\fR (int)
175.ad
176.RS 12n
177Enable metaslab group biasing based on its vdev's over- or under-utilization
178relative to the pool.
179.sp
180Use \fB1\fR for yes (default) and \fB0\fR for no.
181.RE
182
4e21fd06
DB
183.sp
184.ne 2
185.na
186\fBzfs_metaslab_segment_weight_enabled\fR (int)
187.ad
188.RS 12n
189Enable/disable segment-based metaslab selection.
190.sp
191Use \fB1\fR for yes (default) and \fB0\fR for no.
192.RE
193
194.sp
195.ne 2
196.na
197\fBzfs_metaslab_switch_threshold\fR (int)
198.ad
199.RS 12n
200When using segment-based metaslab selection, continue allocating
321204be 201from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
4e21fd06
DB
202worth of buckets have been exhausted.
203.sp
204Default value: \fB2\fR.
205.RE
206
29714574
TF
207.sp
208.ne 2
209.na
aa7d06a9 210\fBmetaslab_debug_load\fR (int)
29714574
TF
211.ad
212.RS 12n
aa7d06a9
GW
213Load all metaslabs during pool import.
214.sp
215Use \fB1\fR for yes and \fB0\fR for no (default).
216.RE
217
218.sp
219.ne 2
220.na
221\fBmetaslab_debug_unload\fR (int)
222.ad
223.RS 12n
224Prevent metaslabs from being unloaded.
29714574
TF
225.sp
226Use \fB1\fR for yes and \fB0\fR for no (default).
227.RE
228
f3a7f661
GW
229.sp
230.ne 2
231.na
232\fBmetaslab_fragmentation_factor_enabled\fR (int)
233.ad
234.RS 12n
235Enable use of the fragmentation metric in computing metaslab weights.
236.sp
237Use \fB1\fR for yes (default) and \fB0\fR for no.
238.RE
239
b8bcca18
MA
240.sp
241.ne 2
242.na
243\fBmetaslabs_per_vdev\fR (int)
244.ad
245.RS 12n
246When a vdev is added, it will be divided into approximately (but no more than) this number of metaslabs.
247.sp
248Default value: \fB200\fR.
249.RE
250
f3a7f661
GW
251.sp
252.ne 2
253.na
254\fBmetaslab_preload_enabled\fR (int)
255.ad
256.RS 12n
257Enable metaslab group preloading.
258.sp
259Use \fB1\fR for yes (default) and \fB0\fR for no.
260.RE
261
262.sp
263.ne 2
264.na
265\fBmetaslab_lba_weighting_enabled\fR (int)
266.ad
267.RS 12n
268Give more weight to metaslabs with lower LBAs, assuming they have
269greater bandwidth as is typically the case on a modern constant
270angular velocity disk drive.
271.sp
272Use \fB1\fR for yes (default) and \fB0\fR for no.
273.RE
274
29714574
TF
275.sp
276.ne 2
277.na
278\fBspa_config_path\fR (charp)
279.ad
280.RS 12n
281SPA config file
282.sp
283Default value: \fB/etc/zfs/zpool.cache\fR.
284.RE
285
e8b96c60
MA
286.sp
287.ne 2
288.na
289\fBspa_asize_inflation\fR (int)
290.ad
291.RS 12n
292Multiplication factor used to estimate actual disk consumption from the
293size of data being written. The default value is a worst case estimate,
294but lower values may be valid for a given pool depending on its
295configuration. Pool administrators who understand the factors involved
296may wish to specify a more realistic inflation factor, particularly if
297they operate close to quota or capacity limits.
298.sp
83426735 299Default value: \fB24\fR.
e8b96c60
MA
300.RE
301
dea377c0
MA
302.sp
303.ne 2
304.na
305\fBspa_load_verify_data\fR (int)
306.ad
307.RS 12n
308Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
309import. Use 0 to disable and 1 to enable.
310
311An extreme rewind import normally performs a full traversal of all
312blocks in the pool for verification. If this parameter is set to 0,
313the traversal skips non-metadata blocks. It can be toggled once the
314import has started to stop or start the traversal of non-metadata blocks.
315.sp
83426735 316Default value: \fB1\fR.
dea377c0
MA
317.RE
318
319.sp
320.ne 2
321.na
322\fBspa_load_verify_metadata\fR (int)
323.ad
324.RS 12n
325Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
326pool import. Use 0 to disable and 1 to enable.
327
328An extreme rewind import normally performs a full traversal of all
1c012083 329blocks in the pool for verification. If this parameter is set to 0,
dea377c0
MA
330the traversal is not performed. It can be toggled once the import has
331started to stop or start the traversal.
332.sp
83426735 333Default value: \fB1\fR.
dea377c0
MA
334.RE
335
336.sp
337.ne 2
338.na
339\fBspa_load_verify_maxinflight\fR (int)
340.ad
341.RS 12n
342Maximum concurrent I/Os during the traversal performed during an "extreme
343rewind" (\fB-X\fR) pool import.
344.sp
83426735 345Default value: \fB10000\fR.
dea377c0
MA
346.RE
347
6cde6435
BB
348.sp
349.ne 2
350.na
351\fBspa_slop_shift\fR (int)
352.ad
353.RS 12n
354Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
355in the pool to be consumed. This ensures that we don't run the pool
356completely out of space, due to unaccounted changes (e.g. to the MOS).
357It also limits the worst-case time to allocate space. If we have
358less than this amount of free space, most ZPL operations (e.g. write,
359create) will return ENOSPC.
360.sp
83426735 361Default value: \fB5\fR.
6cde6435
BB
362.RE
363
29714574
TF
364.sp
365.ne 2
366.na
367\fBzfetch_array_rd_sz\fR (ulong)
368.ad
369.RS 12n
27b293be 370If prefetching is enabled, disable prefetching for reads larger than this size.
29714574
TF
371.sp
372Default value: \fB1,048,576\fR.
373.RE
374
375.sp
376.ne 2
377.na
7f60329a 378\fBzfetch_max_distance\fR (uint)
29714574
TF
379.ad
380.RS 12n
7f60329a 381Max bytes to prefetch per stream (default 8MB).
29714574 382.sp
7f60329a 383Default value: \fB8,388,608\fR.
29714574
TF
384.RE
385
386.sp
387.ne 2
388.na
389\fBzfetch_max_streams\fR (uint)
390.ad
391.RS 12n
27b293be 392Max number of streams per zfetch (prefetch streams per file).
29714574
TF
393.sp
394Default value: \fB8\fR.
395.RE
396
397.sp
398.ne 2
399.na
400\fBzfetch_min_sec_reap\fR (uint)
401.ad
402.RS 12n
27b293be 403Min time before an active prefetch stream can be reclaimed
29714574
TF
404.sp
405Default value: \fB2\fR.
406.RE
407
25458cbe
TC
408.sp
409.ne 2
410.na
411\fBzfs_arc_dnode_limit\fR (ulong)
412.ad
413.RS 12n
414When the number of bytes consumed by dnodes in the ARC exceeds this number of
9907cc1c 415bytes, try to unpin some of it in response to demand for non-metadata. This
627791f3 416value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
9907cc1c
G
417indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
418the ARC meta buffers that may be used for dnodes.
25458cbe
TC
419
420See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
421when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
422than in response to overall demand for non-metadata.
423
424.sp
9907cc1c
G
425Default value: \fB0\fR.
426.RE
427
428.sp
429.ne 2
430.na
431\fBzfs_arc_dnode_limit_percent\fR (ulong)
432.ad
433.RS 12n
434Percentage that can be consumed by dnodes of ARC meta buffers.
435.sp
436See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
437higher priority if set to nonzero value.
438.sp
439Default value: \fB10\fR.
25458cbe
TC
440.RE
441
442.sp
443.ne 2
444.na
445\fBzfs_arc_dnode_reduce_percent\fR (ulong)
446.ad
447.RS 12n
448Percentage of ARC dnodes to try to scan in response to demand for non-metadata
6146e17e 449when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
25458cbe
TC
450
451.sp
452Default value: \fB10% of the number of dnodes in the ARC\fR.
453.RE
454
49ddb315
MA
455.sp
456.ne 2
457.na
458\fBzfs_arc_average_blocksize\fR (int)
459.ad
460.RS 12n
461The ARC's buffer hash table is sized based on the assumption of an average
462block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
463to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
464For configurations with a known larger average block size this value can be
465increased to reduce the memory footprint.
466
467.sp
468Default value: \fB8192\fR.
469.RE
470
ca0bf58d
PS
471.sp
472.ne 2
473.na
474\fBzfs_arc_evict_batch_limit\fR (int)
475.ad
476.RS 12n
8f343973 477Number ARC headers to evict per sub-list before proceeding to another sub-list.
ca0bf58d
PS
478This batch-style operation prevents entire sub-lists from being evicted at once
479but comes at a cost of additional unlocking and locking.
480.sp
481Default value: \fB10\fR.
482.RE
483
29714574
TF
484.sp
485.ne 2
486.na
487\fBzfs_arc_grow_retry\fR (int)
488.ad
489.RS 12n
ca85d690 490If set to a non zero value, it will replace the arc_grow_retry value with this value.
d4a72f23 491The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
ca85d690 492trying to resume growth after a memory pressure event.
29714574 493.sp
ca85d690 494Default value: \fB0\fR.
29714574
TF
495.RE
496
497.sp
498.ne 2
499.na
7e8bddd0 500\fBzfs_arc_lotsfree_percent\fR (int)
29714574
TF
501.ad
502.RS 12n
7e8bddd0
BB
503Throttle I/O when free system memory drops below this percentage of total
504system memory. Setting this value to 0 will disable the throttle.
29714574 505.sp
7e8bddd0 506Default value: \fB10\fR.
29714574
TF
507.RE
508
509.sp
510.ne 2
511.na
7e8bddd0 512\fBzfs_arc_max\fR (ulong)
29714574
TF
513.ad
514.RS 12n
83426735
D
515Max arc size of ARC in bytes. If set to 0 then it will consume 1/2 of system
516RAM. This value must be at least 67108864 (64 megabytes).
517.sp
518This value can be changed dynamically with some caveats. It cannot be set back
519to 0 while running and reducing it below the current ARC size will not cause
520the ARC to shrink without memory pressure to induce shrinking.
29714574 521.sp
7e8bddd0 522Default value: \fB0\fR.
29714574
TF
523.RE
524
ca85d690 525.sp
526.ne 2
527.na
528\fBzfs_arc_meta_adjust_restarts\fR (ulong)
529.ad
530.RS 12n
531The number of restart passes to make while scanning the ARC attempting
532the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
533This value should not need to be tuned but is available to facilitate
534performance analysis.
535.sp
536Default value: \fB4096\fR.
537.RE
538
29714574
TF
539.sp
540.ne 2
541.na
542\fBzfs_arc_meta_limit\fR (ulong)
543.ad
544.RS 12n
2cbb06b5
BB
545The maximum allowed size in bytes that meta data buffers are allowed to
546consume in the ARC. When this limit is reached meta data buffers will
547be reclaimed even if the overall arc_c_max has not been reached. This
9907cc1c
G
548value defaults to 0 which indicates that a percent which is based on
549\fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
29714574 550.sp
83426735 551This value my be changed dynamically except that it cannot be set back to 0
9907cc1c 552for a specific percent of the ARC; it must be set to an explicit value.
83426735 553.sp
29714574
TF
554Default value: \fB0\fR.
555.RE
556
9907cc1c
G
557.sp
558.ne 2
559.na
560\fBzfs_arc_meta_limit_percent\fR (ulong)
561.ad
562.RS 12n
563Percentage of ARC buffers that can be used for meta data.
564
565See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
566higher priority if set to nonzero value.
567
568.sp
569Default value: \fB75\fR.
570.RE
571
ca0bf58d
PS
572.sp
573.ne 2
574.na
575\fBzfs_arc_meta_min\fR (ulong)
576.ad
577.RS 12n
578The minimum allowed size in bytes that meta data buffers may consume in
579the ARC. This value defaults to 0 which disables a floor on the amount
580of the ARC devoted meta data.
581.sp
582Default value: \fB0\fR.
583.RE
584
29714574
TF
585.sp
586.ne 2
587.na
588\fBzfs_arc_meta_prune\fR (int)
589.ad
590.RS 12n
2cbb06b5
BB
591The number of dentries and inodes to be scanned looking for entries
592which can be dropped. This may be required when the ARC reaches the
593\fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
594in the ARC. Increasing this value will cause to dentry and inode caches
595to be pruned more aggressively. Setting this value to 0 will disable
596pruning the inode and dentry caches.
29714574 597.sp
2cbb06b5 598Default value: \fB10,000\fR.
29714574
TF
599.RE
600
bc888666
BB
601.sp
602.ne 2
603.na
ca85d690 604\fBzfs_arc_meta_strategy\fR (int)
bc888666
BB
605.ad
606.RS 12n
ca85d690 607Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
608A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
d4a72f23 609A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
ca85d690 610that is required to in order to evict the required number of meta data buffers.
bc888666 611.sp
ca85d690 612Default value: \fB1\fR.
bc888666
BB
613.RE
614
29714574
TF
615.sp
616.ne 2
617.na
618\fBzfs_arc_min\fR (ulong)
619.ad
620.RS 12n
ca85d690 621Min arc size of ARC in bytes. If set to 0 then arc_c_min will default to
622consuming the larger of 32M or 1/32 of total system memory.
29714574 623.sp
ca85d690 624Default value: \fB0\fR.
29714574
TF
625.RE
626
627.sp
628.ne 2
629.na
d4a72f23 630\fBzfs_arc_min_prefetch_ms\fR (int)
29714574
TF
631.ad
632.RS 12n
d4a72f23
TC
633Minimum time prefetched blocks are locked in the ARC, specified in ms.
634A value of \fB0\fR will default to 1 second.
635.sp
636Default value: \fB0\fR.
637.RE
638
639.sp
640.ne 2
641.na
642\fBzfs_arc_min_prescient_prefetch_ms\fR (int)
643.ad
644.RS 12n
645Minimum time "prescient prefetched" blocks are locked in the ARC, specified
646in ms. These blocks are meant to be prefetched fairly aggresively ahead of
647the code that may use them. A value of \fB0\fR will default to 6 seconds.
29714574 648.sp
83426735 649Default value: \fB0\fR.
29714574
TF
650.RE
651
ca0bf58d
PS
652.sp
653.ne 2
654.na
c30e58c4 655\fBzfs_multilist_num_sublists\fR (int)
ca0bf58d
PS
656.ad
657.RS 12n
658To allow more fine-grained locking, each ARC state contains a series
659of lists for both data and meta data objects. Locking is performed at
660the level of these "sub-lists". This parameters controls the number of
c30e58c4
MA
661sub-lists per ARC state, and also applies to other uses of the
662multilist data structure.
ca0bf58d 663.sp
c30e58c4 664Default value: \fB4\fR or the number of online CPUs, whichever is greater
ca0bf58d
PS
665.RE
666
667.sp
668.ne 2
669.na
670\fBzfs_arc_overflow_shift\fR (int)
671.ad
672.RS 12n
673The ARC size is considered to be overflowing if it exceeds the current
674ARC target size (arc_c) by a threshold determined by this parameter.
675The threshold is calculated as a fraction of arc_c using the formula
676"arc_c >> \fBzfs_arc_overflow_shift\fR".
677
678The default value of 8 causes the ARC to be considered to be overflowing
679if it exceeds the target size by 1/256th (0.3%) of the target size.
680
681When the ARC is overflowing, new buffer allocations are stalled until
682the reclaim thread catches up and the overflow condition no longer exists.
683.sp
684Default value: \fB8\fR.
685.RE
686
728d6ae9
BB
687.sp
688.ne 2
689.na
690
691\fBzfs_arc_p_min_shift\fR (int)
692.ad
693.RS 12n
ca85d690 694If set to a non zero value, this will update arc_p_min_shift (default 4)
695with the new value.
d4a72f23 696arc_p_min_shift is used to shift of arc_c for calculating both min and max
ca85d690 697max arc_p
728d6ae9 698.sp
ca85d690 699Default value: \fB0\fR.
728d6ae9
BB
700.RE
701
89c8cac4
PS
702.sp
703.ne 2
704.na
705\fBzfs_arc_p_aggressive_disable\fR (int)
706.ad
707.RS 12n
708Disable aggressive arc_p growth
709.sp
710Use \fB1\fR for yes (default) and \fB0\fR to disable.
711.RE
712
62422785
PS
713.sp
714.ne 2
715.na
716\fBzfs_arc_p_dampener_disable\fR (int)
717.ad
718.RS 12n
719Disable arc_p adapt dampener
720.sp
721Use \fB1\fR for yes (default) and \fB0\fR to disable.
722.RE
723
29714574
TF
724.sp
725.ne 2
726.na
727\fBzfs_arc_shrink_shift\fR (int)
728.ad
729.RS 12n
ca85d690 730If set to a non zero value, this will update arc_shrink_shift (default 7)
731with the new value.
29714574 732.sp
ca85d690 733Default value: \fB0\fR.
29714574
TF
734.RE
735
03b60eee
DB
736.sp
737.ne 2
738.na
739\fBzfs_arc_pc_percent\fR (uint)
740.ad
741.RS 12n
742Percent of pagecache to reclaim arc to
743
744This tunable allows ZFS arc to play more nicely with the kernel's LRU
745pagecache. It can guarantee that the arc size won't collapse under scanning
746pressure on the pagecache, yet still allows arc to be reclaimed down to
747zfs_arc_min if necessary. This value is specified as percent of pagecache
748size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
749only operates during memory pressure/reclaim.
750.sp
751Default value: \fB0\fR (disabled).
752.RE
753
11f552fa
BB
754.sp
755.ne 2
756.na
757\fBzfs_arc_sys_free\fR (ulong)
758.ad
759.RS 12n
760The target number of bytes the ARC should leave as free memory on the system.
761Defaults to the larger of 1/64 of physical memory or 512K. Setting this
762option to a non-zero value will override the default.
763.sp
764Default value: \fB0\fR.
765.RE
766
29714574
TF
767.sp
768.ne 2
769.na
770\fBzfs_autoimport_disable\fR (int)
771.ad
772.RS 12n
27b293be 773Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
29714574 774.sp
70081096 775Use \fB1\fR for yes (default) and \fB0\fR for no.
29714574
TF
776.RE
777
2fe61a7e
PS
778.sp
779.ne 2
780.na
781\fBzfs_commit_timeout_pct\fR (int)
782.ad
783.RS 12n
784This controls the amount of time that a ZIL block (lwb) will remain "open"
785when it isn't "full", and it has a thread waiting for it to be committed to
786stable storage. The timeout is scaled based on a percentage of the last lwb
787latency to avoid significantly impacting the latency of each individual
788transaction record (itx).
789.sp
790Default value: \fB5\fR.
791.RE
792
3b36f831
BB
793.sp
794.ne 2
795.na
796\fBzfs_dbgmsg_enable\fR (int)
797.ad
798.RS 12n
799Internally ZFS keeps a small log to facilitate debugging. By default the log
800is disabled, to enable it set this option to 1. The contents of the log can
801be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
802this proc file clears the log.
803.sp
804Default value: \fB0\fR.
805.RE
806
807.sp
808.ne 2
809.na
810\fBzfs_dbgmsg_maxsize\fR (int)
811.ad
812.RS 12n
813The maximum size in bytes of the internal ZFS debug log.
814.sp
815Default value: \fB4M\fR.
816.RE
817
29714574
TF
818.sp
819.ne 2
820.na
821\fBzfs_dbuf_state_index\fR (int)
822.ad
823.RS 12n
83426735
D
824This feature is currently unused. It is normally used for controlling what
825reporting is available under /proc/spl/kstat/zfs.
29714574
TF
826.sp
827Default value: \fB0\fR.
828.RE
829
830.sp
831.ne 2
832.na
833\fBzfs_deadman_enabled\fR (int)
834.ad
835.RS 12n
b81a3ddc
TC
836When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
837milliseconds, a "slow spa_sync" message is logged to the debug log
838(see \fBzfs_dbgmsg_enable\fR). If \fBzfs_deadman_enabled\fR is set,
839all pending IO operations are also checked and if any haven't completed
840within \fBzfs_deadman_synctime_ms\fR milliseconds, a "SLOW IO" message
841is logged to the debug log and a "delay" system event with the details of
842the hung IO is posted.
29714574 843.sp
b81a3ddc
TC
844Use \fB1\fR (default) to enable the slow IO check and \fB0\fR to disable.
845.RE
846
847.sp
848.ne 2
849.na
850\fBzfs_deadman_checktime_ms\fR (int)
851.ad
852.RS 12n
853Once a pool sync operation has taken longer than
854\fBzfs_deadman_synctime_ms\fR milliseconds, continue to check for slow
855operations every \fBzfs_deadman_checktime_ms\fR milliseconds.
856.sp
857Default value: \fB5,000\fR.
29714574
TF
858.RE
859
860.sp
861.ne 2
862.na
e8b96c60 863\fBzfs_deadman_synctime_ms\fR (ulong)
29714574
TF
864.ad
865.RS 12n
b81a3ddc
TC
866Interval in milliseconds after which the deadman is triggered and also
867the interval after which an IO operation is considered to be "hung"
868if \fBzfs_deadman_enabled\fR is set.
869
870See \fBzfs_deadman_enabled\fR.
29714574 871.sp
e8b96c60 872Default value: \fB1,000,000\fR.
29714574
TF
873.RE
874
875.sp
876.ne 2
877.na
878\fBzfs_dedup_prefetch\fR (int)
879.ad
880.RS 12n
881Enable prefetching dedup-ed blks
882.sp
0dfc7324 883Use \fB1\fR for yes and \fB0\fR to disable (default).
29714574
TF
884.RE
885
e8b96c60
MA
886.sp
887.ne 2
888.na
889\fBzfs_delay_min_dirty_percent\fR (int)
890.ad
891.RS 12n
892Start to delay each transaction once there is this amount of dirty data,
893expressed as a percentage of \fBzfs_dirty_data_max\fR.
894This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
895See the section "ZFS TRANSACTION DELAY".
896.sp
897Default value: \fB60\fR.
898.RE
899
900.sp
901.ne 2
902.na
903\fBzfs_delay_scale\fR (int)
904.ad
905.RS 12n
906This controls how quickly the transaction delay approaches infinity.
907Larger values cause longer delays for a given amount of dirty data.
908.sp
909For the smoothest delay, this value should be about 1 billion divided
910by the maximum number of operations per second. This will smoothly
911handle between 10x and 1/10th this number.
912.sp
913See the section "ZFS TRANSACTION DELAY".
914.sp
915Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
916.sp
917Default value: \fB500,000\fR.
918.RE
919
a966c564
K
920.sp
921.ne 2
922.na
923\fBzfs_delete_blocks\fR (ulong)
924.ad
925.RS 12n
926This is the used to define a large file for the purposes of delete. Files
927containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
928while smaller files are deleted synchronously. Decreasing this value will
929reduce the time spent in an unlink(2) system call at the expense of a longer
930delay before the freed space is available.
931.sp
932Default value: \fB20,480\fR.
933.RE
934
e8b96c60
MA
935.sp
936.ne 2
937.na
938\fBzfs_dirty_data_max\fR (int)
939.ad
940.RS 12n
941Determines the dirty space limit in bytes. Once this limit is exceeded, new
942writes are halted until space frees up. This parameter takes precedence
943over \fBzfs_dirty_data_max_percent\fR.
944See the section "ZFS TRANSACTION DELAY".
945.sp
946Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
947.RE
948
949.sp
950.ne 2
951.na
952\fBzfs_dirty_data_max_max\fR (int)
953.ad
954.RS 12n
955Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
956This limit is only enforced at module load time, and will be ignored if
957\fBzfs_dirty_data_max\fR is later changed. This parameter takes
958precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
959"ZFS TRANSACTION DELAY".
960.sp
961Default value: 25% of physical RAM.
962.RE
963
964.sp
965.ne 2
966.na
967\fBzfs_dirty_data_max_max_percent\fR (int)
968.ad
969.RS 12n
970Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
971percentage of physical RAM. This limit is only enforced at module load
972time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
973The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
974one. See the section "ZFS TRANSACTION DELAY".
975.sp
9ef3906a 976Default value: \fB25\fR.
e8b96c60
MA
977.RE
978
979.sp
980.ne 2
981.na
982\fBzfs_dirty_data_max_percent\fR (int)
983.ad
984.RS 12n
985Determines the dirty space limit, expressed as a percentage of all
986memory. Once this limit is exceeded, new writes are halted until space frees
987up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
988one. See the section "ZFS TRANSACTION DELAY".
989.sp
990Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
991.RE
992
993.sp
994.ne 2
995.na
996\fBzfs_dirty_data_sync\fR (int)
997.ad
998.RS 12n
999Start syncing out a transaction group if there is at least this much dirty data.
1000.sp
1001Default value: \fB67,108,864\fR.
1002.RE
1003
1eeb4562
JX
1004.sp
1005.ne 2
1006.na
1007\fBzfs_fletcher_4_impl\fR (string)
1008.ad
1009.RS 12n
1010Select a fletcher 4 implementation.
1011.sp
35a76a03 1012Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
24cdeaf1 1013\fBavx2\fR, \fBavx512f\fR, and \fBaarch64_neon\fR.
70b258fc
GN
1014All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
1015set extensions to be available and will only appear if ZFS detects that they are
1016present at runtime. If multiple implementations of fletcher 4 are available,
1017the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
1018results in the original, CPU based calculation, being used. Selecting any option
1019other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
1020the respective CPU instruction set being used.
1eeb4562
JX
1021.sp
1022Default value: \fBfastest\fR.
1023.RE
1024
ba5ad9a4
GW
1025.sp
1026.ne 2
1027.na
1028\fBzfs_free_bpobj_enabled\fR (int)
1029.ad
1030.RS 12n
1031Enable/disable the processing of the free_bpobj object.
1032.sp
1033Default value: \fB1\fR.
1034.RE
1035
36283ca2
MG
1036.sp
1037.ne 2
1038.na
1039\fBzfs_free_max_blocks\fR (ulong)
1040.ad
1041.RS 12n
1042Maximum number of blocks freed in a single txg.
1043.sp
1044Default value: \fB100,000\fR.
1045.RE
1046
e8b96c60
MA
1047.sp
1048.ne 2
1049.na
1050\fBzfs_vdev_async_read_max_active\fR (int)
1051.ad
1052.RS 12n
83426735 1053Maximum asynchronous read I/Os active to each device.
e8b96c60
MA
1054See the section "ZFS I/O SCHEDULER".
1055.sp
1056Default value: \fB3\fR.
1057.RE
1058
1059.sp
1060.ne 2
1061.na
1062\fBzfs_vdev_async_read_min_active\fR (int)
1063.ad
1064.RS 12n
1065Minimum asynchronous read I/Os active to each device.
1066See the section "ZFS I/O SCHEDULER".
1067.sp
1068Default value: \fB1\fR.
1069.RE
1070
1071.sp
1072.ne 2
1073.na
1074\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
1075.ad
1076.RS 12n
1077When the pool has more than
1078\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
1079\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
1080the dirty data is between min and max, the active I/O limit is linearly
1081interpolated. See the section "ZFS I/O SCHEDULER".
1082.sp
1083Default value: \fB60\fR.
1084.RE
1085
1086.sp
1087.ne 2
1088.na
1089\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
1090.ad
1091.RS 12n
1092When the pool has less than
1093\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
1094\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
1095the dirty data is between min and max, the active I/O limit is linearly
1096interpolated. See the section "ZFS I/O SCHEDULER".
1097.sp
1098Default value: \fB30\fR.
1099.RE
1100
1101.sp
1102.ne 2
1103.na
1104\fBzfs_vdev_async_write_max_active\fR (int)
1105.ad
1106.RS 12n
83426735 1107Maximum asynchronous write I/Os active to each device.
e8b96c60
MA
1108See the section "ZFS I/O SCHEDULER".
1109.sp
1110Default value: \fB10\fR.
1111.RE
1112
1113.sp
1114.ne 2
1115.na
1116\fBzfs_vdev_async_write_min_active\fR (int)
1117.ad
1118.RS 12n
1119Minimum asynchronous write I/Os active to each device.
1120See the section "ZFS I/O SCHEDULER".
1121.sp
06226b59
D
1122Lower values are associated with better latency on rotational media but poorer
1123resilver performance. The default value of 2 was chosen as a compromise. A
1124value of 3 has been shown to improve resilver performance further at a cost of
1125further increasing latency.
1126.sp
1127Default value: \fB2\fR.
e8b96c60
MA
1128.RE
1129
1130.sp
1131.ne 2
1132.na
1133\fBzfs_vdev_max_active\fR (int)
1134.ad
1135.RS 12n
1136The maximum number of I/Os active to each device. Ideally, this will be >=
1137the sum of each queue's max_active. It must be at least the sum of each
1138queue's min_active. See the section "ZFS I/O SCHEDULER".
1139.sp
1140Default value: \fB1,000\fR.
1141.RE
1142
1143.sp
1144.ne 2
1145.na
1146\fBzfs_vdev_scrub_max_active\fR (int)
1147.ad
1148.RS 12n
83426735 1149Maximum scrub I/Os active to each device.
e8b96c60
MA
1150See the section "ZFS I/O SCHEDULER".
1151.sp
1152Default value: \fB2\fR.
1153.RE
1154
1155.sp
1156.ne 2
1157.na
1158\fBzfs_vdev_scrub_min_active\fR (int)
1159.ad
1160.RS 12n
1161Minimum scrub I/Os active to each device.
1162See the section "ZFS I/O SCHEDULER".
1163.sp
1164Default value: \fB1\fR.
1165.RE
1166
1167.sp
1168.ne 2
1169.na
1170\fBzfs_vdev_sync_read_max_active\fR (int)
1171.ad
1172.RS 12n
83426735 1173Maximum synchronous read I/Os active to each device.
e8b96c60
MA
1174See the section "ZFS I/O SCHEDULER".
1175.sp
1176Default value: \fB10\fR.
1177.RE
1178
1179.sp
1180.ne 2
1181.na
1182\fBzfs_vdev_sync_read_min_active\fR (int)
1183.ad
1184.RS 12n
1185Minimum synchronous read I/Os active to each device.
1186See the section "ZFS I/O SCHEDULER".
1187.sp
1188Default value: \fB10\fR.
1189.RE
1190
1191.sp
1192.ne 2
1193.na
1194\fBzfs_vdev_sync_write_max_active\fR (int)
1195.ad
1196.RS 12n
83426735 1197Maximum synchronous write I/Os active to each device.
e8b96c60
MA
1198See the section "ZFS I/O SCHEDULER".
1199.sp
1200Default value: \fB10\fR.
1201.RE
1202
1203.sp
1204.ne 2
1205.na
1206\fBzfs_vdev_sync_write_min_active\fR (int)
1207.ad
1208.RS 12n
1209Minimum synchronous write I/Os active to each device.
1210See the section "ZFS I/O SCHEDULER".
1211.sp
1212Default value: \fB10\fR.
1213.RE
1214
3dfb57a3
DB
1215.sp
1216.ne 2
1217.na
1218\fBzfs_vdev_queue_depth_pct\fR (int)
1219.ad
1220.RS 12n
e815485f
TC
1221Maximum number of queued allocations per top-level vdev expressed as
1222a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
1223system to detect devices that are more capable of handling allocations
1224and to allocate more blocks to those devices. It allows for dynamic
1225allocation distribution when devices are imbalanced as fuller devices
1226will tend to be slower than empty devices.
1227
1228See also \fBzio_dva_throttle_enabled\fR.
3dfb57a3
DB
1229.sp
1230Default value: \fB1000\fR.
1231.RE
1232
29714574
TF
1233.sp
1234.ne 2
1235.na
1236\fBzfs_disable_dup_eviction\fR (int)
1237.ad
1238.RS 12n
1239Disable duplicate buffer eviction
1240.sp
1241Use \fB1\fR for yes and \fB0\fR for no (default).
1242.RE
1243
1244.sp
1245.ne 2
1246.na
1247\fBzfs_expire_snapshot\fR (int)
1248.ad
1249.RS 12n
1250Seconds to expire .zfs/snapshot
1251.sp
1252Default value: \fB300\fR.
1253.RE
1254
0500e835
BB
1255.sp
1256.ne 2
1257.na
1258\fBzfs_admin_snapshot\fR (int)
1259.ad
1260.RS 12n
1261Allow the creation, removal, or renaming of entries in the .zfs/snapshot
1262directory to cause the creation, destruction, or renaming of snapshots.
1263When enabled this functionality works both locally and over NFS exports
1264which have the 'no_root_squash' option set. This functionality is disabled
1265by default.
1266.sp
1267Use \fB1\fR for yes and \fB0\fR for no (default).
1268.RE
1269
29714574
TF
1270.sp
1271.ne 2
1272.na
1273\fBzfs_flags\fR (int)
1274.ad
1275.RS 12n
33b6dbbc
NB
1276Set additional debugging flags. The following flags may be bitwise-or'd
1277together.
1278.sp
1279.TS
1280box;
1281rB lB
1282lB lB
1283r l.
1284Value Symbolic Name
1285 Description
1286_
12871 ZFS_DEBUG_DPRINTF
1288 Enable dprintf entries in the debug log.
1289_
12902 ZFS_DEBUG_DBUF_VERIFY *
1291 Enable extra dbuf verifications.
1292_
12934 ZFS_DEBUG_DNODE_VERIFY *
1294 Enable extra dnode verifications.
1295_
12968 ZFS_DEBUG_SNAPNAMES
1297 Enable snapshot name verification.
1298_
129916 ZFS_DEBUG_MODIFY
1300 Check for illegally modified ARC buffers.
1301_
130232 ZFS_DEBUG_SPA
1303 Enable spa_dbgmsg entries in the debug log.
1304_
130564 ZFS_DEBUG_ZIO_FREE
1306 Enable verification of block frees.
1307_
1308128 ZFS_DEBUG_HISTOGRAM_VERIFY
1309 Enable extra spacemap histogram verifications.
8740cf4a
NB
1310_
1311256 ZFS_DEBUG_METASLAB_VERIFY
1312 Verify space accounting on disk matches in-core range_trees.
1313_
1314512 ZFS_DEBUG_SET_ERROR
1315 Enable SET_ERROR and dprintf entries in the debug log.
33b6dbbc
NB
1316.TE
1317.sp
1318* Requires debug build.
29714574 1319.sp
33b6dbbc 1320Default value: \fB0\fR.
29714574
TF
1321.RE
1322
fbeddd60
MA
1323.sp
1324.ne 2
1325.na
1326\fBzfs_free_leak_on_eio\fR (int)
1327.ad
1328.RS 12n
1329If destroy encounters an EIO while reading metadata (e.g. indirect
1330blocks), space referenced by the missing metadata can not be freed.
1331Normally this causes the background destroy to become "stalled", as
1332it is unable to make forward progress. While in this stalled state,
1333all remaining space to free from the error-encountering filesystem is
1334"temporarily leaked". Set this flag to cause it to ignore the EIO,
1335permanently leak the space from indirect blocks that can not be read,
1336and continue to free everything else that it can.
1337
1338The default, "stalling" behavior is useful if the storage partially
1339fails (i.e. some but not all i/os fail), and then later recovers. In
1340this case, we will be able to continue pool operations while it is
1341partially failed, and when it recovers, we can continue to free the
1342space, with no leaks. However, note that this case is actually
1343fairly rare.
1344
1345Typically pools either (a) fail completely (but perhaps temporarily,
1346e.g. a top-level vdev going offline), or (b) have localized,
1347permanent errors (e.g. disk returns the wrong data due to bit flip or
1348firmware bug). In case (a), this setting does not matter because the
1349pool will be suspended and the sync thread will not be able to make
1350forward progress regardless. In case (b), because the error is
1351permanent, the best we can do is leak the minimum amount of space,
1352which is what setting this flag will do. Therefore, it is reasonable
1353for this flag to normally be set, but we chose the more conservative
1354approach of not setting it, so that there is no possibility of
1355leaking space in the "partial temporary" failure case.
1356.sp
1357Default value: \fB0\fR.
1358.RE
1359
29714574
TF
1360.sp
1361.ne 2
1362.na
1363\fBzfs_free_min_time_ms\fR (int)
1364.ad
1365.RS 12n
6146e17e 1366During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
83426735 1367of this much time will be spent working on freeing blocks per txg.
29714574
TF
1368.sp
1369Default value: \fB1,000\fR.
1370.RE
1371
1372.sp
1373.ne 2
1374.na
1375\fBzfs_immediate_write_sz\fR (long)
1376.ad
1377.RS 12n
83426735 1378Largest data block to write to zil. Larger blocks will be treated as if the
6146e17e 1379dataset being written to had the property setting \fBlogbias=throughput\fR.
29714574
TF
1380.sp
1381Default value: \fB32,768\fR.
1382.RE
1383
f1512ee6
MA
1384.sp
1385.ne 2
1386.na
1387\fBzfs_max_recordsize\fR (int)
1388.ad
1389.RS 12n
1390We currently support block sizes from 512 bytes to 16MB. The benefits of
1391larger blocks, and thus larger IO, need to be weighed against the cost of
1392COWing a giant block to modify one byte. Additionally, very large blocks
1393can have an impact on i/o latency, and also potentially on the memory
1394allocator. Therefore, we do not allow the recordsize to be set larger than
1395zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
1396this tunable, and pools with larger blocks can always be imported and used,
1397regardless of this setting.
1398.sp
1399Default value: \fB1,048,576\fR.
1400.RE
1401
29714574
TF
1402.sp
1403.ne 2
1404.na
1405\fBzfs_mdcomp_disable\fR (int)
1406.ad
1407.RS 12n
1408Disable meta data compression
1409.sp
1410Use \fB1\fR for yes and \fB0\fR for no (default).
1411.RE
1412
f3a7f661
GW
1413.sp
1414.ne 2
1415.na
1416\fBzfs_metaslab_fragmentation_threshold\fR (int)
1417.ad
1418.RS 12n
1419Allow metaslabs to keep their active state as long as their fragmentation
1420percentage is less than or equal to this value. An active metaslab that
1421exceeds this threshold will no longer keep its active status allowing
1422better metaslabs to be selected.
1423.sp
1424Default value: \fB70\fR.
1425.RE
1426
1427.sp
1428.ne 2
1429.na
1430\fBzfs_mg_fragmentation_threshold\fR (int)
1431.ad
1432.RS 12n
1433Metaslab groups are considered eligible for allocations if their
83426735 1434fragmentation metric (measured as a percentage) is less than or equal to
f3a7f661
GW
1435this value. If a metaslab group exceeds this threshold then it will be
1436skipped unless all metaslab groups within the metaslab class have also
1437crossed this threshold.
1438.sp
1439Default value: \fB85\fR.
1440.RE
1441
f4a4046b
TC
1442.sp
1443.ne 2
1444.na
1445\fBzfs_mg_noalloc_threshold\fR (int)
1446.ad
1447.RS 12n
1448Defines a threshold at which metaslab groups should be eligible for
1449allocations. The value is expressed as a percentage of free space
1450beyond which a metaslab group is always eligible for allocations.
1451If a metaslab group's free space is less than or equal to the
6b4e21c6 1452threshold, the allocator will avoid allocating to that group
f4a4046b
TC
1453unless all groups in the pool have reached the threshold. Once all
1454groups have reached the threshold, all groups are allowed to accept
1455allocations. The default value of 0 disables the feature and causes
1456all metaslab groups to be eligible for allocations.
1457
b58237e7 1458This parameter allows one to deal with pools having heavily imbalanced
f4a4046b
TC
1459vdevs such as would be the case when a new vdev has been added.
1460Setting the threshold to a non-zero percentage will stop allocations
1461from being made to vdevs that aren't filled to the specified percentage
1462and allow lesser filled vdevs to acquire more allocations than they
1463otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
1464.sp
1465Default value: \fB0\fR.
1466.RE
1467
379ca9cf
OF
1468.sp
1469.ne 2
1470.na
1471\fBzfs_multihost_history\fR (int)
1472.ad
1473.RS 12n
1474Historical statistics for the last N multihost updates will be available in
1475\fB/proc/spl/kstat/zfs/<pool>/multihost\fR
1476.sp
1477Default value: \fB0\fR.
1478.RE
1479
1480.sp
1481.ne 2
1482.na
1483\fBzfs_multihost_interval\fR (ulong)
1484.ad
1485.RS 12n
1486Used to control the frequency of multihost writes which are performed when the
1487\fBmultihost\fR pool property is on. This is one factor used to determine
1488the length of the activity check during import.
1489.sp
1490The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR milliseconds.
1491This means that on average a multihost write will be issued for each leaf vdev every
1492\fBzfs_multihost_interval\fR milliseconds. In practice, the observed period can
1493vary with the I/O load and this observed value is the delay which is stored in
1494the uberblock.
1495.sp
1496On import the activity check waits a minimum amount of time determined by
1497\fBzfs_multihost_interval * zfs_multihost_import_intervals\fR. The activity
1498check time may be further extended if the value of mmp delay found in the best
1499uberblock indicates actual multihost updates happened at longer intervals than
1500\fBzfs_multihost_interval\fR. A minimum value of \fB100ms\fR is enforced.
1501.sp
1502Default value: \fB1000\fR.
1503.RE
1504
1505.sp
1506.ne 2
1507.na
1508\fBzfs_multihost_import_intervals\fR (uint)
1509.ad
1510.RS 12n
1511Used to control the duration of the activity test on import. Smaller values of
1512\fBzfs_multihost_import_intervals\fR will reduce the import time but increase
1513the risk of failing to detect an active pool. The total activity check time is
1514never allowed to drop below one second. A value of 0 is ignored and treated as
1515if it was set to 1
1516.sp
1517Default value: \fB10\fR.
1518.RE
1519
1520.sp
1521.ne 2
1522.na
1523\fBzfs_multihost_fail_intervals\fR (uint)
1524.ad
1525.RS 12n
1526Controls the behavior of the pool when multihost write failures are detected.
1527.sp
1528When \fBzfs_multihost_fail_intervals = 0\fR then multihost write failures are ignored.
1529The failures will still be reported to the ZED which depending on its
1530configuration may take action such as suspending the pool or offlining a device.
1531.sp
1532When \fBzfs_multihost_fail_intervals > 0\fR then sequential multihost write failures
1533will cause the pool to be suspended. This occurs when
1534\fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds have
1535passed since the last successful multihost write. This guarantees the activity test
1536will see multihost writes if the pool is imported.
1537.sp
1538Default value: \fB5\fR.
1539.RE
1540
29714574
TF
1541.sp
1542.ne 2
1543.na
1544\fBzfs_no_scrub_io\fR (int)
1545.ad
1546.RS 12n
83426735
D
1547Set for no scrub I/O. This results in scrubs not actually scrubbing data and
1548simply doing a metadata crawl of the pool instead.
29714574
TF
1549.sp
1550Use \fB1\fR for yes and \fB0\fR for no (default).
1551.RE
1552
1553.sp
1554.ne 2
1555.na
1556\fBzfs_no_scrub_prefetch\fR (int)
1557.ad
1558.RS 12n
83426735 1559Set to disable block prefetching for scrubs.
29714574
TF
1560.sp
1561Use \fB1\fR for yes and \fB0\fR for no (default).
1562.RE
1563
29714574
TF
1564.sp
1565.ne 2
1566.na
1567\fBzfs_nocacheflush\fR (int)
1568.ad
1569.RS 12n
83426735
D
1570Disable cache flush operations on disks when writing. Beware, this may cause
1571corruption if disks re-order writes.
29714574
TF
1572.sp
1573Use \fB1\fR for yes and \fB0\fR for no (default).
1574.RE
1575
1576.sp
1577.ne 2
1578.na
1579\fBzfs_nopwrite_enabled\fR (int)
1580.ad
1581.RS 12n
1582Enable NOP writes
1583.sp
1584Use \fB1\fR for yes (default) and \fB0\fR to disable.
1585.RE
1586
66aca247
DB
1587.sp
1588.ne 2
1589.na
1590\fBzfs_dmu_offset_next_sync\fR (int)
1591.ad
1592.RS 12n
1593Enable forcing txg sync to find holes. When enabled forces ZFS to act
1594like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
1595when a dnode is dirty causes txg's to be synced so that this data can be
1596found.
1597.sp
1598Use \fB1\fR for yes and \fB0\fR to disable (default).
1599.RE
1600
29714574
TF
1601.sp
1602.ne 2
1603.na
b738bc5a 1604\fBzfs_pd_bytes_max\fR (int)
29714574
TF
1605.ad
1606.RS 12n
83426735 1607The number of bytes which should be prefetched during a pool traversal
6146e17e 1608(eg: \fBzfs send\fR or other data crawling operations)
29714574 1609.sp
74aa2ba2 1610Default value: \fB52,428,800\fR.
29714574
TF
1611.RE
1612
bef78122
DQ
1613.sp
1614.ne 2
1615.na
1616\fBzfs_per_txg_dirty_frees_percent \fR (ulong)
1617.ad
1618.RS 12n
1619Tunable to control percentage of dirtied blocks from frees in one TXG.
1620After this threshold is crossed, additional dirty blocks from frees
1621wait until the next TXG.
1622A value of zero will disable this throttle.
1623.sp
1624Default value: \fB30\fR and \fB0\fR to disable.
1625.RE
1626
1627
1628
29714574
TF
1629.sp
1630.ne 2
1631.na
1632\fBzfs_prefetch_disable\fR (int)
1633.ad
1634.RS 12n
7f60329a
MA
1635This tunable disables predictive prefetch. Note that it leaves "prescient"
1636prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
1637prescient prefetch never issues i/os that end up not being needed, so it
1638can't hurt performance.
29714574
TF
1639.sp
1640Use \fB1\fR for yes and \fB0\fR for no (default).
1641.RE
1642
1643.sp
1644.ne 2
1645.na
1646\fBzfs_read_chunk_size\fR (long)
1647.ad
1648.RS 12n
1649Bytes to read per chunk
1650.sp
1651Default value: \fB1,048,576\fR.
1652.RE
1653
1654.sp
1655.ne 2
1656.na
1657\fBzfs_read_history\fR (int)
1658.ad
1659.RS 12n
379ca9cf
OF
1660Historical statistics for the last N reads will be available in
1661\fB/proc/spl/kstat/zfs/<pool>/reads\fR
29714574 1662.sp
83426735 1663Default value: \fB0\fR (no data is kept).
29714574
TF
1664.RE
1665
1666.sp
1667.ne 2
1668.na
1669\fBzfs_read_history_hits\fR (int)
1670.ad
1671.RS 12n
1672Include cache hits in read history
1673.sp
1674Use \fB1\fR for yes and \fB0\fR for no (default).
1675.RE
1676
1677.sp
1678.ne 2
1679.na
1680\fBzfs_recover\fR (int)
1681.ad
1682.RS 12n
1683Set to attempt to recover from fatal errors. This should only be used as a
1684last resort, as it typically results in leaked space, or worse.
1685.sp
1686Use \fB1\fR for yes and \fB0\fR for no (default).
1687.RE
1688
1689.sp
1690.ne 2
1691.na
d4a72f23 1692\fBzfs_resilver_min_time_ms\fR (int)
29714574
TF
1693.ad
1694.RS 12n
d4a72f23
TC
1695Resilvers are processed by the sync thread. While resilvering it will spend
1696at least this much time working on a resilver between txg flushes.
29714574 1697.sp
d4a72f23 1698Default value: \fB3,000\fR.
29714574
TF
1699.RE
1700
1701.sp
1702.ne 2
1703.na
d4a72f23 1704\fBzfs_scrub_min_time_ms\fR (int)
29714574
TF
1705.ad
1706.RS 12n
d4a72f23
TC
1707Scrubs are processed by the sync thread. While scrubbing it will spend
1708at least this much time working on a scrub between txg flushes.
29714574 1709.sp
d4a72f23 1710Default value: \fB1,000\fR.
29714574
TF
1711.RE
1712
1713.sp
1714.ne 2
1715.na
d4a72f23 1716\fBzfs_scan_checkpoint_intval\fR (int)
29714574
TF
1717.ad
1718.RS 12n
d4a72f23
TC
1719To preserve progress across reboots the sequential scan algorithm periodically
1720needs to stop metadata scanning and issue all the verifications I/Os to disk.
1721The frequency of this flushing is determined by the
1722\fBfBzfs_scan_checkpoint_intval\fR tunable.
29714574 1723.sp
d4a72f23 1724Default value: \fB7200\fR seconds (every 2 hours).
29714574
TF
1725.RE
1726
1727.sp
1728.ne 2
1729.na
d4a72f23 1730\fBzfs_scan_fill_weight\fR (int)
29714574
TF
1731.ad
1732.RS 12n
d4a72f23
TC
1733This tunable affects how scrub and resilver I/O segments are ordered. A higher
1734number indicates that we care more about how filled in a segment is, while a
1735lower number indicates we care more about the size of the extent without
1736considering the gaps within a segment. This value is only tunable upon module
1737insertion. Changing the value afterwards will have no affect on scrub or
1738resilver performance.
29714574 1739.sp
d4a72f23 1740Default value: \fB3\fR.
29714574
TF
1741.RE
1742
1743.sp
1744.ne 2
1745.na
d4a72f23 1746\fBzfs_scan_issue_strategy\fR (int)
29714574
TF
1747.ad
1748.RS 12n
d4a72f23
TC
1749Determines the order that data will be verified while scrubbing or resilvering.
1750If set to \fB1\fR, data will be verified as sequentially as possible, given the
1751amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
1752may improve scrub performance if the pool's data is very fragmented. If set to
1753\fB2\fR, the largest mostly-contiguous chunk of found data will be verified
1754first. By deferring scrubbing of small segments, we may later find adjacent data
1755to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
1756strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
1757checkpoint.
29714574 1758.sp
d4a72f23
TC
1759Default value: \fB0\fR.
1760.RE
1761
1762.sp
1763.ne 2
1764.na
1765\fBzfs_scan_legacy\fR (int)
1766.ad
1767.RS 12n
1768A value of 0 indicates that scrubs and resilvers will gather metadata in
1769memory before issuing sequential I/O. A value of 1 indicates that the legacy
1770algorithm will be used where I/O is initiated as soon as it is discovered.
1771Changing this value to 0 will not affect scrubs or resilvers that are already
1772in progress.
1773.sp
1774Default value: \fB0\fR.
1775.RE
1776
1777.sp
1778.ne 2
1779.na
1780\fBzfs_scan_max_ext_gap\fR (int)
1781.ad
1782.RS 12n
1783Indicates the largest gap in bytes between scrub / resilver I/Os that will still
1784be considered sequential for sorting purposes. Changing this value will not
1785affect scrubs or resilvers that are already in progress.
1786.sp
1787Default value: \fB2097152 (2 MB)\fR.
1788.RE
1789
1790.sp
1791.ne 2
1792.na
1793\fBzfs_scan_mem_lim_fact\fR (int)
1794.ad
1795.RS 12n
1796Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
1797This tunable determines the hard limit for I/O sorting memory usage.
1798When the hard limit is reached we stop scanning metadata and start issuing
1799data verification I/O. This is done until we get below the soft limit.
1800.sp
1801Default value: \fB20\fR which is 5% of RAM (1/20).
1802.RE
1803
1804.sp
1805.ne 2
1806.na
1807\fBzfs_scan_mem_lim_soft_fact\fR (int)
1808.ad
1809.RS 12n
1810The fraction of the hard limit used to determined the soft limit for I/O sorting
1811by the sequential scan algorithm. When we cross this limit from bellow no action
1812is taken. When we cross this limit from above it is because we are issuing
1813verification I/O. In this case (unless the metadata scan is done) we stop
1814issuing verification I/O and start scanning metadata again until we get to the
1815hard limit.
1816.sp
1817Default value: \fB20\fR which is 5% of the hard limit (1/20).
1818.RE
1819
1820.sp
1821.ne 2
1822.na
1823\fBzfs_scan_vdev_limit\fR (int)
1824.ad
1825.RS 12n
1826Maximum amount of data that can be concurrently issued at once for scrubs and
1827resilvers per leaf device, given in bytes.
1828.sp
1829Default value: \fB41943040\fR.
29714574
TF
1830.RE
1831
fd8febbd
TF
1832.sp
1833.ne 2
1834.na
1835\fBzfs_send_corrupt_data\fR (int)
1836.ad
1837.RS 12n
83426735 1838Allow sending of corrupt data (ignore read/checksum errors when sending data)
fd8febbd
TF
1839.sp
1840Use \fB1\fR for yes and \fB0\fR for no (default).
1841.RE
1842
29714574
TF
1843.sp
1844.ne 2
1845.na
1846\fBzfs_sync_pass_deferred_free\fR (int)
1847.ad
1848.RS 12n
83426735 1849Flushing of data to disk is done in passes. Defer frees starting in this pass
29714574
TF
1850.sp
1851Default value: \fB2\fR.
1852.RE
1853
1854.sp
1855.ne 2
1856.na
1857\fBzfs_sync_pass_dont_compress\fR (int)
1858.ad
1859.RS 12n
1860Don't compress starting in this pass
1861.sp
1862Default value: \fB5\fR.
1863.RE
1864
1865.sp
1866.ne 2
1867.na
1868\fBzfs_sync_pass_rewrite\fR (int)
1869.ad
1870.RS 12n
83426735 1871Rewrite new block pointers starting in this pass
29714574
TF
1872.sp
1873Default value: \fB2\fR.
1874.RE
1875
a032ac4b
BB
1876.sp
1877.ne 2
1878.na
1879\fBzfs_sync_taskq_batch_pct\fR (int)
1880.ad
1881.RS 12n
1882This controls the number of threads used by the dp_sync_taskq. The default
1883value of 75% will create a maximum of one thread per cpu.
1884.sp
1885Default value: \fB75\fR.
1886.RE
1887
29714574
TF
1888.sp
1889.ne 2
1890.na
1891\fBzfs_txg_history\fR (int)
1892.ad
1893.RS 12n
379ca9cf
OF
1894Historical statistics for the last N txgs will be available in
1895\fB/proc/spl/kstat/zfs/<pool>/txgs\fR
29714574 1896.sp
ca85d690 1897Default value: \fB0\fR.
29714574
TF
1898.RE
1899
29714574
TF
1900.sp
1901.ne 2
1902.na
1903\fBzfs_txg_timeout\fR (int)
1904.ad
1905.RS 12n
83426735 1906Flush dirty data to disk at least every N seconds (maximum txg duration)
29714574
TF
1907.sp
1908Default value: \fB5\fR.
1909.RE
1910
1911.sp
1912.ne 2
1913.na
1914\fBzfs_vdev_aggregation_limit\fR (int)
1915.ad
1916.RS 12n
1917Max vdev I/O aggregation size
1918.sp
1919Default value: \fB131,072\fR.
1920.RE
1921
1922.sp
1923.ne 2
1924.na
1925\fBzfs_vdev_cache_bshift\fR (int)
1926.ad
1927.RS 12n
1928Shift size to inflate reads too
1929.sp
83426735 1930Default value: \fB16\fR (effectively 65536).
29714574
TF
1931.RE
1932
1933.sp
1934.ne 2
1935.na
1936\fBzfs_vdev_cache_max\fR (int)
1937.ad
1938.RS 12n
ca85d690 1939Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
1940size (default 64k).
83426735
D
1941.sp
1942Default value: \fB16384\fR.
29714574
TF
1943.RE
1944
1945.sp
1946.ne 2
1947.na
1948\fBzfs_vdev_cache_size\fR (int)
1949.ad
1950.RS 12n
83426735
D
1951Total size of the per-disk cache in bytes.
1952.sp
1953Currently this feature is disabled as it has been found to not be helpful
1954for performance and in some cases harmful.
29714574
TF
1955.sp
1956Default value: \fB0\fR.
1957.RE
1958
29714574
TF
1959.sp
1960.ne 2
1961.na
9f500936 1962\fBzfs_vdev_mirror_rotating_inc\fR (int)
29714574
TF
1963.ad
1964.RS 12n
9f500936 1965A number by which the balancing algorithm increments the load calculation for
1966the purpose of selecting the least busy mirror member when an I/O immediately
1967follows its predecessor on rotational vdevs for the purpose of making decisions
1968based on load.
29714574 1969.sp
9f500936 1970Default value: \fB0\fR.
1971.RE
1972
1973.sp
1974.ne 2
1975.na
1976\fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
1977.ad
1978.RS 12n
1979A number by which the balancing algorithm increments the load calculation for
1980the purpose of selecting the least busy mirror member when an I/O lacks
1981locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
1982this that are not immediately following the previous I/O are incremented by
1983half.
1984.sp
1985Default value: \fB5\fR.
1986.RE
1987
1988.sp
1989.ne 2
1990.na
1991\fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
1992.ad
1993.RS 12n
1994The maximum distance for the last queued I/O in which the balancing algorithm
1995considers an I/O to have locality.
1996See the section "ZFS I/O SCHEDULER".
1997.sp
1998Default value: \fB1048576\fR.
1999.RE
2000
2001.sp
2002.ne 2
2003.na
2004\fBzfs_vdev_mirror_non_rotating_inc\fR (int)
2005.ad
2006.RS 12n
2007A number by which the balancing algorithm increments the load calculation for
2008the purpose of selecting the least busy mirror member on non-rotational vdevs
2009when I/Os do not immediately follow one another.
2010.sp
2011Default value: \fB0\fR.
2012.RE
2013
2014.sp
2015.ne 2
2016.na
2017\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
2018.ad
2019.RS 12n
2020A number by which the balancing algorithm increments the load calculation for
2021the purpose of selecting the least busy mirror member when an I/O lacks
2022locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
2023this that are not immediately following the previous I/O are incremented by
2024half.
2025.sp
2026Default value: \fB1\fR.
29714574
TF
2027.RE
2028
29714574
TF
2029.sp
2030.ne 2
2031.na
2032\fBzfs_vdev_read_gap_limit\fR (int)
2033.ad
2034.RS 12n
83426735
D
2035Aggregate read I/O operations if the gap on-disk between them is within this
2036threshold.
29714574
TF
2037.sp
2038Default value: \fB32,768\fR.
2039.RE
2040
2041.sp
2042.ne 2
2043.na
2044\fBzfs_vdev_scheduler\fR (charp)
2045.ad
2046.RS 12n
ca85d690 2047Set the Linux I/O scheduler on whole disk vdevs to this scheduler. Valid options
2048are noop, cfq, bfq & deadline
29714574
TF
2049.sp
2050Default value: \fBnoop\fR.
2051.RE
2052
29714574
TF
2053.sp
2054.ne 2
2055.na
2056\fBzfs_vdev_write_gap_limit\fR (int)
2057.ad
2058.RS 12n
2059Aggregate write I/O over gap
2060.sp
2061Default value: \fB4,096\fR.
2062.RE
2063
ab9f4b0b
GN
2064.sp
2065.ne 2
2066.na
2067\fBzfs_vdev_raidz_impl\fR (string)
2068.ad
2069.RS 12n
c9187d86 2070Parameter for selecting raidz parity implementation to use.
ab9f4b0b
GN
2071
2072Options marked (always) below may be selected on module load as they are
2073supported on all systems.
2074The remaining options may only be set after the module is loaded, as they
2075are available only if the implementations are compiled in and supported
2076on the running system.
2077
2078Once the module is loaded, the content of
2079/sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
2080with the currently selected one enclosed in [].
2081Possible options are:
2082 fastest - (always) implementation selected using built-in benchmark
2083 original - (always) original raidz implementation
2084 scalar - (always) scalar raidz implementation
ae25d222
GN
2085 sse2 - implementation using SSE2 instruction set (64bit x86 only)
2086 ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
ab9f4b0b 2087 avx2 - implementation using AVX2 instruction set (64bit x86 only)
7f547f85
RD
2088 avx512f - implementation using AVX512F instruction set (64bit x86 only)
2089 avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
62a65a65
RD
2090 aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
2091 aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
ab9f4b0b
GN
2092.sp
2093Default value: \fBfastest\fR.
2094.RE
2095
29714574
TF
2096.sp
2097.ne 2
2098.na
2099\fBzfs_zevent_cols\fR (int)
2100.ad
2101.RS 12n
83426735 2102When zevents are logged to the console use this as the word wrap width.
29714574
TF
2103.sp
2104Default value: \fB80\fR.
2105.RE
2106
2107.sp
2108.ne 2
2109.na
2110\fBzfs_zevent_console\fR (int)
2111.ad
2112.RS 12n
2113Log events to the console
2114.sp
2115Use \fB1\fR for yes and \fB0\fR for no (default).
2116.RE
2117
2118.sp
2119.ne 2
2120.na
2121\fBzfs_zevent_len_max\fR (int)
2122.ad
2123.RS 12n
83426735
D
2124Max event queue length. A value of 0 will result in a calculated value which
2125increases with the number of CPUs in the system (minimum 64 events). Events
2126in the queue can be viewed with the \fBzpool events\fR command.
29714574
TF
2127.sp
2128Default value: \fB0\fR.
2129.RE
2130
a032ac4b
BB
2131.sp
2132.ne 2
2133.na
2134\fBzfs_zil_clean_taskq_maxalloc\fR (int)
2135.ad
2136.RS 12n
2137The maximum number of taskq entries that are allowed to be cached. When this
2fe61a7e 2138limit is exceeded transaction records (itxs) will be cleaned synchronously.
a032ac4b
BB
2139.sp
2140Default value: \fB1048576\fR.
2141.RE
2142
2143.sp
2144.ne 2
2145.na
2146\fBzfs_zil_clean_taskq_minalloc\fR (int)
2147.ad
2148.RS 12n
2149The number of taskq entries that are pre-populated when the taskq is first
2150created and are immediately available for use.
2151.sp
2152Default value: \fB1024\fR.
2153.RE
2154
2155.sp
2156.ne 2
2157.na
2158\fBzfs_zil_clean_taskq_nthr_pct\fR (int)
2159.ad
2160.RS 12n
2161This controls the number of threads used by the dp_zil_clean_taskq. The default
2162value of 100% will create a maximum of one thread per cpu.
2163.sp
2164Default value: \fB100\fR.
2165.RE
2166
29714574
TF
2167.sp
2168.ne 2
2169.na
2170\fBzil_replay_disable\fR (int)
2171.ad
2172.RS 12n
83426735
D
2173Disable intent logging replay. Can be disabled for recovery from corrupted
2174ZIL
29714574
TF
2175.sp
2176Use \fB1\fR for yes and \fB0\fR for no (default).
2177.RE
2178
2179.sp
2180.ne 2
2181.na
1b7c1e5c 2182\fBzil_slog_bulk\fR (ulong)
29714574
TF
2183.ad
2184.RS 12n
1b7c1e5c
GDN
2185Limit SLOG write size per commit executed with synchronous priority.
2186Any writes above that will be executed with lower (asynchronous) priority
2187to limit potential SLOG device abuse by single active ZIL writer.
29714574 2188.sp
1b7c1e5c 2189Default value: \fB786,432\fR.
29714574
TF
2190.RE
2191
29714574
TF
2192.sp
2193.ne 2
2194.na
2195\fBzio_delay_max\fR (int)
2196.ad
2197.RS 12n
83426735 2198A zevent will be logged if a ZIO operation takes more than N milliseconds to
ab9f4b0b 2199complete. Note that this is only a logging facility, not a timeout on
83426735 2200operations.
29714574
TF
2201.sp
2202Default value: \fB30,000\fR.
2203.RE
2204
3dfb57a3
DB
2205.sp
2206.ne 2
2207.na
2208\fBzio_dva_throttle_enabled\fR (int)
2209.ad
2210.RS 12n
2211Throttle block allocations in the ZIO pipeline. This allows for
2212dynamic allocation distribution when devices are imbalanced.
e815485f
TC
2213When enabled, the maximum number of pending allocations per top-level vdev
2214is limited by \fBzfs_vdev_queue_depth_pct\fR.
3dfb57a3 2215.sp
27f2b90d 2216Default value: \fB1\fR.
3dfb57a3
DB
2217.RE
2218
29714574
TF
2219.sp
2220.ne 2
2221.na
2222\fBzio_requeue_io_start_cut_in_line\fR (int)
2223.ad
2224.RS 12n
2225Prioritize requeued I/O
2226.sp
2227Default value: \fB0\fR.
2228.RE
2229
dcb6bed1
D
2230.sp
2231.ne 2
2232.na
2233\fBzio_taskq_batch_pct\fR (uint)
2234.ad
2235.RS 12n
2236Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
2237for IO. These workers are responsible for IO work such as compression and
2238checksum calculations. Fractional number of CPUs will be rounded down.
2239.sp
2240The default value of 75 was chosen to avoid using all CPUs which can result in
2241latency issues and inconsistent application performance, especially when high
2242compression is enabled.
2243.sp
2244Default value: \fB75\fR.
2245.RE
2246
29714574
TF
2247.sp
2248.ne 2
2249.na
2250\fBzvol_inhibit_dev\fR (uint)
2251.ad
2252.RS 12n
83426735
D
2253Do not create zvol device nodes. This may slightly improve startup time on
2254systems with a very large number of zvols.
29714574
TF
2255.sp
2256Use \fB1\fR for yes and \fB0\fR for no (default).
2257.RE
2258
2259.sp
2260.ne 2
2261.na
2262\fBzvol_major\fR (uint)
2263.ad
2264.RS 12n
83426735 2265Major number for zvol block devices
29714574
TF
2266.sp
2267Default value: \fB230\fR.
2268.RE
2269
2270.sp
2271.ne 2
2272.na
2273\fBzvol_max_discard_blocks\fR (ulong)
2274.ad
2275.RS 12n
83426735
D
2276Discard (aka TRIM) operations done on zvols will be done in batches of this
2277many blocks, where block size is determined by the \fBvolblocksize\fR property
2278of a zvol.
29714574
TF
2279.sp
2280Default value: \fB16,384\fR.
2281.RE
2282
9965059a
BB
2283.sp
2284.ne 2
2285.na
2286\fBzvol_prefetch_bytes\fR (uint)
2287.ad
2288.RS 12n
2289When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
2290from the start and end of the volume. Prefetching these regions
2291of the volume is desirable because they are likely to be accessed
2292immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
2293table.
2294.sp
2295Default value: \fB131,072\fR.
2296.RE
2297
692e55b8
CC
2298.sp
2299.ne 2
2300.na
2301\fBzvol_request_sync\fR (uint)
2302.ad
2303.RS 12n
2304When processing I/O requests for a zvol submit them synchronously. This
2305effectively limits the queue depth to 1 for each I/O submitter. When set
2306to 0 requests are handled asynchronously by a thread pool. The number of
2307requests which can be handled concurrently is controller by \fBzvol_threads\fR.
2308.sp
8fa5250f 2309Default value: \fB0\fR.
692e55b8
CC
2310.RE
2311
2312.sp
2313.ne 2
2314.na
2315\fBzvol_threads\fR (uint)
2316.ad
2317.RS 12n
2318Max number of threads which can handle zvol I/O requests concurrently.
2319.sp
2320Default value: \fB32\fR.
2321.RE
2322
cf8738d8 2323.sp
2324.ne 2
2325.na
2326\fBzvol_volmode\fR (uint)
2327.ad
2328.RS 12n
2329Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
2330Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
2331.sp
2332Default value: \fB1\fR.
2333.RE
2334
39ccc909 2335.sp
2336.ne 2
2337.na
2338\fBzfs_qat_disable\fR (int)
2339.ad
2340.RS 12n
2341This tunable disables qat hardware acceleration for gzip compression.
2342It is available only if qat acceleration is compiled in and qat driver
2343is present.
2344.sp
2345Use \fB1\fR for yes and \fB0\fR for no (default).
2346.RE
2347
e8b96c60
MA
2348.SH ZFS I/O SCHEDULER
2349ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
2350The I/O scheduler determines when and in what order those operations are
2351issued. The I/O scheduler divides operations into five I/O classes
2352prioritized in the following order: sync read, sync write, async read,
2353async write, and scrub/resilver. Each queue defines the minimum and
2354maximum number of concurrent operations that may be issued to the
2355device. In addition, the device has an aggregate maximum,
2356\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
2357must not exceed the aggregate maximum. If the sum of the per-queue
2358maximums exceeds the aggregate maximum, then the number of active I/Os
2359may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
2360be issued regardless of whether all per-queue minimums have been met.
2361.sp
2362For many physical devices, throughput increases with the number of
2363concurrent operations, but latency typically suffers. Further, physical
2364devices typically have a limit at which more concurrent operations have no
2365effect on throughput or can actually cause it to decrease.
2366.sp
2367The scheduler selects the next operation to issue by first looking for an
2368I/O class whose minimum has not been satisfied. Once all are satisfied and
2369the aggregate maximum has not been hit, the scheduler looks for classes
2370whose maximum has not been satisfied. Iteration through the I/O classes is
2371done in the order specified above. No further operations are issued if the
2372aggregate maximum number of concurrent operations has been hit or if there
2373are no operations queued for an I/O class that has not hit its maximum.
2374Every time an I/O is queued or an operation completes, the I/O scheduler
2375looks for new operations to issue.
2376.sp
2377In general, smaller max_active's will lead to lower latency of synchronous
2378operations. Larger max_active's may lead to higher overall throughput,
2379depending on underlying storage.
2380.sp
2381The ratio of the queues' max_actives determines the balance of performance
2382between reads, writes, and scrubs. E.g., increasing
2383\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
2384more quickly, but reads and writes to have higher latency and lower throughput.
2385.sp
2386All I/O classes have a fixed maximum number of outstanding operations
2387except for the async write class. Asynchronous writes represent the data
2388that is committed to stable storage during the syncing stage for
2389transaction groups. Transaction groups enter the syncing state
2390periodically so the number of queued async writes will quickly burst up
2391and then bleed down to zero. Rather than servicing them as quickly as
2392possible, the I/O scheduler changes the maximum number of active async
2393write I/Os according to the amount of dirty data in the pool. Since
2394both throughput and latency typically increase with the number of
2395concurrent operations issued to physical devices, reducing the
2396burstiness in the number of concurrent operations also stabilizes the
2397response time of operations from other -- and in particular synchronous
2398-- queues. In broad strokes, the I/O scheduler will issue more
2399concurrent operations from the async write queue as there's more dirty
2400data in the pool.
2401.sp
2402Async Writes
2403.sp
2404The number of concurrent operations issued for the async write I/O class
2405follows a piece-wise linear function defined by a few adjustable points.
2406.nf
2407
2408 | o---------| <-- zfs_vdev_async_write_max_active
2409 ^ | /^ |
2410 | | / | |
2411active | / | |
2412 I/O | / | |
2413count | / | |
2414 | / | |
2415 |-------o | | <-- zfs_vdev_async_write_min_active
2416 0|_______^______|_________|
2417 0% | | 100% of zfs_dirty_data_max
2418 | |
2419 | `-- zfs_vdev_async_write_active_max_dirty_percent
2420 `--------- zfs_vdev_async_write_active_min_dirty_percent
2421
2422.fi
2423Until the amount of dirty data exceeds a minimum percentage of the dirty
2424data allowed in the pool, the I/O scheduler will limit the number of
2425concurrent operations to the minimum. As that threshold is crossed, the
2426number of concurrent operations issued increases linearly to the maximum at
2427the specified maximum percentage of the dirty data allowed in the pool.
2428.sp
2429Ideally, the amount of dirty data on a busy pool will stay in the sloped
2430part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
2431and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
2432maximum percentage, this indicates that the rate of incoming data is
2433greater than the rate that the backend storage can handle. In this case, we
2434must further throttle incoming writes, as described in the next section.
2435
2436.SH ZFS TRANSACTION DELAY
2437We delay transactions when we've determined that the backend storage
2438isn't able to accommodate the rate of incoming writes.
2439.sp
2440If there is already a transaction waiting, we delay relative to when
2441that transaction will finish waiting. This way the calculated delay time
2442is independent of the number of threads concurrently executing
2443transactions.
2444.sp
2445If we are the only waiter, wait relative to when the transaction
2446started, rather than the current time. This credits the transaction for
2447"time already served", e.g. reading indirect blocks.
2448.sp
2449The minimum time for a transaction to take is calculated as:
2450.nf
2451 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
2452 min_time is then capped at 100 milliseconds.
2453.fi
2454.sp
2455The delay has two degrees of freedom that can be adjusted via tunables. The
2456percentage of dirty data at which we start to delay is defined by
2457\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
2458\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
2459delay after writing at full speed has failed to keep up with the incoming write
2460rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
2461this variable determines the amount of delay at the midpoint of the curve.
2462.sp
2463.nf
2464delay
2465 10ms +-------------------------------------------------------------*+
2466 | *|
2467 9ms + *+
2468 | *|
2469 8ms + *+
2470 | * |
2471 7ms + * +
2472 | * |
2473 6ms + * +
2474 | * |
2475 5ms + * +
2476 | * |
2477 4ms + * +
2478 | * |
2479 3ms + * +
2480 | * |
2481 2ms + (midpoint) * +
2482 | | ** |
2483 1ms + v *** +
2484 | zfs_delay_scale ----------> ******** |
2485 0 +-------------------------------------*********----------------+
2486 0% <- zfs_dirty_data_max -> 100%
2487.fi
2488.sp
2489Note that since the delay is added to the outstanding time remaining on the
2490most recent transaction, the delay is effectively the inverse of IOPS.
2491Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
2492was chosen such that small changes in the amount of accumulated dirty data
2493in the first 3/4 of the curve yield relatively small differences in the
2494amount of delay.
2495.sp
2496The effects can be easier to understand when the amount of delay is
2497represented on a log scale:
2498.sp
2499.nf
2500delay
2501100ms +-------------------------------------------------------------++
2502 + +
2503 | |
2504 + *+
2505 10ms + *+
2506 + ** +
2507 | (midpoint) ** |
2508 + | ** +
2509 1ms + v **** +
2510 + zfs_delay_scale ----------> ***** +
2511 | **** |
2512 + **** +
2513100us + ** +
2514 + * +
2515 | * |
2516 + * +
2517 10us + * +
2518 + +
2519 | |
2520 + +
2521 +--------------------------------------------------------------+
2522 0% <- zfs_dirty_data_max -> 100%
2523.fi
2524.sp
2525Note here that only as the amount of dirty data approaches its limit does
2526the delay start to increase rapidly. The goal of a properly tuned system
2527should be to keep the amount of dirty data out of that range by first
2528ensuring that the appropriate limits are set for the I/O scheduler to reach
2529optimal throughput on the backend storage, and then by changing the value
2530of \fBzfs_delay_scale\fR to increase the steepness of the curve.