]> git.proxmox.com Git - mirror_zfs.git/blame - man/man5/zfs-module-parameters.5
Don't ignore zfs_arc_max below allmem/32
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
CommitLineData
29714574
TF
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
87c25d56 3.\" Copyright (c) 2019 by Delphix. All rights reserved.
65282ee9 4.\" Copyright (c) 2019 Datto Inc.
29714574
TF
5.\" The contents of this file are subject to the terms of the Common Development
6.\" and Distribution License (the "License"). You may not use this file except
7.\" in compliance with the License. You can obtain a copy of the license at
8.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
9.\"
10.\" See the License for the specific language governing permissions and
11.\" limitations under the License. When distributing Covered Code, include this
12.\" CDDL HEADER in each file and include the License file at
13.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
14.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
15.\" own identifying information:
16.\" Portions Copyright [yyyy] [name of copyright owner]
1b939560 17.TH ZFS-MODULE-PARAMETERS 5 "Feb 15, 2019"
29714574
TF
18.SH NAME
19zfs\-module\-parameters \- ZFS module parameters
20.SH DESCRIPTION
21.sp
22.LP
23Description of the different parameters to the ZFS module.
24
25.SS "Module parameters"
26.sp
27.LP
28
de4f8d5d
BB
29.sp
30.ne 2
31.na
32\fBdbuf_cache_max_bytes\fR (ulong)
33.ad
34.RS 12n
35Maximum size in bytes of the dbuf cache. When \fB0\fR this value will default
36to \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size, otherwise the
37provided value in bytes will be used. The behavior of the dbuf cache and its
38associated settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR
39kstat.
40.sp
41Default value: \fB0\fR.
42.RE
43
2e5dc449
MA
44.sp
45.ne 2
46.na
47\fBdbuf_metadata_cache_max_bytes\fR (ulong)
48.ad
49.RS 12n
50Maximum size in bytes of the metadata dbuf cache. When \fB0\fR this value will
51default to \fB1/2^dbuf_cache_shift\fR (1/16) of the target ARC size, otherwise
52the provided value in bytes will be used. The behavior of the metadata dbuf
53cache and its associated settings can be observed via the
54\fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
55.sp
56Default value: \fB0\fR.
57.RE
58
de4f8d5d
BB
59.sp
60.ne 2
61.na
62\fBdbuf_cache_hiwater_pct\fR (uint)
63.ad
64.RS 12n
65The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted
66directly.
67.sp
68Default value: \fB10\fR%.
69.RE
70
71.sp
72.ne 2
73.na
74\fBdbuf_cache_lowater_pct\fR (uint)
75.ad
76.RS 12n
77The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops
78evicting dbufs.
79.sp
80Default value: \fB10\fR%.
81.RE
82
83.sp
84.ne 2
85.na
86\fBdbuf_cache_shift\fR (int)
87.ad
88.RS 12n
89Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
90of the target arc size.
91.sp
92Default value: \fB5\fR.
93.RE
94
2e5dc449
MA
95.sp
96.ne 2
97.na
98\fBdbuf_metadata_cache_shift\fR (int)
99.ad
100.RS 12n
101Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR,
102to a log2 fraction of the target arc size.
103.sp
104Default value: \fB6\fR.
105.RE
106
67709516
D
107.sp
108.ne 2
109.na
110\fBdmu_object_alloc_chunk_shift\fR (int)
111.ad
112.RS 12n
113dnode slots allocated in a single operation as a power of 2. The default value
114minimizes lock contention for the bulk operation performed.
115.sp
116Default value: \fB7\fR (128).
117.RE
118
d9b4bf06
MA
119.sp
120.ne 2
121.na
122\fBdmu_prefetch_max\fR (int)
123.ad
124.RS 12n
125Limit the amount we can prefetch with one call to this amount (in bytes).
126This helps to limit the amount of memory that can be used by prefetching.
127.sp
128Default value: \fB134,217,728\fR (128MB).
129.RE
130
6d836e6f
RE
131.sp
132.ne 2
133.na
134\fBignore_hole_birth\fR (int)
135.ad
136.RS 12n
6ce7b2d9 137This is an alias for \fBsend_holes_without_birth_time\fR.
6d836e6f
RE
138.RE
139
29714574
TF
140.sp
141.ne 2
142.na
143\fBl2arc_feed_again\fR (int)
144.ad
145.RS 12n
83426735
D
146Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
147fast as possible.
29714574
TF
148.sp
149Use \fB1\fR for yes (default) and \fB0\fR to disable.
150.RE
151
152.sp
153.ne 2
154.na
155\fBl2arc_feed_min_ms\fR (ulong)
156.ad
157.RS 12n
83426735
D
158Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
159applicable in related situations.
29714574
TF
160.sp
161Default value: \fB200\fR.
162.RE
163
164.sp
165.ne 2
166.na
167\fBl2arc_feed_secs\fR (ulong)
168.ad
169.RS 12n
170Seconds between L2ARC writing
171.sp
172Default value: \fB1\fR.
173.RE
174
175.sp
176.ne 2
177.na
178\fBl2arc_headroom\fR (ulong)
179.ad
180.RS 12n
83426735
D
181How far through the ARC lists to search for L2ARC cacheable content, expressed
182as a multiplier of \fBl2arc_write_max\fR
29714574
TF
183.sp
184Default value: \fB2\fR.
185.RE
186
187.sp
188.ne 2
189.na
190\fBl2arc_headroom_boost\fR (ulong)
191.ad
192.RS 12n
83426735
D
193Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
194successfully compressed before writing. A value of 100 disables this feature.
29714574 195.sp
be54a13c 196Default value: \fB200\fR%.
29714574
TF
197.RE
198
29714574
TF
199.sp
200.ne 2
201.na
202\fBl2arc_noprefetch\fR (int)
203.ad
204.RS 12n
83426735
D
205Do not write buffers to L2ARC if they were prefetched but not used by
206applications
29714574
TF
207.sp
208Use \fB1\fR for yes (default) and \fB0\fR to disable.
209.RE
210
211.sp
212.ne 2
213.na
214\fBl2arc_norw\fR (int)
215.ad
216.RS 12n
217No reads during writes
218.sp
219Use \fB1\fR for yes and \fB0\fR for no (default).
220.RE
221
222.sp
223.ne 2
224.na
225\fBl2arc_write_boost\fR (ulong)
226.ad
227.RS 12n
603a1784 228Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
83426735 229while they remain cold.
29714574
TF
230.sp
231Default value: \fB8,388,608\fR.
232.RE
233
234.sp
235.ne 2
236.na
237\fBl2arc_write_max\fR (ulong)
238.ad
239.RS 12n
240Max write bytes per interval
241.sp
242Default value: \fB8,388,608\fR.
243.RE
244
99b14de4
ED
245.sp
246.ne 2
247.na
248\fBmetaslab_aliquot\fR (ulong)
249.ad
250.RS 12n
251Metaslab granularity, in bytes. This is roughly similar to what would be
252referred to as the "stripe size" in traditional RAID arrays. In normal
253operation, ZFS will try to write this amount of data to a top-level vdev
254before moving on to the next one.
255.sp
256Default value: \fB524,288\fR.
257.RE
258
f3a7f661
GW
259.sp
260.ne 2
261.na
262\fBmetaslab_bias_enabled\fR (int)
263.ad
264.RS 12n
265Enable metaslab group biasing based on its vdev's over- or under-utilization
266relative to the pool.
267.sp
268Use \fB1\fR for yes (default) and \fB0\fR for no.
269.RE
270
d830d479
MA
271.sp
272.ne 2
273.na
274\fBmetaslab_force_ganging\fR (ulong)
275.ad
276.RS 12n
277Make some blocks above a certain size be gang blocks. This option is used
278by the test suite to facilitate testing.
279.sp
280Default value: \fB16,777,217\fR.
281.RE
282
93e28d66
SD
283.sp
284.ne 2
285.na
286\fBzfs_keep_log_spacemaps_at_export\fR (int)
287.ad
288.RS 12n
289Prevent log spacemaps from being destroyed during pool exports and destroys.
290.sp
291Use \fB1\fR for yes and \fB0\fR for no (default).
292.RE
293
4e21fd06
DB
294.sp
295.ne 2
296.na
297\fBzfs_metaslab_segment_weight_enabled\fR (int)
298.ad
299.RS 12n
300Enable/disable segment-based metaslab selection.
301.sp
302Use \fB1\fR for yes (default) and \fB0\fR for no.
303.RE
304
305.sp
306.ne 2
307.na
308\fBzfs_metaslab_switch_threshold\fR (int)
309.ad
310.RS 12n
311When using segment-based metaslab selection, continue allocating
321204be 312from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
4e21fd06
DB
313worth of buckets have been exhausted.
314.sp
315Default value: \fB2\fR.
316.RE
317
29714574
TF
318.sp
319.ne 2
320.na
aa7d06a9 321\fBmetaslab_debug_load\fR (int)
29714574
TF
322.ad
323.RS 12n
aa7d06a9
GW
324Load all metaslabs during pool import.
325.sp
326Use \fB1\fR for yes and \fB0\fR for no (default).
327.RE
328
329.sp
330.ne 2
331.na
332\fBmetaslab_debug_unload\fR (int)
333.ad
334.RS 12n
335Prevent metaslabs from being unloaded.
29714574
TF
336.sp
337Use \fB1\fR for yes and \fB0\fR for no (default).
338.RE
339
f3a7f661
GW
340.sp
341.ne 2
342.na
343\fBmetaslab_fragmentation_factor_enabled\fR (int)
344.ad
345.RS 12n
346Enable use of the fragmentation metric in computing metaslab weights.
347.sp
348Use \fB1\fR for yes (default) and \fB0\fR for no.
349.RE
350
d3230d76
MA
351.sp
352.ne 2
353.na
354\fBmetaslab_df_max_search\fR (int)
355.ad
356.RS 12n
357Maximum distance to search forward from the last offset. Without this limit,
358fragmented pools can see >100,000 iterations and metaslab_block_picker()
359becomes the performance limiting factor on high-performance storage.
360
361With the default setting of 16MB, we typically see less than 500 iterations,
362even with very fragmented, ashift=9 pools. The maximum number of iterations
363possible is: \fBmetaslab_df_max_search / (2 * (1<<ashift))\fR.
364With the default setting of 16MB this is 16*1024 (with ashift=9) or 2048
365(with ashift=12).
366.sp
367Default value: \fB16,777,216\fR (16MB)
368.RE
369
370.sp
371.ne 2
372.na
373\fBmetaslab_df_use_largest_segment\fR (int)
374.ad
375.RS 12n
376If we are not searching forward (due to metaslab_df_max_search,
377metaslab_df_free_pct, or metaslab_df_alloc_threshold), this tunable controls
378what segment is used. If it is set, we will use the largest free segment.
379If it is not set, we will use a segment of exactly the requested size (or
380larger).
381.sp
382Use \fB1\fR for yes and \fB0\fR for no (default).
383.RE
384
c81f1790
PD
385.sp
386.ne 2
387.na
388\fBzfs_metaslab_max_size_cache_sec\fR (ulong)
389.ad
390.RS 12n
391When we unload a metaslab, we cache the size of the largest free chunk. We use
392that cached size to determine whether or not to load a metaslab for a given
393allocation. As more frees accumulate in that metaslab while it's unloaded, the
394cached max size becomes less and less accurate. After a number of seconds
395controlled by this tunable, we stop considering the cached max size and start
396considering only the histogram instead.
397.sp
398Default value: \fB3600 seconds\fR (one hour)
399.RE
400
f09fda50
PD
401.sp
402.ne 2
403.na
404\fBzfs_metaslab_mem_limit\fR (int)
405.ad
406.RS 12n
407When we are loading a new metaslab, we check the amount of memory being used
408to store metaslab range trees. If it is over a threshold, we attempt to unload
409the least recently used metaslab to prevent the system from clogging all of
410its memory with range trees. This tunable sets the percentage of total system
411memory that is the threshold.
412.sp
eef0f4d8 413Default value: \fB25 percent\fR
f09fda50
PD
414.RE
415
b8bcca18
MA
416.sp
417.ne 2
418.na
c853f382 419\fBzfs_vdev_default_ms_count\fR (int)
b8bcca18
MA
420.ad
421.RS 12n
e4e94ca3 422When a vdev is added target this number of metaslabs per top-level vdev.
b8bcca18
MA
423.sp
424Default value: \fB200\fR.
425.RE
426
93e28d66
SD
427.sp
428.ne 2
429.na
430\fBzfs_vdev_default_ms_shift\fR (int)
431.ad
432.RS 12n
433Default limit for metaslab size.
434.sp
435Default value: \fB29\fR [meaning (1 << 29) = 512MB].
436.RE
437
d2734cce
SD
438.sp
439.ne 2
440.na
c853f382 441\fBzfs_vdev_min_ms_count\fR (int)
d2734cce
SD
442.ad
443.RS 12n
444Minimum number of metaslabs to create in a top-level vdev.
445.sp
446Default value: \fB16\fR.
447.RE
448
e4e94ca3
DB
449.sp
450.ne 2
451.na
67709516
D
452\fBvdev_validate_skip\fR (int)
453.ad
454.RS 12n
455Skip label validation steps during pool import. Changing is not recommended
456unless you know what you are doing and are recovering a damaged label.
457.sp
458Default value: \fB0\fR.
459.RE
460
461.sp
462.ne 2
463.na
464\fBzfs_vdev_ms_count_limit\fR (int)
e4e94ca3
DB
465.ad
466.RS 12n
467Practical upper limit of total metaslabs per top-level vdev.
468.sp
469Default value: \fB131,072\fR.
470.RE
471
f3a7f661
GW
472.sp
473.ne 2
474.na
475\fBmetaslab_preload_enabled\fR (int)
476.ad
477.RS 12n
478Enable metaslab group preloading.
479.sp
480Use \fB1\fR for yes (default) and \fB0\fR for no.
481.RE
482
483.sp
484.ne 2
485.na
486\fBmetaslab_lba_weighting_enabled\fR (int)
487.ad
488.RS 12n
489Give more weight to metaslabs with lower LBAs, assuming they have
490greater bandwidth as is typically the case on a modern constant
491angular velocity disk drive.
492.sp
493Use \fB1\fR for yes (default) and \fB0\fR for no.
494.RE
495
eef0f4d8
PD
496.sp
497.ne 2
498.na
499\fBmetaslab_unload_delay\fR (int)
500.ad
501.RS 12n
502After a metaslab is used, we keep it loaded for this many txgs, to attempt to
503reduce unnecessary reloading. Note that both this many txgs and
504\fBmetaslab_unload_delay_ms\fR milliseconds must pass before unloading will
505occur.
506.sp
507Default value: \fB32\fR.
508.RE
509
510.sp
511.ne 2
512.na
513\fBmetaslab_unload_delay_ms\fR (int)
514.ad
515.RS 12n
516After a metaslab is used, we keep it loaded for this many milliseconds, to
517attempt to reduce unnecessary reloading. Note that both this many
518milliseconds and \fBmetaslab_unload_delay\fR txgs must pass before unloading
519will occur.
520.sp
521Default value: \fB600000\fR (ten minutes).
522.RE
523
6ce7b2d9
RL
524.sp
525.ne 2
526.na
527\fBsend_holes_without_birth_time\fR (int)
528.ad
529.RS 12n
530When set, the hole_birth optimization will not be used, and all holes will
d0c3aa9c
TC
531always be sent on zfs send. This is useful if you suspect your datasets are
532affected by a bug in hole_birth.
6ce7b2d9
RL
533.sp
534Use \fB1\fR for on (default) and \fB0\fR for off.
535.RE
536
29714574
TF
537.sp
538.ne 2
539.na
540\fBspa_config_path\fR (charp)
541.ad
542.RS 12n
543SPA config file
544.sp
545Default value: \fB/etc/zfs/zpool.cache\fR.
546.RE
547
e8b96c60
MA
548.sp
549.ne 2
550.na
551\fBspa_asize_inflation\fR (int)
552.ad
553.RS 12n
554Multiplication factor used to estimate actual disk consumption from the
555size of data being written. The default value is a worst case estimate,
556but lower values may be valid for a given pool depending on its
557configuration. Pool administrators who understand the factors involved
558may wish to specify a more realistic inflation factor, particularly if
559they operate close to quota or capacity limits.
560.sp
83426735 561Default value: \fB24\fR.
e8b96c60
MA
562.RE
563
6cb8e530
PZ
564.sp
565.ne 2
566.na
567\fBspa_load_print_vdev_tree\fR (int)
568.ad
569.RS 12n
570Whether to print the vdev tree in the debugging message buffer during pool import.
571Use 0 to disable and 1 to enable.
572.sp
573Default value: \fB0\fR.
574.RE
575
dea377c0
MA
576.sp
577.ne 2
578.na
579\fBspa_load_verify_data\fR (int)
580.ad
581.RS 12n
582Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
583import. Use 0 to disable and 1 to enable.
584
585An extreme rewind import normally performs a full traversal of all
586blocks in the pool for verification. If this parameter is set to 0,
587the traversal skips non-metadata blocks. It can be toggled once the
588import has started to stop or start the traversal of non-metadata blocks.
589.sp
83426735 590Default value: \fB1\fR.
dea377c0
MA
591.RE
592
593.sp
594.ne 2
595.na
596\fBspa_load_verify_metadata\fR (int)
597.ad
598.RS 12n
599Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
600pool import. Use 0 to disable and 1 to enable.
601
602An extreme rewind import normally performs a full traversal of all
1c012083 603blocks in the pool for verification. If this parameter is set to 0,
dea377c0
MA
604the traversal is not performed. It can be toggled once the import has
605started to stop or start the traversal.
606.sp
83426735 607Default value: \fB1\fR.
dea377c0
MA
608.RE
609
610.sp
611.ne 2
612.na
c8242a96 613\fBspa_load_verify_shift\fR (int)
dea377c0
MA
614.ad
615.RS 12n
c8242a96
GW
616Sets the maximum number of bytes to consume during pool import to the log2
617fraction of the target arc size.
dea377c0 618.sp
c8242a96 619Default value: \fB4\fR.
dea377c0
MA
620.RE
621
6cde6435
BB
622.sp
623.ne 2
624.na
625\fBspa_slop_shift\fR (int)
626.ad
627.RS 12n
628Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
629in the pool to be consumed. This ensures that we don't run the pool
630completely out of space, due to unaccounted changes (e.g. to the MOS).
631It also limits the worst-case time to allocate space. If we have
632less than this amount of free space, most ZPL operations (e.g. write,
633create) will return ENOSPC.
634.sp
83426735 635Default value: \fB5\fR.
6cde6435
BB
636.RE
637
0dc2f70c
MA
638.sp
639.ne 2
640.na
641\fBvdev_removal_max_span\fR (int)
642.ad
643.RS 12n
644During top-level vdev removal, chunks of data are copied from the vdev
645which may include free space in order to trade bandwidth for IOPS.
646This parameter determines the maximum span of free space (in bytes)
647which will be included as "unnecessary" data in a chunk of copied data.
648
649The default value here was chosen to align with
650\fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing
651regular reads (but there's no reason it has to be the same).
652.sp
653Default value: \fB32,768\fR.
654.RE
655
d9b4bf06
MA
656.sp
657.ne 2
658.na
659\fBzap_iterate_prefetch\fR (int)
660.ad
661.RS 12n
662If this is set, when we start iterating over a ZAP object, zfs will prefetch
663the entire object (all leaf blocks). However, this is limited by
664\fBdmu_prefetch_max\fR.
665.sp
666Use \fB1\fR for on (default) and \fB0\fR for off.
667.RE
668
29714574
TF
669.sp
670.ne 2
671.na
672\fBzfetch_array_rd_sz\fR (ulong)
673.ad
674.RS 12n
27b293be 675If prefetching is enabled, disable prefetching for reads larger than this size.
29714574
TF
676.sp
677Default value: \fB1,048,576\fR.
678.RE
679
680.sp
681.ne 2
682.na
7f60329a 683\fBzfetch_max_distance\fR (uint)
29714574
TF
684.ad
685.RS 12n
7f60329a 686Max bytes to prefetch per stream (default 8MB).
29714574 687.sp
7f60329a 688Default value: \fB8,388,608\fR.
29714574
TF
689.RE
690
691.sp
692.ne 2
693.na
694\fBzfetch_max_streams\fR (uint)
695.ad
696.RS 12n
27b293be 697Max number of streams per zfetch (prefetch streams per file).
29714574
TF
698.sp
699Default value: \fB8\fR.
700.RE
701
702.sp
703.ne 2
704.na
705\fBzfetch_min_sec_reap\fR (uint)
706.ad
707.RS 12n
27b293be 708Min time before an active prefetch stream can be reclaimed
29714574
TF
709.sp
710Default value: \fB2\fR.
711.RE
712
67709516
D
713.sp
714.ne 2
715.na
716\fBzfs_abd_scatter_enabled\fR (int)
717.ad
718.RS 12n
719Enables ARC from using scatter/gather lists and forces all allocations to be
720linear in kernel memory. Disabling can improve performance in some code paths
721at the expense of fragmented kernel memory.
722.sp
723Default value: \fB1\fR.
724.RE
725
726.sp
727.ne 2
728.na
729\fBzfs_abd_scatter_max_order\fR (iunt)
730.ad
731.RS 12n
732Maximum number of consecutive memory pages allocated in a single block for
733scatter/gather lists. Default value is specified by the kernel itself.
734.sp
735Default value: \fB10\fR at the time of this writing.
736.RE
737
87c25d56
MA
738.sp
739.ne 2
740.na
741\fBzfs_abd_scatter_min_size\fR (uint)
742.ad
743.RS 12n
744This is the minimum allocation size that will use scatter (page-based)
745ABD's. Smaller allocations will use linear ABD's.
746.sp
747Default value: \fB1536\fR (512B and 1KB allocations will be linear).
748.RE
749
25458cbe
TC
750.sp
751.ne 2
752.na
753\fBzfs_arc_dnode_limit\fR (ulong)
754.ad
755.RS 12n
756When the number of bytes consumed by dnodes in the ARC exceeds this number of
9907cc1c 757bytes, try to unpin some of it in response to demand for non-metadata. This
627791f3 758value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
9907cc1c
G
759indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
760the ARC meta buffers that may be used for dnodes.
25458cbe
TC
761
762See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
763when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
764than in response to overall demand for non-metadata.
765
766.sp
9907cc1c
G
767Default value: \fB0\fR.
768.RE
769
770.sp
771.ne 2
772.na
773\fBzfs_arc_dnode_limit_percent\fR (ulong)
774.ad
775.RS 12n
776Percentage that can be consumed by dnodes of ARC meta buffers.
777.sp
778See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
779higher priority if set to nonzero value.
780.sp
be54a13c 781Default value: \fB10\fR%.
25458cbe
TC
782.RE
783
784.sp
785.ne 2
786.na
787\fBzfs_arc_dnode_reduce_percent\fR (ulong)
788.ad
789.RS 12n
790Percentage of ARC dnodes to try to scan in response to demand for non-metadata
6146e17e 791when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
25458cbe
TC
792
793.sp
be54a13c 794Default value: \fB10\fR% of the number of dnodes in the ARC.
25458cbe
TC
795.RE
796
49ddb315
MA
797.sp
798.ne 2
799.na
800\fBzfs_arc_average_blocksize\fR (int)
801.ad
802.RS 12n
803The ARC's buffer hash table is sized based on the assumption of an average
804block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
805to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
806For configurations with a known larger average block size this value can be
807increased to reduce the memory footprint.
808
809.sp
810Default value: \fB8192\fR.
811.RE
812
ca0bf58d
PS
813.sp
814.ne 2
815.na
816\fBzfs_arc_evict_batch_limit\fR (int)
817.ad
818.RS 12n
8f343973 819Number ARC headers to evict per sub-list before proceeding to another sub-list.
ca0bf58d
PS
820This batch-style operation prevents entire sub-lists from being evicted at once
821but comes at a cost of additional unlocking and locking.
822.sp
823Default value: \fB10\fR.
824.RE
825
29714574
TF
826.sp
827.ne 2
828.na
829\fBzfs_arc_grow_retry\fR (int)
830.ad
831.RS 12n
ca85d690 832If set to a non zero value, it will replace the arc_grow_retry value with this value.
d4a72f23 833The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
ca85d690 834trying to resume growth after a memory pressure event.
29714574 835.sp
ca85d690 836Default value: \fB0\fR.
29714574
TF
837.RE
838
839.sp
840.ne 2
841.na
7e8bddd0 842\fBzfs_arc_lotsfree_percent\fR (int)
29714574
TF
843.ad
844.RS 12n
7e8bddd0
BB
845Throttle I/O when free system memory drops below this percentage of total
846system memory. Setting this value to 0 will disable the throttle.
29714574 847.sp
be54a13c 848Default value: \fB10\fR%.
29714574
TF
849.RE
850
851.sp
852.ne 2
853.na
7e8bddd0 854\fBzfs_arc_max\fR (ulong)
29714574
TF
855.ad
856.RS 12n
9a51738b
RM
857Max size of ARC in bytes. If set to 0 then the max size of ARC is determined
858by the amount of system memory installed. For Linux, 1/2 of system memory will
859be used as the limit. For FreeBSD, the larger of all system memory - 1GB or
8605/8 of system memory will be used as the limit. This value must be at least
86167108864 (64 megabytes).
83426735
D
862.sp
863This value can be changed dynamically with some caveats. It cannot be set back
864to 0 while running and reducing it below the current ARC size will not cause
865the ARC to shrink without memory pressure to induce shrinking.
29714574 866.sp
7e8bddd0 867Default value: \fB0\fR.
29714574
TF
868.RE
869
ca85d690 870.sp
871.ne 2
872.na
873\fBzfs_arc_meta_adjust_restarts\fR (ulong)
874.ad
875.RS 12n
876The number of restart passes to make while scanning the ARC attempting
877the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
878This value should not need to be tuned but is available to facilitate
879performance analysis.
880.sp
881Default value: \fB4096\fR.
882.RE
883
29714574
TF
884.sp
885.ne 2
886.na
887\fBzfs_arc_meta_limit\fR (ulong)
888.ad
889.RS 12n
2cbb06b5
BB
890The maximum allowed size in bytes that meta data buffers are allowed to
891consume in the ARC. When this limit is reached meta data buffers will
892be reclaimed even if the overall arc_c_max has not been reached. This
9907cc1c
G
893value defaults to 0 which indicates that a percent which is based on
894\fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
29714574 895.sp
83426735 896This value my be changed dynamically except that it cannot be set back to 0
9907cc1c 897for a specific percent of the ARC; it must be set to an explicit value.
83426735 898.sp
29714574
TF
899Default value: \fB0\fR.
900.RE
901
9907cc1c
G
902.sp
903.ne 2
904.na
905\fBzfs_arc_meta_limit_percent\fR (ulong)
906.ad
907.RS 12n
908Percentage of ARC buffers that can be used for meta data.
909
910See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
911higher priority if set to nonzero value.
912
913.sp
be54a13c 914Default value: \fB75\fR%.
9907cc1c
G
915.RE
916
ca0bf58d
PS
917.sp
918.ne 2
919.na
920\fBzfs_arc_meta_min\fR (ulong)
921.ad
922.RS 12n
923The minimum allowed size in bytes that meta data buffers may consume in
924the ARC. This value defaults to 0 which disables a floor on the amount
925of the ARC devoted meta data.
926.sp
927Default value: \fB0\fR.
928.RE
929
29714574
TF
930.sp
931.ne 2
932.na
933\fBzfs_arc_meta_prune\fR (int)
934.ad
935.RS 12n
2cbb06b5
BB
936The number of dentries and inodes to be scanned looking for entries
937which can be dropped. This may be required when the ARC reaches the
938\fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
939in the ARC. Increasing this value will cause to dentry and inode caches
940to be pruned more aggressively. Setting this value to 0 will disable
941pruning the inode and dentry caches.
29714574 942.sp
2cbb06b5 943Default value: \fB10,000\fR.
29714574
TF
944.RE
945
bc888666
BB
946.sp
947.ne 2
948.na
ca85d690 949\fBzfs_arc_meta_strategy\fR (int)
bc888666
BB
950.ad
951.RS 12n
ca85d690 952Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
953A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
d4a72f23 954A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
ca85d690 955that is required to in order to evict the required number of meta data buffers.
bc888666 956.sp
ca85d690 957Default value: \fB1\fR.
bc888666
BB
958.RE
959
29714574
TF
960.sp
961.ne 2
962.na
963\fBzfs_arc_min\fR (ulong)
964.ad
965.RS 12n
ca85d690 966Min arc size of ARC in bytes. If set to 0 then arc_c_min will default to
967consuming the larger of 32M or 1/32 of total system memory.
29714574 968.sp
ca85d690 969Default value: \fB0\fR.
29714574
TF
970.RE
971
972.sp
973.ne 2
974.na
d4a72f23 975\fBzfs_arc_min_prefetch_ms\fR (int)
29714574
TF
976.ad
977.RS 12n
d4a72f23 978Minimum time prefetched blocks are locked in the ARC, specified in ms.
2b84817f 979A value of \fB0\fR will default to 1000 ms.
d4a72f23
TC
980.sp
981Default value: \fB0\fR.
982.RE
983
984.sp
985.ne 2
986.na
987\fBzfs_arc_min_prescient_prefetch_ms\fR (int)
988.ad
989.RS 12n
990Minimum time "prescient prefetched" blocks are locked in the ARC, specified
ac3d4d0c 991in ms. These blocks are meant to be prefetched fairly aggressively ahead of
2b84817f 992the code that may use them. A value of \fB0\fR will default to 6000 ms.
29714574 993.sp
83426735 994Default value: \fB0\fR.
29714574
TF
995.RE
996
6cb8e530
PZ
997.sp
998.ne 2
999.na
1000\fBzfs_max_missing_tvds\fR (int)
1001.ad
1002.RS 12n
1003Number of missing top-level vdevs which will be allowed during
1004pool import (only in read-only mode).
1005.sp
1006Default value: \fB0\fR
1007.RE
1008
ca0bf58d
PS
1009.sp
1010.ne 2
1011.na
c30e58c4 1012\fBzfs_multilist_num_sublists\fR (int)
ca0bf58d
PS
1013.ad
1014.RS 12n
1015To allow more fine-grained locking, each ARC state contains a series
1016of lists for both data and meta data objects. Locking is performed at
1017the level of these "sub-lists". This parameters controls the number of
c30e58c4
MA
1018sub-lists per ARC state, and also applies to other uses of the
1019multilist data structure.
ca0bf58d 1020.sp
c30e58c4 1021Default value: \fB4\fR or the number of online CPUs, whichever is greater
ca0bf58d
PS
1022.RE
1023
1024.sp
1025.ne 2
1026.na
1027\fBzfs_arc_overflow_shift\fR (int)
1028.ad
1029.RS 12n
1030The ARC size is considered to be overflowing if it exceeds the current
1031ARC target size (arc_c) by a threshold determined by this parameter.
1032The threshold is calculated as a fraction of arc_c using the formula
1033"arc_c >> \fBzfs_arc_overflow_shift\fR".
1034
1035The default value of 8 causes the ARC to be considered to be overflowing
1036if it exceeds the target size by 1/256th (0.3%) of the target size.
1037
1038When the ARC is overflowing, new buffer allocations are stalled until
1039the reclaim thread catches up and the overflow condition no longer exists.
1040.sp
1041Default value: \fB8\fR.
1042.RE
1043
728d6ae9
BB
1044.sp
1045.ne 2
1046.na
1047
1048\fBzfs_arc_p_min_shift\fR (int)
1049.ad
1050.RS 12n
ca85d690 1051If set to a non zero value, this will update arc_p_min_shift (default 4)
1052with the new value.
d4a72f23 1053arc_p_min_shift is used to shift of arc_c for calculating both min and max
ca85d690 1054max arc_p
728d6ae9 1055.sp
ca85d690 1056Default value: \fB0\fR.
728d6ae9
BB
1057.RE
1058
62422785
PS
1059.sp
1060.ne 2
1061.na
1062\fBzfs_arc_p_dampener_disable\fR (int)
1063.ad
1064.RS 12n
1065Disable arc_p adapt dampener
1066.sp
1067Use \fB1\fR for yes (default) and \fB0\fR to disable.
1068.RE
1069
29714574
TF
1070.sp
1071.ne 2
1072.na
1073\fBzfs_arc_shrink_shift\fR (int)
1074.ad
1075.RS 12n
ca85d690 1076If set to a non zero value, this will update arc_shrink_shift (default 7)
1077with the new value.
29714574 1078.sp
ca85d690 1079Default value: \fB0\fR.
29714574
TF
1080.RE
1081
03b60eee
DB
1082.sp
1083.ne 2
1084.na
1085\fBzfs_arc_pc_percent\fR (uint)
1086.ad
1087.RS 12n
1088Percent of pagecache to reclaim arc to
1089
1090This tunable allows ZFS arc to play more nicely with the kernel's LRU
1091pagecache. It can guarantee that the arc size won't collapse under scanning
1092pressure on the pagecache, yet still allows arc to be reclaimed down to
1093zfs_arc_min if necessary. This value is specified as percent of pagecache
1094size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
1095only operates during memory pressure/reclaim.
1096.sp
be54a13c 1097Default value: \fB0\fR% (disabled).
03b60eee
DB
1098.RE
1099
11f552fa
BB
1100.sp
1101.ne 2
1102.na
1103\fBzfs_arc_sys_free\fR (ulong)
1104.ad
1105.RS 12n
1106The target number of bytes the ARC should leave as free memory on the system.
1107Defaults to the larger of 1/64 of physical memory or 512K. Setting this
1108option to a non-zero value will override the default.
1109.sp
1110Default value: \fB0\fR.
1111.RE
1112
29714574
TF
1113.sp
1114.ne 2
1115.na
1116\fBzfs_autoimport_disable\fR (int)
1117.ad
1118.RS 12n
27b293be 1119Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
29714574 1120.sp
70081096 1121Use \fB1\fR for yes (default) and \fB0\fR for no.
29714574
TF
1122.RE
1123
80d52c39
TH
1124.sp
1125.ne 2
1126.na
67709516 1127\fBzfs_checksum_events_per_second\fR (uint)
80d52c39
TH
1128.ad
1129.RS 12n
1130Rate limit checksum events to this many per second. Note that this should
1131not be set below the zed thresholds (currently 10 checksums over 10 sec)
1132or else zed may not trigger any action.
1133.sp
1134Default value: 20
1135.RE
1136
2fe61a7e
PS
1137.sp
1138.ne 2
1139.na
1140\fBzfs_commit_timeout_pct\fR (int)
1141.ad
1142.RS 12n
1143This controls the amount of time that a ZIL block (lwb) will remain "open"
1144when it isn't "full", and it has a thread waiting for it to be committed to
1145stable storage. The timeout is scaled based on a percentage of the last lwb
1146latency to avoid significantly impacting the latency of each individual
1147transaction record (itx).
1148.sp
be54a13c 1149Default value: \fB5\fR%.
2fe61a7e
PS
1150.RE
1151
67709516
D
1152.sp
1153.ne 2
1154.na
1155\fBzfs_condense_indirect_commit_entry_delay_ms\fR (int)
1156.ad
1157.RS 12n
1158Vdev indirection layer (used for device removal) sleeps for this many
1159milliseconds during mapping generation. Intended for use with the test suite
1160to throttle vdev removal speed.
1161.sp
1162Default value: \fB0\fR (no throttle).
1163.RE
1164
0dc2f70c
MA
1165.sp
1166.ne 2
1167.na
1168\fBzfs_condense_indirect_vdevs_enable\fR (int)
1169.ad
1170.RS 12n
1171Enable condensing indirect vdev mappings. When set to a non-zero value,
1172attempt to condense indirect vdev mappings if the mapping uses more than
1173\fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete
1174space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR
1175bytes on-disk. The condensing process is an attempt to save memory by
1176removing obsolete mappings.
1177.sp
1178Default value: \fB1\fR.
1179.RE
1180
1181.sp
1182.ne 2
1183.na
1184\fBzfs_condense_max_obsolete_bytes\fR (ulong)
1185.ad
1186.RS 12n
1187Only attempt to condense indirect vdev mappings if the on-disk size
1188of the obsolete space map object is greater than this number of bytes
1189(see \fBfBzfs_condense_indirect_vdevs_enable\fR).
1190.sp
1191Default value: \fB1,073,741,824\fR.
1192.RE
1193
1194.sp
1195.ne 2
1196.na
1197\fBzfs_condense_min_mapping_bytes\fR (ulong)
1198.ad
1199.RS 12n
1200Minimum size vdev mapping to attempt to condense (see
1201\fBzfs_condense_indirect_vdevs_enable\fR).
1202.sp
1203Default value: \fB131,072\fR.
1204.RE
1205
3b36f831
BB
1206.sp
1207.ne 2
1208.na
1209\fBzfs_dbgmsg_enable\fR (int)
1210.ad
1211.RS 12n
1212Internally ZFS keeps a small log to facilitate debugging. By default the log
1213is disabled, to enable it set this option to 1. The contents of the log can
1214be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
1215this proc file clears the log.
1216.sp
1217Default value: \fB0\fR.
1218.RE
1219
1220.sp
1221.ne 2
1222.na
1223\fBzfs_dbgmsg_maxsize\fR (int)
1224.ad
1225.RS 12n
1226The maximum size in bytes of the internal ZFS debug log.
1227.sp
1228Default value: \fB4M\fR.
1229.RE
1230
29714574
TF
1231.sp
1232.ne 2
1233.na
1234\fBzfs_dbuf_state_index\fR (int)
1235.ad
1236.RS 12n
83426735
D
1237This feature is currently unused. It is normally used for controlling what
1238reporting is available under /proc/spl/kstat/zfs.
29714574
TF
1239.sp
1240Default value: \fB0\fR.
1241.RE
1242
1243.sp
1244.ne 2
1245.na
1246\fBzfs_deadman_enabled\fR (int)
1247.ad
1248.RS 12n
b81a3ddc 1249When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
8fb1ede1
BB
1250milliseconds, or when an individual I/O takes longer than
1251\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
1252be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
1253invoked as described by the \fBzfs_deadman_failmode\fR module option.
1254By default the deadman is enabled and configured to \fBwait\fR which results
1255in "hung" I/Os only being logged. The deadman is automatically disabled
1256when a pool gets suspended.
29714574 1257.sp
8fb1ede1
BB
1258Default value: \fB1\fR.
1259.RE
1260
1261.sp
1262.ne 2
1263.na
1264\fBzfs_deadman_failmode\fR (charp)
1265.ad
1266.RS 12n
1267Controls the failure behavior when the deadman detects a "hung" I/O. Valid
1268values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
1269.sp
1270\fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
1271"deadman" event will be posted describing that I/O.
1272.sp
1273\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
1274to the I/O pipeline if possible.
1275.sp
1276\fBpanic\fR - Panic the system. This can be used to facilitate an automatic
1277fail-over to a properly configured fail-over partner.
1278.sp
1279Default value: \fBwait\fR.
b81a3ddc
TC
1280.RE
1281
1282.sp
1283.ne 2
1284.na
1285\fBzfs_deadman_checktime_ms\fR (int)
1286.ad
1287.RS 12n
8fb1ede1
BB
1288Check time in milliseconds. This defines the frequency at which we check
1289for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
b81a3ddc 1290.sp
8fb1ede1 1291Default value: \fB60,000\fR.
29714574
TF
1292.RE
1293
1294.sp
1295.ne 2
1296.na
e8b96c60 1297\fBzfs_deadman_synctime_ms\fR (ulong)
29714574
TF
1298.ad
1299.RS 12n
b81a3ddc 1300Interval in milliseconds after which the deadman is triggered and also
8fb1ede1
BB
1301the interval after which a pool sync operation is considered to be "hung".
1302Once this limit is exceeded the deadman will be invoked every
1303\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
1304.sp
1305Default value: \fB600,000\fR.
1306.RE
b81a3ddc 1307
29714574 1308.sp
8fb1ede1
BB
1309.ne 2
1310.na
1311\fBzfs_deadman_ziotime_ms\fR (ulong)
1312.ad
1313.RS 12n
1314Interval in milliseconds after which the deadman is triggered and an
ad796b8a 1315individual I/O operation is considered to be "hung". As long as the I/O
8fb1ede1
BB
1316remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
1317milliseconds until the I/O completes.
1318.sp
1319Default value: \fB300,000\fR.
29714574
TF
1320.RE
1321
1322.sp
1323.ne 2
1324.na
1325\fBzfs_dedup_prefetch\fR (int)
1326.ad
1327.RS 12n
1328Enable prefetching dedup-ed blks
1329.sp
0dfc7324 1330Use \fB1\fR for yes and \fB0\fR to disable (default).
29714574
TF
1331.RE
1332
e8b96c60
MA
1333.sp
1334.ne 2
1335.na
1336\fBzfs_delay_min_dirty_percent\fR (int)
1337.ad
1338.RS 12n
1339Start to delay each transaction once there is this amount of dirty data,
1340expressed as a percentage of \fBzfs_dirty_data_max\fR.
1341This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
1342See the section "ZFS TRANSACTION DELAY".
1343.sp
be54a13c 1344Default value: \fB60\fR%.
e8b96c60
MA
1345.RE
1346
1347.sp
1348.ne 2
1349.na
1350\fBzfs_delay_scale\fR (int)
1351.ad
1352.RS 12n
1353This controls how quickly the transaction delay approaches infinity.
1354Larger values cause longer delays for a given amount of dirty data.
1355.sp
1356For the smoothest delay, this value should be about 1 billion divided
1357by the maximum number of operations per second. This will smoothly
1358handle between 10x and 1/10th this number.
1359.sp
1360See the section "ZFS TRANSACTION DELAY".
1361.sp
1362Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
1363.sp
1364Default value: \fB500,000\fR.
1365.RE
1366
67709516
D
1367.sp
1368.ne 2
1369.na
1370\fBzfs_disable_ivset_guid_check\fR (int)
1371.ad
1372.RS 12n
1373Disables requirement for IVset guids to be present and match when doing a raw
1374receive of encrypted datasets. Intended for users whose pools were created with
1375ZFS on Linux pre-release versions and now have compatibility issues.
1376.sp
1377Default value: \fB0\fR.
1378.RE
1379
1380.sp
1381.ne 2
1382.na
1383\fBzfs_key_max_salt_uses\fR (ulong)
1384.ad
1385.RS 12n
1386Maximum number of uses of a single salt value before generating a new one for
1387encrypted datasets. The default value is also the maximum that will be
1388accepted.
1389.sp
1390Default value: \fB400,000,000\fR.
1391.RE
1392
1393.sp
1394.ne 2
1395.na
1396\fBzfs_object_mutex_size\fR (uint)
1397.ad
1398.RS 12n
1399Size of the znode hashtable used for holds.
1400
1401Due to the need to hold locks on objects that may not exist yet, kernel mutexes
1402are not created per-object and instead a hashtable is used where collisions
1403will result in objects waiting when there is not actually contention on the
1404same object.
1405.sp
1406Default value: \fB64\fR.
1407.RE
1408
80d52c39
TH
1409.sp
1410.ne 2
1411.na
62ee31ad 1412\fBzfs_slow_io_events_per_second\fR (int)
80d52c39
TH
1413.ad
1414.RS 12n
ad796b8a 1415Rate limit delay zevents (which report slow I/Os) to this many per second.
80d52c39
TH
1416.sp
1417Default value: 20
1418.RE
1419
93e28d66
SD
1420.sp
1421.ne 2
1422.na
1423\fBzfs_unflushed_max_mem_amt\fR (ulong)
1424.ad
1425.RS 12n
1426Upper-bound limit for unflushed metadata changes to be held by the
1427log spacemap in memory (in bytes).
1428.sp
1429Default value: \fB1,073,741,824\fR (1GB).
1430.RE
1431
1432.sp
1433.ne 2
1434.na
1435\fBzfs_unflushed_max_mem_ppm\fR (ulong)
1436.ad
1437.RS 12n
1438Percentage of the overall system memory that ZFS allows to be used
1439for unflushed metadata changes by the log spacemap.
1440(value is calculated over 1000000 for finer granularity).
1441.sp
1442Default value: \fB1000\fR (which is divided by 1000000, resulting in
1443the limit to be \fB0.1\fR% of memory)
1444.RE
1445
1446.sp
1447.ne 2
1448.na
1449\fBzfs_unflushed_log_block_max\fR (ulong)
1450.ad
1451.RS 12n
1452Describes the maximum number of log spacemap blocks allowed for each pool.
1453The default value of 262144 means that the space in all the log spacemaps
1454can add up to no more than 262144 blocks (which means 32GB of logical
1455space before compression and ditto blocks, assuming that blocksize is
1456128k).
1457.sp
1458This tunable is important because it involves a trade-off between import
1459time after an unclean export and the frequency of flushing metaslabs.
1460The higher this number is, the more log blocks we allow when the pool is
1461active which means that we flush metaslabs less often and thus decrease
1462the number of I/Os for spacemap updates per TXG.
1463At the same time though, that means that in the event of an unclean export,
1464there will be more log spacemap blocks for us to read, inducing overhead
1465in the import time of the pool.
1466The lower the number, the amount of flushing increases destroying log
1467blocks quicker as they become obsolete faster, which leaves less blocks
1468to be read during import time after a crash.
1469.sp
1470Each log spacemap block existing during pool import leads to approximately
1471one extra logical I/O issued.
1472This is the reason why this tunable is exposed in terms of blocks rather
1473than space used.
1474.sp
1475Default value: \fB262144\fR (256K).
1476.RE
1477
1478.sp
1479.ne 2
1480.na
1481\fBzfs_unflushed_log_block_min\fR (ulong)
1482.ad
1483.RS 12n
1484If the number of metaslabs is small and our incoming rate is high, we
1485could get into a situation that we are flushing all our metaslabs every
1486TXG.
1487Thus we always allow at least this many log blocks.
1488.sp
1489Default value: \fB1000\fR.
1490.RE
1491
1492.sp
1493.ne 2
1494.na
1495\fBzfs_unflushed_log_block_pct\fR (ulong)
1496.ad
1497.RS 12n
1498Tunable used to determine the number of blocks that can be used for
1499the spacemap log, expressed as a percentage of the total number of
1500metaslabs in the pool.
1501.sp
1502Default value: \fB400\fR (read as \fB400\fR% - meaning that the number
1503of log spacemap blocks are capped at 4 times the number of
1504metaslabs in the pool).
1505.RE
1506
dcec0a12
AP
1507.sp
1508.ne 2
1509.na
1510\fBzfs_unlink_suspend_progress\fR (uint)
1511.ad
1512.RS 12n
1513When enabled, files will not be asynchronously removed from the list of pending
1514unlinks and the space they consume will be leaked. Once this option has been
1515disabled and the dataset is remounted, the pending unlinks will be processed
1516and the freed space returned to the pool.
1517This option is used by the test suite to facilitate testing.
1518.sp
1519Uses \fB0\fR (default) to allow progress and \fB1\fR to pause progress.
1520.RE
1521
a966c564
K
1522.sp
1523.ne 2
1524.na
1525\fBzfs_delete_blocks\fR (ulong)
1526.ad
1527.RS 12n
1528This is the used to define a large file for the purposes of delete. Files
1529containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
1530while smaller files are deleted synchronously. Decreasing this value will
1531reduce the time spent in an unlink(2) system call at the expense of a longer
1532delay before the freed space is available.
1533.sp
1534Default value: \fB20,480\fR.
1535.RE
1536
e8b96c60
MA
1537.sp
1538.ne 2
1539.na
1540\fBzfs_dirty_data_max\fR (int)
1541.ad
1542.RS 12n
1543Determines the dirty space limit in bytes. Once this limit is exceeded, new
1544writes are halted until space frees up. This parameter takes precedence
1545over \fBzfs_dirty_data_max_percent\fR.
1546See the section "ZFS TRANSACTION DELAY".
1547.sp
be54a13c 1548Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR.
e8b96c60
MA
1549.RE
1550
1551.sp
1552.ne 2
1553.na
1554\fBzfs_dirty_data_max_max\fR (int)
1555.ad
1556.RS 12n
1557Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
1558This limit is only enforced at module load time, and will be ignored if
1559\fBzfs_dirty_data_max\fR is later changed. This parameter takes
1560precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
1561"ZFS TRANSACTION DELAY".
1562.sp
be54a13c 1563Default value: \fB25\fR% of physical RAM.
e8b96c60
MA
1564.RE
1565
1566.sp
1567.ne 2
1568.na
1569\fBzfs_dirty_data_max_max_percent\fR (int)
1570.ad
1571.RS 12n
1572Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
1573percentage of physical RAM. This limit is only enforced at module load
1574time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
1575The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
1576one. See the section "ZFS TRANSACTION DELAY".
1577.sp
be54a13c 1578Default value: \fB25\fR%.
e8b96c60
MA
1579.RE
1580
1581.sp
1582.ne 2
1583.na
1584\fBzfs_dirty_data_max_percent\fR (int)
1585.ad
1586.RS 12n
1587Determines the dirty space limit, expressed as a percentage of all
1588memory. Once this limit is exceeded, new writes are halted until space frees
1589up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
1590one. See the section "ZFS TRANSACTION DELAY".
1591.sp
be54a13c 1592Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR.
e8b96c60
MA
1593.RE
1594
1595.sp
1596.ne 2
1597.na
dfbe2675 1598\fBzfs_dirty_data_sync_percent\fR (int)
e8b96c60
MA
1599.ad
1600.RS 12n
dfbe2675
MA
1601Start syncing out a transaction group if there's at least this much dirty data
1602as a percentage of \fBzfs_dirty_data_max\fR. This should be less than
1603\fBzfs_vdev_async_write_active_min_dirty_percent\fR.
e8b96c60 1604.sp
dfbe2675 1605Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR.
e8b96c60
MA
1606.RE
1607
1eeb4562
JX
1608.sp
1609.ne 2
1610.na
1611\fBzfs_fletcher_4_impl\fR (string)
1612.ad
1613.RS 12n
1614Select a fletcher 4 implementation.
1615.sp
35a76a03 1616Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
0b2a6423 1617\fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR.
70b258fc
GN
1618All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
1619set extensions to be available and will only appear if ZFS detects that they are
1620present at runtime. If multiple implementations of fletcher 4 are available,
1621the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
1622results in the original, CPU based calculation, being used. Selecting any option
1623other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
1624the respective CPU instruction set being used.
1eeb4562
JX
1625.sp
1626Default value: \fBfastest\fR.
1627.RE
1628
ba5ad9a4
GW
1629.sp
1630.ne 2
1631.na
1632\fBzfs_free_bpobj_enabled\fR (int)
1633.ad
1634.RS 12n
1635Enable/disable the processing of the free_bpobj object.
1636.sp
1637Default value: \fB1\fR.
1638.RE
1639
36283ca2
MG
1640.sp
1641.ne 2
1642.na
a1d477c2 1643\fBzfs_async_block_max_blocks\fR (ulong)
36283ca2
MG
1644.ad
1645.RS 12n
1646Maximum number of blocks freed in a single txg.
1647.sp
4fe3a842
MA
1648Default value: \fBULONG_MAX\fR (unlimited).
1649.RE
1650
1651.sp
1652.ne 2
1653.na
1654\fBzfs_max_async_dedup_frees\fR (ulong)
1655.ad
1656.RS 12n
1657Maximum number of dedup blocks freed in a single txg.
1658.sp
36283ca2
MG
1659Default value: \fB100,000\fR.
1660.RE
1661
ca0845d5
PD
1662.sp
1663.ne 2
1664.na
1665\fBzfs_override_estimate_recordsize\fR (ulong)
1666.ad
1667.RS 12n
1668Record size calculation override for zfs send estimates.
1669.sp
1670Default value: \fB0\fR.
1671.RE
1672
e8b96c60
MA
1673.sp
1674.ne 2
1675.na
1676\fBzfs_vdev_async_read_max_active\fR (int)
1677.ad
1678.RS 12n
83426735 1679Maximum asynchronous read I/Os active to each device.
e8b96c60
MA
1680See the section "ZFS I/O SCHEDULER".
1681.sp
1682Default value: \fB3\fR.
1683.RE
1684
1685.sp
1686.ne 2
1687.na
1688\fBzfs_vdev_async_read_min_active\fR (int)
1689.ad
1690.RS 12n
1691Minimum asynchronous read I/Os active to each device.
1692See the section "ZFS I/O SCHEDULER".
1693.sp
1694Default value: \fB1\fR.
1695.RE
1696
1697.sp
1698.ne 2
1699.na
1700\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
1701.ad
1702.RS 12n
1703When the pool has more than
1704\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
1705\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
1706the dirty data is between min and max, the active I/O limit is linearly
1707interpolated. See the section "ZFS I/O SCHEDULER".
1708.sp
be54a13c 1709Default value: \fB60\fR%.
e8b96c60
MA
1710.RE
1711
1712.sp
1713.ne 2
1714.na
1715\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
1716.ad
1717.RS 12n
1718When the pool has less than
1719\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
1720\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
1721the dirty data is between min and max, the active I/O limit is linearly
1722interpolated. See the section "ZFS I/O SCHEDULER".
1723.sp
be54a13c 1724Default value: \fB30\fR%.
e8b96c60
MA
1725.RE
1726
1727.sp
1728.ne 2
1729.na
1730\fBzfs_vdev_async_write_max_active\fR (int)
1731.ad
1732.RS 12n
83426735 1733Maximum asynchronous write I/Os active to each device.
e8b96c60
MA
1734See the section "ZFS I/O SCHEDULER".
1735.sp
1736Default value: \fB10\fR.
1737.RE
1738
1739.sp
1740.ne 2
1741.na
1742\fBzfs_vdev_async_write_min_active\fR (int)
1743.ad
1744.RS 12n
1745Minimum asynchronous write I/Os active to each device.
1746See the section "ZFS I/O SCHEDULER".
1747.sp
06226b59
D
1748Lower values are associated with better latency on rotational media but poorer
1749resilver performance. The default value of 2 was chosen as a compromise. A
1750value of 3 has been shown to improve resilver performance further at a cost of
1751further increasing latency.
1752.sp
1753Default value: \fB2\fR.
e8b96c60
MA
1754.RE
1755
619f0976
GW
1756.sp
1757.ne 2
1758.na
1759\fBzfs_vdev_initializing_max_active\fR (int)
1760.ad
1761.RS 12n
1762Maximum initializing I/Os active to each device.
1763See the section "ZFS I/O SCHEDULER".
1764.sp
1765Default value: \fB1\fR.
1766.RE
1767
1768.sp
1769.ne 2
1770.na
1771\fBzfs_vdev_initializing_min_active\fR (int)
1772.ad
1773.RS 12n
1774Minimum initializing I/Os active to each device.
1775See the section "ZFS I/O SCHEDULER".
1776.sp
1777Default value: \fB1\fR.
1778.RE
1779
e8b96c60
MA
1780.sp
1781.ne 2
1782.na
1783\fBzfs_vdev_max_active\fR (int)
1784.ad
1785.RS 12n
1786The maximum number of I/Os active to each device. Ideally, this will be >=
1787the sum of each queue's max_active. It must be at least the sum of each
1788queue's min_active. See the section "ZFS I/O SCHEDULER".
1789.sp
1790Default value: \fB1,000\fR.
1791.RE
1792
619f0976
GW
1793.sp
1794.ne 2
1795.na
1796\fBzfs_vdev_removal_max_active\fR (int)
1797.ad
1798.RS 12n
1799Maximum removal I/Os active to each device.
1800See the section "ZFS I/O SCHEDULER".
1801.sp
1802Default value: \fB2\fR.
1803.RE
1804
1805.sp
1806.ne 2
1807.na
1808\fBzfs_vdev_removal_min_active\fR (int)
1809.ad
1810.RS 12n
1811Minimum removal I/Os active to each device.
1812See the section "ZFS I/O SCHEDULER".
1813.sp
1814Default value: \fB1\fR.
1815.RE
1816
e8b96c60
MA
1817.sp
1818.ne 2
1819.na
1820\fBzfs_vdev_scrub_max_active\fR (int)
1821.ad
1822.RS 12n
83426735 1823Maximum scrub I/Os active to each device.
e8b96c60
MA
1824See the section "ZFS I/O SCHEDULER".
1825.sp
1826Default value: \fB2\fR.
1827.RE
1828
1829.sp
1830.ne 2
1831.na
1832\fBzfs_vdev_scrub_min_active\fR (int)
1833.ad
1834.RS 12n
1835Minimum scrub I/Os active to each device.
1836See the section "ZFS I/O SCHEDULER".
1837.sp
1838Default value: \fB1\fR.
1839.RE
1840
1841.sp
1842.ne 2
1843.na
1844\fBzfs_vdev_sync_read_max_active\fR (int)
1845.ad
1846.RS 12n
83426735 1847Maximum synchronous read I/Os active to each device.
e8b96c60
MA
1848See the section "ZFS I/O SCHEDULER".
1849.sp
1850Default value: \fB10\fR.
1851.RE
1852
1853.sp
1854.ne 2
1855.na
1856\fBzfs_vdev_sync_read_min_active\fR (int)
1857.ad
1858.RS 12n
1859Minimum synchronous read I/Os active to each device.
1860See the section "ZFS I/O SCHEDULER".
1861.sp
1862Default value: \fB10\fR.
1863.RE
1864
1865.sp
1866.ne 2
1867.na
1868\fBzfs_vdev_sync_write_max_active\fR (int)
1869.ad
1870.RS 12n
83426735 1871Maximum synchronous write I/Os active to each device.
e8b96c60
MA
1872See the section "ZFS I/O SCHEDULER".
1873.sp
1874Default value: \fB10\fR.
1875.RE
1876
1877.sp
1878.ne 2
1879.na
1880\fBzfs_vdev_sync_write_min_active\fR (int)
1881.ad
1882.RS 12n
1883Minimum synchronous write I/Os active to each device.
1884See the section "ZFS I/O SCHEDULER".
1885.sp
1886Default value: \fB10\fR.
1887.RE
1888
1b939560
BB
1889.sp
1890.ne 2
1891.na
1892\fBzfs_vdev_trim_max_active\fR (int)
1893.ad
1894.RS 12n
1895Maximum trim/discard I/Os active to each device.
1896See the section "ZFS I/O SCHEDULER".
1897.sp
1898Default value: \fB2\fR.
1899.RE
1900
1901.sp
1902.ne 2
1903.na
1904\fBzfs_vdev_trim_min_active\fR (int)
1905.ad
1906.RS 12n
1907Minimum trim/discard I/Os active to each device.
1908See the section "ZFS I/O SCHEDULER".
1909.sp
1910Default value: \fB1\fR.
1911.RE
1912
3dfb57a3
DB
1913.sp
1914.ne 2
1915.na
1916\fBzfs_vdev_queue_depth_pct\fR (int)
1917.ad
1918.RS 12n
e815485f
TC
1919Maximum number of queued allocations per top-level vdev expressed as
1920a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
1921system to detect devices that are more capable of handling allocations
1922and to allocate more blocks to those devices. It allows for dynamic
1923allocation distribution when devices are imbalanced as fuller devices
1924will tend to be slower than empty devices.
1925
1926See also \fBzio_dva_throttle_enabled\fR.
3dfb57a3 1927.sp
be54a13c 1928Default value: \fB1000\fR%.
3dfb57a3
DB
1929.RE
1930
29714574
TF
1931.sp
1932.ne 2
1933.na
1934\fBzfs_expire_snapshot\fR (int)
1935.ad
1936.RS 12n
1937Seconds to expire .zfs/snapshot
1938.sp
1939Default value: \fB300\fR.
1940.RE
1941
0500e835
BB
1942.sp
1943.ne 2
1944.na
1945\fBzfs_admin_snapshot\fR (int)
1946.ad
1947.RS 12n
1948Allow the creation, removal, or renaming of entries in the .zfs/snapshot
1949directory to cause the creation, destruction, or renaming of snapshots.
1950When enabled this functionality works both locally and over NFS exports
1951which have the 'no_root_squash' option set. This functionality is disabled
1952by default.
1953.sp
1954Use \fB1\fR for yes and \fB0\fR for no (default).
1955.RE
1956
29714574
TF
1957.sp
1958.ne 2
1959.na
1960\fBzfs_flags\fR (int)
1961.ad
1962.RS 12n
33b6dbbc
NB
1963Set additional debugging flags. The following flags may be bitwise-or'd
1964together.
1965.sp
1966.TS
1967box;
1968rB lB
1969lB lB
1970r l.
1971Value Symbolic Name
1972 Description
1973_
19741 ZFS_DEBUG_DPRINTF
1975 Enable dprintf entries in the debug log.
1976_
19772 ZFS_DEBUG_DBUF_VERIFY *
1978 Enable extra dbuf verifications.
1979_
19804 ZFS_DEBUG_DNODE_VERIFY *
1981 Enable extra dnode verifications.
1982_
19838 ZFS_DEBUG_SNAPNAMES
1984 Enable snapshot name verification.
1985_
198616 ZFS_DEBUG_MODIFY
1987 Check for illegally modified ARC buffers.
1988_
33b6dbbc
NB
198964 ZFS_DEBUG_ZIO_FREE
1990 Enable verification of block frees.
1991_
1992128 ZFS_DEBUG_HISTOGRAM_VERIFY
1993 Enable extra spacemap histogram verifications.
8740cf4a
NB
1994_
1995256 ZFS_DEBUG_METASLAB_VERIFY
1996 Verify space accounting on disk matches in-core range_trees.
1997_
1998512 ZFS_DEBUG_SET_ERROR
1999 Enable SET_ERROR and dprintf entries in the debug log.
1b939560
BB
2000_
20011024 ZFS_DEBUG_INDIRECT_REMAP
2002 Verify split blocks created by device removal.
2003_
20042048 ZFS_DEBUG_TRIM
2005 Verify TRIM ranges are always within the allocatable range tree.
93e28d66
SD
2006_
20074096 ZFS_DEBUG_LOG_SPACEMAP
2008 Verify that the log summary is consistent with the spacemap log
2009 and enable zfs_dbgmsgs for metaslab loading and flushing.
33b6dbbc
NB
2010.TE
2011.sp
2012* Requires debug build.
29714574 2013.sp
33b6dbbc 2014Default value: \fB0\fR.
29714574
TF
2015.RE
2016
fbeddd60
MA
2017.sp
2018.ne 2
2019.na
2020\fBzfs_free_leak_on_eio\fR (int)
2021.ad
2022.RS 12n
2023If destroy encounters an EIO while reading metadata (e.g. indirect
2024blocks), space referenced by the missing metadata can not be freed.
2025Normally this causes the background destroy to become "stalled", as
2026it is unable to make forward progress. While in this stalled state,
2027all remaining space to free from the error-encountering filesystem is
2028"temporarily leaked". Set this flag to cause it to ignore the EIO,
2029permanently leak the space from indirect blocks that can not be read,
2030and continue to free everything else that it can.
2031
2032The default, "stalling" behavior is useful if the storage partially
2033fails (i.e. some but not all i/os fail), and then later recovers. In
2034this case, we will be able to continue pool operations while it is
2035partially failed, and when it recovers, we can continue to free the
2036space, with no leaks. However, note that this case is actually
2037fairly rare.
2038
2039Typically pools either (a) fail completely (but perhaps temporarily,
2040e.g. a top-level vdev going offline), or (b) have localized,
2041permanent errors (e.g. disk returns the wrong data due to bit flip or
2042firmware bug). In case (a), this setting does not matter because the
2043pool will be suspended and the sync thread will not be able to make
2044forward progress regardless. In case (b), because the error is
2045permanent, the best we can do is leak the minimum amount of space,
2046which is what setting this flag will do. Therefore, it is reasonable
2047for this flag to normally be set, but we chose the more conservative
2048approach of not setting it, so that there is no possibility of
2049leaking space in the "partial temporary" failure case.
2050.sp
2051Default value: \fB0\fR.
2052.RE
2053
29714574
TF
2054.sp
2055.ne 2
2056.na
2057\fBzfs_free_min_time_ms\fR (int)
2058.ad
2059.RS 12n
6146e17e 2060During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
83426735 2061of this much time will be spent working on freeing blocks per txg.
29714574
TF
2062.sp
2063Default value: \fB1,000\fR.
2064.RE
2065
67709516
D
2066.sp
2067.ne 2
2068.na
2069\fBzfs_obsolete_min_time_ms\fR (int)
2070.ad
2071.RS 12n
2072Simlar to \fBzfs_free_min_time_ms\fR but for cleanup of old indirection records
2073for removed vdevs.
2074.sp
2075Default value: \fB500\fR.
2076.RE
2077
29714574
TF
2078.sp
2079.ne 2
2080.na
2081\fBzfs_immediate_write_sz\fR (long)
2082.ad
2083.RS 12n
83426735 2084Largest data block to write to zil. Larger blocks will be treated as if the
6146e17e 2085dataset being written to had the property setting \fBlogbias=throughput\fR.
29714574
TF
2086.sp
2087Default value: \fB32,768\fR.
2088.RE
2089
619f0976
GW
2090.sp
2091.ne 2
2092.na
2093\fBzfs_initialize_value\fR (ulong)
2094.ad
2095.RS 12n
2096Pattern written to vdev free space by \fBzpool initialize\fR.
2097.sp
2098Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee).
2099.RE
2100
e60e158e
JG
2101.sp
2102.ne 2
2103.na
2104\fBzfs_initialize_chunk_size\fR (ulong)
2105.ad
2106.RS 12n
2107Size of writes used by \fBzpool initialize\fR.
2108This option is used by the test suite to facilitate testing.
2109.sp
2110Default value: \fB1,048,576\fR
2111.RE
2112
37f03da8
SH
2113.sp
2114.ne 2
2115.na
2116\fBzfs_livelist_max_entries\fR (ulong)
2117.ad
2118.RS 12n
2119The threshold size (in block pointers) at which we create a new sub-livelist.
2120Larger sublists are more costly from a memory perspective but the fewer
2121sublists there are, the lower the cost of insertion.
2122.sp
2123Default value: \fB500,000\fR.
2124.RE
2125
2126.sp
2127.ne 2
2128.na
2129\fBzfs_livelist_min_percent_shared\fR (int)
2130.ad
2131.RS 12n
2132If the amount of shared space between a snapshot and its clone drops below
2133this threshold, the clone turns off the livelist and reverts to the old deletion
2134method. This is in place because once a clone has been overwritten enough
2135livelists no long give us a benefit.
2136.sp
2137Default value: \fB75\fR.
2138.RE
2139
2140.sp
2141.ne 2
2142.na
2143\fBzfs_livelist_condense_new_alloc\fR (int)
2144.ad
2145.RS 12n
2146Incremented each time an extra ALLOC blkptr is added to a livelist entry while
2147it is being condensed.
2148This option is used by the test suite to track race conditions.
2149.sp
2150Default value: \fB0\fR.
2151.RE
2152
2153.sp
2154.ne 2
2155.na
2156\fBzfs_livelist_condense_sync_cancel\fR (int)
2157.ad
2158.RS 12n
2159Incremented each time livelist condensing is canceled while in
2160spa_livelist_condense_sync.
2161This option is used by the test suite to track race conditions.
2162.sp
2163Default value: \fB0\fR.
2164.RE
2165
2166.sp
2167.ne 2
2168.na
2169\fBzfs_livelist_condense_sync_pause\fR (int)
2170.ad
2171.RS 12n
2172When set, the livelist condense process pauses indefinitely before
2173executing the synctask - spa_livelist_condense_sync.
2174This option is used by the test suite to trigger race conditions.
2175.sp
2176Default value: \fB0\fR.
2177.RE
2178
2179.sp
2180.ne 2
2181.na
2182\fBzfs_livelist_condense_zthr_cancel\fR (int)
2183.ad
2184.RS 12n
2185Incremented each time livelist condensing is canceled while in
2186spa_livelist_condense_cb.
2187This option is used by the test suite to track race conditions.
2188.sp
2189Default value: \fB0\fR.
2190.RE
2191
2192.sp
2193.ne 2
2194.na
2195\fBzfs_livelist_condense_zthr_pause\fR (int)
2196.ad
2197.RS 12n
2198When set, the livelist condense process pauses indefinitely before
2199executing the open context condensing work in spa_livelist_condense_cb.
2200This option is used by the test suite to trigger race conditions.
2201.sp
2202Default value: \fB0\fR.
2203.RE
2204
917f475f
JG
2205.sp
2206.ne 2
2207.na
2208\fBzfs_lua_max_instrlimit\fR (ulong)
2209.ad
2210.RS 12n
2211The maximum execution time limit that can be set for a ZFS channel program,
2212specified as a number of Lua instructions.
2213.sp
2214Default value: \fB100,000,000\fR.
2215.RE
2216
2217.sp
2218.ne 2
2219.na
2220\fBzfs_lua_max_memlimit\fR (ulong)
2221.ad
2222.RS 12n
2223The maximum memory limit that can be set for a ZFS channel program, specified
2224in bytes.
2225.sp
2226Default value: \fB104,857,600\fR.
2227.RE
2228
a7ed98d8
SD
2229.sp
2230.ne 2
2231.na
2232\fBzfs_max_dataset_nesting\fR (int)
2233.ad
2234.RS 12n
2235The maximum depth of nested datasets. This value can be tuned temporarily to
2236fix existing datasets that exceed the predefined limit.
2237.sp
2238Default value: \fB50\fR.
2239.RE
2240
93e28d66
SD
2241.sp
2242.ne 2
2243.na
2244\fBzfs_max_log_walking\fR (ulong)
2245.ad
2246.RS 12n
2247The number of past TXGs that the flushing algorithm of the log spacemap
2248feature uses to estimate incoming log blocks.
2249.sp
2250Default value: \fB5\fR.
2251.RE
2252
2253.sp
2254.ne 2
2255.na
2256\fBzfs_max_logsm_summary_length\fR (ulong)
2257.ad
2258.RS 12n
2259Maximum number of rows allowed in the summary of the spacemap log.
2260.sp
2261Default value: \fB10\fR.
2262.RE
2263
f1512ee6
MA
2264.sp
2265.ne 2
2266.na
2267\fBzfs_max_recordsize\fR (int)
2268.ad
2269.RS 12n
2270We currently support block sizes from 512 bytes to 16MB. The benefits of
ad796b8a 2271larger blocks, and thus larger I/O, need to be weighed against the cost of
f1512ee6
MA
2272COWing a giant block to modify one byte. Additionally, very large blocks
2273can have an impact on i/o latency, and also potentially on the memory
2274allocator. Therefore, we do not allow the recordsize to be set larger than
2275zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
2276this tunable, and pools with larger blocks can always be imported and used,
2277regardless of this setting.
2278.sp
2279Default value: \fB1,048,576\fR.
2280.RE
2281
30af21b0
PD
2282.sp
2283.ne 2
2284.na
2285\fBzfs_allow_redacted_dataset_mount\fR (int)
2286.ad
2287.RS 12n
2288Allow datasets received with redacted send/receive to be mounted. Normally
2289disabled because these datasets may be missing key data.
2290.sp
2291Default value: \fB0\fR.
2292.RE
2293
93e28d66
SD
2294.sp
2295.ne 2
2296.na
2297\fBzfs_min_metaslabs_to_flush\fR (ulong)
2298.ad
2299.RS 12n
2300Minimum number of metaslabs to flush per dirty TXG
2301.sp
2302Default value: \fB1\fR.
2303.RE
2304
f3a7f661
GW
2305.sp
2306.ne 2
2307.na
2308\fBzfs_metaslab_fragmentation_threshold\fR (int)
2309.ad
2310.RS 12n
2311Allow metaslabs to keep their active state as long as their fragmentation
2312percentage is less than or equal to this value. An active metaslab that
2313exceeds this threshold will no longer keep its active status allowing
2314better metaslabs to be selected.
2315.sp
2316Default value: \fB70\fR.
2317.RE
2318
2319.sp
2320.ne 2
2321.na
2322\fBzfs_mg_fragmentation_threshold\fR (int)
2323.ad
2324.RS 12n
2325Metaslab groups are considered eligible for allocations if their
83426735 2326fragmentation metric (measured as a percentage) is less than or equal to
f3a7f661
GW
2327this value. If a metaslab group exceeds this threshold then it will be
2328skipped unless all metaslab groups within the metaslab class have also
2329crossed this threshold.
2330.sp
cb020f0d 2331Default value: \fB95\fR.
f3a7f661
GW
2332.RE
2333
f4a4046b
TC
2334.sp
2335.ne 2
2336.na
2337\fBzfs_mg_noalloc_threshold\fR (int)
2338.ad
2339.RS 12n
2340Defines a threshold at which metaslab groups should be eligible for
2341allocations. The value is expressed as a percentage of free space
2342beyond which a metaslab group is always eligible for allocations.
2343If a metaslab group's free space is less than or equal to the
6b4e21c6 2344threshold, the allocator will avoid allocating to that group
f4a4046b
TC
2345unless all groups in the pool have reached the threshold. Once all
2346groups have reached the threshold, all groups are allowed to accept
2347allocations. The default value of 0 disables the feature and causes
2348all metaslab groups to be eligible for allocations.
2349
b58237e7 2350This parameter allows one to deal with pools having heavily imbalanced
f4a4046b
TC
2351vdevs such as would be the case when a new vdev has been added.
2352Setting the threshold to a non-zero percentage will stop allocations
2353from being made to vdevs that aren't filled to the specified percentage
2354and allow lesser filled vdevs to acquire more allocations than they
2355otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
2356.sp
2357Default value: \fB0\fR.
2358.RE
2359
cc99f275
DB
2360.sp
2361.ne 2
2362.na
2363\fBzfs_ddt_data_is_special\fR (int)
2364.ad
2365.RS 12n
2366If enabled, ZFS will place DDT data into the special allocation class.
2367.sp
2368Default value: \fB1\fR.
2369.RE
2370
2371.sp
2372.ne 2
2373.na
2374\fBzfs_user_indirect_is_special\fR (int)
2375.ad
2376.RS 12n
2377If enabled, ZFS will place user data (both file and zvol) indirect blocks
2378into the special allocation class.
2379.sp
2380Default value: \fB1\fR.
2381.RE
2382
379ca9cf
OF
2383.sp
2384.ne 2
2385.na
2386\fBzfs_multihost_history\fR (int)
2387.ad
2388.RS 12n
2389Historical statistics for the last N multihost updates will be available in
2390\fB/proc/spl/kstat/zfs/<pool>/multihost\fR
2391.sp
2392Default value: \fB0\fR.
2393.RE
2394
2395.sp
2396.ne 2
2397.na
2398\fBzfs_multihost_interval\fR (ulong)
2399.ad
2400.RS 12n
2401Used to control the frequency of multihost writes which are performed when the
060f0226
OF
2402\fBmultihost\fR pool property is on. This is one factor used to determine the
2403length of the activity check during import.
379ca9cf 2404.sp
060f0226
OF
2405The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR
2406milliseconds. On average a multihost write will be issued for each leaf vdev
2407every \fBzfs_multihost_interval\fR milliseconds. In practice, the observed
2408period can vary with the I/O load and this observed value is the delay which is
2409stored in the uberblock.
379ca9cf
OF
2410.sp
2411Default value: \fB1000\fR.
2412.RE
2413
2414.sp
2415.ne 2
2416.na
2417\fBzfs_multihost_import_intervals\fR (uint)
2418.ad
2419.RS 12n
2420Used to control the duration of the activity test on import. Smaller values of
2421\fBzfs_multihost_import_intervals\fR will reduce the import time but increase
2422the risk of failing to detect an active pool. The total activity check time is
060f0226
OF
2423never allowed to drop below one second.
2424.sp
2425On import the activity check waits a minimum amount of time determined by
2426\fBzfs_multihost_interval * zfs_multihost_import_intervals\fR, or the same
2427product computed on the host which last had the pool imported (whichever is
2428greater). The activity check time may be further extended if the value of mmp
2429delay found in the best uberblock indicates actual multihost updates happened
2430at longer intervals than \fBzfs_multihost_interval\fR. A minimum value of
2431\fB100ms\fR is enforced.
2432.sp
2433A value of 0 is ignored and treated as if it was set to 1.
379ca9cf 2434.sp
db2af93d 2435Default value: \fB20\fR.
379ca9cf
OF
2436.RE
2437
2438.sp
2439.ne 2
2440.na
2441\fBzfs_multihost_fail_intervals\fR (uint)
2442.ad
2443.RS 12n
060f0226
OF
2444Controls the behavior of the pool when multihost write failures or delays are
2445detected.
379ca9cf 2446.sp
060f0226
OF
2447When \fBzfs_multihost_fail_intervals = 0\fR, multihost write failures or delays
2448are ignored. The failures will still be reported to the ZED which depending on
2449its configuration may take action such as suspending the pool or offlining a
2450device.
2451
379ca9cf 2452.sp
060f0226
OF
2453When \fBzfs_multihost_fail_intervals > 0\fR, the pool will be suspended if
2454\fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds pass
2455without a successful mmp write. This guarantees the activity test will see
2456mmp writes if the pool is imported. A value of 1 is ignored and treated as
2457if it was set to 2. This is necessary to prevent the pool from being suspended
2458due to normal, small I/O latency variations.
2459
379ca9cf 2460.sp
db2af93d 2461Default value: \fB10\fR.
379ca9cf
OF
2462.RE
2463
29714574
TF
2464.sp
2465.ne 2
2466.na
2467\fBzfs_no_scrub_io\fR (int)
2468.ad
2469.RS 12n
83426735
D
2470Set for no scrub I/O. This results in scrubs not actually scrubbing data and
2471simply doing a metadata crawl of the pool instead.
29714574
TF
2472.sp
2473Use \fB1\fR for yes and \fB0\fR for no (default).
2474.RE
2475
2476.sp
2477.ne 2
2478.na
2479\fBzfs_no_scrub_prefetch\fR (int)
2480.ad
2481.RS 12n
83426735 2482Set to disable block prefetching for scrubs.
29714574
TF
2483.sp
2484Use \fB1\fR for yes and \fB0\fR for no (default).
2485.RE
2486
29714574
TF
2487.sp
2488.ne 2
2489.na
2490\fBzfs_nocacheflush\fR (int)
2491.ad
2492.RS 12n
53b1f5ea
PS
2493Disable cache flush operations on disks when writing. Setting this will
2494cause pool corruption on power loss if a volatile out-of-order write cache
2495is enabled.
29714574
TF
2496.sp
2497Use \fB1\fR for yes and \fB0\fR for no (default).
2498.RE
2499
2500.sp
2501.ne 2
2502.na
2503\fBzfs_nopwrite_enabled\fR (int)
2504.ad
2505.RS 12n
2506Enable NOP writes
2507.sp
2508Use \fB1\fR for yes (default) and \fB0\fR to disable.
2509.RE
2510
66aca247
DB
2511.sp
2512.ne 2
2513.na
2514\fBzfs_dmu_offset_next_sync\fR (int)
2515.ad
2516.RS 12n
2517Enable forcing txg sync to find holes. When enabled forces ZFS to act
2518like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
2519when a dnode is dirty causes txg's to be synced so that this data can be
2520found.
2521.sp
2522Use \fB1\fR for yes and \fB0\fR to disable (default).
2523.RE
2524
29714574
TF
2525.sp
2526.ne 2
2527.na
b738bc5a 2528\fBzfs_pd_bytes_max\fR (int)
29714574
TF
2529.ad
2530.RS 12n
83426735 2531The number of bytes which should be prefetched during a pool traversal
6146e17e 2532(eg: \fBzfs send\fR or other data crawling operations)
29714574 2533.sp
74aa2ba2 2534Default value: \fB52,428,800\fR.
29714574
TF
2535.RE
2536
bef78122
DQ
2537.sp
2538.ne 2
2539.na
2540\fBzfs_per_txg_dirty_frees_percent \fR (ulong)
2541.ad
2542.RS 12n
65282ee9
AP
2543Tunable to control percentage of dirtied indirect blocks from frees allowed
2544into one TXG. After this threshold is crossed, additional frees will wait until
2545the next TXG.
bef78122
DQ
2546A value of zero will disable this throttle.
2547.sp
65282ee9 2548Default value: \fB5\fR, set to \fB0\fR to disable.
bef78122
DQ
2549.RE
2550
29714574
TF
2551.sp
2552.ne 2
2553.na
2554\fBzfs_prefetch_disable\fR (int)
2555.ad
2556.RS 12n
7f60329a
MA
2557This tunable disables predictive prefetch. Note that it leaves "prescient"
2558prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
2559prescient prefetch never issues i/os that end up not being needed, so it
2560can't hurt performance.
29714574
TF
2561.sp
2562Use \fB1\fR for yes and \fB0\fR for no (default).
2563.RE
2564
5090f727
CZ
2565.sp
2566.ne 2
2567.na
2568\fBzfs_qat_checksum_disable\fR (int)
2569.ad
2570.RS 12n
2571This tunable disables qat hardware acceleration for sha256 checksums. It
2572may be set after the zfs modules have been loaded to initialize the qat
2573hardware as long as support is compiled in and the qat driver is present.
2574.sp
2575Use \fB1\fR for yes and \fB0\fR for no (default).
2576.RE
2577
2578.sp
2579.ne 2
2580.na
2581\fBzfs_qat_compress_disable\fR (int)
2582.ad
2583.RS 12n
2584This tunable disables qat hardware acceleration for gzip compression. It
2585may be set after the zfs modules have been loaded to initialize the qat
2586hardware as long as support is compiled in and the qat driver is present.
2587.sp
2588Use \fB1\fR for yes and \fB0\fR for no (default).
2589.RE
2590
2591.sp
2592.ne 2
2593.na
2594\fBzfs_qat_encrypt_disable\fR (int)
2595.ad
2596.RS 12n
2597This tunable disables qat hardware acceleration for AES-GCM encryption. It
2598may be set after the zfs modules have been loaded to initialize the qat
2599hardware as long as support is compiled in and the qat driver is present.
2600.sp
2601Use \fB1\fR for yes and \fB0\fR for no (default).
2602.RE
2603
29714574
TF
2604.sp
2605.ne 2
2606.na
2607\fBzfs_read_chunk_size\fR (long)
2608.ad
2609.RS 12n
2610Bytes to read per chunk
2611.sp
2612Default value: \fB1,048,576\fR.
2613.RE
2614
2615.sp
2616.ne 2
2617.na
2618\fBzfs_read_history\fR (int)
2619.ad
2620.RS 12n
379ca9cf
OF
2621Historical statistics for the last N reads will be available in
2622\fB/proc/spl/kstat/zfs/<pool>/reads\fR
29714574 2623.sp
83426735 2624Default value: \fB0\fR (no data is kept).
29714574
TF
2625.RE
2626
2627.sp
2628.ne 2
2629.na
2630\fBzfs_read_history_hits\fR (int)
2631.ad
2632.RS 12n
2633Include cache hits in read history
2634.sp
2635Use \fB1\fR for yes and \fB0\fR for no (default).
2636.RE
2637
9e052db4
MA
2638.sp
2639.ne 2
2640.na
4589f3ae
BB
2641\fBzfs_reconstruct_indirect_combinations_max\fR (int)
2642.ad
2643.RS 12na
2644If an indirect split block contains more than this many possible unique
2645combinations when being reconstructed, consider it too computationally
2646expensive to check them all. Instead, try at most
2647\fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected
2648combinations each time the block is accessed. This allows all segment
2649copies to participate fairly in the reconstruction when all combinations
2650cannot be checked and prevents repeated use of one bad copy.
2651.sp
64bdf63f 2652Default value: \fB4096\fR.
9e052db4
MA
2653.RE
2654
29714574
TF
2655.sp
2656.ne 2
2657.na
2658\fBzfs_recover\fR (int)
2659.ad
2660.RS 12n
2661Set to attempt to recover from fatal errors. This should only be used as a
2662last resort, as it typically results in leaked space, or worse.
2663.sp
2664Use \fB1\fR for yes and \fB0\fR for no (default).
2665.RE
2666
7c9a4292
BB
2667.sp
2668.ne 2
2669.na
2670\fBzfs_removal_ignore_errors\fR (int)
2671.ad
2672.RS 12n
2673.sp
2674Ignore hard IO errors during device removal. When set, if a device encounters
2675a hard IO error during the removal process the removal will not be cancelled.
2676This can result in a normally recoverable block becoming permanently damaged
2677and is not recommended. This should only be used as a last resort when the
2678pool cannot be returned to a healthy state prior to removing the device.
2679.sp
2680Default value: \fB0\fR.
2681.RE
2682
53dce5ac
MA
2683.sp
2684.ne 2
2685.na
2686\fBzfs_removal_suspend_progress\fR (int)
2687.ad
2688.RS 12n
2689.sp
2690This is used by the test suite so that it can ensure that certain actions
2691happen while in the middle of a removal.
2692.sp
2693Default value: \fB0\fR.
2694.RE
2695
2696.sp
2697.ne 2
2698.na
2699\fBzfs_remove_max_segment\fR (int)
2700.ad
2701.RS 12n
2702.sp
2703The largest contiguous segment that we will attempt to allocate when removing
2704a device. This can be no larger than 16MB. If there is a performance
2705problem with attempting to allocate large blocks, consider decreasing this.
2706.sp
2707Default value: \fB16,777,216\fR (16MB).
2708.RE
2709
67709516
D
2710.sp
2711.ne 2
2712.na
2713\fBzfs_resilver_disable_defer\fR (int)
2714.ad
2715.RS 12n
2716Disables the \fBresilver_defer\fR feature, causing an operation that would
2717start a resilver to restart one in progress immediately.
2718.sp
2719Default value: \fB0\fR (feature enabled).
2720.RE
2721
29714574
TF
2722.sp
2723.ne 2
2724.na
d4a72f23 2725\fBzfs_resilver_min_time_ms\fR (int)
29714574
TF
2726.ad
2727.RS 12n
d4a72f23
TC
2728Resilvers are processed by the sync thread. While resilvering it will spend
2729at least this much time working on a resilver between txg flushes.
29714574 2730.sp
d4a72f23 2731Default value: \fB3,000\fR.
29714574
TF
2732.RE
2733
02638a30
TC
2734.sp
2735.ne 2
2736.na
2737\fBzfs_scan_ignore_errors\fR (int)
2738.ad
2739.RS 12n
2740If set to a nonzero value, remove the DTL (dirty time list) upon
2741completion of a pool scan (scrub) even if there were unrepairable
2742errors. It is intended to be used during pool repair or recovery to
2743stop resilvering when the pool is next imported.
2744.sp
2745Default value: \fB0\fR.
2746.RE
2747
29714574
TF
2748.sp
2749.ne 2
2750.na
d4a72f23 2751\fBzfs_scrub_min_time_ms\fR (int)
29714574
TF
2752.ad
2753.RS 12n
d4a72f23
TC
2754Scrubs are processed by the sync thread. While scrubbing it will spend
2755at least this much time working on a scrub between txg flushes.
29714574 2756.sp
d4a72f23 2757Default value: \fB1,000\fR.
29714574
TF
2758.RE
2759
2760.sp
2761.ne 2
2762.na
d4a72f23 2763\fBzfs_scan_checkpoint_intval\fR (int)
29714574
TF
2764.ad
2765.RS 12n
d4a72f23
TC
2766To preserve progress across reboots the sequential scan algorithm periodically
2767needs to stop metadata scanning and issue all the verifications I/Os to disk.
2768The frequency of this flushing is determined by the
a8577bdb 2769\fBzfs_scan_checkpoint_intval\fR tunable.
29714574 2770.sp
d4a72f23 2771Default value: \fB7200\fR seconds (every 2 hours).
29714574
TF
2772.RE
2773
2774.sp
2775.ne 2
2776.na
d4a72f23 2777\fBzfs_scan_fill_weight\fR (int)
29714574
TF
2778.ad
2779.RS 12n
d4a72f23
TC
2780This tunable affects how scrub and resilver I/O segments are ordered. A higher
2781number indicates that we care more about how filled in a segment is, while a
2782lower number indicates we care more about the size of the extent without
2783considering the gaps within a segment. This value is only tunable upon module
2784insertion. Changing the value afterwards will have no affect on scrub or
2785resilver performance.
29714574 2786.sp
d4a72f23 2787Default value: \fB3\fR.
29714574
TF
2788.RE
2789
2790.sp
2791.ne 2
2792.na
d4a72f23 2793\fBzfs_scan_issue_strategy\fR (int)
29714574
TF
2794.ad
2795.RS 12n
d4a72f23
TC
2796Determines the order that data will be verified while scrubbing or resilvering.
2797If set to \fB1\fR, data will be verified as sequentially as possible, given the
2798amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
2799may improve scrub performance if the pool's data is very fragmented. If set to
2800\fB2\fR, the largest mostly-contiguous chunk of found data will be verified
2801first. By deferring scrubbing of small segments, we may later find adjacent data
2802to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
2803strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
2804checkpoint.
29714574 2805.sp
d4a72f23
TC
2806Default value: \fB0\fR.
2807.RE
2808
2809.sp
2810.ne 2
2811.na
2812\fBzfs_scan_legacy\fR (int)
2813.ad
2814.RS 12n
2815A value of 0 indicates that scrubs and resilvers will gather metadata in
2816memory before issuing sequential I/O. A value of 1 indicates that the legacy
2817algorithm will be used where I/O is initiated as soon as it is discovered.
2818Changing this value to 0 will not affect scrubs or resilvers that are already
2819in progress.
2820.sp
2821Default value: \fB0\fR.
2822.RE
2823
2824.sp
2825.ne 2
2826.na
2827\fBzfs_scan_max_ext_gap\fR (int)
2828.ad
2829.RS 12n
2830Indicates the largest gap in bytes between scrub / resilver I/Os that will still
2831be considered sequential for sorting purposes. Changing this value will not
2832affect scrubs or resilvers that are already in progress.
2833.sp
2834Default value: \fB2097152 (2 MB)\fR.
2835.RE
2836
2837.sp
2838.ne 2
2839.na
2840\fBzfs_scan_mem_lim_fact\fR (int)
2841.ad
2842.RS 12n
2843Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
2844This tunable determines the hard limit for I/O sorting memory usage.
2845When the hard limit is reached we stop scanning metadata and start issuing
2846data verification I/O. This is done until we get below the soft limit.
2847.sp
2848Default value: \fB20\fR which is 5% of RAM (1/20).
2849.RE
2850
2851.sp
2852.ne 2
2853.na
2854\fBzfs_scan_mem_lim_soft_fact\fR (int)
2855.ad
2856.RS 12n
2857The fraction of the hard limit used to determined the soft limit for I/O sorting
ac3d4d0c 2858by the sequential scan algorithm. When we cross this limit from below no action
d4a72f23
TC
2859is taken. When we cross this limit from above it is because we are issuing
2860verification I/O. In this case (unless the metadata scan is done) we stop
2861issuing verification I/O and start scanning metadata again until we get to the
2862hard limit.
2863.sp
2864Default value: \fB20\fR which is 5% of the hard limit (1/20).
2865.RE
2866
67709516
D
2867.sp
2868.ne 2
2869.na
2870\fBzfs_scan_strict_mem_lim\fR (int)
2871.ad
2872.RS 12n
2873Enforces tight memory limits on pool scans when a sequential scan is in
2874progress. When disabled the memory limit may be exceeded by fast disks.
2875.sp
2876Default value: \fB0\fR.
2877.RE
2878
2879.sp
2880.ne 2
2881.na
2882\fBzfs_scan_suspend_progress\fR (int)
2883.ad
2884.RS 12n
2885Freezes a scrub/resilver in progress without actually pausing it. Intended for
2886testing/debugging.
2887.sp
2888Default value: \fB0\fR.
2889.RE
2890
2891
d4a72f23
TC
2892.sp
2893.ne 2
2894.na
2895\fBzfs_scan_vdev_limit\fR (int)
2896.ad
2897.RS 12n
2898Maximum amount of data that can be concurrently issued at once for scrubs and
2899resilvers per leaf device, given in bytes.
2900.sp
2901Default value: \fB41943040\fR.
29714574
TF
2902.RE
2903
fd8febbd
TF
2904.sp
2905.ne 2
2906.na
2907\fBzfs_send_corrupt_data\fR (int)
2908.ad
2909.RS 12n
83426735 2910Allow sending of corrupt data (ignore read/checksum errors when sending data)
fd8febbd
TF
2911.sp
2912Use \fB1\fR for yes and \fB0\fR for no (default).
2913.RE
2914
caf9dd20
BB
2915.sp
2916.ne 2
2917.na
2918\fBzfs_send_unmodified_spill_blocks\fR (int)
2919.ad
2920.RS 12n
2921Include unmodified spill blocks in the send stream. Under certain circumstances
2922previous versions of ZFS could incorrectly remove the spill block from an
2923existing object. Including unmodified copies of the spill blocks creates a
2924backwards compatible stream which will recreate a spill block if it was
2925incorrectly removed.
2926.sp
2927Use \fB1\fR for yes (default) and \fB0\fR for no.
2928.RE
2929
30af21b0
PD
2930.sp
2931.ne 2
2932.na
2933\fBzfs_send_no_prefetch_queue_ff\fR (int)
2934.ad
2935.RS 12n
2936The fill fraction of the \fBzfs send\fR internal queues. The fill fraction
2937controls the timing with which internal threads are woken up.
2938.sp
2939Default value: \fB20\fR.
2940.RE
2941
2942.sp
2943.ne 2
2944.na
2945\fBzfs_send_no_prefetch_queue_length\fR (int)
2946.ad
2947.RS 12n
2948The maximum number of bytes allowed in \fBzfs send\fR's internal queues.
2949.sp
2950Default value: \fB1,048,576\fR.
2951.RE
2952
2953.sp
2954.ne 2
2955.na
2956\fBzfs_send_queue_ff\fR (int)
2957.ad
2958.RS 12n
2959The fill fraction of the \fBzfs send\fR prefetch queue. The fill fraction
2960controls the timing with which internal threads are woken up.
2961.sp
2962Default value: \fB20\fR.
2963.RE
2964
3b0d9928
BB
2965.sp
2966.ne 2
2967.na
2968\fBzfs_send_queue_length\fR (int)
2969.ad
2970.RS 12n
30af21b0
PD
2971The maximum number of bytes allowed that will be prefetched by \fBzfs send\fR.
2972This value must be at least twice the maximum block size in use.
3b0d9928
BB
2973.sp
2974Default value: \fB16,777,216\fR.
2975.RE
2976
30af21b0
PD
2977.sp
2978.ne 2
2979.na
2980\fBzfs_recv_queue_ff\fR (int)
2981.ad
2982.RS 12n
2983The fill fraction of the \fBzfs receive\fR queue. The fill fraction
2984controls the timing with which internal threads are woken up.
2985.sp
2986Default value: \fB20\fR.
2987.RE
2988
3b0d9928
BB
2989.sp
2990.ne 2
2991.na
2992\fBzfs_recv_queue_length\fR (int)
2993.ad
2994.RS 12n
3b0d9928
BB
2995The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value
2996must be at least twice the maximum block size in use.
2997.sp
2998Default value: \fB16,777,216\fR.
2999.RE
3000
7261fc2e
MA
3001.sp
3002.ne 2
3003.na
3004\fBzfs_recv_write_batch_size\fR (int)
3005.ad
3006.RS 12n
3007The maximum amount of data (in bytes) that \fBzfs receive\fR will write in
3008one DMU transaction. This is the uncompressed size, even when receiving a
3009compressed send stream. This setting will not reduce the write size below
3010a single block. Capped at a maximum of 32MB
3011.sp
3012Default value: \fB1MB\fR.
3013.RE
3014
30af21b0
PD
3015.sp
3016.ne 2
3017.na
3018\fBzfs_override_estimate_recordsize\fR (ulong)
3019.ad
3020.RS 12n
3021Setting this variable overrides the default logic for estimating block
3022sizes when doing a zfs send. The default heuristic is that the average
3023block size will be the current recordsize. Override this value if most data
3024in your dataset is not of that size and you require accurate zfs send size
3025estimates.
3026.sp
3027Default value: \fB0\fR.
3028.RE
3029
29714574
TF
3030.sp
3031.ne 2
3032.na
3033\fBzfs_sync_pass_deferred_free\fR (int)
3034.ad
3035.RS 12n
83426735 3036Flushing of data to disk is done in passes. Defer frees starting in this pass
29714574
TF
3037.sp
3038Default value: \fB2\fR.
3039.RE
3040
d2734cce
SD
3041.sp
3042.ne 2
3043.na
3044\fBzfs_spa_discard_memory_limit\fR (int)
3045.ad
3046.RS 12n
3047Maximum memory used for prefetching a checkpoint's space map on each
3048vdev while discarding the checkpoint.
3049.sp
3050Default value: \fB16,777,216\fR.
3051.RE
3052
1f02ecc5
D
3053.sp
3054.ne 2
3055.na
3056\fBzfs_special_class_metadata_reserve_pct\fR (int)
3057.ad
3058.RS 12n
3059Only allow small data blocks to be allocated on the special and dedup vdev
3060types when the available free space percentage on these vdevs exceeds this
3061value. This ensures reserved space is available for pool meta data as the
3062special vdevs approach capacity.
3063.sp
3064Default value: \fB25\fR.
3065.RE
3066
29714574
TF
3067.sp
3068.ne 2
3069.na
3070\fBzfs_sync_pass_dont_compress\fR (int)
3071.ad
3072.RS 12n
be89734a
MA
3073Starting in this sync pass, we disable compression (including of metadata).
3074With the default setting, in practice, we don't have this many sync passes,
3075so this has no effect.
3076.sp
3077The original intent was that disabling compression would help the sync passes
3078to converge. However, in practice disabling compression increases the average
3079number of sync passes, because when we turn compression off, a lot of block's
3080size will change and thus we have to re-allocate (not overwrite) them. It
3081also increases the number of 128KB allocations (e.g. for indirect blocks and
3082spacemaps) because these will not be compressed. The 128K allocations are
3083especially detrimental to performance on highly fragmented systems, which may
3084have very few free segments of this size, and may need to load new metaslabs
3085to satisfy 128K allocations.
29714574 3086.sp
be89734a 3087Default value: \fB8\fR.
29714574
TF
3088.RE
3089
3090.sp
3091.ne 2
3092.na
3093\fBzfs_sync_pass_rewrite\fR (int)
3094.ad
3095.RS 12n
83426735 3096Rewrite new block pointers starting in this pass
29714574
TF
3097.sp
3098Default value: \fB2\fR.
3099.RE
3100
a032ac4b
BB
3101.sp
3102.ne 2
3103.na
3104\fBzfs_sync_taskq_batch_pct\fR (int)
3105.ad
3106.RS 12n
3107This controls the number of threads used by the dp_sync_taskq. The default
3108value of 75% will create a maximum of one thread per cpu.
3109.sp
be54a13c 3110Default value: \fB75\fR%.
a032ac4b
BB
3111.RE
3112
1b939560
BB
3113.sp
3114.ne 2
3115.na
67709516 3116\fBzfs_trim_extent_bytes_max\fR (uint)
1b939560
BB
3117.ad
3118.RS 12n
3119Maximum size of TRIM command. Ranges larger than this will be split in to
3120chunks no larger than \fBzfs_trim_extent_bytes_max\fR bytes before being
3121issued to the device.
3122.sp
3123Default value: \fB134,217,728\fR.
3124.RE
3125
3126.sp
3127.ne 2
3128.na
67709516 3129\fBzfs_trim_extent_bytes_min\fR (uint)
1b939560
BB
3130.ad
3131.RS 12n
3132Minimum size of TRIM commands. TRIM ranges smaller than this will be skipped
3133unless they're part of a larger range which was broken in to chunks. This is
3134done because it's common for these small TRIMs to negatively impact overall
3135performance. This value can be set to 0 to TRIM all unallocated space.
3136.sp
3137Default value: \fB32,768\fR.
3138.RE
3139
3140.sp
3141.ne 2
3142.na
67709516 3143\fBzfs_trim_metaslab_skip\fR (uint)
1b939560
BB
3144.ad
3145.RS 12n
3146Skip uninitialized metaslabs during the TRIM process. This option is useful
3147for pools constructed from large thinly-provisioned devices where TRIM
3148operations are slow. As a pool ages an increasing fraction of the pools
3149metaslabs will be initialized progressively degrading the usefulness of
3150this option. This setting is stored when starting a manual TRIM and will
3151persist for the duration of the requested TRIM.
3152.sp
3153Default value: \fB0\fR.
3154.RE
3155
3156.sp
3157.ne 2
3158.na
67709516 3159\fBzfs_trim_queue_limit\fR (uint)
1b939560
BB
3160.ad
3161.RS 12n
3162Maximum number of queued TRIMs outstanding per leaf vdev. The number of
3163concurrent TRIM commands issued to the device is controlled by the
3164\fBzfs_vdev_trim_min_active\fR and \fBzfs_vdev_trim_max_active\fR module
3165options.
3166.sp
3167Default value: \fB10\fR.
3168.RE
3169
3170.sp
3171.ne 2
3172.na
67709516 3173\fBzfs_trim_txg_batch\fR (uint)
1b939560
BB
3174.ad
3175.RS 12n
3176The number of transaction groups worth of frees which should be aggregated
3177before TRIM operations are issued to the device. This setting represents a
3178trade-off between issuing larger, more efficient TRIM operations and the
3179delay before the recently trimmed space is available for use by the device.
3180.sp
3181Increasing this value will allow frees to be aggregated for a longer time.
3182This will result is larger TRIM operations and potentially increased memory
3183usage. Decreasing this value will have the opposite effect. The default
3184value of 32 was determined to be a reasonable compromise.
3185.sp
3186Default value: \fB32\fR.
3187.RE
3188
29714574
TF
3189.sp
3190.ne 2
3191.na
3192\fBzfs_txg_history\fR (int)
3193.ad
3194.RS 12n
379ca9cf
OF
3195Historical statistics for the last N txgs will be available in
3196\fB/proc/spl/kstat/zfs/<pool>/txgs\fR
29714574 3197.sp
ca85d690 3198Default value: \fB0\fR.
29714574
TF
3199.RE
3200
29714574
TF
3201.sp
3202.ne 2
3203.na
3204\fBzfs_txg_timeout\fR (int)
3205.ad
3206.RS 12n
83426735 3207Flush dirty data to disk at least every N seconds (maximum txg duration)
29714574
TF
3208.sp
3209Default value: \fB5\fR.
3210.RE
3211
1b939560
BB
3212.sp
3213.ne 2
3214.na
3215\fBzfs_vdev_aggregate_trim\fR (int)
3216.ad
3217.RS 12n
3218Allow TRIM I/Os to be aggregated. This is normally not helpful because
3219the extents to be trimmed will have been already been aggregated by the
3220metaslab. This option is provided for debugging and performance analysis.
3221.sp
3222Default value: \fB0\fR.
3223.RE
3224
29714574
TF
3225.sp
3226.ne 2
3227.na
3228\fBzfs_vdev_aggregation_limit\fR (int)
3229.ad
3230.RS 12n
3231Max vdev I/O aggregation size
3232.sp
1af240f3
AM
3233Default value: \fB1,048,576\fR.
3234.RE
3235
3236.sp
3237.ne 2
3238.na
3239\fBzfs_vdev_aggregation_limit_non_rotating\fR (int)
3240.ad
3241.RS 12n
3242Max vdev I/O aggregation size for non-rotating media
3243.sp
29714574
TF
3244Default value: \fB131,072\fR.
3245.RE
3246
3247.sp
3248.ne 2
3249.na
3250\fBzfs_vdev_cache_bshift\fR (int)
3251.ad
3252.RS 12n
3253Shift size to inflate reads too
3254.sp
83426735 3255Default value: \fB16\fR (effectively 65536).
29714574
TF
3256.RE
3257
3258.sp
3259.ne 2
3260.na
3261\fBzfs_vdev_cache_max\fR (int)
3262.ad
3263.RS 12n
ca85d690 3264Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
3265size (default 64k).
83426735
D
3266.sp
3267Default value: \fB16384\fR.
29714574
TF
3268.RE
3269
3270.sp
3271.ne 2
3272.na
3273\fBzfs_vdev_cache_size\fR (int)
3274.ad
3275.RS 12n
83426735
D
3276Total size of the per-disk cache in bytes.
3277.sp
3278Currently this feature is disabled as it has been found to not be helpful
3279for performance and in some cases harmful.
29714574
TF
3280.sp
3281Default value: \fB0\fR.
3282.RE
3283
29714574
TF
3284.sp
3285.ne 2
3286.na
9f500936 3287\fBzfs_vdev_mirror_rotating_inc\fR (int)
29714574
TF
3288.ad
3289.RS 12n
9f500936 3290A number by which the balancing algorithm increments the load calculation for
3291the purpose of selecting the least busy mirror member when an I/O immediately
3292follows its predecessor on rotational vdevs for the purpose of making decisions
3293based on load.
29714574 3294.sp
9f500936 3295Default value: \fB0\fR.
3296.RE
3297
3298.sp
3299.ne 2
3300.na
3301\fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
3302.ad
3303.RS 12n
3304A number by which the balancing algorithm increments the load calculation for
3305the purpose of selecting the least busy mirror member when an I/O lacks
3306locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3307this that are not immediately following the previous I/O are incremented by
3308half.
3309.sp
3310Default value: \fB5\fR.
3311.RE
3312
3313.sp
3314.ne 2
3315.na
3316\fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
3317.ad
3318.RS 12n
3319The maximum distance for the last queued I/O in which the balancing algorithm
3320considers an I/O to have locality.
3321See the section "ZFS I/O SCHEDULER".
3322.sp
3323Default value: \fB1048576\fR.
3324.RE
3325
3326.sp
3327.ne 2
3328.na
3329\fBzfs_vdev_mirror_non_rotating_inc\fR (int)
3330.ad
3331.RS 12n
3332A number by which the balancing algorithm increments the load calculation for
3333the purpose of selecting the least busy mirror member on non-rotational vdevs
3334when I/Os do not immediately follow one another.
3335.sp
3336Default value: \fB0\fR.
3337.RE
3338
3339.sp
3340.ne 2
3341.na
3342\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
3343.ad
3344.RS 12n
3345A number by which the balancing algorithm increments the load calculation for
3346the purpose of selecting the least busy mirror member when an I/O lacks
3347locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3348this that are not immediately following the previous I/O are incremented by
3349half.
3350.sp
3351Default value: \fB1\fR.
29714574
TF
3352.RE
3353
29714574
TF
3354.sp
3355.ne 2
3356.na
3357\fBzfs_vdev_read_gap_limit\fR (int)
3358.ad
3359.RS 12n
83426735
D
3360Aggregate read I/O operations if the gap on-disk between them is within this
3361threshold.
29714574
TF
3362.sp
3363Default value: \fB32,768\fR.
3364.RE
3365
29714574
TF
3366.sp
3367.ne 2
3368.na
3369\fBzfs_vdev_write_gap_limit\fR (int)
3370.ad
3371.RS 12n
3372Aggregate write I/O over gap
3373.sp
3374Default value: \fB4,096\fR.
3375.RE
3376
ab9f4b0b
GN
3377.sp
3378.ne 2
3379.na
3380\fBzfs_vdev_raidz_impl\fR (string)
3381.ad
3382.RS 12n
c9187d86 3383Parameter for selecting raidz parity implementation to use.
ab9f4b0b
GN
3384
3385Options marked (always) below may be selected on module load as they are
3386supported on all systems.
3387The remaining options may only be set after the module is loaded, as they
3388are available only if the implementations are compiled in and supported
3389on the running system.
3390
3391Once the module is loaded, the content of
3392/sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
3393with the currently selected one enclosed in [].
3394Possible options are:
3395 fastest - (always) implementation selected using built-in benchmark
3396 original - (always) original raidz implementation
3397 scalar - (always) scalar raidz implementation
ae25d222
GN
3398 sse2 - implementation using SSE2 instruction set (64bit x86 only)
3399 ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
ab9f4b0b 3400 avx2 - implementation using AVX2 instruction set (64bit x86 only)
7f547f85
RD
3401 avx512f - implementation using AVX512F instruction set (64bit x86 only)
3402 avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
62a65a65
RD
3403 aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
3404 aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
35b07497 3405 powerpc_altivec - implementation using Altivec (PowerPC only)
ab9f4b0b
GN
3406.sp
3407Default value: \fBfastest\fR.
3408.RE
3409
67709516
D
3410.sp
3411.ne 2
3412.na
3413\fBzfs_vdev_scheduler\fR (charp)
3414.ad
3415.RS 12n
3416\fBDEPRECATED\fR: This option exists for compatibility with older user
3417configurations. It does nothing except print a warning to the kernel log if
3418set.
3419.sp
3420.RE
3421
29714574
TF
3422.sp
3423.ne 2
3424.na
3425\fBzfs_zevent_cols\fR (int)
3426.ad
3427.RS 12n
83426735 3428When zevents are logged to the console use this as the word wrap width.
29714574
TF
3429.sp
3430Default value: \fB80\fR.
3431.RE
3432
3433.sp
3434.ne 2
3435.na
3436\fBzfs_zevent_console\fR (int)
3437.ad
3438.RS 12n
3439Log events to the console
3440.sp
3441Use \fB1\fR for yes and \fB0\fR for no (default).
3442.RE
3443
3444.sp
3445.ne 2
3446.na
3447\fBzfs_zevent_len_max\fR (int)
3448.ad
3449.RS 12n
83426735
D
3450Max event queue length. A value of 0 will result in a calculated value which
3451increases with the number of CPUs in the system (minimum 64 events). Events
3452in the queue can be viewed with the \fBzpool events\fR command.
29714574
TF
3453.sp
3454Default value: \fB0\fR.
3455.RE
3456
a032ac4b
BB
3457.sp
3458.ne 2
3459.na
3460\fBzfs_zil_clean_taskq_maxalloc\fR (int)
3461.ad
3462.RS 12n
3463The maximum number of taskq entries that are allowed to be cached. When this
2fe61a7e 3464limit is exceeded transaction records (itxs) will be cleaned synchronously.
a032ac4b
BB
3465.sp
3466Default value: \fB1048576\fR.
3467.RE
3468
3469.sp
3470.ne 2
3471.na
3472\fBzfs_zil_clean_taskq_minalloc\fR (int)
3473.ad
3474.RS 12n
3475The number of taskq entries that are pre-populated when the taskq is first
3476created and are immediately available for use.
3477.sp
3478Default value: \fB1024\fR.
3479.RE
3480
3481.sp
3482.ne 2
3483.na
3484\fBzfs_zil_clean_taskq_nthr_pct\fR (int)
3485.ad
3486.RS 12n
3487This controls the number of threads used by the dp_zil_clean_taskq. The default
3488value of 100% will create a maximum of one thread per cpu.
3489.sp
be54a13c 3490Default value: \fB100\fR%.
a032ac4b
BB
3491.RE
3492
b8738257
MA
3493.sp
3494.ne 2
3495.na
3496\fBzil_maxblocksize\fR (int)
3497.ad
3498.RS 12n
3499This sets the maximum block size used by the ZIL. On very fragmented pools,
3500lowering this (typically to 36KB) can improve performance.
3501.sp
3502Default value: \fB131072\fR (128KB).
3503.RE
3504
53b1f5ea
PS
3505.sp
3506.ne 2
3507.na
3508\fBzil_nocacheflush\fR (int)
3509.ad
3510.RS 12n
3511Disable the cache flush commands that are normally sent to the disk(s) by
3512the ZIL after an LWB write has completed. Setting this will cause ZIL
3513corruption on power loss if a volatile out-of-order write cache is enabled.
3514.sp
3515Use \fB1\fR for yes and \fB0\fR for no (default).
3516.RE
3517
29714574
TF
3518.sp
3519.ne 2
3520.na
3521\fBzil_replay_disable\fR (int)
3522.ad
3523.RS 12n
83426735
D
3524Disable intent logging replay. Can be disabled for recovery from corrupted
3525ZIL
29714574
TF
3526.sp
3527Use \fB1\fR for yes and \fB0\fR for no (default).
3528.RE
3529
3530.sp
3531.ne 2
3532.na
1b7c1e5c 3533\fBzil_slog_bulk\fR (ulong)
29714574
TF
3534.ad
3535.RS 12n
1b7c1e5c
GDN
3536Limit SLOG write size per commit executed with synchronous priority.
3537Any writes above that will be executed with lower (asynchronous) priority
3538to limit potential SLOG device abuse by single active ZIL writer.
29714574 3539.sp
1b7c1e5c 3540Default value: \fB786,432\fR.
29714574
TF
3541.RE
3542
638dd5f4
TC
3543.sp
3544.ne 2
3545.na
3546\fBzio_deadman_log_all\fR (int)
3547.ad
3548.RS 12n
3549If non-zero, the zio deadman will produce debugging messages (see
3550\fBzfs_dbgmsg_enable\fR) for all zios, rather than only for leaf
3551zios possessing a vdev. This is meant to be used by developers to gain
3552diagnostic information for hang conditions which don't involve a mutex
3553or other locking primitive; typically conditions in which a thread in
3554the zio pipeline is looping indefinitely.
3555.sp
3556Default value: \fB0\fR.
3557.RE
3558
c3bd3fb4
TC
3559.sp
3560.ne 2
3561.na
3562\fBzio_decompress_fail_fraction\fR (int)
3563.ad
3564.RS 12n
3565If non-zero, this value represents the denominator of the probability that zfs
3566should induce a decompression failure. For instance, for a 5% decompression
3567failure rate, this value should be set to 20.
3568.sp
3569Default value: \fB0\fR.
3570.RE
3571
29714574
TF
3572.sp
3573.ne 2
3574.na
ad796b8a 3575\fBzio_slow_io_ms\fR (int)
29714574
TF
3576.ad
3577.RS 12n
ad796b8a
TH
3578When an I/O operation takes more than \fBzio_slow_io_ms\fR milliseconds to
3579complete is marked as a slow I/O. Each slow I/O causes a delay zevent. Slow
3580I/O counters can be seen with "zpool status -s".
3581
29714574
TF
3582.sp
3583Default value: \fB30,000\fR.
3584.RE
3585
3dfb57a3
DB
3586.sp
3587.ne 2
3588.na
3589\fBzio_dva_throttle_enabled\fR (int)
3590.ad
3591.RS 12n
ad796b8a 3592Throttle block allocations in the I/O pipeline. This allows for
3dfb57a3 3593dynamic allocation distribution when devices are imbalanced.
e815485f
TC
3594When enabled, the maximum number of pending allocations per top-level vdev
3595is limited by \fBzfs_vdev_queue_depth_pct\fR.
3dfb57a3 3596.sp
27f2b90d 3597Default value: \fB1\fR.
3dfb57a3
DB
3598.RE
3599
29714574
TF
3600.sp
3601.ne 2
3602.na
3603\fBzio_requeue_io_start_cut_in_line\fR (int)
3604.ad
3605.RS 12n
3606Prioritize requeued I/O
3607.sp
3608Default value: \fB0\fR.
3609.RE
3610
dcb6bed1
D
3611.sp
3612.ne 2
3613.na
3614\fBzio_taskq_batch_pct\fR (uint)
3615.ad
3616.RS 12n
3617Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
ad796b8a 3618for I/O. These workers are responsible for I/O work such as compression and
dcb6bed1
D
3619checksum calculations. Fractional number of CPUs will be rounded down.
3620.sp
3621The default value of 75 was chosen to avoid using all CPUs which can result in
3622latency issues and inconsistent application performance, especially when high
3623compression is enabled.
3624.sp
3625Default value: \fB75\fR.
3626.RE
3627
29714574
TF
3628.sp
3629.ne 2
3630.na
3631\fBzvol_inhibit_dev\fR (uint)
3632.ad
3633.RS 12n
83426735
D
3634Do not create zvol device nodes. This may slightly improve startup time on
3635systems with a very large number of zvols.
29714574
TF
3636.sp
3637Use \fB1\fR for yes and \fB0\fR for no (default).
3638.RE
3639
3640.sp
3641.ne 2
3642.na
3643\fBzvol_major\fR (uint)
3644.ad
3645.RS 12n
83426735 3646Major number for zvol block devices
29714574
TF
3647.sp
3648Default value: \fB230\fR.
3649.RE
3650
3651.sp
3652.ne 2
3653.na
3654\fBzvol_max_discard_blocks\fR (ulong)
3655.ad
3656.RS 12n
83426735
D
3657Discard (aka TRIM) operations done on zvols will be done in batches of this
3658many blocks, where block size is determined by the \fBvolblocksize\fR property
3659of a zvol.
29714574
TF
3660.sp
3661Default value: \fB16,384\fR.
3662.RE
3663
9965059a
BB
3664.sp
3665.ne 2
3666.na
3667\fBzvol_prefetch_bytes\fR (uint)
3668.ad
3669.RS 12n
3670When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
3671from the start and end of the volume. Prefetching these regions
3672of the volume is desirable because they are likely to be accessed
3673immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
3674table.
3675.sp
3676Default value: \fB131,072\fR.
3677.RE
3678
692e55b8
CC
3679.sp
3680.ne 2
3681.na
3682\fBzvol_request_sync\fR (uint)
3683.ad
3684.RS 12n
3685When processing I/O requests for a zvol submit them synchronously. This
3686effectively limits the queue depth to 1 for each I/O submitter. When set
3687to 0 requests are handled asynchronously by a thread pool. The number of
3688requests which can be handled concurrently is controller by \fBzvol_threads\fR.
3689.sp
8fa5250f 3690Default value: \fB0\fR.
692e55b8
CC
3691.RE
3692
3693.sp
3694.ne 2
3695.na
3696\fBzvol_threads\fR (uint)
3697.ad
3698.RS 12n
3699Max number of threads which can handle zvol I/O requests concurrently.
3700.sp
3701Default value: \fB32\fR.
3702.RE
3703
cf8738d8 3704.sp
3705.ne 2
3706.na
3707\fBzvol_volmode\fR (uint)
3708.ad
3709.RS 12n
3710Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
3711Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
3712.sp
3713Default value: \fB1\fR.
3714.RE
3715
e8b96c60
MA
3716.SH ZFS I/O SCHEDULER
3717ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
3718The I/O scheduler determines when and in what order those operations are
3719issued. The I/O scheduler divides operations into five I/O classes
3720prioritized in the following order: sync read, sync write, async read,
3721async write, and scrub/resilver. Each queue defines the minimum and
3722maximum number of concurrent operations that may be issued to the
3723device. In addition, the device has an aggregate maximum,
3724\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
3725must not exceed the aggregate maximum. If the sum of the per-queue
3726maximums exceeds the aggregate maximum, then the number of active I/Os
3727may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
3728be issued regardless of whether all per-queue minimums have been met.
3729.sp
3730For many physical devices, throughput increases with the number of
3731concurrent operations, but latency typically suffers. Further, physical
3732devices typically have a limit at which more concurrent operations have no
3733effect on throughput or can actually cause it to decrease.
3734.sp
3735The scheduler selects the next operation to issue by first looking for an
3736I/O class whose minimum has not been satisfied. Once all are satisfied and
3737the aggregate maximum has not been hit, the scheduler looks for classes
3738whose maximum has not been satisfied. Iteration through the I/O classes is
3739done in the order specified above. No further operations are issued if the
3740aggregate maximum number of concurrent operations has been hit or if there
3741are no operations queued for an I/O class that has not hit its maximum.
3742Every time an I/O is queued or an operation completes, the I/O scheduler
3743looks for new operations to issue.
3744.sp
3745In general, smaller max_active's will lead to lower latency of synchronous
3746operations. Larger max_active's may lead to higher overall throughput,
3747depending on underlying storage.
3748.sp
3749The ratio of the queues' max_actives determines the balance of performance
3750between reads, writes, and scrubs. E.g., increasing
3751\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
3752more quickly, but reads and writes to have higher latency and lower throughput.
3753.sp
3754All I/O classes have a fixed maximum number of outstanding operations
3755except for the async write class. Asynchronous writes represent the data
3756that is committed to stable storage during the syncing stage for
3757transaction groups. Transaction groups enter the syncing state
3758periodically so the number of queued async writes will quickly burst up
3759and then bleed down to zero. Rather than servicing them as quickly as
3760possible, the I/O scheduler changes the maximum number of active async
3761write I/Os according to the amount of dirty data in the pool. Since
3762both throughput and latency typically increase with the number of
3763concurrent operations issued to physical devices, reducing the
3764burstiness in the number of concurrent operations also stabilizes the
3765response time of operations from other -- and in particular synchronous
3766-- queues. In broad strokes, the I/O scheduler will issue more
3767concurrent operations from the async write queue as there's more dirty
3768data in the pool.
3769.sp
3770Async Writes
3771.sp
3772The number of concurrent operations issued for the async write I/O class
3773follows a piece-wise linear function defined by a few adjustable points.
3774.nf
3775
3776 | o---------| <-- zfs_vdev_async_write_max_active
3777 ^ | /^ |
3778 | | / | |
3779active | / | |
3780 I/O | / | |
3781count | / | |
3782 | / | |
3783 |-------o | | <-- zfs_vdev_async_write_min_active
3784 0|_______^______|_________|
3785 0% | | 100% of zfs_dirty_data_max
3786 | |
3787 | `-- zfs_vdev_async_write_active_max_dirty_percent
3788 `--------- zfs_vdev_async_write_active_min_dirty_percent
3789
3790.fi
3791Until the amount of dirty data exceeds a minimum percentage of the dirty
3792data allowed in the pool, the I/O scheduler will limit the number of
3793concurrent operations to the minimum. As that threshold is crossed, the
3794number of concurrent operations issued increases linearly to the maximum at
3795the specified maximum percentage of the dirty data allowed in the pool.
3796.sp
3797Ideally, the amount of dirty data on a busy pool will stay in the sloped
3798part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
3799and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
3800maximum percentage, this indicates that the rate of incoming data is
3801greater than the rate that the backend storage can handle. In this case, we
3802must further throttle incoming writes, as described in the next section.
3803
3804.SH ZFS TRANSACTION DELAY
3805We delay transactions when we've determined that the backend storage
3806isn't able to accommodate the rate of incoming writes.
3807.sp
3808If there is already a transaction waiting, we delay relative to when
3809that transaction will finish waiting. This way the calculated delay time
3810is independent of the number of threads concurrently executing
3811transactions.
3812.sp
3813If we are the only waiter, wait relative to when the transaction
3814started, rather than the current time. This credits the transaction for
3815"time already served", e.g. reading indirect blocks.
3816.sp
3817The minimum time for a transaction to take is calculated as:
3818.nf
3819 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
3820 min_time is then capped at 100 milliseconds.
3821.fi
3822.sp
3823The delay has two degrees of freedom that can be adjusted via tunables. The
3824percentage of dirty data at which we start to delay is defined by
3825\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
3826\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
3827delay after writing at full speed has failed to keep up with the incoming write
3828rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
3829this variable determines the amount of delay at the midpoint of the curve.
3830.sp
3831.nf
3832delay
3833 10ms +-------------------------------------------------------------*+
3834 | *|
3835 9ms + *+
3836 | *|
3837 8ms + *+
3838 | * |
3839 7ms + * +
3840 | * |
3841 6ms + * +
3842 | * |
3843 5ms + * +
3844 | * |
3845 4ms + * +
3846 | * |
3847 3ms + * +
3848 | * |
3849 2ms + (midpoint) * +
3850 | | ** |
3851 1ms + v *** +
3852 | zfs_delay_scale ----------> ******** |
3853 0 +-------------------------------------*********----------------+
3854 0% <- zfs_dirty_data_max -> 100%
3855.fi
3856.sp
3857Note that since the delay is added to the outstanding time remaining on the
3858most recent transaction, the delay is effectively the inverse of IOPS.
3859Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
3860was chosen such that small changes in the amount of accumulated dirty data
3861in the first 3/4 of the curve yield relatively small differences in the
3862amount of delay.
3863.sp
3864The effects can be easier to understand when the amount of delay is
3865represented on a log scale:
3866.sp
3867.nf
3868delay
3869100ms +-------------------------------------------------------------++
3870 + +
3871 | |
3872 + *+
3873 10ms + *+
3874 + ** +
3875 | (midpoint) ** |
3876 + | ** +
3877 1ms + v **** +
3878 + zfs_delay_scale ----------> ***** +
3879 | **** |
3880 + **** +
3881100us + ** +
3882 + * +
3883 | * |
3884 + * +
3885 10us + * +
3886 + +
3887 | |
3888 + +
3889 +--------------------------------------------------------------+
3890 0% <- zfs_dirty_data_max -> 100%
3891.fi
3892.sp
3893Note here that only as the amount of dirty data approaches its limit does
3894the delay start to increase rapidly. The goal of a properly tuned system
3895should be to keep the amount of dirty data out of that range by first
3896ensuring that the appropriate limits are set for the I/O scheduler to reach
3897optimal throughput on the backend storage, and then by changing the value
3898of \fBzfs_delay_scale\fR to increase the steepness of the curve.