]> git.proxmox.com Git - mirror_zfs.git/blame - man/man5/zfs-module-parameters.5
Avoid possibility of division by zero
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
CommitLineData
29714574
TF
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
4f072827 3.\" Copyright (c) 2019, 2020 by Delphix. All rights reserved.
65282ee9 4.\" Copyright (c) 2019 Datto Inc.
29714574
TF
5.\" The contents of this file are subject to the terms of the Common Development
6.\" and Distribution License (the "License"). You may not use this file except
7.\" in compliance with the License. You can obtain a copy of the license at
8.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
9.\"
10.\" See the License for the specific language governing permissions and
11.\" limitations under the License. When distributing Covered Code, include this
12.\" CDDL HEADER in each file and include the License file at
13.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
14.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
15.\" own identifying information:
16.\" Portions Copyright [yyyy] [name of copyright owner]
b596585f 17.TH ZFS-MODULE-PARAMETERS 5 "Aug 24, 2020" OpenZFS
29714574
TF
18.SH NAME
19zfs\-module\-parameters \- ZFS module parameters
20.SH DESCRIPTION
21.sp
22.LP
23Description of the different parameters to the ZFS module.
24
25.SS "Module parameters"
26.sp
27.LP
28
de4f8d5d
BB
29.sp
30.ne 2
31.na
32\fBdbuf_cache_max_bytes\fR (ulong)
33.ad
34.RS 12n
8348fac3
RM
35Maximum size in bytes of the dbuf cache. The target size is determined by the
36MIN versus \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size. The
37behavior of the dbuf cache and its associated settings can be observed via the
38\fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
de4f8d5d 39.sp
8348fac3 40Default value: \fBULONG_MAX\fR.
de4f8d5d
BB
41.RE
42
2e5dc449
MA
43.sp
44.ne 2
45.na
46\fBdbuf_metadata_cache_max_bytes\fR (ulong)
47.ad
48.RS 12n
8348fac3
RM
49Maximum size in bytes of the metadata dbuf cache. The target size is
50determined by the MIN versus \fB1/2^dbuf_metadata_cache_shift\fR (1/64) of the
51target ARC size. The behavior of the metadata dbuf cache and its associated
52settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
2e5dc449 53.sp
8348fac3 54Default value: \fBULONG_MAX\fR.
2e5dc449
MA
55.RE
56
de4f8d5d
BB
57.sp
58.ne 2
59.na
60\fBdbuf_cache_hiwater_pct\fR (uint)
61.ad
62.RS 12n
63The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted
64directly.
65.sp
66Default value: \fB10\fR%.
67.RE
68
69.sp
70.ne 2
71.na
72\fBdbuf_cache_lowater_pct\fR (uint)
73.ad
74.RS 12n
75The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops
76evicting dbufs.
77.sp
78Default value: \fB10\fR%.
79.RE
80
81.sp
82.ne 2
83.na
84\fBdbuf_cache_shift\fR (int)
85.ad
86.RS 12n
87Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
77f6826b 88of the target ARC size.
de4f8d5d
BB
89.sp
90Default value: \fB5\fR.
91.RE
92
2e5dc449
MA
93.sp
94.ne 2
95.na
96\fBdbuf_metadata_cache_shift\fR (int)
97.ad
98.RS 12n
99Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR,
77f6826b 100to a log2 fraction of the target ARC size.
2e5dc449
MA
101.sp
102Default value: \fB6\fR.
103.RE
104
67709516
D
105.sp
106.ne 2
107.na
108\fBdmu_object_alloc_chunk_shift\fR (int)
109.ad
110.RS 12n
111dnode slots allocated in a single operation as a power of 2. The default value
112minimizes lock contention for the bulk operation performed.
113.sp
114Default value: \fB7\fR (128).
115.RE
116
d9b4bf06
MA
117.sp
118.ne 2
119.na
120\fBdmu_prefetch_max\fR (int)
121.ad
122.RS 12n
123Limit the amount we can prefetch with one call to this amount (in bytes).
124This helps to limit the amount of memory that can be used by prefetching.
125.sp
126Default value: \fB134,217,728\fR (128MB).
127.RE
128
6d836e6f
RE
129.sp
130.ne 2
131.na
132\fBignore_hole_birth\fR (int)
133.ad
134.RS 12n
6ce7b2d9 135This is an alias for \fBsend_holes_without_birth_time\fR.
6d836e6f
RE
136.RE
137
29714574
TF
138.sp
139.ne 2
140.na
141\fBl2arc_feed_again\fR (int)
142.ad
143.RS 12n
83426735
D
144Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
145fast as possible.
29714574
TF
146.sp
147Use \fB1\fR for yes (default) and \fB0\fR to disable.
148.RE
149
150.sp
151.ne 2
152.na
153\fBl2arc_feed_min_ms\fR (ulong)
154.ad
155.RS 12n
83426735
D
156Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
157applicable in related situations.
29714574
TF
158.sp
159Default value: \fB200\fR.
160.RE
161
162.sp
163.ne 2
164.na
165\fBl2arc_feed_secs\fR (ulong)
166.ad
167.RS 12n
168Seconds between L2ARC writing
169.sp
170Default value: \fB1\fR.
171.RE
172
173.sp
174.ne 2
175.na
176\fBl2arc_headroom\fR (ulong)
177.ad
178.RS 12n
83426735 179How far through the ARC lists to search for L2ARC cacheable content, expressed
77f6826b
GA
180as a multiplier of \fBl2arc_write_max\fR.
181ARC persistence across reboots can be achieved with persistent L2ARC by setting
182this parameter to \fB0\fR allowing the full length of ARC lists to be searched
183for cacheable content.
29714574
TF
184.sp
185Default value: \fB2\fR.
186.RE
187
188.sp
189.ne 2
190.na
191\fBl2arc_headroom_boost\fR (ulong)
192.ad
193.RS 12n
83426735 194Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
b7654bd7
GA
195successfully compressed before writing. A value of \fB100\fR disables this
196feature.
29714574 197.sp
be54a13c 198Default value: \fB200\fR%.
29714574
TF
199.RE
200
523e1295
AM
201.sp
202.ne 2
203.na
204\fBl2arc_meta_percent\fR (int)
205.ad
206.RS 12n
207Percent of ARC size allowed for L2ARC-only headers.
208Since L2ARC buffers are not evicted on memory pressure, too large amount of
209headers on system with irrationaly large L2ARC can render it slow or unusable.
210This parameter limits L2ARC writes and rebuild to achieve it.
211.sp
212Default value: \fB33\fR%.
213.RE
214
b7654bd7
GA
215.sp
216.ne 2
217.na
218\fBl2arc_trim_ahead\fR (ulong)
219.ad
220.RS 12n
221Trims ahead of the current write size (\fBl2arc_write_max\fR) on L2ARC devices
222by this percentage of write size if we have filled the device. If set to
223\fB100\fR we TRIM twice the space required to accommodate upcoming writes. A
224minimum of 64MB will be trimmed. It also enables TRIM of the whole L2ARC device
225upon creation or addition to an existing pool or if the header of the device is
226invalid upon importing a pool or onlining a cache device. A value of \fB0\fR
227disables TRIM on L2ARC altogether and is the default as it can put significant
228stress on the underlying storage devices. This will vary depending of how well
229the specific device handles these commands.
230.sp
231Default value: \fB0\fR%.
232.RE
233
29714574
TF
234.sp
235.ne 2
236.na
237\fBl2arc_noprefetch\fR (int)
238.ad
239.RS 12n
83426735 240Do not write buffers to L2ARC if they were prefetched but not used by
77f6826b 241applications.
29714574
TF
242.sp
243Use \fB1\fR for yes (default) and \fB0\fR to disable.
244.RE
245
246.sp
247.ne 2
248.na
249\fBl2arc_norw\fR (int)
250.ad
251.RS 12n
77f6826b 252No reads during writes.
29714574
TF
253.sp
254Use \fB1\fR for yes and \fB0\fR for no (default).
255.RE
256
257.sp
258.ne 2
259.na
260\fBl2arc_write_boost\fR (ulong)
261.ad
262.RS 12n
603a1784 263Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
83426735 264while they remain cold.
29714574
TF
265.sp
266Default value: \fB8,388,608\fR.
267.RE
268
269.sp
270.ne 2
271.na
272\fBl2arc_write_max\fR (ulong)
273.ad
274.RS 12n
77f6826b 275Max write bytes per interval.
29714574
TF
276.sp
277Default value: \fB8,388,608\fR.
278.RE
279
77f6826b
GA
280.sp
281.ne 2
282.na
283\fBl2arc_rebuild_enabled\fR (int)
284.ad
285.RS 12n
286Rebuild the L2ARC when importing a pool (persistent L2ARC). This can be
287disabled if there are problems importing a pool or attaching an L2ARC device
288(e.g. the L2ARC device is slow in reading stored log metadata, or the metadata
289has become somehow fragmented/unusable).
290.sp
291Use \fB1\fR for yes (default) and \fB0\fR for no.
292.RE
293
294.sp
295.ne 2
296.na
297\fBl2arc_rebuild_blocks_min_l2size\fR (ulong)
298.ad
299.RS 12n
300Min size (in bytes) of an L2ARC device required in order to write log blocks
301in it. The log blocks are used upon importing the pool to rebuild
302the L2ARC (persistent L2ARC). Rationale: for L2ARC devices less than 1GB, the
303amount of data l2arc_evict() evicts is significant compared to the amount of
304restored L2ARC data. In this case do not write log blocks in L2ARC in order not
305to waste space.
306.sp
307Default value: \fB1,073,741,824\fR (1GB).
308.RE
309
99b14de4
ED
310.sp
311.ne 2
312.na
313\fBmetaslab_aliquot\fR (ulong)
314.ad
315.RS 12n
316Metaslab granularity, in bytes. This is roughly similar to what would be
317referred to as the "stripe size" in traditional RAID arrays. In normal
318operation, ZFS will try to write this amount of data to a top-level vdev
319before moving on to the next one.
320.sp
321Default value: \fB524,288\fR.
322.RE
323
f3a7f661
GW
324.sp
325.ne 2
326.na
327\fBmetaslab_bias_enabled\fR (int)
328.ad
329.RS 12n
330Enable metaslab group biasing based on its vdev's over- or under-utilization
331relative to the pool.
332.sp
333Use \fB1\fR for yes (default) and \fB0\fR for no.
334.RE
335
d830d479
MA
336.sp
337.ne 2
338.na
339\fBmetaslab_force_ganging\fR (ulong)
340.ad
341.RS 12n
342Make some blocks above a certain size be gang blocks. This option is used
343by the test suite to facilitate testing.
344.sp
345Default value: \fB16,777,217\fR.
346.RE
347
93e28d66
SD
348.sp
349.ne 2
350.na
351\fBzfs_keep_log_spacemaps_at_export\fR (int)
352.ad
353.RS 12n
354Prevent log spacemaps from being destroyed during pool exports and destroys.
355.sp
356Use \fB1\fR for yes and \fB0\fR for no (default).
357.RE
358
4e21fd06
DB
359.sp
360.ne 2
361.na
362\fBzfs_metaslab_segment_weight_enabled\fR (int)
363.ad
364.RS 12n
365Enable/disable segment-based metaslab selection.
366.sp
367Use \fB1\fR for yes (default) and \fB0\fR for no.
368.RE
369
370.sp
371.ne 2
372.na
373\fBzfs_metaslab_switch_threshold\fR (int)
374.ad
375.RS 12n
376When using segment-based metaslab selection, continue allocating
321204be 377from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
4e21fd06
DB
378worth of buckets have been exhausted.
379.sp
380Default value: \fB2\fR.
381.RE
382
29714574
TF
383.sp
384.ne 2
385.na
aa7d06a9 386\fBmetaslab_debug_load\fR (int)
29714574
TF
387.ad
388.RS 12n
aa7d06a9
GW
389Load all metaslabs during pool import.
390.sp
391Use \fB1\fR for yes and \fB0\fR for no (default).
392.RE
393
394.sp
395.ne 2
396.na
397\fBmetaslab_debug_unload\fR (int)
398.ad
399.RS 12n
400Prevent metaslabs from being unloaded.
29714574
TF
401.sp
402Use \fB1\fR for yes and \fB0\fR for no (default).
403.RE
404
f3a7f661
GW
405.sp
406.ne 2
407.na
408\fBmetaslab_fragmentation_factor_enabled\fR (int)
409.ad
410.RS 12n
411Enable use of the fragmentation metric in computing metaslab weights.
412.sp
413Use \fB1\fR for yes (default) and \fB0\fR for no.
414.RE
415
d3230d76
MA
416.sp
417.ne 2
418.na
419\fBmetaslab_df_max_search\fR (int)
420.ad
421.RS 12n
422Maximum distance to search forward from the last offset. Without this limit,
423fragmented pools can see >100,000 iterations and metaslab_block_picker()
424becomes the performance limiting factor on high-performance storage.
425
426With the default setting of 16MB, we typically see less than 500 iterations,
427even with very fragmented, ashift=9 pools. The maximum number of iterations
428possible is: \fBmetaslab_df_max_search / (2 * (1<<ashift))\fR.
429With the default setting of 16MB this is 16*1024 (with ashift=9) or 2048
430(with ashift=12).
431.sp
432Default value: \fB16,777,216\fR (16MB)
433.RE
434
435.sp
436.ne 2
437.na
438\fBmetaslab_df_use_largest_segment\fR (int)
439.ad
440.RS 12n
441If we are not searching forward (due to metaslab_df_max_search,
442metaslab_df_free_pct, or metaslab_df_alloc_threshold), this tunable controls
b596585f 443what segment is used. If it is set, we will use the largest free segment.
d3230d76
MA
444If it is not set, we will use a segment of exactly the requested size (or
445larger).
446.sp
447Use \fB1\fR for yes and \fB0\fR for no (default).
448.RE
449
c81f1790
PD
450.sp
451.ne 2
452.na
453\fBzfs_metaslab_max_size_cache_sec\fR (ulong)
454.ad
455.RS 12n
456When we unload a metaslab, we cache the size of the largest free chunk. We use
457that cached size to determine whether or not to load a metaslab for a given
458allocation. As more frees accumulate in that metaslab while it's unloaded, the
459cached max size becomes less and less accurate. After a number of seconds
460controlled by this tunable, we stop considering the cached max size and start
461considering only the histogram instead.
462.sp
463Default value: \fB3600 seconds\fR (one hour)
464.RE
465
f09fda50
PD
466.sp
467.ne 2
468.na
469\fBzfs_metaslab_mem_limit\fR (int)
470.ad
471.RS 12n
472When we are loading a new metaslab, we check the amount of memory being used
473to store metaslab range trees. If it is over a threshold, we attempt to unload
474the least recently used metaslab to prevent the system from clogging all of
475its memory with range trees. This tunable sets the percentage of total system
476memory that is the threshold.
477.sp
eef0f4d8 478Default value: \fB25 percent\fR
f09fda50
PD
479.RE
480
b8bcca18
MA
481.sp
482.ne 2
483.na
c853f382 484\fBzfs_vdev_default_ms_count\fR (int)
b8bcca18
MA
485.ad
486.RS 12n
e4e94ca3 487When a vdev is added target this number of metaslabs per top-level vdev.
b8bcca18
MA
488.sp
489Default value: \fB200\fR.
490.RE
491
93e28d66
SD
492.sp
493.ne 2
494.na
495\fBzfs_vdev_default_ms_shift\fR (int)
496.ad
497.RS 12n
498Default limit for metaslab size.
499.sp
500Default value: \fB29\fR [meaning (1 << 29) = 512MB].
501.RE
502
6fe3498c
RM
503.sp
504.ne 2
505.na
506\fBzfs_vdev_max_auto_ashift\fR (ulong)
507.ad
508.RS 12n
509Maximum ashift used when optimizing for logical -> physical sector size on new
510top-level vdevs.
511.sp
512Default value: \fBASHIFT_MAX\fR (16).
513.RE
514
515.sp
516.ne 2
517.na
518\fBzfs_vdev_min_auto_ashift\fR (ulong)
519.ad
520.RS 12n
521Minimum ashift used when creating new top-level vdevs.
522.sp
523Default value: \fBASHIFT_MIN\fR (9).
524.RE
525
d2734cce
SD
526.sp
527.ne 2
528.na
c853f382 529\fBzfs_vdev_min_ms_count\fR (int)
d2734cce
SD
530.ad
531.RS 12n
532Minimum number of metaslabs to create in a top-level vdev.
533.sp
534Default value: \fB16\fR.
535.RE
536
e4e94ca3
DB
537.sp
538.ne 2
539.na
67709516
D
540\fBvdev_validate_skip\fR (int)
541.ad
542.RS 12n
543Skip label validation steps during pool import. Changing is not recommended
544unless you know what you are doing and are recovering a damaged label.
545.sp
546Default value: \fB0\fR.
547.RE
548
549.sp
550.ne 2
551.na
552\fBzfs_vdev_ms_count_limit\fR (int)
e4e94ca3
DB
553.ad
554.RS 12n
555Practical upper limit of total metaslabs per top-level vdev.
556.sp
557Default value: \fB131,072\fR.
558.RE
559
f3a7f661
GW
560.sp
561.ne 2
562.na
563\fBmetaslab_preload_enabled\fR (int)
564.ad
565.RS 12n
566Enable metaslab group preloading.
567.sp
568Use \fB1\fR for yes (default) and \fB0\fR for no.
569.RE
570
571.sp
572.ne 2
573.na
574\fBmetaslab_lba_weighting_enabled\fR (int)
575.ad
576.RS 12n
577Give more weight to metaslabs with lower LBAs, assuming they have
578greater bandwidth as is typically the case on a modern constant
579angular velocity disk drive.
580.sp
581Use \fB1\fR for yes (default) and \fB0\fR for no.
582.RE
583
eef0f4d8
PD
584.sp
585.ne 2
586.na
587\fBmetaslab_unload_delay\fR (int)
588.ad
589.RS 12n
590After a metaslab is used, we keep it loaded for this many txgs, to attempt to
591reduce unnecessary reloading. Note that both this many txgs and
592\fBmetaslab_unload_delay_ms\fR milliseconds must pass before unloading will
593occur.
594.sp
595Default value: \fB32\fR.
596.RE
597
598.sp
599.ne 2
600.na
601\fBmetaslab_unload_delay_ms\fR (int)
602.ad
603.RS 12n
604After a metaslab is used, we keep it loaded for this many milliseconds, to
605attempt to reduce unnecessary reloading. Note that both this many
606milliseconds and \fBmetaslab_unload_delay\fR txgs must pass before unloading
607will occur.
608.sp
609Default value: \fB600000\fR (ten minutes).
610.RE
611
6ce7b2d9
RL
612.sp
613.ne 2
614.na
615\fBsend_holes_without_birth_time\fR (int)
616.ad
617.RS 12n
618When set, the hole_birth optimization will not be used, and all holes will
d0c3aa9c
TC
619always be sent on zfs send. This is useful if you suspect your datasets are
620affected by a bug in hole_birth.
6ce7b2d9
RL
621.sp
622Use \fB1\fR for on (default) and \fB0\fR for off.
623.RE
624
29714574
TF
625.sp
626.ne 2
627.na
628\fBspa_config_path\fR (charp)
629.ad
630.RS 12n
631SPA config file
632.sp
633Default value: \fB/etc/zfs/zpool.cache\fR.
634.RE
635
e8b96c60
MA
636.sp
637.ne 2
638.na
639\fBspa_asize_inflation\fR (int)
640.ad
641.RS 12n
642Multiplication factor used to estimate actual disk consumption from the
643size of data being written. The default value is a worst case estimate,
644but lower values may be valid for a given pool depending on its
645configuration. Pool administrators who understand the factors involved
646may wish to specify a more realistic inflation factor, particularly if
647they operate close to quota or capacity limits.
648.sp
83426735 649Default value: \fB24\fR.
e8b96c60
MA
650.RE
651
6cb8e530
PZ
652.sp
653.ne 2
654.na
655\fBspa_load_print_vdev_tree\fR (int)
656.ad
657.RS 12n
658Whether to print the vdev tree in the debugging message buffer during pool import.
659Use 0 to disable and 1 to enable.
660.sp
661Default value: \fB0\fR.
662.RE
663
dea377c0
MA
664.sp
665.ne 2
666.na
667\fBspa_load_verify_data\fR (int)
668.ad
669.RS 12n
670Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
671import. Use 0 to disable and 1 to enable.
672
673An extreme rewind import normally performs a full traversal of all
674blocks in the pool for verification. If this parameter is set to 0,
675the traversal skips non-metadata blocks. It can be toggled once the
676import has started to stop or start the traversal of non-metadata blocks.
677.sp
83426735 678Default value: \fB1\fR.
dea377c0
MA
679.RE
680
681.sp
682.ne 2
683.na
684\fBspa_load_verify_metadata\fR (int)
685.ad
686.RS 12n
687Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
688pool import. Use 0 to disable and 1 to enable.
689
690An extreme rewind import normally performs a full traversal of all
1c012083 691blocks in the pool for verification. If this parameter is set to 0,
dea377c0
MA
692the traversal is not performed. It can be toggled once the import has
693started to stop or start the traversal.
694.sp
83426735 695Default value: \fB1\fR.
dea377c0
MA
696.RE
697
698.sp
699.ne 2
700.na
c8242a96 701\fBspa_load_verify_shift\fR (int)
dea377c0
MA
702.ad
703.RS 12n
c8242a96 704Sets the maximum number of bytes to consume during pool import to the log2
77f6826b 705fraction of the target ARC size.
dea377c0 706.sp
c8242a96 707Default value: \fB4\fR.
dea377c0
MA
708.RE
709
6cde6435
BB
710.sp
711.ne 2
712.na
713\fBspa_slop_shift\fR (int)
714.ad
715.RS 12n
716Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
717in the pool to be consumed. This ensures that we don't run the pool
718completely out of space, due to unaccounted changes (e.g. to the MOS).
719It also limits the worst-case time to allocate space. If we have
720less than this amount of free space, most ZPL operations (e.g. write,
721create) will return ENOSPC.
722.sp
83426735 723Default value: \fB5\fR.
6cde6435
BB
724.RE
725
0dc2f70c
MA
726.sp
727.ne 2
728.na
729\fBvdev_removal_max_span\fR (int)
730.ad
731.RS 12n
732During top-level vdev removal, chunks of data are copied from the vdev
733which may include free space in order to trade bandwidth for IOPS.
734This parameter determines the maximum span of free space (in bytes)
735which will be included as "unnecessary" data in a chunk of copied data.
736
737The default value here was chosen to align with
738\fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing
739regular reads (but there's no reason it has to be the same).
740.sp
741Default value: \fB32,768\fR.
742.RE
743
d9b4bf06
MA
744.sp
745.ne 2
746.na
747\fBzap_iterate_prefetch\fR (int)
748.ad
749.RS 12n
750If this is set, when we start iterating over a ZAP object, zfs will prefetch
751the entire object (all leaf blocks). However, this is limited by
752\fBdmu_prefetch_max\fR.
753.sp
754Use \fB1\fR for on (default) and \fB0\fR for off.
755.RE
756
29714574
TF
757.sp
758.ne 2
759.na
760\fBzfetch_array_rd_sz\fR (ulong)
761.ad
762.RS 12n
27b293be 763If prefetching is enabled, disable prefetching for reads larger than this size.
29714574
TF
764.sp
765Default value: \fB1,048,576\fR.
766.RE
767
768.sp
769.ne 2
770.na
7f60329a 771\fBzfetch_max_distance\fR (uint)
29714574
TF
772.ad
773.RS 12n
7f60329a 774Max bytes to prefetch per stream (default 8MB).
29714574 775.sp
7f60329a 776Default value: \fB8,388,608\fR.
29714574
TF
777.RE
778
779.sp
780.ne 2
781.na
782\fBzfetch_max_streams\fR (uint)
783.ad
784.RS 12n
27b293be 785Max number of streams per zfetch (prefetch streams per file).
29714574
TF
786.sp
787Default value: \fB8\fR.
788.RE
789
790.sp
791.ne 2
792.na
793\fBzfetch_min_sec_reap\fR (uint)
794.ad
795.RS 12n
27b293be 796Min time before an active prefetch stream can be reclaimed
29714574
TF
797.sp
798Default value: \fB2\fR.
799.RE
800
67709516
D
801.sp
802.ne 2
803.na
804\fBzfs_abd_scatter_enabled\fR (int)
805.ad
806.RS 12n
807Enables ARC from using scatter/gather lists and forces all allocations to be
808linear in kernel memory. Disabling can improve performance in some code paths
809at the expense of fragmented kernel memory.
810.sp
811Default value: \fB1\fR.
812.RE
813
814.sp
815.ne 2
816.na
817\fBzfs_abd_scatter_max_order\fR (iunt)
818.ad
819.RS 12n
820Maximum number of consecutive memory pages allocated in a single block for
821scatter/gather lists. Default value is specified by the kernel itself.
822.sp
823Default value: \fB10\fR at the time of this writing.
824.RE
825
87c25d56
MA
826.sp
827.ne 2
828.na
829\fBzfs_abd_scatter_min_size\fR (uint)
830.ad
831.RS 12n
832This is the minimum allocation size that will use scatter (page-based)
833ABD's. Smaller allocations will use linear ABD's.
834.sp
835Default value: \fB1536\fR (512B and 1KB allocations will be linear).
836.RE
837
25458cbe
TC
838.sp
839.ne 2
840.na
841\fBzfs_arc_dnode_limit\fR (ulong)
842.ad
843.RS 12n
844When the number of bytes consumed by dnodes in the ARC exceeds this number of
9907cc1c 845bytes, try to unpin some of it in response to demand for non-metadata. This
627791f3 846value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
9907cc1c
G
847indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
848the ARC meta buffers that may be used for dnodes.
25458cbe
TC
849
850See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
851when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
852than in response to overall demand for non-metadata.
853
854.sp
9907cc1c
G
855Default value: \fB0\fR.
856.RE
857
858.sp
859.ne 2
860.na
861\fBzfs_arc_dnode_limit_percent\fR (ulong)
862.ad
863.RS 12n
864Percentage that can be consumed by dnodes of ARC meta buffers.
865.sp
866See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
867higher priority if set to nonzero value.
868.sp
be54a13c 869Default value: \fB10\fR%.
25458cbe
TC
870.RE
871
872.sp
873.ne 2
874.na
875\fBzfs_arc_dnode_reduce_percent\fR (ulong)
876.ad
877.RS 12n
878Percentage of ARC dnodes to try to scan in response to demand for non-metadata
6146e17e 879when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
25458cbe
TC
880
881.sp
be54a13c 882Default value: \fB10\fR% of the number of dnodes in the ARC.
25458cbe
TC
883.RE
884
49ddb315
MA
885.sp
886.ne 2
887.na
888\fBzfs_arc_average_blocksize\fR (int)
889.ad
890.RS 12n
891The ARC's buffer hash table is sized based on the assumption of an average
892block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
893to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
894For configurations with a known larger average block size this value can be
895increased to reduce the memory footprint.
896
897.sp
898Default value: \fB8192\fR.
899.RE
900
3442c2a0
MA
901.sp
902.ne 2
903.na
904\fBzfs_arc_eviction_pct\fR (int)
905.ad
906.RS 12n
907When \fBarc_is_overflowing()\fR, \fBarc_get_data_impl()\fR waits for this
908percent of the requested amount of data to be evicted. For example, by
909default for every 2KB that's evicted, 1KB of it may be "reused" by a new
910allocation. Since this is above 100%, it ensures that progress is made
911towards getting \fBarc_size\fR under \fBarc_c\fR. Since this is finite, it
912ensures that allocations can still happen, even during the potentially long
913time that \fBarc_size\fR is more than \fBarc_c\fR.
914.sp
915Default value: \fB200\fR.
916.RE
917
ca0bf58d
PS
918.sp
919.ne 2
920.na
921\fBzfs_arc_evict_batch_limit\fR (int)
922.ad
923.RS 12n
8f343973 924Number ARC headers to evict per sub-list before proceeding to another sub-list.
ca0bf58d
PS
925This batch-style operation prevents entire sub-lists from being evicted at once
926but comes at a cost of additional unlocking and locking.
927.sp
928Default value: \fB10\fR.
929.RE
930
29714574
TF
931.sp
932.ne 2
933.na
934\fBzfs_arc_grow_retry\fR (int)
935.ad
936.RS 12n
ca85d690 937If set to a non zero value, it will replace the arc_grow_retry value with this value.
d4a72f23 938The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
ca85d690 939trying to resume growth after a memory pressure event.
29714574 940.sp
ca85d690 941Default value: \fB0\fR.
29714574
TF
942.RE
943
944.sp
945.ne 2
946.na
7e8bddd0 947\fBzfs_arc_lotsfree_percent\fR (int)
29714574
TF
948.ad
949.RS 12n
7e8bddd0
BB
950Throttle I/O when free system memory drops below this percentage of total
951system memory. Setting this value to 0 will disable the throttle.
29714574 952.sp
be54a13c 953Default value: \fB10\fR%.
29714574
TF
954.RE
955
956.sp
957.ne 2
958.na
7e8bddd0 959\fBzfs_arc_max\fR (ulong)
29714574
TF
960.ad
961.RS 12n
9a51738b
RM
962Max size of ARC in bytes. If set to 0 then the max size of ARC is determined
963by the amount of system memory installed. For Linux, 1/2 of system memory will
964be used as the limit. For FreeBSD, the larger of all system memory - 1GB or
9655/8 of system memory will be used as the limit. This value must be at least
96667108864 (64 megabytes).
83426735
D
967.sp
968This value can be changed dynamically with some caveats. It cannot be set back
969to 0 while running and reducing it below the current ARC size will not cause
970the ARC to shrink without memory pressure to induce shrinking.
29714574 971.sp
7e8bddd0 972Default value: \fB0\fR.
29714574
TF
973.RE
974
ca85d690 975.sp
976.ne 2
977.na
978\fBzfs_arc_meta_adjust_restarts\fR (ulong)
979.ad
980.RS 12n
981The number of restart passes to make while scanning the ARC attempting
982the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
983This value should not need to be tuned but is available to facilitate
984performance analysis.
985.sp
986Default value: \fB4096\fR.
987.RE
988
29714574
TF
989.sp
990.ne 2
991.na
992\fBzfs_arc_meta_limit\fR (ulong)
993.ad
994.RS 12n
2cbb06b5
BB
995The maximum allowed size in bytes that meta data buffers are allowed to
996consume in the ARC. When this limit is reached meta data buffers will
997be reclaimed even if the overall arc_c_max has not been reached. This
9907cc1c
G
998value defaults to 0 which indicates that a percent which is based on
999\fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
29714574 1000.sp
83426735 1001This value my be changed dynamically except that it cannot be set back to 0
9907cc1c 1002for a specific percent of the ARC; it must be set to an explicit value.
83426735 1003.sp
29714574
TF
1004Default value: \fB0\fR.
1005.RE
1006
9907cc1c
G
1007.sp
1008.ne 2
1009.na
1010\fBzfs_arc_meta_limit_percent\fR (ulong)
1011.ad
1012.RS 12n
1013Percentage of ARC buffers that can be used for meta data.
1014
1015See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
1016higher priority if set to nonzero value.
1017
1018.sp
be54a13c 1019Default value: \fB75\fR%.
9907cc1c
G
1020.RE
1021
ca0bf58d
PS
1022.sp
1023.ne 2
1024.na
1025\fBzfs_arc_meta_min\fR (ulong)
1026.ad
1027.RS 12n
1028The minimum allowed size in bytes that meta data buffers may consume in
1029the ARC. This value defaults to 0 which disables a floor on the amount
1030of the ARC devoted meta data.
1031.sp
1032Default value: \fB0\fR.
1033.RE
1034
29714574
TF
1035.sp
1036.ne 2
1037.na
1038\fBzfs_arc_meta_prune\fR (int)
1039.ad
1040.RS 12n
2cbb06b5
BB
1041The number of dentries and inodes to be scanned looking for entries
1042which can be dropped. This may be required when the ARC reaches the
1043\fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
1044in the ARC. Increasing this value will cause to dentry and inode caches
1045to be pruned more aggressively. Setting this value to 0 will disable
1046pruning the inode and dentry caches.
29714574 1047.sp
2cbb06b5 1048Default value: \fB10,000\fR.
29714574
TF
1049.RE
1050
bc888666
BB
1051.sp
1052.ne 2
1053.na
ca85d690 1054\fBzfs_arc_meta_strategy\fR (int)
bc888666
BB
1055.ad
1056.RS 12n
ca85d690 1057Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
1058A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
d4a72f23 1059A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
ca85d690 1060that is required to in order to evict the required number of meta data buffers.
bc888666 1061.sp
ca85d690 1062Default value: \fB1\fR.
bc888666
BB
1063.RE
1064
29714574
TF
1065.sp
1066.ne 2
1067.na
1068\fBzfs_arc_min\fR (ulong)
1069.ad
1070.RS 12n
77f6826b 1071Min size of ARC in bytes. If set to 0 then arc_c_min will default to
ca85d690 1072consuming the larger of 32M or 1/32 of total system memory.
29714574 1073.sp
ca85d690 1074Default value: \fB0\fR.
29714574
TF
1075.RE
1076
1077.sp
1078.ne 2
1079.na
d4a72f23 1080\fBzfs_arc_min_prefetch_ms\fR (int)
29714574
TF
1081.ad
1082.RS 12n
d4a72f23 1083Minimum time prefetched blocks are locked in the ARC, specified in ms.
2b84817f 1084A value of \fB0\fR will default to 1000 ms.
d4a72f23
TC
1085.sp
1086Default value: \fB0\fR.
1087.RE
1088
1089.sp
1090.ne 2
1091.na
1092\fBzfs_arc_min_prescient_prefetch_ms\fR (int)
1093.ad
1094.RS 12n
1095Minimum time "prescient prefetched" blocks are locked in the ARC, specified
ac3d4d0c 1096in ms. These blocks are meant to be prefetched fairly aggressively ahead of
2b84817f 1097the code that may use them. A value of \fB0\fR will default to 6000 ms.
29714574 1098.sp
83426735 1099Default value: \fB0\fR.
29714574
TF
1100.RE
1101
6cb8e530
PZ
1102.sp
1103.ne 2
1104.na
1105\fBzfs_max_missing_tvds\fR (int)
1106.ad
1107.RS 12n
1108Number of missing top-level vdevs which will be allowed during
1109pool import (only in read-only mode).
1110.sp
1111Default value: \fB0\fR
1112.RE
1113
009cc8e8
RM
1114.sp
1115.ne 2
1116.na
1117\fBzfs_max_nvlist_src_size\fR (ulong)
1118.ad
1119.RS 12n
1120Maximum size in bytes allowed to be passed as zc_nvlist_src_size for ioctls on
1121/dev/zfs. This prevents a user from causing the kernel to allocate an excessive
1122amount of memory. When the limit is exceeded, the ioctl fails with EINVAL and a
1123description of the error is sent to the zfs-dbgmsg log. This parameter should
1124not need to be touched under normal circumstances. On FreeBSD, the default is
1125based on the system limit on user wired memory. On Linux, the default is
1126\fBKMALLOC_MAX_SIZE\fR .
1127.sp
1128Default value: \fB0\fR (kernel decides)
1129.RE
1130
ca0bf58d
PS
1131.sp
1132.ne 2
1133.na
c30e58c4 1134\fBzfs_multilist_num_sublists\fR (int)
ca0bf58d
PS
1135.ad
1136.RS 12n
1137To allow more fine-grained locking, each ARC state contains a series
1138of lists for both data and meta data objects. Locking is performed at
1139the level of these "sub-lists". This parameters controls the number of
c30e58c4
MA
1140sub-lists per ARC state, and also applies to other uses of the
1141multilist data structure.
ca0bf58d 1142.sp
c30e58c4 1143Default value: \fB4\fR or the number of online CPUs, whichever is greater
ca0bf58d
PS
1144.RE
1145
1146.sp
1147.ne 2
1148.na
1149\fBzfs_arc_overflow_shift\fR (int)
1150.ad
1151.RS 12n
1152The ARC size is considered to be overflowing if it exceeds the current
1153ARC target size (arc_c) by a threshold determined by this parameter.
1154The threshold is calculated as a fraction of arc_c using the formula
1155"arc_c >> \fBzfs_arc_overflow_shift\fR".
1156
1157The default value of 8 causes the ARC to be considered to be overflowing
1158if it exceeds the target size by 1/256th (0.3%) of the target size.
1159
1160When the ARC is overflowing, new buffer allocations are stalled until
1161the reclaim thread catches up and the overflow condition no longer exists.
1162.sp
1163Default value: \fB8\fR.
1164.RE
1165
728d6ae9
BB
1166.sp
1167.ne 2
1168.na
1169
1170\fBzfs_arc_p_min_shift\fR (int)
1171.ad
1172.RS 12n
ca85d690 1173If set to a non zero value, this will update arc_p_min_shift (default 4)
1174with the new value.
d4a72f23 1175arc_p_min_shift is used to shift of arc_c for calculating both min and max
ca85d690 1176max arc_p
728d6ae9 1177.sp
ca85d690 1178Default value: \fB0\fR.
728d6ae9
BB
1179.RE
1180
62422785
PS
1181.sp
1182.ne 2
1183.na
1184\fBzfs_arc_p_dampener_disable\fR (int)
1185.ad
1186.RS 12n
1187Disable arc_p adapt dampener
1188.sp
1189Use \fB1\fR for yes (default) and \fB0\fR to disable.
1190.RE
1191
29714574
TF
1192.sp
1193.ne 2
1194.na
1195\fBzfs_arc_shrink_shift\fR (int)
1196.ad
1197.RS 12n
ca85d690 1198If set to a non zero value, this will update arc_shrink_shift (default 7)
1199with the new value.
29714574 1200.sp
ca85d690 1201Default value: \fB0\fR.
29714574
TF
1202.RE
1203
03b60eee
DB
1204.sp
1205.ne 2
1206.na
1207\fBzfs_arc_pc_percent\fR (uint)
1208.ad
1209.RS 12n
1210Percent of pagecache to reclaim arc to
1211
1212This tunable allows ZFS arc to play more nicely with the kernel's LRU
77f6826b 1213pagecache. It can guarantee that the ARC size won't collapse under scanning
03b60eee
DB
1214pressure on the pagecache, yet still allows arc to be reclaimed down to
1215zfs_arc_min if necessary. This value is specified as percent of pagecache
1216size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
1217only operates during memory pressure/reclaim.
1218.sp
be54a13c 1219Default value: \fB0\fR% (disabled).
03b60eee
DB
1220.RE
1221
3442c2a0
MA
1222.sp
1223.ne 2
1224.na
1225\fBzfs_arc_shrinker_limit\fR (int)
1226.ad
1227.RS 12n
1228This is a limit on how many pages the ARC shrinker makes available for
1229eviction in response to one page allocation attempt. Note that in
1230practice, the kernel's shrinker can ask us to evict up to about 4x this
1231for one allocation attempt.
1232.sp
1233The default limit of 10,000 (in practice, 160MB per allocation attempt with
12344K pages) limits the amount of time spent attempting to reclaim ARC memory to
1235less than 100ms per allocation attempt, even with a small average compressed
1236block size of ~8KB.
1237.sp
1238The parameter can be set to 0 (zero) to disable the limit.
1239.sp
1240This parameter only applies on Linux.
1241.sp
1242Default value: \fB10,000\fR.
1243.RE
1244
11f552fa
BB
1245.sp
1246.ne 2
1247.na
1248\fBzfs_arc_sys_free\fR (ulong)
1249.ad
1250.RS 12n
1251The target number of bytes the ARC should leave as free memory on the system.
1252Defaults to the larger of 1/64 of physical memory or 512K. Setting this
1253option to a non-zero value will override the default.
1254.sp
1255Default value: \fB0\fR.
1256.RE
1257
29714574
TF
1258.sp
1259.ne 2
1260.na
1261\fBzfs_autoimport_disable\fR (int)
1262.ad
1263.RS 12n
27b293be 1264Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
29714574 1265.sp
70081096 1266Use \fB1\fR for yes (default) and \fB0\fR for no.
29714574
TF
1267.RE
1268
80d52c39
TH
1269.sp
1270.ne 2
1271.na
67709516 1272\fBzfs_checksum_events_per_second\fR (uint)
80d52c39
TH
1273.ad
1274.RS 12n
1275Rate limit checksum events to this many per second. Note that this should
1276not be set below the zed thresholds (currently 10 checksums over 10 sec)
1277or else zed may not trigger any action.
1278.sp
1279Default value: 20
1280.RE
1281
2fe61a7e
PS
1282.sp
1283.ne 2
1284.na
1285\fBzfs_commit_timeout_pct\fR (int)
1286.ad
1287.RS 12n
1288This controls the amount of time that a ZIL block (lwb) will remain "open"
1289when it isn't "full", and it has a thread waiting for it to be committed to
1290stable storage. The timeout is scaled based on a percentage of the last lwb
1291latency to avoid significantly impacting the latency of each individual
1292transaction record (itx).
1293.sp
be54a13c 1294Default value: \fB5\fR%.
2fe61a7e
PS
1295.RE
1296
67709516
D
1297.sp
1298.ne 2
1299.na
1300\fBzfs_condense_indirect_commit_entry_delay_ms\fR (int)
1301.ad
1302.RS 12n
1303Vdev indirection layer (used for device removal) sleeps for this many
1304milliseconds during mapping generation. Intended for use with the test suite
1305to throttle vdev removal speed.
1306.sp
1307Default value: \fB0\fR (no throttle).
1308.RE
1309
0dc2f70c
MA
1310.sp
1311.ne 2
1312.na
1313\fBzfs_condense_indirect_vdevs_enable\fR (int)
1314.ad
1315.RS 12n
1316Enable condensing indirect vdev mappings. When set to a non-zero value,
1317attempt to condense indirect vdev mappings if the mapping uses more than
1318\fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete
1319space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR
1320bytes on-disk. The condensing process is an attempt to save memory by
1321removing obsolete mappings.
1322.sp
1323Default value: \fB1\fR.
1324.RE
1325
1326.sp
1327.ne 2
1328.na
1329\fBzfs_condense_max_obsolete_bytes\fR (ulong)
1330.ad
1331.RS 12n
1332Only attempt to condense indirect vdev mappings if the on-disk size
1333of the obsolete space map object is greater than this number of bytes
1334(see \fBfBzfs_condense_indirect_vdevs_enable\fR).
1335.sp
1336Default value: \fB1,073,741,824\fR.
1337.RE
1338
1339.sp
1340.ne 2
1341.na
1342\fBzfs_condense_min_mapping_bytes\fR (ulong)
1343.ad
1344.RS 12n
1345Minimum size vdev mapping to attempt to condense (see
1346\fBzfs_condense_indirect_vdevs_enable\fR).
1347.sp
1348Default value: \fB131,072\fR.
1349.RE
1350
3b36f831
BB
1351.sp
1352.ne 2
1353.na
1354\fBzfs_dbgmsg_enable\fR (int)
1355.ad
1356.RS 12n
1357Internally ZFS keeps a small log to facilitate debugging. By default the log
1358is disabled, to enable it set this option to 1. The contents of the log can
1359be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
1360this proc file clears the log.
1361.sp
1362Default value: \fB0\fR.
1363.RE
1364
1365.sp
1366.ne 2
1367.na
1368\fBzfs_dbgmsg_maxsize\fR (int)
1369.ad
1370.RS 12n
1371The maximum size in bytes of the internal ZFS debug log.
1372.sp
1373Default value: \fB4M\fR.
1374.RE
1375
29714574
TF
1376.sp
1377.ne 2
1378.na
1379\fBzfs_dbuf_state_index\fR (int)
1380.ad
1381.RS 12n
83426735
D
1382This feature is currently unused. It is normally used for controlling what
1383reporting is available under /proc/spl/kstat/zfs.
29714574
TF
1384.sp
1385Default value: \fB0\fR.
1386.RE
1387
1388.sp
1389.ne 2
1390.na
1391\fBzfs_deadman_enabled\fR (int)
1392.ad
1393.RS 12n
b81a3ddc 1394When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
8fb1ede1
BB
1395milliseconds, or when an individual I/O takes longer than
1396\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
1397be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
1398invoked as described by the \fBzfs_deadman_failmode\fR module option.
1399By default the deadman is enabled and configured to \fBwait\fR which results
1400in "hung" I/Os only being logged. The deadman is automatically disabled
1401when a pool gets suspended.
29714574 1402.sp
8fb1ede1
BB
1403Default value: \fB1\fR.
1404.RE
1405
1406.sp
1407.ne 2
1408.na
1409\fBzfs_deadman_failmode\fR (charp)
1410.ad
1411.RS 12n
1412Controls the failure behavior when the deadman detects a "hung" I/O. Valid
1413values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
1414.sp
1415\fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
1416"deadman" event will be posted describing that I/O.
1417.sp
1418\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
1419to the I/O pipeline if possible.
1420.sp
1421\fBpanic\fR - Panic the system. This can be used to facilitate an automatic
1422fail-over to a properly configured fail-over partner.
1423.sp
1424Default value: \fBwait\fR.
b81a3ddc
TC
1425.RE
1426
1427.sp
1428.ne 2
1429.na
1430\fBzfs_deadman_checktime_ms\fR (int)
1431.ad
1432.RS 12n
8fb1ede1
BB
1433Check time in milliseconds. This defines the frequency at which we check
1434for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
b81a3ddc 1435.sp
8fb1ede1 1436Default value: \fB60,000\fR.
29714574
TF
1437.RE
1438
1439.sp
1440.ne 2
1441.na
e8b96c60 1442\fBzfs_deadman_synctime_ms\fR (ulong)
29714574
TF
1443.ad
1444.RS 12n
b81a3ddc 1445Interval in milliseconds after which the deadman is triggered and also
8fb1ede1
BB
1446the interval after which a pool sync operation is considered to be "hung".
1447Once this limit is exceeded the deadman will be invoked every
1448\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
1449.sp
1450Default value: \fB600,000\fR.
1451.RE
b81a3ddc 1452
29714574 1453.sp
8fb1ede1
BB
1454.ne 2
1455.na
1456\fBzfs_deadman_ziotime_ms\fR (ulong)
1457.ad
1458.RS 12n
1459Interval in milliseconds after which the deadman is triggered and an
ad796b8a 1460individual I/O operation is considered to be "hung". As long as the I/O
8fb1ede1
BB
1461remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
1462milliseconds until the I/O completes.
1463.sp
1464Default value: \fB300,000\fR.
29714574
TF
1465.RE
1466
1467.sp
1468.ne 2
1469.na
1470\fBzfs_dedup_prefetch\fR (int)
1471.ad
1472.RS 12n
1473Enable prefetching dedup-ed blks
1474.sp
0dfc7324 1475Use \fB1\fR for yes and \fB0\fR to disable (default).
29714574
TF
1476.RE
1477
e8b96c60
MA
1478.sp
1479.ne 2
1480.na
1481\fBzfs_delay_min_dirty_percent\fR (int)
1482.ad
1483.RS 12n
1484Start to delay each transaction once there is this amount of dirty data,
1485expressed as a percentage of \fBzfs_dirty_data_max\fR.
1486This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
1487See the section "ZFS TRANSACTION DELAY".
1488.sp
be54a13c 1489Default value: \fB60\fR%.
e8b96c60
MA
1490.RE
1491
1492.sp
1493.ne 2
1494.na
1495\fBzfs_delay_scale\fR (int)
1496.ad
1497.RS 12n
1498This controls how quickly the transaction delay approaches infinity.
1499Larger values cause longer delays for a given amount of dirty data.
1500.sp
1501For the smoothest delay, this value should be about 1 billion divided
1502by the maximum number of operations per second. This will smoothly
1503handle between 10x and 1/10th this number.
1504.sp
1505See the section "ZFS TRANSACTION DELAY".
1506.sp
1507Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
1508.sp
1509Default value: \fB500,000\fR.
1510.RE
1511
67709516
D
1512.sp
1513.ne 2
1514.na
1515\fBzfs_disable_ivset_guid_check\fR (int)
1516.ad
1517.RS 12n
1518Disables requirement for IVset guids to be present and match when doing a raw
1519receive of encrypted datasets. Intended for users whose pools were created with
1520ZFS on Linux pre-release versions and now have compatibility issues.
1521.sp
1522Default value: \fB0\fR.
1523.RE
1524
1525.sp
1526.ne 2
1527.na
1528\fBzfs_key_max_salt_uses\fR (ulong)
1529.ad
1530.RS 12n
1531Maximum number of uses of a single salt value before generating a new one for
1532encrypted datasets. The default value is also the maximum that will be
1533accepted.
1534.sp
1535Default value: \fB400,000,000\fR.
1536.RE
1537
1538.sp
1539.ne 2
1540.na
1541\fBzfs_object_mutex_size\fR (uint)
1542.ad
1543.RS 12n
1544Size of the znode hashtable used for holds.
1545
1546Due to the need to hold locks on objects that may not exist yet, kernel mutexes
1547are not created per-object and instead a hashtable is used where collisions
1548will result in objects waiting when there is not actually contention on the
1549same object.
1550.sp
1551Default value: \fB64\fR.
1552.RE
1553
80d52c39
TH
1554.sp
1555.ne 2
1556.na
62ee31ad 1557\fBzfs_slow_io_events_per_second\fR (int)
80d52c39
TH
1558.ad
1559.RS 12n
ad796b8a 1560Rate limit delay zevents (which report slow I/Os) to this many per second.
80d52c39
TH
1561.sp
1562Default value: 20
1563.RE
1564
93e28d66
SD
1565.sp
1566.ne 2
1567.na
1568\fBzfs_unflushed_max_mem_amt\fR (ulong)
1569.ad
1570.RS 12n
1571Upper-bound limit for unflushed metadata changes to be held by the
1572log spacemap in memory (in bytes).
1573.sp
1574Default value: \fB1,073,741,824\fR (1GB).
1575.RE
1576
1577.sp
1578.ne 2
1579.na
1580\fBzfs_unflushed_max_mem_ppm\fR (ulong)
1581.ad
1582.RS 12n
1583Percentage of the overall system memory that ZFS allows to be used
1584for unflushed metadata changes by the log spacemap.
1585(value is calculated over 1000000 for finer granularity).
1586.sp
1587Default value: \fB1000\fR (which is divided by 1000000, resulting in
1588the limit to be \fB0.1\fR% of memory)
1589.RE
1590
1591.sp
1592.ne 2
1593.na
1594\fBzfs_unflushed_log_block_max\fR (ulong)
1595.ad
1596.RS 12n
1597Describes the maximum number of log spacemap blocks allowed for each pool.
1598The default value of 262144 means that the space in all the log spacemaps
1599can add up to no more than 262144 blocks (which means 32GB of logical
1600space before compression and ditto blocks, assuming that blocksize is
1601128k).
1602.sp
1603This tunable is important because it involves a trade-off between import
1604time after an unclean export and the frequency of flushing metaslabs.
1605The higher this number is, the more log blocks we allow when the pool is
1606active which means that we flush metaslabs less often and thus decrease
1607the number of I/Os for spacemap updates per TXG.
1608At the same time though, that means that in the event of an unclean export,
1609there will be more log spacemap blocks for us to read, inducing overhead
1610in the import time of the pool.
1611The lower the number, the amount of flushing increases destroying log
1612blocks quicker as they become obsolete faster, which leaves less blocks
1613to be read during import time after a crash.
1614.sp
1615Each log spacemap block existing during pool import leads to approximately
1616one extra logical I/O issued.
1617This is the reason why this tunable is exposed in terms of blocks rather
1618than space used.
1619.sp
1620Default value: \fB262144\fR (256K).
1621.RE
1622
1623.sp
1624.ne 2
1625.na
1626\fBzfs_unflushed_log_block_min\fR (ulong)
1627.ad
1628.RS 12n
1629If the number of metaslabs is small and our incoming rate is high, we
1630could get into a situation that we are flushing all our metaslabs every
1631TXG.
1632Thus we always allow at least this many log blocks.
1633.sp
1634Default value: \fB1000\fR.
1635.RE
1636
1637.sp
1638.ne 2
1639.na
1640\fBzfs_unflushed_log_block_pct\fR (ulong)
1641.ad
1642.RS 12n
1643Tunable used to determine the number of blocks that can be used for
1644the spacemap log, expressed as a percentage of the total number of
1645metaslabs in the pool.
1646.sp
1647Default value: \fB400\fR (read as \fB400\fR% - meaning that the number
1648of log spacemap blocks are capped at 4 times the number of
1649metaslabs in the pool).
1650.RE
1651
dcec0a12
AP
1652.sp
1653.ne 2
1654.na
1655\fBzfs_unlink_suspend_progress\fR (uint)
1656.ad
1657.RS 12n
1658When enabled, files will not be asynchronously removed from the list of pending
1659unlinks and the space they consume will be leaked. Once this option has been
1660disabled and the dataset is remounted, the pending unlinks will be processed
1661and the freed space returned to the pool.
1662This option is used by the test suite to facilitate testing.
1663.sp
1664Uses \fB0\fR (default) to allow progress and \fB1\fR to pause progress.
1665.RE
1666
a966c564
K
1667.sp
1668.ne 2
1669.na
1670\fBzfs_delete_blocks\fR (ulong)
1671.ad
1672.RS 12n
1673This is the used to define a large file for the purposes of delete. Files
1674containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
1675while smaller files are deleted synchronously. Decreasing this value will
1676reduce the time spent in an unlink(2) system call at the expense of a longer
1677delay before the freed space is available.
1678.sp
1679Default value: \fB20,480\fR.
1680.RE
1681
e8b96c60
MA
1682.sp
1683.ne 2
1684.na
1685\fBzfs_dirty_data_max\fR (int)
1686.ad
1687.RS 12n
1688Determines the dirty space limit in bytes. Once this limit is exceeded, new
1689writes are halted until space frees up. This parameter takes precedence
1690over \fBzfs_dirty_data_max_percent\fR.
1691See the section "ZFS TRANSACTION DELAY".
1692.sp
be54a13c 1693Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR.
e8b96c60
MA
1694.RE
1695
1696.sp
1697.ne 2
1698.na
1699\fBzfs_dirty_data_max_max\fR (int)
1700.ad
1701.RS 12n
1702Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
1703This limit is only enforced at module load time, and will be ignored if
1704\fBzfs_dirty_data_max\fR is later changed. This parameter takes
1705precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
1706"ZFS TRANSACTION DELAY".
1707.sp
be54a13c 1708Default value: \fB25\fR% of physical RAM.
e8b96c60
MA
1709.RE
1710
1711.sp
1712.ne 2
1713.na
1714\fBzfs_dirty_data_max_max_percent\fR (int)
1715.ad
1716.RS 12n
1717Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
1718percentage of physical RAM. This limit is only enforced at module load
1719time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
1720The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
1721one. See the section "ZFS TRANSACTION DELAY".
1722.sp
be54a13c 1723Default value: \fB25\fR%.
e8b96c60
MA
1724.RE
1725
1726.sp
1727.ne 2
1728.na
1729\fBzfs_dirty_data_max_percent\fR (int)
1730.ad
1731.RS 12n
1732Determines the dirty space limit, expressed as a percentage of all
1733memory. Once this limit is exceeded, new writes are halted until space frees
1734up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
1735one. See the section "ZFS TRANSACTION DELAY".
1736.sp
be54a13c 1737Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR.
e8b96c60
MA
1738.RE
1739
1740.sp
1741.ne 2
1742.na
dfbe2675 1743\fBzfs_dirty_data_sync_percent\fR (int)
e8b96c60
MA
1744.ad
1745.RS 12n
dfbe2675
MA
1746Start syncing out a transaction group if there's at least this much dirty data
1747as a percentage of \fBzfs_dirty_data_max\fR. This should be less than
1748\fBzfs_vdev_async_write_active_min_dirty_percent\fR.
e8b96c60 1749.sp
dfbe2675 1750Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR.
e8b96c60
MA
1751.RE
1752
f734301d
AD
1753.sp
1754.ne 2
1755.na
1756\fBzfs_fallocate_reserve_percent\fR (uint)
1757.ad
1758.RS 12n
1759Since ZFS is a copy-on-write filesystem with snapshots, blocks cannot be
1760preallocated for a file in order to guarantee that later writes will not
1761run out of space. Instead, fallocate() space preallocation only checks
1762that sufficient space is currently available in the pool or the user's
1763project quota allocation, and then creates a sparse file of the requested
1764size. The requested space is multiplied by \fBzfs_fallocate_reserve_percent\fR
1765to allow additional space for indirect blocks and other internal metadata.
1766Setting this value to 0 disables support for fallocate(2) and returns
1767EOPNOTSUPP for fallocate() space preallocation again.
1768.sp
1769Default value: \fB110\fR%
1770.RE
1771
1eeb4562
JX
1772.sp
1773.ne 2
1774.na
1775\fBzfs_fletcher_4_impl\fR (string)
1776.ad
1777.RS 12n
1778Select a fletcher 4 implementation.
1779.sp
35a76a03 1780Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
0b2a6423 1781\fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR.
70b258fc
GN
1782All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
1783set extensions to be available and will only appear if ZFS detects that they are
1784present at runtime. If multiple implementations of fletcher 4 are available,
1785the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
1786results in the original, CPU based calculation, being used. Selecting any option
1787other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
1788the respective CPU instruction set being used.
1eeb4562
JX
1789.sp
1790Default value: \fBfastest\fR.
1791.RE
1792
ba5ad9a4
GW
1793.sp
1794.ne 2
1795.na
1796\fBzfs_free_bpobj_enabled\fR (int)
1797.ad
1798.RS 12n
1799Enable/disable the processing of the free_bpobj object.
1800.sp
1801Default value: \fB1\fR.
1802.RE
1803
36283ca2
MG
1804.sp
1805.ne 2
1806.na
a1d477c2 1807\fBzfs_async_block_max_blocks\fR (ulong)
36283ca2
MG
1808.ad
1809.RS 12n
1810Maximum number of blocks freed in a single txg.
1811.sp
4fe3a842
MA
1812Default value: \fBULONG_MAX\fR (unlimited).
1813.RE
1814
1815.sp
1816.ne 2
1817.na
1818\fBzfs_max_async_dedup_frees\fR (ulong)
1819.ad
1820.RS 12n
1821Maximum number of dedup blocks freed in a single txg.
1822.sp
36283ca2
MG
1823Default value: \fB100,000\fR.
1824.RE
1825
ca0845d5
PD
1826.sp
1827.ne 2
1828.na
1829\fBzfs_override_estimate_recordsize\fR (ulong)
1830.ad
1831.RS 12n
1832Record size calculation override for zfs send estimates.
1833.sp
1834Default value: \fB0\fR.
1835.RE
1836
e8b96c60
MA
1837.sp
1838.ne 2
1839.na
1840\fBzfs_vdev_async_read_max_active\fR (int)
1841.ad
1842.RS 12n
83426735 1843Maximum asynchronous read I/Os active to each device.
e8b96c60
MA
1844See the section "ZFS I/O SCHEDULER".
1845.sp
1846Default value: \fB3\fR.
1847.RE
1848
1849.sp
1850.ne 2
1851.na
1852\fBzfs_vdev_async_read_min_active\fR (int)
1853.ad
1854.RS 12n
1855Minimum asynchronous read I/Os active to each device.
1856See the section "ZFS I/O SCHEDULER".
1857.sp
1858Default value: \fB1\fR.
1859.RE
1860
1861.sp
1862.ne 2
1863.na
1864\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
1865.ad
1866.RS 12n
1867When the pool has more than
1868\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
1869\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
1870the dirty data is between min and max, the active I/O limit is linearly
1871interpolated. See the section "ZFS I/O SCHEDULER".
1872.sp
be54a13c 1873Default value: \fB60\fR%.
e8b96c60
MA
1874.RE
1875
1876.sp
1877.ne 2
1878.na
1879\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
1880.ad
1881.RS 12n
1882When the pool has less than
1883\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
1884\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
1885the dirty data is between min and max, the active I/O limit is linearly
1886interpolated. See the section "ZFS I/O SCHEDULER".
1887.sp
be54a13c 1888Default value: \fB30\fR%.
e8b96c60
MA
1889.RE
1890
1891.sp
1892.ne 2
1893.na
1894\fBzfs_vdev_async_write_max_active\fR (int)
1895.ad
1896.RS 12n
83426735 1897Maximum asynchronous write I/Os active to each device.
e8b96c60
MA
1898See the section "ZFS I/O SCHEDULER".
1899.sp
1900Default value: \fB10\fR.
1901.RE
1902
1903.sp
1904.ne 2
1905.na
1906\fBzfs_vdev_async_write_min_active\fR (int)
1907.ad
1908.RS 12n
1909Minimum asynchronous write I/Os active to each device.
1910See the section "ZFS I/O SCHEDULER".
1911.sp
06226b59
D
1912Lower values are associated with better latency on rotational media but poorer
1913resilver performance. The default value of 2 was chosen as a compromise. A
1914value of 3 has been shown to improve resilver performance further at a cost of
1915further increasing latency.
1916.sp
1917Default value: \fB2\fR.
e8b96c60
MA
1918.RE
1919
619f0976
GW
1920.sp
1921.ne 2
1922.na
1923\fBzfs_vdev_initializing_max_active\fR (int)
1924.ad
1925.RS 12n
1926Maximum initializing I/Os active to each device.
1927See the section "ZFS I/O SCHEDULER".
1928.sp
1929Default value: \fB1\fR.
1930.RE
1931
1932.sp
1933.ne 2
1934.na
1935\fBzfs_vdev_initializing_min_active\fR (int)
1936.ad
1937.RS 12n
1938Minimum initializing I/Os active to each device.
1939See the section "ZFS I/O SCHEDULER".
1940.sp
1941Default value: \fB1\fR.
1942.RE
1943
e8b96c60
MA
1944.sp
1945.ne 2
1946.na
1947\fBzfs_vdev_max_active\fR (int)
1948.ad
1949.RS 12n
1950The maximum number of I/Os active to each device. Ideally, this will be >=
1951the sum of each queue's max_active. It must be at least the sum of each
1952queue's min_active. See the section "ZFS I/O SCHEDULER".
1953.sp
1954Default value: \fB1,000\fR.
1955.RE
1956
9a49d3f3
BB
1957.sp
1958.ne 2
1959.na
1960\fBzfs_vdev_rebuild_max_active\fR (int)
1961.ad
1962.RS 12n
1963Maximum sequential resilver I/Os active to each device.
1964See the section "ZFS I/O SCHEDULER".
1965.sp
1966Default value: \fB3\fR.
1967.RE
1968
1969.sp
1970.ne 2
1971.na
1972\fBzfs_vdev_rebuild_min_active\fR (int)
1973.ad
1974.RS 12n
1975Minimum sequential resilver I/Os active to each device.
1976See the section "ZFS I/O SCHEDULER".
1977.sp
1978Default value: \fB1\fR.
1979.RE
1980
619f0976
GW
1981.sp
1982.ne 2
1983.na
1984\fBzfs_vdev_removal_max_active\fR (int)
1985.ad
1986.RS 12n
1987Maximum removal I/Os active to each device.
1988See the section "ZFS I/O SCHEDULER".
1989.sp
1990Default value: \fB2\fR.
1991.RE
1992
1993.sp
1994.ne 2
1995.na
1996\fBzfs_vdev_removal_min_active\fR (int)
1997.ad
1998.RS 12n
1999Minimum removal I/Os active to each device.
2000See the section "ZFS I/O SCHEDULER".
2001.sp
2002Default value: \fB1\fR.
2003.RE
2004
e8b96c60
MA
2005.sp
2006.ne 2
2007.na
2008\fBzfs_vdev_scrub_max_active\fR (int)
2009.ad
2010.RS 12n
83426735 2011Maximum scrub I/Os active to each device.
e8b96c60
MA
2012See the section "ZFS I/O SCHEDULER".
2013.sp
2014Default value: \fB2\fR.
2015.RE
2016
2017.sp
2018.ne 2
2019.na
2020\fBzfs_vdev_scrub_min_active\fR (int)
2021.ad
2022.RS 12n
2023Minimum scrub I/Os active to each device.
2024See the section "ZFS I/O SCHEDULER".
2025.sp
2026Default value: \fB1\fR.
2027.RE
2028
2029.sp
2030.ne 2
2031.na
2032\fBzfs_vdev_sync_read_max_active\fR (int)
2033.ad
2034.RS 12n
83426735 2035Maximum synchronous read I/Os active to each device.
e8b96c60
MA
2036See the section "ZFS I/O SCHEDULER".
2037.sp
2038Default value: \fB10\fR.
2039.RE
2040
2041.sp
2042.ne 2
2043.na
2044\fBzfs_vdev_sync_read_min_active\fR (int)
2045.ad
2046.RS 12n
2047Minimum synchronous read I/Os active to each device.
2048See the section "ZFS I/O SCHEDULER".
2049.sp
2050Default value: \fB10\fR.
2051.RE
2052
2053.sp
2054.ne 2
2055.na
2056\fBzfs_vdev_sync_write_max_active\fR (int)
2057.ad
2058.RS 12n
83426735 2059Maximum synchronous write I/Os active to each device.
e8b96c60
MA
2060See the section "ZFS I/O SCHEDULER".
2061.sp
2062Default value: \fB10\fR.
2063.RE
2064
2065.sp
2066.ne 2
2067.na
2068\fBzfs_vdev_sync_write_min_active\fR (int)
2069.ad
2070.RS 12n
2071Minimum synchronous write I/Os active to each device.
2072See the section "ZFS I/O SCHEDULER".
2073.sp
2074Default value: \fB10\fR.
2075.RE
2076
1b939560
BB
2077.sp
2078.ne 2
2079.na
2080\fBzfs_vdev_trim_max_active\fR (int)
2081.ad
2082.RS 12n
2083Maximum trim/discard I/Os active to each device.
2084See the section "ZFS I/O SCHEDULER".
2085.sp
2086Default value: \fB2\fR.
2087.RE
2088
2089.sp
2090.ne 2
2091.na
2092\fBzfs_vdev_trim_min_active\fR (int)
2093.ad
2094.RS 12n
2095Minimum trim/discard I/Os active to each device.
2096See the section "ZFS I/O SCHEDULER".
2097.sp
2098Default value: \fB1\fR.
2099.RE
2100
3dfb57a3
DB
2101.sp
2102.ne 2
2103.na
2104\fBzfs_vdev_queue_depth_pct\fR (int)
2105.ad
2106.RS 12n
e815485f
TC
2107Maximum number of queued allocations per top-level vdev expressed as
2108a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
2109system to detect devices that are more capable of handling allocations
2110and to allocate more blocks to those devices. It allows for dynamic
2111allocation distribution when devices are imbalanced as fuller devices
2112will tend to be slower than empty devices.
2113
2114See also \fBzio_dva_throttle_enabled\fR.
3dfb57a3 2115.sp
be54a13c 2116Default value: \fB1000\fR%.
3dfb57a3
DB
2117.RE
2118
29714574
TF
2119.sp
2120.ne 2
2121.na
2122\fBzfs_expire_snapshot\fR (int)
2123.ad
2124.RS 12n
2125Seconds to expire .zfs/snapshot
2126.sp
2127Default value: \fB300\fR.
2128.RE
2129
0500e835
BB
2130.sp
2131.ne 2
2132.na
2133\fBzfs_admin_snapshot\fR (int)
2134.ad
2135.RS 12n
2136Allow the creation, removal, or renaming of entries in the .zfs/snapshot
2137directory to cause the creation, destruction, or renaming of snapshots.
2138When enabled this functionality works both locally and over NFS exports
2139which have the 'no_root_squash' option set. This functionality is disabled
2140by default.
2141.sp
2142Use \fB1\fR for yes and \fB0\fR for no (default).
2143.RE
2144
29714574
TF
2145.sp
2146.ne 2
2147.na
2148\fBzfs_flags\fR (int)
2149.ad
2150.RS 12n
33b6dbbc
NB
2151Set additional debugging flags. The following flags may be bitwise-or'd
2152together.
2153.sp
2154.TS
2155box;
2156rB lB
2157lB lB
2158r l.
2159Value Symbolic Name
2160 Description
2161_
21621 ZFS_DEBUG_DPRINTF
2163 Enable dprintf entries in the debug log.
2164_
21652 ZFS_DEBUG_DBUF_VERIFY *
2166 Enable extra dbuf verifications.
2167_
21684 ZFS_DEBUG_DNODE_VERIFY *
2169 Enable extra dnode verifications.
2170_
21718 ZFS_DEBUG_SNAPNAMES
2172 Enable snapshot name verification.
2173_
217416 ZFS_DEBUG_MODIFY
2175 Check for illegally modified ARC buffers.
2176_
33b6dbbc
NB
217764 ZFS_DEBUG_ZIO_FREE
2178 Enable verification of block frees.
2179_
2180128 ZFS_DEBUG_HISTOGRAM_VERIFY
2181 Enable extra spacemap histogram verifications.
8740cf4a
NB
2182_
2183256 ZFS_DEBUG_METASLAB_VERIFY
2184 Verify space accounting on disk matches in-core range_trees.
2185_
2186512 ZFS_DEBUG_SET_ERROR
2187 Enable SET_ERROR and dprintf entries in the debug log.
1b939560
BB
2188_
21891024 ZFS_DEBUG_INDIRECT_REMAP
2190 Verify split blocks created by device removal.
2191_
21922048 ZFS_DEBUG_TRIM
2193 Verify TRIM ranges are always within the allocatable range tree.
93e28d66
SD
2194_
21954096 ZFS_DEBUG_LOG_SPACEMAP
2196 Verify that the log summary is consistent with the spacemap log
2197 and enable zfs_dbgmsgs for metaslab loading and flushing.
33b6dbbc
NB
2198.TE
2199.sp
2200* Requires debug build.
29714574 2201.sp
33b6dbbc 2202Default value: \fB0\fR.
29714574
TF
2203.RE
2204
fbeddd60
MA
2205.sp
2206.ne 2
2207.na
2208\fBzfs_free_leak_on_eio\fR (int)
2209.ad
2210.RS 12n
2211If destroy encounters an EIO while reading metadata (e.g. indirect
2212blocks), space referenced by the missing metadata can not be freed.
2213Normally this causes the background destroy to become "stalled", as
2214it is unable to make forward progress. While in this stalled state,
2215all remaining space to free from the error-encountering filesystem is
2216"temporarily leaked". Set this flag to cause it to ignore the EIO,
2217permanently leak the space from indirect blocks that can not be read,
2218and continue to free everything else that it can.
2219
2220The default, "stalling" behavior is useful if the storage partially
2221fails (i.e. some but not all i/os fail), and then later recovers. In
2222this case, we will be able to continue pool operations while it is
2223partially failed, and when it recovers, we can continue to free the
2224space, with no leaks. However, note that this case is actually
2225fairly rare.
2226
2227Typically pools either (a) fail completely (but perhaps temporarily,
2228e.g. a top-level vdev going offline), or (b) have localized,
2229permanent errors (e.g. disk returns the wrong data due to bit flip or
2230firmware bug). In case (a), this setting does not matter because the
2231pool will be suspended and the sync thread will not be able to make
2232forward progress regardless. In case (b), because the error is
2233permanent, the best we can do is leak the minimum amount of space,
2234which is what setting this flag will do. Therefore, it is reasonable
2235for this flag to normally be set, but we chose the more conservative
2236approach of not setting it, so that there is no possibility of
2237leaking space in the "partial temporary" failure case.
2238.sp
2239Default value: \fB0\fR.
2240.RE
2241
29714574
TF
2242.sp
2243.ne 2
2244.na
2245\fBzfs_free_min_time_ms\fR (int)
2246.ad
2247.RS 12n
6146e17e 2248During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
83426735 2249of this much time will be spent working on freeing blocks per txg.
29714574
TF
2250.sp
2251Default value: \fB1,000\fR.
2252.RE
2253
67709516
D
2254.sp
2255.ne 2
2256.na
2257\fBzfs_obsolete_min_time_ms\fR (int)
2258.ad
2259.RS 12n
dd4bc569 2260Similar to \fBzfs_free_min_time_ms\fR but for cleanup of old indirection records
67709516
D
2261for removed vdevs.
2262.sp
2263Default value: \fB500\fR.
2264.RE
2265
29714574
TF
2266.sp
2267.ne 2
2268.na
2269\fBzfs_immediate_write_sz\fR (long)
2270.ad
2271.RS 12n
83426735 2272Largest data block to write to zil. Larger blocks will be treated as if the
6146e17e 2273dataset being written to had the property setting \fBlogbias=throughput\fR.
29714574
TF
2274.sp
2275Default value: \fB32,768\fR.
2276.RE
2277
619f0976
GW
2278.sp
2279.ne 2
2280.na
2281\fBzfs_initialize_value\fR (ulong)
2282.ad
2283.RS 12n
2284Pattern written to vdev free space by \fBzpool initialize\fR.
2285.sp
2286Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee).
2287.RE
2288
e60e158e
JG
2289.sp
2290.ne 2
2291.na
2292\fBzfs_initialize_chunk_size\fR (ulong)
2293.ad
2294.RS 12n
2295Size of writes used by \fBzpool initialize\fR.
2296This option is used by the test suite to facilitate testing.
2297.sp
2298Default value: \fB1,048,576\fR
2299.RE
2300
37f03da8
SH
2301.sp
2302.ne 2
2303.na
2304\fBzfs_livelist_max_entries\fR (ulong)
2305.ad
2306.RS 12n
2307The threshold size (in block pointers) at which we create a new sub-livelist.
2308Larger sublists are more costly from a memory perspective but the fewer
2309sublists there are, the lower the cost of insertion.
2310.sp
2311Default value: \fB500,000\fR.
2312.RE
2313
2314.sp
2315.ne 2
2316.na
2317\fBzfs_livelist_min_percent_shared\fR (int)
2318.ad
2319.RS 12n
2320If the amount of shared space between a snapshot and its clone drops below
2321this threshold, the clone turns off the livelist and reverts to the old deletion
2322method. This is in place because once a clone has been overwritten enough
2323livelists no long give us a benefit.
2324.sp
2325Default value: \fB75\fR.
2326.RE
2327
2328.sp
2329.ne 2
2330.na
2331\fBzfs_livelist_condense_new_alloc\fR (int)
2332.ad
2333.RS 12n
2334Incremented each time an extra ALLOC blkptr is added to a livelist entry while
2335it is being condensed.
2336This option is used by the test suite to track race conditions.
2337.sp
2338Default value: \fB0\fR.
2339.RE
2340
2341.sp
2342.ne 2
2343.na
2344\fBzfs_livelist_condense_sync_cancel\fR (int)
2345.ad
2346.RS 12n
2347Incremented each time livelist condensing is canceled while in
2348spa_livelist_condense_sync.
2349This option is used by the test suite to track race conditions.
2350.sp
2351Default value: \fB0\fR.
2352.RE
2353
2354.sp
2355.ne 2
2356.na
2357\fBzfs_livelist_condense_sync_pause\fR (int)
2358.ad
2359.RS 12n
2360When set, the livelist condense process pauses indefinitely before
2361executing the synctask - spa_livelist_condense_sync.
2362This option is used by the test suite to trigger race conditions.
2363.sp
2364Default value: \fB0\fR.
2365.RE
2366
2367.sp
2368.ne 2
2369.na
2370\fBzfs_livelist_condense_zthr_cancel\fR (int)
2371.ad
2372.RS 12n
2373Incremented each time livelist condensing is canceled while in
2374spa_livelist_condense_cb.
2375This option is used by the test suite to track race conditions.
2376.sp
2377Default value: \fB0\fR.
2378.RE
2379
2380.sp
2381.ne 2
2382.na
2383\fBzfs_livelist_condense_zthr_pause\fR (int)
2384.ad
2385.RS 12n
2386When set, the livelist condense process pauses indefinitely before
2387executing the open context condensing work in spa_livelist_condense_cb.
2388This option is used by the test suite to trigger race conditions.
2389.sp
2390Default value: \fB0\fR.
2391.RE
2392
917f475f
JG
2393.sp
2394.ne 2
2395.na
2396\fBzfs_lua_max_instrlimit\fR (ulong)
2397.ad
2398.RS 12n
2399The maximum execution time limit that can be set for a ZFS channel program,
2400specified as a number of Lua instructions.
2401.sp
2402Default value: \fB100,000,000\fR.
2403.RE
2404
2405.sp
2406.ne 2
2407.na
2408\fBzfs_lua_max_memlimit\fR (ulong)
2409.ad
2410.RS 12n
2411The maximum memory limit that can be set for a ZFS channel program, specified
2412in bytes.
2413.sp
2414Default value: \fB104,857,600\fR.
2415.RE
2416
a7ed98d8
SD
2417.sp
2418.ne 2
2419.na
2420\fBzfs_max_dataset_nesting\fR (int)
2421.ad
2422.RS 12n
2423The maximum depth of nested datasets. This value can be tuned temporarily to
2424fix existing datasets that exceed the predefined limit.
2425.sp
2426Default value: \fB50\fR.
2427.RE
2428
93e28d66
SD
2429.sp
2430.ne 2
2431.na
2432\fBzfs_max_log_walking\fR (ulong)
2433.ad
2434.RS 12n
2435The number of past TXGs that the flushing algorithm of the log spacemap
2436feature uses to estimate incoming log blocks.
2437.sp
2438Default value: \fB5\fR.
2439.RE
2440
2441.sp
2442.ne 2
2443.na
2444\fBzfs_max_logsm_summary_length\fR (ulong)
2445.ad
2446.RS 12n
2447Maximum number of rows allowed in the summary of the spacemap log.
2448.sp
2449Default value: \fB10\fR.
2450.RE
2451
f1512ee6
MA
2452.sp
2453.ne 2
2454.na
2455\fBzfs_max_recordsize\fR (int)
2456.ad
2457.RS 12n
2458We currently support block sizes from 512 bytes to 16MB. The benefits of
ad796b8a 2459larger blocks, and thus larger I/O, need to be weighed against the cost of
f1512ee6
MA
2460COWing a giant block to modify one byte. Additionally, very large blocks
2461can have an impact on i/o latency, and also potentially on the memory
2462allocator. Therefore, we do not allow the recordsize to be set larger than
2463zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
2464this tunable, and pools with larger blocks can always be imported and used,
2465regardless of this setting.
2466.sp
2467Default value: \fB1,048,576\fR.
2468.RE
2469
30af21b0
PD
2470.sp
2471.ne 2
2472.na
2473\fBzfs_allow_redacted_dataset_mount\fR (int)
2474.ad
2475.RS 12n
2476Allow datasets received with redacted send/receive to be mounted. Normally
2477disabled because these datasets may be missing key data.
2478.sp
2479Default value: \fB0\fR.
2480.RE
2481
93e28d66
SD
2482.sp
2483.ne 2
2484.na
2485\fBzfs_min_metaslabs_to_flush\fR (ulong)
2486.ad
2487.RS 12n
2488Minimum number of metaslabs to flush per dirty TXG
2489.sp
2490Default value: \fB1\fR.
2491.RE
2492
f3a7f661
GW
2493.sp
2494.ne 2
2495.na
2496\fBzfs_metaslab_fragmentation_threshold\fR (int)
2497.ad
2498.RS 12n
2499Allow metaslabs to keep their active state as long as their fragmentation
2500percentage is less than or equal to this value. An active metaslab that
2501exceeds this threshold will no longer keep its active status allowing
2502better metaslabs to be selected.
2503.sp
2504Default value: \fB70\fR.
2505.RE
2506
2507.sp
2508.ne 2
2509.na
2510\fBzfs_mg_fragmentation_threshold\fR (int)
2511.ad
2512.RS 12n
2513Metaslab groups are considered eligible for allocations if their
83426735 2514fragmentation metric (measured as a percentage) is less than or equal to
f3a7f661
GW
2515this value. If a metaslab group exceeds this threshold then it will be
2516skipped unless all metaslab groups within the metaslab class have also
2517crossed this threshold.
2518.sp
cb020f0d 2519Default value: \fB95\fR.
f3a7f661
GW
2520.RE
2521
f4a4046b
TC
2522.sp
2523.ne 2
2524.na
2525\fBzfs_mg_noalloc_threshold\fR (int)
2526.ad
2527.RS 12n
2528Defines a threshold at which metaslab groups should be eligible for
2529allocations. The value is expressed as a percentage of free space
2530beyond which a metaslab group is always eligible for allocations.
2531If a metaslab group's free space is less than or equal to the
6b4e21c6 2532threshold, the allocator will avoid allocating to that group
f4a4046b
TC
2533unless all groups in the pool have reached the threshold. Once all
2534groups have reached the threshold, all groups are allowed to accept
2535allocations. The default value of 0 disables the feature and causes
2536all metaslab groups to be eligible for allocations.
2537
b58237e7 2538This parameter allows one to deal with pools having heavily imbalanced
f4a4046b
TC
2539vdevs such as would be the case when a new vdev has been added.
2540Setting the threshold to a non-zero percentage will stop allocations
2541from being made to vdevs that aren't filled to the specified percentage
2542and allow lesser filled vdevs to acquire more allocations than they
2543otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
2544.sp
2545Default value: \fB0\fR.
2546.RE
2547
cc99f275
DB
2548.sp
2549.ne 2
2550.na
2551\fBzfs_ddt_data_is_special\fR (int)
2552.ad
2553.RS 12n
2554If enabled, ZFS will place DDT data into the special allocation class.
2555.sp
2556Default value: \fB1\fR.
2557.RE
2558
2559.sp
2560.ne 2
2561.na
2562\fBzfs_user_indirect_is_special\fR (int)
2563.ad
2564.RS 12n
2565If enabled, ZFS will place user data (both file and zvol) indirect blocks
2566into the special allocation class.
2567.sp
2568Default value: \fB1\fR.
2569.RE
2570
379ca9cf
OF
2571.sp
2572.ne 2
2573.na
2574\fBzfs_multihost_history\fR (int)
2575.ad
2576.RS 12n
2577Historical statistics for the last N multihost updates will be available in
2578\fB/proc/spl/kstat/zfs/<pool>/multihost\fR
2579.sp
2580Default value: \fB0\fR.
2581.RE
2582
2583.sp
2584.ne 2
2585.na
2586\fBzfs_multihost_interval\fR (ulong)
2587.ad
2588.RS 12n
2589Used to control the frequency of multihost writes which are performed when the
060f0226
OF
2590\fBmultihost\fR pool property is on. This is one factor used to determine the
2591length of the activity check during import.
379ca9cf 2592.sp
060f0226
OF
2593The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR
2594milliseconds. On average a multihost write will be issued for each leaf vdev
2595every \fBzfs_multihost_interval\fR milliseconds. In practice, the observed
2596period can vary with the I/O load and this observed value is the delay which is
2597stored in the uberblock.
379ca9cf
OF
2598.sp
2599Default value: \fB1000\fR.
2600.RE
2601
2602.sp
2603.ne 2
2604.na
2605\fBzfs_multihost_import_intervals\fR (uint)
2606.ad
2607.RS 12n
2608Used to control the duration of the activity test on import. Smaller values of
2609\fBzfs_multihost_import_intervals\fR will reduce the import time but increase
2610the risk of failing to detect an active pool. The total activity check time is
060f0226
OF
2611never allowed to drop below one second.
2612.sp
2613On import the activity check waits a minimum amount of time determined by
2614\fBzfs_multihost_interval * zfs_multihost_import_intervals\fR, or the same
2615product computed on the host which last had the pool imported (whichever is
2616greater). The activity check time may be further extended if the value of mmp
2617delay found in the best uberblock indicates actual multihost updates happened
2618at longer intervals than \fBzfs_multihost_interval\fR. A minimum value of
2619\fB100ms\fR is enforced.
2620.sp
2621A value of 0 is ignored and treated as if it was set to 1.
379ca9cf 2622.sp
db2af93d 2623Default value: \fB20\fR.
379ca9cf
OF
2624.RE
2625
2626.sp
2627.ne 2
2628.na
2629\fBzfs_multihost_fail_intervals\fR (uint)
2630.ad
2631.RS 12n
060f0226
OF
2632Controls the behavior of the pool when multihost write failures or delays are
2633detected.
379ca9cf 2634.sp
060f0226
OF
2635When \fBzfs_multihost_fail_intervals = 0\fR, multihost write failures or delays
2636are ignored. The failures will still be reported to the ZED which depending on
2637its configuration may take action such as suspending the pool or offlining a
2638device.
2639
379ca9cf 2640.sp
060f0226
OF
2641When \fBzfs_multihost_fail_intervals > 0\fR, the pool will be suspended if
2642\fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds pass
2643without a successful mmp write. This guarantees the activity test will see
2644mmp writes if the pool is imported. A value of 1 is ignored and treated as
2645if it was set to 2. This is necessary to prevent the pool from being suspended
2646due to normal, small I/O latency variations.
2647
379ca9cf 2648.sp
db2af93d 2649Default value: \fB10\fR.
379ca9cf
OF
2650.RE
2651
29714574
TF
2652.sp
2653.ne 2
2654.na
2655\fBzfs_no_scrub_io\fR (int)
2656.ad
2657.RS 12n
83426735
D
2658Set for no scrub I/O. This results in scrubs not actually scrubbing data and
2659simply doing a metadata crawl of the pool instead.
29714574
TF
2660.sp
2661Use \fB1\fR for yes and \fB0\fR for no (default).
2662.RE
2663
2664.sp
2665.ne 2
2666.na
2667\fBzfs_no_scrub_prefetch\fR (int)
2668.ad
2669.RS 12n
83426735 2670Set to disable block prefetching for scrubs.
29714574
TF
2671.sp
2672Use \fB1\fR for yes and \fB0\fR for no (default).
2673.RE
2674
29714574
TF
2675.sp
2676.ne 2
2677.na
2678\fBzfs_nocacheflush\fR (int)
2679.ad
2680.RS 12n
53b1f5ea
PS
2681Disable cache flush operations on disks when writing. Setting this will
2682cause pool corruption on power loss if a volatile out-of-order write cache
2683is enabled.
29714574
TF
2684.sp
2685Use \fB1\fR for yes and \fB0\fR for no (default).
2686.RE
2687
2688.sp
2689.ne 2
2690.na
2691\fBzfs_nopwrite_enabled\fR (int)
2692.ad
2693.RS 12n
2694Enable NOP writes
2695.sp
2696Use \fB1\fR for yes (default) and \fB0\fR to disable.
2697.RE
2698
66aca247
DB
2699.sp
2700.ne 2
2701.na
2702\fBzfs_dmu_offset_next_sync\fR (int)
2703.ad
2704.RS 12n
2705Enable forcing txg sync to find holes. When enabled forces ZFS to act
2706like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
2707when a dnode is dirty causes txg's to be synced so that this data can be
2708found.
2709.sp
2710Use \fB1\fR for yes and \fB0\fR to disable (default).
2711.RE
2712
29714574
TF
2713.sp
2714.ne 2
2715.na
b738bc5a 2716\fBzfs_pd_bytes_max\fR (int)
29714574
TF
2717.ad
2718.RS 12n
83426735 2719The number of bytes which should be prefetched during a pool traversal
6146e17e 2720(eg: \fBzfs send\fR or other data crawling operations)
29714574 2721.sp
74aa2ba2 2722Default value: \fB52,428,800\fR.
29714574
TF
2723.RE
2724
bef78122
DQ
2725.sp
2726.ne 2
2727.na
2728\fBzfs_per_txg_dirty_frees_percent \fR (ulong)
2729.ad
2730.RS 12n
65282ee9
AP
2731Tunable to control percentage of dirtied indirect blocks from frees allowed
2732into one TXG. After this threshold is crossed, additional frees will wait until
2733the next TXG.
bef78122
DQ
2734A value of zero will disable this throttle.
2735.sp
65282ee9 2736Default value: \fB5\fR, set to \fB0\fR to disable.
bef78122
DQ
2737.RE
2738
29714574
TF
2739.sp
2740.ne 2
2741.na
2742\fBzfs_prefetch_disable\fR (int)
2743.ad
2744.RS 12n
7f60329a
MA
2745This tunable disables predictive prefetch. Note that it leaves "prescient"
2746prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
2747prescient prefetch never issues i/os that end up not being needed, so it
2748can't hurt performance.
29714574
TF
2749.sp
2750Use \fB1\fR for yes and \fB0\fR for no (default).
2751.RE
2752
5090f727
CZ
2753.sp
2754.ne 2
2755.na
2756\fBzfs_qat_checksum_disable\fR (int)
2757.ad
2758.RS 12n
2759This tunable disables qat hardware acceleration for sha256 checksums. It
2760may be set after the zfs modules have been loaded to initialize the qat
2761hardware as long as support is compiled in and the qat driver is present.
2762.sp
2763Use \fB1\fR for yes and \fB0\fR for no (default).
2764.RE
2765
2766.sp
2767.ne 2
2768.na
2769\fBzfs_qat_compress_disable\fR (int)
2770.ad
2771.RS 12n
2772This tunable disables qat hardware acceleration for gzip compression. It
2773may be set after the zfs modules have been loaded to initialize the qat
2774hardware as long as support is compiled in and the qat driver is present.
2775.sp
2776Use \fB1\fR for yes and \fB0\fR for no (default).
2777.RE
2778
2779.sp
2780.ne 2
2781.na
2782\fBzfs_qat_encrypt_disable\fR (int)
2783.ad
2784.RS 12n
2785This tunable disables qat hardware acceleration for AES-GCM encryption. It
2786may be set after the zfs modules have been loaded to initialize the qat
2787hardware as long as support is compiled in and the qat driver is present.
2788.sp
2789Use \fB1\fR for yes and \fB0\fR for no (default).
2790.RE
2791
29714574
TF
2792.sp
2793.ne 2
2794.na
2795\fBzfs_read_chunk_size\fR (long)
2796.ad
2797.RS 12n
2798Bytes to read per chunk
2799.sp
2800Default value: \fB1,048,576\fR.
2801.RE
2802
2803.sp
2804.ne 2
2805.na
2806\fBzfs_read_history\fR (int)
2807.ad
2808.RS 12n
379ca9cf
OF
2809Historical statistics for the last N reads will be available in
2810\fB/proc/spl/kstat/zfs/<pool>/reads\fR
29714574 2811.sp
83426735 2812Default value: \fB0\fR (no data is kept).
29714574
TF
2813.RE
2814
2815.sp
2816.ne 2
2817.na
2818\fBzfs_read_history_hits\fR (int)
2819.ad
2820.RS 12n
2821Include cache hits in read history
2822.sp
2823Use \fB1\fR for yes and \fB0\fR for no (default).
2824.RE
2825
9a49d3f3
BB
2826.sp
2827.ne 2
2828.na
2829\fBzfs_rebuild_max_segment\fR (ulong)
2830.ad
2831.RS 12n
2832Maximum read segment size to issue when sequentially resilvering a
2833top-level vdev.
2834.sp
2835Default value: \fB1,048,576\fR.
2836.RE
2837
9e052db4
MA
2838.sp
2839.ne 2
2840.na
4589f3ae
BB
2841\fBzfs_reconstruct_indirect_combinations_max\fR (int)
2842.ad
2843.RS 12na
2844If an indirect split block contains more than this many possible unique
2845combinations when being reconstructed, consider it too computationally
2846expensive to check them all. Instead, try at most
2847\fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected
2848combinations each time the block is accessed. This allows all segment
2849copies to participate fairly in the reconstruction when all combinations
2850cannot be checked and prevents repeated use of one bad copy.
2851.sp
64bdf63f 2852Default value: \fB4096\fR.
9e052db4
MA
2853.RE
2854
29714574
TF
2855.sp
2856.ne 2
2857.na
2858\fBzfs_recover\fR (int)
2859.ad
2860.RS 12n
2861Set to attempt to recover from fatal errors. This should only be used as a
2862last resort, as it typically results in leaked space, or worse.
2863.sp
2864Use \fB1\fR for yes and \fB0\fR for no (default).
2865.RE
2866
7c9a4292
BB
2867.sp
2868.ne 2
2869.na
2870\fBzfs_removal_ignore_errors\fR (int)
2871.ad
2872.RS 12n
2873.sp
2874Ignore hard IO errors during device removal. When set, if a device encounters
2875a hard IO error during the removal process the removal will not be cancelled.
2876This can result in a normally recoverable block becoming permanently damaged
2877and is not recommended. This should only be used as a last resort when the
2878pool cannot be returned to a healthy state prior to removing the device.
2879.sp
2880Default value: \fB0\fR.
2881.RE
2882
53dce5ac
MA
2883.sp
2884.ne 2
2885.na
2886\fBzfs_removal_suspend_progress\fR (int)
2887.ad
2888.RS 12n
2889.sp
2890This is used by the test suite so that it can ensure that certain actions
2891happen while in the middle of a removal.
2892.sp
2893Default value: \fB0\fR.
2894.RE
2895
2896.sp
2897.ne 2
2898.na
2899\fBzfs_remove_max_segment\fR (int)
2900.ad
2901.RS 12n
2902.sp
2903The largest contiguous segment that we will attempt to allocate when removing
2904a device. This can be no larger than 16MB. If there is a performance
2905problem with attempting to allocate large blocks, consider decreasing this.
2906.sp
2907Default value: \fB16,777,216\fR (16MB).
2908.RE
2909
67709516
D
2910.sp
2911.ne 2
2912.na
2913\fBzfs_resilver_disable_defer\fR (int)
2914.ad
2915.RS 12n
2916Disables the \fBresilver_defer\fR feature, causing an operation that would
2917start a resilver to restart one in progress immediately.
2918.sp
2919Default value: \fB0\fR (feature enabled).
2920.RE
2921
29714574
TF
2922.sp
2923.ne 2
2924.na
d4a72f23 2925\fBzfs_resilver_min_time_ms\fR (int)
29714574
TF
2926.ad
2927.RS 12n
d4a72f23
TC
2928Resilvers are processed by the sync thread. While resilvering it will spend
2929at least this much time working on a resilver between txg flushes.
29714574 2930.sp
d4a72f23 2931Default value: \fB3,000\fR.
29714574
TF
2932.RE
2933
02638a30
TC
2934.sp
2935.ne 2
2936.na
2937\fBzfs_scan_ignore_errors\fR (int)
2938.ad
2939.RS 12n
2940If set to a nonzero value, remove the DTL (dirty time list) upon
2941completion of a pool scan (scrub) even if there were unrepairable
2942errors. It is intended to be used during pool repair or recovery to
2943stop resilvering when the pool is next imported.
2944.sp
2945Default value: \fB0\fR.
2946.RE
2947
29714574
TF
2948.sp
2949.ne 2
2950.na
d4a72f23 2951\fBzfs_scrub_min_time_ms\fR (int)
29714574
TF
2952.ad
2953.RS 12n
d4a72f23
TC
2954Scrubs are processed by the sync thread. While scrubbing it will spend
2955at least this much time working on a scrub between txg flushes.
29714574 2956.sp
d4a72f23 2957Default value: \fB1,000\fR.
29714574
TF
2958.RE
2959
2960.sp
2961.ne 2
2962.na
d4a72f23 2963\fBzfs_scan_checkpoint_intval\fR (int)
29714574
TF
2964.ad
2965.RS 12n
d4a72f23
TC
2966To preserve progress across reboots the sequential scan algorithm periodically
2967needs to stop metadata scanning and issue all the verifications I/Os to disk.
2968The frequency of this flushing is determined by the
a8577bdb 2969\fBzfs_scan_checkpoint_intval\fR tunable.
29714574 2970.sp
d4a72f23 2971Default value: \fB7200\fR seconds (every 2 hours).
29714574
TF
2972.RE
2973
2974.sp
2975.ne 2
2976.na
d4a72f23 2977\fBzfs_scan_fill_weight\fR (int)
29714574
TF
2978.ad
2979.RS 12n
d4a72f23
TC
2980This tunable affects how scrub and resilver I/O segments are ordered. A higher
2981number indicates that we care more about how filled in a segment is, while a
2982lower number indicates we care more about the size of the extent without
2983considering the gaps within a segment. This value is only tunable upon module
2984insertion. Changing the value afterwards will have no affect on scrub or
2985resilver performance.
29714574 2986.sp
d4a72f23 2987Default value: \fB3\fR.
29714574
TF
2988.RE
2989
2990.sp
2991.ne 2
2992.na
d4a72f23 2993\fBzfs_scan_issue_strategy\fR (int)
29714574
TF
2994.ad
2995.RS 12n
d4a72f23
TC
2996Determines the order that data will be verified while scrubbing or resilvering.
2997If set to \fB1\fR, data will be verified as sequentially as possible, given the
2998amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
2999may improve scrub performance if the pool's data is very fragmented. If set to
3000\fB2\fR, the largest mostly-contiguous chunk of found data will be verified
3001first. By deferring scrubbing of small segments, we may later find adjacent data
3002to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
3003strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
3004checkpoint.
29714574 3005.sp
d4a72f23
TC
3006Default value: \fB0\fR.
3007.RE
3008
3009.sp
3010.ne 2
3011.na
3012\fBzfs_scan_legacy\fR (int)
3013.ad
3014.RS 12n
3015A value of 0 indicates that scrubs and resilvers will gather metadata in
3016memory before issuing sequential I/O. A value of 1 indicates that the legacy
3017algorithm will be used where I/O is initiated as soon as it is discovered.
3018Changing this value to 0 will not affect scrubs or resilvers that are already
3019in progress.
3020.sp
3021Default value: \fB0\fR.
3022.RE
3023
3024.sp
3025.ne 2
3026.na
3027\fBzfs_scan_max_ext_gap\fR (int)
3028.ad
3029.RS 12n
3030Indicates the largest gap in bytes between scrub / resilver I/Os that will still
3031be considered sequential for sorting purposes. Changing this value will not
3032affect scrubs or resilvers that are already in progress.
3033.sp
3034Default value: \fB2097152 (2 MB)\fR.
3035.RE
3036
3037.sp
3038.ne 2
3039.na
3040\fBzfs_scan_mem_lim_fact\fR (int)
3041.ad
3042.RS 12n
3043Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
3044This tunable determines the hard limit for I/O sorting memory usage.
3045When the hard limit is reached we stop scanning metadata and start issuing
3046data verification I/O. This is done until we get below the soft limit.
3047.sp
3048Default value: \fB20\fR which is 5% of RAM (1/20).
3049.RE
3050
3051.sp
3052.ne 2
3053.na
3054\fBzfs_scan_mem_lim_soft_fact\fR (int)
3055.ad
3056.RS 12n
3057The fraction of the hard limit used to determined the soft limit for I/O sorting
ac3d4d0c 3058by the sequential scan algorithm. When we cross this limit from below no action
d4a72f23
TC
3059is taken. When we cross this limit from above it is because we are issuing
3060verification I/O. In this case (unless the metadata scan is done) we stop
3061issuing verification I/O and start scanning metadata again until we get to the
3062hard limit.
3063.sp
3064Default value: \fB20\fR which is 5% of the hard limit (1/20).
3065.RE
3066
67709516
D
3067.sp
3068.ne 2
3069.na
3070\fBzfs_scan_strict_mem_lim\fR (int)
3071.ad
3072.RS 12n
3073Enforces tight memory limits on pool scans when a sequential scan is in
3074progress. When disabled the memory limit may be exceeded by fast disks.
3075.sp
3076Default value: \fB0\fR.
3077.RE
3078
3079.sp
3080.ne 2
3081.na
3082\fBzfs_scan_suspend_progress\fR (int)
3083.ad
3084.RS 12n
3085Freezes a scrub/resilver in progress without actually pausing it. Intended for
3086testing/debugging.
3087.sp
3088Default value: \fB0\fR.
3089.RE
3090
3091
d4a72f23
TC
3092.sp
3093.ne 2
3094.na
3095\fBzfs_scan_vdev_limit\fR (int)
3096.ad
3097.RS 12n
3098Maximum amount of data that can be concurrently issued at once for scrubs and
3099resilvers per leaf device, given in bytes.
3100.sp
3101Default value: \fB41943040\fR.
29714574
TF
3102.RE
3103
fd8febbd
TF
3104.sp
3105.ne 2
3106.na
3107\fBzfs_send_corrupt_data\fR (int)
3108.ad
3109.RS 12n
83426735 3110Allow sending of corrupt data (ignore read/checksum errors when sending data)
fd8febbd
TF
3111.sp
3112Use \fB1\fR for yes and \fB0\fR for no (default).
3113.RE
3114
caf9dd20
BB
3115.sp
3116.ne 2
3117.na
3118\fBzfs_send_unmodified_spill_blocks\fR (int)
3119.ad
3120.RS 12n
3121Include unmodified spill blocks in the send stream. Under certain circumstances
3122previous versions of ZFS could incorrectly remove the spill block from an
3123existing object. Including unmodified copies of the spill blocks creates a
3124backwards compatible stream which will recreate a spill block if it was
3125incorrectly removed.
3126.sp
3127Use \fB1\fR for yes (default) and \fB0\fR for no.
3128.RE
3129
30af21b0
PD
3130.sp
3131.ne 2
3132.na
3133\fBzfs_send_no_prefetch_queue_ff\fR (int)
3134.ad
3135.RS 12n
3136The fill fraction of the \fBzfs send\fR internal queues. The fill fraction
3137controls the timing with which internal threads are woken up.
3138.sp
3139Default value: \fB20\fR.
3140.RE
3141
3142.sp
3143.ne 2
3144.na
3145\fBzfs_send_no_prefetch_queue_length\fR (int)
3146.ad
3147.RS 12n
3148The maximum number of bytes allowed in \fBzfs send\fR's internal queues.
3149.sp
3150Default value: \fB1,048,576\fR.
3151.RE
3152
3153.sp
3154.ne 2
3155.na
3156\fBzfs_send_queue_ff\fR (int)
3157.ad
3158.RS 12n
3159The fill fraction of the \fBzfs send\fR prefetch queue. The fill fraction
3160controls the timing with which internal threads are woken up.
3161.sp
3162Default value: \fB20\fR.
3163.RE
3164
3b0d9928
BB
3165.sp
3166.ne 2
3167.na
3168\fBzfs_send_queue_length\fR (int)
3169.ad
3170.RS 12n
30af21b0
PD
3171The maximum number of bytes allowed that will be prefetched by \fBzfs send\fR.
3172This value must be at least twice the maximum block size in use.
3b0d9928
BB
3173.sp
3174Default value: \fB16,777,216\fR.
3175.RE
3176
30af21b0
PD
3177.sp
3178.ne 2
3179.na
3180\fBzfs_recv_queue_ff\fR (int)
3181.ad
3182.RS 12n
3183The fill fraction of the \fBzfs receive\fR queue. The fill fraction
3184controls the timing with which internal threads are woken up.
3185.sp
3186Default value: \fB20\fR.
3187.RE
3188
3b0d9928
BB
3189.sp
3190.ne 2
3191.na
3192\fBzfs_recv_queue_length\fR (int)
3193.ad
3194.RS 12n
3b0d9928
BB
3195The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value
3196must be at least twice the maximum block size in use.
3197.sp
3198Default value: \fB16,777,216\fR.
3199.RE
3200
7261fc2e
MA
3201.sp
3202.ne 2
3203.na
3204\fBzfs_recv_write_batch_size\fR (int)
3205.ad
3206.RS 12n
3207The maximum amount of data (in bytes) that \fBzfs receive\fR will write in
3208one DMU transaction. This is the uncompressed size, even when receiving a
3209compressed send stream. This setting will not reduce the write size below
3210a single block. Capped at a maximum of 32MB
3211.sp
3212Default value: \fB1MB\fR.
3213.RE
3214
30af21b0
PD
3215.sp
3216.ne 2
3217.na
3218\fBzfs_override_estimate_recordsize\fR (ulong)
3219.ad
3220.RS 12n
3221Setting this variable overrides the default logic for estimating block
3222sizes when doing a zfs send. The default heuristic is that the average
3223block size will be the current recordsize. Override this value if most data
3224in your dataset is not of that size and you require accurate zfs send size
3225estimates.
3226.sp
3227Default value: \fB0\fR.
3228.RE
3229
29714574
TF
3230.sp
3231.ne 2
3232.na
3233\fBzfs_sync_pass_deferred_free\fR (int)
3234.ad
3235.RS 12n
83426735 3236Flushing of data to disk is done in passes. Defer frees starting in this pass
29714574
TF
3237.sp
3238Default value: \fB2\fR.
3239.RE
3240
d2734cce
SD
3241.sp
3242.ne 2
3243.na
3244\fBzfs_spa_discard_memory_limit\fR (int)
3245.ad
3246.RS 12n
3247Maximum memory used for prefetching a checkpoint's space map on each
3248vdev while discarding the checkpoint.
3249.sp
3250Default value: \fB16,777,216\fR.
3251.RE
3252
1f02ecc5
D
3253.sp
3254.ne 2
3255.na
3256\fBzfs_special_class_metadata_reserve_pct\fR (int)
3257.ad
3258.RS 12n
3259Only allow small data blocks to be allocated on the special and dedup vdev
3260types when the available free space percentage on these vdevs exceeds this
3261value. This ensures reserved space is available for pool meta data as the
3262special vdevs approach capacity.
3263.sp
3264Default value: \fB25\fR.
3265.RE
3266
29714574
TF
3267.sp
3268.ne 2
3269.na
3270\fBzfs_sync_pass_dont_compress\fR (int)
3271.ad
3272.RS 12n
b596585f 3273Starting in this sync pass, we disable compression (including of metadata).
be89734a
MA
3274With the default setting, in practice, we don't have this many sync passes,
3275so this has no effect.
3276.sp
3277The original intent was that disabling compression would help the sync passes
3278to converge. However, in practice disabling compression increases the average
3279number of sync passes, because when we turn compression off, a lot of block's
3280size will change and thus we have to re-allocate (not overwrite) them. It
3281also increases the number of 128KB allocations (e.g. for indirect blocks and
3282spacemaps) because these will not be compressed. The 128K allocations are
3283especially detrimental to performance on highly fragmented systems, which may
3284have very few free segments of this size, and may need to load new metaslabs
3285to satisfy 128K allocations.
29714574 3286.sp
be89734a 3287Default value: \fB8\fR.
29714574
TF
3288.RE
3289
3290.sp
3291.ne 2
3292.na
3293\fBzfs_sync_pass_rewrite\fR (int)
3294.ad
3295.RS 12n
83426735 3296Rewrite new block pointers starting in this pass
29714574
TF
3297.sp
3298Default value: \fB2\fR.
3299.RE
3300
a032ac4b
BB
3301.sp
3302.ne 2
3303.na
3304\fBzfs_sync_taskq_batch_pct\fR (int)
3305.ad
3306.RS 12n
3307This controls the number of threads used by the dp_sync_taskq. The default
3308value of 75% will create a maximum of one thread per cpu.
3309.sp
be54a13c 3310Default value: \fB75\fR%.
a032ac4b
BB
3311.RE
3312
1b939560
BB
3313.sp
3314.ne 2
3315.na
67709516 3316\fBzfs_trim_extent_bytes_max\fR (uint)
1b939560
BB
3317.ad
3318.RS 12n
3319Maximum size of TRIM command. Ranges larger than this will be split in to
3320chunks no larger than \fBzfs_trim_extent_bytes_max\fR bytes before being
3321issued to the device.
3322.sp
3323Default value: \fB134,217,728\fR.
3324.RE
3325
3326.sp
3327.ne 2
3328.na
67709516 3329\fBzfs_trim_extent_bytes_min\fR (uint)
1b939560
BB
3330.ad
3331.RS 12n
3332Minimum size of TRIM commands. TRIM ranges smaller than this will be skipped
3333unless they're part of a larger range which was broken in to chunks. This is
3334done because it's common for these small TRIMs to negatively impact overall
3335performance. This value can be set to 0 to TRIM all unallocated space.
3336.sp
3337Default value: \fB32,768\fR.
3338.RE
3339
3340.sp
3341.ne 2
3342.na
67709516 3343\fBzfs_trim_metaslab_skip\fR (uint)
1b939560
BB
3344.ad
3345.RS 12n
3346Skip uninitialized metaslabs during the TRIM process. This option is useful
3347for pools constructed from large thinly-provisioned devices where TRIM
3348operations are slow. As a pool ages an increasing fraction of the pools
3349metaslabs will be initialized progressively degrading the usefulness of
3350this option. This setting is stored when starting a manual TRIM and will
3351persist for the duration of the requested TRIM.
3352.sp
3353Default value: \fB0\fR.
3354.RE
3355
3356.sp
3357.ne 2
3358.na
67709516 3359\fBzfs_trim_queue_limit\fR (uint)
1b939560
BB
3360.ad
3361.RS 12n
3362Maximum number of queued TRIMs outstanding per leaf vdev. The number of
3363concurrent TRIM commands issued to the device is controlled by the
3364\fBzfs_vdev_trim_min_active\fR and \fBzfs_vdev_trim_max_active\fR module
3365options.
3366.sp
3367Default value: \fB10\fR.
3368.RE
3369
3370.sp
3371.ne 2
3372.na
67709516 3373\fBzfs_trim_txg_batch\fR (uint)
1b939560
BB
3374.ad
3375.RS 12n
3376The number of transaction groups worth of frees which should be aggregated
3377before TRIM operations are issued to the device. This setting represents a
3378trade-off between issuing larger, more efficient TRIM operations and the
3379delay before the recently trimmed space is available for use by the device.
3380.sp
3381Increasing this value will allow frees to be aggregated for a longer time.
3382This will result is larger TRIM operations and potentially increased memory
3383usage. Decreasing this value will have the opposite effect. The default
3384value of 32 was determined to be a reasonable compromise.
3385.sp
3386Default value: \fB32\fR.
3387.RE
3388
29714574
TF
3389.sp
3390.ne 2
3391.na
3392\fBzfs_txg_history\fR (int)
3393.ad
3394.RS 12n
379ca9cf
OF
3395Historical statistics for the last N txgs will be available in
3396\fB/proc/spl/kstat/zfs/<pool>/txgs\fR
29714574 3397.sp
ca85d690 3398Default value: \fB0\fR.
29714574
TF
3399.RE
3400
29714574
TF
3401.sp
3402.ne 2
3403.na
3404\fBzfs_txg_timeout\fR (int)
3405.ad
3406.RS 12n
83426735 3407Flush dirty data to disk at least every N seconds (maximum txg duration)
29714574
TF
3408.sp
3409Default value: \fB5\fR.
3410.RE
3411
1b939560
BB
3412.sp
3413.ne 2
3414.na
3415\fBzfs_vdev_aggregate_trim\fR (int)
3416.ad
3417.RS 12n
3418Allow TRIM I/Os to be aggregated. This is normally not helpful because
3419the extents to be trimmed will have been already been aggregated by the
3420metaslab. This option is provided for debugging and performance analysis.
3421.sp
3422Default value: \fB0\fR.
3423.RE
3424
29714574
TF
3425.sp
3426.ne 2
3427.na
3428\fBzfs_vdev_aggregation_limit\fR (int)
3429.ad
3430.RS 12n
3431Max vdev I/O aggregation size
3432.sp
1af240f3
AM
3433Default value: \fB1,048,576\fR.
3434.RE
3435
3436.sp
3437.ne 2
3438.na
3439\fBzfs_vdev_aggregation_limit_non_rotating\fR (int)
3440.ad
3441.RS 12n
3442Max vdev I/O aggregation size for non-rotating media
3443.sp
29714574
TF
3444Default value: \fB131,072\fR.
3445.RE
3446
3447.sp
3448.ne 2
3449.na
3450\fBzfs_vdev_cache_bshift\fR (int)
3451.ad
3452.RS 12n
3453Shift size to inflate reads too
3454.sp
83426735 3455Default value: \fB16\fR (effectively 65536).
29714574
TF
3456.RE
3457
3458.sp
3459.ne 2
3460.na
3461\fBzfs_vdev_cache_max\fR (int)
3462.ad
3463.RS 12n
ca85d690 3464Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
3465size (default 64k).
83426735
D
3466.sp
3467Default value: \fB16384\fR.
29714574
TF
3468.RE
3469
3470.sp
3471.ne 2
3472.na
3473\fBzfs_vdev_cache_size\fR (int)
3474.ad
3475.RS 12n
83426735
D
3476Total size of the per-disk cache in bytes.
3477.sp
3478Currently this feature is disabled as it has been found to not be helpful
3479for performance and in some cases harmful.
29714574
TF
3480.sp
3481Default value: \fB0\fR.
3482.RE
3483
29714574
TF
3484.sp
3485.ne 2
3486.na
9f500936 3487\fBzfs_vdev_mirror_rotating_inc\fR (int)
29714574
TF
3488.ad
3489.RS 12n
9f500936 3490A number by which the balancing algorithm increments the load calculation for
3491the purpose of selecting the least busy mirror member when an I/O immediately
3492follows its predecessor on rotational vdevs for the purpose of making decisions
3493based on load.
29714574 3494.sp
9f500936 3495Default value: \fB0\fR.
3496.RE
3497
3498.sp
3499.ne 2
3500.na
3501\fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
3502.ad
3503.RS 12n
3504A number by which the balancing algorithm increments the load calculation for
3505the purpose of selecting the least busy mirror member when an I/O lacks
3506locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3507this that are not immediately following the previous I/O are incremented by
3508half.
3509.sp
3510Default value: \fB5\fR.
3511.RE
3512
3513.sp
3514.ne 2
3515.na
3516\fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
3517.ad
3518.RS 12n
3519The maximum distance for the last queued I/O in which the balancing algorithm
3520considers an I/O to have locality.
3521See the section "ZFS I/O SCHEDULER".
3522.sp
3523Default value: \fB1048576\fR.
3524.RE
3525
3526.sp
3527.ne 2
3528.na
3529\fBzfs_vdev_mirror_non_rotating_inc\fR (int)
3530.ad
3531.RS 12n
3532A number by which the balancing algorithm increments the load calculation for
3533the purpose of selecting the least busy mirror member on non-rotational vdevs
3534when I/Os do not immediately follow one another.
3535.sp
3536Default value: \fB0\fR.
3537.RE
3538
3539.sp
3540.ne 2
3541.na
3542\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
3543.ad
3544.RS 12n
3545A number by which the balancing algorithm increments the load calculation for
3546the purpose of selecting the least busy mirror member when an I/O lacks
3547locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3548this that are not immediately following the previous I/O are incremented by
3549half.
3550.sp
3551Default value: \fB1\fR.
29714574
TF
3552.RE
3553
29714574
TF
3554.sp
3555.ne 2
3556.na
3557\fBzfs_vdev_read_gap_limit\fR (int)
3558.ad
3559.RS 12n
83426735
D
3560Aggregate read I/O operations if the gap on-disk between them is within this
3561threshold.
29714574
TF
3562.sp
3563Default value: \fB32,768\fR.
3564.RE
3565
29714574
TF
3566.sp
3567.ne 2
3568.na
3569\fBzfs_vdev_write_gap_limit\fR (int)
3570.ad
3571.RS 12n
3572Aggregate write I/O over gap
3573.sp
3574Default value: \fB4,096\fR.
3575.RE
3576
ab9f4b0b
GN
3577.sp
3578.ne 2
3579.na
3580\fBzfs_vdev_raidz_impl\fR (string)
3581.ad
3582.RS 12n
c9187d86 3583Parameter for selecting raidz parity implementation to use.
ab9f4b0b
GN
3584
3585Options marked (always) below may be selected on module load as they are
3586supported on all systems.
3587The remaining options may only be set after the module is loaded, as they
3588are available only if the implementations are compiled in and supported
3589on the running system.
3590
3591Once the module is loaded, the content of
3592/sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
3593with the currently selected one enclosed in [].
3594Possible options are:
3595 fastest - (always) implementation selected using built-in benchmark
3596 original - (always) original raidz implementation
3597 scalar - (always) scalar raidz implementation
ae25d222
GN
3598 sse2 - implementation using SSE2 instruction set (64bit x86 only)
3599 ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
ab9f4b0b 3600 avx2 - implementation using AVX2 instruction set (64bit x86 only)
7f547f85
RD
3601 avx512f - implementation using AVX512F instruction set (64bit x86 only)
3602 avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
62a65a65
RD
3603 aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
3604 aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
35b07497 3605 powerpc_altivec - implementation using Altivec (PowerPC only)
ab9f4b0b
GN
3606.sp
3607Default value: \fBfastest\fR.
3608.RE
3609
67709516
D
3610.sp
3611.ne 2
3612.na
3613\fBzfs_vdev_scheduler\fR (charp)
3614.ad
3615.RS 12n
3616\fBDEPRECATED\fR: This option exists for compatibility with older user
3617configurations. It does nothing except print a warning to the kernel log if
3618set.
3619.sp
3620.RE
3621
29714574
TF
3622.sp
3623.ne 2
3624.na
3625\fBzfs_zevent_cols\fR (int)
3626.ad
3627.RS 12n
83426735 3628When zevents are logged to the console use this as the word wrap width.
29714574
TF
3629.sp
3630Default value: \fB80\fR.
3631.RE
3632
3633.sp
3634.ne 2
3635.na
3636\fBzfs_zevent_console\fR (int)
3637.ad
3638.RS 12n
3639Log events to the console
3640.sp
3641Use \fB1\fR for yes and \fB0\fR for no (default).
3642.RE
3643
3644.sp
3645.ne 2
3646.na
3647\fBzfs_zevent_len_max\fR (int)
3648.ad
3649.RS 12n
83426735
D
3650Max event queue length. A value of 0 will result in a calculated value which
3651increases with the number of CPUs in the system (minimum 64 events). Events
3652in the queue can be viewed with the \fBzpool events\fR command.
29714574
TF
3653.sp
3654Default value: \fB0\fR.
3655.RE
3656
a032ac4b
BB
3657.sp
3658.ne 2
4f072827
DB
3659.na
3660\fBzfs_zevent_retain_max\fR (int)
3661.ad
3662.RS 12n
3663Maximum recent zevent records to retain for duplicate checking. Setting
3664this value to zero disables duplicate detection.
3665.sp
3666Default value: \fB2000\fR.
3667.RE
3668
3669.sp
3670.ne 2
3671.na
3672\fBzfs_zevent_retain_expire_secs\fR (int)
3673.ad
3674.RS 12n
3675Lifespan for a recent ereport that was retained for duplicate checking.
3676.sp
3677Default value: \fB900\fR.
3678.RE
3679
a032ac4b
BB
3680.na
3681\fBzfs_zil_clean_taskq_maxalloc\fR (int)
3682.ad
3683.RS 12n
3684The maximum number of taskq entries that are allowed to be cached. When this
2fe61a7e 3685limit is exceeded transaction records (itxs) will be cleaned synchronously.
a032ac4b
BB
3686.sp
3687Default value: \fB1048576\fR.
3688.RE
3689
3690.sp
3691.ne 2
3692.na
3693\fBzfs_zil_clean_taskq_minalloc\fR (int)
3694.ad
3695.RS 12n
3696The number of taskq entries that are pre-populated when the taskq is first
3697created and are immediately available for use.
3698.sp
3699Default value: \fB1024\fR.
3700.RE
3701
3702.sp
3703.ne 2
3704.na
3705\fBzfs_zil_clean_taskq_nthr_pct\fR (int)
3706.ad
3707.RS 12n
3708This controls the number of threads used by the dp_zil_clean_taskq. The default
3709value of 100% will create a maximum of one thread per cpu.
3710.sp
be54a13c 3711Default value: \fB100\fR%.
a032ac4b
BB
3712.RE
3713
b8738257
MA
3714.sp
3715.ne 2
3716.na
3717\fBzil_maxblocksize\fR (int)
3718.ad
3719.RS 12n
3720This sets the maximum block size used by the ZIL. On very fragmented pools,
3721lowering this (typically to 36KB) can improve performance.
3722.sp
3723Default value: \fB131072\fR (128KB).
3724.RE
3725
53b1f5ea
PS
3726.sp
3727.ne 2
3728.na
3729\fBzil_nocacheflush\fR (int)
3730.ad
3731.RS 12n
3732Disable the cache flush commands that are normally sent to the disk(s) by
3733the ZIL after an LWB write has completed. Setting this will cause ZIL
3734corruption on power loss if a volatile out-of-order write cache is enabled.
3735.sp
3736Use \fB1\fR for yes and \fB0\fR for no (default).
3737.RE
3738
29714574
TF
3739.sp
3740.ne 2
3741.na
3742\fBzil_replay_disable\fR (int)
3743.ad
3744.RS 12n
83426735
D
3745Disable intent logging replay. Can be disabled for recovery from corrupted
3746ZIL
29714574
TF
3747.sp
3748Use \fB1\fR for yes and \fB0\fR for no (default).
3749.RE
3750
3751.sp
3752.ne 2
3753.na
1b7c1e5c 3754\fBzil_slog_bulk\fR (ulong)
29714574
TF
3755.ad
3756.RS 12n
1b7c1e5c
GDN
3757Limit SLOG write size per commit executed with synchronous priority.
3758Any writes above that will be executed with lower (asynchronous) priority
3759to limit potential SLOG device abuse by single active ZIL writer.
29714574 3760.sp
1b7c1e5c 3761Default value: \fB786,432\fR.
29714574
TF
3762.RE
3763
638dd5f4
TC
3764.sp
3765.ne 2
3766.na
3767\fBzio_deadman_log_all\fR (int)
3768.ad
3769.RS 12n
3770If non-zero, the zio deadman will produce debugging messages (see
3771\fBzfs_dbgmsg_enable\fR) for all zios, rather than only for leaf
3772zios possessing a vdev. This is meant to be used by developers to gain
3773diagnostic information for hang conditions which don't involve a mutex
3774or other locking primitive; typically conditions in which a thread in
3775the zio pipeline is looping indefinitely.
3776.sp
3777Default value: \fB0\fR.
3778.RE
3779
c3bd3fb4
TC
3780.sp
3781.ne 2
3782.na
3783\fBzio_decompress_fail_fraction\fR (int)
3784.ad
3785.RS 12n
3786If non-zero, this value represents the denominator of the probability that zfs
3787should induce a decompression failure. For instance, for a 5% decompression
3788failure rate, this value should be set to 20.
3789.sp
3790Default value: \fB0\fR.
3791.RE
3792
29714574
TF
3793.sp
3794.ne 2
3795.na
ad796b8a 3796\fBzio_slow_io_ms\fR (int)
29714574
TF
3797.ad
3798.RS 12n
ad796b8a
TH
3799When an I/O operation takes more than \fBzio_slow_io_ms\fR milliseconds to
3800complete is marked as a slow I/O. Each slow I/O causes a delay zevent. Slow
3801I/O counters can be seen with "zpool status -s".
3802
29714574
TF
3803.sp
3804Default value: \fB30,000\fR.
3805.RE
3806
3dfb57a3
DB
3807.sp
3808.ne 2
3809.na
3810\fBzio_dva_throttle_enabled\fR (int)
3811.ad
3812.RS 12n
ad796b8a 3813Throttle block allocations in the I/O pipeline. This allows for
3dfb57a3 3814dynamic allocation distribution when devices are imbalanced.
e815485f
TC
3815When enabled, the maximum number of pending allocations per top-level vdev
3816is limited by \fBzfs_vdev_queue_depth_pct\fR.
3dfb57a3 3817.sp
27f2b90d 3818Default value: \fB1\fR.
3dfb57a3
DB
3819.RE
3820
29714574
TF
3821.sp
3822.ne 2
3823.na
3824\fBzio_requeue_io_start_cut_in_line\fR (int)
3825.ad
3826.RS 12n
3827Prioritize requeued I/O
3828.sp
3829Default value: \fB0\fR.
3830.RE
3831
dcb6bed1
D
3832.sp
3833.ne 2
3834.na
3835\fBzio_taskq_batch_pct\fR (uint)
3836.ad
3837.RS 12n
3838Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
ad796b8a 3839for I/O. These workers are responsible for I/O work such as compression and
dcb6bed1
D
3840checksum calculations. Fractional number of CPUs will be rounded down.
3841.sp
3842The default value of 75 was chosen to avoid using all CPUs which can result in
3843latency issues and inconsistent application performance, especially when high
3844compression is enabled.
3845.sp
3846Default value: \fB75\fR.
3847.RE
3848
29714574
TF
3849.sp
3850.ne 2
3851.na
3852\fBzvol_inhibit_dev\fR (uint)
3853.ad
3854.RS 12n
83426735
D
3855Do not create zvol device nodes. This may slightly improve startup time on
3856systems with a very large number of zvols.
29714574
TF
3857.sp
3858Use \fB1\fR for yes and \fB0\fR for no (default).
3859.RE
3860
3861.sp
3862.ne 2
3863.na
3864\fBzvol_major\fR (uint)
3865.ad
3866.RS 12n
83426735 3867Major number for zvol block devices
29714574
TF
3868.sp
3869Default value: \fB230\fR.
3870.RE
3871
3872.sp
3873.ne 2
3874.na
3875\fBzvol_max_discard_blocks\fR (ulong)
3876.ad
3877.RS 12n
83426735
D
3878Discard (aka TRIM) operations done on zvols will be done in batches of this
3879many blocks, where block size is determined by the \fBvolblocksize\fR property
3880of a zvol.
29714574
TF
3881.sp
3882Default value: \fB16,384\fR.
3883.RE
3884
9965059a
BB
3885.sp
3886.ne 2
3887.na
3888\fBzvol_prefetch_bytes\fR (uint)
3889.ad
3890.RS 12n
3891When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
3892from the start and end of the volume. Prefetching these regions
3893of the volume is desirable because they are likely to be accessed
3894immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
3895table.
3896.sp
3897Default value: \fB131,072\fR.
3898.RE
3899
692e55b8
CC
3900.sp
3901.ne 2
3902.na
3903\fBzvol_request_sync\fR (uint)
3904.ad
3905.RS 12n
3906When processing I/O requests for a zvol submit them synchronously. This
3907effectively limits the queue depth to 1 for each I/O submitter. When set
3908to 0 requests are handled asynchronously by a thread pool. The number of
3909requests which can be handled concurrently is controller by \fBzvol_threads\fR.
3910.sp
8fa5250f 3911Default value: \fB0\fR.
692e55b8
CC
3912.RE
3913
3914.sp
3915.ne 2
3916.na
3917\fBzvol_threads\fR (uint)
3918.ad
3919.RS 12n
3920Max number of threads which can handle zvol I/O requests concurrently.
3921.sp
3922Default value: \fB32\fR.
3923.RE
3924
cf8738d8 3925.sp
3926.ne 2
3927.na
3928\fBzvol_volmode\fR (uint)
3929.ad
3930.RS 12n
3931Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
3932Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
3933.sp
3934Default value: \fB1\fR.
3935.RE
3936
e8b96c60
MA
3937.SH ZFS I/O SCHEDULER
3938ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
3939The I/O scheduler determines when and in what order those operations are
3940issued. The I/O scheduler divides operations into five I/O classes
3941prioritized in the following order: sync read, sync write, async read,
3942async write, and scrub/resilver. Each queue defines the minimum and
3943maximum number of concurrent operations that may be issued to the
3944device. In addition, the device has an aggregate maximum,
3945\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
3946must not exceed the aggregate maximum. If the sum of the per-queue
3947maximums exceeds the aggregate maximum, then the number of active I/Os
3948may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
3949be issued regardless of whether all per-queue minimums have been met.
3950.sp
3951For many physical devices, throughput increases with the number of
3952concurrent operations, but latency typically suffers. Further, physical
3953devices typically have a limit at which more concurrent operations have no
3954effect on throughput or can actually cause it to decrease.
3955.sp
3956The scheduler selects the next operation to issue by first looking for an
3957I/O class whose minimum has not been satisfied. Once all are satisfied and
3958the aggregate maximum has not been hit, the scheduler looks for classes
3959whose maximum has not been satisfied. Iteration through the I/O classes is
3960done in the order specified above. No further operations are issued if the
3961aggregate maximum number of concurrent operations has been hit or if there
3962are no operations queued for an I/O class that has not hit its maximum.
3963Every time an I/O is queued or an operation completes, the I/O scheduler
3964looks for new operations to issue.
3965.sp
3966In general, smaller max_active's will lead to lower latency of synchronous
3967operations. Larger max_active's may lead to higher overall throughput,
3968depending on underlying storage.
3969.sp
3970The ratio of the queues' max_actives determines the balance of performance
3971between reads, writes, and scrubs. E.g., increasing
3972\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
3973more quickly, but reads and writes to have higher latency and lower throughput.
3974.sp
3975All I/O classes have a fixed maximum number of outstanding operations
3976except for the async write class. Asynchronous writes represent the data
3977that is committed to stable storage during the syncing stage for
3978transaction groups. Transaction groups enter the syncing state
3979periodically so the number of queued async writes will quickly burst up
3980and then bleed down to zero. Rather than servicing them as quickly as
3981possible, the I/O scheduler changes the maximum number of active async
3982write I/Os according to the amount of dirty data in the pool. Since
3983both throughput and latency typically increase with the number of
3984concurrent operations issued to physical devices, reducing the
3985burstiness in the number of concurrent operations also stabilizes the
3986response time of operations from other -- and in particular synchronous
3987-- queues. In broad strokes, the I/O scheduler will issue more
3988concurrent operations from the async write queue as there's more dirty
3989data in the pool.
3990.sp
3991Async Writes
3992.sp
3993The number of concurrent operations issued for the async write I/O class
3994follows a piece-wise linear function defined by a few adjustable points.
3995.nf
3996
3997 | o---------| <-- zfs_vdev_async_write_max_active
3998 ^ | /^ |
3999 | | / | |
4000active | / | |
4001 I/O | / | |
4002count | / | |
4003 | / | |
4004 |-------o | | <-- zfs_vdev_async_write_min_active
4005 0|_______^______|_________|
4006 0% | | 100% of zfs_dirty_data_max
4007 | |
4008 | `-- zfs_vdev_async_write_active_max_dirty_percent
4009 `--------- zfs_vdev_async_write_active_min_dirty_percent
4010
4011.fi
4012Until the amount of dirty data exceeds a minimum percentage of the dirty
4013data allowed in the pool, the I/O scheduler will limit the number of
4014concurrent operations to the minimum. As that threshold is crossed, the
4015number of concurrent operations issued increases linearly to the maximum at
4016the specified maximum percentage of the dirty data allowed in the pool.
4017.sp
4018Ideally, the amount of dirty data on a busy pool will stay in the sloped
4019part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
4020and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
4021maximum percentage, this indicates that the rate of incoming data is
4022greater than the rate that the backend storage can handle. In this case, we
4023must further throttle incoming writes, as described in the next section.
4024
4025.SH ZFS TRANSACTION DELAY
4026We delay transactions when we've determined that the backend storage
4027isn't able to accommodate the rate of incoming writes.
4028.sp
4029If there is already a transaction waiting, we delay relative to when
4030that transaction will finish waiting. This way the calculated delay time
4031is independent of the number of threads concurrently executing
4032transactions.
4033.sp
4034If we are the only waiter, wait relative to when the transaction
4035started, rather than the current time. This credits the transaction for
4036"time already served", e.g. reading indirect blocks.
4037.sp
4038The minimum time for a transaction to take is calculated as:
4039.nf
4040 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
4041 min_time is then capped at 100 milliseconds.
4042.fi
4043.sp
4044The delay has two degrees of freedom that can be adjusted via tunables. The
4045percentage of dirty data at which we start to delay is defined by
4046\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
4047\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
4048delay after writing at full speed has failed to keep up with the incoming write
4049rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
4050this variable determines the amount of delay at the midpoint of the curve.
4051.sp
4052.nf
4053delay
4054 10ms +-------------------------------------------------------------*+
4055 | *|
4056 9ms + *+
4057 | *|
4058 8ms + *+
4059 | * |
4060 7ms + * +
4061 | * |
4062 6ms + * +
4063 | * |
4064 5ms + * +
4065 | * |
4066 4ms + * +
4067 | * |
4068 3ms + * +
4069 | * |
4070 2ms + (midpoint) * +
4071 | | ** |
4072 1ms + v *** +
4073 | zfs_delay_scale ----------> ******** |
4074 0 +-------------------------------------*********----------------+
4075 0% <- zfs_dirty_data_max -> 100%
4076.fi
4077.sp
4078Note that since the delay is added to the outstanding time remaining on the
4079most recent transaction, the delay is effectively the inverse of IOPS.
4080Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
4081was chosen such that small changes in the amount of accumulated dirty data
4082in the first 3/4 of the curve yield relatively small differences in the
4083amount of delay.
4084.sp
4085The effects can be easier to understand when the amount of delay is
4086represented on a log scale:
4087.sp
4088.nf
4089delay
4090100ms +-------------------------------------------------------------++
4091 + +
4092 | |
4093 + *+
4094 10ms + *+
4095 + ** +
4096 | (midpoint) ** |
4097 + | ** +
4098 1ms + v **** +
4099 + zfs_delay_scale ----------> ***** +
4100 | **** |
4101 + **** +
4102100us + ** +
4103 + * +
4104 | * |
4105 + * +
4106 10us + * +
4107 + +
4108 | |
4109 + +
4110 +--------------------------------------------------------------+
4111 0% <- zfs_dirty_data_max -> 100%
4112.fi
4113.sp
4114Note here that only as the amount of dirty data approaches its limit does
4115the delay start to increase rapidly. The goal of a properly tuned system
4116should be to keep the amount of dirty data out of that range by first
4117ensuring that the appropriate limits are set for the I/O scheduler to reach
4118optimal throughput on the backend storage, and then by changing the value
4119of \fBzfs_delay_scale\fR to increase the steepness of the curve.