]> git.proxmox.com Git - mirror_zfs.git/blame - man/man5/zfs-module-parameters.5
Atomically check and set dropped zevent count
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
CommitLineData
29714574
TF
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
dd0b5c85 3.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
65282ee9 4.\" Copyright (c) 2019 Datto Inc.
29714574
TF
5.\" The contents of this file are subject to the terms of the Common Development
6.\" and Distribution License (the "License"). You may not use this file except
7.\" in compliance with the License. You can obtain a copy of the license at
8.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
9.\"
10.\" See the License for the specific language governing permissions and
11.\" limitations under the License. When distributing Covered Code, include this
12.\" CDDL HEADER in each file and include the License file at
13.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
14.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
15.\" own identifying information:
16.\" Portions Copyright [yyyy] [name of copyright owner]
b596585f 17.TH ZFS-MODULE-PARAMETERS 5 "Aug 24, 2020" OpenZFS
29714574
TF
18.SH NAME
19zfs\-module\-parameters \- ZFS module parameters
20.SH DESCRIPTION
21.sp
22.LP
23Description of the different parameters to the ZFS module.
24
25.SS "Module parameters"
26.sp
27.LP
28
de4f8d5d
BB
29.sp
30.ne 2
31.na
32\fBdbuf_cache_max_bytes\fR (ulong)
33.ad
34.RS 12n
8348fac3
RM
35Maximum size in bytes of the dbuf cache. The target size is determined by the
36MIN versus \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size. The
37behavior of the dbuf cache and its associated settings can be observed via the
38\fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
de4f8d5d 39.sp
8348fac3 40Default value: \fBULONG_MAX\fR.
de4f8d5d
BB
41.RE
42
2e5dc449
MA
43.sp
44.ne 2
45.na
46\fBdbuf_metadata_cache_max_bytes\fR (ulong)
47.ad
48.RS 12n
8348fac3
RM
49Maximum size in bytes of the metadata dbuf cache. The target size is
50determined by the MIN versus \fB1/2^dbuf_metadata_cache_shift\fR (1/64) of the
51target ARC size. The behavior of the metadata dbuf cache and its associated
52settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
2e5dc449 53.sp
8348fac3 54Default value: \fBULONG_MAX\fR.
2e5dc449
MA
55.RE
56
de4f8d5d
BB
57.sp
58.ne 2
59.na
60\fBdbuf_cache_hiwater_pct\fR (uint)
61.ad
62.RS 12n
63The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted
64directly.
65.sp
66Default value: \fB10\fR%.
67.RE
68
69.sp
70.ne 2
71.na
72\fBdbuf_cache_lowater_pct\fR (uint)
73.ad
74.RS 12n
75The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops
76evicting dbufs.
77.sp
78Default value: \fB10\fR%.
79.RE
80
81.sp
82.ne 2
83.na
84\fBdbuf_cache_shift\fR (int)
85.ad
86.RS 12n
87Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
77f6826b 88of the target ARC size.
de4f8d5d
BB
89.sp
90Default value: \fB5\fR.
91.RE
92
2e5dc449
MA
93.sp
94.ne 2
95.na
96\fBdbuf_metadata_cache_shift\fR (int)
97.ad
98.RS 12n
99Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR,
77f6826b 100to a log2 fraction of the target ARC size.
2e5dc449
MA
101.sp
102Default value: \fB6\fR.
103.RE
104
67709516
D
105.sp
106.ne 2
107.na
108\fBdmu_object_alloc_chunk_shift\fR (int)
109.ad
110.RS 12n
111dnode slots allocated in a single operation as a power of 2. The default value
112minimizes lock contention for the bulk operation performed.
113.sp
114Default value: \fB7\fR (128).
115.RE
116
d9b4bf06
MA
117.sp
118.ne 2
119.na
120\fBdmu_prefetch_max\fR (int)
121.ad
122.RS 12n
123Limit the amount we can prefetch with one call to this amount (in bytes).
124This helps to limit the amount of memory that can be used by prefetching.
125.sp
126Default value: \fB134,217,728\fR (128MB).
127.RE
128
6d836e6f
RE
129.sp
130.ne 2
131.na
132\fBignore_hole_birth\fR (int)
133.ad
134.RS 12n
6ce7b2d9 135This is an alias for \fBsend_holes_without_birth_time\fR.
6d836e6f
RE
136.RE
137
29714574
TF
138.sp
139.ne 2
140.na
141\fBl2arc_feed_again\fR (int)
142.ad
143.RS 12n
83426735
D
144Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
145fast as possible.
29714574
TF
146.sp
147Use \fB1\fR for yes (default) and \fB0\fR to disable.
148.RE
149
150.sp
151.ne 2
152.na
153\fBl2arc_feed_min_ms\fR (ulong)
154.ad
155.RS 12n
83426735
D
156Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
157applicable in related situations.
29714574
TF
158.sp
159Default value: \fB200\fR.
160.RE
161
162.sp
163.ne 2
164.na
165\fBl2arc_feed_secs\fR (ulong)
166.ad
167.RS 12n
168Seconds between L2ARC writing
169.sp
170Default value: \fB1\fR.
171.RE
172
173.sp
174.ne 2
175.na
176\fBl2arc_headroom\fR (ulong)
177.ad
178.RS 12n
83426735 179How far through the ARC lists to search for L2ARC cacheable content, expressed
77f6826b
GA
180as a multiplier of \fBl2arc_write_max\fR.
181ARC persistence across reboots can be achieved with persistent L2ARC by setting
182this parameter to \fB0\fR allowing the full length of ARC lists to be searched
183for cacheable content.
29714574
TF
184.sp
185Default value: \fB2\fR.
186.RE
187
188.sp
189.ne 2
190.na
191\fBl2arc_headroom_boost\fR (ulong)
192.ad
193.RS 12n
83426735 194Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
b7654bd7
GA
195successfully compressed before writing. A value of \fB100\fR disables this
196feature.
29714574 197.sp
be54a13c 198Default value: \fB200\fR%.
29714574
TF
199.RE
200
feb3a7ee
GA
201.sp
202.ne 2
203.na
204\fBl2arc_mfuonly\fR (int)
205.ad
206.RS 12n
207Controls whether only MFU metadata and data are cached from ARC into L2ARC.
208This may be desired to avoid wasting space on L2ARC when reading/writing large
209amounts of data that are not expected to be accessed more than once. The
210default is \fB0\fR, meaning both MRU and MFU data and metadata are cached.
211When turning off (\fB0\fR) this feature some MRU buffers will still be present
08532162
GA
212in ARC and eventually cached on L2ARC. If \fBl2arc_noprefetch\fR is set to 0,
213some prefetched buffers will be cached to L2ARC, and those might later
214transition to MRU, in which case the \fBl2arc_mru_asize\fR arcstat will not
215be 0. Regardless of \fBl2arc_noprefetch\fR, some MFU buffers might be evicted
216from ARC, accessed later on as prefetches and transition to MRU as prefetches.
217If accessed again they are counted as MRU and the \fBl2arc_mru_asize\fR arcstat
218will not be 0. The ARC status of L2ARC buffers when they were first cached in
219L2ARC can be seen in the \fBl2arc_mru_asize\fR, \fBl2arc_mfu_asize\fR and
220\fBl2arc_prefetch_asize\fR arcstats when importing the pool or onlining a cache
37b00fb0
GA
221device if persistent L2ARC is enabled. The \fBevicted_l2_eligible_mru\fR
222arcstat does not take into account if this option is enabled as the information
223provided by the evicted_l2_eligible_* arcstats can be used to decide if
224toggling this option is appropriate for the current workload.
feb3a7ee
GA
225.sp
226Use \fB0\fR for no (default) and \fB1\fR for yes.
227.RE
228
523e1295
AM
229.sp
230.ne 2
231.na
232\fBl2arc_meta_percent\fR (int)
233.ad
234.RS 12n
235Percent of ARC size allowed for L2ARC-only headers.
236Since L2ARC buffers are not evicted on memory pressure, too large amount of
237headers on system with irrationaly large L2ARC can render it slow or unusable.
238This parameter limits L2ARC writes and rebuild to achieve it.
239.sp
240Default value: \fB33\fR%.
241.RE
242
b7654bd7
GA
243.sp
244.ne 2
245.na
246\fBl2arc_trim_ahead\fR (ulong)
247.ad
248.RS 12n
249Trims ahead of the current write size (\fBl2arc_write_max\fR) on L2ARC devices
250by this percentage of write size if we have filled the device. If set to
251\fB100\fR we TRIM twice the space required to accommodate upcoming writes. A
252minimum of 64MB will be trimmed. It also enables TRIM of the whole L2ARC device
253upon creation or addition to an existing pool or if the header of the device is
254invalid upon importing a pool or onlining a cache device. A value of \fB0\fR
255disables TRIM on L2ARC altogether and is the default as it can put significant
256stress on the underlying storage devices. This will vary depending of how well
257the specific device handles these commands.
258.sp
259Default value: \fB0\fR%.
260.RE
261
29714574
TF
262.sp
263.ne 2
264.na
265\fBl2arc_noprefetch\fR (int)
266.ad
267.RS 12n
83426735 268Do not write buffers to L2ARC if they were prefetched but not used by
08532162
GA
269applications. In case there are prefetched buffers in L2ARC and this option
270is later set to \fB1\fR, we do not read the prefetched buffers from L2ARC.
271Setting this option to \fB0\fR is useful for caching sequential reads from the
272disks to L2ARC and serve those reads from L2ARC later on. This may be beneficial
273in case the L2ARC device is significantly faster in sequential reads than the
274disks of the pool.
275.sp
276Use \fB1\fR to disable (default) and \fB0\fR to enable caching/reading
277prefetches to/from L2ARC..
29714574
TF
278.RE
279
280.sp
281.ne 2
282.na
283\fBl2arc_norw\fR (int)
284.ad
285.RS 12n
77f6826b 286No reads during writes.
29714574
TF
287.sp
288Use \fB1\fR for yes and \fB0\fR for no (default).
289.RE
290
291.sp
292.ne 2
293.na
294\fBl2arc_write_boost\fR (ulong)
295.ad
296.RS 12n
603a1784 297Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
83426735 298while they remain cold.
29714574
TF
299.sp
300Default value: \fB8,388,608\fR.
301.RE
302
303.sp
304.ne 2
305.na
306\fBl2arc_write_max\fR (ulong)
307.ad
308.RS 12n
77f6826b 309Max write bytes per interval.
29714574
TF
310.sp
311Default value: \fB8,388,608\fR.
312.RE
313
77f6826b
GA
314.sp
315.ne 2
316.na
317\fBl2arc_rebuild_enabled\fR (int)
318.ad
319.RS 12n
320Rebuild the L2ARC when importing a pool (persistent L2ARC). This can be
321disabled if there are problems importing a pool or attaching an L2ARC device
322(e.g. the L2ARC device is slow in reading stored log metadata, or the metadata
323has become somehow fragmented/unusable).
324.sp
325Use \fB1\fR for yes (default) and \fB0\fR for no.
326.RE
327
328.sp
329.ne 2
330.na
331\fBl2arc_rebuild_blocks_min_l2size\fR (ulong)
332.ad
333.RS 12n
334Min size (in bytes) of an L2ARC device required in order to write log blocks
335in it. The log blocks are used upon importing the pool to rebuild
336the L2ARC (persistent L2ARC). Rationale: for L2ARC devices less than 1GB, the
337amount of data l2arc_evict() evicts is significant compared to the amount of
338restored L2ARC data. In this case do not write log blocks in L2ARC in order not
339to waste space.
340.sp
341Default value: \fB1,073,741,824\fR (1GB).
342.RE
343
99b14de4
ED
344.sp
345.ne 2
346.na
347\fBmetaslab_aliquot\fR (ulong)
348.ad
349.RS 12n
350Metaslab granularity, in bytes. This is roughly similar to what would be
351referred to as the "stripe size" in traditional RAID arrays. In normal
352operation, ZFS will try to write this amount of data to a top-level vdev
353before moving on to the next one.
354.sp
355Default value: \fB524,288\fR.
356.RE
357
f3a7f661
GW
358.sp
359.ne 2
360.na
361\fBmetaslab_bias_enabled\fR (int)
362.ad
363.RS 12n
364Enable metaslab group biasing based on its vdev's over- or under-utilization
365relative to the pool.
366.sp
367Use \fB1\fR for yes (default) and \fB0\fR for no.
368.RE
369
d830d479
MA
370.sp
371.ne 2
372.na
373\fBmetaslab_force_ganging\fR (ulong)
374.ad
375.RS 12n
376Make some blocks above a certain size be gang blocks. This option is used
377by the test suite to facilitate testing.
378.sp
379Default value: \fB16,777,217\fR.
380.RE
381
d66aab7c
MA
382.sp
383.ne 2
384.na
385\fBzfs_history_output_max\fR (int)
386.ad
387.RS 12n
388When attempting to log the output nvlist of an ioctl in the on-disk history, the
389output will not be stored if it is larger than size (in bytes). This must be
390less then DMU_MAX_ACCESS (64MB). This applies primarily to
391zfs_ioc_channel_program().
392.sp
393Default value: \fB1MB\fR.
394.RE
395
93e28d66
SD
396.sp
397.ne 2
398.na
399\fBzfs_keep_log_spacemaps_at_export\fR (int)
400.ad
401.RS 12n
402Prevent log spacemaps from being destroyed during pool exports and destroys.
403.sp
404Use \fB1\fR for yes and \fB0\fR for no (default).
405.RE
406
4e21fd06
DB
407.sp
408.ne 2
409.na
410\fBzfs_metaslab_segment_weight_enabled\fR (int)
411.ad
412.RS 12n
413Enable/disable segment-based metaslab selection.
414.sp
415Use \fB1\fR for yes (default) and \fB0\fR for no.
416.RE
417
418.sp
419.ne 2
420.na
421\fBzfs_metaslab_switch_threshold\fR (int)
422.ad
423.RS 12n
424When using segment-based metaslab selection, continue allocating
321204be 425from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
4e21fd06
DB
426worth of buckets have been exhausted.
427.sp
428Default value: \fB2\fR.
429.RE
430
29714574
TF
431.sp
432.ne 2
433.na
aa7d06a9 434\fBmetaslab_debug_load\fR (int)
29714574
TF
435.ad
436.RS 12n
aa7d06a9
GW
437Load all metaslabs during pool import.
438.sp
439Use \fB1\fR for yes and \fB0\fR for no (default).
440.RE
441
442.sp
443.ne 2
444.na
445\fBmetaslab_debug_unload\fR (int)
446.ad
447.RS 12n
448Prevent metaslabs from being unloaded.
29714574
TF
449.sp
450Use \fB1\fR for yes and \fB0\fR for no (default).
451.RE
452
f3a7f661
GW
453.sp
454.ne 2
455.na
456\fBmetaslab_fragmentation_factor_enabled\fR (int)
457.ad
458.RS 12n
459Enable use of the fragmentation metric in computing metaslab weights.
460.sp
461Use \fB1\fR for yes (default) and \fB0\fR for no.
462.RE
463
d3230d76
MA
464.sp
465.ne 2
466.na
467\fBmetaslab_df_max_search\fR (int)
468.ad
469.RS 12n
470Maximum distance to search forward from the last offset. Without this limit,
471fragmented pools can see >100,000 iterations and metaslab_block_picker()
472becomes the performance limiting factor on high-performance storage.
473
474With the default setting of 16MB, we typically see less than 500 iterations,
475even with very fragmented, ashift=9 pools. The maximum number of iterations
476possible is: \fBmetaslab_df_max_search / (2 * (1<<ashift))\fR.
477With the default setting of 16MB this is 16*1024 (with ashift=9) or 2048
478(with ashift=12).
479.sp
480Default value: \fB16,777,216\fR (16MB)
481.RE
482
483.sp
484.ne 2
485.na
486\fBmetaslab_df_use_largest_segment\fR (int)
487.ad
488.RS 12n
489If we are not searching forward (due to metaslab_df_max_search,
490metaslab_df_free_pct, or metaslab_df_alloc_threshold), this tunable controls
b596585f 491what segment is used. If it is set, we will use the largest free segment.
d3230d76
MA
492If it is not set, we will use a segment of exactly the requested size (or
493larger).
494.sp
495Use \fB1\fR for yes and \fB0\fR for no (default).
496.RE
497
c81f1790
PD
498.sp
499.ne 2
500.na
501\fBzfs_metaslab_max_size_cache_sec\fR (ulong)
502.ad
503.RS 12n
504When we unload a metaslab, we cache the size of the largest free chunk. We use
505that cached size to determine whether or not to load a metaslab for a given
506allocation. As more frees accumulate in that metaslab while it's unloaded, the
507cached max size becomes less and less accurate. After a number of seconds
508controlled by this tunable, we stop considering the cached max size and start
509considering only the histogram instead.
510.sp
511Default value: \fB3600 seconds\fR (one hour)
512.RE
513
f09fda50
PD
514.sp
515.ne 2
516.na
517\fBzfs_metaslab_mem_limit\fR (int)
518.ad
519.RS 12n
520When we are loading a new metaslab, we check the amount of memory being used
521to store metaslab range trees. If it is over a threshold, we attempt to unload
522the least recently used metaslab to prevent the system from clogging all of
523its memory with range trees. This tunable sets the percentage of total system
524memory that is the threshold.
525.sp
eef0f4d8 526Default value: \fB25 percent\fR
f09fda50
PD
527.RE
528
be5c6d96
MA
529.sp
530.ne 2
531.na
532\fBzfs_metaslab_try_hard_before_gang\fR (int)
533.ad
534.RS 12n
535If not set (the default), we will first try normal allocation.
536If that fails then we will do a gang allocation.
537If that fails then we will do a "try hard" gang allocation.
538If that fails then we will have a multi-layer gang block.
539.sp
540If set, we will first try normal allocation.
541If that fails then we will do a "try hard" allocation.
542If that fails we will do a gang allocation.
543If that fails we will do a "try hard" gang allocation.
544If that fails then we will have a multi-layer gang block.
545.sp
546Default value: \fB0 (false)\fR
547.RE
548
549.sp
550.ne 2
551.na
552\fBzfs_metaslab_find_max_tries\fR (int)
553.ad
554.RS 12n
555When not trying hard, we only consider this number of the best metaslabs.
556This improves performance, especially when there are many metaslabs per vdev
557and the allocation can't actually be satisfied (so we would otherwise iterate
558all the metaslabs).
559.sp
560Default value: \fB100\fR
561.RE
562
b8bcca18
MA
563.sp
564.ne 2
565.na
c853f382 566\fBzfs_vdev_default_ms_count\fR (int)
b8bcca18
MA
567.ad
568.RS 12n
e4e94ca3 569When a vdev is added target this number of metaslabs per top-level vdev.
b8bcca18
MA
570.sp
571Default value: \fB200\fR.
572.RE
573
93e28d66
SD
574.sp
575.ne 2
576.na
577\fBzfs_vdev_default_ms_shift\fR (int)
578.ad
579.RS 12n
580Default limit for metaslab size.
581.sp
582Default value: \fB29\fR [meaning (1 << 29) = 512MB].
583.RE
584
6fe3498c
RM
585.sp
586.ne 2
587.na
588\fBzfs_vdev_max_auto_ashift\fR (ulong)
589.ad
590.RS 12n
591Maximum ashift used when optimizing for logical -> physical sector size on new
592top-level vdevs.
593.sp
594Default value: \fBASHIFT_MAX\fR (16).
595.RE
596
597.sp
598.ne 2
599.na
600\fBzfs_vdev_min_auto_ashift\fR (ulong)
601.ad
602.RS 12n
603Minimum ashift used when creating new top-level vdevs.
604.sp
605Default value: \fBASHIFT_MIN\fR (9).
606.RE
607
d2734cce
SD
608.sp
609.ne 2
610.na
c853f382 611\fBzfs_vdev_min_ms_count\fR (int)
d2734cce
SD
612.ad
613.RS 12n
614Minimum number of metaslabs to create in a top-level vdev.
615.sp
616Default value: \fB16\fR.
617.RE
618
e4e94ca3
DB
619.sp
620.ne 2
621.na
67709516
D
622\fBvdev_validate_skip\fR (int)
623.ad
624.RS 12n
625Skip label validation steps during pool import. Changing is not recommended
626unless you know what you are doing and are recovering a damaged label.
627.sp
628Default value: \fB0\fR.
629.RE
630
631.sp
632.ne 2
633.na
634\fBzfs_vdev_ms_count_limit\fR (int)
e4e94ca3
DB
635.ad
636.RS 12n
637Practical upper limit of total metaslabs per top-level vdev.
638.sp
639Default value: \fB131,072\fR.
640.RE
641
f3a7f661
GW
642.sp
643.ne 2
644.na
645\fBmetaslab_preload_enabled\fR (int)
646.ad
647.RS 12n
648Enable metaslab group preloading.
649.sp
650Use \fB1\fR for yes (default) and \fB0\fR for no.
651.RE
652
653.sp
654.ne 2
655.na
656\fBmetaslab_lba_weighting_enabled\fR (int)
657.ad
658.RS 12n
659Give more weight to metaslabs with lower LBAs, assuming they have
660greater bandwidth as is typically the case on a modern constant
661angular velocity disk drive.
662.sp
663Use \fB1\fR for yes (default) and \fB0\fR for no.
664.RE
665
eef0f4d8
PD
666.sp
667.ne 2
668.na
669\fBmetaslab_unload_delay\fR (int)
670.ad
671.RS 12n
672After a metaslab is used, we keep it loaded for this many txgs, to attempt to
673reduce unnecessary reloading. Note that both this many txgs and
674\fBmetaslab_unload_delay_ms\fR milliseconds must pass before unloading will
675occur.
676.sp
677Default value: \fB32\fR.
678.RE
679
680.sp
681.ne 2
682.na
683\fBmetaslab_unload_delay_ms\fR (int)
684.ad
685.RS 12n
686After a metaslab is used, we keep it loaded for this many milliseconds, to
687attempt to reduce unnecessary reloading. Note that both this many
688milliseconds and \fBmetaslab_unload_delay\fR txgs must pass before unloading
689will occur.
690.sp
691Default value: \fB600000\fR (ten minutes).
692.RE
693
6ce7b2d9
RL
694.sp
695.ne 2
696.na
dd0b5c85
DB
697\fBreference_history\fR (int)
698.ad
699.RS 12n
700Maximum reference holders being tracked when reference_tracking_enable is
701active.
702.sp
703Default value: \fB3\fR.
704.RE
705
706.sp
707.ne 2
708.na
709\fBreference_tracking_enable\fR (int)
710.ad
711.RS 12n
712Track reference holders to refcount_t objects (debug builds only).
713.sp
714Use \fB1\fR for yes and \fB0\fR for no (default).
715.RE
716
717.sp
718.ne 2
719.na
6ce7b2d9
RL
720\fBsend_holes_without_birth_time\fR (int)
721.ad
722.RS 12n
723When set, the hole_birth optimization will not be used, and all holes will
d0c3aa9c
TC
724always be sent on zfs send. This is useful if you suspect your datasets are
725affected by a bug in hole_birth.
6ce7b2d9
RL
726.sp
727Use \fB1\fR for on (default) and \fB0\fR for off.
728.RE
729
29714574
TF
730.sp
731.ne 2
732.na
733\fBspa_config_path\fR (charp)
734.ad
735.RS 12n
736SPA config file
737.sp
738Default value: \fB/etc/zfs/zpool.cache\fR.
739.RE
740
e8b96c60
MA
741.sp
742.ne 2
743.na
744\fBspa_asize_inflation\fR (int)
745.ad
746.RS 12n
747Multiplication factor used to estimate actual disk consumption from the
748size of data being written. The default value is a worst case estimate,
749but lower values may be valid for a given pool depending on its
750configuration. Pool administrators who understand the factors involved
751may wish to specify a more realistic inflation factor, particularly if
752they operate close to quota or capacity limits.
753.sp
83426735 754Default value: \fB24\fR.
e8b96c60
MA
755.RE
756
6cb8e530
PZ
757.sp
758.ne 2
759.na
760\fBspa_load_print_vdev_tree\fR (int)
761.ad
762.RS 12n
763Whether to print the vdev tree in the debugging message buffer during pool import.
764Use 0 to disable and 1 to enable.
765.sp
766Default value: \fB0\fR.
767.RE
768
dea377c0
MA
769.sp
770.ne 2
771.na
772\fBspa_load_verify_data\fR (int)
773.ad
774.RS 12n
775Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
776import. Use 0 to disable and 1 to enable.
777
778An extreme rewind import normally performs a full traversal of all
779blocks in the pool for verification. If this parameter is set to 0,
780the traversal skips non-metadata blocks. It can be toggled once the
781import has started to stop or start the traversal of non-metadata blocks.
782.sp
83426735 783Default value: \fB1\fR.
dea377c0
MA
784.RE
785
786.sp
787.ne 2
788.na
789\fBspa_load_verify_metadata\fR (int)
790.ad
791.RS 12n
792Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
793pool import. Use 0 to disable and 1 to enable.
794
795An extreme rewind import normally performs a full traversal of all
1c012083 796blocks in the pool for verification. If this parameter is set to 0,
dea377c0
MA
797the traversal is not performed. It can be toggled once the import has
798started to stop or start the traversal.
799.sp
83426735 800Default value: \fB1\fR.
dea377c0
MA
801.RE
802
803.sp
804.ne 2
805.na
c8242a96 806\fBspa_load_verify_shift\fR (int)
dea377c0
MA
807.ad
808.RS 12n
c8242a96 809Sets the maximum number of bytes to consume during pool import to the log2
77f6826b 810fraction of the target ARC size.
dea377c0 811.sp
c8242a96 812Default value: \fB4\fR.
dea377c0
MA
813.RE
814
6cde6435
BB
815.sp
816.ne 2
817.na
818\fBspa_slop_shift\fR (int)
819.ad
820.RS 12n
821Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
822in the pool to be consumed. This ensures that we don't run the pool
823completely out of space, due to unaccounted changes (e.g. to the MOS).
824It also limits the worst-case time to allocate space. If we have
825less than this amount of free space, most ZPL operations (e.g. write,
826create) will return ENOSPC.
827.sp
83426735 828Default value: \fB5\fR.
6cde6435
BB
829.RE
830
0dc2f70c
MA
831.sp
832.ne 2
833.na
834\fBvdev_removal_max_span\fR (int)
835.ad
836.RS 12n
837During top-level vdev removal, chunks of data are copied from the vdev
838which may include free space in order to trade bandwidth for IOPS.
839This parameter determines the maximum span of free space (in bytes)
840which will be included as "unnecessary" data in a chunk of copied data.
841
842The default value here was chosen to align with
843\fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing
844regular reads (but there's no reason it has to be the same).
845.sp
846Default value: \fB32,768\fR.
847.RE
848
c494aa7f
GW
849.sp
850.ne 2
851.na
852\fBvdev_file_logical_ashift\fR (ulong)
853.ad
854.RS 12n
855Logical ashift for file-based devices.
856.sp
857Default value: \fB9\fR.
858.RE
859
860.sp
861.ne 2
862.na
863\fBvdev_file_physical_ashift\fR (ulong)
864.ad
865.RS 12n
866Physical ashift for file-based devices.
867.sp
868Default value: \fB9\fR.
869.RE
870
d9b4bf06
MA
871.sp
872.ne 2
873.na
874\fBzap_iterate_prefetch\fR (int)
875.ad
876.RS 12n
877If this is set, when we start iterating over a ZAP object, zfs will prefetch
878the entire object (all leaf blocks). However, this is limited by
879\fBdmu_prefetch_max\fR.
880.sp
881Use \fB1\fR for on (default) and \fB0\fR for off.
882.RE
883
29714574
TF
884.sp
885.ne 2
886.na
887\fBzfetch_array_rd_sz\fR (ulong)
888.ad
889.RS 12n
27b293be 890If prefetching is enabled, disable prefetching for reads larger than this size.
29714574
TF
891.sp
892Default value: \fB1,048,576\fR.
893.RE
894
895.sp
896.ne 2
897.na
7f60329a 898\fBzfetch_max_distance\fR (uint)
29714574
TF
899.ad
900.RS 12n
7dfc56d8 901Max bytes to prefetch per stream.
29714574 902.sp
7dfc56d8
RM
903Default value: \fB8,388,608\fR (8MB).
904.RE
905
906.sp
907.ne 2
908.na
909\fBzfetch_max_idistance\fR (uint)
910.ad
911.RS 12n
912Max bytes to prefetch indirects for per stream.
913.sp
914Default vaule: \fB67,108,864\fR (64MB).
29714574
TF
915.RE
916
917.sp
918.ne 2
919.na
920\fBzfetch_max_streams\fR (uint)
921.ad
922.RS 12n
27b293be 923Max number of streams per zfetch (prefetch streams per file).
29714574
TF
924.sp
925Default value: \fB8\fR.
926.RE
927
928.sp
929.ne 2
930.na
931\fBzfetch_min_sec_reap\fR (uint)
932.ad
933.RS 12n
27b293be 934Min time before an active prefetch stream can be reclaimed
29714574
TF
935.sp
936Default value: \fB2\fR.
937.RE
938
67709516
D
939.sp
940.ne 2
941.na
942\fBzfs_abd_scatter_enabled\fR (int)
943.ad
944.RS 12n
945Enables ARC from using scatter/gather lists and forces all allocations to be
946linear in kernel memory. Disabling can improve performance in some code paths
947at the expense of fragmented kernel memory.
948.sp
949Default value: \fB1\fR.
950.RE
951
952.sp
953.ne 2
954.na
955\fBzfs_abd_scatter_max_order\fR (iunt)
956.ad
957.RS 12n
958Maximum number of consecutive memory pages allocated in a single block for
959scatter/gather lists. Default value is specified by the kernel itself.
960.sp
961Default value: \fB10\fR at the time of this writing.
962.RE
963
87c25d56
MA
964.sp
965.ne 2
966.na
967\fBzfs_abd_scatter_min_size\fR (uint)
968.ad
969.RS 12n
970This is the minimum allocation size that will use scatter (page-based)
971ABD's. Smaller allocations will use linear ABD's.
972.sp
973Default value: \fB1536\fR (512B and 1KB allocations will be linear).
974.RE
975
25458cbe
TC
976.sp
977.ne 2
978.na
979\fBzfs_arc_dnode_limit\fR (ulong)
980.ad
981.RS 12n
982When the number of bytes consumed by dnodes in the ARC exceeds this number of
9907cc1c 983bytes, try to unpin some of it in response to demand for non-metadata. This
627791f3 984value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
9907cc1c
G
985indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
986the ARC meta buffers that may be used for dnodes.
25458cbe
TC
987
988See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
989when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
990than in response to overall demand for non-metadata.
991
992.sp
9907cc1c
G
993Default value: \fB0\fR.
994.RE
995
996.sp
997.ne 2
998.na
999\fBzfs_arc_dnode_limit_percent\fR (ulong)
1000.ad
1001.RS 12n
1002Percentage that can be consumed by dnodes of ARC meta buffers.
1003.sp
1004See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
1005higher priority if set to nonzero value.
1006.sp
be54a13c 1007Default value: \fB10\fR%.
25458cbe
TC
1008.RE
1009
1010.sp
1011.ne 2
1012.na
1013\fBzfs_arc_dnode_reduce_percent\fR (ulong)
1014.ad
1015.RS 12n
1016Percentage of ARC dnodes to try to scan in response to demand for non-metadata
6146e17e 1017when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
25458cbe
TC
1018
1019.sp
be54a13c 1020Default value: \fB10\fR% of the number of dnodes in the ARC.
25458cbe
TC
1021.RE
1022
49ddb315
MA
1023.sp
1024.ne 2
1025.na
1026\fBzfs_arc_average_blocksize\fR (int)
1027.ad
1028.RS 12n
1029The ARC's buffer hash table is sized based on the assumption of an average
1030block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
1031to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
1032For configurations with a known larger average block size this value can be
1033increased to reduce the memory footprint.
1034
1035.sp
1036Default value: \fB8192\fR.
1037.RE
1038
3442c2a0
MA
1039.sp
1040.ne 2
1041.na
1042\fBzfs_arc_eviction_pct\fR (int)
1043.ad
1044.RS 12n
1045When \fBarc_is_overflowing()\fR, \fBarc_get_data_impl()\fR waits for this
1046percent of the requested amount of data to be evicted. For example, by
1047default for every 2KB that's evicted, 1KB of it may be "reused" by a new
1048allocation. Since this is above 100%, it ensures that progress is made
1049towards getting \fBarc_size\fR under \fBarc_c\fR. Since this is finite, it
1050ensures that allocations can still happen, even during the potentially long
1051time that \fBarc_size\fR is more than \fBarc_c\fR.
1052.sp
1053Default value: \fB200\fR.
1054.RE
1055
ca0bf58d
PS
1056.sp
1057.ne 2
1058.na
1059\fBzfs_arc_evict_batch_limit\fR (int)
1060.ad
1061.RS 12n
8f343973 1062Number ARC headers to evict per sub-list before proceeding to another sub-list.
ca0bf58d
PS
1063This batch-style operation prevents entire sub-lists from being evicted at once
1064but comes at a cost of additional unlocking and locking.
1065.sp
1066Default value: \fB10\fR.
1067.RE
1068
29714574
TF
1069.sp
1070.ne 2
1071.na
1072\fBzfs_arc_grow_retry\fR (int)
1073.ad
1074.RS 12n
ca85d690 1075If set to a non zero value, it will replace the arc_grow_retry value with this value.
d4a72f23 1076The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
ca85d690 1077trying to resume growth after a memory pressure event.
29714574 1078.sp
ca85d690 1079Default value: \fB0\fR.
29714574
TF
1080.RE
1081
1082.sp
1083.ne 2
1084.na
7e8bddd0 1085\fBzfs_arc_lotsfree_percent\fR (int)
29714574
TF
1086.ad
1087.RS 12n
7e8bddd0
BB
1088Throttle I/O when free system memory drops below this percentage of total
1089system memory. Setting this value to 0 will disable the throttle.
29714574 1090.sp
be54a13c 1091Default value: \fB10\fR%.
29714574
TF
1092.RE
1093
1094.sp
1095.ne 2
1096.na
7e8bddd0 1097\fBzfs_arc_max\fR (ulong)
29714574
TF
1098.ad
1099.RS 12n
9a51738b
RM
1100Max size of ARC in bytes. If set to 0 then the max size of ARC is determined
1101by the amount of system memory installed. For Linux, 1/2 of system memory will
1102be used as the limit. For FreeBSD, the larger of all system memory - 1GB or
11035/8 of system memory will be used as the limit. This value must be at least
110467108864 (64 megabytes).
83426735
D
1105.sp
1106This value can be changed dynamically with some caveats. It cannot be set back
1107to 0 while running and reducing it below the current ARC size will not cause
1108the ARC to shrink without memory pressure to induce shrinking.
29714574 1109.sp
7e8bddd0 1110Default value: \fB0\fR.
29714574
TF
1111.RE
1112
ca85d690 1113.sp
1114.ne 2
1115.na
1116\fBzfs_arc_meta_adjust_restarts\fR (ulong)
1117.ad
1118.RS 12n
1119The number of restart passes to make while scanning the ARC attempting
1120the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
1121This value should not need to be tuned but is available to facilitate
1122performance analysis.
1123.sp
1124Default value: \fB4096\fR.
1125.RE
1126
29714574
TF
1127.sp
1128.ne 2
1129.na
1130\fBzfs_arc_meta_limit\fR (ulong)
1131.ad
1132.RS 12n
2cbb06b5
BB
1133The maximum allowed size in bytes that meta data buffers are allowed to
1134consume in the ARC. When this limit is reached meta data buffers will
1135be reclaimed even if the overall arc_c_max has not been reached. This
9907cc1c
G
1136value defaults to 0 which indicates that a percent which is based on
1137\fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
29714574 1138.sp
83426735 1139This value my be changed dynamically except that it cannot be set back to 0
9907cc1c 1140for a specific percent of the ARC; it must be set to an explicit value.
83426735 1141.sp
29714574
TF
1142Default value: \fB0\fR.
1143.RE
1144
9907cc1c
G
1145.sp
1146.ne 2
1147.na
1148\fBzfs_arc_meta_limit_percent\fR (ulong)
1149.ad
1150.RS 12n
1151Percentage of ARC buffers that can be used for meta data.
1152
1153See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
1154higher priority if set to nonzero value.
1155
1156.sp
be54a13c 1157Default value: \fB75\fR%.
9907cc1c
G
1158.RE
1159
ca0bf58d
PS
1160.sp
1161.ne 2
1162.na
1163\fBzfs_arc_meta_min\fR (ulong)
1164.ad
1165.RS 12n
1166The minimum allowed size in bytes that meta data buffers may consume in
1167the ARC. This value defaults to 0 which disables a floor on the amount
1168of the ARC devoted meta data.
1169.sp
1170Default value: \fB0\fR.
1171.RE
1172
29714574
TF
1173.sp
1174.ne 2
1175.na
1176\fBzfs_arc_meta_prune\fR (int)
1177.ad
1178.RS 12n
2cbb06b5
BB
1179The number of dentries and inodes to be scanned looking for entries
1180which can be dropped. This may be required when the ARC reaches the
1181\fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
1182in the ARC. Increasing this value will cause to dentry and inode caches
1183to be pruned more aggressively. Setting this value to 0 will disable
1184pruning the inode and dentry caches.
29714574 1185.sp
2cbb06b5 1186Default value: \fB10,000\fR.
29714574
TF
1187.RE
1188
bc888666
BB
1189.sp
1190.ne 2
1191.na
ca85d690 1192\fBzfs_arc_meta_strategy\fR (int)
bc888666
BB
1193.ad
1194.RS 12n
ca85d690 1195Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
1196A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
d4a72f23 1197A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
ca85d690 1198that is required to in order to evict the required number of meta data buffers.
bc888666 1199.sp
ca85d690 1200Default value: \fB1\fR.
bc888666
BB
1201.RE
1202
29714574
TF
1203.sp
1204.ne 2
1205.na
1206\fBzfs_arc_min\fR (ulong)
1207.ad
1208.RS 12n
77f6826b 1209Min size of ARC in bytes. If set to 0 then arc_c_min will default to
ca85d690 1210consuming the larger of 32M or 1/32 of total system memory.
29714574 1211.sp
ca85d690 1212Default value: \fB0\fR.
29714574
TF
1213.RE
1214
1215.sp
1216.ne 2
1217.na
d4a72f23 1218\fBzfs_arc_min_prefetch_ms\fR (int)
29714574
TF
1219.ad
1220.RS 12n
d4a72f23 1221Minimum time prefetched blocks are locked in the ARC, specified in ms.
2b84817f 1222A value of \fB0\fR will default to 1000 ms.
d4a72f23
TC
1223.sp
1224Default value: \fB0\fR.
1225.RE
1226
1227.sp
1228.ne 2
1229.na
1230\fBzfs_arc_min_prescient_prefetch_ms\fR (int)
1231.ad
1232.RS 12n
1233Minimum time "prescient prefetched" blocks are locked in the ARC, specified
ac3d4d0c 1234in ms. These blocks are meant to be prefetched fairly aggressively ahead of
2b84817f 1235the code that may use them. A value of \fB0\fR will default to 6000 ms.
29714574 1236.sp
83426735 1237Default value: \fB0\fR.
29714574
TF
1238.RE
1239
6cb8e530
PZ
1240.sp
1241.ne 2
1242.na
1243\fBzfs_max_missing_tvds\fR (int)
1244.ad
1245.RS 12n
1246Number of missing top-level vdevs which will be allowed during
1247pool import (only in read-only mode).
1248.sp
1249Default value: \fB0\fR
1250.RE
1251
009cc8e8
RM
1252.sp
1253.ne 2
1254.na
1255\fBzfs_max_nvlist_src_size\fR (ulong)
1256.ad
1257.RS 12n
1258Maximum size in bytes allowed to be passed as zc_nvlist_src_size for ioctls on
1259/dev/zfs. This prevents a user from causing the kernel to allocate an excessive
1260amount of memory. When the limit is exceeded, the ioctl fails with EINVAL and a
1261description of the error is sent to the zfs-dbgmsg log. This parameter should
1262not need to be touched under normal circumstances. On FreeBSD, the default is
1263based on the system limit on user wired memory. On Linux, the default is
1dfc82a1 1264\fB128MB\fR.
009cc8e8
RM
1265.sp
1266Default value: \fB0\fR (kernel decides)
1267.RE
1268
ca0bf58d
PS
1269.sp
1270.ne 2
1271.na
c30e58c4 1272\fBzfs_multilist_num_sublists\fR (int)
ca0bf58d
PS
1273.ad
1274.RS 12n
1275To allow more fine-grained locking, each ARC state contains a series
1276of lists for both data and meta data objects. Locking is performed at
1277the level of these "sub-lists". This parameters controls the number of
c30e58c4
MA
1278sub-lists per ARC state, and also applies to other uses of the
1279multilist data structure.
ca0bf58d 1280.sp
c30e58c4 1281Default value: \fB4\fR or the number of online CPUs, whichever is greater
ca0bf58d
PS
1282.RE
1283
1284.sp
1285.ne 2
1286.na
1287\fBzfs_arc_overflow_shift\fR (int)
1288.ad
1289.RS 12n
1290The ARC size is considered to be overflowing if it exceeds the current
1291ARC target size (arc_c) by a threshold determined by this parameter.
1292The threshold is calculated as a fraction of arc_c using the formula
1293"arc_c >> \fBzfs_arc_overflow_shift\fR".
1294
1295The default value of 8 causes the ARC to be considered to be overflowing
1296if it exceeds the target size by 1/256th (0.3%) of the target size.
1297
1298When the ARC is overflowing, new buffer allocations are stalled until
1299the reclaim thread catches up and the overflow condition no longer exists.
1300.sp
1301Default value: \fB8\fR.
1302.RE
1303
728d6ae9
BB
1304.sp
1305.ne 2
1306.na
1307
1308\fBzfs_arc_p_min_shift\fR (int)
1309.ad
1310.RS 12n
ca85d690 1311If set to a non zero value, this will update arc_p_min_shift (default 4)
1312with the new value.
d4a72f23 1313arc_p_min_shift is used to shift of arc_c for calculating both min and max
ca85d690 1314max arc_p
728d6ae9 1315.sp
ca85d690 1316Default value: \fB0\fR.
728d6ae9
BB
1317.RE
1318
62422785
PS
1319.sp
1320.ne 2
1321.na
1322\fBzfs_arc_p_dampener_disable\fR (int)
1323.ad
1324.RS 12n
1325Disable arc_p adapt dampener
1326.sp
1327Use \fB1\fR for yes (default) and \fB0\fR to disable.
1328.RE
1329
29714574
TF
1330.sp
1331.ne 2
1332.na
1333\fBzfs_arc_shrink_shift\fR (int)
1334.ad
1335.RS 12n
ca85d690 1336If set to a non zero value, this will update arc_shrink_shift (default 7)
1337with the new value.
29714574 1338.sp
ca85d690 1339Default value: \fB0\fR.
29714574
TF
1340.RE
1341
03b60eee
DB
1342.sp
1343.ne 2
1344.na
1345\fBzfs_arc_pc_percent\fR (uint)
1346.ad
1347.RS 12n
1348Percent of pagecache to reclaim arc to
1349
1350This tunable allows ZFS arc to play more nicely with the kernel's LRU
77f6826b 1351pagecache. It can guarantee that the ARC size won't collapse under scanning
03b60eee
DB
1352pressure on the pagecache, yet still allows arc to be reclaimed down to
1353zfs_arc_min if necessary. This value is specified as percent of pagecache
1354size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
1355only operates during memory pressure/reclaim.
1356.sp
be54a13c 1357Default value: \fB0\fR% (disabled).
03b60eee
DB
1358.RE
1359
3442c2a0
MA
1360.sp
1361.ne 2
1362.na
1363\fBzfs_arc_shrinker_limit\fR (int)
1364.ad
1365.RS 12n
1366This is a limit on how many pages the ARC shrinker makes available for
1367eviction in response to one page allocation attempt. Note that in
1368practice, the kernel's shrinker can ask us to evict up to about 4x this
1369for one allocation attempt.
1370.sp
1371The default limit of 10,000 (in practice, 160MB per allocation attempt with
13724K pages) limits the amount of time spent attempting to reclaim ARC memory to
1373less than 100ms per allocation attempt, even with a small average compressed
1374block size of ~8KB.
1375.sp
1376The parameter can be set to 0 (zero) to disable the limit.
1377.sp
1378This parameter only applies on Linux.
1379.sp
1380Default value: \fB10,000\fR.
1381.RE
1382
11f552fa
BB
1383.sp
1384.ne 2
1385.na
1386\fBzfs_arc_sys_free\fR (ulong)
1387.ad
1388.RS 12n
1389The target number of bytes the ARC should leave as free memory on the system.
1390Defaults to the larger of 1/64 of physical memory or 512K. Setting this
1391option to a non-zero value will override the default.
1392.sp
1393Default value: \fB0\fR.
1394.RE
1395
29714574
TF
1396.sp
1397.ne 2
1398.na
1399\fBzfs_autoimport_disable\fR (int)
1400.ad
1401.RS 12n
27b293be 1402Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
29714574 1403.sp
70081096 1404Use \fB1\fR for yes (default) and \fB0\fR for no.
29714574
TF
1405.RE
1406
80d52c39
TH
1407.sp
1408.ne 2
1409.na
67709516 1410\fBzfs_checksum_events_per_second\fR (uint)
80d52c39
TH
1411.ad
1412.RS 12n
1413Rate limit checksum events to this many per second. Note that this should
1414not be set below the zed thresholds (currently 10 checksums over 10 sec)
1415or else zed may not trigger any action.
1416.sp
1417Default value: 20
1418.RE
1419
2fe61a7e
PS
1420.sp
1421.ne 2
1422.na
1423\fBzfs_commit_timeout_pct\fR (int)
1424.ad
1425.RS 12n
1426This controls the amount of time that a ZIL block (lwb) will remain "open"
1427when it isn't "full", and it has a thread waiting for it to be committed to
1428stable storage. The timeout is scaled based on a percentage of the last lwb
1429latency to avoid significantly impacting the latency of each individual
1430transaction record (itx).
1431.sp
be54a13c 1432Default value: \fB5\fR%.
2fe61a7e
PS
1433.RE
1434
67709516
D
1435.sp
1436.ne 2
1437.na
1438\fBzfs_condense_indirect_commit_entry_delay_ms\fR (int)
1439.ad
1440.RS 12n
1441Vdev indirection layer (used for device removal) sleeps for this many
1442milliseconds during mapping generation. Intended for use with the test suite
1443to throttle vdev removal speed.
1444.sp
1445Default value: \fB0\fR (no throttle).
1446.RE
1447
0dc2f70c
MA
1448.sp
1449.ne 2
1450.na
1451\fBzfs_condense_indirect_vdevs_enable\fR (int)
1452.ad
1453.RS 12n
1454Enable condensing indirect vdev mappings. When set to a non-zero value,
1455attempt to condense indirect vdev mappings if the mapping uses more than
1456\fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete
1457space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR
1458bytes on-disk. The condensing process is an attempt to save memory by
1459removing obsolete mappings.
1460.sp
1461Default value: \fB1\fR.
1462.RE
1463
1464.sp
1465.ne 2
1466.na
1467\fBzfs_condense_max_obsolete_bytes\fR (ulong)
1468.ad
1469.RS 12n
1470Only attempt to condense indirect vdev mappings if the on-disk size
1471of the obsolete space map object is greater than this number of bytes
1472(see \fBfBzfs_condense_indirect_vdevs_enable\fR).
1473.sp
1474Default value: \fB1,073,741,824\fR.
1475.RE
1476
1477.sp
1478.ne 2
1479.na
1480\fBzfs_condense_min_mapping_bytes\fR (ulong)
1481.ad
1482.RS 12n
1483Minimum size vdev mapping to attempt to condense (see
1484\fBzfs_condense_indirect_vdevs_enable\fR).
1485.sp
1486Default value: \fB131,072\fR.
1487.RE
1488
3b36f831
BB
1489.sp
1490.ne 2
1491.na
1492\fBzfs_dbgmsg_enable\fR (int)
1493.ad
1494.RS 12n
1495Internally ZFS keeps a small log to facilitate debugging. By default the log
1496is disabled, to enable it set this option to 1. The contents of the log can
1497be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
1498this proc file clears the log.
1499.sp
1500Default value: \fB0\fR.
1501.RE
1502
1503.sp
1504.ne 2
1505.na
1506\fBzfs_dbgmsg_maxsize\fR (int)
1507.ad
1508.RS 12n
1509The maximum size in bytes of the internal ZFS debug log.
1510.sp
1511Default value: \fB4M\fR.
1512.RE
1513
29714574
TF
1514.sp
1515.ne 2
1516.na
1517\fBzfs_dbuf_state_index\fR (int)
1518.ad
1519.RS 12n
83426735
D
1520This feature is currently unused. It is normally used for controlling what
1521reporting is available under /proc/spl/kstat/zfs.
29714574
TF
1522.sp
1523Default value: \fB0\fR.
1524.RE
1525
1526.sp
1527.ne 2
1528.na
1529\fBzfs_deadman_enabled\fR (int)
1530.ad
1531.RS 12n
b81a3ddc 1532When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
8fb1ede1
BB
1533milliseconds, or when an individual I/O takes longer than
1534\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
1535be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
1536invoked as described by the \fBzfs_deadman_failmode\fR module option.
1537By default the deadman is enabled and configured to \fBwait\fR which results
1538in "hung" I/Os only being logged. The deadman is automatically disabled
1539when a pool gets suspended.
29714574 1540.sp
8fb1ede1
BB
1541Default value: \fB1\fR.
1542.RE
1543
1544.sp
1545.ne 2
1546.na
1547\fBzfs_deadman_failmode\fR (charp)
1548.ad
1549.RS 12n
1550Controls the failure behavior when the deadman detects a "hung" I/O. Valid
1551values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
1552.sp
1553\fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
1554"deadman" event will be posted describing that I/O.
1555.sp
1556\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
1557to the I/O pipeline if possible.
1558.sp
1559\fBpanic\fR - Panic the system. This can be used to facilitate an automatic
1560fail-over to a properly configured fail-over partner.
1561.sp
1562Default value: \fBwait\fR.
b81a3ddc
TC
1563.RE
1564
1565.sp
1566.ne 2
1567.na
1568\fBzfs_deadman_checktime_ms\fR (int)
1569.ad
1570.RS 12n
8fb1ede1
BB
1571Check time in milliseconds. This defines the frequency at which we check
1572for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
b81a3ddc 1573.sp
8fb1ede1 1574Default value: \fB60,000\fR.
29714574
TF
1575.RE
1576
1577.sp
1578.ne 2
1579.na
e8b96c60 1580\fBzfs_deadman_synctime_ms\fR (ulong)
29714574
TF
1581.ad
1582.RS 12n
b81a3ddc 1583Interval in milliseconds after which the deadman is triggered and also
8fb1ede1
BB
1584the interval after which a pool sync operation is considered to be "hung".
1585Once this limit is exceeded the deadman will be invoked every
1586\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
1587.sp
1588Default value: \fB600,000\fR.
1589.RE
b81a3ddc 1590
29714574 1591.sp
8fb1ede1
BB
1592.ne 2
1593.na
1594\fBzfs_deadman_ziotime_ms\fR (ulong)
1595.ad
1596.RS 12n
1597Interval in milliseconds after which the deadman is triggered and an
ad796b8a 1598individual I/O operation is considered to be "hung". As long as the I/O
8fb1ede1
BB
1599remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
1600milliseconds until the I/O completes.
1601.sp
1602Default value: \fB300,000\fR.
29714574
TF
1603.RE
1604
1605.sp
1606.ne 2
1607.na
1608\fBzfs_dedup_prefetch\fR (int)
1609.ad
1610.RS 12n
1611Enable prefetching dedup-ed blks
1612.sp
0dfc7324 1613Use \fB1\fR for yes and \fB0\fR to disable (default).
29714574
TF
1614.RE
1615
e8b96c60
MA
1616.sp
1617.ne 2
1618.na
1619\fBzfs_delay_min_dirty_percent\fR (int)
1620.ad
1621.RS 12n
1622Start to delay each transaction once there is this amount of dirty data,
1623expressed as a percentage of \fBzfs_dirty_data_max\fR.
1624This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
1625See the section "ZFS TRANSACTION DELAY".
1626.sp
be54a13c 1627Default value: \fB60\fR%.
e8b96c60
MA
1628.RE
1629
1630.sp
1631.ne 2
1632.na
1633\fBzfs_delay_scale\fR (int)
1634.ad
1635.RS 12n
1636This controls how quickly the transaction delay approaches infinity.
1637Larger values cause longer delays for a given amount of dirty data.
1638.sp
1639For the smoothest delay, this value should be about 1 billion divided
1640by the maximum number of operations per second. This will smoothly
1641handle between 10x and 1/10th this number.
1642.sp
1643See the section "ZFS TRANSACTION DELAY".
1644.sp
1645Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
1646.sp
1647Default value: \fB500,000\fR.
1648.RE
1649
67709516
D
1650.sp
1651.ne 2
1652.na
1653\fBzfs_disable_ivset_guid_check\fR (int)
1654.ad
1655.RS 12n
1656Disables requirement for IVset guids to be present and match when doing a raw
1657receive of encrypted datasets. Intended for users whose pools were created with
d0249a4b 1658OpenZFS pre-release versions and now have compatibility issues.
67709516
D
1659.sp
1660Default value: \fB0\fR.
1661.RE
1662
1663.sp
1664.ne 2
1665.na
1666\fBzfs_key_max_salt_uses\fR (ulong)
1667.ad
1668.RS 12n
1669Maximum number of uses of a single salt value before generating a new one for
1670encrypted datasets. The default value is also the maximum that will be
1671accepted.
1672.sp
1673Default value: \fB400,000,000\fR.
1674.RE
1675
1676.sp
1677.ne 2
1678.na
1679\fBzfs_object_mutex_size\fR (uint)
1680.ad
1681.RS 12n
1682Size of the znode hashtable used for holds.
1683
1684Due to the need to hold locks on objects that may not exist yet, kernel mutexes
1685are not created per-object and instead a hashtable is used where collisions
1686will result in objects waiting when there is not actually contention on the
1687same object.
1688.sp
1689Default value: \fB64\fR.
1690.RE
1691
80d52c39
TH
1692.sp
1693.ne 2
1694.na
62ee31ad 1695\fBzfs_slow_io_events_per_second\fR (int)
80d52c39
TH
1696.ad
1697.RS 12n
ad796b8a 1698Rate limit delay zevents (which report slow I/Os) to this many per second.
80d52c39
TH
1699.sp
1700Default value: 20
1701.RE
1702
93e28d66
SD
1703.sp
1704.ne 2
1705.na
1706\fBzfs_unflushed_max_mem_amt\fR (ulong)
1707.ad
1708.RS 12n
1709Upper-bound limit for unflushed metadata changes to be held by the
1710log spacemap in memory (in bytes).
1711.sp
1712Default value: \fB1,073,741,824\fR (1GB).
1713.RE
1714
1715.sp
1716.ne 2
1717.na
1718\fBzfs_unflushed_max_mem_ppm\fR (ulong)
1719.ad
1720.RS 12n
1721Percentage of the overall system memory that ZFS allows to be used
1722for unflushed metadata changes by the log spacemap.
1723(value is calculated over 1000000 for finer granularity).
1724.sp
1725Default value: \fB1000\fR (which is divided by 1000000, resulting in
1726the limit to be \fB0.1\fR% of memory)
1727.RE
1728
1729.sp
1730.ne 2
1731.na
1732\fBzfs_unflushed_log_block_max\fR (ulong)
1733.ad
1734.RS 12n
1735Describes the maximum number of log spacemap blocks allowed for each pool.
1736The default value of 262144 means that the space in all the log spacemaps
1737can add up to no more than 262144 blocks (which means 32GB of logical
1738space before compression and ditto blocks, assuming that blocksize is
1739128k).
1740.sp
1741This tunable is important because it involves a trade-off between import
1742time after an unclean export and the frequency of flushing metaslabs.
1743The higher this number is, the more log blocks we allow when the pool is
1744active which means that we flush metaslabs less often and thus decrease
1745the number of I/Os for spacemap updates per TXG.
1746At the same time though, that means that in the event of an unclean export,
1747there will be more log spacemap blocks for us to read, inducing overhead
1748in the import time of the pool.
1749The lower the number, the amount of flushing increases destroying log
1750blocks quicker as they become obsolete faster, which leaves less blocks
1751to be read during import time after a crash.
1752.sp
1753Each log spacemap block existing during pool import leads to approximately
1754one extra logical I/O issued.
1755This is the reason why this tunable is exposed in terms of blocks rather
1756than space used.
1757.sp
1758Default value: \fB262144\fR (256K).
1759.RE
1760
1761.sp
1762.ne 2
1763.na
1764\fBzfs_unflushed_log_block_min\fR (ulong)
1765.ad
1766.RS 12n
1767If the number of metaslabs is small and our incoming rate is high, we
1768could get into a situation that we are flushing all our metaslabs every
1769TXG.
1770Thus we always allow at least this many log blocks.
1771.sp
1772Default value: \fB1000\fR.
1773.RE
1774
1775.sp
1776.ne 2
1777.na
1778\fBzfs_unflushed_log_block_pct\fR (ulong)
1779.ad
1780.RS 12n
1781Tunable used to determine the number of blocks that can be used for
1782the spacemap log, expressed as a percentage of the total number of
1783metaslabs in the pool.
1784.sp
1785Default value: \fB400\fR (read as \fB400\fR% - meaning that the number
1786of log spacemap blocks are capped at 4 times the number of
1787metaslabs in the pool).
1788.RE
1789
dcec0a12
AP
1790.sp
1791.ne 2
1792.na
1793\fBzfs_unlink_suspend_progress\fR (uint)
1794.ad
1795.RS 12n
1796When enabled, files will not be asynchronously removed from the list of pending
1797unlinks and the space they consume will be leaked. Once this option has been
1798disabled and the dataset is remounted, the pending unlinks will be processed
1799and the freed space returned to the pool.
1800This option is used by the test suite to facilitate testing.
1801.sp
1802Uses \fB0\fR (default) to allow progress and \fB1\fR to pause progress.
1803.RE
1804
a966c564
K
1805.sp
1806.ne 2
1807.na
1808\fBzfs_delete_blocks\fR (ulong)
1809.ad
1810.RS 12n
1811This is the used to define a large file for the purposes of delete. Files
1812containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
1813while smaller files are deleted synchronously. Decreasing this value will
1814reduce the time spent in an unlink(2) system call at the expense of a longer
1815delay before the freed space is available.
1816.sp
1817Default value: \fB20,480\fR.
1818.RE
1819
e8b96c60
MA
1820.sp
1821.ne 2
1822.na
1823\fBzfs_dirty_data_max\fR (int)
1824.ad
1825.RS 12n
1826Determines the dirty space limit in bytes. Once this limit is exceeded, new
1827writes are halted until space frees up. This parameter takes precedence
1828over \fBzfs_dirty_data_max_percent\fR.
1829See the section "ZFS TRANSACTION DELAY".
1830.sp
be54a13c 1831Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR.
e8b96c60
MA
1832.RE
1833
1834.sp
1835.ne 2
1836.na
1837\fBzfs_dirty_data_max_max\fR (int)
1838.ad
1839.RS 12n
1840Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
1841This limit is only enforced at module load time, and will be ignored if
1842\fBzfs_dirty_data_max\fR is later changed. This parameter takes
1843precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
1844"ZFS TRANSACTION DELAY".
1845.sp
be54a13c 1846Default value: \fB25\fR% of physical RAM.
e8b96c60
MA
1847.RE
1848
1849.sp
1850.ne 2
1851.na
1852\fBzfs_dirty_data_max_max_percent\fR (int)
1853.ad
1854.RS 12n
1855Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
1856percentage of physical RAM. This limit is only enforced at module load
1857time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
1858The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
1859one. See the section "ZFS TRANSACTION DELAY".
1860.sp
be54a13c 1861Default value: \fB25\fR%.
e8b96c60
MA
1862.RE
1863
1864.sp
1865.ne 2
1866.na
1867\fBzfs_dirty_data_max_percent\fR (int)
1868.ad
1869.RS 12n
1870Determines the dirty space limit, expressed as a percentage of all
1871memory. Once this limit is exceeded, new writes are halted until space frees
1872up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
1873one. See the section "ZFS TRANSACTION DELAY".
1874.sp
be54a13c 1875Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR.
e8b96c60
MA
1876.RE
1877
1878.sp
1879.ne 2
1880.na
dfbe2675 1881\fBzfs_dirty_data_sync_percent\fR (int)
e8b96c60
MA
1882.ad
1883.RS 12n
dfbe2675
MA
1884Start syncing out a transaction group if there's at least this much dirty data
1885as a percentage of \fBzfs_dirty_data_max\fR. This should be less than
1886\fBzfs_vdev_async_write_active_min_dirty_percent\fR.
e8b96c60 1887.sp
dfbe2675 1888Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR.
e8b96c60
MA
1889.RE
1890
f734301d
AD
1891.sp
1892.ne 2
1893.na
1894\fBzfs_fallocate_reserve_percent\fR (uint)
1895.ad
1896.RS 12n
1897Since ZFS is a copy-on-write filesystem with snapshots, blocks cannot be
1898preallocated for a file in order to guarantee that later writes will not
1899run out of space. Instead, fallocate() space preallocation only checks
1900that sufficient space is currently available in the pool or the user's
1901project quota allocation, and then creates a sparse file of the requested
1902size. The requested space is multiplied by \fBzfs_fallocate_reserve_percent\fR
1903to allow additional space for indirect blocks and other internal metadata.
1904Setting this value to 0 disables support for fallocate(2) and returns
1905EOPNOTSUPP for fallocate() space preallocation again.
1906.sp
1907Default value: \fB110\fR%
1908.RE
1909
1eeb4562
JX
1910.sp
1911.ne 2
1912.na
1913\fBzfs_fletcher_4_impl\fR (string)
1914.ad
1915.RS 12n
1916Select a fletcher 4 implementation.
1917.sp
35a76a03 1918Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
0b2a6423 1919\fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR.
70b258fc
GN
1920All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
1921set extensions to be available and will only appear if ZFS detects that they are
1922present at runtime. If multiple implementations of fletcher 4 are available,
1923the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
1924results in the original, CPU based calculation, being used. Selecting any option
1925other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
1926the respective CPU instruction set being used.
1eeb4562
JX
1927.sp
1928Default value: \fBfastest\fR.
1929.RE
1930
ba5ad9a4
GW
1931.sp
1932.ne 2
1933.na
1934\fBzfs_free_bpobj_enabled\fR (int)
1935.ad
1936.RS 12n
1937Enable/disable the processing of the free_bpobj object.
1938.sp
1939Default value: \fB1\fR.
1940.RE
1941
36283ca2
MG
1942.sp
1943.ne 2
1944.na
a1d477c2 1945\fBzfs_async_block_max_blocks\fR (ulong)
36283ca2
MG
1946.ad
1947.RS 12n
1948Maximum number of blocks freed in a single txg.
1949.sp
4fe3a842
MA
1950Default value: \fBULONG_MAX\fR (unlimited).
1951.RE
1952
1953.sp
1954.ne 2
1955.na
1956\fBzfs_max_async_dedup_frees\fR (ulong)
1957.ad
1958.RS 12n
1959Maximum number of dedup blocks freed in a single txg.
1960.sp
36283ca2
MG
1961Default value: \fB100,000\fR.
1962.RE
1963
ca0845d5
PD
1964.sp
1965.ne 2
1966.na
1967\fBzfs_override_estimate_recordsize\fR (ulong)
1968.ad
1969.RS 12n
1970Record size calculation override for zfs send estimates.
1971.sp
1972Default value: \fB0\fR.
1973.RE
1974
e8b96c60
MA
1975.sp
1976.ne 2
1977.na
1978\fBzfs_vdev_async_read_max_active\fR (int)
1979.ad
1980.RS 12n
83426735 1981Maximum asynchronous read I/Os active to each device.
e8b96c60
MA
1982See the section "ZFS I/O SCHEDULER".
1983.sp
1984Default value: \fB3\fR.
1985.RE
1986
1987.sp
1988.ne 2
1989.na
1990\fBzfs_vdev_async_read_min_active\fR (int)
1991.ad
1992.RS 12n
1993Minimum asynchronous read I/Os active to each device.
1994See the section "ZFS I/O SCHEDULER".
1995.sp
1996Default value: \fB1\fR.
1997.RE
1998
1999.sp
2000.ne 2
2001.na
2002\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
2003.ad
2004.RS 12n
2005When the pool has more than
2006\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
2007\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
2008the dirty data is between min and max, the active I/O limit is linearly
2009interpolated. See the section "ZFS I/O SCHEDULER".
2010.sp
be54a13c 2011Default value: \fB60\fR%.
e8b96c60
MA
2012.RE
2013
2014.sp
2015.ne 2
2016.na
2017\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
2018.ad
2019.RS 12n
2020When the pool has less than
2021\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
2022\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
2023the dirty data is between min and max, the active I/O limit is linearly
2024interpolated. See the section "ZFS I/O SCHEDULER".
2025.sp
be54a13c 2026Default value: \fB30\fR%.
e8b96c60
MA
2027.RE
2028
2029.sp
2030.ne 2
2031.na
2032\fBzfs_vdev_async_write_max_active\fR (int)
2033.ad
2034.RS 12n
83426735 2035Maximum asynchronous write I/Os active to each device.
e8b96c60
MA
2036See the section "ZFS I/O SCHEDULER".
2037.sp
2038Default value: \fB10\fR.
2039.RE
2040
2041.sp
2042.ne 2
2043.na
2044\fBzfs_vdev_async_write_min_active\fR (int)
2045.ad
2046.RS 12n
2047Minimum asynchronous write I/Os active to each device.
2048See the section "ZFS I/O SCHEDULER".
2049.sp
06226b59
D
2050Lower values are associated with better latency on rotational media but poorer
2051resilver performance. The default value of 2 was chosen as a compromise. A
2052value of 3 has been shown to improve resilver performance further at a cost of
2053further increasing latency.
2054.sp
2055Default value: \fB2\fR.
e8b96c60
MA
2056.RE
2057
619f0976
GW
2058.sp
2059.ne 2
2060.na
2061\fBzfs_vdev_initializing_max_active\fR (int)
2062.ad
2063.RS 12n
2064Maximum initializing I/Os active to each device.
2065See the section "ZFS I/O SCHEDULER".
2066.sp
2067Default value: \fB1\fR.
2068.RE
2069
2070.sp
2071.ne 2
2072.na
2073\fBzfs_vdev_initializing_min_active\fR (int)
2074.ad
2075.RS 12n
2076Minimum initializing I/Os active to each device.
2077See the section "ZFS I/O SCHEDULER".
2078.sp
2079Default value: \fB1\fR.
2080.RE
2081
e8b96c60
MA
2082.sp
2083.ne 2
2084.na
2085\fBzfs_vdev_max_active\fR (int)
2086.ad
2087.RS 12n
2088The maximum number of I/Os active to each device. Ideally, this will be >=
6f5aac3c 2089the sum of each queue's max_active. See the section "ZFS I/O SCHEDULER".
e8b96c60
MA
2090.sp
2091Default value: \fB1,000\fR.
2092.RE
2093
9a49d3f3
BB
2094.sp
2095.ne 2
2096.na
2097\fBzfs_vdev_rebuild_max_active\fR (int)
2098.ad
2099.RS 12n
2100Maximum sequential resilver I/Os active to each device.
2101See the section "ZFS I/O SCHEDULER".
2102.sp
2103Default value: \fB3\fR.
2104.RE
2105
2106.sp
2107.ne 2
2108.na
2109\fBzfs_vdev_rebuild_min_active\fR (int)
2110.ad
2111.RS 12n
2112Minimum sequential resilver I/Os active to each device.
2113See the section "ZFS I/O SCHEDULER".
2114.sp
2115Default value: \fB1\fR.
2116.RE
2117
619f0976
GW
2118.sp
2119.ne 2
2120.na
2121\fBzfs_vdev_removal_max_active\fR (int)
2122.ad
2123.RS 12n
2124Maximum removal I/Os active to each device.
2125See the section "ZFS I/O SCHEDULER".
2126.sp
2127Default value: \fB2\fR.
2128.RE
2129
2130.sp
2131.ne 2
2132.na
2133\fBzfs_vdev_removal_min_active\fR (int)
2134.ad
2135.RS 12n
2136Minimum removal I/Os active to each device.
2137See the section "ZFS I/O SCHEDULER".
2138.sp
2139Default value: \fB1\fR.
2140.RE
2141
e8b96c60
MA
2142.sp
2143.ne 2
2144.na
2145\fBzfs_vdev_scrub_max_active\fR (int)
2146.ad
2147.RS 12n
83426735 2148Maximum scrub I/Os active to each device.
e8b96c60
MA
2149See the section "ZFS I/O SCHEDULER".
2150.sp
2151Default value: \fB2\fR.
2152.RE
2153
2154.sp
2155.ne 2
2156.na
2157\fBzfs_vdev_scrub_min_active\fR (int)
2158.ad
2159.RS 12n
2160Minimum scrub I/Os active to each device.
2161See the section "ZFS I/O SCHEDULER".
2162.sp
2163Default value: \fB1\fR.
2164.RE
2165
2166.sp
2167.ne 2
2168.na
2169\fBzfs_vdev_sync_read_max_active\fR (int)
2170.ad
2171.RS 12n
83426735 2172Maximum synchronous read I/Os active to each device.
e8b96c60
MA
2173See the section "ZFS I/O SCHEDULER".
2174.sp
2175Default value: \fB10\fR.
2176.RE
2177
2178.sp
2179.ne 2
2180.na
2181\fBzfs_vdev_sync_read_min_active\fR (int)
2182.ad
2183.RS 12n
2184Minimum synchronous read I/Os active to each device.
2185See the section "ZFS I/O SCHEDULER".
2186.sp
2187Default value: \fB10\fR.
2188.RE
2189
2190.sp
2191.ne 2
2192.na
2193\fBzfs_vdev_sync_write_max_active\fR (int)
2194.ad
2195.RS 12n
83426735 2196Maximum synchronous write I/Os active to each device.
e8b96c60
MA
2197See the section "ZFS I/O SCHEDULER".
2198.sp
2199Default value: \fB10\fR.
2200.RE
2201
2202.sp
2203.ne 2
2204.na
2205\fBzfs_vdev_sync_write_min_active\fR (int)
2206.ad
2207.RS 12n
2208Minimum synchronous write I/Os active to each device.
2209See the section "ZFS I/O SCHEDULER".
2210.sp
2211Default value: \fB10\fR.
2212.RE
2213
1b939560
BB
2214.sp
2215.ne 2
2216.na
2217\fBzfs_vdev_trim_max_active\fR (int)
2218.ad
2219.RS 12n
2220Maximum trim/discard I/Os active to each device.
2221See the section "ZFS I/O SCHEDULER".
2222.sp
2223Default value: \fB2\fR.
2224.RE
2225
2226.sp
2227.ne 2
2228.na
2229\fBzfs_vdev_trim_min_active\fR (int)
2230.ad
2231.RS 12n
2232Minimum trim/discard I/Os active to each device.
2233See the section "ZFS I/O SCHEDULER".
2234.sp
2235Default value: \fB1\fR.
2236.RE
2237
6f5aac3c
AM
2238.sp
2239.ne 2
2240.na
2241\fBzfs_vdev_nia_delay\fR (int)
2242.ad
2243.RS 12n
2244For non-interactive I/O (scrub, resilver, removal, initialize and rebuild),
2245the number of concurrently-active I/O's is limited to *_min_active, unless
2246the vdev is "idle". When there are no interactive I/Os active (sync or
2247async), and zfs_vdev_nia_delay I/Os have completed since the last
2248interactive I/O, then the vdev is considered to be "idle", and the number
2249of concurrently-active non-interactive I/O's is increased to *_max_active.
2250See the section "ZFS I/O SCHEDULER".
2251.sp
2252Default value: \fB5\fR.
2253.RE
2254
2255.sp
2256.ne 2
2257.na
2258\fBzfs_vdev_nia_credit\fR (int)
2259.ad
2260.RS 12n
2261Some HDDs tend to prioritize sequential I/O so high, that concurrent
2262random I/O latency reaches several seconds. On some HDDs it happens
2263even if sequential I/Os are submitted one at a time, and so setting
2264*_max_active to 1 does not help. To prevent non-interactive I/Os, like
2265scrub, from monopolizing the device no more than zfs_vdev_nia_credit
2266I/Os can be sent while there are outstanding incomplete interactive
2267I/Os. This enforced wait ensures the HDD services the interactive I/O
2268within a reasonable amount of time.
2269See the section "ZFS I/O SCHEDULER".
2270.sp
2271Default value: \fB5\fR.
2272.RE
2273
3dfb57a3
DB
2274.sp
2275.ne 2
2276.na
2277\fBzfs_vdev_queue_depth_pct\fR (int)
2278.ad
2279.RS 12n
e815485f
TC
2280Maximum number of queued allocations per top-level vdev expressed as
2281a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
2282system to detect devices that are more capable of handling allocations
2283and to allocate more blocks to those devices. It allows for dynamic
2284allocation distribution when devices are imbalanced as fuller devices
2285will tend to be slower than empty devices.
2286
2287See also \fBzio_dva_throttle_enabled\fR.
3dfb57a3 2288.sp
be54a13c 2289Default value: \fB1000\fR%.
3dfb57a3
DB
2290.RE
2291
29714574
TF
2292.sp
2293.ne 2
2294.na
2295\fBzfs_expire_snapshot\fR (int)
2296.ad
2297.RS 12n
2298Seconds to expire .zfs/snapshot
2299.sp
2300Default value: \fB300\fR.
2301.RE
2302
0500e835
BB
2303.sp
2304.ne 2
2305.na
2306\fBzfs_admin_snapshot\fR (int)
2307.ad
2308.RS 12n
2309Allow the creation, removal, or renaming of entries in the .zfs/snapshot
2310directory to cause the creation, destruction, or renaming of snapshots.
2311When enabled this functionality works both locally and over NFS exports
2312which have the 'no_root_squash' option set. This functionality is disabled
2313by default.
2314.sp
2315Use \fB1\fR for yes and \fB0\fR for no (default).
2316.RE
2317
29714574
TF
2318.sp
2319.ne 2
2320.na
2321\fBzfs_flags\fR (int)
2322.ad
2323.RS 12n
33b6dbbc
NB
2324Set additional debugging flags. The following flags may be bitwise-or'd
2325together.
2326.sp
2327.TS
2328box;
2329rB lB
2330lB lB
2331r l.
2332Value Symbolic Name
2333 Description
2334_
23351 ZFS_DEBUG_DPRINTF
2336 Enable dprintf entries in the debug log.
2337_
23382 ZFS_DEBUG_DBUF_VERIFY *
2339 Enable extra dbuf verifications.
2340_
23414 ZFS_DEBUG_DNODE_VERIFY *
2342 Enable extra dnode verifications.
2343_
23448 ZFS_DEBUG_SNAPNAMES
2345 Enable snapshot name verification.
2346_
234716 ZFS_DEBUG_MODIFY
2348 Check for illegally modified ARC buffers.
2349_
33b6dbbc
NB
235064 ZFS_DEBUG_ZIO_FREE
2351 Enable verification of block frees.
2352_
2353128 ZFS_DEBUG_HISTOGRAM_VERIFY
2354 Enable extra spacemap histogram verifications.
8740cf4a
NB
2355_
2356256 ZFS_DEBUG_METASLAB_VERIFY
2357 Verify space accounting on disk matches in-core range_trees.
2358_
2359512 ZFS_DEBUG_SET_ERROR
2360 Enable SET_ERROR and dprintf entries in the debug log.
1b939560
BB
2361_
23621024 ZFS_DEBUG_INDIRECT_REMAP
2363 Verify split blocks created by device removal.
2364_
23652048 ZFS_DEBUG_TRIM
2366 Verify TRIM ranges are always within the allocatable range tree.
93e28d66
SD
2367_
23684096 ZFS_DEBUG_LOG_SPACEMAP
2369 Verify that the log summary is consistent with the spacemap log
2370 and enable zfs_dbgmsgs for metaslab loading and flushing.
33b6dbbc
NB
2371.TE
2372.sp
2373* Requires debug build.
29714574 2374.sp
33b6dbbc 2375Default value: \fB0\fR.
29714574
TF
2376.RE
2377
fbeddd60
MA
2378.sp
2379.ne 2
2380.na
2381\fBzfs_free_leak_on_eio\fR (int)
2382.ad
2383.RS 12n
2384If destroy encounters an EIO while reading metadata (e.g. indirect
2385blocks), space referenced by the missing metadata can not be freed.
2386Normally this causes the background destroy to become "stalled", as
2387it is unable to make forward progress. While in this stalled state,
2388all remaining space to free from the error-encountering filesystem is
2389"temporarily leaked". Set this flag to cause it to ignore the EIO,
2390permanently leak the space from indirect blocks that can not be read,
2391and continue to free everything else that it can.
2392
2393The default, "stalling" behavior is useful if the storage partially
2394fails (i.e. some but not all i/os fail), and then later recovers. In
2395this case, we will be able to continue pool operations while it is
2396partially failed, and when it recovers, we can continue to free the
2397space, with no leaks. However, note that this case is actually
2398fairly rare.
2399
2400Typically pools either (a) fail completely (but perhaps temporarily,
2401e.g. a top-level vdev going offline), or (b) have localized,
2402permanent errors (e.g. disk returns the wrong data due to bit flip or
2403firmware bug). In case (a), this setting does not matter because the
2404pool will be suspended and the sync thread will not be able to make
2405forward progress regardless. In case (b), because the error is
2406permanent, the best we can do is leak the minimum amount of space,
2407which is what setting this flag will do. Therefore, it is reasonable
2408for this flag to normally be set, but we chose the more conservative
2409approach of not setting it, so that there is no possibility of
2410leaking space in the "partial temporary" failure case.
2411.sp
2412Default value: \fB0\fR.
2413.RE
2414
29714574
TF
2415.sp
2416.ne 2
2417.na
2418\fBzfs_free_min_time_ms\fR (int)
2419.ad
2420.RS 12n
6146e17e 2421During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
83426735 2422of this much time will be spent working on freeing blocks per txg.
29714574
TF
2423.sp
2424Default value: \fB1,000\fR.
2425.RE
2426
67709516
D
2427.sp
2428.ne 2
2429.na
2430\fBzfs_obsolete_min_time_ms\fR (int)
2431.ad
2432.RS 12n
dd4bc569 2433Similar to \fBzfs_free_min_time_ms\fR but for cleanup of old indirection records
67709516
D
2434for removed vdevs.
2435.sp
2436Default value: \fB500\fR.
2437.RE
2438
29714574
TF
2439.sp
2440.ne 2
2441.na
2442\fBzfs_immediate_write_sz\fR (long)
2443.ad
2444.RS 12n
83426735 2445Largest data block to write to zil. Larger blocks will be treated as if the
6146e17e 2446dataset being written to had the property setting \fBlogbias=throughput\fR.
29714574
TF
2447.sp
2448Default value: \fB32,768\fR.
2449.RE
2450
619f0976
GW
2451.sp
2452.ne 2
2453.na
2454\fBzfs_initialize_value\fR (ulong)
2455.ad
2456.RS 12n
2457Pattern written to vdev free space by \fBzpool initialize\fR.
2458.sp
2459Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee).
2460.RE
2461
e60e158e
JG
2462.sp
2463.ne 2
2464.na
2465\fBzfs_initialize_chunk_size\fR (ulong)
2466.ad
2467.RS 12n
2468Size of writes used by \fBzpool initialize\fR.
2469This option is used by the test suite to facilitate testing.
2470.sp
2471Default value: \fB1,048,576\fR
2472.RE
2473
37f03da8
SH
2474.sp
2475.ne 2
2476.na
2477\fBzfs_livelist_max_entries\fR (ulong)
2478.ad
2479.RS 12n
2480The threshold size (in block pointers) at which we create a new sub-livelist.
2481Larger sublists are more costly from a memory perspective but the fewer
2482sublists there are, the lower the cost of insertion.
2483.sp
2484Default value: \fB500,000\fR.
2485.RE
2486
2487.sp
2488.ne 2
2489.na
2490\fBzfs_livelist_min_percent_shared\fR (int)
2491.ad
2492.RS 12n
2493If the amount of shared space between a snapshot and its clone drops below
2494this threshold, the clone turns off the livelist and reverts to the old deletion
2495method. This is in place because once a clone has been overwritten enough
2496livelists no long give us a benefit.
2497.sp
2498Default value: \fB75\fR.
2499.RE
2500
2501.sp
2502.ne 2
2503.na
2504\fBzfs_livelist_condense_new_alloc\fR (int)
2505.ad
2506.RS 12n
2507Incremented each time an extra ALLOC blkptr is added to a livelist entry while
2508it is being condensed.
2509This option is used by the test suite to track race conditions.
2510.sp
2511Default value: \fB0\fR.
2512.RE
2513
2514.sp
2515.ne 2
2516.na
2517\fBzfs_livelist_condense_sync_cancel\fR (int)
2518.ad
2519.RS 12n
2520Incremented each time livelist condensing is canceled while in
2521spa_livelist_condense_sync.
2522This option is used by the test suite to track race conditions.
2523.sp
2524Default value: \fB0\fR.
2525.RE
2526
2527.sp
2528.ne 2
2529.na
2530\fBzfs_livelist_condense_sync_pause\fR (int)
2531.ad
2532.RS 12n
2533When set, the livelist condense process pauses indefinitely before
2534executing the synctask - spa_livelist_condense_sync.
2535This option is used by the test suite to trigger race conditions.
2536.sp
2537Default value: \fB0\fR.
2538.RE
2539
2540.sp
2541.ne 2
2542.na
2543\fBzfs_livelist_condense_zthr_cancel\fR (int)
2544.ad
2545.RS 12n
2546Incremented each time livelist condensing is canceled while in
2547spa_livelist_condense_cb.
2548This option is used by the test suite to track race conditions.
2549.sp
2550Default value: \fB0\fR.
2551.RE
2552
2553.sp
2554.ne 2
2555.na
2556\fBzfs_livelist_condense_zthr_pause\fR (int)
2557.ad
2558.RS 12n
2559When set, the livelist condense process pauses indefinitely before
2560executing the open context condensing work in spa_livelist_condense_cb.
2561This option is used by the test suite to trigger race conditions.
2562.sp
2563Default value: \fB0\fR.
2564.RE
2565
917f475f
JG
2566.sp
2567.ne 2
2568.na
2569\fBzfs_lua_max_instrlimit\fR (ulong)
2570.ad
2571.RS 12n
2572The maximum execution time limit that can be set for a ZFS channel program,
2573specified as a number of Lua instructions.
2574.sp
2575Default value: \fB100,000,000\fR.
2576.RE
2577
2578.sp
2579.ne 2
2580.na
2581\fBzfs_lua_max_memlimit\fR (ulong)
2582.ad
2583.RS 12n
2584The maximum memory limit that can be set for a ZFS channel program, specified
2585in bytes.
2586.sp
2587Default value: \fB104,857,600\fR.
2588.RE
2589
a7ed98d8
SD
2590.sp
2591.ne 2
2592.na
2593\fBzfs_max_dataset_nesting\fR (int)
2594.ad
2595.RS 12n
2596The maximum depth of nested datasets. This value can be tuned temporarily to
2597fix existing datasets that exceed the predefined limit.
2598.sp
2599Default value: \fB50\fR.
2600.RE
2601
93e28d66
SD
2602.sp
2603.ne 2
2604.na
2605\fBzfs_max_log_walking\fR (ulong)
2606.ad
2607.RS 12n
2608The number of past TXGs that the flushing algorithm of the log spacemap
2609feature uses to estimate incoming log blocks.
2610.sp
2611Default value: \fB5\fR.
2612.RE
2613
2614.sp
2615.ne 2
2616.na
2617\fBzfs_max_logsm_summary_length\fR (ulong)
2618.ad
2619.RS 12n
2620Maximum number of rows allowed in the summary of the spacemap log.
2621.sp
2622Default value: \fB10\fR.
2623.RE
2624
f1512ee6
MA
2625.sp
2626.ne 2
2627.na
2628\fBzfs_max_recordsize\fR (int)
2629.ad
2630.RS 12n
2631We currently support block sizes from 512 bytes to 16MB. The benefits of
ad796b8a 2632larger blocks, and thus larger I/O, need to be weighed against the cost of
f1512ee6
MA
2633COWing a giant block to modify one byte. Additionally, very large blocks
2634can have an impact on i/o latency, and also potentially on the memory
2635allocator. Therefore, we do not allow the recordsize to be set larger than
2636zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
2637this tunable, and pools with larger blocks can always be imported and used,
2638regardless of this setting.
2639.sp
2640Default value: \fB1,048,576\fR.
2641.RE
2642
30af21b0
PD
2643.sp
2644.ne 2
2645.na
2646\fBzfs_allow_redacted_dataset_mount\fR (int)
2647.ad
2648.RS 12n
2649Allow datasets received with redacted send/receive to be mounted. Normally
2650disabled because these datasets may be missing key data.
2651.sp
2652Default value: \fB0\fR.
2653.RE
2654
93e28d66
SD
2655.sp
2656.ne 2
2657.na
2658\fBzfs_min_metaslabs_to_flush\fR (ulong)
2659.ad
2660.RS 12n
2661Minimum number of metaslabs to flush per dirty TXG
2662.sp
2663Default value: \fB1\fR.
2664.RE
2665
f3a7f661
GW
2666.sp
2667.ne 2
2668.na
2669\fBzfs_metaslab_fragmentation_threshold\fR (int)
2670.ad
2671.RS 12n
2672Allow metaslabs to keep their active state as long as their fragmentation
2673percentage is less than or equal to this value. An active metaslab that
2674exceeds this threshold will no longer keep its active status allowing
2675better metaslabs to be selected.
2676.sp
2677Default value: \fB70\fR.
2678.RE
2679
2680.sp
2681.ne 2
2682.na
2683\fBzfs_mg_fragmentation_threshold\fR (int)
2684.ad
2685.RS 12n
2686Metaslab groups are considered eligible for allocations if their
83426735 2687fragmentation metric (measured as a percentage) is less than or equal to
f3a7f661
GW
2688this value. If a metaslab group exceeds this threshold then it will be
2689skipped unless all metaslab groups within the metaslab class have also
2690crossed this threshold.
2691.sp
cb020f0d 2692Default value: \fB95\fR.
f3a7f661
GW
2693.RE
2694
f4a4046b
TC
2695.sp
2696.ne 2
2697.na
2698\fBzfs_mg_noalloc_threshold\fR (int)
2699.ad
2700.RS 12n
2701Defines a threshold at which metaslab groups should be eligible for
2702allocations. The value is expressed as a percentage of free space
2703beyond which a metaslab group is always eligible for allocations.
2704If a metaslab group's free space is less than or equal to the
6b4e21c6 2705threshold, the allocator will avoid allocating to that group
f4a4046b
TC
2706unless all groups in the pool have reached the threshold. Once all
2707groups have reached the threshold, all groups are allowed to accept
2708allocations. The default value of 0 disables the feature and causes
2709all metaslab groups to be eligible for allocations.
2710
b58237e7 2711This parameter allows one to deal with pools having heavily imbalanced
f4a4046b
TC
2712vdevs such as would be the case when a new vdev has been added.
2713Setting the threshold to a non-zero percentage will stop allocations
2714from being made to vdevs that aren't filled to the specified percentage
2715and allow lesser filled vdevs to acquire more allocations than they
2716otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
2717.sp
2718Default value: \fB0\fR.
2719.RE
2720
cc99f275
DB
2721.sp
2722.ne 2
2723.na
2724\fBzfs_ddt_data_is_special\fR (int)
2725.ad
2726.RS 12n
2727If enabled, ZFS will place DDT data into the special allocation class.
2728.sp
2729Default value: \fB1\fR.
2730.RE
2731
2732.sp
2733.ne 2
2734.na
2735\fBzfs_user_indirect_is_special\fR (int)
2736.ad
2737.RS 12n
2738If enabled, ZFS will place user data (both file and zvol) indirect blocks
2739into the special allocation class.
2740.sp
2741Default value: \fB1\fR.
2742.RE
2743
379ca9cf
OF
2744.sp
2745.ne 2
2746.na
2747\fBzfs_multihost_history\fR (int)
2748.ad
2749.RS 12n
2750Historical statistics for the last N multihost updates will be available in
2751\fB/proc/spl/kstat/zfs/<pool>/multihost\fR
2752.sp
2753Default value: \fB0\fR.
2754.RE
2755
2756.sp
2757.ne 2
2758.na
2759\fBzfs_multihost_interval\fR (ulong)
2760.ad
2761.RS 12n
2762Used to control the frequency of multihost writes which are performed when the
060f0226
OF
2763\fBmultihost\fR pool property is on. This is one factor used to determine the
2764length of the activity check during import.
379ca9cf 2765.sp
060f0226
OF
2766The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR
2767milliseconds. On average a multihost write will be issued for each leaf vdev
2768every \fBzfs_multihost_interval\fR milliseconds. In practice, the observed
2769period can vary with the I/O load and this observed value is the delay which is
2770stored in the uberblock.
379ca9cf
OF
2771.sp
2772Default value: \fB1000\fR.
2773.RE
2774
2775.sp
2776.ne 2
2777.na
2778\fBzfs_multihost_import_intervals\fR (uint)
2779.ad
2780.RS 12n
2781Used to control the duration of the activity test on import. Smaller values of
2782\fBzfs_multihost_import_intervals\fR will reduce the import time but increase
2783the risk of failing to detect an active pool. The total activity check time is
060f0226
OF
2784never allowed to drop below one second.
2785.sp
2786On import the activity check waits a minimum amount of time determined by
2787\fBzfs_multihost_interval * zfs_multihost_import_intervals\fR, or the same
2788product computed on the host which last had the pool imported (whichever is
2789greater). The activity check time may be further extended if the value of mmp
2790delay found in the best uberblock indicates actual multihost updates happened
2791at longer intervals than \fBzfs_multihost_interval\fR. A minimum value of
2792\fB100ms\fR is enforced.
2793.sp
2794A value of 0 is ignored and treated as if it was set to 1.
379ca9cf 2795.sp
db2af93d 2796Default value: \fB20\fR.
379ca9cf
OF
2797.RE
2798
2799.sp
2800.ne 2
2801.na
2802\fBzfs_multihost_fail_intervals\fR (uint)
2803.ad
2804.RS 12n
060f0226
OF
2805Controls the behavior of the pool when multihost write failures or delays are
2806detected.
379ca9cf 2807.sp
060f0226
OF
2808When \fBzfs_multihost_fail_intervals = 0\fR, multihost write failures or delays
2809are ignored. The failures will still be reported to the ZED which depending on
2810its configuration may take action such as suspending the pool or offlining a
2811device.
2812
379ca9cf 2813.sp
060f0226
OF
2814When \fBzfs_multihost_fail_intervals > 0\fR, the pool will be suspended if
2815\fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds pass
2816without a successful mmp write. This guarantees the activity test will see
2817mmp writes if the pool is imported. A value of 1 is ignored and treated as
2818if it was set to 2. This is necessary to prevent the pool from being suspended
2819due to normal, small I/O latency variations.
2820
379ca9cf 2821.sp
db2af93d 2822Default value: \fB10\fR.
379ca9cf
OF
2823.RE
2824
29714574
TF
2825.sp
2826.ne 2
2827.na
2828\fBzfs_no_scrub_io\fR (int)
2829.ad
2830.RS 12n
83426735
D
2831Set for no scrub I/O. This results in scrubs not actually scrubbing data and
2832simply doing a metadata crawl of the pool instead.
29714574
TF
2833.sp
2834Use \fB1\fR for yes and \fB0\fR for no (default).
2835.RE
2836
2837.sp
2838.ne 2
2839.na
2840\fBzfs_no_scrub_prefetch\fR (int)
2841.ad
2842.RS 12n
83426735 2843Set to disable block prefetching for scrubs.
29714574
TF
2844.sp
2845Use \fB1\fR for yes and \fB0\fR for no (default).
2846.RE
2847
29714574
TF
2848.sp
2849.ne 2
2850.na
2851\fBzfs_nocacheflush\fR (int)
2852.ad
2853.RS 12n
53b1f5ea
PS
2854Disable cache flush operations on disks when writing. Setting this will
2855cause pool corruption on power loss if a volatile out-of-order write cache
2856is enabled.
29714574
TF
2857.sp
2858Use \fB1\fR for yes and \fB0\fR for no (default).
2859.RE
2860
2861.sp
2862.ne 2
2863.na
2864\fBzfs_nopwrite_enabled\fR (int)
2865.ad
2866.RS 12n
2867Enable NOP writes
2868.sp
2869Use \fB1\fR for yes (default) and \fB0\fR to disable.
2870.RE
2871
66aca247
DB
2872.sp
2873.ne 2
2874.na
2875\fBzfs_dmu_offset_next_sync\fR (int)
2876.ad
2877.RS 12n
2878Enable forcing txg sync to find holes. When enabled forces ZFS to act
2879like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
2880when a dnode is dirty causes txg's to be synced so that this data can be
2881found.
2882.sp
2883Use \fB1\fR for yes and \fB0\fR to disable (default).
2884.RE
2885
29714574
TF
2886.sp
2887.ne 2
2888.na
b738bc5a 2889\fBzfs_pd_bytes_max\fR (int)
29714574
TF
2890.ad
2891.RS 12n
83426735 2892The number of bytes which should be prefetched during a pool traversal
6146e17e 2893(eg: \fBzfs send\fR or other data crawling operations)
29714574 2894.sp
74aa2ba2 2895Default value: \fB52,428,800\fR.
29714574
TF
2896.RE
2897
bef78122
DQ
2898.sp
2899.ne 2
2900.na
2901\fBzfs_per_txg_dirty_frees_percent \fR (ulong)
2902.ad
2903.RS 12n
65282ee9
AP
2904Tunable to control percentage of dirtied indirect blocks from frees allowed
2905into one TXG. After this threshold is crossed, additional frees will wait until
2906the next TXG.
bef78122
DQ
2907A value of zero will disable this throttle.
2908.sp
65282ee9 2909Default value: \fB5\fR, set to \fB0\fR to disable.
bef78122
DQ
2910.RE
2911
29714574
TF
2912.sp
2913.ne 2
2914.na
2915\fBzfs_prefetch_disable\fR (int)
2916.ad
2917.RS 12n
7f60329a
MA
2918This tunable disables predictive prefetch. Note that it leaves "prescient"
2919prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
2920prescient prefetch never issues i/os that end up not being needed, so it
2921can't hurt performance.
29714574
TF
2922.sp
2923Use \fB1\fR for yes and \fB0\fR for no (default).
2924.RE
2925
5090f727
CZ
2926.sp
2927.ne 2
2928.na
2929\fBzfs_qat_checksum_disable\fR (int)
2930.ad
2931.RS 12n
2932This tunable disables qat hardware acceleration for sha256 checksums. It
2933may be set after the zfs modules have been loaded to initialize the qat
2934hardware as long as support is compiled in and the qat driver is present.
2935.sp
2936Use \fB1\fR for yes and \fB0\fR for no (default).
2937.RE
2938
2939.sp
2940.ne 2
2941.na
2942\fBzfs_qat_compress_disable\fR (int)
2943.ad
2944.RS 12n
2945This tunable disables qat hardware acceleration for gzip compression. It
2946may be set after the zfs modules have been loaded to initialize the qat
2947hardware as long as support is compiled in and the qat driver is present.
2948.sp
2949Use \fB1\fR for yes and \fB0\fR for no (default).
2950.RE
2951
2952.sp
2953.ne 2
2954.na
2955\fBzfs_qat_encrypt_disable\fR (int)
2956.ad
2957.RS 12n
2958This tunable disables qat hardware acceleration for AES-GCM encryption. It
2959may be set after the zfs modules have been loaded to initialize the qat
2960hardware as long as support is compiled in and the qat driver is present.
2961.sp
2962Use \fB1\fR for yes and \fB0\fR for no (default).
2963.RE
2964
29714574
TF
2965.sp
2966.ne 2
2967.na
2968\fBzfs_read_chunk_size\fR (long)
2969.ad
2970.RS 12n
2971Bytes to read per chunk
2972.sp
2973Default value: \fB1,048,576\fR.
2974.RE
2975
2976.sp
2977.ne 2
2978.na
2979\fBzfs_read_history\fR (int)
2980.ad
2981.RS 12n
379ca9cf
OF
2982Historical statistics for the last N reads will be available in
2983\fB/proc/spl/kstat/zfs/<pool>/reads\fR
29714574 2984.sp
83426735 2985Default value: \fB0\fR (no data is kept).
29714574
TF
2986.RE
2987
2988.sp
2989.ne 2
2990.na
2991\fBzfs_read_history_hits\fR (int)
2992.ad
2993.RS 12n
2994Include cache hits in read history
2995.sp
2996Use \fB1\fR for yes and \fB0\fR for no (default).
2997.RE
2998
9a49d3f3
BB
2999.sp
3000.ne 2
3001.na
3002\fBzfs_rebuild_max_segment\fR (ulong)
3003.ad
3004.RS 12n
3005Maximum read segment size to issue when sequentially resilvering a
3006top-level vdev.
3007.sp
3008Default value: \fB1,048,576\fR.
3009.RE
3010
b2255edc
BB
3011.sp
3012.ne 2
3013.na
3014\fBzfs_rebuild_scrub_enabled\fR (int)
3015.ad
3016.RS 12n
3017Automatically start a pool scrub when the last active sequential resilver
3018completes in order to verify the checksums of all blocks which have been
3019resilvered. This option is enabled by default and is strongly recommended.
3020.sp
3021Default value: \fB1\fR.
3022.RE
3023
3024.sp
3025.ne 2
3026.na
3027\fBzfs_rebuild_vdev_limit\fR (ulong)
3028.ad
3029.RS 12n
3030Maximum amount of i/o that can be concurrently issued for a sequential
3031resilver per leaf device, given in bytes.
3032.sp
3033Default value: \fB33,554,432\fR.
3034.RE
3035
9e052db4
MA
3036.sp
3037.ne 2
3038.na
4589f3ae
BB
3039\fBzfs_reconstruct_indirect_combinations_max\fR (int)
3040.ad
3041.RS 12na
3042If an indirect split block contains more than this many possible unique
3043combinations when being reconstructed, consider it too computationally
3044expensive to check them all. Instead, try at most
3045\fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected
3046combinations each time the block is accessed. This allows all segment
3047copies to participate fairly in the reconstruction when all combinations
3048cannot be checked and prevents repeated use of one bad copy.
3049.sp
64bdf63f 3050Default value: \fB4096\fR.
9e052db4
MA
3051.RE
3052
29714574
TF
3053.sp
3054.ne 2
3055.na
3056\fBzfs_recover\fR (int)
3057.ad
3058.RS 12n
3059Set to attempt to recover from fatal errors. This should only be used as a
3060last resort, as it typically results in leaked space, or worse.
3061.sp
3062Use \fB1\fR for yes and \fB0\fR for no (default).
3063.RE
3064
7c9a4292
BB
3065.sp
3066.ne 2
3067.na
3068\fBzfs_removal_ignore_errors\fR (int)
3069.ad
3070.RS 12n
3071.sp
3072Ignore hard IO errors during device removal. When set, if a device encounters
3073a hard IO error during the removal process the removal will not be cancelled.
3074This can result in a normally recoverable block becoming permanently damaged
3075and is not recommended. This should only be used as a last resort when the
3076pool cannot be returned to a healthy state prior to removing the device.
3077.sp
3078Default value: \fB0\fR.
3079.RE
3080
53dce5ac
MA
3081.sp
3082.ne 2
3083.na
3084\fBzfs_removal_suspend_progress\fR (int)
3085.ad
3086.RS 12n
3087.sp
3088This is used by the test suite so that it can ensure that certain actions
3089happen while in the middle of a removal.
3090.sp
3091Default value: \fB0\fR.
3092.RE
3093
3094.sp
3095.ne 2
3096.na
3097\fBzfs_remove_max_segment\fR (int)
3098.ad
3099.RS 12n
3100.sp
3101The largest contiguous segment that we will attempt to allocate when removing
3102a device. This can be no larger than 16MB. If there is a performance
3103problem with attempting to allocate large blocks, consider decreasing this.
3104.sp
3105Default value: \fB16,777,216\fR (16MB).
3106.RE
3107
67709516
D
3108.sp
3109.ne 2
3110.na
3111\fBzfs_resilver_disable_defer\fR (int)
3112.ad
3113.RS 12n
3114Disables the \fBresilver_defer\fR feature, causing an operation that would
3115start a resilver to restart one in progress immediately.
3116.sp
3117Default value: \fB0\fR (feature enabled).
3118.RE
3119
29714574
TF
3120.sp
3121.ne 2
3122.na
d4a72f23 3123\fBzfs_resilver_min_time_ms\fR (int)
29714574
TF
3124.ad
3125.RS 12n
d4a72f23
TC
3126Resilvers are processed by the sync thread. While resilvering it will spend
3127at least this much time working on a resilver between txg flushes.
29714574 3128.sp
d4a72f23 3129Default value: \fB3,000\fR.
29714574
TF
3130.RE
3131
02638a30
TC
3132.sp
3133.ne 2
3134.na
3135\fBzfs_scan_ignore_errors\fR (int)
3136.ad
3137.RS 12n
3138If set to a nonzero value, remove the DTL (dirty time list) upon
3139completion of a pool scan (scrub) even if there were unrepairable
3140errors. It is intended to be used during pool repair or recovery to
3141stop resilvering when the pool is next imported.
3142.sp
3143Default value: \fB0\fR.
3144.RE
3145
29714574
TF
3146.sp
3147.ne 2
3148.na
d4a72f23 3149\fBzfs_scrub_min_time_ms\fR (int)
29714574
TF
3150.ad
3151.RS 12n
d4a72f23
TC
3152Scrubs are processed by the sync thread. While scrubbing it will spend
3153at least this much time working on a scrub between txg flushes.
29714574 3154.sp
d4a72f23 3155Default value: \fB1,000\fR.
29714574
TF
3156.RE
3157
3158.sp
3159.ne 2
3160.na
d4a72f23 3161\fBzfs_scan_checkpoint_intval\fR (int)
29714574
TF
3162.ad
3163.RS 12n
d4a72f23
TC
3164To preserve progress across reboots the sequential scan algorithm periodically
3165needs to stop metadata scanning and issue all the verifications I/Os to disk.
3166The frequency of this flushing is determined by the
a8577bdb 3167\fBzfs_scan_checkpoint_intval\fR tunable.
29714574 3168.sp
d4a72f23 3169Default value: \fB7200\fR seconds (every 2 hours).
29714574
TF
3170.RE
3171
3172.sp
3173.ne 2
3174.na
d4a72f23 3175\fBzfs_scan_fill_weight\fR (int)
29714574
TF
3176.ad
3177.RS 12n
d4a72f23
TC
3178This tunable affects how scrub and resilver I/O segments are ordered. A higher
3179number indicates that we care more about how filled in a segment is, while a
3180lower number indicates we care more about the size of the extent without
3181considering the gaps within a segment. This value is only tunable upon module
3182insertion. Changing the value afterwards will have no affect on scrub or
3183resilver performance.
29714574 3184.sp
d4a72f23 3185Default value: \fB3\fR.
29714574
TF
3186.RE
3187
3188.sp
3189.ne 2
3190.na
d4a72f23 3191\fBzfs_scan_issue_strategy\fR (int)
29714574
TF
3192.ad
3193.RS 12n
d4a72f23
TC
3194Determines the order that data will be verified while scrubbing or resilvering.
3195If set to \fB1\fR, data will be verified as sequentially as possible, given the
3196amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
3197may improve scrub performance if the pool's data is very fragmented. If set to
3198\fB2\fR, the largest mostly-contiguous chunk of found data will be verified
3199first. By deferring scrubbing of small segments, we may later find adjacent data
3200to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
3201strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
3202checkpoint.
29714574 3203.sp
d4a72f23
TC
3204Default value: \fB0\fR.
3205.RE
3206
3207.sp
3208.ne 2
3209.na
3210\fBzfs_scan_legacy\fR (int)
3211.ad
3212.RS 12n
3213A value of 0 indicates that scrubs and resilvers will gather metadata in
3214memory before issuing sequential I/O. A value of 1 indicates that the legacy
3215algorithm will be used where I/O is initiated as soon as it is discovered.
3216Changing this value to 0 will not affect scrubs or resilvers that are already
3217in progress.
3218.sp
3219Default value: \fB0\fR.
3220.RE
3221
3222.sp
3223.ne 2
3224.na
3225\fBzfs_scan_max_ext_gap\fR (int)
3226.ad
3227.RS 12n
3228Indicates the largest gap in bytes between scrub / resilver I/Os that will still
3229be considered sequential for sorting purposes. Changing this value will not
3230affect scrubs or resilvers that are already in progress.
3231.sp
3232Default value: \fB2097152 (2 MB)\fR.
3233.RE
3234
3235.sp
3236.ne 2
3237.na
3238\fBzfs_scan_mem_lim_fact\fR (int)
3239.ad
3240.RS 12n
3241Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
3242This tunable determines the hard limit for I/O sorting memory usage.
3243When the hard limit is reached we stop scanning metadata and start issuing
3244data verification I/O. This is done until we get below the soft limit.
3245.sp
3246Default value: \fB20\fR which is 5% of RAM (1/20).
3247.RE
3248
3249.sp
3250.ne 2
3251.na
3252\fBzfs_scan_mem_lim_soft_fact\fR (int)
3253.ad
3254.RS 12n
3255The fraction of the hard limit used to determined the soft limit for I/O sorting
ac3d4d0c 3256by the sequential scan algorithm. When we cross this limit from below no action
d4a72f23
TC
3257is taken. When we cross this limit from above it is because we are issuing
3258verification I/O. In this case (unless the metadata scan is done) we stop
3259issuing verification I/O and start scanning metadata again until we get to the
3260hard limit.
3261.sp
3262Default value: \fB20\fR which is 5% of the hard limit (1/20).
3263.RE
3264
67709516
D
3265.sp
3266.ne 2
3267.na
3268\fBzfs_scan_strict_mem_lim\fR (int)
3269.ad
3270.RS 12n
3271Enforces tight memory limits on pool scans when a sequential scan is in
3272progress. When disabled the memory limit may be exceeded by fast disks.
3273.sp
3274Default value: \fB0\fR.
3275.RE
3276
3277.sp
3278.ne 2
3279.na
3280\fBzfs_scan_suspend_progress\fR (int)
3281.ad
3282.RS 12n
3283Freezes a scrub/resilver in progress without actually pausing it. Intended for
3284testing/debugging.
3285.sp
3286Default value: \fB0\fR.
3287.RE
3288
3289
d4a72f23
TC
3290.sp
3291.ne 2
3292.na
3293\fBzfs_scan_vdev_limit\fR (int)
3294.ad
3295.RS 12n
3296Maximum amount of data that can be concurrently issued at once for scrubs and
3297resilvers per leaf device, given in bytes.
3298.sp
3299Default value: \fB41943040\fR.
29714574
TF
3300.RE
3301
fd8febbd
TF
3302.sp
3303.ne 2
3304.na
3305\fBzfs_send_corrupt_data\fR (int)
3306.ad
3307.RS 12n
83426735 3308Allow sending of corrupt data (ignore read/checksum errors when sending data)
fd8febbd
TF
3309.sp
3310Use \fB1\fR for yes and \fB0\fR for no (default).
3311.RE
3312
caf9dd20
BB
3313.sp
3314.ne 2
3315.na
3316\fBzfs_send_unmodified_spill_blocks\fR (int)
3317.ad
3318.RS 12n
3319Include unmodified spill blocks in the send stream. Under certain circumstances
3320previous versions of ZFS could incorrectly remove the spill block from an
3321existing object. Including unmodified copies of the spill blocks creates a
3322backwards compatible stream which will recreate a spill block if it was
3323incorrectly removed.
3324.sp
3325Use \fB1\fR for yes (default) and \fB0\fR for no.
3326.RE
3327
30af21b0
PD
3328.sp
3329.ne 2
3330.na
3331\fBzfs_send_no_prefetch_queue_ff\fR (int)
3332.ad
3333.RS 12n
3334The fill fraction of the \fBzfs send\fR internal queues. The fill fraction
3335controls the timing with which internal threads are woken up.
3336.sp
3337Default value: \fB20\fR.
3338.RE
3339
3340.sp
3341.ne 2
3342.na
3343\fBzfs_send_no_prefetch_queue_length\fR (int)
3344.ad
3345.RS 12n
3346The maximum number of bytes allowed in \fBzfs send\fR's internal queues.
3347.sp
3348Default value: \fB1,048,576\fR.
3349.RE
3350
3351.sp
3352.ne 2
3353.na
3354\fBzfs_send_queue_ff\fR (int)
3355.ad
3356.RS 12n
3357The fill fraction of the \fBzfs send\fR prefetch queue. The fill fraction
3358controls the timing with which internal threads are woken up.
3359.sp
3360Default value: \fB20\fR.
3361.RE
3362
3b0d9928
BB
3363.sp
3364.ne 2
3365.na
3366\fBzfs_send_queue_length\fR (int)
3367.ad
3368.RS 12n
30af21b0
PD
3369The maximum number of bytes allowed that will be prefetched by \fBzfs send\fR.
3370This value must be at least twice the maximum block size in use.
3b0d9928
BB
3371.sp
3372Default value: \fB16,777,216\fR.
3373.RE
3374
30af21b0
PD
3375.sp
3376.ne 2
3377.na
3378\fBzfs_recv_queue_ff\fR (int)
3379.ad
3380.RS 12n
3381The fill fraction of the \fBzfs receive\fR queue. The fill fraction
3382controls the timing with which internal threads are woken up.
3383.sp
3384Default value: \fB20\fR.
3385.RE
3386
3b0d9928
BB
3387.sp
3388.ne 2
3389.na
3390\fBzfs_recv_queue_length\fR (int)
3391.ad
3392.RS 12n
3b0d9928
BB
3393The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value
3394must be at least twice the maximum block size in use.
3395.sp
3396Default value: \fB16,777,216\fR.
3397.RE
3398
7261fc2e
MA
3399.sp
3400.ne 2
3401.na
3402\fBzfs_recv_write_batch_size\fR (int)
3403.ad
3404.RS 12n
3405The maximum amount of data (in bytes) that \fBzfs receive\fR will write in
3406one DMU transaction. This is the uncompressed size, even when receiving a
3407compressed send stream. This setting will not reduce the write size below
3408a single block. Capped at a maximum of 32MB
3409.sp
3410Default value: \fB1MB\fR.
3411.RE
3412
30af21b0
PD
3413.sp
3414.ne 2
3415.na
3416\fBzfs_override_estimate_recordsize\fR (ulong)
3417.ad
3418.RS 12n
3419Setting this variable overrides the default logic for estimating block
3420sizes when doing a zfs send. The default heuristic is that the average
3421block size will be the current recordsize. Override this value if most data
3422in your dataset is not of that size and you require accurate zfs send size
3423estimates.
3424.sp
3425Default value: \fB0\fR.
3426.RE
3427
29714574
TF
3428.sp
3429.ne 2
3430.na
3431\fBzfs_sync_pass_deferred_free\fR (int)
3432.ad
3433.RS 12n
83426735 3434Flushing of data to disk is done in passes. Defer frees starting in this pass
29714574
TF
3435.sp
3436Default value: \fB2\fR.
3437.RE
3438
d2734cce
SD
3439.sp
3440.ne 2
3441.na
3442\fBzfs_spa_discard_memory_limit\fR (int)
3443.ad
3444.RS 12n
3445Maximum memory used for prefetching a checkpoint's space map on each
3446vdev while discarding the checkpoint.
3447.sp
3448Default value: \fB16,777,216\fR.
3449.RE
3450
1f02ecc5
D
3451.sp
3452.ne 2
3453.na
3454\fBzfs_special_class_metadata_reserve_pct\fR (int)
3455.ad
3456.RS 12n
3457Only allow small data blocks to be allocated on the special and dedup vdev
3458types when the available free space percentage on these vdevs exceeds this
3459value. This ensures reserved space is available for pool meta data as the
3460special vdevs approach capacity.
3461.sp
3462Default value: \fB25\fR.
3463.RE
3464
29714574
TF
3465.sp
3466.ne 2
3467.na
3468\fBzfs_sync_pass_dont_compress\fR (int)
3469.ad
3470.RS 12n
b596585f 3471Starting in this sync pass, we disable compression (including of metadata).
be89734a
MA
3472With the default setting, in practice, we don't have this many sync passes,
3473so this has no effect.
3474.sp
3475The original intent was that disabling compression would help the sync passes
3476to converge. However, in practice disabling compression increases the average
3477number of sync passes, because when we turn compression off, a lot of block's
3478size will change and thus we have to re-allocate (not overwrite) them. It
3479also increases the number of 128KB allocations (e.g. for indirect blocks and
3480spacemaps) because these will not be compressed. The 128K allocations are
3481especially detrimental to performance on highly fragmented systems, which may
3482have very few free segments of this size, and may need to load new metaslabs
3483to satisfy 128K allocations.
29714574 3484.sp
be89734a 3485Default value: \fB8\fR.
29714574
TF
3486.RE
3487
3488.sp
3489.ne 2
3490.na
3491\fBzfs_sync_pass_rewrite\fR (int)
3492.ad
3493.RS 12n
83426735 3494Rewrite new block pointers starting in this pass
29714574
TF
3495.sp
3496Default value: \fB2\fR.
3497.RE
3498
a032ac4b
BB
3499.sp
3500.ne 2
3501.na
3502\fBzfs_sync_taskq_batch_pct\fR (int)
3503.ad
3504.RS 12n
3505This controls the number of threads used by the dp_sync_taskq. The default
3506value of 75% will create a maximum of one thread per cpu.
3507.sp
be54a13c 3508Default value: \fB75\fR%.
a032ac4b
BB
3509.RE
3510
1b939560
BB
3511.sp
3512.ne 2
3513.na
67709516 3514\fBzfs_trim_extent_bytes_max\fR (uint)
1b939560
BB
3515.ad
3516.RS 12n
3517Maximum size of TRIM command. Ranges larger than this will be split in to
3518chunks no larger than \fBzfs_trim_extent_bytes_max\fR bytes before being
3519issued to the device.
3520.sp
3521Default value: \fB134,217,728\fR.
3522.RE
3523
3524.sp
3525.ne 2
3526.na
67709516 3527\fBzfs_trim_extent_bytes_min\fR (uint)
1b939560
BB
3528.ad
3529.RS 12n
3530Minimum size of TRIM commands. TRIM ranges smaller than this will be skipped
3531unless they're part of a larger range which was broken in to chunks. This is
3532done because it's common for these small TRIMs to negatively impact overall
3533performance. This value can be set to 0 to TRIM all unallocated space.
3534.sp
3535Default value: \fB32,768\fR.
3536.RE
3537
3538.sp
3539.ne 2
3540.na
67709516 3541\fBzfs_trim_metaslab_skip\fR (uint)
1b939560
BB
3542.ad
3543.RS 12n
3544Skip uninitialized metaslabs during the TRIM process. This option is useful
3545for pools constructed from large thinly-provisioned devices where TRIM
3546operations are slow. As a pool ages an increasing fraction of the pools
3547metaslabs will be initialized progressively degrading the usefulness of
3548this option. This setting is stored when starting a manual TRIM and will
3549persist for the duration of the requested TRIM.
3550.sp
3551Default value: \fB0\fR.
3552.RE
3553
3554.sp
3555.ne 2
3556.na
67709516 3557\fBzfs_trim_queue_limit\fR (uint)
1b939560
BB
3558.ad
3559.RS 12n
3560Maximum number of queued TRIMs outstanding per leaf vdev. The number of
3561concurrent TRIM commands issued to the device is controlled by the
3562\fBzfs_vdev_trim_min_active\fR and \fBzfs_vdev_trim_max_active\fR module
3563options.
3564.sp
3565Default value: \fB10\fR.
3566.RE
3567
3568.sp
3569.ne 2
3570.na
67709516 3571\fBzfs_trim_txg_batch\fR (uint)
1b939560
BB
3572.ad
3573.RS 12n
3574The number of transaction groups worth of frees which should be aggregated
3575before TRIM operations are issued to the device. This setting represents a
3576trade-off between issuing larger, more efficient TRIM operations and the
3577delay before the recently trimmed space is available for use by the device.
3578.sp
3579Increasing this value will allow frees to be aggregated for a longer time.
3580This will result is larger TRIM operations and potentially increased memory
3581usage. Decreasing this value will have the opposite effect. The default
3582value of 32 was determined to be a reasonable compromise.
3583.sp
3584Default value: \fB32\fR.
3585.RE
3586
29714574
TF
3587.sp
3588.ne 2
3589.na
3590\fBzfs_txg_history\fR (int)
3591.ad
3592.RS 12n
379ca9cf
OF
3593Historical statistics for the last N txgs will be available in
3594\fB/proc/spl/kstat/zfs/<pool>/txgs\fR
29714574 3595.sp
ca85d690 3596Default value: \fB0\fR.
29714574
TF
3597.RE
3598
29714574
TF
3599.sp
3600.ne 2
3601.na
3602\fBzfs_txg_timeout\fR (int)
3603.ad
3604.RS 12n
83426735 3605Flush dirty data to disk at least every N seconds (maximum txg duration)
29714574
TF
3606.sp
3607Default value: \fB5\fR.
3608.RE
3609
1b939560
BB
3610.sp
3611.ne 2
3612.na
3613\fBzfs_vdev_aggregate_trim\fR (int)
3614.ad
3615.RS 12n
3616Allow TRIM I/Os to be aggregated. This is normally not helpful because
3617the extents to be trimmed will have been already been aggregated by the
3618metaslab. This option is provided for debugging and performance analysis.
3619.sp
3620Default value: \fB0\fR.
3621.RE
3622
29714574
TF
3623.sp
3624.ne 2
3625.na
3626\fBzfs_vdev_aggregation_limit\fR (int)
3627.ad
3628.RS 12n
3629Max vdev I/O aggregation size
3630.sp
1af240f3
AM
3631Default value: \fB1,048,576\fR.
3632.RE
3633
3634.sp
3635.ne 2
3636.na
3637\fBzfs_vdev_aggregation_limit_non_rotating\fR (int)
3638.ad
3639.RS 12n
3640Max vdev I/O aggregation size for non-rotating media
3641.sp
29714574
TF
3642Default value: \fB131,072\fR.
3643.RE
3644
3645.sp
3646.ne 2
3647.na
3648\fBzfs_vdev_cache_bshift\fR (int)
3649.ad
3650.RS 12n
3651Shift size to inflate reads too
3652.sp
83426735 3653Default value: \fB16\fR (effectively 65536).
29714574
TF
3654.RE
3655
3656.sp
3657.ne 2
3658.na
3659\fBzfs_vdev_cache_max\fR (int)
3660.ad
3661.RS 12n
ca85d690 3662Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
3663size (default 64k).
83426735
D
3664.sp
3665Default value: \fB16384\fR.
29714574
TF
3666.RE
3667
3668.sp
3669.ne 2
3670.na
3671\fBzfs_vdev_cache_size\fR (int)
3672.ad
3673.RS 12n
83426735
D
3674Total size of the per-disk cache in bytes.
3675.sp
3676Currently this feature is disabled as it has been found to not be helpful
3677for performance and in some cases harmful.
29714574
TF
3678.sp
3679Default value: \fB0\fR.
3680.RE
3681
29714574
TF
3682.sp
3683.ne 2
3684.na
9f500936 3685\fBzfs_vdev_mirror_rotating_inc\fR (int)
29714574
TF
3686.ad
3687.RS 12n
9f500936 3688A number by which the balancing algorithm increments the load calculation for
3689the purpose of selecting the least busy mirror member when an I/O immediately
3690follows its predecessor on rotational vdevs for the purpose of making decisions
3691based on load.
29714574 3692.sp
9f500936 3693Default value: \fB0\fR.
3694.RE
3695
3696.sp
3697.ne 2
3698.na
3699\fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
3700.ad
3701.RS 12n
3702A number by which the balancing algorithm increments the load calculation for
3703the purpose of selecting the least busy mirror member when an I/O lacks
3704locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3705this that are not immediately following the previous I/O are incremented by
3706half.
3707.sp
3708Default value: \fB5\fR.
3709.RE
3710
3711.sp
3712.ne 2
3713.na
3714\fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
3715.ad
3716.RS 12n
3717The maximum distance for the last queued I/O in which the balancing algorithm
3718considers an I/O to have locality.
3719See the section "ZFS I/O SCHEDULER".
3720.sp
3721Default value: \fB1048576\fR.
3722.RE
3723
3724.sp
3725.ne 2
3726.na
3727\fBzfs_vdev_mirror_non_rotating_inc\fR (int)
3728.ad
3729.RS 12n
3730A number by which the balancing algorithm increments the load calculation for
3731the purpose of selecting the least busy mirror member on non-rotational vdevs
3732when I/Os do not immediately follow one another.
3733.sp
3734Default value: \fB0\fR.
3735.RE
3736
3737.sp
3738.ne 2
3739.na
3740\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
3741.ad
3742.RS 12n
3743A number by which the balancing algorithm increments the load calculation for
3744the purpose of selecting the least busy mirror member when an I/O lacks
3745locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3746this that are not immediately following the previous I/O are incremented by
3747half.
3748.sp
3749Default value: \fB1\fR.
29714574
TF
3750.RE
3751
29714574
TF
3752.sp
3753.ne 2
3754.na
3755\fBzfs_vdev_read_gap_limit\fR (int)
3756.ad
3757.RS 12n
83426735
D
3758Aggregate read I/O operations if the gap on-disk between them is within this
3759threshold.
29714574
TF
3760.sp
3761Default value: \fB32,768\fR.
3762.RE
3763
29714574
TF
3764.sp
3765.ne 2
3766.na
3767\fBzfs_vdev_write_gap_limit\fR (int)
3768.ad
3769.RS 12n
3770Aggregate write I/O over gap
3771.sp
3772Default value: \fB4,096\fR.
3773.RE
3774
ab9f4b0b
GN
3775.sp
3776.ne 2
3777.na
3778\fBzfs_vdev_raidz_impl\fR (string)
3779.ad
3780.RS 12n
c9187d86 3781Parameter for selecting raidz parity implementation to use.
ab9f4b0b
GN
3782
3783Options marked (always) below may be selected on module load as they are
3784supported on all systems.
3785The remaining options may only be set after the module is loaded, as they
3786are available only if the implementations are compiled in and supported
3787on the running system.
3788
3789Once the module is loaded, the content of
3790/sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
3791with the currently selected one enclosed in [].
3792Possible options are:
3793 fastest - (always) implementation selected using built-in benchmark
3794 original - (always) original raidz implementation
3795 scalar - (always) scalar raidz implementation
ae25d222
GN
3796 sse2 - implementation using SSE2 instruction set (64bit x86 only)
3797 ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
ab9f4b0b 3798 avx2 - implementation using AVX2 instruction set (64bit x86 only)
7f547f85
RD
3799 avx512f - implementation using AVX512F instruction set (64bit x86 only)
3800 avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
62a65a65
RD
3801 aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
3802 aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
35b07497 3803 powerpc_altivec - implementation using Altivec (PowerPC only)
ab9f4b0b
GN
3804.sp
3805Default value: \fBfastest\fR.
3806.RE
3807
67709516
D
3808.sp
3809.ne 2
3810.na
3811\fBzfs_vdev_scheduler\fR (charp)
3812.ad
3813.RS 12n
3814\fBDEPRECATED\fR: This option exists for compatibility with older user
3815configurations. It does nothing except print a warning to the kernel log if
3816set.
3817.sp
3818.RE
3819
29714574
TF
3820.sp
3821.ne 2
3822.na
3823\fBzfs_zevent_cols\fR (int)
3824.ad
3825.RS 12n
83426735 3826When zevents are logged to the console use this as the word wrap width.
29714574
TF
3827.sp
3828Default value: \fB80\fR.
3829.RE
3830
3831.sp
3832.ne 2
3833.na
3834\fBzfs_zevent_console\fR (int)
3835.ad
3836.RS 12n
3837Log events to the console
3838.sp
3839Use \fB1\fR for yes and \fB0\fR for no (default).
3840.RE
3841
3842.sp
3843.ne 2
3844.na
3845\fBzfs_zevent_len_max\fR (int)
3846.ad
3847.RS 12n
83426735
D
3848Max event queue length. A value of 0 will result in a calculated value which
3849increases with the number of CPUs in the system (minimum 64 events). Events
3850in the queue can be viewed with the \fBzpool events\fR command.
29714574
TF
3851.sp
3852Default value: \fB0\fR.
3853.RE
3854
a032ac4b
BB
3855.sp
3856.ne 2
4f072827
DB
3857.na
3858\fBzfs_zevent_retain_max\fR (int)
3859.ad
3860.RS 12n
3861Maximum recent zevent records to retain for duplicate checking. Setting
3862this value to zero disables duplicate detection.
3863.sp
3864Default value: \fB2000\fR.
3865.RE
3866
3867.sp
3868.ne 2
3869.na
3870\fBzfs_zevent_retain_expire_secs\fR (int)
3871.ad
3872.RS 12n
3873Lifespan for a recent ereport that was retained for duplicate checking.
3874.sp
3875Default value: \fB900\fR.
3876.RE
3877
a032ac4b
BB
3878.na
3879\fBzfs_zil_clean_taskq_maxalloc\fR (int)
3880.ad
3881.RS 12n
3882The maximum number of taskq entries that are allowed to be cached. When this
2fe61a7e 3883limit is exceeded transaction records (itxs) will be cleaned synchronously.
a032ac4b
BB
3884.sp
3885Default value: \fB1048576\fR.
3886.RE
3887
3888.sp
3889.ne 2
3890.na
3891\fBzfs_zil_clean_taskq_minalloc\fR (int)
3892.ad
3893.RS 12n
3894The number of taskq entries that are pre-populated when the taskq is first
3895created and are immediately available for use.
3896.sp
3897Default value: \fB1024\fR.
3898.RE
3899
3900.sp
3901.ne 2
3902.na
3903\fBzfs_zil_clean_taskq_nthr_pct\fR (int)
3904.ad
3905.RS 12n
3906This controls the number of threads used by the dp_zil_clean_taskq. The default
3907value of 100% will create a maximum of one thread per cpu.
3908.sp
be54a13c 3909Default value: \fB100\fR%.
a032ac4b
BB
3910.RE
3911
b8738257
MA
3912.sp
3913.ne 2
3914.na
3915\fBzil_maxblocksize\fR (int)
3916.ad
3917.RS 12n
3918This sets the maximum block size used by the ZIL. On very fragmented pools,
3919lowering this (typically to 36KB) can improve performance.
3920.sp
3921Default value: \fB131072\fR (128KB).
3922.RE
3923
53b1f5ea
PS
3924.sp
3925.ne 2
3926.na
3927\fBzil_nocacheflush\fR (int)
3928.ad
3929.RS 12n
3930Disable the cache flush commands that are normally sent to the disk(s) by
3931the ZIL after an LWB write has completed. Setting this will cause ZIL
3932corruption on power loss if a volatile out-of-order write cache is enabled.
3933.sp
3934Use \fB1\fR for yes and \fB0\fR for no (default).
3935.RE
3936
29714574
TF
3937.sp
3938.ne 2
3939.na
3940\fBzil_replay_disable\fR (int)
3941.ad
3942.RS 12n
83426735
D
3943Disable intent logging replay. Can be disabled for recovery from corrupted
3944ZIL
29714574
TF
3945.sp
3946Use \fB1\fR for yes and \fB0\fR for no (default).
3947.RE
3948
3949.sp
3950.ne 2
3951.na
1b7c1e5c 3952\fBzil_slog_bulk\fR (ulong)
29714574
TF
3953.ad
3954.RS 12n
1b7c1e5c
GDN
3955Limit SLOG write size per commit executed with synchronous priority.
3956Any writes above that will be executed with lower (asynchronous) priority
3957to limit potential SLOG device abuse by single active ZIL writer.
29714574 3958.sp
1b7c1e5c 3959Default value: \fB786,432\fR.
29714574
TF
3960.RE
3961
aa755b35
MA
3962.sp
3963.ne 2
3964.na
3965\fBzfs_embedded_slog_min_ms\fR (int)
3966.ad
3967.RS 12n
3968Usually, one metaslab from each (normal-class) vdev is dedicated for use by
3969the ZIL (to log synchronous writes).
3970However, if there are fewer than zfs_embedded_slog_min_ms metaslabs in the
3971vdev, this functionality is disabled.
3972This ensures that we don't set aside an unreasonable amount of space for the
3973ZIL.
3974.sp
3975Default value: \fB64\fR.
3976.RE
3977
638dd5f4
TC
3978.sp
3979.ne 2
3980.na
3981\fBzio_deadman_log_all\fR (int)
3982.ad
3983.RS 12n
3984If non-zero, the zio deadman will produce debugging messages (see
3985\fBzfs_dbgmsg_enable\fR) for all zios, rather than only for leaf
3986zios possessing a vdev. This is meant to be used by developers to gain
3987diagnostic information for hang conditions which don't involve a mutex
3988or other locking primitive; typically conditions in which a thread in
3989the zio pipeline is looping indefinitely.
3990.sp
3991Default value: \fB0\fR.
3992.RE
3993
c3bd3fb4
TC
3994.sp
3995.ne 2
3996.na
3997\fBzio_decompress_fail_fraction\fR (int)
3998.ad
3999.RS 12n
4000If non-zero, this value represents the denominator of the probability that zfs
4001should induce a decompression failure. For instance, for a 5% decompression
4002failure rate, this value should be set to 20.
4003.sp
4004Default value: \fB0\fR.
4005.RE
4006
29714574
TF
4007.sp
4008.ne 2
4009.na
ad796b8a 4010\fBzio_slow_io_ms\fR (int)
29714574
TF
4011.ad
4012.RS 12n
ad796b8a
TH
4013When an I/O operation takes more than \fBzio_slow_io_ms\fR milliseconds to
4014complete is marked as a slow I/O. Each slow I/O causes a delay zevent. Slow
4015I/O counters can be seen with "zpool status -s".
4016
29714574
TF
4017.sp
4018Default value: \fB30,000\fR.
4019.RE
4020
3dfb57a3
DB
4021.sp
4022.ne 2
4023.na
4024\fBzio_dva_throttle_enabled\fR (int)
4025.ad
4026.RS 12n
ad796b8a 4027Throttle block allocations in the I/O pipeline. This allows for
3dfb57a3 4028dynamic allocation distribution when devices are imbalanced.
e815485f
TC
4029When enabled, the maximum number of pending allocations per top-level vdev
4030is limited by \fBzfs_vdev_queue_depth_pct\fR.
3dfb57a3 4031.sp
27f2b90d 4032Default value: \fB1\fR.
3dfb57a3
DB
4033.RE
4034
29714574
TF
4035.sp
4036.ne 2
4037.na
4038\fBzio_requeue_io_start_cut_in_line\fR (int)
4039.ad
4040.RS 12n
4041Prioritize requeued I/O
4042.sp
4043Default value: \fB0\fR.
4044.RE
4045
dcb6bed1
D
4046.sp
4047.ne 2
4048.na
4049\fBzio_taskq_batch_pct\fR (uint)
4050.ad
4051.RS 12n
4052Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
ad796b8a 4053for I/O. These workers are responsible for I/O work such as compression and
dcb6bed1
D
4054checksum calculations. Fractional number of CPUs will be rounded down.
4055.sp
4056The default value of 75 was chosen to avoid using all CPUs which can result in
4057latency issues and inconsistent application performance, especially when high
4058compression is enabled.
4059.sp
4060Default value: \fB75\fR.
4061.RE
4062
29714574
TF
4063.sp
4064.ne 2
4065.na
4066\fBzvol_inhibit_dev\fR (uint)
4067.ad
4068.RS 12n
83426735
D
4069Do not create zvol device nodes. This may slightly improve startup time on
4070systems with a very large number of zvols.
29714574
TF
4071.sp
4072Use \fB1\fR for yes and \fB0\fR for no (default).
4073.RE
4074
4075.sp
4076.ne 2
4077.na
4078\fBzvol_major\fR (uint)
4079.ad
4080.RS 12n
83426735 4081Major number for zvol block devices
29714574
TF
4082.sp
4083Default value: \fB230\fR.
4084.RE
4085
4086.sp
4087.ne 2
4088.na
4089\fBzvol_max_discard_blocks\fR (ulong)
4090.ad
4091.RS 12n
83426735
D
4092Discard (aka TRIM) operations done on zvols will be done in batches of this
4093many blocks, where block size is determined by the \fBvolblocksize\fR property
4094of a zvol.
29714574
TF
4095.sp
4096Default value: \fB16,384\fR.
4097.RE
4098
9965059a
BB
4099.sp
4100.ne 2
4101.na
4102\fBzvol_prefetch_bytes\fR (uint)
4103.ad
4104.RS 12n
4105When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
4106from the start and end of the volume. Prefetching these regions
4107of the volume is desirable because they are likely to be accessed
4108immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
4109table.
4110.sp
4111Default value: \fB131,072\fR.
4112.RE
4113
692e55b8
CC
4114.sp
4115.ne 2
4116.na
4117\fBzvol_request_sync\fR (uint)
4118.ad
4119.RS 12n
4120When processing I/O requests for a zvol submit them synchronously. This
4121effectively limits the queue depth to 1 for each I/O submitter. When set
4122to 0 requests are handled asynchronously by a thread pool. The number of
4123requests which can be handled concurrently is controller by \fBzvol_threads\fR.
4124.sp
8fa5250f 4125Default value: \fB0\fR.
692e55b8
CC
4126.RE
4127
4128.sp
4129.ne 2
4130.na
4131\fBzvol_threads\fR (uint)
4132.ad
4133.RS 12n
4134Max number of threads which can handle zvol I/O requests concurrently.
4135.sp
4136Default value: \fB32\fR.
4137.RE
4138
cf8738d8 4139.sp
4140.ne 2
4141.na
4142\fBzvol_volmode\fR (uint)
4143.ad
4144.RS 12n
4145Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
4146Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
4147.sp
4148Default value: \fB1\fR.
4149.RE
4150
e8b96c60
MA
4151.SH ZFS I/O SCHEDULER
4152ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
4153The I/O scheduler determines when and in what order those operations are
4154issued. The I/O scheduler divides operations into five I/O classes
4155prioritized in the following order: sync read, sync write, async read,
4156async write, and scrub/resilver. Each queue defines the minimum and
4157maximum number of concurrent operations that may be issued to the
4158device. In addition, the device has an aggregate maximum,
4159\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
4160must not exceed the aggregate maximum. If the sum of the per-queue
4161maximums exceeds the aggregate maximum, then the number of active I/Os
4162may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
4163be issued regardless of whether all per-queue minimums have been met.
4164.sp
4165For many physical devices, throughput increases with the number of
4166concurrent operations, but latency typically suffers. Further, physical
4167devices typically have a limit at which more concurrent operations have no
4168effect on throughput or can actually cause it to decrease.
4169.sp
4170The scheduler selects the next operation to issue by first looking for an
4171I/O class whose minimum has not been satisfied. Once all are satisfied and
4172the aggregate maximum has not been hit, the scheduler looks for classes
4173whose maximum has not been satisfied. Iteration through the I/O classes is
4174done in the order specified above. No further operations are issued if the
4175aggregate maximum number of concurrent operations has been hit or if there
4176are no operations queued for an I/O class that has not hit its maximum.
4177Every time an I/O is queued or an operation completes, the I/O scheduler
4178looks for new operations to issue.
4179.sp
4180In general, smaller max_active's will lead to lower latency of synchronous
4181operations. Larger max_active's may lead to higher overall throughput,
4182depending on underlying storage.
4183.sp
4184The ratio of the queues' max_actives determines the balance of performance
4185between reads, writes, and scrubs. E.g., increasing
4186\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
4187more quickly, but reads and writes to have higher latency and lower throughput.
4188.sp
4189All I/O classes have a fixed maximum number of outstanding operations
4190except for the async write class. Asynchronous writes represent the data
4191that is committed to stable storage during the syncing stage for
4192transaction groups. Transaction groups enter the syncing state
4193periodically so the number of queued async writes will quickly burst up
4194and then bleed down to zero. Rather than servicing them as quickly as
4195possible, the I/O scheduler changes the maximum number of active async
4196write I/Os according to the amount of dirty data in the pool. Since
4197both throughput and latency typically increase with the number of
4198concurrent operations issued to physical devices, reducing the
4199burstiness in the number of concurrent operations also stabilizes the
4200response time of operations from other -- and in particular synchronous
4201-- queues. In broad strokes, the I/O scheduler will issue more
4202concurrent operations from the async write queue as there's more dirty
4203data in the pool.
4204.sp
4205Async Writes
4206.sp
4207The number of concurrent operations issued for the async write I/O class
4208follows a piece-wise linear function defined by a few adjustable points.
4209.nf
4210
4211 | o---------| <-- zfs_vdev_async_write_max_active
4212 ^ | /^ |
4213 | | / | |
4214active | / | |
4215 I/O | / | |
4216count | / | |
4217 | / | |
4218 |-------o | | <-- zfs_vdev_async_write_min_active
4219 0|_______^______|_________|
4220 0% | | 100% of zfs_dirty_data_max
4221 | |
4222 | `-- zfs_vdev_async_write_active_max_dirty_percent
4223 `--------- zfs_vdev_async_write_active_min_dirty_percent
4224
4225.fi
4226Until the amount of dirty data exceeds a minimum percentage of the dirty
4227data allowed in the pool, the I/O scheduler will limit the number of
4228concurrent operations to the minimum. As that threshold is crossed, the
4229number of concurrent operations issued increases linearly to the maximum at
4230the specified maximum percentage of the dirty data allowed in the pool.
4231.sp
4232Ideally, the amount of dirty data on a busy pool will stay in the sloped
4233part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
4234and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
4235maximum percentage, this indicates that the rate of incoming data is
4236greater than the rate that the backend storage can handle. In this case, we
4237must further throttle incoming writes, as described in the next section.
4238
4239.SH ZFS TRANSACTION DELAY
4240We delay transactions when we've determined that the backend storage
4241isn't able to accommodate the rate of incoming writes.
4242.sp
4243If there is already a transaction waiting, we delay relative to when
4244that transaction will finish waiting. This way the calculated delay time
4245is independent of the number of threads concurrently executing
4246transactions.
4247.sp
4248If we are the only waiter, wait relative to when the transaction
4249started, rather than the current time. This credits the transaction for
4250"time already served", e.g. reading indirect blocks.
4251.sp
4252The minimum time for a transaction to take is calculated as:
4253.nf
4254 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
4255 min_time is then capped at 100 milliseconds.
4256.fi
4257.sp
4258The delay has two degrees of freedom that can be adjusted via tunables. The
4259percentage of dirty data at which we start to delay is defined by
4260\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
4261\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
4262delay after writing at full speed has failed to keep up with the incoming write
4263rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
4264this variable determines the amount of delay at the midpoint of the curve.
4265.sp
4266.nf
4267delay
4268 10ms +-------------------------------------------------------------*+
4269 | *|
4270 9ms + *+
4271 | *|
4272 8ms + *+
4273 | * |
4274 7ms + * +
4275 | * |
4276 6ms + * +
4277 | * |
4278 5ms + * +
4279 | * |
4280 4ms + * +
4281 | * |
4282 3ms + * +
4283 | * |
4284 2ms + (midpoint) * +
4285 | | ** |
4286 1ms + v *** +
4287 | zfs_delay_scale ----------> ******** |
4288 0 +-------------------------------------*********----------------+
4289 0% <- zfs_dirty_data_max -> 100%
4290.fi
4291.sp
4292Note that since the delay is added to the outstanding time remaining on the
4293most recent transaction, the delay is effectively the inverse of IOPS.
4294Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
4295was chosen such that small changes in the amount of accumulated dirty data
4296in the first 3/4 of the curve yield relatively small differences in the
4297amount of delay.
4298.sp
4299The effects can be easier to understand when the amount of delay is
4300represented on a log scale:
4301.sp
4302.nf
4303delay
4304100ms +-------------------------------------------------------------++
4305 + +
4306 | |
4307 + *+
4308 10ms + *+
4309 + ** +
4310 | (midpoint) ** |
4311 + | ** +
4312 1ms + v **** +
4313 + zfs_delay_scale ----------> ***** +
4314 | **** |
4315 + **** +
4316100us + ** +
4317 + * +
4318 | * |
4319 + * +
4320 10us + * +
4321 + +
4322 | |
4323 + +
4324 +--------------------------------------------------------------+
4325 0% <- zfs_dirty_data_max -> 100%
4326.fi
4327.sp
4328Note here that only as the amount of dirty data approaches its limit does
4329the delay start to increase rapidly. The goal of a properly tuned system
4330should be to keep the amount of dirty data out of that range by first
4331ensuring that the appropriate limits are set for the I/O scheduler to reach
4332optimal throughput on the backend storage, and then by changing the value
4333of \fBzfs_delay_scale\fR to increase the steepness of the curve.