]> git.proxmox.com Git - mirror_zfs.git/blame - man/man5/zfs-module-parameters.5
Cause zfs.spec to place dracut files properly
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
CommitLineData
29714574
TF
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3.\" The contents of this file are subject to the terms of the Common Development
4.\" and Distribution License (the "License"). You may not use this file except
5.\" in compliance with the License. You can obtain a copy of the license at
6.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
7.\"
8.\" See the License for the specific language governing permissions and
9.\" limitations under the License. When distributing Covered Code, include this
10.\" CDDL HEADER in each file and include the License file at
11.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
12.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
13.\" own identifying information:
14.\" Portions Copyright [yyyy] [name of copyright owner]
15.TH ZFS-MODULE-PARAMETERS 5 "Nov 16, 2013"
16.SH NAME
17zfs\-module\-parameters \- ZFS module parameters
18.SH DESCRIPTION
19.sp
20.LP
21Description of the different parameters to the ZFS module.
22
23.SS "Module parameters"
24.sp
25.LP
26
27.sp
28.ne 2
29.na
30\fBl2arc_feed_again\fR (int)
31.ad
32.RS 12n
33Turbo L2ARC warmup
34.sp
35Use \fB1\fR for yes (default) and \fB0\fR to disable.
36.RE
37
38.sp
39.ne 2
40.na
41\fBl2arc_feed_min_ms\fR (ulong)
42.ad
43.RS 12n
44Min feed interval in milliseconds
45.sp
46Default value: \fB200\fR.
47.RE
48
49.sp
50.ne 2
51.na
52\fBl2arc_feed_secs\fR (ulong)
53.ad
54.RS 12n
55Seconds between L2ARC writing
56.sp
57Default value: \fB1\fR.
58.RE
59
60.sp
61.ne 2
62.na
63\fBl2arc_headroom\fR (ulong)
64.ad
65.RS 12n
66Number of max device writes to precache
67.sp
68Default value: \fB2\fR.
69.RE
70
71.sp
72.ne 2
73.na
74\fBl2arc_headroom_boost\fR (ulong)
75.ad
76.RS 12n
77Compressed l2arc_headroom multiplier
78.sp
79Default value: \fB200\fR.
80.RE
81
82.sp
83.ne 2
84.na
85\fBl2arc_nocompress\fR (int)
86.ad
87.RS 12n
88Skip compressing L2ARC buffers
89.sp
90Use \fB1\fR for yes and \fB0\fR for no (default).
91.RE
92
93.sp
94.ne 2
95.na
96\fBl2arc_noprefetch\fR (int)
97.ad
98.RS 12n
99Skip caching prefetched buffers
100.sp
101Use \fB1\fR for yes (default) and \fB0\fR to disable.
102.RE
103
104.sp
105.ne 2
106.na
107\fBl2arc_norw\fR (int)
108.ad
109.RS 12n
110No reads during writes
111.sp
112Use \fB1\fR for yes and \fB0\fR for no (default).
113.RE
114
115.sp
116.ne 2
117.na
118\fBl2arc_write_boost\fR (ulong)
119.ad
120.RS 12n
121Extra write bytes during device warmup
122.sp
123Default value: \fB8,388,608\fR.
124.RE
125
126.sp
127.ne 2
128.na
129\fBl2arc_write_max\fR (ulong)
130.ad
131.RS 12n
132Max write bytes per interval
133.sp
134Default value: \fB8,388,608\fR.
135.RE
136
137.sp
138.ne 2
139.na
140\fBmetaslab_debug\fR (int)
141.ad
142.RS 12n
143Keep space maps in core to verify frees
144.sp
145Use \fB1\fR for yes and \fB0\fR for no (default).
146.RE
147
148.sp
149.ne 2
150.na
151\fBspa_config_path\fR (charp)
152.ad
153.RS 12n
154SPA config file
155.sp
156Default value: \fB/etc/zfs/zpool.cache\fR.
157.RE
158
e8b96c60
MA
159.sp
160.ne 2
161.na
162\fBspa_asize_inflation\fR (int)
163.ad
164.RS 12n
165Multiplication factor used to estimate actual disk consumption from the
166size of data being written. The default value is a worst case estimate,
167but lower values may be valid for a given pool depending on its
168configuration. Pool administrators who understand the factors involved
169may wish to specify a more realistic inflation factor, particularly if
170they operate close to quota or capacity limits.
171.sp
172Default value: 24
173.RE
174
29714574
TF
175.sp
176.ne 2
177.na
178\fBzfetch_array_rd_sz\fR (ulong)
179.ad
180.RS 12n
181Number of bytes in a array_read
182.sp
183Default value: \fB1,048,576\fR.
184.RE
185
186.sp
187.ne 2
188.na
189\fBzfetch_block_cap\fR (uint)
190.ad
191.RS 12n
192Max number of blocks to fetch at a time
193.sp
194Default value: \fB256\fR.
195.RE
196
197.sp
198.ne 2
199.na
200\fBzfetch_max_streams\fR (uint)
201.ad
202.RS 12n
203Max number of streams per zfetch
204.sp
205Default value: \fB8\fR.
206.RE
207
208.sp
209.ne 2
210.na
211\fBzfetch_min_sec_reap\fR (uint)
212.ad
213.RS 12n
214Min time before stream reclaim
215.sp
216Default value: \fB2\fR.
217.RE
218
219.sp
220.ne 2
221.na
222\fBzfs_arc_grow_retry\fR (int)
223.ad
224.RS 12n
225Seconds before growing arc size
226.sp
227Default value: \fB5\fR.
228.RE
229
230.sp
231.ne 2
232.na
233\fBzfs_arc_max\fR (ulong)
234.ad
235.RS 12n
236Max arc size
237.sp
238Default value: \fB0\fR.
239.RE
240
241.sp
242.ne 2
243.na
244\fBzfs_arc_memory_throttle_disable\fR (int)
245.ad
246.RS 12n
247Disable memory throttle
248.sp
249Use \fB1\fR for yes (default) and \fB0\fR to disable.
250.RE
251
252.sp
253.ne 2
254.na
255\fBzfs_arc_meta_limit\fR (ulong)
256.ad
257.RS 12n
258Meta limit for arc size
259.sp
260Default value: \fB0\fR.
261.RE
262
263.sp
264.ne 2
265.na
266\fBzfs_arc_meta_prune\fR (int)
267.ad
268.RS 12n
269Bytes of meta data to prune
270.sp
271Default value: \fB1,048,576\fR.
272.RE
273
274.sp
275.ne 2
276.na
277\fBzfs_arc_min\fR (ulong)
278.ad
279.RS 12n
280Min arc size
281.sp
282Default value: \fB100\fR.
283.RE
284
285.sp
286.ne 2
287.na
288\fBzfs_arc_min_prefetch_lifespan\fR (int)
289.ad
290.RS 12n
291Min life of prefetch block
292.sp
293Default value: \fB100\fR.
294.RE
295
296.sp
297.ne 2
298.na
299\fBzfs_arc_p_min_shift\fR (int)
300.ad
301.RS 12n
302arc_c shift to calc min/max arc_p
303.sp
304Default value: \fB4\fR.
305.RE
306
307.sp
308.ne 2
309.na
310\fBzfs_arc_shrink_shift\fR (int)
311.ad
312.RS 12n
313log2(fraction of arc to reclaim)
314.sp
315Default value: \fB5\fR.
316.RE
317
318.sp
319.ne 2
320.na
321\fBzfs_autoimport_disable\fR (int)
322.ad
323.RS 12n
324Disable pool import at module load
325.sp
326Use \fB1\fR for yes and \fB0\fR for no (default).
327.RE
328
329.sp
330.ne 2
331.na
332\fBzfs_dbuf_state_index\fR (int)
333.ad
334.RS 12n
335Calculate arc header index
336.sp
337Default value: \fB0\fR.
338.RE
339
340.sp
341.ne 2
342.na
343\fBzfs_deadman_enabled\fR (int)
344.ad
345.RS 12n
346Enable deadman timer
347.sp
348Use \fB1\fR for yes (default) and \fB0\fR to disable.
349.RE
350
351.sp
352.ne 2
353.na
e8b96c60 354\fBzfs_deadman_synctime_ms\fR (ulong)
29714574
TF
355.ad
356.RS 12n
e8b96c60
MA
357Expiration time in milliseconds. This value has two meanings. First it is
358used to determine when the spa_deadman() logic should fire. By default the
359spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
360Secondly, the value determines if an I/O is considered "hung". Any I/O that
361has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
362in a zevent being logged.
29714574 363.sp
e8b96c60 364Default value: \fB1,000,000\fR.
29714574
TF
365.RE
366
367.sp
368.ne 2
369.na
370\fBzfs_dedup_prefetch\fR (int)
371.ad
372.RS 12n
373Enable prefetching dedup-ed blks
374.sp
375Use \fB1\fR for yes (default) and \fB0\fR to disable.
376.RE
377
e8b96c60
MA
378.sp
379.ne 2
380.na
381\fBzfs_delay_min_dirty_percent\fR (int)
382.ad
383.RS 12n
384Start to delay each transaction once there is this amount of dirty data,
385expressed as a percentage of \fBzfs_dirty_data_max\fR.
386This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
387See the section "ZFS TRANSACTION DELAY".
388.sp
389Default value: \fB60\fR.
390.RE
391
392.sp
393.ne 2
394.na
395\fBzfs_delay_scale\fR (int)
396.ad
397.RS 12n
398This controls how quickly the transaction delay approaches infinity.
399Larger values cause longer delays for a given amount of dirty data.
400.sp
401For the smoothest delay, this value should be about 1 billion divided
402by the maximum number of operations per second. This will smoothly
403handle between 10x and 1/10th this number.
404.sp
405See the section "ZFS TRANSACTION DELAY".
406.sp
407Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
408.sp
409Default value: \fB500,000\fR.
410.RE
411
412.sp
413.ne 2
414.na
415\fBzfs_dirty_data_max\fR (int)
416.ad
417.RS 12n
418Determines the dirty space limit in bytes. Once this limit is exceeded, new
419writes are halted until space frees up. This parameter takes precedence
420over \fBzfs_dirty_data_max_percent\fR.
421See the section "ZFS TRANSACTION DELAY".
422.sp
423Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
424.RE
425
426.sp
427.ne 2
428.na
429\fBzfs_dirty_data_max_max\fR (int)
430.ad
431.RS 12n
432Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
433This limit is only enforced at module load time, and will be ignored if
434\fBzfs_dirty_data_max\fR is later changed. This parameter takes
435precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
436"ZFS TRANSACTION DELAY".
437.sp
438Default value: 25% of physical RAM.
439.RE
440
441.sp
442.ne 2
443.na
444\fBzfs_dirty_data_max_max_percent\fR (int)
445.ad
446.RS 12n
447Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
448percentage of physical RAM. This limit is only enforced at module load
449time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
450The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
451one. See the section "ZFS TRANSACTION DELAY".
452.sp
453Default value: 25
454.RE
455
456.sp
457.ne 2
458.na
459\fBzfs_dirty_data_max_percent\fR (int)
460.ad
461.RS 12n
462Determines the dirty space limit, expressed as a percentage of all
463memory. Once this limit is exceeded, new writes are halted until space frees
464up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
465one. See the section "ZFS TRANSACTION DELAY".
466.sp
467Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
468.RE
469
470.sp
471.ne 2
472.na
473\fBzfs_dirty_data_sync\fR (int)
474.ad
475.RS 12n
476Start syncing out a transaction group if there is at least this much dirty data.
477.sp
478Default value: \fB67,108,864\fR.
479.RE
480
481.sp
482.ne 2
483.na
484\fBzfs_vdev_async_read_max_active\fR (int)
485.ad
486.RS 12n
487Maxium asynchronous read I/Os active to each device.
488See the section "ZFS I/O SCHEDULER".
489.sp
490Default value: \fB3\fR.
491.RE
492
493.sp
494.ne 2
495.na
496\fBzfs_vdev_async_read_min_active\fR (int)
497.ad
498.RS 12n
499Minimum asynchronous read I/Os active to each device.
500See the section "ZFS I/O SCHEDULER".
501.sp
502Default value: \fB1\fR.
503.RE
504
505.sp
506.ne 2
507.na
508\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
509.ad
510.RS 12n
511When the pool has more than
512\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
513\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
514the dirty data is between min and max, the active I/O limit is linearly
515interpolated. See the section "ZFS I/O SCHEDULER".
516.sp
517Default value: \fB60\fR.
518.RE
519
520.sp
521.ne 2
522.na
523\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
524.ad
525.RS 12n
526When the pool has less than
527\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
528\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
529the dirty data is between min and max, the active I/O limit is linearly
530interpolated. See the section "ZFS I/O SCHEDULER".
531.sp
532Default value: \fB30\fR.
533.RE
534
535.sp
536.ne 2
537.na
538\fBzfs_vdev_async_write_max_active\fR (int)
539.ad
540.RS 12n
541Maxium asynchronous write I/Os active to each device.
542See the section "ZFS I/O SCHEDULER".
543.sp
544Default value: \fB10\fR.
545.RE
546
547.sp
548.ne 2
549.na
550\fBzfs_vdev_async_write_min_active\fR (int)
551.ad
552.RS 12n
553Minimum asynchronous write I/Os active to each device.
554See the section "ZFS I/O SCHEDULER".
555.sp
556Default value: \fB1\fR.
557.RE
558
559.sp
560.ne 2
561.na
562\fBzfs_vdev_max_active\fR (int)
563.ad
564.RS 12n
565The maximum number of I/Os active to each device. Ideally, this will be >=
566the sum of each queue's max_active. It must be at least the sum of each
567queue's min_active. See the section "ZFS I/O SCHEDULER".
568.sp
569Default value: \fB1,000\fR.
570.RE
571
572.sp
573.ne 2
574.na
575\fBzfs_vdev_scrub_max_active\fR (int)
576.ad
577.RS 12n
578Maxium scrub I/Os active to each device.
579See the section "ZFS I/O SCHEDULER".
580.sp
581Default value: \fB2\fR.
582.RE
583
584.sp
585.ne 2
586.na
587\fBzfs_vdev_scrub_min_active\fR (int)
588.ad
589.RS 12n
590Minimum scrub I/Os active to each device.
591See the section "ZFS I/O SCHEDULER".
592.sp
593Default value: \fB1\fR.
594.RE
595
596.sp
597.ne 2
598.na
599\fBzfs_vdev_sync_read_max_active\fR (int)
600.ad
601.RS 12n
602Maxium synchronous read I/Os active to each device.
603See the section "ZFS I/O SCHEDULER".
604.sp
605Default value: \fB10\fR.
606.RE
607
608.sp
609.ne 2
610.na
611\fBzfs_vdev_sync_read_min_active\fR (int)
612.ad
613.RS 12n
614Minimum synchronous read I/Os active to each device.
615See the section "ZFS I/O SCHEDULER".
616.sp
617Default value: \fB10\fR.
618.RE
619
620.sp
621.ne 2
622.na
623\fBzfs_vdev_sync_write_max_active\fR (int)
624.ad
625.RS 12n
626Maxium synchronous write I/Os active to each device.
627See the section "ZFS I/O SCHEDULER".
628.sp
629Default value: \fB10\fR.
630.RE
631
632.sp
633.ne 2
634.na
635\fBzfs_vdev_sync_write_min_active\fR (int)
636.ad
637.RS 12n
638Minimum synchronous write I/Os active to each device.
639See the section "ZFS I/O SCHEDULER".
640.sp
641Default value: \fB10\fR.
642.RE
643
29714574
TF
644.sp
645.ne 2
646.na
647\fBzfs_disable_dup_eviction\fR (int)
648.ad
649.RS 12n
650Disable duplicate buffer eviction
651.sp
652Use \fB1\fR for yes and \fB0\fR for no (default).
653.RE
654
655.sp
656.ne 2
657.na
658\fBzfs_expire_snapshot\fR (int)
659.ad
660.RS 12n
661Seconds to expire .zfs/snapshot
662.sp
663Default value: \fB300\fR.
664.RE
665
666.sp
667.ne 2
668.na
669\fBzfs_flags\fR (int)
670.ad
671.RS 12n
672Set additional debugging flags
673.sp
674Default value: \fB1\fR.
675.RE
676
677.sp
678.ne 2
679.na
680\fBzfs_free_min_time_ms\fR (int)
681.ad
682.RS 12n
683Min millisecs to free per txg
684.sp
685Default value: \fB1,000\fR.
686.RE
687
688.sp
689.ne 2
690.na
691\fBzfs_immediate_write_sz\fR (long)
692.ad
693.RS 12n
694Largest data block to write to zil
695.sp
696Default value: \fB32,768\fR.
697.RE
698
699.sp
700.ne 2
701.na
702\fBzfs_mdcomp_disable\fR (int)
703.ad
704.RS 12n
705Disable meta data compression
706.sp
707Use \fB1\fR for yes and \fB0\fR for no (default).
708.RE
709
710.sp
711.ne 2
712.na
713\fBzfs_no_scrub_io\fR (int)
714.ad
715.RS 12n
716Set for no scrub I/O
717.sp
718Use \fB1\fR for yes and \fB0\fR for no (default).
719.RE
720
721.sp
722.ne 2
723.na
724\fBzfs_no_scrub_prefetch\fR (int)
725.ad
726.RS 12n
727Set for no scrub prefetching
728.sp
729Use \fB1\fR for yes and \fB0\fR for no (default).
730.RE
731
29714574
TF
732.sp
733.ne 2
734.na
735\fBzfs_nocacheflush\fR (int)
736.ad
737.RS 12n
738Disable cache flushes
739.sp
740Use \fB1\fR for yes and \fB0\fR for no (default).
741.RE
742
743.sp
744.ne 2
745.na
746\fBzfs_nopwrite_enabled\fR (int)
747.ad
748.RS 12n
749Enable NOP writes
750.sp
751Use \fB1\fR for yes (default) and \fB0\fR to disable.
752.RE
753
754.sp
755.ne 2
756.na
757\fBzfs_pd_blks_max\fR (int)
758.ad
759.RS 12n
760Max number of blocks to prefetch
761.sp
762Default value: \fB100\fR.
763.RE
764
765.sp
766.ne 2
767.na
768\fBzfs_prefetch_disable\fR (int)
769.ad
770.RS 12n
771Disable all ZFS prefetching
772.sp
773Use \fB1\fR for yes and \fB0\fR for no (default).
774.RE
775
776.sp
777.ne 2
778.na
779\fBzfs_read_chunk_size\fR (long)
780.ad
781.RS 12n
782Bytes to read per chunk
783.sp
784Default value: \fB1,048,576\fR.
785.RE
786
787.sp
788.ne 2
789.na
790\fBzfs_read_history\fR (int)
791.ad
792.RS 12n
793Historic statistics for the last N reads
794.sp
795Default value: \fB0\fR.
796.RE
797
798.sp
799.ne 2
800.na
801\fBzfs_read_history_hits\fR (int)
802.ad
803.RS 12n
804Include cache hits in read history
805.sp
806Use \fB1\fR for yes and \fB0\fR for no (default).
807.RE
808
809.sp
810.ne 2
811.na
812\fBzfs_recover\fR (int)
813.ad
814.RS 12n
815Set to attempt to recover from fatal errors. This should only be used as a
816last resort, as it typically results in leaked space, or worse.
817.sp
818Use \fB1\fR for yes and \fB0\fR for no (default).
819.RE
820
821.sp
822.ne 2
823.na
824\fBzfs_resilver_delay\fR (int)
825.ad
826.RS 12n
827Number of ticks to delay resilver
828.sp
829Default value: \fB2\fR.
830.RE
831
832.sp
833.ne 2
834.na
835\fBzfs_resilver_min_time_ms\fR (int)
836.ad
837.RS 12n
838Min millisecs to resilver per txg
839.sp
840Default value: \fB3,000\fR.
841.RE
842
843.sp
844.ne 2
845.na
846\fBzfs_scan_idle\fR (int)
847.ad
848.RS 12n
849Idle window in clock ticks
850.sp
851Default value: \fB50\fR.
852.RE
853
854.sp
855.ne 2
856.na
857\fBzfs_scan_min_time_ms\fR (int)
858.ad
859.RS 12n
860Min millisecs to scrub per txg
861.sp
862Default value: \fB1,000\fR.
863.RE
864
865.sp
866.ne 2
867.na
868\fBzfs_scrub_delay\fR (int)
869.ad
870.RS 12n
871Number of ticks to delay scrub
872.sp
873Default value: \fB4\fR.
874.RE
875
876.sp
877.ne 2
878.na
879\fBzfs_sync_pass_deferred_free\fR (int)
880.ad
881.RS 12n
882Defer frees starting in this pass
883.sp
884Default value: \fB2\fR.
885.RE
886
887.sp
888.ne 2
889.na
890\fBzfs_sync_pass_dont_compress\fR (int)
891.ad
892.RS 12n
893Don't compress starting in this pass
894.sp
895Default value: \fB5\fR.
896.RE
897
898.sp
899.ne 2
900.na
901\fBzfs_sync_pass_rewrite\fR (int)
902.ad
903.RS 12n
904Rewrite new bps starting in this pass
905.sp
906Default value: \fB2\fR.
907.RE
908
909.sp
910.ne 2
911.na
912\fBzfs_top_maxinflight\fR (int)
913.ad
914.RS 12n
915Max I/Os per top-level
916.sp
917Default value: \fB32\fR.
918.RE
919
920.sp
921.ne 2
922.na
923\fBzfs_txg_history\fR (int)
924.ad
925.RS 12n
926Historic statistics for the last N txgs
927.sp
928Default value: \fB0\fR.
929.RE
930
29714574
TF
931.sp
932.ne 2
933.na
934\fBzfs_txg_timeout\fR (int)
935.ad
936.RS 12n
937Max seconds worth of delta per txg
938.sp
939Default value: \fB5\fR.
940.RE
941
942.sp
943.ne 2
944.na
945\fBzfs_vdev_aggregation_limit\fR (int)
946.ad
947.RS 12n
948Max vdev I/O aggregation size
949.sp
950Default value: \fB131,072\fR.
951.RE
952
953.sp
954.ne 2
955.na
956\fBzfs_vdev_cache_bshift\fR (int)
957.ad
958.RS 12n
959Shift size to inflate reads too
960.sp
961Default value: \fB16\fR.
962.RE
963
964.sp
965.ne 2
966.na
967\fBzfs_vdev_cache_max\fR (int)
968.ad
969.RS 12n
970Inflate reads small than max
971.RE
972
973.sp
974.ne 2
975.na
976\fBzfs_vdev_cache_size\fR (int)
977.ad
978.RS 12n
979Total size of the per-disk cache
980.sp
981Default value: \fB0\fR.
982.RE
983
29714574
TF
984.sp
985.ne 2
986.na
987\fBzfs_vdev_mirror_switch_us\fR (int)
988.ad
989.RS 12n
990Switch mirrors every N usecs
991.sp
992Default value: \fB10,000\fR.
993.RE
994
29714574
TF
995.sp
996.ne 2
997.na
998\fBzfs_vdev_read_gap_limit\fR (int)
999.ad
1000.RS 12n
1001Aggregate read I/O over gap
1002.sp
1003Default value: \fB32,768\fR.
1004.RE
1005
1006.sp
1007.ne 2
1008.na
1009\fBzfs_vdev_scheduler\fR (charp)
1010.ad
1011.RS 12n
1012I/O scheduler
1013.sp
1014Default value: \fBnoop\fR.
1015.RE
1016
29714574
TF
1017.sp
1018.ne 2
1019.na
1020\fBzfs_vdev_write_gap_limit\fR (int)
1021.ad
1022.RS 12n
1023Aggregate write I/O over gap
1024.sp
1025Default value: \fB4,096\fR.
1026.RE
1027
29714574
TF
1028.sp
1029.ne 2
1030.na
1031\fBzfs_zevent_cols\fR (int)
1032.ad
1033.RS 12n
1034Max event column width
1035.sp
1036Default value: \fB80\fR.
1037.RE
1038
1039.sp
1040.ne 2
1041.na
1042\fBzfs_zevent_console\fR (int)
1043.ad
1044.RS 12n
1045Log events to the console
1046.sp
1047Use \fB1\fR for yes and \fB0\fR for no (default).
1048.RE
1049
1050.sp
1051.ne 2
1052.na
1053\fBzfs_zevent_len_max\fR (int)
1054.ad
1055.RS 12n
1056Max event queue length
1057.sp
1058Default value: \fB0\fR.
1059.RE
1060
1061.sp
1062.ne 2
1063.na
1064\fBzil_replay_disable\fR (int)
1065.ad
1066.RS 12n
1067Disable intent logging replay
1068.sp
1069Use \fB1\fR for yes and \fB0\fR for no (default).
1070.RE
1071
1072.sp
1073.ne 2
1074.na
1075\fBzil_slog_limit\fR (ulong)
1076.ad
1077.RS 12n
1078Max commit bytes to separate log device
1079.sp
1080Default value: \fB1,048,576\fR.
1081.RE
1082
1083.sp
1084.ne 2
1085.na
1086\fBzio_bulk_flags\fR (int)
1087.ad
1088.RS 12n
1089Additional flags to pass to bulk buffers
1090.sp
1091Default value: \fB0\fR.
1092.RE
1093
1094.sp
1095.ne 2
1096.na
1097\fBzio_delay_max\fR (int)
1098.ad
1099.RS 12n
1100Max zio millisec delay before posting event
1101.sp
1102Default value: \fB30,000\fR.
1103.RE
1104
1105.sp
1106.ne 2
1107.na
1108\fBzio_injection_enabled\fR (int)
1109.ad
1110.RS 12n
1111Enable fault injection
1112.sp
1113Use \fB1\fR for yes and \fB0\fR for no (default).
1114.RE
1115
1116.sp
1117.ne 2
1118.na
1119\fBzio_requeue_io_start_cut_in_line\fR (int)
1120.ad
1121.RS 12n
1122Prioritize requeued I/O
1123.sp
1124Default value: \fB0\fR.
1125.RE
1126
1127.sp
1128.ne 2
1129.na
1130\fBzvol_inhibit_dev\fR (uint)
1131.ad
1132.RS 12n
1133Do not create zvol device nodes
1134.sp
1135Use \fB1\fR for yes and \fB0\fR for no (default).
1136.RE
1137
1138.sp
1139.ne 2
1140.na
1141\fBzvol_major\fR (uint)
1142.ad
1143.RS 12n
1144Major number for zvol device
1145.sp
1146Default value: \fB230\fR.
1147.RE
1148
1149.sp
1150.ne 2
1151.na
1152\fBzvol_max_discard_blocks\fR (ulong)
1153.ad
1154.RS 12n
1155Max number of blocks to discard at once
1156.sp
1157Default value: \fB16,384\fR.
1158.RE
1159
1160.sp
1161.ne 2
1162.na
1163\fBzvol_threads\fR (uint)
1164.ad
1165.RS 12n
1166Number of threads for zvol device
1167.sp
1168Default value: \fB32\fR.
1169.RE
1170
e8b96c60
MA
1171.SH ZFS I/O SCHEDULER
1172ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
1173The I/O scheduler determines when and in what order those operations are
1174issued. The I/O scheduler divides operations into five I/O classes
1175prioritized in the following order: sync read, sync write, async read,
1176async write, and scrub/resilver. Each queue defines the minimum and
1177maximum number of concurrent operations that may be issued to the
1178device. In addition, the device has an aggregate maximum,
1179\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
1180must not exceed the aggregate maximum. If the sum of the per-queue
1181maximums exceeds the aggregate maximum, then the number of active I/Os
1182may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
1183be issued regardless of whether all per-queue minimums have been met.
1184.sp
1185For many physical devices, throughput increases with the number of
1186concurrent operations, but latency typically suffers. Further, physical
1187devices typically have a limit at which more concurrent operations have no
1188effect on throughput or can actually cause it to decrease.
1189.sp
1190The scheduler selects the next operation to issue by first looking for an
1191I/O class whose minimum has not been satisfied. Once all are satisfied and
1192the aggregate maximum has not been hit, the scheduler looks for classes
1193whose maximum has not been satisfied. Iteration through the I/O classes is
1194done in the order specified above. No further operations are issued if the
1195aggregate maximum number of concurrent operations has been hit or if there
1196are no operations queued for an I/O class that has not hit its maximum.
1197Every time an I/O is queued or an operation completes, the I/O scheduler
1198looks for new operations to issue.
1199.sp
1200In general, smaller max_active's will lead to lower latency of synchronous
1201operations. Larger max_active's may lead to higher overall throughput,
1202depending on underlying storage.
1203.sp
1204The ratio of the queues' max_actives determines the balance of performance
1205between reads, writes, and scrubs. E.g., increasing
1206\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
1207more quickly, but reads and writes to have higher latency and lower throughput.
1208.sp
1209All I/O classes have a fixed maximum number of outstanding operations
1210except for the async write class. Asynchronous writes represent the data
1211that is committed to stable storage during the syncing stage for
1212transaction groups. Transaction groups enter the syncing state
1213periodically so the number of queued async writes will quickly burst up
1214and then bleed down to zero. Rather than servicing them as quickly as
1215possible, the I/O scheduler changes the maximum number of active async
1216write I/Os according to the amount of dirty data in the pool. Since
1217both throughput and latency typically increase with the number of
1218concurrent operations issued to physical devices, reducing the
1219burstiness in the number of concurrent operations also stabilizes the
1220response time of operations from other -- and in particular synchronous
1221-- queues. In broad strokes, the I/O scheduler will issue more
1222concurrent operations from the async write queue as there's more dirty
1223data in the pool.
1224.sp
1225Async Writes
1226.sp
1227The number of concurrent operations issued for the async write I/O class
1228follows a piece-wise linear function defined by a few adjustable points.
1229.nf
1230
1231 | o---------| <-- zfs_vdev_async_write_max_active
1232 ^ | /^ |
1233 | | / | |
1234active | / | |
1235 I/O | / | |
1236count | / | |
1237 | / | |
1238 |-------o | | <-- zfs_vdev_async_write_min_active
1239 0|_______^______|_________|
1240 0% | | 100% of zfs_dirty_data_max
1241 | |
1242 | `-- zfs_vdev_async_write_active_max_dirty_percent
1243 `--------- zfs_vdev_async_write_active_min_dirty_percent
1244
1245.fi
1246Until the amount of dirty data exceeds a minimum percentage of the dirty
1247data allowed in the pool, the I/O scheduler will limit the number of
1248concurrent operations to the minimum. As that threshold is crossed, the
1249number of concurrent operations issued increases linearly to the maximum at
1250the specified maximum percentage of the dirty data allowed in the pool.
1251.sp
1252Ideally, the amount of dirty data on a busy pool will stay in the sloped
1253part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
1254and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
1255maximum percentage, this indicates that the rate of incoming data is
1256greater than the rate that the backend storage can handle. In this case, we
1257must further throttle incoming writes, as described in the next section.
1258
1259.SH ZFS TRANSACTION DELAY
1260We delay transactions when we've determined that the backend storage
1261isn't able to accommodate the rate of incoming writes.
1262.sp
1263If there is already a transaction waiting, we delay relative to when
1264that transaction will finish waiting. This way the calculated delay time
1265is independent of the number of threads concurrently executing
1266transactions.
1267.sp
1268If we are the only waiter, wait relative to when the transaction
1269started, rather than the current time. This credits the transaction for
1270"time already served", e.g. reading indirect blocks.
1271.sp
1272The minimum time for a transaction to take is calculated as:
1273.nf
1274 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
1275 min_time is then capped at 100 milliseconds.
1276.fi
1277.sp
1278The delay has two degrees of freedom that can be adjusted via tunables. The
1279percentage of dirty data at which we start to delay is defined by
1280\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
1281\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
1282delay after writing at full speed has failed to keep up with the incoming write
1283rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
1284this variable determines the amount of delay at the midpoint of the curve.
1285.sp
1286.nf
1287delay
1288 10ms +-------------------------------------------------------------*+
1289 | *|
1290 9ms + *+
1291 | *|
1292 8ms + *+
1293 | * |
1294 7ms + * +
1295 | * |
1296 6ms + * +
1297 | * |
1298 5ms + * +
1299 | * |
1300 4ms + * +
1301 | * |
1302 3ms + * +
1303 | * |
1304 2ms + (midpoint) * +
1305 | | ** |
1306 1ms + v *** +
1307 | zfs_delay_scale ----------> ******** |
1308 0 +-------------------------------------*********----------------+
1309 0% <- zfs_dirty_data_max -> 100%
1310.fi
1311.sp
1312Note that since the delay is added to the outstanding time remaining on the
1313most recent transaction, the delay is effectively the inverse of IOPS.
1314Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
1315was chosen such that small changes in the amount of accumulated dirty data
1316in the first 3/4 of the curve yield relatively small differences in the
1317amount of delay.
1318.sp
1319The effects can be easier to understand when the amount of delay is
1320represented on a log scale:
1321.sp
1322.nf
1323delay
1324100ms +-------------------------------------------------------------++
1325 + +
1326 | |
1327 + *+
1328 10ms + *+
1329 + ** +
1330 | (midpoint) ** |
1331 + | ** +
1332 1ms + v **** +
1333 + zfs_delay_scale ----------> ***** +
1334 | **** |
1335 + **** +
1336100us + ** +
1337 + * +
1338 | * |
1339 + * +
1340 10us + * +
1341 + +
1342 | |
1343 + +
1344 +--------------------------------------------------------------+
1345 0% <- zfs_dirty_data_max -> 100%
1346.fi
1347.sp
1348Note here that only as the amount of dirty data approaches its limit does
1349the delay start to increase rapidly. The goal of a properly tuned system
1350should be to keep the amount of dirty data out of that range by first
1351ensuring that the appropriate limits are set for the I/O scheduler to reach
1352optimal throughput on the backend storage, and then by changing the value
1353of \fBzfs_delay_scale\fR to increase the steepness of the curve.