]> git.proxmox.com Git - mirror_zfs-debian.git/blame - man/man5/zfs-module-parameters.5
Imported Upstream version 0.6.2+git20140204
[mirror_zfs-debian.git] / man / man5 / zfs-module-parameters.5
CommitLineData
a08ee875
LG
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3.\" The contents of this file are subject to the terms of the Common Development
4.\" and Distribution License (the "License"). You may not use this file except
5.\" in compliance with the License. You can obtain a copy of the license at
6.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
7.\"
8.\" See the License for the specific language governing permissions and
9.\" limitations under the License. When distributing Covered Code, include this
10.\" CDDL HEADER in each file and include the License file at
11.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
12.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
13.\" own identifying information:
14.\" Portions Copyright [yyyy] [name of copyright owner]
15.TH ZFS-MODULE-PARAMETERS 5 "Nov 16, 2013"
16.SH NAME
17zfs\-module\-parameters \- ZFS module parameters
18.SH DESCRIPTION
19.sp
20.LP
21Description of the different parameters to the ZFS module.
22
23.SS "Module parameters"
24.sp
25.LP
26
27.sp
28.ne 2
29.na
30\fBl2arc_feed_again\fR (int)
31.ad
32.RS 12n
33Turbo L2ARC warmup
34.sp
35Use \fB1\fR for yes (default) and \fB0\fR to disable.
36.RE
37
38.sp
39.ne 2
40.na
41\fBl2arc_feed_min_ms\fR (ulong)
42.ad
43.RS 12n
44Min feed interval in milliseconds
45.sp
46Default value: \fB200\fR.
47.RE
48
49.sp
50.ne 2
51.na
52\fBl2arc_feed_secs\fR (ulong)
53.ad
54.RS 12n
55Seconds between L2ARC writing
56.sp
57Default value: \fB1\fR.
58.RE
59
60.sp
61.ne 2
62.na
63\fBl2arc_headroom\fR (ulong)
64.ad
65.RS 12n
66Number of max device writes to precache
67.sp
68Default value: \fB2\fR.
69.RE
70
71.sp
72.ne 2
73.na
74\fBl2arc_headroom_boost\fR (ulong)
75.ad
76.RS 12n
77Compressed l2arc_headroom multiplier
78.sp
79Default value: \fB200\fR.
80.RE
81
82.sp
83.ne 2
84.na
85\fBl2arc_nocompress\fR (int)
86.ad
87.RS 12n
88Skip compressing L2ARC buffers
89.sp
90Use \fB1\fR for yes and \fB0\fR for no (default).
91.RE
92
93.sp
94.ne 2
95.na
96\fBl2arc_noprefetch\fR (int)
97.ad
98.RS 12n
99Skip caching prefetched buffers
100.sp
101Use \fB1\fR for yes (default) and \fB0\fR to disable.
102.RE
103
104.sp
105.ne 2
106.na
107\fBl2arc_norw\fR (int)
108.ad
109.RS 12n
110No reads during writes
111.sp
112Use \fB1\fR for yes and \fB0\fR for no (default).
113.RE
114
115.sp
116.ne 2
117.na
118\fBl2arc_write_boost\fR (ulong)
119.ad
120.RS 12n
121Extra write bytes during device warmup
122.sp
123Default value: \fB8,388,608\fR.
124.RE
125
126.sp
127.ne 2
128.na
129\fBl2arc_write_max\fR (ulong)
130.ad
131.RS 12n
132Max write bytes per interval
133.sp
134Default value: \fB8,388,608\fR.
135.RE
136
137.sp
138.ne 2
139.na
140\fBmetaslab_debug\fR (int)
141.ad
142.RS 12n
143Keep space maps in core to verify frees
144.sp
145Use \fB1\fR for yes and \fB0\fR for no (default).
146.RE
147
148.sp
149.ne 2
150.na
151\fBspa_config_path\fR (charp)
152.ad
153.RS 12n
154SPA config file
155.sp
156Default value: \fB/etc/zfs/zpool.cache\fR.
157.RE
158
159.sp
160.ne 2
161.na
162\fBspa_asize_inflation\fR (int)
163.ad
164.RS 12n
165Multiplication factor used to estimate actual disk consumption from the
166size of data being written. The default value is a worst case estimate,
167but lower values may be valid for a given pool depending on its
168configuration. Pool administrators who understand the factors involved
169may wish to specify a more realistic inflation factor, particularly if
170they operate close to quota or capacity limits.
171.sp
172Default value: 24
173.RE
174
175.sp
176.ne 2
177.na
178\fBzfetch_array_rd_sz\fR (ulong)
179.ad
180.RS 12n
181Number of bytes in a array_read
182.sp
183Default value: \fB1,048,576\fR.
184.RE
185
186.sp
187.ne 2
188.na
189\fBzfetch_block_cap\fR (uint)
190.ad
191.RS 12n
192Max number of blocks to fetch at a time
193.sp
194Default value: \fB256\fR.
195.RE
196
197.sp
198.ne 2
199.na
200\fBzfetch_max_streams\fR (uint)
201.ad
202.RS 12n
203Max number of streams per zfetch
204.sp
205Default value: \fB8\fR.
206.RE
207
208.sp
209.ne 2
210.na
211\fBzfetch_min_sec_reap\fR (uint)
212.ad
213.RS 12n
214Min time before stream reclaim
215.sp
216Default value: \fB2\fR.
217.RE
218
219.sp
220.ne 2
221.na
222\fBzfs_arc_grow_retry\fR (int)
223.ad
224.RS 12n
225Seconds before growing arc size
226.sp
227Default value: \fB5\fR.
228.RE
229
230.sp
231.ne 2
232.na
233\fBzfs_arc_max\fR (ulong)
234.ad
235.RS 12n
236Max arc size
237.sp
238Default value: \fB0\fR.
239.RE
240
241.sp
242.ne 2
243.na
244\fBzfs_arc_memory_throttle_disable\fR (int)
245.ad
246.RS 12n
247Disable memory throttle
248.sp
249Use \fB1\fR for yes (default) and \fB0\fR to disable.
250.RE
251
252.sp
253.ne 2
254.na
255\fBzfs_arc_meta_limit\fR (ulong)
256.ad
257.RS 12n
258Meta limit for arc size
259.sp
260Default value: \fB0\fR.
261.RE
262
263.sp
264.ne 2
265.na
266\fBzfs_arc_meta_prune\fR (int)
267.ad
268.RS 12n
269Bytes of meta data to prune
270.sp
271Default value: \fB1,048,576\fR.
272.RE
273
274.sp
275.ne 2
276.na
277\fBzfs_arc_min\fR (ulong)
278.ad
279.RS 12n
280Min arc size
281.sp
282Default value: \fB100\fR.
283.RE
284
285.sp
286.ne 2
287.na
288\fBzfs_arc_min_prefetch_lifespan\fR (int)
289.ad
290.RS 12n
291Min life of prefetch block
292.sp
293Default value: \fB100\fR.
294.RE
295
296.sp
297.ne 2
298.na
299\fBzfs_arc_p_min_shift\fR (int)
300.ad
301.RS 12n
302arc_c shift to calc min/max arc_p
303.sp
304Default value: \fB4\fR.
305.RE
306
307.sp
308.ne 2
309.na
310\fBzfs_arc_shrink_shift\fR (int)
311.ad
312.RS 12n
313log2(fraction of arc to reclaim)
314.sp
315Default value: \fB5\fR.
316.RE
317
318.sp
319.ne 2
320.na
321\fBzfs_autoimport_disable\fR (int)
322.ad
323.RS 12n
324Disable pool import at module load
325.sp
326Use \fB1\fR for yes and \fB0\fR for no (default).
327.RE
328
329.sp
330.ne 2
331.na
332\fBzfs_dbuf_state_index\fR (int)
333.ad
334.RS 12n
335Calculate arc header index
336.sp
337Default value: \fB0\fR.
338.RE
339
340.sp
341.ne 2
342.na
343\fBzfs_deadman_enabled\fR (int)
344.ad
345.RS 12n
346Enable deadman timer
347.sp
348Use \fB1\fR for yes (default) and \fB0\fR to disable.
349.RE
350
351.sp
352.ne 2
353.na
354\fBzfs_deadman_synctime_ms\fR (ulong)
355.ad
356.RS 12n
357Expiration time in milliseconds. This value has two meanings. First it is
358used to determine when the spa_deadman() logic should fire. By default the
359spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
360Secondly, the value determines if an I/O is considered "hung". Any I/O that
361has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
362in a zevent being logged.
363.sp
364Default value: \fB1,000,000\fR.
365.RE
366
367.sp
368.ne 2
369.na
370\fBzfs_dedup_prefetch\fR (int)
371.ad
372.RS 12n
373Enable prefetching dedup-ed blks
374.sp
375Use \fB1\fR for yes (default) and \fB0\fR to disable.
376.RE
377
378.sp
379.ne 2
380.na
381\fBzfs_delay_min_dirty_percent\fR (int)
382.ad
383.RS 12n
384Start to delay each transaction once there is this amount of dirty data,
385expressed as a percentage of \fBzfs_dirty_data_max\fR.
386This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
387See the section "ZFS TRANSACTION DELAY".
388.sp
389Default value: \fB60\fR.
390.RE
391
392.sp
393.ne 2
394.na
395\fBzfs_delay_scale\fR (int)
396.ad
397.RS 12n
398This controls how quickly the transaction delay approaches infinity.
399Larger values cause longer delays for a given amount of dirty data.
400.sp
401For the smoothest delay, this value should be about 1 billion divided
402by the maximum number of operations per second. This will smoothly
403handle between 10x and 1/10th this number.
404.sp
405See the section "ZFS TRANSACTION DELAY".
406.sp
407Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
408.sp
409Default value: \fB500,000\fR.
410.RE
411
412.sp
413.ne 2
414.na
415\fBzfs_dirty_data_max\fR (int)
416.ad
417.RS 12n
418Determines the dirty space limit in bytes. Once this limit is exceeded, new
419writes are halted until space frees up. This parameter takes precedence
420over \fBzfs_dirty_data_max_percent\fR.
421See the section "ZFS TRANSACTION DELAY".
422.sp
423Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
424.RE
425
426.sp
427.ne 2
428.na
429\fBzfs_dirty_data_max_max\fR (int)
430.ad
431.RS 12n
432Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
433This limit is only enforced at module load time, and will be ignored if
434\fBzfs_dirty_data_max\fR is later changed. This parameter takes
435precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
436"ZFS TRANSACTION DELAY".
437.sp
438Default value: 25% of physical RAM.
439.RE
440
441.sp
442.ne 2
443.na
444\fBzfs_dirty_data_max_max_percent\fR (int)
445.ad
446.RS 12n
447Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
448percentage of physical RAM. This limit is only enforced at module load
449time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
450The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
451one. See the section "ZFS TRANSACTION DELAY".
452.sp
453Default value: 25
454.RE
455
456.sp
457.ne 2
458.na
459\fBzfs_dirty_data_max_percent\fR (int)
460.ad
461.RS 12n
462Determines the dirty space limit, expressed as a percentage of all
463memory. Once this limit is exceeded, new writes are halted until space frees
464up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
465one. See the section "ZFS TRANSACTION DELAY".
466.sp
467Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
468.RE
469
470.sp
471.ne 2
472.na
473\fBzfs_dirty_data_sync\fR (int)
474.ad
475.RS 12n
476Start syncing out a transaction group if there is at least this much dirty data.
477.sp
478Default value: \fB67,108,864\fR.
479.RE
480
481.sp
482.ne 2
483.na
484\fBzfs_vdev_async_read_max_active\fR (int)
485.ad
486.RS 12n
487Maxium asynchronous read I/Os active to each device.
488See the section "ZFS I/O SCHEDULER".
489.sp
490Default value: \fB3\fR.
491.RE
492
493.sp
494.ne 2
495.na
496\fBzfs_vdev_async_read_min_active\fR (int)
497.ad
498.RS 12n
499Minimum asynchronous read I/Os active to each device.
500See the section "ZFS I/O SCHEDULER".
501.sp
502Default value: \fB1\fR.
503.RE
504
505.sp
506.ne 2
507.na
508\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
509.ad
510.RS 12n
511When the pool has more than
512\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
513\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
514the dirty data is between min and max, the active I/O limit is linearly
515interpolated. See the section "ZFS I/O SCHEDULER".
516.sp
517Default value: \fB60\fR.
518.RE
519
520.sp
521.ne 2
522.na
523\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
524.ad
525.RS 12n
526When the pool has less than
527\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
528\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
529the dirty data is between min and max, the active I/O limit is linearly
530interpolated. See the section "ZFS I/O SCHEDULER".
531.sp
532Default value: \fB30\fR.
533.RE
534
535.sp
536.ne 2
537.na
538\fBzfs_vdev_async_write_max_active\fR (int)
539.ad
540.RS 12n
541Maxium asynchronous write I/Os active to each device.
542See the section "ZFS I/O SCHEDULER".
543.sp
544Default value: \fB10\fR.
545.RE
546
547.sp
548.ne 2
549.na
550\fBzfs_vdev_async_write_min_active\fR (int)
551.ad
552.RS 12n
553Minimum asynchronous write I/Os active to each device.
554See the section "ZFS I/O SCHEDULER".
555.sp
556Default value: \fB1\fR.
557.RE
558
559.sp
560.ne 2
561.na
562\fBzfs_vdev_max_active\fR (int)
563.ad
564.RS 12n
565The maximum number of I/Os active to each device. Ideally, this will be >=
566the sum of each queue's max_active. It must be at least the sum of each
567queue's min_active. See the section "ZFS I/O SCHEDULER".
568.sp
569Default value: \fB1,000\fR.
570.RE
571
572.sp
573.ne 2
574.na
575\fBzfs_vdev_scrub_max_active\fR (int)
576.ad
577.RS 12n
578Maxium scrub I/Os active to each device.
579See the section "ZFS I/O SCHEDULER".
580.sp
581Default value: \fB2\fR.
582.RE
583
584.sp
585.ne 2
586.na
587\fBzfs_vdev_scrub_min_active\fR (int)
588.ad
589.RS 12n
590Minimum scrub I/Os active to each device.
591See the section "ZFS I/O SCHEDULER".
592.sp
593Default value: \fB1\fR.
594.RE
595
596.sp
597.ne 2
598.na
599\fBzfs_vdev_sync_read_max_active\fR (int)
600.ad
601.RS 12n
602Maxium synchronous read I/Os active to each device.
603See the section "ZFS I/O SCHEDULER".
604.sp
605Default value: \fB10\fR.
606.RE
607
608.sp
609.ne 2
610.na
611\fBzfs_vdev_sync_read_min_active\fR (int)
612.ad
613.RS 12n
614Minimum synchronous read I/Os active to each device.
615See the section "ZFS I/O SCHEDULER".
616.sp
617Default value: \fB10\fR.
618.RE
619
620.sp
621.ne 2
622.na
623\fBzfs_vdev_sync_write_max_active\fR (int)
624.ad
625.RS 12n
626Maxium synchronous write I/Os active to each device.
627See the section "ZFS I/O SCHEDULER".
628.sp
629Default value: \fB10\fR.
630.RE
631
632.sp
633.ne 2
634.na
635\fBzfs_vdev_sync_write_min_active\fR (int)
636.ad
637.RS 12n
638Minimum synchronous write I/Os active to each device.
639See the section "ZFS I/O SCHEDULER".
640.sp
641Default value: \fB10\fR.
642.RE
643
644.sp
645.ne 2
646.na
647\fBzfs_disable_dup_eviction\fR (int)
648.ad
649.RS 12n
650Disable duplicate buffer eviction
651.sp
652Use \fB1\fR for yes and \fB0\fR for no (default).
653.RE
654
655.sp
656.ne 2
657.na
658\fBzfs_expire_snapshot\fR (int)
659.ad
660.RS 12n
661Seconds to expire .zfs/snapshot
662.sp
663Default value: \fB300\fR.
664.RE
665
666.sp
667.ne 2
668.na
669\fBzfs_flags\fR (int)
670.ad
671.RS 12n
672Set additional debugging flags
673.sp
674Default value: \fB1\fR.
675.RE
676
677.sp
678.ne 2
679.na
680\fBzfs_free_min_time_ms\fR (int)
681.ad
682.RS 12n
683Min millisecs to free per txg
684.sp
685Default value: \fB1,000\fR.
686.RE
687
688.sp
689.ne 2
690.na
691\fBzfs_immediate_write_sz\fR (long)
692.ad
693.RS 12n
694Largest data block to write to zil
695.sp
696Default value: \fB32,768\fR.
697.RE
698
699.sp
700.ne 2
701.na
702\fBzfs_mdcomp_disable\fR (int)
703.ad
704.RS 12n
705Disable meta data compression
706.sp
707Use \fB1\fR for yes and \fB0\fR for no (default).
708.RE
709
710.sp
711.ne 2
712.na
713\fBzfs_no_scrub_io\fR (int)
714.ad
715.RS 12n
716Set for no scrub I/O
717.sp
718Use \fB1\fR for yes and \fB0\fR for no (default).
719.RE
720
721.sp
722.ne 2
723.na
724\fBzfs_no_scrub_prefetch\fR (int)
725.ad
726.RS 12n
727Set for no scrub prefetching
728.sp
729Use \fB1\fR for yes and \fB0\fR for no (default).
730.RE
731
732.sp
733.ne 2
734.na
735\fBzfs_nocacheflush\fR (int)
736.ad
737.RS 12n
738Disable cache flushes
739.sp
740Use \fB1\fR for yes and \fB0\fR for no (default).
741.RE
742
743.sp
744.ne 2
745.na
746\fBzfs_nopwrite_enabled\fR (int)
747.ad
748.RS 12n
749Enable NOP writes
750.sp
751Use \fB1\fR for yes (default) and \fB0\fR to disable.
752.RE
753
754.sp
755.ne 2
756.na
757\fBzfs_pd_blks_max\fR (int)
758.ad
759.RS 12n
760Max number of blocks to prefetch
761.sp
762Default value: \fB100\fR.
763.RE
764
765.sp
766.ne 2
767.na
768\fBzfs_prefetch_disable\fR (int)
769.ad
770.RS 12n
771Disable all ZFS prefetching
772.sp
773Use \fB1\fR for yes and \fB0\fR for no (default).
774.RE
775
776.sp
777.ne 2
778.na
779\fBzfs_read_chunk_size\fR (long)
780.ad
781.RS 12n
782Bytes to read per chunk
783.sp
784Default value: \fB1,048,576\fR.
785.RE
786
787.sp
788.ne 2
789.na
790\fBzfs_read_history\fR (int)
791.ad
792.RS 12n
793Historic statistics for the last N reads
794.sp
795Default value: \fB0\fR.
796.RE
797
798.sp
799.ne 2
800.na
801\fBzfs_read_history_hits\fR (int)
802.ad
803.RS 12n
804Include cache hits in read history
805.sp
806Use \fB1\fR for yes and \fB0\fR for no (default).
807.RE
808
809.sp
810.ne 2
811.na
812\fBzfs_recover\fR (int)
813.ad
814.RS 12n
815Set to attempt to recover from fatal errors. This should only be used as a
816last resort, as it typically results in leaked space, or worse.
817.sp
818Use \fB1\fR for yes and \fB0\fR for no (default).
819.RE
820
821.sp
822.ne 2
823.na
824\fBzfs_resilver_delay\fR (int)
825.ad
826.RS 12n
827Number of ticks to delay resilver
828.sp
829Default value: \fB2\fR.
830.RE
831
832.sp
833.ne 2
834.na
835\fBzfs_resilver_min_time_ms\fR (int)
836.ad
837.RS 12n
838Min millisecs to resilver per txg
839.sp
840Default value: \fB3,000\fR.
841.RE
842
843.sp
844.ne 2
845.na
846\fBzfs_scan_idle\fR (int)
847.ad
848.RS 12n
849Idle window in clock ticks
850.sp
851Default value: \fB50\fR.
852.RE
853
854.sp
855.ne 2
856.na
857\fBzfs_scan_min_time_ms\fR (int)
858.ad
859.RS 12n
860Min millisecs to scrub per txg
861.sp
862Default value: \fB1,000\fR.
863.RE
864
865.sp
866.ne 2
867.na
868\fBzfs_scrub_delay\fR (int)
869.ad
870.RS 12n
871Number of ticks to delay scrub
872.sp
873Default value: \fB4\fR.
874.RE
875
876.sp
877.ne 2
878.na
879\fBzfs_send_corrupt_data\fR (int)
880.ad
881.RS 12n
882Allow to send corrupt data (ignore read/checksum errors when sending data)
883.sp
884Use \fB1\fR for yes and \fB0\fR for no (default).
885.RE
886
887.sp
888.ne 2
889.na
890\fBzfs_sync_pass_deferred_free\fR (int)
891.ad
892.RS 12n
893Defer frees starting in this pass
894.sp
895Default value: \fB2\fR.
896.RE
897
898.sp
899.ne 2
900.na
901\fBzfs_sync_pass_dont_compress\fR (int)
902.ad
903.RS 12n
904Don't compress starting in this pass
905.sp
906Default value: \fB5\fR.
907.RE
908
909.sp
910.ne 2
911.na
912\fBzfs_sync_pass_rewrite\fR (int)
913.ad
914.RS 12n
915Rewrite new bps starting in this pass
916.sp
917Default value: \fB2\fR.
918.RE
919
920.sp
921.ne 2
922.na
923\fBzfs_top_maxinflight\fR (int)
924.ad
925.RS 12n
926Max I/Os per top-level
927.sp
928Default value: \fB32\fR.
929.RE
930
931.sp
932.ne 2
933.na
934\fBzfs_txg_history\fR (int)
935.ad
936.RS 12n
937Historic statistics for the last N txgs
938.sp
939Default value: \fB0\fR.
940.RE
941
942.sp
943.ne 2
944.na
945\fBzfs_txg_timeout\fR (int)
946.ad
947.RS 12n
948Max seconds worth of delta per txg
949.sp
950Default value: \fB5\fR.
951.RE
952
953.sp
954.ne 2
955.na
956\fBzfs_vdev_aggregation_limit\fR (int)
957.ad
958.RS 12n
959Max vdev I/O aggregation size
960.sp
961Default value: \fB131,072\fR.
962.RE
963
964.sp
965.ne 2
966.na
967\fBzfs_vdev_cache_bshift\fR (int)
968.ad
969.RS 12n
970Shift size to inflate reads too
971.sp
972Default value: \fB16\fR.
973.RE
974
975.sp
976.ne 2
977.na
978\fBzfs_vdev_cache_max\fR (int)
979.ad
980.RS 12n
981Inflate reads small than max
982.RE
983
984.sp
985.ne 2
986.na
987\fBzfs_vdev_cache_size\fR (int)
988.ad
989.RS 12n
990Total size of the per-disk cache
991.sp
992Default value: \fB0\fR.
993.RE
994
995.sp
996.ne 2
997.na
998\fBzfs_vdev_mirror_switch_us\fR (int)
999.ad
1000.RS 12n
1001Switch mirrors every N usecs
1002.sp
1003Default value: \fB10,000\fR.
1004.RE
1005
1006.sp
1007.ne 2
1008.na
1009\fBzfs_vdev_read_gap_limit\fR (int)
1010.ad
1011.RS 12n
1012Aggregate read I/O over gap
1013.sp
1014Default value: \fB32,768\fR.
1015.RE
1016
1017.sp
1018.ne 2
1019.na
1020\fBzfs_vdev_scheduler\fR (charp)
1021.ad
1022.RS 12n
1023I/O scheduler
1024.sp
1025Default value: \fBnoop\fR.
1026.RE
1027
1028.sp
1029.ne 2
1030.na
1031\fBzfs_vdev_write_gap_limit\fR (int)
1032.ad
1033.RS 12n
1034Aggregate write I/O over gap
1035.sp
1036Default value: \fB4,096\fR.
1037.RE
1038
1039.sp
1040.ne 2
1041.na
1042\fBzfs_zevent_cols\fR (int)
1043.ad
1044.RS 12n
1045Max event column width
1046.sp
1047Default value: \fB80\fR.
1048.RE
1049
1050.sp
1051.ne 2
1052.na
1053\fBzfs_zevent_console\fR (int)
1054.ad
1055.RS 12n
1056Log events to the console
1057.sp
1058Use \fB1\fR for yes and \fB0\fR for no (default).
1059.RE
1060
1061.sp
1062.ne 2
1063.na
1064\fBzfs_zevent_len_max\fR (int)
1065.ad
1066.RS 12n
1067Max event queue length
1068.sp
1069Default value: \fB0\fR.
1070.RE
1071
1072.sp
1073.ne 2
1074.na
1075\fBzil_replay_disable\fR (int)
1076.ad
1077.RS 12n
1078Disable intent logging replay
1079.sp
1080Use \fB1\fR for yes and \fB0\fR for no (default).
1081.RE
1082
1083.sp
1084.ne 2
1085.na
1086\fBzil_slog_limit\fR (ulong)
1087.ad
1088.RS 12n
1089Max commit bytes to separate log device
1090.sp
1091Default value: \fB1,048,576\fR.
1092.RE
1093
1094.sp
1095.ne 2
1096.na
1097\fBzio_bulk_flags\fR (int)
1098.ad
1099.RS 12n
1100Additional flags to pass to bulk buffers
1101.sp
1102Default value: \fB0\fR.
1103.RE
1104
1105.sp
1106.ne 2
1107.na
1108\fBzio_delay_max\fR (int)
1109.ad
1110.RS 12n
1111Max zio millisec delay before posting event
1112.sp
1113Default value: \fB30,000\fR.
1114.RE
1115
1116.sp
1117.ne 2
1118.na
1119\fBzio_injection_enabled\fR (int)
1120.ad
1121.RS 12n
1122Enable fault injection
1123.sp
1124Use \fB1\fR for yes and \fB0\fR for no (default).
1125.RE
1126
1127.sp
1128.ne 2
1129.na
1130\fBzio_requeue_io_start_cut_in_line\fR (int)
1131.ad
1132.RS 12n
1133Prioritize requeued I/O
1134.sp
1135Default value: \fB0\fR.
1136.RE
1137
1138.sp
1139.ne 2
1140.na
1141\fBzvol_inhibit_dev\fR (uint)
1142.ad
1143.RS 12n
1144Do not create zvol device nodes
1145.sp
1146Use \fB1\fR for yes and \fB0\fR for no (default).
1147.RE
1148
1149.sp
1150.ne 2
1151.na
1152\fBzvol_major\fR (uint)
1153.ad
1154.RS 12n
1155Major number for zvol device
1156.sp
1157Default value: \fB230\fR.
1158.RE
1159
1160.sp
1161.ne 2
1162.na
1163\fBzvol_max_discard_blocks\fR (ulong)
1164.ad
1165.RS 12n
1166Max number of blocks to discard at once
1167.sp
1168Default value: \fB16,384\fR.
1169.RE
1170
1171.sp
1172.ne 2
1173.na
1174\fBzvol_threads\fR (uint)
1175.ad
1176.RS 12n
1177Number of threads for zvol device
1178.sp
1179Default value: \fB32\fR.
1180.RE
1181
1182.SH ZFS I/O SCHEDULER
1183ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
1184The I/O scheduler determines when and in what order those operations are
1185issued. The I/O scheduler divides operations into five I/O classes
1186prioritized in the following order: sync read, sync write, async read,
1187async write, and scrub/resilver. Each queue defines the minimum and
1188maximum number of concurrent operations that may be issued to the
1189device. In addition, the device has an aggregate maximum,
1190\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
1191must not exceed the aggregate maximum. If the sum of the per-queue
1192maximums exceeds the aggregate maximum, then the number of active I/Os
1193may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
1194be issued regardless of whether all per-queue minimums have been met.
1195.sp
1196For many physical devices, throughput increases with the number of
1197concurrent operations, but latency typically suffers. Further, physical
1198devices typically have a limit at which more concurrent operations have no
1199effect on throughput or can actually cause it to decrease.
1200.sp
1201The scheduler selects the next operation to issue by first looking for an
1202I/O class whose minimum has not been satisfied. Once all are satisfied and
1203the aggregate maximum has not been hit, the scheduler looks for classes
1204whose maximum has not been satisfied. Iteration through the I/O classes is
1205done in the order specified above. No further operations are issued if the
1206aggregate maximum number of concurrent operations has been hit or if there
1207are no operations queued for an I/O class that has not hit its maximum.
1208Every time an I/O is queued or an operation completes, the I/O scheduler
1209looks for new operations to issue.
1210.sp
1211In general, smaller max_active's will lead to lower latency of synchronous
1212operations. Larger max_active's may lead to higher overall throughput,
1213depending on underlying storage.
1214.sp
1215The ratio of the queues' max_actives determines the balance of performance
1216between reads, writes, and scrubs. E.g., increasing
1217\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
1218more quickly, but reads and writes to have higher latency and lower throughput.
1219.sp
1220All I/O classes have a fixed maximum number of outstanding operations
1221except for the async write class. Asynchronous writes represent the data
1222that is committed to stable storage during the syncing stage for
1223transaction groups. Transaction groups enter the syncing state
1224periodically so the number of queued async writes will quickly burst up
1225and then bleed down to zero. Rather than servicing them as quickly as
1226possible, the I/O scheduler changes the maximum number of active async
1227write I/Os according to the amount of dirty data in the pool. Since
1228both throughput and latency typically increase with the number of
1229concurrent operations issued to physical devices, reducing the
1230burstiness in the number of concurrent operations also stabilizes the
1231response time of operations from other -- and in particular synchronous
1232-- queues. In broad strokes, the I/O scheduler will issue more
1233concurrent operations from the async write queue as there's more dirty
1234data in the pool.
1235.sp
1236Async Writes
1237.sp
1238The number of concurrent operations issued for the async write I/O class
1239follows a piece-wise linear function defined by a few adjustable points.
1240.nf
1241
1242 | o---------| <-- zfs_vdev_async_write_max_active
1243 ^ | /^ |
1244 | | / | |
1245active | / | |
1246 I/O | / | |
1247count | / | |
1248 | / | |
1249 |-------o | | <-- zfs_vdev_async_write_min_active
1250 0|_______^______|_________|
1251 0% | | 100% of zfs_dirty_data_max
1252 | |
1253 | `-- zfs_vdev_async_write_active_max_dirty_percent
1254 `--------- zfs_vdev_async_write_active_min_dirty_percent
1255
1256.fi
1257Until the amount of dirty data exceeds a minimum percentage of the dirty
1258data allowed in the pool, the I/O scheduler will limit the number of
1259concurrent operations to the minimum. As that threshold is crossed, the
1260number of concurrent operations issued increases linearly to the maximum at
1261the specified maximum percentage of the dirty data allowed in the pool.
1262.sp
1263Ideally, the amount of dirty data on a busy pool will stay in the sloped
1264part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
1265and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
1266maximum percentage, this indicates that the rate of incoming data is
1267greater than the rate that the backend storage can handle. In this case, we
1268must further throttle incoming writes, as described in the next section.
1269
1270.SH ZFS TRANSACTION DELAY
1271We delay transactions when we've determined that the backend storage
1272isn't able to accommodate the rate of incoming writes.
1273.sp
1274If there is already a transaction waiting, we delay relative to when
1275that transaction will finish waiting. This way the calculated delay time
1276is independent of the number of threads concurrently executing
1277transactions.
1278.sp
1279If we are the only waiter, wait relative to when the transaction
1280started, rather than the current time. This credits the transaction for
1281"time already served", e.g. reading indirect blocks.
1282.sp
1283The minimum time for a transaction to take is calculated as:
1284.nf
1285 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
1286 min_time is then capped at 100 milliseconds.
1287.fi
1288.sp
1289The delay has two degrees of freedom that can be adjusted via tunables. The
1290percentage of dirty data at which we start to delay is defined by
1291\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
1292\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
1293delay after writing at full speed has failed to keep up with the incoming write
1294rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
1295this variable determines the amount of delay at the midpoint of the curve.
1296.sp
1297.nf
1298delay
1299 10ms +-------------------------------------------------------------*+
1300 | *|
1301 9ms + *+
1302 | *|
1303 8ms + *+
1304 | * |
1305 7ms + * +
1306 | * |
1307 6ms + * +
1308 | * |
1309 5ms + * +
1310 | * |
1311 4ms + * +
1312 | * |
1313 3ms + * +
1314 | * |
1315 2ms + (midpoint) * +
1316 | | ** |
1317 1ms + v *** +
1318 | zfs_delay_scale ----------> ******** |
1319 0 +-------------------------------------*********----------------+
1320 0% <- zfs_dirty_data_max -> 100%
1321.fi
1322.sp
1323Note that since the delay is added to the outstanding time remaining on the
1324most recent transaction, the delay is effectively the inverse of IOPS.
1325Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
1326was chosen such that small changes in the amount of accumulated dirty data
1327in the first 3/4 of the curve yield relatively small differences in the
1328amount of delay.
1329.sp
1330The effects can be easier to understand when the amount of delay is
1331represented on a log scale:
1332.sp
1333.nf
1334delay
1335100ms +-------------------------------------------------------------++
1336 + +
1337 | |
1338 + *+
1339 10ms + *+
1340 + ** +
1341 | (midpoint) ** |
1342 + | ** +
1343 1ms + v **** +
1344 + zfs_delay_scale ----------> ***** +
1345 | **** |
1346 + **** +
1347100us + ** +
1348 + * +
1349 | * |
1350 + * +
1351 10us + * +
1352 + +
1353 | |
1354 + +
1355 +--------------------------------------------------------------+
1356 0% <- zfs_dirty_data_max -> 100%
1357.fi
1358.sp
1359Note here that only as the amount of dirty data approaches its limit does
1360the delay start to increase rapidly. The goal of a properly tuned system
1361should be to keep the amount of dirty data out of that range by first
1362ensuring that the appropriate limits are set for the I/O scheduler to reach
1363optimal throughput on the backend storage, and then by changing the value
1364of \fBzfs_delay_scale\fR to increase the steepness of the curve.