]> git.proxmox.com Git - mirror_zfs.git/blame - man/man5/zfs-module-parameters.5
Disable aggressive arc_p growth by default
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
CommitLineData
29714574
TF
1'\" te
2.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3.\" The contents of this file are subject to the terms of the Common Development
4.\" and Distribution License (the "License"). You may not use this file except
5.\" in compliance with the License. You can obtain a copy of the license at
6.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
7.\"
8.\" See the License for the specific language governing permissions and
9.\" limitations under the License. When distributing Covered Code, include this
10.\" CDDL HEADER in each file and include the License file at
11.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
12.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
13.\" own identifying information:
14.\" Portions Copyright [yyyy] [name of copyright owner]
15.TH ZFS-MODULE-PARAMETERS 5 "Nov 16, 2013"
16.SH NAME
17zfs\-module\-parameters \- ZFS module parameters
18.SH DESCRIPTION
19.sp
20.LP
21Description of the different parameters to the ZFS module.
22
23.SS "Module parameters"
24.sp
25.LP
26
27.sp
28.ne 2
29.na
30\fBl2arc_feed_again\fR (int)
31.ad
32.RS 12n
33Turbo L2ARC warmup
34.sp
35Use \fB1\fR for yes (default) and \fB0\fR to disable.
36.RE
37
38.sp
39.ne 2
40.na
41\fBl2arc_feed_min_ms\fR (ulong)
42.ad
43.RS 12n
44Min feed interval in milliseconds
45.sp
46Default value: \fB200\fR.
47.RE
48
49.sp
50.ne 2
51.na
52\fBl2arc_feed_secs\fR (ulong)
53.ad
54.RS 12n
55Seconds between L2ARC writing
56.sp
57Default value: \fB1\fR.
58.RE
59
60.sp
61.ne 2
62.na
63\fBl2arc_headroom\fR (ulong)
64.ad
65.RS 12n
66Number of max device writes to precache
67.sp
68Default value: \fB2\fR.
69.RE
70
71.sp
72.ne 2
73.na
74\fBl2arc_headroom_boost\fR (ulong)
75.ad
76.RS 12n
77Compressed l2arc_headroom multiplier
78.sp
79Default value: \fB200\fR.
80.RE
81
82.sp
83.ne 2
84.na
85\fBl2arc_nocompress\fR (int)
86.ad
87.RS 12n
88Skip compressing L2ARC buffers
89.sp
90Use \fB1\fR for yes and \fB0\fR for no (default).
91.RE
92
93.sp
94.ne 2
95.na
96\fBl2arc_noprefetch\fR (int)
97.ad
98.RS 12n
99Skip caching prefetched buffers
100.sp
101Use \fB1\fR for yes (default) and \fB0\fR to disable.
102.RE
103
104.sp
105.ne 2
106.na
107\fBl2arc_norw\fR (int)
108.ad
109.RS 12n
110No reads during writes
111.sp
112Use \fB1\fR for yes and \fB0\fR for no (default).
113.RE
114
115.sp
116.ne 2
117.na
118\fBl2arc_write_boost\fR (ulong)
119.ad
120.RS 12n
121Extra write bytes during device warmup
122.sp
123Default value: \fB8,388,608\fR.
124.RE
125
126.sp
127.ne 2
128.na
129\fBl2arc_write_max\fR (ulong)
130.ad
131.RS 12n
132Max write bytes per interval
133.sp
134Default value: \fB8,388,608\fR.
135.RE
136
137.sp
138.ne 2
139.na
140\fBmetaslab_debug\fR (int)
141.ad
142.RS 12n
143Keep space maps in core to verify frees
144.sp
145Use \fB1\fR for yes and \fB0\fR for no (default).
146.RE
147
148.sp
149.ne 2
150.na
151\fBspa_config_path\fR (charp)
152.ad
153.RS 12n
154SPA config file
155.sp
156Default value: \fB/etc/zfs/zpool.cache\fR.
157.RE
158
e8b96c60
MA
159.sp
160.ne 2
161.na
162\fBspa_asize_inflation\fR (int)
163.ad
164.RS 12n
165Multiplication factor used to estimate actual disk consumption from the
166size of data being written. The default value is a worst case estimate,
167but lower values may be valid for a given pool depending on its
168configuration. Pool administrators who understand the factors involved
169may wish to specify a more realistic inflation factor, particularly if
170they operate close to quota or capacity limits.
171.sp
172Default value: 24
173.RE
174
29714574
TF
175.sp
176.ne 2
177.na
178\fBzfetch_array_rd_sz\fR (ulong)
179.ad
180.RS 12n
181Number of bytes in a array_read
182.sp
183Default value: \fB1,048,576\fR.
184.RE
185
186.sp
187.ne 2
188.na
189\fBzfetch_block_cap\fR (uint)
190.ad
191.RS 12n
192Max number of blocks to fetch at a time
193.sp
194Default value: \fB256\fR.
195.RE
196
197.sp
198.ne 2
199.na
200\fBzfetch_max_streams\fR (uint)
201.ad
202.RS 12n
203Max number of streams per zfetch
204.sp
205Default value: \fB8\fR.
206.RE
207
208.sp
209.ne 2
210.na
211\fBzfetch_min_sec_reap\fR (uint)
212.ad
213.RS 12n
214Min time before stream reclaim
215.sp
216Default value: \fB2\fR.
217.RE
218
219.sp
220.ne 2
221.na
222\fBzfs_arc_grow_retry\fR (int)
223.ad
224.RS 12n
225Seconds before growing arc size
226.sp
227Default value: \fB5\fR.
228.RE
229
230.sp
231.ne 2
232.na
233\fBzfs_arc_max\fR (ulong)
234.ad
235.RS 12n
236Max arc size
237.sp
238Default value: \fB0\fR.
239.RE
240
241.sp
242.ne 2
243.na
244\fBzfs_arc_memory_throttle_disable\fR (int)
245.ad
246.RS 12n
247Disable memory throttle
248.sp
249Use \fB1\fR for yes (default) and \fB0\fR to disable.
250.RE
251
252.sp
253.ne 2
254.na
255\fBzfs_arc_meta_limit\fR (ulong)
256.ad
257.RS 12n
258Meta limit for arc size
259.sp
260Default value: \fB0\fR.
261.RE
262
263.sp
264.ne 2
265.na
266\fBzfs_arc_meta_prune\fR (int)
267.ad
268.RS 12n
269Bytes of meta data to prune
270.sp
271Default value: \fB1,048,576\fR.
272.RE
273
274.sp
275.ne 2
276.na
277\fBzfs_arc_min\fR (ulong)
278.ad
279.RS 12n
280Min arc size
281.sp
282Default value: \fB100\fR.
283.RE
284
285.sp
286.ne 2
287.na
288\fBzfs_arc_min_prefetch_lifespan\fR (int)
289.ad
290.RS 12n
291Min life of prefetch block
292.sp
293Default value: \fB100\fR.
294.RE
295
296.sp
297.ne 2
298.na
299\fBzfs_arc_p_min_shift\fR (int)
300.ad
301.RS 12n
302arc_c shift to calc min/max arc_p
303.sp
304Default value: \fB4\fR.
305.RE
306
89c8cac4
PS
307.sp
308.ne 2
309.na
310\fBzfs_arc_p_aggressive_disable\fR (int)
311.ad
312.RS 12n
313Disable aggressive arc_p growth
314.sp
315Use \fB1\fR for yes (default) and \fB0\fR to disable.
316.RE
317
29714574
TF
318.sp
319.ne 2
320.na
321\fBzfs_arc_shrink_shift\fR (int)
322.ad
323.RS 12n
324log2(fraction of arc to reclaim)
325.sp
326Default value: \fB5\fR.
327.RE
328
329.sp
330.ne 2
331.na
332\fBzfs_autoimport_disable\fR (int)
333.ad
334.RS 12n
335Disable pool import at module load
336.sp
337Use \fB1\fR for yes and \fB0\fR for no (default).
338.RE
339
340.sp
341.ne 2
342.na
343\fBzfs_dbuf_state_index\fR (int)
344.ad
345.RS 12n
346Calculate arc header index
347.sp
348Default value: \fB0\fR.
349.RE
350
351.sp
352.ne 2
353.na
354\fBzfs_deadman_enabled\fR (int)
355.ad
356.RS 12n
357Enable deadman timer
358.sp
359Use \fB1\fR for yes (default) and \fB0\fR to disable.
360.RE
361
362.sp
363.ne 2
364.na
e8b96c60 365\fBzfs_deadman_synctime_ms\fR (ulong)
29714574
TF
366.ad
367.RS 12n
e8b96c60
MA
368Expiration time in milliseconds. This value has two meanings. First it is
369used to determine when the spa_deadman() logic should fire. By default the
370spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
371Secondly, the value determines if an I/O is considered "hung". Any I/O that
372has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
373in a zevent being logged.
29714574 374.sp
e8b96c60 375Default value: \fB1,000,000\fR.
29714574
TF
376.RE
377
378.sp
379.ne 2
380.na
381\fBzfs_dedup_prefetch\fR (int)
382.ad
383.RS 12n
384Enable prefetching dedup-ed blks
385.sp
386Use \fB1\fR for yes (default) and \fB0\fR to disable.
387.RE
388
e8b96c60
MA
389.sp
390.ne 2
391.na
392\fBzfs_delay_min_dirty_percent\fR (int)
393.ad
394.RS 12n
395Start to delay each transaction once there is this amount of dirty data,
396expressed as a percentage of \fBzfs_dirty_data_max\fR.
397This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
398See the section "ZFS TRANSACTION DELAY".
399.sp
400Default value: \fB60\fR.
401.RE
402
403.sp
404.ne 2
405.na
406\fBzfs_delay_scale\fR (int)
407.ad
408.RS 12n
409This controls how quickly the transaction delay approaches infinity.
410Larger values cause longer delays for a given amount of dirty data.
411.sp
412For the smoothest delay, this value should be about 1 billion divided
413by the maximum number of operations per second. This will smoothly
414handle between 10x and 1/10th this number.
415.sp
416See the section "ZFS TRANSACTION DELAY".
417.sp
418Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
419.sp
420Default value: \fB500,000\fR.
421.RE
422
423.sp
424.ne 2
425.na
426\fBzfs_dirty_data_max\fR (int)
427.ad
428.RS 12n
429Determines the dirty space limit in bytes. Once this limit is exceeded, new
430writes are halted until space frees up. This parameter takes precedence
431over \fBzfs_dirty_data_max_percent\fR.
432See the section "ZFS TRANSACTION DELAY".
433.sp
434Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
435.RE
436
437.sp
438.ne 2
439.na
440\fBzfs_dirty_data_max_max\fR (int)
441.ad
442.RS 12n
443Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
444This limit is only enforced at module load time, and will be ignored if
445\fBzfs_dirty_data_max\fR is later changed. This parameter takes
446precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
447"ZFS TRANSACTION DELAY".
448.sp
449Default value: 25% of physical RAM.
450.RE
451
452.sp
453.ne 2
454.na
455\fBzfs_dirty_data_max_max_percent\fR (int)
456.ad
457.RS 12n
458Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
459percentage of physical RAM. This limit is only enforced at module load
460time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
461The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
462one. See the section "ZFS TRANSACTION DELAY".
463.sp
464Default value: 25
465.RE
466
467.sp
468.ne 2
469.na
470\fBzfs_dirty_data_max_percent\fR (int)
471.ad
472.RS 12n
473Determines the dirty space limit, expressed as a percentage of all
474memory. Once this limit is exceeded, new writes are halted until space frees
475up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
476one. See the section "ZFS TRANSACTION DELAY".
477.sp
478Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
479.RE
480
481.sp
482.ne 2
483.na
484\fBzfs_dirty_data_sync\fR (int)
485.ad
486.RS 12n
487Start syncing out a transaction group if there is at least this much dirty data.
488.sp
489Default value: \fB67,108,864\fR.
490.RE
491
492.sp
493.ne 2
494.na
495\fBzfs_vdev_async_read_max_active\fR (int)
496.ad
497.RS 12n
498Maxium asynchronous read I/Os active to each device.
499See the section "ZFS I/O SCHEDULER".
500.sp
501Default value: \fB3\fR.
502.RE
503
504.sp
505.ne 2
506.na
507\fBzfs_vdev_async_read_min_active\fR (int)
508.ad
509.RS 12n
510Minimum asynchronous read I/Os active to each device.
511See the section "ZFS I/O SCHEDULER".
512.sp
513Default value: \fB1\fR.
514.RE
515
516.sp
517.ne 2
518.na
519\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
520.ad
521.RS 12n
522When the pool has more than
523\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
524\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
525the dirty data is between min and max, the active I/O limit is linearly
526interpolated. See the section "ZFS I/O SCHEDULER".
527.sp
528Default value: \fB60\fR.
529.RE
530
531.sp
532.ne 2
533.na
534\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
535.ad
536.RS 12n
537When the pool has less than
538\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
539\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
540the dirty data is between min and max, the active I/O limit is linearly
541interpolated. See the section "ZFS I/O SCHEDULER".
542.sp
543Default value: \fB30\fR.
544.RE
545
546.sp
547.ne 2
548.na
549\fBzfs_vdev_async_write_max_active\fR (int)
550.ad
551.RS 12n
552Maxium asynchronous write I/Os active to each device.
553See the section "ZFS I/O SCHEDULER".
554.sp
555Default value: \fB10\fR.
556.RE
557
558.sp
559.ne 2
560.na
561\fBzfs_vdev_async_write_min_active\fR (int)
562.ad
563.RS 12n
564Minimum asynchronous write I/Os active to each device.
565See the section "ZFS I/O SCHEDULER".
566.sp
567Default value: \fB1\fR.
568.RE
569
570.sp
571.ne 2
572.na
573\fBzfs_vdev_max_active\fR (int)
574.ad
575.RS 12n
576The maximum number of I/Os active to each device. Ideally, this will be >=
577the sum of each queue's max_active. It must be at least the sum of each
578queue's min_active. See the section "ZFS I/O SCHEDULER".
579.sp
580Default value: \fB1,000\fR.
581.RE
582
583.sp
584.ne 2
585.na
586\fBzfs_vdev_scrub_max_active\fR (int)
587.ad
588.RS 12n
589Maxium scrub I/Os active to each device.
590See the section "ZFS I/O SCHEDULER".
591.sp
592Default value: \fB2\fR.
593.RE
594
595.sp
596.ne 2
597.na
598\fBzfs_vdev_scrub_min_active\fR (int)
599.ad
600.RS 12n
601Minimum scrub I/Os active to each device.
602See the section "ZFS I/O SCHEDULER".
603.sp
604Default value: \fB1\fR.
605.RE
606
607.sp
608.ne 2
609.na
610\fBzfs_vdev_sync_read_max_active\fR (int)
611.ad
612.RS 12n
613Maxium synchronous read I/Os active to each device.
614See the section "ZFS I/O SCHEDULER".
615.sp
616Default value: \fB10\fR.
617.RE
618
619.sp
620.ne 2
621.na
622\fBzfs_vdev_sync_read_min_active\fR (int)
623.ad
624.RS 12n
625Minimum synchronous read I/Os active to each device.
626See the section "ZFS I/O SCHEDULER".
627.sp
628Default value: \fB10\fR.
629.RE
630
631.sp
632.ne 2
633.na
634\fBzfs_vdev_sync_write_max_active\fR (int)
635.ad
636.RS 12n
637Maxium synchronous write I/Os active to each device.
638See the section "ZFS I/O SCHEDULER".
639.sp
640Default value: \fB10\fR.
641.RE
642
643.sp
644.ne 2
645.na
646\fBzfs_vdev_sync_write_min_active\fR (int)
647.ad
648.RS 12n
649Minimum synchronous write I/Os active to each device.
650See the section "ZFS I/O SCHEDULER".
651.sp
652Default value: \fB10\fR.
653.RE
654
29714574
TF
655.sp
656.ne 2
657.na
658\fBzfs_disable_dup_eviction\fR (int)
659.ad
660.RS 12n
661Disable duplicate buffer eviction
662.sp
663Use \fB1\fR for yes and \fB0\fR for no (default).
664.RE
665
666.sp
667.ne 2
668.na
669\fBzfs_expire_snapshot\fR (int)
670.ad
671.RS 12n
672Seconds to expire .zfs/snapshot
673.sp
674Default value: \fB300\fR.
675.RE
676
677.sp
678.ne 2
679.na
680\fBzfs_flags\fR (int)
681.ad
682.RS 12n
683Set additional debugging flags
684.sp
685Default value: \fB1\fR.
686.RE
687
688.sp
689.ne 2
690.na
691\fBzfs_free_min_time_ms\fR (int)
692.ad
693.RS 12n
694Min millisecs to free per txg
695.sp
696Default value: \fB1,000\fR.
697.RE
698
699.sp
700.ne 2
701.na
702\fBzfs_immediate_write_sz\fR (long)
703.ad
704.RS 12n
705Largest data block to write to zil
706.sp
707Default value: \fB32,768\fR.
708.RE
709
710.sp
711.ne 2
712.na
713\fBzfs_mdcomp_disable\fR (int)
714.ad
715.RS 12n
716Disable meta data compression
717.sp
718Use \fB1\fR for yes and \fB0\fR for no (default).
719.RE
720
721.sp
722.ne 2
723.na
724\fBzfs_no_scrub_io\fR (int)
725.ad
726.RS 12n
727Set for no scrub I/O
728.sp
729Use \fB1\fR for yes and \fB0\fR for no (default).
730.RE
731
732.sp
733.ne 2
734.na
735\fBzfs_no_scrub_prefetch\fR (int)
736.ad
737.RS 12n
738Set for no scrub prefetching
739.sp
740Use \fB1\fR for yes and \fB0\fR for no (default).
741.RE
742
29714574
TF
743.sp
744.ne 2
745.na
746\fBzfs_nocacheflush\fR (int)
747.ad
748.RS 12n
749Disable cache flushes
750.sp
751Use \fB1\fR for yes and \fB0\fR for no (default).
752.RE
753
754.sp
755.ne 2
756.na
757\fBzfs_nopwrite_enabled\fR (int)
758.ad
759.RS 12n
760Enable NOP writes
761.sp
762Use \fB1\fR for yes (default) and \fB0\fR to disable.
763.RE
764
765.sp
766.ne 2
767.na
768\fBzfs_pd_blks_max\fR (int)
769.ad
770.RS 12n
771Max number of blocks to prefetch
772.sp
773Default value: \fB100\fR.
774.RE
775
776.sp
777.ne 2
778.na
779\fBzfs_prefetch_disable\fR (int)
780.ad
781.RS 12n
782Disable all ZFS prefetching
783.sp
784Use \fB1\fR for yes and \fB0\fR for no (default).
785.RE
786
787.sp
788.ne 2
789.na
790\fBzfs_read_chunk_size\fR (long)
791.ad
792.RS 12n
793Bytes to read per chunk
794.sp
795Default value: \fB1,048,576\fR.
796.RE
797
798.sp
799.ne 2
800.na
801\fBzfs_read_history\fR (int)
802.ad
803.RS 12n
804Historic statistics for the last N reads
805.sp
806Default value: \fB0\fR.
807.RE
808
809.sp
810.ne 2
811.na
812\fBzfs_read_history_hits\fR (int)
813.ad
814.RS 12n
815Include cache hits in read history
816.sp
817Use \fB1\fR for yes and \fB0\fR for no (default).
818.RE
819
820.sp
821.ne 2
822.na
823\fBzfs_recover\fR (int)
824.ad
825.RS 12n
826Set to attempt to recover from fatal errors. This should only be used as a
827last resort, as it typically results in leaked space, or worse.
828.sp
829Use \fB1\fR for yes and \fB0\fR for no (default).
830.RE
831
832.sp
833.ne 2
834.na
835\fBzfs_resilver_delay\fR (int)
836.ad
837.RS 12n
838Number of ticks to delay resilver
839.sp
840Default value: \fB2\fR.
841.RE
842
843.sp
844.ne 2
845.na
846\fBzfs_resilver_min_time_ms\fR (int)
847.ad
848.RS 12n
849Min millisecs to resilver per txg
850.sp
851Default value: \fB3,000\fR.
852.RE
853
854.sp
855.ne 2
856.na
857\fBzfs_scan_idle\fR (int)
858.ad
859.RS 12n
860Idle window in clock ticks
861.sp
862Default value: \fB50\fR.
863.RE
864
865.sp
866.ne 2
867.na
868\fBzfs_scan_min_time_ms\fR (int)
869.ad
870.RS 12n
871Min millisecs to scrub per txg
872.sp
873Default value: \fB1,000\fR.
874.RE
875
876.sp
877.ne 2
878.na
879\fBzfs_scrub_delay\fR (int)
880.ad
881.RS 12n
882Number of ticks to delay scrub
883.sp
884Default value: \fB4\fR.
885.RE
886
fd8febbd
TF
887.sp
888.ne 2
889.na
890\fBzfs_send_corrupt_data\fR (int)
891.ad
892.RS 12n
893Allow to send corrupt data (ignore read/checksum errors when sending data)
894.sp
895Use \fB1\fR for yes and \fB0\fR for no (default).
896.RE
897
29714574
TF
898.sp
899.ne 2
900.na
901\fBzfs_sync_pass_deferred_free\fR (int)
902.ad
903.RS 12n
904Defer frees starting in this pass
905.sp
906Default value: \fB2\fR.
907.RE
908
909.sp
910.ne 2
911.na
912\fBzfs_sync_pass_dont_compress\fR (int)
913.ad
914.RS 12n
915Don't compress starting in this pass
916.sp
917Default value: \fB5\fR.
918.RE
919
920.sp
921.ne 2
922.na
923\fBzfs_sync_pass_rewrite\fR (int)
924.ad
925.RS 12n
926Rewrite new bps starting in this pass
927.sp
928Default value: \fB2\fR.
929.RE
930
931.sp
932.ne 2
933.na
934\fBzfs_top_maxinflight\fR (int)
935.ad
936.RS 12n
937Max I/Os per top-level
938.sp
939Default value: \fB32\fR.
940.RE
941
942.sp
943.ne 2
944.na
945\fBzfs_txg_history\fR (int)
946.ad
947.RS 12n
948Historic statistics for the last N txgs
949.sp
950Default value: \fB0\fR.
951.RE
952
29714574
TF
953.sp
954.ne 2
955.na
956\fBzfs_txg_timeout\fR (int)
957.ad
958.RS 12n
959Max seconds worth of delta per txg
960.sp
961Default value: \fB5\fR.
962.RE
963
964.sp
965.ne 2
966.na
967\fBzfs_vdev_aggregation_limit\fR (int)
968.ad
969.RS 12n
970Max vdev I/O aggregation size
971.sp
972Default value: \fB131,072\fR.
973.RE
974
975.sp
976.ne 2
977.na
978\fBzfs_vdev_cache_bshift\fR (int)
979.ad
980.RS 12n
981Shift size to inflate reads too
982.sp
983Default value: \fB16\fR.
984.RE
985
986.sp
987.ne 2
988.na
989\fBzfs_vdev_cache_max\fR (int)
990.ad
991.RS 12n
992Inflate reads small than max
993.RE
994
995.sp
996.ne 2
997.na
998\fBzfs_vdev_cache_size\fR (int)
999.ad
1000.RS 12n
1001Total size of the per-disk cache
1002.sp
1003Default value: \fB0\fR.
1004.RE
1005
29714574
TF
1006.sp
1007.ne 2
1008.na
1009\fBzfs_vdev_mirror_switch_us\fR (int)
1010.ad
1011.RS 12n
1012Switch mirrors every N usecs
1013.sp
1014Default value: \fB10,000\fR.
1015.RE
1016
29714574
TF
1017.sp
1018.ne 2
1019.na
1020\fBzfs_vdev_read_gap_limit\fR (int)
1021.ad
1022.RS 12n
1023Aggregate read I/O over gap
1024.sp
1025Default value: \fB32,768\fR.
1026.RE
1027
1028.sp
1029.ne 2
1030.na
1031\fBzfs_vdev_scheduler\fR (charp)
1032.ad
1033.RS 12n
1034I/O scheduler
1035.sp
1036Default value: \fBnoop\fR.
1037.RE
1038
29714574
TF
1039.sp
1040.ne 2
1041.na
1042\fBzfs_vdev_write_gap_limit\fR (int)
1043.ad
1044.RS 12n
1045Aggregate write I/O over gap
1046.sp
1047Default value: \fB4,096\fR.
1048.RE
1049
29714574
TF
1050.sp
1051.ne 2
1052.na
1053\fBzfs_zevent_cols\fR (int)
1054.ad
1055.RS 12n
1056Max event column width
1057.sp
1058Default value: \fB80\fR.
1059.RE
1060
1061.sp
1062.ne 2
1063.na
1064\fBzfs_zevent_console\fR (int)
1065.ad
1066.RS 12n
1067Log events to the console
1068.sp
1069Use \fB1\fR for yes and \fB0\fR for no (default).
1070.RE
1071
1072.sp
1073.ne 2
1074.na
1075\fBzfs_zevent_len_max\fR (int)
1076.ad
1077.RS 12n
1078Max event queue length
1079.sp
1080Default value: \fB0\fR.
1081.RE
1082
1083.sp
1084.ne 2
1085.na
1086\fBzil_replay_disable\fR (int)
1087.ad
1088.RS 12n
1089Disable intent logging replay
1090.sp
1091Use \fB1\fR for yes and \fB0\fR for no (default).
1092.RE
1093
1094.sp
1095.ne 2
1096.na
1097\fBzil_slog_limit\fR (ulong)
1098.ad
1099.RS 12n
1100Max commit bytes to separate log device
1101.sp
1102Default value: \fB1,048,576\fR.
1103.RE
1104
1105.sp
1106.ne 2
1107.na
1108\fBzio_bulk_flags\fR (int)
1109.ad
1110.RS 12n
1111Additional flags to pass to bulk buffers
1112.sp
1113Default value: \fB0\fR.
1114.RE
1115
1116.sp
1117.ne 2
1118.na
1119\fBzio_delay_max\fR (int)
1120.ad
1121.RS 12n
1122Max zio millisec delay before posting event
1123.sp
1124Default value: \fB30,000\fR.
1125.RE
1126
1127.sp
1128.ne 2
1129.na
1130\fBzio_injection_enabled\fR (int)
1131.ad
1132.RS 12n
1133Enable fault injection
1134.sp
1135Use \fB1\fR for yes and \fB0\fR for no (default).
1136.RE
1137
1138.sp
1139.ne 2
1140.na
1141\fBzio_requeue_io_start_cut_in_line\fR (int)
1142.ad
1143.RS 12n
1144Prioritize requeued I/O
1145.sp
1146Default value: \fB0\fR.
1147.RE
1148
1149.sp
1150.ne 2
1151.na
1152\fBzvol_inhibit_dev\fR (uint)
1153.ad
1154.RS 12n
1155Do not create zvol device nodes
1156.sp
1157Use \fB1\fR for yes and \fB0\fR for no (default).
1158.RE
1159
1160.sp
1161.ne 2
1162.na
1163\fBzvol_major\fR (uint)
1164.ad
1165.RS 12n
1166Major number for zvol device
1167.sp
1168Default value: \fB230\fR.
1169.RE
1170
1171.sp
1172.ne 2
1173.na
1174\fBzvol_max_discard_blocks\fR (ulong)
1175.ad
1176.RS 12n
1177Max number of blocks to discard at once
1178.sp
1179Default value: \fB16,384\fR.
1180.RE
1181
1182.sp
1183.ne 2
1184.na
1185\fBzvol_threads\fR (uint)
1186.ad
1187.RS 12n
1188Number of threads for zvol device
1189.sp
1190Default value: \fB32\fR.
1191.RE
1192
e8b96c60
MA
1193.SH ZFS I/O SCHEDULER
1194ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
1195The I/O scheduler determines when and in what order those operations are
1196issued. The I/O scheduler divides operations into five I/O classes
1197prioritized in the following order: sync read, sync write, async read,
1198async write, and scrub/resilver. Each queue defines the minimum and
1199maximum number of concurrent operations that may be issued to the
1200device. In addition, the device has an aggregate maximum,
1201\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
1202must not exceed the aggregate maximum. If the sum of the per-queue
1203maximums exceeds the aggregate maximum, then the number of active I/Os
1204may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
1205be issued regardless of whether all per-queue minimums have been met.
1206.sp
1207For many physical devices, throughput increases with the number of
1208concurrent operations, but latency typically suffers. Further, physical
1209devices typically have a limit at which more concurrent operations have no
1210effect on throughput or can actually cause it to decrease.
1211.sp
1212The scheduler selects the next operation to issue by first looking for an
1213I/O class whose minimum has not been satisfied. Once all are satisfied and
1214the aggregate maximum has not been hit, the scheduler looks for classes
1215whose maximum has not been satisfied. Iteration through the I/O classes is
1216done in the order specified above. No further operations are issued if the
1217aggregate maximum number of concurrent operations has been hit or if there
1218are no operations queued for an I/O class that has not hit its maximum.
1219Every time an I/O is queued or an operation completes, the I/O scheduler
1220looks for new operations to issue.
1221.sp
1222In general, smaller max_active's will lead to lower latency of synchronous
1223operations. Larger max_active's may lead to higher overall throughput,
1224depending on underlying storage.
1225.sp
1226The ratio of the queues' max_actives determines the balance of performance
1227between reads, writes, and scrubs. E.g., increasing
1228\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
1229more quickly, but reads and writes to have higher latency and lower throughput.
1230.sp
1231All I/O classes have a fixed maximum number of outstanding operations
1232except for the async write class. Asynchronous writes represent the data
1233that is committed to stable storage during the syncing stage for
1234transaction groups. Transaction groups enter the syncing state
1235periodically so the number of queued async writes will quickly burst up
1236and then bleed down to zero. Rather than servicing them as quickly as
1237possible, the I/O scheduler changes the maximum number of active async
1238write I/Os according to the amount of dirty data in the pool. Since
1239both throughput and latency typically increase with the number of
1240concurrent operations issued to physical devices, reducing the
1241burstiness in the number of concurrent operations also stabilizes the
1242response time of operations from other -- and in particular synchronous
1243-- queues. In broad strokes, the I/O scheduler will issue more
1244concurrent operations from the async write queue as there's more dirty
1245data in the pool.
1246.sp
1247Async Writes
1248.sp
1249The number of concurrent operations issued for the async write I/O class
1250follows a piece-wise linear function defined by a few adjustable points.
1251.nf
1252
1253 | o---------| <-- zfs_vdev_async_write_max_active
1254 ^ | /^ |
1255 | | / | |
1256active | / | |
1257 I/O | / | |
1258count | / | |
1259 | / | |
1260 |-------o | | <-- zfs_vdev_async_write_min_active
1261 0|_______^______|_________|
1262 0% | | 100% of zfs_dirty_data_max
1263 | |
1264 | `-- zfs_vdev_async_write_active_max_dirty_percent
1265 `--------- zfs_vdev_async_write_active_min_dirty_percent
1266
1267.fi
1268Until the amount of dirty data exceeds a minimum percentage of the dirty
1269data allowed in the pool, the I/O scheduler will limit the number of
1270concurrent operations to the minimum. As that threshold is crossed, the
1271number of concurrent operations issued increases linearly to the maximum at
1272the specified maximum percentage of the dirty data allowed in the pool.
1273.sp
1274Ideally, the amount of dirty data on a busy pool will stay in the sloped
1275part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
1276and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
1277maximum percentage, this indicates that the rate of incoming data is
1278greater than the rate that the backend storage can handle. In this case, we
1279must further throttle incoming writes, as described in the next section.
1280
1281.SH ZFS TRANSACTION DELAY
1282We delay transactions when we've determined that the backend storage
1283isn't able to accommodate the rate of incoming writes.
1284.sp
1285If there is already a transaction waiting, we delay relative to when
1286that transaction will finish waiting. This way the calculated delay time
1287is independent of the number of threads concurrently executing
1288transactions.
1289.sp
1290If we are the only waiter, wait relative to when the transaction
1291started, rather than the current time. This credits the transaction for
1292"time already served", e.g. reading indirect blocks.
1293.sp
1294The minimum time for a transaction to take is calculated as:
1295.nf
1296 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
1297 min_time is then capped at 100 milliseconds.
1298.fi
1299.sp
1300The delay has two degrees of freedom that can be adjusted via tunables. The
1301percentage of dirty data at which we start to delay is defined by
1302\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
1303\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
1304delay after writing at full speed has failed to keep up with the incoming write
1305rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
1306this variable determines the amount of delay at the midpoint of the curve.
1307.sp
1308.nf
1309delay
1310 10ms +-------------------------------------------------------------*+
1311 | *|
1312 9ms + *+
1313 | *|
1314 8ms + *+
1315 | * |
1316 7ms + * +
1317 | * |
1318 6ms + * +
1319 | * |
1320 5ms + * +
1321 | * |
1322 4ms + * +
1323 | * |
1324 3ms + * +
1325 | * |
1326 2ms + (midpoint) * +
1327 | | ** |
1328 1ms + v *** +
1329 | zfs_delay_scale ----------> ******** |
1330 0 +-------------------------------------*********----------------+
1331 0% <- zfs_dirty_data_max -> 100%
1332.fi
1333.sp
1334Note that since the delay is added to the outstanding time remaining on the
1335most recent transaction, the delay is effectively the inverse of IOPS.
1336Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
1337was chosen such that small changes in the amount of accumulated dirty data
1338in the first 3/4 of the curve yield relatively small differences in the
1339amount of delay.
1340.sp
1341The effects can be easier to understand when the amount of delay is
1342represented on a log scale:
1343.sp
1344.nf
1345delay
1346100ms +-------------------------------------------------------------++
1347 + +
1348 | |
1349 + *+
1350 10ms + *+
1351 + ** +
1352 | (midpoint) ** |
1353 + | ** +
1354 1ms + v **** +
1355 + zfs_delay_scale ----------> ***** +
1356 | **** |
1357 + **** +
1358100us + ** +
1359 + * +
1360 | * |
1361 + * +
1362 10us + * +
1363 + +
1364 | |
1365 + +
1366 +--------------------------------------------------------------+
1367 0% <- zfs_dirty_data_max -> 100%
1368.fi
1369.sp
1370Note here that only as the amount of dirty data approaches its limit does
1371the delay start to increase rapidly. The goal of a properly tuned system
1372should be to keep the amount of dirty data out of that range by first
1373ensuring that the appropriate limits are set for the I/O scheduler to reach
1374optimal throughput on the backend storage, and then by changing the value
1375of \fBzfs_delay_scale\fR to increase the steepness of the curve.