]> git.proxmox.com Git - mirror_zfs.git/blob - man/man5/zfs-module-parameters.5
Change the default 'zfs_dedup_prefetch' value to '0'
[mirror_zfs.git] / man / man5 / zfs-module-parameters.5
1 '\" te
2 .\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3 .\" The contents of this file are subject to the terms of the Common Development
4 .\" and Distribution License (the "License"). You may not use this file except
5 .\" in compliance with the License. You can obtain a copy of the license at
6 .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
7 .\"
8 .\" See the License for the specific language governing permissions and
9 .\" limitations under the License. When distributing Covered Code, include this
10 .\" CDDL HEADER in each file and include the License file at
11 .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
12 .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
13 .\" own identifying information:
14 .\" Portions Copyright [yyyy] [name of copyright owner]
15 .TH ZFS-MODULE-PARAMETERS 5 "Nov 16, 2013"
16 .SH NAME
17 zfs\-module\-parameters \- ZFS module parameters
18 .SH DESCRIPTION
19 .sp
20 .LP
21 Description of the different parameters to the ZFS module.
22
23 .SS "Module parameters"
24 .sp
25 .LP
26
27 .sp
28 .ne 2
29 .na
30 \fBl2arc_feed_again\fR (int)
31 .ad
32 .RS 12n
33 Turbo L2ARC warmup
34 .sp
35 Use \fB1\fR for yes (default) and \fB0\fR to disable.
36 .RE
37
38 .sp
39 .ne 2
40 .na
41 \fBl2arc_feed_min_ms\fR (ulong)
42 .ad
43 .RS 12n
44 Min feed interval in milliseconds
45 .sp
46 Default value: \fB200\fR.
47 .RE
48
49 .sp
50 .ne 2
51 .na
52 \fBl2arc_feed_secs\fR (ulong)
53 .ad
54 .RS 12n
55 Seconds between L2ARC writing
56 .sp
57 Default value: \fB1\fR.
58 .RE
59
60 .sp
61 .ne 2
62 .na
63 \fBl2arc_headroom\fR (ulong)
64 .ad
65 .RS 12n
66 Number of max device writes to precache
67 .sp
68 Default value: \fB2\fR.
69 .RE
70
71 .sp
72 .ne 2
73 .na
74 \fBl2arc_headroom_boost\fR (ulong)
75 .ad
76 .RS 12n
77 Compressed l2arc_headroom multiplier
78 .sp
79 Default value: \fB200\fR.
80 .RE
81
82 .sp
83 .ne 2
84 .na
85 \fBl2arc_nocompress\fR (int)
86 .ad
87 .RS 12n
88 Skip compressing L2ARC buffers
89 .sp
90 Use \fB1\fR for yes and \fB0\fR for no (default).
91 .RE
92
93 .sp
94 .ne 2
95 .na
96 \fBl2arc_noprefetch\fR (int)
97 .ad
98 .RS 12n
99 Skip caching prefetched buffers
100 .sp
101 Use \fB1\fR for yes (default) and \fB0\fR to disable.
102 .RE
103
104 .sp
105 .ne 2
106 .na
107 \fBl2arc_norw\fR (int)
108 .ad
109 .RS 12n
110 No reads during writes
111 .sp
112 Use \fB1\fR for yes and \fB0\fR for no (default).
113 .RE
114
115 .sp
116 .ne 2
117 .na
118 \fBl2arc_write_boost\fR (ulong)
119 .ad
120 .RS 12n
121 Extra write bytes during device warmup
122 .sp
123 Default value: \fB8,388,608\fR.
124 .RE
125
126 .sp
127 .ne 2
128 .na
129 \fBl2arc_write_max\fR (ulong)
130 .ad
131 .RS 12n
132 Max write bytes per interval
133 .sp
134 Default value: \fB8,388,608\fR.
135 .RE
136
137 .sp
138 .ne 2
139 .na
140 \fBmetaslab_bias_enabled\fR (int)
141 .ad
142 .RS 12n
143 Enable metaslab group biasing based on its vdev's over- or under-utilization
144 relative to the pool.
145 .sp
146 Use \fB1\fR for yes (default) and \fB0\fR for no.
147 .RE
148
149 .sp
150 .ne 2
151 .na
152 \fBmetaslab_debug_load\fR (int)
153 .ad
154 .RS 12n
155 Load all metaslabs during pool import.
156 .sp
157 Use \fB1\fR for yes and \fB0\fR for no (default).
158 .RE
159
160 .sp
161 .ne 2
162 .na
163 \fBmetaslab_debug_unload\fR (int)
164 .ad
165 .RS 12n
166 Prevent metaslabs from being unloaded.
167 .sp
168 Use \fB1\fR for yes and \fB0\fR for no (default).
169 .RE
170
171 .sp
172 .ne 2
173 .na
174 \fBmetaslab_fragmentation_factor_enabled\fR (int)
175 .ad
176 .RS 12n
177 Enable use of the fragmentation metric in computing metaslab weights.
178 .sp
179 Use \fB1\fR for yes (default) and \fB0\fR for no.
180 .RE
181
182 .sp
183 .ne 2
184 .na
185 \fBmetaslab_preload_enabled\fR (int)
186 .ad
187 .RS 12n
188 Enable metaslab group preloading.
189 .sp
190 Use \fB1\fR for yes (default) and \fB0\fR for no.
191 .RE
192
193 .sp
194 .ne 2
195 .na
196 \fBmetaslab_lba_weighting_enabled\fR (int)
197 .ad
198 .RS 12n
199 Give more weight to metaslabs with lower LBAs, assuming they have
200 greater bandwidth as is typically the case on a modern constant
201 angular velocity disk drive.
202 .sp
203 Use \fB1\fR for yes (default) and \fB0\fR for no.
204 .RE
205
206 .sp
207 .ne 2
208 .na
209 \fBspa_config_path\fR (charp)
210 .ad
211 .RS 12n
212 SPA config file
213 .sp
214 Default value: \fB/etc/zfs/zpool.cache\fR.
215 .RE
216
217 .sp
218 .ne 2
219 .na
220 \fBspa_asize_inflation\fR (int)
221 .ad
222 .RS 12n
223 Multiplication factor used to estimate actual disk consumption from the
224 size of data being written. The default value is a worst case estimate,
225 but lower values may be valid for a given pool depending on its
226 configuration. Pool administrators who understand the factors involved
227 may wish to specify a more realistic inflation factor, particularly if
228 they operate close to quota or capacity limits.
229 .sp
230 Default value: 24
231 .RE
232
233 .sp
234 .ne 2
235 .na
236 \fBspa_load_verify_data\fR (int)
237 .ad
238 .RS 12n
239 Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
240 import. Use 0 to disable and 1 to enable.
241
242 An extreme rewind import normally performs a full traversal of all
243 blocks in the pool for verification. If this parameter is set to 0,
244 the traversal skips non-metadata blocks. It can be toggled once the
245 import has started to stop or start the traversal of non-metadata blocks.
246 .sp
247 Default value: 1
248 .RE
249
250 .sp
251 .ne 2
252 .na
253 \fBspa_load_verify_metadata\fR (int)
254 .ad
255 .RS 12n
256 Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
257 pool import. Use 0 to disable and 1 to enable.
258
259 An extreme rewind import normally performs a full traversal of all
260 blocks in the pool for verification. If this parameter is set to 1,
261 the traversal is not performed. It can be toggled once the import has
262 started to stop or start the traversal.
263 .sp
264 Default value: 1
265 .RE
266
267 .sp
268 .ne 2
269 .na
270 \fBspa_load_verify_maxinflight\fR (int)
271 .ad
272 .RS 12n
273 Maximum concurrent I/Os during the traversal performed during an "extreme
274 rewind" (\fB-X\fR) pool import.
275 .sp
276 Default value: 10000
277 .RE
278
279 .sp
280 .ne 2
281 .na
282 \fBzfetch_array_rd_sz\fR (ulong)
283 .ad
284 .RS 12n
285 If prefetching is enabled, disable prefetching for reads larger than this size.
286 .sp
287 Default value: \fB1,048,576\fR.
288 .RE
289
290 .sp
291 .ne 2
292 .na
293 \fBzfetch_block_cap\fR (uint)
294 .ad
295 .RS 12n
296 Max number of blocks to prefetch at a time
297 .sp
298 Default value: \fB256\fR.
299 .RE
300
301 .sp
302 .ne 2
303 .na
304 \fBzfetch_max_streams\fR (uint)
305 .ad
306 .RS 12n
307 Max number of streams per zfetch (prefetch streams per file).
308 .sp
309 Default value: \fB8\fR.
310 .RE
311
312 .sp
313 .ne 2
314 .na
315 \fBzfetch_min_sec_reap\fR (uint)
316 .ad
317 .RS 12n
318 Min time before an active prefetch stream can be reclaimed
319 .sp
320 Default value: \fB2\fR.
321 .RE
322
323 .sp
324 .ne 2
325 .na
326 \fBzfs_arc_average_blocksize\fR (int)
327 .ad
328 .RS 12n
329 The ARC's buffer hash table is sized based on the assumption of an average
330 block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
331 to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
332 For configurations with a known larger average block size this value can be
333 increased to reduce the memory footprint.
334
335 .sp
336 Default value: \fB8192\fR.
337 .RE
338
339 .sp
340 .ne 2
341 .na
342 \fBzfs_arc_grow_retry\fR (int)
343 .ad
344 .RS 12n
345 Seconds before growing arc size
346 .sp
347 Default value: \fB5\fR.
348 .RE
349
350 .sp
351 .ne 2
352 .na
353 \fBzfs_arc_max\fR (ulong)
354 .ad
355 .RS 12n
356 Max arc size
357 .sp
358 Default value: \fB0\fR.
359 .RE
360
361 .sp
362 .ne 2
363 .na
364 \fBzfs_arc_memory_throttle_disable\fR (int)
365 .ad
366 .RS 12n
367 Disable memory throttle
368 .sp
369 Use \fB1\fR for yes (default) and \fB0\fR to disable.
370 .RE
371
372 .sp
373 .ne 2
374 .na
375 \fBzfs_arc_meta_limit\fR (ulong)
376 .ad
377 .RS 12n
378 Meta limit for arc size
379 .sp
380 Default value: \fB0\fR.
381 .RE
382
383 .sp
384 .ne 2
385 .na
386 \fBzfs_arc_meta_prune\fR (int)
387 .ad
388 .RS 12n
389 Bytes of meta data to prune
390 .sp
391 Default value: \fB1,048,576\fR.
392 .RE
393
394 .sp
395 .ne 2
396 .na
397 \fBzfs_arc_min\fR (ulong)
398 .ad
399 .RS 12n
400 Min arc size
401 .sp
402 Default value: \fB100\fR.
403 .RE
404
405 .sp
406 .ne 2
407 .na
408 \fBzfs_arc_min_prefetch_lifespan\fR (int)
409 .ad
410 .RS 12n
411 Min life of prefetch block
412 .sp
413 Default value: \fB100\fR.
414 .RE
415
416 .sp
417 .ne 2
418 .na
419 \fBzfs_arc_p_aggressive_disable\fR (int)
420 .ad
421 .RS 12n
422 Disable aggressive arc_p growth
423 .sp
424 Use \fB1\fR for yes (default) and \fB0\fR to disable.
425 .RE
426
427 .sp
428 .ne 2
429 .na
430 \fBzfs_arc_p_dampener_disable\fR (int)
431 .ad
432 .RS 12n
433 Disable arc_p adapt dampener
434 .sp
435 Use \fB1\fR for yes (default) and \fB0\fR to disable.
436 .RE
437
438 .sp
439 .ne 2
440 .na
441 \fBzfs_arc_shrink_shift\fR (int)
442 .ad
443 .RS 12n
444 log2(fraction of arc to reclaim)
445 .sp
446 Default value: \fB5\fR.
447 .RE
448
449 .sp
450 .ne 2
451 .na
452 \fBzfs_autoimport_disable\fR (int)
453 .ad
454 .RS 12n
455 Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
456 .sp
457 Use \fB1\fR for yes and \fB0\fR for no (default).
458 .RE
459
460 .sp
461 .ne 2
462 .na
463 \fBzfs_dbuf_state_index\fR (int)
464 .ad
465 .RS 12n
466 Calculate arc header index
467 .sp
468 Default value: \fB0\fR.
469 .RE
470
471 .sp
472 .ne 2
473 .na
474 \fBzfs_deadman_enabled\fR (int)
475 .ad
476 .RS 12n
477 Enable deadman timer
478 .sp
479 Use \fB1\fR for yes (default) and \fB0\fR to disable.
480 .RE
481
482 .sp
483 .ne 2
484 .na
485 \fBzfs_deadman_synctime_ms\fR (ulong)
486 .ad
487 .RS 12n
488 Expiration time in milliseconds. This value has two meanings. First it is
489 used to determine when the spa_deadman() logic should fire. By default the
490 spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
491 Secondly, the value determines if an I/O is considered "hung". Any I/O that
492 has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
493 in a zevent being logged.
494 .sp
495 Default value: \fB1,000,000\fR.
496 .RE
497
498 .sp
499 .ne 2
500 .na
501 \fBzfs_dedup_prefetch\fR (int)
502 .ad
503 .RS 12n
504 Enable prefetching dedup-ed blks
505 .sp
506 Use \fB1\fR for yes and \fB0\fR to disable (default).
507 .RE
508
509 .sp
510 .ne 2
511 .na
512 \fBzfs_delay_min_dirty_percent\fR (int)
513 .ad
514 .RS 12n
515 Start to delay each transaction once there is this amount of dirty data,
516 expressed as a percentage of \fBzfs_dirty_data_max\fR.
517 This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
518 See the section "ZFS TRANSACTION DELAY".
519 .sp
520 Default value: \fB60\fR.
521 .RE
522
523 .sp
524 .ne 2
525 .na
526 \fBzfs_delay_scale\fR (int)
527 .ad
528 .RS 12n
529 This controls how quickly the transaction delay approaches infinity.
530 Larger values cause longer delays for a given amount of dirty data.
531 .sp
532 For the smoothest delay, this value should be about 1 billion divided
533 by the maximum number of operations per second. This will smoothly
534 handle between 10x and 1/10th this number.
535 .sp
536 See the section "ZFS TRANSACTION DELAY".
537 .sp
538 Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
539 .sp
540 Default value: \fB500,000\fR.
541 .RE
542
543 .sp
544 .ne 2
545 .na
546 \fBzfs_dirty_data_max\fR (int)
547 .ad
548 .RS 12n
549 Determines the dirty space limit in bytes. Once this limit is exceeded, new
550 writes are halted until space frees up. This parameter takes precedence
551 over \fBzfs_dirty_data_max_percent\fR.
552 See the section "ZFS TRANSACTION DELAY".
553 .sp
554 Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
555 .RE
556
557 .sp
558 .ne 2
559 .na
560 \fBzfs_dirty_data_max_max\fR (int)
561 .ad
562 .RS 12n
563 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
564 This limit is only enforced at module load time, and will be ignored if
565 \fBzfs_dirty_data_max\fR is later changed. This parameter takes
566 precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
567 "ZFS TRANSACTION DELAY".
568 .sp
569 Default value: 25% of physical RAM.
570 .RE
571
572 .sp
573 .ne 2
574 .na
575 \fBzfs_dirty_data_max_max_percent\fR (int)
576 .ad
577 .RS 12n
578 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
579 percentage of physical RAM. This limit is only enforced at module load
580 time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
581 The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
582 one. See the section "ZFS TRANSACTION DELAY".
583 .sp
584 Default value: 25
585 .RE
586
587 .sp
588 .ne 2
589 .na
590 \fBzfs_dirty_data_max_percent\fR (int)
591 .ad
592 .RS 12n
593 Determines the dirty space limit, expressed as a percentage of all
594 memory. Once this limit is exceeded, new writes are halted until space frees
595 up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
596 one. See the section "ZFS TRANSACTION DELAY".
597 .sp
598 Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
599 .RE
600
601 .sp
602 .ne 2
603 .na
604 \fBzfs_dirty_data_sync\fR (int)
605 .ad
606 .RS 12n
607 Start syncing out a transaction group if there is at least this much dirty data.
608 .sp
609 Default value: \fB67,108,864\fR.
610 .RE
611
612 .sp
613 .ne 2
614 .na
615 \fBzfs_vdev_async_read_max_active\fR (int)
616 .ad
617 .RS 12n
618 Maxium asynchronous read I/Os active to each device.
619 See the section "ZFS I/O SCHEDULER".
620 .sp
621 Default value: \fB3\fR.
622 .RE
623
624 .sp
625 .ne 2
626 .na
627 \fBzfs_vdev_async_read_min_active\fR (int)
628 .ad
629 .RS 12n
630 Minimum asynchronous read I/Os active to each device.
631 See the section "ZFS I/O SCHEDULER".
632 .sp
633 Default value: \fB1\fR.
634 .RE
635
636 .sp
637 .ne 2
638 .na
639 \fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
640 .ad
641 .RS 12n
642 When the pool has more than
643 \fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
644 \fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
645 the dirty data is between min and max, the active I/O limit is linearly
646 interpolated. See the section "ZFS I/O SCHEDULER".
647 .sp
648 Default value: \fB60\fR.
649 .RE
650
651 .sp
652 .ne 2
653 .na
654 \fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
655 .ad
656 .RS 12n
657 When the pool has less than
658 \fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
659 \fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
660 the dirty data is between min and max, the active I/O limit is linearly
661 interpolated. See the section "ZFS I/O SCHEDULER".
662 .sp
663 Default value: \fB30\fR.
664 .RE
665
666 .sp
667 .ne 2
668 .na
669 \fBzfs_vdev_async_write_max_active\fR (int)
670 .ad
671 .RS 12n
672 Maxium asynchronous write I/Os active to each device.
673 See the section "ZFS I/O SCHEDULER".
674 .sp
675 Default value: \fB10\fR.
676 .RE
677
678 .sp
679 .ne 2
680 .na
681 \fBzfs_vdev_async_write_min_active\fR (int)
682 .ad
683 .RS 12n
684 Minimum asynchronous write I/Os active to each device.
685 See the section "ZFS I/O SCHEDULER".
686 .sp
687 Default value: \fB1\fR.
688 .RE
689
690 .sp
691 .ne 2
692 .na
693 \fBzfs_vdev_max_active\fR (int)
694 .ad
695 .RS 12n
696 The maximum number of I/Os active to each device. Ideally, this will be >=
697 the sum of each queue's max_active. It must be at least the sum of each
698 queue's min_active. See the section "ZFS I/O SCHEDULER".
699 .sp
700 Default value: \fB1,000\fR.
701 .RE
702
703 .sp
704 .ne 2
705 .na
706 \fBzfs_vdev_scrub_max_active\fR (int)
707 .ad
708 .RS 12n
709 Maxium scrub I/Os active to each device.
710 See the section "ZFS I/O SCHEDULER".
711 .sp
712 Default value: \fB2\fR.
713 .RE
714
715 .sp
716 .ne 2
717 .na
718 \fBzfs_vdev_scrub_min_active\fR (int)
719 .ad
720 .RS 12n
721 Minimum scrub I/Os active to each device.
722 See the section "ZFS I/O SCHEDULER".
723 .sp
724 Default value: \fB1\fR.
725 .RE
726
727 .sp
728 .ne 2
729 .na
730 \fBzfs_vdev_sync_read_max_active\fR (int)
731 .ad
732 .RS 12n
733 Maxium synchronous read I/Os active to each device.
734 See the section "ZFS I/O SCHEDULER".
735 .sp
736 Default value: \fB10\fR.
737 .RE
738
739 .sp
740 .ne 2
741 .na
742 \fBzfs_vdev_sync_read_min_active\fR (int)
743 .ad
744 .RS 12n
745 Minimum synchronous read I/Os active to each device.
746 See the section "ZFS I/O SCHEDULER".
747 .sp
748 Default value: \fB10\fR.
749 .RE
750
751 .sp
752 .ne 2
753 .na
754 \fBzfs_vdev_sync_write_max_active\fR (int)
755 .ad
756 .RS 12n
757 Maxium synchronous write I/Os active to each device.
758 See the section "ZFS I/O SCHEDULER".
759 .sp
760 Default value: \fB10\fR.
761 .RE
762
763 .sp
764 .ne 2
765 .na
766 \fBzfs_vdev_sync_write_min_active\fR (int)
767 .ad
768 .RS 12n
769 Minimum synchronous write I/Os active to each device.
770 See the section "ZFS I/O SCHEDULER".
771 .sp
772 Default value: \fB10\fR.
773 .RE
774
775 .sp
776 .ne 2
777 .na
778 \fBzfs_disable_dup_eviction\fR (int)
779 .ad
780 .RS 12n
781 Disable duplicate buffer eviction
782 .sp
783 Use \fB1\fR for yes and \fB0\fR for no (default).
784 .RE
785
786 .sp
787 .ne 2
788 .na
789 \fBzfs_expire_snapshot\fR (int)
790 .ad
791 .RS 12n
792 Seconds to expire .zfs/snapshot
793 .sp
794 Default value: \fB300\fR.
795 .RE
796
797 .sp
798 .ne 2
799 .na
800 \fBzfs_flags\fR (int)
801 .ad
802 .RS 12n
803 Set additional debugging flags
804 .sp
805 Default value: \fB1\fR.
806 .RE
807
808 .sp
809 .ne 2
810 .na
811 \fBzfs_free_leak_on_eio\fR (int)
812 .ad
813 .RS 12n
814 If destroy encounters an EIO while reading metadata (e.g. indirect
815 blocks), space referenced by the missing metadata can not be freed.
816 Normally this causes the background destroy to become "stalled", as
817 it is unable to make forward progress. While in this stalled state,
818 all remaining space to free from the error-encountering filesystem is
819 "temporarily leaked". Set this flag to cause it to ignore the EIO,
820 permanently leak the space from indirect blocks that can not be read,
821 and continue to free everything else that it can.
822
823 The default, "stalling" behavior is useful if the storage partially
824 fails (i.e. some but not all i/os fail), and then later recovers. In
825 this case, we will be able to continue pool operations while it is
826 partially failed, and when it recovers, we can continue to free the
827 space, with no leaks. However, note that this case is actually
828 fairly rare.
829
830 Typically pools either (a) fail completely (but perhaps temporarily,
831 e.g. a top-level vdev going offline), or (b) have localized,
832 permanent errors (e.g. disk returns the wrong data due to bit flip or
833 firmware bug). In case (a), this setting does not matter because the
834 pool will be suspended and the sync thread will not be able to make
835 forward progress regardless. In case (b), because the error is
836 permanent, the best we can do is leak the minimum amount of space,
837 which is what setting this flag will do. Therefore, it is reasonable
838 for this flag to normally be set, but we chose the more conservative
839 approach of not setting it, so that there is no possibility of
840 leaking space in the "partial temporary" failure case.
841 .sp
842 Default value: \fB0\fR.
843 .RE
844
845 .sp
846 .ne 2
847 .na
848 \fBzfs_free_min_time_ms\fR (int)
849 .ad
850 .RS 12n
851 Min millisecs to free per txg
852 .sp
853 Default value: \fB1,000\fR.
854 .RE
855
856 .sp
857 .ne 2
858 .na
859 \fBzfs_immediate_write_sz\fR (long)
860 .ad
861 .RS 12n
862 Largest data block to write to zil
863 .sp
864 Default value: \fB32,768\fR.
865 .RE
866
867 .sp
868 .ne 2
869 .na
870 \fBzfs_mdcomp_disable\fR (int)
871 .ad
872 .RS 12n
873 Disable meta data compression
874 .sp
875 Use \fB1\fR for yes and \fB0\fR for no (default).
876 .RE
877
878 .sp
879 .ne 2
880 .na
881 \fBzfs_metaslab_fragmentation_threshold\fR (int)
882 .ad
883 .RS 12n
884 Allow metaslabs to keep their active state as long as their fragmentation
885 percentage is less than or equal to this value. An active metaslab that
886 exceeds this threshold will no longer keep its active status allowing
887 better metaslabs to be selected.
888 .sp
889 Default value: \fB70\fR.
890 .RE
891
892 .sp
893 .ne 2
894 .na
895 \fBzfs_mg_fragmentation_threshold\fR (int)
896 .ad
897 .RS 12n
898 Metaslab groups are considered eligible for allocations if their
899 fragmenation metric (measured as a percentage) is less than or equal to
900 this value. If a metaslab group exceeds this threshold then it will be
901 skipped unless all metaslab groups within the metaslab class have also
902 crossed this threshold.
903 .sp
904 Default value: \fB85\fR.
905 .RE
906
907 .sp
908 .ne 2
909 .na
910 \fBzfs_mg_noalloc_threshold\fR (int)
911 .ad
912 .RS 12n
913 Defines a threshold at which metaslab groups should be eligible for
914 allocations. The value is expressed as a percentage of free space
915 beyond which a metaslab group is always eligible for allocations.
916 If a metaslab group's free space is less than or equal to the
917 the threshold, the allocator will avoid allocating to that group
918 unless all groups in the pool have reached the threshold. Once all
919 groups have reached the threshold, all groups are allowed to accept
920 allocations. The default value of 0 disables the feature and causes
921 all metaslab groups to be eligible for allocations.
922
923 This parameter allows to deal with pools having heavily imbalanced
924 vdevs such as would be the case when a new vdev has been added.
925 Setting the threshold to a non-zero percentage will stop allocations
926 from being made to vdevs that aren't filled to the specified percentage
927 and allow lesser filled vdevs to acquire more allocations than they
928 otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
929 .sp
930 Default value: \fB0\fR.
931 .RE
932
933 .sp
934 .ne 2
935 .na
936 \fBzfs_no_scrub_io\fR (int)
937 .ad
938 .RS 12n
939 Set for no scrub I/O
940 .sp
941 Use \fB1\fR for yes and \fB0\fR for no (default).
942 .RE
943
944 .sp
945 .ne 2
946 .na
947 \fBzfs_no_scrub_prefetch\fR (int)
948 .ad
949 .RS 12n
950 Set for no scrub prefetching
951 .sp
952 Use \fB1\fR for yes and \fB0\fR for no (default).
953 .RE
954
955 .sp
956 .ne 2
957 .na
958 \fBzfs_nocacheflush\fR (int)
959 .ad
960 .RS 12n
961 Disable cache flushes
962 .sp
963 Use \fB1\fR for yes and \fB0\fR for no (default).
964 .RE
965
966 .sp
967 .ne 2
968 .na
969 \fBzfs_nopwrite_enabled\fR (int)
970 .ad
971 .RS 12n
972 Enable NOP writes
973 .sp
974 Use \fB1\fR for yes (default) and \fB0\fR to disable.
975 .RE
976
977 .sp
978 .ne 2
979 .na
980 \fBzfs_pd_blks_max\fR (int)
981 .ad
982 .RS 12n
983 Max number of blocks to prefetch
984 .sp
985 Default value: \fB100\fR.
986 .RE
987
988 .sp
989 .ne 2
990 .na
991 \fBzfs_prefetch_disable\fR (int)
992 .ad
993 .RS 12n
994 Disable all ZFS prefetching
995 .sp
996 Use \fB1\fR for yes and \fB0\fR for no (default).
997 .RE
998
999 .sp
1000 .ne 2
1001 .na
1002 \fBzfs_read_chunk_size\fR (long)
1003 .ad
1004 .RS 12n
1005 Bytes to read per chunk
1006 .sp
1007 Default value: \fB1,048,576\fR.
1008 .RE
1009
1010 .sp
1011 .ne 2
1012 .na
1013 \fBzfs_read_history\fR (int)
1014 .ad
1015 .RS 12n
1016 Historic statistics for the last N reads
1017 .sp
1018 Default value: \fB0\fR.
1019 .RE
1020
1021 .sp
1022 .ne 2
1023 .na
1024 \fBzfs_read_history_hits\fR (int)
1025 .ad
1026 .RS 12n
1027 Include cache hits in read history
1028 .sp
1029 Use \fB1\fR for yes and \fB0\fR for no (default).
1030 .RE
1031
1032 .sp
1033 .ne 2
1034 .na
1035 \fBzfs_recover\fR (int)
1036 .ad
1037 .RS 12n
1038 Set to attempt to recover from fatal errors. This should only be used as a
1039 last resort, as it typically results in leaked space, or worse.
1040 .sp
1041 Use \fB1\fR for yes and \fB0\fR for no (default).
1042 .RE
1043
1044 .sp
1045 .ne 2
1046 .na
1047 \fBzfs_resilver_delay\fR (int)
1048 .ad
1049 .RS 12n
1050 Number of ticks to delay prior to issuing a resilver I/O operation when
1051 a non-resilver or non-scrub I/O operation has occurred within the past
1052 \fBzfs_scan_idle\fR ticks.
1053 .sp
1054 Default value: \fB2\fR.
1055 .RE
1056
1057 .sp
1058 .ne 2
1059 .na
1060 \fBzfs_resilver_min_time_ms\fR (int)
1061 .ad
1062 .RS 12n
1063 Min millisecs to resilver per txg
1064 .sp
1065 Default value: \fB3,000\fR.
1066 .RE
1067
1068 .sp
1069 .ne 2
1070 .na
1071 \fBzfs_scan_idle\fR (int)
1072 .ad
1073 .RS 12n
1074 Idle window in clock ticks. During a scrub or a resilver, if
1075 a non-scrub or non-resilver I/O operation has occurred during this
1076 window, the next scrub or resilver operation is delayed by, respectively
1077 \fBzfs_scrub_delay\fR or \fBzfs_resilver_delay\fR ticks.
1078 .sp
1079 Default value: \fB50\fR.
1080 .RE
1081
1082 .sp
1083 .ne 2
1084 .na
1085 \fBzfs_scan_min_time_ms\fR (int)
1086 .ad
1087 .RS 12n
1088 Min millisecs to scrub per txg
1089 .sp
1090 Default value: \fB1,000\fR.
1091 .RE
1092
1093 .sp
1094 .ne 2
1095 .na
1096 \fBzfs_scrub_delay\fR (int)
1097 .ad
1098 .RS 12n
1099 Number of ticks to delay prior to issuing a scrub I/O operation when
1100 a non-scrub or non-resilver I/O operation has occurred within the past
1101 \fBzfs_scan_idle\fR ticks.
1102 .sp
1103 Default value: \fB4\fR.
1104 .RE
1105
1106 .sp
1107 .ne 2
1108 .na
1109 \fBzfs_send_corrupt_data\fR (int)
1110 .ad
1111 .RS 12n
1112 Allow to send corrupt data (ignore read/checksum errors when sending data)
1113 .sp
1114 Use \fB1\fR for yes and \fB0\fR for no (default).
1115 .RE
1116
1117 .sp
1118 .ne 2
1119 .na
1120 \fBzfs_sync_pass_deferred_free\fR (int)
1121 .ad
1122 .RS 12n
1123 Defer frees starting in this pass
1124 .sp
1125 Default value: \fB2\fR.
1126 .RE
1127
1128 .sp
1129 .ne 2
1130 .na
1131 \fBzfs_sync_pass_dont_compress\fR (int)
1132 .ad
1133 .RS 12n
1134 Don't compress starting in this pass
1135 .sp
1136 Default value: \fB5\fR.
1137 .RE
1138
1139 .sp
1140 .ne 2
1141 .na
1142 \fBzfs_sync_pass_rewrite\fR (int)
1143 .ad
1144 .RS 12n
1145 Rewrite new bps starting in this pass
1146 .sp
1147 Default value: \fB2\fR.
1148 .RE
1149
1150 .sp
1151 .ne 2
1152 .na
1153 \fBzfs_top_maxinflight\fR (int)
1154 .ad
1155 .RS 12n
1156 Max I/Os per top-level vdev during scrub or resilver operations.
1157 .sp
1158 Default value: \fB32\fR.
1159 .RE
1160
1161 .sp
1162 .ne 2
1163 .na
1164 \fBzfs_txg_history\fR (int)
1165 .ad
1166 .RS 12n
1167 Historic statistics for the last N txgs
1168 .sp
1169 Default value: \fB0\fR.
1170 .RE
1171
1172 .sp
1173 .ne 2
1174 .na
1175 \fBzfs_txg_timeout\fR (int)
1176 .ad
1177 .RS 12n
1178 Max seconds worth of delta per txg
1179 .sp
1180 Default value: \fB5\fR.
1181 .RE
1182
1183 .sp
1184 .ne 2
1185 .na
1186 \fBzfs_vdev_aggregation_limit\fR (int)
1187 .ad
1188 .RS 12n
1189 Max vdev I/O aggregation size
1190 .sp
1191 Default value: \fB131,072\fR.
1192 .RE
1193
1194 .sp
1195 .ne 2
1196 .na
1197 \fBzfs_vdev_cache_bshift\fR (int)
1198 .ad
1199 .RS 12n
1200 Shift size to inflate reads too
1201 .sp
1202 Default value: \fB16\fR.
1203 .RE
1204
1205 .sp
1206 .ne 2
1207 .na
1208 \fBzfs_vdev_cache_max\fR (int)
1209 .ad
1210 .RS 12n
1211 Inflate reads small than max
1212 .RE
1213
1214 .sp
1215 .ne 2
1216 .na
1217 \fBzfs_vdev_cache_size\fR (int)
1218 .ad
1219 .RS 12n
1220 Total size of the per-disk cache
1221 .sp
1222 Default value: \fB0\fR.
1223 .RE
1224
1225 .sp
1226 .ne 2
1227 .na
1228 \fBzfs_vdev_mirror_switch_us\fR (int)
1229 .ad
1230 .RS 12n
1231 Switch mirrors every N usecs
1232 .sp
1233 Default value: \fB10,000\fR.
1234 .RE
1235
1236 .sp
1237 .ne 2
1238 .na
1239 \fBzfs_vdev_read_gap_limit\fR (int)
1240 .ad
1241 .RS 12n
1242 Aggregate read I/O over gap
1243 .sp
1244 Default value: \fB32,768\fR.
1245 .RE
1246
1247 .sp
1248 .ne 2
1249 .na
1250 \fBzfs_vdev_scheduler\fR (charp)
1251 .ad
1252 .RS 12n
1253 I/O scheduler
1254 .sp
1255 Default value: \fBnoop\fR.
1256 .RE
1257
1258 .sp
1259 .ne 2
1260 .na
1261 \fBzfs_vdev_write_gap_limit\fR (int)
1262 .ad
1263 .RS 12n
1264 Aggregate write I/O over gap
1265 .sp
1266 Default value: \fB4,096\fR.
1267 .RE
1268
1269 .sp
1270 .ne 2
1271 .na
1272 \fBzfs_zevent_cols\fR (int)
1273 .ad
1274 .RS 12n
1275 Max event column width
1276 .sp
1277 Default value: \fB80\fR.
1278 .RE
1279
1280 .sp
1281 .ne 2
1282 .na
1283 \fBzfs_zevent_console\fR (int)
1284 .ad
1285 .RS 12n
1286 Log events to the console
1287 .sp
1288 Use \fB1\fR for yes and \fB0\fR for no (default).
1289 .RE
1290
1291 .sp
1292 .ne 2
1293 .na
1294 \fBzfs_zevent_len_max\fR (int)
1295 .ad
1296 .RS 12n
1297 Max event queue length
1298 .sp
1299 Default value: \fB0\fR.
1300 .RE
1301
1302 .sp
1303 .ne 2
1304 .na
1305 \fBzil_replay_disable\fR (int)
1306 .ad
1307 .RS 12n
1308 Disable intent logging replay
1309 .sp
1310 Use \fB1\fR for yes and \fB0\fR for no (default).
1311 .RE
1312
1313 .sp
1314 .ne 2
1315 .na
1316 \fBzil_slog_limit\fR (ulong)
1317 .ad
1318 .RS 12n
1319 Max commit bytes to separate log device
1320 .sp
1321 Default value: \fB1,048,576\fR.
1322 .RE
1323
1324 .sp
1325 .ne 2
1326 .na
1327 \fBzio_bulk_flags\fR (int)
1328 .ad
1329 .RS 12n
1330 Additional flags to pass to bulk buffers
1331 .sp
1332 Default value: \fB0\fR.
1333 .RE
1334
1335 .sp
1336 .ne 2
1337 .na
1338 \fBzio_delay_max\fR (int)
1339 .ad
1340 .RS 12n
1341 Max zio millisec delay before posting event
1342 .sp
1343 Default value: \fB30,000\fR.
1344 .RE
1345
1346 .sp
1347 .ne 2
1348 .na
1349 \fBzio_injection_enabled\fR (int)
1350 .ad
1351 .RS 12n
1352 Enable fault injection
1353 .sp
1354 Use \fB1\fR for yes and \fB0\fR for no (default).
1355 .RE
1356
1357 .sp
1358 .ne 2
1359 .na
1360 \fBzio_requeue_io_start_cut_in_line\fR (int)
1361 .ad
1362 .RS 12n
1363 Prioritize requeued I/O
1364 .sp
1365 Default value: \fB0\fR.
1366 .RE
1367
1368 .sp
1369 .ne 2
1370 .na
1371 \fBzvol_inhibit_dev\fR (uint)
1372 .ad
1373 .RS 12n
1374 Do not create zvol device nodes
1375 .sp
1376 Use \fB1\fR for yes and \fB0\fR for no (default).
1377 .RE
1378
1379 .sp
1380 .ne 2
1381 .na
1382 \fBzvol_major\fR (uint)
1383 .ad
1384 .RS 12n
1385 Major number for zvol device
1386 .sp
1387 Default value: \fB230\fR.
1388 .RE
1389
1390 .sp
1391 .ne 2
1392 .na
1393 \fBzvol_max_discard_blocks\fR (ulong)
1394 .ad
1395 .RS 12n
1396 Max number of blocks to discard at once
1397 .sp
1398 Default value: \fB16,384\fR.
1399 .RE
1400
1401 .sp
1402 .ne 2
1403 .na
1404 \fBzvol_threads\fR (uint)
1405 .ad
1406 .RS 12n
1407 Number of threads for zvol device
1408 .sp
1409 Default value: \fB32\fR.
1410 .RE
1411
1412 .SH ZFS I/O SCHEDULER
1413 ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
1414 The I/O scheduler determines when and in what order those operations are
1415 issued. The I/O scheduler divides operations into five I/O classes
1416 prioritized in the following order: sync read, sync write, async read,
1417 async write, and scrub/resilver. Each queue defines the minimum and
1418 maximum number of concurrent operations that may be issued to the
1419 device. In addition, the device has an aggregate maximum,
1420 \fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
1421 must not exceed the aggregate maximum. If the sum of the per-queue
1422 maximums exceeds the aggregate maximum, then the number of active I/Os
1423 may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
1424 be issued regardless of whether all per-queue minimums have been met.
1425 .sp
1426 For many physical devices, throughput increases with the number of
1427 concurrent operations, but latency typically suffers. Further, physical
1428 devices typically have a limit at which more concurrent operations have no
1429 effect on throughput or can actually cause it to decrease.
1430 .sp
1431 The scheduler selects the next operation to issue by first looking for an
1432 I/O class whose minimum has not been satisfied. Once all are satisfied and
1433 the aggregate maximum has not been hit, the scheduler looks for classes
1434 whose maximum has not been satisfied. Iteration through the I/O classes is
1435 done in the order specified above. No further operations are issued if the
1436 aggregate maximum number of concurrent operations has been hit or if there
1437 are no operations queued for an I/O class that has not hit its maximum.
1438 Every time an I/O is queued or an operation completes, the I/O scheduler
1439 looks for new operations to issue.
1440 .sp
1441 In general, smaller max_active's will lead to lower latency of synchronous
1442 operations. Larger max_active's may lead to higher overall throughput,
1443 depending on underlying storage.
1444 .sp
1445 The ratio of the queues' max_actives determines the balance of performance
1446 between reads, writes, and scrubs. E.g., increasing
1447 \fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
1448 more quickly, but reads and writes to have higher latency and lower throughput.
1449 .sp
1450 All I/O classes have a fixed maximum number of outstanding operations
1451 except for the async write class. Asynchronous writes represent the data
1452 that is committed to stable storage during the syncing stage for
1453 transaction groups. Transaction groups enter the syncing state
1454 periodically so the number of queued async writes will quickly burst up
1455 and then bleed down to zero. Rather than servicing them as quickly as
1456 possible, the I/O scheduler changes the maximum number of active async
1457 write I/Os according to the amount of dirty data in the pool. Since
1458 both throughput and latency typically increase with the number of
1459 concurrent operations issued to physical devices, reducing the
1460 burstiness in the number of concurrent operations also stabilizes the
1461 response time of operations from other -- and in particular synchronous
1462 -- queues. In broad strokes, the I/O scheduler will issue more
1463 concurrent operations from the async write queue as there's more dirty
1464 data in the pool.
1465 .sp
1466 Async Writes
1467 .sp
1468 The number of concurrent operations issued for the async write I/O class
1469 follows a piece-wise linear function defined by a few adjustable points.
1470 .nf
1471
1472 | o---------| <-- zfs_vdev_async_write_max_active
1473 ^ | /^ |
1474 | | / | |
1475 active | / | |
1476 I/O | / | |
1477 count | / | |
1478 | / | |
1479 |-------o | | <-- zfs_vdev_async_write_min_active
1480 0|_______^______|_________|
1481 0% | | 100% of zfs_dirty_data_max
1482 | |
1483 | `-- zfs_vdev_async_write_active_max_dirty_percent
1484 `--------- zfs_vdev_async_write_active_min_dirty_percent
1485
1486 .fi
1487 Until the amount of dirty data exceeds a minimum percentage of the dirty
1488 data allowed in the pool, the I/O scheduler will limit the number of
1489 concurrent operations to the minimum. As that threshold is crossed, the
1490 number of concurrent operations issued increases linearly to the maximum at
1491 the specified maximum percentage of the dirty data allowed in the pool.
1492 .sp
1493 Ideally, the amount of dirty data on a busy pool will stay in the sloped
1494 part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
1495 and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
1496 maximum percentage, this indicates that the rate of incoming data is
1497 greater than the rate that the backend storage can handle. In this case, we
1498 must further throttle incoming writes, as described in the next section.
1499
1500 .SH ZFS TRANSACTION DELAY
1501 We delay transactions when we've determined that the backend storage
1502 isn't able to accommodate the rate of incoming writes.
1503 .sp
1504 If there is already a transaction waiting, we delay relative to when
1505 that transaction will finish waiting. This way the calculated delay time
1506 is independent of the number of threads concurrently executing
1507 transactions.
1508 .sp
1509 If we are the only waiter, wait relative to when the transaction
1510 started, rather than the current time. This credits the transaction for
1511 "time already served", e.g. reading indirect blocks.
1512 .sp
1513 The minimum time for a transaction to take is calculated as:
1514 .nf
1515 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
1516 min_time is then capped at 100 milliseconds.
1517 .fi
1518 .sp
1519 The delay has two degrees of freedom that can be adjusted via tunables. The
1520 percentage of dirty data at which we start to delay is defined by
1521 \fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
1522 \fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
1523 delay after writing at full speed has failed to keep up with the incoming write
1524 rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
1525 this variable determines the amount of delay at the midpoint of the curve.
1526 .sp
1527 .nf
1528 delay
1529 10ms +-------------------------------------------------------------*+
1530 | *|
1531 9ms + *+
1532 | *|
1533 8ms + *+
1534 | * |
1535 7ms + * +
1536 | * |
1537 6ms + * +
1538 | * |
1539 5ms + * +
1540 | * |
1541 4ms + * +
1542 | * |
1543 3ms + * +
1544 | * |
1545 2ms + (midpoint) * +
1546 | | ** |
1547 1ms + v *** +
1548 | zfs_delay_scale ----------> ******** |
1549 0 +-------------------------------------*********----------------+
1550 0% <- zfs_dirty_data_max -> 100%
1551 .fi
1552 .sp
1553 Note that since the delay is added to the outstanding time remaining on the
1554 most recent transaction, the delay is effectively the inverse of IOPS.
1555 Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
1556 was chosen such that small changes in the amount of accumulated dirty data
1557 in the first 3/4 of the curve yield relatively small differences in the
1558 amount of delay.
1559 .sp
1560 The effects can be easier to understand when the amount of delay is
1561 represented on a log scale:
1562 .sp
1563 .nf
1564 delay
1565 100ms +-------------------------------------------------------------++
1566 + +
1567 | |
1568 + *+
1569 10ms + *+
1570 + ** +
1571 | (midpoint) ** |
1572 + | ** +
1573 1ms + v **** +
1574 + zfs_delay_scale ----------> ***** +
1575 | **** |
1576 + **** +
1577 100us + ** +
1578 + * +
1579 | * |
1580 + * +
1581 10us + * +
1582 + +
1583 | |
1584 + +
1585 +--------------------------------------------------------------+
1586 0% <- zfs_dirty_data_max -> 100%
1587 .fi
1588 .sp
1589 Note here that only as the amount of dirty data approaches its limit does
1590 the delay start to increase rapidly. The goal of a properly tuned system
1591 should be to keep the amount of dirty data out of that range by first
1592 ensuring that the appropriate limits are set for the I/O scheduler to reach
1593 optimal throughput on the backend storage, and then by changing the value
1594 of \fBzfs_delay_scale\fR to increase the steepness of the curve.