2 .\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3 .\" The contents of this file are subject to the terms of the Common Development
4 .\" and Distribution License (the "License"). You may not use this file except
5 .\" in compliance with the License. You can obtain a copy of the license at
6 .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
8 .\" See the License for the specific language governing permissions and
9 .\" limitations under the License. When distributing Covered Code, include this
10 .\" CDDL HEADER in each file and include the License file at
11 .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
12 .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
13 .\" own identifying information:
14 .\" Portions Copyright [yyyy] [name of copyright owner]
15 .TH ZFS-MODULE-PARAMETERS 5 "Nov 16, 2013"
17 zfs\-module\-parameters \- ZFS module parameters
21 Description of the different parameters to the ZFS module.
23 .SS "Module parameters"
30 \fBl2arc_feed_again\fR (int)
35 Use \fB1\fR for yes (default) and \fB0\fR to disable.
41 \fBl2arc_feed_min_ms\fR (ulong)
44 Min feed interval in milliseconds
46 Default value: \fB200\fR.
52 \fBl2arc_feed_secs\fR (ulong)
55 Seconds between L2ARC writing
57 Default value: \fB1\fR.
63 \fBl2arc_headroom\fR (ulong)
66 Number of max device writes to precache
68 Default value: \fB2\fR.
74 \fBl2arc_headroom_boost\fR (ulong)
77 Compressed l2arc_headroom multiplier
79 Default value: \fB200\fR.
85 \fBl2arc_nocompress\fR (int)
88 Skip compressing L2ARC buffers
90 Use \fB1\fR for yes and \fB0\fR for no (default).
96 \fBl2arc_noprefetch\fR (int)
99 Skip caching prefetched buffers
101 Use \fB1\fR for yes (default) and \fB0\fR to disable.
107 \fBl2arc_norw\fR (int)
110 No reads during writes
112 Use \fB1\fR for yes and \fB0\fR for no (default).
118 \fBl2arc_write_boost\fR (ulong)
121 Extra write bytes during device warmup
123 Default value: \fB8,388,608\fR.
129 \fBl2arc_write_max\fR (ulong)
132 Max write bytes per interval
134 Default value: \fB8,388,608\fR.
140 \fBmetaslab_debug_load\fR (int)
143 Load all metaslabs during pool import.
145 Use \fB1\fR for yes and \fB0\fR for no (default).
151 \fBmetaslab_debug_unload\fR (int)
154 Prevent metaslabs from being unloaded.
156 Use \fB1\fR for yes and \fB0\fR for no (default).
162 \fBspa_config_path\fR (charp)
167 Default value: \fB/etc/zfs/zpool.cache\fR.
173 \fBspa_asize_inflation\fR (int)
176 Multiplication factor used to estimate actual disk consumption from the
177 size of data being written. The default value is a worst case estimate,
178 but lower values may be valid for a given pool depending on its
179 configuration. Pool administrators who understand the factors involved
180 may wish to specify a more realistic inflation factor, particularly if
181 they operate close to quota or capacity limits.
189 \fBzfetch_array_rd_sz\fR (ulong)
192 If prefetching is enabled, disable prefetching for reads larger than this size.
194 Default value: \fB1,048,576\fR.
200 \fBzfetch_block_cap\fR (uint)
203 Max number of blocks to prefetch at a time
205 Default value: \fB256\fR.
211 \fBzfetch_max_streams\fR (uint)
214 Max number of streams per zfetch (prefetch streams per file).
216 Default value: \fB8\fR.
222 \fBzfetch_min_sec_reap\fR (uint)
225 Min time before an active prefetch stream can be reclaimed
227 Default value: \fB2\fR.
233 \fBzfs_arc_grow_retry\fR (int)
236 Seconds before growing arc size
238 Default value: \fB5\fR.
244 \fBzfs_arc_max\fR (ulong)
249 Default value: \fB0\fR.
255 \fBzfs_arc_memory_throttle_disable\fR (int)
258 Disable memory throttle
260 Use \fB1\fR for yes (default) and \fB0\fR to disable.
266 \fBzfs_arc_meta_limit\fR (ulong)
269 Meta limit for arc size
271 Default value: \fB0\fR.
277 \fBzfs_arc_meta_prune\fR (int)
280 Bytes of meta data to prune
282 Default value: \fB1,048,576\fR.
288 \fBzfs_arc_min\fR (ulong)
293 Default value: \fB100\fR.
299 \fBzfs_arc_min_prefetch_lifespan\fR (int)
302 Min life of prefetch block
304 Default value: \fB100\fR.
310 \fBzfs_arc_p_aggressive_disable\fR (int)
313 Disable aggressive arc_p growth
315 Use \fB1\fR for yes (default) and \fB0\fR to disable.
321 \fBzfs_arc_p_dampener_disable\fR (int)
324 Disable arc_p adapt dampener
326 Use \fB1\fR for yes (default) and \fB0\fR to disable.
332 \fBzfs_arc_shrink_shift\fR (int)
335 log2(fraction of arc to reclaim)
337 Default value: \fB5\fR.
343 \fBzfs_autoimport_disable\fR (int)
346 Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
348 Use \fB1\fR for yes and \fB0\fR for no (default).
354 \fBzfs_dbuf_state_index\fR (int)
357 Calculate arc header index
359 Default value: \fB0\fR.
365 \fBzfs_deadman_enabled\fR (int)
370 Use \fB1\fR for yes (default) and \fB0\fR to disable.
376 \fBzfs_deadman_synctime_ms\fR (ulong)
379 Expiration time in milliseconds. This value has two meanings. First it is
380 used to determine when the spa_deadman() logic should fire. By default the
381 spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
382 Secondly, the value determines if an I/O is considered "hung". Any I/O that
383 has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
384 in a zevent being logged.
386 Default value: \fB1,000,000\fR.
392 \fBzfs_dedup_prefetch\fR (int)
395 Enable prefetching dedup-ed blks
397 Use \fB1\fR for yes (default) and \fB0\fR to disable.
403 \fBzfs_delay_min_dirty_percent\fR (int)
406 Start to delay each transaction once there is this amount of dirty data,
407 expressed as a percentage of \fBzfs_dirty_data_max\fR.
408 This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
409 See the section "ZFS TRANSACTION DELAY".
411 Default value: \fB60\fR.
417 \fBzfs_delay_scale\fR (int)
420 This controls how quickly the transaction delay approaches infinity.
421 Larger values cause longer delays for a given amount of dirty data.
423 For the smoothest delay, this value should be about 1 billion divided
424 by the maximum number of operations per second. This will smoothly
425 handle between 10x and 1/10th this number.
427 See the section "ZFS TRANSACTION DELAY".
429 Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
431 Default value: \fB500,000\fR.
437 \fBzfs_dirty_data_max\fR (int)
440 Determines the dirty space limit in bytes. Once this limit is exceeded, new
441 writes are halted until space frees up. This parameter takes precedence
442 over \fBzfs_dirty_data_max_percent\fR.
443 See the section "ZFS TRANSACTION DELAY".
445 Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
451 \fBzfs_dirty_data_max_max\fR (int)
454 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
455 This limit is only enforced at module load time, and will be ignored if
456 \fBzfs_dirty_data_max\fR is later changed. This parameter takes
457 precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
458 "ZFS TRANSACTION DELAY".
460 Default value: 25% of physical RAM.
466 \fBzfs_dirty_data_max_max_percent\fR (int)
469 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
470 percentage of physical RAM. This limit is only enforced at module load
471 time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
472 The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
473 one. See the section "ZFS TRANSACTION DELAY".
481 \fBzfs_dirty_data_max_percent\fR (int)
484 Determines the dirty space limit, expressed as a percentage of all
485 memory. Once this limit is exceeded, new writes are halted until space frees
486 up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
487 one. See the section "ZFS TRANSACTION DELAY".
489 Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
495 \fBzfs_dirty_data_sync\fR (int)
498 Start syncing out a transaction group if there is at least this much dirty data.
500 Default value: \fB67,108,864\fR.
506 \fBzfs_vdev_async_read_max_active\fR (int)
509 Maxium asynchronous read I/Os active to each device.
510 See the section "ZFS I/O SCHEDULER".
512 Default value: \fB3\fR.
518 \fBzfs_vdev_async_read_min_active\fR (int)
521 Minimum asynchronous read I/Os active to each device.
522 See the section "ZFS I/O SCHEDULER".
524 Default value: \fB1\fR.
530 \fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
533 When the pool has more than
534 \fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
535 \fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
536 the dirty data is between min and max, the active I/O limit is linearly
537 interpolated. See the section "ZFS I/O SCHEDULER".
539 Default value: \fB60\fR.
545 \fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
548 When the pool has less than
549 \fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
550 \fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
551 the dirty data is between min and max, the active I/O limit is linearly
552 interpolated. See the section "ZFS I/O SCHEDULER".
554 Default value: \fB30\fR.
560 \fBzfs_vdev_async_write_max_active\fR (int)
563 Maxium asynchronous write I/Os active to each device.
564 See the section "ZFS I/O SCHEDULER".
566 Default value: \fB10\fR.
572 \fBzfs_vdev_async_write_min_active\fR (int)
575 Minimum asynchronous write I/Os active to each device.
576 See the section "ZFS I/O SCHEDULER".
578 Default value: \fB1\fR.
584 \fBzfs_vdev_max_active\fR (int)
587 The maximum number of I/Os active to each device. Ideally, this will be >=
588 the sum of each queue's max_active. It must be at least the sum of each
589 queue's min_active. See the section "ZFS I/O SCHEDULER".
591 Default value: \fB1,000\fR.
597 \fBzfs_vdev_scrub_max_active\fR (int)
600 Maxium scrub I/Os active to each device.
601 See the section "ZFS I/O SCHEDULER".
603 Default value: \fB2\fR.
609 \fBzfs_vdev_scrub_min_active\fR (int)
612 Minimum scrub I/Os active to each device.
613 See the section "ZFS I/O SCHEDULER".
615 Default value: \fB1\fR.
621 \fBzfs_vdev_sync_read_max_active\fR (int)
624 Maxium synchronous read I/Os active to each device.
625 See the section "ZFS I/O SCHEDULER".
627 Default value: \fB10\fR.
633 \fBzfs_vdev_sync_read_min_active\fR (int)
636 Minimum synchronous read I/Os active to each device.
637 See the section "ZFS I/O SCHEDULER".
639 Default value: \fB10\fR.
645 \fBzfs_vdev_sync_write_max_active\fR (int)
648 Maxium synchronous write I/Os active to each device.
649 See the section "ZFS I/O SCHEDULER".
651 Default value: \fB10\fR.
657 \fBzfs_vdev_sync_write_min_active\fR (int)
660 Minimum synchronous write I/Os active to each device.
661 See the section "ZFS I/O SCHEDULER".
663 Default value: \fB10\fR.
669 \fBzfs_disable_dup_eviction\fR (int)
672 Disable duplicate buffer eviction
674 Use \fB1\fR for yes and \fB0\fR for no (default).
680 \fBzfs_expire_snapshot\fR (int)
683 Seconds to expire .zfs/snapshot
685 Default value: \fB300\fR.
691 \fBzfs_flags\fR (int)
694 Set additional debugging flags
696 Default value: \fB1\fR.
702 \fBzfs_free_min_time_ms\fR (int)
705 Min millisecs to free per txg
707 Default value: \fB1,000\fR.
713 \fBzfs_immediate_write_sz\fR (long)
716 Largest data block to write to zil
718 Default value: \fB32,768\fR.
724 \fBzfs_mdcomp_disable\fR (int)
727 Disable meta data compression
729 Use \fB1\fR for yes and \fB0\fR for no (default).
735 \fBzfs_no_scrub_io\fR (int)
740 Use \fB1\fR for yes and \fB0\fR for no (default).
746 \fBzfs_no_scrub_prefetch\fR (int)
749 Set for no scrub prefetching
751 Use \fB1\fR for yes and \fB0\fR for no (default).
757 \fBzfs_nocacheflush\fR (int)
760 Disable cache flushes
762 Use \fB1\fR for yes and \fB0\fR for no (default).
768 \fBzfs_nopwrite_enabled\fR (int)
773 Use \fB1\fR for yes (default) and \fB0\fR to disable.
779 \fBzfs_pd_blks_max\fR (int)
782 Max number of blocks to prefetch
784 Default value: \fB100\fR.
790 \fBzfs_prefetch_disable\fR (int)
793 Disable all ZFS prefetching
795 Use \fB1\fR for yes and \fB0\fR for no (default).
801 \fBzfs_read_chunk_size\fR (long)
804 Bytes to read per chunk
806 Default value: \fB1,048,576\fR.
812 \fBzfs_read_history\fR (int)
815 Historic statistics for the last N reads
817 Default value: \fB0\fR.
823 \fBzfs_read_history_hits\fR (int)
826 Include cache hits in read history
828 Use \fB1\fR for yes and \fB0\fR for no (default).
834 \fBzfs_recover\fR (int)
837 Set to attempt to recover from fatal errors. This should only be used as a
838 last resort, as it typically results in leaked space, or worse.
840 Use \fB1\fR for yes and \fB0\fR for no (default).
846 \fBzfs_resilver_delay\fR (int)
849 Number of ticks to delay prior to issuing a resilver I/O operation when
850 a non-resilver or non-scrub I/O operation has occurred within the past
851 \fBzfs_scan_idle\fR ticks.
853 Default value: \fB2\fR.
859 \fBzfs_resilver_min_time_ms\fR (int)
862 Min millisecs to resilver per txg
864 Default value: \fB3,000\fR.
870 \fBzfs_scan_idle\fR (int)
873 Idle window in clock ticks. During a scrub or a resilver, if
874 a non-scrub or non-resilver I/O operation has occurred during this
875 window, the next scrub or resilver operation is delayed by, respectively
876 \fBzfs_scrub_delay\fR or \fBzfs_resilver_delay\fR ticks.
878 Default value: \fB50\fR.
884 \fBzfs_scan_min_time_ms\fR (int)
887 Min millisecs to scrub per txg
889 Default value: \fB1,000\fR.
895 \fBzfs_scrub_delay\fR (int)
898 Number of ticks to delay prior to issuing a scrub I/O operation when
899 a non-scrub or non-resilver I/O operation has occurred within the past
900 \fBzfs_scan_idle\fR ticks.
902 Default value: \fB4\fR.
908 \fBzfs_send_corrupt_data\fR (int)
911 Allow to send corrupt data (ignore read/checksum errors when sending data)
913 Use \fB1\fR for yes and \fB0\fR for no (default).
919 \fBzfs_sync_pass_deferred_free\fR (int)
922 Defer frees starting in this pass
924 Default value: \fB2\fR.
930 \fBzfs_sync_pass_dont_compress\fR (int)
933 Don't compress starting in this pass
935 Default value: \fB5\fR.
941 \fBzfs_sync_pass_rewrite\fR (int)
944 Rewrite new bps starting in this pass
946 Default value: \fB2\fR.
952 \fBzfs_top_maxinflight\fR (int)
955 Max I/Os per top-level vdev during scrub or resilver operations.
957 Default value: \fB32\fR.
963 \fBzfs_txg_history\fR (int)
966 Historic statistics for the last N txgs
968 Default value: \fB0\fR.
974 \fBzfs_txg_timeout\fR (int)
977 Max seconds worth of delta per txg
979 Default value: \fB5\fR.
985 \fBzfs_vdev_aggregation_limit\fR (int)
988 Max vdev I/O aggregation size
990 Default value: \fB131,072\fR.
996 \fBzfs_vdev_cache_bshift\fR (int)
999 Shift size to inflate reads too
1001 Default value: \fB16\fR.
1007 \fBzfs_vdev_cache_max\fR (int)
1010 Inflate reads small than max
1016 \fBzfs_vdev_cache_size\fR (int)
1019 Total size of the per-disk cache
1021 Default value: \fB0\fR.
1027 \fBzfs_vdev_mirror_switch_us\fR (int)
1030 Switch mirrors every N usecs
1032 Default value: \fB10,000\fR.
1038 \fBzfs_vdev_read_gap_limit\fR (int)
1041 Aggregate read I/O over gap
1043 Default value: \fB32,768\fR.
1049 \fBzfs_vdev_scheduler\fR (charp)
1054 Default value: \fBnoop\fR.
1060 \fBzfs_vdev_write_gap_limit\fR (int)
1063 Aggregate write I/O over gap
1065 Default value: \fB4,096\fR.
1071 \fBzfs_zevent_cols\fR (int)
1074 Max event column width
1076 Default value: \fB80\fR.
1082 \fBzfs_zevent_console\fR (int)
1085 Log events to the console
1087 Use \fB1\fR for yes and \fB0\fR for no (default).
1093 \fBzfs_zevent_len_max\fR (int)
1096 Max event queue length
1098 Default value: \fB0\fR.
1104 \fBzil_replay_disable\fR (int)
1107 Disable intent logging replay
1109 Use \fB1\fR for yes and \fB0\fR for no (default).
1115 \fBzil_slog_limit\fR (ulong)
1118 Max commit bytes to separate log device
1120 Default value: \fB1,048,576\fR.
1126 \fBzio_bulk_flags\fR (int)
1129 Additional flags to pass to bulk buffers
1131 Default value: \fB0\fR.
1137 \fBzio_delay_max\fR (int)
1140 Max zio millisec delay before posting event
1142 Default value: \fB30,000\fR.
1148 \fBzio_injection_enabled\fR (int)
1151 Enable fault injection
1153 Use \fB1\fR for yes and \fB0\fR for no (default).
1159 \fBzio_requeue_io_start_cut_in_line\fR (int)
1162 Prioritize requeued I/O
1164 Default value: \fB0\fR.
1170 \fBzvol_inhibit_dev\fR (uint)
1173 Do not create zvol device nodes
1175 Use \fB1\fR for yes and \fB0\fR for no (default).
1181 \fBzvol_major\fR (uint)
1184 Major number for zvol device
1186 Default value: \fB230\fR.
1192 \fBzvol_max_discard_blocks\fR (ulong)
1195 Max number of blocks to discard at once
1197 Default value: \fB16,384\fR.
1203 \fBzvol_threads\fR (uint)
1206 Number of threads for zvol device
1208 Default value: \fB32\fR.
1211 .SH ZFS I/O SCHEDULER
1212 ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
1213 The I/O scheduler determines when and in what order those operations are
1214 issued. The I/O scheduler divides operations into five I/O classes
1215 prioritized in the following order: sync read, sync write, async read,
1216 async write, and scrub/resilver. Each queue defines the minimum and
1217 maximum number of concurrent operations that may be issued to the
1218 device. In addition, the device has an aggregate maximum,
1219 \fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
1220 must not exceed the aggregate maximum. If the sum of the per-queue
1221 maximums exceeds the aggregate maximum, then the number of active I/Os
1222 may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
1223 be issued regardless of whether all per-queue minimums have been met.
1225 For many physical devices, throughput increases with the number of
1226 concurrent operations, but latency typically suffers. Further, physical
1227 devices typically have a limit at which more concurrent operations have no
1228 effect on throughput or can actually cause it to decrease.
1230 The scheduler selects the next operation to issue by first looking for an
1231 I/O class whose minimum has not been satisfied. Once all are satisfied and
1232 the aggregate maximum has not been hit, the scheduler looks for classes
1233 whose maximum has not been satisfied. Iteration through the I/O classes is
1234 done in the order specified above. No further operations are issued if the
1235 aggregate maximum number of concurrent operations has been hit or if there
1236 are no operations queued for an I/O class that has not hit its maximum.
1237 Every time an I/O is queued or an operation completes, the I/O scheduler
1238 looks for new operations to issue.
1240 In general, smaller max_active's will lead to lower latency of synchronous
1241 operations. Larger max_active's may lead to higher overall throughput,
1242 depending on underlying storage.
1244 The ratio of the queues' max_actives determines the balance of performance
1245 between reads, writes, and scrubs. E.g., increasing
1246 \fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
1247 more quickly, but reads and writes to have higher latency and lower throughput.
1249 All I/O classes have a fixed maximum number of outstanding operations
1250 except for the async write class. Asynchronous writes represent the data
1251 that is committed to stable storage during the syncing stage for
1252 transaction groups. Transaction groups enter the syncing state
1253 periodically so the number of queued async writes will quickly burst up
1254 and then bleed down to zero. Rather than servicing them as quickly as
1255 possible, the I/O scheduler changes the maximum number of active async
1256 write I/Os according to the amount of dirty data in the pool. Since
1257 both throughput and latency typically increase with the number of
1258 concurrent operations issued to physical devices, reducing the
1259 burstiness in the number of concurrent operations also stabilizes the
1260 response time of operations from other -- and in particular synchronous
1261 -- queues. In broad strokes, the I/O scheduler will issue more
1262 concurrent operations from the async write queue as there's more dirty
1267 The number of concurrent operations issued for the async write I/O class
1268 follows a piece-wise linear function defined by a few adjustable points.
1271 | o---------| <-- zfs_vdev_async_write_max_active
1278 |-------o | | <-- zfs_vdev_async_write_min_active
1279 0|_______^______|_________|
1280 0% | | 100% of zfs_dirty_data_max
1282 | `-- zfs_vdev_async_write_active_max_dirty_percent
1283 `--------- zfs_vdev_async_write_active_min_dirty_percent
1286 Until the amount of dirty data exceeds a minimum percentage of the dirty
1287 data allowed in the pool, the I/O scheduler will limit the number of
1288 concurrent operations to the minimum. As that threshold is crossed, the
1289 number of concurrent operations issued increases linearly to the maximum at
1290 the specified maximum percentage of the dirty data allowed in the pool.
1292 Ideally, the amount of dirty data on a busy pool will stay in the sloped
1293 part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
1294 and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
1295 maximum percentage, this indicates that the rate of incoming data is
1296 greater than the rate that the backend storage can handle. In this case, we
1297 must further throttle incoming writes, as described in the next section.
1299 .SH ZFS TRANSACTION DELAY
1300 We delay transactions when we've determined that the backend storage
1301 isn't able to accommodate the rate of incoming writes.
1303 If there is already a transaction waiting, we delay relative to when
1304 that transaction will finish waiting. This way the calculated delay time
1305 is independent of the number of threads concurrently executing
1308 If we are the only waiter, wait relative to when the transaction
1309 started, rather than the current time. This credits the transaction for
1310 "time already served", e.g. reading indirect blocks.
1312 The minimum time for a transaction to take is calculated as:
1314 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
1315 min_time is then capped at 100 milliseconds.
1318 The delay has two degrees of freedom that can be adjusted via tunables. The
1319 percentage of dirty data at which we start to delay is defined by
1320 \fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
1321 \fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
1322 delay after writing at full speed has failed to keep up with the incoming write
1323 rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
1324 this variable determines the amount of delay at the midpoint of the curve.
1328 10ms +-------------------------------------------------------------*+
1344 2ms + (midpoint) * +
1347 | zfs_delay_scale ----------> ******** |
1348 0 +-------------------------------------*********----------------+
1349 0% <- zfs_dirty_data_max -> 100%
1352 Note that since the delay is added to the outstanding time remaining on the
1353 most recent transaction, the delay is effectively the inverse of IOPS.
1354 Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
1355 was chosen such that small changes in the amount of accumulated dirty data
1356 in the first 3/4 of the curve yield relatively small differences in the
1359 The effects can be easier to understand when the amount of delay is
1360 represented on a log scale:
1364 100ms +-------------------------------------------------------------++
1373 + zfs_delay_scale ----------> ***** +
1384 +--------------------------------------------------------------+
1385 0% <- zfs_dirty_data_max -> 100%
1388 Note here that only as the amount of dirty data approaches its limit does
1389 the delay start to increase rapidly. The goal of a properly tuned system
1390 should be to keep the amount of dirty data out of that range by first
1391 ensuring that the appropriate limits are set for the I/O scheduler to reach
1392 optimal throughput on the backend storage, and then by changing the value
1393 of \fBzfs_delay_scale\fR to increase the steepness of the curve.