man/man5/zfs-module-parameters.5

   1 '\" te
   2 .\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
   3 .\" The contents of this file are subject to the terms of the Common Development
   4 .\" and Distribution License (the "License").  You may not use this file except
   5 .\" in compliance with the License. You can obtain a copy of the license at
   6 .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
   7 .\"
   8 .\" See the License for the specific language governing permissions and
   9 .\" limitations under the License. When distributing Covered Code, include this
  10 .\" CDDL HEADER in each file and include the License file at
  11 .\" usr/src/OPENSOLARIS.LICENSE.  If applicable, add the following below this
  12 .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
  13 .\" own identifying information:
  14 .\" Portions Copyright [yyyy] [name of copyright owner]
  15 .TH ZFS-MODULE-PARAMETERS 5 "Nov 16, 2013"
  16 .SH NAME
  17 zfs\-module\-parameters \- ZFS module parameters
  18 .SH DESCRIPTION
  19 .sp
  20 .LP
  21 Description of the different parameters to the ZFS module.
  22
  23 .SS "Module parameters"
  24 .sp
  25 .LP
  26
  27 .sp
  28 .ne 2
  29 .na
  30 \fBl2arc_feed_again\fR (int)
  31 .ad
  32 .RS 12n
  33 Turbo L2ARC warmup
  34 .sp
  35 Use \fB1\fR for yes (default) and \fB0\fR to disable.
  36 .RE
  37
  38 .sp
  39 .ne 2
  40 .na
  41 \fBl2arc_feed_min_ms\fR (ulong)
  42 .ad
  43 .RS 12n
  44 Min feed interval in milliseconds
  45 .sp
  46 Default value: \fB200\fR.
  47 .RE
  48
  49 .sp
  50 .ne 2
  51 .na
  52 \fBl2arc_feed_secs\fR (ulong)
  53 .ad
  54 .RS 12n
  55 Seconds between L2ARC writing
  56 .sp
  57 Default value: \fB1\fR.
  58 .RE
  59
  60 .sp
  61 .ne 2
  62 .na
  63 \fBl2arc_headroom\fR (ulong)
  64 .ad
  65 .RS 12n
  66 Number of max device writes to precache
  67 .sp
  68 Default value: \fB2\fR.
  69 .RE
  70
  71 .sp
  72 .ne 2
  73 .na
  74 \fBl2arc_headroom_boost\fR (ulong)
  75 .ad
  76 .RS 12n
  77 Compressed l2arc_headroom multiplier
  78 .sp
  79 Default value: \fB200\fR.
  80 .RE
  81
  82 .sp
  83 .ne 2
  84 .na
  85 \fBl2arc_nocompress\fR (int)
  86 .ad
  87 .RS 12n
  88 Skip compressing L2ARC buffers
  89 .sp
  90 Use \fB1\fR for yes and \fB0\fR for no (default).
  91 .RE
  92
  93 .sp
  94 .ne 2
  95 .na
  96 \fBl2arc_noprefetch\fR (int)
  97 .ad
  98 .RS 12n
  99 Skip caching prefetched buffers
 100 .sp
 101 Use \fB1\fR for yes (default) and \fB0\fR to disable.
 102 .RE
 103
 104 .sp
 105 .ne 2
 106 .na
 107 \fBl2arc_norw\fR (int)
 108 .ad
 109 .RS 12n
 110 No reads during writes
 111 .sp
 112 Use \fB1\fR for yes and \fB0\fR for no (default).
 113 .RE
 114
 115 .sp
 116 .ne 2
 117 .na
 118 \fBl2arc_write_boost\fR (ulong)
 119 .ad
 120 .RS 12n
 121 Extra write bytes during device warmup
 122 .sp
 123 Default value: \fB8,388,608\fR.
 124 .RE
 125
 126 .sp
 127 .ne 2
 128 .na
 129 \fBl2arc_write_max\fR (ulong)
 130 .ad
 131 .RS 12n
 132 Max write bytes per interval
 133 .sp
 134 Default value: \fB8,388,608\fR.
 135 .RE
 136
 137 .sp
 138 .ne 2
 139 .na
 140 \fBmetaslab_bias_enabled\fR (int)
 141 .ad
 142 .RS 12n
 143 Enable metaslab group biasing based on its vdev's over- or under-utilization
 144 relative to the pool.
 145 .sp
 146 Use \fB1\fR for yes (default) and \fB0\fR for no.
 147 .RE
 148
 149 .sp
 150 .ne 2
 151 .na
 152 \fBmetaslab_debug_load\fR (int)
 153 .ad
 154 .RS 12n
 155 Load all metaslabs during pool import.
 156 .sp
 157 Use \fB1\fR for yes and \fB0\fR for no (default).
 158 .RE
 159
 160 .sp
 161 .ne 2
 162 .na
 163 \fBmetaslab_debug_unload\fR (int)
 164 .ad
 165 .RS 12n
 166 Prevent metaslabs from being unloaded.
 167 .sp
 168 Use \fB1\fR for yes and \fB0\fR for no (default).
 169 .RE
 170
 171 .sp
 172 .ne 2
 173 .na
 174 \fBmetaslab_fragmentation_factor_enabled\fR (int)
 175 .ad
 176 .RS 12n
 177 Enable use of the fragmentation metric in computing metaslab weights.
 178 .sp
 179 Use \fB1\fR for yes (default) and \fB0\fR for no.
 180 .RE
 181
 182 .sp
 183 .ne 2
 184 .na
 185 \fBmetaslabs_per_vdev\fR (int)
 186 .ad
 187 .RS 12n
 188 When a vdev is added, it will be divided into approximately (but no more than) this number of metaslabs.
 189 .sp
 190 Default value: \fB200\fR.
 191 .RE
 192
 193 .sp
 194 .ne 2
 195 .na
 196 \fBmetaslab_preload_enabled\fR (int)
 197 .ad
 198 .RS 12n
 199 Enable metaslab group preloading.
 200 .sp
 201 Use \fB1\fR for yes (default) and \fB0\fR for no.
 202 .RE
 203
 204 .sp
 205 .ne 2
 206 .na
 207 \fBmetaslab_lba_weighting_enabled\fR (int)
 208 .ad
 209 .RS 12n
 210 Give more weight to metaslabs with lower LBAs, assuming they have
 211 greater bandwidth as is typically the case on a modern constant
 212 angular velocity disk drive.
 213 .sp
 214 Use \fB1\fR for yes (default) and \fB0\fR for no.
 215 .RE
 216
 217 .sp
 218 .ne 2
 219 .na
 220 \fBspa_config_path\fR (charp)
 221 .ad
 222 .RS 12n
 223 SPA config file
 224 .sp
 225 Default value: \fB/etc/zfs/zpool.cache\fR.
 226 .RE
 227
 228 .sp
 229 .ne 2
 230 .na
 231 \fBspa_asize_inflation\fR (int)
 232 .ad
 233 .RS 12n
 234 Multiplication factor used to estimate actual disk consumption from the
 235 size of data being written. The default value is a worst case estimate,
 236 but lower values may be valid for a given pool depending on its
 237 configuration.  Pool administrators who understand the factors involved
 238 may wish to specify a more realistic inflation factor, particularly if
 239 they operate close to quota or capacity limits.
 240 .sp
 241 Default value: 24
 242 .RE
 243
 244 .sp
 245 .ne 2
 246 .na
 247 \fBspa_load_verify_data\fR (int)
 248 .ad
 249 .RS 12n
 250 Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
 251 import.  Use 0 to disable and 1 to enable.
 252
 253 An extreme rewind import normally performs a full traversal of all
 254 blocks in the pool for verification.  If this parameter is set to 0,
 255 the traversal skips non-metadata blocks.  It can be toggled once the
 256 import has started to stop or start the traversal of non-metadata blocks.
 257 .sp
 258 Default value: 1
 259 .RE
 260
 261 .sp
 262 .ne 2
 263 .na
 264 \fBspa_load_verify_metadata\fR (int)
 265 .ad
 266 .RS 12n
 267 Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
 268 pool import.  Use 0 to disable and 1 to enable.
 269
 270 An extreme rewind import normally performs a full traversal of all
 271 blocks in the pool for verification.  If this parameter is set to 1,
 272 the traversal is not performed.  It can be toggled once the import has
 273 started to stop or start the traversal.
 274 .sp
 275 Default value: 1
 276 .RE
 277
 278 .sp
 279 .ne 2
 280 .na
 281 \fBspa_load_verify_maxinflight\fR (int)
 282 .ad
 283 .RS 12n
 284 Maximum concurrent I/Os during the traversal performed during an "extreme
 285 rewind" (\fB-X\fR) pool import.
 286 .sp
 287 Default value: 10000
 288 .RE
 289
 290 .sp
 291 .ne 2
 292 .na
 293 \fBzfetch_array_rd_sz\fR (ulong)
 294 .ad
 295 .RS 12n
 296 If prefetching is enabled, disable prefetching for reads larger than this size.
 297 .sp
 298 Default value: \fB1,048,576\fR.
 299 .RE
 300
 301 .sp
 302 .ne 2
 303 .na
 304 \fBzfetch_block_cap\fR (uint)
 305 .ad
 306 .RS 12n
 307 Max number of blocks to prefetch at a time
 308 .sp
 309 Default value: \fB256\fR.
 310 .RE
 311
 312 .sp
 313 .ne 2
 314 .na
 315 \fBzfetch_max_streams\fR (uint)
 316 .ad
 317 .RS 12n
 318 Max number of streams per zfetch (prefetch streams per file).
 319 .sp
 320 Default value: \fB8\fR.
 321 .RE
 322
 323 .sp
 324 .ne 2
 325 .na
 326 \fBzfetch_min_sec_reap\fR (uint)
 327 .ad
 328 .RS 12n
 329 Min time before an active prefetch stream can be reclaimed
 330 .sp
 331 Default value: \fB2\fR.
 332 .RE
 333
 334 .sp
 335 .ne 2
 336 .na
 337 \fBzfs_arc_average_blocksize\fR (int)
 338 .ad
 339 .RS 12n
 340 The ARC's buffer hash table is sized based on the assumption of an average
 341 block size of \fBzfs_arc_average_blocksize\fR (default 8K).  This works out
 342 to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
 343 For configurations with a known larger average block size this value can be
 344 increased to reduce the memory footprint.
 345
 346 .sp
 347 Default value: \fB8192\fR.
 348 .RE
 349
 350 .sp
 351 .ne 2
 352 .na
 353 \fBzfs_arc_grow_retry\fR (int)
 354 .ad
 355 .RS 12n
 356 Seconds before growing arc size
 357 .sp
 358 Default value: \fB5\fR.
 359 .RE
 360
 361 .sp
 362 .ne 2
 363 .na
 364 \fBzfs_arc_max\fR (ulong)
 365 .ad
 366 .RS 12n
 367 Max arc size
 368 .sp
 369 Default value: \fB0\fR.
 370 .RE
 371
 372 .sp
 373 .ne 2
 374 .na
 375 \fBzfs_arc_memory_throttle_disable\fR (int)
 376 .ad
 377 .RS 12n
 378 Disable memory throttle
 379 .sp
 380 Use \fB1\fR for yes (default) and \fB0\fR to disable.
 381 .RE
 382
 383 .sp
 384 .ne 2
 385 .na
 386 \fBzfs_arc_meta_limit\fR (ulong)
 387 .ad
 388 .RS 12n
 389 Meta limit for arc size
 390 .sp
 391 Default value: \fB0\fR.
 392 .RE
 393
 394 .sp
 395 .ne 2
 396 .na
 397 \fBzfs_arc_meta_prune\fR (int)
 398 .ad
 399 .RS 12n
 400 Bytes of meta data to prune
 401 .sp
 402 Default value: \fB1,048,576\fR.
 403 .RE
 404
 405 .sp
 406 .ne 2
 407 .na
 408 \fBzfs_arc_min\fR (ulong)
 409 .ad
 410 .RS 12n
 411 Min arc size
 412 .sp
 413 Default value: \fB100\fR.
 414 .RE
 415
 416 .sp
 417 .ne 2
 418 .na
 419 \fBzfs_arc_min_prefetch_lifespan\fR (int)
 420 .ad
 421 .RS 12n
 422 Min life of prefetch block
 423 .sp
 424 Default value: \fB100\fR.
 425 .RE
 426
 427 .sp
 428 .ne 2
 429 .na
 430 \fBzfs_arc_p_aggressive_disable\fR (int)
 431 .ad
 432 .RS 12n
 433 Disable aggressive arc_p growth
 434 .sp
 435 Use \fB1\fR for yes (default) and \fB0\fR to disable.
 436 .RE
 437
 438 .sp
 439 .ne 2
 440 .na
 441 \fBzfs_arc_p_dampener_disable\fR (int)
 442 .ad
 443 .RS 12n
 444 Disable arc_p adapt dampener
 445 .sp
 446 Use \fB1\fR for yes (default) and \fB0\fR to disable.
 447 .RE
 448
 449 .sp
 450 .ne 2
 451 .na
 452 \fBzfs_arc_shrink_shift\fR (int)
 453 .ad
 454 .RS 12n
 455 log2(fraction of arc to reclaim)
 456 .sp
 457 Default value: \fB5\fR.
 458 .RE
 459
 460 .sp
 461 .ne 2
 462 .na
 463 \fBzfs_autoimport_disable\fR (int)
 464 .ad
 465 .RS 12n
 466 Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
 467 .sp
 468 Use \fB1\fR for yes and \fB0\fR for no (default).
 469 .RE
 470
 471 .sp
 472 .ne 2
 473 .na
 474 \fBzfs_dbuf_state_index\fR (int)
 475 .ad
 476 .RS 12n
 477 Calculate arc header index
 478 .sp
 479 Default value: \fB0\fR.
 480 .RE
 481
 482 .sp
 483 .ne 2
 484 .na
 485 \fBzfs_deadman_enabled\fR (int)
 486 .ad
 487 .RS 12n
 488 Enable deadman timer
 489 .sp
 490 Use \fB1\fR for yes (default) and \fB0\fR to disable.
 491 .RE
 492
 493 .sp
 494 .ne 2
 495 .na
 496 \fBzfs_deadman_synctime_ms\fR (ulong)
 497 .ad
 498 .RS 12n
 499 Expiration time in milliseconds. This value has two meanings. First it is
 500 used to determine when the spa_deadman() logic should fire. By default the
 501 spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
 502 Secondly, the value determines if an I/O is considered "hung". Any I/O that
 503 has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
 504 in a zevent being logged.
 505 .sp
 506 Default value: \fB1,000,000\fR.
 507 .RE
 508
 509 .sp
 510 .ne 2
 511 .na
 512 \fBzfs_dedup_prefetch\fR (int)
 513 .ad
 514 .RS 12n
 515 Enable prefetching dedup-ed blks
 516 .sp
 517 Use \fB1\fR for yes and \fB0\fR to disable (default).
 518 .RE
 519
 520 .sp
 521 .ne 2
 522 .na
 523 \fBzfs_delay_min_dirty_percent\fR (int)
 524 .ad
 525 .RS 12n
 526 Start to delay each transaction once there is this amount of dirty data,
 527 expressed as a percentage of \fBzfs_dirty_data_max\fR.
 528 This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
 529 See the section "ZFS TRANSACTION DELAY".
 530 .sp
 531 Default value: \fB60\fR.
 532 .RE
 533
 534 .sp
 535 .ne 2
 536 .na
 537 \fBzfs_delay_scale\fR (int)
 538 .ad
 539 .RS 12n
 540 This controls how quickly the transaction delay approaches infinity.
 541 Larger values cause longer delays for a given amount of dirty data.
 542 .sp
 543 For the smoothest delay, this value should be about 1 billion divided
 544 by the maximum number of operations per second.  This will smoothly
 545 handle between 10x and 1/10th this number.
 546 .sp
 547 See the section "ZFS TRANSACTION DELAY".
 548 .sp
 549 Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
 550 .sp
 551 Default value: \fB500,000\fR.
 552 .RE
 553
 554 .sp
 555 .ne 2
 556 .na
 557 \fBzfs_dirty_data_max\fR (int)
 558 .ad
 559 .RS 12n
 560 Determines the dirty space limit in bytes.  Once this limit is exceeded, new
 561 writes are halted until space frees up. This parameter takes precedence
 562 over \fBzfs_dirty_data_max_percent\fR.
 563 See the section "ZFS TRANSACTION DELAY".
 564 .sp
 565 Default value: 10 percent of all memory, capped at \fBzfs_dirty_data_max_max\fR.
 566 .RE
 567
 568 .sp
 569 .ne 2
 570 .na
 571 \fBzfs_dirty_data_max_max\fR (int)
 572 .ad
 573 .RS 12n
 574 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
 575 This limit is only enforced at module load time, and will be ignored if
 576 \fBzfs_dirty_data_max\fR is later changed.  This parameter takes
 577 precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
 578 "ZFS TRANSACTION DELAY".
 579 .sp
 580 Default value: 25% of physical RAM.
 581 .RE
 582
 583 .sp
 584 .ne 2
 585 .na
 586 \fBzfs_dirty_data_max_max_percent\fR (int)
 587 .ad
 588 .RS 12n
 589 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
 590 percentage of physical RAM.  This limit is only enforced at module load
 591 time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
 592 The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
 593 one. See the section "ZFS TRANSACTION DELAY".
 594 .sp
 595 Default value: 25
 596 .RE
 597
 598 .sp
 599 .ne 2
 600 .na
 601 \fBzfs_dirty_data_max_percent\fR (int)
 602 .ad
 603 .RS 12n
 604 Determines the dirty space limit, expressed as a percentage of all
 605 memory.  Once this limit is exceeded, new writes are halted until space frees
 606 up.  The parameter \fBzfs_dirty_data_max\fR takes precedence over this
 607 one.  See the section "ZFS TRANSACTION DELAY".
 608 .sp
 609 Default value: 10%, subject to \fBzfs_dirty_data_max_max\fR.
 610 .RE
 611
 612 .sp
 613 .ne 2
 614 .na
 615 \fBzfs_dirty_data_sync\fR (int)
 616 .ad
 617 .RS 12n
 618 Start syncing out a transaction group if there is at least this much dirty data.
 619 .sp
 620 Default value: \fB67,108,864\fR.
 621 .RE
 622
 623 .sp
 624 .ne 2
 625 .na
 626 \fBzfs_vdev_async_read_max_active\fR (int)
 627 .ad
 628 .RS 12n
 629 Maxium asynchronous read I/Os active to each device.
 630 See the section "ZFS I/O SCHEDULER".
 631 .sp
 632 Default value: \fB3\fR.
 633 .RE
 634
 635 .sp
 636 .ne 2
 637 .na
 638 \fBzfs_vdev_async_read_min_active\fR (int)
 639 .ad
 640 .RS 12n
 641 Minimum asynchronous read I/Os active to each device.
 642 See the section "ZFS I/O SCHEDULER".
 643 .sp
 644 Default value: \fB1\fR.
 645 .RE
 646
 647 .sp
 648 .ne 2
 649 .na
 650 \fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
 651 .ad
 652 .RS 12n
 653 When the pool has more than
 654 \fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
 655 \fBzfs_vdev_async_write_max_active\fR to limit active async writes.  If
 656 the dirty data is between min and max, the active I/O limit is linearly
 657 interpolated. See the section "ZFS I/O SCHEDULER".
 658 .sp
 659 Default value: \fB60\fR.
 660 .RE
 661
 662 .sp
 663 .ne 2
 664 .na
 665 \fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
 666 .ad
 667 .RS 12n
 668 When the pool has less than
 669 \fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
 670 \fBzfs_vdev_async_write_min_active\fR to limit active async writes.  If
 671 the dirty data is between min and max, the active I/O limit is linearly
 672 interpolated. See the section "ZFS I/O SCHEDULER".
 673 .sp
 674 Default value: \fB30\fR.
 675 .RE
 676
 677 .sp
 678 .ne 2
 679 .na
 680 \fBzfs_vdev_async_write_max_active\fR (int)
 681 .ad
 682 .RS 12n
 683 Maxium asynchronous write I/Os active to each device.
 684 See the section "ZFS I/O SCHEDULER".
 685 .sp
 686 Default value: \fB10\fR.
 687 .RE
 688
 689 .sp
 690 .ne 2
 691 .na
 692 \fBzfs_vdev_async_write_min_active\fR (int)
 693 .ad
 694 .RS 12n
 695 Minimum asynchronous write I/Os active to each device.
 696 See the section "ZFS I/O SCHEDULER".
 697 .sp
 698 Default value: \fB1\fR.
 699 .RE
 700
 701 .sp
 702 .ne 2
 703 .na
 704 \fBzfs_vdev_max_active\fR (int)
 705 .ad
 706 .RS 12n
 707 The maximum number of I/Os active to each device.  Ideally, this will be >=
 708 the sum of each queue's max_active.  It must be at least the sum of each
 709 queue's min_active.  See the section "ZFS I/O SCHEDULER".
 710 .sp
 711 Default value: \fB1,000\fR.
 712 .RE
 713
 714 .sp
 715 .ne 2
 716 .na
 717 \fBzfs_vdev_scrub_max_active\fR (int)
 718 .ad
 719 .RS 12n
 720 Maxium scrub I/Os active to each device.
 721 See the section "ZFS I/O SCHEDULER".
 722 .sp
 723 Default value: \fB2\fR.
 724 .RE
 725
 726 .sp
 727 .ne 2
 728 .na
 729 \fBzfs_vdev_scrub_min_active\fR (int)
 730 .ad
 731 .RS 12n
 732 Minimum scrub I/Os active to each device.
 733 See the section "ZFS I/O SCHEDULER".
 734 .sp
 735 Default value: \fB1\fR.
 736 .RE
 737
 738 .sp
 739 .ne 2
 740 .na
 741 \fBzfs_vdev_sync_read_max_active\fR (int)
 742 .ad
 743 .RS 12n
 744 Maxium synchronous read I/Os active to each device.
 745 See the section "ZFS I/O SCHEDULER".
 746 .sp
 747 Default value: \fB10\fR.
 748 .RE
 749
 750 .sp
 751 .ne 2
 752 .na
 753 \fBzfs_vdev_sync_read_min_active\fR (int)
 754 .ad
 755 .RS 12n
 756 Minimum synchronous read I/Os active to each device.
 757 See the section "ZFS I/O SCHEDULER".
 758 .sp
 759 Default value: \fB10\fR.
 760 .RE
 761
 762 .sp
 763 .ne 2
 764 .na
 765 \fBzfs_vdev_sync_write_max_active\fR (int)
 766 .ad
 767 .RS 12n
 768 Maxium synchronous write I/Os active to each device.
 769 See the section "ZFS I/O SCHEDULER".
 770 .sp
 771 Default value: \fB10\fR.
 772 .RE
 773
 774 .sp
 775 .ne 2
 776 .na
 777 \fBzfs_vdev_sync_write_min_active\fR (int)
 778 .ad
 779 .RS 12n
 780 Minimum synchronous write I/Os active to each device.
 781 See the section "ZFS I/O SCHEDULER".
 782 .sp
 783 Default value: \fB10\fR.
 784 .RE
 785
 786 .sp
 787 .ne 2
 788 .na
 789 \fBzfs_disable_dup_eviction\fR (int)
 790 .ad
 791 .RS 12n
 792 Disable duplicate buffer eviction
 793 .sp
 794 Use \fB1\fR for yes and \fB0\fR for no (default).
 795 .RE
 796
 797 .sp
 798 .ne 2
 799 .na
 800 \fBzfs_expire_snapshot\fR (int)
 801 .ad
 802 .RS 12n
 803 Seconds to expire .zfs/snapshot
 804 .sp
 805 Default value: \fB300\fR.
 806 .RE
 807
 808 .sp
 809 .ne 2
 810 .na
 811 \fBzfs_flags\fR (int)
 812 .ad
 813 .RS 12n
 814 Set additional debugging flags
 815 .sp
 816 Default value: \fB1\fR.
 817 .RE
 818
 819 .sp
 820 .ne 2
 821 .na
 822 \fBzfs_free_leak_on_eio\fR (int)
 823 .ad
 824 .RS 12n
 825 If destroy encounters an EIO while reading metadata (e.g. indirect
 826 blocks), space referenced by the missing metadata can not be freed.
 827 Normally this causes the background destroy to become "stalled", as
 828 it is unable to make forward progress.  While in this stalled state,
 829 all remaining space to free from the error-encountering filesystem is
 830 "temporarily leaked".  Set this flag to cause it to ignore the EIO,
 831 permanently leak the space from indirect blocks that can not be read,
 832 and continue to free everything else that it can.
 833
 834 The default, "stalling" behavior is useful if the storage partially
 835 fails (i.e. some but not all i/os fail), and then later recovers.  In
 836 this case, we will be able to continue pool operations while it is
 837 partially failed, and when it recovers, we can continue to free the
 838 space, with no leaks.  However, note that this case is actually
 839 fairly rare.
 840
 841 Typically pools either (a) fail completely (but perhaps temporarily,
 842 e.g. a top-level vdev going offline), or (b) have localized,
 843 permanent errors (e.g. disk returns the wrong data due to bit flip or
 844 firmware bug).  In case (a), this setting does not matter because the
 845 pool will be suspended and the sync thread will not be able to make
 846 forward progress regardless.  In case (b), because the error is
 847 permanent, the best we can do is leak the minimum amount of space,
 848 which is what setting this flag will do.  Therefore, it is reasonable
 849 for this flag to normally be set, but we chose the more conservative
 850 approach of not setting it, so that there is no possibility of
 851 leaking space in the "partial temporary" failure case.
 852 .sp
 853 Default value: \fB0\fR.
 854 .RE
 855
 856 .sp
 857 .ne 2
 858 .na
 859 \fBzfs_free_min_time_ms\fR (int)
 860 .ad
 861 .RS 12n
 862 Min millisecs to free per txg
 863 .sp
 864 Default value: \fB1,000\fR.
 865 .RE
 866
 867 .sp
 868 .ne 2
 869 .na
 870 \fBzfs_immediate_write_sz\fR (long)
 871 .ad
 872 .RS 12n
 873 Largest data block to write to zil
 874 .sp
 875 Default value: \fB32,768\fR.
 876 .RE
 877
 878 .sp
 879 .ne 2
 880 .na
 881 \fBzfs_mdcomp_disable\fR (int)
 882 .ad
 883 .RS 12n
 884 Disable meta data compression
 885 .sp
 886 Use \fB1\fR for yes and \fB0\fR for no (default).
 887 .RE
 888
 889 .sp
 890 .ne 2
 891 .na
 892 \fBzfs_metaslab_fragmentation_threshold\fR (int)
 893 .ad
 894 .RS 12n
 895 Allow metaslabs to keep their active state as long as their fragmentation
 896 percentage is less than or equal to this value. An active metaslab that
 897 exceeds this threshold will no longer keep its active status allowing
 898 better metaslabs to be selected.
 899 .sp
 900 Default value: \fB70\fR.
 901 .RE
 902
 903 .sp
 904 .ne 2
 905 .na
 906 \fBzfs_mg_fragmentation_threshold\fR (int)
 907 .ad
 908 .RS 12n
 909 Metaslab groups are considered eligible for allocations if their
 910 fragmenation metric (measured as a percentage) is less than or equal to
 911 this value. If a metaslab group exceeds this threshold then it will be
 912 skipped unless all metaslab groups within the metaslab class have also
 913 crossed this threshold.
 914 .sp
 915 Default value: \fB85\fR.
 916 .RE
 917
 918 .sp
 919 .ne 2
 920 .na
 921 \fBzfs_mg_noalloc_threshold\fR (int)
 922 .ad
 923 .RS 12n
 924 Defines a threshold at which metaslab groups should be eligible for
 925 allocations.  The value is expressed as a percentage of free space
 926 beyond which a metaslab group is always eligible for allocations.
 927 If a metaslab group's free space is less than or equal to the
 928 the threshold, the allocator will avoid allocating to that group
 929 unless all groups in the pool have reached the threshold.  Once all
 930 groups have reached the threshold, all groups are allowed to accept
 931 allocations.  The default value of 0 disables the feature and causes
 932 all metaslab groups to be eligible for allocations.
 933
 934 This parameter allows to deal with pools having heavily imbalanced
 935 vdevs such as would be the case when a new vdev has been added.
 936 Setting the threshold to a non-zero percentage will stop allocations
 937 from being made to vdevs that aren't filled to the specified percentage
 938 and allow lesser filled vdevs to acquire more allocations than they
 939 otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
 940 .sp
 941 Default value: \fB0\fR.
 942 .RE
 943
 944 .sp
 945 .ne 2
 946 .na
 947 \fBzfs_no_scrub_io\fR (int)
 948 .ad
 949 .RS 12n
 950 Set for no scrub I/O
 951 .sp
 952 Use \fB1\fR for yes and \fB0\fR for no (default).
 953 .RE
 954
 955 .sp
 956 .ne 2
 957 .na
 958 \fBzfs_no_scrub_prefetch\fR (int)
 959 .ad
 960 .RS 12n
 961 Set for no scrub prefetching
 962 .sp
 963 Use \fB1\fR for yes and \fB0\fR for no (default).
 964 .RE
 965
 966 .sp
 967 .ne 2
 968 .na
 969 \fBzfs_nocacheflush\fR (int)
 970 .ad
 971 .RS 12n
 972 Disable cache flushes
 973 .sp
 974 Use \fB1\fR for yes and \fB0\fR for no (default).
 975 .RE
 976
 977 .sp
 978 .ne 2
 979 .na
 980 \fBzfs_nopwrite_enabled\fR (int)
 981 .ad
 982 .RS 12n
 983 Enable NOP writes
 984 .sp
 985 Use \fB1\fR for yes (default) and \fB0\fR to disable.
 986 .RE
 987
 988 .sp
 989 .ne 2
 990 .na
 991 \fBzfs_pd_blks_max\fR (int)
 992 .ad
 993 .RS 12n
 994 Max number of blocks to prefetch
 995 .sp
 996 Default value: \fB100\fR.
 997 .RE
 998
 999 .sp
1000 .ne 2
1001 .na
1002 \fBzfs_prefetch_disable\fR (int)
1003 .ad
1004 .RS 12n
1005 Disable all ZFS prefetching
1006 .sp
1007 Use \fB1\fR for yes and \fB0\fR for no (default).
1008 .RE
1009
1010 .sp
1011 .ne 2
1012 .na
1013 \fBzfs_read_chunk_size\fR (long)
1014 .ad
1015 .RS 12n
1016 Bytes to read per chunk
1017 .sp
1018 Default value: \fB1,048,576\fR.
1019 .RE
1020
1021 .sp
1022 .ne 2
1023 .na
1024 \fBzfs_read_history\fR (int)
1025 .ad
1026 .RS 12n
1027 Historic statistics for the last N reads
1028 .sp
1029 Default value: \fB0\fR.
1030 .RE
1031
1032 .sp
1033 .ne 2
1034 .na
1035 \fBzfs_read_history_hits\fR (int)
1036 .ad
1037 .RS 12n
1038 Include cache hits in read history
1039 .sp
1040 Use \fB1\fR for yes and \fB0\fR for no (default).
1041 .RE
1042
1043 .sp
1044 .ne 2
1045 .na
1046 \fBzfs_recover\fR (int)
1047 .ad
1048 .RS 12n
1049 Set to attempt to recover from fatal errors. This should only be used as a
1050 last resort, as it typically results in leaked space, or worse.
1051 .sp
1052 Use \fB1\fR for yes and \fB0\fR for no (default).
1053 .RE
1054
1055 .sp
1056 .ne 2
1057 .na
1058 \fBzfs_resilver_delay\fR (int)
1059 .ad
1060 .RS 12n
1061 Number of ticks to delay prior to issuing a resilver I/O operation when
1062 a non-resilver or non-scrub I/O operation has occurred within the past
1063 \fBzfs_scan_idle\fR ticks.
1064 .sp
1065 Default value: \fB2\fR.
1066 .RE
1067
1068 .sp
1069 .ne 2
1070 .na
1071 \fBzfs_resilver_min_time_ms\fR (int)
1072 .ad
1073 .RS 12n
1074 Min millisecs to resilver per txg
1075 .sp
1076 Default value: \fB3,000\fR.
1077 .RE
1078
1079 .sp
1080 .ne 2
1081 .na
1082 \fBzfs_scan_idle\fR (int)
1083 .ad
1084 .RS 12n
1085 Idle window in clock ticks.  During a scrub or a resilver, if
1086 a non-scrub or non-resilver I/O operation has occurred during this
1087 window, the next scrub or resilver operation is delayed by, respectively
1088 \fBzfs_scrub_delay\fR or \fBzfs_resilver_delay\fR ticks.
1089 .sp
1090 Default value: \fB50\fR.
1091 .RE
1092
1093 .sp
1094 .ne 2
1095 .na
1096 \fBzfs_scan_min_time_ms\fR (int)
1097 .ad
1098 .RS 12n
1099 Min millisecs to scrub per txg
1100 .sp
1101 Default value: \fB1,000\fR.
1102 .RE
1103
1104 .sp
1105 .ne 2
1106 .na
1107 \fBzfs_scrub_delay\fR (int)
1108 .ad
1109 .RS 12n
1110 Number of ticks to delay prior to issuing a scrub I/O operation when
1111 a non-scrub or non-resilver I/O operation has occurred within the past
1112 \fBzfs_scan_idle\fR ticks.
1113 .sp
1114 Default value: \fB4\fR.
1115 .RE
1116
1117 .sp
1118 .ne 2
1119 .na
1120 \fBzfs_send_corrupt_data\fR (int)
1121 .ad
1122 .RS 12n
1123 Allow to send corrupt data (ignore read/checksum errors when sending data)
1124 .sp
1125 Use \fB1\fR for yes and \fB0\fR for no (default).
1126 .RE
1127
1128 .sp
1129 .ne 2
1130 .na
1131 \fBzfs_sync_pass_deferred_free\fR (int)
1132 .ad
1133 .RS 12n
1134 Defer frees starting in this pass
1135 .sp
1136 Default value: \fB2\fR.
1137 .RE
1138
1139 .sp
1140 .ne 2
1141 .na
1142 \fBzfs_sync_pass_dont_compress\fR (int)
1143 .ad
1144 .RS 12n
1145 Don't compress starting in this pass
1146 .sp
1147 Default value: \fB5\fR.
1148 .RE
1149
1150 .sp
1151 .ne 2
1152 .na
1153 \fBzfs_sync_pass_rewrite\fR (int)
1154 .ad
1155 .RS 12n
1156 Rewrite new bps starting in this pass
1157 .sp
1158 Default value: \fB2\fR.
1159 .RE
1160
1161 .sp
1162 .ne 2
1163 .na
1164 \fBzfs_top_maxinflight\fR (int)
1165 .ad
1166 .RS 12n
1167 Max I/Os per top-level vdev during scrub or resilver operations.
1168 .sp
1169 Default value: \fB32\fR.
1170 .RE
1171
1172 .sp
1173 .ne 2
1174 .na
1175 \fBzfs_txg_history\fR (int)
1176 .ad
1177 .RS 12n
1178 Historic statistics for the last N txgs
1179 .sp
1180 Default value: \fB0\fR.
1181 .RE
1182
1183 .sp
1184 .ne 2
1185 .na
1186 \fBzfs_txg_timeout\fR (int)
1187 .ad
1188 .RS 12n
1189 Max seconds worth of delta per txg
1190 .sp
1191 Default value: \fB5\fR.
1192 .RE
1193
1194 .sp
1195 .ne 2
1196 .na
1197 \fBzfs_vdev_aggregation_limit\fR (int)
1198 .ad
1199 .RS 12n
1200 Max vdev I/O aggregation size
1201 .sp
1202 Default value: \fB131,072\fR.
1203 .RE
1204
1205 .sp
1206 .ne 2
1207 .na
1208 \fBzfs_vdev_cache_bshift\fR (int)
1209 .ad
1210 .RS 12n
1211 Shift size to inflate reads too
1212 .sp
1213 Default value: \fB16\fR.
1214 .RE
1215
1216 .sp
1217 .ne 2
1218 .na
1219 \fBzfs_vdev_cache_max\fR (int)
1220 .ad
1221 .RS 12n
1222 Inflate reads small than max
1223 .RE
1224
1225 .sp
1226 .ne 2
1227 .na
1228 \fBzfs_vdev_cache_size\fR (int)
1229 .ad
1230 .RS 12n
1231 Total size of the per-disk cache
1232 .sp
1233 Default value: \fB0\fR.
1234 .RE
1235
1236 .sp
1237 .ne 2
1238 .na
1239 \fBzfs_vdev_mirror_switch_us\fR (int)
1240 .ad
1241 .RS 12n
1242 Switch mirrors every N usecs
1243 .sp
1244 Default value: \fB10,000\fR.
1245 .RE
1246
1247 .sp
1248 .ne 2
1249 .na
1250 \fBzfs_vdev_read_gap_limit\fR (int)
1251 .ad
1252 .RS 12n
1253 Aggregate read I/O over gap
1254 .sp
1255 Default value: \fB32,768\fR.
1256 .RE
1257
1258 .sp
1259 .ne 2
1260 .na
1261 \fBzfs_vdev_scheduler\fR (charp)
1262 .ad
1263 .RS 12n
1264 I/O scheduler
1265 .sp
1266 Default value: \fBnoop\fR.
1267 .RE
1268
1269 .sp
1270 .ne 2
1271 .na
1272 \fBzfs_vdev_write_gap_limit\fR (int)
1273 .ad
1274 .RS 12n
1275 Aggregate write I/O over gap
1276 .sp
1277 Default value: \fB4,096\fR.
1278 .RE
1279
1280 .sp
1281 .ne 2
1282 .na
1283 \fBzfs_zevent_cols\fR (int)
1284 .ad
1285 .RS 12n
1286 Max event column width
1287 .sp
1288 Default value: \fB80\fR.
1289 .RE
1290
1291 .sp
1292 .ne 2
1293 .na
1294 \fBzfs_zevent_console\fR (int)
1295 .ad
1296 .RS 12n
1297 Log events to the console
1298 .sp
1299 Use \fB1\fR for yes and \fB0\fR for no (default).
1300 .RE
1301
1302 .sp
1303 .ne 2
1304 .na
1305 \fBzfs_zevent_len_max\fR (int)
1306 .ad
1307 .RS 12n
1308 Max event queue length
1309 .sp
1310 Default value: \fB0\fR.
1311 .RE
1312
1313 .sp
1314 .ne 2
1315 .na
1316 \fBzil_replay_disable\fR (int)
1317 .ad
1318 .RS 12n
1319 Disable intent logging replay
1320 .sp
1321 Use \fB1\fR for yes and \fB0\fR for no (default).
1322 .RE
1323
1324 .sp
1325 .ne 2
1326 .na
1327 \fBzil_slog_limit\fR (ulong)
1328 .ad
1329 .RS 12n
1330 Max commit bytes to separate log device
1331 .sp
1332 Default value: \fB1,048,576\fR.
1333 .RE
1334
1335 .sp
1336 .ne 2
1337 .na
1338 \fBzio_bulk_flags\fR (int)
1339 .ad
1340 .RS 12n
1341 Additional flags to pass to bulk buffers
1342 .sp
1343 Default value: \fB0\fR.
1344 .RE
1345
1346 .sp
1347 .ne 2
1348 .na
1349 \fBzio_delay_max\fR (int)
1350 .ad
1351 .RS 12n
1352 Max zio millisec delay before posting event
1353 .sp
1354 Default value: \fB30,000\fR.
1355 .RE
1356
1357 .sp
1358 .ne 2
1359 .na
1360 \fBzio_injection_enabled\fR (int)
1361 .ad
1362 .RS 12n
1363 Enable fault injection
1364 .sp
1365 Use \fB1\fR for yes and \fB0\fR for no (default).
1366 .RE
1367
1368 .sp
1369 .ne 2
1370 .na
1371 \fBzio_requeue_io_start_cut_in_line\fR (int)
1372 .ad
1373 .RS 12n
1374 Prioritize requeued I/O
1375 .sp
1376 Default value: \fB0\fR.
1377 .RE
1378
1379 .sp
1380 .ne 2
1381 .na
1382 \fBzvol_inhibit_dev\fR (uint)
1383 .ad
1384 .RS 12n
1385 Do not create zvol device nodes
1386 .sp
1387 Use \fB1\fR for yes and \fB0\fR for no (default).
1388 .RE
1389
1390 .sp
1391 .ne 2
1392 .na
1393 \fBzvol_major\fR (uint)
1394 .ad
1395 .RS 12n
1396 Major number for zvol device
1397 .sp
1398 Default value: \fB230\fR.
1399 .RE
1400
1401 .sp
1402 .ne 2
1403 .na
1404 \fBzvol_max_discard_blocks\fR (ulong)
1405 .ad
1406 .RS 12n
1407 Max number of blocks to discard at once
1408 .sp
1409 Default value: \fB16,384\fR.
1410 .RE
1411
1412 .sp
1413 .ne 2
1414 .na
1415 \fBzvol_threads\fR (uint)
1416 .ad
1417 .RS 12n
1418 Number of threads for zvol device
1419 .sp
1420 Default value: \fB32\fR.
1421 .RE
1422
1423 .SH ZFS I/O SCHEDULER
1424 ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
1425 The I/O scheduler determines when and in what order those operations are
1426 issued.  The I/O scheduler divides operations into five I/O classes
1427 prioritized in the following order: sync read, sync write, async read,
1428 async write, and scrub/resilver.  Each queue defines the minimum and
1429 maximum number of concurrent operations that may be issued to the
1430 device.  In addition, the device has an aggregate maximum,
1431 \fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
1432 must not exceed the aggregate maximum.  If the sum of the per-queue
1433 maximums exceeds the aggregate maximum, then the number of active I/Os
1434 may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
1435 be issued regardless of whether all per-queue minimums have been met.
1436 .sp
1437 For many physical devices, throughput increases with the number of
1438 concurrent operations, but latency typically suffers. Further, physical
1439 devices typically have a limit at which more concurrent operations have no
1440 effect on throughput or can actually cause it to decrease.
1441 .sp
1442 The scheduler selects the next operation to issue by first looking for an
1443 I/O class whose minimum has not been satisfied. Once all are satisfied and
1444 the aggregate maximum has not been hit, the scheduler looks for classes
1445 whose maximum has not been satisfied. Iteration through the I/O classes is
1446 done in the order specified above. No further operations are issued if the
1447 aggregate maximum number of concurrent operations has been hit or if there
1448 are no operations queued for an I/O class that has not hit its maximum.
1449 Every time an I/O is queued or an operation completes, the I/O scheduler
1450 looks for new operations to issue.
1451 .sp
1452 In general, smaller max_active's will lead to lower latency of synchronous
1453 operations.  Larger max_active's may lead to higher overall throughput,
1454 depending on underlying storage.
1455 .sp
1456 The ratio of the queues' max_actives determines the balance of performance
1457 between reads, writes, and scrubs.  E.g., increasing
1458 \fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
1459 more quickly, but reads and writes to have higher latency and lower throughput.
1460 .sp
1461 All I/O classes have a fixed maximum number of outstanding operations
1462 except for the async write class. Asynchronous writes represent the data
1463 that is committed to stable storage during the syncing stage for
1464 transaction groups. Transaction groups enter the syncing state
1465 periodically so the number of queued async writes will quickly burst up
1466 and then bleed down to zero. Rather than servicing them as quickly as
1467 possible, the I/O scheduler changes the maximum number of active async
1468 write I/Os according to the amount of dirty data in the pool.  Since
1469 both throughput and latency typically increase with the number of
1470 concurrent operations issued to physical devices, reducing the
1471 burstiness in the number of concurrent operations also stabilizes the
1472 response time of operations from other -- and in particular synchronous
1473 -- queues. In broad strokes, the I/O scheduler will issue more
1474 concurrent operations from the async write queue as there's more dirty
1475 data in the pool.
1476 .sp
1477 Async Writes
1478 .sp
1479 The number of concurrent operations issued for the async write I/O class
1480 follows a piece-wise linear function defined by a few adjustable points.
1481 .nf
1482
1483        |              o---------| <-- zfs_vdev_async_write_max_active
1484   ^    |             /^         |
1485   |    |            / |         |
1486 active |           /  |         |
1487  I/O   |          /   |         |
1488 count  |         /    |         |
1489        |        /     |         |
1490        |-------o      |         | <-- zfs_vdev_async_write_min_active
1491       0|_______^______|_________|
1492        0%      |      |       100% of zfs_dirty_data_max
1493                |      |
1494                |      `-- zfs_vdev_async_write_active_max_dirty_percent
1495                `--------- zfs_vdev_async_write_active_min_dirty_percent
1496
1497 .fi
1498 Until the amount of dirty data exceeds a minimum percentage of the dirty
1499 data allowed in the pool, the I/O scheduler will limit the number of
1500 concurrent operations to the minimum. As that threshold is crossed, the
1501 number of concurrent operations issued increases linearly to the maximum at
1502 the specified maximum percentage of the dirty data allowed in the pool.
1503 .sp
1504 Ideally, the amount of dirty data on a busy pool will stay in the sloped
1505 part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
1506 and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
1507 maximum percentage, this indicates that the rate of incoming data is
1508 greater than the rate that the backend storage can handle. In this case, we
1509 must further throttle incoming writes, as described in the next section.
1510
1511 .SH ZFS TRANSACTION DELAY
1512 We delay transactions when we've determined that the backend storage
1513 isn't able to accommodate the rate of incoming writes.
1514 .sp
1515 If there is already a transaction waiting, we delay relative to when
1516 that transaction will finish waiting.  This way the calculated delay time
1517 is independent of the number of threads concurrently executing
1518 transactions.
1519 .sp
1520 If we are the only waiter, wait relative to when the transaction
1521 started, rather than the current time.  This credits the transaction for
1522 "time already served", e.g. reading indirect blocks.
1523 .sp
1524 The minimum time for a transaction to take is calculated as:
1525 .nf
1526     min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
1527     min_time is then capped at 100 milliseconds.
1528 .fi
1529 .sp
1530 The delay has two degrees of freedom that can be adjusted via tunables.  The
1531 percentage of dirty data at which we start to delay is defined by
1532 \fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
1533 \fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
1534 delay after writing at full speed has failed to keep up with the incoming write
1535 rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
1536 this variable determines the amount of delay at the midpoint of the curve.
1537 .sp
1538 .nf
1539 delay
1540  10ms +-------------------------------------------------------------*+
1541       |                                                             *|
1542   9ms +                                                             *+
1543       |                                                             *|
1544   8ms +                                                             *+
1545       |                                                            * |
1546   7ms +                                                            * +
1547       |                                                            * |
1548   6ms +                                                            * +
1549       |                                                            * |
1550   5ms +                                                           *  +
1551       |                                                           *  |
1552   4ms +                                                           *  +
1553       |                                                           *  |
1554   3ms +                                                          *   +
1555       |                                                          *   |
1556   2ms +                                              (midpoint) *    +
1557       |                                                  |    **     |
1558   1ms +                                                  v ***       +
1559       |             zfs_delay_scale ---------->     ********         |
1560     0 +-------------------------------------*********----------------+
1561       0%                    <- zfs_dirty_data_max ->               100%
1562 .fi
1563 .sp
1564 Note that since the delay is added to the outstanding time remaining on the
1565 most recent transaction, the delay is effectively the inverse of IOPS.
1566 Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
1567 was chosen such that small changes in the amount of accumulated dirty data
1568 in the first 3/4 of the curve yield relatively small differences in the
1569 amount of delay.
1570 .sp
1571 The effects can be easier to understand when the amount of delay is
1572 represented on a log scale:
1573 .sp
1574 .nf
1575 delay
1576 100ms +-------------------------------------------------------------++
1577       +                                                              +
1578       |                                                              |
1579       +                                                             *+
1580  10ms +                                                             *+
1581       +                                                           ** +
1582       |                                              (midpoint)  **  |
1583       +                                                  |     **    +
1584   1ms +                                                  v ****      +
1585       +             zfs_delay_scale ---------->        *****         +
1586       |                                             ****             |
1587       +                                          ****                +
1588 100us +                                        **                    +
1589       +                                       *                      +
1590       |                                      *                       |
1591       +                                     *                        +
1592  10us +                                     *                        +
1593       +                                                              +
1594       |                                                              |
1595       +                                                              +
1596       +--------------------------------------------------------------+
1597       0%                    <- zfs_dirty_data_max ->               100%
1598 .fi
1599 .sp
1600 Note here that only as the amount of dirty data approaches its limit does
1601 the delay start to increase rapidly. The goal of a properly tuned system
1602 should be to keep the amount of dirty data out of that range by first
1603 ensuring that the appropriate limits are set for the I/O scheduler to reach
1604 optimal throughput on the backend storage, and then by changing the value
1605 of \fBzfs_delay_scale\fR to increase the steepness of the curve.