]> git.proxmox.com Git - mirror_zfs.git/commit - include/sys/metaslab.h
OpenZFS 7090 - zfs should throttle allocations
authorDon Brady <don.brady@intel.com>
Fri, 14 Oct 2016 00:59:18 +0000 (18:59 -0600)
committerBrian Behlendorf <behlendorf1@llnl.gov>
Fri, 14 Oct 2016 00:59:18 +0000 (17:59 -0700)
commit3dfb57a35e8cbaa7c424611235d669f3c575ada1
treed0958fdc57be43a540bba035580f0d8b39f1a99c
parenta85a90557dfc70e09475c156a376f6923a6c89f0
OpenZFS 7090 - zfs should throttle allocations

OpenZFS 7090 - zfs should throttle allocations

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
When write I/Os are issued, they are issued in block order but the ZIO
pipeline will drive them asynchronously through the allocation stage
which can result in blocks being allocated out-of-order. It would be
nice to preserve as much of the logical order as possible.

In addition, the allocations are equally scattered across all top-level
VDEVs but not all top-level VDEVs are created equally. The pipeline
should be able to detect devices that are more capable of handling
allocations and should allocate more blocks to those devices. This
allows for dynamic allocation distribution when devices are imbalanced
as fuller devices will tend to be slower than empty devices.

The change includes a new pool-wide allocation queue which would
throttle and order allocations in the ZIO pipeline. The queue would be
ordered by issued time and offset and would provide an initial amount of
allocation of work to each top-level vdev. The allocation logic utilizes
a reservation system to reserve allocations that will be performed by
the allocator. Once an allocation is successfully completed it's
scheduled on a given top-level vdev. Each top-level vdev maintains a
maximum number of allocations that it can handle (mg_alloc_queue_depth).
The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth)
are distributed across the top-level vdevs metaslab groups and round
robin across all eligible metaslab groups to distribute the work. As
top-levels complete their work, they receive additional work from the
pool-wide allocation queue until the allocation queue is emptied.

OpenZFS-issue: https://www.illumos.org/issues/7090
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7
Closes #5258

Porting Notes:
- Maintained minimal stack in zio_done
- Preserve linux-specific io sizes in zio_write_compress
- Added module params and documentation
- Updated to use optimize AVL cmp macros
18 files changed:
include/sys/fs/zfs.h
include/sys/metaslab.h
include/sys/metaslab_impl.h
include/sys/refcount.h
include/sys/spa_impl.h
include/sys/vdev_impl.h
include/sys/zio.h
include/sys/zio_impl.h
man/man5/zfs-module-parameters.5
module/zfs/metaslab.c
module/zfs/refcount.c
module/zfs/spa.c
module/zfs/spa_misc.c
module/zfs/vdev.c
module/zfs/vdev_cache.c
module/zfs/vdev_mirror.c
module/zfs/vdev_queue.c
module/zfs/zio.c