]> git.proxmox.com Git - mirror_zfs.git/commit - module/zfs/dbuf.c
Improve zfs receive performance with lightweight write
authorMatthew Ahrens <matthew.ahrens@delphix.com>
Fri, 11 Dec 2020 18:26:02 +0000 (10:26 -0800)
committerGitHub <noreply@github.com>
Fri, 11 Dec 2020 18:26:02 +0000 (10:26 -0800)
commitba67d82142bc7034734a49a62998cfc96b1d0038
tree571de79e9293abc687e93aea8912431997d00669
parent7d4b365ce32ecaf97020178f2a847a01f7e35476
Improve zfs receive performance with lightweight write

The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105
include/sys/dbuf.h
include/sys/dmu_objset.h
module/zfs/dbuf.c
module/zfs/dmu.c
module/zfs/dmu_objset.c
module/zfs/dmu_recv.c
module/zfs/dnode.c
module/zfs/dnode_sync.c
module/zfs/dsl_dataset.c
module/zfs/dsl_pool.c