]> git.proxmox.com Git - mirror_zfs.git/commitdiff
OpenZFS 8166 - zpool scrub thinks it repaired offline device
authorMatthew Ahrens <mahrens@delphix.com>
Wed, 10 May 2017 17:32:40 +0000 (10:32 -0700)
committerBrian Behlendorf <behlendorf1@llnl.gov>
Wed, 10 May 2017 17:32:39 +0000 (10:32 -0700)
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged.  When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started.  The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.

The fix is to never clear the DTL of offline devices.  Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.

The problem can be worked around by running "zpool scrub" after
"zpool online".

OpenZFS-issue: https://www.illumos.org/issues/8166
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/372
Closes #5806
Closes #6103

module/zfs/vdev.c

index 3b2ba8e259c7b5bce8eae864d72709e72fac864b..b979509c57849e625b3facb5b269824c05495f33 100644 (file)
@@ -1868,6 +1868,9 @@ vdev_dtl_should_excise(vdev_t *vd)
        ASSERT0(scn->scn_phys.scn_errors);
        ASSERT0(vd->vdev_children);
 
+       if (vd->vdev_state < VDEV_STATE_DEGRADED)
+               return (B_FALSE);
+
        if (vd->vdev_resilver_txg == 0 ||
            range_tree_space(vd->vdev_dtl[DTL_MISSING]) == 0)
                return (B_TRUE);