]> git.proxmox.com Git - mirror_zfs.git/commitdiff
vdev_open: clear async fault flag after reopen
authorRob Norris <rob.norris@klarasystems.com>
Tue, 11 Jun 2024 10:49:10 +0000 (20:49 +1000)
committerTony Hutter <hutter2@llnl.gov>
Wed, 17 Jul 2024 17:03:41 +0000 (10:03 -0700)
After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.

In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.

The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
module/zfs/vdev.c

index c74f72159dc9a2a94d12c2a92b8daf0b0b03b656..11cc39ba3527318c51afc320c340146ccf5f8a4a 100644 (file)
@@ -2021,6 +2021,7 @@ vdev_open(vdev_t *vd)
        vd->vdev_stat.vs_aux = VDEV_AUX_NONE;
        vd->vdev_cant_read = B_FALSE;
        vd->vdev_cant_write = B_FALSE;
+       vd->vdev_fault_wanted = B_FALSE;
        vd->vdev_min_asize = vdev_get_min_asize(vd);
 
        /*