]> git.proxmox.com Git - mirror_ubuntu-focal-kernel.git/commitdiff
nvme: fix identify error status silent ignore
authorSagi Grimberg <sagi@grimberg.me>
Fri, 26 Jun 2020 17:46:29 +0000 (10:46 -0700)
committerKhalid Elmously <khalid.elmously@canonical.com>
Sat, 8 Aug 2020 05:53:12 +0000 (01:53 -0400)
BugLink: https://bugs.launchpad.net/bugs/1886995
[ Upstream commit ea43d9709f727e728e933a8157a7a7ca1a868281 ]

Commit 59c7c3caaaf8 intended to only silently ignore non retry-able
errors (DNR bit set) such that we can still identify misbehaving
controllers, and in the other hand propagate retry-able errors (DNR bit
cleared) so we don't wrongly abandon a namespace just because it happens
to be temporarily inaccessible.

The goal remains the same as the original commit where this was
introduced but unfortunately had the logic backwards.

Fixes: 59c7c3caaaf8 ("nvme: fix possible hang when ns scanning fails during error recovery")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
drivers/nvme/host/core.c

index d4b388793f40d99ca040dcb3112b07c1a4933c1c..071b63146d4b756dcebd37ce7e697cd4e24385eb 100644 (file)
@@ -1088,10 +1088,16 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
                dev_warn(ctrl->device,
                        "Identify Descriptors failed (%d)\n", status);
                 /*
-                 * Don't treat an error as fatal, as we potentially already
-                 * have a NGUID or EUI-64.
+                 * Don't treat non-retryable errors as fatal, as we potentially
+                 * already have a NGUID or EUI-64.  If we failed with DNR set,
+                 * we want to silently ignore the error as we can still
+                 * identify the device, but if the status has DNR set, we want
+                 * to propagate the error back specifically for the disk
+                 * revalidation flow to make sure we don't abandon the
+                 * device just because of a temporal retry-able error (such
+                 * as path of transport errors).
                  */
-               if (status > 0 && !(status & NVME_SC_DNR))
+               if (status > 0 && (status & NVME_SC_DNR))
                        status = 0;
                goto free_data;
        }