]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/commit
nvme: fix regression when disconnect a recovering ctrl
authorRuozhu Li <liruozhu@huawei.com>
Thu, 23 Jun 2022 06:45:39 +0000 (14:45 +0800)
committerStefan Bader <stefan.bader@canonical.com>
Fri, 16 Sep 2022 08:52:56 +0000 (10:52 +0200)
commit15159f22edf65a6922d66bcfa4309900a240988f
tree4c8f47f6225ff4a197ee4fd36a0a18c194fd2248
parentc2ba2ca8187f163254c54125004cb8b1e312be53
nvme: fix regression when disconnect a recovering ctrl

BugLink: https://bugs.launchpad.net/bugs/1988351
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]

We encountered a problem that the disconnect command hangs.
After analyzing the log and stack, we found that the triggering
process is as follows:
CPU0                          CPU1
                                nvme_rdma_error_recovery_work
                                  nvme_rdma_teardown_io_queues
nvme_do_delete_ctrl                 nvme_stop_queues
  nvme_remove_namespaces
  --clear ctrl->namespaces
                                    nvme_start_queues
                                    --no ns in ctrl->namespaces
    nvme_ns_remove                  return(because ctrl is deleting)
      blk_freeze_queue
        blk_mq_freeze_queue_wait
        --wait for ns to unquiesce to clean infligt IO, hang forever

This problem was not found in older kernels because we will flush
err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
seem to be modified for functional reasons, the patch can be revert
to solve the problem.

Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")

Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
drivers/nvme/host/core.c
drivers/nvme/host/nvme.h
drivers/nvme/host/rdma.c
drivers/nvme/host/tcp.c