Jun Chen [Mon, 10 Apr 2023 02:37:23 +0000 (10:37 +0800)]
scsi: lpfc: Silence an incorrect device output
In lpfc_sli4_pci_mem_unset(), case LPFC_SLI_INTF_IF_TYPE_1 does not have a
break statement, resulting in an incorrect device output.
Fix this by adding a break statement before the default option.
Signed-off-by: Jun Chen <jun_c@hust.edu.cn> Link: https://lore.kernel.org/r/20230410023724.3209455-1-jun_c@hust.edu.cn Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
Driver uses spin lock without irqsave when it needs to acquire a chain
frame. This is done to protect chain frame allocation from multiple
submission threads. If there is any I/O queued from an interrupt context,
and if that requires a chain frame, and if the chain lock is held by the CPU
which got interrupted, then there will be a possible deadlock.
Although it is unlikely that KMEM_CACHE might fail, but if it does then ret
might be zero. So to fix this explicitly mark ret as "-ENOMEM" and then
goto driver_unreg.
Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20230406074607.3637097-1-harshit.m.mogalapalli@oracle.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: hisi_sas: Work around build failure in suspend function
The suspend/resume functions in this driver seem to have multiple problems,
the latest one just got introduced by a bugfix:
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
As far as I can tell, the 'usage_count' is not meant to be accessed by
device drivers at all, though I don't know what the driver is supposed to
do instead.
Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and
marking functions as __maybe_unused to avoid warnings about unused
functions. This should probably be changed to using
DEFINE_RUNTIME_DEV_PM_OPS().
Both changes require actually understanding what the driver needs to do,
and being able to test this, so instead here is the simplest patch to make
it pass the randconfig builds instead.
Fixes: e368d38cb952 ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.org Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shuchang Li [Tue, 4 Apr 2023 07:21:32 +0000 (15:21 +0800)]
scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4)
returns false, drbl_regs_memmap_p is not remapped. This passes a NULL
pointer to iounmap(), which can trigger a WARN() on certain arches.
When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4)
returns true, drbl_regs_memmap_p may has been remapped and
ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a
NULL pointer to iounmap().
To fix these issues, we need to add null checks before iounmap(), and
change some goto labels.
Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Shuchang Li <lishuchang@hust.edu.cn> Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cn Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The issue is related to how the driver manages submit queues and tags. A
single array of submit queues - sdebug_q_arr - with its own set of tags is
shared among all shosts. As such, for occasions when we have more than one
host it is possible to overload the submit queues and run out of tags.
Another separate issue that we may reduce the shost submit queue depth,
sdebug_max_queue, dynamically causing the shost to be overloaded. How many
IOs which the shost may be sent is fixed at can_queue at init time, which
is the same initial value for sdebug_max_queue. So reducing
sdebug_max_queue means that the shost may be sent more IOs than it is
configured to handle, causing overloading.
This series removes the scsi_debug submit queue concept and uses
pre-existing APIs to manage and examine tags, like scsi_block_requests()
and blk_mq_tagset_busy_iter(). Using standard APIs makes the driver more
maintainable and extensible in future.
A restriction is also added to allow sdebug_max_queue only be modified when
no shosts are present, i.e. we need to remove shosts, modify
sdebug_max_queue, and then re-add the shosts.
The issue is related to how the driver manages submit queues and tags. A
single array of submit queues - sdebug_q_arr - with its own set of tags is
shared among all shosts. As such, for occasions when we have more than one
shost it is possible to overload the submit queues and run out of tags.
The struct sdebug_queue is to manage tags and hold the associated
queued command entry pointer (for that tag).
Since the tagset iters are now used for functions like
sdebug_blk_mq_poll(), there is no need to manage these queues. Indeed,
blk-mq already provides what we need for managing tags and queues.
Drop sdebug_queue and all its usage in the driver.
John Garry [Mon, 27 Mar 2023 07:43:09 +0000 (07:43 +0000)]
scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
The shost->can_queue value is initially used to set per-HW queue context
tag depth in the block layer. This ensures that the shost is not sent too
many commands which it can deal with. However lowering sdebug_max_queue
separately means that we can easily overload the shost, as in the following
example:
Ensure that this cannot happen by allowing sdebug_max_queue be modified
only when we have no shosts. As such, any shost->can_queue value will match
sdebug_max_queue, and sdebug_max_queue cannot be modified separately.
Since retired_max_queue is no longer set, remove support.
Continue to apply the restriction that sdebug_host_max_queue cannot be
modified when sdebug_host_max_queue is set. Adding support for that would
mean extra code, and no one has complained about this restriction
previously.
A command like the following may be used to remove a shost:
echo -1 > /sys/bus/pseudo/drivers/scsi_debug/add_host
John Garry [Mon, 27 Mar 2023 07:43:08 +0000 (07:43 +0000)]
scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
The functions to update ndelay and delay value first check whether we have
any in-flight IO for any host. It does this by checking if any tag is used
in the global submit queues.
We can achieve the same by setting the host as blocked and then ensuring
that we have no in-flight commands with scsi_host_busy().
Note that scsi_host_busy() checks SCMD_STATE_INFLIGHT flag, which is only
set per command after we ensure that the host is not blocked, i.e. we see
more commands active after the check for scsi_host_busy() returns 0.
Eventually we will drop the sdebug_queue struct as it is not really
required, so start with making the sdebug_queued_cmd dynamically allocated
for the lifetime of the scsi_cmnd in the driver.
As an interim measure, make sdebug_queued_cmd.sd_dp a pointer to struct
sdebug_defer. Also keep a value of the index allocated in
sdebug_queued_cmd.qc_arr in struct sdebug_queued_cmd.
To deal with an races in accessing the scsi cmnd allocated struct
sdebug_queued_cmd, add a spinlock for the scsi command in its priv area.
Races may be between scheduling a command for completion, aborting a
command, and the command actually completing and freeing the struct
sdebug_queued_cmd.
John Garry [Mon, 27 Mar 2023 07:43:03 +0000 (07:43 +0000)]
scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
There is no reason that calls to block_unblock_all_queues() from different
context can't race with one another, so protect with the
sdebug_host_list_mutex. There's no need for a more fine-grained per shost
locking here (and we don't have a per-host lock anyway).
Also simplify some touched code in sdebug_change_qdepth().
John Garry [Mon, 27 Mar 2023 07:43:02 +0000 (07:43 +0000)]
scsi: scsi_debug: Change shost list lock to a mutex
The shost list lock, sdebug_host_list_lock, is a spinlock. We would only
lock in non-atomic context in this driver, so use a mutex instead, which is
friendlier if we need to schedule when iterating.
John Garry [Mon, 27 Mar 2023 07:43:01 +0000 (07:43 +0000)]
scsi: scsi_debug: Don't iter all shosts in clear_luns_changed_on_target()
In clear_luns_changed_on_target(), we iter all devices for all shosts to
conditionally clear the SDEBUG_UA_LUNS_CHANGED flag in the per-device
uas_bm.
One condition to see whether we clear the flag is to test whether the host
for the device under consideration is the same as the matching device's
(devip) host. This check will only ever pass for devices for the same
shost, so only iter the devices for the matching device shost.
We can now drop the spinlock'ing of the sdebug_host_list_lock in the same
function. This will allow us to use a mutex instead of the spinlock for the
global shost lock, as clear_luns_changed_on_target() could be called in
non-blocking context, in scsi_debug_queuecommand() -> make_ua() ->
clear_luns_changed_on_target() (which is why required a spinlock).
John Garry [Mon, 27 Mar 2023 07:43:00 +0000 (07:43 +0000)]
scsi: scsi_debug: Fix check for sdev queue full
There is a report that the blktests scsi/004 test for "TASK SET FULL" (TSF)
now fails.
The condition upon we should issue this TSF is when the sdev queue is
full. The check for a full queue has an off-by-1 error. Previously we would
increment the number of requests in the queue after testing if the queue
would be full, i.e. test if one less than full. Since we now use
scsi_device_busy() to count the number of requests in the queue, this would
already account for the current request, so fix the test for queue full
accordingly.
Yihang Li [Mon, 20 Mar 2023 03:34:25 +0000 (11:34 +0800)]
scsi: hisi_sas: Exit suspend state when usage count is greater than 0
When the current status of the host controller is suspended, enabling a
local PHY just after disabling all local PHYs in expander environment, a
hang as follows occurs:
We find that when all local PHYs are disabled, all the devices will be
removed, the ->runtime_suspend() callback suspend_v3_hw() directly execute
since the controller usage count drop to 0. On the other side, the first
local PHY is enabled through the sysfs interface, and ensures that function
phy_up_v3_hw() is completed due to suspend_v3_hw()->
interrupt_disable_v3_hw(). In the expander scenario,
sas_discover_root_expander() is executed in event work
DISCE_DISCOVER_DOMAIN, which will increases the controller usage count and
carry out a resume and sends SMPIO, it cannot be completed because the
runtime PM status of the controller is RPM_SUSPENDING. At the same time,
the ->runtime_suspend() callback suspend_v3_hw() also cannot complete the
process because of drain libsas event queue in sas_suspend_ha(), so hung
occurs.
To fix this, check if the current runtime PM status of the controller
allows to be suspended continue after interrupt_disable_v3_hw(), return
immediately if not.
Yihang Li [Mon, 20 Mar 2023 03:34:24 +0000 (11:34 +0800)]
scsi: hisi_sas: Ensure all enabled PHYs up during controller reset
For the controller reset operation, hisi_sas_phy_enable() is executed for
each enabled local PHY, and refresh the port id of each device based on the
latest hisi_sas_phy->port_id after 1 second sleep, hisi_sas_phy->port_id is
configured in the interrupt processing function phy_up_v3_hw(). However, in
directly attached scenario, for some SATA disks the amount of time for
phyup more than 1s sometimes. In this case, incorrect port id may be
configured in hisi_sas_refresh_port_id(). As a result, all the internal
IOs fail and disk lost, such as follows:
So the time of waitting for PHYs up is 1s which may be not enough. To solve
the issue, running hisi_sas_phy_enable() in parallel through async
operations and use wait_for_completion_timeout() to wait for PHYs come up
instead of directly sleep for 1 second.
Xingui Yang [Mon, 20 Mar 2023 03:34:23 +0000 (11:34 +0800)]
scsi: hisi_sas: Handle NCQ error when IPTT is valid
If an NCQ error occurs when the IPTT is valid and slot->abort flag is set
in completion path, sas_task_abort() will be called to abort only one NCQ
command now, and the host would be set to SHOST_RECOVERY state. But this
may not kick-off EH Immediately until other outstanding QCs timeouts. As a
result, the host may remain in the SHOST_RECOVERY state for up to 30
seconds, such as follows:
Xingui Yang [Mon, 20 Mar 2023 03:34:22 +0000 (11:34 +0800)]
scsi: hisi_sas: Grab sas_dev lock when traversing the members of sas_dev.list
When freeing slots in function slot_complete_v3_hw(), it is possible that
sas_dev.list is being traversed elsewhere, and it may trigger a NULL
pointer exception, such as follows:
To fix the issue, grab sas_dev lock when traversing the members of
sas_dev.list in dereg_device_v3_hw() and hisi_sas_release_tasks() to avoid
concurrency of adding and deleting member. When function
hisi_sas_release_tasks() calls hisi_sas_do_release_task() to free slot, the
lock cannot be grabbed again in hisi_sas_slot_task_free(), then a bool
parameter need_lock is added.
Tom Rix [Fri, 31 Mar 2023 17:57:57 +0000 (13:57 -0400)]
scsi: qla4xxx: Remove unused 'count' variable
clang with W=1 reports:
drivers/scsi/qla4xxx/ql4_isr.c:475:11: error: variable
'count' set but not used [-Werror,-Wunused-but-set-variable]
uint32_t count = 0;
^
This variable is not used so remove it.
Tom Rix [Tue, 28 Mar 2023 00:16:47 +0000 (20:16 -0400)]
scsi: snic: Remove unused 'xfer_len' variable
clang with W=1 reports:
drivers/scsi/snic/snic_scsi.c:490:6: error: variable
'xfer_len' set but not used [-Werror,-Wunused-but-set-variable]
u64 xfer_len = 0;
^
This variable is not used so remove it.
Tom Rix [Thu, 30 Mar 2023 20:34:44 +0000 (16:34 -0400)]
scsi: qedf: Remove unused 'num_handled' variable
clang with W=1 reports:
drivers/scsi/qedf/qedf_main.c:2227:6: error: variable
'num_handled' set but not used [-Werror,-Wunused-but-set-variable]
int num_handled = 0;
^
This variable is not used so remove it.
drivers/scsi/scsi_transport_fc.c:908:6: error: variable
'desc_cnt' set but not used [-Werror,-Wunused-but-set-variable]
u32 desc_cnt = 0, bytes_remain;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230326003222.1343671-1-trix@redhat.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enze Li [Mon, 27 Mar 2023 03:02:37 +0000 (11:02 +0800)]
scsi: sr: Simplify the sr_open() function
Simplify the sr_open() by removing the goto label as the function only
returns one error code.
Signed-off-by: Enze Li <lienze@kylinos.cn> Link: https://lore.kernel.org/r/20230327030237.3407253-1-lienze@kylinos.cn Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/target/target_core_spc.c:229:6: error: variable
'prod_len' set but not used [-Werror,-Wunused-but-set-variable]
u32 prod_len;
^
This variable is not used so remove it.
Jason Yan [Thu, 30 Mar 2023 11:09:30 +0000 (19:09 +0800)]
scsi: libsas: Abort all in-flight requests when device is gone
When a disk is removed with in-flight I/O, the application needs to wait
for 30 seconds (depending on the timeout configuration) to hear back from
the kernel. Xingui tried to fix this issue by aborting the ATA link for
SATA devices[1], however this approach left the SAS devices unresolved.
Try to fix this issue by aborting all in-flight requests when the device is
gone. This is implemented by iterating over the tagset.
Cc: Xingui Yang <yangxingui@huawei.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20230330110930.175539-1-yanaijie@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stanley Chu [Thu, 30 Mar 2023 01:29:18 +0000 (09:29 +0800)]
scsi: core: Clean up struct ufs_saved_pwr_info
The "is_valid" field of the struct ufs_saved_pwr_info is no longer used,
and this struct can be replaced by struct ufs_pa_layer_attr without any
changes to the functionality.
Jerry Snitselaar [Fri, 24 Mar 2023 19:32:04 +0000 (12:32 -0700)]
scsi: mpt3sas: Don't print sense pool info twice
_base_allocate_sense_dma_pool() already prints out the sense pool
information, so don't print it a second time after calling it in
_base_allocate_memory_pools(). In addition the version in
_base_allocate_memory_pools() was using the wrong size value, sz, which was
last assigned when doing some nvme calculations instead of sense_sz to
determine the pool size in kilobytes.
Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: MPT-FusionLinux.pdl@broadcom.com Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Fixes: 970ac2bb70e7 ("scsi: mpt3sas: Force sense buffer allocations to be within same 4 GB region") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230324193204.567932-1-jsnitsel@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Wed, 22 Mar 2023 02:22:11 +0000 (11:22 +0900)]
scsi: core: Improve scsi_vpd_inquiry() checks
Some USB-SATA adapters have broken behavior when an unsupported VPD page is
probed: Depending on the VPD page number, a 4-byte header with a valid VPD
page number but with a 0 length is returned. Currently, scsi_vpd_inquiry()
only checks that the page number is valid to determine if the page is
valid, which results in receiving only the 4-byte header for the
non-existent page. This error manifests itself very often with page 0xb9
for the Concurrent Positioning Ranges detection done by sd_read_cpr(),
resulting in the following error message:
Tomas Henzl [Fri, 24 Mar 2023 15:01:34 +0000 (16:01 +0100)]
scsi: megaraid_sas: Fix crash after a double completion
When a physical disk is attached directly "without JBOD MAP support" (see
megasas_get_tm_devhandle()) then there is no real error handling in the
driver. Return FAILED instead of SUCCESS.
Zheng Wang [Sat, 18 Mar 2023 08:16:35 +0000 (16:16 +0800)]
scsi: message: mptlan: Fix use after free bug in mptlan_remove() due to race condition
mptlan_probe() calls mpt_register_lan_device() which initializes the
&priv->post_buckets_task workqueue. A call to
mpt_lan_wake_post_buckets_task() will subsequently start the work.
During driver unload in mptlan_remove() the following race may occur:
Merge patch series "Constify most SCSI host templates"
Bart Van Assche <bvanassche@acm.org> says:
It helps humans and the compiler if it is made explicit that SCSI host
templates are not modified. Hence this patch series that constifies most
SCSI host templates. Please consider this patch series for the next merge
window.