OSDN Git Service

scsi: libsas: Don't always drain event workqueue for HA resume
authorJohn Garry <john.garry@huawei.com>
Mon, 20 Dec 2021 11:21:24 +0000 (19:21 +0800)
committerMartin K. Petersen <martin.petersen@oracle.com>
Thu, 23 Dec 2021 04:38:29 +0000 (23:38 -0500)
commitfbefe22811c3140a686e407e114789ebf328a9a2
tree75086b929e23c76bd9f950582c387c87a5645757
parent4be6181fea1dbfd21a8d73f69d87a6cae2d3023d
scsi: libsas: Don't always drain event workqueue for HA resume

For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:

The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.

Other drivers which use libsas don't worry about this as none support
runtime suspend.

The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.

There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.

Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
drivers/scsi/libsas/sas_init.c
include/scsi/libsas.h