drivers/scsi/pm8001/pm8001_sas.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
From: Igor Pylypiv <ipylypiv@google.com>
It's possible to end up in a state where pm8001_dev->running_req never
reaches zero. In that state we will be sleeping forever.
sas_execute_internal_abort_dev() can wait for a response for
up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough
for pm8001_dev->running_req to get to zero.
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: TJ Adams <tadamsjr@google.com>
---
drivers/scsi/pm8001/pm8001_sas.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index a5a31dfa4512..513e9a49838c 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev)
if (atomic_read(&pm8001_dev->running_req)) {
spin_unlock_irqrestore(&pm8001_ha->lock, flags);
sas_execute_internal_abort_dev(dev, 0, NULL);
- while (atomic_read(&pm8001_dev->running_req))
- msleep(20);
+ if (atomic_read(&pm8001_dev->running_req)) {
+ pm8001_dbg(pm8001_ha, FAIL,
+ "device_id: %u: Failed to abort %d requests!\n",
+ device_id, atomic_read(&pm8001_dev->running_req));
+ }
spin_lock_irqsave(&pm8001_ha->lock, flags);
}
PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id);
--
2.45.2.803.g4e1b14247a-goog
On 09/07/2024 17:00, TJ Adams wrote:
> From: Igor Pylypiv <ipylypiv@google.com>
>
> It's possible to end up in a state where pm8001_dev->running_req never
> reaches zero.
Is that a driver bug then?
> In that state we will be sleeping forever.
>
> sas_execute_internal_abort_dev() can wait for a response for
> up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough
> for pm8001_dev->running_req to get to zero.
May I suggest you drop running_req at some stage, and use other methods
to find how many IOs are active?
>
> Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> Signed-off-by: TJ Adams <tadamsjr@google.com>
> ---
> drivers/scsi/pm8001/pm8001_sas.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index a5a31dfa4512..513e9a49838c 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev)
> if (atomic_read(&pm8001_dev->running_req)) {
> spin_unlock_irqrestore(&pm8001_ha->lock, flags);
> sas_execute_internal_abort_dev(dev, 0, NULL);
> - while (atomic_read(&pm8001_dev->running_req))
> - msleep(20);
> + if (atomic_read(&pm8001_dev->running_req)) {
> + pm8001_dbg(pm8001_ha, FAIL,
> + "device_id: %u: Failed to abort %d requests!\n",
> + device_id, atomic_read(&pm8001_dev->running_req));
> + }
> spin_lock_irqsave(&pm8001_ha->lock, flags);
> }
> PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id);
Sorry for the late response.
> > It's possible to end up in a state where pm8001_dev->running_req never
> > reaches zero.
>
> Is that a driver bug then?
I haven't seen this unless artificially creating the situation. This
is a preventative change rather than a response to a specific issue
seen.
> > In that state we will be sleeping forever.
> >
> > sas_execute_internal_abort_dev() can wait for a response for
> > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough
> > for pm8001_dev->running_req to get to zero.
> May I suggest you drop running_req at some stage, and use other methods
> to find how many IOs are active?
I haven't given much thought about better ways to keep track of active
ios, so it will have to come later but definitely noted!
On Tue, Jul 9, 2024 at 9:09 AM John Garry <john.g.garry@oracle.com> wrote:
>
> On 09/07/2024 17:00, TJ Adams wrote:
> > From: Igor Pylypiv <ipylypiv@google.com>
> >
> > It's possible to end up in a state where pm8001_dev->running_req never
> > reaches zero.
>
> Is that a driver bug then?
>
> > In that state we will be sleeping forever.
> >
> > sas_execute_internal_abort_dev() can wait for a response for
> > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough
> > for pm8001_dev->running_req to get to zero.
>
> May I suggest you drop running_req at some stage, and use other methods
> to find how many IOs are active?
>
> >
> > Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> > Signed-off-by: TJ Adams <tadamsjr@google.com>
> > ---
> > drivers/scsi/pm8001/pm8001_sas.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> > index a5a31dfa4512..513e9a49838c 100644
> > --- a/drivers/scsi/pm8001/pm8001_sas.c
> > +++ b/drivers/scsi/pm8001/pm8001_sas.c
> > @@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev)
> > if (atomic_read(&pm8001_dev->running_req)) {
> > spin_unlock_irqrestore(&pm8001_ha->lock, flags);
> > sas_execute_internal_abort_dev(dev, 0, NULL);
> > - while (atomic_read(&pm8001_dev->running_req))
> > - msleep(20);
> > + if (atomic_read(&pm8001_dev->running_req)) {
> > + pm8001_dbg(pm8001_ha, FAIL,
> > + "device_id: %u: Failed to abort %d requests!\n",
> > + device_id, atomic_read(&pm8001_dev->running_req));
> > + }
> > spin_lock_irqsave(&pm8001_ha->lock, flags);
> > }
> > PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id);
>
© 2016 - 2026 Red Hat, Inc.