[PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery

Mateusz Nowicki posted 2 patches 1 month, 1 week ago
drivers/scsi/smartpqi/smartpqi_init.c | 47 +++++++++++++++++++++++++++
drivers/scsi/smartpqi/smartpqi_sis.c  |  2 +-
2 files changed, 48 insertions(+), 1 deletion(-)
[PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
Posted by Mateusz Nowicki 1 month, 1 week ago
A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset") on a
controller without FLR support leaves the HPE SR932i-p Gen10+ unusable
until reboot: smartpqi registers no pci_error_handlers, so the driver
is not notified, firmware reverts to SIS mode, and all queue mappings
are dropped while the driver still drives PQI.

Patch 1 adds .reset_prepare / .reset_done reusing
pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().

Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
matching the cold-boot path; without this patch 1 fails at the SIS
ready check because firmware boot after reset takes ~125s on the
SR932i-p Gen10+.

Tested on HPE SR932i-p Gen10+ against Linus' master at 74fe02ce122a.

Note: the From: header is my Posteo address because my employer's SMTP
is unavailable for external mailing lists.  The Signed-off-by carries
the Microchip attribution.

Mateusz Nowicki (2):
  scsi: smartpqi: add pci_error_handlers for bus reset recovery
  scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s

 drivers/scsi/smartpqi/smartpqi_init.c | 47 +++++++++++++++++++++++++++
 drivers/scsi/smartpqi/smartpqi_sis.c  |  2 +-
 2 files changed, 48 insertions(+), 1 deletion(-)

--
2.43.0
Re: [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
Posted by Laurence Oberman 1 month, 1 week ago
On Wed, 2026-05-06 at 14:01 +0000, Mateusz Nowicki wrote:
> A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset")
> on a
> controller without FLR support leaves the HPE SR932i-p Gen10+
> unusable
> until reboot: smartpqi registers no pci_error_handlers, so the driver
> is not notified, firmware reverts to SIS mode, and all queue mappings
> are dropped while the driver still drives PQI.
> 
> Patch 1 adds .reset_prepare / .reset_done reusing
> pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().
> 
> Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
> matching the cold-boot path; without this patch 1 fails at the SIS
> ready check because firmware boot after reset takes ~125s on the
> SR932i-p Gen10+.
> 
> Tested on HPE SR932i-p Gen10+ against Linus' master at 74fe02ce122a.
> 
> Note: the From: header is my Posteo address because my employer's
> SMTP
> is unavailable for external mailing lists.  The Signed-off-by carries
> the Microchip attribution.
> 
> Mateusz Nowicki (2):
>   scsi: smartpqi: add pci_error_handlers for bus reset recovery
>   scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
> 
>  drivers/scsi/smartpqi/smartpqi_init.c | 47
> +++++++++++++++++++++++++++
>  drivers/scsi/smartpqi/smartpqi_sis.c  |  2 +-
>  2 files changed, 48 insertions(+), 1 deletion(-)
> 
> --
> 2.43.0
> 
> 
> 
Hello

I did reproduce this so I am testing the patches as well.
They look correct to me, I will reply again after testing with a
review.

Thanks
Laurence


[2513778.140012] smartpqi 0000:64:00.0: no heartbeat detected - last
heartbeat count: 4207808511
[2513778.140031] smartpqi 0000:64:00.0: controller offline: reason code
0x4 (no controller heartbeat detected)
[2513778.141346] sd 1:0:0:0: [sda] tag#549 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=18s
[2513778.141355] sd 1:0:0:0: [sda] tag#550 FAILED Result: 

"xfs_buf_ioend_handle_error+0xd5/0x3f0 [xfs]" at daddr 0x9f78 len 8
error 5
[2513778.141526] XFS (dm-0): log I/O error -5
Re: [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
Posted by Laurence Oberman 1 month, 1 week ago
On Wed, 2026-05-06 at 18:21 -0400, Laurence Oberman wrote:
> On Wed, 2026-05-06 at 14:01 +0000, Mateusz Nowicki wrote:
> > A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset")
> > on a
> > controller without FLR support leaves the HPE SR932i-p Gen10+
> > unusable
> > until reboot: smartpqi registers no pci_error_handlers, so the
> > driver
> > is not notified, firmware reverts to SIS mode, and all queue
> > mappings
> > are dropped while the driver still drives PQI.
> > 
> > Patch 1 adds .reset_prepare / .reset_done reusing
> > pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().
> > 
> > Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
> > matching the cold-boot path; without this patch 1 fails at the SIS
> > ready check because firmware boot after reset takes ~125s on the
> > SR932i-p Gen10+.
> > 
> > Tested on HPE SR932i-p Gen10+ against Linus' master at
> > 74fe02ce122a.
> > 
> > Note: the From: header is my Posteo address because my employer's
> > SMTP
> > is unavailable for external mailing lists.  The Signed-off-by
> > carries
> > the Microchip attribution.
> > 
> > Mateusz Nowicki (2):
> >   scsi: smartpqi: add pci_error_handlers for bus reset recovery
> >   scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
> > 
> >  drivers/scsi/smartpqi/smartpqi_init.c | 47
> > +++++++++++++++++++++++++++
> >  drivers/scsi/smartpqi/smartpqi_sis.c  |  2 +-
> >  2 files changed, 48 insertions(+), 1 deletion(-)
> > 
> > --
> > 2.43.0
> > 
> > 
> > 
> Hello
> 
> I did reproduce this so I am testing the patches as well.
> They look correct to me, I will reply again after testing with a
> review.
> 
> Thanks
> Laurence
> 
> 
> [2513778.140012] smartpqi 0000:64:00.0: no heartbeat detected - last
> heartbeat count: 4207808511
> [2513778.140031] smartpqi 0000:64:00.0: controller offline: reason
> code
> 0x4 (no controller heartbeat detected)
> [2513778.141346] sd 1:0:0:0: [sda] tag#549 FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=18s
> [2513778.141355] sd 1:0:0:0: [sda] tag#550 FAILED Result: 
> 
> "xfs_buf_ioend_handle_error+0xd5/0x3f0 [xfs]" at daddr 0x9f78 len 8
> error 5
> [2513778.141526] XFS (dm-0): log I/O error -5
> 

Hello 

For the series:

I tested the patches and it recovers with them applied.
The patches look good.

Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Re: [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
Posted by mateusz.nowicki@posteo.net 1 month, 1 week ago
Hello,

Thank you Laurence, appreciated. I'll add your Tested-by and Reviewed-by
to both patches in v2 if the series needs a respin; otherwise Don or 
Martin
will pick it up on apply.

Thanks,
Mateusz

On 07.05.2026 03:45, Laurence Oberman wrote:
> On Wed, 2026-05-06 at 18:21 -0400, Laurence Oberman wrote:
>> On Wed, 2026-05-06 at 14:01 +0000, Mateusz Nowicki wrote:
>> > A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset")
>> > on a
>> > controller without FLR support leaves the HPE SR932i-p Gen10+
>> > unusable
>> > until reboot: smartpqi registers no pci_error_handlers, so the
>> > driver
>> > is not notified, firmware reverts to SIS mode, and all queue
>> > mappings
>> > are dropped while the driver still drives PQI.
>> >
>> > Patch 1 adds .reset_prepare / .reset_done reusing
>> > pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().
>> >
>> > Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
>> > matching the cold-boot path; without this patch 1 fails at the SIS
>> > ready check because firmware boot after reset takes ~125s on the
>> > SR932i-p Gen10+.
>> >
>> > Tested on HPE SR932i-p Gen10+ against Linus' master at
>> > 74fe02ce122a.
>> >
>> > Note: the From: header is my Posteo address because my employer's
>> > SMTP
>> > is unavailable for external mailing lists.  The Signed-off-by
>> > carries
>> > the Microchip attribution.
>> >
>> > Mateusz Nowicki (2):
>> >   scsi: smartpqi: add pci_error_handlers for bus reset recovery
>> >   scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
>> >
>> >  drivers/scsi/smartpqi/smartpqi_init.c | 47
>> > +++++++++++++++++++++++++++
>> >  drivers/scsi/smartpqi/smartpqi_sis.c  |  2 +-
>> >  2 files changed, 48 insertions(+), 1 deletion(-)
>> >
>> > --
>> > 2.43.0
>> >
>> >
>> >
>> Hello
>> 
>> I did reproduce this so I am testing the patches as well.
>> They look correct to me, I will reply again after testing with a
>> review.
>> 
>> Thanks
>> Laurence
>> 
>> 
>> [2513778.140012] smartpqi 0000:64:00.0: no heartbeat detected - last
>> heartbeat count: 4207808511
>> [2513778.140031] smartpqi 0000:64:00.0: controller offline: reason
>> code
>> 0x4 (no controller heartbeat detected)
>> [2513778.141346] sd 1:0:0:0: [sda] tag#549 FAILED Result:
>> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=18s
>> [2513778.141355] sd 1:0:0:0: [sda] tag#550 FAILED Result: 
>> 
>> "xfs_buf_ioend_handle_error+0xd5/0x3f0 [xfs]" at daddr 0x9f78 len 8
>> error 5
>> [2513778.141526] XFS (dm-0): log I/O error -5
>> 
> 
> Hello
> 
> For the series:
> 
> I tested the patches and it recovers with them applied.
> The patches look good.
> 
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Reviewed-by: Laurence Oberman <loberman@redhat.com>