drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 22 ++- drivers/scsi/libsas/sas_ata.c | 12 ++ drivers/scsi/libsas/sas_init.c | 3 - drivers/scsi/libsas/sas_internal.h | 4 + drivers/scsi/pm8001/pm8001_hwi.c | 186 ++++--------------------- drivers/scsi/pm8001/pm8001_sas.c | 8 ++ drivers/scsi/pm8001/pm8001_sas.h | 4 - drivers/scsi/pm8001/pm80xx_hwi.c | 177 +++-------------------- include/scsi/libsas.h | 4 - include/scsi/sas_ata.h | 6 + 11 files changed, 96 insertions(+), 335 deletions(-)
As reported in [0], the pm8001 driver NCQ error handling more or less
duplicates what libata does in link error handling, as follows:
- abort all commands
- do autopsy with read log ext 10 command
- reset the target to recover, if necessary
Indeed for the hisi_sas driver we want to add similar handling for NCQ
errors.
This series add a new libsas API - sas_ata_device_link_abort() - to handle
host NCQ errors, and fixes up pm8001 and hisi_sas drivers to use it.
A difference in the pm8001 driver NCQ error handling is that we send
SATA_ABORT per-task prior to read log ext10, but I feel that this should
not make a difference to the error handling.
Damien kindly tested previous the series for pm8001, but any further pm8001
testing would be appreciated as I have since tweaked pm8001 handling again.
This is because the pm8001 driver hangs on my arm64 machine read log ext10
command.
Finally with these changes we can make the libsas task alloc/free APIs
private, which they should always have been.
Based on v6.0-rc4
[0] https://lore.kernel.org/linux-scsi/8fb3b093-55f0-1fab-81f4-e8519810a978@huawei.com/
Changes since v2:
- Stop sending SATA_ABORT all for pm8001 handling
- Make "reset" optional in sas_ata_device_link_abort()
Changes since v1:
- Rename sas_ata_link_abort() -> sas_ata_device_link_abort()
- Set EH RESET flag in sas_ata_device_link_abort()
- Add Jack's Ack tags
- Rebase
John Garry (5):
scsi: pm8001: Modify task abort handling for SATA task
scsi: libsas: Add sas_ata_device_link_abort()
scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
scsi: hisi_sas: Don't issue ATA softreset in hisi_sas_abort_task()
scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private
Xingui Yang (1):
scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw
drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 22 ++-
drivers/scsi/libsas/sas_ata.c | 12 ++
drivers/scsi/libsas/sas_init.c | 3 -
drivers/scsi/libsas/sas_internal.h | 4 +
drivers/scsi/pm8001/pm8001_hwi.c | 186 ++++---------------------
drivers/scsi/pm8001/pm8001_sas.c | 8 ++
drivers/scsi/pm8001/pm8001_sas.h | 4 -
drivers/scsi/pm8001/pm80xx_hwi.c | 177 +++--------------------
include/scsi/libsas.h | 4 -
include/scsi/sas_ata.h | 6 +
11 files changed, 96 insertions(+), 335 deletions(-)
--
2.35.3
On 9/7/22 00:08, John Garry wrote:
> As reported in [0], the pm8001 driver NCQ error handling more or less
> duplicates what libata does in link error handling, as follows:
> - abort all commands
> - do autopsy with read log ext 10 command
> - reset the target to recover, if necessary
>
> Indeed for the hisi_sas driver we want to add similar handling for NCQ
> errors.
>
> This series add a new libsas API - sas_ata_device_link_abort() - to handle
> host NCQ errors, and fixes up pm8001 and hisi_sas drivers to use it.
>
> A difference in the pm8001 driver NCQ error handling is that we send
> SATA_ABORT per-task prior to read log ext10, but I feel that this should
> not make a difference to the error handling.
>
> Damien kindly tested previous the series for pm8001, but any further pm8001
> testing would be appreciated as I have since tweaked pm8001 handling again.
> This is because the pm8001 driver hangs on my arm64 machine read log ext10
> command.
I tested this series on top of the current Linus tree. All look good. As
ususal, I hammered an SMR drive connected to a pm80xx adapter and
generated lots of invalid commands to check EH. No issues that I can see.
E.g., an unaligned write error look like this:
[ 5384.194853] pm80xx0:: mpi_sata_event 2685: SATA EVENT 0x23
[ 5384.238720] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 5384.246171] sas: sas_scsi_find_task: aborting task 0x0000000050be2a4b
[ 5384.254654] pm80xx0:: mpi_sata_completion 2292: task null, freeing
CCB tag 2
[ 5384.254676] sas: sas_scsi_find_task: task 0x0000000050be2a4b is aborted
[ 5384.294659] sas: sas_eh_handle_sas_errors: task 0x0000000050be2a4b is
aborted
[ 5384.318691] ata22.00: exception Emask 0x0 SAct 0x200000 SErr 0x0
action 0x0
[ 5384.328425] ata22.00: failed command: WRITE FPDMA QUEUED
[ 5384.336277] ata22.00: cmd 61/01:00:01:00:ea/00:00:02:00:00/40 tag 21
ncq dma 4096 out
[ 5384.336277] res 43/04:01:01:00:ea/00:00:02:00:00/00 Emask
0x400 (NCQ error) <F>
[ 5384.357299] ata22.00: status: { DRDY SENSE ERR }
[ 5384.364459] ata22.00: error: { ABRT }
[ 5384.553177] ata22.00: configured for UDMA/133
[ 5384.560343] sd 19:0:3:0: [sdl] tag#80 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_OK cmd_age=0s
[ 5384.572175] sd 19:0:3:0: [sdl] tag#80 Sense Key : Illegal Request
[current]
[ 5384.581765] sd 19:0:3:0: [sdl] tag#80 Add. Sense: Unaligned write command
[ 5384.591126] sd 19:0:3:0: [sdl] tag#80 CDB: Write(16) 8a 00 00 00 00
00 02 ea 00 01 00 00 00 01 00 00
[ 5384.602938] I/O error, dev sdl, sector 391118856 op 0x1:(WRITE) flags
0x8800 phys_seg 1 prio class 2
[ 5384.613854] ata22: EH complete
[ 5384.618570] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1
tries: 1
So looks good to me.
Feel free to add:
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>
> Finally with these changes we can make the libsas task alloc/free APIs
> private, which they should always have been.
>
> Based on v6.0-rc4
>
> [0] https://lore.kernel.org/linux-scsi/8fb3b093-55f0-1fab-81f4-e8519810a978@huawei.com/
>
> Changes since v2:
> - Stop sending SATA_ABORT all for pm8001 handling
> - Make "reset" optional in sas_ata_device_link_abort()
>
> Changes since v1:
> - Rename sas_ata_link_abort() -> sas_ata_device_link_abort()
> - Set EH RESET flag in sas_ata_device_link_abort()
> - Add Jack's Ack tags
> - Rebase
>
> John Garry (5):
> scsi: pm8001: Modify task abort handling for SATA task
> scsi: libsas: Add sas_ata_device_link_abort()
> scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
> scsi: hisi_sas: Don't issue ATA softreset in hisi_sas_abort_task()
> scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private
>
> Xingui Yang (1):
> scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw
>
> drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +-
> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 22 ++-
> drivers/scsi/libsas/sas_ata.c | 12 ++
> drivers/scsi/libsas/sas_init.c | 3 -
> drivers/scsi/libsas/sas_internal.h | 4 +
> drivers/scsi/pm8001/pm8001_hwi.c | 186 ++++---------------------
> drivers/scsi/pm8001/pm8001_sas.c | 8 ++
> drivers/scsi/pm8001/pm8001_sas.h | 4 -
> drivers/scsi/pm8001/pm80xx_hwi.c | 177 +++--------------------
> include/scsi/libsas.h | 4 -
> include/scsi/sas_ata.h | 6 +
> 11 files changed, 96 insertions(+), 335 deletions(-)
>
--
Damien Le Moal
Western Digital Research
On 21/09/2022 04:59, Damien Le Moal wrote:
> I tested this series on top of the current Linus tree. All look good. As
> ususal, I hammered an SMR drive connected to a pm80xx adapter and
> generated lots of invalid commands to check EH. No issues that I can see.
> E.g., an unaligned write error look like this:
>
> [ 5384.194853] pm80xx0:: mpi_sata_event 2685: SATA EVENT 0x23
> [ 5384.238720] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
> [ 5384.246171] sas: sas_scsi_find_task: aborting task 0x0000000050be2a4b
> [ 5384.254654] pm80xx0:: mpi_sata_completion 2292: task null, freeing
> CCB tag 2
> [ 5384.254676] sas: sas_scsi_find_task: task 0x0000000050be2a4b is aborted
> [ 5384.294659] sas: sas_eh_handle_sas_errors: task 0x0000000050be2a4b is
> aborted
> [ 5384.318691] ata22.00: exception Emask 0x0 SAct 0x200000 SErr 0x0
> action 0x0
> [ 5384.328425] ata22.00: failed command: WRITE FPDMA QUEUED
> [ 5384.336277] ata22.00: cmd 61/01:00:01:00:ea/00:00:02:00:00/40 tag 21
> ncq dma 4096 out
> [ 5384.336277] res 43/04:01:01:00:ea/00:00:02:00:00/00 Emask
> 0x400 (NCQ error) <F>
> [ 5384.357299] ata22.00: status: { DRDY SENSE ERR }
> [ 5384.364459] ata22.00: error: { ABRT }
> [ 5384.553177] ata22.00: configured for UDMA/133
> [ 5384.560343] sd 19:0:3:0: [sdl] tag#80 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_OK cmd_age=0s
> [ 5384.572175] sd 19:0:3:0: [sdl] tag#80 Sense Key : Illegal Request
> [current]
> [ 5384.581765] sd 19:0:3:0: [sdl] tag#80 Add. Sense: Unaligned write
> command
> [ 5384.591126] sd 19:0:3:0: [sdl] tag#80 CDB: Write(16) 8a 00 00 00 00
> 00 02 ea 00 01 00 00 00 01 00 00
> [ 5384.602938] I/O error, dev sdl, sector 391118856 op 0x1:(WRITE) flags
> 0x8800 phys_seg 1 prio class 2
> [ 5384.613854] ata22: EH complete
> [ 5384.618570] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1
> tries: 1
>
> So looks good to me.
>
> Feel free to add:
>
> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Great, thanks very much. I'll send an updated version based on mkp-scsi
6.1 queue today.
John
© 2016 - 2026 Red Hat, Inc.