[PATCH v2 00/16] qla2xxx target mode improvements

Tony Battersby posted 16 patches 4 months, 1 week ago
There is a newer version of this series
drivers/scsi/qla2xxx/qla_dbg.c     |    3 +-
drivers/scsi/qla2xxx/qla_def.h     |    1 -
drivers/scsi/qla2xxx/qla_gbl.h     |    2 +-
drivers/scsi/qla2xxx/qla_init.c    |    1 +
drivers/scsi/qla2xxx/qla_isr.c     |   32 +-
drivers/scsi/qla2xxx/qla_mbx.c     |    2 +
drivers/scsi/qla2xxx/qla_mid.c     |    4 +-
drivers/scsi/qla2xxx/qla_os.c      |   35 +-
drivers/scsi/qla2xxx/qla_target.c  | 1775 +++++++++++++++++++++++-----
drivers/scsi/qla2xxx/qla_target.h  |  112 +-
drivers/scsi/qla2xxx/tcm_qla2xxx.c |   17 +
11 files changed, 1646 insertions(+), 338 deletions(-)
[PATCH v2 00/16] qla2xxx target mode improvements
Posted by Tony Battersby 4 months, 1 week ago
v1 -> v2
- Add new patch "scsi: qla2xxx: clear cmds after chip reset" suggested
by Dmitry Bogdanov.
- Rename "scsi: qla2xxx: fix oops during cmd abort" to "scsi: qla2xxx:
fix races with aborting commands" and make SCST reset the ISP on a HW
timeout instead of unmapping DMA that might still be in use.
- Fix "scsi: qla2xxx: fix TMR failure handling" to free mcmds properly
for LIO.
- In "scsi: qla2xxx: add back SRR support", detect more buggy HBA fw
versions based on the fw release notes.
- Shorten code comment in "scsi: qla2xxx: improve safety of cmd lookup
by handle" and improve patch description.
- Rebase other patches as needed.

v1:
https://lore.kernel.org/r/f8977250-638c-4d7d-ac0c-65f742b8d535@cybernetics.com/

This patch series improves the qla2xxx FC driver in target mode.  I
developed these patches using the out-of-tree SCST target-mode subsystem
(https://scst.sourceforge.net/), although most of the improvements will
also apply to the other target-mode subsystems such as the in-tree LIO. 
Unfortunately qla2xxx+LIO does not pass all of my tests, but my patches
do not make it any worse (results below).  These patches have been
well-tested at my employer with qla2xxx+SCST in both initiator mode and
target mode and with a variety of FC HBAs and initiators.  Since SCST is
out-of-tree, some of the patches have parts that apply in-tree and other
parts that apply out-of-tree to SCST.  I am going to include the
out-of-tree SCST patches to provide additional context; feel free to
ignore them if you are not interested.

All patches apply to linux 6.17 and SCST 3.10 master branch.

Summary of patches:
- bugfixes
- cleanups
- improve handling of aborts and task management requests
- improve log message
- add back SLER / SRR support (removed in 2017)

Some of these patches improve handling of aborts and task management
requests.  This is some of the testing that I did:

Test 1: Use /dev/sg to queue random disk I/O with short timeouts; make
sure cmds are aborted successfully.
Test 2: Queue lots of disk I/O, then use "sg_reset -N -d /dev/sg" on
initiator to reset logical unit.
Test 3: Queue lots of disk I/O, then use "sg_reset -N -t /dev/sg" on
initiator to reset target.
Test 4: Queue lots of disk I/O, then use "sg_reset -N -b /dev/sg" on
initiator to reset bus.
Test 5: Queue lots of disk I/O, then use "sg_reset -N -H /dev/sg" on
initiator to reset host.
Test 6: Use fiber channel attenuator to trigger SRR during
write/read/compare test; check data integrity.

With my patches, SCST passes all of these tests.

Results with in-tree LIO target-mode subsystem:

Test 1: Seems to abort the same cmd multiple times (both
qlt_24xx_retry_term_exchange() and __qlt_send_term_exchange()).  But
cmds get aborted, so give it a pass?

Test 2: Seems to work; cmds are aborted.

Test 3: Target reset doesn't seem to abort cmds, instead, a few seconds
later:
qla2xxx [0000:04:00.0]-f058:9: qla_target(0): tag 1314312, op 2a: CTIO
with TIMEOUT status 0xb received (state 1, port 51:40:2e:c0:18:1d:9f:cc,
LUN 0)

Tests 4 and 5: The initiator is unable to log back in to the target; the
following messages are repeated over and over on the target:
qla2xxx [0000:04:00.0]-e01c:9: Sending TERM ELS CTIO (ha=00000000f8811390)
qla2xxx [0000:04:00.0]-f097:9: Linking sess 000000008df5aba8 [0] wwn
51:40:2e:c0:18:1d:9f:cc with PLOGI ACK to wwn 51:40:2e:c0:18:1d:9f:cc
s_id 00:00:01, ref=2 pla 00000000835a9271 link 0

Test 6: passes with my patches; SRR not supported previously.

So qla2xxx+LIO seems a bit flaky when handling exceptions, but my
patches do not make it any worse.  Perhaps someone who is more familiar
with LIO can look at the difference between LIO and SCST and figure out
how to improve it.

Tony Battersby
https://www.cybernetics.com/

Tony Battersby (16):
  Revert "scsi: qla2xxx: Perform lockless command completion in abort
    path"
  scsi: qla2xxx: fix initiator mode with qlini_mode=exclusive
  scsi: qla2xxx: fix lost interrupts with qlini_mode=disabled
  scsi: qla2xxx: use reinit_completion on mbx_intr_comp
  scsi: qla2xxx: remove code for unsupported hardware
  scsi: qla2xxx: improve debug output for term exchange
  scsi: qla2xxx: fix term exchange when cmd_sent_to_fw == 1
  scsi: qla2xxx: clear cmds after chip reset
  scsi: qla2xxx: fix races with aborting commands
  scsi: qla2xxx: improve checks in qlt_xmit_response / qlt_rdy_to_xfer
  scsi: qla2xxx: fix TMR failure handling
  scsi: qla2xxx: fix invalid memory access with big CDBs
  scsi: qla2xxx: add cmd->rsp_sent
  scsi: qla2xxx: improve cmd logging
  scsi: qla2xxx: add back SRR support
  scsi: qla2xxx: improve safety of cmd lookup by handle

 drivers/scsi/qla2xxx/qla_dbg.c     |    3 +-
 drivers/scsi/qla2xxx/qla_def.h     |    1 -
 drivers/scsi/qla2xxx/qla_gbl.h     |    2 +-
 drivers/scsi/qla2xxx/qla_init.c    |    1 +
 drivers/scsi/qla2xxx/qla_isr.c     |   32 +-
 drivers/scsi/qla2xxx/qla_mbx.c     |    2 +
 drivers/scsi/qla2xxx/qla_mid.c     |    4 +-
 drivers/scsi/qla2xxx/qla_os.c      |   35 +-
 drivers/scsi/qla2xxx/qla_target.c  | 1775 +++++++++++++++++++++++-----
 drivers/scsi/qla2xxx/qla_target.h  |  112 +-
 drivers/scsi/qla2xxx/tcm_qla2xxx.c |   17 +
 11 files changed, 1646 insertions(+), 338 deletions(-)


base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
-- 
2.43.0
Re: [PATCH v2 00/16] qla2xxx target mode improvements
Posted by Tony Battersby 3 months ago
On 9/29/25 10:28, Tony Battersby wrote:
> v1 -> v2
> - Add new patch "scsi: qla2xxx: clear cmds after chip reset" suggested
> by Dmitry Bogdanov.
> - Rename "scsi: qla2xxx: fix oops during cmd abort" to "scsi: qla2xxx:
> fix races with aborting commands" and make SCST reset the ISP on a HW
> timeout instead of unmapping DMA that might still be in use.
> - Fix "scsi: qla2xxx: fix TMR failure handling" to free mcmds properly
> for LIO.
> - In "scsi: qla2xxx: add back SRR support", detect more buggy HBA fw
> versions based on the fw release notes.
> - Shorten code comment in "scsi: qla2xxx: improve safety of cmd lookup
> by handle" and improve patch description.
> - Rebase other patches as needed.
>
> v1:
> https://lore.kernel.org/r/f8977250-638c-4d7d-ac0c-65f742b8d535@cybernetics.com/
>
> This patch series improves the qla2xxx FC driver in target mode.  I
> developed these patches using the out-of-tree SCST target-mode subsystem
> (https://scst.sourceforge.net/), although most of the improvements will
> also apply to the other target-mode subsystems such as the in-tree LIO. 
> Unfortunately qla2xxx+LIO does not pass all of my tests, but my patches
> do not make it any worse (results below).  These patches have been
> well-tested at my employer with qla2xxx+SCST in both initiator mode and
> target mode and with a variety of FC HBAs and initiators.  Since SCST is
> out-of-tree, some of the patches have parts that apply in-tree and other
> parts that apply out-of-tree to SCST.  I am going to include the
> out-of-tree SCST patches to provide additional context; feel free to
> ignore them if you are not interested.
>
> All patches apply to linux 6.17 and SCST 3.10 master branch.
>
> Summary of patches:
> - bugfixes
> - cleanups
> - improve handling of aborts and task management requests
> - improve log message
> - add back SLER / SRR support (removed in 2017)
>
> Some of these patches improve handling of aborts and task management
> requests.  This is some of the testing that I did:
>
> Test 1: Use /dev/sg to queue random disk I/O with short timeouts; make
> sure cmds are aborted successfully.
> Test 2: Queue lots of disk I/O, then use "sg_reset -N -d /dev/sg" on
> initiator to reset logical unit.
> Test 3: Queue lots of disk I/O, then use "sg_reset -N -t /dev/sg" on
> initiator to reset target.
> Test 4: Queue lots of disk I/O, then use "sg_reset -N -b /dev/sg" on
> initiator to reset bus.
> Test 5: Queue lots of disk I/O, then use "sg_reset -N -H /dev/sg" on
> initiator to reset host.
> Test 6: Use fiber channel attenuator to trigger SRR during
> write/read/compare test; check data integrity.
>
> With my patches, SCST passes all of these tests.
>
> Results with in-tree LIO target-mode subsystem:
>
> Test 1: Seems to abort the same cmd multiple times (both
> qlt_24xx_retry_term_exchange() and __qlt_send_term_exchange()).  But
> cmds get aborted, so give it a pass?
>
> Test 2: Seems to work; cmds are aborted.
>
> Test 3: Target reset doesn't seem to abort cmds, instead, a few seconds
> later:
> qla2xxx [0000:04:00.0]-f058:9: qla_target(0): tag 1314312, op 2a: CTIO
> with TIMEOUT status 0xb received (state 1, port 51:40:2e:c0:18:1d:9f:cc,
> LUN 0)
>
> Tests 4 and 5: The initiator is unable to log back in to the target; the
> following messages are repeated over and over on the target:
> qla2xxx [0000:04:00.0]-e01c:9: Sending TERM ELS CTIO (ha=00000000f8811390)
> qla2xxx [0000:04:00.0]-f097:9: Linking sess 000000008df5aba8 [0] wwn
> 51:40:2e:c0:18:1d:9f:cc with PLOGI ACK to wwn 51:40:2e:c0:18:1d:9f:cc
> s_id 00:00:01, ref=2 pla 00000000835a9271 link 0
>
> Test 6: passes with my patches; SRR not supported previously.
>
> So qla2xxx+LIO seems a bit flaky when handling exceptions, but my
> patches do not make it any worse.  Perhaps someone who is more familiar
> with LIO can look at the difference between LIO and SCST and figure out
> how to improve it.
>
> Tony Battersby
> https://www.cybernetics.com/
>
> Tony Battersby (16):
>   Revert "scsi: qla2xxx: Perform lockless command completion in abort
>     path"
>   scsi: qla2xxx: fix initiator mode with qlini_mode=exclusive
>   scsi: qla2xxx: fix lost interrupts with qlini_mode=disabled
>   scsi: qla2xxx: use reinit_completion on mbx_intr_comp
>   scsi: qla2xxx: remove code for unsupported hardware
>   scsi: qla2xxx: improve debug output for term exchange
>   scsi: qla2xxx: fix term exchange when cmd_sent_to_fw == 1
>   scsi: qla2xxx: clear cmds after chip reset
>   scsi: qla2xxx: fix races with aborting commands
>   scsi: qla2xxx: improve checks in qlt_xmit_response / qlt_rdy_to_xfer
>   scsi: qla2xxx: fix TMR failure handling
>   scsi: qla2xxx: fix invalid memory access with big CDBs
>   scsi: qla2xxx: add cmd->rsp_sent
>   scsi: qla2xxx: improve cmd logging
>   scsi: qla2xxx: add back SRR support
>   scsi: qla2xxx: improve safety of cmd lookup by handle
>
>  drivers/scsi/qla2xxx/qla_dbg.c     |    3 +-
>  drivers/scsi/qla2xxx/qla_def.h     |    1 -
>  drivers/scsi/qla2xxx/qla_gbl.h     |    2 +-
>  drivers/scsi/qla2xxx/qla_init.c    |    1 +
>  drivers/scsi/qla2xxx/qla_isr.c     |   32 +-
>  drivers/scsi/qla2xxx/qla_mbx.c     |    2 +
>  drivers/scsi/qla2xxx/qla_mid.c     |    4 +-
>  drivers/scsi/qla2xxx/qla_os.c      |   35 +-
>  drivers/scsi/qla2xxx/qla_target.c  | 1775 +++++++++++++++++++++++-----
>  drivers/scsi/qla2xxx/qla_target.h  |  112 +-
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c |   17 +
>  11 files changed, 1646 insertions(+), 338 deletions(-)
>
>
> base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a

Martin,

Could you apply this patch series for 6.19?  I have addressed all review
comments and no one has given me any objections.  All patches are on v2
except patch #11 which is on v3.

https://lore.kernel.org/linux-scsi/e95ee7d0-3580-4124-b854-7f73ca3a3a84@cybernetics.com/

Thanks,
Tony Battersby
https://www.cybernetics.com/
Re: [PATCH v2 00/16] qla2xxx target mode improvements
Posted by Martin K. Petersen 3 months ago
Tony,

> Could you apply this patch series for 6.19? I have addressed all
> review comments and no one has given me any objections. All patches
> are on v2 except patch #11 which is on v3.

b4 and patchwork doesn't know how to deal with this mix of SCST and
kernel changes.

Please post a v3 series which only has the Linux bits.

Thanks!

-- 
Martin K. Petersen