drivers/scsi/megaraid/megaraid_sas_base.c | 16 ++++++++++++++++ drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 +++++++++++++- 2 files changed, 29 insertions(+), 1 deletion(-)
From: Ionut Nechita <ionut_n2001@yahoo.com>
When the MegaRAID firmware returns MFI_STAT_SCSI_DONE_WITH_ERROR (0x2d)
with zero bytes transferred on a data-bearing command, the driver
currently returns DID_OK to the SCSI midlayer. This causes the I/O to
appear complete with no data, leading to hung tasks that block
indefinitely.
Production systems show the following repeated pattern:
sd 0:0:9:0: [sdb] tag#24 BRCM Debug mfi stat 0x2d, data len
requested/completed 0x1000/0x0
INFO: task systemd-udevd:267 blocked for more than 245 seconds.
INFO: task modprobe:296 blocked for more than 246 seconds.
When the firmware reports DONE_WITH_ERROR with no data transferred and
no CHECK_CONDITION sense data, return DID_SOFT_ERROR instead of DID_OK.
This causes the SCSI midlayer to retry the command up to cmd->allowed
times (default 5), matching the established pattern used by mpt3sas and
smartpqi for similar conditions.
Commands with CHECK_CONDITION sense data are not affected -- they
continue to be completed immediately with the sense data intact.
Fixes: 9c915a8c99bc ("[SCSI] megaraid_sas: Add 9565/9285 specific code")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 16 ++++++++++++++++
drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 +++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index abbbc4b36cd1d..de35b7d5094d7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3682,6 +3682,22 @@ megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
hdr->sense_len);
}
+ /*
+ * MFI firmware does not report actual bytes
+ * transferred, so we cannot compute residuals.
+ * If data was expected and no CHECK_CONDITION,
+ * retry via DID_SOFT_ERROR. The SCSI midlayer
+ * retries up to cmd->allowed times (default 5).
+ */
+ if (hdr->scsi_status != SAM_STAT_CHECK_CONDITION &&
+ scsi_bufflen(cmd->scmd) > 0) {
+ cmd->scmd->result = DID_SOFT_ERROR << 16;
+ dev_warn(&instance->pdev->dev,
+ "megaraid_sas: DONE_WITH_ERROR (stat 0x%x) on cmd 0x%x to tgt %d, retrying\n",
+ hdr->cmd_status, hdr->cmd,
+ hdr->target_id);
+ }
+
break;
case MFI_STAT_LD_OFFLINE:
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index a6794f49e9fae..6021f1363ef4c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2066,7 +2066,19 @@ map_cmd_status(struct fusion_context *fusion,
resid = (scsi_bufflen(scmd) - data_length);
scsi_set_resid(scmd, resid);
- if (resid &&
+ /*
+ * If data was expected but zero bytes were transferred
+ * and there is no CHECK_CONDITION sense data, retry via
+ * DID_SOFT_ERROR. The SCSI midlayer retries up to
+ * cmd->allowed times (default 5).
+ */
+ if (data_length == 0 && scsi_bufflen(scmd) > 0 &&
+ ext_status != SAM_STAT_CHECK_CONDITION) {
+ scmd->result = DID_SOFT_ERROR << 16;
+ scmd_printk(KERN_WARNING, scmd,
+ "megaraid_sas: zero data on DONE_WITH_ERROR (stat 0x%x, bufflen 0x%x), retrying\n",
+ status, scsi_bufflen(scmd));
+ } else if (resid &&
((cmd_type == READ_WRITE_LDIO) ||
(cmd_type == READ_WRITE_SYSPDIO)))
scmd_printk(KERN_INFO, scmd, "BRCM Debug mfi stat 0x%x, data len"
--
2.52.0
© 2016 - 2026 Red Hat, Inc.