[PATCH AUTOSEL 6.19-6.12] scsi: mpi3mr: Clear reset history on ready and recheck state after timeout

Sasha Levin posted 1 patch 4 weeks, 1 day ago
drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
1 file changed, 10 insertions(+)
[PATCH AUTOSEL 6.19-6.12] scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
Posted by Sasha Levin 4 weeks, 1 day ago
From: Ranjan Kumar <ranjan.kumar@broadcom.com>

[ Upstream commit dbd53975ed4132d161b6a97ebe785a262380182d ]

The driver retains reset history even after the IOC has successfully
reached the READY state. That leaves stale reset information active during
normal operation and can mislead recovery and diagnostics.  In addition, if
the IOC becomes READY just as the ready timeout loop exits, the driver
still follows the failure path and may retry or report failure incorrectly.

Clear reset history once READY is confirmed so driver state matches actual
IOC status. After the timeout loop, recheck the IOC state and treat READY
as success instead of failing.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20260225082622.82588-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a clear picture of the changes. Let me finalize my analysis.

## Analysis

### What the commit does

This commit makes two small changes to `mpi3mr_bring_ioc_ready()` in the
mpi3mr SCSI driver:

1. **Clears reset history on READY** (line 1533): When the IOC
   transitions to READY state inside the polling loop, it now calls
   `mpi3mr_clear_reset_history()` before returning success. Without
   this, stale reset history bits remain set, which can confuse
   subsequent recovery logic that checks
   `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY`.

2. **Rechecks state after timeout** (lines 1553-1560): After the
   `do...while` timeout loop exits, the code does one final check to see
   if the IOC became READY right at the boundary. Previously, if the IOC
   transitioned to READY during the last `msleep(100)` (after the
   elapsed time already exceeded the timeout), the driver would fall
   through to `out_failed` and potentially retry unnecessarily or report
   failure.

### Bug assessment

**Race at timeout boundary (bug #2):** This is a real timing bug. The
loop checks `elapsed_time_sec < mrioc->ready_timeout` *after*
`msleep(100)`. If the IOC becomes READY during that last sleep,
`elapsed_time_sec` has already exceeded the timeout, so the loop exits.
Without the post-loop recheck, the driver goes to `out_failed`, which
triggers up to 2 retries (unnecessary soft resets) or reports
initialization failure. This can cause:
- Unnecessary controller resets during probe/resume
- Spurious initialization failures if retries are exhausted
- I/O failures during reset recovery (since `mpi3mr_reinit_ioc` fails)

**Stale reset history (bug #1):** If reset history is not cleared when
the IOC reaches READY, subsequent iterations of the loop in future calls
to this function will see `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY` set and
jump to `out_failed` (line 1537-1540), causing spurious failure.

### Stable kernel criteria

- **Fixes a real bug:** Yes - timing race causes false initialization
  failures; stale state causes spurious failures in subsequent
  operations.
- **Obviously correct:** Yes - `mpi3mr_clear_reset_history()` is already
  called elsewhere in the same function (at line 1510 after soft reset).
  The post-timeout recheck follows the same pattern as the in-loop
  check.
- **Small and contained:** Yes - 10 lines added, single file, single
  function.
- **No new features:** Correct - pure bug fix.
- **Risk:** Very low. The added code only adds a success path where
  there was none (post-timeout READY check) and clears a status bit that
  should be cleared (reset history). No existing success/failure paths
  are altered.

### Impact

This function is called on three critical paths: PCI probe, controller
reset recovery, and system resume. A false failure here means the SCSI
controller doesn't initialize, which means loss of access to all
attached storage devices. This affects real users with Broadcom MPI3
(SAS/NVMe) storage controllers.

### Verification

- Confirmed `mpi3mr_clear_reset_history` is a trivial inline that clears
  the RESET_HISTORY bit in IOC status register (lines 1256-1263).
- Confirmed the function is called from probe (`mpi3mr_init_ioc`), reset
  (`mpi3mr_reinit_ioc`), and resume paths.
- Confirmed that `mpi3mr_clear_reset_history` is already used elsewhere
  in the same function, so the pattern is established.
- Confirmed the timeout race: the `do...while` loop exits when
  `elapsed_time_sec >= ready_timeout`, but the IOC could have become
  READY during the final `msleep(100)`, which is not checked without
  this patch.
- Confirmed that falling through to `out_failed` triggers retries (up to
  2) or returns failure (line 1564-1577).
- The commit is authored by Broadcom (the hardware vendor), accepted by
  the SCSI maintainer Martin Petersen, indicating domain expertise.
- Could NOT verify specific user reports of this race triggering in the
  field (commit message implies it was found via code
  review/diagnostics), but the race window is real and the consequences
  are severe.

**YES**

 drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 8c4bb7169a87c..6d36575997871 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -1530,6 +1530,7 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
 			ioc_info(mrioc,
 			    "successfully transitioned to %s state\n",
 			    mpi3mr_iocstate_name(ioc_state));
+			mpi3mr_clear_reset_history(mrioc);
 			return 0;
 		}
 		ioc_status = readl(&mrioc->sysif_regs->ioc_status);
@@ -1549,6 +1550,15 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
 		elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
 	} while (elapsed_time_sec < mrioc->ready_timeout);
 
+	ioc_state = mpi3mr_get_iocstate(mrioc);
+	if (ioc_state == MRIOC_STATE_READY) {
+		ioc_info(mrioc,
+		    "successfully transitioned to %s state after %llu seconds\n",
+		    mpi3mr_iocstate_name(ioc_state), elapsed_time_sec);
+		mpi3mr_clear_reset_history(mrioc);
+		return 0;
+	}
+
 out_failed:
 	elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
 	if ((retry < 2) && (elapsed_time_sec < (mrioc->ready_timeout - 60))) {
-- 
2.51.0