drivers/spi/spi-tegra210-quad.c | 174 +++++++++++++++++++++++--------- 1 file changed, 128 insertions(+), 46 deletions(-)
Hi, This patch series addresses timeout handling issues in the Tegra QSPI driver that occur under high system load conditions. We've observed that when CPUs are saturated (due to error injection, RAS firmware activity, or general CPU contention), QSPI interrupt handlers can be delayed, causing spurious transfer failures even though the hardware completed the operation successfully. Patch 1 fixes a stale pointer issue by ensuring curr_xfer is cleared on timeout and checked when the IRQ thread finally runs. It also ensures interrupts are properly cleared on failure paths. Patch 2 refactors the timeout cleanup code into dedicated helper functions (tegra_qspi_reset, tegra_qspi_dma_stop, tegra_qspi_pio_stop) to improve code readability and maintainability. This is purely a code reorganization with no functional changes. Patch 3 adds hardware status checking on timeout. Before failing a transfer, the driver now reads QSPI_TRANS_STATUS to verify if the hardware actually completed the operation. If so, it manually invokes the completion handler instead of failing the transfer. This distinguishes genuine hardware timeouts from delayed/lost interrupts. These changes have been tested in production environments under various high load scenarios including RAS testing and CPU saturation workloads. Changes in v5: - No code changes, rebased to resolve conflicts Changes in v4: - Removed Change-Id from commit messages Changes in v3: - Added missing tqspi->curr_xfer = NULL assignment in handle_cpu_based_xfer() - Split the previous patch 2/2 into two separate patches (now 2/3 and 3/3) - Patch 2/3: New patch - refactoring only, no functional changes - Patch 3/3: Functional changes to add hardware timeout checking Changes in v2: - Fixed indentation in patch 1/2: The "Reset controller if timeout happens" block now has correct indentation (inside the WARN_ON_ONCE block) - No functional changes Thierry Reding (1): spi: tegra210-quad: Fix timeout handling Vishwaroop A (2): spi: tegra210-quad: Refactor error handling into helper functions spi: tegra210-quad: Check hardware status on timeout drivers/spi/spi-tegra210-quad.c | 174 +++++++++++++++++++++++--------- 1 file changed, 128 insertions(+), 46 deletions(-) -- 2.17.1
On Tue, 28 Oct 2025 15:57:00 +0000, Vishwaroop A wrote:
> This patch series addresses timeout handling issues in the Tegra QSPI driver
> that occur under high system load conditions. We've observed that when CPUs
> are saturated (due to error injection, RAS firmware activity, or general CPU
> contention), QSPI interrupt handlers can be delayed, causing spurious transfer
> failures even though the hardware completed the operation successfully.
>
> Patch 1 fixes a stale pointer issue by ensuring curr_xfer is cleared on timeout
> and checked when the IRQ thread finally runs. It also ensures interrupts are
> properly cleared on failure paths.
>
> [...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
Thanks!
[1/3] spi: tegra210-quad: Fix timeout handling
commit: b4e002d8a7cee3b1d70efad0e222567f92a73000
[2/3] spi: tegra210-quad: Refactor error handling into helper functions
commit: 6022eacdda8b0b06a2e1d4122e5268099b62ff5d
[3/3] spi: tegra210-quad: Check hardware status on timeout
commit: 380fd29d57abe6679d87ec56babe65ddc5873a37
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
On 28/10/2025 15:57, Vishwaroop A wrote: > Hi, > > This patch series addresses timeout handling issues in the Tegra QSPI driver > that occur under high system load conditions. We've observed that when CPUs > are saturated (due to error injection, RAS firmware activity, or general CPU > contention), QSPI interrupt handlers can be delayed, causing spurious transfer > failures even though the hardware completed the operation successfully. > > Patch 1 fixes a stale pointer issue by ensuring curr_xfer is cleared on timeout > and checked when the IRQ thread finally runs. It also ensures interrupts are > properly cleared on failure paths. > > Patch 2 refactors the timeout cleanup code into dedicated helper functions > (tegra_qspi_reset, tegra_qspi_dma_stop, tegra_qspi_pio_stop) to improve code > readability and maintainability. This is purely a code reorganization with no > functional changes. > > Patch 3 adds hardware status checking on timeout. Before failing a transfer, > the driver now reads QSPI_TRANS_STATUS to verify if the hardware actually > completed the operation. If so, it manually invokes the completion handler > instead of failing the transfer. This distinguishes genuine hardware timeouts > from delayed/lost interrupts. > > These changes have been tested in production environments under various high > load scenarios including RAS testing and CPU saturation workloads. For the series ... Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Thanks Jon -- nvpublic
© 2016 - 2025 Red Hat, Inc.