The current threaded IRQ implementation in spi-tegra210-quad suffers from
scheduler-induced latency on heavily loaded systems. Because threaded IRQ
handlers are subject to CFS scheduling, they can be delayed long enough to
trigger transfer timeouts even though hardware completes in microseconds.
This results in false timeout errors and WARN_ON splats during normal
operation.
This series addresses the problem in three steps:
1. Convert the threaded IRQ handler to a hard IRQ + high-priority unbound
workqueue model. The hard IRQ does the minimum: verify interrupt
ownership, cache status registers, clear and mask interrupts. The
workqueue bottom-half handles the rest in process context and can run
on any available CPU, avoiding the CPU0 bottleneck inherent in threaded
IRQs.
2. Cache QSPI_TRANS_STATUS in the ISR before clearing it. This allows the
timeout handler to distinguish between a real hardware timeout (QSPI_RDY
not set) and a delayed workqueue (QSPI_RDY set), preventing false
timeout errors when hardware has already completed.
3. Process small PIO transfers (≤ FIFO depth, 256 bytes) directly in
hard IRQ context. This eliminates workqueue scheduling overhead for
latency-sensitive devices like TPMs, reducing completion latency from
potentially seconds to microseconds.
Tested on TH500 with TPM and QSPI flash devices under sustained load.
The series applies cleanly on top of linux-next (20260508).
Vishwaroop A (3):
spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue
spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler
spi: tegra210-quad: Process small PIO transfers in hard IRQ context
drivers/spi/spi-tegra210-quad.c | 169 ++++++++++++++++++++++----------
1 file changed, 117 insertions(+), 52 deletions(-)
--
2.17.1