drm/msm: make sure to not queue up recovery more than once

[PATCH] drm/msm: make sure to not queue up recovery more than once

Posted by Antonino Maniscalco 5 months, 2 weeks ago

If two fault IRQs arrive in short succession recovery work will be
queued up twice.

When recovery runs a second time it may end up killing an unrelated
context.

Prevent this by masking off interrupts when triggering recovery.

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 45dd5fd1c2bfcb0a01b71a326c7d95b0f9496d99..f8992a68df7fb77362273206859e696c1a52e02f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1727,6 +1727,9 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
 	/* Turn off the hangcheck timer to keep it from bothering us */
 	timer_delete(&gpu->hangcheck_timer);
 
+	/* Turn off interrupts to avoid triggering recovery again */
+	gpu_write(gpu, REG_A6XX_RBBM_INT_0_MASK, 0);
+
 	kthread_queue_work(gpu->worker, &gpu->recover_work);
 }
 

---
base-commit: ba0f4c4c0f9d0f90300578fc8d081f43be281a71
change-id: 20250821-recovery-fix-350c07a92f97

Best regards,
-- 
Antonino Maniscalco <antomani103@gmail.com>

Re: [PATCH] drm/msm: make sure to not queue up recovery more than once

Posted by Akhil P Oommen 5 months, 2 weeks ago

On 8/21/2025 6:36 PM, Antonino Maniscalco wrote:
> If two fault IRQs arrive in short succession recovery work will be
> queued up twice.
> 
> When recovery runs a second time it may end up killing an unrelated
> context.
> 
> Prevent this by masking off interrupts when triggering recovery.
> 
> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>

Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 45dd5fd1c2bfcb0a01b71a326c7d95b0f9496d99..f8992a68df7fb77362273206859e696c1a52e02f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1727,6 +1727,9 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>  	/* Turn off the hangcheck timer to keep it from bothering us */
>  	timer_delete(&gpu->hangcheck_timer);
>  
> +	/* Turn off interrupts to avoid triggering recovery again */
> +	gpu_write(gpu, REG_A6XX_RBBM_INT_0_MASK, 0);
> +
>  	kthread_queue_work(gpu->worker, &gpu->recover_work);
>  }
>  
> 
> ---
> base-commit: ba0f4c4c0f9d0f90300578fc8d081f43be281a71
> change-id: 20250821-recovery-fix-350c07a92f97
> 
> Best regards,