[PATCH v2 1/2] drm/nouveau: Fix WARN_ON in nouveau_fence_context_kill()

Philipp Stanner posted 2 patches 8 months, 1 week ago
[PATCH v2 1/2] drm/nouveau: Fix WARN_ON in nouveau_fence_context_kill()
Posted by Philipp Stanner 8 months, 1 week ago
Nouveau is mostly designed in a way that it's expected that fences only
ever get signaled through nouveau_fence_signal(). However, in at least
one other place, nouveau_fence_done(), can signal fences, too. If that
happens (race) a signaled fence remains in the pending list for a while,
until it gets removed by nouveau_fence_update().

Should nouveau_fence_context_kill() run in the meantime, this would be
a bug because the function would attempt to set an error code on an
already signaled fence.

Have nouveau_fence_context_kill() check for a fence being signaled.

Cc: <stable@vger.kernel.org> # v5.10+
Fixes: ea13e5abf807 ("drm/nouveau: signal pending fences when channel has been killed")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 7622587f149e..6ded8c2b6d3b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -90,7 +90,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 	while (!list_empty(&fctx->pending)) {
 		fence = list_entry(fctx->pending.next, typeof(*fence), head);
 
-		if (error)
+		if (error && !dma_fence_is_signaled_locked(&fence->base))
 			dma_fence_set_error(&fence->base, error);
 
 		if (nouveau_fence_signal(fence))
-- 
2.48.1

Re: [PATCH v2 1/2] drm/nouveau: Fix WARN_ON in nouveau_fence_context_kill()
Posted by Danilo Krummrich 8 months ago
On Tue, Apr 15, 2025 at 02:19:00PM +0200, Philipp Stanner wrote:
> Nouveau is mostly designed in a way that it's expected that fences only
> ever get signaled through nouveau_fence_signal(). However, in at least
> one other place, nouveau_fence_done(), can signal fences, too. If that
> happens (race) a signaled fence remains in the pending list for a while,
> until it gets removed by nouveau_fence_update().
> 
> Should nouveau_fence_context_kill() run in the meantime, this would be
> a bug because the function would attempt to set an error code on an
> already signaled fence.
> 
> Have nouveau_fence_context_kill() check for a fence being signaled.
> 
> Cc: <stable@vger.kernel.org> # v5.10+
> Fixes: ea13e5abf807 ("drm/nouveau: signal pending fences when channel has been killed")
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>

Applied to drm-misc-fixes, thanks!