From nobody Wed Dec 17 07:27:48 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2D252BE020 for ; Wed, 13 Aug 2025 08:57:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755075435; cv=none; b=Dgunznvr2tUhuQ4PNnpczZW49QZsd5/FvlrfO0/ZdGV9Js+4QGrTP745seTiyQLcoUknWMUVMH4QtZfjPQQkKwpal4yFdZUJ04rBJES4vtUs4GSMdqcQ7Y88RjntypynuMcXruQHiWqR9oO9mzrZds0FzeABa/x/Hqz66BOROaI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755075435; c=relaxed/simple; bh=tPv65mDfdZ0llyoTq/yFclRsFjMwzVQQFJKFvnEN1BY=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=at9353R53xFIME+jQI76i1k7zz/H+eN3tkbIGX7n2wQV0is/5UDkVJpQAmk95zQWy+2xkQwnb8muc3zpR6P+FAaru+TYELjIBfz4L07rnxU9p+yi9l+ch5gf8t4N2p7/J5+f3EEIEuTVlkRKX8zVCgEI568Vi8g1MKoF8pl8jFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fih5kiHO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fih5kiHO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 67C3BC4CEFD; Wed, 13 Aug 2025 08:57:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755075435; bh=tPv65mDfdZ0llyoTq/yFclRsFjMwzVQQFJKFvnEN1BY=; h=From:To:Cc:Subject:Date:From; b=fih5kiHO4WkIyfwJXWpF18b7qsVzfXDW7l8SYOmBRrHZBnKEA7vHlL7ITh6iwyFYv FZgfIPaKtlNafDg2xRR/RH2G3knrSxljuC92QnxENgXDzCi+kLa2p1spt7XqskWkrx czoG6ckmHHqeEabTd5Qv4DnBdZcIFqA4JRsi/5Zz3ogWInwnhVBjnblg96qLfOOMN7 05LvmMBtMVnh8WsEBh3disiYIQCu51Bm2Tay7Kej/6kHghFv+EnbwxM6FDA6Y2ucxP 0rx2IttgRxtkXPXGQcOFFf6/8hSSnxJqoOztZBLX7OZEf4p/lqmA/vEn+CL9rUzjCO NklAy7kbPnE6g== From: Philipp Stanner To: Matthew Brost , Danilo Krummrich , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, James Flowers Subject: [PATCH v2] drm/sched: Document race condition in drm_sched_fini() Date: Wed, 13 Aug 2025 10:56:55 +0200 Message-ID: <20250813085654.102504-2-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In drm_sched_fini() all entities are marked as stopped - without taking the appropriate lock, because that would deadlock. That means that drm_sched_fini() and drm_sched_entity_push_job() can race against each other. This should most likely be fixed by establishing the rule that all entities associated with a scheduler must be torn down first. Then, however, the locking should be removed from drm_sched_fini() alltogether with an appropriate comment. Reported-by: James Flowers Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2373= @fastmail.com/ Signed-off-by: Philipp Stanner Reviewed-by: Christian K=C3=B6nig --- Changes in v2: - Fix typo. --- drivers/gpu/drm/scheduler/sched_main.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/sched= uler/sched_main.c index 5a550fd76bf0..46119aacb809 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1424,6 +1424,22 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) * Prevents reinsertion and marks job_queue as idle, * it will be removed from the rq in drm_sched_entity_fini() * eventually + * + * FIXME: + * This lacks the proper spin_lock(&s_entity->lock) and + * is, therefore, a race condition. Most notably, it + * can race with drm_sched_entity_push_job(). The lock + * cannot be taken here, however, because this would + * lead to lock inversion -> deadlock. + * + * The best solution probably is to enforce the life + * time rule of all entities having to be torn down + * before their scheduler. Then, however, locking could + * be dropped alltogether from this function. + * + * For now, this remains a potential race in all + * drivers that keep entities alive for longer than + * the scheduler. */ s_entity->stopped =3D true; spin_unlock(&rq->lock); --=20 2.49.0