From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95F35277028; Mon, 7 Jul 2025 13:42:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895755; cv=none; b=UwHz+j8NLnPf8Oggcc6nu2Sf67wpQJ0TM9QGJ/aYF+HNeUz3EKAQ3o53RdRgc7d3El49cucciBJmGtf6bwKjK+EI8Sy2ng/teh1sRAhr6EGb1beQ/RhVUDS+V8wepwcAvY8mjulZ9s7RAQFuDMPcJNtu2nSVBuM5gD7WT+FRR1Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895755; c=relaxed/simple; bh=crPQc1BXWo+zV232yN0wvYJrZBZg43BUNUjW4auuQw8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=u5hJ1IPrvuEZZ8tfN/gMWwbKrTxeTXggS1U0yInE1je19TvzbzuKf4fvcJVSv0Nrsbm/4UF9IIGgH3dF6dbIjWIESqtbWdwlLy9fV2RPiuUnDc77CT/35p/g7MirtCZbk8+pydVdcA4k6Agc6c9Ixyn0KFKLXKHKm2khs3Wamz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Cz2JhRko; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Cz2JhRko" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F28B7C4CEF4; Mon, 7 Jul 2025 13:42:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895755; bh=crPQc1BXWo+zV232yN0wvYJrZBZg43BUNUjW4auuQw8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Cz2JhRkoeedY5jsYLkcZ1J69fK5VX/jzB0l98Rla+AezIbFy+jiJP3v6AqdLy3l2g Mm9QIpsNiFeZjpL0DinV5aE6HBiZtbpMw2TzaIijROuX2lBKOycfQVhb0O+19wWzKF 8w1aQqfIir5qRUZxFYiPvV4Wct5ZDKgiuIfVDEMgBgUSzplgScMCAlOQeFPARNpvPw Es6GygnUw6YS9WVn9g3ZX4iZ87sw6O48/aPE7EP9DIDfD7oLkF96/xio1tyq3ABrZt CQNNM4MmDV5a1bpsl1Ztp3JohAepXB9cxIBH2QkxBqL1TT0FwboXz96htsUpdGpyT7 WemGA2yRvSJ6A== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, =?UTF-8?q?Ma=C3=ADra=20Canal?= Subject: [PATCH v2 1/7] drm/sched: Avoid memory leaks with cancel_job() callback Date: Mon, 7 Jul 2025 15:42:14 +0200 Message-ID: <20250707134221.34291-3-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Since its inception, the GPU scheduler can leak memory if the driver calls drm_sched_fini() while there are still jobs in flight. The simplest way to solve this in a backwards compatible manner is by adding a new callback, drm_sched_backend_ops.cancel_job(), which instructs the driver to signal the hardware fence associated with the job. Afterwards, the scheduler can safely use the established free_job() callback for freeing the job. Implement the new backend_ops callback cancel_job(). Suggested-by: Tvrtko Ursulin Link: https://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursul= in@igalia.com/ Signed-off-by: Philipp Stanner Reviewed-by: Ma=C3=ADra Canal --- drivers/gpu/drm/scheduler/sched_main.c | 34 ++++++++++++++++---------- include/drm/gpu_scheduler.h | 18 ++++++++++++++ 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/sched= uler/sched_main.c index c63543132f9d..1239954f5f7c 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1353,6 +1353,18 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, = const struct drm_sched_init_ } EXPORT_SYMBOL(drm_sched_init); =20 +static void drm_sched_cancel_remaining_jobs(struct drm_gpu_scheduler *sche= d) +{ + struct drm_sched_job *job, *tmp; + + /* All other accessors are stopped. No locking necessary. */ + list_for_each_entry_safe_reverse(job, tmp, &sched->pending_list, list) { + sched->ops->cancel_job(job); + list_del(&job->list); + sched->ops->free_job(job); + } +} + /** * drm_sched_fini - Destroy a gpu scheduler * @@ -1360,19 +1372,11 @@ EXPORT_SYMBOL(drm_sched_init); * * Tears down and cleans up the scheduler. * - * This stops submission of new jobs to the hardware through - * drm_sched_backend_ops.run_job(). Consequently, drm_sched_backend_ops.fr= ee_job() - * will not be called for all jobs still in drm_gpu_scheduler.pending_list. - * There is no solution for this currently. Thus, it is up to the driver t= o make - * sure that: - * - * a) drm_sched_fini() is only called after for all submitted jobs - * drm_sched_backend_ops.free_job() has been called or that - * b) the jobs for which drm_sched_backend_ops.free_job() has not been ca= lled - * after drm_sched_fini() ran are freed manually. - * - * FIXME: Take care of the above problem and prevent this function from le= aking - * the jobs in drm_gpu_scheduler.pending_list under any circumstances. + * This stops submission of new jobs to the hardware through &struct + * drm_sched_backend_ops.run_job. If &struct drm_sched_backend_ops.cancel_= job + * is implemented, all jobs will be canceled through it and afterwards cle= aned + * up through &struct drm_sched_backend_ops.free_job. If cancel_job is not + * implemented, memory could leak. */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { @@ -1402,6 +1406,10 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) /* Confirm no work left behind accessing device structures */ cancel_delayed_work_sync(&sched->work_tdr); =20 + /* Avoid memory leaks if supported by the driver. */ + if (sched->ops->cancel_job) + drm_sched_cancel_remaining_jobs(sched); + if (sched->own_submit_wq) destroy_workqueue(sched->submit_wq); sched->ready =3D false; diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index e62a7214e052..190844370f48 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -512,6 +512,24 @@ struct drm_sched_backend_ops { * and it's time to clean it up. */ void (*free_job)(struct drm_sched_job *sched_job); + + /** + * @cancel_job: Used by the scheduler to guarantee remaining jobs' fences + * get signaled in drm_sched_fini(). + * + * Used by the scheduler to cancel all jobs that have not been executed + * with &struct drm_sched_backend_ops.run_job by the time + * drm_sched_fini() gets invoked. + * + * Drivers need to signal the passed job's hardware fence with an + * appropriate error code (e.g., -ECANCELED) in this callback. They + * must not free the job. + * + * The scheduler will only call this callback once it stopped calling + * all other callbacks forever, with the exception of &struct + * drm_sched_backend_ops.free_job. + */ + void (*cancel_job)(struct drm_sched_job *sched_job); }; =20 /** --=20 2.49.0 From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C531277028; Mon, 7 Jul 2025 13:42:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895759; cv=none; b=FyT5BWRgcRiH3Xse6uh8aibNVUNjm6iJxZAPNBVE/+DJPjeVlkA+5cgbZERbW+1v0/0ko+P23146lr8zYGLZiCunLRJUSdhZYQRAfBdLaz/i+ZtrNoe11HO3SSICB6uL3ok3NCE7VZjgDb5XSa4tjLiaaYiZLHjiZPKV64G2dKw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895759; c=relaxed/simple; bh=Sgl1fbx3s/Q+rdymOYlAFyb0iIci+VpWW4JgdhHhMxs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AhqeMJBnoMD0O0Gp4GYR83CwEQX846JX/dl1pC/trWJ529LCYo7G7XE0UomDTb8+H4l7mhwJH6lChymIn9kY8CTTAytck00LCjKG63CMqo3pcllRZ+1CIX8zG5CpEcTFqrA11xsI+TdCV4AmYlHM6ul/tGsJMg+C6vMV/Enz2mY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lywvtWQ/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lywvtWQ/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 808CAC4CEF7; Mon, 7 Jul 2025 13:42:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895759; bh=Sgl1fbx3s/Q+rdymOYlAFyb0iIci+VpWW4JgdhHhMxs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lywvtWQ/CCfrXQyOmnnnhVJaDD75fJW9DkdBFPYQ1H7lNFejHhcdDXFezF6C6v0GC uSn4wGUNfG/gGEDhCuvufgKB3bbcbA2BpK0G+vOMX8DztBtOtebrNeLH3mW5Irjgt8 giXlcEmigKSfZ+oxNP2ztl73pBQpaQ9HOs7xoA3iMJTE1iTM+KjU71AuDDkPdQ8owI vNMBVj5/LXpaVtzdVQxeEsjmZb2oDgEFhdkIovjwNRF28oAIuORNNpgZ+cfAprfqRe 1gSBFwhNSJJ/nUXr200peua9veNIq86RpmTf3DBJAfO/C4cMDNW+iERD+xXESaalBm 86x9RiWJ1R5BA== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH v2 2/7] drm/sched/tests: Implement cancel_job() callback Date: Mon, 7 Jul 2025 15:42:15 +0200 Message-ID: <20250707134221.34291-4-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The GPU Scheduler now supports a new callback, cancel_job(), which lets the scheduler cancel all jobs which might not yet be freed when drm_sched_fini() runs. Using this callback allows for significantly simplifying the mock scheduler teardown code. Implement the cancel_job() callback and adjust the code where necessary. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 66 +++++++------------ 1 file changed, 23 insertions(+), 43 deletions(-) diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu= /drm/scheduler/tests/mock_scheduler.c index 49d067fecd67..2d3169d95200 100644 --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c @@ -63,7 +63,7 @@ static void drm_mock_sched_job_complete(struct drm_mock_s= ched_job *job) lockdep_assert_held(&sched->lock); =20 job->flags |=3D DRM_MOCK_SCHED_JOB_DONE; - list_move_tail(&job->link, &sched->done_list); + list_del(&job->link); dma_fence_signal_locked(&job->hw_fence); complete(&job->done); } @@ -236,26 +236,39 @@ mock_sched_timedout_job(struct drm_sched_job *sched_j= ob) =20 static void mock_sched_free_job(struct drm_sched_job *sched_job) { - struct drm_mock_scheduler *sched =3D - drm_sched_to_mock_sched(sched_job->sched); struct drm_mock_sched_job *job =3D drm_sched_job_to_mock_job(sched_job); - unsigned long flags; =20 - /* Remove from the scheduler done list. */ - spin_lock_irqsave(&sched->lock, flags); - list_del(&job->link); - spin_unlock_irqrestore(&sched->lock, flags); dma_fence_put(&job->hw_fence); - drm_sched_job_cleanup(sched_job); =20 /* Mock job itself is freed by the kunit framework. */ } =20 +static void mock_sched_cancel_job(struct drm_sched_job *sched_job) +{ + struct drm_mock_scheduler *sched =3D drm_sched_to_mock_sched(sched_job->s= ched); + struct drm_mock_sched_job *job =3D drm_sched_job_to_mock_job(sched_job); + unsigned long flags; + + hrtimer_cancel(&job->timer); + + spin_lock_irqsave(&sched->lock, flags); + if (!dma_fence_is_signaled_locked(&job->hw_fence)) { + list_del(&job->link); + dma_fence_set_error(&job->hw_fence, -ECANCELED); + dma_fence_signal_locked(&job->hw_fence); + } + spin_unlock_irqrestore(&sched->lock, flags); + + /* The GPU Scheduler will call drm_sched_backend_ops.free_job(), still. + * Mock job itself is freed by the kunit framework. */ +} + static const struct drm_sched_backend_ops drm_mock_scheduler_ops =3D { .run_job =3D mock_sched_run_job, .timedout_job =3D mock_sched_timedout_job, - .free_job =3D mock_sched_free_job + .free_job =3D mock_sched_free_job, + .cancel_job =3D mock_sched_cancel_job, }; =20 /** @@ -289,7 +302,6 @@ struct drm_mock_scheduler *drm_mock_sched_new(struct ku= nit *test, long timeout) sched->hw_timeline.context =3D dma_fence_context_alloc(1); atomic_set(&sched->hw_timeline.next_seqno, 0); INIT_LIST_HEAD(&sched->job_list); - INIT_LIST_HEAD(&sched->done_list); spin_lock_init(&sched->lock); =20 return sched; @@ -304,38 +316,6 @@ struct drm_mock_scheduler *drm_mock_sched_new(struct k= unit *test, long timeout) */ void drm_mock_sched_fini(struct drm_mock_scheduler *sched) { - struct drm_mock_sched_job *job, *next; - unsigned long flags; - LIST_HEAD(list); - - drm_sched_wqueue_stop(&sched->base); - - /* Force complete all unfinished jobs. */ - spin_lock_irqsave(&sched->lock, flags); - list_for_each_entry_safe(job, next, &sched->job_list, link) - list_move_tail(&job->link, &list); - spin_unlock_irqrestore(&sched->lock, flags); - - list_for_each_entry(job, &list, link) - hrtimer_cancel(&job->timer); - - spin_lock_irqsave(&sched->lock, flags); - list_for_each_entry_safe(job, next, &list, link) - drm_mock_sched_job_complete(job); - spin_unlock_irqrestore(&sched->lock, flags); - - /* - * Free completed jobs and jobs not yet processed by the DRM scheduler - * free worker. - */ - spin_lock_irqsave(&sched->lock, flags); - list_for_each_entry_safe(job, next, &sched->done_list, link) - list_move_tail(&job->link, &list); - spin_unlock_irqrestore(&sched->lock, flags); - - list_for_each_entry_safe(job, next, &list, link) - mock_sched_free_job(&job->base); - drm_sched_fini(&sched->base); } =20 --=20 2.49.0 From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CC672BCF4A; Mon, 7 Jul 2025 13:42:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895764; cv=none; b=RhMol7/g2fUDhJq8yxiZYFo3EO5Ts0RaDwM8sgOdSALqAlnxXhc+flwv02bFo94a8Qg65RH12l8+bZYFN7tkllS1bcNNu6XGidVSUgxrQi2Qz0cueGHhl1u0VJv1z6l+B7dImLownTKjzUDqaLHzv2La9c8lijnTGNr80PYBrQw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895764; c=relaxed/simple; bh=Q7QXEZRxCNICRO2IlbnIh26kG/g8UZeXIk3BFaIKQJk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OAzA3Hlse1nxiAAo3uRtiWfLQ5KwAYrSFICZSEGYgQoemw8onXs1affgGh5vwEiojiC7VtcE0V0+moCaj15o8sX6mEX5ASd0T+uh/X7dIOj5RG7s/5vJjJtc6LiL31heNCeOUxrHJixTNPYp/gj910Iy7Eu4XdVzf1UocG3nTNg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PKws/67E; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PKws/67E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E3428C4CEE3; Mon, 7 Jul 2025 13:42:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895763; bh=Q7QXEZRxCNICRO2IlbnIh26kG/g8UZeXIk3BFaIKQJk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PKws/67ExSqQWUmERDOLccJ/K7eEsvAYaqK3dnWLk0pIuiP55W7JTeMv9/4iQDjCI 6UNn2RQ6BMLnul54rD1RL8Jv4NeigznMb0x3Azsd4QzPDnUS/6KvFPQrib6rFHn3bK uW8aC7fN5CtpWYlHXbgmMio2SuZ/ryD0B98q/FUiOyG8VewJKThlo8+oPqXIuSZIWo ImmfdX1w93AYC/pUUNdgLgIS3IjGKq1l0f6ENMnbOqbec16kl+DawXWL/V7OAq4nwU WjDnw3kUnKNI7Omr5ZfHubOKbL5ajywtA7zA76ax3NgPVOgV4FC5paLaJEap5nXZf3 VjVW7cqLcnjGw== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH v2 3/7] drm/sched/tests: Add unit test for cancel_job() Date: Mon, 7 Jul 2025 15:42:16 +0200 Message-ID: <20250707134221.34291-5-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The scheduler unit tests now provide a new callback, cancel_job(). This callback gets used by drm_sched_fini() for all still pending jobs to cancel them. Implement a new unit test to test this. Signed-off-by: Philipp Stanner Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/scheduler/tests/tests_basic.c | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/dr= m/scheduler/tests/tests_basic.c index 7230057e0594..fa3da2db4893 100644 --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c @@ -204,6 +204,48 @@ static struct kunit_suite drm_sched_basic =3D { .test_cases =3D drm_sched_basic_tests, }; =20 +static void drm_sched_basic_cancel(struct kunit *test) +{ + struct drm_mock_sched_entity *entity; + struct drm_mock_scheduler *sched; + struct drm_mock_sched_job *job; + bool done; + + /* + * Check that the configured credit limit is respected. + */ + + sched =3D drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT); + sched->base.credit_limit =3D 1; + + entity =3D drm_mock_sched_entity_new(test, DRM_SCHED_PRIORITY_NORMAL, + sched); + + job =3D drm_mock_sched_job_new(test, entity); + + drm_mock_sched_job_submit(job); + + done =3D drm_mock_sched_job_wait_scheduled(job, HZ); + KUNIT_ASSERT_TRUE(test, done); + + drm_mock_sched_entity_free(entity); + drm_mock_sched_fini(sched); + + KUNIT_ASSERT_EQ(test, job->hw_fence.error, -ECANCELED); +} + +static struct kunit_case drm_sched_cancel_tests[] =3D { + KUNIT_CASE(drm_sched_basic_cancel), + {} +}; + +static struct kunit_suite drm_sched_cancel =3D { + .name =3D "drm_sched_basic_cancel_tests", + .init =3D drm_sched_basic_init, + .exit =3D drm_sched_basic_exit, + .test_cases =3D drm_sched_cancel_tests, +}; + static void drm_sched_basic_timeout(struct kunit *test) { struct drm_mock_scheduler *sched =3D test->priv; @@ -471,6 +513,7 @@ static struct kunit_suite drm_sched_credits =3D { =20 kunit_test_suites(&drm_sched_basic, &drm_sched_timeout, + &drm_sched_cancel, &drm_sched_priority, &drm_sched_modify_sched, &drm_sched_credits); --=20 2.49.0 From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68DF72BD58F; Mon, 7 Jul 2025 13:42:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895768; cv=none; b=hL5iziVIMIj6ldqIkgtbhhEIu5Q3dGxe9ceX4gh7ptTpkYPHDHdP2C26GkOojBLK8p+BRQMwD6MFzErtNF4Lzg+h8fbSikLoxm0iFNdR3EuMWc61feFpAKsosY82KeJhjVSzSntyQTDjMugg7mYaxP3ZLQ2mru2yuZ7a8Q7YRh8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895768; c=relaxed/simple; bh=UQqJCi3q3248PmPaymeTCZH4g0d/zd1eLvZ5EVok/aw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kSJVb4WVdNYDl16A2+bZgRvLsKvpCS5eiXeCvNHD6T6BJT120Ud5boqG/TUYeJGNYnoZ3PuKOW049f+5r9YGZPDLQl1w4FB7uruHxNRaU81JfSj1VQ9MtnteTXREHGZthI1MCW/Vto5wedTMaWMjy/6nuP2ysr4dA5bsFpTHgW0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sKaWRNw6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sKaWRNw6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1CE9AC4CEF4; Mon, 7 Jul 2025 13:42:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895768; bh=UQqJCi3q3248PmPaymeTCZH4g0d/zd1eLvZ5EVok/aw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sKaWRNw6rzetS4hgXXzsihe0ACF22tZm2IJXFpvG5XItP0bHQJTC2obbzQMYHd3me XYMlaPiCZdaaOsdoMVInJSQuMggkxzcU9/XL1vhfIZ47c9z8QIbMt2DyOrC2yOq+bL 7dpBu1pFR27MUxJR5C9/PogpOphu6U+dYqpa0eRZ/HLvICpiSi+H36hbPfgXnZCTCR vB3IS3ceH4NKjQKqQdt4I2NLP9LPGxPc43x8E5n/h1I8w7mngG24TkqjJtLkCasyOT BbmbKOgVObQc4KbXmRzi3poIjr+EcgOQwh+Mi+X1UhPOb/kKivMG2XuiTjuNeqXvs5 aQyDFs4mrgOWA== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH v2 4/7] drm/sched: Warn if pending list is not empty Date: Mon, 7 Jul 2025 15:42:17 +0200 Message-ID: <20250707134221.34291-6-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/sched= uler/sched_main.c index 1239954f5f7c..dadf1a22ddf6 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1415,6 +1415,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) sched->ready =3D false; kfree(sched->sched_rq); sched->sched_rq =3D NULL; + + if (!list_empty(&sched->pending_list)) + dev_err(sched->dev, "Tearing down scheduler while jobs are pending!\n"); } EXPORT_SYMBOL(drm_sched_fini); =20 --=20 2.49.0 From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1C662BE05A; Mon, 7 Jul 2025 13:42:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895772; cv=none; b=QiMWWt8j5NM41jsS4aDVb6n4yWh6qmjVr+nwIK30ELLxdUm9ZdxTb+1Z9wjU8hVe35GG6yTqAsb9uirfmwp6LU0RsjgmJj5YxuJi207/Muu9/iCsOhA9jSkvTlEbBbL+Q4JZdteWBzq9G60fA4w46Jo992v+J80sbRccuBxA5mU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895772; c=relaxed/simple; bh=QPT51nD7e7AHwZ27sJjHwy+UmpkZSyFDrE7MKORACwU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B5iZ2nk+4kuJPAKUlmfc6POOinop6JdM0QHQPSX2QKVz/nEY+9BHZNUtz6X4fFPNaXrA/4Vf1ggqRAYCo88INfwLJTAjnJZ0v+Hq3wrp9ZPdci0zqKXTjMajYkm+fj1XhRk8178H+6CVS//46T2sI7mQinW0KbzXz/+eJRTUTVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aydjUyN/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aydjUyN/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81E25C4CEE3; Mon, 7 Jul 2025 13:42:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895772; bh=QPT51nD7e7AHwZ27sJjHwy+UmpkZSyFDrE7MKORACwU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aydjUyN/A760J3oT4NQShTZLe4+vNdMYHm0mMfV5EPzWH78zRrt3nK8KXYk/B8TNB dI8VEBa6mIlmqSW/Xx7jyC8eRFkQN5wW99NLgYnIXTPBSLhWfwlP/zk0ke2BIAoUTi qbFa3qmH7YrTFes/rSGWb5EzD1ZoAIoiiwQCxQaRtWeoOhnzd+5QjiahfO2+86WT30 rxqRwDjp6OUTdgkQJ3AtLqVRX21vjZjZfJyXXohxbIYOTrmxamzhvLRqCfXnDMMUbz fLRNMO46JSMg00fJRzh12565lK/7Bn3pO94EO9XLTMZ+kcOopix2gWTzo1I+OKChMw RSxKvJ0FBnP9Q== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH v2 5/7] drm/nouveau: Make fence container helper usable driver-wide Date: Mon, 7 Jul 2025 15:42:18 +0200 Message-ID: <20250707134221.34291-7-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to implement a new DRM GPU scheduler callback in Nouveau, a helper for obtaining a nouveau_fence from a dma_fence is necessary. Such a helper exists already inside nouveau_fence.c, called from_fence(). Make that helper available to other C files with a more precise name. Signed-off-by: Philipp Stanner Acked-by: Danilo Krummrich --- drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++++++------------- drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++++++ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouv= eau/nouveau_fence.c index d5654e26d5bc..869d4335c0f4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -38,12 +38,6 @@ static const struct dma_fence_ops nouveau_fence_ops_uevent; static const struct dma_fence_ops nouveau_fence_ops_legacy; =20 -static inline struct nouveau_fence * -from_fence(struct dma_fence *fence) -{ - return container_of(fence, struct nouveau_fence, base); -} - static inline struct nouveau_fence_chan * nouveau_fctx(struct nouveau_fence *fence) { @@ -77,7 +71,7 @@ nouveau_local_fence(struct dma_fence *fence, struct nouve= au_drm *drm) fence->ops !=3D &nouveau_fence_ops_uevent) return NULL; =20 - return from_fence(fence); + return to_nouveau_fence(fence); } =20 void @@ -268,7 +262,7 @@ nouveau_fence_done(struct nouveau_fence *fence) static long nouveau_fence_wait_legacy(struct dma_fence *f, bool intr, long wait) { - struct nouveau_fence *fence =3D from_fence(f); + struct nouveau_fence *fence =3D to_nouveau_fence(f); unsigned long sleep_time =3D NSEC_PER_MSEC / 1000; unsigned long t =3D jiffies, timeout =3D t + wait; =20 @@ -448,7 +442,7 @@ static const char *nouveau_fence_get_get_driver_name(st= ruct dma_fence *fence) =20 static const char *nouveau_fence_get_timeline_name(struct dma_fence *f) { - struct nouveau_fence *fence =3D from_fence(f); + struct nouveau_fence *fence =3D to_nouveau_fence(f); struct nouveau_fence_chan *fctx =3D nouveau_fctx(fence); =20 return !fctx->dead ? fctx->name : "dead channel"; @@ -462,7 +456,7 @@ static const char *nouveau_fence_get_timeline_name(stru= ct dma_fence *f) */ static bool nouveau_fence_is_signaled(struct dma_fence *f) { - struct nouveau_fence *fence =3D from_fence(f); + struct nouveau_fence *fence =3D to_nouveau_fence(f); struct nouveau_fence_chan *fctx =3D nouveau_fctx(fence); struct nouveau_channel *chan; bool ret =3D false; @@ -478,7 +472,7 @@ static bool nouveau_fence_is_signaled(struct dma_fence = *f) =20 static bool nouveau_fence_no_signaling(struct dma_fence *f) { - struct nouveau_fence *fence =3D from_fence(f); + struct nouveau_fence *fence =3D to_nouveau_fence(f); =20 /* * caller should have a reference on the fence, @@ -503,7 +497,7 @@ static bool nouveau_fence_no_signaling(struct dma_fence= *f) =20 static void nouveau_fence_release(struct dma_fence *f) { - struct nouveau_fence *fence =3D from_fence(f); + struct nouveau_fence *fence =3D to_nouveau_fence(f); struct nouveau_fence_chan *fctx =3D nouveau_fctx(fence); =20 kref_put(&fctx->fence_ref, nouveau_fence_context_put); @@ -521,7 +515,7 @@ static const struct dma_fence_ops nouveau_fence_ops_leg= acy =3D { =20 static bool nouveau_fence_enable_signaling(struct dma_fence *f) { - struct nouveau_fence *fence =3D from_fence(f); + struct nouveau_fence *fence =3D to_nouveau_fence(f); struct nouveau_fence_chan *fctx =3D nouveau_fctx(fence); bool ret; =20 diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouv= eau/nouveau_fence.h index 6a983dd9f7b9..183dd43ecfff 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.h +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h @@ -17,6 +17,12 @@ struct nouveau_fence { unsigned long timeout; }; =20 +static inline struct nouveau_fence * +to_nouveau_fence(struct dma_fence *fence) +{ + return container_of(fence, struct nouveau_fence, base); +} + int nouveau_fence_create(struct nouveau_fence **, struct nouveau_channel = *); int nouveau_fence_new(struct nouveau_fence **, struct nouveau_channel *); void nouveau_fence_unref(struct nouveau_fence **); --=20 2.49.0 From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 223D44594A; Mon, 7 Jul 2025 13:42:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895777; cv=none; b=IZlepTvVn6xk1ztfqF0knprgV/5VO53rKhh9wKmzwNNmJDYcDJHOdEd1/gK1UKsPncA+MjhQhQ7vEzGzsqTXtFigSs9/bqgkIsSNIfabGjDvLYRpBMlIr09bwtryK6u1xKGVrNvhjqNh2ruY13TL/jAkvMgAZ8Vi+/MiSApfwjo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895777; c=relaxed/simple; bh=YPEvwsrhEYilovZE/FzKLWSDVL/XkoOF8+zmDx8isVw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JFFrchjYmxzfcnZaaGpeNsL7lsd5GzlgBHFjq0Q31xn8hvKfZRgUU57+6Y/SEUO46j+U6Wd0dkLqGcZiI9VfCmKNatjVRcz2C9MC44PlYEx+R78axb8s9k/xVBEE45/RmlV24NpvRZ30GYMAeEbbr4RQCsaSuRs0My7IXR77PaU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r6XocvBv; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r6XocvBv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB5BDC4CEF4; Mon, 7 Jul 2025 13:42:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895776; bh=YPEvwsrhEYilovZE/FzKLWSDVL/XkoOF8+zmDx8isVw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=r6XocvBvqNUB5/9tCW7d8fPhNJFKSQMo4bgJIqkJp2LyWU6RQlPwZf31Jb+UN0OBu lNUqkl9Y/oUiJZ930NUmN4vKajMjqHq5sYyK8r/7U6LN9ceo90L8M6JrsxkDQi+J5u mgDJOTQFkRfgcWbAzn6XEfR5rliAly+leHYK5BW6E5bvIsus1LQzKwkxN/A1II/K1L kejRTDxLnQIiA/b314drqx6M5Kqn+fJH7uCvygv3Kmg8Y/a2hXJg39Ky9H2d3BmOmf oyYY2uWyTNFtTRCtAw0oBeB7OC6m4QCDsbUrS81G7712SxsQkxZq7R74S8WGlqy1P5 hasr6upNTcCng== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH v2 6/7] drm/nouveau: Add new callback for scheduler teardown Date: Mon, 7 Jul 2025 15:42:19 +0200 Message-ID: <20250707134221.34291-8-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner Acked-by: Danilo Krummrich --- drivers/gpu/drm/nouveau/nouveau_fence.c | 15 +++++++++++++++ drivers/gpu/drm/nouveau/nouveau_fence.h | 1 + drivers/gpu/drm/nouveau/nouveau_sched.c | 15 ++++++++++++++- 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouv= eau/nouveau_fence.c index 869d4335c0f4..9f345a008717 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -240,6 +240,21 @@ nouveau_fence_emit(struct nouveau_fence *fence) return ret; } =20 +void +nouveau_fence_cancel(struct nouveau_fence *fence) +{ + struct nouveau_fence_chan *fctx =3D nouveau_fctx(fence); + unsigned long flags; + + spin_lock_irqsave(&fctx->lock, flags); + if (!dma_fence_is_signaled_locked(&fence->base)) { + dma_fence_set_error(&fence->base, -ECANCELED); + if (nouveau_fence_signal(fence)) + nvif_event_block(&fctx->event); + } + spin_unlock_irqrestore(&fctx->lock, flags); +} + bool nouveau_fence_done(struct nouveau_fence *fence) { diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouv= eau/nouveau_fence.h index 183dd43ecfff..9957a919bd38 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.h +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h @@ -29,6 +29,7 @@ void nouveau_fence_unref(struct nouveau_fence **); =20 int nouveau_fence_emit(struct nouveau_fence *); bool nouveau_fence_done(struct nouveau_fence *); +void nouveau_fence_cancel(struct nouveau_fence *fence); int nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr); int nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *, boo= l exclusive, bool intr); =20 diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouv= eau/nouveau_sched.c index 460a5fb02412..2ec62059c351 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sched.c +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c @@ -11,6 +11,7 @@ #include "nouveau_exec.h" #include "nouveau_abi16.h" #include "nouveau_sched.h" +#include "nouveau_chan.h" =20 #define NOUVEAU_SCHED_JOB_TIMEOUT_MS 10000 =20 @@ -393,10 +394,23 @@ nouveau_sched_free_job(struct drm_sched_job *sched_jo= b) nouveau_job_fini(job); } =20 +static void +nouveau_sched_cancel_job(struct drm_sched_job *sched_job) +{ + struct nouveau_fence *fence; + struct nouveau_job *job; + + job =3D to_nouveau_job(sched_job); + fence =3D to_nouveau_fence(job->done_fence); + + nouveau_fence_cancel(fence); +} + static const struct drm_sched_backend_ops nouveau_sched_ops =3D { .run_job =3D nouveau_sched_run_job, .timedout_job =3D nouveau_sched_timedout_job, .free_job =3D nouveau_sched_free_job, + .cancel_job =3D nouveau_sched_cancel_job, }; =20 static int @@ -482,7 +496,6 @@ nouveau_sched_create(struct nouveau_sched **psched, str= uct nouveau_drm *drm, return 0; } =20 - static void nouveau_sched_fini(struct nouveau_sched *sched) { --=20 2.49.0 From nobody Tue Oct 7 19:24:15 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83D992BEFF5; Mon, 7 Jul 2025 13:43:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895781; cv=none; b=cZldyAckqF5XH05tyT++X3kqBCcN0so+Bg+nBpVSzJdJ7PNUoys9Lq3HidySp4mK2HPUTj6cSSCnRVgU0ZvQvZkgGINKJ1UgwJx/V9hXT70iXPtPe9aAvWEXozjjLPidpQ9OEehjLUjpuwi7y+y+JerMuwI8NDEqLY0yXt1l5Jo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751895781; c=relaxed/simple; bh=kFeQuslhr/3hnNLg6rE8C4RDktELrENqA7ibJjiDglQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hB45jdSGRLNrWR9g0Rvi6Zn7NREc93eIynS1EdmQeSaIAtrTJy6pQSY0V7G0rYqZvM3v+HV+tnuBVCVHae1PTVzW+aeHmfPqWM0RV9oO+nru3BwyBEzD4NZlEfLIFsAJrtGsGCi8BWZVr4/lsqQLvNpNis+vvg7SNjn/cHuxJbI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FCiDJQO3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FCiDJQO3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 22655C4CEE3; Mon, 7 Jul 2025 13:42:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751895781; bh=kFeQuslhr/3hnNLg6rE8C4RDktELrENqA7ibJjiDglQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FCiDJQO3TLWkMR1UB2l25UOB7Cnz92eRbGlBhrORo4bms4FxKzSIPOETrajXSVj8c dOOE+6vir6JSjXFB4L0reTiS8Q0A46QNGBRwmXCTdfo22CNxvcStmgWrmR/JE1e+AL PvwZvBp3RQakfGrCjlCQurC3T5yNmZAQv5TopTIi4d3Je62azAUwaQlLa6p3RwLhH2 hEts6+sqxHSEHdJFY1GaSKVGtJggmzf4WXClcEMuUqHHHiGyAjGdQrMXi9KAq7UMtb Ve41v+KGrkbfmxxm5G+500uSA7y/8KO9tqRGHz/BBX46KJ40nP7fGRPVfeqiQaTPR7 D7U86KksH/Mww== From: Philipp Stanner To: Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Matthew Brost , Philipp Stanner , =?UTF-8?q?Christian=20K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sumit Semwal , Tvrtko Ursulin , Pierre-Eric Pelloux-Prayer Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH v2 7/7] drm/nouveau: Remove waitque for sched teardown Date: Mon, 7 Jul 2025 15:42:20 +0200 Message-ID: <20250707134221.34291-9-phasta@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250707134221.34291-2-phasta@kernel.org> References: <20250707134221.34291-2-phasta@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" struct nouveau_sched contains a waitque needed to prevent drm_sched_fini() from being called while there are still jobs pending. Doing so so far would have caused memory leaks. With the new memleak-free mode of operation switched on in drm_sched_fini() by providing the callback nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner Acked-by: Danilo Krummrich --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++++++------------- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++------ drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 ++++---- 3 files changed, 14 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouv= eau/nouveau_sched.c index 2ec62059c351..7d9c3418e76b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sched.c +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c @@ -122,11 +122,9 @@ nouveau_job_done(struct nouveau_job *job) { struct nouveau_sched *sched =3D job->sched; =20 - spin_lock(&sched->job.list.lock); + spin_lock(&sched->job_list.lock); list_del(&job->entry); - spin_unlock(&sched->job.list.lock); - - wake_up(&sched->job.wq); + spin_unlock(&sched->job_list.lock); } =20 void @@ -307,9 +305,9 @@ nouveau_job_submit(struct nouveau_job *job) } =20 /* Submit was successful; add the job to the schedulers job list. */ - spin_lock(&sched->job.list.lock); - list_add(&job->entry, &sched->job.list.head); - spin_unlock(&sched->job.list.lock); + spin_lock(&sched->job_list.lock); + list_add(&job->entry, &sched->job_list.head); + spin_unlock(&sched->job_list.lock); =20 drm_sched_job_arm(&job->base); job->done_fence =3D dma_fence_get(&job->base.s_fence->finished); @@ -460,9 +458,8 @@ nouveau_sched_init(struct nouveau_sched *sched, struct = nouveau_drm *drm, goto fail_sched; =20 mutex_init(&sched->mutex); - spin_lock_init(&sched->job.list.lock); - INIT_LIST_HEAD(&sched->job.list.head); - init_waitqueue_head(&sched->job.wq); + spin_lock_init(&sched->job_list.lock); + INIT_LIST_HEAD(&sched->job_list.head); =20 return 0; =20 @@ -502,9 +499,6 @@ nouveau_sched_fini(struct nouveau_sched *sched) struct drm_gpu_scheduler *drm_sched =3D &sched->base; struct drm_sched_entity *entity =3D &sched->entity; =20 - rmb(); /* for list_empty to work without lock */ - wait_event(sched->job.wq, list_empty(&sched->job.list.head)); - drm_sched_entity_fini(entity); drm_sched_fini(drm_sched); =20 diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouv= eau/nouveau_sched.h index 20cd1da8db73..b98c3f0bef30 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sched.h +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h @@ -103,12 +103,9 @@ struct nouveau_sched { struct mutex mutex; =20 struct { - struct { - struct list_head head; - spinlock_t lock; - } list; - struct wait_queue_head wq; - } job; + struct list_head head; + spinlock_t lock; + } job_list; }; =20 int nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm= *drm, diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouve= au/nouveau_uvmm.c index 48f105239f42..ddfc46bc1b3e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c @@ -1019,8 +1019,8 @@ bind_validate_map_sparse(struct nouveau_job *job, u64= addr, u64 range) u64 end =3D addr + range; =20 again: - spin_lock(&sched->job.list.lock); - list_for_each_entry(__job, &sched->job.list.head, entry) { + spin_lock(&sched->job_list.lock); + list_for_each_entry(__job, &sched->job_list.head, entry) { struct nouveau_uvmm_bind_job *bind_job =3D to_uvmm_bind_job(__job); =20 list_for_each_op(op, &bind_job->ops) { @@ -1030,7 +1030,7 @@ bind_validate_map_sparse(struct nouveau_job *job, u64= addr, u64 range) =20 if (!(end <=3D op_addr || addr >=3D op_end)) { nouveau_uvmm_bind_job_get(bind_job); - spin_unlock(&sched->job.list.lock); + spin_unlock(&sched->job_list.lock); wait_for_completion(&bind_job->complete); nouveau_uvmm_bind_job_put(bind_job); goto again; @@ -1038,7 +1038,7 @@ bind_validate_map_sparse(struct nouveau_job *job, u64= addr, u64 range) } } } - spin_unlock(&sched->job.list.lock); + spin_unlock(&sched->job_list.lock); } =20 static int --=20 2.49.0