[v2] drm/sched: Don't crash kernel on wrong params

[PATCH v2] drm/sched: Don't crash kernel on wrong params

Posted by Philipp Stanner 2 months, 3 weeks ago

drm_sched_job_arm() just panics the kernel with BUG_ON() in case of an
entity being NULL. If the entity is NULL, subsequent accesses will crash
the particular CPU anyways with a NULL pointer exception backtrace.

Remove the BUG_ON().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
Changes in v2:
  - Drop BUG_ON() instead of replacing it. (Tvrtko)
---
 drivers/gpu/drm/scheduler/sched_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1d4f1b822e7b..05eb50d4cf08 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -857,7 +857,6 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_entity *entity = job->entity;
 
-	BUG_ON(!entity);
 	drm_sched_entity_select_rq(entity);
 	sched = entity->rq->sched;
 
-- 
2.49.0

Re: [PATCH v2] drm/sched: Don't crash kernel on wrong params

Posted by Tvrtko Ursulin 2 months, 3 weeks ago

On 12/11/2025 09:18, Philipp Stanner wrote:
> drm_sched_job_arm() just panics the kernel with BUG_ON() in case of an
> entity being NULL. If the entity is NULL, subsequent accesses will crash
> the particular CPU anyways with a NULL pointer exception backtrace.
> 
> Remove the BUG_ON().
> 
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
> Changes in v2:
>    - Drop BUG_ON() instead of replacing it. (Tvrtko)

The option of removing the BUG_ON was conditional on brainstorming a bit 
whether we think the null pointer dereference is the worst that can 
happen or not.

Other option was "WARN_ON_ONCE() return" in arm and push.

Problem being, if we allow it to continue, are we opening up the 
possibly to mess up the kernel in a worse way.

For example push job writes to the entity. Okay offsets are low so is 
the zero page always safe to write? I don't know but sounds scary. From 
that point of view BUG_ON or WARN_ON_ONCE with exit are safer options.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 1d4f1b822e7b..05eb50d4cf08 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -857,7 +857,6 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_entity *entity = job->entity;
>   
> -	BUG_ON(!entity);
>   	drm_sched_entity_select_rq(entity);
>   	sched = entity->rq->sched;
>

Re: [PATCH v2] drm/sched: Don't crash kernel on wrong params

Posted by Philipp Stanner 2 months, 3 weeks ago

On Wed, 2025-11-12 at 09:46 +0000, Tvrtko Ursulin wrote:
> 
> On 12/11/2025 09:18, Philipp Stanner wrote:
> > drm_sched_job_arm() just panics the kernel with BUG_ON() in case of an
> > entity being NULL. If the entity is NULL, subsequent accesses will crash
> > the particular CPU anyways with a NULL pointer exception backtrace.
> > 
> > Remove the BUG_ON().
> > 
> > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > ---
> > Changes in v2:
> >    - Drop BUG_ON() instead of replacing it. (Tvrtko)
> 
> The option of removing the BUG_ON was conditional on brainstorming a bit 
> whether we think the null pointer dereference is the worst that can 
> happen or not.
> 
> Other option was "WARN_ON_ONCE() return" in arm and push.
> 

Maybe even WARN_ON() is OK to make it noticable.

I mostly care about getting rid of BUG_ON().


P.