sched_ext: Avoid NULL scx_root deref in __scx_exit()

[PATCH sched_ext/for-6.16] sched_ext: Avoid NULL scx_root deref in __scx_exit()

Posted by Andrea Righi 9 months, 1 week ago

A sched_ext scheduler may trigger __scx_exit() from a BPF timer
callback, where scx_root may not be safely dereferenced.

This can lead to a NULL pointer dereference as shown below (triggered by
scx_tickless):

 BUG: kernel NULL pointer dereference, address: 0000000000000330
...
 CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.14.0-virtme #1 PREEMPT(full)
 RIP: 0010:__scx_exit+0x2b/0x190
...
 Call Trace:
  <IRQ>
  scx_bpf_get_idle_smtmask+0x59/0x80
  bpf_prog_8320d4217989178c_dispatch_all_cpus+0x35/0x1b6
...
  bpf_prog_97f847d871513f95_sched_timerfn+0x4c/0x264
  bpf_timer_cb+0x7a/0x140
  __hrtimer_run_queues+0x1f9/0x3a0
  hrtimer_run_softirq+0x8c/0xd0
  handle_softirqs+0xd3/0x3d0
  __irq_exit_rcu+0x9a/0xc0
  irq_exit_rcu+0xe/0x20

Fix this by checking for a valid scx_root and adding proper RCU
protection.

Fixes: 48e1267773866 ("sched_ext: Introduce scx_sched")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 66d21c7cdff7d..1ee5a0088b1a1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5212,12 +5212,19 @@ static void scx_error_irq_workfn(struct irq_work *irq_work)
 static __printf(3, 4) void __scx_exit(enum scx_exit_kind kind, s64 exit_code,
 				      const char *fmt, ...)
 {
-	struct scx_exit_info *ei = scx_root->exit_info;
+	struct scx_sched *sch;
+	struct scx_exit_info *ei;
 	int none = SCX_EXIT_NONE;
 	va_list args;
 
-	if (!atomic_try_cmpxchg(&scx_root->exit_kind, &none, kind))
-		return;
+	rcu_read_lock();
+	sch = rcu_dereference(scx_root);
+	if (!sch)
+		goto out_unlock;
+	ei = sch->exit_info;
+
+	if (!atomic_try_cmpxchg(&sch->exit_kind, &none, kind))
+		goto out_unlock;
 
 	ei->exit_code = exit_code;
 #ifdef CONFIG_STACKTRACE
@@ -5235,7 +5242,9 @@ static __printf(3, 4) void __scx_exit(enum scx_exit_kind kind, s64 exit_code,
 	ei->kind = kind;
 	ei->reason = scx_exit_reason(ei->kind);
 
-	irq_work_queue(&scx_root->error_irq_work);
+	irq_work_queue(&sch->error_irq_work);
+out_unlock:
+	rcu_read_unlock();
 }
 
 static struct scx_sched *scx_alloc_and_add_sched(struct sched_ext_ops *ops)
-- 
2.49.0

Re: [PATCH sched_ext/for-6.16] sched_ext: Avoid NULL scx_root deref in __scx_exit()

Posted by Tejun Heo 9 months, 1 week ago

On Wed, Apr 30, 2025 at 10:40:21AM +0200, Andrea Righi wrote:
> A sched_ext scheduler may trigger __scx_exit() from a BPF timer
> callback, where scx_root may not be safely dereferenced.
> 
> This can lead to a NULL pointer dereference as shown below (triggered by
> scx_tickless):
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000330
> ...
>  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.14.0-virtme #1 PREEMPT(full)
>  RIP: 0010:__scx_exit+0x2b/0x190
> ...
>  Call Trace:
>   <IRQ>
>   scx_bpf_get_idle_smtmask+0x59/0x80
>   bpf_prog_8320d4217989178c_dispatch_all_cpus+0x35/0x1b6
> ...
>   bpf_prog_97f847d871513f95_sched_timerfn+0x4c/0x264
>   bpf_timer_cb+0x7a/0x140
>   __hrtimer_run_queues+0x1f9/0x3a0
>   hrtimer_run_softirq+0x8c/0xd0
>   handle_softirqs+0xd3/0x3d0
>   __irq_exit_rcu+0x9a/0xc0
>   irq_exit_rcu+0xe/0x20
> 
> Fix this by checking for a valid scx_root and adding proper RCU
> protection.
> 
> Fixes: 48e1267773866 ("sched_ext: Introduce scx_sched")
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

Applied to sched_ext/for-6.16.

Thanks.

-- 
tejun