[PATCH sched_ext/for-6.16] sched_ext: Add RCU protection to scx_root in DSQ iterator

Andrea Righi posted 1 patch 9 months, 1 week ago
There is a newer version of this series
kernel/sched/ext.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
[PATCH sched_ext/for-6.16] sched_ext: Add RCU protection to scx_root in DSQ iterator
Posted by Andrea Righi 9 months, 1 week ago
Using a DSQ iterators from a timer callback can trigger the following
lockdep splat:

 =============================
 WARNING: suspicious RCU usage
 6.14.0-virtme #1 Not tainted
 -----------------------------
 kernel/sched/ext.c:6907 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 no locks held by swapper/0/0.

 stack backtrace:
 CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.14.0-virtme #1 PREEMPT(full)
 Sched_ext: tickless (enabled+all)
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x6f/0xb0
  lockdep_rcu_suspicious.cold+0x4e/0xa3
  bpf_iter_scx_dsq_new+0xb1/0xd0
  bpf_prog_63f4fd1bccc101e7_dispatch_cpu+0x3e/0x156
  bpf_prog_8320d4217989178c_dispatch_all_cpus+0x153/0x1b6
  bpf_prog_97f847d871513f95_sched_timerfn+0x4c/0x264
  ? hrtimer_run_softirq+0x4f/0xd0
  bpf_timer_cb+0x7a/0x140
  __hrtimer_run_queues+0x1f9/0x3a0
  hrtimer_run_softirq+0x8c/0xd0
  handle_softirqs+0xd3/0x3d0
  __irq_exit_rcu+0x9a/0xc0
  irq_exit_rcu+0xe/0x20
  sysvec_apic_timer_interrupt+0x73/0x80

Add rcu_read_lock/unlock() around scx_root access to prevent this.

Fixes: cdf5a6faa8cf0 ("sched_ext: Move dsq_hash into scx_sched")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1ee5a0088b1a1..d2856529711d7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6907,28 +6907,39 @@ __bpf_kfunc int bpf_iter_scx_dsq_new(struct bpf_iter_scx_dsq *it, u64 dsq_id,
 {
 	struct bpf_iter_scx_dsq_kern *kit = (void *)it;
 	struct scx_sched *sch;
+	int ret = 0;
 
 	BUILD_BUG_ON(sizeof(struct bpf_iter_scx_dsq_kern) >
 		     sizeof(struct bpf_iter_scx_dsq));
 	BUILD_BUG_ON(__alignof__(struct bpf_iter_scx_dsq_kern) !=
 		     __alignof__(struct bpf_iter_scx_dsq));
 
+	rcu_read_lock();
 	sch = rcu_dereference(scx_root);
-	if (!sch)
-		return -ENODEV;
+	if (!sch) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
 
-	if (flags & ~__SCX_DSQ_ITER_USER_FLAGS)
-		return -EINVAL;
+	if (flags & ~__SCX_DSQ_ITER_USER_FLAGS) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
 
 	kit->dsq = find_user_dsq(sch, dsq_id);
-	if (!kit->dsq)
-		return -ENOENT;
+	if (!kit->dsq) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
 
 	INIT_LIST_HEAD(&kit->cursor.node);
 	kit->cursor.flags = SCX_DSQ_LNODE_ITER_CURSOR | flags;
 	kit->cursor.priv = READ_ONCE(kit->dsq->seq);
 
-	return 0;
+out_unlock:
+	rcu_read_unlock();
+
+	return ret;
 }
 
 /**
-- 
2.49.0
Re: [PATCH sched_ext/for-6.16] sched_ext: Add RCU protection to scx_root in DSQ iterator
Posted by Tejun Heo 9 months, 1 week ago
On Wed, Apr 30, 2025 at 11:09:30AM +0200, Andrea Righi wrote:
> @@ -6907,28 +6907,39 @@ __bpf_kfunc int bpf_iter_scx_dsq_new(struct bpf_iter_scx_dsq *it, u64 dsq_id,
>  {
>  	struct bpf_iter_scx_dsq_kern *kit = (void *)it;
>  	struct scx_sched *sch;
> +	int ret = 0;
>  
>  	BUILD_BUG_ON(sizeof(struct bpf_iter_scx_dsq_kern) >
>  		     sizeof(struct bpf_iter_scx_dsq));
>  	BUILD_BUG_ON(__alignof__(struct bpf_iter_scx_dsq_kern) !=
>  		     __alignof__(struct bpf_iter_scx_dsq));
>  
> +	rcu_read_lock();
>  	sch = rcu_dereference(scx_root);

Can you instead try changing rcu_derefernece(scx_root) to
rcu_derefernece_check(scx_root, rcu_read_lock_bh_held())? The warning is
because this can be called both from rcu-locked and bh contexts and while
grabbing rcu_read_lock() once works, telling rcu that the dereference can be
in both contexts is more accurate here.

Thanks.

-- 
tejun
Re: [PATCH sched_ext/for-6.16] sched_ext: Add RCU protection to scx_root in DSQ iterator
Posted by Andrea Righi 9 months, 1 week ago
On Wed, Apr 30, 2025 at 08:59:06AM -1000, Tejun Heo wrote:
> On Wed, Apr 30, 2025 at 11:09:30AM +0200, Andrea Righi wrote:
> > @@ -6907,28 +6907,39 @@ __bpf_kfunc int bpf_iter_scx_dsq_new(struct bpf_iter_scx_dsq *it, u64 dsq_id,
> >  {
> >  	struct bpf_iter_scx_dsq_kern *kit = (void *)it;
> >  	struct scx_sched *sch;
> > +	int ret = 0;
> >  
> >  	BUILD_BUG_ON(sizeof(struct bpf_iter_scx_dsq_kern) >
> >  		     sizeof(struct bpf_iter_scx_dsq));
> >  	BUILD_BUG_ON(__alignof__(struct bpf_iter_scx_dsq_kern) !=
> >  		     __alignof__(struct bpf_iter_scx_dsq));
> >  
> > +	rcu_read_lock();
> >  	sch = rcu_dereference(scx_root);
> 
> Can you instead try changing rcu_derefernece(scx_root) to
> rcu_derefernece_check(scx_root, rcu_read_lock_bh_held())? The warning is
> because this can be called both from rcu-locked and bh contexts and while
> grabbing rcu_read_lock() once works, telling rcu that the dereference can be
> in both contexts is more accurate here.

Yeah, that seems better, also considering that bpf_iter_scx_dsq_next() is
marked with KF_RCU_PROTECTED, so checking for rcu_read_lock_bh_held()
should cover all the possible cases.

Will test that and send a v2.

Thanks!
-Andrea