kernel/sched/ext.c | 19 +++++++++++++++---- tools/sched_ext/include/scx/compat.bpf.h | 8 +++++++- 2 files changed, 22 insertions(+), 5 deletions(-)
scx_bpf_cpu_curr() has been introduced to retrieve the current task of a
given runqueue, allowing schedulers to interact with that task.
The kfunc assumes that it is always called in an RCU context, but this
is not always guaranteed and some BPF schedulers can trigger the
following warning:
WARNING: suspicious RCU usage
sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled
6.17.0-rc1 #1-NixOS Not tainted
-----------------------------
kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage!
The correct behavior is to acquire a reference to the returned task, so
the scheduler can safely access it and then release it with
bpf_task_release().
Update the kfunc and the corresponding compatibility helper to implement
reference acquisition and prevent potential RCU warnings.
Fixes: 20b158094a1ad ("sched_ext: Introduce scx_bpf_cpu_curr()")
Cc: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/ext.c | 19 +++++++++++++++----
tools/sched_ext/include/scx/compat.bpf.h | 8 +++++++-
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 477eccf023388..760f49a9c846c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6392,16 +6392,27 @@ __bpf_kfunc struct rq *scx_bpf_locked_rq(void)
}
/**
- * scx_bpf_cpu_curr - Return remote CPU's curr task
+ * scx_bpf_cpu_curr - Acquire a reference and return the current task
+ * running on a target CPU
* @cpu: CPU of interest
*
- * Callers must hold RCU read lock (KF_RCU).
+ * A task acquired by this kfunc must be released by calling
+ * bpf_task_release().
*/
__bpf_kfunc struct task_struct *scx_bpf_cpu_curr(s32 cpu)
{
+ struct task_struct *p;
+
if (!kf_cpu_valid(cpu, NULL))
return NULL;
- return rcu_dereference(cpu_rq(cpu)->curr);
+
+ rcu_read_lock();
+ p = rcu_dereference(cpu_rq(cpu)->curr);
+ if (p && !refcount_inc_not_zero(&p->rcu_users))
+ p = NULL;
+ rcu_read_unlock();
+
+ return p;
}
/**
@@ -6569,7 +6580,7 @@ BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU)
BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
BTF_ID_FLAGS(func, scx_bpf_cpu_rq)
BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_RET_NULL)
-BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_ACQUIRE)
#ifdef CONFIG_CGROUP_SCHED
BTF_ID_FLAGS(func, scx_bpf_task_cgroup, KF_RCU | KF_ACQUIRE)
#endif
diff --git a/tools/sched_ext/include/scx/compat.bpf.h b/tools/sched_ext/include/scx/compat.bpf.h
index dd9144624dc99..551635746f25c 100644
--- a/tools/sched_ext/include/scx/compat.bpf.h
+++ b/tools/sched_ext/include/scx/compat.bpf.h
@@ -238,13 +238,19 @@ static inline bool __COMPAT_is_enq_cpu_selected(u64 enq_flags)
static inline struct task_struct *__COMPAT_scx_bpf_cpu_curr(int cpu)
{
struct rq *rq;
+ struct task_struct *p;
if (bpf_ksym_exists(scx_bpf_cpu_curr))
return scx_bpf_cpu_curr(cpu);
+ bpf_rcu_read_lock();
rq = scx_bpf_cpu_rq(cpu);
+ p = rq ? rq->curr : NULL;
+ if (p)
+ p = bpf_task_from_pid(p->pid);
+ bpf_rcu_read_unlock();
- return rq ? rq->curr : NULL;
+ return p;
}
/*
--
2.51.0
Hello, Andrea. On Tue, Sep 09, 2025 at 09:57:09PM +0200, Andrea Righi wrote: > scx_bpf_cpu_curr() has been introduced to retrieve the current task of a > given runqueue, allowing schedulers to interact with that task. > > The kfunc assumes that it is always called in an RCU context, but this > is not always guaranteed and some BPF schedulers can trigger the > following warning: > > WARNING: suspicious RCU usage > sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled > 6.17.0-rc1 #1-NixOS Not tainted > ----------------------------- > kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage! > > The correct behavior is to acquire a reference to the returned task, so > the scheduler can safely access it and then release it with > bpf_task_release(). > > Update the kfunc and the corresponding compatibility helper to implement > reference acquisition and prevent potential RCU warnings. I think KF_RCU likely fits better for peeking kernel data structures than having to acquire/release them. Can you post the full backtrace? Is it being called from a sleepable bpf prog? Or is it that we just need to expand the rcu check scope to cover regular rcu, bh and sched? And, everything aside, if KF_RCU, should we be tripping on rcu_dereference() in the first place? Thanks. -- tejun
Hi Tejun, On Tue, Sep 09, 2025 at 10:01:16AM -1000, Tejun Heo wrote: > Hello, Andrea. > > On Tue, Sep 09, 2025 at 09:57:09PM +0200, Andrea Righi wrote: > > scx_bpf_cpu_curr() has been introduced to retrieve the current task of a > > given runqueue, allowing schedulers to interact with that task. > > > > The kfunc assumes that it is always called in an RCU context, but this > > is not always guaranteed and some BPF schedulers can trigger the > > following warning: > > > > WARNING: suspicious RCU usage > > sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled > > 6.17.0-rc1 #1-NixOS Not tainted > > ----------------------------- > > kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage! > > > > The correct behavior is to acquire a reference to the returned task, so > > the scheduler can safely access it and then release it with > > bpf_task_release(). > > > > Update the kfunc and the corresponding compatibility helper to implement > > reference acquisition and prevent potential RCU warnings. > > I think KF_RCU likely fits better for peeking kernel data structures than > having to acquire/release them. Can you post the full backtrace? Is it being > called from a sleepable bpf prog? Or is it that we just need to expand the > rcu check scope to cover regular rcu, bh and sched? And, everything aside, > if KF_RCU, should we be tripping on rcu_dereference() in the first place? For the records, as discussed offline, we should be fine marking the kfunc as KF_RCU_PROTECTED instead of acquiring the reference to the task. Right now the kfunc is marked as KF_RCU, which is not really necessary, because KF_RCU ensures the kfunc *arguments* are either RCU-protected or trusted. KF_RCU_PROTECTED, instead, should ensure that the kfunc is called inside an RCU read-side critical section, that is what we need. In this way the kfunc can safely return a pointer to the task and sleepable BPF programs can wrap the call in a bpf_rcu_read_lock/unlock() section. This should prevent the RCU warning while still letting schedulers safely use the returned task. I'll send a new patch with a proper fix. Thanks, -Andrea
On 9/9/25 21:45, Andrea Righi wrote: > Hi Tejun, > > On Tue, Sep 09, 2025 at 10:01:16AM -1000, Tejun Heo wrote: >> Hello, Andrea. >> >> On Tue, Sep 09, 2025 at 09:57:09PM +0200, Andrea Righi wrote: >>> scx_bpf_cpu_curr() has been introduced to retrieve the current task of a >>> given runqueue, allowing schedulers to interact with that task. >>> >>> The kfunc assumes that it is always called in an RCU context, but this >>> is not always guaranteed and some BPF schedulers can trigger the >>> following warning: >>> >>> WARNING: suspicious RCU usage >>> sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled >>> 6.17.0-rc1 #1-NixOS Not tainted >>> ----------------------------- >>> kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage! >>> >>> The correct behavior is to acquire a reference to the returned task, so >>> the scheduler can safely access it and then release it with >>> bpf_task_release(). >>> >>> Update the kfunc and the corresponding compatibility helper to implement >>> reference acquisition and prevent potential RCU warnings. >> >> I think KF_RCU likely fits better for peeking kernel data structures than >> having to acquire/release them. Can you post the full backtrace? Is it being >> called from a sleepable bpf prog? Or is it that we just need to expand the >> rcu check scope to cover regular rcu, bh and sched? And, everything aside, >> if KF_RCU, should we be tripping on rcu_dereference() in the first place? > > For the records, as discussed offline, we should be fine marking the kfunc > as KF_RCU_PROTECTED instead of acquiring the reference to the task. > > Right now the kfunc is marked as KF_RCU, which is not really necessary, > because KF_RCU ensures the kfunc *arguments* are either RCU-protected or > trusted. > > KF_RCU_PROTECTED, instead, should ensure that the kfunc is called inside an > RCU read-side critical section, that is what we need. > > In this way the kfunc can safely return a pointer to the task and sleepable > BPF programs can wrap the call in a bpf_rcu_read_lock/unlock() section. > This should prevent the RCU warning while still letting schedulers safely > use the returned task. > > I'll send a new patch with a proper fix. > Hi Andrea, is this patch still outstanding or am I out of the loop now?
Hi Christian, On Thu, Sep 18, 2025 at 04:48:14PM +0100, Christian Loehle wrote: > On 9/9/25 21:45, Andrea Righi wrote: > > Hi Tejun, > > > > On Tue, Sep 09, 2025 at 10:01:16AM -1000, Tejun Heo wrote: > >> Hello, Andrea. > >> > >> On Tue, Sep 09, 2025 at 09:57:09PM +0200, Andrea Righi wrote: > >>> scx_bpf_cpu_curr() has been introduced to retrieve the current task of a > >>> given runqueue, allowing schedulers to interact with that task. > >>> > >>> The kfunc assumes that it is always called in an RCU context, but this > >>> is not always guaranteed and some BPF schedulers can trigger the > >>> following warning: > >>> > >>> WARNING: suspicious RCU usage > >>> sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled > >>> 6.17.0-rc1 #1-NixOS Not tainted > >>> ----------------------------- > >>> kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage! > >>> > >>> The correct behavior is to acquire a reference to the returned task, so > >>> the scheduler can safely access it and then release it with > >>> bpf_task_release(). > >>> > >>> Update the kfunc and the corresponding compatibility helper to implement > >>> reference acquisition and prevent potential RCU warnings. > >> > >> I think KF_RCU likely fits better for peeking kernel data structures than > >> having to acquire/release them. Can you post the full backtrace? Is it being > >> called from a sleepable bpf prog? Or is it that we just need to expand the > >> rcu check scope to cover regular rcu, bh and sched? And, everything aside, > >> if KF_RCU, should we be tripping on rcu_dereference() in the first place? > > > > For the records, as discussed offline, we should be fine marking the kfunc > > as KF_RCU_PROTECTED instead of acquiring the reference to the task. > > > > Right now the kfunc is marked as KF_RCU, which is not really necessary, > > because KF_RCU ensures the kfunc *arguments* are either RCU-protected or > > trusted. > > > > KF_RCU_PROTECTED, instead, should ensure that the kfunc is called inside an > > RCU read-side critical section, that is what we need. > > > > In this way the kfunc can safely return a pointer to the task and sleepable > > BPF programs can wrap the call in a bpf_rcu_read_lock/unlock() section. > > This should prevent the RCU warning while still letting schedulers safely > > use the returned task. > > > > I'll send a new patch with a proper fix. > > > Hi Andrea, > is this patch still outstanding or am I out of the loop now? I have a fix in my tree: https://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git/commit/?h=scx-6.18&id=225dccfba74877ba7c74971801f8d8f47d124373 But it also requires this fix for BPF as well: https://lore.kernel.org/all/20250917032755.4068726-1-memxor@gmail.com/ I was waiting for the BPF fix to land in bpf-next, then I was planning to send the fix for the scx_bpf_cpu_curr() kfunc (I'll add you in cc). -Andrea
On 9/18/25 17:06, Andrea Righi wrote: > Hi Christian, > > On Thu, Sep 18, 2025 at 04:48:14PM +0100, Christian Loehle wrote: >> On 9/9/25 21:45, Andrea Righi wrote: >>> Hi Tejun, >>> >>> On Tue, Sep 09, 2025 at 10:01:16AM -1000, Tejun Heo wrote: >>>> Hello, Andrea. >>>> >>>> On Tue, Sep 09, 2025 at 09:57:09PM +0200, Andrea Righi wrote: >>>>> scx_bpf_cpu_curr() has been introduced to retrieve the current task of a >>>>> given runqueue, allowing schedulers to interact with that task. >>>>> >>>>> The kfunc assumes that it is always called in an RCU context, but this >>>>> is not always guaranteed and some BPF schedulers can trigger the >>>>> following warning: >>>>> >>>>> WARNING: suspicious RCU usage >>>>> sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled >>>>> 6.17.0-rc1 #1-NixOS Not tainted >>>>> ----------------------------- >>>>> kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage! >>>>> >>>>> The correct behavior is to acquire a reference to the returned task, so >>>>> the scheduler can safely access it and then release it with >>>>> bpf_task_release(). >>>>> >>>>> Update the kfunc and the corresponding compatibility helper to implement >>>>> reference acquisition and prevent potential RCU warnings. >>>> >>>> I think KF_RCU likely fits better for peeking kernel data structures than >>>> having to acquire/release them. Can you post the full backtrace? Is it being >>>> called from a sleepable bpf prog? Or is it that we just need to expand the >>>> rcu check scope to cover regular rcu, bh and sched? And, everything aside, >>>> if KF_RCU, should we be tripping on rcu_dereference() in the first place? >>> >>> For the records, as discussed offline, we should be fine marking the kfunc >>> as KF_RCU_PROTECTED instead of acquiring the reference to the task. >>> >>> Right now the kfunc is marked as KF_RCU, which is not really necessary, >>> because KF_RCU ensures the kfunc *arguments* are either RCU-protected or >>> trusted. >>> >>> KF_RCU_PROTECTED, instead, should ensure that the kfunc is called inside an >>> RCU read-side critical section, that is what we need. >>> >>> In this way the kfunc can safely return a pointer to the task and sleepable >>> BPF programs can wrap the call in a bpf_rcu_read_lock/unlock() section. >>> This should prevent the RCU warning while still letting schedulers safely >>> use the returned task. >>> >>> I'll send a new patch with a proper fix. >>> >> Hi Andrea, >> is this patch still outstanding or am I out of the loop now? > > I have a fix in my tree: > https://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git/commit/?h=scx-6.18&id=225dccfba74877ba7c74971801f8d8f47d124373 > > But it also requires this fix for BPF as well: > https://lore.kernel.org/all/20250917032755.4068726-1-memxor@gmail.com/ Ah I missed that, thanks for the hint! > > I was waiting for the BPF fix to land in bpf-next, then I was planning to > send the fix for the scx_bpf_cpu_curr() kfunc (I'll add you in cc). > > -Andrea
© 2016 - 2025 Red Hat, Inc.