[PATCH v6 0/3] sched_ext: Harden scx_bpf_cpu_rq()

Christian Loehle posted 3 patches 1 month ago
There is a newer version of this series
kernel/sched/ext.c                       | 40 ++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h |  2 ++
2 files changed, 42 insertions(+)
[PATCH v6 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Christian Loehle 1 month ago
scx_bpf_cpu_rq() currently allows accessing struct rq fields without
holding the associated rq.
It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
scx_tickless. Fortunately it is only ever used to fetch rq->curr.
So provide an alternative scx_bpf_remote_curr() that doesn't expose struct rq
and provide a hardened scx_bpf_cpu_rq_locked() by ensuring we hold the rq lock.
Add a deprecation warning to scx_bpf_cpu_rq() that mentions the two alternatives.

This also simplifies scx code from:

rq = scx_bpf_cpu_rq(cpu);
if (!rq)
	return;
p = rq->curr
/* ... Do something with p */

into:

p = scx_bpf_remote_curr(cpu);
/* ... Do something with p */

Changes since:
v5:
https://lore.kernel.org/lkml/20250901132605.2282650-2-christian.loehle@arm.com/
- Actually expose the RCU pointer in scx_bpf_remote_curr() as such (Andrea)
v4:
https://lore.kernel.org/lkml/20250811212150.85759-1-christian.loehle@arm.com/
- Remove cpu argument from scx_bpf_cpu_rq_locked() as SCX has a unique
locked_rq_state anyway. (Tejun)
- Expose RCU pointer in scx_bpf_remote_curr() (Peter)
v3:
https://lore.kernel.org/lkml/20250805111036.130121-1-christian.loehle@arm.com/
- Don't change scx_bpf_cpu_rq() do not break BPF schedulers without the
grace period. Just add the deprecation warning and do the hardening in
the new scx_bpf_cpu_rq_locked(). (Andrea, Tejun, Jake)
v2:
https://lore.kernel.org/lkml/20250804112743.711816-1-christian.loehle@arm.com/
- Open-code bpf_task_acquire() to avoid the forward declaration (Andrea)
- Rename scx_bpf_task_acquire_remote_curr() to make it more explicit it
behaves like bpf_task_acquire()
v1:
https://lore.kernel.org/lkml/20250801141741.355059-1-christian.loehle@arm.com/
- scx_bpf_cpu_rq() now errors when a not locked rq is requested. (Andrea)
- scx_bpf_remote_curr() calls bpf_task_acquire() which BPF user needs to
release. (Andrea)

Christian Loehle (3):
  sched_ext: Introduce scx_bpf_cpu_rq_locked()
  sched_ext: Introduce scx_bpf_remote_curr()
  sched_ext: deprecation warn for scx_bpf_cpu_rq()

 kernel/sched/ext.c                       | 40 ++++++++++++++++++++++++
 tools/sched_ext/include/scx/common.bpf.h |  2 ++
 2 files changed, 42 insertions(+)

--
2.34.1
Re: [PATCH v6 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Andrea Righi 1 month ago
On Tue, Sep 02, 2025 at 12:11:40PM +0100, Christian Loehle wrote:
> scx_bpf_cpu_rq() currently allows accessing struct rq fields without
> holding the associated rq.
> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
> scx_tickless. Fortunately it is only ever used to fetch rq->curr.
> So provide an alternative scx_bpf_remote_curr() that doesn't expose struct rq
> and provide a hardened scx_bpf_cpu_rq_locked() by ensuring we hold the rq lock.
> Add a deprecation warning to scx_bpf_cpu_rq() that mentions the two alternatives.
> 
> This also simplifies scx code from:
> 
> rq = scx_bpf_cpu_rq(cpu);
> if (!rq)
> 	return;
> p = rq->curr
> /* ... Do something with p */
> 
> into:
> 
> p = scx_bpf_remote_curr(cpu);
> /* ... Do something with p */

This looks good to me.

We should probably add a __COMPAT_scx_bpf_remote_curr() macro, so that the
BPF schedulers can be updated to use this new kfunc without breaking the
compatibility with older kernels, but we can do this later, I'll send a
follow-up patch. For now:

Acked-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea
Re: [PATCH v6 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Christian Loehle 1 month ago
On 9/2/25 12:58, Andrea Righi wrote:
> On Tue, Sep 02, 2025 at 12:11:40PM +0100, Christian Loehle wrote:
>> scx_bpf_cpu_rq() currently allows accessing struct rq fields without
>> holding the associated rq.
>> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
>> scx_tickless. Fortunately it is only ever used to fetch rq->curr.
>> So provide an alternative scx_bpf_remote_curr() that doesn't expose struct rq
>> and provide a hardened scx_bpf_cpu_rq_locked() by ensuring we hold the rq lock.
>> Add a deprecation warning to scx_bpf_cpu_rq() that mentions the two alternatives.
>>
>> This also simplifies scx code from:
>>
>> rq = scx_bpf_cpu_rq(cpu);
>> if (!rq)
>> 	return;
>> p = rq->curr
>> /* ... Do something with p */
>>
>> into:
>>
>> p = scx_bpf_remote_curr(cpu);
>> /* ... Do something with p */
> 
> This looks good to me.
> 
> We should probably add a __COMPAT_scx_bpf_remote_curr() macro, so that the
> BPF schedulers can be updated to use this new kfunc without breaking the
> compatibility with older kernels, but we can do this later, I'll send a
> follow-up patch. For now:
> 
> Acked-by: Andrea Righi <arighi@nvidia.com>

Thanks!
I'd have the compat patch ready as well and would send it out in a bit.
Re: [PATCH v6 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Andrea Righi 1 month ago
On Tue, Sep 02, 2025 at 02:53:56PM +0100, Christian Loehle wrote:
> On 9/2/25 12:58, Andrea Righi wrote:
> > On Tue, Sep 02, 2025 at 12:11:40PM +0100, Christian Loehle wrote:
> >> scx_bpf_cpu_rq() currently allows accessing struct rq fields without
> >> holding the associated rq.
> >> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
> >> scx_tickless. Fortunately it is only ever used to fetch rq->curr.
> >> So provide an alternative scx_bpf_remote_curr() that doesn't expose struct rq
> >> and provide a hardened scx_bpf_cpu_rq_locked() by ensuring we hold the rq lock.
> >> Add a deprecation warning to scx_bpf_cpu_rq() that mentions the two alternatives.
> >>
> >> This also simplifies scx code from:
> >>
> >> rq = scx_bpf_cpu_rq(cpu);
> >> if (!rq)
> >> 	return;
> >> p = rq->curr
> >> /* ... Do something with p */
> >>
> >> into:
> >>
> >> p = scx_bpf_remote_curr(cpu);
> >> /* ... Do something with p */
> > 
> > This looks good to me.
> > 
> > We should probably add a __COMPAT_scx_bpf_remote_curr() macro, so that the
> > BPF schedulers can be updated to use this new kfunc without breaking the
> > compatibility with older kernels, but we can do this later, I'll send a
> > follow-up patch. For now:
> > 
> > Acked-by: Andrea Righi <arighi@nvidia.com>
> 
> Thanks!
> I'd have the compat patch ready as well and would send it out in a bit.

Awesome, I was thinking about something like the following (untested).
Feel free to include this in your patch.

Thanks,
-Andrea

 tools/sched_ext/include/scx/compat.bpf.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/sched_ext/include/scx/compat.bpf.h b/tools/sched_ext/include/scx/compat.bpf.h
index 36e0cd2fd4eda..67594ff99a461 100644
--- a/tools/sched_ext/include/scx/compat.bpf.h
+++ b/tools/sched_ext/include/scx/compat.bpf.h
@@ -230,6 +230,15 @@ static inline bool __COMPAT_is_enq_cpu_selected(u64 enq_flags)
 	 scx_bpf_pick_any_cpu_node(cpus_allowed, node, flags) :			\
 	 scx_bpf_pick_any_cpu(cpus_allowed, flags))
 
+/*
+ * v6.18: Add a helper to retrieve the current task from a runqueue.
+ *
+ * Keep this macro available until v6.20 for compatibility.
+ */
+#define __COMPAT_scx_bpf_remote_curr(cpu)					\
+	(bpf_ksym_exists(scx_bpf_remote_curr) ?					\
+	 scx_bpf_remote_curr(cpu) : scx_bpf_cpu_rq(cpu)->curr)
+
 /*
  * Define sched_ext_ops. This may be expanded to define multiple variants for
  * backward compatibility. See compat.h::SCX_OPS_LOAD/ATTACH().