[PATCH v4 0/3] sched_ext: Harden scx_bpf_cpu_rq()

Christian Loehle posted 3 patches 1 month, 3 weeks ago
There is a newer version of this series
kernel/sched/ext.c                       | 49 ++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h |  2 +
2 files changed, 51 insertions(+)
[PATCH v4 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Christian Loehle 1 month, 3 weeks ago
scx_bpf_cpu_rq() currently allows accessing struct rq fields without
holding the associated rq.
It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
scx_tickless. Fortunately it is only ever used to fetch rq->curr.
So provide an alternative scx_bpf_task_acquire_remote_curr() that
doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked()
by ensuring we hold the rq lock.
Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the
two alternatives.

This also simplifies scx code from:

rq = scx_bpf_cpu_rq(cpu);
if (!rq)
	return;
p = rq->curr
if (!p)
	return;
/* ... Do something with p */

into:

p = scx_bpf_task_acquire_remote_curr(cpu);
if (!p)
	return;
/* ... Do something with p */
bpf_task_release(p);


v3:
https://lore.kernel.org/lkml/20250805111036.130121-1-christian.loehle@arm.com/
Don't change scx_bpf_cpu_rq() do not break BPF schedulers without the
grace period. Just add the deprecation warning and do the hardening in
the new scx_bpf_cpu_rq_locked(). (Andrea, Tejun, Jake)
v2:
https://lore.kernel.org/lkml/20250804112743.711816-1-christian.loehle@arm.com/
- Open-code bpf_task_acquire() to avoid the forward declaration (Andrea)
- Rename scx_bpf_task_acquire_remote_curr() to make it more explicit it
behaves like bpf_task_acquire()
- Dis
v1:
https://lore.kernel.org/lkml/20250801141741.355059-1-christian.loehle@arm.com/
- scx_bpf_cpu_rq() now errors when a not locked rq is requested. (Andrea)
- scx_bpf_remote_curr() calls bpf_task_acquire() which BPF user needs to
release. (Andrea)
Christian Loehle (3):
  sched_ext: Introduce scx_bpf_cpu_rq_locked()
  sched_ext: Provide scx_bpf_task_acquire_remote_curr()
  sched_ext: deprecation warn for scx_bpf_cpu_rq()

 kernel/sched/ext.c                       | 49 ++++++++++++++++++++++++
 tools/sched_ext/include/scx/common.bpf.h |  2 +
 2 files changed, 51 insertions(+)

-- 
2.34.1
Re: [PATCH v4 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Peter Zijlstra 1 month, 3 weeks ago
On Mon, Aug 11, 2025 at 10:21:47PM +0100, Christian Loehle wrote:
> scx_bpf_cpu_rq() currently allows accessing struct rq fields without
> holding the associated rq.
> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
> scx_tickless. Fortunately it is only ever used to fetch rq->curr.
> So provide an alternative scx_bpf_task_acquire_remote_curr() that
> doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked()
> by ensuring we hold the rq lock.
> Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the
> two alternatives.
> 
> This also simplifies scx code from:
> 
> rq = scx_bpf_cpu_rq(cpu);
> if (!rq)
> 	return;
> p = rq->curr
> if (!p)
> 	return;
> /* ... Do something with p */
> 
> into:
> 
> p = scx_bpf_task_acquire_remote_curr(cpu);
> if (!p)
> 	return;
> /* ... Do something with p */
> bpf_task_release(p);

Why do that mandatory refcount dance, rather than directly exposing the
RCU-ness of that pointer? IIRC BPF was good with RCU.
Re: [PATCH v4 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Christian Loehle 1 month, 3 weeks ago
On 8/12/25 09:00, Peter Zijlstra wrote:
> On Mon, Aug 11, 2025 at 10:21:47PM +0100, Christian Loehle wrote:
>> scx_bpf_cpu_rq() currently allows accessing struct rq fields without
>> holding the associated rq.
>> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
>> scx_tickless. Fortunately it is only ever used to fetch rq->curr.
>> So provide an alternative scx_bpf_task_acquire_remote_curr() that
>> doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked()
>> by ensuring we hold the rq lock.
>> Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the
>> two alternatives.
>>
>> This also simplifies scx code from:
>>
>> rq = scx_bpf_cpu_rq(cpu);
>> if (!rq)
>> 	return;
>> p = rq->curr
>> if (!p)
>> 	return;
>> /* ... Do something with p */
>>
>> into:
>>
>> p = scx_bpf_task_acquire_remote_curr(cpu);
>> if (!p)
>> 	return;
>> /* ... Do something with p */
>> bpf_task_release(p);
> 
> Why do that mandatory refcount dance, rather than directly exposing the
> RCU-ness of that pointer? IIRC BPF was good with RCU.

Just because that's how
bpf_task_from_pid()
bpf_task_from_vpid()
already work. I have no strong preference either way.
Apart from the above just returning
rcu_dereference(cpu_rq(cpu)->curr);
is obviously a bit less cumbersome (and yes, RCUs are exposed to BPF,
for scx most callbacks have that implicit anyway.)

I'll change it to scx_bpf_remote_curr() that does that in the next
version, thanks!
Re: [PATCH v4 0/3] sched_ext: Harden scx_bpf_cpu_rq()
Posted by Andrea Righi 1 month, 3 weeks ago
On Tue, Aug 12, 2025 at 12:40:39PM +0100, Christian Loehle wrote:
> On 8/12/25 09:00, Peter Zijlstra wrote:
> > On Mon, Aug 11, 2025 at 10:21:47PM +0100, Christian Loehle wrote:
> >> scx_bpf_cpu_rq() currently allows accessing struct rq fields without
> >> holding the associated rq.
> >> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and
> >> scx_tickless. Fortunately it is only ever used to fetch rq->curr.
> >> So provide an alternative scx_bpf_task_acquire_remote_curr() that
> >> doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked()
> >> by ensuring we hold the rq lock.
> >> Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the
> >> two alternatives.
> >>
> >> This also simplifies scx code from:
> >>
> >> rq = scx_bpf_cpu_rq(cpu);
> >> if (!rq)
> >> 	return;
> >> p = rq->curr
> >> if (!p)
> >> 	return;
> >> /* ... Do something with p */
> >>
> >> into:
> >>
> >> p = scx_bpf_task_acquire_remote_curr(cpu);
> >> if (!p)
> >> 	return;
> >> /* ... Do something with p */
> >> bpf_task_release(p);
> > 
> > Why do that mandatory refcount dance, rather than directly exposing the
> > RCU-ness of that pointer? IIRC BPF was good with RCU.
> 
> Just because that's how
> bpf_task_from_pid()
> bpf_task_from_vpid()
> already work. I have no strong preference either way.
> Apart from the above just returning
> rcu_dereference(cpu_rq(cpu)->curr);
> is obviously a bit less cumbersome (and yes, RCUs are exposed to BPF,
> for scx most callbacks have that implicit anyway.)
> 
> I'll change it to scx_bpf_remote_curr() that does that in the next
> version, thanks!

Yeah, I suggested Christian to do the refcount dance to be consistent with
bpf_task_from_[v]pid(), but we can probably mark the kfunc as KF_RCU and
rely on RCU locking to save a bit of overhead. So, it's probably better to
follow Peter's suggestion.

Thanks,
-Andrea