kernel/sched/ext.c | 49 ++++++++++++++++++++++++ tools/sched_ext/include/scx/common.bpf.h | 2 + 2 files changed, 51 insertions(+)
scx_bpf_cpu_rq() currently allows accessing struct rq fields without holding the associated rq. It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and scx_tickless. Fortunately it is only ever used to fetch rq->curr. So provide an alternative scx_bpf_task_acquire_remote_curr() that doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked() by ensuring we hold the rq lock. Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the two alternatives. This also simplifies scx code from: rq = scx_bpf_cpu_rq(cpu); if (!rq) return; p = rq->curr if (!p) return; /* ... Do something with p */ into: p = scx_bpf_task_acquire_remote_curr(cpu); if (!p) return; /* ... Do something with p */ bpf_task_release(p); v3: https://lore.kernel.org/lkml/20250805111036.130121-1-christian.loehle@arm.com/ Don't change scx_bpf_cpu_rq() do not break BPF schedulers without the grace period. Just add the deprecation warning and do the hardening in the new scx_bpf_cpu_rq_locked(). (Andrea, Tejun, Jake) v2: https://lore.kernel.org/lkml/20250804112743.711816-1-christian.loehle@arm.com/ - Open-code bpf_task_acquire() to avoid the forward declaration (Andrea) - Rename scx_bpf_task_acquire_remote_curr() to make it more explicit it behaves like bpf_task_acquire() - Dis v1: https://lore.kernel.org/lkml/20250801141741.355059-1-christian.loehle@arm.com/ - scx_bpf_cpu_rq() now errors when a not locked rq is requested. (Andrea) - scx_bpf_remote_curr() calls bpf_task_acquire() which BPF user needs to release. (Andrea) Christian Loehle (3): sched_ext: Introduce scx_bpf_cpu_rq_locked() sched_ext: Provide scx_bpf_task_acquire_remote_curr() sched_ext: deprecation warn for scx_bpf_cpu_rq() kernel/sched/ext.c | 49 ++++++++++++++++++++++++ tools/sched_ext/include/scx/common.bpf.h | 2 + 2 files changed, 51 insertions(+) -- 2.34.1
On Mon, Aug 11, 2025 at 10:21:47PM +0100, Christian Loehle wrote: > scx_bpf_cpu_rq() currently allows accessing struct rq fields without > holding the associated rq. > It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and > scx_tickless. Fortunately it is only ever used to fetch rq->curr. > So provide an alternative scx_bpf_task_acquire_remote_curr() that > doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked() > by ensuring we hold the rq lock. > Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the > two alternatives. > > This also simplifies scx code from: > > rq = scx_bpf_cpu_rq(cpu); > if (!rq) > return; > p = rq->curr > if (!p) > return; > /* ... Do something with p */ > > into: > > p = scx_bpf_task_acquire_remote_curr(cpu); > if (!p) > return; > /* ... Do something with p */ > bpf_task_release(p); Why do that mandatory refcount dance, rather than directly exposing the RCU-ness of that pointer? IIRC BPF was good with RCU.
On 8/12/25 09:00, Peter Zijlstra wrote: > On Mon, Aug 11, 2025 at 10:21:47PM +0100, Christian Loehle wrote: >> scx_bpf_cpu_rq() currently allows accessing struct rq fields without >> holding the associated rq. >> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and >> scx_tickless. Fortunately it is only ever used to fetch rq->curr. >> So provide an alternative scx_bpf_task_acquire_remote_curr() that >> doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked() >> by ensuring we hold the rq lock. >> Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the >> two alternatives. >> >> This also simplifies scx code from: >> >> rq = scx_bpf_cpu_rq(cpu); >> if (!rq) >> return; >> p = rq->curr >> if (!p) >> return; >> /* ... Do something with p */ >> >> into: >> >> p = scx_bpf_task_acquire_remote_curr(cpu); >> if (!p) >> return; >> /* ... Do something with p */ >> bpf_task_release(p); > > Why do that mandatory refcount dance, rather than directly exposing the > RCU-ness of that pointer? IIRC BPF was good with RCU. Just because that's how bpf_task_from_pid() bpf_task_from_vpid() already work. I have no strong preference either way. Apart from the above just returning rcu_dereference(cpu_rq(cpu)->curr); is obviously a bit less cumbersome (and yes, RCUs are exposed to BPF, for scx most callbacks have that implicit anyway.) I'll change it to scx_bpf_remote_curr() that does that in the next version, thanks!
On Tue, Aug 12, 2025 at 12:40:39PM +0100, Christian Loehle wrote: > On 8/12/25 09:00, Peter Zijlstra wrote: > > On Mon, Aug 11, 2025 at 10:21:47PM +0100, Christian Loehle wrote: > >> scx_bpf_cpu_rq() currently allows accessing struct rq fields without > >> holding the associated rq. > >> It is being used by scx_cosmos, scx_flash, scx_lavd, scx_layered, and > >> scx_tickless. Fortunately it is only ever used to fetch rq->curr. > >> So provide an alternative scx_bpf_task_acquire_remote_curr() that > >> doesn't expose struct rq and provide a hardened scx_bpf_cpu_rq_locked() > >> by ensuring we hold the rq lock. > >> Add a deprecation warning to scx_bpf_cpu_rq_locked() that mentions the > >> two alternatives. > >> > >> This also simplifies scx code from: > >> > >> rq = scx_bpf_cpu_rq(cpu); > >> if (!rq) > >> return; > >> p = rq->curr > >> if (!p) > >> return; > >> /* ... Do something with p */ > >> > >> into: > >> > >> p = scx_bpf_task_acquire_remote_curr(cpu); > >> if (!p) > >> return; > >> /* ... Do something with p */ > >> bpf_task_release(p); > > > > Why do that mandatory refcount dance, rather than directly exposing the > > RCU-ness of that pointer? IIRC BPF was good with RCU. > > Just because that's how > bpf_task_from_pid() > bpf_task_from_vpid() > already work. I have no strong preference either way. > Apart from the above just returning > rcu_dereference(cpu_rq(cpu)->curr); > is obviously a bit less cumbersome (and yes, RCUs are exposed to BPF, > for scx most callbacks have that implicit anyway.) > > I'll change it to scx_bpf_remote_curr() that does that in the next > version, thanks! Yeah, I suggested Christian to do the refcount dance to be consistent with bpf_task_from_[v]pid(), but we can probably mark the kfunc as KF_RCU and rely on RCU locking to save a bit of overhead. So, it's probably better to follow Peter's suggestion. Thanks, -Andrea
© 2016 - 2025 Red Hat, Inc.