kernel/sched/ext.c | 4 ++++ 1 file changed, 4 insertions(+)
SCX tasks re-dispatched from BPF (e.g., after bandwidth throttling)
bypass enqueue_task() and go directly through dispatch_enqueue().
This skips sched_info_enqueue(), leaving last_queued at 0, which
prevents run_delay from accumulating in /proc/<pid>/schedstat.
Add sched_info_enqueue() in dispatch_enqueue() when last_queued is
not already set. This ensures run_delay correctly reflects the time
a task spends waiting for a CPU after being dispatched, including
time spent in BPF-managed throttle queues.
Without this fix, schedstat shows frozen run_delay values for SCX
tasks that go through throttle/unthrottle cycles.
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
---
kernel/sched/ext.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7ebdaf75d..827a96e39 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1525,6 +1525,10 @@ static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
WARN_ON_ONCE((p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) ||
!RB_EMPTY_NODE(&p->scx.dsq_priq));
+ /* Track queue time for schedstat run_delay accounting */
+ if (!p->sched_info.last_queued)
+ sched_info_enqueue(task_rq(p), p);
+
if (!is_local) {
raw_spin_lock_nested(&dsq->lock,
(enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
--
2.47.3
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
On Mon, May 25, 2026 at 09:19:42PM +0200, Fernand Sieber wrote: > SCX tasks re-dispatched from BPF (e.g., after bandwidth throttling) > bypass enqueue_task() and go directly through dispatch_enqueue(). > This skips sched_info_enqueue(), leaving last_queued at 0, which > prevents run_delay from accumulating in /proc/<pid>/schedstat. > > Add sched_info_enqueue() in dispatch_enqueue() when last_queued is > not already set. This ensures run_delay correctly reflects the time > a task spends waiting for a CPU after being dispatched, including > time spent in BPF-managed throttle queues. > > Without this fix, schedstat shows frozen run_delay values for SCX > tasks that go through throttle/unthrottle cycles. > > Signed-off-by: Fernand Sieber <sieberf@amazon.com> > --- > kernel/sched/ext.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 7ebdaf75d..827a96e39 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -1525,6 +1525,10 @@ static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq, > WARN_ON_ONCE((p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) || > !RB_EMPTY_NODE(&p->scx.dsq_priq)); > > + /* Track queue time for schedstat run_delay accounting */ > + if (!p->sched_info.last_queued) > + sched_info_enqueue(task_rq(p), p); I don't think this works. A DSQ can be used for throttling too and a BPF data structure can be used for non-throttling queueing too. I don't see how doing the above unconditionally would capture something meaningful reliably. Thanks. -- tejun
© 2016 - 2026 Red Hat, Inc.