kernel/sched/ext.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
Add a BPF kfunc that allows SCX schedulers to report throttled time
back to the kernel's cfs_bandwidth accounting. This makes cpu.stat's
throttled_usec reflect SCX-enforced throttling, not just CFS throttling.
Without this, tools reading cpu.stat see zero throttled time when
bandwidth control is handled entirely in BPF by the SCX scheduler.
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
---
kernel/sched/ext.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 65631e577..7ebdaf75d 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -9760,6 +9760,38 @@ __bpf_kfunc struct cgroup *scx_bpf_task_cgroup(struct task_struct *p,
}
#endif /* CONFIG_CGROUP_SCHED */
+/**
+ * scx_bpf_cgroup_report_throttled - Report throttled time for a cgroup
+ * @cgrp: target cgroup
+ * @throttled_ns: amount of throttled time in nanoseconds to add
+ *
+ * BPF schedulers implementing bandwidth control should call this to update
+ * the kernel's throttled_usec accounting in cpu.stat. Without this, tools
+ * reading cpu.stat will see zero throttled time under SCX scheduling.
+ */
+__bpf_kfunc void scx_bpf_cgroup_report_throttled(struct cgroup *cgrp,
+ u64 throttled_ns)
+{
+#ifdef CONFIG_CFS_BANDWIDTH
+ struct task_group *tg;
+
+ rcu_read_lock();
+ tg = cgrp->subsys[cpu_cgrp_id] ?
+ container_of(cgrp->subsys[cpu_cgrp_id], struct task_group, css) :
+ NULL;
+ if (tg) {
+ struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+
+ raw_spin_lock_irq(&cfs_b->lock);
+ cfs_b->throttled_time += throttled_ns;
+ cfs_b->nr_throttled++;
+ cfs_b->nr_periods++;
+ raw_spin_unlock_irq(&cfs_b->lock);
+ }
+ rcu_read_unlock();
+#endif
+}
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(scx_kfunc_ids_any)
@@ -9795,6 +9827,7 @@ BTF_ID_FLAGS(func, scx_bpf_events)
#ifdef CONFIG_CGROUP_SCHED
BTF_ID_FLAGS(func, scx_bpf_task_cgroup, KF_IMPLICIT_ARGS | KF_RCU | KF_ACQUIRE)
#endif
+BTF_ID_FLAGS(func, scx_bpf_cgroup_report_throttled, KF_TRUSTED_ARGS)
BTF_KFUNCS_END(scx_kfunc_ids_any)
static const struct btf_kfunc_id_set scx_kfunc_set_any = {
--
2.47.3
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
Hi Fernand, kernel test robot noticed the following build errors: [auto build test ERROR on tip/sched/core] [also build test ERROR on linus/master v7.1-rc5 next-20260528] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-ext-add-scx_bpf_cgroup_report_throttled-kfunc/20260526-032545 base: tip/sched/core patch link: https://lore.kernel.org/r/20260525192432.20487-1-sieberf%40amazon.com patch subject: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc config: x86_64-rhel-9.4-bpf (https://download.01.org/0day-ci/archive/20260530/202605300827.2IdN367O-lkp@intel.com/config) compiler: gcc-14 (Debian 14.2.0-19) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260530/202605300827.2IdN367O-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202605300827.2IdN367O-lkp@intel.com/ All errors (new ones prefixed by >>): ld: vmlinux.o: in function `__BTF_ID__set8__scx_kfunc_ids_any': >> build_policy.c:(.BTF_ids+0x1cc): undefined reference to `KF_TRUSTED_ARGS' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi Fernand, kernel test robot noticed the following build errors: [auto build test ERROR on tip/sched/core] [also build test ERROR on next-20260527] [cannot apply to linus/master v6.16-rc1] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-ext-add-scx_bpf_cgroup_report_throttled-kfunc/20260526-032545 base: tip/sched/core patch link: https://lore.kernel.org/r/20260525192432.20487-1-sieberf%40amazon.com patch subject: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc config: x86_64-rhel-9.4-bpf (https://download.01.org/0day-ci/archive/20260528/202605281242.Y7eMHdOU-lkp@intel.com/config) compiler: gcc-14 (Debian 14.2.0-19) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260528/202605281242.Y7eMHdOU-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202605281242.Y7eMHdOU-lkp@intel.com/ All errors (new ones prefixed by >>): ld: vmlinux.o: in function `__BTF_ID__set8__scx_kfunc_ids_any': >> build_policy.c:(.BTF_ids+0x1cc): undefined reference to `KF_TRUSTED_ARGS' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hello,
On Mon, May 25, 2026 at 09:24:32PM +0200, Fernand Sieber wrote:
> +__bpf_kfunc void scx_bpf_cgroup_report_throttled(struct cgroup *cgrp,
> + u64 throttled_ns)
> +{
> +#ifdef CONFIG_CFS_BANDWIDTH
> + struct task_group *tg;
> +
> + rcu_read_lock();
> + tg = cgrp->subsys[cpu_cgrp_id] ?
> + container_of(cgrp->subsys[cpu_cgrp_id], struct task_group, css) :
> + NULL;
> + if (tg) {
> + struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> +
> + raw_spin_lock_irq(&cfs_b->lock);
> + cfs_b->throttled_time += throttled_ns;
> + cfs_b->nr_throttled++;
> + cfs_b->nr_periods++;
> + raw_spin_unlock_irq(&cfs_b->lock);
> + }
> + rcu_read_unlock();
> +#endif
I don't think modifying fair's internal state from scx is a good idea. All
the cgroup piping is split at the interface level. I think it'd be better to
make both cfs and scx report their own numbers and then sum them up in the
output.
Thanks.
--
tejun
© 2016 - 2026 Red Hat, Inc.