sched/ext: add scx_bpf_cgroup_report_throttled kfunc

[PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc

Posted by Fernand Sieber 2 weeks ago

Add a BPF kfunc that allows SCX schedulers to report throttled time
back to the kernel's cfs_bandwidth accounting. This makes cpu.stat's
throttled_usec reflect SCX-enforced throttling, not just CFS throttling.

Without this, tools reading cpu.stat see zero throttled time when
bandwidth control is handled entirely in BPF by the SCX scheduler.

Signed-off-by: Fernand Sieber <sieberf@amazon.com>
---
 kernel/sched/ext.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 65631e577..7ebdaf75d 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -9760,6 +9760,38 @@ __bpf_kfunc struct cgroup *scx_bpf_task_cgroup(struct task_struct *p,
 }
 #endif	/* CONFIG_CGROUP_SCHED */
 
+/**
+ * scx_bpf_cgroup_report_throttled - Report throttled time for a cgroup
+ * @cgrp: target cgroup
+ * @throttled_ns: amount of throttled time in nanoseconds to add
+ *
+ * BPF schedulers implementing bandwidth control should call this to update
+ * the kernel's throttled_usec accounting in cpu.stat. Without this, tools
+ * reading cpu.stat will see zero throttled time under SCX scheduling.
+ */
+__bpf_kfunc void scx_bpf_cgroup_report_throttled(struct cgroup *cgrp,
+						  u64 throttled_ns)
+{
+#ifdef CONFIG_CFS_BANDWIDTH
+	struct task_group *tg;
+
+	rcu_read_lock();
+	tg = cgrp->subsys[cpu_cgrp_id] ?
+	     container_of(cgrp->subsys[cpu_cgrp_id], struct task_group, css) :
+	     NULL;
+	if (tg) {
+		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+
+		raw_spin_lock_irq(&cfs_b->lock);
+		cfs_b->throttled_time += throttled_ns;
+		cfs_b->nr_throttled++;
+		cfs_b->nr_periods++;
+		raw_spin_unlock_irq(&cfs_b->lock);
+	}
+	rcu_read_unlock();
+#endif
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(scx_kfunc_ids_any)
@@ -9795,6 +9827,7 @@ BTF_ID_FLAGS(func, scx_bpf_events)
 #ifdef CONFIG_CGROUP_SCHED
 BTF_ID_FLAGS(func, scx_bpf_task_cgroup, KF_IMPLICIT_ARGS | KF_RCU | KF_ACQUIRE)
 #endif
+BTF_ID_FLAGS(func, scx_bpf_cgroup_report_throttled, KF_TRUSTED_ARGS)
 BTF_KFUNCS_END(scx_kfunc_ids_any)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_any = {
-- 
2.47.3




Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

Re: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc

Posted by kernel test robot 1 week, 2 days ago

Hi Fernand,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on linus/master v7.1-rc5 next-20260528]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-ext-add-scx_bpf_cgroup_report_throttled-kfunc/20260526-032545
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260525192432.20487-1-sieberf%40amazon.com
patch subject: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc
config: x86_64-rhel-9.4-bpf (https://download.01.org/0day-ci/archive/20260530/202605300827.2IdN367O-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260530/202605300827.2IdN367O-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605300827.2IdN367O-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `__BTF_ID__set8__scx_kfunc_ids_any':
>> build_policy.c:(.BTF_ids+0x1cc): undefined reference to `KF_TRUSTED_ARGS'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc

Posted by kernel test robot 1 week, 4 days ago

Hi Fernand,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20260527]
[cannot apply to linus/master v6.16-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-ext-add-scx_bpf_cgroup_report_throttled-kfunc/20260526-032545
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260525192432.20487-1-sieberf%40amazon.com
patch subject: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc
config: x86_64-rhel-9.4-bpf (https://download.01.org/0day-ci/archive/20260528/202605281242.Y7eMHdOU-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260528/202605281242.Y7eMHdOU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605281242.Y7eMHdOU-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `__BTF_ID__set8__scx_kfunc_ids_any':
>> build_policy.c:(.BTF_ids+0x1cc): undefined reference to `KF_TRUSTED_ARGS'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] sched/ext: add scx_bpf_cgroup_report_throttled kfunc

Posted by Tejun Heo 1 week, 5 days ago

Hello,

On Mon, May 25, 2026 at 09:24:32PM +0200, Fernand Sieber wrote:
> +__bpf_kfunc void scx_bpf_cgroup_report_throttled(struct cgroup *cgrp,
> +						  u64 throttled_ns)
> +{
> +#ifdef CONFIG_CFS_BANDWIDTH
> +	struct task_group *tg;
> +
> +	rcu_read_lock();
> +	tg = cgrp->subsys[cpu_cgrp_id] ?
> +	     container_of(cgrp->subsys[cpu_cgrp_id], struct task_group, css) :
> +	     NULL;
> +	if (tg) {
> +		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> +
> +		raw_spin_lock_irq(&cfs_b->lock);
> +		cfs_b->throttled_time += throttled_ns;
> +		cfs_b->nr_throttled++;
> +		cfs_b->nr_periods++;
> +		raw_spin_unlock_irq(&cfs_b->lock);
> +	}
> +	rcu_read_unlock();
> +#endif

I don't think modifying fair's internal state from scx is a good idea. All
the cgroup piping is split at the interface level. I think it'd be better to
make both cfs and scx report their own numbers and then sum them up in the
output.

Thanks.

-- 
tejun