kernel/sched/ext.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
A WARN fires when systemd's user manager writes "+cpu +memory +pids" to
its own subtree_control while a sched_ext scheduler is loaded:
WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
scx_cgroup_move_task+0xa8/0xb0
sched_move_task+0x134/0x290
cpu_cgroup_attach+0x39/0x70
cgroup_migrate_execute+0x37d/0x450
cgroup_update_dfl_csses+0x1e3/0x270
cgroup_subtree_control_write+0x3e7/0x440
scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu
cgroup changes. It can still be NULL when scx_cgroup_move_task() runs,
through this sequence:
Step Result
--------------------------------- ----------------------------------
1. cpu enabled on cgroup G cpu css = A
2. cpu toggled off then on for G A killed, B created (same cgroup)
3. an exiting task keeps A alive migration skips it, A now stale
4. +memory migrates G stale A vs current B pulls cpu in
5. cpu attach runs for all tasks hits a live, cpu-unchanged task
6. scx_cgroup_move_task() on it cgrp_moving_from NULL -> WARN
The mismatch is that scx_cgroup_can_attach() keys on cgroup identity
while migration drives the move on css identity, so a NULL cgrp_moving_from
here is a legitimate css-only migration, not a missing prep.
The call is already gated on cgrp_moving_from, so just drop the warning.
ops.cgroup_prep_move() and ops.cgroup_move() stay paired.
Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 012ca8b..a1f7698 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4293,11 +4293,13 @@ void scx_cgroup_move_task(struct task_struct *p)
return;
/*
- * @p must have ops.cgroup_prep_move() called on it and thus
- * cgrp_moving_from set.
+ * scx_cgroup_can_attach() sets cgrp_moving_from only when the task's
+ * cgroup changes. Migration keys off css rather than cgroup identity,
+ * so it can hand an unchanged-cgroup task here with cgrp_moving_from
+ * NULL. Nothing to report to the BPF scheduler then, so skip it and
+ * keep prep_move and move paired.
*/
- if (SCX_HAS_OP(sch, cgroup_move) &&
- !WARN_ON_ONCE(!p->scx.cgrp_moving_from))
+ if (SCX_HAS_OP(sch, cgroup_move) && p->scx.cgrp_moving_from)
SCX_CALL_OP_TASK(sch, cgroup_move, task_rq(p),
p, p->scx.cgrp_moving_from,
tg_cgrp(task_group(p)));
Applied to sched_ext/for-7.1-fixes. Thanks. -- tejun
Hello,
On Mon, Jun 01, 2026 at 09:22:37AM -1000, Tejun Heo wrote:
> A WARN fires when systemd's user manager writes "+cpu +memory +pids" to
> its own subtree_control while a sched_ext scheduler is loaded:
>
> WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
> scx_cgroup_move_task+0xa8/0xb0
> sched_move_task+0x134/0x290
> cpu_cgroup_attach+0x39/0x70
> cgroup_migrate_execute+0x37d/0x450
> cgroup_update_dfl_csses+0x1e3/0x270
> cgroup_subtree_control_write+0x3e7/0x440
>
> scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu
> cgroup changes. It can still be NULL when scx_cgroup_move_task() runs,
> through this sequence:
>
> Step Result
> --------------------------------- ----------------------------------
> 1. cpu enabled on cgroup G cpu css = A
> 2. cpu toggled off then on for G A killed, B created (same cgroup)
> 3. an exiting task keeps A alive migration skips it, A now stale
> 4. +memory migrates G stale A vs current B pulls cpu in
> 5. cpu attach runs for all tasks hits a live, cpu-unchanged task
> 6. scx_cgroup_move_task() on it cgrp_moving_from NULL -> WARN
>
> The mismatch is that scx_cgroup_can_attach() keys on cgroup identity
> while migration drives the move on css identity, so a NULL cgrp_moving_from
> here is a legitimate css-only migration, not a missing prep.
>
> The call is already gated on cgrp_moving_from, so just drop the warning.
> ops.cgroup_prep_move() and ops.cgroup_move() stay paired.
>
> Fixes: 819513666966 ("sched_ext: Add cgroup support")
> Cc: stable@vger.kernel.org # v6.12+
> Reported-by: Matt Fleming <mfleming@cloudflare.com>
> Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/
> Signed-off-by: Tejun Heo <tj@kernel.org>
Makes sense to me.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Thanks,
-Andrea
> ---
> kernel/sched/ext.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 012ca8b..a1f7698 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4293,11 +4293,13 @@ void scx_cgroup_move_task(struct task_struct *p)
> return;
>
> /*
> - * @p must have ops.cgroup_prep_move() called on it and thus
> - * cgrp_moving_from set.
> + * scx_cgroup_can_attach() sets cgrp_moving_from only when the task's
> + * cgroup changes. Migration keys off css rather than cgroup identity,
> + * so it can hand an unchanged-cgroup task here with cgrp_moving_from
> + * NULL. Nothing to report to the BPF scheduler then, so skip it and
> + * keep prep_move and move paired.
> */
> - if (SCX_HAS_OP(sch, cgroup_move) &&
> - !WARN_ON_ONCE(!p->scx.cgrp_moving_from))
> + if (SCX_HAS_OP(sch, cgroup_move) && p->scx.cgrp_moving_from)
> SCX_CALL_OP_TASK(sch, cgroup_move, task_rq(p),
> p, p->scx.cgrp_moving_from,
> tg_cgrp(task_group(p)));
© 2016 - 2026 Red Hat, Inc.