sched_ext: idle: honor built-in idle disablement in node kfuncs

[PATCH] sched_ext: idle: honor built-in idle disablement in node kfuncs

Posted by Joseph Salisbury 1 week, 4 days ago

The node-aware idle kfunc helpers validate per-node idle tracking, but they
don't check whether built-in idle tracking itself is enabled.

As a result, when ops.update_idle() disables built-in idle tracking, the
node helpers can still read per-node idle masks and attempt idle CPU
selection.  This violates the documented behavior and can expose stale
idle state to BPF schedulers.

Fix this by checking check_builtin_idle_enabled() in the node mask getters
and in scx_bpf_pick_idle_cpu_node(), matching the behavior of the non-node
helpers.

scx_bpf_pick_any_cpu_node() is different by: when built-in idle
tracking is disabled, it should skip idle selection and fall back directly
to the any-CPU path.  Make it do so and match scx_bpf_pick_any_cpu().

Fixes: 01059219b0cf ("sched_ext: idle: Introduce node-aware idle cpu kfunc helpers")
Cc: stable@vger.kernel.org # v6.15+
Assisted-by: Codex:GPT-5
Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
---
 kernel/sched/ext_idle.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index ba298ac3ce6c..948f6b4f8ab5 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -1082,6 +1082,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask_node(int node)
 	if (node < 0)
 		return cpu_none_mask;
 
+	if (!check_builtin_idle_enabled(sch))
+		return cpu_none_mask;
+
 	return idle_cpumask(node)->cpu;
 }
 
@@ -1137,6 +1140,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask_node(int node)
 	if (node < 0)
 		return cpu_none_mask;
 
+	if (!check_builtin_idle_enabled(sch))
+		return cpu_none_mask;
+
 	if (sched_smt_active())
 		return idle_cpumask(node)->smt;
 	else
@@ -1253,6 +1259,9 @@ __bpf_kfunc s32 scx_bpf_pick_idle_cpu_node(const struct cpumask *cpus_allowed,
 	if (node < 0)
 		return node;
 
+	if (!check_builtin_idle_enabled(sch))
+		return -EBUSY;
+
 	return scx_pick_idle_cpu(cpus_allowed, node, flags);
 }
 
@@ -1337,9 +1346,11 @@ __bpf_kfunc s32 scx_bpf_pick_any_cpu_node(const struct cpumask *cpus_allowed,
 	if (node < 0)
 		return node;
 
-	cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
-	if (cpu >= 0)
-		return cpu;
+	if (static_branch_likely(&scx_builtin_idle_enabled)) {
+		cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
+		if (cpu >= 0)
+			return cpu;
+	}
 
 	if (flags & SCX_PICK_IDLE_IN_NODE)
 		cpu = cpumask_any_and_distribute(cpumask_of_node(node), cpus_allowed);
-- 
2.47.3

Re: [PATCH] sched_ext: idle: honor built-in idle disablement in node kfuncs

Posted by Andrea Righi 1 week, 3 days ago

Hi Joe,

On Tue, Mar 24, 2026 at 03:42:35PM -0400, Joseph Salisbury wrote:
> The node-aware idle kfunc helpers validate per-node idle tracking, but they
> don't check whether built-in idle tracking itself is enabled.
> 
> As a result, when ops.update_idle() disables built-in idle tracking, the
> node helpers can still read per-node idle masks and attempt idle CPU
> selection.  This violates the documented behavior and can expose stale
> idle state to BPF schedulers.
> 
> Fix this by checking check_builtin_idle_enabled() in the node mask getters
> and in scx_bpf_pick_idle_cpu_node(), matching the behavior of the non-node
> helpers.
> 
> scx_bpf_pick_any_cpu_node() is different by: when built-in idle
> tracking is disabled, it should skip idle selection and fall back directly
> to the any-CPU path.  Make it do so and match scx_bpf_pick_any_cpu().
> 
> Fixes: 01059219b0cf ("sched_ext: idle: Introduce node-aware idle cpu kfunc helpers")
> Cc: stable@vger.kernel.org # v6.15+
> Assisted-by: Codex:GPT-5
> Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>

We are already validating this at load time, see validate_ops():
...
       /*
         * SCX_OPS_BUILTIN_IDLE_PER_NODE requires built-in CPU idle
         * selection policy to be enabled.
         */
        if ((ops->flags & SCX_OPS_BUILTIN_IDLE_PER_NODE) &&
            (ops->update_idle && !(ops->flags & SCX_OPS_KEEP_BUILTIN_IDLE))) {
                scx_error(sch, "SCX_OPS_BUILTIN_IDLE_PER_NODE requires CPU idle selection enabled");
                return -EINVAL;
        }
...

In practice you can't have SCX_OPS_BUILTIN_IDLE_PER_NODE set without
built-in idle enabled if a scheduler is running and we are checking for
SCX_OPS_BUILTIN_IDLE_PER_NODE in validate_node(). So I think these extra
checks are not needed.

Thanks,
-Andrea

> ---
>  kernel/sched/ext_idle.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index ba298ac3ce6c..948f6b4f8ab5 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -1082,6 +1082,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask_node(int node)
>  	if (node < 0)
>  		return cpu_none_mask;
>  
> +	if (!check_builtin_idle_enabled(sch))
> +		return cpu_none_mask;
> +
>  	return idle_cpumask(node)->cpu;
>  }
>  
> @@ -1137,6 +1140,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask_node(int node)
>  	if (node < 0)
>  		return cpu_none_mask;
>  
> +	if (!check_builtin_idle_enabled(sch))
> +		return cpu_none_mask;
> +
>  	if (sched_smt_active())
>  		return idle_cpumask(node)->smt;
>  	else
> @@ -1253,6 +1259,9 @@ __bpf_kfunc s32 scx_bpf_pick_idle_cpu_node(const struct cpumask *cpus_allowed,
>  	if (node < 0)
>  		return node;
>  
> +	if (!check_builtin_idle_enabled(sch))
> +		return -EBUSY;
> +
>  	return scx_pick_idle_cpu(cpus_allowed, node, flags);
>  }
>  
> @@ -1337,9 +1346,11 @@ __bpf_kfunc s32 scx_bpf_pick_any_cpu_node(const struct cpumask *cpus_allowed,
>  	if (node < 0)
>  		return node;
>  
> -	cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
> -	if (cpu >= 0)
> -		return cpu;
> +	if (static_branch_likely(&scx_builtin_idle_enabled)) {
> +		cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
> +		if (cpu >= 0)
> +			return cpu;
> +	}
>  
>  	if (flags & SCX_PICK_IDLE_IN_NODE)
>  		cpu = cpumask_any_and_distribute(cpumask_of_node(node), cpus_allowed);
> -- 
> 2.47.3
>

Re: [External] : Re: [PATCH] sched_ext: idle: honor built-in idle disablement in node kfuncs

Posted by Joseph Salisbury 1 week, 2 days ago


On 3/25/26 6:22 PM, Andrea Righi wrote:
> Hi Joe,
>
> On Tue, Mar 24, 2026 at 03:42:35PM -0400, Joseph Salisbury wrote:
>> The node-aware idle kfunc helpers validate per-node idle tracking, but they
>> don't check whether built-in idle tracking itself is enabled.
>>
>> As a result, when ops.update_idle() disables built-in idle tracking, the
>> node helpers can still read per-node idle masks and attempt idle CPU
>> selection.  This violates the documented behavior and can expose stale
>> idle state to BPF schedulers.
>>
>> Fix this by checking check_builtin_idle_enabled() in the node mask getters
>> and in scx_bpf_pick_idle_cpu_node(), matching the behavior of the non-node
>> helpers.
>>
>> scx_bpf_pick_any_cpu_node() is different by: when built-in idle
>> tracking is disabled, it should skip idle selection and fall back directly
>> to the any-CPU path.  Make it do so and match scx_bpf_pick_any_cpu().
>>
>> Fixes: 01059219b0cf ("sched_ext: idle: Introduce node-aware idle cpu kfunc helpers")
>> Cc: stable@vger.kernel.org # v6.15+
>> Assisted-by: Codex:GPT-5
>> Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
> We are already validating this at load time, see validate_ops():
> ...
>         /*
>           * SCX_OPS_BUILTIN_IDLE_PER_NODE requires built-in CPU idle
>           * selection policy to be enabled.
>           */
>          if ((ops->flags & SCX_OPS_BUILTIN_IDLE_PER_NODE) &&
>              (ops->update_idle && !(ops->flags & SCX_OPS_KEEP_BUILTIN_IDLE))) {
>                  scx_error(sch, "SCX_OPS_BUILTIN_IDLE_PER_NODE requires CPU idle selection enabled");
>                  return -EINVAL;
>          }
> ...
>
> In practice you can't have SCX_OPS_BUILTIN_IDLE_PER_NODE set without
> built-in idle enabled if a scheduler is running and we are checking for
> SCX_OPS_BUILTIN_IDLE_PER_NODE in validate_node(). So I think these extra
> checks are not needed.
>
> Thanks,
> -Andrea

Hi Andrea,

Thanks for the review.  I missed the validate_ops() check and focused on 
the helper-side behavior.

SCX_OPS_BUILTIN_IDLE_PER_NODE is rejected when ops.update_idle() 
disables built-in idle unless SCX_OPS_KEEP_BUILTIN_IDLE is set, so the 
state I was trying to guard is not reachable for a running scheduler.  
That makes the added checks unnecessary.

I thought scx_bpf_pick_any_cpu_node() should mirror 
scx_bpf_pick_any_cpu() and fall back when built-in idle is disabled.  
However, that only makes sense for the non-node helper, where built-in 
idle disabled is a valid configuration. For the per-node case, 
validate_ops() rejects that combination, so there is no runtime case to 
handle.

While looking at this, I noticed the comment above 
scx_bpf_pick_any_cpu_node() still describes that unreachable 
built-in-idle-disabled fallback case (Line 1364 in ext_idle.c in 
mainline).  I can create a comment-only cleanup to align the comment 
with the current behavior.  Do you think that is worth sending? Maybe 
something like this:


- * If ops.update_idle() is implemented and %SCX_OPS_KEEP_BUILTIN_IDLE 
is not
- * set, this function can't tell which CPUs are idle and will always 
pick any
- * CPU.
+ * %SCX_OPS_BUILTIN_IDLE_PER_NODE requires built-in idle tracking, so
+ * this helper always attempts node-aware idle selection before falling
+ * back to picking any CPU.


Thanks for the explanation, and sorry for the noise.

Thanks,

Joe

>
>> ---
>>   kernel/sched/ext_idle.c | 17 ++++++++++++++---
>>   1 file changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
>> index ba298ac3ce6c..948f6b4f8ab5 100644
>> --- a/kernel/sched/ext_idle.c
>> +++ b/kernel/sched/ext_idle.c
>> @@ -1082,6 +1082,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask_node(int node)
>>   	if (node < 0)
>>   		return cpu_none_mask;
>>   
>> +	if (!check_builtin_idle_enabled(sch))
>> +		return cpu_none_mask;
>> +
>>   	return idle_cpumask(node)->cpu;
>>   }
>>   
>> @@ -1137,6 +1140,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask_node(int node)
>>   	if (node < 0)
>>   		return cpu_none_mask;
>>   
>> +	if (!check_builtin_idle_enabled(sch))
>> +		return cpu_none_mask;
>> +
>>   	if (sched_smt_active())
>>   		return idle_cpumask(node)->smt;
>>   	else
>> @@ -1253,6 +1259,9 @@ __bpf_kfunc s32 scx_bpf_pick_idle_cpu_node(const struct cpumask *cpus_allowed,
>>   	if (node < 0)
>>   		return node;
>>   
>> +	if (!check_builtin_idle_enabled(sch))
>> +		return -EBUSY;
>> +
>>   	return scx_pick_idle_cpu(cpus_allowed, node, flags);
>>   }
>>   
>> @@ -1337,9 +1346,11 @@ __bpf_kfunc s32 scx_bpf_pick_any_cpu_node(const struct cpumask *cpus_allowed,
>>   	if (node < 0)
>>   		return node;
>>   
>> -	cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
>> -	if (cpu >= 0)
>> -		return cpu;
>> +	if (static_branch_likely(&scx_builtin_idle_enabled)) {
>> +		cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
>> +		if (cpu >= 0)
>> +			return cpu;
>> +	}
>>   
>>   	if (flags & SCX_PICK_IDLE_IN_NODE)
>>   		cpu = cpumask_any_and_distribute(cpumask_of_node(node), cpus_allowed);
>> -- 
>> 2.47.3
>>