sched: Add scx_cpuperf_target in sched_cpu_util()

[RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Xuewen Yan 2 weeks, 5 days ago

Recently, while enabling sched-ext debugging, we observed abnormal behavior
in our thermal power_allocator’s temperature control.
Through debugging, we found that the CPU util was too low, causing
the CPU frequency to remain unrestricted.

This issue stems from the fact that in the sched_cpu_util() function,
when scx is enabled, cpu_util_cfs becomes zero. As a result,
the thermal subsystem perceives an extremely low CPU utilization,
which degrades the effectiveness of the power_allocator’s control.

To address this, we propose adding scx_cpuperf_target in the sched_cpu_util()
as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives
accurate load information and restores proper control behavior.

Reported-by: Di Shen <di.shen@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 kernel/sched/fair.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..20adb6fede2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
 
 unsigned long sched_cpu_util(int cpu)
 {
-	return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
+	unsigned long util = scx_cpuperf_target(cpu);
+
+	if (!scx_switched_all())
+		util += cpu_util_cfs(cpu);
+
+	return effective_cpu_util(cpu, util, NULL, NULL);
 }
 
 /*
-- 
2.25.1

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Tejun Heo 2 weeks, 4 days ago

On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> Recently, while enabling sched-ext debugging, we observed abnormal behavior
> in our thermal power_allocator’s temperature control.
> Through debugging, we found that the CPU util was too low, causing
> the CPU frequency to remain unrestricted.
> 
> This issue stems from the fact that in the sched_cpu_util() function,
> when scx is enabled, cpu_util_cfs becomes zero. As a result,
> the thermal subsystem perceives an extremely low CPU utilization,
> which degrades the effectiveness of the power_allocator’s control.
> 
> To address this, we propose adding scx_cpuperf_target in the sched_cpu_util()
> as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives
> accurate load information and restores proper control behavior.
> 
> Reported-by: Di Shen <di.shen@unisoc.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>

Yeah, I missed that path. In either this or Peter's suggested form:

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Christian Loehle 2 weeks, 5 days ago

On 3/18/26 12:17, Xuewen Yan wrote:
> Recently, while enabling sched-ext debugging, we observed abnormal behavior
> in our thermal power_allocator’s temperature control.
> Through debugging, we found that the CPU util was too low, causing
> the CPU frequency to remain unrestricted.
> 
> This issue stems from the fact that in the sched_cpu_util() function,
> when scx is enabled, cpu_util_cfs becomes zero. As a result,
> the thermal subsystem perceives an extremely low CPU utilization,
> which degrades the effectiveness of the power_allocator’s control.
> 
> To address this, we propose adding scx_cpuperf_target in the sched_cpu_util()
> as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives
> accurate load information and restores proper control behavior.
> 
> Reported-by: Di Shen <di.shen@unisoc.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>

Fixes: d86adb4fc0655 ("sched_ext: Add cpuperf support")?

> ---
>  kernel/sched/fair.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..20adb6fede2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>  
>  unsigned long sched_cpu_util(int cpu)
>  {
> -	return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> +	unsigned long util = scx_cpuperf_target(cpu);
> +
> +	if (!scx_switched_all())
> +		util += cpu_util_cfs(cpu);
> +
> +	return effective_cpu_util(cpu, util, NULL, NULL);
>  }
>  
>  /*

Ah right, I guess it's not as good as sugov as there's no guarantee it behaves
somewhat low-pass-filterish, but better than nothing.
No idea if it's worth factoring out this and sugov_get_util, maybe not given that
sugov includes ('temporary') cfs runnable boost while sched_cpu_util() doesn't.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Peter Zijlstra 2 weeks, 5 days ago

On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..20adb6fede2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>  
>  unsigned long sched_cpu_util(int cpu)
>  {
> -	return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> +	unsigned long util = scx_cpuperf_target(cpu);
> +
> +	if (!scx_switched_all())
> +		util += cpu_util_cfs(cpu);
> +
> +	return effective_cpu_util(cpu, util, NULL, NULL);
>  }

This puts the common case of no ext muck into the slow path of that
static_branch.

This wants to be something like:

unsigned long sched_cpu_util(int cpu)
{
	unsigned long util = cpu_util_cfs(cpu);

	if (scx_enabled()) {
		unsigned long scx_util = scx_cpuperf_target(cpu);

		if (!scx_switched_all())
			scx_util += util;

		util = scx_util;
	}

	return effective_cpu_util(cpu, util, NULL, NULL);
}

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Tejun Heo 2 weeks, 4 days ago

On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed..20adb6fede2a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> >  
> >  unsigned long sched_cpu_util(int cpu)
> >  {
> > -	return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > +	unsigned long util = scx_cpuperf_target(cpu);
> > +
> > +	if (!scx_switched_all())
> > +		util += cpu_util_cfs(cpu);
> > +
> > +	return effective_cpu_util(cpu, util, NULL, NULL);
> >  }
> 
> This puts the common case of no ext muck into the slow path of that
> static_branch.
> 
> This wants to be something like:
> 
> unsigned long sched_cpu_util(int cpu)
> {
> 	unsigned long util = cpu_util_cfs(cpu);
> 
> 	if (scx_enabled()) {
> 		unsigned long scx_util = scx_cpuperf_target(cpu);
> 
> 		if (!scx_switched_all())
> 			scx_util += util;
> 
> 		util = scx_util;
> 	}
> 
> 	return effective_cpu_util(cpu, util, NULL, NULL);
> }

scx_switched_all() is an unlikely static branch just like scx_enabled() and
scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
fair path between the two versions is two noop run-throughs vs. one. Either
way is fine but it is more code for likely no discernible gain.

Thanks.

-- 
tejun

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Peter Zijlstra 2 weeks, 4 days ago

On Wed, Mar 18, 2026 at 03:08:43PM -1000, Tejun Heo wrote:
> On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >  
> > >  unsigned long sched_cpu_util(int cpu)
> > >  {
> > > -	return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > +	unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > +	if (!scx_switched_all())
> > > +		util += cpu_util_cfs(cpu);
> > > +
> > > +	return effective_cpu_util(cpu, util, NULL, NULL);
> > >  }
> > 
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
> > 
> > This wants to be something like:
> > 
> > unsigned long sched_cpu_util(int cpu)
> > {
> > 	unsigned long util = cpu_util_cfs(cpu);
> > 
> > 	if (scx_enabled()) {
> > 		unsigned long scx_util = scx_cpuperf_target(cpu);
> > 
> > 		if (!scx_switched_all())
> > 			scx_util += util;
> > 
> > 		util = scx_util;
> > 	}
> > 
> > 	return effective_cpu_util(cpu, util, NULL, NULL);
> > }
> 
> scx_switched_all() is an unlikely static branch just like scx_enabled() and
> scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> fair path between the two versions is two noop run-throughs vs. one. Either
> way is fine but it is more code for likely no discernible gain.

(added noinline to effective_cpu_util() for clarity)

So the original patch generates this:

sched_cpu_util:
1c5240:  sched_cpu_util+0x0       endbr64
1c5244:  sched_cpu_util+0x4       call   0x1c5249 <__fentry__>
1c5249:  sched_cpu_util+0x9       push   %rbp
1c524a:  sched_cpu_util+0xa       push   %rbx
1c524b:  sched_cpu_util+0xb       mov    %edi,%ebx
1c524d:  sched_cpu_util+0xd       <jump_table.1c524d>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5271 <sched_cpu_util+0x31>    (if JUMP)
1c524f:  sched_cpu_util+0xf       xor    %ebp,%ebp
1c5251:  sched_cpu_util+0x11      <jump_table.1c5251>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5261 <sched_cpu_util+0x21>    (if JUMP)
1c5253:  sched_cpu_util+0x13      xor    %edx,%edx
1c5255:  sched_cpu_util+0x15      xor    %esi,%esi
1c5257:  sched_cpu_util+0x17      mov    %ebx,%edi
1c5259:  sched_cpu_util+0x19      call   0x1bc5b0 <cpu_util.constprop.0>
1c525e:  sched_cpu_util+0x1e      add    %rax,%rbp
1c5261:  sched_cpu_util+0x21      mov    %rbp,%rsi
1c5264:  sched_cpu_util+0x24      mov    %ebx,%edi
1c5266:  sched_cpu_util+0x26      xor    %ecx,%ecx
1c5268:  sched_cpu_util+0x28      pop    %rbx
1c5269:  sched_cpu_util+0x29      xor    %edx,%edx
1c526b:  sched_cpu_util+0x2b      pop    %rbp
1c526c:  sched_cpu_util+0x2c      jmp    0x1c5160 <effective_cpu_util>

(slowpath)

1c5271:  sched_cpu_util+0x31      movslq %edi,%rdx
1c5274:  sched_cpu_util+0x34      mov    $0x0,%rax
1c527b:  sched_cpu_util+0x3b      mov    0x0(,%rdx,8),%rdx
1c5283:  sched_cpu_util+0x43      mov    0xa34(%rdx,%rax,1),%ebp
1c528a:  sched_cpu_util+0x4a      jmp    0x1c5251 <sched_cpu_util+0x11>


While my proposal generates this:

sched_cpu_util:
1c5240:  sched_cpu_util+0x0       endbr64
1c5244:  sched_cpu_util+0x4       call   0x1c5249 <__fentry__>
1c5249:  sched_cpu_util+0x9       push   %rbx
1c524a:  sched_cpu_util+0xa       xor    %esi,%esi
1c524c:  sched_cpu_util+0xc       xor    %edx,%edx
1c524e:  sched_cpu_util+0xe       mov    %edi,%ebx
1c5250:  sched_cpu_util+0x10      call   0x1bc5b0 <cpu_util.constprop.0>
1c5255:  sched_cpu_util+0x15      mov    %rax,%rsi
1c5258:  sched_cpu_util+0x18      <jump_table.1c5258>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5266 <sched_cpu_util+0x26>    (if JUMP)
1c525a:  sched_cpu_util+0x1a      mov    %ebx,%edi
1c525c:  sched_cpu_util+0x1c      xor    %ecx,%ecx
1c525e:  sched_cpu_util+0x1e      xor    %edx,%edx
1c5260:  sched_cpu_util+0x20      pop    %rbx
1c5261:  sched_cpu_util+0x21      jmp    0x1c5160 <effective_cpu_util>

(slowpath)

1c5266:  sched_cpu_util+0x26      <jump_table.1c5266>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c527b <sched_cpu_util+0x3b>    (if JUMP)
1c5268:  sched_cpu_util+0x28      xor    %eax,%eax
1c526a:  sched_cpu_util+0x2a      <jump_table.1c526a>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5296 <sched_cpu_util+0x56>    (if JUMP)
1c526c:  sched_cpu_util+0x2c      mov    %ebx,%edi
1c526e:  sched_cpu_util+0x2e      add    %rax,%rsi
1c5271:  sched_cpu_util+0x31      xor    %ecx,%ecx
1c5273:  sched_cpu_util+0x33      xor    %edx,%edx
1c5275:  sched_cpu_util+0x35      pop    %rbx
1c5276:  sched_cpu_util+0x36      jmp    0x1c5160 <effective_cpu_util>
1c527b:  sched_cpu_util+0x3b      movslq %ebx,%rdx
1c527e:  sched_cpu_util+0x3e      mov    $0x0,%rax
1c5285:  sched_cpu_util+0x45      mov    0x0(,%rdx,8),%rdx
1c528d:  sched_cpu_util+0x4d      mov    0xa34(%rdx,%rax,1),%eax
1c5294:  sched_cpu_util+0x54      jmp    0x1c526a <sched_cpu_util+0x2a>
1c5296:  sched_cpu_util+0x56      mov    %ebx,%edi
1c5298:  sched_cpu_util+0x58      mov    %rax,%rsi
1c529b:  sched_cpu_util+0x5b      xor    %ecx,%ecx
1c529d:  sched_cpu_util+0x5d      xor    %edx,%edx
1c529f:  sched_cpu_util+0x5f      pop    %rbx
1c52a0:  sched_cpu_util+0x60      jmp    0x1c5160 <effective_cpu_util>


That fastpath is definitely better; the slowpath is worse, but that is
in part because the compilers are stupid and cannot eliminate
static_branch().

/me goes try again .. Yeah, the below patch does nothing :-( It will
happily emit scx_enabled() twice.


---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 05b16299588d..47cd1a1f9784 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -32,7 +32,7 @@
 	JUMP_TABLE_ENTRY(key, label)
 #endif /* CONFIG_HAVE_JUMP_LABEL_HACK */
 
-static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
+static __always_inline __const bool arch_static_branch(struct static_key * const key, const bool branch)
 {
 	asm goto(ARCH_STATIC_BRANCH_ASM("%c0 + %c1", "%l[l_yes]")
 		: :  "i" (key), "i" (branch) : : l_yes);
@@ -42,7 +42,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co
 	return true;
 }
 
-static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
+static __always_inline __const bool arch_static_branch_jump(struct static_key * const key, const bool branch)
 {
 	asm goto("1:"
 		"jmp %l[l_yes]\n\t"
diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index c16d4199bf92..553fc9f3f7eb 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -312,6 +312,7 @@
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute
  */
 #define __pure                          __attribute__((__pure__))
+#define __const                         __attribute__((__const__))
 
 /*
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Uros Bizjak 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:

> That fastpath is definitely better; the slowpath is worse, but that is
> in part because the compilers are stupid and cannot eliminate
> static_branch().

asm gotos are implicitly volatile because they are control flow
primitives. The compiler will *not* remove them.

Uros.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Peter Zijlstra 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > That fastpath is definitely better; the slowpath is worse, but that is
> > in part because the compilers are stupid and cannot eliminate
> > static_branch().
> 
> asm gotos are implicitly volatile because they are control flow
> primitives. The compiler will *not* remove them.

Yes, but I want ponies ;-)

  if (static_branch_unlikely(&foo)) {
	if (static_branch_unlikely(&foo)) {
		/* A */
	} else {
		/* B */
	}
	/* C */
  }

Is a very common occurence. And we all know this really should be:

  if (static_branch_unlikely(&foo)) {
	  /* A */
	  /* C */
  }

So how can we make this happen? IMO marking those functions __const
should tell the compiler that yes, it can elimintate them.

You should not try and protect the user. If they use __const
incorrectly, they get to keep the pieces and all that.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Uros Bizjak 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > That fastpath is definitely better; the slowpath is worse, but that is
> > > in part because the compilers are stupid and cannot eliminate
> > > static_branch().
> >
> > asm gotos are implicitly volatile because they are control flow
> > primitives. The compiler will *not* remove them.
>
> Yes, but I want ponies ;-)
>
>   if (static_branch_unlikely(&foo)) {
>         if (static_branch_unlikely(&foo)) {
>                 /* A */
>         } else {
>                 /* B */
>         }
>         /* C */
>   }
>
> Is a very common occurence. And we all know this really should be:
>
>   if (static_branch_unlikely(&foo)) {
>           /* A */
>           /* C */
>   }
>
> So how can we make this happen? IMO marking those functions __const
> should tell the compiler that yes, it can elimintate them.

Huh, __const promises that the function does not access global memory
and that the function does not have side effects other than returning
a value. asm volatile inside the __const function creates a side
effect, so removing function calls would be considered a
misoptimization. Probably this could lead to undefined behavior in
terms of what the compiler expects from a __const function.

> You should not try and protect the user. If they use __const
> incorrectly, they get to keep the pieces and all that.

I'm afraid that here the user wants the "__const with only partial
properties of __const" function, where the compiler does not know what
the user really wants.

Uros.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Peter Zijlstra 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote:
> On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > > That fastpath is definitely better; the slowpath is worse, but that is
> > > > in part because the compilers are stupid and cannot eliminate
> > > > static_branch().
> > >
> > > asm gotos are implicitly volatile because they are control flow
> > > primitives. The compiler will *not* remove them.
> >
> > Yes, but I want ponies ;-)
> >
> >   if (static_branch_unlikely(&foo)) {
> >         if (static_branch_unlikely(&foo)) {
> >                 /* A */
> >         } else {
> >                 /* B */
> >         }
> >         /* C */
> >   }
> >
> > Is a very common occurence. And we all know this really should be:
> >
> >   if (static_branch_unlikely(&foo)) {
> >           /* A */
> >           /* C */
> >   }
> >
> > So how can we make this happen? IMO marking those functions __const
> > should tell the compiler that yes, it can elimintate them.
> 
> Huh, __const promises that the function does not access global memory
> and that the function does not have side effects other than returning
> a value. asm volatile inside the __const function creates a side
> effect, so removing function calls would be considered a
> misoptimization. Probably this could lead to undefined behavior in
> terms of what the compiler expects from a __const function.

So since the whole function reduces to a single branch or nop, it does
not in fact access memory or have side effects, right?

(and there is still __pure, for if we somehow consider the key governing
the text patching to be an 'access' in this case)

> > You should not try and protect the user. If they use __const
> > incorrectly, they get to keep the pieces and all that.
> 
> I'm afraid that here the user wants the "__const with only partial
> properties of __const" function, where the compiler does not know what
> the user really wants.

Well, clearly I'm not a compiler person :-) I just want to be able to
tell the compiler that it can collapse these branches. Do you have
another option that would be less offensive to compiler folks such as
yourself?

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Peter Zijlstra 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 12:12:42PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > > That fastpath is definitely better; the slowpath is worse, but that is
> > > > > in part because the compilers are stupid and cannot eliminate
> > > > > static_branch().
> > > >
> > > > asm gotos are implicitly volatile because they are control flow
> > > > primitives. The compiler will *not* remove them.
> > >
> > > Yes, but I want ponies ;-)
> > >
> > >   if (static_branch_unlikely(&foo)) {
> > >         if (static_branch_unlikely(&foo)) {
> > >                 /* A */
> > >         } else {
> > >                 /* B */
> > >         }
> > >         /* C */
> > >   }
> > >
> > > Is a very common occurence. And we all know this really should be:
> > >
> > >   if (static_branch_unlikely(&foo)) {
> > >           /* A */
> > >           /* C */
> > >   }
> > >
> > > So how can we make this happen? IMO marking those functions __const
> > > should tell the compiler that yes, it can elimintate them.
> > 
> > Huh, __const promises that the function does not access global memory
> > and that the function does not have side effects other than returning
> > a value. asm volatile inside the __const function creates a side
> > effect, so removing function calls would be considered a
> > misoptimization. Probably this could lead to undefined behavior in
> > terms of what the compiler expects from a __const function.
> 
> So since the whole function reduces to a single branch or nop, it does
> not in fact access memory or have side effects, right?
> 
> (and there is still __pure, for if we somehow consider the key governing
> the text patching to be an 'access' in this case)

So is it really just the problem that since the compiler doesn't know
what asm volatile ends up doing, it must assume the worse and hence
invalidates the __const (without a warning even).

So you're letting the unknown weight heavier than the explicit
instruction from the author who put on __const (and supposedly does
know).

And I feel somewhat strongly that this is wrong.

Yes, if the author gets it wrong, and there are side effects and things
are elided it is a misoptimization, but that is on the author. You can't
blame the compiler for doing what it was told to do.

Or is there really more to this?

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Uros Bizjak 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 12:12 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > > That fastpath is definitely better; the slowpath is worse, but that is
> > > > > in part because the compilers are stupid and cannot eliminate
> > > > > static_branch().
> > > >
> > > > asm gotos are implicitly volatile because they are control flow
> > > > primitives. The compiler will *not* remove them.
> > >
> > > Yes, but I want ponies ;-)
> > >
> > >   if (static_branch_unlikely(&foo)) {
> > >         if (static_branch_unlikely(&foo)) {
> > >                 /* A */
> > >         } else {
> > >                 /* B */
> > >         }
> > >         /* C */
> > >   }
> > >
> > > Is a very common occurence. And we all know this really should be:
> > >
> > >   if (static_branch_unlikely(&foo)) {
> > >           /* A */
> > >           /* C */
> > >   }
> > >
> > > So how can we make this happen? IMO marking those functions __const
> > > should tell the compiler that yes, it can elimintate them.
> >
> > Huh, __const promises that the function does not access global memory
> > and that the function does not have side effects other than returning
> > a value. asm volatile inside the __const function creates a side
> > effect, so removing function calls would be considered a
> > misoptimization. Probably this could lead to undefined behavior in
> > terms of what the compiler expects from a __const function.
>
> So since the whole function reduces to a single branch or nop, it does
> not in fact access memory or have side effects, right?

But there is "asm goto" inside, this tells the compiler that there is
a side effect, no matter what is written in the asm template. Even
'asm volatile ("nop");' inside function declares a side effect.

> (and there is still __pure, for if we somehow consider the key governing
> the text patching to be an 'access' in this case)

__pure function also can't have side effects.

> > > You should not try and protect the user. If they use __const
> > > incorrectly, they get to keep the pieces and all that.
> >
> > I'm afraid that here the user wants the "__const with only partial
> > properties of __const" function, where the compiler does not know what
> > the user really wants.
>
> Well, clearly I'm not a compiler person :-) I just want to be able to
> tell the compiler that it can collapse these branches. Do you have
> another option that would be less offensive to compiler folks such as
> yourself?

I don't see any in this case (mixing volatiles with __const or __pure
attribute), but please ask on gcc@ mailing list, if there is any other
option I have missed.

Uros.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Peter Zijlstra 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 12:19:50PM +0100, Uros Bizjak wrote:

> I don't see any in this case (mixing volatiles with __const or __pure
> attribute), but please ask on gcc@ mailing list, if there is any other
> option I have missed.

I raised this on toolchains a while ago, but that discussions got
derailed into irrelevance :/

  https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass.net/

I really just want it fixed, I don't particularly care how.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Xuewen Yan 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 9:08 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >
> > >  unsigned long sched_cpu_util(int cpu)
> > >  {
> > > -   return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > +   unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > +   if (!scx_switched_all())
> > > +           util += cpu_util_cfs(cpu);
> > > +
> > > +   return effective_cpu_util(cpu, util, NULL, NULL);
> > >  }
> >
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
> >
> > This wants to be something like:
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> >       unsigned long util = cpu_util_cfs(cpu);
> >
> >       if (scx_enabled()) {
> >               unsigned long scx_util = scx_cpuperf_target(cpu);
> >
> >               if (!scx_switched_all())
> >                       scx_util += util;
> >
> >               util = scx_util;
> >       }
> >
> >       return effective_cpu_util(cpu, util, NULL, NULL);
> > }
>
> scx_switched_all() is an unlikely static branch just like scx_enabled() and
> scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> fair path between the two versions is two noop run-throughs vs. one. Either
> way is fine but it is more code for likely no discernible gain.

Agree, and for scx_switched_all(), there is no need to call cpu_util_cfs(),
Maybe we could modify it as follows?

--->
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..8f5d0c7e8ea2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8198,7 +8198,18 @@ unsigned long effective_cpu_util(int cpu,
unsigned long util_cfs,

 unsigned long sched_cpu_util(int cpu)
 {
-       return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
+        unsigned long util;
+
+       if (scx_enabled()) {
+               unsigned long util = scx_cpuperf_target(cpu);
+
+                if (!scx_switched_all())
+                       util += cpu_util_cfs(cpu);
+       } else {
+               util = cpu_util_cfs(cpu);
+       }
+
+       return effective_cpu_util(cpu, util, NULL, NULL);
 }
---


Thanks!

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Xuewen Yan 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 10:24 AM Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> On Thu, Mar 19, 2026 at 9:08 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index bf948db905ed..20adb6fede2a 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > >
> > > >  unsigned long sched_cpu_util(int cpu)
> > > >  {
> > > > -   return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > +   unsigned long util = scx_cpuperf_target(cpu);
> > > > +
> > > > +   if (!scx_switched_all())
> > > > +           util += cpu_util_cfs(cpu);
> > > > +
> > > > +   return effective_cpu_util(cpu, util, NULL, NULL);
> > > >  }
> > >
> > > This puts the common case of no ext muck into the slow path of that
> > > static_branch.
> > >
> > > This wants to be something like:
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > >       unsigned long util = cpu_util_cfs(cpu);
> > >
> > >       if (scx_enabled()) {
> > >               unsigned long scx_util = scx_cpuperf_target(cpu);
> > >
> > >               if (!scx_switched_all())
> > >                       scx_util += util;
> > >
> > >               util = scx_util;
> > >       }
> > >
> > >       return effective_cpu_util(cpu, util, NULL, NULL);
> > > }
> >
> > scx_switched_all() is an unlikely static branch just like scx_enabled() and
> > scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> > fair path between the two versions is two noop run-throughs vs. one. Either
> > way is fine but it is more code for likely no discernible gain.
>
> Agree, and for scx_switched_all(), there is no need to call cpu_util_cfs(),
> Maybe we could modify it as follows?
>
> --->
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..8f5d0c7e8ea2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8198,7 +8198,18 @@ unsigned long effective_cpu_util(int cpu,
> unsigned long util_cfs,
>
>  unsigned long sched_cpu_util(int cpu)
>  {
> -       return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> +        unsigned long util;
> +
> +       if (scx_enabled()) {
> +               unsigned long util = scx_cpuperf_target(cpu);

Sorry, that was a typo, this will be fixed in patch v2 if the proposed
fix is acceptable.

Thanks!
> +
> +                if (!scx_switched_all())
> +                       util += cpu_util_cfs(cpu);
> +       } else {
> +               util = cpu_util_cfs(cpu);
> +       }
> +
> +       return effective_cpu_util(cpu, util, NULL, NULL);
>  }
> ---
>
>
> Thanks!

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Vincent Guittot 2 weeks, 5 days ago

On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed..20adb6fede2a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> >
> >  unsigned long sched_cpu_util(int cpu)
> >  {
> > -     return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > +     unsigned long util = scx_cpuperf_target(cpu);
> > +
> > +     if (!scx_switched_all())
> > +             util += cpu_util_cfs(cpu);
> > +
> > +     return effective_cpu_util(cpu, util, NULL, NULL);
> >  }
>
> This puts the common case of no ext muck into the slow path of that
> static_branch.

+1
I was about to same

>
> This wants to be something like:
>
> unsigned long sched_cpu_util(int cpu)
> {
>         unsigned long util = cpu_util_cfs(cpu);
>
>         if (scx_enabled()) {
>                 unsigned long scx_util = scx_cpuperf_target(cpu);

also scx_cpuperf_target() does not reflect the utilization of the CPU
but the targeted perfromance level


>
>                 if (!scx_switched_all())
>                         scx_util += util;
>
>                 util = scx_util;
>         }
>
>         return effective_cpu_util(cpu, util, NULL, NULL);
> }

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Qais Yousef 2 weeks, 5 days ago

On 03/18/26 13:55, Vincent Guittot wrote:
> On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >
> > >  unsigned long sched_cpu_util(int cpu)
> > >  {
> > > -     return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > +     unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > +     if (!scx_switched_all())
> > > +             util += cpu_util_cfs(cpu);
> > > +
> > > +     return effective_cpu_util(cpu, util, NULL, NULL);
> > >  }
> >
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
> 
> +1
> I was about to same
> 
> >
> > This wants to be something like:
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> >         unsigned long util = cpu_util_cfs(cpu);
> >
> >         if (scx_enabled()) {
> >                 unsigned long scx_util = scx_cpuperf_target(cpu);
> 
> also scx_cpuperf_target() does not reflect the utilization of the CPU
> but the targeted perfromance level

Beside that, this sort of plug-and-play is a big concern. You picked up sched
ext and changed the behavior, then you'd need to get your thermal management to
work with that. Not retrospectively sprinkle these hacks around to force things
to work again.

This is a no from me. I think we have to keep the separation clear. And
I haven't seen a single contribution back to scheduler out of these
'experiments'. Clearly everyone is going their own way and getting the
threshold for us to get patches merged even higher not to break these out of
tree 'experiments' is a big nuance. I think the sugov one was already a mistake
to accept.

> 
> 
> >
> >                 if (!scx_switched_all())
> >                         scx_util += util;
> >
> >                 util = scx_util;
> >         }
> >
> >         return effective_cpu_util(cpu, util, NULL, NULL);
> > }

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Xuewen Yan 2 weeks, 4 days ago

On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote:
>
> On 03/18/26 13:55, Vincent Guittot wrote:
> > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index bf948db905ed..20adb6fede2a 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > >
> > > >  unsigned long sched_cpu_util(int cpu)
> > > >  {
> > > > -     return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > +     unsigned long util = scx_cpuperf_target(cpu);
> > > > +
> > > > +     if (!scx_switched_all())
> > > > +             util += cpu_util_cfs(cpu);
> > > > +
> > > > +     return effective_cpu_util(cpu, util, NULL, NULL);
> > > >  }
> > >
> > > This puts the common case of no ext muck into the slow path of that
> > > static_branch.
> >
> > +1
> > I was about to same
> >
> > >
> > > This wants to be something like:
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > >         unsigned long util = cpu_util_cfs(cpu);
> > >
> > >         if (scx_enabled()) {
> > >                 unsigned long scx_util = scx_cpuperf_target(cpu);
> >
> > also scx_cpuperf_target() does not reflect the utilization of the CPU
> > but the targeted perfromance level
>
> Beside that, this sort of plug-and-play is a big concern. You picked up sched
> ext and changed the behavior, then you'd need to get your thermal management to
> work with that. Not retrospectively sprinkle these hacks around to force things
> to work again.

In fact, I had considered this issue even before sending this patch.
Our initial fix was to modify the thermal subsystem specifically,
in cpufreq_cooling.c, when SCX is enabled, we stopped calling
sched_cpu_util() and instead used idle-time-based calculation to
obtain CPU util.

However, we later realized that sched_cpu_util() is used not only in
cpufreq_cooling.c but also in dtpm_cpu.c.
Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
This led us to propose the current patch.
Although scx_cpuperf_target may not accurately reflect the true CPU
utilization, as Christian pointed out, it’s still better than having
nothing at all.

That’s exactly why I marked this patch as RFC, I wasn’t sure whether
the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
It now seems clear that modifying only sched_cpu_util() is
insufficient. Ideally, we should also update cpufreq_cooling.c, since
we cannot guarantee that all SCX BPF programs will provide an accurate
cpuperf_target.
I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.

Thanks!

>
> This is a no from me. I think we have to keep the separation clear. And
> I haven't seen a single contribution back to scheduler out of these
> 'experiments'. Clearly everyone is going their own way and getting the
> threshold for us to get patches merged even higher not to break these out of
> tree 'experiments' is a big nuance. I think the sugov one was already a mistake
> to accept.
>
> >
> >
> > >
> > >                 if (!scx_switched_all())
> > >                         scx_util += util;
> > >
> > >                 util = scx_util;
> > >         }
> > >
> > >         return effective_cpu_util(cpu, util, NULL, NULL);
> > > }

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Qais Yousef 1 week, 6 days ago

On 03/19/26 10:13, Xuewen Yan wrote:

> > Beside that, this sort of plug-and-play is a big concern. You picked up sched
> > ext and changed the behavior, then you'd need to get your thermal management to
> > work with that. Not retrospectively sprinkle these hacks around to force things
> > to work again.
> 
> In fact, I had considered this issue even before sending this patch.
> Our initial fix was to modify the thermal subsystem specifically,
> in cpufreq_cooling.c, when SCX is enabled, we stopped calling
> sched_cpu_util() and instead used idle-time-based calculation to
> obtain CPU util.
> 
> However, we later realized that sched_cpu_util() is used not only in
> cpufreq_cooling.c but also in dtpm_cpu.c.
> Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
> This led us to propose the current patch.
> Although scx_cpuperf_target may not accurately reflect the true CPU
> utilization, as Christian pointed out, it’s still better than having
> nothing at all.
> 
> That’s exactly why I marked this patch as RFC, I wasn’t sure whether
> the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
> It now seems clear that modifying only sched_cpu_util() is
> insufficient. Ideally, we should also update cpufreq_cooling.c, since
> we cannot guarantee that all SCX BPF programs will provide an accurate
> cpuperf_target.
> I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.

The bigger picture problem is that sched_ext is not trustable. For something
that is critical as thermal management you'd have to bake it with your out of
tree sched_ext changes. The current system was designed to work together with
some assumptions, and by opting to sched_ext you're opting out of these in-tree
assumptions. User space scheduler can be used with a user space cpufreq
governor and a userspace thermal solutions; I don't think there are
restrictions to creating out of tree governors.

If you have a problem that needs to be fixed in the scheduler, perhaps we can
help with that ;-)

If you just want to go and do your own thing, perhaps you can go all the way
then and redo it all.

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Lukasz Luba 2 weeks, 4 days ago

Hi Qais and Xuewen,

On 3/19/26 02:13, Xuewen Yan wrote:
> On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote:
>>
>> On 03/18/26 13:55, Vincent Guittot wrote:
>>> On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index bf948db905ed..20adb6fede2a 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>>>>>
>>>>>   unsigned long sched_cpu_util(int cpu)
>>>>>   {
>>>>> -     return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
>>>>> +     unsigned long util = scx_cpuperf_target(cpu);
>>>>> +
>>>>> +     if (!scx_switched_all())
>>>>> +             util += cpu_util_cfs(cpu);
>>>>> +
>>>>> +     return effective_cpu_util(cpu, util, NULL, NULL);
>>>>>   }
>>>>
>>>> This puts the common case of no ext muck into the slow path of that
>>>> static_branch.
>>>
>>> +1
>>> I was about to same
>>>
>>>>
>>>> This wants to be something like:
>>>>
>>>> unsigned long sched_cpu_util(int cpu)
>>>> {
>>>>          unsigned long util = cpu_util_cfs(cpu);
>>>>
>>>>          if (scx_enabled()) {
>>>>                  unsigned long scx_util = scx_cpuperf_target(cpu);
>>>
>>> also scx_cpuperf_target() does not reflect the utilization of the CPU
>>> but the targeted perfromance level
>>
>> Beside that, this sort of plug-and-play is a big concern. You picked up sched
>> ext and changed the behavior, then you'd need to get your thermal management to
>> work with that. Not retrospectively sprinkle these hacks around to force things
>> to work again.
> 
> In fact, I had considered this issue even before sending this patch.
> Our initial fix was to modify the thermal subsystem specifically,
> in cpufreq_cooling.c, when SCX is enabled, we stopped calling
> sched_cpu_util() and instead used idle-time-based calculation to
> obtain CPU util.
> 
> However, we later realized that sched_cpu_util() is used not only in
> cpufreq_cooling.c but also in dtpm_cpu.c.
> Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
> This led us to propose the current patch.
> Although scx_cpuperf_target may not accurately reflect the true CPU
> utilization, as Christian pointed out, it’s still better than having
> nothing at all.
> 
> That’s exactly why I marked this patch as RFC, I wasn’t sure whether
> the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
> It now seems clear that modifying only sched_cpu_util() is
> insufficient. Ideally, we should also update cpufreq_cooling.c, since
> we cannot guarantee that all SCX BPF programs will provide an accurate
> cpuperf_target.
> I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.
> 

If you refer to the utilization driven power estimation - I tend to
agree. You might remember, we had a few corner-cases due to that
design and I tried to address it differently [1][2].

Xuewen please send your proposal for the cpufreq_cooling.c and I will
have a look there. If there still be a need to fix it more deeply
then I can help with some deeper thermal changes.

Estimating the real CPU's power based on available information in the
kernel is really hard. Although, let's try to fix what you
observe in your devices.

Regards,
Lukasz

[1] 
https://lore.kernel.org/linux-pm/20210622075925.16189-1-lukasz.luba@arm.com/

[2] 
https://lore.kernel.org/linux-pm/20220406220809.22555-1-lukasz.luba@arm.com/

Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

Posted by Vincent Guittot 2 weeks, 4 days ago

On Thu, 19 Mar 2026 at 03:13, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote:
> >
> > On 03/18/26 13:55, Vincent Guittot wrote:
> > > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > > index bf948db905ed..20adb6fede2a 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > > >
> > > > >  unsigned long sched_cpu_util(int cpu)
> > > > >  {
> > > > > -     return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > > +     unsigned long util = scx_cpuperf_target(cpu);
> > > > > +
> > > > > +     if (!scx_switched_all())
> > > > > +             util += cpu_util_cfs(cpu);
> > > > > +
> > > > > +     return effective_cpu_util(cpu, util, NULL, NULL);
> > > > >  }
> > > >
> > > > This puts the common case of no ext muck into the slow path of that
> > > > static_branch.
> > >
> > > +1
> > > I was about to same
> > >
> > > >
> > > > This wants to be something like:
> > > >
> > > > unsigned long sched_cpu_util(int cpu)
> > > > {
> > > >         unsigned long util = cpu_util_cfs(cpu);
> > > >
> > > >         if (scx_enabled()) {
> > > >                 unsigned long scx_util = scx_cpuperf_target(cpu);
> > >
> > > also scx_cpuperf_target() does not reflect the utilization of the CPU
> > > but the targeted perfromance level
> >
> > Beside that, this sort of plug-and-play is a big concern. You picked up sched
> > ext and changed the behavior, then you'd need to get your thermal management to
> > work with that. Not retrospectively sprinkle these hacks around to force things
> > to work again.
>
> In fact, I had considered this issue even before sending this patch.
> Our initial fix was to modify the thermal subsystem specifically,
> in cpufreq_cooling.c, when SCX is enabled, we stopped calling
> sched_cpu_util() and instead used idle-time-based calculation to
> obtain CPU util.
>
> However, we later realized that sched_cpu_util() is used not only in
> cpufreq_cooling.c but also in dtpm_cpu.c.
> Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
> This led us to propose the current patch.
> Although scx_cpuperf_target may not accurately reflect the true CPU
> utilization, as Christian pointed out, it’s still better than having
> nothing at all.

scx_cpuperf_target() reflects the targeted performance level, not the
utilisation. This is good for cpufreq but not here as you can have a
high performance target with low cpu utilization. You must provide the
right value

>
> That’s exactly why I marked this patch as RFC, I wasn’t sure whether
> the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
> It now seems clear that modifying only sched_cpu_util() is
> insufficient. Ideally, we should also update cpufreq_cooling.c, since
> we cannot guarantee that all SCX BPF programs will provide an accurate
> cpuperf_target.
> I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.
>
> Thanks!
>
>
> >
> > This is a no from me. I think we have to keep the separation clear. And
> > I haven't seen a single contribution back to scheduler out of these
> > 'experiments'. Clearly everyone is going their own way and getting the
> > threshold for us to get patches merged even higher not to break these out of
> > tree 'experiments' is a big nuance. I think the sugov one was already a mistake
> > to accept.
> >
> > >
> > >
> > > >
> > > >                 if (!scx_switched_all())
> > > >                         scx_util += util;
> > > >
> > > >                 util = scx_util;
> > > >         }
> > > >
> > > >         return effective_cpu_util(cpu, util, NULL, NULL);
> > > > }

[PATCH] sched/cpufreq: Reorder so non-SCX is common path

Posted by Christian Loehle 2 weeks, 5 days ago

scx_enabled() with sugov is probably the rarer case to fair with
sugov, so reorder the condition into scx_enabled() check instead of
the !scx_switched_all() check so the static branch is not checked when
sched_ext isn't enabled.

No functional change intended.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
@Peter
We probably also want the analogous reorder in sugov then....

 kernel/sched/cpufreq_schedutil.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 153232dd8276..06bd453a37eb 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -225,10 +225,16 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 
 static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
 {
-	unsigned long min, max, util = scx_cpuperf_target(sg_cpu->cpu);
+	unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu);
 
-	if (!scx_switched_all())
-		util += cpu_util_cfs_boost(sg_cpu->cpu);
+	if (scx_enabled()) {
+		unsigned long scx_util = scx_cpuperf_target(sg_cpu->cpu);
+
+		if (!scx_switched_all())
+			scx_util += util;
+
+		util = scx_util;
+	}
 	util = effective_cpu_util(sg_cpu->cpu, util, &min, &max);
 	util = max(util, boost);
 	sg_cpu->bw_min = min;
-- 
2.34.1