kernel/sched/fair.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
Recently, while enabling sched-ext debugging, we observed abnormal behavior
in our thermal power_allocator’s temperature control.
Through debugging, we found that the CPU util was too low, causing
the CPU frequency to remain unrestricted.
This issue stems from the fact that in the sched_cpu_util() function,
when scx is enabled, cpu_util_cfs becomes zero. As a result,
the thermal subsystem perceives an extremely low CPU utilization,
which degrades the effectiveness of the power_allocator’s control.
To address this, we propose adding scx_cpuperf_target in the sched_cpu_util()
as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives
accurate load information and restores proper control behavior.
Reported-by: Di Shen <di.shen@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
kernel/sched/fair.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..20adb6fede2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
unsigned long sched_cpu_util(int cpu)
{
- return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
+ unsigned long util = scx_cpuperf_target(cpu);
+
+ if (!scx_switched_all())
+ util += cpu_util_cfs(cpu);
+
+ return effective_cpu_util(cpu, util, NULL, NULL);
}
/*
--
2.25.1
On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > Recently, while enabling sched-ext debugging, we observed abnormal behavior > in our thermal power_allocator’s temperature control. > Through debugging, we found that the CPU util was too low, causing > the CPU frequency to remain unrestricted. > > This issue stems from the fact that in the sched_cpu_util() function, > when scx is enabled, cpu_util_cfs becomes zero. As a result, > the thermal subsystem perceives an extremely low CPU utilization, > which degrades the effectiveness of the power_allocator’s control. > > To address this, we propose adding scx_cpuperf_target in the sched_cpu_util() > as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives > accurate load information and restores proper control behavior. > > Reported-by: Di Shen <di.shen@unisoc.com> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Yeah, I missed that path. In either this or Peter's suggested form: Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun
On 3/18/26 12:17, Xuewen Yan wrote:
> Recently, while enabling sched-ext debugging, we observed abnormal behavior
> in our thermal power_allocator’s temperature control.
> Through debugging, we found that the CPU util was too low, causing
> the CPU frequency to remain unrestricted.
>
> This issue stems from the fact that in the sched_cpu_util() function,
> when scx is enabled, cpu_util_cfs becomes zero. As a result,
> the thermal subsystem perceives an extremely low CPU utilization,
> which degrades the effectiveness of the power_allocator’s control.
>
> To address this, we propose adding scx_cpuperf_target in the sched_cpu_util()
> as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives
> accurate load information and restores proper control behavior.
>
> Reported-by: Di Shen <di.shen@unisoc.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Fixes: d86adb4fc0655 ("sched_ext: Add cpuperf support")?
> ---
> kernel/sched/fair.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..20adb6fede2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>
> unsigned long sched_cpu_util(int cpu)
> {
> - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> + unsigned long util = scx_cpuperf_target(cpu);
> +
> + if (!scx_switched_all())
> + util += cpu_util_cfs(cpu);
> +
> + return effective_cpu_util(cpu, util, NULL, NULL);
> }
>
> /*
Ah right, I guess it's not as good as sugov as there's no guarantee it behaves
somewhat low-pass-filterish, but better than nothing.
No idea if it's worth factoring out this and sugov_get_util, maybe not given that
sugov includes ('temporary') cfs runnable boost while sched_cpu_util() doesn't.
On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..20adb6fede2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>
> unsigned long sched_cpu_util(int cpu)
> {
> - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> + unsigned long util = scx_cpuperf_target(cpu);
> +
> + if (!scx_switched_all())
> + util += cpu_util_cfs(cpu);
> +
> + return effective_cpu_util(cpu, util, NULL, NULL);
> }
This puts the common case of no ext muck into the slow path of that
static_branch.
This wants to be something like:
unsigned long sched_cpu_util(int cpu)
{
unsigned long util = cpu_util_cfs(cpu);
if (scx_enabled()) {
unsigned long scx_util = scx_cpuperf_target(cpu);
if (!scx_switched_all())
scx_util += util;
util = scx_util;
}
return effective_cpu_util(cpu, util, NULL, NULL);
}
On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed..20adb6fede2a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > + unsigned long util = scx_cpuperf_target(cpu);
> > +
> > + if (!scx_switched_all())
> > + util += cpu_util_cfs(cpu);
> > +
> > + return effective_cpu_util(cpu, util, NULL, NULL);
> > }
>
> This puts the common case of no ext muck into the slow path of that
> static_branch.
>
> This wants to be something like:
>
> unsigned long sched_cpu_util(int cpu)
> {
> unsigned long util = cpu_util_cfs(cpu);
>
> if (scx_enabled()) {
> unsigned long scx_util = scx_cpuperf_target(cpu);
>
> if (!scx_switched_all())
> scx_util += util;
>
> util = scx_util;
> }
>
> return effective_cpu_util(cpu, util, NULL, NULL);
> }
scx_switched_all() is an unlikely static branch just like scx_enabled() and
scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
fair path between the two versions is two noop run-throughs vs. one. Either
way is fine but it is more code for likely no discernible gain.
Thanks.
--
tejun
On Wed, Mar 18, 2026 at 03:08:43PM -1000, Tejun Heo wrote:
> On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > + unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > + if (!scx_switched_all())
> > > + util += cpu_util_cfs(cpu);
> > > +
> > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > }
> >
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
> >
> > This wants to be something like:
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> > unsigned long util = cpu_util_cfs(cpu);
> >
> > if (scx_enabled()) {
> > unsigned long scx_util = scx_cpuperf_target(cpu);
> >
> > if (!scx_switched_all())
> > scx_util += util;
> >
> > util = scx_util;
> > }
> >
> > return effective_cpu_util(cpu, util, NULL, NULL);
> > }
>
> scx_switched_all() is an unlikely static branch just like scx_enabled() and
> scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> fair path between the two versions is two noop run-throughs vs. one. Either
> way is fine but it is more code for likely no discernible gain.
(added noinline to effective_cpu_util() for clarity)
So the original patch generates this:
sched_cpu_util:
1c5240: sched_cpu_util+0x0 endbr64
1c5244: sched_cpu_util+0x4 call 0x1c5249 <__fentry__>
1c5249: sched_cpu_util+0x9 push %rbp
1c524a: sched_cpu_util+0xa push %rbx
1c524b: sched_cpu_util+0xb mov %edi,%ebx
1c524d: sched_cpu_util+0xd <jump_table.1c524d>
= nop2 (if DEFAULT)
= jmp 1c5271 <sched_cpu_util+0x31> (if JUMP)
1c524f: sched_cpu_util+0xf xor %ebp,%ebp
1c5251: sched_cpu_util+0x11 <jump_table.1c5251>
= nop2 (if DEFAULT)
= jmp 1c5261 <sched_cpu_util+0x21> (if JUMP)
1c5253: sched_cpu_util+0x13 xor %edx,%edx
1c5255: sched_cpu_util+0x15 xor %esi,%esi
1c5257: sched_cpu_util+0x17 mov %ebx,%edi
1c5259: sched_cpu_util+0x19 call 0x1bc5b0 <cpu_util.constprop.0>
1c525e: sched_cpu_util+0x1e add %rax,%rbp
1c5261: sched_cpu_util+0x21 mov %rbp,%rsi
1c5264: sched_cpu_util+0x24 mov %ebx,%edi
1c5266: sched_cpu_util+0x26 xor %ecx,%ecx
1c5268: sched_cpu_util+0x28 pop %rbx
1c5269: sched_cpu_util+0x29 xor %edx,%edx
1c526b: sched_cpu_util+0x2b pop %rbp
1c526c: sched_cpu_util+0x2c jmp 0x1c5160 <effective_cpu_util>
(slowpath)
1c5271: sched_cpu_util+0x31 movslq %edi,%rdx
1c5274: sched_cpu_util+0x34 mov $0x0,%rax
1c527b: sched_cpu_util+0x3b mov 0x0(,%rdx,8),%rdx
1c5283: sched_cpu_util+0x43 mov 0xa34(%rdx,%rax,1),%ebp
1c528a: sched_cpu_util+0x4a jmp 0x1c5251 <sched_cpu_util+0x11>
While my proposal generates this:
sched_cpu_util:
1c5240: sched_cpu_util+0x0 endbr64
1c5244: sched_cpu_util+0x4 call 0x1c5249 <__fentry__>
1c5249: sched_cpu_util+0x9 push %rbx
1c524a: sched_cpu_util+0xa xor %esi,%esi
1c524c: sched_cpu_util+0xc xor %edx,%edx
1c524e: sched_cpu_util+0xe mov %edi,%ebx
1c5250: sched_cpu_util+0x10 call 0x1bc5b0 <cpu_util.constprop.0>
1c5255: sched_cpu_util+0x15 mov %rax,%rsi
1c5258: sched_cpu_util+0x18 <jump_table.1c5258>
= nop2 (if DEFAULT)
= jmp 1c5266 <sched_cpu_util+0x26> (if JUMP)
1c525a: sched_cpu_util+0x1a mov %ebx,%edi
1c525c: sched_cpu_util+0x1c xor %ecx,%ecx
1c525e: sched_cpu_util+0x1e xor %edx,%edx
1c5260: sched_cpu_util+0x20 pop %rbx
1c5261: sched_cpu_util+0x21 jmp 0x1c5160 <effective_cpu_util>
(slowpath)
1c5266: sched_cpu_util+0x26 <jump_table.1c5266>
= nop2 (if DEFAULT)
= jmp 1c527b <sched_cpu_util+0x3b> (if JUMP)
1c5268: sched_cpu_util+0x28 xor %eax,%eax
1c526a: sched_cpu_util+0x2a <jump_table.1c526a>
= nop2 (if DEFAULT)
= jmp 1c5296 <sched_cpu_util+0x56> (if JUMP)
1c526c: sched_cpu_util+0x2c mov %ebx,%edi
1c526e: sched_cpu_util+0x2e add %rax,%rsi
1c5271: sched_cpu_util+0x31 xor %ecx,%ecx
1c5273: sched_cpu_util+0x33 xor %edx,%edx
1c5275: sched_cpu_util+0x35 pop %rbx
1c5276: sched_cpu_util+0x36 jmp 0x1c5160 <effective_cpu_util>
1c527b: sched_cpu_util+0x3b movslq %ebx,%rdx
1c527e: sched_cpu_util+0x3e mov $0x0,%rax
1c5285: sched_cpu_util+0x45 mov 0x0(,%rdx,8),%rdx
1c528d: sched_cpu_util+0x4d mov 0xa34(%rdx,%rax,1),%eax
1c5294: sched_cpu_util+0x54 jmp 0x1c526a <sched_cpu_util+0x2a>
1c5296: sched_cpu_util+0x56 mov %ebx,%edi
1c5298: sched_cpu_util+0x58 mov %rax,%rsi
1c529b: sched_cpu_util+0x5b xor %ecx,%ecx
1c529d: sched_cpu_util+0x5d xor %edx,%edx
1c529f: sched_cpu_util+0x5f pop %rbx
1c52a0: sched_cpu_util+0x60 jmp 0x1c5160 <effective_cpu_util>
That fastpath is definitely better; the slowpath is worse, but that is
in part because the compilers are stupid and cannot eliminate
static_branch().
/me goes try again .. Yeah, the below patch does nothing :-( It will
happily emit scx_enabled() twice.
---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 05b16299588d..47cd1a1f9784 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -32,7 +32,7 @@
JUMP_TABLE_ENTRY(key, label)
#endif /* CONFIG_HAVE_JUMP_LABEL_HACK */
-static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
+static __always_inline __const bool arch_static_branch(struct static_key * const key, const bool branch)
{
asm goto(ARCH_STATIC_BRANCH_ASM("%c0 + %c1", "%l[l_yes]")
: : "i" (key), "i" (branch) : : l_yes);
@@ -42,7 +42,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co
return true;
}
-static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
+static __always_inline __const bool arch_static_branch_jump(struct static_key * const key, const bool branch)
{
asm goto("1:"
"jmp %l[l_yes]\n\t"
diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index c16d4199bf92..553fc9f3f7eb 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -312,6 +312,7 @@
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute
*/
#define __pure __attribute__((__pure__))
+#define __const __attribute__((__const__))
/*
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute
On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > That fastpath is definitely better; the slowpath is worse, but that is > in part because the compilers are stupid and cannot eliminate > static_branch(). asm gotos are implicitly volatile because they are control flow primitives. The compiler will *not* remove them. Uros.
On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> > That fastpath is definitely better; the slowpath is worse, but that is
> > in part because the compilers are stupid and cannot eliminate
> > static_branch().
>
> asm gotos are implicitly volatile because they are control flow
> primitives. The compiler will *not* remove them.
Yes, but I want ponies ;-)
if (static_branch_unlikely(&foo)) {
if (static_branch_unlikely(&foo)) {
/* A */
} else {
/* B */
}
/* C */
}
Is a very common occurence. And we all know this really should be:
if (static_branch_unlikely(&foo)) {
/* A */
/* C */
}
So how can we make this happen? IMO marking those functions __const
should tell the compiler that yes, it can elimintate them.
You should not try and protect the user. If they use __const
incorrectly, they get to keep the pieces and all that.
On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > That fastpath is definitely better; the slowpath is worse, but that is
> > > in part because the compilers are stupid and cannot eliminate
> > > static_branch().
> >
> > asm gotos are implicitly volatile because they are control flow
> > primitives. The compiler will *not* remove them.
>
> Yes, but I want ponies ;-)
>
> if (static_branch_unlikely(&foo)) {
> if (static_branch_unlikely(&foo)) {
> /* A */
> } else {
> /* B */
> }
> /* C */
> }
>
> Is a very common occurence. And we all know this really should be:
>
> if (static_branch_unlikely(&foo)) {
> /* A */
> /* C */
> }
>
> So how can we make this happen? IMO marking those functions __const
> should tell the compiler that yes, it can elimintate them.
Huh, __const promises that the function does not access global memory
and that the function does not have side effects other than returning
a value. asm volatile inside the __const function creates a side
effect, so removing function calls would be considered a
misoptimization. Probably this could lead to undefined behavior in
terms of what the compiler expects from a __const function.
> You should not try and protect the user. If they use __const
> incorrectly, they get to keep the pieces and all that.
I'm afraid that here the user wants the "__const with only partial
properties of __const" function, where the compiler does not know what
the user really wants.
Uros.
On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote:
> On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > > That fastpath is definitely better; the slowpath is worse, but that is
> > > > in part because the compilers are stupid and cannot eliminate
> > > > static_branch().
> > >
> > > asm gotos are implicitly volatile because they are control flow
> > > primitives. The compiler will *not* remove them.
> >
> > Yes, but I want ponies ;-)
> >
> > if (static_branch_unlikely(&foo)) {
> > if (static_branch_unlikely(&foo)) {
> > /* A */
> > } else {
> > /* B */
> > }
> > /* C */
> > }
> >
> > Is a very common occurence. And we all know this really should be:
> >
> > if (static_branch_unlikely(&foo)) {
> > /* A */
> > /* C */
> > }
> >
> > So how can we make this happen? IMO marking those functions __const
> > should tell the compiler that yes, it can elimintate them.
>
> Huh, __const promises that the function does not access global memory
> and that the function does not have side effects other than returning
> a value. asm volatile inside the __const function creates a side
> effect, so removing function calls would be considered a
> misoptimization. Probably this could lead to undefined behavior in
> terms of what the compiler expects from a __const function.
So since the whole function reduces to a single branch or nop, it does
not in fact access memory or have side effects, right?
(and there is still __pure, for if we somehow consider the key governing
the text patching to be an 'access' in this case)
> > You should not try and protect the user. If they use __const
> > incorrectly, they get to keep the pieces and all that.
>
> I'm afraid that here the user wants the "__const with only partial
> properties of __const" function, where the compiler does not know what
> the user really wants.
Well, clearly I'm not a compiler person :-) I just want to be able to
tell the compiler that it can collapse these branches. Do you have
another option that would be less offensive to compiler folks such as
yourself?
On Thu, Mar 19, 2026 at 12:12:42PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > > That fastpath is definitely better; the slowpath is worse, but that is
> > > > > in part because the compilers are stupid and cannot eliminate
> > > > > static_branch().
> > > >
> > > > asm gotos are implicitly volatile because they are control flow
> > > > primitives. The compiler will *not* remove them.
> > >
> > > Yes, but I want ponies ;-)
> > >
> > > if (static_branch_unlikely(&foo)) {
> > > if (static_branch_unlikely(&foo)) {
> > > /* A */
> > > } else {
> > > /* B */
> > > }
> > > /* C */
> > > }
> > >
> > > Is a very common occurence. And we all know this really should be:
> > >
> > > if (static_branch_unlikely(&foo)) {
> > > /* A */
> > > /* C */
> > > }
> > >
> > > So how can we make this happen? IMO marking those functions __const
> > > should tell the compiler that yes, it can elimintate them.
> >
> > Huh, __const promises that the function does not access global memory
> > and that the function does not have side effects other than returning
> > a value. asm volatile inside the __const function creates a side
> > effect, so removing function calls would be considered a
> > misoptimization. Probably this could lead to undefined behavior in
> > terms of what the compiler expects from a __const function.
>
> So since the whole function reduces to a single branch or nop, it does
> not in fact access memory or have side effects, right?
>
> (and there is still __pure, for if we somehow consider the key governing
> the text patching to be an 'access' in this case)
So is it really just the problem that since the compiler doesn't know
what asm volatile ends up doing, it must assume the worse and hence
invalidates the __const (without a warning even).
So you're letting the unknown weight heavier than the explicit
instruction from the author who put on __const (and supposedly does
know).
And I feel somewhat strongly that this is wrong.
Yes, if the author gets it wrong, and there are side effects and things
are elided it is a misoptimization, but that is on the author. You can't
blame the compiler for doing what it was told to do.
Or is there really more to this?
On Thu, Mar 19, 2026 at 12:12 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote:
> > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > > That fastpath is definitely better; the slowpath is worse, but that is
> > > > > in part because the compilers are stupid and cannot eliminate
> > > > > static_branch().
> > > >
> > > > asm gotos are implicitly volatile because they are control flow
> > > > primitives. The compiler will *not* remove them.
> > >
> > > Yes, but I want ponies ;-)
> > >
> > > if (static_branch_unlikely(&foo)) {
> > > if (static_branch_unlikely(&foo)) {
> > > /* A */
> > > } else {
> > > /* B */
> > > }
> > > /* C */
> > > }
> > >
> > > Is a very common occurence. And we all know this really should be:
> > >
> > > if (static_branch_unlikely(&foo)) {
> > > /* A */
> > > /* C */
> > > }
> > >
> > > So how can we make this happen? IMO marking those functions __const
> > > should tell the compiler that yes, it can elimintate them.
> >
> > Huh, __const promises that the function does not access global memory
> > and that the function does not have side effects other than returning
> > a value. asm volatile inside the __const function creates a side
> > effect, so removing function calls would be considered a
> > misoptimization. Probably this could lead to undefined behavior in
> > terms of what the compiler expects from a __const function.
>
> So since the whole function reduces to a single branch or nop, it does
> not in fact access memory or have side effects, right?
But there is "asm goto" inside, this tells the compiler that there is
a side effect, no matter what is written in the asm template. Even
'asm volatile ("nop");' inside function declares a side effect.
> (and there is still __pure, for if we somehow consider the key governing
> the text patching to be an 'access' in this case)
__pure function also can't have side effects.
> > > You should not try and protect the user. If they use __const
> > > incorrectly, they get to keep the pieces and all that.
> >
> > I'm afraid that here the user wants the "__const with only partial
> > properties of __const" function, where the compiler does not know what
> > the user really wants.
>
> Well, clearly I'm not a compiler person :-) I just want to be able to
> tell the compiler that it can collapse these branches. Do you have
> another option that would be less offensive to compiler folks such as
> yourself?
I don't see any in this case (mixing volatiles with __const or __pure
attribute), but please ask on gcc@ mailing list, if there is any other
option I have missed.
Uros.
On Thu, Mar 19, 2026 at 12:19:50PM +0100, Uros Bizjak wrote: > I don't see any in this case (mixing volatiles with __const or __pure > attribute), but please ask on gcc@ mailing list, if there is any other > option I have missed. I raised this on toolchains a while ago, but that discussions got derailed into irrelevance :/ https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass.net/ I really just want it fixed, I don't particularly care how.
On Thu, Mar 19, 2026 at 9:08 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > + unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > + if (!scx_switched_all())
> > > + util += cpu_util_cfs(cpu);
> > > +
> > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > }
> >
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
> >
> > This wants to be something like:
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> > unsigned long util = cpu_util_cfs(cpu);
> >
> > if (scx_enabled()) {
> > unsigned long scx_util = scx_cpuperf_target(cpu);
> >
> > if (!scx_switched_all())
> > scx_util += util;
> >
> > util = scx_util;
> > }
> >
> > return effective_cpu_util(cpu, util, NULL, NULL);
> > }
>
> scx_switched_all() is an unlikely static branch just like scx_enabled() and
> scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> fair path between the two versions is two noop run-throughs vs. one. Either
> way is fine but it is more code for likely no discernible gain.
Agree, and for scx_switched_all(), there is no need to call cpu_util_cfs(),
Maybe we could modify it as follows?
--->
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..8f5d0c7e8ea2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8198,7 +8198,18 @@ unsigned long effective_cpu_util(int cpu,
unsigned long util_cfs,
unsigned long sched_cpu_util(int cpu)
{
- return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
+ unsigned long util;
+
+ if (scx_enabled()) {
+ unsigned long util = scx_cpuperf_target(cpu);
+
+ if (!scx_switched_all())
+ util += cpu_util_cfs(cpu);
+ } else {
+ util = cpu_util_cfs(cpu);
+ }
+
+ return effective_cpu_util(cpu, util, NULL, NULL);
}
---
Thanks!
On Thu, Mar 19, 2026 at 10:24 AM Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> On Thu, Mar 19, 2026 at 9:08 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index bf948db905ed..20adb6fede2a 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > >
> > > > unsigned long sched_cpu_util(int cpu)
> > > > {
> > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > + unsigned long util = scx_cpuperf_target(cpu);
> > > > +
> > > > + if (!scx_switched_all())
> > > > + util += cpu_util_cfs(cpu);
> > > > +
> > > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > > }
> > >
> > > This puts the common case of no ext muck into the slow path of that
> > > static_branch.
> > >
> > > This wants to be something like:
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > > unsigned long util = cpu_util_cfs(cpu);
> > >
> > > if (scx_enabled()) {
> > > unsigned long scx_util = scx_cpuperf_target(cpu);
> > >
> > > if (!scx_switched_all())
> > > scx_util += util;
> > >
> > > util = scx_util;
> > > }
> > >
> > > return effective_cpu_util(cpu, util, NULL, NULL);
> > > }
> >
> > scx_switched_all() is an unlikely static branch just like scx_enabled() and
> > scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> > fair path between the two versions is two noop run-throughs vs. one. Either
> > way is fine but it is more code for likely no discernible gain.
>
> Agree, and for scx_switched_all(), there is no need to call cpu_util_cfs(),
> Maybe we could modify it as follows?
>
> --->
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..8f5d0c7e8ea2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8198,7 +8198,18 @@ unsigned long effective_cpu_util(int cpu,
> unsigned long util_cfs,
>
> unsigned long sched_cpu_util(int cpu)
> {
> - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> + unsigned long util;
> +
> + if (scx_enabled()) {
> + unsigned long util = scx_cpuperf_target(cpu);
Sorry, that was a typo, this will be fixed in patch v2 if the proposed
fix is acceptable.
Thanks!
> +
> + if (!scx_switched_all())
> + util += cpu_util_cfs(cpu);
> + } else {
> + util = cpu_util_cfs(cpu);
> + }
> +
> + return effective_cpu_util(cpu, util, NULL, NULL);
> }
> ---
>
>
> Thanks!
On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed..20adb6fede2a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > + unsigned long util = scx_cpuperf_target(cpu);
> > +
> > + if (!scx_switched_all())
> > + util += cpu_util_cfs(cpu);
> > +
> > + return effective_cpu_util(cpu, util, NULL, NULL);
> > }
>
> This puts the common case of no ext muck into the slow path of that
> static_branch.
+1
I was about to same
>
> This wants to be something like:
>
> unsigned long sched_cpu_util(int cpu)
> {
> unsigned long util = cpu_util_cfs(cpu);
>
> if (scx_enabled()) {
> unsigned long scx_util = scx_cpuperf_target(cpu);
also scx_cpuperf_target() does not reflect the utilization of the CPU
but the targeted perfromance level
>
> if (!scx_switched_all())
> scx_util += util;
>
> util = scx_util;
> }
>
> return effective_cpu_util(cpu, util, NULL, NULL);
> }
On 03/18/26 13:55, Vincent Guittot wrote:
> On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > + unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > + if (!scx_switched_all())
> > > + util += cpu_util_cfs(cpu);
> > > +
> > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > }
> >
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
>
> +1
> I was about to same
>
> >
> > This wants to be something like:
> >
> > unsigned long sched_cpu_util(int cpu)
> > {
> > unsigned long util = cpu_util_cfs(cpu);
> >
> > if (scx_enabled()) {
> > unsigned long scx_util = scx_cpuperf_target(cpu);
>
> also scx_cpuperf_target() does not reflect the utilization of the CPU
> but the targeted perfromance level
Beside that, this sort of plug-and-play is a big concern. You picked up sched
ext and changed the behavior, then you'd need to get your thermal management to
work with that. Not retrospectively sprinkle these hacks around to force things
to work again.
This is a no from me. I think we have to keep the separation clear. And
I haven't seen a single contribution back to scheduler out of these
'experiments'. Clearly everyone is going their own way and getting the
threshold for us to get patches merged even higher not to break these out of
tree 'experiments' is a big nuance. I think the sugov one was already a mistake
to accept.
>
>
> >
> > if (!scx_switched_all())
> > scx_util += util;
> >
> > util = scx_util;
> > }
> >
> > return effective_cpu_util(cpu, util, NULL, NULL);
> > }
On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote:
>
> On 03/18/26 13:55, Vincent Guittot wrote:
> > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index bf948db905ed..20adb6fede2a 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > >
> > > > unsigned long sched_cpu_util(int cpu)
> > > > {
> > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > + unsigned long util = scx_cpuperf_target(cpu);
> > > > +
> > > > + if (!scx_switched_all())
> > > > + util += cpu_util_cfs(cpu);
> > > > +
> > > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > > }
> > >
> > > This puts the common case of no ext muck into the slow path of that
> > > static_branch.
> >
> > +1
> > I was about to same
> >
> > >
> > > This wants to be something like:
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > > unsigned long util = cpu_util_cfs(cpu);
> > >
> > > if (scx_enabled()) {
> > > unsigned long scx_util = scx_cpuperf_target(cpu);
> >
> > also scx_cpuperf_target() does not reflect the utilization of the CPU
> > but the targeted perfromance level
>
> Beside that, this sort of plug-and-play is a big concern. You picked up sched
> ext and changed the behavior, then you'd need to get your thermal management to
> work with that. Not retrospectively sprinkle these hacks around to force things
> to work again.
In fact, I had considered this issue even before sending this patch.
Our initial fix was to modify the thermal subsystem specifically,
in cpufreq_cooling.c, when SCX is enabled, we stopped calling
sched_cpu_util() and instead used idle-time-based calculation to
obtain CPU util.
However, we later realized that sched_cpu_util() is used not only in
cpufreq_cooling.c but also in dtpm_cpu.c.
Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
This led us to propose the current patch.
Although scx_cpuperf_target may not accurately reflect the true CPU
utilization, as Christian pointed out, it’s still better than having
nothing at all.
That’s exactly why I marked this patch as RFC, I wasn’t sure whether
the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
It now seems clear that modifying only sched_cpu_util() is
insufficient. Ideally, we should also update cpufreq_cooling.c, since
we cannot guarantee that all SCX BPF programs will provide an accurate
cpuperf_target.
I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.
Thanks!
>
> This is a no from me. I think we have to keep the separation clear. And
> I haven't seen a single contribution back to scheduler out of these
> 'experiments'. Clearly everyone is going their own way and getting the
> threshold for us to get patches merged even higher not to break these out of
> tree 'experiments' is a big nuance. I think the sugov one was already a mistake
> to accept.
>
> >
> >
> > >
> > > if (!scx_switched_all())
> > > scx_util += util;
> > >
> > > util = scx_util;
> > > }
> > >
> > > return effective_cpu_util(cpu, util, NULL, NULL);
> > > }
On 03/19/26 10:13, Xuewen Yan wrote: > > Beside that, this sort of plug-and-play is a big concern. You picked up sched > > ext and changed the behavior, then you'd need to get your thermal management to > > work with that. Not retrospectively sprinkle these hacks around to force things > > to work again. > > In fact, I had considered this issue even before sending this patch. > Our initial fix was to modify the thermal subsystem specifically, > in cpufreq_cooling.c, when SCX is enabled, we stopped calling > sched_cpu_util() and instead used idle-time-based calculation to > obtain CPU util. > > However, we later realized that sched_cpu_util() is used not only in > cpufreq_cooling.c but also in dtpm_cpu.c. > Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it. > This led us to propose the current patch. > Although scx_cpuperf_target may not accurately reflect the true CPU > utilization, as Christian pointed out, it’s still better than having > nothing at all. > > That’s exactly why I marked this patch as RFC, I wasn’t sure whether > the proper fix should be in cpufreq_cooling.c or in sched_cpu_util(). > It now seems clear that modifying only sched_cpu_util() is > insufficient. Ideally, we should also update cpufreq_cooling.c, since > we cannot guarantee that all SCX BPF programs will provide an accurate > cpuperf_target. > I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly. The bigger picture problem is that sched_ext is not trustable. For something that is critical as thermal management you'd have to bake it with your out of tree sched_ext changes. The current system was designed to work together with some assumptions, and by opting to sched_ext you're opting out of these in-tree assumptions. User space scheduler can be used with a user space cpufreq governor and a userspace thermal solutions; I don't think there are restrictions to creating out of tree governors. If you have a problem that needs to be fixed in the scheduler, perhaps we can help with that ;-) If you just want to go and do your own thing, perhaps you can go all the way then and redo it all.
Hi Qais and Xuewen,
On 3/19/26 02:13, Xuewen Yan wrote:
> On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote:
>>
>> On 03/18/26 13:55, Vincent Guittot wrote:
>>> On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index bf948db905ed..20adb6fede2a 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>>>>>
>>>>> unsigned long sched_cpu_util(int cpu)
>>>>> {
>>>>> - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
>>>>> + unsigned long util = scx_cpuperf_target(cpu);
>>>>> +
>>>>> + if (!scx_switched_all())
>>>>> + util += cpu_util_cfs(cpu);
>>>>> +
>>>>> + return effective_cpu_util(cpu, util, NULL, NULL);
>>>>> }
>>>>
>>>> This puts the common case of no ext muck into the slow path of that
>>>> static_branch.
>>>
>>> +1
>>> I was about to same
>>>
>>>>
>>>> This wants to be something like:
>>>>
>>>> unsigned long sched_cpu_util(int cpu)
>>>> {
>>>> unsigned long util = cpu_util_cfs(cpu);
>>>>
>>>> if (scx_enabled()) {
>>>> unsigned long scx_util = scx_cpuperf_target(cpu);
>>>
>>> also scx_cpuperf_target() does not reflect the utilization of the CPU
>>> but the targeted perfromance level
>>
>> Beside that, this sort of plug-and-play is a big concern. You picked up sched
>> ext and changed the behavior, then you'd need to get your thermal management to
>> work with that. Not retrospectively sprinkle these hacks around to force things
>> to work again.
>
> In fact, I had considered this issue even before sending this patch.
> Our initial fix was to modify the thermal subsystem specifically,
> in cpufreq_cooling.c, when SCX is enabled, we stopped calling
> sched_cpu_util() and instead used idle-time-based calculation to
> obtain CPU util.
>
> However, we later realized that sched_cpu_util() is used not only in
> cpufreq_cooling.c but also in dtpm_cpu.c.
> Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
> This led us to propose the current patch.
> Although scx_cpuperf_target may not accurately reflect the true CPU
> utilization, as Christian pointed out, it’s still better than having
> nothing at all.
>
> That’s exactly why I marked this patch as RFC, I wasn’t sure whether
> the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
> It now seems clear that modifying only sched_cpu_util() is
> insufficient. Ideally, we should also update cpufreq_cooling.c, since
> we cannot guarantee that all SCX BPF programs will provide an accurate
> cpuperf_target.
> I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.
>
If you refer to the utilization driven power estimation - I tend to
agree. You might remember, we had a few corner-cases due to that
design and I tried to address it differently [1][2].
Xuewen please send your proposal for the cpufreq_cooling.c and I will
have a look there. If there still be a need to fix it more deeply
then I can help with some deeper thermal changes.
Estimating the real CPU's power based on available information in the
kernel is really hard. Although, let's try to fix what you
observe in your devices.
Regards,
Lukasz
[1]
https://lore.kernel.org/linux-pm/20210622075925.16189-1-lukasz.luba@arm.com/
[2]
https://lore.kernel.org/linux-pm/20220406220809.22555-1-lukasz.luba@arm.com/
On Thu, 19 Mar 2026 at 03:13, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote:
> >
> > On 03/18/26 13:55, Vincent Guittot wrote:
> > > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > > index bf948db905ed..20adb6fede2a 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > > >
> > > > > unsigned long sched_cpu_util(int cpu)
> > > > > {
> > > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > > + unsigned long util = scx_cpuperf_target(cpu);
> > > > > +
> > > > > + if (!scx_switched_all())
> > > > > + util += cpu_util_cfs(cpu);
> > > > > +
> > > > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > > > }
> > > >
> > > > This puts the common case of no ext muck into the slow path of that
> > > > static_branch.
> > >
> > > +1
> > > I was about to same
> > >
> > > >
> > > > This wants to be something like:
> > > >
> > > > unsigned long sched_cpu_util(int cpu)
> > > > {
> > > > unsigned long util = cpu_util_cfs(cpu);
> > > >
> > > > if (scx_enabled()) {
> > > > unsigned long scx_util = scx_cpuperf_target(cpu);
> > >
> > > also scx_cpuperf_target() does not reflect the utilization of the CPU
> > > but the targeted perfromance level
> >
> > Beside that, this sort of plug-and-play is a big concern. You picked up sched
> > ext and changed the behavior, then you'd need to get your thermal management to
> > work with that. Not retrospectively sprinkle these hacks around to force things
> > to work again.
>
> In fact, I had considered this issue even before sending this patch.
> Our initial fix was to modify the thermal subsystem specifically,
> in cpufreq_cooling.c, when SCX is enabled, we stopped calling
> sched_cpu_util() and instead used idle-time-based calculation to
> obtain CPU util.
>
> However, we later realized that sched_cpu_util() is used not only in
> cpufreq_cooling.c but also in dtpm_cpu.c.
> Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
> This led us to propose the current patch.
> Although scx_cpuperf_target may not accurately reflect the true CPU
> utilization, as Christian pointed out, it’s still better than having
> nothing at all.
scx_cpuperf_target() reflects the targeted performance level, not the
utilisation. This is good for cpufreq but not here as you can have a
high performance target with low cpu utilization. You must provide the
right value
>
> That’s exactly why I marked this patch as RFC, I wasn’t sure whether
> the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
> It now seems clear that modifying only sched_cpu_util() is
> insufficient. Ideally, we should also update cpufreq_cooling.c, since
> we cannot guarantee that all SCX BPF programs will provide an accurate
> cpuperf_target.
> I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.
>
> Thanks!
>
>
> >
> > This is a no from me. I think we have to keep the separation clear. And
> > I haven't seen a single contribution back to scheduler out of these
> > 'experiments'. Clearly everyone is going their own way and getting the
> > threshold for us to get patches merged even higher not to break these out of
> > tree 'experiments' is a big nuance. I think the sugov one was already a mistake
> > to accept.
> >
> > >
> > >
> > > >
> > > > if (!scx_switched_all())
> > > > scx_util += util;
> > > >
> > > > util = scx_util;
> > > > }
> > > >
> > > > return effective_cpu_util(cpu, util, NULL, NULL);
> > > > }
scx_enabled() with sugov is probably the rarer case to fair with
sugov, so reorder the condition into scx_enabled() check instead of
the !scx_switched_all() check so the static branch is not checked when
sched_ext isn't enabled.
No functional change intended.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
@Peter
We probably also want the analogous reorder in sugov then....
kernel/sched/cpufreq_schedutil.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 153232dd8276..06bd453a37eb 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -225,10 +225,16 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
{
- unsigned long min, max, util = scx_cpuperf_target(sg_cpu->cpu);
+ unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu);
- if (!scx_switched_all())
- util += cpu_util_cfs_boost(sg_cpu->cpu);
+ if (scx_enabled()) {
+ unsigned long scx_util = scx_cpuperf_target(sg_cpu->cpu);
+
+ if (!scx_switched_all())
+ scx_util += util;
+
+ util = scx_util;
+ }
util = effective_cpu_util(sg_cpu->cpu, util, &min, &max);
util = max(util, boost);
sg_cpu->bw_min = min;
--
2.34.1
© 2016 - 2026 Red Hat, Inc.