arch/x86/include/asm/mwait.h | 3 ++- arch/x86/kernel/process.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
The inline asm used with alternative_input() specifies the address
operand for clflush with the "a" input operand constraint and
explicit "(%[addr])" dereference:
"clflush (%[addr])", [addr] "a" (addr)
This forces the pointer into %rax and manually encodes the memory
operand in the template. Instead, use the %a address operand
modifier and relax the constraint from "a" to "r":
"clflush %a[addr]", [addr] "r" (addr)
This lets the compiler choose the register while generating the
correct addressing mode.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/include/asm/mwait.h | 3 ++-
arch/x86/kernel/process.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index e4815e15dc9a..fcb7299b293a 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,7 +116,8 @@ static __always_inline void mwait_idle_with_hints(u32 eax, u32 ecx)
if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
const void *addr = ¤t_thread_info()->flags;
- alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+ alternative_input("", "clflush %a[addr]",
+ X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
__monitor(addr, 0, 0);
if (need_resched())
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4c718f8adc59..8e295fb19b10 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -921,7 +921,8 @@ static __cpuidle void mwait_idle(void)
if (!current_set_polling_and_test()) {
const void *addr = ¤t_thread_info()->flags;
- alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+ alternative_input("", "clflush %a[addr]",
+ X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
__monitor(addr, 0, 0);
if (need_resched())
goto out;
--
2.53.0
On Wed, 18 Mar 2026 10:08:11 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:
> The inline asm used with alternative_input() specifies the address
> operand for clflush with the "a" input operand constraint and
> explicit "(%[addr])" dereference:
>
> "clflush (%[addr])", [addr] "a" (addr)
>
> This forces the pointer into %rax and manually encodes the memory
> operand in the template. Instead, use the %a address operand
> modifier and relax the constraint from "a" to "r":
>
> "clflush %a[addr]", [addr] "r" (addr)
>
> This lets the compiler choose the register while generating the
> correct addressing mode.
Aren't these two independent changes?
%a saves you having to know how to write the memory reference for the
architecture - so is the same as (%[addr]) (assuming att syntax).
I think the assembler handles the one 'odd' case of (%rbp).
Was there ever a reason for using "a" rather than "r" - it seems an
unusual choice.
I also think there should be a "memory" clobber - but it probably
makes no difference for these two cases.
David
>
> No functional change intended.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
> arch/x86/include/asm/mwait.h | 3 ++-
> arch/x86/kernel/process.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index e4815e15dc9a..fcb7299b293a 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -116,7 +116,8 @@ static __always_inline void mwait_idle_with_hints(u32 eax, u32 ecx)
> if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
> const void *addr = ¤t_thread_info()->flags;
>
> - alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
> + alternative_input("", "clflush %a[addr]",
> + X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
> __monitor(addr, 0, 0);
>
> if (need_resched())
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 4c718f8adc59..8e295fb19b10 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -921,7 +921,8 @@ static __cpuidle void mwait_idle(void)
> if (!current_set_polling_and_test()) {
> const void *addr = ¤t_thread_info()->flags;
>
> - alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
> + alternative_input("", "clflush %a[addr]",
> + X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
> __monitor(addr, 0, 0);
> if (need_resched())
> goto out;
On Wed, Mar 18, 2026 at 4:03 PM David Laight <david.laight.linux@gmail.com> wrote: > > On Wed, 18 Mar 2026 10:08:11 +0100 > Uros Bizjak <ubizjak@gmail.com> wrote: > > > The inline asm used with alternative_input() specifies the address > > operand for clflush with the "a" input operand constraint and > > explicit "(%[addr])" dereference: > > > > "clflush (%[addr])", [addr] "a" (addr) > > > > This forces the pointer into %rax and manually encodes the memory > > operand in the template. Instead, use the %a address operand > > modifier and relax the constraint from "a" to "r": > > > > "clflush %a[addr]", [addr] "r" (addr) > > > > This lets the compiler choose the register while generating the > > correct addressing mode. > > Aren't these two independent changes? I was hoping I can put a trivial "a" -> "r" change under the "also ..." change. OTOH, let's change the summary to "x86/asm: Improve clflush alternatives assembly", that will also handle your proposed addition of "memory" clobber. > %a saves you having to know how to write the memory reference for the > architecture - so is the same as (%[addr]) (assuming att syntax). > I think the assembler handles the one 'odd' case of (%rbp). Yes, it does, and also fixes another 'odd' case of (%r13). > Was there ever a reason for using "a" rather than "r" - it seems an > unusual choice. Probably just an oversight due to a follow-up __monitor() that wants its operand in %rax. > I also think there should be a "memory" clobber - but it probably > makes no difference for these two cases. Hm, I think this is a good proposal. The pointer in the register is invisible to the compiler memory tracker, so the compiler is free to schedule (potentially related!) memory access around clflush. The clobber doesn't make a difference in this particular case, but should be there nevertheless as a memory read/write barrier. Thanks, Uros.
On Wed, 18 Mar 2026 16:45:28 +0100 Uros Bizjak <ubizjak@gmail.com> wrote: > On Wed, Mar 18, 2026 at 4:03 PM David Laight > <david.laight.linux@gmail.com> wrote: > > > > On Wed, 18 Mar 2026 10:08:11 +0100 > > Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > The inline asm used with alternative_input() specifies the address > > > operand for clflush with the "a" input operand constraint and > > > explicit "(%[addr])" dereference: > > > > > > "clflush (%[addr])", [addr] "a" (addr) > > > > > > This forces the pointer into %rax and manually encodes the memory > > > operand in the template. Instead, use the %a address operand > > > modifier and relax the constraint from "a" to "r": > > > > > > "clflush %a[addr]", [addr] "r" (addr) > > > > > > This lets the compiler choose the register while generating the > > > correct addressing mode. > > > > Aren't these two independent changes? > > I was hoping I can put a trivial "a" -> "r" change under the "also > ..." change. OTOH, let's change the summary to "x86/asm: Improve > clflush alternatives assembly", that will also handle your proposed > addition of "memory" clobber. > > > %a saves you having to know how to write the memory reference for the > > architecture - so is the same as (%[addr]) (assuming att syntax). > > I think the assembler handles the one 'odd' case of (%rbp). > > Yes, it does, and also fixes another 'odd' case of (%r13). > > > Was there ever a reason for using "a" rather than "r" - it seems an > > unusual choice. > > Probably just an oversight due to a follow-up __monitor() that wants > its operand in %rax. Actually gcc can be quite bad are reverse tracking register requirements. So forcing 'addr' into %rax for the cflush might actually remove a register move before the monitor. Indeed, were it to pick a different register there will always be a extra register move. If the value is in a different register (eg from a function call) then you'll move the register move instruction - but there'll still be one. So I suspect this change can never improve the code. David > > > I also think there should be a "memory" clobber - but it probably > > makes no difference for these two cases. > > Hm, I think this is a good proposal. The pointer in the register is > invisible to the compiler memory tracker, so the compiler is free to > schedule (potentially related!) memory access around clflush. The > clobber doesn't make a difference in this particular case, but should > be there nevertheless as a memory read/write barrier. > > Thanks, > Uros. >
On Thu, Mar 19, 2026 at 11:20 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 18 Mar 2026 16:45:28 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > On Wed, Mar 18, 2026 at 4:03 PM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Wed, 18 Mar 2026 10:08:11 +0100
> > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > The inline asm used with alternative_input() specifies the address
> > > > operand for clflush with the "a" input operand constraint and
> > > > explicit "(%[addr])" dereference:
> > > >
> > > > "clflush (%[addr])", [addr] "a" (addr)
> > > >
> > > > This forces the pointer into %rax and manually encodes the memory
> > > > operand in the template. Instead, use the %a address operand
> > > > modifier and relax the constraint from "a" to "r":
> > > >
> > > > "clflush %a[addr]", [addr] "r" (addr)
> > > >
> > > > This lets the compiler choose the register while generating the
> > > > correct addressing mode.
> > >
> > > Aren't these two independent changes?
> >
> > I was hoping I can put a trivial "a" -> "r" change under the "also
> > ..." change. OTOH, let's change the summary to "x86/asm: Improve
> > clflush alternatives assembly", that will also handle your proposed
> > addition of "memory" clobber.
> >
> > > %a saves you having to know how to write the memory reference for the
> > > architecture - so is the same as (%[addr]) (assuming att syntax).
> > > I think the assembler handles the one 'odd' case of (%rbp).
> >
> > Yes, it does, and also fixes another 'odd' case of (%r13).
> >
> > > Was there ever a reason for using "a" rather than "r" - it seems an
> > > unusual choice.
> >
> > Probably just an oversight due to a follow-up __monitor() that wants
> > its operand in %rax.
>
> Actually gcc can be quite bad are reverse tracking register requirements.
This must be a very old GCC as I'm not aware of this deficiency.
--cut here--
void foo (int a)
{
asm volatile ("# 1" : : "r" (a));
asm volatile ("# 2" : : "a" (a));
}
void bar (int a)
{
asm volatile ("# 1" : : "a" (a));
asm volatile ("# 2" : : "a" (a));
}
--cut here--
foo:
movl %edi, %eax
# 1
# 2
ret
bar:
movl %edi, %eax
# 1
# 2
ret
Do you perhaps have a testcase to illustrate your claim?
> So forcing 'addr' into %rax for the cflush might actually remove
> a register move before the monitor.
> Indeed, were it to pick a different register there will always be a
> extra register move.
> If the value is in a different register (eg from a function call)
> then you'll move the register move instruction - but there'll still
> be one.
>
> So I suspect this change can never improve the code.
Of course, there will always be a register move in the above case, but
please look at [1].
[1] https://claude.ai/share/cf559f66-dfcf-451a-8260-6f687aead052
Uros.
On Thu, 19 Mar 2026 11:45:59 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Mar 19, 2026 at 11:20 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Wed, 18 Mar 2026 16:45:28 +0100
> > Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > > On Wed, Mar 18, 2026 at 4:03 PM David Laight
> > > <david.laight.linux@gmail.com> wrote:
> > > >
> > > > On Wed, 18 Mar 2026 10:08:11 +0100
> > > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > > The inline asm used with alternative_input() specifies the address
> > > > > operand for clflush with the "a" input operand constraint and
> > > > > explicit "(%[addr])" dereference:
> > > > >
> > > > > "clflush (%[addr])", [addr] "a" (addr)
> > > > >
> > > > > This forces the pointer into %rax and manually encodes the memory
> > > > > operand in the template. Instead, use the %a address operand
> > > > > modifier and relax the constraint from "a" to "r":
> > > > >
> > > > > "clflush %a[addr]", [addr] "r" (addr)
> > > > >
> > > > > This lets the compiler choose the register while generating the
> > > > > correct addressing mode.
> > > >
> > > > Aren't these two independent changes?
> > >
> > > I was hoping I can put a trivial "a" -> "r" change under the "also
> > > ..." change. OTOH, let's change the summary to "x86/asm: Improve
> > > clflush alternatives assembly", that will also handle your proposed
> > > addition of "memory" clobber.
> > >
> > > > %a saves you having to know how to write the memory reference for the
> > > > architecture - so is the same as (%[addr]) (assuming att syntax).
> > > > I think the assembler handles the one 'odd' case of (%rbp).
> > >
> > > Yes, it does, and also fixes another 'odd' case of (%r13).
> > >
> > > > Was there ever a reason for using "a" rather than "r" - it seems an
> > > > unusual choice.
> > >
> > > Probably just an oversight due to a follow-up __monitor() that wants
> > > its operand in %rax.
> >
> > Actually gcc can be quite bad are reverse tracking register requirements.
>
> This must be a very old GCC as I'm not aware of this deficiency.
>
> --cut here--
> void foo (int a)
> {
> asm volatile ("# 1" : : "r" (a));
> asm volatile ("# 2" : : "a" (a));
> }
>
> void bar (int a)
> {
> asm volatile ("# 1" : : "a" (a));
> asm volatile ("# 2" : : "a" (a));
> }
> --cut here--
>
> foo:
> movl %edi, %eax
> # 1
> # 2
> ret
>
> bar:
> movl %edi, %eax
> # 1
> # 2
> ret
>
> Do you perhaps have a testcase to illustrate your claim?
If you look at enough gcc output you'll see places where there are register
moves that look like they could be removed by adjusting the register
assignments.
I'm pretty sure Linus has commented about that as well.
Whether it can happen in this trivial case is another matter.
Oh - I can't see anything in the gcc 15.2 doc that says that the order
of 'asm volatile' statements can't get swapped.
I'm also pretty sure that some older (possibly very much older) versions
definitely would swap them over.
There might have been a post from someone saying that 'it doesn't do that
any more', but it isn't documented.
David
>
> > So forcing 'addr' into %rax for the cflush might actually remove
> > a register move before the monitor.
> > Indeed, were it to pick a different register there will always be a
> > extra register move.
> > If the value is in a different register (eg from a function call)
> > then you'll move the register move instruction - but there'll still
> > be one.
> >
> > So I suspect this change can never improve the code.
>
> Of course, there will always be a register move in the above case, but
> please look at [1].
>
> [1] https://claude.ai/share/cf559f66-dfcf-451a-8260-6f687aead052
>
> Uros.
On Thu, Mar 19, 2026 at 12:21 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Thu, 19 Mar 2026 11:45:59 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > On Thu, Mar 19, 2026 at 11:20 AM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Wed, 18 Mar 2026 16:45:28 +0100
> > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > On Wed, Mar 18, 2026 at 4:03 PM David Laight
> > > > <david.laight.linux@gmail.com> wrote:
> > > > >
> > > > > On Wed, 18 Mar 2026 10:08:11 +0100
> > > > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > > The inline asm used with alternative_input() specifies the address
> > > > > > operand for clflush with the "a" input operand constraint and
> > > > > > explicit "(%[addr])" dereference:
> > > > > >
> > > > > > "clflush (%[addr])", [addr] "a" (addr)
> > > > > >
> > > > > > This forces the pointer into %rax and manually encodes the memory
> > > > > > operand in the template. Instead, use the %a address operand
> > > > > > modifier and relax the constraint from "a" to "r":
> > > > > >
> > > > > > "clflush %a[addr]", [addr] "r" (addr)
> > > > > >
> > > > > > This lets the compiler choose the register while generating the
> > > > > > correct addressing mode.
> > > > >
> > > > > Aren't these two independent changes?
> > > >
> > > > I was hoping I can put a trivial "a" -> "r" change under the "also
> > > > ..." change. OTOH, let's change the summary to "x86/asm: Improve
> > > > clflush alternatives assembly", that will also handle your proposed
> > > > addition of "memory" clobber.
> > > >
> > > > > %a saves you having to know how to write the memory reference for the
> > > > > architecture - so is the same as (%[addr]) (assuming att syntax).
> > > > > I think the assembler handles the one 'odd' case of (%rbp).
> > > >
> > > > Yes, it does, and also fixes another 'odd' case of (%r13).
> > > >
> > > > > Was there ever a reason for using "a" rather than "r" - it seems an
> > > > > unusual choice.
> > > >
> > > > Probably just an oversight due to a follow-up __monitor() that wants
> > > > its operand in %rax.
> > >
> > > Actually gcc can be quite bad are reverse tracking register requirements.
> >
> > This must be a very old GCC as I'm not aware of this deficiency.
> >
> > --cut here--
> > void foo (int a)
> > {
> > asm volatile ("# 1" : : "r" (a));
> > asm volatile ("# 2" : : "a" (a));
> > }
> >
> > void bar (int a)
> > {
> > asm volatile ("# 1" : : "a" (a));
> > asm volatile ("# 2" : : "a" (a));
> > }
> > --cut here--
> >
> > foo:
> > movl %edi, %eax
> > # 1
> > # 2
> > ret
> >
> > bar:
> > movl %edi, %eax
> > # 1
> > # 2
> > ret
> >
> > Do you perhaps have a testcase to illustrate your claim?
>
> If you look at enough gcc output you'll see places where there are register
> moves that look like they could be removed by adjusting the register
> assignments.
> I'm pretty sure Linus has commented about that as well.
> Whether it can happen in this trivial case is another matter.
>
> Oh - I can't see anything in the gcc 15.2 doc that says that the order
> of 'asm volatile' statements can't get swapped.
> I'm also pretty sure that some older (possibly very much older) versions
> definitely would swap them over.
> There might have been a post from someone saying that 'it doesn't do that
> any more', but it isn't documented.
It isn't explicitly documented. But rest assured that they won't be
scheduled around:
from gcc/sched-deps.cc:
Traditional and volatile asm instructions must be considered to use
and clobber all hard registers, all pseudo-registers and all of
memory. So must TRAP_IF and UNSPEC_VOLATILE operations.
...
reg_pending_barrier = TRUE_BARRIER;
Uros.
On Wed, Mar 18, 2026 at 10:08 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > The inline asm used with alternative_input() specifies the address > operand for clflush with the "a" input operand constraint and > explicit "(%[addr])" dereference: > > "clflush (%[addr])", [addr] "a" (addr) > > This forces the pointer into %rax and manually encodes the memory > operand in the template. Instead, use the %a address operand > modifier and relax the constraint from "a" to "r": > > "clflush %a[addr]", [addr] "r" (addr) > > This lets the compiler choose the register while generating the > correct addressing mode. > > No functional change intended. > > Signed-off-by: Uros Bizjak <ubizjak@gmail.com> > Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Whoops, I copied the wrong stack of emails. This patch is *NOT* Acked-by peter. Uros.
© 2016 - 2026 Red Hat, Inc.