[PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier

Uros Bizjak posted 1 patch 2 weeks, 5 days ago
arch/x86/include/asm/mwait.h | 3 ++-
arch/x86/kernel/process.c    | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
[PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by Uros Bizjak 2 weeks, 5 days ago
The inline asm used with alternative_input() specifies the address
operand for clflush with the "a" input operand constraint and
explicit "(%[addr])" dereference:

    "clflush (%[addr])", [addr] "a" (addr)

This forces the pointer into %rax and manually encodes the memory
operand in the template. Instead, use the %a address operand
modifier and relax the constraint from "a" to "r":

    "clflush %a[addr]", [addr] "r" (addr)

This lets the compiler choose the register while generating the
correct addressing mode.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/mwait.h | 3 ++-
 arch/x86/kernel/process.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index e4815e15dc9a..fcb7299b293a 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,7 +116,8 @@ static __always_inline void mwait_idle_with_hints(u32 eax, u32 ecx)
 	if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
 		const void *addr = &current_thread_info()->flags;
 
-		alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+		alternative_input("", "clflush %a[addr]",
+				  X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
 		__monitor(addr, 0, 0);
 
 		if (need_resched())
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4c718f8adc59..8e295fb19b10 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -921,7 +921,8 @@ static __cpuidle void mwait_idle(void)
 	if (!current_set_polling_and_test()) {
 		const void *addr = &current_thread_info()->flags;
 
-		alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+		alternative_input("", "clflush %a[addr]",
+				  X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
 		__monitor(addr, 0, 0);
 		if (need_resched())
 			goto out;
-- 
2.53.0
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by David Laight 2 weeks, 5 days ago
On Wed, 18 Mar 2026 10:08:11 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> The inline asm used with alternative_input() specifies the address
> operand for clflush with the "a" input operand constraint and
> explicit "(%[addr])" dereference:
> 
>     "clflush (%[addr])", [addr] "a" (addr)
> 
> This forces the pointer into %rax and manually encodes the memory
> operand in the template. Instead, use the %a address operand
> modifier and relax the constraint from "a" to "r":
> 
>     "clflush %a[addr]", [addr] "r" (addr)
> 
> This lets the compiler choose the register while generating the
> correct addressing mode.

Aren't these two independent changes?
%a saves you having to know how to write the memory reference for the
architecture - so is the same as (%[addr]) (assuming att syntax).
I think the assembler handles the one 'odd' case of (%rbp).

Was there ever a reason for using "a" rather than "r" - it seems an
unusual choice.
I also think there should be a "memory" clobber - but it probably
makes no difference for these two cases.

	David
 
> 
> No functional change intended.
> 
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/include/asm/mwait.h | 3 ++-
>  arch/x86/kernel/process.c    | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index e4815e15dc9a..fcb7299b293a 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -116,7 +116,8 @@ static __always_inline void mwait_idle_with_hints(u32 eax, u32 ecx)
>  	if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
>  		const void *addr = &current_thread_info()->flags;
>  
> -		alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
> +		alternative_input("", "clflush %a[addr]",
> +				  X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
>  		__monitor(addr, 0, 0);
>  
>  		if (need_resched())
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 4c718f8adc59..8e295fb19b10 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -921,7 +921,8 @@ static __cpuidle void mwait_idle(void)
>  	if (!current_set_polling_and_test()) {
>  		const void *addr = &current_thread_info()->flags;
>  
> -		alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
> +		alternative_input("", "clflush %a[addr]",
> +				  X86_BUG_CLFLUSH_MONITOR, [addr] "r" (addr));
>  		__monitor(addr, 0, 0);
>  		if (need_resched())
>  			goto out;
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by Uros Bizjak 2 weeks, 5 days ago
On Wed, Mar 18, 2026 at 4:03 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 18 Mar 2026 10:08:11 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > The inline asm used with alternative_input() specifies the address
> > operand for clflush with the "a" input operand constraint and
> > explicit "(%[addr])" dereference:
> >
> >     "clflush (%[addr])", [addr] "a" (addr)
> >
> > This forces the pointer into %rax and manually encodes the memory
> > operand in the template. Instead, use the %a address operand
> > modifier and relax the constraint from "a" to "r":
> >
> >     "clflush %a[addr]", [addr] "r" (addr)
> >
> > This lets the compiler choose the register while generating the
> > correct addressing mode.
>
> Aren't these two independent changes?

I was hoping I can put a trivial "a" -> "r" change under the "also
..." change. OTOH, let's change the summary to "x86/asm: Improve
clflush alternatives assembly", that will also handle your proposed
addition of "memory" clobber.

> %a saves you having to know how to write the memory reference for the
> architecture - so is the same as (%[addr]) (assuming att syntax).
> I think the assembler handles the one 'odd' case of (%rbp).

Yes, it does, and also fixes another 'odd' case of (%r13).

> Was there ever a reason for using "a" rather than "r" - it seems an
> unusual choice.

Probably just an oversight due to a follow-up __monitor() that wants
its operand in %rax.

> I also think there should be a "memory" clobber - but it probably
> makes no difference for these two cases.

Hm, I think this is a good proposal. The pointer in the register is
invisible to the compiler memory tracker, so the compiler is free to
schedule (potentially related!) memory access around clflush. The
clobber doesn't make a difference in this particular case, but should
be there nevertheless as a memory read/write barrier.

Thanks,
Uros.
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by David Laight 2 weeks, 4 days ago
On Wed, 18 Mar 2026 16:45:28 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> On Wed, Mar 18, 2026 at 4:03 PM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Wed, 18 Mar 2026 10:08:11 +0100
> > Uros Bizjak <ubizjak@gmail.com> wrote:
> >  
> > > The inline asm used with alternative_input() specifies the address
> > > operand for clflush with the "a" input operand constraint and
> > > explicit "(%[addr])" dereference:
> > >
> > >     "clflush (%[addr])", [addr] "a" (addr)
> > >
> > > This forces the pointer into %rax and manually encodes the memory
> > > operand in the template. Instead, use the %a address operand
> > > modifier and relax the constraint from "a" to "r":
> > >
> > >     "clflush %a[addr]", [addr] "r" (addr)
> > >
> > > This lets the compiler choose the register while generating the
> > > correct addressing mode.  
> >
> > Aren't these two independent changes?  
> 
> I was hoping I can put a trivial "a" -> "r" change under the "also
> ..." change. OTOH, let's change the summary to "x86/asm: Improve
> clflush alternatives assembly", that will also handle your proposed
> addition of "memory" clobber.
> 
> > %a saves you having to know how to write the memory reference for the
> > architecture - so is the same as (%[addr]) (assuming att syntax).
> > I think the assembler handles the one 'odd' case of (%rbp).  
> 
> Yes, it does, and also fixes another 'odd' case of (%r13).
> 
> > Was there ever a reason for using "a" rather than "r" - it seems an
> > unusual choice.  
> 
> Probably just an oversight due to a follow-up __monitor() that wants
> its operand in %rax.

Actually gcc can be quite bad are reverse tracking register requirements.
So forcing 'addr' into %rax for the cflush might actually remove
a register move before the monitor.
Indeed, were it to pick a different register there will always be a
extra register move.
If the value is in a different register (eg from a function call)
then you'll move the register move instruction - but there'll still
be one.

So I suspect this change can never improve the code.

	David

> 
> > I also think there should be a "memory" clobber - but it probably
> > makes no difference for these two cases.  
> 
> Hm, I think this is a good proposal. The pointer in the register is
> invisible to the compiler memory tracker, so the compiler is free to
> schedule (potentially related!) memory access around clflush. The
> clobber doesn't make a difference in this particular case, but should
> be there nevertheless as a memory read/write barrier.
> 
> Thanks,
> Uros.
> 
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by Uros Bizjak 2 weeks, 4 days ago
On Thu, Mar 19, 2026 at 11:20 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 18 Mar 2026 16:45:28 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > On Wed, Mar 18, 2026 at 4:03 PM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Wed, 18 Mar 2026 10:08:11 +0100
> > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > The inline asm used with alternative_input() specifies the address
> > > > operand for clflush with the "a" input operand constraint and
> > > > explicit "(%[addr])" dereference:
> > > >
> > > >     "clflush (%[addr])", [addr] "a" (addr)
> > > >
> > > > This forces the pointer into %rax and manually encodes the memory
> > > > operand in the template. Instead, use the %a address operand
> > > > modifier and relax the constraint from "a" to "r":
> > > >
> > > >     "clflush %a[addr]", [addr] "r" (addr)
> > > >
> > > > This lets the compiler choose the register while generating the
> > > > correct addressing mode.
> > >
> > > Aren't these two independent changes?
> >
> > I was hoping I can put a trivial "a" -> "r" change under the "also
> > ..." change. OTOH, let's change the summary to "x86/asm: Improve
> > clflush alternatives assembly", that will also handle your proposed
> > addition of "memory" clobber.
> >
> > > %a saves you having to know how to write the memory reference for the
> > > architecture - so is the same as (%[addr]) (assuming att syntax).
> > > I think the assembler handles the one 'odd' case of (%rbp).
> >
> > Yes, it does, and also fixes another 'odd' case of (%r13).
> >
> > > Was there ever a reason for using "a" rather than "r" - it seems an
> > > unusual choice.
> >
> > Probably just an oversight due to a follow-up __monitor() that wants
> > its operand in %rax.
>
> Actually gcc can be quite bad are reverse tracking register requirements.

This must be a very old GCC as I'm not aware of this deficiency.

--cut here--
void foo (int a)
{
  asm volatile ("# 1" : : "r" (a));
  asm volatile ("# 2" : : "a" (a));
}

void bar (int a)
{
  asm volatile ("# 1" : : "a" (a));
  asm volatile ("# 2" : : "a" (a));
}
--cut here--

foo:
       movl    %edi, %eax
       # 1
       # 2
       ret

bar:
       movl    %edi, %eax
       # 1
       # 2
       ret

Do you perhaps have a testcase to illustrate your claim?

> So forcing 'addr' into %rax for the cflush might actually remove
> a register move before the monitor.
> Indeed, were it to pick a different register there will always be a
> extra register move.
> If the value is in a different register (eg from a function call)
> then you'll move the register move instruction - but there'll still
> be one.
>
> So I suspect this change can never improve the code.

Of course, there will always be a register move in the above case, but
please look at [1].

[1] https://claude.ai/share/cf559f66-dfcf-451a-8260-6f687aead052

Uros.
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by David Laight 2 weeks, 4 days ago
On Thu, 19 Mar 2026 11:45:59 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> On Thu, Mar 19, 2026 at 11:20 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Wed, 18 Mar 2026 16:45:28 +0100
> > Uros Bizjak <ubizjak@gmail.com> wrote:
> >  
> > > On Wed, Mar 18, 2026 at 4:03 PM David Laight
> > > <david.laight.linux@gmail.com> wrote:  
> > > >
> > > > On Wed, 18 Mar 2026 10:08:11 +0100
> > > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >  
> > > > > The inline asm used with alternative_input() specifies the address
> > > > > operand for clflush with the "a" input operand constraint and
> > > > > explicit "(%[addr])" dereference:
> > > > >
> > > > >     "clflush (%[addr])", [addr] "a" (addr)
> > > > >
> > > > > This forces the pointer into %rax and manually encodes the memory
> > > > > operand in the template. Instead, use the %a address operand
> > > > > modifier and relax the constraint from "a" to "r":
> > > > >
> > > > >     "clflush %a[addr]", [addr] "r" (addr)
> > > > >
> > > > > This lets the compiler choose the register while generating the
> > > > > correct addressing mode.  
> > > >
> > > > Aren't these two independent changes?  
> > >
> > > I was hoping I can put a trivial "a" -> "r" change under the "also
> > > ..." change. OTOH, let's change the summary to "x86/asm: Improve
> > > clflush alternatives assembly", that will also handle your proposed
> > > addition of "memory" clobber.
> > >  
> > > > %a saves you having to know how to write the memory reference for the
> > > > architecture - so is the same as (%[addr]) (assuming att syntax).
> > > > I think the assembler handles the one 'odd' case of (%rbp).  
> > >
> > > Yes, it does, and also fixes another 'odd' case of (%r13).
> > >  
> > > > Was there ever a reason for using "a" rather than "r" - it seems an
> > > > unusual choice.  
> > >
> > > Probably just an oversight due to a follow-up __monitor() that wants
> > > its operand in %rax.  
> >
> > Actually gcc can be quite bad are reverse tracking register requirements.  
> 
> This must be a very old GCC as I'm not aware of this deficiency.
> 
> --cut here--
> void foo (int a)
> {
>   asm volatile ("# 1" : : "r" (a));
>   asm volatile ("# 2" : : "a" (a));
> }
> 
> void bar (int a)
> {
>   asm volatile ("# 1" : : "a" (a));
>   asm volatile ("# 2" : : "a" (a));
> }
> --cut here--
> 
> foo:
>        movl    %edi, %eax
>        # 1
>        # 2
>        ret
> 
> bar:
>        movl    %edi, %eax
>        # 1
>        # 2
>        ret
> 
> Do you perhaps have a testcase to illustrate your claim?

If you look at enough gcc output you'll see places where there are register
moves that look like they could be removed by adjusting the register
assignments.
I'm pretty sure Linus has commented about that as well.
Whether it can happen in this trivial case is another matter.

Oh - I can't see anything in the gcc 15.2 doc that says that the order
of 'asm volatile' statements can't get swapped.
I'm also pretty sure that some older (possibly very much older) versions
definitely would swap them over.
There might have been a post from someone saying that 'it doesn't do that
any more', but it isn't documented. 

	David

> 
> > So forcing 'addr' into %rax for the cflush might actually remove
> > a register move before the monitor.
> > Indeed, were it to pick a different register there will always be a
> > extra register move.
> > If the value is in a different register (eg from a function call)
> > then you'll move the register move instruction - but there'll still
> > be one.
> >
> > So I suspect this change can never improve the code.  
> 
> Of course, there will always be a register move in the above case, but
> please look at [1].
> 
> [1] https://claude.ai/share/cf559f66-dfcf-451a-8260-6f687aead052
> 
> Uros.
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by Uros Bizjak 2 weeks, 4 days ago
On Thu, Mar 19, 2026 at 12:21 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Thu, 19 Mar 2026 11:45:59 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > On Thu, Mar 19, 2026 at 11:20 AM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Wed, 18 Mar 2026 16:45:28 +0100
> > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > On Wed, Mar 18, 2026 at 4:03 PM David Laight
> > > > <david.laight.linux@gmail.com> wrote:
> > > > >
> > > > > On Wed, 18 Mar 2026 10:08:11 +0100
> > > > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > > The inline asm used with alternative_input() specifies the address
> > > > > > operand for clflush with the "a" input operand constraint and
> > > > > > explicit "(%[addr])" dereference:
> > > > > >
> > > > > >     "clflush (%[addr])", [addr] "a" (addr)
> > > > > >
> > > > > > This forces the pointer into %rax and manually encodes the memory
> > > > > > operand in the template. Instead, use the %a address operand
> > > > > > modifier and relax the constraint from "a" to "r":
> > > > > >
> > > > > >     "clflush %a[addr]", [addr] "r" (addr)
> > > > > >
> > > > > > This lets the compiler choose the register while generating the
> > > > > > correct addressing mode.
> > > > >
> > > > > Aren't these two independent changes?
> > > >
> > > > I was hoping I can put a trivial "a" -> "r" change under the "also
> > > > ..." change. OTOH, let's change the summary to "x86/asm: Improve
> > > > clflush alternatives assembly", that will also handle your proposed
> > > > addition of "memory" clobber.
> > > >
> > > > > %a saves you having to know how to write the memory reference for the
> > > > > architecture - so is the same as (%[addr]) (assuming att syntax).
> > > > > I think the assembler handles the one 'odd' case of (%rbp).
> > > >
> > > > Yes, it does, and also fixes another 'odd' case of (%r13).
> > > >
> > > > > Was there ever a reason for using "a" rather than "r" - it seems an
> > > > > unusual choice.
> > > >
> > > > Probably just an oversight due to a follow-up __monitor() that wants
> > > > its operand in %rax.
> > >
> > > Actually gcc can be quite bad are reverse tracking register requirements.
> >
> > This must be a very old GCC as I'm not aware of this deficiency.
> >
> > --cut here--
> > void foo (int a)
> > {
> >   asm volatile ("# 1" : : "r" (a));
> >   asm volatile ("# 2" : : "a" (a));
> > }
> >
> > void bar (int a)
> > {
> >   asm volatile ("# 1" : : "a" (a));
> >   asm volatile ("# 2" : : "a" (a));
> > }
> > --cut here--
> >
> > foo:
> >        movl    %edi, %eax
> >        # 1
> >        # 2
> >        ret
> >
> > bar:
> >        movl    %edi, %eax
> >        # 1
> >        # 2
> >        ret
> >
> > Do you perhaps have a testcase to illustrate your claim?
>
> If you look at enough gcc output you'll see places where there are register
> moves that look like they could be removed by adjusting the register
> assignments.
> I'm pretty sure Linus has commented about that as well.
> Whether it can happen in this trivial case is another matter.
>
> Oh - I can't see anything in the gcc 15.2 doc that says that the order
> of 'asm volatile' statements can't get swapped.
> I'm also pretty sure that some older (possibly very much older) versions
> definitely would swap them over.
> There might have been a post from someone saying that 'it doesn't do that
> any more', but it isn't documented.

It isn't explicitly documented. But rest assured that they won't be
scheduled around:

from gcc/sched-deps.cc:

       Traditional and volatile asm instructions must be considered to use
       and clobber all hard registers, all pseudo-registers and all of
       memory.  So must TRAP_IF and UNSPEC_VOLATILE operations.
...
      reg_pending_barrier = TRUE_BARRIER;

Uros.
Re: [PATCH] x86/asm: Switch clflush alternatives to use %a address operand modifier
Posted by Uros Bizjak 2 weeks, 5 days ago
On Wed, Mar 18, 2026 at 10:08 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> The inline asm used with alternative_input() specifies the address
> operand for clflush with the "a" input operand constraint and
> explicit "(%[addr])" dereference:
>
>     "clflush (%[addr])", [addr] "a" (addr)
>
> This forces the pointer into %rax and manually encodes the memory
> operand in the template. Instead, use the %a address operand
> modifier and relax the constraint from "a" to "r":
>
>     "clflush %a[addr]", [addr] "r" (addr)
>
> This lets the compiler choose the register while generating the
> correct addressing mode.
>
> No functional change intended.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>

Whoops, I copied the wrong stack of emails. This patch is *NOT* Acked-by peter.

Uros.