[PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions

Josh Poimboeuf posted 20 patches 9 months, 1 week ago
[PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Josh Poimboeuf 9 months, 1 week ago
Use the standard alternative_io() interface.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/include/asm/barrier.h | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index db70832232d4..489a7ea76384 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -12,12 +12,23 @@
  */
 
 #ifdef CONFIG_X86_32
-#define mb() asm volatile(ALTERNATIVE("lock addl $0,-4(%%esp)", "mfence", \
-				      X86_FEATURE_XMM2) ::: "memory", "cc")
-#define rmb() asm volatile(ALTERNATIVE("lock addl $0,-4(%%esp)", "lfence", \
-				       X86_FEATURE_XMM2) ::: "memory", "cc")
-#define wmb() asm volatile(ALTERNATIVE("lock addl $0,-4(%%esp)", "sfence", \
-				       X86_FEATURE_XMM2) ::: "memory", "cc")
+#define mb() alternative_io("lock addl $0,-4(%%esp)",			\
+			    "mfence", X86_FEATURE_XMM2,			\
+			    ARG(),					\
+			    ARG(),					\
+			    ARG("memory", "cc"))
+
+#define rmb() alternative_io("lock addl $0, -4(%%esp)",			\
+			     "lfence", X86_FEATURE_XMM2,		\
+			     ARG(),					\
+			     ARG(),					\
+			     ARG("memory", "cc"))
+
+#define wmb() alternative_io("lock addl $0, -4(%%esp)",			\
+			     "sfence", X86_FEATURE_XMM2,		\
+			     ARG(),					\
+			     ARG(),					\
+			     ARG("memory", "cc"))
 #else
 #define __mb()	asm volatile("mfence":::"memory")
 #define __rmb()	asm volatile("lfence":::"memory")
-- 
2.48.1
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Linus Torvalds 9 months, 1 week ago
On Fri, 14 Mar 2025 at 11:42, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> +#define mb() alternative_io("lock addl $0,-4(%%esp)",                  \
> +                           "mfence", X86_FEATURE_XMM2,                 \
> +                           ARG(),                                      \
> +                           ARG(),                                      \
> +                           ARG("memory", "cc"))

So all of these patches look like good cleanups to me, but I do wonder
if we should

 (a) not use some naming *quite* as generic as 'ARG()'

 (b) make the asms use ARG_OUT/ARG_IN/ARG_CLOBBER() to clarify

because that ARG(), ARG(), ARGC() pattern looks odd to me.

Maybe it's just me.

Regardless, I do think the series looks like a nice improvement even
in the current form, even if that particular repeated pattern feels
strange.

              Linus
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Ingo Molnar 9 months, 1 week ago
* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 14 Mar 2025 at 11:42, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > +#define mb() alternative_io("lock addl $0,-4(%%esp)",                  \
> > +                           "mfence", X86_FEATURE_XMM2,                 \
> > +                           ARG(),                                      \
> > +                           ARG(),                                      \
> > +                           ARG("memory", "cc"))
> 
> So all of these patches look like good cleanups to me, but I do wonder
> if we should
> 
>  (a) not use some naming *quite* as generic as 'ARG()'
> 
>  (b) make the asms use ARG_OUT/ARG_IN/ARG_CLOBBER() to clarify
> 
> because that ARG(), ARG(), ARGC() pattern looks odd to me.
> 
> Maybe it's just me.

Not just you, and I think the ARG_ prefix still looks a bit too 
generic-C to me, it should be something more specific and unambiguously 
asm() related, like:

	ASM_ARGS_IN(),
	ASM_ARGS_OUT(),
	ASM_ARGS_CLOBBER(),

or maybe even:

	ASM_CONSTRAINT_IN(),
	ASM_CONSTRAINT_OUT(),
	ASM_CONSTRAINT_CLOBBER(),

Because these asm()-ish syntactic constructs look better in separate 
lines anyway, and it's not like we are at risk of running out of 
letters in the kernel anytime soon.

Thanks,

	Ingo
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Josh Poimboeuf 9 months, 1 week ago
On Fri, Mar 14, 2025 at 01:49:48PM -1000, Linus Torvalds wrote:
> So all of these patches look like good cleanups to me, but I do wonder
> if we should
> 
>  (a) not use some naming *quite* as generic as 'ARG()'
> 
>  (b) make the asms use ARG_OUT/ARG_IN/ARG_CLOBBER() to clarify
> 
> because that ARG(), ARG(), ARGC() pattern looks odd to me.
> 
> Maybe it's just me.
> 
> Regardless, I do think the series looks like a nice improvement even
> in the current form, even if that particular repeated pattern feels
> strange.

So originally I had ASM_OUTPUT/ASM_INPUT/ASM_CLOBBER, but I ended up
going with ARG() due to its nice vertical alignment and conciseness:


	__asm_call(qual,						\
		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
			    "cmpxchg8b " __percpu_arg([var]),		\
			    X86_FEATURE_CX8),				\
		ARG([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
		    "+d" (old__.high)),					\
		ARG("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
		ARG("memory"));						\


Though ASM_OUTPUT/ASM_INPUT/ASM_CLOBBER isn't so bad either:

	__asm_call(qual,						\
		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
			    "cmpxchg8b " __percpu_arg([var]),		\
			    X86_FEATURE_CX8),				\
		ASM_OUTPUT([var] "+m" (__my_cpu_var(_var)),		\
			   "+a" (old__.low), "+d" (old__.high)),	\
		ASM_INPUT("b" (new__.low), "c" (new__.high),		\
			  "S" (&(_var))),				\
		ASM_CLOBBER("memory"));					\


That has the nice benefit of being more self-documenting, albeit more
verbose and less vertically aligned.

So I could go either way, really.

-- 
Josh
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by David Laight 9 months ago
On Fri, 14 Mar 2025 17:05:34 -0700
Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> On Fri, Mar 14, 2025 at 01:49:48PM -1000, Linus Torvalds wrote:
> > So all of these patches look like good cleanups to me, but I do wonder
> > if we should
> > 
> >  (a) not use some naming *quite* as generic as 'ARG()'
> > 
> >  (b) make the asms use ARG_OUT/ARG_IN/ARG_CLOBBER() to clarify
> > 
> > because that ARG(), ARG(), ARGC() pattern looks odd to me.
> > 
> > Maybe it's just me.
> > 
> > Regardless, I do think the series looks like a nice improvement even
> > in the current form, even if that particular repeated pattern feels
> > strange.  
> 
> So originally I had ASM_OUTPUT/ASM_INPUT/ASM_CLOBBER, but I ended up
> going with ARG() due to its nice vertical alignment and conciseness:

But ARG() does look horrid.

Is the ARG() necessary just to handle the comma separated lists?
If so is it only actually needed if there is more than one item?

Another option is to just require () and add the ARG in the expansion.
So with:
#define __asm_call(qual, alt, out, in, clobber) \
	asm("zzz", ARG out, ARG in, ARG clobber)

__asm_call(qual, ALT(), \
		([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
		    "+d" (old__.high)),					\
		("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
		("memory"));

would get expanded the same as the line below.

	David
	
> 
> 
> 	__asm_call(qual,						\
> 		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
> 			    "cmpxchg8b " __percpu_arg([var]),		\
> 			    X86_FEATURE_CX8),				\
> 		ARG([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
> 		    "+d" (old__.high)),					\
> 		ARG("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
> 		ARG("memory"));						\
> 
> 
> Though ASM_OUTPUT/ASM_INPUT/ASM_CLOBBER isn't so bad either:
> 
> 	__asm_call(qual,						\
> 		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
> 			    "cmpxchg8b " __percpu_arg([var]),		\
> 			    X86_FEATURE_CX8),				\
> 		ASM_OUTPUT([var] "+m" (__my_cpu_var(_var)),		\
> 			   "+a" (old__.low), "+d" (old__.high)),	\
> 		ASM_INPUT("b" (new__.low), "c" (new__.high),		\
> 			  "S" (&(_var))),				\
> 		ASM_CLOBBER("memory"));					\
> 
> 
> That has the nice benefit of being more self-documenting, albeit more
> verbose and less vertically aligned.
> 
> So I could go either way, really.
>
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Josh Poimboeuf 9 months ago
On Mon, Mar 17, 2025 at 08:04:32PM +0000, David Laight wrote:
> Is the ARG() necessary just to handle the comma separated lists?
> If so is it only actually needed if there is more than one item?

No, but my preference is to require the use of the macro even for single
constraints as it helps visually separate the lists.

> Another option is to just require () and add the ARG in the expansion.
> So with:
> #define __asm_call(qual, alt, out, in, clobber) \
> 	asm("zzz", ARG out, ARG in, ARG clobber)
> 
> __asm_call(qual, ALT(), \
> 		([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
> 		    "+d" (old__.high)),					\
> 		("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
> 		("memory"));
> 
> would get expanded the same as the line below.

Interesting idea, though I still prefer the self-documenting ASM_OUTPUT
/ ASM_INPUT / ASM_CLOBBER macros which are self-documenting and make it
easier to read and visually distinguish the constraint lists.

-- 
Josh
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by David Laight 9 months ago
On Mon, 17 Mar 2025 17:11:58 -0700
Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> On Mon, Mar 17, 2025 at 08:04:32PM +0000, David Laight wrote:
> > Is the ARG() necessary just to handle the comma separated lists?
> > If so is it only actually needed if there is more than one item?  
> 
> No, but my preference is to require the use of the macro even for single
> constraints as it helps visually separate the lists.
> 
> > Another option is to just require () and add the ARG in the expansion.
> > So with:
> > #define __asm_call(qual, alt, out, in, clobber) \
> > 	asm("zzz", ARG out, ARG in, ARG clobber)
> > 
> > __asm_call(qual, ALT(), \
> > 		([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
> > 		    "+d" (old__.high)),					\
> > 		("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
> > 		("memory"));
> > 
> > would get expanded the same as the line below.  
> 
> Interesting idea, though I still prefer the self-documenting ASM_OUTPUT
> / ASM_INPUT / ASM_CLOBBER macros which are self-documenting and make it
> easier to read and visually distinguish the constraint lists.

Except that non of this really makes it easier to get out/in in the correct
order or to use the right constraints.
So are you just adding 'syntactic sugar' for no real gain?

Looking back at one of the changes:
-#define mb() asm volatile(ALTERNATIVE("lock addl $0,-4(%%esp)", "mfence", \
-				      X86_FEATURE_XMM2) ::: "memory", "cc")
+#define mb() alternative_io("lock addl $0,-4(%%esp)",			\
+			    "mfence", X86_FEATURE_XMM2,			\
+			    ARG(),					\
+			    ARG(),					\
+			    ARG("memory", "cc")) 

is it really an improvement?

	David
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Josh Poimboeuf 9 months ago
On Tue, Mar 18, 2025 at 10:06:05PM +0000, David Laight wrote:
> > > So with:
> > > #define __asm_call(qual, alt, out, in, clobber) \
> > > 	asm("zzz", ARG out, ARG in, ARG clobber)
> > > 
> > > __asm_call(qual, ALT(), \
> > > 		([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
> > > 		    "+d" (old__.high)),					\
> > > 		("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
> > > 		("memory"));
> > > 
> > > would get expanded the same as the line below.  
> > 
> > Interesting idea, though I still prefer the self-documenting ASM_OUTPUT
> > / ASM_INPUT / ASM_CLOBBER macros which are self-documenting and make it
> > easier to read and visually distinguish the constraint lists.
> 
> Except that non of this really makes it easier to get out/in in the correct
> order or to use the right constraints.

At least it's still no worse than asm() itself in that respect.

> So are you just adding 'syntactic sugar' for no real gain?

Some wrappers need to modify their constraint lists, so the sugar does
have a functional purpose.  The new alternative_io() (or whatever it
will be called) interface will especially be needed for the followup to
this patch set which introduces asm_call() to try to fix an
ASM_CALL_CONSTRAINT mess.

> Looking back at one of the changes:
> -#define mb() asm volatile(ALTERNATIVE("lock addl $0,-4(%%esp)", "mfence", \
> -				      X86_FEATURE_XMM2) ::: "memory", "cc")
> +#define mb() alternative_io("lock addl $0,-4(%%esp)",			\
> +			    "mfence", X86_FEATURE_XMM2,			\
> +			    ARG(),					\
> +			    ARG(),					\
> +			    ARG("memory", "cc")) 
> 
> is it really an improvement?

The motivation here is to use the alternative*() wrappers whenever
possible.  It helps achieve consistent behaviors and also removes the
ugly nested ALTERNATIVE() macro.

In fact, the change in your example actually improves code generation:
it changes the asm() to asm_inline() which prevents GCC from doing crazy
things due to the exploded size of the asm string.

-- 
Josh
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Ingo Molnar 9 months, 1 week ago
* Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> On Fri, Mar 14, 2025 at 01:49:48PM -1000, Linus Torvalds wrote:
> > So all of these patches look like good cleanups to me, but I do wonder
> > if we should
> > 
> >  (a) not use some naming *quite* as generic as 'ARG()'
> > 
> >  (b) make the asms use ARG_OUT/ARG_IN/ARG_CLOBBER() to clarify
> > 
> > because that ARG(), ARG(), ARGC() pattern looks odd to me.
> > 
> > Maybe it's just me.
> > 
> > Regardless, I do think the series looks like a nice improvement even
> > in the current form, even if that particular repeated pattern feels
> > strange.
> 
> So originally I had ASM_OUTPUT/ASM_INPUT/ASM_CLOBBER, but I ended up
> going with ARG() due to its nice vertical alignment and conciseness:
> 
> 
> 	__asm_call(qual,						\
> 		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
> 			    "cmpxchg8b " __percpu_arg([var]),		\
> 			    X86_FEATURE_CX8),				\
> 		ARG([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low),	\
> 		    "+d" (old__.high)),					\
> 		ARG("b" (new__.low), "c" (new__.high), "S" (&(_var))),	\
> 		ARG("memory"));						\

Two nits:

1)

In justified cases we can align vertically just fine by using spaces:

  ASM_INPUT  ([var] "+m" (__my_cpu_var(_var)), "+a" (old__.low)...
  ASM_OUTPUT ("b" (new__.low), "c" (new__.high), "S" (&(_var))),
  ASM_CLOBBER("memory")

But I don't think the vertical alignment of visually disjoint, 
comma-separated arguments is an improvement in this specific case.

A *truly* advanced typographically aware syntactic construct would be 
something like:

		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
			    "cmpxchg8b " __percpu_arg([var]),		\
			    X86_FEATURE_CX8),				\
									\
		  ASM_INPUT( [var] "+m" (__my_cpu_var(_var)),		\
				   "+a" (old__.low),			\
				   "+d" (old__.high)),			\
									\
		  ASM_OUTPUT(       "b" (new__.low),			\
				    "c" (new__.high),			\
				    "S" (&(_var))),			\
									\
		  ASM_CLOBBER(	    "memory"));

Note how horizontal and vertical grouping improves readability by an 
order of magnitude and properly highlights the only named operand and 
makes it very easy to review this code, should it be a new submission 
(which it isn't).

And as Knuth said, the intersection of the sets of good coders and good 
typographers is necessarily a tiny percentage of humanity 
(paraphrased), but I digress ...

2)

If 'ARGS' is included in the naming then I'd like to insist on the 
plural 'ARGS', not 'ARG', because the common case for more complicated 
asm() statements is multiple asm template constraint arguments 
separated by commas.

But I don't think we need the 'ARGS':

> Though ASM_OUTPUT/ASM_INPUT/ASM_CLOBBER isn't so bad either:
> 
> 	__asm_call(qual,						\
> 		ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
> 			    "cmpxchg8b " __percpu_arg([var]),		\
> 			    X86_FEATURE_CX8),				\
> 		ASM_OUTPUT([var] "+m" (__my_cpu_var(_var)),		\
> 			   "+a" (old__.low), "+d" (old__.high)),	\
> 		ASM_INPUT("b" (new__.low), "c" (new__.high),		\
> 			  "S" (&(_var))),				\
> 		ASM_CLOBBER("memory"));					\
> 
> 
> That has the nice benefit of being more self-documenting, albeit more
> verbose and less vertically aligned.
> 
> So I could go either way, really.

I'd vote on:

	ASM_INPUT(),
	ASM_OUTPUT(),
	ASM_CLOBBER()

... because the ASM_ prefix is already unique enough visually to reset 
any cross-pollination from other well-known namespaces, and because in 
coding shorter is better, all other things equal.

Thanks,

	Ingo
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Linus Torvalds 9 months, 1 week ago
On Fri, 14 Mar 2025 at 13:49, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> because that ARG(), ARG(), ARGC() pattern looks odd to me.
>
> Maybe it's just me.

Oh, and the other thing I reacted to is that I think the
"alternative_io()" thing should be renamed.

The "io" makes me think "actual I/O". As in PCI or disks or whatever.
It always read oddly, but now it's *comletely* pointless, because the
new macro model actually takes pretty much arbitrary asm arguments, to
the "both input and output arguments" no longer makes any real sense.

So I think it would be better to just call this "alternative_asm()",
and make naming simpler. Hmm?

            Linus
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Josh Poimboeuf 9 months, 1 week ago
On Fri, Mar 14, 2025 at 01:54:00PM -1000, Linus Torvalds wrote:
> On Fri, 14 Mar 2025 at 13:49, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > because that ARG(), ARG(), ARGC() pattern looks odd to me.
> >
> > Maybe it's just me.
> 
> Oh, and the other thing I reacted to is that I think the
> "alternative_io()" thing should be renamed.
> 
> The "io" makes me think "actual I/O". As in PCI or disks or whatever.
> It always read oddly, but now it's *comletely* pointless, because the
> new macro model actually takes pretty much arbitrary asm arguments, to
> the "both input and output arguments" no longer makes any real sense.
> 
> So I think it would be better to just call this "alternative_asm()",
> and make naming simpler. Hmm?

Thing is, we still have alternative(), which is also an asm wrapper, but
it's for when the caller doesn't care about adding any constraints.

So the "_io()" distinguishes from that.

-- 
Josh
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Linus Torvalds 9 months, 1 week ago
On Fri, 14 Mar 2025 at 14:09, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> Thing is, we still have alternative(), which is also an asm wrapper, but
> it's for when the caller doesn't care about adding any constraints.
>
> So the "_io()" distinguishes from that.

.. but I think it does so very badly because "io" really means
something else entirely in absolutely all other contexts.

And it really makes no sense as "io", since it doesn't take inputs and
outputs, it takes inputs, outputs AND CLOBBERS.

So it would make more sense to call it "ioc", but that's just obvious
nonsense, and "ioc" is already taken as a globally recognized
shorthand for "corruption in sports".

So "ioc" is bad too, but that should make you go "Oh, 'io' is _doubly_
nonsensical".

Ergo: I think "asm" would be a better distinguishing marker, withg the
plain "alternative()" being used for particularly simple asms.

              Linus
Re: [PATCH 14/20] x86/barrier: Use alternative_io() in 32-bit barrier functions
Posted by Ingo Molnar 9 months, 1 week ago
* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 14 Mar 2025 at 14:09, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > Thing is, we still have alternative(), which is also an asm wrapper, but
> > it's for when the caller doesn't care about adding any constraints.
> >
> > So the "_io()" distinguishes from that.
> 
> .. but I think it does so very badly because "io" really means
> something else entirely in absolutely all other contexts.

Yeah, alternative_io() is really a misnomer we should fix.

As a minor side note, it's *doubly* a misnomer, because 'io' mixes up 
the defined 'o/i' order of the output/input constraints:

  arch/x86/include/asm/alternative.h:#define alternative_io(oldinstr, newinstr, ft_flags, output, input...)       \

So it should have been alternative_oi().


> And it really makes no sense as "io", since it doesn't take inputs and
> outputs, it takes inputs, outputs AND CLOBBERS.
> 
> So it would make more sense to call it "ioc", but that's just obvious
> nonsense, and "ioc" is already taken as a globally recognized
> shorthand for "corruption in sports".

lol ...

> So "ioc" is bad too, but that should make you go "Oh, 'io' is _doubly_
> nonsensical".
> 
> Ergo: I think "asm" would be a better distinguishing marker, withg the
> plain "alternative()" being used for particularly simple asms.

Yeah, alternative_asm() or alternative_opts(). Anything but '_io()' :-)

Thanks,

	Ingo