[PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly

Uros Bizjak posted 1 patch 11 months ago
There is a newer version of this series
arch/x86/include/asm/arch_hweight.h | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
[PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Uros Bizjak 11 months ago
a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
instruction from being scheduled before the frame pointer gets set
up by the containing function, causing objtool to print a "call
without frame pointer save/setup" warning.

b) Use asm_inline to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.

c) Use named operands in inline asm.

More inlining causes slight increase in the code size:

   text    data     bss     dec     hex filename
27261832        4640296  814660 32716788        1f337f4 vmlinux-new.o
27261222        4640320  814660 32716202        1f335aa vmlinux-old.o

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/arch_hweight.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
index ba88edd0d58b..20b0633744e4 100644
--- a/arch/x86/include/asm/arch_hweight.h
+++ b/arch/x86/include/asm/arch_hweight.h
@@ -16,10 +16,10 @@ static __always_inline unsigned int __arch_hweight32(unsigned int w)
 {
 	unsigned int res;
 
-	asm (ALTERNATIVE("call __sw_hweight32", "popcntl %1, %0", X86_FEATURE_POPCNT)
-			 : "="REG_OUT (res)
-			 : REG_IN (w));
-
+	asm_inline (ALTERNATIVE("call __sw_hweight32",
+				"popcntl %[val], %[cnt]", X86_FEATURE_POPCNT)
+			 : [cnt] "="REG_OUT (res), ASM_CALL_CONSTRAINT
+			 : [val] REG_IN (w));
 	return res;
 }
 
@@ -44,10 +44,10 @@ static __always_inline unsigned long __arch_hweight64(__u64 w)
 {
 	unsigned long res;
 
-	asm (ALTERNATIVE("call __sw_hweight64", "popcntq %1, %0", X86_FEATURE_POPCNT)
-			 : "="REG_OUT (res)
-			 : REG_IN (w));
-
+	asm_inline (ALTERNATIVE("call __sw_hweight64",
+				"popcntq %[val], %[cnt]", X86_FEATURE_POPCNT)
+			 : [cnt] "="REG_OUT (res), ASM_CALL_CONSTRAINT
+			 : [val] REG_IN (w));
 	return res;
 }
 #endif /* CONFIG_X86_32 */
-- 
2.42.0
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Ingo Molnar 11 months ago
* Uros Bizjak <ubizjak@gmail.com> wrote:

> a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> instruction from being scheduled before the frame pointer gets set
> up by the containing function, causing objtool to print a "call
> without frame pointer save/setup" warning.
> 
> b) Use asm_inline to instruct the compiler that the size of asm()
> is the minimum size of one instruction, ignoring how many instructions
> the compiler thinks it is. ALTERNATIVE macro that expands to several
> pseudo directives causes instruction length estimate to count
> more than 20 instructions.
> 
> c) Use named operands in inline asm.
> 
> More inlining causes slight increase in the code size:
> 
>    text    data     bss     dec     hex filename
> 27261832        4640296  814660 32716788        1f337f4 vmlinux-new.o
> 27261222        4640320  814660 32716202        1f335aa vmlinux-old.o

What is the per call/inlining-instance change in code size, measured in 
fast-path instruction bytes? Also, exception code or cold branches near 
the epilogue of the function after the main RET don't fully count as a 
size increase.

This kind of normalization and filtering of changes to relevant 
generated instructions is a better metric than some rather meaningless 
'+610 bytes of code' figure.

Also, please always specify the kind of config you used for building 
the vmlinux.

Thanks,

	Ingo
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Uros Bizjak 11 months ago
On Mon, Mar 10, 2025 at 9:16 PM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> > instruction from being scheduled before the frame pointer gets set
> > up by the containing function, causing objtool to print a "call
> > without frame pointer save/setup" warning.
> >
> > b) Use asm_inline to instruct the compiler that the size of asm()
> > is the minimum size of one instruction, ignoring how many instructions
> > the compiler thinks it is. ALTERNATIVE macro that expands to several
> > pseudo directives causes instruction length estimate to count
> > more than 20 instructions.
> >
> > c) Use named operands in inline asm.
> >
> > More inlining causes slight increase in the code size:
> >
> >    text    data     bss     dec     hex filename
> > 27261832        4640296  814660 32716788        1f337f4 vmlinux-new.o
> > 27261222        4640320  814660 32716202        1f335aa vmlinux-old.o
>
> What is the per call/inlining-instance change in code size, measured in
> fast-path instruction bytes? Also, exception code or cold branches near
> the epilogue of the function after the main RET don't fully count as a
> size increase.
>
> This kind of normalization and filtering of changes to relevant
> generated instructions is a better metric than some rather meaningless
> '+610 bytes of code' figure.
>
> Also, please always specify the kind of config you used for building
> the vmlinux.

Sorry, this just slipped my mind. x86_64 defconfig - I'll note this in
the revised commit entry.

BTW: The difference between old and new number of inlined __sw_hweight
calls is: 367 -> 396. I'll try to analyze this some more.

Thanks,
Uros.
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Borislav Petkov 11 months ago
On Mon, Mar 10, 2025 at 09:08:04PM +0100, Uros Bizjak wrote:
> a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> instruction from being scheduled before the frame pointer gets set
> up by the containing function, causing objtool to print a "call
> without frame pointer save/setup" warning.

The other two are ok but this is new. How do you trigger this? I've never seen
it in my randconfig builds...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Uros Bizjak 11 months ago
On Mon, Mar 10, 2025 at 9:12 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Mar 10, 2025 at 09:08:04PM +0100, Uros Bizjak wrote:
> > a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> > instruction from being scheduled before the frame pointer gets set
> > up by the containing function, causing objtool to print a "call
> > without frame pointer save/setup" warning.
>
> The other two are ok but this is new. How do you trigger this? I've never seen
> it in my randconfig builds...

It is not triggered now, but without this constraint, nothing prevents
the compiler from scheduling the insn in front of frame creation.

Uros.
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Borislav Petkov 11 months ago
On Mon, Mar 10, 2025 at 09:35:42PM +0100, Uros Bizjak wrote:
> On Mon, Mar 10, 2025 at 9:12 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Mar 10, 2025 at 09:08:04PM +0100, Uros Bizjak wrote:
> > > a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> > > instruction from being scheduled before the frame pointer gets set
> > > up by the containing function, causing objtool to print a "call
> > > without frame pointer save/setup" warning.
> >
> > The other two are ok but this is new. How do you trigger this? I've never seen
> > it in my randconfig builds...
> 
> It is not triggered now, but without this constraint, nothing prevents
> the compiler from scheduling the insn in front of frame creation.

Can you please stop with this silliness?

When we start doing git archeology months, years from now, it should be
perfectly clear why a commit was done. This one is not. So either the compiler
is doing the bad scheduling or it isn't. Things can't just work by chance.

Geez.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Ingo Molnar 11 months ago
* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Mar 10, 2025 at 09:35:42PM +0100, Uros Bizjak wrote:
> > On Mon, Mar 10, 2025 at 9:12 PM Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Mon, Mar 10, 2025 at 09:08:04PM +0100, Uros Bizjak wrote:
> > > > a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> > > > instruction from being scheduled before the frame pointer gets set
> > > > up by the containing function, causing objtool to print a "call
> > > > without frame pointer save/setup" warning.
> > >
> > > The other two are ok but this is new. How do you trigger this? I've never seen
> > > it in my randconfig builds...
> > 
> > It is not triggered now, but without this constraint, nothing prevents
> > the compiler from scheduling the insn in front of frame creation.
> 
> Can you please stop with this silliness?
> 
> When we start doing git archeology months, years from now, it should 
> be perfectly clear why a commit was done. This one is not. So either 
> the compiler is doing the bad scheduling or it isn't. Things can't 
> just work by chance.

So this particular code generation aspect seems to be working by random 
implementational chance right now: objtool is basically a second, 
independent layer of tooling with its own assumptions and expectations, 
which is why objtool warnings are not hard build failures.

But whether unexpected instruction scheduling is known to occur or not 
with current compilers should be included in the changelog and is 
relevant information.

Thanks,

	Ingo
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Uros Bizjak 11 months ago
On Mon, Mar 10, 2025 at 9:45 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Mar 10, 2025 at 09:35:42PM +0100, Uros Bizjak wrote:
> > On Mon, Mar 10, 2025 at 9:12 PM Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Mon, Mar 10, 2025 at 09:08:04PM +0100, Uros Bizjak wrote:
> > > > a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> > > > instruction from being scheduled before the frame pointer gets set
> > > > up by the containing function, causing objtool to print a "call
> > > > without frame pointer save/setup" warning.
> > >
> > > The other two are ok but this is new. How do you trigger this? I've never seen
> > > it in my randconfig builds...
> >
> > It is not triggered now, but without this constraint, nothing prevents
> > the compiler from scheduling the insn in front of frame creation.
>
> Can you please stop with this silliness?
>
> When we start doing git archeology months, years from now, it should be
> perfectly clear why a commit was done. This one is not. So either the compiler
> is doing the bad scheduling or it isn't. Things can't just work by chance.
>
> Geez.

Ok, so let it be your way and let's just sweep the issue under the carpet.

BR,
Uros.
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Borislav Petkov 11 months ago
On Mon, Mar 10, 2025 at 09:54:25PM +0100, Uros Bizjak wrote:
> Ok, so let it be your way and let's just sweep the issue under the carpet.

Can you please read my mails more carefilly? Where did I say we should sweep
the issue under the carpet?

The commit message should be *perfectly* clear what it is fixing. This

"a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
instruction from being scheduled before the frame pointer gets set
up by the containing function, causing objtool to print a "call
without frame pointer save/setup" warning."

says that objool is printing a warning. When I ask, it is not really printing
a warning but it can potentially do so because the compiler is allowed to
schedule things wrongly.

Do you notice the difference?

Dammit, it is very important *why* a commit message is there - it is not
write-only and people look at it. So *again* *please* be precise when
explaining why your patch exists!

All that stuff has been documented at length:

https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

Thanks.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Uros Bizjak 11 months ago
On Mon, Mar 10, 2025 at 10:08 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Mar 10, 2025 at 09:54:25PM +0100, Uros Bizjak wrote:
> > Ok, so let it be your way and let's just sweep the issue under the carpet.
>
> Can you please read my mails more carefilly? Where did I say we should sweep
> the issue under the carpet?

The "stop with this silliness" part? But let's put this at rest.

> The commit message should be *perfectly* clear what it is fixing. This
>
> "a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> instruction from being scheduled before the frame pointer gets set
> up by the containing function, causing objtool to print a "call
> without frame pointer save/setup" warning."
>
> says that objool is printing a warning. When I ask, it is not really printing
> a warning but it can potentially do so because the compiler is allowed to
> schedule things wrongly.
>
> Do you notice the difference?

So, rewording this part to:

a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
instruction from being scheduled by the compiler before the frame
pointer gets set
up by the containing function. This unconstrained scheduling might
cause objtool to print a "call without frame pointer save/setup"
warning.

would be ok?

Thanks,
Uros.
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Borislav Petkov 11 months ago
On Mon, Mar 10, 2025 at 10:18:50PM +0100, Uros Bizjak wrote:
> a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> instruction from being scheduled by the compiler before the frame
> pointer gets set
> up by the containing function. This unconstrained scheduling might
> cause objtool to print a "call without frame pointer save/setup"
> warning.
> 
> would be ok?

Yes, and pls say something along the lines of: this is not a currently
triggered issue but it can potentially happen, so that it is perfectly clear
what this patch is addressing.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
Posted by Ingo Molnar 11 months ago
* Uros Bizjak <ubizjak@gmail.com> wrote:

> On Mon, Mar 10, 2025 at 9:12 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Mar 10, 2025 at 09:08:04PM +0100, Uros Bizjak wrote:
> > > a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
> > > instruction from being scheduled before the frame pointer gets set
> > > up by the containing function, causing objtool to print a "call
> > > without frame pointer save/setup" warning.
> >
> > The other two are ok but this is new. How do you trigger this? I've never seen
> > it in my randconfig builds...
> 
> It is not triggered now, but without this constraint, nothing prevents
> the compiler from scheduling the insn in front of frame creation.

Please add:

 'Current versions of compilers don't seem to trigger this condition,
  but without this constraint there's nothing to prevent the compiler
  from scheduling the insn in front of frame creation.'

Thanks,

	Ingo