[PATCH 7/7] xen/bitops: Delete find_first_set_bit()

Andrew Cooper posted 7 patches 1 year, 11 months ago
There is a newer version of this series
[PATCH 7/7] xen/bitops: Delete find_first_set_bit()
Posted by Andrew Cooper 1 year, 11 months ago
No more users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/arm/include/asm/bitops.h | 12 ------------
 xen/arch/ppc/include/asm/bitops.h |  9 ---------
 xen/arch/x86/include/asm/bitops.h | 12 ------------
 3 files changed, 33 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 59ae8ed150b6..5104334e4874 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -160,18 +160,6 @@ static inline int fls(unsigned int x)
 #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
 #define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-        return ffsl(word) - 1;
-}
-
 /**
  * hweightN - returns the hamming weight of a N-bit word
  * @x: the word to weigh
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index ecec2a826660..989d341a44c7 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -206,13 +206,4 @@ static always_inline unsigned long __ffs(unsigned long word)
     return __builtin_ctzl(word);
 }
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-#define find_first_set_bit(x) (ffsl(x) - 1)
-
 #endif /* _ASM_PPC_BITOPS_H */
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 99342877e32f..2835bb6814d5 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
     r__;                                                                    \
 })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- * 
- * Returns the bit-number of the first set bit. The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
-    return (unsigned int)word;
-}
-
 static inline unsigned int arch_ffs(unsigned int x)
 {
     int r = -1;
-- 
2.30.2


Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
Posted by Jan Beulich 1 year, 11 months ago
On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>      r__;                                                                    \
>  })
>  
> -/**
> - * find_first_set_bit - find the first set bit in @word
> - * @word: the word to search
> - * 
> - * Returns the bit-number of the first set bit. The input must *not* be zero.
> - */
> -static inline unsigned int find_first_set_bit(unsigned long word)
> -{
> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
> -    return (unsigned int)word;
> -}

And you think it's okay to no longer use TZCNT like this when available,
where the output doesn't have to have its value set up front?

Jan
Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
Posted by Andrew Cooper 1 year, 11 months ago
On 14/03/2024 3:59 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>      r__;                                                                    \
>>  })
>>  
>> -/**
>> - * find_first_set_bit - find the first set bit in @word
>> - * @word: the word to search
>> - * 
>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>> - */
>> -static inline unsigned int find_first_set_bit(unsigned long word)
>> -{
>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>> -    return (unsigned int)word;
>> -}
> And you think it's okay to no longer use TZCNT like this when available,
> where the output doesn't have to have its value set up front?

This is a particularly evil piece of inline asm.

It is interpreted as BSF or TZCNT depending on the BMI instruction set
(Haswell/Piledriver era).  Furthermore there are errata on some Intel
systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.

Which means this piece of asm suffers from all of an undefined output
register, undefined CF behaviour, and differing ZF behaviour (I believe)
depending on which hardware you're running on.

The only thing the REP prefix is getting you is a deterministic 0 in the
destination register, on some hardware only, for code which has already
violated the input safety condition.  As a piece of defence in depth,
then perhaps it's useful.

But following up from the other thread,
https://gcc.gnu.org/pipermail/gcc/2024-March/243475.html is form where
the compiler can (and does!) simplify back to the plain BSF form when it
can prove that this is safe.


The only case where using TZCNT is helpful is when we're compiling for
x86_64-v3 and there is no need to work around BSF's undefined behaviour.

Even with x86's arch_ffs() now split nicely based on whether the
compiler knows BSF is safe or not, an alternative to swap between BSF
and TZCNT probably isn't a win; you still have to cover up 6 or 7 bytes
of the -1 setup, which you can't do with leading prefixes on the TZCNT
itself.

All CPUs with BMI can swallow the double-instruction data dependency
without breaking a sweat, at which point you're trading off (at best) a
1-cycle improvement vs the setup costs of the alternative.  If there is
any real improvement to be had here, it's marginal enough that I'm not
sure it's worth doing.

~Andrew

Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
Posted by Andrew Cooper 1 year, 11 months ago
On 14/03/2024 5:14 pm, Andrew Cooper wrote:
> On 14/03/2024 3:59 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>>      r__;                                                                    \
>>>  })
>>>  
>>> -/**
>>> - * find_first_set_bit - find the first set bit in @word
>>> - * @word: the word to search
>>> - * 
>>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>>> - */
>>> -static inline unsigned int find_first_set_bit(unsigned long word)
>>> -{
>>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>>> -    return (unsigned int)word;
>>> -}
>> And you think it's okay to no longer use TZCNT like this when available,
>> where the output doesn't have to have its value set up front?
> This is a particularly evil piece of inline asm.
>
> It is interpreted as BSF or TZCNT depending on the BMI instruction set
> (Haswell/Piledriver era).  Furthermore there are errata on some Intel
> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.
>
> Which means this piece of asm suffers from all of an undefined output
> register, undefined CF behaviour, and differing ZF behaviour (I believe)
> depending on which hardware you're running on.
>
> The only thing the REP prefix is getting you is a deterministic 0 in the
> destination register,

No, it doesn't.

For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64
in this case.

It also means there's no chance of coming up with a useful alternative
for ffs() to use TZCNT when available.

~Andrew

Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
Posted by Jan Beulich 1 year, 11 months ago
On 15.03.2024 14:48, Andrew Cooper wrote:
> On 14/03/2024 5:14 pm, Andrew Cooper wrote:
>> On 14/03/2024 3:59 pm, Jan Beulich wrote:
>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/include/asm/bitops.h
>>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>>>      r__;                                                                    \
>>>>  })
>>>>  
>>>> -/**
>>>> - * find_first_set_bit - find the first set bit in @word
>>>> - * @word: the word to search
>>>> - * 
>>>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>>>> - */
>>>> -static inline unsigned int find_first_set_bit(unsigned long word)
>>>> -{
>>>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>>>> -    return (unsigned int)word;
>>>> -}
>>> And you think it's okay to no longer use TZCNT like this when available,
>>> where the output doesn't have to have its value set up front?
>> This is a particularly evil piece of inline asm.
>>
>> It is interpreted as BSF or TZCNT depending on the BMI instruction set
>> (Haswell/Piledriver era).  Furthermore there are errata on some Intel
>> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.
>>
>> Which means this piece of asm suffers from all of an undefined output
>> register, undefined CF behaviour, and differing ZF behaviour (I believe)
>> depending on which hardware you're running on.
>>
>> The only thing the REP prefix is getting you is a deterministic 0 in the
>> destination register,
> 
> No, it doesn't.
> 
> For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64
> in this case.
> 
> It also means there's no chance of coming up with a useful alternative
> for ffs() to use TZCNT when available.

Right, for ffs() TZCNT isn't suitable. But for find_first_set_bit() it was,
yielding a reliably out-of-range output for zero input (which BSF wouldn't
guarantee).

Jan