x86/bitops: adjust partial first word handling in __find_next{,_zero}_bit()

[PATCH] x86/bitops: adjust partial first word handling in __find_next{,_zero}_bit()

Posted by Jan Beulich 1 week, 4 days ago

There's no need to subtract "bit" in what is passed to __scanbit(), as the
other bits are zero anyway after the shift (in __find_next_bit()) or can
be made so (in __find_next_zero_bit()) by flipping negation and shift. (We
actually leverage the same facts in find_next{,_zero}_bit() as well.) This
way in __scanbit() the TZCNT alternative can be engaged.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Register allocation (and hence effects on code size of the change here)
is pretty "interesting". The compiler doesn't look to realize that while
for 64-bit insns it doesn't matter which GPR is used (a REX prefix is
needed anyway), 32-bit insns can be helped by preferring the low 8 GPRs.
(Granted the inline assembly in __scanbit() may also be a little difficult
to deal with.)

--- a/xen/arch/x86/bitops.c
+++ b/xen/arch/x86/bitops.c
@@ -35,8 +35,8 @@ unsigned int __find_next_bit(
     if ( bit != 0 )
     {
         /* Look for a bit in the first word. */
-        set = __scanbit(*p >> bit, BITS_PER_LONG - bit);
-        if ( set < (BITS_PER_LONG - bit) )
+        set = __scanbit(*p >> bit, BITS_PER_LONG);
+        if ( set < BITS_PER_LONG )
             return (offset + set);
         offset += BITS_PER_LONG - bit;
         p++;
@@ -85,8 +85,8 @@ unsigned int __find_next_zero_bit(
     if ( bit != 0 )
     {
         /* Look for zero in the first word. */
-        set = __scanbit(~(*p >> bit), BITS_PER_LONG - bit);
-        if ( set < (BITS_PER_LONG - bit) )
+        set = __scanbit(~*p >> bit, BITS_PER_LONG);
+        if ( set < BITS_PER_LONG )
             return (offset + set);
         offset += BITS_PER_LONG - bit;
         p++;

Re: [PATCH] x86/bitops: adjust partial first word handling in __find_next{,_zero}_bit()

Posted by Andrew Cooper 1 week, 4 days ago

On 19/02/2026 3:42 pm, Jan Beulich wrote:
> There's no need to subtract "bit" in what is passed to __scanbit(), as the
> other bits are zero anyway after the shift (in __find_next_bit()) or can
> be made so (in __find_next_zero_bit()) by flipping negation and shift. (We
> actually leverage the same facts in find_next{,_zero}_bit() as well.) This
> way in __scanbit() the TZCNT alternative can be engaged.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

I agree that we're already doing that in find_next{,_zero}_bit() and
therefore ought to be safe here.

What's really confusing about all of this is that __scanbit() differs
from ffs() by wanting a return value in the range [1, 64).  With that in
mind, I think we can do exactly the same as we do in arch_ffs(), but
preloading with BITS_PER_LONG rather than -1.  That in turn removes the
alternative and simplifies things further.

~Andrew