There's no need to subtract "bit" in what is passed to __scanbit(), as the
other bits are zero anyway after the shift (in __find_next_bit()) or can
be made so (in __find_next_zero_bit()) by flipping negation and shift. (We
actually leverage the same facts in find_next{,_zero}_bit() as well.) This
way in __scanbit() the TZCNT alternative can be engaged.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Register allocation (and hence effects on code size of the change here)
is pretty "interesting". The compiler doesn't look to realize that while
for 64-bit insns it doesn't matter which GPR is used (a REX prefix is
needed anyway), 32-bit insns can be helped by preferring the low 8 GPRs.
(Granted the inline assembly in __scanbit() may also be a little difficult
to deal with.)
--- a/xen/arch/x86/bitops.c
+++ b/xen/arch/x86/bitops.c
@@ -35,8 +35,8 @@ unsigned int __find_next_bit(
if ( bit != 0 )
{
/* Look for a bit in the first word. */
- set = __scanbit(*p >> bit, BITS_PER_LONG - bit);
- if ( set < (BITS_PER_LONG - bit) )
+ set = __scanbit(*p >> bit, BITS_PER_LONG);
+ if ( set < BITS_PER_LONG )
return (offset + set);
offset += BITS_PER_LONG - bit;
p++;
@@ -85,8 +85,8 @@ unsigned int __find_next_zero_bit(
if ( bit != 0 )
{
/* Look for zero in the first word. */
- set = __scanbit(~(*p >> bit), BITS_PER_LONG - bit);
- if ( set < (BITS_PER_LONG - bit) )
+ set = __scanbit(~*p >> bit, BITS_PER_LONG);
+ if ( set < BITS_PER_LONG )
return (offset + set);
offset += BITS_PER_LONG - bit;
p++;
On 19/02/2026 3:42 pm, Jan Beulich wrote:
> There's no need to subtract "bit" in what is passed to __scanbit(), as the
> other bits are zero anyway after the shift (in __find_next_bit()) or can
> be made so (in __find_next_zero_bit()) by flipping negation and shift. (We
> actually leverage the same facts in find_next{,_zero}_bit() as well.) This
> way in __scanbit() the TZCNT alternative can be engaged.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
I agree that we're already doing that in find_next{,_zero}_bit() and
therefore ought to be safe here.
What's really confusing about all of this is that __scanbit() differs
from ffs() by wanting a return value in the range [1, 64). With that in
mind, I think we can do exactly the same as we do in arch_ffs(), but
preloading with BITS_PER_LONG rather than -1. That in turn removes the
alternative and simplifies things further.
~Andrew
© 2016 - 2026 Red Hat, Inc.