[RFC PATCH 0/2] Helper to isolate least-significant bit

Petr Tesarik posted 2 patches 1 month ago
arch/alpha/include/asm/bitops.h              |  2 +-
arch/alpha/kernel/core_cia.c                 |  2 +-
arch/alpha/kernel/smp.c                      |  2 +-
arch/arc/include/asm/bitops.h                |  2 +-
arch/m68k/include/asm/bitops.h               | 12 +++++------
arch/mips/dec/ecc-berr.c                     |  2 +-
arch/mips/include/asm/bitops.h               |  4 ++--
arch/mips/pci/pci-malta.c                    |  4 ++--
arch/powerpc/include/asm/bitops.h            |  4 ++--
arch/powerpc/kvm/e500_mmu_host.c             |  2 +-
arch/powerpc/lib/sstep.c                     |  2 +-
arch/powerpc/xmon/ppc-dis.c                  |  3 ++-
arch/powerpc/xmon/ppc-opc.c                  |  6 +++---
arch/s390/include/asm/bitops.h               |  2 +-
arch/xtensa/include/asm/bitops.h             |  6 +++---
arch/xtensa/kernel/traps.c                   |  2 +-
drivers/gpu/drm/gma500/psb_intel_sdvo.c      |  2 +-
drivers/iommu/dma-iommu.c                    |  2 +-
drivers/net/ethernet/netronome/nfp/bpf/jit.c |  2 +-
drivers/net/usb/cdc_ncm.c                    |  4 ++--
include/asm-generic/div64.h                  |  4 ++--
include/linux/bitfield.h                     |  3 ++-
include/linux/bitops.h                       |  1 +
include/linux/ffs_val.h                      | 21 ++++++++++++++++++++
include/linux/log2.h                         |  2 +-
include/linux/min_heap.h                     |  5 +++--
lib/math/gcd.c                               |  4 ++--
lib/sort.c                                   |  3 ++-
net/bluetooth/mgmt.c                         |  2 +-
net/netfilter/nft_set_pipapo.c               |  2 +-
30 files changed, 70 insertions(+), 44 deletions(-)
create mode 100644 include/linux/ffs_val.h
[RFC PATCH 0/2] Helper to isolate least-significant bit
Posted by Petr Tesarik 1 month ago
Isolation of the least significant bit can be achieved with 3 basic
ALU operations which are already open-coded in various places in the
kernel.

However, since other places less efficient constructs, for example
`1UL << ffs(x)`, I assume the trick is known only to some authors, and
it's worth adding a helper to promote its use.

Petr Tesarik (2):
  bits: introduce ffs_val()
  treewide, bits: use ffs_val() where it is open-coded

 arch/alpha/include/asm/bitops.h              |  2 +-
 arch/alpha/kernel/core_cia.c                 |  2 +-
 arch/alpha/kernel/smp.c                      |  2 +-
 arch/arc/include/asm/bitops.h                |  2 +-
 arch/m68k/include/asm/bitops.h               | 12 +++++------
 arch/mips/dec/ecc-berr.c                     |  2 +-
 arch/mips/include/asm/bitops.h               |  4 ++--
 arch/mips/pci/pci-malta.c                    |  4 ++--
 arch/powerpc/include/asm/bitops.h            |  4 ++--
 arch/powerpc/kvm/e500_mmu_host.c             |  2 +-
 arch/powerpc/lib/sstep.c                     |  2 +-
 arch/powerpc/xmon/ppc-dis.c                  |  3 ++-
 arch/powerpc/xmon/ppc-opc.c                  |  6 +++---
 arch/s390/include/asm/bitops.h               |  2 +-
 arch/xtensa/include/asm/bitops.h             |  6 +++---
 arch/xtensa/kernel/traps.c                   |  2 +-
 drivers/gpu/drm/gma500/psb_intel_sdvo.c      |  2 +-
 drivers/iommu/dma-iommu.c                    |  2 +-
 drivers/net/ethernet/netronome/nfp/bpf/jit.c |  2 +-
 drivers/net/usb/cdc_ncm.c                    |  4 ++--
 include/asm-generic/div64.h                  |  4 ++--
 include/linux/bitfield.h                     |  3 ++-
 include/linux/bitops.h                       |  1 +
 include/linux/ffs_val.h                      | 21 ++++++++++++++++++++
 include/linux/log2.h                         |  2 +-
 include/linux/min_heap.h                     |  5 +++--
 lib/math/gcd.c                               |  4 ++--
 lib/sort.c                                   |  3 ++-
 net/bluetooth/mgmt.c                         |  2 +-
 net/netfilter/nft_set_pipapo.c               |  2 +-
 30 files changed, 70 insertions(+), 44 deletions(-)
 create mode 100644 include/linux/ffs_val.h

-- 
2.52.0
Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
Posted by Andrew Cooper 1 month ago
> diff --git a/arch/alpha/kernel/core_cia.c b/arch/alpha/kernel/core_cia.c
> index 6e577228e175d..b3c273c13c104 100644
> --- a/arch/alpha/kernel/core_cia.c
> +++ b/arch/alpha/kernel/core_cia.c
> @@ -1035,7 +1035,7 @@ cia_decode_ecc_error(struct el_CIA_sysdata_mcheck *cia, const char *msg)
>      cia_decode_mem_error(cia, msg);
>  
>      syn = cia->cia_syn & 0xff;
> -    if (syn == (syn & -syn)) {
> +    if (syn == ffs_val(syn)) {
>          fmt = KERN_CRIT "  ECC syndrome %#x -- check bit %d\n";
>          i = ffs(syn) - 1;
>      } else {

This is a check for multiple bits set, which has a simpler expression to
calculate.  (the ffs() in context shows that syn isn't expected to be 0.)

In Xen I added this helper while tiding things up, and it's applicable
to the more common pattern of checking hweight() of a bitmap against 1.

/*
 * Calculate if a value has two or more bits set.  Always use this in
 * preference to an expression of the form 'hweight(x) > 1'.
 */
#define multiple_bits_set(x)                    \
    ({                                          \
        typeof(x) _v = (x);                     \
        (_v & (_v - 1)) != 0;                   \
    })


~Andrew
Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
Posted by Kuan-Wei Chiu 1 month ago
Hi Petr,

On Fri, Jan 09, 2026 at 05:41:34PM +0100, Petr Tesarik wrote:
> Isolation of the least significant bit can be achieved with 3 basic
> ALU operations which are already open-coded in various places in the
> kernel.
> 
> However, since other places less efficient constructs, for example
> `1UL << ffs(x)`, I assume the trick is known only to some authors, and
> it's worth adding a helper to promote its use.

Just out of curiosity, are there any existing users employing 1 <<
ffs(x) (or other inefficient variants) in performance-critical
hotpaths?

From a quick grep, I only found one instance in drivers/clk/ti/mux.c
matching the 1 << ffs(x) pattern. However, this doesn't appear to be a
bottleneck since it is followed by ti_clk_ll_ops->clk_writel(...). The
latency of the MMIO write would likely overshadow the savings of a few
ALU cycles.

Additionally, it seems that patch #2 focuses on replacing the x & -x
implementation with the new API, rather than converting inefficient
constructs like 1 << ffs(x) to use ffs_val().

Regards,
Kuan-Wei

> 
> Petr Tesarik (2):
>   bits: introduce ffs_val()
>   treewide, bits: use ffs_val() where it is open-coded
> 
>  arch/alpha/include/asm/bitops.h              |  2 +-
>  arch/alpha/kernel/core_cia.c                 |  2 +-
>  arch/alpha/kernel/smp.c                      |  2 +-
>  arch/arc/include/asm/bitops.h                |  2 +-
>  arch/m68k/include/asm/bitops.h               | 12 +++++------
>  arch/mips/dec/ecc-berr.c                     |  2 +-
>  arch/mips/include/asm/bitops.h               |  4 ++--
>  arch/mips/pci/pci-malta.c                    |  4 ++--
>  arch/powerpc/include/asm/bitops.h            |  4 ++--
>  arch/powerpc/kvm/e500_mmu_host.c             |  2 +-
>  arch/powerpc/lib/sstep.c                     |  2 +-
>  arch/powerpc/xmon/ppc-dis.c                  |  3 ++-
>  arch/powerpc/xmon/ppc-opc.c                  |  6 +++---
>  arch/s390/include/asm/bitops.h               |  2 +-
>  arch/xtensa/include/asm/bitops.h             |  6 +++---
>  arch/xtensa/kernel/traps.c                   |  2 +-
>  drivers/gpu/drm/gma500/psb_intel_sdvo.c      |  2 +-
>  drivers/iommu/dma-iommu.c                    |  2 +-
>  drivers/net/ethernet/netronome/nfp/bpf/jit.c |  2 +-
>  drivers/net/usb/cdc_ncm.c                    |  4 ++--
>  include/asm-generic/div64.h                  |  4 ++--
>  include/linux/bitfield.h                     |  3 ++-
>  include/linux/bitops.h                       |  1 +
>  include/linux/ffs_val.h                      | 21 ++++++++++++++++++++
>  include/linux/log2.h                         |  2 +-
>  include/linux/min_heap.h                     |  5 +++--
>  lib/math/gcd.c                               |  4 ++--
>  lib/sort.c                                   |  3 ++-
>  net/bluetooth/mgmt.c                         |  2 +-
>  net/netfilter/nft_set_pipapo.c               |  2 +-
>  30 files changed, 70 insertions(+), 44 deletions(-)
>  create mode 100644 include/linux/ffs_val.h
> 
> -- 
> 2.52.0
>
Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
Posted by Petr Tesarik 1 month ago
On Sat, 10 Jan 2026 01:20:59 +0800
Kuan-Wei Chiu <visitorckw@gmail.com> wrote:

> Hi Petr,
> 
> On Fri, Jan 09, 2026 at 05:41:34PM +0100, Petr Tesarik wrote:
> > Isolation of the least significant bit can be achieved with 3 basic
> > ALU operations which are already open-coded in various places in the
> > kernel.
> > 
> > However, since other places less efficient constructs, for example
> > `1UL << ffs(x)`, I assume the trick is known only to some authors, and
> > it's worth adding a helper to promote its use.  
> 
> Just out of curiosity, are there any existing users employing 1 <<
> ffs(x) (or other inefficient variants) in performance-critical
> hotpaths?
>
> From a quick grep, I only found one instance in drivers/clk/ti/mux.c
> matching the 1 << ffs(x) pattern. However, this doesn't appear to be a
> bottleneck since it is followed by ti_clk_ll_ops->clk_writel(...). The
> latency of the MMIO write would likely overshadow the savings of a few
> ALU cycles.

Most expressions are a bit more complex, like this one in
page_cache_async_ra():

  align = 1UL << min(ra->order, ffs(max_pages) - 1);

Or split across multiple lines, like this one in sata_down_spd_limit():

  bit = ffs(mask) - 1;
  mask = 1 << bit;

I agree that there is most likely no measurable performance win. The
resulting machine code merely looks quite silly on architectures
without an instruction to do ffs() and a little bit silly on
architectures where the instruction has a slightly different semantics
(bit position numbering and/or handling of zero value).

> Additionally, it seems that patch #2 focuses on replacing the x & -x
> implementation with the new API, rather than converting inefficient
> constructs like 1 << ffs(x) to use ffs_val().

This is correct. This patch series merely introduces the helper. Patch
2/2 was created mechanically. I haven't sent any follow-up yet to
replace inefficient use, because that would require more effort, and I'm
trying to get some early feedback first (that's why the series is
tagged RFC).

If we agree that it's worth the effort, and I split the patches by
subsystem, it may become a rather long series.

Petr T