[PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage

Yury Norov posted 54 patches 4 years, 5 months ago
MAINTAINERS                                   |  4 +
arch/alpha/kernel/process.c                   |  2 +-
arch/ia64/kernel/setup.c                      |  2 +-
arch/ia64/mm/tlb.c                            |  2 +-
arch/mips/cavium-octeon/octeon-irq.c          |  4 +-
arch/mips/kernel/crash.c                      |  2 +-
arch/nds32/kernel/perf_event_cpu.c            |  2 +-
arch/powerpc/kernel/smp.c                     |  2 +-
arch/powerpc/kernel/watchdog.c                |  2 +-
arch/powerpc/xmon/xmon.c                      |  4 +-
arch/s390/kernel/perf_cpum_cf.c               |  2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 16 ++--
arch/x86/kernel/smpboot.c                     |  4 +-
arch/x86/kvm/hyperv.c                         |  8 +-
arch/x86/mm/amdtopology.c                     |  2 +-
arch/x86/mm/mmio-mod.c                        |  2 +-
arch/x86/mm/numa_emulation.c                  |  4 +-
arch/x86/platform/uv/uv_nmi.c                 |  2 +-
drivers/acpi/numa/srat.c                      |  2 +-
drivers/cpufreq/qcom-cpufreq-hw.c             |  2 +-
drivers/cpufreq/scmi-cpufreq.c                |  2 +-
drivers/firmware/psci/psci_checker.c          |  2 +-
drivers/gpu/drm/i915/i915_pmu.c               |  2 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c      |  2 +-
drivers/hv/channel_mgmt.c                     |  4 +-
drivers/iio/dummy/iio_simple_dummy_buffer.c   |  4 +-
drivers/iio/industrialio-trigger.c            |  2 +-
drivers/infiniband/hw/hfi1/affinity.c         | 13 ++-
drivers/infiniband/hw/qib/qib_file_ops.c      |  2 +-
drivers/infiniband/hw/qib/qib_iba7322.c       |  2 +-
drivers/irqchip/irq-bcm6345-l1.c              |  2 +-
drivers/memstick/core/ms_block.c              |  4 +-
drivers/net/dsa/b53/b53_common.c              |  6 +-
drivers/net/ethernet/broadcom/bcmsysport.c    |  6 +-
.../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  4 +-
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c    |  2 +-
.../marvell/octeontx2/nic/otx2_ethtool.c      |  2 +-
.../marvell/octeontx2/nic/otx2_flows.c        |  8 +-
.../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  2 +-
drivers/net/ethernet/mellanox/mlx4/cmd.c      | 33 +++-----
drivers/net/ethernet/mellanox/mlx4/eq.c       |  4 +-
drivers/net/ethernet/mellanox/mlx4/fw.c       |  4 +-
drivers/net/ethernet/mellanox/mlx4/main.c     |  2 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c    |  4 +-
drivers/net/ethernet/qlogic/qed/qed_roce.c    |  2 +-
drivers/perf/arm-cci.c                        |  2 +-
drivers/perf/arm_pmu.c                        |  4 +-
drivers/perf/hisilicon/hisi_uncore_pmu.c      |  2 +-
drivers/perf/thunderx2_pmu.c                  |  4 +-
drivers/perf/xgene_pmu.c                      |  2 +-
drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
drivers/soc/fsl/qbman/qman_test_stash.c       |  2 +-
drivers/staging/media/tegra-video/vi.c        |  2 +-
drivers/thermal/intel/intel_powerclamp.c      |  9 +--
include/linux/bitmap.h                        | 80 +++++++++++++++++++
include/linux/cpumask.h                       | 50 ++++++++++++
include/linux/nodemask.h                      | 40 ++++++++++
kernel/irq/affinity.c                         |  2 +-
kernel/padata.c                               |  2 +-
kernel/rcu/tree_nocb.h                        |  4 +-
kernel/rcu/tree_plugin.h                      |  2 +-
kernel/sched/core.c                           | 10 +--
kernel/sched/topology.c                       |  4 +-
kernel/time/clockevents.c                     |  2 +-
kernel/time/clocksource.c                     |  2 +-
lib/bitmap.c                                  | 21 +++++
mm/mempolicy.c                                |  2 +-
mm/page_alloc.c                               |  2 +-
mm/vmstat.c                                   |  4 +-
tools/include/linux/bitmap.h                  | 44 ++++++++++
tools/lib/bitmap.c                            | 20 +++++
tools/perf/builtin-c2c.c                      |  4 +-
tools/perf/util/pmu.c                         |  2 +-
73 files changed, 374 insertions(+), 142 deletions(-)
[PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage
Posted by Yury Norov 4 years, 5 months ago
In many cases people use bitmap_weight()-based functions to compare
the result against a number of expression:

	if (cpumask_weight(mask) > 1)
		do_something();

This may take considerable amount of time on many-cpus machines because
cpumask_weight() will traverse every word of underlying cpumask
unconditionally.

We can significantly improve on it for many real cases if stop traversing
the mask as soon as we count cpus to any number greater than 1:

	if (cpumask_weight_gt(mask, 1))
		do_something();

The first part of series converts cpumask_weight() to cpumask_empty()
if the number to compare with is 0. Ditto for bitmap_weigth() and
nodes_weight().

In the 2nd part of the series bitmap_weight_cmp() is added together with
bitmap_weight_{eq,gt,ge,lt,le} wrappers on top of it. Corresponding
wrappers for cpumask and nodemask are added as well.

v1: https://lkml.org/lkml/2021/11/27/339
v2: https://lkml.org/lkml/2021/12/18/241
v3:
  - drop subseries for possible, present and active cpumasks. Will
    submit it separately if needed;
  - split patches per subsystems as requested by Greg and Michał;
  - trim the recipient list. Add drivers and arch maintainers to 
    corresponding patches only.

Yury Norov (54):
  net/dsa: don't use bitmap_weight() in b53_arl_read()
  net/ethernet: don't use bitmap_weight() in bcm_sysport_rule_set()
  thermal/intel: don't use bitmap_weight() in end_power_clamp()
  net: mellanox: fix open-coded for_each_set_bit()
  nds32: perf: replace bitmap_weight with bitmap_empty where appropriate
  x86/kvm: replace bitmap_weight with bitmap_empty where appropriate
  gpu: drm: replace bitmap_weight with bitmap_empty where appropriate
  net: ethernet: replace bitmap_weight with bitmap_empty for intel
  net: ethernet: replace bitmap_weight with bitmap_empty for Marvell
  net: ethernet: replace bitmap_weight with bitmap_empty for qlogic
  perf: replace bitmap_weight with bitmap_empty where appropriate
  tools/perf: replace bitmap_weight with bitmap_empty where appropriate
  arch/alpha: replace cpumask_weight with cpumask_empty where
    appropriate
  arch/ia64: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  cpufreq: replace cpumask_weight with cpumask_empty where appropriate
  gpu: drm: replace cpumask_weight with cpumask_empty where appropriate
  drivers/infiniband: replace cpumask_weight with cpumask_empty where
    appropriate
  drivers/irqchip: replace cpumask_weight with cpumask_empty where
    appropriate
  kernel/irq: replace cpumask_weight with cpumask_empty where
    appropriate
  kernel: replace cpumask_weight with cpumask_empty in padata.c
  rcu: replace cpumask_weight with cpumask_empty where appropriate
  sched: replace cpumask_weight with cpumask_empty where appropriate
  time: replace cpumask_weight with cpumask_empty in clocksource.c
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions
  arch/x86: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le}
    where appropriate
  drivers/iio: replace bitmap_weight() with bitmap_weight_{eq,gt} where
    appropriate
  drivers/memstick: replace bitmap_weight with bitmap_weight_eq where
    appropriate
  net: ethernet: replace bitmap_weight with bitmap_weight_eq for intel
  net: ethernet: replace bitmap_weight with bitmap_weight_{eq,gt} for
    OcteonTX2
  net: ethernet: replace bitmap_weight with
    bitmap_weight_{eq,gt,ge,lt,le} for mellanox
  perf: replace bitmap_weight with bitmap_weight_eq for ThunderX2
  drivers/staging: replace bitmap_weight with bitmap_weight_le for
    tegra-video
  lib/cpumask: add cpumask_weight_{eq,gt,ge,lt,le}
  arch/ia64: replace cpumask_weight with cpumask_weight_eq in mm/tlb.c
  arch/mips: replace cpumask_weight with cpumask_weight_{eq, ...} where
    appropriate
  arch/powerpc: replace cpumask_weight with cpumask_weight_{eq, ...}
    where appropriate
  arch/s390: replace cpumask_weight with cpumask_weight_eq where
    appropriate
  arch/x86: replace cpumask_weight with cpumask_weight_eq where
    appropriate
  firmware: pcsi: replace cpumask_weight with cpumask_weight_eq
  drivers/hv: replace cpumask_weight with cpumask_weight_eq
  infiniband: replace cpumask_weight with cpumask_weight_{eq, ...} where
    appropriate
  scsi: replace cpumask_weight with cpumask_weight_gt
  soc: replace cpumask_weight with cpumask_weight_lt
  sched: replace cpumask_weight with cpumask_weight_eq where appropriate
  kernel/time: replace cpumask_weight with cpumask_weight_eq where
    appropriate
  lib/nodemask: add nodemask_weight_{eq,gt,ge,lt,le}
  acpi: replace nodes__weight with nodes_weight_ge for numa
  mm: replace nodes_weight with nodes_weight_eq in mempolicy
  lib/nodemask: add num_node_state_eq()
  tools/bitmap: sync bitmap_weight
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API

 MAINTAINERS                                   |  4 +
 arch/alpha/kernel/process.c                   |  2 +-
 arch/ia64/kernel/setup.c                      |  2 +-
 arch/ia64/mm/tlb.c                            |  2 +-
 arch/mips/cavium-octeon/octeon-irq.c          |  4 +-
 arch/mips/kernel/crash.c                      |  2 +-
 arch/nds32/kernel/perf_event_cpu.c            |  2 +-
 arch/powerpc/kernel/smp.c                     |  2 +-
 arch/powerpc/kernel/watchdog.c                |  2 +-
 arch/powerpc/xmon/xmon.c                      |  4 +-
 arch/s390/kernel/perf_cpum_cf.c               |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 16 ++--
 arch/x86/kernel/smpboot.c                     |  4 +-
 arch/x86/kvm/hyperv.c                         |  8 +-
 arch/x86/mm/amdtopology.c                     |  2 +-
 arch/x86/mm/mmio-mod.c                        |  2 +-
 arch/x86/mm/numa_emulation.c                  |  4 +-
 arch/x86/platform/uv/uv_nmi.c                 |  2 +-
 drivers/acpi/numa/srat.c                      |  2 +-
 drivers/cpufreq/qcom-cpufreq-hw.c             |  2 +-
 drivers/cpufreq/scmi-cpufreq.c                |  2 +-
 drivers/firmware/psci/psci_checker.c          |  2 +-
 drivers/gpu/drm/i915/i915_pmu.c               |  2 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c      |  2 +-
 drivers/hv/channel_mgmt.c                     |  4 +-
 drivers/iio/dummy/iio_simple_dummy_buffer.c   |  4 +-
 drivers/iio/industrialio-trigger.c            |  2 +-
 drivers/infiniband/hw/hfi1/affinity.c         | 13 ++-
 drivers/infiniband/hw/qib/qib_file_ops.c      |  2 +-
 drivers/infiniband/hw/qib/qib_iba7322.c       |  2 +-
 drivers/irqchip/irq-bcm6345-l1.c              |  2 +-
 drivers/memstick/core/ms_block.c              |  4 +-
 drivers/net/dsa/b53/b53_common.c              |  6 +-
 drivers/net/ethernet/broadcom/bcmsysport.c    |  6 +-
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  4 +-
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    |  2 +-
 .../marvell/octeontx2/nic/otx2_ethtool.c      |  2 +-
 .../marvell/octeontx2/nic/otx2_flows.c        |  8 +-
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/cmd.c      | 33 +++-----
 drivers/net/ethernet/mellanox/mlx4/eq.c       |  4 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c       |  4 +-
 drivers/net/ethernet/mellanox/mlx4/main.c     |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c    |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_roce.c    |  2 +-
 drivers/perf/arm-cci.c                        |  2 +-
 drivers/perf/arm_pmu.c                        |  4 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c      |  2 +-
 drivers/perf/thunderx2_pmu.c                  |  4 +-
 drivers/perf/xgene_pmu.c                      |  2 +-
 drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
 drivers/soc/fsl/qbman/qman_test_stash.c       |  2 +-
 drivers/staging/media/tegra-video/vi.c        |  2 +-
 drivers/thermal/intel/intel_powerclamp.c      |  9 +--
 include/linux/bitmap.h                        | 80 +++++++++++++++++++
 include/linux/cpumask.h                       | 50 ++++++++++++
 include/linux/nodemask.h                      | 40 ++++++++++
 kernel/irq/affinity.c                         |  2 +-
 kernel/padata.c                               |  2 +-
 kernel/rcu/tree_nocb.h                        |  4 +-
 kernel/rcu/tree_plugin.h                      |  2 +-
 kernel/sched/core.c                           | 10 +--
 kernel/sched/topology.c                       |  4 +-
 kernel/time/clockevents.c                     |  2 +-
 kernel/time/clocksource.c                     |  2 +-
 lib/bitmap.c                                  | 21 +++++
 mm/mempolicy.c                                |  2 +-
 mm/page_alloc.c                               |  2 +-
 mm/vmstat.c                                   |  4 +-
 tools/include/linux/bitmap.h                  | 44 ++++++++++
 tools/lib/bitmap.c                            | 20 +++++
 tools/perf/builtin-c2c.c                      |  4 +-
 tools/perf/util/pmu.c                         |  2 +-
 73 files changed, 374 insertions(+), 142 deletions(-)

-- 
2.30.2

Re: [PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage
Posted by Vaittinen, Matti 4 years, 5 months ago
On 1/23/22 20:38, Yury Norov wrote:
> In many cases people use bitmap_weight()-based functions to compare
> the result against a number of expression:
> 
> 	if (cpumask_weight(mask) > 1)
> 		do_something();
> 
> This may take considerable amount of time on many-cpus machines because
> cpumask_weight() will traverse every word of underlying cpumask
> unconditionally.
> 
> We can significantly improve on it for many real cases if stop traversing
> the mask as soon as we count cpus to any number greater than 1:
> 
> 	if (cpumask_weight_gt(mask, 1))
> 		do_something();

I guess I am part of the recipient list because I did the original 
suggestion of adding the single_bit_set()?

If this is the case - well, I do like this series. Overall it looks good 
to me - but I for sure did not go through all the changes in detail ;) 
If there is some other reason to loop me in (Eg, if someone expects me 
to take a more specific look on something) - please give me a nudge.

Best Regards
	-- Matti Vaittinen


-- 
The Linux Kernel guy at ROHM Semiconductors

Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~ this year is the year of a signature writers block ~~
Re: [PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage
Posted by Yury Norov 4 years, 5 months ago
On Tue, Jan 25, 2022 at 11:30 PM Vaittinen, Matti
<Matti.Vaittinen@fi.rohmeurope.com> wrote:
>
> On 1/23/22 20:38, Yury Norov wrote:
> > In many cases people use bitmap_weight()-based functions to compare
> > the result against a number of expression:
> >
> >       if (cpumask_weight(mask) > 1)
> >               do_something();
> >
> > This may take considerable amount of time on many-cpus machines because
> > cpumask_weight() will traverse every word of underlying cpumask
> > unconditionally.
> >
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count cpus to any number greater than 1:
> >
> >       if (cpumask_weight_gt(mask, 1))
> >               do_something();
>
> I guess I am part of the recipient list because I did the original
> suggestion of adding the single_bit_set()?

Yes, because of single_bit_set()

> If this is the case - well, I do like this series. Overall it looks good
> to me - but I for sure did not go through all the changes in detail ;)
> If there is some other reason to loop me in (Eg, if someone expects me
> to take a more specific look on something) - please give me a nudge.

The key patch of the series is #27: "lib/bitmap: add bitmap_weight_{cmp, eq,
gt, ge, lt, le} functions"

Feel free to add suggested/reviewed (or whatever you find appropriate) tags
if you want.

Thanks,
Yury