Using hweight() is an especially expensive way of determining simply if
multiple bits are set in a value. Worse, 4 of the 10 hweight() calls in Xen
are of this form.
Switch to the new multiple_bits_set() helper. This is far more efficient than
the longhand hweight() algorithm and, owing to its simplicity, likely more
efficient than even a dedicated instruction on a superscalar processor.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
For future work, do related work for bitmap_weight() so cpumask_weight() in
particular can stop being used for so many 0/1 special cases.
On x86, the code reduction speaks for itself:
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-240 (-240)
Function old new delta
vlapic_ipi 722 650 -72
numa_emulation 577 497 -80
do_xenpmu_op 1665 1577 -88
That's an aweful lot of wasted calculation for the same answer.
I can't find a way of enabling CONFIG_NUMA on ARM (yet?) so right now there's
no change in any other architecture. However, a synthetic helper shows the
following on ARM32:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-128 (-128)
Function old new delta
test_mbs 176 48 -128
and on ARM64:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44)
Function old new delta
test_mbs 60 16 -44
PPC64 has POPCNTD in a default build of Xen and by chance both algorithms
compile to the same number of instructions.
RISC-V doesn't have hweight() wired up yet at all.
---
xen/arch/x86/cpu/vpmu.c | 2 +-
xen/arch/x86/hvm/vlapic.c | 10 ++++++----
xen/common/numa.c | 2 +-
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index b2ba9994129b..a5bb1689c7d5 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -673,7 +673,7 @@ long do_xenpmu_op(
{
if ( (pmu_params.val &
~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
- (hweight64(pmu_params.val) > 1) )
+ multiple_bits_set(pmu_params.val) )
return -EINVAL;
/* 32-bit dom0 can only sample itself. */
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 2ec95942713e..4a3e21a65f9d 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -467,12 +467,14 @@ static bool is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
return short_hand != APIC_DEST_SELF;
if ( vlapic_x2apic_mode(vlapic) )
- return dest_mode ? hweight16(dest) > 1 : dest == 0xffffffffU;
+ return dest_mode ? multiple_bits_set((uint16_t)dest)
+ : dest == 0xffffffffU;
if ( dest_mode )
- return hweight8(dest &
- GET_xAPIC_DEST_FIELD(vlapic_get_reg(vlapic,
- APIC_DFR))) > 1;
+ {
+ dest &= GET_xAPIC_DEST_FIELD(vlapic_get_reg(vlapic, APIC_DFR));
+ return multiple_bits_set((uint8_t)dest);
+ }
return dest == 0xff;
}
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 28a09766fabc..ce3991929ce5 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -546,7 +546,7 @@ static int __init numa_emulation(unsigned long start_pfn,
uint64_t sz = pfn_to_paddr(end_pfn - start_pfn) / numa_fake;
/* Kludge needed for the hash function */
- if ( hweight64(sz) > 1 )
+ if ( multiple_bits_set(sz) )
{
uint64_t x = 1;
--
2.39.2