[PATCH 2/2] uapi: Refactor __GENMASK_ULL() for speed-up

I Hsin Cheng posted 2 patches 12 months ago
[PATCH 2/2] uapi: Refactor __GENMASK_ULL() for speed-up
Posted by I Hsin Cheng 12 months ago
The calculation of "((~_ULL(0)) - (_ULL(1) << (l)) + 1)" is to generate
a bitmask with "l" trailing zeroes, which is equivalent to
"(~_ULL(0) << (l))".

Refactor the calculation so the number of arithmetic instruction will be
reduced from 3 to 1.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
---
Test is done to show the speed-up we can get from reducing the number of
instruction. The test machine runs with 6.9.0-0-generic kernel on x86_64
architecture with processor AMD Ryzen 7 5700X3D 8-Core Processor.

The test executes in the form of kernel module.

Test result:
[452858.156718] new __GENMASK_ULL() : 5908678
[452858.156722] old __GENMASK_ULL() : 5950998

Test script snippet:
/* switch to __BITS_PER_LONG_LONG when testing __GENMASK_ULL() */
\#define __GENMASK_ULL_NEW(h, l) \
        ((~_ULL(0) << (l)) & \
         (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))

int init_module(void)
{
    ktime_t start, end, total1 = 0, total2 = 0;
    for (int k = 0; k < 100; k++) {
        for (int i = 0; i < __BITS_PER_LONG_LONG; i++) {
            for (int j = 0; j <= i; j++) {
                unsigned result1, result2;

                start = ktime_get();
                result1 = __GENMASK_ULL_NEW(i, j);
                end = ktime_get();
                total1 += (end - start);

                start = ktime_get();
                result2 = __GENMASK_ULL(i, j);
                end = ktime_get();
                total2 += (end - start);

                if (result1 != result2) {
                    pr_info("Wrong calculation of GENMASK_ULL_NEW()\n");
                    return 0;
                }
            }
        }
    }

    pr_info("new __GENMASK_ULL() : %lld\n", total1);
    pr_info("old __GENMASK_ULL() : %lld\n", total2);

    return 0;
}
---
 include/uapi/linux/bits.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
index 8fc7fea65288..965f6eb91ff3 100644
--- a/include/uapi/linux/bits.h
+++ b/include/uapi/linux/bits.h
@@ -9,7 +9,7 @@
          (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
 
 #define __GENMASK_ULL(h, l) \
-        (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
+	((~_ULL(0) << (l)) & \
          (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
 
 #define __GENMASK_U128(h, l) \
-- 
2.43.0
Re: [PATCH 2/2] uapi: Refactor __GENMASK_ULL() for speed-up
Posted by David Laight 12 months ago
On Wed, 12 Feb 2025 00:24:12 +0800
I Hsin Cheng <richard120310@gmail.com> wrote:

> The calculation of "((~_ULL(0)) - (_ULL(1) << (l)) + 1)" is to generate
> a bitmask with "l" trailing zeroes, which is equivalent to
> "(~_ULL(0) << (l))".

Yes, and if you look through the commit history you'll see it was changed
to avoid a compiler warning about the shift losing significant bits.
So you are just reverting that change and the compiler warnings will
reappear.

For non-constants I suspect that (2ul << hi) - (1ul << lo) is the
best answer.
But the compiler (clang with some options?) will still complain
for constants when trying to set the high bit.

That version also doesn't need BITS_PER_[U]LONG.
While that is valid for C, the _ULL() are there for the assembler
(when they are no-ops) so there is no way asm copies can be right
for both GENMASK() ans GENMASK_ULL().

	David
Re: [PATCH 2/2] uapi: Refactor __GENMASK_ULL() for speed-up
Posted by I Hsin Cheng 12 months ago
On Tue, Feb 11, 2025 at 10:30:45PM +0000, David Laight wrote:
> On Wed, 12 Feb 2025 00:24:12 +0800
> I Hsin Cheng <richard120310@gmail.com> wrote:
> 
> > The calculation of "((~_ULL(0)) - (_ULL(1) << (l)) + 1)" is to generate
> > a bitmask with "l" trailing zeroes, which is equivalent to
> > "(~_ULL(0) << (l))".
> 
> Yes, and if you look through the commit history you'll see it was changed
> to avoid a compiler warning about the shift losing significant bits.
> So you are just reverting that change and the compiler warnings will
> reappear.
> 
> For non-constants I suspect that (2ul << hi) - (1ul << lo) is the
> best answer.
> But the compiler (clang with some options?) will still complain
> for constants when trying to set the high bit.
> 
> That version also doesn't need BITS_PER_[U]LONG.
> While that is valid for C, the _ULL() are there for the assembler
> (when they are no-ops) so there is no way asm copies can be right
> for both GENMASK() ans GENMASK_ULL().
> 
> 	David

Hi David,

Thanks for the review!

> Yes, and if you look through the commit history you'll see it was changed
> to avoid a compiler warning about the shift losing significant bits.

I've browse through the commits of include/linux/bits.h , where
GENMASK() was placed under. Though I didn't find specific commit of it,
would you be so kind to paste the link of the commit?

I assume you're talking about warnings like the following?
warning: left shift count >= width of type [-Wshift-count-overflow]

If this is the case then it happens for the current version as well, I
mean from the perspective of operations, "(~_ULL(0) << (l))" and
"(_ULL(1) << (1))" are basically the same, they all shift a constant
left with an amount of "l". When "l" is large enough the compiler will
complain about losing msb.

> While that is valid for C, the _ULL() are there for the assembler
> (when they are no-ops) so there is no way asm copies can be right
> for both GENMASK() ans GENMASK_ULL().

I don't understand this part sorry, if asm copies can't be right for
"~_ULL(0) << (l)" , how can it be right for "(_ULL(1) << (l))" ?
At least from my pespective these 2 ops only differs at the value of
constant. Please let me know about it, I'm not familiar with assembler
and would love to know more about it. Thanks.

Best regards,
I Hsin Cheng
Re: [PATCH 2/2] uapi: Refactor __GENMASK_ULL() for speed-up
Posted by David Laight 12 months ago
On Wed, 12 Feb 2025 20:39:09 +0800
I Hsin Cheng <richard120310@gmail.com> wrote:

> On Tue, Feb 11, 2025 at 10:30:45PM +0000, David Laight wrote:
> > On Wed, 12 Feb 2025 00:24:12 +0800
> > I Hsin Cheng <richard120310@gmail.com> wrote:
> >   
> > > The calculation of "((~_ULL(0)) - (_ULL(1) << (l)) + 1)" is to generate
> > > a bitmask with "l" trailing zeroes, which is equivalent to
> > > "(~_ULL(0) << (l))".  
> > 
> > Yes, and if you look through the commit history you'll see it was changed
> > to avoid a compiler warning about the shift losing significant bits.
> > So you are just reverting that change and the compiler warnings will
> > reappear.
> > 
> > For non-constants I suspect that (2ul << hi) - (1ul << lo) is the
> > best answer.
> > But the compiler (clang with some options?) will still complain
> > for constants when trying to set the high bit.
> > 
> > That version also doesn't need BITS_PER_[U]LONG.
> > While that is valid for C, the _ULL() are there for the assembler
> > (when they are no-ops) so there is no way asm copies can be right
> > for both GENMASK() ans GENMASK_ULL().
> > 
> > 	David  
> 
> Hi David,
> 
> Thanks for the review!
> 
> > Yes, and if you look through the commit history you'll see it was changed
> > to avoid a compiler warning about the shift losing significant bits.  
> 
> I've browse through the commits of include/linux/bits.h , where
> GENMASK() was placed under. Though I didn't find specific commit of it,
> would you be so kind to paste the link of the commit?
> 
> I assume you're talking about warnings like the following?
> warning: left shift count >= width of type [-Wshift-count-overflow]

No, there is one about clang 'objecting' to ~0u << count because
significant bits get discarded.
The 'strange' expression was used to avoid that - even though it is
mathematically equivalent.

> 
> If this is the case then it happens for the current version as well, I
> mean from the perspective of operations, "(~_ULL(0) << (l))" and
> "(_ULL(1) << (1))" are basically the same, they all shift a constant
> left with an amount of "l". When "l" is large enough the compiler will
> complain about losing msb.
> 
> > While that is valid for C, the _ULL() are there for the assembler
> > (when they are no-ops) so there is no way asm copies can be right
> > for both GENMASK() ans GENMASK_ULL().  
> 
> I don't understand this part sorry, if asm copies can't be right for
> "~_ULL(0) << (l)" , how can it be right for "(_ULL(1) << (l))" ?
> At least from my pespective these 2 ops only differs at the value of
> constant. Please let me know about it, I'm not familiar with assembler
> and would love to know more about it. Thanks.

It is the >> that go wrong, since they rely on the number of bits in
the integer type being used.
For asm both ~_UL(0) and ~_ULL(0) are just ~0, but the >> uses a
different constant.
I'm not even sure that the _ULL() should even be in a uapi header.

I proposed using:
	((1u << (hi)) - 1) * 2) + 1 - (((1u << (lo)) - 1)
not that long ago.
That avoid any bits being shifted off the top and I'm sure it
that gcc optimised add the +/- away.
It also has the advantage that the type of the result only depends
of the type of the 1u (so can be 1ul, 1ull etc).

	David

> 
> Best regards,
> I Hsin Cheng
>
Re: [PATCH 2/2] uapi: Refactor __GENMASK_ULL() for speed-up
Posted by Yury Norov 12 months ago
+ Matthias, Andrew

On Wed, Feb 12, 2025 at 08:39:09PM +0800, I Hsin Cheng wrote:
> On Tue, Feb 11, 2025 at 10:30:45PM +0000, David Laight wrote:
> > On Wed, 12 Feb 2025 00:24:12 +0800
> > I Hsin Cheng <richard120310@gmail.com> wrote:
> > 
> > > The calculation of "((~_ULL(0)) - (_ULL(1) << (l)) + 1)" is to generate
> > > a bitmask with "l" trailing zeroes, which is equivalent to
> > > "(~_ULL(0) << (l))".
> > 
> > Yes, and if you look through the commit history you'll see it was changed
> > to avoid a compiler warning about the shift losing significant bits.
> > So you are just reverting that change and the compiler warnings will
> > reappear.
> > For non-constants I suspect that (2ul << hi) - (1ul << lo) is the
> > best answer.
> > But the compiler (clang with some options?) will still complain
> > for constants when trying to set the high bit.
> > 
> > That version also doesn't need BITS_PER_[U]LONG.
> > While that is valid for C, the _ULL() are there for the assembler
> > (when they are no-ops) so there is no way asm copies can be right
> > for both GENMASK() ans GENMASK_ULL().
> > 
> > 	David
> 
> Hi David,
> 
> Thanks for the review!
> 
> > Yes, and if you look through the commit history you'll see it was changed
> > to avoid a compiler warning about the shift losing significant bits.
> 
> I've browse through the commits of include/linux/bits.h , where
> GENMASK() was placed under. Though I didn't find specific commit of it,
> would you be so kind to paste the link of the commit?

It's c32ee3d9abd284b4f ("bitops: avoid integer overflow in GENMASK(_ULL)").

[From cover letter] 

> $ ./scripts/bloat-o-meter vmlinux_old vmlinux_new
> add/remove: 0/2 grow/shrink: 46/510 up/down: 464/-1733 (-1269)

You can see, Andrew tested it and found no changes in .text size. If that
has changed, we need to revert the original patch and suppress warning.

Can you please print the full bloat-o-meter result? Which compiler(s) did
you try?

Thanks,
Yury