[PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops

Andrew Cooper posted 1 patch 8 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20250226172050.1300771-1-andrew.cooper3@citrix.com
xen/arch/riscv/include/asm/bitops.h | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Andrew Cooper 8 months, 1 week ago
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Depends on "xen/riscv: make zbb as mandatory"
---
 xen/arch/riscv/include/asm/bitops.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/arch/riscv/include/asm/bitops.h b/xen/arch/riscv/include/asm/bitops.h
index d22eec1e87c7..df3df93520c5 100644
--- a/xen/arch/riscv/include/asm/bitops.h
+++ b/xen/arch/riscv/include/asm/bitops.h
@@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
 #undef NOT
 #undef __AMO
 
+#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
+#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
+#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
+#define arch_flsl(x)    ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0)
+
+#define arch_heightl(x) __builtin_popcountl(x)
+
 #endif /* ASM__RISCV__BITOPS_H */
 
 /*

base-commit: 7cf163879c5add0a4f7f9c987b61f04f8f7051b1
prerequisite-patch-id: 9ee1f7ebf5d34b1c565ee2d3d4fb319164bb8bcd
prerequisite-patch-id: 8a05c87c8d051a3ac0820887f676bbd318e4ae88
prerequisite-patch-id: 6b56e42d130d8b5ee39457b6760b05cc6e16b049
prerequisite-patch-id: c139f1f5741d695cd5e5aa6be904edcb61b73885
prerequisite-patch-id: 99f8b701000e9ee11060934e627a988ddf9aaaa7
-- 
2.39.5
Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Jan Beulich 8 months, 1 week ago
On 26.02.2025 18:20, Andrew Cooper wrote:
> --- a/xen/arch/riscv/include/asm/bitops.h
> +++ b/xen/arch/riscv/include/asm/bitops.h
> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>  #undef NOT
>  #undef __AMO
>  
> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)

I fear you won't like me to say this, but can't we avoid baking in yet
another assumption on sizeof(int) == 4, by using at least sizeof(int) * 8
here (yet better might be sizeof(int) * BITS_PER_BYTE)?

> +#define arch_flsl(x)    ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0)
> +
> +#define arch_heightl(x) __builtin_popcountl(x)

arch_hweightl()?

Jan
Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Andrew Cooper 8 months ago
On 27/02/2025 8:11 am, Jan Beulich wrote:
> On 26.02.2025 18:20, Andrew Cooper wrote:
>> --- a/xen/arch/riscv/include/asm/bitops.h
>> +++ b/xen/arch/riscv/include/asm/bitops.h
>> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>>  #undef NOT
>>  #undef __AMO
>>  
>> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
>> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
>> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
> I fear you won't like me to say this, but can't we avoid baking in yet
> another assumption on sizeof(int) == 4, by using at least sizeof(int) * 8
> here (yet better might be sizeof(int) * BITS_PER_BYTE)?

Yes and no.

No, because 32 here is consistent with ARM and PPC when it comes to
arch_fls().  Given the effort it took to get these consistent, I'm not
interested in letting them diverge.

But, if someone wants to introduce BITS_PER_INT to mirror BITS_PER_LONG
and use it consistently, then that would be ok too.


>
>> +#define arch_flsl(x)    ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0)
>> +
>> +#define arch_heightl(x) __builtin_popcountl(x)
> arch_hweightl()?

Oops yes.  And RISC-V does have two uses, via __bitmap_weight.

~Andrew

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Jan Beulich 8 months ago
On 28.02.2025 17:24, Andrew Cooper wrote:
> On 27/02/2025 8:11 am, Jan Beulich wrote:
>> On 26.02.2025 18:20, Andrew Cooper wrote:
>>> --- a/xen/arch/riscv/include/asm/bitops.h
>>> +++ b/xen/arch/riscv/include/asm/bitops.h
>>> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>>>  #undef NOT
>>>  #undef __AMO
>>>  
>>> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
>>> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
>>> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
>> I fear you won't like me to say this, but can't we avoid baking in yet
>> another assumption on sizeof(int) == 4, by using at least sizeof(int) * 8
>> here (yet better might be sizeof(int) * BITS_PER_BYTE)?
> 
> Yes and no.
> 
> No, because 32 here is consistent with ARM and PPC when it comes to
> arch_fls().  Given the effort it took to get these consistent, I'm not
> interested in letting them diverge.
> 
> But, if someone wants to introduce BITS_PER_INT to mirror BITS_PER_LONG
> and use it consistently, then that would be ok too.

I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
retain a shorthand of that name, if so desired, but I see no reason why
each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)

Jan

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Andrew Cooper 8 months ago
On 05/03/2025 7:34 am, Jan Beulich wrote:
> On 28.02.2025 17:24, Andrew Cooper wrote:
>> On 27/02/2025 8:11 am, Jan Beulich wrote:
>>> On 26.02.2025 18:20, Andrew Cooper wrote:
>>>> --- a/xen/arch/riscv/include/asm/bitops.h
>>>> +++ b/xen/arch/riscv/include/asm/bitops.h
>>>> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>>>>  #undef NOT
>>>>  #undef __AMO
>>>>  
>>>> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
>>>> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
>>>> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
>>> I fear you won't like me to say this, but can't we avoid baking in yet
>>> another assumption on sizeof(int) == 4, by using at least sizeof(int) * 8
>>> here (yet better might be sizeof(int) * BITS_PER_BYTE)?
>> Yes and no.
>>
>> No, because 32 here is consistent with ARM and PPC when it comes to
>> arch_fls().  Given the effort it took to get these consistent, I'm not
>> interested in letting them diverge.
>>
>> But, if someone wants to introduce BITS_PER_INT to mirror BITS_PER_LONG
>> and use it consistently, then that would be ok too.

Oleksii: I see your patch is committed, but when I said "use it
consistently", I meant "patch ARM and PPC too".
> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
> retain a shorthand of that name, if so desired, but I see no reason why
> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)

The concern is legibility and clarity.

This:

    ((x) ? 32 - __builtin_clz(x) : 0)

is a clear expression in a way that this:

    ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)

is not.  The problem is the extra binary expression, and this:

    ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)

is still clear, because the reader doesn't have to perform a multiply to
just to figure out what's going on.


It is definitely stupid to have each architecture provide their own
BITS_PER_*.  The compiler is in a superior position to provide those
details, and it should be in a common location.

I don't particularly mind how those constants are derived, but one key
thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.

The following files use BITS_PER_LONG preprocessing:

arch/arm/include/asm/div64.h
arch/x86/cpu/mcheck/mce.c
arch/x86/smpboot.c
common/bitops.c
common/coverage/gcov.h
common/coverage/llvm.c
common/cpu.c
common/event_channel.c
common/time.c
common/ubsan/ubsan.c
include/asm-generic/div64.h
include/xen/cpumask.h
include/xen/inttypes.h
include/xen/nodemask.h
include/xen/sched.h
include/xen/xxhash.h
lib/find-next-bit.c
lib/generic-ffsl.c
lib/generic-flsl.c
lib/generic-hweightl.c

And I really don't think they can be replaced with a sizeof().

~Andrew

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Oleksii Kurochko 7 months, 4 weeks ago
On 3/6/25 9:19 PM, Andrew Cooper wrote:
> On 05/03/2025 7:34 am, Jan Beulich wrote:
>> On 28.02.2025 17:24, Andrew Cooper wrote:
>>> On 27/02/2025 8:11 am, Jan Beulich wrote:
>>>> On 26.02.2025 18:20, Andrew Cooper wrote:
>>>>> --- a/xen/arch/riscv/include/asm/bitops.h
>>>>> +++ b/xen/arch/riscv/include/asm/bitops.h
>>>>> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>>>>>   #undef NOT
>>>>>   #undef __AMO
>>>>>   
>>>>> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
>>>>> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
>>>>> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
>>>> I fear you won't like me to say this, but can't we avoid baking in yet
>>>> another assumption on sizeof(int) == 4, by using at least sizeof(int) * 8
>>>> here (yet better might be sizeof(int) * BITS_PER_BYTE)?
>>> Yes and no.
>>>
>>> No, because 32 here is consistent with ARM and PPC when it comes to
>>> arch_fls().  Given the effort it took to get these consistent, I'm not
>>> interested in letting them diverge.
>>>
>>> But, if someone wants to introduce BITS_PER_INT to mirror BITS_PER_LONG
>>> and use it consistently, then that would be ok too.
> Oleksii: I see your patch is committed, but when I said "use it
> consistently", I meant "patch ARM and PPC too".

I planned to do that in a separate patch next week.

>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>> retain a shorthand of that name, if so desired, but I see no reason why
>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
> The concern is legibility and clarity.
>
> This:
>
>      ((x) ? 32 - __builtin_clz(x) : 0)
>
> is a clear expression in a way that this:
>
>      ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>
> is not.  The problem is the extra binary expression, and this:
>
>      ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>
> is still clear, because the reader doesn't have to perform a multiply to
> just to figure out what's going on.
>
>
> It is definitely stupid to have each architecture provide their own
> BITS_PER_*.  The compiler is in a superior position to provide those
> details, and it should be in a common location.
>
> I don't particularly mind how those constants are derived, but one key
> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.

What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)

#define BYTES_PER_INT  (1 << INT_BYTEORDER)
#define BITS_PER_INT  (BYTES_PER_INT << 3)

#define BYTES_PER_LONG (1 << LONG_BYTEORDER)
#define BITS_PER_LONG (BYTES_PER_LONG << 3)
#define BITS_PER_BYTE 8

Also, it seems like the follwoing could be moved there too:

#define POINTER_ALIGN  BYTES_PER_LONG

#define BITS_PER_LLONG 64

#define BITS_PER_BYTE 8

Only, BITS_PER_XEN_ULONG is looking different for x86, so should be still in arch specific config.h.

~ Oleksii

>
> The following files use BITS_PER_LONG preprocessing:
>
> arch/arm/include/asm/div64.h
> arch/x86/cpu/mcheck/mce.c
> arch/x86/smpboot.c
> common/bitops.c
> common/coverage/gcov.h
> common/coverage/llvm.c
> common/cpu.c
> common/event_channel.c
> common/time.c
> common/ubsan/ubsan.c
> include/asm-generic/div64.h
> include/xen/cpumask.h
> include/xen/inttypes.h
> include/xen/nodemask.h
> include/xen/sched.h
> include/xen/xxhash.h
> lib/find-next-bit.c
> lib/generic-ffsl.c
> lib/generic-flsl.c
> lib/generic-hweightl.c
>
> And I really don't think they can be replaced with a sizeof().
>
> ~Andrew
Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Jan Beulich 7 months, 4 weeks ago
On 07.03.2025 12:50, Oleksii Kurochko wrote:
> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>> retain a shorthand of that name, if so desired, but I see no reason why
>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>> The concern is legibility and clarity.
>>
>> This:
>>
>>      ((x) ? 32 - __builtin_clz(x) : 0)
>>
>> is a clear expression in a way that this:
>>
>>      ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>
>> is not.  The problem is the extra binary expression, and this:
>>
>>      ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>
>> is still clear, because the reader doesn't have to perform a multiply to
>> just to figure out what's going on.
>>
>>
>> It is definitely stupid to have each architecture provide their own
>> BITS_PER_*.  The compiler is in a superior position to provide those
>> details, and it should be in a common location.
>>
>> I don't particularly mind how those constants are derived, but one key
>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
> 
> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
> 
> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
> #define BITS_PER_INT  (BYTES_PER_INT << 3)
> 
> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
> #define BITS_PER_BYTE 8
> 
> Also, it seems like the follwoing could be moved there too:
> 
> #define POINTER_ALIGN  BYTES_PER_LONG

This one is likely fine to move.

> #define BITS_PER_LLONG 64

This one is only fine to move imo when converted to

#define BITS_PER_LONG (BYTES_PER_LLONG << 3)

> #define BITS_PER_BYTE 8

Personally I'd rather leave this per-arch. The others can truly be derived;
this one can't be. If we centralize, imo we should also convert the " << 3"
to " * BITS_PER_BYTE".

Jan

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Andrew Cooper 7 months, 4 weeks ago
On 07/03/2025 12:01 pm, Jan Beulich wrote:
> On 07.03.2025 12:50, Oleksii Kurochko wrote:
>> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>>> retain a shorthand of that name, if so desired, but I see no reason why
>>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>>> The concern is legibility and clarity.
>>>
>>> This:
>>>
>>>      ((x) ? 32 - __builtin_clz(x) : 0)
>>>
>>> is a clear expression in a way that this:
>>>
>>>      ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>>
>>> is not.  The problem is the extra binary expression, and this:
>>>
>>>      ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>>
>>> is still clear, because the reader doesn't have to perform a multiply to
>>> just to figure out what's going on.
>>>
>>>
>>> It is definitely stupid to have each architecture provide their own
>>> BITS_PER_*.  The compiler is in a superior position to provide those
>>> details, and it should be in a common location.
>>>
>>> I don't particularly mind how those constants are derived, but one key
>>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
>> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
>>
>> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
>> #define BITS_PER_INT  (BYTES_PER_INT << 3)
>>
>> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
>> #define BITS_PER_BYTE 8

The *_BYTEORDER's are useless indirection.  BITS_PER_* should be defined
straight up, and this will simplify quite a lot of preprocessing.

>>
>> Also, it seems like the follwoing could be moved there too:
>>
>> #define POINTER_ALIGN  BYTES_PER_LONG
> This one is likely fine to move.
>
>> #define BITS_PER_LLONG 64
> This one is only fine to move imo when converted to
>
> #define BITS_PER_LONG (BYTES_PER_LLONG << 3)

Erm?  That's mixing long and long long types.  Presuming that's an
errant L, then sure.

>
>> #define BITS_PER_BYTE 8
> Personally I'd rather leave this per-arch. The others can truly be derived;
> this one can't be. If we centralize, imo we should also convert the " << 3"
> to " * BITS_PER_BYTE".

It is highly unlikely that Xen will ever run on a system where CHAR_BIT
isn't 8.

So I suggest it stays central to reduce complexity.  If an arch ever
needs to change it, the complexity can be added then.

~Andrew

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Oleksii Kurochko 7 months, 1 week ago
On 3/7/25 1:12 PM, Andrew Cooper wrote:
> On 07/03/2025 12:01 pm, Jan Beulich wrote:
>> On 07.03.2025 12:50, Oleksii Kurochko wrote:
>>> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>>>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>>>> retain a shorthand of that name, if so desired, but I see no reason why
>>>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>>>> The concern is legibility and clarity.
>>>>
>>>> This:
>>>>
>>>>       ((x) ? 32 - __builtin_clz(x) : 0)
>>>>
>>>> is a clear expression in a way that this:
>>>>
>>>>       ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>>>
>>>> is not.  The problem is the extra binary expression, and this:
>>>>
>>>>       ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>>>
>>>> is still clear, because the reader doesn't have to perform a multiply to
>>>> just to figure out what's going on.
>>>>
>>>>
>>>> It is definitely stupid to have each architecture provide their own
>>>> BITS_PER_*.  The compiler is in a superior position to provide those
>>>> details, and it should be in a common location.
>>>>
>>>> I don't particularly mind how those constants are derived, but one key
>>>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
>>> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
>>>
>>> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
>>> #define BITS_PER_INT  (BYTES_PER_INT << 3)
>>>
>>> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>>> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
>>> #define BITS_PER_BYTE 8
> The *_BYTEORDER's are useless indirection.  BITS_PER_* should be defined
> straight up, and this will simplify quite a lot of preprocessing.

Could we really drop *_BYTEORDER?

For example, LONG_BYTEORDER for Arm could be either 2 or 3 depends on Arm32 or Arm64 is used.

>
>>> Also, it seems like the follwoing could be moved there too:
>>>
>>> #define POINTER_ALIGN  BYTES_PER_LONG
>> This one is likely fine to move.
>>
>>> #define BITS_PER_LLONG 64
>> This one is only fine to move imo when converted to
>>
>> #define BITS_PER_LONG (BYTES_PER_LLONG << 3)
> Erm?  That's mixing long and long long types.  Presuming that's an
> errant L, then sure.
>
>>> #define BITS_PER_BYTE 8
>> Personally I'd rather leave this per-arch. The others can truly be derived;
>> this one can't be. If we centralize, imo we should also convert the " << 3"
>> to " * BITS_PER_BYTE".
> It is highly unlikely that Xen will ever run on a system where CHAR_BIT
> isn't 8.
>
> So I suggest it stays central to reduce complexity.  If an arch ever
> needs to change it, the complexity can be added then.

Does it make sense to ifdef that? Or, at least, before defintion of BITS_PER_BYTE something like:
#if CHAR_BIT != 8
#error "CHAR_BIT isn't 8"
#endif

~ Oleksii
Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Jan Beulich 7 months, 1 week ago
On 25.03.2025 17:35, Oleksii Kurochko wrote:
> 
> On 3/7/25 1:12 PM, Andrew Cooper wrote:
>> On 07/03/2025 12:01 pm, Jan Beulich wrote:
>>> On 07.03.2025 12:50, Oleksii Kurochko wrote:
>>>> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>>>>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>>>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>>>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>>>>> retain a shorthand of that name, if so desired, but I see no reason why
>>>>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>>>>> The concern is legibility and clarity.
>>>>>
>>>>> This:
>>>>>
>>>>>       ((x) ? 32 - __builtin_clz(x) : 0)
>>>>>
>>>>> is a clear expression in a way that this:
>>>>>
>>>>>       ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>>>>
>>>>> is not.  The problem is the extra binary expression, and this:
>>>>>
>>>>>       ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>>>>
>>>>> is still clear, because the reader doesn't have to perform a multiply to
>>>>> just to figure out what's going on.
>>>>>
>>>>>
>>>>> It is definitely stupid to have each architecture provide their own
>>>>> BITS_PER_*.  The compiler is in a superior position to provide those
>>>>> details, and it should be in a common location.
>>>>>
>>>>> I don't particularly mind how those constants are derived, but one key
>>>>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
>>>> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
>>>>
>>>> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
>>>> #define BITS_PER_INT  (BYTES_PER_INT << 3)
>>>>
>>>> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>>>> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
>>>> #define BITS_PER_BYTE 8
>> The *_BYTEORDER's are useless indirection.  BITS_PER_* should be defined
>> straight up, and this will simplify quite a lot of preprocessing.
> 
> Could we really drop *_BYTEORDER?
> 
> For example, LONG_BYTEORDER for Arm could be either 2 or 3 depends on Arm32 or Arm64 is used.

The can still #define BITS_PER_LONG to 32 or 64 respectively, can't they?

>>>> Also, it seems like the follwoing could be moved there too:
>>>>
>>>> #define POINTER_ALIGN  BYTES_PER_LONG
>>> This one is likely fine to move.
>>>
>>>> #define BITS_PER_LLONG 64
>>> This one is only fine to move imo when converted to
>>>
>>> #define BITS_PER_LONG (BYTES_PER_LLONG << 3)
>> Erm?  That's mixing long and long long types.  Presuming that's an
>> errant L, then sure.
>>
>>>> #define BITS_PER_BYTE 8
>>> Personally I'd rather leave this per-arch. The others can truly be derived;
>>> this one can't be. If we centralize, imo we should also convert the " << 3"
>>> to " * BITS_PER_BYTE".
>> It is highly unlikely that Xen will ever run on a system where CHAR_BIT
>> isn't 8.
>>
>> So I suggest it stays central to reduce complexity.  If an arch ever
>> needs to change it, the complexity can be added then.
> 
> Does it make sense to ifdef that? Or, at least, before defintion of BITS_PER_BYTE something like:
> #if CHAR_BIT != 8
> #error "CHAR_BIT isn't 8"
> #endif

Where would CHAR_BIT come from? Oh, perhaps you mean __CHAR_BIT__? If all
compilers we can build with supply that value, we could indeed centrally
use either

#define BITS_PER_BYTE __CHAR_BIT__

or

#define BITS_PER_BYTE 8
#if BITS_PER_BYTE != __CHAR_BIT__
# error "..."
#endif

Jan

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Oleksii Kurochko 7 months, 1 week ago
On 3/25/25 5:46 PM, Jan Beulich wrote:
> On 25.03.2025 17:35, Oleksii Kurochko wrote:
>> On 3/7/25 1:12 PM, Andrew Cooper wrote:
>>> On 07/03/2025 12:01 pm, Jan Beulich wrote:
>>>> On 07.03.2025 12:50, Oleksii Kurochko wrote:
>>>>> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>>>>>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>>>>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>>>>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>>>>>> retain a shorthand of that name, if so desired, but I see no reason why
>>>>>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>>>>>> The concern is legibility and clarity.
>>>>>>
>>>>>> This:
>>>>>>
>>>>>>        ((x) ? 32 - __builtin_clz(x) : 0)
>>>>>>
>>>>>> is a clear expression in a way that this:
>>>>>>
>>>>>>        ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>>>>>
>>>>>> is not.  The problem is the extra binary expression, and this:
>>>>>>
>>>>>>        ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>>>>>
>>>>>> is still clear, because the reader doesn't have to perform a multiply to
>>>>>> just to figure out what's going on.
>>>>>>
>>>>>>
>>>>>> It is definitely stupid to have each architecture provide their own
>>>>>> BITS_PER_*.  The compiler is in a superior position to provide those
>>>>>> details, and it should be in a common location.
>>>>>>
>>>>>> I don't particularly mind how those constants are derived, but one key
>>>>>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
>>>>> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
>>>>>
>>>>> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
>>>>> #define BITS_PER_INT  (BYTES_PER_INT << 3)
>>>>>
>>>>> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>>>>> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
>>>>> #define BITS_PER_BYTE 8
>>> The *_BYTEORDER's are useless indirection.  BITS_PER_* should be defined
>>> straight up, and this will simplify quite a lot of preprocessing.
>> Could we really drop *_BYTEORDER?
>>
>> For example, LONG_BYTEORDER for Arm could be either 2 or 3 depends on Arm32 or Arm64 is used.
> The can still #define BITS_PER_LONG to 32 or 64 respectively, can't they?

Yes, but then if we want to move it to xen/config.h then it will be needed to:
in Arm's asm/config.h to have something like:
   #ifdef CONFIG_ARM_32
       #define BITS_PER_LONG 32
   #endif

and then in xen/config.h
   #ifndef BITS_PER_LONG
       #define BITS_PER_LONG 64
   #endif

But I wanted to not have #ifndef BITS_PER_LONG in xen/config.h. (or using CONFIG_ARM_32 in xen/config.h)

If it is okay to have this #ifndef in xen/config.h then we can do in that way.

>
>>>>> Also, it seems like the follwoing could be moved there too:
>>>>>
>>>>> #define POINTER_ALIGN  BYTES_PER_LONG
>>>> This one is likely fine to move.
>>>>
>>>>> #define BITS_PER_LLONG 64
>>>> This one is only fine to move imo when converted to
>>>>
>>>> #define BITS_PER_LONG (BYTES_PER_LLONG << 3)
>>> Erm?  That's mixing long and long long types.  Presuming that's an
>>> errant L, then sure.
>>>
>>>>> #define BITS_PER_BYTE 8
>>>> Personally I'd rather leave this per-arch. The others can truly be derived;
>>>> this one can't be. If we centralize, imo we should also convert the " << 3"
>>>> to " * BITS_PER_BYTE".
>>> It is highly unlikely that Xen will ever run on a system where CHAR_BIT
>>> isn't 8.
>>>
>>> So I suggest it stays central to reduce complexity.  If an arch ever
>>> needs to change it, the complexity can be added then.
>> Does it make sense to ifdef that? Or, at least, before defintion of BITS_PER_BYTE something like:
>> #if CHAR_BIT != 8
>> #error "CHAR_BIT isn't 8"
>> #endif
> Where would CHAR_BIT come from? Oh, perhaps you mean __CHAR_BIT__? If all
> compilers we can build with supply that value, we could indeed centrally
> use either
>
> #define BITS_PER_BYTE __CHAR_BIT__
>
> or
>
> #define BITS_PER_BYTE 8
> #if BITS_PER_BYTE != __CHAR_BIT__
> # error "..."
> #endif

Sorry, I meant __CHAR_BIT__.

Thanks.

~ Oleksii
Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Jan Beulich 7 months, 1 week ago
On 25.03.2025 18:04, Oleksii Kurochko wrote:
> 
> On 3/25/25 5:46 PM, Jan Beulich wrote:
>> On 25.03.2025 17:35, Oleksii Kurochko wrote:
>>> On 3/7/25 1:12 PM, Andrew Cooper wrote:
>>>> On 07/03/2025 12:01 pm, Jan Beulich wrote:
>>>>> On 07.03.2025 12:50, Oleksii Kurochko wrote:
>>>>>> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>>>>>>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>>>>>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>>>>>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>>>>>>> retain a shorthand of that name, if so desired, but I see no reason why
>>>>>>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>>>>>>> The concern is legibility and clarity.
>>>>>>>
>>>>>>> This:
>>>>>>>
>>>>>>>        ((x) ? 32 - __builtin_clz(x) : 0)
>>>>>>>
>>>>>>> is a clear expression in a way that this:
>>>>>>>
>>>>>>>        ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>>>>>>
>>>>>>> is not.  The problem is the extra binary expression, and this:
>>>>>>>
>>>>>>>        ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>>>>>>
>>>>>>> is still clear, because the reader doesn't have to perform a multiply to
>>>>>>> just to figure out what's going on.
>>>>>>>
>>>>>>>
>>>>>>> It is definitely stupid to have each architecture provide their own
>>>>>>> BITS_PER_*.  The compiler is in a superior position to provide those
>>>>>>> details, and it should be in a common location.
>>>>>>>
>>>>>>> I don't particularly mind how those constants are derived, but one key
>>>>>>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
>>>>>> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
>>>>>>
>>>>>> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
>>>>>> #define BITS_PER_INT  (BYTES_PER_INT << 3)
>>>>>>
>>>>>> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>>>>>> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
>>>>>> #define BITS_PER_BYTE 8
>>>> The *_BYTEORDER's are useless indirection.  BITS_PER_* should be defined
>>>> straight up, and this will simplify quite a lot of preprocessing.
>>> Could we really drop *_BYTEORDER?
>>>
>>> For example, LONG_BYTEORDER for Arm could be either 2 or 3 depends on Arm32 or Arm64 is used.
>> The can still #define BITS_PER_LONG to 32 or 64 respectively, can't they?
> 
> Yes, but then if we want to move it to xen/config.h then it will be needed to:
> in Arm's asm/config.h to have something like:
>    #ifdef CONFIG_ARM_32
>        #define BITS_PER_LONG 32
>    #endif
> 
> and then in xen/config.h
>    #ifndef BITS_PER_LONG
>        #define BITS_PER_LONG 64
>    #endif
> 
> But I wanted to not have #ifndef BITS_PER_LONG in xen/config.h. (or using CONFIG_ARM_32 in xen/config.h)

Whatever the fundamental definitions that can vary per arch - those should
imo live in asm/config.h. For the case here, if we deem *_BYTEORDER as
fundamental, they go there (and BITS_PER_* go into xen/config.h). If we deem
BITS_PER_* fundamental, they go into asm/config.h.

Stuff that we expect to remain uniform across ports (BITS_PER_BYTE, possibly
also BITS_PER_INT BITS_PER_LLONG) may also go into xen/config.h, as long as
we're okay with the possible future churn if a port appeared that has
different needs.

I agree that there better wouldn't be #ifndef for such in xen/config.h.

Jan

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Andrew Cooper 7 months, 1 week ago
On 26/03/2025 10:15 am, Jan Beulich wrote:
> On 25.03.2025 18:04, Oleksii Kurochko wrote:
>> On 3/25/25 5:46 PM, Jan Beulich wrote:
>>> On 25.03.2025 17:35, Oleksii Kurochko wrote:
>>>> On 3/7/25 1:12 PM, Andrew Cooper wrote:
>>>>> On 07/03/2025 12:01 pm, Jan Beulich wrote:
>>>>>> On 07.03.2025 12:50, Oleksii Kurochko wrote:
>>>>>>> On 3/6/25 9:19 PM, Andrew Cooper wrote:
>>>>>>>> On 05/03/2025 7:34 am, Jan Beulich wrote:
>>>>>>>>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>>>>>>>>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>>>>>>>>> retain a shorthand of that name, if so desired, but I see no reason why
>>>>>>>>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
>>>>>>>> The concern is legibility and clarity.
>>>>>>>>
>>>>>>>> This:
>>>>>>>>
>>>>>>>>        ((x) ? 32 - __builtin_clz(x) : 0)
>>>>>>>>
>>>>>>>> is a clear expression in a way that this:
>>>>>>>>
>>>>>>>>        ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
>>>>>>>>
>>>>>>>> is not.  The problem is the extra binary expression, and this:
>>>>>>>>
>>>>>>>>        ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
>>>>>>>>
>>>>>>>> is still clear, because the reader doesn't have to perform a multiply to
>>>>>>>> just to figure out what's going on.
>>>>>>>>
>>>>>>>>
>>>>>>>> It is definitely stupid to have each architecture provide their own
>>>>>>>> BITS_PER_*.  The compiler is in a superior position to provide those
>>>>>>>> details, and it should be in a common location.
>>>>>>>>
>>>>>>>> I don't particularly mind how those constants are derived, but one key
>>>>>>>> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.
>>>>>>> What about moving them to xen/config.h? (if it isn't the best one place, any suggestion which is better?)
>>>>>>>
>>>>>>> #define BYTES_PER_INT  (1 << INT_BYTEORDER)
>>>>>>> #define BITS_PER_INT  (BYTES_PER_INT << 3)
>>>>>>>
>>>>>>> #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
>>>>>>> #define BITS_PER_LONG (BYTES_PER_LONG << 3)
>>>>>>> #define BITS_PER_BYTE 8
>>>>> The *_BYTEORDER's are useless indirection.  BITS_PER_* should be defined
>>>>> straight up, and this will simplify quite a lot of preprocessing.
>>>> Could we really drop *_BYTEORDER?
>>>>
>>>> For example, LONG_BYTEORDER for Arm could be either 2 or 3 depends on Arm32 or Arm64 is used.
>>> The can still #define BITS_PER_LONG to 32 or 64 respectively, can't they?
>> Yes, but then if we want to move it to xen/config.h then it will be needed to:
>> in Arm's asm/config.h to have something like:
>>    #ifdef CONFIG_ARM_32
>>        #define BITS_PER_LONG 32
>>    #endif
>>
>> and then in xen/config.h
>>    #ifndef BITS_PER_LONG
>>        #define BITS_PER_LONG 64
>>    #endif
>>
>> But I wanted to not have #ifndef BITS_PER_LONG in xen/config.h. (or using CONFIG_ARM_32 in xen/config.h)
> Whatever the fundamental definitions that can vary per arch - those should
> imo live in asm/config.h. For the case here, if we deem *_BYTEORDER as
> fundamental, they go there (and BITS_PER_* go into xen/config.h). If we deem
> BITS_PER_* fundamental, they go into asm/config.h.
>
> Stuff that we expect to remain uniform across ports (BITS_PER_BYTE, possibly
> also BITS_PER_INT BITS_PER_LLONG) may also go into xen/config.h, as long as
> we're okay with the possible future churn if a port appeared that has
> different needs.
>
> I agree that there better wouldn't be #ifndef for such in xen/config.h.

With a new toolchain baseline, we get both of these concepts directly
from the compiler environment.

I don't see any need for arch-specific overrides to these.  They
ultimately come from the -march used.

~Andrew

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Jan Beulich 7 months, 4 weeks ago
On 06.03.2025 21:19, Andrew Cooper wrote:
> On 05/03/2025 7:34 am, Jan Beulich wrote:
>> On 28.02.2025 17:24, Andrew Cooper wrote:
>>> On 27/02/2025 8:11 am, Jan Beulich wrote:
>>>> On 26.02.2025 18:20, Andrew Cooper wrote:
>>>>> --- a/xen/arch/riscv/include/asm/bitops.h
>>>>> +++ b/xen/arch/riscv/include/asm/bitops.h
>>>>> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>>>>>  #undef NOT
>>>>>  #undef __AMO
>>>>>  
>>>>> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
>>>>> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
>>>>> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
>>>> I fear you won't like me to say this, but can't we avoid baking in yet
>>>> another assumption on sizeof(int) == 4, by using at least sizeof(int) * 8
>>>> here (yet better might be sizeof(int) * BITS_PER_BYTE)?
>>> Yes and no.
>>>
>>> No, because 32 here is consistent with ARM and PPC when it comes to
>>> arch_fls().  Given the effort it took to get these consistent, I'm not
>>> interested in letting them diverge.
>>>
>>> But, if someone wants to introduce BITS_PER_INT to mirror BITS_PER_LONG
>>> and use it consistently, then that would be ok too.
> 
> Oleksii: I see your patch is committed, but when I said "use it
> consistently", I meant "patch ARM and PPC too".
>> I was actually hoping to eliminate BITS_PER_LONG at some point, in favor
>> of using sizeof(long) * BITS_PER_BYTE. (Surely in common code we could
>> retain a shorthand of that name, if so desired, but I see no reason why
>> each arch would need to provide all three BITS_PER_{BYTE,INT,LONG}.)
> 
> The concern is legibility and clarity.
> 
> This:
> 
>     ((x) ? 32 - __builtin_clz(x) : 0)
> 
> is a clear expression in a way that this:
> 
>     ((x) ? (sizeof(int) * BITS_PER_BYTE) - __builtin_clz(x) : 0)
> 
> is not.  The problem is the extra binary expression, and this:
> 
>     ((x) ? BITS_PER_INT - __builtin_clz(x) : 0)
> 
> is still clear, because the reader doesn't have to perform a multiply to
> just to figure out what's going on.
> 
> 
> It is definitely stupid to have each architecture provide their own
> BITS_PER_*.  The compiler is in a superior position to provide those
> details, and it should be in a common location.
> 
> I don't particularly mind how those constants are derived, but one key
> thing that BITS_PER_* can do which sizeof() can't is be used in #ifdef/etc.

This is a fair point indeed.

Jan

Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Oleksii Kurochko 8 months, 1 week ago
On 2/26/25 6:20 PM, Andrew Cooper wrote:
> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
> ---
> CC: Oleksii Kurochko<oleksii.kurochko@gmail.com>
>
> Depends on "xen/riscv: make zbb as mandatory"
> ---
>   xen/arch/riscv/include/asm/bitops.h | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/xen/arch/riscv/include/asm/bitops.h b/xen/arch/riscv/include/asm/bitops.h
> index d22eec1e87c7..df3df93520c5 100644
> --- a/xen/arch/riscv/include/asm/bitops.h
> +++ b/xen/arch/riscv/include/asm/bitops.h
> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>   #undef NOT
>   #undef __AMO
>   
> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
> +#define arch_flsl(x)    ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0)
> +
> +#define arch_heightl(x) __builtin_popcountl(x)
> +
>   #endif /* ASM__RISCV__BITOPS_H */
>   
>   /*

LGRM: Reviewed-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>

Thanks.

> base-commit: 7cf163879c5add0a4f7f9c987b61f04f8f7051b1
> prerequisite-patch-id: 9ee1f7ebf5d34b1c565ee2d3d4fb319164bb8bcd
> prerequisite-patch-id: 8a05c87c8d051a3ac0820887f676bbd318e4ae88
> prerequisite-patch-id: 6b56e42d130d8b5ee39457b6760b05cc6e16b049
> prerequisite-patch-id: c139f1f5741d695cd5e5aa6be904edcb61b73885
> prerequisite-patch-id: 99f8b701000e9ee11060934e627a988ddf9aaaa7

Could you please tell me how do you generate this one?

~ Oleksii
Re: [PATCH] RISCV/bitops: Use Zbb to provide arch-optimised bitops
Posted by Andrew Cooper 8 months, 1 week ago
On 26/02/2025 9:01 pm, Oleksii Kurochko wrote:
>
>
> On 2/26/25 6:20 PM, Andrew Cooper wrote:
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>> Depends on "xen/riscv: make zbb as mandatory"
>> ---
>>  xen/arch/riscv/include/asm/bitops.h | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/xen/arch/riscv/include/asm/bitops.h b/xen/arch/riscv/include/asm/bitops.h
>> index d22eec1e87c7..df3df93520c5 100644
>> --- a/xen/arch/riscv/include/asm/bitops.h
>> +++ b/xen/arch/riscv/include/asm/bitops.h
>> @@ -125,6 +125,13 @@ static inline void clear_bit(int nr, volatile void *p)
>>  #undef NOT
>>  #undef __AMO
>>  
>> +#define arch_ffs(x)     ((x) ? 1 + __builtin_ctz(x) : 0)
>> +#define arch_ffsl(x)    ((x) ? 1 + __builtin_ctzl(x) : 0)
>> +#define arch_fls(x)     ((x) ? 32 - __builtin_clz(x) : 0)
>> +#define arch_flsl(x)    ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0)
>> +
>> +#define arch_heightl(x) __builtin_popcountl(x)
>> +
>>  #endif /* ASM__RISCV__BITOPS_H */
>>  
>>  /*
> LGRM: Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Thanks.  This form is much easier than making alternatives for it.

> Thanks.
>> base-commit: 7cf163879c5add0a4f7f9c987b61f04f8f7051b1
>> prerequisite-patch-id: 9ee1f7ebf5d34b1c565ee2d3d4fb319164bb8bcd
>> prerequisite-patch-id: 8a05c87c8d051a3ac0820887f676bbd318e4ae88
>> prerequisite-patch-id: 6b56e42d130d8b5ee39457b6760b05cc6e16b049
>> prerequisite-patch-id: c139f1f5741d695cd5e5aa6be904edcb61b73885
>> prerequisite-patch-id: 99f8b701000e9ee11060934e627a988ddf9aaaa7
> Could you please tell me how do you generate this one?

In gitconfig,

[format]
        useAutoBase = "whenAble"

or --base=auto on a git format-patch command.

This is a poor example.  Those prereq ids are:

* 307e136282d8 - (HEAD -> riscv-isa) RISCV/bitops: Use Zbb to provide
arch-optimised bitops (7 hours ago) <Andrew Cooper>
* 519dcd50e4cd - xen/riscv: make zbb as mandatory (12 hours ago)
<Oleksii Kurochko>
* c9bf2a9ac22c - xen/riscv: identify specific ISA supported by cpu (12
hours ago) <Oleksii Kurochko>
* 9014de63aa14 - automation: drop debian:11-riscv64 container (12 hours
ago) <Oleksii Kurochko>
* 5febb98d11f3 - xen/riscv: drop CONFIG_RISCV_ISA_RV64G (12 hours ago)
<Oleksii Kurochko>
* 5a7c9fd746af - xen/README: add compiler and binutils versions for
RISCV-64 (12 hours ago) <Oleksii Kurochko>
* 7cf163879c5a - (xenbits/staging, xenbits/master, upstream/staging,
upstream/master, origin/staging, origin/master, origin/HEAD, staging,
pending, master) PPC: Activate UBSAN in testing (12 hours ago) <Andrew
Cooper>

as I pulled your series in locally.

~Andrew