arch/x86/include/asm/bitops.h | 64 +++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 26 deletions(-)
The compilers provide some builtin expression equivalent to the ffs(),
__ffs() and ffz() functions of the kernel. The kernel uses optimized
assembly which produces better code than the builtin
functions. However, such assembly code can not be folded when used
with constant expressions.
This series relies on __builtin_constant_p to select the optimal solution:
* use kernel assembly for non constant expressions
* use compiler's __builtin function for constant expressions.
** Statistics **
Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
of __ffs() and ffz() calls (details of the calculation in each patch).
** Changelog **
v5 -> v6:
* Rename variable___ffs() into variable__ffs() (two underscores
instead of three)
v4 -> v5:
* (no changes on code, only commit comment was modified)
* Rewrite the commit log:
- Use two spaces instead of `| ' to indent code snippets.
- Do not use `we'.
- Do not use `this patch' in the commit description. Instead,
use imperative tone.
Link: https://lore.kernel.org/all/YvUZVYxbOMcZtR5G@zn.tnic/
v3 -> v4:
* (no changes on code, only commit comment was modified)
* Remove note and link to Nick's message in patch 1/2, c.f.:
Link: https://lore.kernel.org/all/CAKwvOdnnDaiJcV1gr9vV+ya-jWxx7+2KJNTDThyFctVDOgt9zQ@mail.gmail.com/
* Add Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> tag in
patch 2/2.
v2 -> v3:
* Redacted out the instructions after ret and before next function
in the assembly output.
* Added a note and a link to Nick's message on the constant
propagation missed-optimization in clang:
Link: https://lore.kernel.org/all/CAKwvOdnH_gYv4qRN9pKY7jNTQK95xNeH1w1KZJJmvCkh8xJLBg@mail.gmail.com/
* Fix copy/paste typo in statistics of patch 1/2. Number of
occurences before patches are 1081 and not 3607 (percentage
reduction of 26.7% remains correct)
* Rename the functions as follow:
- __varible_ffs() -> variable___ffs()
- __variable_ffz() -> variable_ffz()
* Add Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> tag in
patch 1/2.
Vincent Mailhol (2):
x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant
expressions
x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant
expressions
arch/x86/include/asm/bitops.h | 64 +++++++++++++++++++++--------------
1 file changed, 38 insertions(+), 26 deletions(-)
--
2.35.1
On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> The compilers provide some builtin expression equivalent to the ffs(),
> __ffs() and ffz() functions of the kernel. The kernel uses optimized
> assembly which produces better code than the builtin
> functions. However, such assembly code can not be folded when used
> with constant expressions.
>
> This series relies on __builtin_constant_p to select the optimal solution:
>
> * use kernel assembly for non constant expressions
>
> * use compiler's __builtin function for constant expressions.
>
>
> ** Statistics **
>
> Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> of __ffs() and ffz() calls (details of the calculation in each patch).
Hi Vincent,
Can you please add a test for this? We've recently added a very similar
test_bitmap_const_eval() in lib/test_bitmap.c.
dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
assertions")
Would be nice to have something like this for ffs() and ffz() in
lib/test_bitops.c.
Please keep me in loop in case of new versions.
Thanks,
Yury
On Tue. 31 Aug 2022 at 17:51, Yury Norov <yury.norov@gmail.com> wrote: > On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote: (...) > Please keep me in loop in case of new versions. A side comment, but if you want not to miss such discussion again in the future, why not do this? diff --git a/MAINTAINERS b/MAINTAINERS index 56af0182a93b..f6caf4252799 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3602,6 +3602,7 @@ M: Yury Norov <yury.norov@gmail.com> R: Andy Shevchenko <andriy.shevchenko@linux.intel.com> R: Rasmus Villemoes <linux@rasmusvillemoes.dk> S: Maintained +F: arch/*/include/asm/bitops.h F: include/linux/bitmap.h F: include/linux/cpumask.h F: include/linux/find.h N.B. this is just a suggestion, please feel free to discard it. > Thanks, > Yury
On Wed, Aug 31, 2022 at 01:54:01AM -0700, Yury Norov wrote:
> On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> > The compilers provide some builtin expression equivalent to the ffs(),
> > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > assembly which produces better code than the builtin
> > functions. However, such assembly code can not be folded when used
> > with constant expressions.
> >
> > This series relies on __builtin_constant_p to select the optimal solution:
> >
> > * use kernel assembly for non constant expressions
> >
> > * use compiler's __builtin function for constant expressions.
> >
> >
> > ** Statistics **
> >
> > Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> > of __ffs() and ffz() calls (details of the calculation in each patch).
>
> Hi Vincent,
>
> Can you please add a test for this? We've recently added a very similar
> test_bitmap_const_eval() in lib/test_bitmap.c.
>
> dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
> assertions")
>
> Would be nice to have something like this for ffs() and ffz() in
> lib/test_bitops.c.
>
> Please keep me in loop in case of new versions.
Also, what about fls? Is there any difference with ffs/ffz wrt compile
time optimizations? If not, would be great if the series will take
care of it too.
Thanks,
Yury
On Tue. 1 sept. 2022 at 12:49, Yury Norov <yury.norov@gmail.com> wrote:
> On Wed, Aug 31, 2022 at 01:54:01AM -0700, Yury Norov wrote:
> > On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> > > The compilers provide some builtin expression equivalent to the ffs(),
> > > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > > assembly which produces better code than the builtin
> > > functions. However, such assembly code can not be folded when used
> > > with constant expressions.
> > >
> > > This series relies on __builtin_constant_p to select the optimal solution:
> > >
> > > * use kernel assembly for non constant expressions
> > >
> > > * use compiler's __builtin function for constant expressions.
> > >
> > >
> > > ** Statistics **
> > >
> > > Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> > > of __ffs() and ffz() calls (details of the calculation in each patch).
> >
> > Hi Vincent,
> >
> > Can you please add a test for this? We've recently added a very similar
> > test_bitmap_const_eval() in lib/test_bitmap.c.
> >
> > dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
> > assertions")
> >
> > Would be nice to have something like this for ffs() and ffz() in
> > lib/test_bitops.c.
> >
> > Please keep me in loop in case of new versions.
Hi Yury,
My patch only takes care of the x86 architecture. Assuming some other
architectures are not optimized yet, adding such a test might break
some builds. I am fine with adding the test, however, I will not write
patches for the other architecture because I do not have the
environment to compile and test it.
Does it still make sense to add the test before fixing all the architectures?
> Also, what about fls? Is there any difference with ffs/ffz wrt compile
> time optimizations? If not, would be great if the series will take
> care of it too.
Agree. The fls() and fls64() can use __builtin_ctz() and
__builtin_ctzll(). However, those two functions are a bit less
trivial. I wanted to have this first series approved first before
working on *fls*().
Yours sincerely,
Vincent Mailhol
On Thu, Sep 01, 2022 at 07:30:10PM +0900, Vincent MAILHOL wrote:
> On Tue. 1 sept. 2022 at 12:49, Yury Norov <yury.norov@gmail.com> wrote:
> > On Wed, Aug 31, 2022 at 01:54:01AM -0700, Yury Norov wrote:
> > > On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> > > > The compilers provide some builtin expression equivalent to the ffs(),
> > > > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > > > assembly which produces better code than the builtin
> > > > functions. However, such assembly code can not be folded when used
> > > > with constant expressions.
> > > >
> > > > This series relies on __builtin_constant_p to select the optimal solution:
> > > >
> > > > * use kernel assembly for non constant expressions
> > > >
> > > > * use compiler's __builtin function for constant expressions.
> > > >
> > > >
> > > > ** Statistics **
> > > >
> > > > Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> > > > of __ffs() and ffz() calls (details of the calculation in each patch).
> > >
> > > Hi Vincent,
> > >
> > > Can you please add a test for this? We've recently added a very similar
> > > test_bitmap_const_eval() in lib/test_bitmap.c.
> > >
> > > dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
> > > assertions")
> > >
> > > Would be nice to have something like this for ffs() and ffz() in
> > > lib/test_bitops.c.
> > >
> > > Please keep me in loop in case of new versions.
>
> Hi Yury,
>
> My patch only takes care of the x86 architecture.
OK, I just realized that you started submitting this at least back in May.
For me, v6 is good enough and well-described. So, for the series:
Reviewed-by: Yury Norov <yury.norov@gmail.com>
How are you going to merge it? If you haven't a specific tree in mind
already, I can take it in my bitmap tree because ffs and ffz are closely
related to find_bit() functions.
> Assuming some other
> architectures are not optimized yet, adding such a test might break
> some builds. I am fine with adding the test, however, I will not write
> patches for the other architecture because I do not have the
> environment to compile and test it.
>
> Does it still make sense to add the test before fixing all the architectures?
All-arches fix should begin with changing the ffs design. Namely, there
should be a generic ffs() in include/linux/bitops.h, and arch-specific
arch__ffs() in arch/xxx/include/asm/bitops.h; like we do for the set_bit()
family. I have a feeling that it's far beyond the scope of your series.
The test is a different story. Good tests are always welcome, even if
they don't cover all the arches.
> > Also, what about fls? Is there any difference with ffs/ffz wrt compile
> > time optimizations? If not, would be great if the series will take
> > care of it too.
>
> Agree. The fls() and fls64() can use __builtin_ctz() and
> __builtin_ctzll(). However, those two functions are a bit less
> trivial. I wanted to have this first series approved first before
> working on *fls*().
OK, the test and fls() can be a matter of a follow-up series, taking
into account how long are these 2 patches moving.
Thanks,
Yury
On Thu. 1 Sep. 2022 at 23:19, Yury Norov <yury.norov@gmail.com> wrote:
> On Thu, Sep 01, 2022 at 07:30:10PM +0900, Vincent MAILHOL wrote:
> > On Tue. 1 sept. 2022 at 12:49, Yury Norov <yury.norov@gmail.com> wrote:
> > > On Wed, Aug 31, 2022 at 01:54:01AM -0700, Yury Norov wrote:
> > > > On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> > > > > The compilers provide some builtin expression equivalent to the ffs(),
> > > > > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > > > > assembly which produces better code than the builtin
> > > > > functions. However, such assembly code can not be folded when used
> > > > > with constant expressions.
> > > > >
> > > > > This series relies on __builtin_constant_p to select the optimal solution:
> > > > >
> > > > > * use kernel assembly for non constant expressions
> > > > >
> > > > > * use compiler's __builtin function for constant expressions.
> > > > >
> > > > >
> > > > > ** Statistics **
> > > > >
> > > > > Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> > > > > of __ffs() and ffz() calls (details of the calculation in each patch).
> > > >
> > > > Hi Vincent,
> > > >
> > > > Can you please add a test for this? We've recently added a very similar
> > > > test_bitmap_const_eval() in lib/test_bitmap.c.
> > > >
> > > > dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
> > > > assertions")
> > > >
> > > > Would be nice to have something like this for ffs() and ffz() in
> > > > lib/test_bitops.c.
> > > >
> > > > Please keep me in loop in case of new versions.
> >
> > Hi Yury,
> >
> > My patch only takes care of the x86 architecture.
>
> OK, I just realized that you started submitting this at least back in May.
>
> For me, v6 is good enough and well-described. So, for the series:
> Reviewed-by: Yury Norov <yury.norov@gmail.com>
Thanks for the review!
> How are you going to merge it? If you haven't a specific tree in mind
> already, I can take it in my bitmap tree because ffs and ffz are closely
> related to find_bit() functions.
I never thought of a specific tree. I just CCed the x86 architecture
maintainers according to get_maintainer.pl and was expecting it to go
through the x86/asm branch of the tip tree. But I am perfectly fine if
it goes through your tree.
So same as Nick's comment below, unless Borislav still has concern on
the v6, please take it in your tree.
> > Assuming some other
> > architectures are not optimized yet, adding such a test might break
> > some builds. I am fine with adding the test, however, I will not write
> > patches for the other architecture because I do not have the
> > environment to compile and test it.
> >
> > Does it still make sense to add the test before fixing all the architectures?
>
> All-arches fix should begin with changing the ffs design. Namely, there
> should be a generic ffs() in include/linux/bitops.h,
Currently, the generic ffl, ffs, flz are under:
/include/asm-generic/bitops
especially, here is the generic ffs():
https://elixir.bootlin.com/linux/latest/source/include/asm-generic/bitops/ffs.h
Isn't this sufficient?
> and arch-specific
> arch__ffs() in arch/xxx/include/asm/bitops.h; like we do for the set_bit()
> family. I have a feeling that it's far beyond the scope of your series.
>
> The test is a different story. Good tests are always welcome, even if
> they don't cover all the arches.
ACK. I will add the test in a different patch *after* this series gets
accepted. But to be clear, I will not fix other architectures.
> > > Also, what about fls? Is there any difference with ffs/ffz wrt compile
> > > time optimizations? If not, would be great if the series will take
> > > care of it too.
> >
> > Agree. The fls() and fls64() can use __builtin_ctz() and
> > __builtin_ctzll(). However, those two functions are a bit less
> > trivial. I wanted to have this first series approved first before
> > working on *fls*().
>
> OK, the test and fls() can be a matter of a follow-up series, taking
> into account how long are these 2 patches moving.
ACK.
> Thanks,
> Yury
On Thu, Sep 1, 2022 at 7:19 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Thu, Sep 01, 2022 at 07:30:10PM +0900, Vincent MAILHOL wrote:
> > On Tue. 1 sept. 2022 at 12:49, Yury Norov <yury.norov@gmail.com> wrote:
> > > On Wed, Aug 31, 2022 at 01:54:01AM -0700, Yury Norov wrote:
> > > > On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> > > > > The compilers provide some builtin expression equivalent to the ffs(),
> > > > > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > > > > assembly which produces better code than the builtin
> > > > > functions. However, such assembly code can not be folded when used
> > > > > with constant expressions.
> > > > >
> > > > > This series relies on __builtin_constant_p to select the optimal solution:
> > > > >
> > > > > * use kernel assembly for non constant expressions
> > > > >
> > > > > * use compiler's __builtin function for constant expressions.
> > > > >
> > > > >
> > > > > ** Statistics **
> > > > >
> > > > > Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> > > > > of __ffs() and ffz() calls (details of the calculation in each patch).
> > > >
> > > > Hi Vincent,
> > > >
> > > > Can you please add a test for this? We've recently added a very similar
> > > > test_bitmap_const_eval() in lib/test_bitmap.c.
> > > >
> > > > dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
> > > > assertions")
> > > >
> > > > Would be nice to have something like this for ffs() and ffz() in
> > > > lib/test_bitops.c.
> > > >
> > > > Please keep me in loop in case of new versions.
> >
> > Hi Yury,
> >
> > My patch only takes care of the x86 architecture.
>
> OK, I just realized that you started submitting this at least back in May.
>
> For me, v6 is good enough and well-described. So, for the series:
> Reviewed-by: Yury Norov <yury.norov@gmail.com>
>
> How are you going to merge it? If you haven't a specific tree in mind
> already, I can take it in my bitmap tree because ffs and ffz are closely
> related to find_bit() functions.
Unless Boris feels strong enough to nack the series, I think Yury
accepting the series is the way to go.
--
Thanks,
~Nick Desaulniers
On Thu, Sep 01, 2022 at 10:06:48AM -0700, Nick Desaulniers wrote:
> Unless Boris feels strong enough to nack the series, I think Yury
> accepting the series is the way to go.
arch/x86/ patches go through the tip tree unless there's a compelling
reason to carry them through another. I don't see one here.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
© 2016 - 2026 Red Hat, Inc.