[v7] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

[PATCH v7 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

Posted by Vincent Mailhol 3 years, 7 months ago

The compilers provide some builtin expression equivalent to the ffs(),
__ffs() and ffz() functions of the kernel. The kernel uses optimized
assembly which produces better code than the builtin
functions. However, such assembly code can not be folded when used
with constant expressions.

This series relies on __builtin_constant_p to select the optimal solution:

  * use kernel assembly for non constant expressions

  * use compiler's __builtin function for constant expressions.


** Statistics **

Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
of __ffs() and ffz() calls (details of the calculation in each patch).


** Changelog **

v6 -> v7:

  * (no changes on code, only commit tag was modified)

  * Add Reviewed-by: Yury Norov <yury.norov@gmail.com> in both patches


v5 -> v6:
  * Rename variable___ffs() into variable__ffs() (two underscores
    instead of three)


v4 -> v5:

  * (no changes on code, only commit comment was modified)

  * Rewrite the commit log:
    - Use two spaces instead of `| ' to indent code snippets.
    - Do not use `we'.
    - Do not use `this patch' in the commit description. Instead,
      use imperative tone.
  Link: https://lore.kernel.org/all/YvUZVYxbOMcZtR5G@zn.tnic/


v3 -> v4:

  * (no changes on code, only commit comment was modified)

  * Remove note and link to Nick's message in patch 1/2, c.f.:
  Link: https://lore.kernel.org/all/CAKwvOdnnDaiJcV1gr9vV+ya-jWxx7+2KJNTDThyFctVDOgt9zQ@mail.gmail.com/

  * Add Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> tag in
    patch 2/2.


v2 -> v3:

  * Redacted out the instructions after ret and before next function
    in the assembly output.

  * Added a note and a link to Nick's message on the constant
    propagation missed-optimization in clang:
    Link: https://lore.kernel.org/all/CAKwvOdnH_gYv4qRN9pKY7jNTQK95xNeH1w1KZJJmvCkh8xJLBg@mail.gmail.com/

  * Fix copy/paste typo in statistics of patch 1/2. Number of
    occurences before patches are 1081 and not 3607 (percentage
    reduction of 26.7% remains correct)

  * Rename the functions as follow:
    - __varible_ffs() -> variable___ffs()
    - __variable_ffz() -> variable_ffz()

  * Add Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> tag in
    patch 1/2.


Vincent Mailhol (2):
  x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant
    expressions
  x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant
    expressions

 arch/x86/include/asm/bitops.h | 64 +++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 26 deletions(-)

-- 
2.35.1

Re: [PATCH v7 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

Posted by Nick Desaulniers 3 years, 7 months ago

On Sun, Sep 4, 2022 at 5:38 PM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> The compilers provide some builtin expression equivalent to the ffs(),
> __ffs() and ffz() functions of the kernel. The kernel uses optimized
> assembly which produces better code than the builtin
> functions. However, such assembly code can not be folded when used
> with constant expressions.

Another tact which may help additional sources other than just the
Linux kernel; it seems that compilers should be able to fold this.

Vincent, if you're interested in making such an optimization in LLVM,
we'd welcome the contribution, and I'd be happy to show you where to
make such changes within LLVM; please let me know off thread.

If not, at the least, we should file feature requests in both:
* https://github.com/llvm/llvm-project/issues
* https://gcc.gnu.org/bugzilla/

>
> This series relies on __builtin_constant_p to select the optimal solution:
>
>   * use kernel assembly for non constant expressions
>
>   * use compiler's __builtin function for constant expressions.
>
>
> ** Statistics **
>
> Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> of __ffs() and ffz() calls (details of the calculation in each patch).
>
>
> ** Changelog **
>
> v6 -> v7:
>
>   * (no changes on code, only commit tag was modified)
>
>   * Add Reviewed-by: Yury Norov <yury.norov@gmail.com> in both patches
>
>
> v5 -> v6:
>   * Rename variable___ffs() into variable__ffs() (two underscores
>     instead of three)
>
>
> v4 -> v5:
>
>   * (no changes on code, only commit comment was modified)
>
>   * Rewrite the commit log:
>     - Use two spaces instead of `| ' to indent code snippets.
>     - Do not use `we'.
>     - Do not use `this patch' in the commit description. Instead,
>       use imperative tone.
>   Link: https://lore.kernel.org/all/YvUZVYxbOMcZtR5G@zn.tnic/
>
>
> v3 -> v4:
>
>   * (no changes on code, only commit comment was modified)
>
>   * Remove note and link to Nick's message in patch 1/2, c.f.:
>   Link: https://lore.kernel.org/all/CAKwvOdnnDaiJcV1gr9vV+ya-jWxx7+2KJNTDThyFctVDOgt9zQ@mail.gmail.com/
>
>   * Add Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> tag in
>     patch 2/2.
>
>
> v2 -> v3:
>
>   * Redacted out the instructions after ret and before next function
>     in the assembly output.
>
>   * Added a note and a link to Nick's message on the constant
>     propagation missed-optimization in clang:
>     Link: https://lore.kernel.org/all/CAKwvOdnH_gYv4qRN9pKY7jNTQK95xNeH1w1KZJJmvCkh8xJLBg@mail.gmail.com/
>
>   * Fix copy/paste typo in statistics of patch 1/2. Number of
>     occurences before patches are 1081 and not 3607 (percentage
>     reduction of 26.7% remains correct)
>
>   * Rename the functions as follow:
>     - __varible_ffs() -> variable___ffs()
>     - __variable_ffz() -> variable_ffz()
>
>   * Add Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> tag in
>     patch 1/2.
>
>
> Vincent Mailhol (2):
>   x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant
>     expressions
>   x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant
>     expressions
>
>  arch/x86/include/asm/bitops.h | 64 +++++++++++++++++++++--------------
>  1 file changed, 38 insertions(+), 26 deletions(-)
>
> --
> 2.35.1
>


-- 
Thanks,
~Nick Desaulniers

Re: [PATCH v7 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

Posted by Nick Desaulniers 3 years, 7 months ago

On Tue, Sep 6, 2022 at 11:26 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Sun, Sep 4, 2022 at 5:38 PM Vincent Mailhol
> <mailhol.vincent@wanadoo.fr> wrote:
> >
> > The compilers provide some builtin expression equivalent to the ffs(),
> > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > assembly which produces better code than the builtin
> > functions. However, such assembly code can not be folded when used
> > with constant expressions.
>
> Another tact which may help additional sources other than just the
> Linux kernel; it seems that compilers should be able to fold this.
>
> Vincent, if you're interested in making such an optimization in LLVM,
> we'd welcome the contribution, and I'd be happy to show you where to
> make such changes within LLVM; please let me know off thread.

Oh right, it already does.
https://github.com/llvm/llvm-project/blob/ea953b9d9a65c202985a79f1f95da115829baef6/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp#L2635
I see what's happening. Constant propagation sinks constants into a
specialized version of ffs when there's only 1 callsite in a given
translation unit (or multiple call sites with the same constant).
Then dead argument elimination removes the argument, so libcall
optimization thinks this isn't the ffs(int) you're looking for, and
skips it.  Nice.
https://github.com/llvm/llvm-project/issues/57599
I guess ffs() is usually forward declared in strings.h, so we don't
have such a static inline definition available to constant
prop/specialize in normal C code.
-- 
Thanks,
~Nick Desaulniers

Re: [PATCH v7 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

Posted by Vincent MAILHOL 3 years, 7 months ago

On Wed. 7 Sep. 2022 at 16:04, Nick Desaulniers <ndesaulniers@google.com> wrote:
> On Tue, Sep 6, 2022 at 11:26 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Sun, Sep 4, 2022 at 5:38 PM Vincent Mailhol
> > <mailhol.vincent@wanadoo.fr> wrote:
> > >
> > > The compilers provide some builtin expression equivalent to the ffs(),
> > > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > > assembly which produces better code than the builtin
> > > functions. However, such assembly code can not be folded when used
> > > with constant expressions.
> >
> > Another tact which may help additional sources other than just the
> > Linux kernel; it seems that compilers should be able to fold this.

Initially, I thought that you were suggesting folding the asm code
(which doesn’t seem trivial at all).

> > Vincent, if you're interested in making such an optimization in LLVM,
> > we'd welcome the contribution, and I'd be happy to show you where to
> > make such changes within LLVM; please let me know off thread.
>
> Oh right, it already does.
> https://github.com/llvm/llvm-project/blob/ea953b9d9a65c202985a79f1f95da115829baef6/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp#L2635
> I see what's happening. Constant propagation sinks constants into a
> specialized version of ffs when there's only 1 callsite in a given
> translation unit (or multiple call sites with the same constant).
> Then dead argument elimination removes the argument, so libcall
> optimization thinks this isn't the ffs(int) you're looking for, and
> skips it.

Isn’t it a wise decision to skip it? How should the optimization be
able to decide that the redefined ffs() is equivalent to
__builtin_ffs()?

More generally, if I write my own foo() which shadows a
__builtin_foo() function, the two functions might do something totally
different and I would be pissed off if the compiler decided to
constant-fold my foo().

Dummy example:

===================
char *s;

/* ffs: fast forward string
 * @i: how many bytes to move forward
 *
 * Move forward the global s pointer by @i or strlen(s) (whoever is smaller).
 *
 * Return: how many bytes we move forward.
 */
int ffs(int i)
{
        int len = strlen(s);
        int forward = i < len ? i : len;

        s += forward;
        return forward;
}
===================

How would you instruct the compiler to constant-fold the kernel’s
ffs() but not fold above dummy ffs()?

> Nice.
> https://github.com/llvm/llvm-project/issues/57599
> I guess ffs() is usually forward declared in strings.h, so we don't
> have such a static inline definition available to constant
> prop/specialize in normal C code.