[PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code

Eric Biggers posted 12 patches 1 month, 1 week ago
[PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
Posted by Eric Biggers 1 month, 1 week ago
When support for a crypto algorithm is enabled, the arch-optimized
implementation of that algorithm should be enabled too.  We've learned
this the hard way many times over the years: people regularly forget to
enable the arch-optimized implementations of the crypto algorithms,
resulting in significant performance being left on the table.

Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
ARM and x86_64, should be always enabled too.  Let's do that.

Note that the effect on kernel image size is very small and should not
be a concern.  On ARM, enabling CRYPTO_BLAKE2S_ARM actually *shrinks*
the kernel size by about 1200 bytes, since the ARM-optimized
blake2s_compress() completely replaces the generic blake2s_compress().
On x86_64, enabling CRYPTO_BLAKE2S_X86 increases the kernel size by
about 1400 bytes, as the generic blake2s_compress() is still included as
a fallback; however, for context, that is only about a quarter the size
of the generic blake2s_compress().  The x86_64 optimized BLAKE2s code
uses much less icache at runtime than the generic code.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/arm/Kconfig | 2 +-
 lib/crypto/x86/Kconfig | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/crypto/arm/Kconfig b/lib/crypto/arm/Kconfig
index 740341aa35d21..a5607ad079c4f 100644
--- a/lib/crypto/arm/Kconfig
+++ b/lib/crypto/arm/Kconfig
@@ -1,9 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 config CRYPTO_BLAKE2S_ARM
-	bool "Hash functions: BLAKE2s"
+	def_bool y
 	select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
 	help
 	  BLAKE2s cryptographic hash function (RFC 7693)
 
 	  Architecture: arm
diff --git a/lib/crypto/x86/Kconfig b/lib/crypto/x86/Kconfig
index eb47da71aa6b6..ffa718321369f 100644
--- a/lib/crypto/x86/Kconfig
+++ b/lib/crypto/x86/Kconfig
@@ -1,9 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 config CRYPTO_BLAKE2S_X86
-	bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
+	def_bool y
 	depends on 64BIT
 	select CRYPTO_LIB_BLAKE2S_GENERIC
 	select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
 	help
 	  BLAKE2s cryptographic hash function (RFC 7693)
-- 
2.50.1
Re: [PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
Posted by Honza Fikar 1 month ago
On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote:

> Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
> uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
> ARM and x86_64, should be always enabled too.

Maybe a stupid question: what about ARM64? The current NEON
implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just
for ARM.

While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64):

https://github.com/BLAKE2/BLAKE2/blob/master/neon
Re: [PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
Posted by Eric Biggers 1 month ago
On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote:
> On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote:
> 
> > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
> > uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
> > ARM and x86_64, should be always enabled too.
> 
> Maybe a stupid question: what about ARM64? The current NEON
> implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just
> for ARM.
> 
> While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64):
> 
> https://github.com/BLAKE2/BLAKE2/blob/master/neon

There's no ARM64 optimized BLAKE2s code in the Linux kernel yet.  If
it's useful, someone would need to contribute it.

- Eric
Re: [PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
Posted by Ard Biesheuvel 1 month ago
On Fri, 29 Aug 2025 at 17:30, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote:
> > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
> > > uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
> > > ARM and x86_64, should be always enabled too.
> >
> > Maybe a stupid question: what about ARM64? The current NEON
> > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just
> > for ARM.
> >

That code is scalar not NEON, and is carefully tuned to make use of
the ARM barrel shifter, which does not exist on arm64.

> > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64):
> >
> > https://github.com/BLAKE2/BLAKE2/blob/master/neon
>
> There's no ARM64 optimized BLAKE2s code in the Linux kernel yet.  If
> it's useful, someone would need to contribute it.
>

NEON is cumbersome in the kernel so this only makes sense if it is
substantially more performant, and I'm skeptical that this is the
case, as you pointed out yourself in

commit 5172d322d34c30fb926b29aeb5a064e1fd8a5e13
Author: Eric Biggers <ebiggers@google.com>
Date:   Wed Dec 23 00:09:59 2020 -0800

    crypto: arm/blake2s - add ARM scalar optimized BLAKE2s

    Add an ARM scalar optimized implementation of BLAKE2s.

    NEON isn't very useful for BLAKE2s because the BLAKE2s block size
    is too small for NEON to help.  Each NEON instruction would depend
    on the previous one, resulting in poor performance.

Even if NEON code might be slightly faster on some cores, the fact
that it is sensitive to micro-architectural details makes it less
attractive.
Re: [PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
Posted by Eric Biggers 1 month ago
On Fri, Aug 29, 2025 at 06:05:42PM +0200, Ard Biesheuvel wrote:
> On Fri, 29 Aug 2025 at 17:30, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote:
> > > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
> > > > uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
> > > > ARM and x86_64, should be always enabled too.
> > >
> > > Maybe a stupid question: what about ARM64? The current NEON
> > > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just
> > > for ARM.
> > >
> 
> That code is scalar not NEON, and is carefully tuned to make use of
> the ARM barrel shifter, which does not exist on arm64.
> 
> > > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64):
> > >
> > > https://github.com/BLAKE2/BLAKE2/blob/master/neon
> >
> > There's no ARM64 optimized BLAKE2s code in the Linux kernel yet.  If
> > it's useful, someone would need to contribute it.
> >
> 
> NEON is cumbersome in the kernel so this only makes sense if it is
> substantially more performant, and I'm skeptical that this is the
> case, as you pointed out yourself in
> 
> commit 5172d322d34c30fb926b29aeb5a064e1fd8a5e13
> Author: Eric Biggers <ebiggers@google.com>
> Date:   Wed Dec 23 00:09:59 2020 -0800
> 
>     crypto: arm/blake2s - add ARM scalar optimized BLAKE2s
> 
>     Add an ARM scalar optimized implementation of BLAKE2s.
> 
>     NEON isn't very useful for BLAKE2s because the BLAKE2s block size
>     is too small for NEON to help.  Each NEON instruction would depend
>     on the previous one, resulting in poor performance.
> 
> Even if NEON code might be slightly faster on some cores, the fact
> that it is sensitive to micro-architectural details makes it less
> attractive.

Yes, agreed: there isn't much opportunity for an ARM64 optimized BLAKE2s
implementation to be faster than the generic C code.

- Eric