When support for a crypto algorithm is enabled, the arch-optimized
implementation of that algorithm should be enabled too. We've learned
this the hard way many times over the years: people regularly forget to
enable the arch-optimized implementations of the crypto algorithms,
resulting in significant performance being left on the table.
Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
uses it. Therefore, the arch-optimized BLAKE2s code, which exists for
ARM and x86_64, should be always enabled too. Let's do that.
Note that the effect on kernel image size is very small and should not
be a concern. On ARM, enabling CRYPTO_BLAKE2S_ARM actually *shrinks*
the kernel size by about 1200 bytes, since the ARM-optimized
blake2s_compress() completely replaces the generic blake2s_compress().
On x86_64, enabling CRYPTO_BLAKE2S_X86 increases the kernel size by
about 1400 bytes, as the generic blake2s_compress() is still included as
a fallback; however, for context, that is only about a quarter the size
of the generic blake2s_compress(). The x86_64 optimized BLAKE2s code
uses much less icache at runtime than the generic code.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
lib/crypto/arm/Kconfig | 2 +-
lib/crypto/x86/Kconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/crypto/arm/Kconfig b/lib/crypto/arm/Kconfig
index 740341aa35d21..a5607ad079c4f 100644
--- a/lib/crypto/arm/Kconfig
+++ b/lib/crypto/arm/Kconfig
@@ -1,9 +1,9 @@
# SPDX-License-Identifier: GPL-2.0-only
config CRYPTO_BLAKE2S_ARM
- bool "Hash functions: BLAKE2s"
+ def_bool y
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
help
BLAKE2s cryptographic hash function (RFC 7693)
Architecture: arm
diff --git a/lib/crypto/x86/Kconfig b/lib/crypto/x86/Kconfig
index eb47da71aa6b6..ffa718321369f 100644
--- a/lib/crypto/x86/Kconfig
+++ b/lib/crypto/x86/Kconfig
@@ -1,9 +1,9 @@
# SPDX-License-Identifier: GPL-2.0-only
config CRYPTO_BLAKE2S_X86
- bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
+ def_bool y
depends on 64BIT
select CRYPTO_LIB_BLAKE2S_GENERIC
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
help
BLAKE2s cryptographic hash function (RFC 7693)
--
2.50.1
On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote: > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c > uses it. Therefore, the arch-optimized BLAKE2s code, which exists for > ARM and x86_64, should be always enabled too. Maybe a stupid question: what about ARM64? The current NEON implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just for ARM. While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64): https://github.com/BLAKE2/BLAKE2/blob/master/neon
On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote: > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote: > > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c > > uses it. Therefore, the arch-optimized BLAKE2s code, which exists for > > ARM and x86_64, should be always enabled too. > > Maybe a stupid question: what about ARM64? The current NEON > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just > for ARM. > > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64): > > https://github.com/BLAKE2/BLAKE2/blob/master/neon There's no ARM64 optimized BLAKE2s code in the Linux kernel yet. If it's useful, someone would need to contribute it. - Eric
On Fri, 29 Aug 2025 at 17:30, Eric Biggers <ebiggers@kernel.org> wrote: > > On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote: > > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote: > > > > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c > > > uses it. Therefore, the arch-optimized BLAKE2s code, which exists for > > > ARM and x86_64, should be always enabled too. > > > > Maybe a stupid question: what about ARM64? The current NEON > > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just > > for ARM. > > That code is scalar not NEON, and is carefully tuned to make use of the ARM barrel shifter, which does not exist on arm64. > > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64): > > > > https://github.com/BLAKE2/BLAKE2/blob/master/neon > > There's no ARM64 optimized BLAKE2s code in the Linux kernel yet. If > it's useful, someone would need to contribute it. > NEON is cumbersome in the kernel so this only makes sense if it is substantially more performant, and I'm skeptical that this is the case, as you pointed out yourself in commit 5172d322d34c30fb926b29aeb5a064e1fd8a5e13 Author: Eric Biggers <ebiggers@google.com> Date: Wed Dec 23 00:09:59 2020 -0800 crypto: arm/blake2s - add ARM scalar optimized BLAKE2s Add an ARM scalar optimized implementation of BLAKE2s. NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too small for NEON to help. Each NEON instruction would depend on the previous one, resulting in poor performance. Even if NEON code might be slightly faster on some cores, the fact that it is sensitive to micro-architectural details makes it less attractive.
On Fri, Aug 29, 2025 at 06:05:42PM +0200, Ard Biesheuvel wrote: > On Fri, 29 Aug 2025 at 17:30, Eric Biggers <ebiggers@kernel.org> wrote: > > > > On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote: > > > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@kernel.org> wrote: > > > > > > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c > > > > uses it. Therefore, the arch-optimized BLAKE2s code, which exists for > > > > ARM and x86_64, should be always enabled too. > > > > > > Maybe a stupid question: what about ARM64? The current NEON > > > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just > > > for ARM. > > > > > That code is scalar not NEON, and is carefully tuned to make use of > the ARM barrel shifter, which does not exist on arm64. > > > > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64): > > > > > > https://github.com/BLAKE2/BLAKE2/blob/master/neon > > > > There's no ARM64 optimized BLAKE2s code in the Linux kernel yet. If > > it's useful, someone would need to contribute it. > > > > NEON is cumbersome in the kernel so this only makes sense if it is > substantially more performant, and I'm skeptical that this is the > case, as you pointed out yourself in > > commit 5172d322d34c30fb926b29aeb5a064e1fd8a5e13 > Author: Eric Biggers <ebiggers@google.com> > Date: Wed Dec 23 00:09:59 2020 -0800 > > crypto: arm/blake2s - add ARM scalar optimized BLAKE2s > > Add an ARM scalar optimized implementation of BLAKE2s. > > NEON isn't very useful for BLAKE2s because the BLAKE2s block size > is too small for NEON to help. Each NEON instruction would depend > on the previous one, resulting in poor performance. > > Even if NEON code might be slightly faster on some cores, the fact > that it is sensitive to micro-architectural details makes it less > attractive. Yes, agreed: there isn't much opportunity for an ARM64 optimized BLAKE2s implementation to be faster than the generic C code. - Eric
© 2016 - 2025 Red Hat, Inc.