lib/crypto/Makefile | 4 ++++ 1 file changed, 4 insertions(+)
The GCC bug only occurred on i386 and has been resolved since GCC 12.2.
Limit the frame size workaround to GCC < 12.2 on i386.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
---
lib/crypto/Makefile | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index b5346cebbb55..5ee36a231484 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o
obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o
libblake2b-y := blake2b.o
+ifeq ($(CONFIG_X86_32),y)
+ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_)
CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
+endif # CONFIG_CC_IS_GCC
+endif # CONFIG_X86_32
ifeq ($(CONFIG_CRYPTO_LIB_BLAKE2B_ARCH),y)
CFLAGS_blake2b.o += -I$(src)/$(SRCARCH)
libblake2b-$(CONFIG_ARM) += arm/blake2b-neon-core.o
--
2.51.1
On Sat, 22 Nov 2025 11:55:31 +0100 Thorsten Blum <thorsten.blum@linux.dev> wrote: > The GCC bug only occurred on i386 and has been resolved since GCC 12.2. > Limit the frame size workaround to GCC < 12.2 on i386. > > Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> > --- > lib/crypto/Makefile | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile > index b5346cebbb55..5ee36a231484 100644 > --- a/lib/crypto/Makefile > +++ b/lib/crypto/Makefile > @@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o > > obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o > libblake2b-y := blake2b.o > +ifeq ($(CONFIG_X86_32),y) > +ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_) > CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 > +endif # CONFIG_CC_IS_GCC > +endif # CONFIG_X86_32 Isn't that just going to cause a run-time stack overflow? The compile-time check is a vague attempt to stop run-type overflow by limiting the largest single stack frames. I can't remember the kernel stack size for x86-32, might only be 8k. The only real solution is to either fix the source so it doesn't blow the stack (I suspect the code is 'unrolled' and the spills everything to stack - making it slow), or just disable the code from the build. David > ifeq ($(CONFIG_CRYPTO_LIB_BLAKE2B_ARCH),y) > CFLAGS_blake2b.o += -I$(src)/$(SRCARCH) > libblake2b-$(CONFIG_ARM) += arm/blake2b-neon-core.o
On 23. Nov 2025, at 10:28, david laight wrote:
> On Sat, 22 Nov 2025 11:55:31 +0100
> Thorsten Blum <thorsten.blum@linux.dev> wrote:
>
>> The GCC bug only occurred on i386 and has been resolved since GCC 12.2.
>> Limit the frame size workaround to GCC < 12.2 on i386.
>>
>> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
>> ---
>> lib/crypto/Makefile | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
>> index b5346cebbb55..5ee36a231484 100644
>> --- a/lib/crypto/Makefile
>> +++ b/lib/crypto/Makefile
>> @@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o
>>
>> obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o
>> libblake2b-y := blake2b.o
>> +ifeq ($(CONFIG_X86_32),y)
>> +ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_)
>> CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
>> +endif # CONFIG_CC_IS_GCC
>> +endif # CONFIG_X86_32
>
> Isn't that just going to cause a run-time stack overflow?
My change doesn't cause a runtime stack overflow, it's just a compiler
warning. There's more information in commit 1d3551ced64e ("crypto:
blake2b: effectively disable frame size warning").
Given the kernel test robot results with GCC 15.1.0 on m68k, we should
probably make this conditional on GCC (any version). Clang produces much
smaller stack frames and should be fine with the default warning
threshold.
I'll send a v2.
Thanks,
Thorsten
On Sun, 23 Nov 2025 18:00:01 +0100
Thorsten Blum <thorsten.blum@linux.dev> wrote:
> On 23. Nov 2025, at 10:28, david laight wrote:
> > On Sat, 22 Nov 2025 11:55:31 +0100
> > Thorsten Blum <thorsten.blum@linux.dev> wrote:
> >
> >> The GCC bug only occurred on i386 and has been resolved since GCC 12.2.
> >> Limit the frame size workaround to GCC < 12.2 on i386.
> >>
> >> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
> >> ---
> >> lib/crypto/Makefile | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
> >> index b5346cebbb55..5ee36a231484 100644
> >> --- a/lib/crypto/Makefile
> >> +++ b/lib/crypto/Makefile
> >> @@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o
> >>
> >> obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o
> >> libblake2b-y := blake2b.o
> >> +ifeq ($(CONFIG_X86_32),y)
> >> +ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_)
> >> CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
> >> +endif # CONFIG_CC_IS_GCC
> >> +endif # CONFIG_X86_32
> >
> > Isn't that just going to cause a run-time stack overflow?
>
> My change doesn't cause a runtime stack overflow, it's just a compiler
> warning. There's more information in commit 1d3551ced64e ("crypto:
> blake2b: effectively disable frame size warning").
>
> Given the kernel test robot results with GCC 15.1.0 on m68k, we should
> probably make this conditional on GCC (any version). Clang produces much
> smaller stack frames and should be fine with the default warning
> threshold.
But if anyone tries to run the kernel they'll need space for the '3k monster stack'.
So changing the limit is 'fine' for a test build, but not for a proper build.
(Yes this has been wrong since Linus did the original patch in 2022.)
Does allmodconfig set COMPILE_TEST ?
If so that could be included in the conditional.
A more interesting question is whether the change can just be removed.
I'd guess no one is actively using gcc 12.1 any more.
David
>
> I'll send a v2.
>
> Thanks,
> Thorsten
>
>
On Sun, Nov 23, 2025 at 06:58:18PM +0000, david laight wrote:
> On Sun, 23 Nov 2025 18:00:01 +0100
> Thorsten Blum <thorsten.blum@linux.dev> wrote:
>
> > On 23. Nov 2025, at 10:28, david laight wrote:
> > > On Sat, 22 Nov 2025 11:55:31 +0100
> > > Thorsten Blum <thorsten.blum@linux.dev> wrote:
> > >
> > >> The GCC bug only occurred on i386 and has been resolved since GCC 12.2.
> > >> Limit the frame size workaround to GCC < 12.2 on i386.
> > >>
> > >> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
> > >> ---
> > >> lib/crypto/Makefile | 4 ++++
> > >> 1 file changed, 4 insertions(+)
> > >>
> > >> diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
> > >> index b5346cebbb55..5ee36a231484 100644
> > >> --- a/lib/crypto/Makefile
> > >> +++ b/lib/crypto/Makefile
> > >> @@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o
> > >>
> > >> obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o
> > >> libblake2b-y := blake2b.o
> > >> +ifeq ($(CONFIG_X86_32),y)
> > >> +ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_)
> > >> CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
> > >> +endif # CONFIG_CC_IS_GCC
> > >> +endif # CONFIG_X86_32
> > >
> > > Isn't that just going to cause a run-time stack overflow?
> >
> > My change doesn't cause a runtime stack overflow, it's just a compiler
> > warning. There's more information in commit 1d3551ced64e ("crypto:
> > blake2b: effectively disable frame size warning").
> >
> > Given the kernel test robot results with GCC 15.1.0 on m68k, we should
> > probably make this conditional on GCC (any version). Clang produces much
> > smaller stack frames and should be fine with the default warning
> > threshold.
>
> But if anyone tries to run the kernel they'll need space for the '3k monster stack'.
> So changing the limit is 'fine' for a test build, but not for a proper build.
> (Yes this has been wrong since Linus did the original patch in 2022.)
>
> Does allmodconfig set COMPILE_TEST ?
> If so that could be included in the conditional.
>
> A more interesting question is whether the change can just be removed.
> I'd guess no one is actively using gcc 12.1 any more.
How about we roll up the BLAKE2b rounds loop if !CONFIG_64BIT?
- Eric
On Sun, 23 Nov 2025 12:26:29 -0800
Eric Biggers <ebiggers@kernel.org> wrote:
> On Sun, Nov 23, 2025 at 06:58:18PM +0000, david laight wrote:
> > On Sun, 23 Nov 2025 18:00:01 +0100
> > Thorsten Blum <thorsten.blum@linux.dev> wrote:
> >
> > > On 23. Nov 2025, at 10:28, david laight wrote:
> > > > On Sat, 22 Nov 2025 11:55:31 +0100
> > > > Thorsten Blum <thorsten.blum@linux.dev> wrote:
> > > >
> > > >> The GCC bug only occurred on i386 and has been resolved since GCC 12.2.
> > > >> Limit the frame size workaround to GCC < 12.2 on i386.
> > > >>
> > > >> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
> > > >> ---
> > > >> lib/crypto/Makefile | 4 ++++
> > > >> 1 file changed, 4 insertions(+)
> > > >>
> > > >> diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
> > > >> index b5346cebbb55..5ee36a231484 100644
> > > >> --- a/lib/crypto/Makefile
> > > >> +++ b/lib/crypto/Makefile
> > > >> @@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o
> > > >>
> > > >> obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o
> > > >> libblake2b-y := blake2b.o
> > > >> +ifeq ($(CONFIG_X86_32),y)
> > > >> +ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_)
> > > >> CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
> > > >> +endif # CONFIG_CC_IS_GCC
> > > >> +endif # CONFIG_X86_32
> > > >
> > > > Isn't that just going to cause a run-time stack overflow?
> > >
> > > My change doesn't cause a runtime stack overflow, it's just a compiler
> > > warning. There's more information in commit 1d3551ced64e ("crypto:
> > > blake2b: effectively disable frame size warning").
> > >
> > > Given the kernel test robot results with GCC 15.1.0 on m68k, we should
> > > probably make this conditional on GCC (any version). Clang produces much
> > > smaller stack frames and should be fine with the default warning
> > > threshold.
> >
> > But if anyone tries to run the kernel they'll need space for the '3k monster stack'.
> > So changing the limit is 'fine' for a test build, but not for a proper build.
> > (Yes this has been wrong since Linus did the original patch in 2022.)
> >
> > Does allmodconfig set COMPILE_TEST ?
> > If so that could be included in the conditional.
> >
> > A more interesting question is whether the change can just be removed.
> > I'd guess no one is actively using gcc 12.1 any more.
>
> How about we roll up the BLAKE2b rounds loop if !CONFIG_64BIT?
I do wonder about the real benefit of some of the massive loop unrolling
that happens in a lot of these algorithms (not just blake2b).
It might speed up (some) benchmarks, but the 'I-cache busting' effect
may well some down any real uses - especially on small/moderate sized buffers.
Loop unrolling is so 1980s...
And that is an entirely separate issue from any register spills.
If the compiler is going to spill to stack the benefits of unrolling are
likely to disappear - especially on a modern 'out of order' and 'multi issue'
cpu.
On x86 you normally get any 'loop control' for free, normal loop unrolling
is pretty pointless except for very short loops (you can't do a 1 clock loop).
Register pressure on a 32bit cpu doing 64bit operations is immense.
Worse for old architectures with very few registers - x86 can only hold
three 64bit values in registers.
So the compiler ends up spilling 'temporary' values from the middle of
expressions as well as all obvious named variables.
So yes, rolling it up (or not unrolling it) on 32bit is a good idea.
David
>
> - Eric
On Mon, Nov 24, 2025 at 10:08 AM david laight <david.laight@runbox.com> wrote: > > How about we roll up the BLAKE2b rounds loop if !CONFIG_64BIT? > > I do wonder about the real benefit of some of the massive loop unrolling > that happens in a lot of these algorithms (not just blake2b). I remember looking at this in the context of blake2s, with two paths, depending on CONFIG_CC_OPTIMIZE_FOR_SIZE, but the savings didn't seem enough for the performance hit. It might be platform specific though. I guess try it and post numbers, and that'll either be a compelling reason to adjust it or still "meh"? Jason
On Mon, Nov 24, 2025 at 06:14:31PM +0100, Jason A. Donenfeld wrote: > On Mon, Nov 24, 2025 at 10:08 AM david laight <david.laight@runbox.com> wrote: > > > How about we roll up the BLAKE2b rounds loop if !CONFIG_64BIT? > > > > I do wonder about the real benefit of some of the massive loop unrolling > > that happens in a lot of these algorithms (not just blake2b). > > I remember looking at this in the context of blake2s, with two paths, > depending on CONFIG_CC_OPTIMIZE_FOR_SIZE, but the savings didn't seem > enough for the performance hit. It might be platform specific though. > I guess try it and post numbers, and that'll either be a compelling > reason to adjust it or still "meh"? Earlier I did some quick microbenchmarks with blake2b_kunit. The existing unrolling does increase throughput by as much as 50%. It's probably mostly due to inlining the blake2b_sigma constants. However, the increased code size is a real issue that doesn't show up in that microbenchmark. Naturally, it will be especially bad on 32-bit CPUs, given that BLAKE2b works with 64-bit words. The 32-bit code gets the code size blow-up from emulating the 64-bit arithmetic using 32-bit instructions, in addition to the unrolling. Rolling up the rounds loop when !CONFIG_64BIT seems like a reasonable first step. We could consider rolling up the rounds loop even when CONFIG_64BIT. If optimal BLAKE2b throughput was actually important on x86_64, we should have an AVX optimized implementation anyway. But no one has ever cared to add one. I think btrfs is the only user currently, but btrfs's use case is non-cryptographic and it already supports much faster non-cryptographic checksums (crc32c and xxhash64). - Eric
Hi Thorsten,
kernel test robot noticed the following build warnings:
[auto build test WARNING on ebiggers/libcrypto-next]
[also build test WARNING on next-20251121]
[cannot apply to ebiggers/libcrypto-fixes crng-random/master linus/master v6.18-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Thorsten-Blum/lib-crypto-blake2b-Limit-frame-size-workaround-to-GCC-12-2-on-i386/20251122-185851
base: https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git libcrypto-next
patch link: https://lore.kernel.org/r/20251122105530.441350-2-thorsten.blum%40linux.dev
patch subject: [PATCH] lib/crypto: blake2b: Limit frame size workaround to GCC < 12.2 on i386
:::::: branch date: 13 hours ago
:::::: commit date: 13 hours ago
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20251123/202511230820.9CzhKlk6-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251123/202511230820.9CzhKlk6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202511230820.9CzhKlk6-lkp@intel.com/
All warnings (new ones prefixed by >>):
lib/crypto/blake2b.c: In function 'blake2b_compress_generic':
>> lib/crypto/blake2b.c:108:1: warning: the frame size of 3440 bytes is larger than 1024 bytes [-Wframe-larger-than=]
108 | }
| ^
vim +108 lib/crypto/blake2b.c
23a16c9533ed92 Eric Biggers 2025-10-17 101
23a16c9533ed92 Eric Biggers 2025-10-17 102 for (i = 0; i < 8; ++i)
23a16c9533ed92 Eric Biggers 2025-10-17 103 ctx->h[i] ^= v[i] ^ v[i + 8];
23a16c9533ed92 Eric Biggers 2025-10-17 104
23a16c9533ed92 Eric Biggers 2025-10-17 105 data += BLAKE2B_BLOCK_SIZE;
23a16c9533ed92 Eric Biggers 2025-10-17 106 --nblocks;
23a16c9533ed92 Eric Biggers 2025-10-17 107 }
23a16c9533ed92 Eric Biggers 2025-10-17 @108 }
23a16c9533ed92 Eric Biggers 2025-10-17 109
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Sat, Nov 22, 2025 at 11:55:31AM +0100, Thorsten Blum wrote: > The GCC bug only occurred on i386 and has been resolved since GCC 12.2. > Limit the frame size workaround to GCC < 12.2 on i386. > > Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> > --- > lib/crypto/Makefile | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile > index b5346cebbb55..5ee36a231484 100644 > --- a/lib/crypto/Makefile > +++ b/lib/crypto/Makefile > @@ -33,7 +33,11 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL) += gf128mul.o > > obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o > libblake2b-y := blake2b.o > +ifeq ($(CONFIG_X86_32),y) > +ifeq ($(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),y_) > CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 > +endif # CONFIG_CC_IS_GCC > +endif # CONFIG_X86_32 How about we do it without the nested ifeq? ifeq ($(CONFIG_X86_32)$(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),yy_) CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 endif Also, according to the bugreport this was a regression in gcc 12. With it having been fixed in 12.2, i.e. within the same gcc release series, is this workaround still worth carrying at all? - Eric
On 22. Nov 2025, at 21:04, Eric Biggers wrote: > On Sat, Nov 22, 2025 at 11:55:31AM +0100, Thorsten Blum wrote: >> The GCC bug only occurred on i386 and has been resolved since GCC 12.2. >> Limit the frame size workaround to GCC < 12.2 on i386. >> >> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> >> --- >> [...] > > How about we do it without the nested ifeq? > > ifeq ($(CONFIG_X86_32)$(CONFIG_CC_IS_GCC)_$(call gcc-min-version, 120200),yy_) > CFLAGS_blake2b.o := -Wframe-larger-than=4096 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 > endif I considered it and went with the nested ifeq, but I'm fine with both. > Also, according to the bugreport this was a regression in gcc 12. With > it having been fixed in 12.2, i.e. within the same gcc release series, > is this workaround still worth carrying at all? Not sure - gcc 8.1.0 is still the min version supported by the kernel. Thorsten
© 2016 - 2025 Red Hat, Inc.