[PATCH 0/5] arm64: Move kernel mode FPSIMD buffer to the stack

Ard Biesheuvel posted 5 patches 2 weeks ago
There is a newer version of this series
arch/arm64/crypto/aes-ce-ccm-glue.c |  5 +--
arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++----
arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++----
arch/arm64/include/asm/neon.h       |  7 ++--
arch/arm64/include/asm/processor.h  |  2 +-
arch/arm64/kernel/fpsimd.c          | 34 +++++++++++++-------
6 files changed, 34 insertions(+), 34 deletions(-)
[PATCH 0/5] arm64: Move kernel mode FPSIMD buffer to the stack
Posted by Ard Biesheuvel 2 weeks ago
From: Ard Biesheuvel <ardb@kernel.org>

Move the buffer for preserving/restoring the kernel mode FPSIMD state on a
context switch out of struct thread_struct, and onto the stack, so that
the memory cost is not imposed needlessly on all tasks in the system.

Patches #1 - #3 contains some prepwork so that patch #4 can tighten the
rules around permitted usage patterns of kernel_neon_begin() and
kernel_neon_end(). This permits #5 to provide a stack buffer to
kernel_neon_begin() transparently, in a manner that ensures that it will
remain available until after the associated call to kernel_neon_end()
returns.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>

Ard Biesheuvel (5):
  crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
  crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
  crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
  arm64/fpsimd: Require kernel NEON begin/end calls from the same scope
  arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack

 arch/arm64/crypto/aes-ce-ccm-glue.c |  5 +--
 arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++----
 arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++----
 arch/arm64/include/asm/neon.h       |  7 ++--
 arch/arm64/include/asm/processor.h  |  2 +-
 arch/arm64/kernel/fpsimd.c          | 34 +++++++++++++-------
 6 files changed, 34 insertions(+), 34 deletions(-)


base-commit: f83ec76bf285bea5727f478a68b894f5543ca76e
-- 
2.51.0.384.g4c02a37b29-goog
Re: [PATCH 0/5] arm64: Move kernel mode FPSIMD buffer to the stack
Posted by Eric Biggers 1 week, 5 days ago
On Thu, Sep 18, 2025 at 08:35:40AM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Move the buffer for preserving/restoring the kernel mode FPSIMD state on a
> context switch out of struct thread_struct, and onto the stack, so that
> the memory cost is not imposed needlessly on all tasks in the system.
> 
> Patches #1 - #3 contains some prepwork so that patch #4 can tighten the
> rules around permitted usage patterns of kernel_neon_begin() and
> kernel_neon_end(). This permits #5 to provide a stack buffer to
> kernel_neon_begin() transparently, in a manner that ensures that it will
> remain available until after the associated call to kernel_neon_end()
> returns.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Brown <broonie@kernel.org>
> 
> Ard Biesheuvel (5):
>   crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
>   crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
>   crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
>   arm64/fpsimd: Require kernel NEON begin/end calls from the same scope
>   arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
> 
>  arch/arm64/crypto/aes-ce-ccm-glue.c |  5 +--
>  arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++----
>  arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++----
>  arch/arm64/include/asm/neon.h       |  7 ++--
>  arch/arm64/include/asm/processor.h  |  2 +-
>  arch/arm64/kernel/fpsimd.c          | 34 +++++++++++++-------
>  6 files changed, 34 insertions(+), 34 deletions(-)

This looks like the right decision: saving 528 bytes per task is
significant.  528 bytes is a lot to allocate on the stack too, but
functions that use the NEON registers are either leaf functions or very
close to being leaf functions, so it should be okay.

The implementation is a bit unusual, though:

   #define kernel_neon_begin()	do { __kernel_neon_begin(&(struct user_fpsimd_state){})
   #define kernel_neon_end()	__kernel_neon_end(); } while (0)

It works, but normally macros don't start or end code blocks behind the
scenes like this.  Perhaps it should be more like s390's
kernel_fpu_begin(), where the caller provides the buffer that the
registers are stored in?  

- Eric
Re: [PATCH 0/5] arm64: Move kernel mode FPSIMD buffer to the stack
Posted by Ard Biesheuvel 1 week, 5 days ago
On Fri, 19 Sept 2025 at 21:32, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Thu, Sep 18, 2025 at 08:35:40AM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Move the buffer for preserving/restoring the kernel mode FPSIMD state on a
> > context switch out of struct thread_struct, and onto the stack, so that
> > the memory cost is not imposed needlessly on all tasks in the system.
> >
> > Patches #1 - #3 contains some prepwork so that patch #4 can tighten the
> > rules around permitted usage patterns of kernel_neon_begin() and
> > kernel_neon_end(). This permits #5 to provide a stack buffer to
> > kernel_neon_begin() transparently, in a manner that ensures that it will
> > remain available until after the associated call to kernel_neon_end()
> > returns.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Mark Brown <broonie@kernel.org>
> >
> > Ard Biesheuvel (5):
> >   crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
> >   crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
> >   crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
> >   arm64/fpsimd: Require kernel NEON begin/end calls from the same scope
> >   arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
> >
> >  arch/arm64/crypto/aes-ce-ccm-glue.c |  5 +--
> >  arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++----
> >  arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++----
> >  arch/arm64/include/asm/neon.h       |  7 ++--
> >  arch/arm64/include/asm/processor.h  |  2 +-
> >  arch/arm64/kernel/fpsimd.c          | 34 +++++++++++++-------
> >  6 files changed, 34 insertions(+), 34 deletions(-)
>
> This looks like the right decision: saving 528 bytes per task is
> significant.  528 bytes is a lot to allocate on the stack too, but
> functions that use the NEON registers are either leaf functions or very
> close to being leaf functions, so it should be okay.
>

Indeed.

> The implementation is a bit unusual, though:
>
>    #define kernel_neon_begin()  do { __kernel_neon_begin(&(struct user_fpsimd_state){})
>    #define kernel_neon_end()    __kernel_neon_end(); } while (0)
>
> It works, but normally macros don't start or end code blocks behind the
> scenes like this.

That is kind of the point, as it restricts the use of them to an idiom
that guarantees that the stack variable lives long enough.

> Perhaps it should be more like s390's
> kernel_fpu_begin(), where the caller provides the buffer that the
> registers are stored in?
>

If we're happy to change the API on both arm64 and ARM, then we could
make it more explicit. It's a lot more work, though.