arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +-- arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++---- arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++---- arch/arm64/include/asm/neon.h | 7 ++-- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/kernel/fpsimd.c | 34 +++++++++++++------- 6 files changed, 34 insertions(+), 34 deletions(-)
From: Ard Biesheuvel <ardb@kernel.org> Move the buffer for preserving/restoring the kernel mode FPSIMD state on a context switch out of struct thread_struct, and onto the stack, so that the memory cost is not imposed needlessly on all tasks in the system. Patches #1 - #3 contains some prepwork so that patch #4 can tighten the rules around permitted usage patterns of kernel_neon_begin() and kernel_neon_end(). This permits #5 to provide a stack buffer to kernel_neon_begin() transparently, in a manner that ensures that it will remain available until after the associated call to kernel_neon_end() returns. Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Brown <broonie@kernel.org> Ard Biesheuvel (5): crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit arm64/fpsimd: Require kernel NEON begin/end calls from the same scope arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +-- arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++---- arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++---- arch/arm64/include/asm/neon.h | 7 ++-- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/kernel/fpsimd.c | 34 +++++++++++++------- 6 files changed, 34 insertions(+), 34 deletions(-) base-commit: f83ec76bf285bea5727f478a68b894f5543ca76e -- 2.51.0.384.g4c02a37b29-goog
On Thu, Sep 18, 2025 at 08:35:40AM +0200, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Move the buffer for preserving/restoring the kernel mode FPSIMD state on a > context switch out of struct thread_struct, and onto the stack, so that > the memory cost is not imposed needlessly on all tasks in the system. > > Patches #1 - #3 contains some prepwork so that patch #4 can tighten the > rules around permitted usage patterns of kernel_neon_begin() and > kernel_neon_end(). This permits #5 to provide a stack buffer to > kernel_neon_begin() transparently, in a manner that ensures that it will > remain available until after the associated call to kernel_neon_end() > returns. > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Mark Brown <broonie@kernel.org> > > Ard Biesheuvel (5): > crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit > crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit > crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit > arm64/fpsimd: Require kernel NEON begin/end calls from the same scope > arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack > > arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +-- > arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++---- > arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++---- > arch/arm64/include/asm/neon.h | 7 ++-- > arch/arm64/include/asm/processor.h | 2 +- > arch/arm64/kernel/fpsimd.c | 34 +++++++++++++------- > 6 files changed, 34 insertions(+), 34 deletions(-) This looks like the right decision: saving 528 bytes per task is significant. 528 bytes is a lot to allocate on the stack too, but functions that use the NEON registers are either leaf functions or very close to being leaf functions, so it should be okay. The implementation is a bit unusual, though: #define kernel_neon_begin() do { __kernel_neon_begin(&(struct user_fpsimd_state){}) #define kernel_neon_end() __kernel_neon_end(); } while (0) It works, but normally macros don't start or end code blocks behind the scenes like this. Perhaps it should be more like s390's kernel_fpu_begin(), where the caller provides the buffer that the registers are stored in? - Eric
On Fri, 19 Sept 2025 at 21:32, Eric Biggers <ebiggers@kernel.org> wrote: > > On Thu, Sep 18, 2025 at 08:35:40AM +0200, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Move the buffer for preserving/restoring the kernel mode FPSIMD state on a > > context switch out of struct thread_struct, and onto the stack, so that > > the memory cost is not imposed needlessly on all tasks in the system. > > > > Patches #1 - #3 contains some prepwork so that patch #4 can tighten the > > rules around permitted usage patterns of kernel_neon_begin() and > > kernel_neon_end(). This permits #5 to provide a stack buffer to > > kernel_neon_begin() transparently, in a manner that ensures that it will > > remain available until after the associated call to kernel_neon_end() > > returns. > > > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Will Deacon <will@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Mark Brown <broonie@kernel.org> > > > > Ard Biesheuvel (5): > > crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit > > crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit > > crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit > > arm64/fpsimd: Require kernel NEON begin/end calls from the same scope > > arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack > > > > arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +-- > > arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++---- > > arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++---- > > arch/arm64/include/asm/neon.h | 7 ++-- > > arch/arm64/include/asm/processor.h | 2 +- > > arch/arm64/kernel/fpsimd.c | 34 +++++++++++++------- > > 6 files changed, 34 insertions(+), 34 deletions(-) > > This looks like the right decision: saving 528 bytes per task is > significant. 528 bytes is a lot to allocate on the stack too, but > functions that use the NEON registers are either leaf functions or very > close to being leaf functions, so it should be okay. > Indeed. > The implementation is a bit unusual, though: > > #define kernel_neon_begin() do { __kernel_neon_begin(&(struct user_fpsimd_state){}) > #define kernel_neon_end() __kernel_neon_end(); } while (0) > > It works, but normally macros don't start or end code blocks behind the > scenes like this. That is kind of the point, as it restricts the use of them to an idiom that guarantees that the stack variable lives long enough. > Perhaps it should be more like s390's > kernel_fpu_begin(), where the caller provides the buffer that the > registers are stored in? > If we're happy to change the API on both arm64 and ARM, then we could make it more explicit. It's a lot more work, though.
© 2016 - 2025 Red Hat, Inc.