arch/x86/crypto/Kconfig | 1 + arch/x86/crypto/Makefile | 8 +- arch/x86/crypto/aes-gcm-aesni-x86_64.S | 1128 +++++++++ arch/x86/crypto/aes-gcm-avx10-x86_64.S | 1222 ++++++++++ arch/x86/crypto/aesni-intel_asm.S | 1503 +----------- arch/x86/crypto/aesni-intel_avx-x86_64.S | 2804 ---------------------- arch/x86/crypto/aesni-intel_glue.c | 1269 ++++++---- 7 files changed, 3125 insertions(+), 4810 deletions(-) create mode 100644 arch/x86/crypto/aes-gcm-aesni-x86_64.S create mode 100644 arch/x86/crypto/aes-gcm-avx10-x86_64.S delete mode 100644 arch/x86/crypto/aesni-intel_avx-x86_64.S
This patchset adds a VAES and AVX512 / AVX10 implementation of AES-GCM (Galois/Counter Mode), which improves AES-GCM performance by up to 162%. In addition, it replaces the old AES-NI GCM code from Intel with new code that is slightly faster and fixes a number of issues including the massive binary size of over 250 KB. See the patches for details. The end state of the x86_64 AES-GCM assembly code is that we end up with two assembly files, one that generates AES-NI code with or without AVX, and one that generates VAES code with AVX512 / AVX10 with 256-bit or 512-bit vectors. There's no support for VAES alone (without AVX512 / AVX10). This differs slightly from what I did with AES-XTS where one file generates both AVX and AVX512 / AVX10 code including code using VAES alone (without AVX512 / AVX10), and another file generates non-AVX code only. For now this seems like the right choice for each particular algorithm, though, based on how much being limited to 16 SIMD registers and 128-bit vectors resulted in some significantly different design choices for AES-GCM, but not quite as much for AES-XTS. CPUs shipping with VAES alone also seems to be a temporary thing, so we perhaps shouldn't go too much out of our way to support that combination. Changed in v5: - Fixed sparse warnings in gcm_setkey() - Fixed some comments in aes-gcm-aesni-x86_64.S Changed in v4: - Added AES-NI rewrite patch. - Adjusted the VAES-AVX10 patch slightly to make it possible to cleanly add the AES-NI support on top of it. Changed in v3: - Optimized the finalization code slightly. - Fixed a minor issue in my userspace benchmark program (guard page after key struct made "AVX512_Cloudflare" extra slow on some input lengths) and regenerated tables 3-4. Also upgraded to Emerald Rapids. - Eliminated an instruction from _aes_gcm_precompute. Changed in v2: - Additional assembly optimizations - Improved some comments - Aligned key struct to 64 bytes - Added comparison with Cloudflare's implementation of AES-GCM - Other cleanups Eric Biggers (2): crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM crypto: x86/aes-gcm - rewrite the AES-NI optimized AES-GCM arch/x86/crypto/Kconfig | 1 + arch/x86/crypto/Makefile | 8 +- arch/x86/crypto/aes-gcm-aesni-x86_64.S | 1128 +++++++++ arch/x86/crypto/aes-gcm-avx10-x86_64.S | 1222 ++++++++++ arch/x86/crypto/aesni-intel_asm.S | 1503 +----------- arch/x86/crypto/aesni-intel_avx-x86_64.S | 2804 ---------------------- arch/x86/crypto/aesni-intel_glue.c | 1269 ++++++---- 7 files changed, 3125 insertions(+), 4810 deletions(-) create mode 100644 arch/x86/crypto/aes-gcm-aesni-x86_64.S create mode 100644 arch/x86/crypto/aes-gcm-avx10-x86_64.S delete mode 100644 arch/x86/crypto/aesni-intel_avx-x86_64.S base-commit: aabbf2135f9a9526991f17cb0c78cf1ec878f1c2 -- 2.45.1
Eric Biggers <ebiggers@kernel.org> wrote: > This patchset adds a VAES and AVX512 / AVX10 implementation of AES-GCM > (Galois/Counter Mode), which improves AES-GCM performance by up to 162%. > In addition, it replaces the old AES-NI GCM code from Intel with new > code that is slightly faster and fixes a number of issues including the > massive binary size of over 250 KB. See the patches for details. > > The end state of the x86_64 AES-GCM assembly code is that we end up with > two assembly files, one that generates AES-NI code with or without AVX, > and one that generates VAES code with AVX512 / AVX10 with 256-bit or > 512-bit vectors. There's no support for VAES alone (without AVX512 / > AVX10). This differs slightly from what I did with AES-XTS where one > file generates both AVX and AVX512 / AVX10 code including code using > VAES alone (without AVX512 / AVX10), and another file generates non-AVX > code only. For now this seems like the right choice for each particular > algorithm, though, based on how much being limited to 16 SIMD registers > and 128-bit vectors resulted in some significantly different design > choices for AES-GCM, but not quite as much for AES-XTS. CPUs shipping > with VAES alone also seems to be a temporary thing, so we perhaps > shouldn't go too much out of our way to support that combination. > > Changed in v5: > - Fixed sparse warnings in gcm_setkey() > - Fixed some comments in aes-gcm-aesni-x86_64.S > > Changed in v4: > - Added AES-NI rewrite patch. > - Adjusted the VAES-AVX10 patch slightly to make it possible to cleanly > add the AES-NI support on top of it. > > Changed in v3: > - Optimized the finalization code slightly. > - Fixed a minor issue in my userspace benchmark program (guard page > after key struct made "AVX512_Cloudflare" extra slow on some input > lengths) and regenerated tables 3-4. Also upgraded to Emerald Rapids. > - Eliminated an instruction from _aes_gcm_precompute. > > Changed in v2: > - Additional assembly optimizations > - Improved some comments > - Aligned key struct to 64 bytes > - Added comparison with Cloudflare's implementation of AES-GCM > - Other cleanups > > Eric Biggers (2): > crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM > crypto: x86/aes-gcm - rewrite the AES-NI optimized AES-GCM > > arch/x86/crypto/Kconfig | 1 + > arch/x86/crypto/Makefile | 8 +- > arch/x86/crypto/aes-gcm-aesni-x86_64.S | 1128 +++++++++ > arch/x86/crypto/aes-gcm-avx10-x86_64.S | 1222 ++++++++++ > arch/x86/crypto/aesni-intel_asm.S | 1503 +----------- > arch/x86/crypto/aesni-intel_avx-x86_64.S | 2804 ---------------------- > arch/x86/crypto/aesni-intel_glue.c | 1269 ++++++---- > 7 files changed, 3125 insertions(+), 4810 deletions(-) > create mode 100644 arch/x86/crypto/aes-gcm-aesni-x86_64.S > create mode 100644 arch/x86/crypto/aes-gcm-avx10-x86_64.S > delete mode 100644 arch/x86/crypto/aesni-intel_avx-x86_64.S > > > base-commit: aabbf2135f9a9526991f17cb0c78cf1ec878f1c2 All applied. Thanks. -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Mon, 3 Jun 2024 at 00:24, Eric Biggers <ebiggers@kernel.org> wrote: > > This patchset adds a VAES and AVX512 / AVX10 implementation of AES-GCM > (Galois/Counter Mode), which improves AES-GCM performance by up to 162%. > In addition, it replaces the old AES-NI GCM code from Intel with new > code that is slightly faster and fixes a number of issues including the > massive binary size of over 250 KB. See the patches for details. > > The end state of the x86_64 AES-GCM assembly code is that we end up with > two assembly files, one that generates AES-NI code with or without AVX, > and one that generates VAES code with AVX512 / AVX10 with 256-bit or > 512-bit vectors. There's no support for VAES alone (without AVX512 / > AVX10). This differs slightly from what I did with AES-XTS where one > file generates both AVX and AVX512 / AVX10 code including code using > VAES alone (without AVX512 / AVX10), and another file generates non-AVX > code only. For now this seems like the right choice for each particular > algorithm, though, based on how much being limited to 16 SIMD registers > and 128-bit vectors resulted in some significantly different design > choices for AES-GCM, but not quite as much for AES-XTS. CPUs shipping > with VAES alone also seems to be a temporary thing, so we perhaps > shouldn't go too much out of our way to support that combination. > > Changed in v5: > - Fixed sparse warnings in gcm_setkey() > - Fixed some comments in aes-gcm-aesni-x86_64.S > This version Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org>
© 2016 - 2026 Red Hat, Inc.