[PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm

Chang S. Bae posted 14 patches 1 year, 10 months ago
[PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Chang S. Bae 1 year, 10 months ago
Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformations are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

The AES-KL instructions themselves are executable in userspace. This
restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations,
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

This then creates a rather strong dependency on AES-NI. If this driver
supports a module build, the exported AES-NI functions cannot be inlined.
More importantly, indirect calls will impact the performance.

To simplify, disallow a module build for AES-KL and always select AES-NI.
This restriction can be relaxed if strong use cases arise against it.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. In this way, an error code can be returned, instead of
facing abrupt exceptions.

== Userspace Exposition ==

The Keylocker implementations so far have measurable performance
penalties. So, keep AES-NI as the default.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is an end-user
consideration for selecting AES-KL. Users may pick it according to the
name 'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
Changes from v8:
* Rebase on the upstream changes.
* Combine the XTS enc/dec assembly code in a macro. (Eric Biggers)
* Define setkey() as void instead of returning 'int'. (Eric Biggers)
* Rearrange the assembly code to reduce jumps especially for success
  cases. (Eric Biggers)
* Update the changelog for clarification. (Eric Biggers)
* Exclude module build.

Changes from v7:
* Update the changelog -- remove 'API Limitation'. (Eric Biggers)
* Update the comment for valid_keylocker(). (Eric Biggers)
* Improve the code:
  - Remove the key-length check and simplify the code. (Eric Biggers)
  - Remove aeskl_dec() and __aeskl_dec() as not needed.
  - Simplify the register-function return handling. (Eric Biggers)
  - Rename setkey functions for coherent naming:
    aeskl_setkey() -> __aeskl_setkey(),
    aeskl_setkey_common() -> aeskl_setkey(),
    aeskl_xts_setkey() -> xts_setkey()
  - Revert an unnecessary comment.

Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlstra)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
---
 arch/x86/Kconfig.assembler         |   5 +
 arch/x86/crypto/Kconfig            |  17 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aes-helper_glue.h  |   7 +-
 arch/x86/crypto/aeskl-intel_asm.S  | 412 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 187 +++++++++++++
 arch/x86/crypto/aeskl-intel_glue.h |  35 +++
 arch/x86/crypto/aesni-intel_glue.c |  30 +--
 arch/x86/crypto/aesni-intel_glue.h |  40 +++
 9 files changed, 704 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 8ad41da301e5..0e58f2b61dd3 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -25,6 +25,11 @@ config AS_GFNI
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config AS_WRUSS
 	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
 	help
diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..067bb149998b 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	bool "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
index 52ba1fe5cf71..262c1cec0011 100644
--- a/arch/x86/crypto/aes-helper_glue.h
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -19,16 +19,17 @@
 #include <crypto/internal/aead.h>
 #include <crypto/internal/simd.h>
 
+#include "aeskl-intel_glue.h"
+
 #define AES_ALIGN		16
 #define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
 #define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
 #define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
 
-/*
- * Preserve data types for various AES implementations available in x86
- */
+/* Data types for the two AES implementations available in x86 */
 union x86_aes_ctx {
 	struct crypto_aes_ctx aesni;
+	struct aeskl_ctx aeskl;
 };
 
 struct aes_xts_ctx {
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..81af7f61aab5
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,412 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN	%xmm8
+
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handlep, const u8 *ukeyp,
+ *		       unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *handlep, u8 *outp, const u8 *inp)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	xor %rax, %rax
+	jmp .Lenc_end
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+	xor %rax, %rax
+	jmp .Lenc_end
+
+.Lenc_err:
+	mov $(-EINVAL), %rax
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+.macro XTS_ENC_DEC operation
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.ifc \operation, dec
+	test $15, LEN
+	jz .Lxts_op8_\@
+	sub $16, LEN
+.endif
+
+.Lxts_op8_\@:
+	sub $128, LEN
+	jl .Lxts_op1_pre_\@
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_op8_128_\@
+.ifc \operation, dec
+	aesdecwide256kl (%rdi)
+.else
+	aesencwide256kl (%rdi)
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op8_end_\@
+.Lxts_op8_128_\@:
+.ifc \operation, dec
+	aesdecwide128kl (%rdi)
+.else
+	aesencwide128kl (%rdi)
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op8_end_\@:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_op8_\@
+
+.Lxts_op_ret_\@:
+	movups IV, (IVP)
+	xor %rax, %rax
+	FRAME_END
+	RET
+
+.Lxts_op1_pre_\@:
+	add $128, LEN
+	jz .Lxts_op_ret_\@
+.ifc \operation, enc
+	sub $16, LEN
+	jl .Lxts_op_cts4_\@
+.endif
+
+.Lxts_op1_\@:
+	movdqu (INP), STATE1
+
+.ifc \operation, dec
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_op_cts1_\@
+.endif
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_op1_128_\@
+.ifc \operation, dec
+	aesdec256kl (HANDLEP), STATE1
+.else
+	aesenc256kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op1_end_\@
+.Lxts_op1_128_\@:
+.ifc \operation, dec
+	aesdec128kl (HANDLEP), STATE1
+.else
+	aesenc128kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op1_end_\@:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_op1_out_\@
+
+.ifc \operation, enc
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_op_cts1_\@
+.endif
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_op1_\@
+
+.Lxts_op1_out_\@:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_op_ret_\@
+
+.Lxts_op_cts4_\@:
+.ifc \operation, enc
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+.endif
+
+.Lxts_op_cts1_\@:
+.ifc \operation, dec
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128_\@
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_op_err_\@
+	jmp .Lxts_dec1_cts_pre_end_\@
+.Lxts_dec1_cts_pre_128_\@:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_op_err_\@
+.Lxts_dec1_cts_pre_end_\@:
+	pxor IV, STATE1
+.endif
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN
+	pblendvb STATE3, IN
+	movaps IN, STATE1
+
+.ifc \operation, dec
+	pxor STATE5, STATE1
+.else
+	pxor IV, STATE1
+.endif
+
+	cmp $16, KLEN
+	je .Lxts_op1_cts_128_\@
+.ifc \operation, dec
+	aesdec256kl (HANDLEP), STATE1
+.else
+	aesenc256kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op1_cts_end_\@
+.Lxts_op1_cts_128_\@:
+.ifc \operation, dec
+	aesdec128kl (HANDLEP), STATE1
+.else
+	aesenc128kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op1_cts_end_\@:
+.ifc \operation, dec
+	pxor STATE5, STATE1
+.else
+	pxor IV, STATE1
+.endif
+	movups STATE1, (OUTP)
+	xor %rax, %rax
+	FRAME_END
+	RET
+
+.Lxts_op_err_\@:
+	mov $(-EINVAL), %rax
+	FRAME_END
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *outp,
+ *			   const u8 *inp, unsigned int klen, le128 *ivp)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	XTS_ENC_DEC enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *outp,
+ *			   const u8 *inp, unsigned int klen, le128 *ivp)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	XTS_ENC_DEC dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..7672c4836da8
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(union x86_aes_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	err = aes_check_keylen(keylen);
+	if (err)
+		return err;
+
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		kernel_fpu_begin();
+		aesni_set_key(&ctx->aesni, in_key, keylen);
+		kernel_fpu_end();
+		return 0;
+	}
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	kernel_fpu_begin();
+	__aeskl_setkey(&ctx->aeskl, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(&ctx->aeskl, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(&ctx->aeskl, out, in, len, iv);
+}
+
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+		      unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.aeskl.key_length;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aeskl-intel_glue.h b/arch/x86/crypto/aeskl-intel_glue.h
new file mode 100644
index 000000000000..57cfd6c55a4f
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESKL_INTEL_GLUE_H
+#define _AESKL_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here should be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+#endif /* _AESKL_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 4ac7b9a28967..d9c4aa055383 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,9 +73,6 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			      unsigned int key_len);
-asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
@@ -89,21 +87,9 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
-{
-	__aesni_enc(ctx, out, in);
-	return 0;
-}
-
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				    const u8 *in, unsigned int len, u8 *iv);
-
-asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				    const u8 *in, unsigned int len, u8 *iv);
-
 #ifdef CONFIG_X86_64
 
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -271,20 +257,6 @@ static inline int aesni_xts_setkey(union x86_aes_ctx *ctx,
 	return aes_set_key_common(&ctx->aesni, in_key, key_len);
 }
 
-static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
-				    unsigned int len, u8 *iv)
-{
-	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
-	return 0;
-}
-
-static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
-				    unsigned int len, u8 *iv)
-{
-	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
-	return 0;
-}
-
 static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..999f81f5bcde
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * These are AES-NI functions that are used by the AES-KL code as a
+ * fallback when it is given a 192-bit key. Key Locker does not support
+ * 192-bit keys.
+ */
+
+#ifndef _AESNI_INTEL_GLUE_H
+#define _AESNI_INTEL_GLUE_H
+
+asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
+static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+#endif /* _AESNI_INTEL_GLUE_H */
-- 
2.34.1
[PATCH v9a 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Chang S. Bae 1 year, 8 months ago
Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion along with all subsequent data transformations, is
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

Although the AES-KL instructions themselves are executable in userspace,
this restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support. However,
per the expectations of Linux crypto-cipher implementations, the software
cipher implementation must support all the AES-compliant key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to VAES. In other words, the 192-bit key-size
limitation is documented but not enforced.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. This allows an error code to be returned instead of
encountering abrupt exceptions.

== Userspace Exposition ==

Keylocker implementations have measurable performance penalties.
Therefore, keep the current default remains the same.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is up to the end
user to decide whether to use AES-KL. User can select it by the name
'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
I've reworked this patch based on feedback,
    https://lore.kernel.org/lkml/20240408014806.GA965@quark.localdomain/
and rebased to upstream v6.10 Linus merge tree on May 13th: commit
84c7d76b5ab6 ("Merge tag 'v6.10-p1' of
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6")

According to the dm-crypt benchmark, using VEX-encoded instructions for
tweak processing enhances performance by approximately 2-3%. The
PCLMULDQD instruction did not yield a measurable impact, so I dropped it
to simplify the implementation.

In contrast to other AES instructions, AES-KL does not permit tweak
processing between rounds. In XTS mode, a single instruction covers all
rounds of 8 blocks without interleaving instructions. Maybe this is one
of the reasons for the limited performance gain.

Moving forward, I would like to address any further feedback on this
AES-KL driver code first before the next revision of the whole series.

Changes from v9:
* Duplicate the new XTS glue code, instead of sharing (Eric).
* Use VEX-coded instructions for non-AES parts of the code (Eric).
* Adjust ASM code to stylistically follow the new VAES support (Eric).
* Export and reference the high-level AES-NI XTS functions (Eric). Then,
  support a module build, along with rearranging build dependencies.
* Reorganize the glue code and improve ASM code readability.
* Revoke the review tag due to major changes.
---
 arch/x86/Kconfig.assembler         |   5 +
 arch/x86/crypto/Kconfig            |  18 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-xts-x86_64.S | 358 +++++++++++++++++++++++++++
 arch/x86/crypto/aeskl_glue.c       | 376 +++++++++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_glue.c |  13 +-
 arch/x86/crypto/aesni-xts.h        |  15 ++
 7 files changed, 783 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-xts-x86_64.S
 create mode 100644 arch/x86/crypto/aeskl_glue.c
 create mode 100644 arch/x86/crypto/aesni-xts.h

diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 59aedf32c4ea..89e326c9dbfe 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -35,6 +35,11 @@ config AS_VPCLMULQDQ
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config AS_WRUSS
 	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
 	help
diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..d55704fc9a8f 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,24 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select CRYPTO_SIMD
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9c5ce5613738..c46fd2d9dd16 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -51,6 +51,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o \
 	aes_ctrby8_avx-x86_64.o aes-xts-avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-x86_64.o
+aeskl-x86_64-y := aeskl-xts-x86_64.o aeskl_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-xts-x86_64.S b/arch/x86/crypto/aeskl-xts-x86_64.S
new file mode 100644
index 000000000000..6ff8b5feebfc
--- /dev/null
+++ b/arch/x86/crypto/aeskl-xts-x86_64.S
@@ -0,0 +1,358 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is primarily derived from aesni-intel_asm.S and
+ * stylistically aligned with aes-xts-avx-x86_64.S.
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+
+/* Constant values shared between AES implementations: */
+
+.section .rodata
+.p2align 4
+.Lgf_poly:
+	/*
+	 * Represents the polynomial x^7 + x^2 + x + 1, where the low 64
+	 * bits are XOR'd into the tweak's low 64 bits when a carry
+	 * occurs from the high 64 bits.
+	 */
+	.quad	0x87, 1
+
+	/*
+	 * Table of constants for variable byte shifts and blending
+	 * during ciphertext stealing operations.
+	 */
+.Lcts_permute_table:
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+
+.text
+
+.set	V0,		%xmm0
+.set	V1,		%xmm1
+.set	V2,		%xmm2
+.set	V3,		%xmm3
+.set	V4,		%xmm4
+.set	V5,		%xmm5
+.set	V6,		%xmm6
+.set	V7,		%xmm7
+.set	V8,		%xmm8
+.set	V9,		%xmm9
+.set	V10,		%xmm10
+.set	V11,		%xmm11
+.set	V12,		%xmm12
+.set	V13,		%xmm13
+.set	V14,		%xmm14
+.set	V15,		%xmm15
+
+.set	TWEAK_XMM1,	V8
+.set	TWEAK_XMM2,	V9
+.set	TWEAK_XMM3,	V10
+.set	TWEAK_XMM4,	V11
+.set	TWEAK_XMM5,	V12
+.set	TWEAK_XMM6,	V13
+.set	TWEAK_XMM7,	V14
+.set	GF_POLY_XMM,	V15
+.set	TWEAK_TMP,	TWEAK_XMM1
+.set	TWEAK_XMM,	TWEAK_XMM2
+.set	TMP,		%r10
+
+/* Function parameters */
+.set	HANDLEP,	%rdi	/* Pointer to struct aeskl_ctx */
+.set	DST,		%rsi	/* Pointer to next destination data */
+.set	UKEYP,		DST	/* Pointer to the original key */
+.set	KLEN,		%r9d	/* AES key length in bytes */
+.set	SRC,		%rdx	/* Pointer to next source data */
+.set	LEN,		%rcx	/* Remaining length in bytes */
+.set	TWEAK,		%r8	/* Pointer to next tweak */
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handlep, const u8 *ukeyp,
+ *		       unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl		%edx, 480(HANDLEP)
+	vmovdqu		(UKEYP), V0
+	mov		$1, %eax
+	cmp		$16, %dl
+	je		.Lsetkey_128
+
+	vmovdqu		0x10(UKEYP), V1
+	encodekey256	%eax, %eax
+	vmovdqu		V3, 0x30(HANDLEP)
+	jmp		.Lsetkey_end
+.Lsetkey_128:
+	encodekey128	%eax, %eax
+
+.Lsetkey_end:
+	vmovdqu		V0, 0x00(HANDLEP)
+	vmovdqu		V1, 0x10(HANDLEP)
+	vmovdqu		V2, 0x20(HANDLEP)
+
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+.macro _aeskl		width, operation
+	cmp		$16, KLEN
+	je		.Laeskl128\@
+.ifc \width, wide
+ .ifc \operation, dec
+	aesdecwide256kl	(HANDLEP)
+ .else
+	aesencwide256kl	(HANDLEP)
+ .endif
+.else
+ .ifc \operation, dec
+	aesdec256kl	(HANDLEP), V0
+ .else
+	aesenc256kl	(HANDLEP), V0
+ .endif
+.endif
+	jmp		.Laesklend\@
+.Laeskl128\@:
+.ifc \width, wide
+ .ifc \operation, dec
+	aesdecwide128kl	(HANDLEP)
+ .else
+	aesencwide128kl	(HANDLEP)
+ .endif
+.else
+ .ifc \operation, dec
+	aesdec128kl	(HANDLEP), V0
+ .else
+	aesenc128kl	(HANDLEP), V0
+ .endif
+.endif
+.Laesklend\@:
+.endm
+
+/* int __aeskl_enc(const void *handlep, u8 *dst, const u8 *src) */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	vmovdqu		(SRC), V0
+	movl		480(HANDLEP), KLEN
+
+	_aeskl		oneblock, enc
+	jz		.Lerror
+	xor		%rax, %rax
+	vmovdqu		V0, (DST)
+	FRAME_END
+	RET
+.Lerror:
+	mov		$(-EINVAL), %rax
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * Calculate the next 128-bit XTS tweak by multiplying the polynomial 'x'
+ * with the current tweak stored in the xmm register \src, and store the
+ * result in \dst.
+ */
+.macro _next_tweak	src, tmp, dst
+	vpshufd		$0x13, \src, \tmp
+	vpaddq		\src, \src, \dst
+	vpsrad		$31, \tmp, \tmp
+	vpand		GF_POLY_XMM, \tmp, \tmp
+	vpxor		\tmp, \dst, \dst
+.endm
+
+.macro _aeskl_xts_crypt operation
+	FRAME_BEGIN
+	vmovdqa		.Lgf_poly(%rip), GF_POLY_XMM
+	vmovups		(TWEAK), TWEAK_XMM
+	mov		480(HANDLEP), KLEN
+
+.ifc \operation, dec
+	/*
+	 * During decryption, if the message length is not a multiple of
+	 * the AES block length, exclude the last complete block from the
+	 * decryption loop by subtracting 16 from LEN. This adjustment is
+	 * necessary because ciphertext stealing decryption uses the last
+	 * two tweaks in reverse order. Special handling is required for
+	 * the last complete block and any remaining partial block at the
+	 * end.
+	 */
+	test		$15, LEN
+	jz		.L8block_at_a_time\@
+	sub		$16, LEN
+.endif
+
+.L8block_at_a_time\@:
+	sub		$128, LEN
+	jl		.Lhandle_remainder\@
+
+	vpxor		(SRC), TWEAK_XMM, V0
+	vmovups		TWEAK_XMM, (DST)
+
+	/*
+	 * Calculate and cache tweak values. Note that the tweak
+	 * computation cannot be interleaved with AES rounds here using
+	 * Key Locker instructions.
+	 */
+	_next_tweak	TWEAK_XMM,  V1, TWEAK_XMM1
+	_next_tweak	TWEAK_XMM1, V1, TWEAK_XMM2
+	_next_tweak	TWEAK_XMM2, V1, TWEAK_XMM3
+	_next_tweak	TWEAK_XMM3, V1, TWEAK_XMM4
+	_next_tweak	TWEAK_XMM4, V1, TWEAK_XMM5
+	_next_tweak	TWEAK_XMM5, V1, TWEAK_XMM6
+	_next_tweak	TWEAK_XMM6, V1, TWEAK_XMM7
+
+	/* XOR each source block with its tweak. */
+	vpxor		0x10(SRC), TWEAK_XMM1, V1
+	vpxor		0x20(SRC), TWEAK_XMM2, V2
+	vpxor		0x30(SRC), TWEAK_XMM3, V3
+	vpxor		0x40(SRC), TWEAK_XMM4, V4
+	vpxor		0x50(SRC), TWEAK_XMM5, V5
+	vpxor		0x60(SRC), TWEAK_XMM6, V6
+	vpxor		0x70(SRC), TWEAK_XMM7, V7
+
+	/* Encrypt or decrypt 8 blocks per iteration. */
+	_aeskl		wide, \operation
+	jz		.Lerror\@
+
+	/* XOR tweaks again. */
+	vpxor		(DST), V0, V0
+	vpxor		TWEAK_XMM1, V1, V1
+	vpxor		TWEAK_XMM2, V2, V2
+	vpxor		TWEAK_XMM3, V3, V3
+	vpxor		TWEAK_XMM4, V4, V4
+	vpxor		TWEAK_XMM5, V5, V5
+	vpxor		TWEAK_XMM6, V6, V6
+	vpxor		TWEAK_XMM7, V7, V7
+
+	/* Store destination blocks. */
+	vmovdqu		V0, 0x00(DST)
+	vmovdqu		V1, 0x10(DST)
+	vmovdqu		V2, 0x20(DST)
+	vmovdqu		V3, 0x30(DST)
+	vmovdqu		V4, 0x40(DST)
+	vmovdqu		V5, 0x50(DST)
+	vmovdqu		V6, 0x60(DST)
+	vmovdqu		V7, 0x70(DST)
+
+	_next_tweak	TWEAK_XMM7, TWEAK_TMP, TWEAK_XMM
+	add		$128, SRC
+	add		$128, DST
+	test		LEN, LEN
+	jz		.Lend\@
+	jmp		.L8block_at_a_time\@
+
+.Lhandle_remainder\@:
+	add		$128, LEN
+	jz		.Lend\@
+.ifc \operation, enc
+	vmovdqu		V7, V0
+.endif
+	sub		$16, LEN
+	jl		.Lcts\@
+
+	/* Encrypt or decrypt one block per iteration */
+.Lblock_at_a_time\@:
+	vpxor		(SRC), TWEAK_XMM, V0
+	_aeskl		oneblock, \operation
+	jz		.Lerror\@
+	vpxor		TWEAK_XMM, V0, V0
+	_next_tweak	TWEAK_XMM, TWEAK_TMP, TWEAK_XMM
+	test		LEN, LEN
+	jz		.Lout\@
+
+	add		$16, SRC
+	vmovdqu		V0, (DST)
+	add		$16, DST
+	sub		$16, LEN
+	jge		.Lblock_at_a_time\@
+
+.Lcts\@:
+.ifc \operation, dec
+	/*
+	 * If decrypting, the last block was not decrypted because CTS
+	 * decryption uses the last two tweaks in reverse order. This is
+	 * done by advancing the tweak and decrypting the last block.
+	 */
+	_next_tweak	TWEAK_XMM, TWEAK_TMP, V4
+	vpxor		(SRC), V4, V0
+	_aeskl		oneblock, \operation
+	jz		.Lerror\@
+	vpxor		V4, V0, V0
+	add		$16, SRC
+.else
+	/*
+	 * If encrypting, the last block was already encrypted in V0.
+	 * Prepare the CTS encryption by rewinding the pointer.
+	 */
+	sub		$16, DST
+.endif
+	lea		.Lcts_permute_table(%rip), TMP
+
+	/* Load the source partial block */
+	vmovdqu		(SRC, LEN, 1), V3
+
+	/*
+	 * Shift the first LEN bytes of the encryption and decryption of
+	 * the last block to the end of a register, then store it to
+	 * DST+LEN.
+	 */
+	add		$16, LEN
+	vpshufb		(TMP, LEN, 1), V0, V2
+	vmovdqu		V2, (DST, LEN, 1)
+
+	/* Shift the source partial block to the beginning */
+	sub		LEN, TMP
+	vmovdqu		32(TMP), V2
+	vpshufb		V2, V3, V3
+
+	/* Blend to generate the source partial block */
+	vpblendvb	V2, V0, V3, V3
+
+	/* Encrypt or decrypt again and store the last block. */
+	vpxor		TWEAK_XMM, V3, V0
+	_aeskl		oneblock, \operation
+	jz		.Lerror\@
+	vpxor		TWEAK_XMM, V0, V0
+	vmovdqu		V0, (DST)
+
+	xor		%rax, %rax
+	FRAME_END
+	RET
+.Lout\@:
+	vmovdqu		V0, (DST)
+.Lend\@:
+	vmovups		TWEAK_XMM, (TWEAK)
+	xor		%rax, %rax
+	FRAME_END
+	RET
+.Lerror\@:
+	mov		$(-EINVAL), %rax
+	FRAME_END
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *dst,
+ *			   const u8 *src, unsigned int klen, le128 *tweak)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	_aeskl_xts_crypt	enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *dst,
+ *			   const u8 *src, unsigned int klen, le128 *twek)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	_aeskl_xts_crypt	dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl_glue.c b/arch/x86/crypto/aeskl_glue.c
new file mode 100644
index 000000000000..6dc4d380be54
--- /dev/null
+++ b/arch/x86/crypto/aeskl_glue.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include "aesni-xts.h"
+
+#define AESKL_ALIGN		16
+#define AESKL_ALIGN_ATTR	__attribute__ ((__aligned__(AESKL_ALIGN)))
+#define AESKL_ALIGN_EXTRA	((AESKL_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here should be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+struct aeskl_xts_ctx {
+	struct aeskl_ctx tweak_ctx AESKL_ALIGN_ATTR;
+	struct aeskl_ctx crypt_ctx AESKL_ALIGN_ATTR;
+};
+
+#define XTS_AES_CTX_SIZE (sizeof(struct aeskl_xts_ctx) + AESKL_ALIGN_EXTRA)
+
+static inline struct aeskl_xts_ctx *aeskl_xts_ctx(struct crypto_skcipher *tfm)
+{
+	void *addr = crypto_skcipher_ctx(tfm);
+
+	if (crypto_tfm_ctx_alignment() >= AESKL_ALIGN)
+		return addr;
+
+	return PTR_ALIGN(addr, AESKL_ALIGN);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.key_length;
+}
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	kernel_fpu_begin();
+	if (!valid_keylocker()) {
+		kernel_fpu_end();
+		return -ENODEV;
+	}
+
+	__aeskl_setkey(ctx, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static int aeskl_xts_encrypt_iv(const struct aeskl_ctx *tweak_key,
+				u8 iv[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(tweak_key, iv, iv);
+}
+
+static int aeskl_xts_encrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(key, dst, src, len, tweak);
+}
+
+static int aeskl_xts_decrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(key, dst, src, len, tweak);
+}
+
+/*
+ * The glue code in xts_crypt() and xts_crypt_slowpath() follows
+ * aesni-intel_glue.c. While this code is shareable, the key
+ * material format difference can cause more destructive code changes in
+ * the AES-NI side.
+ */
+
+typedef int (*xts_encrypt_iv_func)(const struct aeskl_ctx *tweak_key,
+				   u8 iv[AES_BLOCK_SIZE]);
+typedef int (*xts_crypt_func)(const struct aeskl_ctx *key,
+			      const u8 *src, u8 *dst, unsigned int len,
+			      u8 tweak[AES_BLOCK_SIZE]);
+
+/* This handles cases where the source and/or destination span pages. */
+static noinline int
+xts_crypt_slowpath(struct skcipher_request *req, xts_crypt_func crypt_func)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct scatterlist sg_src[2], sg_dst[2];
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	struct scatterlist *src, *dst;
+	int err;
+
+	/*
+	 * If the message length isn't divisible by the AES block size, then
+	 * separate off the last full block and the partial block.  This ensures
+	 * that they are processed in the same call to the assembly function,
+	 * which is required for ciphertext stealing.
+	 */
+	if (tail) {
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   req->cryptlen - tail - AES_BLOCK_SIZE,
+					   req->iv);
+		req = &subreq;
+	}
+
+	err = skcipher_walk_virt(&walk, req, false);
+
+	while (walk.nbytes) {
+		kernel_fpu_begin();
+		err |= (*crypt_func)(&ctx->crypt_ctx,
+				     walk.src.virt.addr, walk.dst.virt.addr,
+				     walk.nbytes & ~(AES_BLOCK_SIZE - 1), req->iv);
+		kernel_fpu_end();
+		err |= skcipher_walk_done(&walk,
+					  walk.nbytes & (AES_BLOCK_SIZE - 1));
+	}
+
+	if (err || !tail)
+		return err;
+
+	/* Do ciphertext stealing with the last full block and partial block. */
+
+	dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+	if (req->dst != req->src)
+		dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+	skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+				   req->iv);
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+
+	kernel_fpu_begin();
+	err = (*crypt_func)(&ctx->crypt_ctx, walk.src.virt.addr, walk.dst.virt.addr,
+			    walk.nbytes, req->iv);
+	kernel_fpu_end();
+	if (err)
+		return err;
+
+	return skcipher_walk_done(&walk, 0);
+}
+
+/* __always_inline to avoid indirect call in fastpath */
+static __always_inline int
+xts_crypt(struct skcipher_request *req, xts_encrypt_iv_func encrypt_iv,
+	  xts_crypt_func crypt_func)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(tfm);
+	const unsigned int cryptlen = req->cryptlen;
+	struct scatterlist *src = req->src;
+	struct scatterlist *dst = req->dst;
+	int err;
+
+	if (unlikely(cryptlen < AES_BLOCK_SIZE))
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	err = (*encrypt_iv)(&ctx->tweak_ctx, req->iv);
+	if (err)
+		goto out;
+
+	/*
+	 * In practice, virtually all XTS plaintexts and ciphertexts are either
+	 * 512 or 4096 bytes, aligned such that they don't span page boundaries.
+	 * To optimize the performance of these cases, and also any other case
+	 * where no page boundary is spanned, the below fast-path handles
+	 * single-page sources and destinations as efficiently as possible.
+	 */
+	if (likely(src->length >= cryptlen && dst->length >= cryptlen &&
+		   src->offset + cryptlen <= PAGE_SIZE &&
+		   dst->offset + cryptlen <= PAGE_SIZE)) {
+		struct page *src_page = sg_page(src);
+		struct page *dst_page = sg_page(dst);
+		void *src_virt = kmap_local_page(src_page) + src->offset;
+		void *dst_virt = kmap_local_page(dst_page) + dst->offset;
+
+		err = (*crypt_func)(&ctx->crypt_ctx, src_virt, dst_virt, cryptlen,
+				    req->iv);
+		if (err)
+			goto out;
+		kunmap_local(dst_virt);
+		kunmap_local(src_virt);
+		kernel_fpu_end();
+		return 0;
+	}
+out:
+	kernel_fpu_end();
+	if (err)
+		return err;
+	return xts_crypt_slowpath(req, crypt_func);
+}
+
+static int xts_setkey_aeskl(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen)
+{
+	struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(tfm);
+	unsigned int aes_keylen;
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	aes_keylen = keylen / 2;
+	err = aes_check_keylen(aes_keylen);
+	if (err)
+		return err;
+
+	if (unlikely(aes_keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		return xts_setkey_aesni(tfm, key, keylen);
+	}
+
+	err = aeskl_setkey(&ctx->crypt_ctx, key, aes_keylen);
+	if (err)
+		return err;
+	return aeskl_setkey(&ctx->tweak_ctx, key + aes_keylen, aes_keylen);
+}
+
+static int xts_encrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_encrypt_aesni(req);
+
+	return xts_crypt(req, aeskl_xts_encrypt_iv, aeskl_xts_encrypt);
+}
+
+static int xts_decrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_decrypt_aesni(req);
+
+	return xts_crypt(req, aeskl_xts_encrypt_iv, aeskl_xts_decrypt);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey_aeskl,
+		.encrypt	= xts_encrypt_aeskl,
+		.decrypt	= xts_decrypt_aeskl,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_AES))
+		return -ENODEV;
+
+	/* The tweak processing is optimized using AVX instructions. */
+	if (!cpu_feature_enabled(X86_FEATURE_AVX))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 5b25d2a58aeb..61456f0a99fa 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -35,7 +35,7 @@
 #include <linux/workqueue.h>
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
-
+#include "aesni-xts.h"
 
 #define AESNI_ALIGN	16
 #define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
@@ -864,8 +864,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 }
 #endif
 
-static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
-			    unsigned int keylen)
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
+		     unsigned int keylen)
 {
 	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
@@ -884,6 +884,7 @@ static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
 	/* second half of xts-key is for tweak */
 	return aes_set_key_common(&ctx->tweak_ctx, key + keylen, keylen);
 }
+EXPORT_SYMBOL_GPL(xts_setkey_aesni);
 
 typedef void (*xts_encrypt_iv_func)(const struct crypto_aes_ctx *tweak_key,
 				    u8 iv[AES_BLOCK_SIZE]);
@@ -1020,15 +1021,17 @@ static void aesni_xts_decrypt(const struct crypto_aes_ctx *key,
 	aesni_xts_dec(key, dst, src, len, tweak);
 }
 
-static int xts_encrypt_aesni(struct skcipher_request *req)
+int xts_encrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_encrypt);
 }
+EXPORT_SYMBOL_GPL(xts_encrypt_aesni);
 
-static int xts_decrypt_aesni(struct skcipher_request *req)
+int xts_decrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_decrypt);
 }
+EXPORT_SYMBOL_GPL(xts_decrypt_aesni);
 
 static struct crypto_alg aesni_cipher_alg = {
 	.cra_name		= "aes",
diff --git a/arch/x86/crypto/aesni-xts.h b/arch/x86/crypto/aesni-xts.h
new file mode 100644
index 000000000000..9833da2bd9d2
--- /dev/null
+++ b/arch/x86/crypto/aesni-xts.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESNI_XTS_H
+#define _AESNI_XTS_H
+
+/*
+ * These AES-NI functions are used by the AES-KL code as a fallback when
+ * a 192-bit key is provided. Key Locker does not support 192-bit keys.
+ */
+
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen);
+int xts_encrypt_aesni(struct skcipher_request *req);
+int xts_decrypt_aesni(struct skcipher_request *req);
+
+#endif /* _AESNI_XTS_H */
-- 
2.34.1
Re: [PATCH v9a 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Eric Biggers 1 year, 8 months ago
On Wed, May 22, 2024 at 11:42:35AM -0700, Chang S. Bae wrote:
> I've reworked this patch based on feedback,
>     https://lore.kernel.org/lkml/20240408014806.GA965@quark.localdomain/
> and rebased to upstream v6.10 Linus merge tree on May 13th: commit
> 84c7d76b5ab6 ("Merge tag 'v6.10-p1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6")
> 
> According to the dm-crypt benchmark, using VEX-encoded instructions for
> tweak processing enhances performance by approximately 2-3%. The
> PCLMULDQD instruction did not yield a measurable impact, so I dropped it
> to simplify the implementation.
> 
> In contrast to other AES instructions, AES-KL does not permit tweak
> processing between rounds. In XTS mode, a single instruction covers all
> rounds of 8 blocks without interleaving instructions. Maybe this is one
> of the reasons for the limited performance gain.
> 
> Moving forward, I would like to address any further feedback on this
> AES-KL driver code first before the next revision of the whole series.
> 
> Changes from v9:
> * Duplicate the new XTS glue code, instead of sharing (Eric).
> * Use VEX-coded instructions for non-AES parts of the code (Eric).
> * Adjust ASM code to stylistically follow the new VAES support (Eric).
> * Export and reference the high-level AES-NI XTS functions (Eric). Then,
>   support a module build, along with rearranging build dependencies.
> * Reorganize the glue code and improve ASM code readability.
> * Revoke the review tag due to major changes.
> ---

Thanks for the updated patch!

> diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
> index 59aedf32c4ea..89e326c9dbfe 100644
> --- a/arch/x86/Kconfig.assembler
> +++ b/arch/x86/Kconfig.assembler
> @@ -35,6 +35,11 @@ config AS_VPCLMULQDQ
>  	help
>  	  Supported by binutils >= 2.30 and LLVM integrated assembler
>  
> +config AS_HAS_KEYLOCKER
> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> +	help
> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12

Adding AS_HAS_KEYLOCKER should be its own patch.

> diff --git a/arch/x86/crypto/aeskl-xts-x86_64.S b/arch/x86/crypto/aeskl-xts-x86_64.S
> new file mode 100644
> index 000000000000..6ff8b5feebfc
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl-xts-x86_64.S
> @@ -0,0 +1,358 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Implement AES algorithm using AES Key Locker instructions.
> + *
> + * Most code is primarily derived from aesni-intel_asm.S and
> + * stylistically aligned with aes-xts-avx-x86_64.S.
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/cfi_types.h>
> +#include <asm/errno.h>
> +#include <asm/inst.h>
> +#include <asm/frame.h>
> +
> +/* Constant values shared between AES implementations: */
> +
> +.section .rodata
> +.p2align 4
> +.Lgf_poly:
> +	/*
> +	 * Represents the polynomial x^7 + x^2 + x + 1, where the low 64
> +	 * bits are XOR'd into the tweak's low 64 bits when a carry
> +	 * occurs from the high 64 bits.
> +	 */
> +	.quad	0x87, 1
> +
> +	/*
> +	 * Table of constants for variable byte shifts and blending
> +	 * during ciphertext stealing operations.
> +	 */
> +.Lcts_permute_table:
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
> +	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +
> +.text
> +
> +.set	V0,		%xmm0
> +.set	V1,		%xmm1
> +.set	V2,		%xmm2
> +.set	V3,		%xmm3
> +.set	V4,		%xmm4
> +.set	V5,		%xmm5
> +.set	V6,		%xmm6
> +.set	V7,		%xmm7
> +.set	V8,		%xmm8
> +.set	V9,		%xmm9
> +.set	V10,		%xmm10
> +.set	V11,		%xmm11
> +.set	V12,		%xmm12
> +.set	V13,		%xmm13
> +.set	V14,		%xmm14
> +.set	V15,		%xmm15

The point of the V[0-15] aliases in aes-xts-avx-x86_64.S are to support both ymm
and zmm registers.  Here all registers are xmm, so there's no need for the layer
of indirection and you should just use xmm[0-15] directly.

> +.set	TWEAK_XMM1,	V8
> +.set	TWEAK_XMM2,	V9
> +.set	TWEAK_XMM3,	V10
> +.set	TWEAK_XMM4,	V11
> +.set	TWEAK_XMM5,	V12
> +.set	TWEAK_XMM6,	V13
> +.set	TWEAK_XMM7,	V14
> +.set	GF_POLY_XMM,	V15
> +.set	TWEAK_TMP,	TWEAK_XMM1
> +.set	TWEAK_XMM,	TWEAK_XMM2
> +.set	TMP,		%r10

Similarly, the _XMM suffixes are not really helpful since ymm and zmm registers
are not in play here.

> +/* Function parameters */
> +.set	HANDLEP,	%rdi	/* Pointer to struct aeskl_ctx */
> +.set	DST,		%rsi	/* Pointer to next destination data */
> +.set	UKEYP,		DST	/* Pointer to the original key */
> +.set	KLEN,		%r9d	/* AES key length in bytes */
> +.set	SRC,		%rdx	/* Pointer to next source data */
> +.set	LEN,		%rcx	/* Remaining length in bytes */
> +.set	TWEAK,		%r8	/* Pointer to next tweak */

Please don't put parameters for different functions in the same list like this.
There should be a separate list at the beginning of each function.  (Yes, it
doesn't work perfectly because '.set' is global and doesn't go out of scope once
the function ends.  But at least this would make it clear what the intent is.)

Also LEN needs to be %ecx, not %rcx, because it is unsigned int.

> +SYM_FUNC_START(__aeskl_setkey)
> +	FRAME_BEGIN
> +	movl		%edx, 480(HANDLEP)
> +	vmovdqu		(UKEYP), V0
> +	mov		$1, %eax
> +	cmp		$16, %dl
> +	je		.Lsetkey_128
> +
> +	vmovdqu		0x10(UKEYP), V1
> +	encodekey256	%eax, %eax
> +	vmovdqu		V3, 0x30(HANDLEP)
> +	jmp		.Lsetkey_end
> +.Lsetkey_128:
> +	encodekey128	%eax, %eax
> +
> +.Lsetkey_end:
> +	vmovdqu		V0, 0x00(HANDLEP)
> +	vmovdqu		V1, 0x10(HANDLEP)
> +	vmovdqu		V2, 0x20(HANDLEP)
> +
> +	FRAME_END
> +	RET
> +SYM_FUNC_END(__aeskl_setkey)

These are all leaf functions, so they don't need FRAME_BEGIN and FRAME_END.

> +.macro _aeskl		width, operation
> +	cmp		$16, KLEN
> +	je		.Laeskl128\@
> +.ifc \width, wide
> + .ifc \operation, dec
> +	aesdecwide256kl	(HANDLEP)
> + .else
> +	aesencwide256kl	(HANDLEP)
> + .endif
> +.else
> + .ifc \operation, dec
> +	aesdec256kl	(HANDLEP), V0
> + .else
> +	aesenc256kl	(HANDLEP), V0
> + .endif
> +.endif
> +	jmp		.Laesklend\@
> +.Laeskl128\@:
> +.ifc \width, wide
> + .ifc \operation, dec
> +	aesdecwide128kl	(HANDLEP)
> + .else
> +	aesencwide128kl	(HANDLEP)
> + .endif
> +.else
> + .ifc \operation, dec
> +	aesdec128kl	(HANDLEP), V0
> + .else
> +	aesenc128kl	(HANDLEP), V0
> + .endif
> +.endif
> +.Laesklend\@:
> +.endm

I think it would be easier to read if this was split into two macros, one for
1-block and one for 8-block.

> +/* int __aeskl_enc(const void *handlep, u8 *dst, const u8 *src) */
> +SYM_FUNC_START(__aeskl_enc)
> +	FRAME_BEGIN
> +	vmovdqu		(SRC), V0
> +	movl		480(HANDLEP), KLEN
> +
> +	_aeskl		oneblock, enc
> +	jz		.Lerror
> +	xor		%rax, %rax
> +	vmovdqu		V0, (DST)
> +	FRAME_END
> +	RET
> +.Lerror:
> +	mov		$(-EINVAL), %rax

For returning an int, %eax should be used, not %rax.

(Note that instructions that operate on %eax also tend to be slightly shorter.)

> +/*
> + * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *dst,
> + *			   const u8 *src, unsigned int klen, le128 *tweak)
> + */
> +SYM_FUNC_START(__aeskl_xts_encrypt)
> +	_aeskl_xts_crypt	enc
> +SYM_FUNC_END(__aeskl_xts_encrypt)
> +
> +/*
> + * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *dst,
> + *			   const u8 *src, unsigned int klen, le128 *twek)
> + */

Please make sure the function prototypes, including the parameter names, match
the ones used in the .c file and also the lists of register aliases.

> diff --git a/arch/x86/crypto/aeskl_glue.c b/arch/x86/crypto/aeskl_glue.c
> new file mode 100644
> index 000000000000..6dc4d380be54
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl_glue.c
> @@ -0,0 +1,376 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Support for AES Key Locker instructions. This file contains glue
> + * code and the real AES implementation is in aeskl-intel_asm.S.
> + *
> + * Most code is based on aesni-intel_glue.c
> + */
> +
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/err.h>
> +#include <crypto/algapi.h>
> +#include <crypto/aes.h>
> +#include <crypto/xts.h>
> +#include <crypto/scatterwalk.h>
> +#include <crypto/internal/skcipher.h>
> +#include <crypto/internal/simd.h>
> +#include <asm/simd.h>
> +#include <asm/cpu_device_id.h>
> +#include <asm/fpu/api.h>
> +#include <asm/keylocker.h>
> +#include "aesni-xts.h"
> +
> +#define AESKL_ALIGN		16
> +#define AESKL_ALIGN_ATTR	__attribute__ ((__aligned__(AESKL_ALIGN)))
> +#define AESKL_ALIGN_EXTRA	((AESKL_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
> +
> +#define AESKL_AAD_SIZE		16
> +#define AESKL_TAG_SIZE		16
> +#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
> +
> +/* The Key Locker handle is an encoded form of the AES key. */
> +struct aeskl_handle {
> +	u8 additional_authdata[AESKL_AAD_SIZE];
> +	u8 integrity_tag[AESKL_TAG_SIZE];
> +	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
> +};

ciphre_text => ciphertext

> +/*
> + * Key Locker does not support 192-bit key size. The driver needs to
> + * retrieve the key size in the first place. The offset of the
> + * 'key_length' field here should be compatible with struct

should => must

> + * crypto_aes_ctx.
> + */
> +#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
> +			    - sizeof(u32))
> +
> +struct aeskl_ctx {
> +	struct aeskl_handle handle;
> +	u8 reserved[AESKL_CTX_RESERVED];
> +	u32 key_length;
> +};
> +
> +struct aeskl_xts_ctx {
> +	struct aeskl_ctx tweak_ctx AESKL_ALIGN_ATTR;
> +	struct aeskl_ctx crypt_ctx AESKL_ALIGN_ATTR;
> +};

So there's a union between aeskl_ctx and crypto_aes_ctx going on here, but it's
not made explicit through a C union.  How about doing that?

Also, there should be a BUILD_BUG_ON() that enforces that the key_length is
really at the same offset in both.

> +/*
> + * The glue code in xts_crypt() and xts_crypt_slowpath() follows
> + * aesni-intel_glue.c. While this code is shareable, the key
> + * material format difference can cause more destructive code changes in
> + * the AES-NI side.
> + */
> +
> +typedef int (*xts_encrypt_iv_func)(const struct aeskl_ctx *tweak_key,
> +				   u8 iv[AES_BLOCK_SIZE]);
> +typedef int (*xts_crypt_func)(const struct aeskl_ctx *key,
> +			      const u8 *src, u8 *dst, unsigned int len,
> +			      u8 tweak[AES_BLOCK_SIZE]);

Since there are so few functions in play here (one xts_encrypt_iv_func and two
xts_crypt_func) I think you should just use direct calls instead of function
pointers.  A simple parameter 'bool enc' would take care of selecting the
encryption function vs. the decryption one.

One of the issues with indirect calls, even when inlined with the intention that
they be optimized out, is that there's no guarantee that the compiler will
actually optimize them out.  That has the consequence that CFI stubs are still
needed in the assembly.

Direct calls avoid this issue.

(BTW, in my AES-GCM patchset I'm using direct calls:
https://lore.kernel.org/linux-crypto/20240527075626.142576-1-ebiggers@kernel.org/.
I'm thinking I should have used that approach with AES-XTS too.)

> +static struct skcipher_alg aeskl_skciphers[] = {
> +	{
> +		.base = {
> +			.cra_name		= "__xts(aes)",
> +			.cra_driver_name	= "__xts-aes-aeskl",
> +			.cra_priority		= 200,

Maybe add a comment here that explains that this is intentionally made lower
priority than xts-aes-aesni.

> +static int __init aeskl_init(void)
> +{
> +	u32 eax, ebx, ecx, edx;
> +
> +	if (!valid_keylocker())
> +		return -ENODEV;
> +
> +	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
> +	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
> +		return -ENODEV;
> +
> +	/*
> +	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
> +	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
> +	 * back to AES-NI.
> +	 */
> +	if (!cpu_feature_enabled(X86_FEATURE_AES))
> +		return -ENODEV;

Everywhere else in arch/x86/crypto/ uses boot_cpu_has(), not
cpu_feature_enabled().

> +
> +	/* The tweak processing is optimized using AVX instructions. */
> +	if (!cpu_feature_enabled(X86_FEATURE_AVX))
> +		return -ENODEV;

The whole implementation uses VEX-coded instructions now, not just the tweak
processing.  So either fix or delete the above comment.

- Eric
[PATCH v9b 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Chang S. Bae 1 year, 8 months ago
Hi Eric,

I really appreciate your review. Keeping track of about 800 lines of
code, including assembly lines, in a single patch is demanding. I hope
this version meets your expectations.

The overall diff can be found here:
  https://github.com/intel-staging/keylocker/compare/f8420d4e27fc..57472d9b3f8e

Thanks,
Chang

---
Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion along with all subsequent data transformations, is
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

Although the AES-KL instructions themselves are executable in userspace,
this restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support. However,
per the expectations of Linux crypto-cipher implementations, the software
cipher implementation must support all the AES-compliant key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit key-size
limitation is documented but not enforced.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. This allows an error code to be returned instead of
encountering abrupt exceptions.

== Userspace Exposition ==

Keylocker implementations have measurable performance penalties.
Therefore, keep the current default remains the same.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is up to the end
user to decide whether to use AES-KL. User can select it by the name
'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the previous posting (Eric):
* Assembly Code
  - Improve function argument descriptions.
  - Simplify the AES processing macro.
  - Rename some symbols.
* Glue Code:
  - Do not use function pointer variables; direct calls.
  - Define a union for the two 'ctx' structures.
  - Clarify a few code spots with comments.
  - Adjust some variable and struct names.
* Kconfig
  - Separate out the Kconfig.assembler change.
---
 arch/x86/crypto/Kconfig            |  18 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-xts-x86_64.S | 337 ++++++++++++++++++++++++
 arch/x86/crypto/aeskl_glue.c       | 409 +++++++++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_glue.c |  13 +-
 arch/x86/crypto/aesni-xts.h        |  15 ++
 6 files changed, 790 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-xts-x86_64.S
 create mode 100644 arch/x86/crypto/aeskl_glue.c
 create mode 100644 arch/x86/crypto/aesni-xts.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..c45d8f48f24e 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,24 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select CRYPTO_SIMD
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9c5ce5613738..c46fd2d9dd16 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -51,6 +51,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o \
 	aes_ctrby8_avx-x86_64.o aes-xts-avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-x86_64.o
+aeskl-x86_64-y := aeskl-xts-x86_64.o aeskl_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-xts-x86_64.S b/arch/x86/crypto/aeskl-xts-x86_64.S
new file mode 100644
index 000000000000..261d03789452
--- /dev/null
+++ b/arch/x86/crypto/aeskl-xts-x86_64.S
@@ -0,0 +1,337 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is primarily derived from aesni-intel_asm.S and
+ * stylistically aligned with aes-xts-avx-x86_64.S.
+ */
+
+#include <linux/linkage.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+
+.section .rodata
+.p2align 4
+.Lgf_poly:
+	/*
+	 * Represents the polynomial x^7 + x^2 + x + 1, where the low 64
+	 * bits are XOR'd into the tweak's low 64 bits when a carry
+	 * occurs from the high 64 bits.
+	 */
+	.quad	0x87, 1
+
+	/*
+	 * Table of constants for variable byte shifts and blending
+	 * during ciphertext stealing operations.
+	 */
+.Lcts_permute_table:
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+
+.text
+
+.set	TWEAK_NEXT1,	%xmm8
+.set	TWEAK_NEXT2,	%xmm9
+.set	TWEAK_NEXT3,	%xmm10
+.set	TWEAK_NEXT4,	%xmm11
+.set	TWEAK_NEXT5,	%xmm12
+.set	TWEAK_NEXT6,	%xmm13
+.set	TWEAK_NEXT7,	%xmm14
+.set	GF_POLY,	%xmm15
+.set	TWEAK_TMP,	TWEAK_NEXT1
+.set	TWEAK_NEXT,	TWEAK_NEXT2
+.set	TMP,		%r10
+.set	KLEN,		%r9d
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handle, const u8 *key,
+ *		       unsigned int keylen)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	KEY,	%rsi	/* Pointer to the original key */
+	.set	KEYLEN,	%edx	/* AES key length in bytes */
+	movl		KEYLEN, 480(HANDLE)
+	vmovdqu		(KEY), %xmm0
+	mov		$1, %eax
+	cmp		$16, %dl
+	je		.Lsetkey_128
+
+	vmovdqu		0x10(KEY), %xmm1
+	encodekey256	%eax, %eax
+	vmovdqu		%xmm3, 0x30(HANDLE)
+	jmp		.Lsetkey_end
+.Lsetkey_128:
+	encodekey128	%eax, %eax
+
+.Lsetkey_end:
+	vmovdqu		%xmm0, 0x00(HANDLE)
+	vmovdqu		%xmm1, 0x10(HANDLE)
+	vmovdqu		%xmm2, 0x20(HANDLE)
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+.macro _aeskl		operation
+	cmp		$16, KLEN
+	je		.Laes_128\@
+.ifc \operation, dec
+	aesdec256kl	(HANDLE), %xmm0
+.else
+	aesenc256kl	(HANDLE), %xmm0
+.endif
+	jmp		.Laes_end\@
+.Laes_128\@:
+.ifc \operation, dec
+	aesdec128kl	(HANDLE), %xmm0
+.else
+	aesenc128kl	(HANDLE), %xmm0
+.endif
+.Laes_end\@:
+.endm
+
+.macro _aesklwide	operation
+	cmp		$16, KLEN
+	je		.Laesw_128\@
+.ifc \operation, dec
+	aesdecwide256kl	(HANDLE)
+.else
+	aesencwide256kl	(HANDLE)
+.endif
+	jmp		.Laesw_end\@
+.Laesw_128\@:
+.ifc \operation, dec
+	aesdecwide128kl	(HANDLE)
+.else
+	aesencwide128kl	(HANDLE)
+.endif
+.Laesw_end\@:
+.endm
+
+/* int __aeskl_enc(const void *handle, u8 *dst, const u8 *src) */
+SYM_FUNC_START(__aeskl_enc)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	DST,	%rsi	/* Pointer to next destination data */
+	.set	SRC,	%rdx	/* Pointer to next source data */
+	vmovdqu		(SRC), %xmm0
+	movl		480(HANDLE), KLEN
+
+	_aeskl		enc
+	jz		.Lerror
+	xor		%rax, %rax
+	vmovdqu		%xmm0, (DST)
+	RET
+.Lerror:
+	mov		$(-EINVAL), %eax
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * Calculate the next 128-bit XTS tweak by multiplying the polynomial 'x'
+ * with the current tweak stored in the register \src, and store the
+ * result in the \dst register.
+ */
+.macro _next_tweak	src, tmp, dst
+	vpshufd		$0x13, \src, \tmp
+	vpaddq		\src, \src, \dst
+	vpsrad		$31, \tmp, \tmp
+	vpand		GF_POLY, \tmp, \tmp
+	vpxor		\tmp, \dst, \dst
+.endm
+
+.macro _aeskl_xts_crypt operation
+	vmovdqa		.Lgf_poly(%rip), GF_POLY
+	vmovups		(TWEAK), TWEAK_NEXT
+	mov		480(HANDLE), KLEN
+
+.ifc \operation, dec
+	/*
+	 * During decryption, if the message length is not a multiple of
+	 * the AES block length, exclude the last complete block from the
+	 * decryption loop by subtracting 16 from LEN. This adjustment is
+	 * necessary because ciphertext stealing decryption uses the last
+	 * two tweaks in reverse order. Special handling is required for
+	 * the last complete block and any remaining partial block at the
+	 * end.
+	 */
+	test		$15, LEN
+	jz		.L8block_at_a_time\@
+	sub		$16, LEN
+.endif
+
+.L8block_at_a_time\@:
+	sub		$128, LEN
+	jl		.Lhandle_remainder\@
+
+	vpxor		(SRC), TWEAK_NEXT, %xmm0
+	vmovups		TWEAK_NEXT, (DST)
+
+	/*
+	 * Calculate and cache tweak values. Note that the tweak
+	 * computation cannot be interleaved with AES rounds here using
+	 * Key Locker instructions.
+	 */
+	_next_tweak	TWEAK_NEXT,  %xmm1, TWEAK_NEXT1
+	_next_tweak	TWEAK_NEXT1, %xmm1, TWEAK_NEXT2
+	_next_tweak	TWEAK_NEXT2, %xmm1, TWEAK_NEXT3
+	_next_tweak	TWEAK_NEXT3, %xmm1, TWEAK_NEXT4
+	_next_tweak	TWEAK_NEXT4, %xmm1, TWEAK_NEXT5
+	_next_tweak	TWEAK_NEXT5, %xmm1, TWEAK_NEXT6
+	_next_tweak	TWEAK_NEXT6, %xmm1, TWEAK_NEXT7
+
+	/* XOR each source block with its tweak. */
+	vpxor		0x10(SRC), TWEAK_NEXT1, %xmm1
+	vpxor		0x20(SRC), TWEAK_NEXT2, %xmm2
+	vpxor		0x30(SRC), TWEAK_NEXT3, %xmm3
+	vpxor		0x40(SRC), TWEAK_NEXT4, %xmm4
+	vpxor		0x50(SRC), TWEAK_NEXT5, %xmm5
+	vpxor		0x60(SRC), TWEAK_NEXT6, %xmm6
+	vpxor		0x70(SRC), TWEAK_NEXT7, %xmm7
+
+	/* Encrypt or decrypt 8 blocks per iteration. */
+	_aesklwide	\operation
+	jz		.Lerror\@
+
+	/* XOR tweaks again. */
+	vpxor		(DST), %xmm0, %xmm0
+	vpxor		TWEAK_NEXT1, %xmm1, %xmm1
+	vpxor		TWEAK_NEXT2, %xmm2, %xmm2
+	vpxor		TWEAK_NEXT3, %xmm3, %xmm3
+	vpxor		TWEAK_NEXT4, %xmm4, %xmm4
+	vpxor		TWEAK_NEXT5, %xmm5, %xmm5
+	vpxor		TWEAK_NEXT6, %xmm6, %xmm6
+	vpxor		TWEAK_NEXT7, %xmm7, %xmm7
+
+	/* Store destination blocks. */
+	vmovdqu		%xmm0, 0x00(DST)
+	vmovdqu		%xmm1, 0x10(DST)
+	vmovdqu		%xmm2, 0x20(DST)
+	vmovdqu		%xmm3, 0x30(DST)
+	vmovdqu		%xmm4, 0x40(DST)
+	vmovdqu		%xmm5, 0x50(DST)
+	vmovdqu		%xmm6, 0x60(DST)
+	vmovdqu		%xmm7, 0x70(DST)
+
+	_next_tweak	TWEAK_NEXT7, TWEAK_TMP, TWEAK_NEXT
+	add		$128, SRC
+	add		$128, DST
+	test		LEN, LEN
+	jz		.Lend\@
+	jmp		.L8block_at_a_time\@
+
+.Lhandle_remainder\@:
+	add		$128, LEN
+	jz		.Lend\@
+.ifc \operation, enc
+	vmovdqu		%xmm7, %xmm0
+.endif
+	sub		$16, LEN
+	jl		.Lcts\@
+
+	/* Encrypt or decrypt one block per iteration */
+.Lblock_at_a_time\@:
+	vpxor		(SRC), TWEAK_NEXT, %xmm0
+	_aeskl		\operation
+	jz		.Lerror\@
+	vpxor		TWEAK_NEXT, %xmm0, %xmm0
+	_next_tweak	TWEAK_NEXT, TWEAK_TMP, TWEAK_NEXT
+	test		LEN, LEN
+	jz		.Lout\@
+
+	add		$16, SRC
+	vmovdqu		%xmm0, (DST)
+	add		$16, DST
+	sub		$16, LEN
+	jge		.Lblock_at_a_time\@
+
+.Lcts\@:
+.ifc \operation, dec
+	/*
+	 * If decrypting, the last block was not decrypted because CTS
+	 * decryption uses the last two tweaks in reverse order. This is
+	 * done by advancing the tweak and decrypting the last block.
+	 */
+	_next_tweak	TWEAK_NEXT, TWEAK_TMP, %xmm4
+	vpxor		(SRC), %xmm4, %xmm0
+	_aeskl		\operation
+	jz		.Lerror\@
+	vpxor		%xmm4, %xmm0, %xmm0
+	add		$16, SRC
+.else
+	/*
+	 * If encrypting, the last block was already encrypted in %xmm0.
+	 * Prepare the CTS encryption by rewinding the pointer.
+	 */
+	sub		$16, DST
+.endif
+	lea		.Lcts_permute_table(%rip), TMP
+
+	/* Load the source partial block */
+	vmovdqu		(SRC, LEN, 1), %xmm3
+
+	/*
+	 * Shift the first LEN bytes of the encryption and decryption of
+	 * the last block to the end of a register, then store it to
+	 * DST+LEN.
+	 */
+	add		$16, LEN
+	vpshufb		(TMP, LEN, 1), %xmm0, %xmm2
+	vmovdqu		%xmm2, (DST, LEN, 1)
+
+	/* Shift the source partial block to the beginning */
+	sub		LEN, TMP
+	vmovdqu		32(TMP), %xmm2
+	vpshufb		%xmm2, %xmm3, %xmm3
+
+	/* Blend to generate the source partial block */
+	vpblendvb	%xmm2, %xmm0, %xmm3, %xmm3
+
+	/* Encrypt or decrypt again and store the last block. */
+	vpxor		TWEAK_NEXT, %xmm3, %xmm0
+	_aeskl		\operation
+	jz		.Lerror\@
+	vpxor		TWEAK_NEXT, %xmm0, %xmm0
+	vmovdqu		%xmm0, (DST)
+
+	xor		%rax, %rax
+	RET
+.Lout\@:
+	vmovdqu		%xmm0, (DST)
+.Lend\@:
+	vmovups		TWEAK_NEXT, (TWEAK)
+	xor		%rax, %rax
+	RET
+.Lerror\@:
+	mov		$(-EINVAL), %eax
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handle, u8 *dst,
+ *			   const u8 *src, unsigned int len, u8 *tweak)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	DST,	%rsi	/* Pointer to next destination data */
+	.set	SRC,	%rdx	/* Pointer to next source data */
+	.set	LEN,	%rcx	/* Remaining length in bytes */
+	.set	TWEAK,	%r8	/* Pointer to next tweak */
+	_aeskl_xts_crypt	enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handle, u8 *dst,
+ *			   const u8 *src, unsigned int len, u8 *tweak)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	DST,	%rsi	/* Pointer to next destination data */
+	.set	SRC,	%rdx	/* Pointer to next source data */
+	.set	LEN,	%rcx	/* Remaining length in bytes */
+	.set	TWEAK,	%r8	/* Pointer to next tweak */
+	_aeskl_xts_crypt	dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl_glue.c b/arch/x86/crypto/aeskl_glue.c
new file mode 100644
index 000000000000..51b8daf7e72a
--- /dev/null
+++ b/arch/x86/crypto/aeskl_glue.c
@@ -0,0 +1,409 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-xts-x86_64.S.
+ *
+ * Most code is based on aesni-intel_glue.c
+ */
+
+#include <linux/err.h>
+#include <linux/types.h>
+#include <linux/module.h>
+
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/simd.h>
+
+#include "aesni-xts.h"
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__ ((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 cipher_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here must be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) \
+			    - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+/*
+ * Unify the two context structures to represent the crypto context.
+ * Depending on the key size, either AES-KL or AES-NI will be used.
+ */
+union x86_aes_ctx {
+	struct aeskl_ctx      aeskl;
+	struct crypto_aes_ctx aesni;
+};
+
+struct xts_aes_ctx {
+	union x86_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	union x86_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline struct xts_aes_ctx *xts_aes_ctx(struct crypto_skcipher *tfm)
+{
+	void *addr = crypto_skcipher_ctx(tfm);
+
+	if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
+		return addr;
+
+	return PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct xts_aes_ctx *ctx = xts_aes_ctx(crypto_skcipher_reqtfm(req));
+
+	BUILD_BUG_ON(offsetof(struct crypto_aes_ctx, key_length) !=
+		     offsetof(struct aeskl_ctx, key_length));
+
+	return ctx->crypt_ctx.aeskl.key_length;
+}
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *handle, const u8 *key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *handle, u8 *dst, const u8 *src);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *handle, u8 *dst, const u8 *src,
+				   unsigned int len, u8 *tweak);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *handle, u8 *dst, const u8 *src,
+				   unsigned int len, u8 *tweak);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	kernel_fpu_begin();
+	if (!valid_keylocker()) {
+		kernel_fpu_end();
+		return -ENODEV;
+	}
+
+	__aeskl_setkey(ctx, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static int aeskl_xts_encrypt_iv(const struct aeskl_ctx *tweak_key,
+				u8 iv[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(tweak_key, iv, iv);
+}
+
+static int aeskl_xts_encrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(key, dst, src, len, tweak);
+}
+
+static int aeskl_xts_decrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(key, dst, src, len, tweak);
+}
+
+/*
+ * The glue code in xts_crypt() and xts_crypt_slowpath() follows
+ * aesni-intel_glue.c. While this code is shareable, the key
+ * material format difference can cause more destructive code changes in
+ * the AES-NI side.
+ */
+
+enum xts_ops {
+	XTS_ENCRYPTION,
+	XTS_DECRYPTION
+};
+
+/* This handles cases where the source and/or destination span pages. */
+static noinline int xts_crypt_slowpath(struct skcipher_request *req, enum xts_ops ops)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct xts_aes_ctx *ctx = xts_aes_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct scatterlist sg_src[2], sg_dst[2];
+	struct skcipher_request subreq;
+	struct scatterlist *src, *dst;
+	struct skcipher_walk walk;
+	int err;
+
+	/*
+	 * If the message length isn't divisible by the AES block size, then
+	 * separate off the last full block and the partial block.  This ensures
+	 * that they are processed in the same call to the assembly function,
+	 * which is required for ciphertext stealing.
+	 */
+	if (tail) {
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   req->cryptlen - tail - AES_BLOCK_SIZE,
+					   req->iv);
+		req = &subreq;
+	}
+
+	err = skcipher_walk_virt(&walk, req, false);
+
+	while (walk.nbytes) {
+		kernel_fpu_begin();
+		if (ops == XTS_ENCRYPTION) {
+			err |= aeskl_xts_encrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+						 walk.dst.virt.addr,
+						 walk.nbytes & ~(AES_BLOCK_SIZE - 1), req->iv);
+		} else {
+			err |= aeskl_xts_decrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+						 walk.dst.virt.addr,
+						 walk.nbytes & ~(AES_BLOCK_SIZE - 1), req->iv);
+		}
+		kernel_fpu_end();
+		err |= skcipher_walk_done(&walk,
+					  walk.nbytes & (AES_BLOCK_SIZE - 1));
+	}
+
+	if (err || !tail)
+		return err;
+
+	/* Do ciphertext stealing with the last full block and partial block. */
+
+	dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+	if (req->dst != req->src)
+		dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+	skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+				   req->iv);
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+
+	kernel_fpu_begin();
+	if (ops == XTS_ENCRYPTION) {
+		err = aeskl_xts_encrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+					walk.dst.virt.addr, walk.nbytes, req->iv);
+	} else {
+		err = aeskl_xts_decrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+					walk.dst.virt.addr, walk.nbytes, req->iv);
+	}
+	kernel_fpu_end();
+	if (err)
+		return err;
+
+	return skcipher_walk_done(&walk, 0);
+}
+
+/* __always_inline to avoid indirect call in fastpath */
+static __always_inline int xts_crypt(struct skcipher_request *req, enum xts_ops ops)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct xts_aes_ctx *ctx = xts_aes_ctx(tfm);
+	const unsigned int cryptlen = req->cryptlen;
+	struct scatterlist *src = req->src;
+	struct scatterlist *dst = req->dst;
+	int err;
+
+	if (unlikely(cryptlen < AES_BLOCK_SIZE))
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	err = aeskl_xts_encrypt_iv(&ctx->tweak_ctx.aeskl, req->iv);
+	if (err)
+		goto out;
+
+	/*
+	 * In practice, virtually all XTS plaintexts and ciphertexts are either
+	 * 512 or 4096 bytes, aligned such that they don't span page boundaries.
+	 * To optimize the performance of these cases, and also any other case
+	 * where no page boundary is spanned, the below fast-path handles
+	 * single-page sources and destinations as efficiently as possible.
+	 */
+	if (likely(src->length >= cryptlen && dst->length >= cryptlen &&
+		   src->offset + cryptlen <= PAGE_SIZE &&
+		   dst->offset + cryptlen <= PAGE_SIZE)) {
+		struct page *src_page = sg_page(src);
+		struct page *dst_page = sg_page(dst);
+		void *src_virt = kmap_local_page(src_page) + src->offset;
+		void *dst_virt = kmap_local_page(dst_page) + dst->offset;
+
+		if (ops == XTS_ENCRYPTION) {
+			err = aeskl_xts_encrypt(&ctx->crypt_ctx.aeskl, src_virt,
+						dst_virt, cryptlen, req->iv);
+		} else {
+			err = aeskl_xts_decrypt(&ctx->crypt_ctx.aeskl, src_virt,
+						dst_virt, cryptlen, req->iv);
+		}
+		if (err)
+			goto out;
+		kunmap_local(dst_virt);
+		kunmap_local(src_virt);
+		kernel_fpu_end();
+		return 0;
+	}
+out:
+	kernel_fpu_end();
+	if (err)
+		return err;
+	return xts_crypt_slowpath(req, ops);
+}
+
+static int xts_setkey_aeskl(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen)
+{
+	struct xts_aes_ctx *ctx = xts_aes_ctx(tfm);
+	unsigned int aes_keylen;
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	aes_keylen = keylen / 2;
+	err = aes_check_keylen(aes_keylen);
+	if (err)
+		return err;
+
+	if (unlikely(aes_keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		return xts_setkey_aesni(tfm, key, keylen);
+	}
+
+	err = aeskl_setkey(&ctx->crypt_ctx.aeskl, key, aes_keylen);
+	if (err)
+		return err;
+
+	return aeskl_setkey(&ctx->tweak_ctx.aeskl, key + aes_keylen, aes_keylen);
+}
+
+static int xts_encrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_encrypt_aesni(req);
+
+	return xts_crypt(req, XTS_ENCRYPTION);
+}
+
+static int xts_decrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_decrypt_aesni(req);
+
+	return xts_crypt(req, XTS_DECRYPTION);
+}
+
+#define XTS_AES_CTX_SIZE (sizeof(struct xts_aes_ctx) + AES_ALIGN_EXTRA)
+
+/*
+ * The 'cra_priority' value is intentionally set lower than
+ * xts-aes-aesni.
+ */
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey_aeskl,
+		.encrypt	= xts_encrypt_aeskl,
+		.decrypt	= xts_decrypt_aeskl,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	/*
+	 * For performance, use the Key Locker AES wide and AVX
+	 * instructions.
+	 */
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+	if (!boot_cpu_has(X86_FEATURE_AVX))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index ef031655b2d3..49fb56efac56 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -35,7 +35,7 @@
 #include <linux/workqueue.h>
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
-
+#include "aesni-xts.h"
 
 #define AESNI_ALIGN	16
 #define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
@@ -864,8 +864,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 }
 #endif
 
-static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
-			    unsigned int keylen)
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
+		     unsigned int keylen)
 {
 	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
@@ -884,6 +884,7 @@ static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
 	/* second half of xts-key is for tweak */
 	return aes_set_key_common(&ctx->tweak_ctx, key + keylen, keylen);
 }
+EXPORT_SYMBOL_GPL(xts_setkey_aesni);
 
 typedef void (*xts_encrypt_iv_func)(const struct crypto_aes_ctx *tweak_key,
 				    u8 iv[AES_BLOCK_SIZE]);
@@ -1020,15 +1021,17 @@ static void aesni_xts_decrypt(const struct crypto_aes_ctx *key,
 	aesni_xts_dec(key, dst, src, len, tweak);
 }
 
-static int xts_encrypt_aesni(struct skcipher_request *req)
+int xts_encrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_encrypt);
 }
+EXPORT_SYMBOL_GPL(xts_encrypt_aesni);
 
-static int xts_decrypt_aesni(struct skcipher_request *req)
+int xts_decrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_decrypt);
 }
+EXPORT_SYMBOL_GPL(xts_decrypt_aesni);
 
 static struct crypto_alg aesni_cipher_alg = {
 	.cra_name		= "aes",
diff --git a/arch/x86/crypto/aesni-xts.h b/arch/x86/crypto/aesni-xts.h
new file mode 100644
index 000000000000..9833da2bd9d2
--- /dev/null
+++ b/arch/x86/crypto/aesni-xts.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESNI_XTS_H
+#define _AESNI_XTS_H
+
+/*
+ * These AES-NI functions are used by the AES-KL code as a fallback when
+ * a 192-bit key is provided. Key Locker does not support 192-bit keys.
+ */
+
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen);
+int xts_encrypt_aesni(struct skcipher_request *req);
+int xts_decrypt_aesni(struct skcipher_request *req);
+
+#endif /* _AESNI_XTS_H */
-- 
2.34.1