[PATCH v8 00/12] x86: Support Key Locker

Chang S. Bae posted 12 patches 2 years, 8 months ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |   2 +
Documentation/arch/x86/index.rst              |   1 +
Documentation/arch/x86/keylocker.rst          |  97 +++
arch/x86/Kconfig                              |   3 +
arch/x86/crypto/Kconfig                       |  22 +
arch/x86/crypto/Makefile                      |   3 +
arch/x86/crypto/aes-helper_asm.S              |  22 +
arch/x86/crypto/aes-helper_glue.h             | 161 +++++
arch/x86/crypto/aeskl-intel_asm.S             | 552 ++++++++++++++++++
arch/x86/crypto/aeskl-intel_glue.c            | 188 ++++++
arch/x86/crypto/aesni-intel_asm.S             |  55 +-
arch/x86/crypto/aesni-intel_glue.c            | 241 +++-----
arch/x86/crypto/aesni-intel_glue.h            |  16 +
arch/x86/include/asm/cpufeatures.h            |   1 +
arch/x86/include/asm/disabled-features.h      |   8 +-
arch/x86/include/asm/keylocker.h              |  45 ++
arch/x86/include/asm/msr-index.h              |   6 +
arch/x86/include/asm/special_insns.h          |  32 +
arch/x86/include/uapi/asm/processor-flags.h   |   2 +
arch/x86/kernel/Makefile                      |   1 +
arch/x86/kernel/cpu/common.c                  |  21 +-
arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
arch/x86/kernel/keylocker.c                   | 212 +++++++
arch/x86/kernel/smpboot.c                     |   2 +
arch/x86/lib/x86-opcode-map.txt               |  11 +-
arch/x86/power/cpu.c                          |   2 +
tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
27 files changed, 1507 insertions(+), 211 deletions(-)
create mode 100644 Documentation/arch/x86/keylocker.rst
create mode 100644 arch/x86/crypto/aes-helper_asm.S
create mode 100644 arch/x86/crypto/aes-helper_glue.h
create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
create mode 100644 arch/x86/crypto/aesni-intel_glue.h
create mode 100644 arch/x86/include/asm/keylocker.h
create mode 100644 arch/x86/kernel/keylocker.c
[PATCH v8 00/12] x86: Support Key Locker
Posted by Chang S. Bae 2 years, 8 months ago
Hi all,

Posting V8 here, a brief status of this enabling:

The last two revisions trend to update the crypto code mainly. The
existing AEX-XTS code was improved further before being shared with
the new code. Then, the new implementation was tuned for the AES-XTS
mode which aligns with the claimed use case -- dm-crypt.

But, I'd say some additional change might be still needed.

The overall changes in charge of the last review:

  * PATCH12:
    - Clarify some documentation (Eric Biggers)
    - Simplify (Eric Biggers) and cleanup code

  * PATCH11:
    - Remove dead code.

  * PATCH10:
    - Deduplicate the alignment code (Eric Biggers)

The series can be found in this repo:
    git://github.com/intel-staging/keylocker.git kl-v8

The overall diff was populated in:
    https://raw.githubusercontent.com/intel-staging/keylocker/diff/kl-v8-vs-v7.diff

The feature is already available on recent Intel client systems. The
V3 cover letter covered the usage, the threat model and other details:
    https://lore.kernel.org/lkml/20211124200700.15888-1-chang.seok.bae@intel.com/

And the V6 cover followed up with updating the performance data:
    https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/

V7 posting:
    https://lore.kernel.org/lkml/20230524165717.14062-1-chang.seok.bae@intel.com/

Thanks,
Chang

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot-time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  crypto: x86/aes - Prepare for a new AES-XTS implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/arch/x86/index.rst              |   1 +
 Documentation/arch/x86/keylocker.rst          |  97 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Kconfig                       |  22 +
 arch/x86/crypto/Makefile                      |   3 +
 arch/x86/crypto/aes-helper_asm.S              |  22 +
 arch/x86/crypto/aes-helper_glue.h             | 161 +++++
 arch/x86/crypto/aeskl-intel_asm.S             | 552 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 188 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  55 +-
 arch/x86/crypto/aesni-intel_glue.c            | 241 +++-----
 arch/x86/crypto/aesni-intel_glue.h            |  16 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 +++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 27 files changed, 1507 insertions(+), 211 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 054377e4774eee812b7930933d7a354ed5a7ddd6
-- 
2.17.1
[PATCH v9 00/14] x86: Support Key Locker
Posted by Chang S. Bae 1 year, 10 months ago
Hi all,

As posting this version, I wanted to make sure these code changes were
acknowledgeable at first:

The previous enabling process has been paused to address vulnerabilities
[1][2] that could compromise Key Locker's ability to protect AES keys.
Now, with the mainlining of mitigations [3][4], patches (Patch 10-11)
were added to ensure the application of these mitigations.

During this period, there was a significant change in the mainline commit
b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"). This affected Key Locker's initialization
code, which clobbers XMM registers for loading a wrapping key, as it
depends on FPU initialization.

In this revision, the setup code was adjusted to separate the
initialization part to be invoked during arch_initcall(). The remaining
code for copying the wrapping key from the backup resides in the
identify_cpu() -> setup_keylocker() path. This separation simplifies the
code and resolves an issue with hotplug.

The remaining changes mainly focus on the AES crypto driver, addressing
feedback from Eric. Notably, while doing so, it was realized better to
disallow a module build. Key Locker's AES instructions do not support
192-bit keys. Supporting a module build would require exporting some
AES-NI functions, leading to performance-impacting indirect calls. I
think we can revisit module support later if necessary.

Then, the following is a summary of changes per patch since v8 [6]:

PATCH7-8:
* Invoke the setup code via arch_initcall() due to upstream changes
  delaying the FPU setup.

PATCH9-11:
* Add new patches for security and hotplug support clarification

PATCH12:
* Drop the "nokeylocker" option. (Borislav Petkov)

PATCH13:
* Introduce 'union x86_aes_ctx'. (Eric Biggers)
* Ensure 'inline' for wrapper functions.

PATCH14:
* Combine the XTS enc/dec assembly code in a macro.  (Eric Biggers)
* Define setkey() as void instead of returning 'int'.  (Eric Biggers)
* Rearrange the assembly code to reduce jumps, especially for success
  cases.  (Eric Biggers)
* Update the changelog for clarification. (Eric Biggers)
* Exclude module build.

This series is based on my AES-NI setkey() cleanup [7], which has been
recently merged into the crypto repository [8], and I thought it was
better to go first. You can also find this series here:
    git://github.com/intel-staging/keylocker.git kl-v9

Thanks,
Chang

[1] Gather Data Sampling (GDS)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
[2] Register File Data Sampling (RFDS)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
[3] Mainlining of GDS mitigation
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64094e7e3118aff4b0be8ff713c242303e139834
[4] Mainlining of RFDS Mitigation
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e33cf955f07e3991e45109cb3e29fbc9ca51d06
[5]  Initialize FPU late
    https://lore.kernel.org/lkml/168778151512.3634408.11432553576702911909.tglx@vps.praguecc.cz/
[6] V8: https://lore.kernel.org/lkml/20230603152227.12335-1-chang.seok.bae@intel.com/
[7] https://lore.kernel.org/lkml/20240322230459.456606-1-chang.seok.bae@intel.com/
[8] git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git

Chang S. Bae (14):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/hotplug/keylocker: Ensure wrapping key backup capability
  x86/cpu/keylocker: Check Gather Data Sampling mitigation
  x86/cpu/keylocker: Check Register File Data Sampling mitigation
  x86/Kconfig: Add a configuration for Key Locker
  crypto: x86/aes - Prepare for new AES-XTS implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 Documentation/arch/x86/index.rst            |   1 +
 Documentation/arch/x86/keylocker.rst        |  96 +++++
 arch/x86/Kconfig                            |   3 +
 arch/x86/Kconfig.assembler                  |   5 +
 arch/x86/crypto/Kconfig                     |  17 +
 arch/x86/crypto/Makefile                    |   3 +
 arch/x86/crypto/aes-helper_asm.S            |  22 ++
 arch/x86/crypto/aes-helper_glue.h           | 168 ++++++++
 arch/x86/crypto/aeskl-intel_asm.S           | 412 ++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c          | 187 +++++++++
 arch/x86/crypto/aeskl-intel_glue.h          |  35 ++
 arch/x86/crypto/aesni-intel_asm.S           |  47 +--
 arch/x86/crypto/aesni-intel_glue.c          | 193 ++-------
 arch/x86/crypto/aesni-intel_glue.h          |  40 ++
 arch/x86/include/asm/cpufeatures.h          |   1 +
 arch/x86/include/asm/disabled-features.h    |   8 +-
 arch/x86/include/asm/keylocker.h            |  42 ++
 arch/x86/include/asm/msr-index.h            |   6 +
 arch/x86/include/asm/special_insns.h        |  28 ++
 arch/x86/include/uapi/asm/processor-flags.h |   2 +
 arch/x86/kernel/Makefile                    |   1 +
 arch/x86/kernel/cpu/common.c                |   4 +-
 arch/x86/kernel/cpu/cpuid-deps.c            |   1 +
 arch/x86/kernel/keylocker.c                 | 219 +++++++++++
 arch/x86/lib/x86-opcode-map.txt             |  11 +-
 arch/x86/power/cpu.c                        |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt       |  11 +-
 27 files changed, 1363 insertions(+), 202 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 3a447c31d337bdec7fbc605a7a1e00aff4c492d0
-- 
2.34.1
Re: [PATCH v9 00/14] x86: Support Key Locker
Posted by Chang S. Bae 1 year, 10 months ago
On 3/28/2024 6:53 PM, Chang S. Bae wrote:
> 
> Then, the following is a summary of changes per patch since v8 [6]:
> 
> PATCH7-8:
> * Invoke the setup code via arch_initcall() due to upstream changes
>    delaying the FPU setup.
> 
> PATCH9-11:
> * Add new patches for security and hotplug support clarification

I've recently made updates to a few patches, primarily related to the
mitigation parts. While the series is still under review, Eric's VAES
patches have been merged into the crypto tree and are currently being
sorted out. Once things settle down, I will make a few adjustments on the
crypto side. Then, another revision will be necessary thereafter.

Thanks,
Chang
Re: [PATCH v9 00/14] x86: Support Key Locker
Posted by Eric Biggers 1 year, 10 months ago
Hi,

On Sun, Apr 07, 2024 at 04:24:18PM -0700, Chang S. Bae wrote:
> On 3/28/2024 6:53 PM, Chang S. Bae wrote:
> > 
> > Then, the following is a summary of changes per patch since v8 [6]:
> > 
> > PATCH7-8:
> > * Invoke the setup code via arch_initcall() due to upstream changes
> >    delaying the FPU setup.
> > 
> > PATCH9-11:
> > * Add new patches for security and hotplug support clarification
> 
> I've recently made updates to a few patches, primarily related to the
> mitigation parts. While the series is still under review, Eric's VAES
> patches have been merged into the crypto tree and are currently being
> sorted out. Once things settle down, I will make a few adjustments on the
> crypto side. Then, another revision will be necessary thereafter.
> 
> Thanks,
> Chang

Thanks for the updated patchset!

Do you have a plan for how this will be merged?  Which trees will the patches go
through?  I think that the actual AES-XTS implementation could still use a bit
more polishing; see my comments below.  However, patches 1-12 don't need to wait
for that.  Perhaps the x86 maintainers would like to take patches 1-12 for
v6.10?  Then the AES-XTS support can go through the crypto tree afterwards.

As you noticed, this cycle I've been optimizing AES-XTS for x86_64 by adding new
VAES and AES-NI + AVX implementations.  I have some ideas for the Key Locker
based implementation of AES-XTS:

First, surely it's the case that in practice, all CPUs that support Key Locker
also support AVX?  If so, then there's no need for the Key Locker assembly to
use legacy SSE instructions.  It should instead target AVX and use VEX-coded
instructions.  This would save some instructions and improve performance.

Since the Key Locker assembly only supports 64-bit mode, it should also feel
free to use registers xmm8-xmm15 for purposes such as caching the XTS tweaks.
This would improve performance.

Since the Key Locker assembly advances a large number of XTS tweaks at a time
(8), I'm also wondering if it would be faster to multiply by x^8 directly
instead of multiplying by x sequentially eight times.  This can be done using
the pclmulqdq instruction; see aes-xts-avx-x86_64.S which implements this
optimization.  Probably all CPUs that support Key Locker also support PCLMULQDQ.

I'm also trying to think of the best way to organize the Key Locker AES-XTS glue
code.  I see that you're proposing to share the glue code with the existing
AES-XTS implementations.  Unfortunately I don't think this ends up working very
well, due to the facts that the Key Locker code can return errors and uses a
different key type.  I think that for now, I'd prefer that you simply copied the
XTS glue code into aeskl-intel_glue.c and modified it as needed.  (But make sure
to use the new version of the glue code, which is faster.)

For falling back to AES-NI, I think the cleanest solution is to call the
top-level setkey, encrypt, and decrypt functions (the ones that are set in the
xts-aes-aesni skcipher_alg), instead of calling lower-level functions as your
current patchset does.

If you could keep the Key Locker assembly roughly stylistically consistent with
the new aes-xts-avx-x86_64.S, that would be great too.

Do you happen to know if there's any way to test the Key Locker AES-XTS code
without having access to a bare metal machine with a CPU that supports Key
Locker?  I tried a Sapphire Rapids based VM in Google Compute Engine, but it
doesn't enumerate Key Locker.  I also don't see anything in QEMU related to Key
Locker.  So I don't currently have an easy way to test this patchset.

Finally, a high level question.  Key Locker has been reported to be
substantially slower than AES-NI.  At the same time, VAES has recently doubled
performance over AES-NI.  I'd guess this leaves Key Locker even further behind.
Given that, how useful is this patchset?  I'm a bit concerned that this might be
something that sounds good in theory but won't be used in practice.  Are
performance improvements for Key Locker on the horizon?  (Well, there are the
improvements I suggested above, which should help, but it sounds like main issue
is the Key Locker instructions themselves which are just fundamentally slower.)

- Eric
Re: [PATCH v9 00/14] x86: Support Key Locker
Posted by Chang S. Bae 1 year, 9 months ago
Hi Eric,

Sorry for my delay. I just found your message in my cluttered junk 
folder today after returning from my week off.

On 4/7/2024 6:48 PM, Eric Biggers wrote:
> 
> Do you have a plan for how this will be merged?  Which trees will the patches go
> through?  I think that the actual AES-XTS implementation could still use a bit
> more polishing; see my comments below.  However, patches 1-12 don't need to wait
> for that.  Perhaps the x86 maintainers would like to take patches 1-12 for
> v6.10?  Then the AES-XTS support can go through the crypto tree afterwards.

Yeah, this series spans both x86 code and the crypto driver. I believe 
the decision should be made by maintainers. But, I suspect they'll want 
to see a well-established use case code before moving forward.

> As you noticed, this cycle I've been optimizing AES-XTS for x86_64 by adding new
> VAES and AES-NI + AVX implementations.  I have some ideas for the Key Locker
> based implementation of AES-XTS:

Thanks for the effort! I agree that the code could be beneficial for 
daily disk encryption needs.

> First, surely it's the case that in practice, all CPUs that support Key Locker
> also support AVX?  If so, then there's no need for the Key Locker assembly to
> use legacy SSE instructions.  It should instead target AVX and use VEX-coded
> instructions.  This would save some instructions and improve performance.

Unfortunately, the Key Locker instructions using the AVX states were 
never implemented.

> Since the Key Locker assembly only supports 64-bit mode, it should also feel
> free to use registers xmm8-xmm15 for purposes such as caching the XTS tweaks.
> This would improve performance.
> 
> Since the Key Locker assembly advances a large number of XTS tweaks at a time
> (8), I'm also wondering if it would be faster to multiply by x^8 directly
> instead of multiplying by x sequentially eight times.  This can be done using
> the pclmulqdq instruction; see aes-xts-avx-x86_64.S which implements this
> optimization.  Probably all CPUs that support Key Locker also support PCLMULQDQ.

I'll revisit the assembly code to incorporate your suggestions.

> I'm also trying to think of the best way to organize the Key Locker AES-XTS glue
> code.  I see that you're proposing to share the glue code with the existing
> AES-XTS implementations.  Unfortunately I don't think this ends up working very
> well, due to the facts that the Key Locker code can return errors and uses a
> different key type.  I think that for now, I'd prefer that you simply copied the
> XTS glue code into aeskl-intel_glue.c and modified it as needed.  (But make sure
> to use the new version of the glue code, which is faster.)

I agree; the proposed glue code looks messy due to the different return 
errors and key types. Ard made a point earlier about establishing a 
shared common code as they're logically quite similar. But I suppose it 
is more practical to pursue a separate glue code at this point.

> For falling back to AES-NI, I think the cleanest solution is to call the
> top-level setkey, encrypt, and decrypt functions (the ones that are set in the
> xts-aes-aesni skcipher_alg), instead of calling lower-level functions as your
> current patchset does.

Yes, falling back is indeed one of the ugly parts of this series. Let me 
retry this as you suggested.

> If you could keep the Key Locker assembly roughly stylistically consistent with
> the new aes-xts-avx-x86_64.S, that would be great too.

Okay.

> Do you happen to know if there's any way to test the Key Locker AES-XTS code
> without having access to a bare metal machine with a CPU that supports Key
> Locker?  I tried a Sapphire Rapids based VM in Google Compute Engine, but it
> doesn't enumerate Key Locker.  I also don't see anything in QEMU related to Key
> Locker.  So I don't currently have an easy way to test this patchset.

No, there isn't currently an emulation option available to the public 
that I'm aware of. this feature has been available on client systems 
since the Tiger Lake generation.

> Finally, a high level question.  Key Locker has been reported to be
> substantially slower than AES-NI.  At the same time, VAES has recently doubled
> performance over AES-NI.  I'd guess this leaves Key Locker even further behind.
> Given that, how useful is this patchset?  I'm a bit concerned that this might be
> something that sounds good in theory but won't be used in practice.  Are
> performance improvements for Key Locker on the horizon?  (Well, there are the
> improvements I suggested above, which should help, but it sounds like main issue
> is the Key Locker instructions themselves which are just fundamentally slower.)

On our latest implementations, we've observed the Key Locker performance 
on cryptsetup seems to be roughly the same as what we posted earlier 
[2]. Yes, this sounds like a fair analogy to me, especially given your 
vAES code.

Thanks,
Chang

[1] 
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
[2] 
https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/
Re: [PATCH v9 00/14] x86: Support Key Locker
Posted by Eric Biggers 1 year, 9 months ago
On Mon, Apr 15, 2024 at 03:16:18PM -0700, Chang S. Bae wrote:
> > First, surely it's the case that in practice, all CPUs that support Key Locker
> > also support AVX?  If so, then there's no need for the Key Locker assembly to
> > use legacy SSE instructions.  It should instead target AVX and use VEX-coded
> > instructions.  This would save some instructions and improve performance.
> 
> Unfortunately, the Key Locker instructions using the AVX states were never
> implemented.

Sure, you could still use VEX-coded 128-bit instructions for everything other
than the actual AES (for example, the XTS tweak computation) though, right?
They're a bit more convenient to work with since they are non-destructive.

- Eric
Re: [PATCH v9 00/14] x86: Support Key Locker
Posted by Chang S. Bae 1 year, 9 months ago
On 4/15/2024 3:54 PM, Eric Biggers wrote:
> 
> Sure, you could still use VEX-coded 128-bit instructions for everything other
> than the actual AES (for example, the XTS tweak computation) though, right?
> They're a bit more convenient to work with since they are non-destructive.

Right.

Thanks,
Chang
[PATCH v9 01/14] Documentation/x86: Document Key Locker
Posted by Chang S. Bae 1 year, 10 months ago
Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
Changes from v8:
* Change wording of documentation slightly. (Randy Dunlap and Bagas
  Sanjaya)

Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
  documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
  to process a 192-bit key.'
* Update the text for clarity and readability:
  - Clarify the error code and exemplify the backup failure
  - Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
 Documentation/arch/x86/index.rst     |  1 +
 Documentation/arch/x86/keylocker.rst | 96 ++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)
 create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index 8ac64d7de4dc..669c239c009f 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -43,3 +43,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..b28addb8eaf4
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,96 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+Upon the event of a wrapping key restore failure upon resume from suspend,
+all established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend or reboot. If the volume
+impacted by a wrapping key restore failure is a data volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return -EINVAL.
+  Note that this flag is also set if the wrapping key is changed, e.g.,
+  because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. The AES-KL cipher
+  implementation logs a warning message with a 192-bit key and then falls
+  back to AES-NI. So, this 192-bit key-size limitation is only documented,
+  not enforced. It means the key will remain in clear-text in memory. This
+  is to meet Linux crypto-cipher expectation that each implementation must
+  support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
-- 
2.34.1
Re: [PATCH v9 01/14] Documentation/x86: Document Key Locker
Posted by Randy Dunlap 1 year, 10 months ago
Hi,


On 3/28/24 18:53, Chang S. Bae wrote:
> Document the overview of the feature along with relevant consideration
> when provisioning dm-crypt volumes with AES-KL instead of AES-NI.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> Changes from v8:
> * Change wording of documentation slightly. (Randy Dunlap and Bagas
>   Sanjaya)
> 
> Changes from v6:
> * Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
>   documentation into Documentation/arch/"). (Nathan Huckleberry)
> * Remove a duplicated sentence -- 'But there is no AES-KL instruction
>   to process a 192-bit key.'
> * Update the text for clarity and readability:
>   - Clarify the error code and exemplify the backup failure
>   - Use 'wrapping key' instead of less readable 'IWKey'
> 
> Changes from v5:
> * Fix a typo: 'feature feature' -> 'feature'
> 
> Changes from RFC v2:
> * Add as a new patch.
> 
> The preview is available here:
>   https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
> ---
>  Documentation/arch/x86/index.rst     |  1 +
>  Documentation/arch/x86/keylocker.rst | 96 ++++++++++++++++++++++++++++
>  2 files changed, 97 insertions(+)
>  create mode 100644 Documentation/arch/x86/keylocker.rst
> 
> diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
> index 8ac64d7de4dc..669c239c009f 100644
> --- a/Documentation/arch/x86/index.rst
> +++ b/Documentation/arch/x86/index.rst
> @@ -43,3 +43,4 @@ x86-specific Documentation
>     features
>     elf_auxvec
>     xstate
> +   keylocker
> diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
> new file mode 100644
> index 000000000000..b28addb8eaf4
> --- /dev/null
> +++ b/Documentation/arch/x86/keylocker.rst
> @@ -0,0 +1,96 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +Upon the event of a wrapping key restore failure upon resume from suspend,
> +all established key handles become invalid. In flight dm-crypt operations

                                               In-flight

> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend or reboot. If the volume
> +impacted by a wrapping key restore failure is a data volume then it is
> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return -EINVAL.
> +  Note that this flag is also set if the wrapping key is changed, e.g.,
> +  because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
> +  implementation logs a warning message with a 192-bit key and then falls
> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,
> +  not enforced. It means the key will remain in clear-text in memory. This
> +  is to meet Linux crypto-cipher expectation that each implementation must
> +  support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

thanks.
-- 
#Randy
[PATCH v9 02/14] x86/cpufeature: Enumerate Key Locker feature
Posted by Chang S. Bae 1 year, 10 months ago
Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature having dependency on XMM2 like AES-NI has. The new AES
implementation will be in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled feature list. It will be
enabled by a new Kconfig option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f0337f7bcf16..dd30435af487 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -399,6 +399,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index da4054fbf533..14aa6dc3b846 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -150,7 +156,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK19	(DISABLE_SEV_SNP)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index f1a4adc78272..a24f7cb2cd68 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b7174209d855..820dcf35eca9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
 	{ X86_FEATURE_FRED,			X86_FEATURE_WRMSRNS   },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.34.1
[PATCH v9 03/14] x86/insn: Add Key Locker instructions to the opcode map
Posted by Chang S. Bae 1 year, 10 months ago
The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.34.1
[PATCH v9 04/14] x86/asm: Add a wrapper function for the LOADIWKEY instruction
Posted by Chang S. Bae 1 year, 10 months ago
Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish an
accessor for the feature setup, and define struct iwkey to pass a key
value.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Previously, Dan questioned the necessity of 'WARN_ON(!irq_fpu_usable())'
in the load_xmm_iwkey() function. However, it's worth noting that the
function comment emphasizes the caller's responsibility for invoking
kernel_fpu_begin(), which effectively performs the sanity check through
kernel_fpu_begin_mask().
---
 arch/x86/include/asm/keylocker.h     | 25 +++++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..4e731f577c50
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key:	A 128-bit key used to verify the integrity of
+ *			key handles
+ * @encryption_key:	A 256-bit encryption key used for wrapping and
+ *			unwrapping clear text keys.
+ *
+ * This storage should be flushed immediately after being loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 2e9fc5c400cd..65267013f1e1 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <linux/errno.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -301,6 +302,33 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key into XMM registers.
+ * @key:	A pointer to a struct iwkey containing the key data.
+ *
+ * The caller is responsible for invoking kernel_fpu_begin() before.
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * 'LOADIWKEY %xmm1,%xmm2' loads a key from XMM0-2 into a
+	 * software-invisible CPU state. With zero in EAX, CPU does not
+	 * perform hardware randomization and allows key backup.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.34.1
[PATCH v9 05/14] x86/msr-index: Add MSRs for Key Locker wrapping key
Posted by Chang S. Bae 1 year, 10 months ago
The wrapping key resides in the same power domain as the CPU cache.
Consequently, any sleep state that invalidates the cache, such as S3,
also affects the wrapping key's state.

However, as the wrapping key's state is inaccessible to software, a
specialized mechanism is necessary to save and restore the key during
deep sleep.

A set of new MSRs is provided as an abstract interface for saving,
restoring, and checking the wrapping key's status. The wrapping key
is securely saved in a platform-scoped state using non-volatile media.
Both the backup storage and its path from the CPU are encrypted and
integrity-protected to ensure security.

Define those MSRs for saving and restoring the key during S3/4 sleep
states.

Note that the non-volatility of the backup storage is not architecturally
guaranteed across off-states such as S5 and G3. In such cases, the kernel
may generate a new key during the next boot.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v8:
* Tweak the changelog.

Changes from v6:
* Tweak the changelog -- put the last for those about other sleep states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 05956bd8bacf..a451fa1e2cd9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1192,4 +1192,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.34.1
[PATCH v9 06/14] x86/keylocker: Define Key Locker CPUID leaf
Posted by Chang S. Bae 1 year, 10 months ago
Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference feature-specific CPUID bits.
Define this CPUID leaf and bits.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
  change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 4e731f577c50..1213d273c369 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.34.1
[PATCH v9 07/14] x86/cpu/keylocker: Load a wrapping key at boot time
Posted by Chang S. Bae 1 year, 10 months ago
The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled before the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time, involving
clobbering XMM registers. Perform this process under arch_initcall(),
ensuring that it occurs after FPU initialization. Finally, flush out
random bytes after loading.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt use, and that switching wrapping key per virtual machine
is impractical, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Elliott, Robert (Servers)" <elliott@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
Changes from v8:
* Invoke the setup code via arch_initcall(). The move was due to the
  upstream changes. Commit b81fac906a8f ("x86/fpu: Move FPU
  initialization into arch_cpu_finalize_init()") delays the FPU setup.
* Tweak code comments and the changelog.
* Revoke the review tag as the code change is significant.

Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
  macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
  the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.
---
 arch/x86/kernel/Makefile    |  1 +
 arch/x86/kernel/keylocker.c | 77 +++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 74077694da7d..d105e5785b90 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -137,6 +137,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..0d6b715baf1e
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/processor.h>
+
+static struct iwkey wrapping_key __initdata;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
+	get_random_bytes(&wrapping_key.encryption_key, sizeof(wrapping_key.encryption_key));
+}
+
+static void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&wrapping_key, sizeof(wrapping_key));
+}
+
+/*
+ * For loading the wrapping key into each CPU, the feature bit is set
+ * in the control register and FPU context management is performed.
+ */
+static void __init load_keylocker(struct work_struct *unused)
+{
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	kernel_fpu_begin();
+	load_xmm_iwkey(&wrapping_key);
+	kernel_fpu_end();
+}
+
+static int __init init_keylocker(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto disable;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto clear_cap;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	/* AESKLE depends on CR4.KEYLOCKER */
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+	    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+		pr_debug("x86/keylocker: Not fully supported.\n");
+		goto clear_cap;
+	}
+
+	generate_keylocker_data();
+	schedule_on_each_cpu(load_keylocker);
+	destroy_keylocker_data();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return 0;
+
+clear_cap:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+disable:
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+	return -ENODEV;
+}
+
+arch_initcall(init_keylocker);
-- 
2.34.1
[PATCH v9a 07/14] x86/cpu/keylocker: Load a wrapping key at boot time
Posted by Chang S. Bae 1 year, 10 months ago
The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled before the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time, involving
clobbering XMM registers. Perform this process under arch_initcall(),
ensuring that it occurs after FPU initialization. Finally, flush out
random bytes after loading.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt use, and that switching wrapping key per virtual machine
is impractical, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Elliott, Robert (Servers)" <elliott@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
Changes from v9:
* Include 'tlbflush.h' back, which was once removed by mistake.
---
 arch/x86/kernel/Makefile    |  1 +
 arch/x86/kernel/keylocker.c | 78 +++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 74077694da7d..d105e5785b90 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -137,6 +137,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..8569b92971da
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/processor.h>
+#include <asm/tlbflush.h>
+
+static struct iwkey wrapping_key __initdata;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
+	get_random_bytes(&wrapping_key.encryption_key, sizeof(wrapping_key.encryption_key));
+}
+
+static void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&wrapping_key, sizeof(wrapping_key));
+}
+
+/*
+ * For loading the wrapping key into each CPU, the feature bit is set
+ * in the control register and FPU context management is performed.
+ */
+static void __init load_keylocker(struct work_struct *unused)
+{
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	kernel_fpu_begin();
+	load_xmm_iwkey(&wrapping_key);
+	kernel_fpu_end();
+}
+
+static int __init init_keylocker(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto disable;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto clear_cap;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	/* AESKLE depends on CR4.KEYLOCKER */
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+	    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+		pr_debug("x86/keylocker: Not fully supported.\n");
+		goto clear_cap;
+	}
+
+	generate_keylocker_data();
+	schedule_on_each_cpu(load_keylocker);
+	destroy_keylocker_data();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return 0;
+
+clear_cap:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+disable:
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+	return -ENODEV;
+}
+
+arch_initcall(init_keylocker);
-- 
2.40.1
[PATCH v9 08/14] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
Posted by Chang S. Bae 1 year, 10 months ago
The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

To meet dm-crypt's expectations, the key handles in the suspend-image has
to remain valid after resuming from an S-state. However, when the system
enters ACPI S3 or S4 sleep states, the wrapping key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. Therefore, upon boot, request a backup of the
wrapping key and copy it back to each CPU upon wakeup. If the backup
mechanism is unavailable, disable the feature unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This action invalidates any key handles
present in the suspend-image, leading to I/O errors in dm-crypt
operations.

However, data integrity remains intact, and access is restored with new
handles created by the new wrapping key at the next boot. At least,
manage a feature-specific flag to communicate with the crypto
implementation, ensuring to stop using AES instructions upon the key
restore failure, instead of abruptly disabling the feature.

== Off-states ==

While the backup may persist in non-volatile media across S5 and G3 "off"
states, it is neither architecturally guaranteed nor expected by
dm-crypt. Therefore, a reboot can address this scenario with a new
wrapping key, as dm-crypt prompts for the key whenever the volume is
started.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sangwhan Moon <sxm@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
Changes from v8:
* Rebase on the previous patch (patch7) changes, separating the wrapping
  key restoration code from the initial load. Previously, the
  identify_cpu() -> setup_keylocker() sequence in the hotplug path could
  hit __init code, leading to an explosion. This change removes the
  initialization code from the hotplug path. (Sangwhan Moon)
* Turn copy_keylocker() to return bool for simplification.
* Rename the flag for clarity: 'valid_kl' -> 'valid_wrapping_key'.
* Don't export symbol for valid_keylocker(), as AES-KL will be built-in.
  (see patch14 for detail).
* Tweak code comments and the changelog.
* Revoke the review tag as the code change is significant.

Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
  'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
  (Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
  Biggers and Dave Hansen)
* Update the documentation:
  - Massage the changelog to clarify the problem-and-solution by
    sections
  - Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebase on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h | 10 ++++
 arch/x86/kernel/cpu/common.c     |  4 +-
 arch/x86/kernel/keylocker.c      | 88 ++++++++++++++++++++++++++++++++
 arch/x86/power/cpu.c             |  2 +
 4 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1213d273c369..c93102101c41 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -28,5 +28,15 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
+#else
+static inline void setup_keylocker(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5c1e6d6be267..bfbb1ca64664 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -62,6 +62,7 @@
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
 #include <asm/fred.h>
+#include <asm/keylocker.h>
 #include <asm/uv/uv.h>
 #include <asm/ia32.h>
 #include <asm/set_memory.h>
@@ -1826,10 +1827,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker();
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 0d6b715baf1e..d5d11d0263b7 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -9,10 +9,24 @@
 
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
+#include <asm/msr.h>
 #include <asm/processor.h>
 
 static struct iwkey wrapping_key __initdata;
 
+/*
+ * This flag is set when a wrapping key is successfully loaded. If a key
+ * restoration fails, it is reset. This state is exported to the crypto
+ * library, indicating whether Key Locker is usable. Thus, the feature
+ * can be soft-disabled based on this flag.
+ */
+static bool valid_wrapping_key;
+
+bool valid_keylocker(void)
+{
+	return valid_wrapping_key;
+}
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
@@ -37,9 +51,69 @@ static void __init load_keylocker(struct work_struct *unused)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Returns:	true if successful, otherwise false.
+ */
+static bool copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	return !!(status & BIT(0));
+}
+
+/*
+ * On wakeup, APs copy a wrapping key after the boot CPU verifies a valid
+ * backup status through restore_keylocker(). Subsequently, they adhere
+ * to the error handling protocol by invalidating the flag.
+ */
+void setup_keylocker(void)
+{
+	if (!valid_wrapping_key)
+		return;
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (copy_keylocker())
+		return;
+
+	pr_err_once("x86/keylocker: Invalid copy status.\n");
+	valid_wrapping_key = false;
+}
+
+/* The boot CPU restores the wrapping key in the first place on wakeup. */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+
+	if (!valid_wrapping_key)
+		return;
+
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		if (copy_keylocker())
+			return;
+		pr_err("x86/keylocker: Invalid copy state.\n");
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Invalidate the feature via this flag to indicate that the
+	 * crypto code should voluntarily stop using the feature, rather
+	 * than abruptly disabling it.
+	 */
+	valid_wrapping_key = false;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
+	bool backup_available;
 
 	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
 		goto disable;
@@ -59,9 +133,23 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	/*
+	 * The backup is critical for restoring the wrapping key upon
+	 * wakeup.
+	 */
+	backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+	if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+		pr_debug("x86/keylocker: No key backup with possible S3/4.\n");
+		goto clear_cap;
+	}
+
 	generate_keylocker_data();
 	schedule_on_each_cpu(load_keylocker);
 	destroy_keylocker_data();
+	valid_wrapping_key = true;
+
+	if (backup_available)
+		wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return 0;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.34.1
[PATCH v9a 08/14] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
Posted by Chang S. Bae 1 year, 8 months ago
The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

To meet dm-crypt's expectations, the key handles in the suspend-image has
to remain valid after resuming from an S-state. However, when the system
enters ACPI S3 or S4 sleep states, the wrapping key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. Therefore, upon boot, request a backup of the
wrapping key and copy it back to each CPU upon wakeup. If the backup
mechanism is unavailable, disable the feature unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This action invalidates any key handles
present in the suspend-image, leading to I/O errors in dm-crypt
operations.

However, data integrity remains intact, and access is restored with new
handles created by the new wrapping key at the next boot. At least,
manage a feature-specific flag to communicate with the crypto
implementation, ensuring to stop using AES instructions upon the key
restore failure, instead of abruptly disabling the feature.

== Off-states ==

While the backup may persist in non-volatile media across S5 and G3 "off"
states, it is neither architecturally guaranteed nor expected by
dm-crypt. Therefore, a reboot can address this scenario with a new
wrapping key, as dm-crypt prompts for the key whenever the volume is
started.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
Change from v9:
* Export valid_keylocker() again for the AES-KL module.
---
 arch/x86/include/asm/keylocker.h | 10 ++++
 arch/x86/kernel/cpu/common.c     |  4 +-
 arch/x86/kernel/keylocker.c      | 89 ++++++++++++++++++++++++++++++++
 arch/x86/power/cpu.c             |  2 +
 4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1213d273c369..c93102101c41 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -28,5 +28,15 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
+#else
+static inline void setup_keylocker(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 605c26c009c8..85946d79cb96 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -62,6 +62,7 @@
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
 #include <asm/fred.h>
+#include <asm/keylocker.h>
 #include <asm/uv/uv.h>
 #include <asm/ia32.h>
 #include <asm/set_memory.h>
@@ -1834,10 +1835,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker();
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 8569b92971da..da0830e980ed 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -9,11 +9,26 @@
 
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
+#include <asm/msr.h>
 #include <asm/processor.h>
 #include <asm/tlbflush.h>
 
 static struct iwkey wrapping_key __initdata;
 
+/*
+ * This flag is set when a wrapping key is successfully loaded. If a key
+ * restoration fails, it is reset. This state is exported to the crypto
+ * library, indicating whether Key Locker is usable. Thus, the feature
+ * can be soft-disabled based on this flag.
+ */
+static bool valid_wrapping_key;
+
+bool valid_keylocker(void)
+{
+	return valid_wrapping_key;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
@@ -38,9 +53,69 @@ static void __init load_keylocker(struct work_struct *unused)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Returns:	true if successful, otherwise false.
+ */
+static bool copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	return !!(status & BIT(0));
+}
+
+/*
+ * On wakeup, APs copy a wrapping key after the boot CPU verifies a valid
+ * backup status through restore_keylocker(). Subsequently, they adhere
+ * to the error handling protocol by invalidating the flag.
+ */
+void setup_keylocker(void)
+{
+	if (!valid_wrapping_key)
+		return;
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (copy_keylocker())
+		return;
+
+	pr_err_once("x86/keylocker: Invalid copy status.\n");
+	valid_wrapping_key = false;
+}
+
+/* The boot CPU restores the wrapping key in the first place on wakeup. */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+
+	if (!valid_wrapping_key)
+		return;
+
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		if (copy_keylocker())
+			return;
+		pr_err("x86/keylocker: Invalid copy state.\n");
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Invalidate the feature via this flag to indicate that the
+	 * crypto code should voluntarily stop using the feature, rather
+	 * than abruptly disabling it.
+	 */
+	valid_wrapping_key = false;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
+	bool backup_available;
 
 	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
 		goto disable;
@@ -60,9 +135,23 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	/*
+	 * The backup is critical for restoring the wrapping key upon
+	 * wakeup.
+	 */
+	backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+	if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+		pr_debug("x86/keylocker: No key backup with possible S3/4.\n");
+		goto clear_cap;
+	}
+
 	generate_keylocker_data();
 	schedule_on_each_cpu(load_keylocker);
 	destroy_keylocker_data();
+	valid_wrapping_key = true;
+
+	if (backup_available)
+		wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return 0;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.34.1
[PATCH v9 09/14] x86/hotplug/keylocker: Ensure wrapping key backup capability
Posted by Chang S. Bae 1 year, 10 months ago
To facilitate CPU hotplug, the wrapping key needs to be loaded during CPU
hotplug bringup. setup_keylocker() already establishes the routine
for the wakeup path by copying the key from the backup state.

Disable the feature if it's missing with CONFIG_HOTPLUG_CPU=y. Also,
update the code comment to indicate support for CPU hotplug.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Sangwhan Moon <sxm@google.com>
---
Changes from v8:
* Add as a new patch.
---
 arch/x86/kernel/keylocker.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index d5d11d0263b7..1b57e11d93ad 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -69,6 +69,8 @@ static bool copy_keylocker(void)
  * On wakeup, APs copy a wrapping key after the boot CPU verifies a valid
  * backup status through restore_keylocker(). Subsequently, they adhere
  * to the error handling protocol by invalidating the flag.
+ *
+ * This setup routine is also invoked in the hotplug bringup path.
  */
 void setup_keylocker(void)
 {
@@ -135,11 +137,11 @@ static int __init init_keylocker(void)
 
 	/*
 	 * The backup is critical for restoring the wrapping key upon
-	 * wakeup.
+	 * wakeup or during hotplug bringup.
 	 */
 	backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
-	if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
-		pr_debug("x86/keylocker: No key backup with possible S3/4.\n");
+	if (!backup_available && (IS_ENABLED(CONFIG_SUSPEND) || IS_ENABLED(CONFIG_HOTPLUG_CPU))) {
+		pr_debug("x86/keylocker: No key backup with possible S3/4 or CPU hotplug.\n");
 		goto clear_cap;
 	}
 
-- 
2.34.1
[PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
Posted by Chang S. Bae 1 year, 10 months ago
Gather Data Sampling is a transient execution side channel issue in some
CPU models. The stale data in registers is not guaranteed as secure when
this vulnerability is not addressed.

In the Key Locker usage during AES transformations, the temporary storage
of the original key in registers poses a risk. The key material can be
staled in some implementations, leading to susceptibility to leakage of
the AES key.

To mitigate this vulnerability, a qualified microcode image must be
applied. Software then verifies the mitigation state using MSRs. Add code
to ensure that the mitigation is installed and securely locked. Disable
the feature, otherwise.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Changes from v8:
* Add as a new patch.

Note that the code follows the guidance from [1]:
  "Intel recommends that system software does not enable Key Locker (by
   setting CR4.KL) unless the GDS mitigation is enabled
   (IA32_MCU_OPT_CTRL[GDS_MITG_DIS] (bit 4) is 0) and locked (IA32_MCU_OPT_CTRL
   [GDS_MITG_LOCK](bit 5) is 1)."

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
---
 arch/x86/kernel/keylocker.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 1b57e11d93ad..d4f3aa65ea8a 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -7,6 +7,7 @@
 #include <linux/random.h>
 #include <linux/string.h>
 
+#include <asm/cpu.h>
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/msr.h>
@@ -112,6 +113,37 @@ void restore_keylocker(void)
 	valid_wrapping_key = false;
 }
 
+/*
+ * The mitigation is implemented at a microcode level. Ensure that the
+ * microcode update is applied and the mitigation is locked.
+ */
+static bool __init have_gds_mitigation(void)
+{
+	u64 mcu_ctrl;
+
+	/* GDS_CTRL is set if new microcode is loaded. */
+	if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL))
+		goto vulnerable;
+
+	/* If GDS_MITG_LOCKED is set, GDS_MITG_DIS is forced to 0. */
+	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
+	if (mcu_ctrl & GDS_MITG_LOCKED)
+		return true;
+
+vulnerable:
+	pr_warn("x86/keylocker: Susceptible to the GDS vulnerability.\n");
+	return false;
+}
+
+/* Check if Key Locker is secure enough to be used. */
+static bool __init secure_keylocker(void)
+{
+	if (boot_cpu_has_bug(X86_BUG_GDS) && !have_gds_mitigation())
+		return false;
+
+	return true;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
@@ -125,6 +157,9 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	if (!secure_keylocker())
+		goto clear_cap;
+
 	cr4_set_bits(X86_CR4_KEYLOCKER);
 
 	/* AESKLE depends on CR4.KEYLOCKER */
-- 
2.34.1
Re: [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
Posted by Pawan Gupta 1 year, 10 months ago
On Thu, Mar 28, 2024 at 06:53:42PM -0700, Chang S. Bae wrote:
> +/*
> + * The mitigation is implemented at a microcode level. Ensure that the
> + * microcode update is applied and the mitigation is locked.
> + */
> +static bool __init have_gds_mitigation(void)
> +{
> +	u64 mcu_ctrl;
> +
> +	/* GDS_CTRL is set if new microcode is loaded. */
> +	if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL))
> +		goto vulnerable;
> +
> +	/* If GDS_MITG_LOCKED is set, GDS_MITG_DIS is forced to 0. */
> +	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
> +	if (mcu_ctrl & GDS_MITG_LOCKED)
> +		return true;

Similar to RFDS, above checks can be simplified to:

	if (gds_mitigation == GDS_MITIGATION_FULL_LOCKED)
		return true;
> +
> +vulnerable:
> +	pr_warn("x86/keylocker: Susceptible to the GDS vulnerability.\n");
> +	return false;
> +}
[PATCH v9a 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
Posted by Chang S. Bae 1 year, 10 months ago
Gather Data Sampling is a transient execution side channel issue in some
CPU models. The stale data in registers is not guaranteed as secure when
this vulnerability is not addressed.

In the Key Locker usage during AES transformations, the temporary storage
of the original key in registers poses a risk. The key material can be
staled in some implementations, leading to susceptibility to leakage of
the AES key.

To mitigate this vulnerability, a qualified microcode image must be
applied. Add code to ensure that the mitigation is installed and securely
locked. Disable the feature, otherwise.

Expand gds_ucode_mitigated() to examine the lock state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Changes from v9:
* Removed MSR reads and utilized the helper function. (Pawan Gupta)

Alternatively, 'gds_mitigation' can be exported and referenced directly.
Using 'gds_mitigation == GDS_MITIGATION_FULL_LOCKED' may also be
readable. However, it was opted to expand gds_ucode_mitigated() for
consistency, as it is already established.

Note that this approach aligns with Intel's guidance, as the bugs.c code
checks the following MSR bits:
  "Intel recommends that system software does not enable Key Locker (by
   setting CR4.KL) unless the GDS mitigation is enabled
   (IA32_MCU_OPT_CTRL[GDS_MITG_DIS] (bit 4) is 0) and locked
   (IA32_MCU_OPT_CTRL [GDS_MITG_LOCK](bit 5) is 1)."

For more information, refer to Intel's technical documentation on Gather
Data Sampling:
  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
---
 arch/x86/include/asm/processor.h |  7 ++++++-
 arch/x86/kernel/cpu/bugs.c       |  5 ++++-
 arch/x86/kernel/keylocker.c      | 12 ++++++++++++
 arch/x86/kvm/x86.c               |  2 +-
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f131f4..74eaa3a2b85b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -721,7 +721,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
-extern bool gds_ucode_mitigated(void);
+enum mitigation_info {
+	MITG_FULL,
+	MITG_LOCKED,
+};
+
+extern bool gds_ucode_mitigated(enum mitigation_info mitg);
 
 /*
  * Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index e7ba936d798b..80f6e70619cb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -752,8 +752,11 @@ static const char * const gds_strings[] = {
 	[GDS_MITIGATION_HYPERVISOR]	= "Unknown: Dependent on hypervisor status",
 };
 
-bool gds_ucode_mitigated(void)
+bool gds_ucode_mitigated(enum mitigation_info mitg)
 {
+	if (mitg == MITG_LOCKED)
+		return gds_mitigation == GDS_MITIGATION_FULL_LOCKED;
+
 	return (gds_mitigation == GDS_MITIGATION_FULL ||
 		gds_mitigation == GDS_MITIGATION_FULL_LOCKED);
 }
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 1e81d0704eea..23cf4a235f11 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -113,6 +113,15 @@ void restore_keylocker(void)
 	valid_wrapping_key = false;
 }
 
+/* Check if Key Locker is secure enough to be used. */
+static bool __init secure_keylocker(void)
+{
+	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_ucode_mitigated(MITG_LOCKED))
+		return false;
+
+	return true;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
@@ -126,6 +135,9 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	if (!secure_keylocker())
+		goto clear_cap;
+
 	cr4_set_bits(X86_CR4_KEYLOCKER);
 
 	/* AESKLE depends on CR4.KEYLOCKER */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47d9f03b7778..4ab50e95fdb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1675,7 +1675,7 @@ static u64 kvm_get_arch_capabilities(void)
 		 */
 	}
 
-	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
+	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated(MITG_FULL))
 		data |= ARCH_CAP_GDS_NO;
 
 	return data;
-- 
2.40.1
Re: [PATCH v9a 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
Posted by Pawan Gupta 1 year, 9 months ago
On Sun, Apr 07, 2024 at 04:04:32PM -0700, Chang S. Bae wrote:
> Gather Data Sampling is a transient execution side channel issue in some
> CPU models. The stale data in registers is not guaranteed as secure when
> this vulnerability is not addressed.
> 
> In the Key Locker usage during AES transformations, the temporary storage
> of the original key in registers poses a risk. The key material can be
> staled in some implementations, leading to susceptibility to leakage of
> the AES key.
> 
> To mitigate this vulnerability, a qualified microcode image must be
> applied. Add code to ensure that the mitigation is installed and securely
> locked. Disable the feature, otherwise.
> 
> Expand gds_ucode_mitigated() to examine the lock state.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
> Changes from v9:
> * Removed MSR reads and utilized the helper function. (Pawan Gupta)
> 
> Alternatively, 'gds_mitigation' can be exported and referenced directly.
> Using 'gds_mitigation == GDS_MITIGATION_FULL_LOCKED' may also be
> readable. However, it was opted to expand gds_ucode_mitigated() for
> consistency, as it is already established.
> 
> Note that this approach aligns with Intel's guidance, as the bugs.c code
> checks the following MSR bits:
>   "Intel recommends that system software does not enable Key Locker (by
>    setting CR4.KL) unless the GDS mitigation is enabled
>    (IA32_MCU_OPT_CTRL[GDS_MITG_DIS] (bit 4) is 0) and locked
>    (IA32_MCU_OPT_CTRL [GDS_MITG_LOCK](bit 5) is 1)."
> 
> For more information, refer to Intel's technical documentation on Gather
> Data Sampling:
>   https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
> ---
>  arch/x86/include/asm/processor.h |  7 ++++++-
>  arch/x86/kernel/cpu/bugs.c       |  5 ++++-
>  arch/x86/kernel/keylocker.c      | 12 ++++++++++++
>  arch/x86/kvm/x86.c               |  2 +-
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 811548f131f4..74eaa3a2b85b 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -721,7 +721,12 @@ enum mds_mitigations {
>  	MDS_MITIGATION_VMWERV,
>  };
>  
> -extern bool gds_ucode_mitigated(void);
> +enum mitigation_info {
> +	MITG_FULL,
> +	MITG_LOCKED,
> +};
> +
> +extern bool gds_ucode_mitigated(enum mitigation_info mitg);
>  
>  /*
>   * Make previous memory operations globally visible before
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index e7ba936d798b..80f6e70619cb 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -752,8 +752,11 @@ static const char * const gds_strings[] = {
>  	[GDS_MITIGATION_HYPERVISOR]	= "Unknown: Dependent on hypervisor status",
>  };
>  
> -bool gds_ucode_mitigated(void)
> +bool gds_ucode_mitigated(enum mitigation_info mitg)
>  {
> +	if (mitg == MITG_LOCKED)
> +		return gds_mitigation == GDS_MITIGATION_FULL_LOCKED;
> +
>  	return (gds_mitigation == GDS_MITIGATION_FULL ||
>  		gds_mitigation == GDS_MITIGATION_FULL_LOCKED);
>  }
> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> index 1e81d0704eea..23cf4a235f11 100644
> --- a/arch/x86/kernel/keylocker.c
> +++ b/arch/x86/kernel/keylocker.c
> @@ -113,6 +113,15 @@ void restore_keylocker(void)
>  	valid_wrapping_key = false;
>  }
>  
> +/* Check if Key Locker is secure enough to be used. */
> +static bool __init secure_keylocker(void)
> +{
> +	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_ucode_mitigated(MITG_LOCKED))
> +		return false;
> +
> +	return true;
> +}
> +
>  static int __init init_keylocker(void)
>  {
>  	u32 eax, ebx, ecx, edx;
> @@ -126,6 +135,9 @@ static int __init init_keylocker(void)
>  		goto clear_cap;
>  	}
>  
> +	if (!secure_keylocker())
> +		goto clear_cap;
> +
>  	cr4_set_bits(X86_CR4_KEYLOCKER);
>  
>  	/* AESKLE depends on CR4.KEYLOCKER */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 47d9f03b7778..4ab50e95fdb5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1675,7 +1675,7 @@ static u64 kvm_get_arch_capabilities(void)
>  		 */
>  	}
>  
> -	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
> +	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated(MITG_FULL))
>  		data |= ARCH_CAP_GDS_NO;
>  
>  	return data;

Repurposing gds_ucode_mitigated() to check for the locked state is
adding a bit of a churn. We can introduce gds_mitigation_locked()
instead.

Is below looking okay:

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f131f4..8ba96e8a8754 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -722,6 +722,7 @@ enum mds_mitigations {
 };
 
 extern bool gds_ucode_mitigated(void);
+extern bool gds_mitigation_locked(void);
 
 /*
  * Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ca295b0c1eee..a7ec26988ddb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -753,6 +753,11 @@ bool gds_ucode_mitigated(void)
 }
 EXPORT_SYMBOL_GPL(gds_ucode_mitigated);
 
+bool gds_mitigation_locked(void)
+{
+	return gds_mitigation == GDS_MITIGATION_FULL_LOCKED;
+}
+
 void update_gds_msr(void)
 {
 	u64 mcu_ctrl_after;
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 1b57e11d93ad..c40e72f482b1 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -112,6 +112,15 @@ void restore_keylocker(void)
 	valid_wrapping_key = false;
 }
 
+/* Check if Key Locker is secure enough to be used. */
+static bool __init secure_keylocker(void)
+{
+	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_mitigation_locked())
+		return false;
+
+	return true;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
@@ -125,6 +134,9 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	if (!secure_keylocker())
+		goto clear_cap;
+
 	cr4_set_bits(X86_CR4_KEYLOCKER);
 
 	/* AESKLE depends on CR4.KEYLOCKER */
Re: [PATCH v9a 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
Posted by Chang S. Bae 1 year, 9 months ago
On 4/18/2024 5:01 PM, Pawan Gupta wrote:
> 
> Repurposing gds_ucode_mitigated() to check for the locked state is
> adding a bit of a churn. We can introduce gds_mitigation_locked()
> instead.

I thought this option but I was less convinced about adding a new 
function for every new but slightly different check.

Thanks,
Chang
[PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Pawan Gupta 1 year, 9 months ago
In order to safely enable Intel Keylocker feature, Gather Data Sampling
(GDS) mitigation should be enabled and locked. Hardware provides a way to
lock the mitigation, such that the mitigation cannot be disabled until the
CPU is reset. Currently, GDS mitigation is enabled without the lock.

Below is the recommendation from Intel:

  "Intel recommends that system software does not enable Key Locker (by
  setting CR4.KL) unless the GDS mitigation is enabled (IA32_MCU_OPT_CTRL
  [GDS_MITG_DIS] (bit 4) is 0) and locked (IA32_MCU_OPT_CTRL
  [GDS_MITG_LOCK](bit 5) is 1). This will prevent an adversary that takes
  control of the system from turning off the mitigation in order to infer
  the keys behind Key Locker handles." [1]

When GDS mitigation is enabled, and Keylocker feature is present, also lock
the mitigation.

[1] Gather Data Sampling (ID# 785676)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
This should ideally go before the patch that enables Keylocker. It is
only compile tested.

 arch/x86/kernel/cpu/bugs.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ca295b0c1eee..2777a58110e0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -755,8 +755,8 @@ EXPORT_SYMBOL_GPL(gds_ucode_mitigated);
 
 void update_gds_msr(void)
 {
-	u64 mcu_ctrl_after;
-	u64 mcu_ctrl;
+	u64 mcu_ctrl, mcu_ctrl_after;
+	u64 gds_lock = 0;
 
 	switch (gds_mitigation) {
 	case GDS_MITIGATION_OFF:
@@ -769,6 +769,8 @@ void update_gds_msr(void)
 		 * the same state. Make sure the mitigation is enabled on all
 		 * CPUs.
 		 */
+		gds_lock = GDS_MITG_LOCKED;
+		fallthrough;
 	case GDS_MITIGATION_FULL:
 		rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
 		mcu_ctrl &= ~GDS_MITG_DIS;
@@ -779,6 +781,7 @@ void update_gds_msr(void)
 		return;
 	}
 
+	mcu_ctrl |= gds_lock;
 	wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
 
 	/*
@@ -840,6 +843,11 @@ static void __init gds_select_mitigation(void)
 		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
 	}
 
+	/* Keylocker can only be enabled when GDS mitigation is locked */
+	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
+	    gds_mitigation == GDS_MITIGATION_FULL)
+		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
+
 	update_gds_msr();
 out:
 	pr_info("%s\n", gds_strings[gds_mitigation]);

---
base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
change-id: 20240418-gds-lock-26ecbce88470

Best regards,
-- 
Thanks,
Pawan
Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Chang S. Bae 1 year, 9 months ago
On 4/19/2024 10:47 AM, Pawan Gupta wrote:
>   
>   	/*
> @@ -840,6 +843,11 @@ static void __init gds_select_mitigation(void)
>   		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
>   	}
>   
> +	/* Keylocker can only be enabled when GDS mitigation is locked */
> +	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
> +	    gds_mitigation == GDS_MITIGATION_FULL)
> +		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> +

I'm having trouble understanding this change:

gds_select_mitigation()
{
	...
	if (gds_mitigation == GDS_MITIGATION_FORCE)
		gds_mitigation = GDS_MITIGATION_FULL;

	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
	if (mcu_ctrl & GDS_MITG_LOCKED) {
		...
		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
	}

	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
	    gds_mitigation == GDS_MITIGATION_FULL)
		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;

As I understand it, gds_mitigation is set to GDS_MITIGATION_FULL only if 
the gds force option is enabled but IA32_MCU_OPT_CTRL[GDS_MITG_LOCK] is 
not set.

Then, if the CPU has Key Locker, this code sets gds_mitigation to 
GDS_MITIGATION_FULL_LOCKED, which seems contradictory. I'm not sure why 
this change is necessary.

I'm also not convinced that the Key Locker series needs to modify this 
function. The Key Locker setup code should simply check the current 
mitigation status and enable the feature only if proper mitigation is in 
place. Am I missing something here?

Thanks,
Chang
Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Pawan Gupta 1 year, 9 months ago
On Mon, Apr 22, 2024 at 12:35:45AM -0700, Chang S. Bae wrote:
> On 4/19/2024 10:47 AM, Pawan Gupta wrote:
> >   	/*
> > @@ -840,6 +843,11 @@ static void __init gds_select_mitigation(void)
> >   		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> >   	}
> > +	/* Keylocker can only be enabled when GDS mitigation is locked */
> > +	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
> > +	    gds_mitigation == GDS_MITIGATION_FULL)
> > +		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> > +
> 
> I'm having trouble understanding this change:
> 
> gds_select_mitigation()
> {
> 	...
> 	if (gds_mitigation == GDS_MITIGATION_FORCE)
> 		gds_mitigation = GDS_MITIGATION_FULL;
> 
> 	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
> 	if (mcu_ctrl & GDS_MITG_LOCKED) {
> 		...
> 		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> 	}
> 
> 	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
> 	    gds_mitigation == GDS_MITIGATION_FULL)
> 		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> 
> As I understand it, gds_mitigation is set to GDS_MITIGATION_FULL only if the
> gds force option is enabled but IA32_MCU_OPT_CTRL[GDS_MITG_LOCK] is not set.

Not true, GDS_MITIGATION_FULL is the default. Cmdline
gather_data_sampling=force deploys a software fallback mitigation when
the microcode mitigation is not present. But, when microcode mitigation
is present, mitigation is set to GDS_MITIGATION_FULL.

> Then, if the CPU has Key Locker, this code sets gds_mitigation to
> GDS_MITIGATION_FULL_LOCKED, which seems contradictory. I'm not sure why this
> change is necessary.
>
> I'm also not convinced that the Key Locker series needs to modify this
> function. The Key Locker setup code should simply check the current
> mitigation status and enable the feature only if proper mitigation is in
> place. Am I missing something here?

To enable Key Locker feature, "proper mitigation" is microcode mitigation
enabled and the GDS_MITG_LOCK bit set in MSR_IA32_MCU_OPT_CTRL. Do you
agree?

If not via this patch, how is GDS_MITG_LOCK going to be set?

Below is from Intel's documentation:

  "Intel recommends that system software does not enable Key Locker (by
  setting CR4.KL) unless the GDS mitigation is enabled (IA32_MCU_OPT_CTRL
  [GDS_MITG_DIS] (bit 4) is 0) and locked (IA32_MCU_OPT_CTRL
  [GDS_MITG_LOCK](bit 5) is 1). This will prevent an adversary that takes
  control of the system from turning off the mitigation in order to infer
  the keys behind Key Locker handles.

  To support GDS mitigation locking for Key Locker, microcode updates
  for Tiger Lake systems enable the following model-specific behavior
  for GDS_MITG_LOCK. On these systems, a write to IA32_MCU_OPT_CTRL MSR
  with GDS_MITG_DIS (bit 4) value 0 and GDS_MITG_LOCK (bit 5) value 1
  will lock both bits at these values until reset."
Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Chang S. Bae 1 year, 9 months ago
On 4/22/2024 2:32 PM, Pawan Gupta wrote:
> 
> To enable Key Locker feature, "proper mitigation" is microcode mitigation
> enabled and the GDS_MITG_LOCK bit set in MSR_IA32_MCU_OPT_CTRL. Do you
> agree?
> > If not via this patch, how is GDS_MITG_LOCK going to be set?

The lock bit seems to be set by microcode when SGX is available. 
However, if the lock bit is not set for Key Locker, it does seem odd. 
Introducing kernel code to override this situation might be seen as a 
workaround rather than a proper solution, potentially leading to more 
confusion.

I'd rather investigate the behavior of the microcode further, verify its 
consistency, and gain a clearer understanding of the requirement for 
this lock bit.

Thanks,
Chang
Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Daniel Sneddon 1 year, 9 months ago
On 4/19/24 10:47, Pawan Gupta wrote:
> +	u64 gds_lock = 0;
>  
>  	switch (gds_mitigation) {
>  	case GDS_MITIGATION_OFF:
> @@ -769,6 +769,8 @@ void update_gds_msr(void)
>  		 * the same state. Make sure the mitigation is enabled on all
>  		 * CPUs.
>  		 */
> +		gds_lock = GDS_MITG_LOCKED;
Can't we just drop the new gds_lock var and set mcu_ctrl |= GDS_MITG_LOCKED here?
> +		fallthrough;
>  	case GDS_MITIGATION_FULL:
>  		rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
>  		mcu_ctrl &= ~GDS_MITG_DIS;
> @@ -779,6 +781,7 @@ void update_gds_msr(void)
>  		return;
>  	}
>  
> +	mcu_ctrl |= gds_lock;
>  	wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Pawan Gupta 1 year, 9 months ago
On Fri, Apr 19, 2024 at 11:03:28AM -0700, Daniel Sneddon wrote:
> On 4/19/24 10:47, Pawan Gupta wrote:
> > +	u64 gds_lock = 0;
> >  
> >  	switch (gds_mitigation) {
> >  	case GDS_MITIGATION_OFF:
> > @@ -769,6 +769,8 @@ void update_gds_msr(void)
> >  		 * the same state. Make sure the mitigation is enabled on all
> >  		 * CPUs.
> >  		 */
> > +		gds_lock = GDS_MITG_LOCKED;
> Can't we just drop the new gds_lock var and set mcu_ctrl |= GDS_MITG_LOCKED here?

Unfortunately no, because ...

> > +		fallthrough;
> >  	case GDS_MITIGATION_FULL:
> >  		rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);

... mcu_ctrl is read here, it will overwrite any previous value.

> >  		mcu_ctrl &= ~GDS_MITG_DIS;
> > @@ -779,6 +781,7 @@ void update_gds_msr(void)
> >  		return;
> >  	}
> >  
> > +	mcu_ctrl |= gds_lock;
> >  	wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
Posted by Daniel Sneddon 1 year, 9 months ago
On 4/19/24 13:19, Pawan Gupta wrote:
> ... mcu_ctrl is read here, it will overwrite any previous value.Ah, yep. Bummer.
[PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation
Posted by Chang S. Bae 1 year, 10 months ago
The Register File Data Sampling vulnerability may allow malicious
userspace programs to infer stale kernel register data, potentially
exposing sensitive key values, including AES keys.

To address this vulnerability, a microcode update needs to be applied to
the CPU, which modifies the VERW instruction to flush the affected CPU
buffers.

The kernel already has a facility to flush CPU buffers before returning
to userspace, which is indicated by the X86_FEATURE_CLEAR_CPU_BUF flag.

Ensure the mitigation before enabling Key Locker. Do not enable the
feature on CPUs affected by the vulnerability but lacks mitigation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Change from v8:
* Add as a new patch.

Note that the code change follows the mitigation guidance [1]:
  "Software loading Key Locker keys using LOADIWKEY should execute a VERW
   to clear registers before transitioning to untrusted code to prevent
   later software from inferring the loaded key."

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
---
 arch/x86/kernel/keylocker.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index d4f3aa65ea8a..6e805c4da76d 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -135,12 +135,29 @@ static bool __init have_gds_mitigation(void)
 	return false;
 }
 
+/*
+ * IA32_ARCH_CAPABILITIES MSR is retrieved during the setting of
+ * X86_BUG_RFDS. Ensure that the mitigation is applied to flush CPU
+ * buffers by checking the flag.
+ */
+static bool __init have_rfds_mitigation(void)
+{
+	if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
+		return true;
+
+	pr_warn("x86/keylocker: Susceptible to the RFDS vulnerability.\n");
+	return false;
+}
+
 /* Check if Key Locker is secure enough to be used. */
 static bool __init secure_keylocker(void)
 {
 	if (boot_cpu_has_bug(X86_BUG_GDS) && !have_gds_mitigation())
 		return false;
 
+	if (boot_cpu_has_bug(X86_BUG_RFDS) && !have_rfds_mitigation())
+		return false;
+
 	return true;
 }
 
-- 
2.34.1
Re: [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation
Posted by Pawan Gupta 1 year, 10 months ago
On Thu, Mar 28, 2024 at 06:53:43PM -0700, Chang S. Bae wrote:
> The Register File Data Sampling vulnerability may allow malicious
> userspace programs to infer stale kernel register data, potentially
> exposing sensitive key values, including AES keys.
> 
> To address this vulnerability, a microcode update needs to be applied to
> the CPU, which modifies the VERW instruction to flush the affected CPU
> buffers.
> 
> The kernel already has a facility to flush CPU buffers before returning
> to userspace, which is indicated by the X86_FEATURE_CLEAR_CPU_BUF flag.
> 
> Ensure the mitigation before enabling Key Locker. Do not enable the
> feature on CPUs affected by the vulnerability but lacks mitigation.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
> Change from v8:
> * Add as a new patch.
> 
> Note that the code change follows the mitigation guidance [1]:
>   "Software loading Key Locker keys using LOADIWKEY should execute a VERW
>    to clear registers before transitioning to untrusted code to prevent
>    later software from inferring the loaded key."
> 
> [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
> ---
>  arch/x86/kernel/keylocker.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> index d4f3aa65ea8a..6e805c4da76d 100644
> --- a/arch/x86/kernel/keylocker.c
> +++ b/arch/x86/kernel/keylocker.c
> @@ -135,12 +135,29 @@ static bool __init have_gds_mitigation(void)
>  	return false;
>  }
>  
> +/*
> + * IA32_ARCH_CAPABILITIES MSR is retrieved during the setting of
> + * X86_BUG_RFDS. Ensure that the mitigation is applied to flush CPU
> + * buffers by checking the flag.
> + */
> +static bool __init have_rfds_mitigation(void)
> +{
> +	if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
> +		return true;

X86_FEATURE_CLEAR_CPU_BUF is also set by other VERW based mitigations
like MDS. The feature flag does not guarantee that the microcode
required to mitigate RFDS is loaded.

A more robust check would be:

	if (rfds_mitigation == RFDS_MITIGATION_VERW)
		return true;

And it would be apt to move this function to arch/x86/kernel/cpu/bugs.c

> +
> +	pr_warn("x86/keylocker: Susceptible to the RFDS vulnerability.\n");
> +	return false;
> +}
> +
>  /* Check if Key Locker is secure enough to be used. */
>  static bool __init secure_keylocker(void)
>  {
>  	if (boot_cpu_has_bug(X86_BUG_GDS) && !have_gds_mitigation())
>  		return false;
>  
> +	if (boot_cpu_has_bug(X86_BUG_RFDS) && !have_rfds_mitigation())
> +		return false;
> +
>  	return true;
>  }
[PATCH v9a 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation
Posted by Chang S. Bae 1 year, 10 months ago
The Register File Data Sampling vulnerability may allow malicious
userspace programs to infer stale kernel register data, potentially
exposing sensitive key values, including AES keys.

To address this vulnerability, a microcode update needs to be applied to
the CPU, which modifies the VERW instruction to flush the affected CPU
buffers.

Reference the 'rfds_mitigation' variable to check the mitigation status.
Do not enable Key Locker on CPUs affected by the vulnerability but
lacking mitigation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Changes from v9:
* Remove the helper function and simplify the code by directly reading
  the status variable. (Pawan Gupta)

Note that this code change aligns with mitigation guidance, which
recommends:
  "Software loading Key Locker keys using LOADIWKEY should execute a VERW
   to clear registers before transitioning to untrusted code to prevent
   later software from inferring the loaded key."

For more information, refer to Intel's guidance on Register File Data
Sampling:
  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
---
 arch/x86/include/asm/processor.h | 8 ++++++++
 arch/x86/kernel/cpu/bugs.c       | 8 +-------
 arch/x86/kernel/keylocker.c      | 3 +++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 74eaa3a2b85b..b823163f4786 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -728,6 +728,14 @@ enum mitigation_info {
 
 extern bool gds_ucode_mitigated(enum mitigation_info mitg);
 
+enum rfds_mitigations {
+	RFDS_MITIGATION_OFF,
+	RFDS_MITIGATION_VERW,
+	RFDS_MITIGATION_UCODE_NEEDED,
+};
+
+extern enum rfds_mitigations rfds_mitigation;
+
 /*
  * Make previous memory operations globally visible before
  * a WRMSR.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 80f6e70619cb..a2ba1a0ef872 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -483,14 +483,8 @@ early_param("mmio_stale_data", mmio_stale_data_parse_cmdline);
 #undef pr_fmt
 #define pr_fmt(fmt)	"Register File Data Sampling: " fmt
 
-enum rfds_mitigations {
-	RFDS_MITIGATION_OFF,
-	RFDS_MITIGATION_VERW,
-	RFDS_MITIGATION_UCODE_NEEDED,
-};
-
 /* Default mitigation for Register File Data Sampling */
-static enum rfds_mitigations rfds_mitigation __ro_after_init =
+enum rfds_mitigations rfds_mitigation __ro_after_init =
 	IS_ENABLED(CONFIG_MITIGATION_RFDS) ? RFDS_MITIGATION_VERW : RFDS_MITIGATION_OFF;
 
 static const char * const rfds_strings[] = {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 23cf4a235f11..09876693414c 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -119,6 +119,9 @@ static bool __init secure_keylocker(void)
 	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_ucode_mitigated(MITG_LOCKED))
 		return false;
 
+	if (boot_cpu_has_bug(X86_BUG_RFDS) && rfds_mitigation != RFDS_MITIGATION_VERW)
+		return false;
+
 	return true;
 }
 
-- 
2.40.1
[PATCH v9 12/14] x86/Kconfig: Add a configuration for Key Locker
Posted by Chang S. Bae 1 year, 10 months ago
Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
---
Changes from v8:
* Drop the "nokeylocker" option. (Borislav Petkov)

Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
  kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 arch/x86/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 39886bab943a..41eb88dcfb62 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1878,6 +1878,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
-- 
2.34.1
[PATCH v9b 12/14] x86/Kconfig: Add symbols for Key Locker
Posted by Chang S. Bae 1 year, 8 months ago
Add CONFIG_X86_KEYLOCKER to control whether Key Locker is initialized at
boot. Additionally, add the AS_KEYLOCKER config symbol to indicate
whether the assembler supports Key Locker.

The former will be selected, and the latter will be referenced by the Key
Locker cipher module CRYPTO_AES_KL, to be added in a later patch.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from v9:
* Include AS_KEYLOCKER symbol (Eric Biggers).
* Revoke the earlier tag.
---
 arch/x86/Kconfig           | 3 +++
 arch/x86/Kconfig.assembler | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d7122a1883e..ce4e4c1641da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1881,6 +1881,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 59aedf32c4ea..e6ce80d23113 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -35,6 +35,11 @@ config AS_VPCLMULQDQ
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
 
+config AS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config AS_WRUSS
 	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
 	help
-- 
2.34.1
[PATCH v9 13/14] crypto: x86/aes - Prepare for new AES-XTS implementation
Posted by Chang S. Bae 1 year, 10 months ago
The key Locker's AES instruction set ('AES-KL') shares a similar
programming interface with AES-NI. The internal ABI in the assembly code
will have the same prototype as AES-NI, and the glue code will also be
identical.

The upcoming AES code will exclusively support the XTS mode as disk
encryption is the only intended use case.

Refactor the XTS-related code to eliminate code duplication and relocate
certain constant values to make them shareable. Also, introduce wrappers
for data transformation functions to return an error code, as AES-KL may
populate it.

Introduce union x86_aes_ctx as AES-KL will reference an encoded form
instead of an expanded AES key. This allows different AES context formats
in the shared code.

Inline the refactored code to the caller to prevent the potential
overhead of indirect calls.

No functional changes or performance regressions are intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
Changes from v8:
* Rebase on the AES-NI changes in the mainline -- mostly cleanup works.
* Introduce 'union x86_aes_ctx'. (Eric Biggers)
* Ensure 'inline' for wrapper functions.
* Tweak the changelog.

Changes from v7:
* Remove aesni_dec() as not referenced by the refactored helpers. But,
  keep the ASM symbol '__aesni_dec' to make it appears consistent with
  its counterpart '__aesni_enc'.
* Call out 'AES-XTS' in the subject.

Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
  - Follow the symbol convention: '_' -> '__' (Eric Biggers)
  - Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
    "CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
---
 arch/x86/crypto/aes-helper_asm.S   |  22 +++
 arch/x86/crypto/aes-helper_glue.h  | 167 ++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  47 +++----
 arch/x86/crypto/aesni-intel_glue.c | 213 ++++++++---------------------
 4 files changed, 261 insertions(+), 188 deletions(-)
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..52ba1fe5cf71
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_HELPER_GLUE_H
+#define _AES_HELPER_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+/*
+ * Preserve data types for various AES implementations available in x86
+ */
+union x86_aes_ctx {
+	struct crypto_aes_ctx aesni;
+};
+
+struct aes_xts_ctx {
+	union x86_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	union x86_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline void *aes_align_addr(void *addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? addr : PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return aes_align_addr(crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(union x86_aes_ctx *ctx, const u8 *in_key, unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(&ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(&ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+		else
+			dst = src;
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_HELPER_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 7ecb55cae3d6..1015a36a73a0 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1934,9 +1935,9 @@ SYM_FUNC_START(aesni_set_key)
 SYM_FUNC_END(aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1955,7 +1956,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2123,9 +2124,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2145,7 +2146,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2688,22 +2689,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
+#ifdef __x86_64__
+
 .pushsection .rodata
 .align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
 .Lbswap_mask:
 	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
 .popsection
 
-#ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
  *	setup registers used by _aesni_inc
@@ -2818,12 +2811,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2843,10 +2830,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2995,13 +2982,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3157,4 +3144,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 0ea3abaaa645..4ac7b9a28967 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-helper_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,17 +72,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-static inline void *aes_align_addr(void *addr)
-{
-	if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
-		return addr;
-	return PTR_ALIGN(addr, AESNI_ALIGN);
-}
-
 asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			      unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -104,14 +89,20 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
 #ifdef CONFIG_X86_64
 
@@ -223,11 +214,6 @@ static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 	return aes_align_addr(raw_ctx);
 }
 
-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
-	return aes_align_addr(crypto_skcipher_ctx(tfm));
-}
-
 static int aes_set_key_common(struct crypto_aes_ctx *ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -261,7 +247,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		__aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -274,11 +260,31 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		__aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
 
+static inline int aesni_xts_setkey(union x86_aes_ctx *ctx,
+				   const u8 *in_key, unsigned int key_len)
+{
+	return aes_set_key_common(&ctx->aesni, in_key, key_len);
+}
+
+static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
 static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
@@ -524,7 +530,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			__aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -668,8 +674,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -824,8 +830,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -852,8 +858,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -878,126 +884,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(&ctx->crypt_ctx, key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(&ctx->tweak_ctx, key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aesni_xts_setkey);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1152,8 +1049,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1169,8 +1066,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
-- 
2.34.1
Re: [PATCH v9 13/14] crypto: x86/aes - Prepare for new AES-XTS implementation
Posted by Chang S. Bae 1 year, 8 months ago
This patch is no longer needed, as the rework leads more destructive 
code changes in the AES-NI side.

Thanks,
Chang
[PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Chang S. Bae 1 year, 10 months ago
Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformations are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

The AES-KL instructions themselves are executable in userspace. This
restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations,
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

This then creates a rather strong dependency on AES-NI. If this driver
supports a module build, the exported AES-NI functions cannot be inlined.
More importantly, indirect calls will impact the performance.

To simplify, disallow a module build for AES-KL and always select AES-NI.
This restriction can be relaxed if strong use cases arise against it.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. In this way, an error code can be returned, instead of
facing abrupt exceptions.

== Userspace Exposition ==

The Keylocker implementations so far have measurable performance
penalties. So, keep AES-NI as the default.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is an end-user
consideration for selecting AES-KL. Users may pick it according to the
name 'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
Changes from v8:
* Rebase on the upstream changes.
* Combine the XTS enc/dec assembly code in a macro. (Eric Biggers)
* Define setkey() as void instead of returning 'int'. (Eric Biggers)
* Rearrange the assembly code to reduce jumps especially for success
  cases. (Eric Biggers)
* Update the changelog for clarification. (Eric Biggers)
* Exclude module build.

Changes from v7:
* Update the changelog -- remove 'API Limitation'. (Eric Biggers)
* Update the comment for valid_keylocker(). (Eric Biggers)
* Improve the code:
  - Remove the key-length check and simplify the code. (Eric Biggers)
  - Remove aeskl_dec() and __aeskl_dec() as not needed.
  - Simplify the register-function return handling. (Eric Biggers)
  - Rename setkey functions for coherent naming:
    aeskl_setkey() -> __aeskl_setkey(),
    aeskl_setkey_common() -> aeskl_setkey(),
    aeskl_xts_setkey() -> xts_setkey()
  - Revert an unnecessary comment.

Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlstra)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
---
 arch/x86/Kconfig.assembler         |   5 +
 arch/x86/crypto/Kconfig            |  17 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aes-helper_glue.h  |   7 +-
 arch/x86/crypto/aeskl-intel_asm.S  | 412 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 187 +++++++++++++
 arch/x86/crypto/aeskl-intel_glue.h |  35 +++
 arch/x86/crypto/aesni-intel_glue.c |  30 +--
 arch/x86/crypto/aesni-intel_glue.h |  40 +++
 9 files changed, 704 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 8ad41da301e5..0e58f2b61dd3 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -25,6 +25,11 @@ config AS_GFNI
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config AS_WRUSS
 	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
 	help
diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..067bb149998b 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	bool "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
index 52ba1fe5cf71..262c1cec0011 100644
--- a/arch/x86/crypto/aes-helper_glue.h
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -19,16 +19,17 @@
 #include <crypto/internal/aead.h>
 #include <crypto/internal/simd.h>
 
+#include "aeskl-intel_glue.h"
+
 #define AES_ALIGN		16
 #define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
 #define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
 #define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
 
-/*
- * Preserve data types for various AES implementations available in x86
- */
+/* Data types for the two AES implementations available in x86 */
 union x86_aes_ctx {
 	struct crypto_aes_ctx aesni;
+	struct aeskl_ctx aeskl;
 };
 
 struct aes_xts_ctx {
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..81af7f61aab5
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,412 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN	%xmm8
+
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handlep, const u8 *ukeyp,
+ *		       unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *handlep, u8 *outp, const u8 *inp)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	xor %rax, %rax
+	jmp .Lenc_end
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+	xor %rax, %rax
+	jmp .Lenc_end
+
+.Lenc_err:
+	mov $(-EINVAL), %rax
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+.macro XTS_ENC_DEC operation
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.ifc \operation, dec
+	test $15, LEN
+	jz .Lxts_op8_\@
+	sub $16, LEN
+.endif
+
+.Lxts_op8_\@:
+	sub $128, LEN
+	jl .Lxts_op1_pre_\@
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_op8_128_\@
+.ifc \operation, dec
+	aesdecwide256kl (%rdi)
+.else
+	aesencwide256kl (%rdi)
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op8_end_\@
+.Lxts_op8_128_\@:
+.ifc \operation, dec
+	aesdecwide128kl (%rdi)
+.else
+	aesencwide128kl (%rdi)
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op8_end_\@:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_op8_\@
+
+.Lxts_op_ret_\@:
+	movups IV, (IVP)
+	xor %rax, %rax
+	FRAME_END
+	RET
+
+.Lxts_op1_pre_\@:
+	add $128, LEN
+	jz .Lxts_op_ret_\@
+.ifc \operation, enc
+	sub $16, LEN
+	jl .Lxts_op_cts4_\@
+.endif
+
+.Lxts_op1_\@:
+	movdqu (INP), STATE1
+
+.ifc \operation, dec
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_op_cts1_\@
+.endif
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_op1_128_\@
+.ifc \operation, dec
+	aesdec256kl (HANDLEP), STATE1
+.else
+	aesenc256kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op1_end_\@
+.Lxts_op1_128_\@:
+.ifc \operation, dec
+	aesdec128kl (HANDLEP), STATE1
+.else
+	aesenc128kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op1_end_\@:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_op1_out_\@
+
+.ifc \operation, enc
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_op_cts1_\@
+.endif
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_op1_\@
+
+.Lxts_op1_out_\@:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_op_ret_\@
+
+.Lxts_op_cts4_\@:
+.ifc \operation, enc
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+.endif
+
+.Lxts_op_cts1_\@:
+.ifc \operation, dec
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128_\@
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_op_err_\@
+	jmp .Lxts_dec1_cts_pre_end_\@
+.Lxts_dec1_cts_pre_128_\@:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_op_err_\@
+.Lxts_dec1_cts_pre_end_\@:
+	pxor IV, STATE1
+.endif
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN
+	pblendvb STATE3, IN
+	movaps IN, STATE1
+
+.ifc \operation, dec
+	pxor STATE5, STATE1
+.else
+	pxor IV, STATE1
+.endif
+
+	cmp $16, KLEN
+	je .Lxts_op1_cts_128_\@
+.ifc \operation, dec
+	aesdec256kl (HANDLEP), STATE1
+.else
+	aesenc256kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op1_cts_end_\@
+.Lxts_op1_cts_128_\@:
+.ifc \operation, dec
+	aesdec128kl (HANDLEP), STATE1
+.else
+	aesenc128kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op1_cts_end_\@:
+.ifc \operation, dec
+	pxor STATE5, STATE1
+.else
+	pxor IV, STATE1
+.endif
+	movups STATE1, (OUTP)
+	xor %rax, %rax
+	FRAME_END
+	RET
+
+.Lxts_op_err_\@:
+	mov $(-EINVAL), %rax
+	FRAME_END
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *outp,
+ *			   const u8 *inp, unsigned int klen, le128 *ivp)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	XTS_ENC_DEC enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *outp,
+ *			   const u8 *inp, unsigned int klen, le128 *ivp)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	XTS_ENC_DEC dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..7672c4836da8
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(union x86_aes_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	err = aes_check_keylen(keylen);
+	if (err)
+		return err;
+
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		kernel_fpu_begin();
+		aesni_set_key(&ctx->aesni, in_key, keylen);
+		kernel_fpu_end();
+		return 0;
+	}
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	kernel_fpu_begin();
+	__aeskl_setkey(&ctx->aeskl, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(&ctx->aeskl, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(&ctx->aeskl, out, in, len, iv);
+}
+
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+		      unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.aeskl.key_length;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aeskl-intel_glue.h b/arch/x86/crypto/aeskl-intel_glue.h
new file mode 100644
index 000000000000..57cfd6c55a4f
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESKL_INTEL_GLUE_H
+#define _AESKL_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here should be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+#endif /* _AESKL_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 4ac7b9a28967..d9c4aa055383 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,9 +73,6 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			      unsigned int key_len);
-asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
@@ -89,21 +87,9 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
-{
-	__aesni_enc(ctx, out, in);
-	return 0;
-}
-
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				    const u8 *in, unsigned int len, u8 *iv);
-
-asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				    const u8 *in, unsigned int len, u8 *iv);
-
 #ifdef CONFIG_X86_64
 
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -271,20 +257,6 @@ static inline int aesni_xts_setkey(union x86_aes_ctx *ctx,
 	return aes_set_key_common(&ctx->aesni, in_key, key_len);
 }
 
-static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
-				    unsigned int len, u8 *iv)
-{
-	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
-	return 0;
-}
-
-static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
-				    unsigned int len, u8 *iv)
-{
-	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
-	return 0;
-}
-
 static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..999f81f5bcde
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * These are AES-NI functions that are used by the AES-KL code as a
+ * fallback when it is given a 192-bit key. Key Locker does not support
+ * 192-bit keys.
+ */
+
+#ifndef _AESNI_INTEL_GLUE_H
+#define _AESNI_INTEL_GLUE_H
+
+asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
+static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+#endif /* _AESNI_INTEL_GLUE_H */
-- 
2.34.1
[PATCH v9a 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Chang S. Bae 1 year, 8 months ago
Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion along with all subsequent data transformations, is
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

Although the AES-KL instructions themselves are executable in userspace,
this restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support. However,
per the expectations of Linux crypto-cipher implementations, the software
cipher implementation must support all the AES-compliant key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to VAES. In other words, the 192-bit key-size
limitation is documented but not enforced.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. This allows an error code to be returned instead of
encountering abrupt exceptions.

== Userspace Exposition ==

Keylocker implementations have measurable performance penalties.
Therefore, keep the current default remains the same.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is up to the end
user to decide whether to use AES-KL. User can select it by the name
'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
I've reworked this patch based on feedback,
    https://lore.kernel.org/lkml/20240408014806.GA965@quark.localdomain/
and rebased to upstream v6.10 Linus merge tree on May 13th: commit
84c7d76b5ab6 ("Merge tag 'v6.10-p1' of
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6")

According to the dm-crypt benchmark, using VEX-encoded instructions for
tweak processing enhances performance by approximately 2-3%. The
PCLMULDQD instruction did not yield a measurable impact, so I dropped it
to simplify the implementation.

In contrast to other AES instructions, AES-KL does not permit tweak
processing between rounds. In XTS mode, a single instruction covers all
rounds of 8 blocks without interleaving instructions. Maybe this is one
of the reasons for the limited performance gain.

Moving forward, I would like to address any further feedback on this
AES-KL driver code first before the next revision of the whole series.

Changes from v9:
* Duplicate the new XTS glue code, instead of sharing (Eric).
* Use VEX-coded instructions for non-AES parts of the code (Eric).
* Adjust ASM code to stylistically follow the new VAES support (Eric).
* Export and reference the high-level AES-NI XTS functions (Eric). Then,
  support a module build, along with rearranging build dependencies.
* Reorganize the glue code and improve ASM code readability.
* Revoke the review tag due to major changes.
---
 arch/x86/Kconfig.assembler         |   5 +
 arch/x86/crypto/Kconfig            |  18 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-xts-x86_64.S | 358 +++++++++++++++++++++++++++
 arch/x86/crypto/aeskl_glue.c       | 376 +++++++++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_glue.c |  13 +-
 arch/x86/crypto/aesni-xts.h        |  15 ++
 7 files changed, 783 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-xts-x86_64.S
 create mode 100644 arch/x86/crypto/aeskl_glue.c
 create mode 100644 arch/x86/crypto/aesni-xts.h

diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 59aedf32c4ea..89e326c9dbfe 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -35,6 +35,11 @@ config AS_VPCLMULQDQ
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config AS_WRUSS
 	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
 	help
diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..d55704fc9a8f 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,24 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select CRYPTO_SIMD
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9c5ce5613738..c46fd2d9dd16 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -51,6 +51,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o \
 	aes_ctrby8_avx-x86_64.o aes-xts-avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-x86_64.o
+aeskl-x86_64-y := aeskl-xts-x86_64.o aeskl_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-xts-x86_64.S b/arch/x86/crypto/aeskl-xts-x86_64.S
new file mode 100644
index 000000000000..6ff8b5feebfc
--- /dev/null
+++ b/arch/x86/crypto/aeskl-xts-x86_64.S
@@ -0,0 +1,358 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is primarily derived from aesni-intel_asm.S and
+ * stylistically aligned with aes-xts-avx-x86_64.S.
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+
+/* Constant values shared between AES implementations: */
+
+.section .rodata
+.p2align 4
+.Lgf_poly:
+	/*
+	 * Represents the polynomial x^7 + x^2 + x + 1, where the low 64
+	 * bits are XOR'd into the tweak's low 64 bits when a carry
+	 * occurs from the high 64 bits.
+	 */
+	.quad	0x87, 1
+
+	/*
+	 * Table of constants for variable byte shifts and blending
+	 * during ciphertext stealing operations.
+	 */
+.Lcts_permute_table:
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+
+.text
+
+.set	V0,		%xmm0
+.set	V1,		%xmm1
+.set	V2,		%xmm2
+.set	V3,		%xmm3
+.set	V4,		%xmm4
+.set	V5,		%xmm5
+.set	V6,		%xmm6
+.set	V7,		%xmm7
+.set	V8,		%xmm8
+.set	V9,		%xmm9
+.set	V10,		%xmm10
+.set	V11,		%xmm11
+.set	V12,		%xmm12
+.set	V13,		%xmm13
+.set	V14,		%xmm14
+.set	V15,		%xmm15
+
+.set	TWEAK_XMM1,	V8
+.set	TWEAK_XMM2,	V9
+.set	TWEAK_XMM3,	V10
+.set	TWEAK_XMM4,	V11
+.set	TWEAK_XMM5,	V12
+.set	TWEAK_XMM6,	V13
+.set	TWEAK_XMM7,	V14
+.set	GF_POLY_XMM,	V15
+.set	TWEAK_TMP,	TWEAK_XMM1
+.set	TWEAK_XMM,	TWEAK_XMM2
+.set	TMP,		%r10
+
+/* Function parameters */
+.set	HANDLEP,	%rdi	/* Pointer to struct aeskl_ctx */
+.set	DST,		%rsi	/* Pointer to next destination data */
+.set	UKEYP,		DST	/* Pointer to the original key */
+.set	KLEN,		%r9d	/* AES key length in bytes */
+.set	SRC,		%rdx	/* Pointer to next source data */
+.set	LEN,		%rcx	/* Remaining length in bytes */
+.set	TWEAK,		%r8	/* Pointer to next tweak */
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handlep, const u8 *ukeyp,
+ *		       unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl		%edx, 480(HANDLEP)
+	vmovdqu		(UKEYP), V0
+	mov		$1, %eax
+	cmp		$16, %dl
+	je		.Lsetkey_128
+
+	vmovdqu		0x10(UKEYP), V1
+	encodekey256	%eax, %eax
+	vmovdqu		V3, 0x30(HANDLEP)
+	jmp		.Lsetkey_end
+.Lsetkey_128:
+	encodekey128	%eax, %eax
+
+.Lsetkey_end:
+	vmovdqu		V0, 0x00(HANDLEP)
+	vmovdqu		V1, 0x10(HANDLEP)
+	vmovdqu		V2, 0x20(HANDLEP)
+
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+.macro _aeskl		width, operation
+	cmp		$16, KLEN
+	je		.Laeskl128\@
+.ifc \width, wide
+ .ifc \operation, dec
+	aesdecwide256kl	(HANDLEP)
+ .else
+	aesencwide256kl	(HANDLEP)
+ .endif
+.else
+ .ifc \operation, dec
+	aesdec256kl	(HANDLEP), V0
+ .else
+	aesenc256kl	(HANDLEP), V0
+ .endif
+.endif
+	jmp		.Laesklend\@
+.Laeskl128\@:
+.ifc \width, wide
+ .ifc \operation, dec
+	aesdecwide128kl	(HANDLEP)
+ .else
+	aesencwide128kl	(HANDLEP)
+ .endif
+.else
+ .ifc \operation, dec
+	aesdec128kl	(HANDLEP), V0
+ .else
+	aesenc128kl	(HANDLEP), V0
+ .endif
+.endif
+.Laesklend\@:
+.endm
+
+/* int __aeskl_enc(const void *handlep, u8 *dst, const u8 *src) */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	vmovdqu		(SRC), V0
+	movl		480(HANDLEP), KLEN
+
+	_aeskl		oneblock, enc
+	jz		.Lerror
+	xor		%rax, %rax
+	vmovdqu		V0, (DST)
+	FRAME_END
+	RET
+.Lerror:
+	mov		$(-EINVAL), %rax
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * Calculate the next 128-bit XTS tweak by multiplying the polynomial 'x'
+ * with the current tweak stored in the xmm register \src, and store the
+ * result in \dst.
+ */
+.macro _next_tweak	src, tmp, dst
+	vpshufd		$0x13, \src, \tmp
+	vpaddq		\src, \src, \dst
+	vpsrad		$31, \tmp, \tmp
+	vpand		GF_POLY_XMM, \tmp, \tmp
+	vpxor		\tmp, \dst, \dst
+.endm
+
+.macro _aeskl_xts_crypt operation
+	FRAME_BEGIN
+	vmovdqa		.Lgf_poly(%rip), GF_POLY_XMM
+	vmovups		(TWEAK), TWEAK_XMM
+	mov		480(HANDLEP), KLEN
+
+.ifc \operation, dec
+	/*
+	 * During decryption, if the message length is not a multiple of
+	 * the AES block length, exclude the last complete block from the
+	 * decryption loop by subtracting 16 from LEN. This adjustment is
+	 * necessary because ciphertext stealing decryption uses the last
+	 * two tweaks in reverse order. Special handling is required for
+	 * the last complete block and any remaining partial block at the
+	 * end.
+	 */
+	test		$15, LEN
+	jz		.L8block_at_a_time\@
+	sub		$16, LEN
+.endif
+
+.L8block_at_a_time\@:
+	sub		$128, LEN
+	jl		.Lhandle_remainder\@
+
+	vpxor		(SRC), TWEAK_XMM, V0
+	vmovups		TWEAK_XMM, (DST)
+
+	/*
+	 * Calculate and cache tweak values. Note that the tweak
+	 * computation cannot be interleaved with AES rounds here using
+	 * Key Locker instructions.
+	 */
+	_next_tweak	TWEAK_XMM,  V1, TWEAK_XMM1
+	_next_tweak	TWEAK_XMM1, V1, TWEAK_XMM2
+	_next_tweak	TWEAK_XMM2, V1, TWEAK_XMM3
+	_next_tweak	TWEAK_XMM3, V1, TWEAK_XMM4
+	_next_tweak	TWEAK_XMM4, V1, TWEAK_XMM5
+	_next_tweak	TWEAK_XMM5, V1, TWEAK_XMM6
+	_next_tweak	TWEAK_XMM6, V1, TWEAK_XMM7
+
+	/* XOR each source block with its tweak. */
+	vpxor		0x10(SRC), TWEAK_XMM1, V1
+	vpxor		0x20(SRC), TWEAK_XMM2, V2
+	vpxor		0x30(SRC), TWEAK_XMM3, V3
+	vpxor		0x40(SRC), TWEAK_XMM4, V4
+	vpxor		0x50(SRC), TWEAK_XMM5, V5
+	vpxor		0x60(SRC), TWEAK_XMM6, V6
+	vpxor		0x70(SRC), TWEAK_XMM7, V7
+
+	/* Encrypt or decrypt 8 blocks per iteration. */
+	_aeskl		wide, \operation
+	jz		.Lerror\@
+
+	/* XOR tweaks again. */
+	vpxor		(DST), V0, V0
+	vpxor		TWEAK_XMM1, V1, V1
+	vpxor		TWEAK_XMM2, V2, V2
+	vpxor		TWEAK_XMM3, V3, V3
+	vpxor		TWEAK_XMM4, V4, V4
+	vpxor		TWEAK_XMM5, V5, V5
+	vpxor		TWEAK_XMM6, V6, V6
+	vpxor		TWEAK_XMM7, V7, V7
+
+	/* Store destination blocks. */
+	vmovdqu		V0, 0x00(DST)
+	vmovdqu		V1, 0x10(DST)
+	vmovdqu		V2, 0x20(DST)
+	vmovdqu		V3, 0x30(DST)
+	vmovdqu		V4, 0x40(DST)
+	vmovdqu		V5, 0x50(DST)
+	vmovdqu		V6, 0x60(DST)
+	vmovdqu		V7, 0x70(DST)
+
+	_next_tweak	TWEAK_XMM7, TWEAK_TMP, TWEAK_XMM
+	add		$128, SRC
+	add		$128, DST
+	test		LEN, LEN
+	jz		.Lend\@
+	jmp		.L8block_at_a_time\@
+
+.Lhandle_remainder\@:
+	add		$128, LEN
+	jz		.Lend\@
+.ifc \operation, enc
+	vmovdqu		V7, V0
+.endif
+	sub		$16, LEN
+	jl		.Lcts\@
+
+	/* Encrypt or decrypt one block per iteration */
+.Lblock_at_a_time\@:
+	vpxor		(SRC), TWEAK_XMM, V0
+	_aeskl		oneblock, \operation
+	jz		.Lerror\@
+	vpxor		TWEAK_XMM, V0, V0
+	_next_tweak	TWEAK_XMM, TWEAK_TMP, TWEAK_XMM
+	test		LEN, LEN
+	jz		.Lout\@
+
+	add		$16, SRC
+	vmovdqu		V0, (DST)
+	add		$16, DST
+	sub		$16, LEN
+	jge		.Lblock_at_a_time\@
+
+.Lcts\@:
+.ifc \operation, dec
+	/*
+	 * If decrypting, the last block was not decrypted because CTS
+	 * decryption uses the last two tweaks in reverse order. This is
+	 * done by advancing the tweak and decrypting the last block.
+	 */
+	_next_tweak	TWEAK_XMM, TWEAK_TMP, V4
+	vpxor		(SRC), V4, V0
+	_aeskl		oneblock, \operation
+	jz		.Lerror\@
+	vpxor		V4, V0, V0
+	add		$16, SRC
+.else
+	/*
+	 * If encrypting, the last block was already encrypted in V0.
+	 * Prepare the CTS encryption by rewinding the pointer.
+	 */
+	sub		$16, DST
+.endif
+	lea		.Lcts_permute_table(%rip), TMP
+
+	/* Load the source partial block */
+	vmovdqu		(SRC, LEN, 1), V3
+
+	/*
+	 * Shift the first LEN bytes of the encryption and decryption of
+	 * the last block to the end of a register, then store it to
+	 * DST+LEN.
+	 */
+	add		$16, LEN
+	vpshufb		(TMP, LEN, 1), V0, V2
+	vmovdqu		V2, (DST, LEN, 1)
+
+	/* Shift the source partial block to the beginning */
+	sub		LEN, TMP
+	vmovdqu		32(TMP), V2
+	vpshufb		V2, V3, V3
+
+	/* Blend to generate the source partial block */
+	vpblendvb	V2, V0, V3, V3
+
+	/* Encrypt or decrypt again and store the last block. */
+	vpxor		TWEAK_XMM, V3, V0
+	_aeskl		oneblock, \operation
+	jz		.Lerror\@
+	vpxor		TWEAK_XMM, V0, V0
+	vmovdqu		V0, (DST)
+
+	xor		%rax, %rax
+	FRAME_END
+	RET
+.Lout\@:
+	vmovdqu		V0, (DST)
+.Lend\@:
+	vmovups		TWEAK_XMM, (TWEAK)
+	xor		%rax, %rax
+	FRAME_END
+	RET
+.Lerror\@:
+	mov		$(-EINVAL), %rax
+	FRAME_END
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *dst,
+ *			   const u8 *src, unsigned int klen, le128 *tweak)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	_aeskl_xts_crypt	enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *dst,
+ *			   const u8 *src, unsigned int klen, le128 *twek)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	_aeskl_xts_crypt	dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl_glue.c b/arch/x86/crypto/aeskl_glue.c
new file mode 100644
index 000000000000..6dc4d380be54
--- /dev/null
+++ b/arch/x86/crypto/aeskl_glue.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include "aesni-xts.h"
+
+#define AESKL_ALIGN		16
+#define AESKL_ALIGN_ATTR	__attribute__ ((__aligned__(AESKL_ALIGN)))
+#define AESKL_ALIGN_EXTRA	((AESKL_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here should be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+struct aeskl_xts_ctx {
+	struct aeskl_ctx tweak_ctx AESKL_ALIGN_ATTR;
+	struct aeskl_ctx crypt_ctx AESKL_ALIGN_ATTR;
+};
+
+#define XTS_AES_CTX_SIZE (sizeof(struct aeskl_xts_ctx) + AESKL_ALIGN_EXTRA)
+
+static inline struct aeskl_xts_ctx *aeskl_xts_ctx(struct crypto_skcipher *tfm)
+{
+	void *addr = crypto_skcipher_ctx(tfm);
+
+	if (crypto_tfm_ctx_alignment() >= AESKL_ALIGN)
+		return addr;
+
+	return PTR_ALIGN(addr, AESKL_ALIGN);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.key_length;
+}
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	kernel_fpu_begin();
+	if (!valid_keylocker()) {
+		kernel_fpu_end();
+		return -ENODEV;
+	}
+
+	__aeskl_setkey(ctx, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static int aeskl_xts_encrypt_iv(const struct aeskl_ctx *tweak_key,
+				u8 iv[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(tweak_key, iv, iv);
+}
+
+static int aeskl_xts_encrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(key, dst, src, len, tweak);
+}
+
+static int aeskl_xts_decrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(key, dst, src, len, tweak);
+}
+
+/*
+ * The glue code in xts_crypt() and xts_crypt_slowpath() follows
+ * aesni-intel_glue.c. While this code is shareable, the key
+ * material format difference can cause more destructive code changes in
+ * the AES-NI side.
+ */
+
+typedef int (*xts_encrypt_iv_func)(const struct aeskl_ctx *tweak_key,
+				   u8 iv[AES_BLOCK_SIZE]);
+typedef int (*xts_crypt_func)(const struct aeskl_ctx *key,
+			      const u8 *src, u8 *dst, unsigned int len,
+			      u8 tweak[AES_BLOCK_SIZE]);
+
+/* This handles cases where the source and/or destination span pages. */
+static noinline int
+xts_crypt_slowpath(struct skcipher_request *req, xts_crypt_func crypt_func)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct scatterlist sg_src[2], sg_dst[2];
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	struct scatterlist *src, *dst;
+	int err;
+
+	/*
+	 * If the message length isn't divisible by the AES block size, then
+	 * separate off the last full block and the partial block.  This ensures
+	 * that they are processed in the same call to the assembly function,
+	 * which is required for ciphertext stealing.
+	 */
+	if (tail) {
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   req->cryptlen - tail - AES_BLOCK_SIZE,
+					   req->iv);
+		req = &subreq;
+	}
+
+	err = skcipher_walk_virt(&walk, req, false);
+
+	while (walk.nbytes) {
+		kernel_fpu_begin();
+		err |= (*crypt_func)(&ctx->crypt_ctx,
+				     walk.src.virt.addr, walk.dst.virt.addr,
+				     walk.nbytes & ~(AES_BLOCK_SIZE - 1), req->iv);
+		kernel_fpu_end();
+		err |= skcipher_walk_done(&walk,
+					  walk.nbytes & (AES_BLOCK_SIZE - 1));
+	}
+
+	if (err || !tail)
+		return err;
+
+	/* Do ciphertext stealing with the last full block and partial block. */
+
+	dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+	if (req->dst != req->src)
+		dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+	skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+				   req->iv);
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+
+	kernel_fpu_begin();
+	err = (*crypt_func)(&ctx->crypt_ctx, walk.src.virt.addr, walk.dst.virt.addr,
+			    walk.nbytes, req->iv);
+	kernel_fpu_end();
+	if (err)
+		return err;
+
+	return skcipher_walk_done(&walk, 0);
+}
+
+/* __always_inline to avoid indirect call in fastpath */
+static __always_inline int
+xts_crypt(struct skcipher_request *req, xts_encrypt_iv_func encrypt_iv,
+	  xts_crypt_func crypt_func)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(tfm);
+	const unsigned int cryptlen = req->cryptlen;
+	struct scatterlist *src = req->src;
+	struct scatterlist *dst = req->dst;
+	int err;
+
+	if (unlikely(cryptlen < AES_BLOCK_SIZE))
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	err = (*encrypt_iv)(&ctx->tweak_ctx, req->iv);
+	if (err)
+		goto out;
+
+	/*
+	 * In practice, virtually all XTS plaintexts and ciphertexts are either
+	 * 512 or 4096 bytes, aligned such that they don't span page boundaries.
+	 * To optimize the performance of these cases, and also any other case
+	 * where no page boundary is spanned, the below fast-path handles
+	 * single-page sources and destinations as efficiently as possible.
+	 */
+	if (likely(src->length >= cryptlen && dst->length >= cryptlen &&
+		   src->offset + cryptlen <= PAGE_SIZE &&
+		   dst->offset + cryptlen <= PAGE_SIZE)) {
+		struct page *src_page = sg_page(src);
+		struct page *dst_page = sg_page(dst);
+		void *src_virt = kmap_local_page(src_page) + src->offset;
+		void *dst_virt = kmap_local_page(dst_page) + dst->offset;
+
+		err = (*crypt_func)(&ctx->crypt_ctx, src_virt, dst_virt, cryptlen,
+				    req->iv);
+		if (err)
+			goto out;
+		kunmap_local(dst_virt);
+		kunmap_local(src_virt);
+		kernel_fpu_end();
+		return 0;
+	}
+out:
+	kernel_fpu_end();
+	if (err)
+		return err;
+	return xts_crypt_slowpath(req, crypt_func);
+}
+
+static int xts_setkey_aeskl(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen)
+{
+	struct aeskl_xts_ctx *ctx = aeskl_xts_ctx(tfm);
+	unsigned int aes_keylen;
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	aes_keylen = keylen / 2;
+	err = aes_check_keylen(aes_keylen);
+	if (err)
+		return err;
+
+	if (unlikely(aes_keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		return xts_setkey_aesni(tfm, key, keylen);
+	}
+
+	err = aeskl_setkey(&ctx->crypt_ctx, key, aes_keylen);
+	if (err)
+		return err;
+	return aeskl_setkey(&ctx->tweak_ctx, key + aes_keylen, aes_keylen);
+}
+
+static int xts_encrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_encrypt_aesni(req);
+
+	return xts_crypt(req, aeskl_xts_encrypt_iv, aeskl_xts_encrypt);
+}
+
+static int xts_decrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_decrypt_aesni(req);
+
+	return xts_crypt(req, aeskl_xts_encrypt_iv, aeskl_xts_decrypt);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey_aeskl,
+		.encrypt	= xts_encrypt_aeskl,
+		.decrypt	= xts_decrypt_aeskl,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_AES))
+		return -ENODEV;
+
+	/* The tweak processing is optimized using AVX instructions. */
+	if (!cpu_feature_enabled(X86_FEATURE_AVX))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 5b25d2a58aeb..61456f0a99fa 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -35,7 +35,7 @@
 #include <linux/workqueue.h>
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
-
+#include "aesni-xts.h"
 
 #define AESNI_ALIGN	16
 #define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
@@ -864,8 +864,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 }
 #endif
 
-static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
-			    unsigned int keylen)
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
+		     unsigned int keylen)
 {
 	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
@@ -884,6 +884,7 @@ static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
 	/* second half of xts-key is for tweak */
 	return aes_set_key_common(&ctx->tweak_ctx, key + keylen, keylen);
 }
+EXPORT_SYMBOL_GPL(xts_setkey_aesni);
 
 typedef void (*xts_encrypt_iv_func)(const struct crypto_aes_ctx *tweak_key,
 				    u8 iv[AES_BLOCK_SIZE]);
@@ -1020,15 +1021,17 @@ static void aesni_xts_decrypt(const struct crypto_aes_ctx *key,
 	aesni_xts_dec(key, dst, src, len, tweak);
 }
 
-static int xts_encrypt_aesni(struct skcipher_request *req)
+int xts_encrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_encrypt);
 }
+EXPORT_SYMBOL_GPL(xts_encrypt_aesni);
 
-static int xts_decrypt_aesni(struct skcipher_request *req)
+int xts_decrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_decrypt);
 }
+EXPORT_SYMBOL_GPL(xts_decrypt_aesni);
 
 static struct crypto_alg aesni_cipher_alg = {
 	.cra_name		= "aes",
diff --git a/arch/x86/crypto/aesni-xts.h b/arch/x86/crypto/aesni-xts.h
new file mode 100644
index 000000000000..9833da2bd9d2
--- /dev/null
+++ b/arch/x86/crypto/aesni-xts.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESNI_XTS_H
+#define _AESNI_XTS_H
+
+/*
+ * These AES-NI functions are used by the AES-KL code as a fallback when
+ * a 192-bit key is provided. Key Locker does not support 192-bit keys.
+ */
+
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen);
+int xts_encrypt_aesni(struct skcipher_request *req);
+int xts_decrypt_aesni(struct skcipher_request *req);
+
+#endif /* _AESNI_XTS_H */
-- 
2.34.1
Re: [PATCH v9a 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Eric Biggers 1 year, 8 months ago
On Wed, May 22, 2024 at 11:42:35AM -0700, Chang S. Bae wrote:
> I've reworked this patch based on feedback,
>     https://lore.kernel.org/lkml/20240408014806.GA965@quark.localdomain/
> and rebased to upstream v6.10 Linus merge tree on May 13th: commit
> 84c7d76b5ab6 ("Merge tag 'v6.10-p1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6")
> 
> According to the dm-crypt benchmark, using VEX-encoded instructions for
> tweak processing enhances performance by approximately 2-3%. The
> PCLMULDQD instruction did not yield a measurable impact, so I dropped it
> to simplify the implementation.
> 
> In contrast to other AES instructions, AES-KL does not permit tweak
> processing between rounds. In XTS mode, a single instruction covers all
> rounds of 8 blocks without interleaving instructions. Maybe this is one
> of the reasons for the limited performance gain.
> 
> Moving forward, I would like to address any further feedback on this
> AES-KL driver code first before the next revision of the whole series.
> 
> Changes from v9:
> * Duplicate the new XTS glue code, instead of sharing (Eric).
> * Use VEX-coded instructions for non-AES parts of the code (Eric).
> * Adjust ASM code to stylistically follow the new VAES support (Eric).
> * Export and reference the high-level AES-NI XTS functions (Eric). Then,
>   support a module build, along with rearranging build dependencies.
> * Reorganize the glue code and improve ASM code readability.
> * Revoke the review tag due to major changes.
> ---

Thanks for the updated patch!

> diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
> index 59aedf32c4ea..89e326c9dbfe 100644
> --- a/arch/x86/Kconfig.assembler
> +++ b/arch/x86/Kconfig.assembler
> @@ -35,6 +35,11 @@ config AS_VPCLMULQDQ
>  	help
>  	  Supported by binutils >= 2.30 and LLVM integrated assembler
>  
> +config AS_HAS_KEYLOCKER
> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> +	help
> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12

Adding AS_HAS_KEYLOCKER should be its own patch.

> diff --git a/arch/x86/crypto/aeskl-xts-x86_64.S b/arch/x86/crypto/aeskl-xts-x86_64.S
> new file mode 100644
> index 000000000000..6ff8b5feebfc
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl-xts-x86_64.S
> @@ -0,0 +1,358 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Implement AES algorithm using AES Key Locker instructions.
> + *
> + * Most code is primarily derived from aesni-intel_asm.S and
> + * stylistically aligned with aes-xts-avx-x86_64.S.
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/cfi_types.h>
> +#include <asm/errno.h>
> +#include <asm/inst.h>
> +#include <asm/frame.h>
> +
> +/* Constant values shared between AES implementations: */
> +
> +.section .rodata
> +.p2align 4
> +.Lgf_poly:
> +	/*
> +	 * Represents the polynomial x^7 + x^2 + x + 1, where the low 64
> +	 * bits are XOR'd into the tweak's low 64 bits when a carry
> +	 * occurs from the high 64 bits.
> +	 */
> +	.quad	0x87, 1
> +
> +	/*
> +	 * Table of constants for variable byte shifts and blending
> +	 * during ciphertext stealing operations.
> +	 */
> +.Lcts_permute_table:
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
> +	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
> +
> +.text
> +
> +.set	V0,		%xmm0
> +.set	V1,		%xmm1
> +.set	V2,		%xmm2
> +.set	V3,		%xmm3
> +.set	V4,		%xmm4
> +.set	V5,		%xmm5
> +.set	V6,		%xmm6
> +.set	V7,		%xmm7
> +.set	V8,		%xmm8
> +.set	V9,		%xmm9
> +.set	V10,		%xmm10
> +.set	V11,		%xmm11
> +.set	V12,		%xmm12
> +.set	V13,		%xmm13
> +.set	V14,		%xmm14
> +.set	V15,		%xmm15

The point of the V[0-15] aliases in aes-xts-avx-x86_64.S are to support both ymm
and zmm registers.  Here all registers are xmm, so there's no need for the layer
of indirection and you should just use xmm[0-15] directly.

> +.set	TWEAK_XMM1,	V8
> +.set	TWEAK_XMM2,	V9
> +.set	TWEAK_XMM3,	V10
> +.set	TWEAK_XMM4,	V11
> +.set	TWEAK_XMM5,	V12
> +.set	TWEAK_XMM6,	V13
> +.set	TWEAK_XMM7,	V14
> +.set	GF_POLY_XMM,	V15
> +.set	TWEAK_TMP,	TWEAK_XMM1
> +.set	TWEAK_XMM,	TWEAK_XMM2
> +.set	TMP,		%r10

Similarly, the _XMM suffixes are not really helpful since ymm and zmm registers
are not in play here.

> +/* Function parameters */
> +.set	HANDLEP,	%rdi	/* Pointer to struct aeskl_ctx */
> +.set	DST,		%rsi	/* Pointer to next destination data */
> +.set	UKEYP,		DST	/* Pointer to the original key */
> +.set	KLEN,		%r9d	/* AES key length in bytes */
> +.set	SRC,		%rdx	/* Pointer to next source data */
> +.set	LEN,		%rcx	/* Remaining length in bytes */
> +.set	TWEAK,		%r8	/* Pointer to next tweak */

Please don't put parameters for different functions in the same list like this.
There should be a separate list at the beginning of each function.  (Yes, it
doesn't work perfectly because '.set' is global and doesn't go out of scope once
the function ends.  But at least this would make it clear what the intent is.)

Also LEN needs to be %ecx, not %rcx, because it is unsigned int.

> +SYM_FUNC_START(__aeskl_setkey)
> +	FRAME_BEGIN
> +	movl		%edx, 480(HANDLEP)
> +	vmovdqu		(UKEYP), V0
> +	mov		$1, %eax
> +	cmp		$16, %dl
> +	je		.Lsetkey_128
> +
> +	vmovdqu		0x10(UKEYP), V1
> +	encodekey256	%eax, %eax
> +	vmovdqu		V3, 0x30(HANDLEP)
> +	jmp		.Lsetkey_end
> +.Lsetkey_128:
> +	encodekey128	%eax, %eax
> +
> +.Lsetkey_end:
> +	vmovdqu		V0, 0x00(HANDLEP)
> +	vmovdqu		V1, 0x10(HANDLEP)
> +	vmovdqu		V2, 0x20(HANDLEP)
> +
> +	FRAME_END
> +	RET
> +SYM_FUNC_END(__aeskl_setkey)

These are all leaf functions, so they don't need FRAME_BEGIN and FRAME_END.

> +.macro _aeskl		width, operation
> +	cmp		$16, KLEN
> +	je		.Laeskl128\@
> +.ifc \width, wide
> + .ifc \operation, dec
> +	aesdecwide256kl	(HANDLEP)
> + .else
> +	aesencwide256kl	(HANDLEP)
> + .endif
> +.else
> + .ifc \operation, dec
> +	aesdec256kl	(HANDLEP), V0
> + .else
> +	aesenc256kl	(HANDLEP), V0
> + .endif
> +.endif
> +	jmp		.Laesklend\@
> +.Laeskl128\@:
> +.ifc \width, wide
> + .ifc \operation, dec
> +	aesdecwide128kl	(HANDLEP)
> + .else
> +	aesencwide128kl	(HANDLEP)
> + .endif
> +.else
> + .ifc \operation, dec
> +	aesdec128kl	(HANDLEP), V0
> + .else
> +	aesenc128kl	(HANDLEP), V0
> + .endif
> +.endif
> +.Laesklend\@:
> +.endm

I think it would be easier to read if this was split into two macros, one for
1-block and one for 8-block.

> +/* int __aeskl_enc(const void *handlep, u8 *dst, const u8 *src) */
> +SYM_FUNC_START(__aeskl_enc)
> +	FRAME_BEGIN
> +	vmovdqu		(SRC), V0
> +	movl		480(HANDLEP), KLEN
> +
> +	_aeskl		oneblock, enc
> +	jz		.Lerror
> +	xor		%rax, %rax
> +	vmovdqu		V0, (DST)
> +	FRAME_END
> +	RET
> +.Lerror:
> +	mov		$(-EINVAL), %rax

For returning an int, %eax should be used, not %rax.

(Note that instructions that operate on %eax also tend to be slightly shorter.)

> +/*
> + * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *dst,
> + *			   const u8 *src, unsigned int klen, le128 *tweak)
> + */
> +SYM_FUNC_START(__aeskl_xts_encrypt)
> +	_aeskl_xts_crypt	enc
> +SYM_FUNC_END(__aeskl_xts_encrypt)
> +
> +/*
> + * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *dst,
> + *			   const u8 *src, unsigned int klen, le128 *twek)
> + */

Please make sure the function prototypes, including the parameter names, match
the ones used in the .c file and also the lists of register aliases.

> diff --git a/arch/x86/crypto/aeskl_glue.c b/arch/x86/crypto/aeskl_glue.c
> new file mode 100644
> index 000000000000..6dc4d380be54
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl_glue.c
> @@ -0,0 +1,376 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Support for AES Key Locker instructions. This file contains glue
> + * code and the real AES implementation is in aeskl-intel_asm.S.
> + *
> + * Most code is based on aesni-intel_glue.c
> + */
> +
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/err.h>
> +#include <crypto/algapi.h>
> +#include <crypto/aes.h>
> +#include <crypto/xts.h>
> +#include <crypto/scatterwalk.h>
> +#include <crypto/internal/skcipher.h>
> +#include <crypto/internal/simd.h>
> +#include <asm/simd.h>
> +#include <asm/cpu_device_id.h>
> +#include <asm/fpu/api.h>
> +#include <asm/keylocker.h>
> +#include "aesni-xts.h"
> +
> +#define AESKL_ALIGN		16
> +#define AESKL_ALIGN_ATTR	__attribute__ ((__aligned__(AESKL_ALIGN)))
> +#define AESKL_ALIGN_EXTRA	((AESKL_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
> +
> +#define AESKL_AAD_SIZE		16
> +#define AESKL_TAG_SIZE		16
> +#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
> +
> +/* The Key Locker handle is an encoded form of the AES key. */
> +struct aeskl_handle {
> +	u8 additional_authdata[AESKL_AAD_SIZE];
> +	u8 integrity_tag[AESKL_TAG_SIZE];
> +	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
> +};

ciphre_text => ciphertext

> +/*
> + * Key Locker does not support 192-bit key size. The driver needs to
> + * retrieve the key size in the first place. The offset of the
> + * 'key_length' field here should be compatible with struct

should => must

> + * crypto_aes_ctx.
> + */
> +#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
> +			    - sizeof(u32))
> +
> +struct aeskl_ctx {
> +	struct aeskl_handle handle;
> +	u8 reserved[AESKL_CTX_RESERVED];
> +	u32 key_length;
> +};
> +
> +struct aeskl_xts_ctx {
> +	struct aeskl_ctx tweak_ctx AESKL_ALIGN_ATTR;
> +	struct aeskl_ctx crypt_ctx AESKL_ALIGN_ATTR;
> +};

So there's a union between aeskl_ctx and crypto_aes_ctx going on here, but it's
not made explicit through a C union.  How about doing that?

Also, there should be a BUILD_BUG_ON() that enforces that the key_length is
really at the same offset in both.

> +/*
> + * The glue code in xts_crypt() and xts_crypt_slowpath() follows
> + * aesni-intel_glue.c. While this code is shareable, the key
> + * material format difference can cause more destructive code changes in
> + * the AES-NI side.
> + */
> +
> +typedef int (*xts_encrypt_iv_func)(const struct aeskl_ctx *tweak_key,
> +				   u8 iv[AES_BLOCK_SIZE]);
> +typedef int (*xts_crypt_func)(const struct aeskl_ctx *key,
> +			      const u8 *src, u8 *dst, unsigned int len,
> +			      u8 tweak[AES_BLOCK_SIZE]);

Since there are so few functions in play here (one xts_encrypt_iv_func and two
xts_crypt_func) I think you should just use direct calls instead of function
pointers.  A simple parameter 'bool enc' would take care of selecting the
encryption function vs. the decryption one.

One of the issues with indirect calls, even when inlined with the intention that
they be optimized out, is that there's no guarantee that the compiler will
actually optimize them out.  That has the consequence that CFI stubs are still
needed in the assembly.

Direct calls avoid this issue.

(BTW, in my AES-GCM patchset I'm using direct calls:
https://lore.kernel.org/linux-crypto/20240527075626.142576-1-ebiggers@kernel.org/.
I'm thinking I should have used that approach with AES-XTS too.)

> +static struct skcipher_alg aeskl_skciphers[] = {
> +	{
> +		.base = {
> +			.cra_name		= "__xts(aes)",
> +			.cra_driver_name	= "__xts-aes-aeskl",
> +			.cra_priority		= 200,

Maybe add a comment here that explains that this is intentionally made lower
priority than xts-aes-aesni.

> +static int __init aeskl_init(void)
> +{
> +	u32 eax, ebx, ecx, edx;
> +
> +	if (!valid_keylocker())
> +		return -ENODEV;
> +
> +	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
> +	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
> +		return -ENODEV;
> +
> +	/*
> +	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
> +	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
> +	 * back to AES-NI.
> +	 */
> +	if (!cpu_feature_enabled(X86_FEATURE_AES))
> +		return -ENODEV;

Everywhere else in arch/x86/crypto/ uses boot_cpu_has(), not
cpu_feature_enabled().

> +
> +	/* The tweak processing is optimized using AVX instructions. */
> +	if (!cpu_feature_enabled(X86_FEATURE_AVX))
> +		return -ENODEV;

The whole implementation uses VEX-coded instructions now, not just the tweak
processing.  So either fix or delete the above comment.

- Eric
[PATCH v9b 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
Posted by Chang S. Bae 1 year, 8 months ago
Hi Eric,

I really appreciate your review. Keeping track of about 800 lines of
code, including assembly lines, in a single patch is demanding. I hope
this version meets your expectations.

The overall diff can be found here:
  https://github.com/intel-staging/keylocker/compare/f8420d4e27fc..57472d9b3f8e

Thanks,
Chang

---
Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion along with all subsequent data transformations, is
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

Although the AES-KL instructions themselves are executable in userspace,
this restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support. However,
per the expectations of Linux crypto-cipher implementations, the software
cipher implementation must support all the AES-compliant key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit key-size
limitation is documented but not enforced.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. This allows an error code to be returned instead of
encountering abrupt exceptions.

== Userspace Exposition ==

Keylocker implementations have measurable performance penalties.
Therefore, keep the current default remains the same.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is up to the end
user to decide whether to use AES-KL. User can select it by the name
'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the previous posting (Eric):
* Assembly Code
  - Improve function argument descriptions.
  - Simplify the AES processing macro.
  - Rename some symbols.
* Glue Code:
  - Do not use function pointer variables; direct calls.
  - Define a union for the two 'ctx' structures.
  - Clarify a few code spots with comments.
  - Adjust some variable and struct names.
* Kconfig
  - Separate out the Kconfig.assembler change.
---
 arch/x86/crypto/Kconfig            |  18 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-xts-x86_64.S | 337 ++++++++++++++++++++++++
 arch/x86/crypto/aeskl_glue.c       | 409 +++++++++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_glue.c |  13 +-
 arch/x86/crypto/aesni-xts.h        |  15 ++
 6 files changed, 790 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-xts-x86_64.S
 create mode 100644 arch/x86/crypto/aeskl_glue.c
 create mode 100644 arch/x86/crypto/aesni-xts.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..c45d8f48f24e 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,24 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select CRYPTO_SIMD
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9c5ce5613738..c46fd2d9dd16 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -51,6 +51,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o \
 	aes_ctrby8_avx-x86_64.o aes-xts-avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-x86_64.o
+aeskl-x86_64-y := aeskl-xts-x86_64.o aeskl_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-xts-x86_64.S b/arch/x86/crypto/aeskl-xts-x86_64.S
new file mode 100644
index 000000000000..261d03789452
--- /dev/null
+++ b/arch/x86/crypto/aeskl-xts-x86_64.S
@@ -0,0 +1,337 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is primarily derived from aesni-intel_asm.S and
+ * stylistically aligned with aes-xts-avx-x86_64.S.
+ */
+
+#include <linux/linkage.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+
+.section .rodata
+.p2align 4
+.Lgf_poly:
+	/*
+	 * Represents the polynomial x^7 + x^2 + x + 1, where the low 64
+	 * bits are XOR'd into the tweak's low 64 bits when a carry
+	 * occurs from the high 64 bits.
+	 */
+	.quad	0x87, 1
+
+	/*
+	 * Table of constants for variable byte shifts and blending
+	 * during ciphertext stealing operations.
+	 */
+.Lcts_permute_table:
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+
+.text
+
+.set	TWEAK_NEXT1,	%xmm8
+.set	TWEAK_NEXT2,	%xmm9
+.set	TWEAK_NEXT3,	%xmm10
+.set	TWEAK_NEXT4,	%xmm11
+.set	TWEAK_NEXT5,	%xmm12
+.set	TWEAK_NEXT6,	%xmm13
+.set	TWEAK_NEXT7,	%xmm14
+.set	GF_POLY,	%xmm15
+.set	TWEAK_TMP,	TWEAK_NEXT1
+.set	TWEAK_NEXT,	TWEAK_NEXT2
+.set	TMP,		%r10
+.set	KLEN,		%r9d
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handle, const u8 *key,
+ *		       unsigned int keylen)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	KEY,	%rsi	/* Pointer to the original key */
+	.set	KEYLEN,	%edx	/* AES key length in bytes */
+	movl		KEYLEN, 480(HANDLE)
+	vmovdqu		(KEY), %xmm0
+	mov		$1, %eax
+	cmp		$16, %dl
+	je		.Lsetkey_128
+
+	vmovdqu		0x10(KEY), %xmm1
+	encodekey256	%eax, %eax
+	vmovdqu		%xmm3, 0x30(HANDLE)
+	jmp		.Lsetkey_end
+.Lsetkey_128:
+	encodekey128	%eax, %eax
+
+.Lsetkey_end:
+	vmovdqu		%xmm0, 0x00(HANDLE)
+	vmovdqu		%xmm1, 0x10(HANDLE)
+	vmovdqu		%xmm2, 0x20(HANDLE)
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+.macro _aeskl		operation
+	cmp		$16, KLEN
+	je		.Laes_128\@
+.ifc \operation, dec
+	aesdec256kl	(HANDLE), %xmm0
+.else
+	aesenc256kl	(HANDLE), %xmm0
+.endif
+	jmp		.Laes_end\@
+.Laes_128\@:
+.ifc \operation, dec
+	aesdec128kl	(HANDLE), %xmm0
+.else
+	aesenc128kl	(HANDLE), %xmm0
+.endif
+.Laes_end\@:
+.endm
+
+.macro _aesklwide	operation
+	cmp		$16, KLEN
+	je		.Laesw_128\@
+.ifc \operation, dec
+	aesdecwide256kl	(HANDLE)
+.else
+	aesencwide256kl	(HANDLE)
+.endif
+	jmp		.Laesw_end\@
+.Laesw_128\@:
+.ifc \operation, dec
+	aesdecwide128kl	(HANDLE)
+.else
+	aesencwide128kl	(HANDLE)
+.endif
+.Laesw_end\@:
+.endm
+
+/* int __aeskl_enc(const void *handle, u8 *dst, const u8 *src) */
+SYM_FUNC_START(__aeskl_enc)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	DST,	%rsi	/* Pointer to next destination data */
+	.set	SRC,	%rdx	/* Pointer to next source data */
+	vmovdqu		(SRC), %xmm0
+	movl		480(HANDLE), KLEN
+
+	_aeskl		enc
+	jz		.Lerror
+	xor		%rax, %rax
+	vmovdqu		%xmm0, (DST)
+	RET
+.Lerror:
+	mov		$(-EINVAL), %eax
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * Calculate the next 128-bit XTS tweak by multiplying the polynomial 'x'
+ * with the current tweak stored in the register \src, and store the
+ * result in the \dst register.
+ */
+.macro _next_tweak	src, tmp, dst
+	vpshufd		$0x13, \src, \tmp
+	vpaddq		\src, \src, \dst
+	vpsrad		$31, \tmp, \tmp
+	vpand		GF_POLY, \tmp, \tmp
+	vpxor		\tmp, \dst, \dst
+.endm
+
+.macro _aeskl_xts_crypt operation
+	vmovdqa		.Lgf_poly(%rip), GF_POLY
+	vmovups		(TWEAK), TWEAK_NEXT
+	mov		480(HANDLE), KLEN
+
+.ifc \operation, dec
+	/*
+	 * During decryption, if the message length is not a multiple of
+	 * the AES block length, exclude the last complete block from the
+	 * decryption loop by subtracting 16 from LEN. This adjustment is
+	 * necessary because ciphertext stealing decryption uses the last
+	 * two tweaks in reverse order. Special handling is required for
+	 * the last complete block and any remaining partial block at the
+	 * end.
+	 */
+	test		$15, LEN
+	jz		.L8block_at_a_time\@
+	sub		$16, LEN
+.endif
+
+.L8block_at_a_time\@:
+	sub		$128, LEN
+	jl		.Lhandle_remainder\@
+
+	vpxor		(SRC), TWEAK_NEXT, %xmm0
+	vmovups		TWEAK_NEXT, (DST)
+
+	/*
+	 * Calculate and cache tweak values. Note that the tweak
+	 * computation cannot be interleaved with AES rounds here using
+	 * Key Locker instructions.
+	 */
+	_next_tweak	TWEAK_NEXT,  %xmm1, TWEAK_NEXT1
+	_next_tweak	TWEAK_NEXT1, %xmm1, TWEAK_NEXT2
+	_next_tweak	TWEAK_NEXT2, %xmm1, TWEAK_NEXT3
+	_next_tweak	TWEAK_NEXT3, %xmm1, TWEAK_NEXT4
+	_next_tweak	TWEAK_NEXT4, %xmm1, TWEAK_NEXT5
+	_next_tweak	TWEAK_NEXT5, %xmm1, TWEAK_NEXT6
+	_next_tweak	TWEAK_NEXT6, %xmm1, TWEAK_NEXT7
+
+	/* XOR each source block with its tweak. */
+	vpxor		0x10(SRC), TWEAK_NEXT1, %xmm1
+	vpxor		0x20(SRC), TWEAK_NEXT2, %xmm2
+	vpxor		0x30(SRC), TWEAK_NEXT3, %xmm3
+	vpxor		0x40(SRC), TWEAK_NEXT4, %xmm4
+	vpxor		0x50(SRC), TWEAK_NEXT5, %xmm5
+	vpxor		0x60(SRC), TWEAK_NEXT6, %xmm6
+	vpxor		0x70(SRC), TWEAK_NEXT7, %xmm7
+
+	/* Encrypt or decrypt 8 blocks per iteration. */
+	_aesklwide	\operation
+	jz		.Lerror\@
+
+	/* XOR tweaks again. */
+	vpxor		(DST), %xmm0, %xmm0
+	vpxor		TWEAK_NEXT1, %xmm1, %xmm1
+	vpxor		TWEAK_NEXT2, %xmm2, %xmm2
+	vpxor		TWEAK_NEXT3, %xmm3, %xmm3
+	vpxor		TWEAK_NEXT4, %xmm4, %xmm4
+	vpxor		TWEAK_NEXT5, %xmm5, %xmm5
+	vpxor		TWEAK_NEXT6, %xmm6, %xmm6
+	vpxor		TWEAK_NEXT7, %xmm7, %xmm7
+
+	/* Store destination blocks. */
+	vmovdqu		%xmm0, 0x00(DST)
+	vmovdqu		%xmm1, 0x10(DST)
+	vmovdqu		%xmm2, 0x20(DST)
+	vmovdqu		%xmm3, 0x30(DST)
+	vmovdqu		%xmm4, 0x40(DST)
+	vmovdqu		%xmm5, 0x50(DST)
+	vmovdqu		%xmm6, 0x60(DST)
+	vmovdqu		%xmm7, 0x70(DST)
+
+	_next_tweak	TWEAK_NEXT7, TWEAK_TMP, TWEAK_NEXT
+	add		$128, SRC
+	add		$128, DST
+	test		LEN, LEN
+	jz		.Lend\@
+	jmp		.L8block_at_a_time\@
+
+.Lhandle_remainder\@:
+	add		$128, LEN
+	jz		.Lend\@
+.ifc \operation, enc
+	vmovdqu		%xmm7, %xmm0
+.endif
+	sub		$16, LEN
+	jl		.Lcts\@
+
+	/* Encrypt or decrypt one block per iteration */
+.Lblock_at_a_time\@:
+	vpxor		(SRC), TWEAK_NEXT, %xmm0
+	_aeskl		\operation
+	jz		.Lerror\@
+	vpxor		TWEAK_NEXT, %xmm0, %xmm0
+	_next_tweak	TWEAK_NEXT, TWEAK_TMP, TWEAK_NEXT
+	test		LEN, LEN
+	jz		.Lout\@
+
+	add		$16, SRC
+	vmovdqu		%xmm0, (DST)
+	add		$16, DST
+	sub		$16, LEN
+	jge		.Lblock_at_a_time\@
+
+.Lcts\@:
+.ifc \operation, dec
+	/*
+	 * If decrypting, the last block was not decrypted because CTS
+	 * decryption uses the last two tweaks in reverse order. This is
+	 * done by advancing the tweak and decrypting the last block.
+	 */
+	_next_tweak	TWEAK_NEXT, TWEAK_TMP, %xmm4
+	vpxor		(SRC), %xmm4, %xmm0
+	_aeskl		\operation
+	jz		.Lerror\@
+	vpxor		%xmm4, %xmm0, %xmm0
+	add		$16, SRC
+.else
+	/*
+	 * If encrypting, the last block was already encrypted in %xmm0.
+	 * Prepare the CTS encryption by rewinding the pointer.
+	 */
+	sub		$16, DST
+.endif
+	lea		.Lcts_permute_table(%rip), TMP
+
+	/* Load the source partial block */
+	vmovdqu		(SRC, LEN, 1), %xmm3
+
+	/*
+	 * Shift the first LEN bytes of the encryption and decryption of
+	 * the last block to the end of a register, then store it to
+	 * DST+LEN.
+	 */
+	add		$16, LEN
+	vpshufb		(TMP, LEN, 1), %xmm0, %xmm2
+	vmovdqu		%xmm2, (DST, LEN, 1)
+
+	/* Shift the source partial block to the beginning */
+	sub		LEN, TMP
+	vmovdqu		32(TMP), %xmm2
+	vpshufb		%xmm2, %xmm3, %xmm3
+
+	/* Blend to generate the source partial block */
+	vpblendvb	%xmm2, %xmm0, %xmm3, %xmm3
+
+	/* Encrypt or decrypt again and store the last block. */
+	vpxor		TWEAK_NEXT, %xmm3, %xmm0
+	_aeskl		\operation
+	jz		.Lerror\@
+	vpxor		TWEAK_NEXT, %xmm0, %xmm0
+	vmovdqu		%xmm0, (DST)
+
+	xor		%rax, %rax
+	RET
+.Lout\@:
+	vmovdqu		%xmm0, (DST)
+.Lend\@:
+	vmovups		TWEAK_NEXT, (TWEAK)
+	xor		%rax, %rax
+	RET
+.Lerror\@:
+	mov		$(-EINVAL), %eax
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handle, u8 *dst,
+ *			   const u8 *src, unsigned int len, u8 *tweak)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	DST,	%rsi	/* Pointer to next destination data */
+	.set	SRC,	%rdx	/* Pointer to next source data */
+	.set	LEN,	%rcx	/* Remaining length in bytes */
+	.set	TWEAK,	%r8	/* Pointer to next tweak */
+	_aeskl_xts_crypt	enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handle, u8 *dst,
+ *			   const u8 *src, unsigned int len, u8 *tweak)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	.set	HANDLE,	%rdi	/* Pointer to struct aeskl_ctx */
+	.set	DST,	%rsi	/* Pointer to next destination data */
+	.set	SRC,	%rdx	/* Pointer to next source data */
+	.set	LEN,	%rcx	/* Remaining length in bytes */
+	.set	TWEAK,	%r8	/* Pointer to next tweak */
+	_aeskl_xts_crypt	dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl_glue.c b/arch/x86/crypto/aeskl_glue.c
new file mode 100644
index 000000000000..51b8daf7e72a
--- /dev/null
+++ b/arch/x86/crypto/aeskl_glue.c
@@ -0,0 +1,409 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-xts-x86_64.S.
+ *
+ * Most code is based on aesni-intel_glue.c
+ */
+
+#include <linux/err.h>
+#include <linux/types.h>
+#include <linux/module.h>
+
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/simd.h>
+
+#include "aesni-xts.h"
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__ ((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 cipher_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here must be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) \
+			    - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+/*
+ * Unify the two context structures to represent the crypto context.
+ * Depending on the key size, either AES-KL or AES-NI will be used.
+ */
+union x86_aes_ctx {
+	struct aeskl_ctx      aeskl;
+	struct crypto_aes_ctx aesni;
+};
+
+struct xts_aes_ctx {
+	union x86_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	union x86_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline struct xts_aes_ctx *xts_aes_ctx(struct crypto_skcipher *tfm)
+{
+	void *addr = crypto_skcipher_ctx(tfm);
+
+	if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
+		return addr;
+
+	return PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct xts_aes_ctx *ctx = xts_aes_ctx(crypto_skcipher_reqtfm(req));
+
+	BUILD_BUG_ON(offsetof(struct crypto_aes_ctx, key_length) !=
+		     offsetof(struct aeskl_ctx, key_length));
+
+	return ctx->crypt_ctx.aeskl.key_length;
+}
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *handle, const u8 *key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *handle, u8 *dst, const u8 *src);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *handle, u8 *dst, const u8 *src,
+				   unsigned int len, u8 *tweak);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *handle, u8 *dst, const u8 *src,
+				   unsigned int len, u8 *tweak);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	kernel_fpu_begin();
+	if (!valid_keylocker()) {
+		kernel_fpu_end();
+		return -ENODEV;
+	}
+
+	__aeskl_setkey(ctx, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static int aeskl_xts_encrypt_iv(const struct aeskl_ctx *tweak_key,
+				u8 iv[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(tweak_key, iv, iv);
+}
+
+static int aeskl_xts_encrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(key, dst, src, len, tweak);
+}
+
+static int aeskl_xts_decrypt(const struct aeskl_ctx *key,
+			     const u8 *src, u8 *dst, unsigned int len,
+			     u8 tweak[AES_BLOCK_SIZE])
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(key, dst, src, len, tweak);
+}
+
+/*
+ * The glue code in xts_crypt() and xts_crypt_slowpath() follows
+ * aesni-intel_glue.c. While this code is shareable, the key
+ * material format difference can cause more destructive code changes in
+ * the AES-NI side.
+ */
+
+enum xts_ops {
+	XTS_ENCRYPTION,
+	XTS_DECRYPTION
+};
+
+/* This handles cases where the source and/or destination span pages. */
+static noinline int xts_crypt_slowpath(struct skcipher_request *req, enum xts_ops ops)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct xts_aes_ctx *ctx = xts_aes_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct scatterlist sg_src[2], sg_dst[2];
+	struct skcipher_request subreq;
+	struct scatterlist *src, *dst;
+	struct skcipher_walk walk;
+	int err;
+
+	/*
+	 * If the message length isn't divisible by the AES block size, then
+	 * separate off the last full block and the partial block.  This ensures
+	 * that they are processed in the same call to the assembly function,
+	 * which is required for ciphertext stealing.
+	 */
+	if (tail) {
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   req->cryptlen - tail - AES_BLOCK_SIZE,
+					   req->iv);
+		req = &subreq;
+	}
+
+	err = skcipher_walk_virt(&walk, req, false);
+
+	while (walk.nbytes) {
+		kernel_fpu_begin();
+		if (ops == XTS_ENCRYPTION) {
+			err |= aeskl_xts_encrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+						 walk.dst.virt.addr,
+						 walk.nbytes & ~(AES_BLOCK_SIZE - 1), req->iv);
+		} else {
+			err |= aeskl_xts_decrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+						 walk.dst.virt.addr,
+						 walk.nbytes & ~(AES_BLOCK_SIZE - 1), req->iv);
+		}
+		kernel_fpu_end();
+		err |= skcipher_walk_done(&walk,
+					  walk.nbytes & (AES_BLOCK_SIZE - 1));
+	}
+
+	if (err || !tail)
+		return err;
+
+	/* Do ciphertext stealing with the last full block and partial block. */
+
+	dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+	if (req->dst != req->src)
+		dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+	skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+				   req->iv);
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+
+	kernel_fpu_begin();
+	if (ops == XTS_ENCRYPTION) {
+		err = aeskl_xts_encrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+					walk.dst.virt.addr, walk.nbytes, req->iv);
+	} else {
+		err = aeskl_xts_decrypt(&ctx->crypt_ctx.aeskl, walk.src.virt.addr,
+					walk.dst.virt.addr, walk.nbytes, req->iv);
+	}
+	kernel_fpu_end();
+	if (err)
+		return err;
+
+	return skcipher_walk_done(&walk, 0);
+}
+
+/* __always_inline to avoid indirect call in fastpath */
+static __always_inline int xts_crypt(struct skcipher_request *req, enum xts_ops ops)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct xts_aes_ctx *ctx = xts_aes_ctx(tfm);
+	const unsigned int cryptlen = req->cryptlen;
+	struct scatterlist *src = req->src;
+	struct scatterlist *dst = req->dst;
+	int err;
+
+	if (unlikely(cryptlen < AES_BLOCK_SIZE))
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	err = aeskl_xts_encrypt_iv(&ctx->tweak_ctx.aeskl, req->iv);
+	if (err)
+		goto out;
+
+	/*
+	 * In practice, virtually all XTS plaintexts and ciphertexts are either
+	 * 512 or 4096 bytes, aligned such that they don't span page boundaries.
+	 * To optimize the performance of these cases, and also any other case
+	 * where no page boundary is spanned, the below fast-path handles
+	 * single-page sources and destinations as efficiently as possible.
+	 */
+	if (likely(src->length >= cryptlen && dst->length >= cryptlen &&
+		   src->offset + cryptlen <= PAGE_SIZE &&
+		   dst->offset + cryptlen <= PAGE_SIZE)) {
+		struct page *src_page = sg_page(src);
+		struct page *dst_page = sg_page(dst);
+		void *src_virt = kmap_local_page(src_page) + src->offset;
+		void *dst_virt = kmap_local_page(dst_page) + dst->offset;
+
+		if (ops == XTS_ENCRYPTION) {
+			err = aeskl_xts_encrypt(&ctx->crypt_ctx.aeskl, src_virt,
+						dst_virt, cryptlen, req->iv);
+		} else {
+			err = aeskl_xts_decrypt(&ctx->crypt_ctx.aeskl, src_virt,
+						dst_virt, cryptlen, req->iv);
+		}
+		if (err)
+			goto out;
+		kunmap_local(dst_virt);
+		kunmap_local(src_virt);
+		kernel_fpu_end();
+		return 0;
+	}
+out:
+	kernel_fpu_end();
+	if (err)
+		return err;
+	return xts_crypt_slowpath(req, ops);
+}
+
+static int xts_setkey_aeskl(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen)
+{
+	struct xts_aes_ctx *ctx = xts_aes_ctx(tfm);
+	unsigned int aes_keylen;
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	aes_keylen = keylen / 2;
+	err = aes_check_keylen(aes_keylen);
+	if (err)
+		return err;
+
+	if (unlikely(aes_keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		return xts_setkey_aesni(tfm, key, keylen);
+	}
+
+	err = aeskl_setkey(&ctx->crypt_ctx.aeskl, key, aes_keylen);
+	if (err)
+		return err;
+
+	return aeskl_setkey(&ctx->tweak_ctx.aeskl, key + aes_keylen, aes_keylen);
+}
+
+static int xts_encrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_encrypt_aesni(req);
+
+	return xts_crypt(req, XTS_ENCRYPTION);
+}
+
+static int xts_decrypt_aeskl(struct skcipher_request *req)
+{
+	if (unlikely(xts_keylen(req) == AES_KEYSIZE_192))
+		return xts_decrypt_aesni(req);
+
+	return xts_crypt(req, XTS_DECRYPTION);
+}
+
+#define XTS_AES_CTX_SIZE (sizeof(struct xts_aes_ctx) + AES_ALIGN_EXTRA)
+
+/*
+ * The 'cra_priority' value is intentionally set lower than
+ * xts-aes-aesni.
+ */
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey_aeskl,
+		.encrypt	= xts_encrypt_aeskl,
+		.decrypt	= xts_decrypt_aeskl,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	/*
+	 * For performance, use the Key Locker AES wide and AVX
+	 * instructions.
+	 */
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+	if (!boot_cpu_has(X86_FEATURE_AVX))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index ef031655b2d3..49fb56efac56 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -35,7 +35,7 @@
 #include <linux/workqueue.h>
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
-
+#include "aesni-xts.h"
 
 #define AESNI_ALIGN	16
 #define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
@@ -864,8 +864,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 }
 #endif
 
-static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
-			    unsigned int keylen)
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
+		     unsigned int keylen)
 {
 	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
@@ -884,6 +884,7 @@ static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
 	/* second half of xts-key is for tweak */
 	return aes_set_key_common(&ctx->tweak_ctx, key + keylen, keylen);
 }
+EXPORT_SYMBOL_GPL(xts_setkey_aesni);
 
 typedef void (*xts_encrypt_iv_func)(const struct crypto_aes_ctx *tweak_key,
 				    u8 iv[AES_BLOCK_SIZE]);
@@ -1020,15 +1021,17 @@ static void aesni_xts_decrypt(const struct crypto_aes_ctx *key,
 	aesni_xts_dec(key, dst, src, len, tweak);
 }
 
-static int xts_encrypt_aesni(struct skcipher_request *req)
+int xts_encrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_encrypt);
 }
+EXPORT_SYMBOL_GPL(xts_encrypt_aesni);
 
-static int xts_decrypt_aesni(struct skcipher_request *req)
+int xts_decrypt_aesni(struct skcipher_request *req)
 {
 	return xts_crypt(req, aesni_xts_encrypt_iv, aesni_xts_decrypt);
 }
+EXPORT_SYMBOL_GPL(xts_decrypt_aesni);
 
 static struct crypto_alg aesni_cipher_alg = {
 	.cra_name		= "aes",
diff --git a/arch/x86/crypto/aesni-xts.h b/arch/x86/crypto/aesni-xts.h
new file mode 100644
index 000000000000..9833da2bd9d2
--- /dev/null
+++ b/arch/x86/crypto/aesni-xts.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESNI_XTS_H
+#define _AESNI_XTS_H
+
+/*
+ * These AES-NI functions are used by the AES-KL code as a fallback when
+ * a 192-bit key is provided. Key Locker does not support 192-bit keys.
+ */
+
+int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen);
+int xts_encrypt_aesni(struct skcipher_request *req);
+int xts_decrypt_aesni(struct skcipher_request *req);
+
+#endif /* _AESNI_XTS_H */
-- 
2.34.1