[PATCH 00/35] crypto: Provide aes-round.h and host accel

Richard Henderson posted 35 patches 10 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230603023426.1064431-1-richard.henderson@linaro.org
Maintainers: "Daniel P. Berrangé" <berrange@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Daniel Henrique Barboza <danielhb413@gmail.com>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bin.meng@windriver.com>, Weiwei Li <liweiwei@iscas.ac.cn>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>, Eduardo Habkost <eduardo@habkost.net>, "Alex Bennée" <alex.bennee@linaro.org>
There is a newer version of this series
host/include/aarch64/host/aes-round.h   | 204 ++++++
host/include/aarch64/host/cpuinfo.h     |   1 +
host/include/generic/host/aes-round.h   |  36 ++
host/include/i386/host/aes-round.h      | 148 +++++
host/include/i386/host/cpuinfo.h        |   1 +
host/include/x86_64/host/aes-round.h    |   1 +
include/crypto/aes-round.h              | 158 +++++
include/crypto/aes.h                    |  30 -
target/arm/helper.h                     |   2 +
target/i386/ops_sse.h                   |  64 +-
target/arm/tcg/sve.decode               |   4 +-
crypto/aes.c                            | 808 ++++++++++++++++--------
target/arm/tcg/crypto_helper.c          | 245 +++----
target/arm/tcg/translate-a64.c          |  13 +-
target/arm/tcg/translate-neon.c         |   4 +-
target/arm/tcg/translate-sve.c          |   8 +-
target/ppc/int_helper.c                 |  58 +-
target/riscv/crypto_helper.c            | 142 ++---
tests/tcg/aarch64/test-aes.c            |  58 ++
tests/tcg/i386/test-aes.c               |  68 ++
tests/tcg/ppc64/test-aes.c              | 116 ++++
tests/tcg/riscv64/test-aes.c            |  76 +++
util/cpuinfo-aarch64.c                  |   2 +
util/cpuinfo-i386.c                     |   3 +
tests/tcg/multiarch/test-aes-main.c.inc | 183 ++++++
tests/tcg/aarch64/Makefile.target       |   4 +
tests/tcg/i386/Makefile.target          |   4 +
tests/tcg/ppc64/Makefile.target         |   1 +
tests/tcg/riscv64/Makefile.target       |   4 +
29 files changed, 1776 insertions(+), 670 deletions(-)
create mode 100644 host/include/aarch64/host/aes-round.h
create mode 100644 host/include/generic/host/aes-round.h
create mode 100644 host/include/i386/host/aes-round.h
create mode 100644 host/include/x86_64/host/aes-round.h
create mode 100644 include/crypto/aes-round.h
create mode 100644 tests/tcg/aarch64/test-aes.c
create mode 100644 tests/tcg/i386/test-aes.c
create mode 100644 tests/tcg/ppc64/test-aes.c
create mode 100644 tests/tcg/riscv64/test-aes.c
create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc
[PATCH 00/35] crypto: Provide aes-round.h and host accel
Posted by Richard Henderson 10 months ago
Inspired by Ard Biesheuvel's RFC patches for accelerating AES
under emulation, provide a set of primitives that maps between
the guest and host fragments.

There is a small guest correctness test case.

I think the end result is quite a bit cleaner, since the logic
is now centralized, rather than spread across 4 different guests.

Further work could clean up crypto/aes.c itself to use these
instead of the tables directly.  I'm sure that's just an ultimate
fallback when an appropriate system library is not available, and
so not terribly important, but it could still significantly reduce
the amount of code we carry.

I would imagine structuring a polynomial multiplication header
in a similar way.  There are 4 or 5 versions of those spread across
the different guests.

Anyway, please review.


r~


Richard Henderson (35):
  tests/multiarch: Add test-aes
  target/arm: Move aesmc and aesimc tables to crypto/aes.c
  crypto/aes: Add constants for ShiftRows, InvShiftRows
  crypto: Add aesenc_SB_SR
  target/i386: Use aesenc_SB_SR
  target/arm: Demultiplex AESE and AESMC
  target/arm: Use aesenc_SB_SR
  target/ppc: Use aesenc_SB_SR
  target/riscv: Use aesenc_SB_SR
  crypto: Add aesdec_ISB_ISR
  target/i386: Use aesdec_ISB_ISR
  target/arm: Use aesdec_ISB_ISR
  target/ppc: Use aesdec_ISB_ISR
  target/riscv: Use aesdec_ISB_ISR
  crypto: Add aesenc_MC
  target/arm: Use aesenc_MC
  crypto: Add aesdec_IMC
  target/i386: Use aesdec_IMC
  target/arm: Use aesdec_IMC
  target/riscv: Use aesdec_IMC
  crypto: Add aesenc_SB_SR_MC_AK
  target/i386: Use aesenc_SB_SR_MC_AK
  target/ppc: Use aesenc_SB_SR_MC_AK
  target/riscv: Use aesenc_SB_SR_MC_AK
  crypto: Add aesdec_ISB_ISR_IMC_AK
  target/i386: Use aesdec_ISB_ISR_IMC_AK
  target/riscv: Use aesdec_ISB_ISR_IMC_AK
  crypto: Add aesdec_ISB_ISR_AK_IMC
  target/ppc: Use aesdec_ISB_ISR_AK_IMC
  host/include/i386: Implement aes-round.h
  host/include/aarch64: Implement aes-round.h
  crypto: Remove AES_shifts, AES_ishifts
  crypto: Implement aesdec_IMC with AES_imc_rot
  crypto: Remove AES_imc
  crypto: Unexport AES_*_rot, AES_TeN, AES_TdN

 host/include/aarch64/host/aes-round.h   | 204 ++++++
 host/include/aarch64/host/cpuinfo.h     |   1 +
 host/include/generic/host/aes-round.h   |  36 ++
 host/include/i386/host/aes-round.h      | 148 +++++
 host/include/i386/host/cpuinfo.h        |   1 +
 host/include/x86_64/host/aes-round.h    |   1 +
 include/crypto/aes-round.h              | 158 +++++
 include/crypto/aes.h                    |  30 -
 target/arm/helper.h                     |   2 +
 target/i386/ops_sse.h                   |  64 +-
 target/arm/tcg/sve.decode               |   4 +-
 crypto/aes.c                            | 808 ++++++++++++++++--------
 target/arm/tcg/crypto_helper.c          | 245 +++----
 target/arm/tcg/translate-a64.c          |  13 +-
 target/arm/tcg/translate-neon.c         |   4 +-
 target/arm/tcg/translate-sve.c          |   8 +-
 target/ppc/int_helper.c                 |  58 +-
 target/riscv/crypto_helper.c            | 142 ++---
 tests/tcg/aarch64/test-aes.c            |  58 ++
 tests/tcg/i386/test-aes.c               |  68 ++
 tests/tcg/ppc64/test-aes.c              | 116 ++++
 tests/tcg/riscv64/test-aes.c            |  76 +++
 util/cpuinfo-aarch64.c                  |   2 +
 util/cpuinfo-i386.c                     |   3 +
 tests/tcg/multiarch/test-aes-main.c.inc | 183 ++++++
 tests/tcg/aarch64/Makefile.target       |   4 +
 tests/tcg/i386/Makefile.target          |   4 +
 tests/tcg/ppc64/Makefile.target         |   1 +
 tests/tcg/riscv64/Makefile.target       |   4 +
 29 files changed, 1776 insertions(+), 670 deletions(-)
 create mode 100644 host/include/aarch64/host/aes-round.h
 create mode 100644 host/include/generic/host/aes-round.h
 create mode 100644 host/include/i386/host/aes-round.h
 create mode 100644 host/include/x86_64/host/aes-round.h
 create mode 100644 include/crypto/aes-round.h
 create mode 100644 tests/tcg/aarch64/test-aes.c
 create mode 100644 tests/tcg/i386/test-aes.c
 create mode 100644 tests/tcg/ppc64/test-aes.c
 create mode 100644 tests/tcg/riscv64/test-aes.c
 create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc

-- 
2.34.1
Re: [PATCH 00/35] crypto: Provide aes-round.h and host accel
Posted by Ard Biesheuvel 9 months, 4 weeks ago
On Sat, 3 Jun 2023 at 04:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Inspired by Ard Biesheuvel's RFC patches for accelerating AES
> under emulation, provide a set of primitives that maps between
> the guest and host fragments.
>
> There is a small guest correctness test case.
>
> I think the end result is quite a bit cleaner, since the logic
> is now centralized, rather than spread across 4 different guests.
>
> Further work could clean up crypto/aes.c itself to use these
> instead of the tables directly.  I'm sure that's just an ultimate
> fallback when an appropriate system library is not available, and
> so not terribly important, but it could still significantly reduce
> the amount of code we carry.
>
> I would imagine structuring a polynomial multiplication header
> in a similar way.  There are 4 or 5 versions of those spread across
> the different guests.
>
> Anyway, please review.
>
>
> r~
>
>
> Richard Henderson (35):
>   tests/multiarch: Add test-aes
>   target/arm: Move aesmc and aesimc tables to crypto/aes.c
>   crypto/aes: Add constants for ShiftRows, InvShiftRows
>   crypto: Add aesenc_SB_SR
>   target/i386: Use aesenc_SB_SR
>   target/arm: Demultiplex AESE and AESMC
>   target/arm: Use aesenc_SB_SR
>   target/ppc: Use aesenc_SB_SR
>   target/riscv: Use aesenc_SB_SR
>   crypto: Add aesdec_ISB_ISR
>   target/i386: Use aesdec_ISB_ISR
>   target/arm: Use aesdec_ISB_ISR
>   target/ppc: Use aesdec_ISB_ISR
>   target/riscv: Use aesdec_ISB_ISR
>   crypto: Add aesenc_MC
>   target/arm: Use aesenc_MC
>   crypto: Add aesdec_IMC
>   target/i386: Use aesdec_IMC
>   target/arm: Use aesdec_IMC
>   target/riscv: Use aesdec_IMC
>   crypto: Add aesenc_SB_SR_MC_AK
>   target/i386: Use aesenc_SB_SR_MC_AK
>   target/ppc: Use aesenc_SB_SR_MC_AK
>   target/riscv: Use aesenc_SB_SR_MC_AK
>   crypto: Add aesdec_ISB_ISR_IMC_AK
>   target/i386: Use aesdec_ISB_ISR_IMC_AK
>   target/riscv: Use aesdec_ISB_ISR_IMC_AK
>   crypto: Add aesdec_ISB_ISR_AK_IMC
>   target/ppc: Use aesdec_ISB_ISR_AK_IMC
>   host/include/i386: Implement aes-round.h
>   host/include/aarch64: Implement aes-round.h
>   crypto: Remove AES_shifts, AES_ishifts
>   crypto: Implement aesdec_IMC with AES_imc_rot
>   crypto: Remove AES_imc
>   crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
>

This is looking very good - it is clearly a much better abstraction
than what I proposed, and I'd expect the performance boost to be the
same.
Re: [PATCH 00/35] crypto: Provide aes-round.h and host accel
Posted by Ard Biesheuvel 9 months, 4 weeks ago
On Sat, 3 Jun 2023 at 15:23, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sat, 3 Jun 2023 at 04:34, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Inspired by Ard Biesheuvel's RFC patches for accelerating AES
> > under emulation, provide a set of primitives that maps between
> > the guest and host fragments.
> >
> > There is a small guest correctness test case.
> >
> > I think the end result is quite a bit cleaner, since the logic
> > is now centralized, rather than spread across 4 different guests.
> >
> > Further work could clean up crypto/aes.c itself to use these
> > instead of the tables directly.  I'm sure that's just an ultimate
> > fallback when an appropriate system library is not available, and
> > so not terribly important, but it could still significantly reduce
> > the amount of code we carry.
> >
> > I would imagine structuring a polynomial multiplication header
> > in a similar way.  There are 4 or 5 versions of those spread across
> > the different guests.
> >
> > Anyway, please review.
> >
> >
> > r~
> >
> >
> > Richard Henderson (35):
> >   tests/multiarch: Add test-aes
> >   target/arm: Move aesmc and aesimc tables to crypto/aes.c
> >   crypto/aes: Add constants for ShiftRows, InvShiftRows
> >   crypto: Add aesenc_SB_SR
> >   target/i386: Use aesenc_SB_SR
> >   target/arm: Demultiplex AESE and AESMC
> >   target/arm: Use aesenc_SB_SR
> >   target/ppc: Use aesenc_SB_SR
> >   target/riscv: Use aesenc_SB_SR
> >   crypto: Add aesdec_ISB_ISR
> >   target/i386: Use aesdec_ISB_ISR
> >   target/arm: Use aesdec_ISB_ISR
> >   target/ppc: Use aesdec_ISB_ISR
> >   target/riscv: Use aesdec_ISB_ISR
> >   crypto: Add aesenc_MC
> >   target/arm: Use aesenc_MC
> >   crypto: Add aesdec_IMC
> >   target/i386: Use aesdec_IMC
> >   target/arm: Use aesdec_IMC
> >   target/riscv: Use aesdec_IMC
> >   crypto: Add aesenc_SB_SR_MC_AK
> >   target/i386: Use aesenc_SB_SR_MC_AK
> >   target/ppc: Use aesenc_SB_SR_MC_AK
> >   target/riscv: Use aesenc_SB_SR_MC_AK
> >   crypto: Add aesdec_ISB_ISR_IMC_AK
> >   target/i386: Use aesdec_ISB_ISR_IMC_AK
> >   target/riscv: Use aesdec_ISB_ISR_IMC_AK
> >   crypto: Add aesdec_ISB_ISR_AK_IMC
> >   target/ppc: Use aesdec_ISB_ISR_AK_IMC
> >   host/include/i386: Implement aes-round.h
> >   host/include/aarch64: Implement aes-round.h
> >   crypto: Remove AES_shifts, AES_ishifts
> >   crypto: Implement aesdec_IMC with AES_imc_rot
> >   crypto: Remove AES_imc
> >   crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
> >
>
> This is looking very good - it is clearly a much better abstraction
> than what I proposed, and I'd expect the performance boost to be the
> same.

Benchmark results for OpenSSL running in emulation on TX2:

Without acceleration:

$ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-ctr
version: 3.2.0-dev
built on: Thu Jun  1 17:06:09 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-CTR      25146.07k    50482.19k    69373.44k    76236.80k
78391.98k    78381.06k


With acceleration:

$ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-ctr
version: 3.2.0-dev
built on: Thu Jun  1 17:06:09 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-CTR      28774.46k    81173.59k   162346.24k   206301.53k
224214.22k   225600.56k