[PATCH v3 00/19] crypto: Provide clmul.h and host accel

Richard Henderson posted 19 patches 8 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230821161854.419893-1-richard.henderson@linaro.org
Maintainers: "Daniel P. Berrangé" <berrange@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Daniel Henrique Barboza <danielhb413@gmail.com>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>, Nicholas Piggin <npiggin@gmail.com>, David Hildenbrand <david@redhat.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Thomas Huth <thuth@redhat.com>
host/include/aarch64/host/cpuinfo.h      |   1 +
host/include/aarch64/host/crypto/clmul.h |  41 +++++
host/include/generic/host/crypto/clmul.h |  15 ++
host/include/i386/host/cpuinfo.h         |   1 +
host/include/i386/host/crypto/clmul.h    |  29 ++++
host/include/x86_64/host/crypto/clmul.h  |   1 +
include/crypto/clmul.h                   |  83 ++++++++++
include/qemu/cpuid.h                     |   3 +
target/arm/tcg/vec_internal.h            |  11 --
target/i386/ops_sse.h                    |  40 ++---
crypto/clmul.c                           | 112 ++++++++++++++
target/arm/tcg/mve_helper.c              |  16 +-
target/arm/tcg/vec_helper.c              | 102 ++-----------
target/ppc/int_helper.c                  |  64 ++++----
target/s390x/tcg/vec_int_helper.c        | 186 ++++++++++-------------
util/cpuinfo-aarch64.c                   |   4 +-
util/cpuinfo-i386.c                      |   1 +
crypto/meson.build                       |   9 +-
18 files changed, 434 insertions(+), 285 deletions(-)
create mode 100644 host/include/aarch64/host/crypto/clmul.h
create mode 100644 host/include/generic/host/crypto/clmul.h
create mode 100644 host/include/i386/host/crypto/clmul.h
create mode 100644 host/include/x86_64/host/crypto/clmul.h
create mode 100644 include/crypto/clmul.h
create mode 100644 crypto/clmul.c
[PATCH v3 00/19] crypto: Provide clmul.h and host accel
Posted by Richard Henderson 8 months, 2 weeks ago
Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
carry-less multiply under emulation.

Changes for v3:
  * Update target/i386 ops_sse.h.
  * Apply r-b.

Changes for v2:
  * Only accelerate clmul_64; keep generic helpers for other sizes.
  * Drop most of the Int128 interfaces, except for clmul_64.
  * Use the same acceleration format as aes-round.h.


r~


[1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/


Richard Henderson (19):
  crypto: Add generic 8-bit carry-less multiply routines
  target/arm: Use clmul_8* routines
  target/s390x: Use clmul_8* routines
  target/ppc: Use clmul_8* routines
  crypto: Add generic 16-bit carry-less multiply routines
  target/arm: Use clmul_16* routines
  target/s390x: Use clmul_16* routines
  target/ppc: Use clmul_16* routines
  crypto: Add generic 32-bit carry-less multiply routines
  target/arm: Use clmul_32* routines
  target/s390x: Use clmul_32* routines
  target/ppc: Use clmul_32* routines
  crypto: Add generic 64-bit carry-less multiply routine
  target/arm: Use clmul_64
  target/i386: Use clmul_64
  target/s390x: Use clmul_64
  target/ppc: Use clmul_64
  host/include/i386: Implement clmul.h
  host/include/aarch64: Implement clmul.h

 host/include/aarch64/host/cpuinfo.h      |   1 +
 host/include/aarch64/host/crypto/clmul.h |  41 +++++
 host/include/generic/host/crypto/clmul.h |  15 ++
 host/include/i386/host/cpuinfo.h         |   1 +
 host/include/i386/host/crypto/clmul.h    |  29 ++++
 host/include/x86_64/host/crypto/clmul.h  |   1 +
 include/crypto/clmul.h                   |  83 ++++++++++
 include/qemu/cpuid.h                     |   3 +
 target/arm/tcg/vec_internal.h            |  11 --
 target/i386/ops_sse.h                    |  40 ++---
 crypto/clmul.c                           | 112 ++++++++++++++
 target/arm/tcg/mve_helper.c              |  16 +-
 target/arm/tcg/vec_helper.c              | 102 ++-----------
 target/ppc/int_helper.c                  |  64 ++++----
 target/s390x/tcg/vec_int_helper.c        | 186 ++++++++++-------------
 util/cpuinfo-aarch64.c                   |   4 +-
 util/cpuinfo-i386.c                      |   1 +
 crypto/meson.build                       |   9 +-
 18 files changed, 434 insertions(+), 285 deletions(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h
 create mode 100644 host/include/generic/host/crypto/clmul.h
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

-- 
2.34.1
Re: [PATCH v3 00/19] crypto: Provide clmul.h and host accel
Posted by Richard Henderson 7 months, 3 weeks ago
Ping.

Still missing r-b on patches 1, 4, 5, 8, 9, 12, 13, 18.

r~

On 8/21/23 09:18, Richard Henderson wrote:
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
> 
> Changes for v3:
>    * Update target/i386 ops_sse.h.
>    * Apply r-b.
> 
> Changes for v2:
>    * Only accelerate clmul_64; keep generic helpers for other sizes.
>    * Drop most of the Int128 interfaces, except for clmul_64.
>    * Use the same acceleration format as aes-round.h.
> 
> 
> r~
> 
> 
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/
> 
> 
> Richard Henderson (19):
>    crypto: Add generic 8-bit carry-less multiply routines
>    target/arm: Use clmul_8* routines
>    target/s390x: Use clmul_8* routines
>    target/ppc: Use clmul_8* routines
>    crypto: Add generic 16-bit carry-less multiply routines
>    target/arm: Use clmul_16* routines
>    target/s390x: Use clmul_16* routines
>    target/ppc: Use clmul_16* routines
>    crypto: Add generic 32-bit carry-less multiply routines
>    target/arm: Use clmul_32* routines
>    target/s390x: Use clmul_32* routines
>    target/ppc: Use clmul_32* routines
>    crypto: Add generic 64-bit carry-less multiply routine
>    target/arm: Use clmul_64
>    target/i386: Use clmul_64
>    target/s390x: Use clmul_64
>    target/ppc: Use clmul_64
>    host/include/i386: Implement clmul.h
>    host/include/aarch64: Implement clmul.h
> 
>   host/include/aarch64/host/cpuinfo.h      |   1 +
>   host/include/aarch64/host/crypto/clmul.h |  41 +++++
>   host/include/generic/host/crypto/clmul.h |  15 ++
>   host/include/i386/host/cpuinfo.h         |   1 +
>   host/include/i386/host/crypto/clmul.h    |  29 ++++
>   host/include/x86_64/host/crypto/clmul.h  |   1 +
>   include/crypto/clmul.h                   |  83 ++++++++++
>   include/qemu/cpuid.h                     |   3 +
>   target/arm/tcg/vec_internal.h            |  11 --
>   target/i386/ops_sse.h                    |  40 ++---
>   crypto/clmul.c                           | 112 ++++++++++++++
>   target/arm/tcg/mve_helper.c              |  16 +-
>   target/arm/tcg/vec_helper.c              | 102 ++-----------
>   target/ppc/int_helper.c                  |  64 ++++----
>   target/s390x/tcg/vec_int_helper.c        | 186 ++++++++++-------------
>   util/cpuinfo-aarch64.c                   |   4 +-
>   util/cpuinfo-i386.c                      |   1 +
>   crypto/meson.build                       |   9 +-
>   18 files changed, 434 insertions(+), 285 deletions(-)
>   create mode 100644 host/include/aarch64/host/crypto/clmul.h
>   create mode 100644 host/include/generic/host/crypto/clmul.h
>   create mode 100644 host/include/i386/host/crypto/clmul.h
>   create mode 100644 host/include/x86_64/host/crypto/clmul.h
>   create mode 100644 include/crypto/clmul.h
>   create mode 100644 crypto/clmul.c
>
Re: [PATCH v3 00/19] crypto: Provide clmul.h and host accel
Posted by Ard Biesheuvel 8 months, 2 weeks ago
On Mon, 21 Aug 2023 at 18:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
>
> Changes for v3:
>   * Update target/i386 ops_sse.h.
>   * Apply r-b.
>
> Changes for v2:
>   * Only accelerate clmul_64; keep generic helpers for other sizes.
>   * Drop most of the Int128 interfaces, except for clmul_64.
>   * Use the same acceleration format as aes-round.h.
>
>
> r~
>
>
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/
>
>
> Richard Henderson (19):
>   crypto: Add generic 8-bit carry-less multiply routines
>   target/arm: Use clmul_8* routines
>   target/s390x: Use clmul_8* routines
>   target/ppc: Use clmul_8* routines
>   crypto: Add generic 16-bit carry-less multiply routines
>   target/arm: Use clmul_16* routines
>   target/s390x: Use clmul_16* routines
>   target/ppc: Use clmul_16* routines
>   crypto: Add generic 32-bit carry-less multiply routines
>   target/arm: Use clmul_32* routines
>   target/s390x: Use clmul_32* routines
>   target/ppc: Use clmul_32* routines
>   crypto: Add generic 64-bit carry-less multiply routine
>   target/arm: Use clmul_64
>   target/i386: Use clmul_64
>   target/s390x: Use clmul_64
>   target/ppc: Use clmul_64
>   host/include/i386: Implement clmul.h
>   host/include/aarch64: Implement clmul.h
>

OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
on arm64/ThunderX2, and the speedup is 7x (\o/)

Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>



Distro qemu (no acceleration):

$ qemu-x86_64 --version
qemu-x86_64 version 7.2.4 (Debian 1:7.2+dfsg-7+deb12u1)

$ apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM       8856.13k    13820.95k    17375.49k    16826.37k
16870.06k    17208.66k


QEMU built with this series applied onto latest master:

$ ~/build/qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfffa320b0fcbfffd:0x8041020c01dc47a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM      14237.01k    34176.34k    70633.13k    97372.84k
119668.74k   122049.88k
Re: [PATCH v3 00/19] crypto: Provide clmul.h and host accel
Posted by Richard Henderson 8 months, 2 weeks ago
On 8/21/23 11:08, Ard Biesheuvel wrote:
> OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
> on arm64/ThunderX2, and the speedup is 7x (\o/)

Excellent, thanks.


r~