[PATCH v2 00/15] SHA-3 library

Eric Biggers posted 15 patches 3 months, 2 weeks ago
Documentation/crypto/index.rst                |   1 +
Documentation/crypto/sha3.rst                 | 130 ++++++
arch/arm64/configs/defconfig                  |   2 +-
arch/arm64/crypto/Kconfig                     |  11 -
arch/arm64/crypto/Makefile                    |   3 -
arch/arm64/crypto/sha3-ce-glue.c              | 151 -------
arch/s390/configs/debug_defconfig             |   3 +-
arch/s390/configs/defconfig                   |   3 +-
arch/s390/crypto/Kconfig                      |  20 -
arch/s390/crypto/Makefile                     |   2 -
arch/s390/crypto/sha.h                        |  51 ---
arch/s390/crypto/sha3_256_s390.c              | 157 -------
arch/s390/crypto/sha3_512_s390.c              | 157 -------
arch/s390/crypto/sha_common.c                 | 117 -----
crypto/Kconfig                                |   1 +
crypto/Makefile                               |   2 +-
crypto/jitterentropy-kcapi.c                  |  12 +-
crypto/sha3.c                                 | 166 +++++++
crypto/sha3_generic.c                         | 290 ------------
crypto/testmgr.c                              |   8 +
include/crypto/sha3.h                         | 306 ++++++++++++-
lib/crypto/Kconfig                            |  13 +
lib/crypto/Makefile                           |  10 +
.../crypto/arm64}/sha3-ce-core.S              |  67 +--
lib/crypto/arm64/sha3.h                       |  62 +++
lib/crypto/fips.h                             |   7 +
lib/crypto/s390/sha3.h                        | 151 +++++++
lib/crypto/sha3.c                             | 411 +++++++++++++++++
lib/crypto/tests/Kconfig                      |  11 +
lib/crypto/tests/Makefile                     |   1 +
lib/crypto/tests/sha3-testvecs.h              | 249 +++++++++++
lib/crypto/tests/sha3_kunit.c                 | 422 ++++++++++++++++++
scripts/crypto/gen-fips-testvecs.py           |   4 +
scripts/crypto/gen-hash-testvecs.py           |  27 +-
34 files changed, 2012 insertions(+), 1016 deletions(-)
create mode 100644 Documentation/crypto/sha3.rst
delete mode 100644 arch/arm64/crypto/sha3-ce-glue.c
delete mode 100644 arch/s390/crypto/sha.h
delete mode 100644 arch/s390/crypto/sha3_256_s390.c
delete mode 100644 arch/s390/crypto/sha3_512_s390.c
delete mode 100644 arch/s390/crypto/sha_common.c
create mode 100644 crypto/sha3.c
delete mode 100644 crypto/sha3_generic.c
rename {arch/arm64/crypto => lib/crypto/arm64}/sha3-ce-core.S (84%)
create mode 100644 lib/crypto/arm64/sha3.h
create mode 100644 lib/crypto/s390/sha3.h
create mode 100644 lib/crypto/sha3.c
create mode 100644 lib/crypto/tests/sha3-testvecs.h
create mode 100644 lib/crypto/tests/sha3_kunit.c
[PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months, 2 weeks ago
This series is targeting libcrypto-next.  It can also be retrieved from:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git sha3-lib-v2

This series adds SHA-3 support to lib/crypto/.  This includes support
for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
and also support for the extendable-output functions SHAKE128 and
SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.

The architecture-optimized SHA-3 code for arm64 and s390 is migrated
into lib/crypto/.  (The existing s390 code couldn't really be reused, so
really I rewrote it from scratch.)  This makes the SHA-3 library
functions be accelerated on these architectures.

Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
algorithms are reimplemented on top of the library API.

If the s390 folks could re-test the s390 optimized SHA-3 code (by
enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
would be helpful.  QEMU doesn't support the instructions it uses.  Also,
it would be helpful to provide the benchmark output from just before
"lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
functions".  Then we can verify that each change is useful.

Changed in v2:
  - Added missing selection of CRYPTO_LIB_SHA3 from CRYPTO_SHA3.
  - Fixed a bug where incorrect SHAKE output was produced if a
    zero-length squeeze was followed by a nonzero-length squeeze.
  - Improved the SHAKE tests.
  - Utilized the one-shot SHA-3 digest instructions on s390.
  - Split the s390 changes into several patches.
  - Folded some of my patches into David's.
  - Dropped some unnecessary changes from the first 2 patches.
  - Lots more cleanups, mainly to "lib/crypto: sha3: Add SHA-3 support".

Changed in v1 (vs. first 5 patches of David's v6 patchset):
  - Migrated the arm64 and s390 code into lib/crypto/
  - Simplified the library API
  - Added FIPS test
  - Many other fixes and improvements

The first 5 patches are derived from David's v6 patchset
(https://lore.kernel.org/linux-crypto/20251017144311.817771-1-dhowells@redhat.com/).
Earlier changelogs can be found there.

David Howells (5):
  crypto: s390/sha3 - Rename conflicting functions
  crypto: arm64/sha3 - Rename conflicting function
  lib/crypto: sha3: Add SHA-3 support
  lib/crypto: sha3: Move SHA3 Iota step mapping into round function
  lib/crypto: tests: Add SHA3 kunit tests

Eric Biggers (10):
  lib/crypto: tests: Add additional SHAKE tests
  lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
  crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library
  lib/crypto: arm64/sha3: Migrate optimized code into library
  lib/crypto: s390/sha3: Add optimized Keccak functions
  lib/crypto: sha3: Support arch overrides of one-shot digest functions
  lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
  crypto: jitterentropy - Use default sha3 implementation
  crypto: sha3 - Reimplement using library API
  crypto: s390/sha3 - Remove superseded SHA-3 code

 Documentation/crypto/index.rst                |   1 +
 Documentation/crypto/sha3.rst                 | 130 ++++++
 arch/arm64/configs/defconfig                  |   2 +-
 arch/arm64/crypto/Kconfig                     |  11 -
 arch/arm64/crypto/Makefile                    |   3 -
 arch/arm64/crypto/sha3-ce-glue.c              | 151 -------
 arch/s390/configs/debug_defconfig             |   3 +-
 arch/s390/configs/defconfig                   |   3 +-
 arch/s390/crypto/Kconfig                      |  20 -
 arch/s390/crypto/Makefile                     |   2 -
 arch/s390/crypto/sha.h                        |  51 ---
 arch/s390/crypto/sha3_256_s390.c              | 157 -------
 arch/s390/crypto/sha3_512_s390.c              | 157 -------
 arch/s390/crypto/sha_common.c                 | 117 -----
 crypto/Kconfig                                |   1 +
 crypto/Makefile                               |   2 +-
 crypto/jitterentropy-kcapi.c                  |  12 +-
 crypto/sha3.c                                 | 166 +++++++
 crypto/sha3_generic.c                         | 290 ------------
 crypto/testmgr.c                              |   8 +
 include/crypto/sha3.h                         | 306 ++++++++++++-
 lib/crypto/Kconfig                            |  13 +
 lib/crypto/Makefile                           |  10 +
 .../crypto/arm64}/sha3-ce-core.S              |  67 +--
 lib/crypto/arm64/sha3.h                       |  62 +++
 lib/crypto/fips.h                             |   7 +
 lib/crypto/s390/sha3.h                        | 151 +++++++
 lib/crypto/sha3.c                             | 411 +++++++++++++++++
 lib/crypto/tests/Kconfig                      |  11 +
 lib/crypto/tests/Makefile                     |   1 +
 lib/crypto/tests/sha3-testvecs.h              | 249 +++++++++++
 lib/crypto/tests/sha3_kunit.c                 | 422 ++++++++++++++++++
 scripts/crypto/gen-fips-testvecs.py           |   4 +
 scripts/crypto/gen-hash-testvecs.py           |  27 +-
 34 files changed, 2012 insertions(+), 1016 deletions(-)
 create mode 100644 Documentation/crypto/sha3.rst
 delete mode 100644 arch/arm64/crypto/sha3-ce-glue.c
 delete mode 100644 arch/s390/crypto/sha.h
 delete mode 100644 arch/s390/crypto/sha3_256_s390.c
 delete mode 100644 arch/s390/crypto/sha3_512_s390.c
 delete mode 100644 arch/s390/crypto/sha_common.c
 create mode 100644 crypto/sha3.c
 delete mode 100644 crypto/sha3_generic.c
 rename {arch/arm64/crypto => lib/crypto/arm64}/sha3-ce-core.S (84%)
 create mode 100644 lib/crypto/arm64/sha3.h
 create mode 100644 lib/crypto/s390/sha3.h
 create mode 100644 lib/crypto/sha3.c
 create mode 100644 lib/crypto/tests/sha3-testvecs.h
 create mode 100644 lib/crypto/tests/sha3_kunit.c

base-commit: e3068492d0016d0ea9a1ff07dbfa624d2ec773ca
-- 
2.51.1.dirty
Re: [PATCH v2 00/15] SHA-3 library
Posted by Ard Biesheuvel 3 months, 1 week ago
On Sun, 26 Oct 2025 at 06:53, Eric Biggers <ebiggers@kernel.org> wrote:
>
> This series is targeting libcrypto-next.  It can also be retrieved from:
>
>     git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git sha3-lib-v2
>
> This series adds SHA-3 support to lib/crypto/.  This includes support
> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> and also support for the extendable-output functions SHAKE128 and
> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
>
> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> into lib/crypto/.  (The existing s390 code couldn't really be reused, so
> really I rewrote it from scratch.)  This makes the SHA-3 library
> functions be accelerated on these architectures.
>
> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> algorithms are reimplemented on top of the library API.
>
> If the s390 folks could re-test the s390 optimized SHA-3 code (by
> enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> it would be helpful to provide the benchmark output from just before
> "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions".  Then we can verify that each change is useful.
>
> Changed in v2:
>   - Added missing selection of CRYPTO_LIB_SHA3 from CRYPTO_SHA3.
>   - Fixed a bug where incorrect SHAKE output was produced if a
>     zero-length squeeze was followed by a nonzero-length squeeze.
>   - Improved the SHAKE tests.
>   - Utilized the one-shot SHA-3 digest instructions on s390.
>   - Split the s390 changes into several patches.
>   - Folded some of my patches into David's.
>   - Dropped some unnecessary changes from the first 2 patches.
>   - Lots more cleanups, mainly to "lib/crypto: sha3: Add SHA-3 support".
>
> Changed in v1 (vs. first 5 patches of David's v6 patchset):
>   - Migrated the arm64 and s390 code into lib/crypto/
>   - Simplified the library API
>   - Added FIPS test
>   - Many other fixes and improvements
>
> The first 5 patches are derived from David's v6 patchset
> (https://lore.kernel.org/linux-crypto/20251017144311.817771-1-dhowells@redhat.com/).
> Earlier changelogs can be found there.
>
> David Howells (5):
>   crypto: s390/sha3 - Rename conflicting functions
>   crypto: arm64/sha3 - Rename conflicting function
>   lib/crypto: sha3: Add SHA-3 support
>   lib/crypto: sha3: Move SHA3 Iota step mapping into round function
>   lib/crypto: tests: Add SHA3 kunit tests
>
> Eric Biggers (10):
>   lib/crypto: tests: Add additional SHAKE tests
>   lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
>   crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library
>   lib/crypto: arm64/sha3: Migrate optimized code into library
>   lib/crypto: s390/sha3: Add optimized Keccak functions
>   lib/crypto: sha3: Support arch overrides of one-shot digest functions
>   lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
>   crypto: jitterentropy - Use default sha3 implementation
>   crypto: sha3 - Reimplement using library API
>   crypto: s390/sha3 - Remove superseded SHA-3 code
>

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months, 1 week ago
On 2025-10-26 06:50, Eric Biggers wrote:
> This series is targeting libcrypto-next.  It can also be retrieved 
> from:
> 
>     git fetch
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> sha3-lib-v2
> 
> This series adds SHA-3 support to lib/crypto/.  This includes support
> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> and also support for the extendable-output functions SHAKE128 and
> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> 
> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> into lib/crypto/.  (The existing s390 code couldn't really be reused, 
> so
> really I rewrote it from scratch.)  This makes the SHA-3 library
> functions be accelerated on these architectures.
> 
> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> algorithms are reimplemented on top of the library API.
> 
> If the s390 folks could re-test the s390 optimized SHA-3 code (by
> enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> would be helpful.  QEMU doesn't support the instructions it uses.  
> Also,
> it would be helpful to provide the benchmark output from just before
> "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions".  Then we can verify that each change is useful.
> 
> Changed in v2:
>   - Added missing selection of CRYPTO_LIB_SHA3 from CRYPTO_SHA3.
>   - Fixed a bug where incorrect SHAKE output was produced if a
>     zero-length squeeze was followed by a nonzero-length squeeze.
>   - Improved the SHAKE tests.
>   - Utilized the one-shot SHA-3 digest instructions on s390.
>   - Split the s390 changes into several patches.
>   - Folded some of my patches into David's.
>   - Dropped some unnecessary changes from the first 2 patches.
>   - Lots more cleanups, mainly to "lib/crypto: sha3: Add SHA-3 
> support".
> 
> Changed in v1 (vs. first 5 patches of David's v6 patchset):
>   - Migrated the arm64 and s390 code into lib/crypto/
>   - Simplified the library API
>   - Added FIPS test
>   - Many other fixes and improvements
> 
> The first 5 patches are derived from David's v6 patchset
> (https://lore.kernel.org/linux-crypto/20251017144311.817771-1-dhowells@redhat.com/).
> Earlier changelogs can be found there.
> 
> David Howells (5):
>   crypto: s390/sha3 - Rename conflicting functions
>   crypto: arm64/sha3 - Rename conflicting function
>   lib/crypto: sha3: Add SHA-3 support
>   lib/crypto: sha3: Move SHA3 Iota step mapping into round function
>   lib/crypto: tests: Add SHA3 kunit tests
> 
> Eric Biggers (10):
>   lib/crypto: tests: Add additional SHAKE tests
>   lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
>   crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for 
> library
>   lib/crypto: arm64/sha3: Migrate optimized code into library
>   lib/crypto: s390/sha3: Add optimized Keccak functions
>   lib/crypto: sha3: Support arch overrides of one-shot digest functions
>   lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
>   crypto: jitterentropy - Use default sha3 implementation
>   crypto: sha3 - Reimplement using library API
>   crypto: s390/sha3 - Remove superseded SHA-3 code
> 
>  Documentation/crypto/index.rst                |   1 +
>  Documentation/crypto/sha3.rst                 | 130 ++++++
>  arch/arm64/configs/defconfig                  |   2 +-
>  arch/arm64/crypto/Kconfig                     |  11 -
>  arch/arm64/crypto/Makefile                    |   3 -
>  arch/arm64/crypto/sha3-ce-glue.c              | 151 -------
>  arch/s390/configs/debug_defconfig             |   3 +-
>  arch/s390/configs/defconfig                   |   3 +-
>  arch/s390/crypto/Kconfig                      |  20 -
>  arch/s390/crypto/Makefile                     |   2 -
>  arch/s390/crypto/sha.h                        |  51 ---
>  arch/s390/crypto/sha3_256_s390.c              | 157 -------
>  arch/s390/crypto/sha3_512_s390.c              | 157 -------
>  arch/s390/crypto/sha_common.c                 | 117 -----
>  crypto/Kconfig                                |   1 +
>  crypto/Makefile                               |   2 +-
>  crypto/jitterentropy-kcapi.c                  |  12 +-
>  crypto/sha3.c                                 | 166 +++++++
>  crypto/sha3_generic.c                         | 290 ------------
>  crypto/testmgr.c                              |   8 +
>  include/crypto/sha3.h                         | 306 ++++++++++++-
>  lib/crypto/Kconfig                            |  13 +
>  lib/crypto/Makefile                           |  10 +
>  .../crypto/arm64}/sha3-ce-core.S              |  67 +--
>  lib/crypto/arm64/sha3.h                       |  62 +++
>  lib/crypto/fips.h                             |   7 +
>  lib/crypto/s390/sha3.h                        | 151 +++++++
>  lib/crypto/sha3.c                             | 411 +++++++++++++++++
>  lib/crypto/tests/Kconfig                      |  11 +
>  lib/crypto/tests/Makefile                     |   1 +
>  lib/crypto/tests/sha3-testvecs.h              | 249 +++++++++++
>  lib/crypto/tests/sha3_kunit.c                 | 422 ++++++++++++++++++
>  scripts/crypto/gen-fips-testvecs.py           |   4 +
>  scripts/crypto/gen-hash-testvecs.py           |  27 +-
>  34 files changed, 2012 insertions(+), 1016 deletions(-)
>  create mode 100644 Documentation/crypto/sha3.rst
>  delete mode 100644 arch/arm64/crypto/sha3-ce-glue.c
>  delete mode 100644 arch/s390/crypto/sha.h
>  delete mode 100644 arch/s390/crypto/sha3_256_s390.c
>  delete mode 100644 arch/s390/crypto/sha3_512_s390.c
>  delete mode 100644 arch/s390/crypto/sha_common.c
>  create mode 100644 crypto/sha3.c
>  delete mode 100644 crypto/sha3_generic.c
>  rename {arch/arm64/crypto => lib/crypto/arm64}/sha3-ce-core.S (84%)
>  create mode 100644 lib/crypto/arm64/sha3.h
>  create mode 100644 lib/crypto/s390/sha3.h
>  create mode 100644 lib/crypto/sha3.c
>  create mode 100644 lib/crypto/tests/sha3-testvecs.h
>  create mode 100644 lib/crypto/tests/sha3_kunit.c
> 
> base-commit: e3068492d0016d0ea9a1ff07dbfa624d2ec773ca

Picked this series from your ebiggers repo branch sha3-lib-v2.
Build on s390 runs without any complains, no warnings.
As recommended I enabled the KUNIT option and also 
CRYPTO_SELFTESTS_FULL.
With an "modprobe tcrypt" I enforced to run the selftests
and in parallel I checked that the s390 specific CPACF instructions
are really used (can be done with the pai command and check for
the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
all the the sha3 hashes and check for thread safety.
All this ran without any findings. However there are NO performance
related tests involved.

What's a little bit tricky here is that the sha3 lib is statically
build into the kernel. So no chance to unload/load this as a module.
For sha1 and the sha2 stuff I can understand the need to have this
statically enabled in the kernel. Sha3 is only supposed to be available
as backup in case of sha2 deficiencies. So I can't see why this is
really statically needed.

Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months, 1 week ago
On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
> > If the s390 folks could re-test the s390 optimized SHA-3 code (by
> > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> > it would be helpful to provide the benchmark output from just before
> > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > functions".  Then we can verify that each change is useful.
[...]
> 
> Picked this series from your ebiggers repo branch sha3-lib-v2.
> Build on s390 runs without any complains, no warnings.
> As recommended I enabled the KUNIT option and also CRYPTO_SELFTESTS_FULL.
> With an "modprobe tcrypt" I enforced to run the selftests
> and in parallel I checked that the s390 specific CPACF instructions
> are really used (can be done with the pai command and check for
> the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
> all the the sha3 hashes and check for thread safety.
> All this ran without any findings. However there are NO performance
> related tests involved.

Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
verify that all its test cases passed?  That's the most important one.
It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
enabled, and I was hoping to see your results from that after each
change.  The results get printed to the kernel log when the test runs.

> What's a little bit tricky here is that the sha3 lib is statically
> build into the kernel. So no chance to unload/load this as a module.
> For sha1 and the sha2 stuff I can understand the need to have this
> statically enabled in the kernel. Sha3 is only supposed to be available
> as backup in case of sha2 deficiencies. So I can't see why this is
> really statically needed.

CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
or a loadable module, depending on what other kconfig options select it.
Same as all the other crypto library modules.

- Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months, 1 week ago
On 2025-10-29 17:32, Eric Biggers wrote:
> On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > it would be helpful to provide the benchmark output from just before
>> > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > functions".  Then we can verify that each change is useful.
> [...]
>> 
>> Picked this series from your ebiggers repo branch sha3-lib-v2.
>> Build on s390 runs without any complains, no warnings.
>> As recommended I enabled the KUNIT option and also 
>> CRYPTO_SELFTESTS_FULL.
>> With an "modprobe tcrypt" I enforced to run the selftests
>> and in parallel I checked that the s390 specific CPACF instructions
>> are really used (can be done with the pai command and check for
>> the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> all the the sha3 hashes and check for thread safety.
>> All this ran without any findings. However there are NO performance
>> related tests involved.
> 
> Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
> verify that all its test cases passed?  That's the most important one.
> It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
> enabled, and I was hoping to see your results from that after each
> change.  The results get printed to the kernel log when the test runs.
> 

Here it is - as this is a zVM system the benchmark values may show poor 
performance.

Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
14 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
109 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
911 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 1849 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1872 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 2647 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 3338 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 5484 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 5562 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 8297 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 12625 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 11242 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12853 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21
Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
skip:0 total:21
Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3

>> What's a little bit tricky here is that the sha3 lib is statically
>> build into the kernel. So no chance to unload/load this as a module.
>> For sha1 and the sha2 stuff I can understand the need to have this
>> statically enabled in the kernel. Sha3 is only supposed to be 
>> available
>> as backup in case of sha2 deficiencies. So I can't see why this is
>> really statically needed.
> 
> CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
> or a loadable module, depending on what other kconfig options select 
> it.
> Same as all the other crypto library modules.

I know and see this. However, I am unable to switch this to 'm'. It 
seems
like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
'm'.
And honestly I am unable to read these dependencies (forgive my 
ignorance):

CONFIG_CRYPTO_SHA3:
SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
  Symbol: CRYPTO_SHA3 [=y]
   Type  : tristate
   Defined at crypto/Kconfig:1006
     Prompt: SHA-3
     Depends on: CRYPTO [=y]
     Location:
       -> Cryptographic API (CRYPTO [=y])
         -> Hashes, digests, and MACs
           -> SHA-3 (CRYPTO_SHA3 [=y])
   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
   Selected by [y]:
     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
   Selected by [n]:
     - MODULE_SIG_SHA3_256 [=n] && MODULES [=y] && (MODULE_SIG [=y] || 
IMA_APPRAISE_MODSIG [=n])
     - MODULE_SIG_SHA3_384 [=n] && MODULES [=y] && (MODULE_SIG [=y] || 
IMA_APPRAISE_MODSIG [=n])
     - MODULE_SIG_SHA3_512 [=n] && MODULES [=y] && (MODULE_SIG [=y] || 
IMA_APPRAISE_MODSIG [=n])
     - CRYPTO_DEV_ZYNQMP_SHA3 [=n] && CRYPTO [=y] && CRYPTO_HW [=y] && 
(ZYNQMP_FIRMWARE [=n] || COMPILE_TEST [=n])
     - CRYPTO_DEV_STM32_HASH [=n] && CRYPTO [=y] && CRYPTO_HW [=y] && 
(ARCH_STM32 || ARCH_U8500) && HAS_DMA [=y]
     - CRYPTO_DEV_SAFEXCEL [=n] && CRYPTO [=y] && CRYPTO_HW [=y] && (OF 
[=n] || PCI [=y] || COMPILE_TEST [=n]) && HAS_IOMEM [=y]

> 
> - Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months, 1 week ago
On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
> On 2025-10-29 17:32, Eric Biggers wrote:
> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> > > > it would be helpful to provide the benchmark output from just before
> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > > > functions".  Then we can verify that each change is useful.
> > [...]
> > > 
> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
> > > Build on s390 runs without any complains, no warnings.
> > > As recommended I enabled the KUNIT option and also
> > > CRYPTO_SELFTESTS_FULL.
> > > With an "modprobe tcrypt" I enforced to run the selftests
> > > and in parallel I checked that the s390 specific CPACF instructions
> > > are really used (can be done with the pai command and check for
> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
> > > all the the sha3 hashes and check for thread safety.
> > > All this ran without any findings. However there are NO performance
> > > related tests involved.
> > 
> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
> > verify that all its test cases passed?  That's the most important one.
> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
> > enabled, and I was hoping to see your results from that after each
> > change.  The results get printed to the kernel log when the test runs.
> > 
> 
> Here it is - as this is a zVM system the benchmark values may show poor
> performance.
> 
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 14
> MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 109
> MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 911
> MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 1849 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1872 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 2647 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 3338 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 5484 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 5562 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 8297 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 12625 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 11242 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12853 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 skip:0
> total:21
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3

Thanks!  Is this with the whole series applied?  Those numbers are
pretty fast, so probably at least the Keccak acceleration part is
worthwhile.  But just to reiterate what I asked for:

    Also, it would be helpful to provide the benchmark output from just
    before "lib/crypto: s390/sha3: Add optimized Keccak function", just
    after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
    SHA-3 digest functions".

So I'd like to see how much each change helped, which isn't clear if you
show only the result at the end.

If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
one-shot SHA-3 digest functions" actually helps significantly vs. simply
doing the Keccak acceleration, then we should drop it for simplicity.

> > > What's a little bit tricky here is that the sha3 lib is statically
> > > build into the kernel. So no chance to unload/load this as a module.
> > > For sha1 and the sha2 stuff I can understand the need to have this
> > > statically enabled in the kernel. Sha3 is only supposed to be
> > > available
> > > as backup in case of sha2 deficiencies. So I can't see why this is
> > > really statically needed.
> > 
> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
> > or a loadable module, depending on what other kconfig options select it.
> > Same as all the other crypto library modules.
> 
> I know and see this. However, I am unable to switch this to 'm'. It seems
> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 'm'.
> And honestly I am unable to read these dependencies (forgive my ignorance):
> 
> CONFIG_CRYPTO_SHA3:
> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>  Symbol: CRYPTO_SHA3 [=y]
>   Type  : tristate
>   Defined at crypto/Kconfig:1006
>     Prompt: SHA-3
>     Depends on: CRYPTO [=y]
>     Location:
>       -> Cryptographic API (CRYPTO [=y])
>         -> Hashes, digests, and MACs
>           -> SHA-3 (CRYPTO_SHA3 [=y])
>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>   Selected by [y]:
>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]

Well, all that is saying is that there is a built-in option that selects
SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
working as intended in that case.  (And it's also intended that we no
longer allow the architecture-optimized code to be built as a module
when the generic code is built-in.  That was always a huge footgun.)  If
you want to know why something that needs SHA-3 is being built-in, you'd
need to follow the chain of dependencies up to see how it gets selected.

- Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months ago
On 2025-10-30 18:14, Eric Biggers wrote:
> On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
>> On 2025-10-29 17:32, Eric Biggers wrote:
>> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > > it would be helpful to provide the benchmark output from just before
>> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > > functions".  Then we can verify that each change is useful.
>> > [...]
>> > >
>> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > > Build on s390 runs without any complains, no warnings.
>> > > As recommended I enabled the KUNIT option and also
>> > > CRYPTO_SELFTESTS_FULL.
>> > > With an "modprobe tcrypt" I enforced to run the selftests
>> > > and in parallel I checked that the s390 specific CPACF instructions
>> > > are really used (can be done with the pai command and check for
>> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > > all the the sha3 hashes and check for thread safety.
>> > > All this ran without any findings. However there are NO performance
>> > > related tests involved.
>> >
>> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> > verify that all its test cases passed?  That's the most important one.
>> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> > enabled, and I was hoping to see your results from that after each
>> > change.  The results get printed to the kernel log when the test runs.
>> >
>> 
>> Here it is - as this is a zVM system the benchmark values may show 
>> poor
>> performance.
>> 
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 14
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 109
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 911
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1849 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1872 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2647 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 3338 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 5484 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 5562 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 8297 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 12625 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11242 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12853 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3
> 
> Thanks!  Is this with the whole series applied?  Those numbers are
> pretty fast, so probably at least the Keccak acceleration part is
> worthwhile.  But just to reiterate what I asked for:
> 
>     Also, it would be helpful to provide the benchmark output from just
>     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>     SHA-3 digest functions".
> 
> So I'd like to see how much each change helped, which isn't clear if 
> you
> show only the result at the end.
> 
> If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> one-shot SHA-3 digest functions" actually helps significantly vs. 
> simply
> doing the Keccak acceleration, then we should drop it for simplicity.
> 
>> > > What's a little bit tricky here is that the sha3 lib is statically
>> > > build into the kernel. So no chance to unload/load this as a module.
>> > > For sha1 and the sha2 stuff I can understand the need to have this
>> > > statically enabled in the kernel. Sha3 is only supposed to be
>> > > available
>> > > as backup in case of sha2 deficiencies. So I can't see why this is
>> > > really statically needed.
>> >
>> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
>> > or a loadable module, depending on what other kconfig options select it.
>> > Same as all the other crypto library modules.
>> 
>> I know and see this. However, I am unable to switch this to 'm'. It 
>> seems
>> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
>> 'm'.
>> And honestly I am unable to read these dependencies (forgive my 
>> ignorance):
>> 
>> CONFIG_CRYPTO_SHA3:
>> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>>  Symbol: CRYPTO_SHA3 [=y]
>>   Type  : tristate
>>   Defined at crypto/Kconfig:1006
>>     Prompt: SHA-3
>>     Depends on: CRYPTO [=y]
>>     Location:
>>       -> Cryptographic API (CRYPTO [=y])
>>         -> Hashes, digests, and MACs
>>           -> SHA-3 (CRYPTO_SHA3 [=y])
>>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>>   Selected by [y]:
>>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
> 
> Well, all that is saying is that there is a built-in option that 
> selects
> SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
> working as intended in that case.  (And it's also intended that we no
> longer allow the architecture-optimized code to be built as a module
> when the generic code is built-in.  That was always a huge footgun.)  
> If
> you want to know why something that needs SHA-3 is being built-in, 
> you'd
> need to follow the chain of dependencies up to see how it gets 
> selected.
> 
> - Eric

And here is a benchmark where I used I used commit
151fbe15a6cb lib/crypto: s390/sha3: Migrate optimized code into library
from your branch sha3-lib-v1. As far as I have in mind this lacks the
code optimizations:

Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
12 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
196 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
648 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 1011 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1014 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 1281 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 1396 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 2593 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 2624 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 4637 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 8931 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 10636 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12339 MB/s
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months ago
On 2025-10-30 18:14, Eric Biggers wrote:
> On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
>> On 2025-10-29 17:32, Eric Biggers wrote:
>> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > > it would be helpful to provide the benchmark output from just before
>> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > > functions".  Then we can verify that each change is useful.
>> > [...]
>> > >
>> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > > Build on s390 runs without any complains, no warnings.
>> > > As recommended I enabled the KUNIT option and also
>> > > CRYPTO_SELFTESTS_FULL.
>> > > With an "modprobe tcrypt" I enforced to run the selftests
>> > > and in parallel I checked that the s390 specific CPACF instructions
>> > > are really used (can be done with the pai command and check for
>> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > > all the the sha3 hashes and check for thread safety.
>> > > All this ran without any findings. However there are NO performance
>> > > related tests involved.
>> >
>> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> > verify that all its test cases passed?  That's the most important one.
>> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> > enabled, and I was hoping to see your results from that after each
>> > change.  The results get printed to the kernel log when the test runs.
>> >
>> 
>> Here it is - as this is a zVM system the benchmark values may show 
>> poor
>> performance.
>> 
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 14
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 109
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 911
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1849 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1872 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2647 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 3338 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 5484 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 5562 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 8297 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 12625 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11242 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12853 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3
> 
> Thanks!  Is this with the whole series applied?  Those numbers are
> pretty fast, so probably at least the Keccak acceleration part is
> worthwhile.  But just to reiterate what I asked for:
> 
>     Also, it would be helpful to provide the benchmark output from just
>     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>     SHA-3 digest functions".
> 
> So I'd like to see how much each change helped, which isn't clear if 
> you
> show only the result at the end.
> 
> If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> one-shot SHA-3 digest functions" actually helps significantly vs. 
> simply
> doing the Keccak acceleration, then we should drop it for simplicity.
> 
>> > > What's a little bit tricky here is that the sha3 lib is statically
>> > > build into the kernel. So no chance to unload/load this as a module.
>> > > For sha1 and the sha2 stuff I can understand the need to have this
>> > > statically enabled in the kernel. Sha3 is only supposed to be
>> > > available
>> > > as backup in case of sha2 deficiencies. So I can't see why this is
>> > > really statically needed.
>> >
>> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
>> > or a loadable module, depending on what other kconfig options select it.
>> > Same as all the other crypto library modules.
>> 
>> I know and see this. However, I am unable to switch this to 'm'. It 
>> seems
>> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
>> 'm'.
>> And honestly I am unable to read these dependencies (forgive my 
>> ignorance):
>> 
>> CONFIG_CRYPTO_SHA3:
>> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>>  Symbol: CRYPTO_SHA3 [=y]
>>   Type  : tristate
>>   Defined at crypto/Kconfig:1006
>>     Prompt: SHA-3
>>     Depends on: CRYPTO [=y]
>>     Location:
>>       -> Cryptographic API (CRYPTO [=y])
>>         -> Hashes, digests, and MACs
>>           -> SHA-3 (CRYPTO_SHA3 [=y])
>>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>>   Selected by [y]:
>>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
> 
> Well, all that is saying is that there is a built-in option that 
> selects
> SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
> working as intended in that case.  (And it's also intended that we no
> longer allow the architecture-optimized code to be built as a module
> when the generic code is built-in.  That was always a huge footgun.)  
> If
> you want to know why something that needs SHA-3 is being built-in, 
> you'd
> need to follow the chain of dependencies up to see how it gets 
> selected.
> 
> - Eric

commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 
digest functions:

Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
12 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
80 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
785 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 812 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1619 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 2319 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 2176 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 4881 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 4968 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 7565 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 11909 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 10378 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12273 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21

commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak 
functions:

Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
12 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
211 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
835 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 1557 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1617 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 1457 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 1830 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 3035 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 3245 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 5319 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 9969 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 11123 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12767 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21

commit aaca0ebc0717 lib/crypto: arm64/sha3: Migrate optimized code into 
library:

Nov 04 12:02:31 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     1..21
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
1 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
29 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
120 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 236 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 238 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 185 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 237 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 240 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 239 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 246 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 251 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 253 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 259 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Nov 04 12:02:32 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21

obviously this is without s390 specific acceleration.
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months ago
On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
> > Thanks!  Is this with the whole series applied?  Those numbers are
> > pretty fast, so probably at least the Keccak acceleration part is
> > worthwhile.  But just to reiterate what I asked for:
> > 
> >     Also, it would be helpful to provide the benchmark output from just
> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
> >     SHA-3 digest functions".
> > 
> > So I'd like to see how much each change helped, which isn't clear if you
> > show only the result at the end.
> > 
> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
> > doing the Keccak acceleration, then we should drop it for simplicity.
[...]
> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3
> digest functions:
> 
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 80
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 785
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 812 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1619 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 2319 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 2176 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 4881 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 4968 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 7565 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 11909 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 10378 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12273 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
> 
> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak functions:
> 
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 211
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 835
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 1557 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1617 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 1457 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 1830 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 3035 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 3245 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 5319 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 9969 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 11123 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12767 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21

Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
optimized one-shot SHA-3 digest functions" are:

    Length (bytes)      Before            After
    ==============    ==========        ==========
         1               12 MB/s           12 MB/s
        16              211 MB/s           80 MB/s
        64              835 MB/s          785 MB/s
       127             1557 MB/s          812 MB/s
       128             1617 MB/s         1619 MB/s
       200             1457 MB/s         2319 MB/s
       256             1830 MB/s         2176 MB/s
       511             3035 MB/s         4881 MB/s
       512             3245 MB/s         4968 MB/s
      1024             5319 MB/s         7565 MB/s
      3173             9969 MB/s        11909 MB/s
      4096            11123 MB/s        10378 MB/s
     16384            12767 MB/s        12273 MB/s

Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.

I expected the most improvement on short lengths.  The fact that some of
the short lengths actually regressed is concerning.

It's also clear the the Keccak acceleration itself matters far more than
this additional one-shot optimization, as expected.  The generic code
maxed out at only 259 MB/s for you.

I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" for now, to avoid the extra maintainence cost
and opportunity for bugs.

If you can provide more accurate numbers that show it's worthwhile, we
can reconsider.  Maybe set the CPU to a fixed frequency, and run
sha3_kunit multiple times (triggered via KUnit's debugfs interface)?

- Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months ago
On 2025-11-04 19:27, Eric Biggers wrote:
> On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
>> > Thanks!  Is this with the whole series applied?  Those numbers are
>> > pretty fast, so probably at least the Keccak acceleration part is
>> > worthwhile.  But just to reiterate what I asked for:
>> >
>> >     Also, it would be helpful to provide the benchmark output from just
>> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>> >     SHA-3 digest functions".
>> >
>> > So I'd like to see how much each change helped, which isn't clear if you
>> > show only the result at the end.
>> >
>> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
>> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
>> > doing the Keccak acceleration, then we should drop it for simplicity.
> [...]
>> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot 
>> SHA-3
>> digest functions:
>> 
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 80
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 785
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 812 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1619 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2319 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 2176 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 4881 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 4968 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 7565 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 11909 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 10378 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12273 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> 
>> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak 
>> functions:
>> 
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 211
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 835
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1557 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1617 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 1457 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 1830 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 3035 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 3245 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 5319 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 9969 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11123 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12767 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
> 
> Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
> optimized one-shot SHA-3 digest functions" are:
> 
>     Length (bytes)      Before            After
>     ==============    ==========        ==========
>          1               12 MB/s           12 MB/s
>         16              211 MB/s           80 MB/s
>         64              835 MB/s          785 MB/s
>        127             1557 MB/s          812 MB/s
>        128             1617 MB/s         1619 MB/s
>        200             1457 MB/s         2319 MB/s
>        256             1830 MB/s         2176 MB/s
>        511             3035 MB/s         4881 MB/s
>        512             3245 MB/s         4968 MB/s
>       1024             5319 MB/s         7565 MB/s
>       3173             9969 MB/s        11909 MB/s
>       4096            11123 MB/s        10378 MB/s
>      16384            12767 MB/s        12273 MB/s
> 
> Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
> 3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.
> 
> I expected the most improvement on short lengths.  The fact that some 
> of
> the short lengths actually regressed is concerning.
> 
> It's also clear the the Keccak acceleration itself matters far more 
> than
> this additional one-shot optimization, as expected.  The generic code
> maxed out at only 259 MB/s for you.
> 
> I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" for now, to avoid the extra maintainence cost
> and opportunity for bugs.
> 
> If you can provide more accurate numbers that show it's worthwhile, we
> can reconsider.  Maybe set the CPU to a fixed frequency, and run
> sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
> 
> - Eric

The focus should be on the small data. Let me see what I can do ...
I used a zVM guest for this. Instead use an LPAR may be an option and
some CPU pinning. And do some more tests to be able to calculate a gauss
distribution. However, not within the next few days.
So I agree with you: let's hold back the one-shot optimization.

Harald Freudenberger
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months, 1 week ago
On 2025-10-30 18:14, Eric Biggers wrote:
> On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
>> On 2025-10-29 17:32, Eric Biggers wrote:
>> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > > it would be helpful to provide the benchmark output from just before
>> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > > functions".  Then we can verify that each change is useful.
>> > [...]
>> > >
>> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > > Build on s390 runs without any complains, no warnings.
>> > > As recommended I enabled the KUNIT option and also
>> > > CRYPTO_SELFTESTS_FULL.
>> > > With an "modprobe tcrypt" I enforced to run the selftests
>> > > and in parallel I checked that the s390 specific CPACF instructions
>> > > are really used (can be done with the pai command and check for
>> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > > all the the sha3 hashes and check for thread safety.
>> > > All this ran without any findings. However there are NO performance
>> > > related tests involved.
>> >
>> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> > verify that all its test cases passed?  That's the most important one.
>> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> > enabled, and I was hoping to see your results from that after each
>> > change.  The results get printed to the kernel log when the test runs.
>> >
>> 
>> Here it is - as this is a zVM system the benchmark values may show 
>> poor
>> performance.
>> 
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 14
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 109
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 911
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1849 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1872 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2647 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 3338 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 5484 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 5562 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 8297 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 12625 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11242 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12853 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3
> 
> Thanks!  Is this with the whole series applied?  Those numbers are
> pretty fast, so probably at least the Keccak acceleration part is
> worthwhile.  But just to reiterate what I asked for:
> 
>     Also, it would be helpful to provide the benchmark output from just
>     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>     SHA-3 digest functions".
> 
> So I'd like to see how much each change helped, which isn't clear if 
> you
> show only the result at the end.

Yea, let's see ... Monday maybe ...

> 
> If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> one-shot SHA-3 digest functions" actually helps significantly vs. 
> simply
> doing the Keccak acceleration, then we should drop it for simplicity.
> 
>> > > What's a little bit tricky here is that the sha3 lib is statically
>> > > build into the kernel. So no chance to unload/load this as a module.
>> > > For sha1 and the sha2 stuff I can understand the need to have this
>> > > statically enabled in the kernel. Sha3 is only supposed to be
>> > > available
>> > > as backup in case of sha2 deficiencies. So I can't see why this is
>> > > really statically needed.
>> >
>> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
>> > or a loadable module, depending on what other kconfig options select it.
>> > Same as all the other crypto library modules.
>> 
>> I know and see this. However, I am unable to switch this to 'm'. It 
>> seems
>> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
>> 'm'.
>> And honestly I am unable to read these dependencies (forgive my 
>> ignorance):
>> 
>> CONFIG_CRYPTO_SHA3:
>> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>>  Symbol: CRYPTO_SHA3 [=y]
>>   Type  : tristate
>>   Defined at crypto/Kconfig:1006
>>     Prompt: SHA-3
>>     Depends on: CRYPTO [=y]
>>     Location:
>>       -> Cryptographic API (CRYPTO [=y])
>>         -> Hashes, digests, and MACs
>>           -> SHA-3 (CRYPTO_SHA3 [=y])
>>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>>   Selected by [y]:
>>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
> 
> Well, all that is saying is that there is a built-in option that 
> selects
> SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
> working as intended in that case.  (And it's also intended that we no
> longer allow the architecture-optimized code to be built as a module
> when the generic code is built-in.  That was always a huge footgun.)  
> If
> you want to know why something that needs SHA-3 is being built-in, 
> you'd
> need to follow the chain of dependencies up to see how it gets 
> selected.
> 
> - Eric

Thanks
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months, 1 week ago
On Wed, Oct 29, 2025 at 09:32:16AM -0700, Eric Biggers wrote:
> On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
> > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
> > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> > > it would be helpful to provide the benchmark output from just before
> > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > > functions".  Then we can verify that each change is useful.
> [...]
> > 
> > Picked this series from your ebiggers repo branch sha3-lib-v2.
> > Build on s390 runs without any complains, no warnings.
> > As recommended I enabled the KUNIT option and also CRYPTO_SELFTESTS_FULL.
> > With an "modprobe tcrypt" I enforced to run the selftests
> > and in parallel I checked that the s390 specific CPACF instructions
> > are really used (can be done with the pai command and check for
> > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
> > all the the sha3 hashes and check for thread safety.
> > All this ran without any findings. However there are NO performance
> > related tests involved.
> 
> Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
> verify that all its test cases passed?  That's the most important one.
> It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
> enabled, and I was hoping to see your results from that after each
> change.  The results get printed to the kernel log when the test runs.
> 

Also, can you confirm that you ran the test on a CPU that has
"facility 86", so that the one-shot digest functions get exercised?

(By the way, I recommend defining named constants somewhere in
arch/s390/ for the different facilities.  I borrowed the
"test_facility(86)" from the existing code, which does not say what 86
means.  After doing some research, it looks like it means MSA12.)

- Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months, 1 week ago
On 2025-10-29 21:33, Eric Biggers wrote:
> On Wed, Oct 29, 2025 at 09:32:16AM -0700, Eric Biggers wrote:
>> On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > it would be helpful to provide the benchmark output from just before
>> > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > functions".  Then we can verify that each change is useful.
>> [...]
>> >
>> > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > Build on s390 runs without any complains, no warnings.
>> > As recommended I enabled the KUNIT option and also CRYPTO_SELFTESTS_FULL.
>> > With an "modprobe tcrypt" I enforced to run the selftests
>> > and in parallel I checked that the s390 specific CPACF instructions
>> > are really used (can be done with the pai command and check for
>> > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > all the the sha3 hashes and check for thread safety.
>> > All this ran without any findings. However there are NO performance
>> > related tests involved.
>> 
>> Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> verify that all its test cases passed?  That's the most important one.
>> It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> enabled, and I was hoping to see your results from that after each
>> change.  The results get printed to the kernel log when the test runs.
>> 
> 
> Also, can you confirm that you ran the test on a CPU that has
> "facility 86", so that the one-shot digest functions get exercised?
> 
> (By the way, I recommend defining named constants somewhere in
> arch/s390/ for the different facilities.  I borrowed the
> "test_facility(86)" from the existing code, which does not say what 86
> means.  After doing some research, it looks like it means MSA12.)
> 

Of course, the machine I used has MSA level 12 (stfle bit 86).

> - Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Heiko Carstens 3 months, 1 week ago
On Wed, Oct 29, 2025 at 08:33:45PM +0000, Eric Biggers wrote:
> (By the way, I recommend defining named constants somewhere in
> arch/s390/ for the different facilities.  I borrowed the
> "test_facility(86)" from the existing code, which does not say what 86
> means.  After doing some research, it looks like it means MSA12.)

Not so surpringly this has been discussed several times in the
past. It would have been easy if each of those bits would represent
exactly one facility, but then there is e.g. bit 46 which means:

the distinct-operands, fast-BCR-serialization, high-word, and
population-count facilities, the interlocked-access facility 1, and
the load/store-on- condition facility 1 are installed in the
z/Architecture architectural mode

Some proposed to add defines like "FACILITY_MULTI_46", which is of
course pointless, since there is added benefit for just using plain
46. Alternatively it would be possible to have a define for each of
them. But if you need two or three of them for your code, then there
would be several tests needed - all for the same bit, and each one
generating a static branch - which also doesn't make too much sense.

So in the end we ended up with the conclusion to stick with the plain
numbers.

That said, users are still free to add aliases like e.g. cpu_has_vx(),
see arch/s390/include/asm/cpufeature.h. It is just not an all or
nothing approach.
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months ago
On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> This series is targeting libcrypto-next.  It can also be retrieved from:
> 
>     git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git sha3-lib-v2
> 
> This series adds SHA-3 support to lib/crypto/.  This includes support
> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> and also support for the extendable-output functions SHAKE128 and
> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> 
> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> into lib/crypto/.  (The existing s390 code couldn't really be reused, so
> really I rewrote it from scratch.)  This makes the SHA-3 library
> functions be accelerated on these architectures.
> 
> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> algorithms are reimplemented on top of the library API.

I've applied this series to
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
excluding the following 2 patches which are waiting on benchmark results
from the s390 folks:

    lib/crypto: sha3: Support arch overrides of one-shot digest functions
    lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions

I'd be glad to apply those too if they're shown to be worthwhile.

Note: I also reordered the commits in libcrypto-next to put the new
KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
improvements on a separate branch that's merged in.  This will allow
making separate pull requests for the tests and the AES-GCM
improvements, which I think aligns with what Linus had requested before
(https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).

- Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months ago
On 2025-11-03 18:34, Eric Biggers wrote:
> On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
>> This series is targeting libcrypto-next.  It can also be retrieved 
>> from:
>> 
>>     git fetch 
>> https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git 
>> sha3-lib-v2
>> 
>> This series adds SHA-3 support to lib/crypto/.  This includes support
>> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
>> and also support for the extendable-output functions SHAKE128 and
>> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
>> 
>> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
>> into lib/crypto/.  (The existing s390 code couldn't really be reused, 
>> so
>> really I rewrote it from scratch.)  This makes the SHA-3 library
>> functions be accelerated on these architectures.
>> 
>> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
>> algorithms are reimplemented on top of the library API.
> 
> I've applied this series to
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
> excluding the following 2 patches which are waiting on benchmark 
> results
> from the s390 folks:
> 
>     lib/crypto: sha3: Support arch overrides of one-shot digest 
> functions
>     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest 
> functions
> 
> I'd be glad to apply those too if they're shown to be worthwhile.
> 
> Note: I also reordered the commits in libcrypto-next to put the new
> KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
> improvements on a separate branch that's merged in.  This will allow
> making separate pull requests for the tests and the AES-GCM
> improvements, which I think aligns with what Linus had requested before
> (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
> 
> - Eric

Here are now some measurements on a LPAR with 500 runs once with
sha3-lib-v2 branch full ("with") and once with reverting only the
b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest 
functions
patch ("without"). With the help of gnuplot I generated distribution
charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
See attached pictures - Sorry but I see no other way to provide this 
data
than using an attachment.

Clearly the patch brings a boost - especially for the 256 byte case.

Harald Freudenberger
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months ago
On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
> On 2025-11-03 18:34, Eric Biggers wrote:
> > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> > > This series is targeting libcrypto-next.  It can also be retrieved
> > > from:
> > > 
> > >     git fetch
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> > > sha3-lib-v2
> > > 
> > > This series adds SHA-3 support to lib/crypto/.  This includes support
> > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> > > and also support for the extendable-output functions SHAKE128 and
> > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> > > 
> > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> > > into lib/crypto/.  (The existing s390 code couldn't really be
> > > reused, so
> > > really I rewrote it from scratch.)  This makes the SHA-3 library
> > > functions be accelerated on these architectures.
> > > 
> > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> > > algorithms are reimplemented on top of the library API.
> > 
> > I've applied this series to
> > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
> > excluding the following 2 patches which are waiting on benchmark results
> > from the s390 folks:
> > 
> >     lib/crypto: sha3: Support arch overrides of one-shot digest
> > functions
> >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
> > 
> > I'd be glad to apply those too if they're shown to be worthwhile.
> > 
> > Note: I also reordered the commits in libcrypto-next to put the new
> > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
> > improvements on a separate branch that's merged in.  This will allow
> > making separate pull requests for the tests and the AES-GCM
> > improvements, which I think aligns with what Linus had requested before
> > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
> > 
> > - Eric
> 
> Here are now some measurements on a LPAR with 500 runs once with
> sha3-lib-v2 branch full ("with") and once with reverting only the
> b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions
> patch ("without"). With the help of gnuplot I generated distribution
> charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
> See attached pictures - Sorry but I see no other way to provide this data
> than using an attachment.
> 
> Clearly the patch brings a boost - especially for the 256 byte case.
> 
> Harald Freudenberger

Thanks.  I applied "lib/crypto: sha3: Support arch overrides of one-shot
digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
the commit message to mention your benchmark results:

commit 862445d3b9e74f58360a7a89787da4dca783e6dd
Author: Eric Biggers <ebiggers@kernel.org>
Date:   Sat Oct 25 22:50:29 2025 -0700

    lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
    
    Some z/Architecture processors can compute a SHA-3 digest in a single
    instruction.  arch/s390/crypto/ already uses this capability to optimize
    the SHA-3 crypto_shash algorithms.
    
    Use this capability to implement the sha3_224(), sha3_256(), sha3_384(),
    and sha3_512() library functions too.
    
    SHA3-256 benchmark results provided by Harald Freudenberger
    (https://lore.kernel.org/r/4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com/)
    on a z/Architecture machine with "facility 86" (MSA level 12):
    
        Length (bytes)    Before (MB/s)   After (MB/s)
        ==============    =============   ============
              16                212             225
              64                820             915
             256               1850            3350
            1024               5400            8300
            4096              11200           11300
    
    Note: the original data from Harald was given in the form of a graph for
    each length, showing the distribution of throughputs from 500 runs.  I
    guesstimated the peak of each one.
    
    Harald also reported that the generic SHA-3 code was at most 259 MB/s
    (https://lore.kernel.org/r/c39f6b6c110def0095e5da5becc12085@linux.ibm.com/).
    So as expected, the earlier commit that optimized sha3_absorb_blocks()
    and sha3_keccakf() is the more important one; it optimized the Keccak
    permutation which is the most performance-critical part of SHA-3.
    Still, this additional commit does notably improve performance further
    on some lengths.
    
    Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
    Tested-by: Harald Freudenberger <freude@linux.ibm.com>
    Link: https://lore.kernel.org/r/20251026055032.1413733-13-ebiggers@kernel.org
    Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months ago
On Wed, Nov 05, 2025 at 08:33:40PM -0800, Eric Biggers wrote:
> On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
> > On 2025-11-03 18:34, Eric Biggers wrote:
> > > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> > > > This series is targeting libcrypto-next.  It can also be retrieved
> > > > from:
> > > > 
> > > >     git fetch
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> > > > sha3-lib-v2
> > > > 
> > > > This series adds SHA-3 support to lib/crypto/.  This includes support
> > > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> > > > and also support for the extendable-output functions SHAKE128 and
> > > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> > > > 
> > > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> > > > into lib/crypto/.  (The existing s390 code couldn't really be
> > > > reused, so
> > > > really I rewrote it from scratch.)  This makes the SHA-3 library
> > > > functions be accelerated on these architectures.
> > > > 
> > > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> > > > algorithms are reimplemented on top of the library API.
> > > 
> > > I've applied this series to
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
> > > excluding the following 2 patches which are waiting on benchmark results
> > > from the s390 folks:
> > > 
> > >     lib/crypto: sha3: Support arch overrides of one-shot digest
> > > functions
> > >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
> > > 
> > > I'd be glad to apply those too if they're shown to be worthwhile.
> > > 
> > > Note: I also reordered the commits in libcrypto-next to put the new
> > > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
> > > improvements on a separate branch that's merged in.  This will allow
> > > making separate pull requests for the tests and the AES-GCM
> > > improvements, which I think aligns with what Linus had requested before
> > > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
> > > 
> > > - Eric
> > 
> > Here are now some measurements on a LPAR with 500 runs once with
> > sha3-lib-v2 branch full ("with") and once with reverting only the
> > b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > functions
> > patch ("without"). With the help of gnuplot I generated distribution
> > charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
> > See attached pictures - Sorry but I see no other way to provide this data
> > than using an attachment.
> > 
> > Clearly the patch brings a boost - especially for the 256 byte case.
> > 
> > Harald Freudenberger
> 
> Thanks.  I applied "lib/crypto: sha3: Support arch overrides of one-shot
> digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
> the commit message to mention your benchmark results:

Also, I'm wondering what your plan to add support for these instructions
to QEMU is?  The status quo, where only people with an s390 mainframe
can test this code, isn't sustainable.

I already have s390 in my testing matrix; I run the crypto and CRC tests
on all architectures with optimized crypto or CRC code.  But most of the
s390 optimized crypto code isn't actually being executed.

- Eric
Re: [PATCH v2 00/15] SHA-3 library
Posted by Harald Freudenberger 3 months ago
On 2025-11-06 08:22, Eric Biggers wrote:
> On Wed, Nov 05, 2025 at 08:33:40PM -0800, Eric Biggers wrote:
>> On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
>> > On 2025-11-03 18:34, Eric Biggers wrote:
>> > > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
>> > > > This series is targeting libcrypto-next.  It can also be retrieved
>> > > > from:
>> > > >
>> > > >     git fetch
>> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
>> > > > sha3-lib-v2
>> > > >
>> > > > This series adds SHA-3 support to lib/crypto/.  This includes support
>> > > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
>> > > > and also support for the extendable-output functions SHAKE128 and
>> > > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
>> > > >
>> > > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
>> > > > into lib/crypto/.  (The existing s390 code couldn't really be
>> > > > reused, so
>> > > > really I rewrote it from scratch.)  This makes the SHA-3 library
>> > > > functions be accelerated on these architectures.
>> > > >
>> > > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
>> > > > algorithms are reimplemented on top of the library API.
>> > >
>> > > I've applied this series to
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
>> > > excluding the following 2 patches which are waiting on benchmark results
>> > > from the s390 folks:
>> > >
>> > >     lib/crypto: sha3: Support arch overrides of one-shot digest
>> > > functions
>> > >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
>> > >
>> > > I'd be glad to apply those too if they're shown to be worthwhile.
>> > >
>> > > Note: I also reordered the commits in libcrypto-next to put the new
>> > > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
>> > > improvements on a separate branch that's merged in.  This will allow
>> > > making separate pull requests for the tests and the AES-GCM
>> > > improvements, which I think aligns with what Linus had requested before
>> > > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
>> > >
>> > > - Eric
>> >
>> > Here are now some measurements on a LPAR with 500 runs once with
>> > sha3-lib-v2 branch full ("with") and once with reverting only the
>> > b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > functions
>> > patch ("without"). With the help of gnuplot I generated distribution
>> > charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
>> > See attached pictures - Sorry but I see no other way to provide this data
>> > than using an attachment.
>> >
>> > Clearly the patch brings a boost - especially for the 256 byte case.
>> >
>> > Harald Freudenberger
>> 
>> Thanks.  I applied "lib/crypto: sha3: Support arch overrides of 
>> one-shot
>> digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
>> SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
>> the commit message to mention your benchmark results:
> 
> Also, I'm wondering what your plan to add support for these 
> instructions
> to QEMU is?  The status quo, where only people with an s390 mainframe
> can test this code, isn't sustainable.
> 
> I already have s390 in my testing matrix; I run the crypto and CRC 
> tests
> on all architectures with optimized crypto or CRC code.  But most of 
> the
> s390 optimized crypto code isn't actually being executed.
> 
> - Eric

Well, there are no plans. However, there has been a decision some while 
ago
that "we" may support this in the future. But as there are currently no
human resources available and working there I suspect a qemu CPACF 
support
in general will not come soon. Please note also that this is really an
implementation of crypto algorithms then and as such it needs to apply
to some regulations with regards of the EAR of the US Bureau of Industry
and Security...

Harald Freudenberger
Re: [PATCH v2 00/15] SHA-3 library
Posted by Eric Biggers 3 months ago
On Thu, Nov 06, 2025 at 09:54:59AM +0100, Harald Freudenberger wrote:
> > Also, I'm wondering what your plan to add support for these instructions
> > to QEMU is?  The status quo, where only people with an s390 mainframe
> > can test this code, isn't sustainable.
> > 
> > I already have s390 in my testing matrix; I run the crypto and CRC tests
> > on all architectures with optimized crypto or CRC code.  But most of the
> > s390 optimized crypto code isn't actually being executed.
> > 
> > - Eric
> 
> Well, there are no plans. However, there has been a decision some while ago
> that "we" may support this in the future. But as there are currently no
> human resources available and working there I suspect a qemu CPACF support
> in general will not come soon.

Great to hear that you might have someone work on this in the future.
It should be noted that this is a significant gap that puts s390 behind
all the major architectures (x86_64, arm64, arm32, riscv, etc.) and
makes it much more likely that s390 specific bugs be introduced.

We need to have higher standards for cryptography code.

As I've mentioned before, I don't plan to accept code that uses new
instructions without QEMU support.  The SHA-{1,2,3} code is allowed only
because the instructions were already being used by arch/s390/crypto/.

I see that Jason actually added support for CPACF_*_SHA_512 to QEMU a
few years ago
(https://github.com/qemu/qemu/commit/9f17bfdab422887807cbd5260ed6b0b6e54ddb33).
So clearly it's possible to support these instructions in QEMU.
Someone just needs to add support for the other SHA algorithms.

> Please note also that this is really an implementation of crypto
> algorithms then and as such it needs to apply to some regulations with
> regards of the EAR of the US Bureau of Industry and Security...

Like Linux, QEMU is an open source project, published publicly, and
which already contains many cryptographic algorithms.  Check out
https://www.linuxfoundation.org/resources/publications/understanding-us-export-controls-with-open-source-projects

- Eric