crypto, lib/crypto: Add SHAKE128/256 support and move SHA3 to lib/crypto

[PATCH v3 0/8] crypto, lib/crypto: Add SHAKE128/256 support and move SHA3 to lib/crypto

Posted by David Howells 5 days, 7 hours ago

Hi Eric, Herbert,

Here's a set of patches does the following:

 (1) Renames s390 and arm64 sha3_* functions to avoid name collisions.

 (2) Copies the core of SHA3 support from crypto/ to lib/crypto/.

 (3) Simplifies the internal code to maintain the buffer in little endian
     form, thereby simplifying the update and extraction code which don't
     then need to worry about this.  Instead, the state buffer is
     byteswapped before and after.

 (4) Moves the Iota transform into the function with the rest of the
     transforms.

 (5) Adds SHAKE128 and SHAKE256 support (needed for ML-DSA).

 (6) Adds a kunit test for SHA3 in lib/crypto/tests/.

 (7) Adds proper API documentation for SHA3.

 (8) Makes crypto/sha3_generic.c use lib/crypto/sha3.  This necessitates a
     slight enlargement of the context buffers which might affect optimised
     assembly/hardware drivers.

Note that only the generic code is moved across; the asm-optimised stuff is
not touched as I'm not familiar with that.

I have done what Eric required and made a separate wrapper struct and set
of wrapper functions for each algorithm, though I think this is excessively
bureaucratic as this multiplies the API load by 7 (and maybe 9 in the
future[*]).

[*] The Kyber algorithm also uses CSHAKE variants in the SHA3 family - and
    NIST mentions some other variants too.

This does, however, cause a problem for what I need to do as the ML-DSA
prehash is dynamically selectable by certificate OID, so I have to add
SHAKE128/256 support to the crypto shash API too - though hopefully it will
only require an output of 16 or 32 bytes respectively for the prehash case
and won't require multiple squeezing.

This is based on Eric's libcrypto-next branch.

The patches can also be found here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-pqc

David

Changes
=======
ver #3)
 - Renamed conflicting arm64 functions.
 - Made a separate wrapper API for each algorithm in the family.
 - Removed sha3_init(), sha3_reinit() and sha3_final().
 - Removed sha3_ctx::digest_size.
 - Renamed sha3_ctx::partial to sha3_ctx::absorb_offset.
 - Refer to the output of SHAKE* as "output" not "digest".
 - Moved the Iota transform into the one-round function.
 - Made sha3_update() warn if called after sha3_squeeze().
 - Simplified the module-load test to not do update after squeeze.
 - Added Return: and Context: kdoc statements and expanded the kdoc
   headers.
 - Added an API description document.
 - Overhauled the kunit tests.
   - Only have one kunit test.
   - Only call the general hash tester on one algo.
   - Add separate simple cursory checks for the other algos.
   - Add resqueezing tests.
   - Add some NIST example tests.
 - Changed crypto/sha3_generic to use this
 - Added SHAKE128/256 to crypto/sha3_generic and crypto/testmgr
 - Folded struct sha3_state into struct sha3_ctx.

ver #2)
  - Simplify the endianness handling.
  - Rename sha3_final() to sha3_squeeze() and don't clear the context at the
    end as it's permitted to continue calling sha3_final() to extract
    continuations of the digest (needed by ML-DSA).
  - Don't reapply the end marker to the hash state in continuation
    sha3_squeeze() unless sha3_update() gets called again (needed by
    ML-DSA).
  - Give sha3_squeeze() the amount of digest to produce as a parameter
    rather than using ctx->digest_size and don't return the amount digested.
  - Reimplement sha3_final() as a wrapper around sha3_squeeze() that
    extracts ctx->digest_size amount of digest and then zeroes out the
    context.  The latter is necessary to avoid upsetting
    hash-test-template.h.
  - Provide a sha3_reinit() function to clear the state, but to leave the
    parameters that indicate the hash properties unaffected, allowing for
    reuse.
  - Provide a sha3_set_digestsize() function to change the size of the
    digest to be extracted by sha3_final().  sha3_squeeze() takes a
    parameter for this instead.
  - Don't pass the digest size as a parameter to shake128/256_init() but
    rather default to 128/256 bits as per the function name.
  - Provide a sha3_clear() function to zero out the context.

David Howells (8):
  s390/sha3: Rename conflicting functions
  arm64/sha3: Rename conflicting functions
  lib/crypto: Add SHA3-224, SHA3-256, SHA3-384, SHA-512, SHAKE128,
    SHAKE256
  lib/crypto: Move the SHA3 Iota transform into the single round
    function
  lib/crypto: Add SHA3 kunit tests
  crypto/sha3: Use lib/crypto/sha3
  crypto/sha3: Add SHAKE128/256 support
  crypto: SHAKE tests

 Documentation/crypto/index.rst      |   1 +
 Documentation/crypto/sha3.rst       | 241 +++++++++++++
 arch/arm64/crypto/sha3-ce-glue.c    |  47 +--
 arch/s390/crypto/sha3_256_s390.c    |  26 +-
 arch/s390/crypto/sha3_512_s390.c    |  26 +-
 crypto/sha3_generic.c               | 233 +++---------
 crypto/testmgr.c                    |  14 +
 crypto/testmgr.h                    |  59 ++++
 include/crypto/sha3.h               | 467 +++++++++++++++++++++++-
 lib/crypto/Kconfig                  |   7 +
 lib/crypto/Makefile                 |   6 +
 lib/crypto/sha3.c                   | 529 ++++++++++++++++++++++++++++
 lib/crypto/tests/Kconfig            |  12 +
 lib/crypto/tests/Makefile           |   1 +
 lib/crypto/tests/sha3_kunit.c       | 338 ++++++++++++++++++
 lib/crypto/tests/sha3_testvecs.h    | 231 ++++++++++++
 scripts/crypto/gen-hash-testvecs.py |   8 +-
 17 files changed, 2012 insertions(+), 234 deletions(-)
 create mode 100644 Documentation/crypto/sha3.rst
 create mode 100644 lib/crypto/sha3.c
 create mode 100644 lib/crypto/tests/sha3_kunit.c
 create mode 100644 lib/crypto/tests/sha3_testvecs.h

Re: [PATCH v3 0/8] crypto, lib/crypto: Add SHAKE128/256 support and move SHA3 to lib/crypto

Posted by Eric Biggers 5 days, 1 hour ago

Hi David,

On Fri, Sep 26, 2025 at 03:19:43PM +0100, David Howells wrote:
> I have done what Eric required and made a separate wrapper struct and set
> of wrapper functions for each algorithm, though I think this is excessively
> bureaucratic as this multiplies the API load by 7 (and maybe 9 in the
> future[*]).

I don't think I "required" that it be implemented in exactly this way.
Sorry if I wasn't clear.  Let me quote what I wrote:

    First, this patch's proposed API is error-prone due to the weak
    typing that allows mixing steps of different algorithms together.
    For example, users could initialize a sha3_ctx with sha3_256_init()
    and then squeeze an arbitrary amount from it, incorrectly treating
    it as a XOF.  It would be worth considering separating the APIs for
    the different algorithms that are part of SHA-3, similar to what I
    did with SHA-224 and SHA-256.  (They would of course still share
    code internally, just like SHA-2.)

So I asked that to prevent usage errors such as treating a digest as a
XOF, you consider separating the APIs.  There is more than one way to do
that, and I was hoping that you'd consider different ways.  One way is
separate functions and types for all six SHA-3 algorithms.

However, if that is not scaling well, then we could instead just
separate the SHA-3 algorithms into two groups, the digests and the XOFs:

    void sha3_224_init(struct sha3_ctx *ctx);
    void sha3_256_init(struct sha3_ctx *ctx);
    void sha3_384_init(struct sha3_ctx *ctx);
    void sha3_512_init(struct sha3_ctx *ctx);
    void sha3_update(struct sha3_ctx *ctx, const u8 *data, size_t data_len);
    void sha3_final(struct sha3_ctx *ctx, u8 *out);

    void shake128_init(struct shake_ctx *ctx);
    void shake256_init(struct shake_ctx *ctx);
    void shake_update(struct shake_ctx *ctx, const u8 *data, size_t data_len);
    void shake_squeeze(struct shake_ctx *ctx, u8 *out, size_t out_len);
    void shake_clear(struct shake_ctx *ctx);

(With "sha3_ctx" being used for the digests specifically, the internal
context struct would then have to have a third name, like "__sha3_ctx".)

The *_init() functions would store the correct information in the
context so that the other functions would know what to do.  This would
be similar to how blake2s_init() saves the 'outlen' for blake2s_final().

That would be sufficient to prevent misuse errors where steps of
different algorithms are mixed together, right?

Keep in mind that for SHA-2 we have to have completely different code
and underlying state for the 32-bit hashes (SHA-224 and SHA-256) and
64-bit hashes (SHA-384 and SHA-512) anyway.  We also traditionally
haven't kept any information in the SHA-2 context about which SHA-2
algorithm is being executed.  So that led us more down the road of the
separate functions and types for each SHA-2 algorithm.  With SHA-3,
where e.g. the 224, 256, 384, and 512-bit digests all use the same
underlying state, a slightly more unified API might be appropriate.

All I'm really requesting is that we don't create footguns, like the
following that the API in the v2 patch permitted:

    1. sha3_init() + sha3_update()
        [infinite loop]

    2. sha3_256_init() + sha3_update() + sha3_squeeze()
        [not valid, treats SHA3-256 as a XOF]

    3. sha3_update() + sha3_squeeze() + sha3_update() + sha3_squeeze()
        [not valid, as discussed]

(1) is prevented just by not having the internal function sha3_init() as
a public function.

Splitting the context into two types, one for the digests and one for
the XOFs, is sufficient to prevent (2), as long as there's still one
init function per algorithm.  We don't necessarily need six types.

(3) isn't preventable via the type system, but it's detectable by a
run-time check, which you've done by adding a WARN_ON_ONCE() to
sha3_update().

So, I think we'd be in a good position with just the digests and XOFs
separated out into different functions + types.

> This does, however, cause a problem for what I need to do as the ML-DSA
> prehash is dynamically selectable by certificate OID, so I have to add
> SHAKE128/256 support to the crypto shash API too - though hopefully it will
> only require an output of 16 or 32 bytes respectively for the prehash case
> and won't require multiple squeezing.

When there's only a small number of supported algorithms, just doing the
dispatch in the calling code tends to be simpler than using
crypto_shash.  For example, see the recent conversion of fs/verity/ to
use the SHA-2 library API instead of crypto_shash.

- Eric

Re: [PATCH v3 0/8] crypto, lib/crypto: Add SHAKE128/256 support and move SHA3 to lib/crypto

Posted by Eric Biggers 6 hours ago

Hi David,

On Fri, Sep 26, 2025 at 12:59:58PM -0700, Eric Biggers wrote:
> Hi David,
> 
> On Fri, Sep 26, 2025 at 03:19:43PM +0100, David Howells wrote:
> > I have done what Eric required and made a separate wrapper struct and set
> > of wrapper functions for each algorithm, though I think this is excessively
> > bureaucratic as this multiplies the API load by 7 (and maybe 9 in the
> > future[*]).
> 
> I don't think I "required" that it be implemented in exactly this way.
> Sorry if I wasn't clear.  Let me quote what I wrote:

Have you had a chance to read this reply?  In the v4 patchset I don't
see any evidence that you read this reply.  And you didn't respond to it
either.

- Eric