arch/x86/crypto/crc32-pclmul_asm.S | 6 +- arch/x86/crypto/crc32-pclmul_glue.c | 36 ++- arch/x86/crypto/crc32c-intel_glue.c | 58 +++-- arch/x86/crypto/crct10dif-pclmul_glue.c | 54 ++-- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +- arch/x86/crypto/ghash-clmulni-intel_glue.c | 43 ++-- arch/x86/crypto/nhpoly1305-avx2-glue.c | 21 +- arch/x86/crypto/nhpoly1305-sse2-glue.c | 21 +- arch/x86/crypto/poly1305_glue.c | 49 +++- arch/x86/crypto/polyval-clmulni_glue.c | 14 +- arch/x86/crypto/sha1_ssse3_glue.c | 276 +++++++++++++-------- arch/x86/crypto/sha256_ssse3_glue.c | 268 +++++++++++++------- arch/x86/crypto/sha512_ssse3_glue.c | 191 ++++++++------ arch/x86/crypto/sm3_avx_glue.c | 45 +++- crypto/tcrypt.c | 56 +++-- 15 files changed, 764 insertions(+), 378 deletions(-)
This series fixes the RCU stalls triggered by the x86 crypto modules discussed in https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ Two root causes were: - too much data processed between kernel_fpu_begin and kernel_fpu_end calls (which are heavily used by the x86 optimized drivers) - tcrypt not calling cond_resched during speed test loops These problems have always been lurking, but improving the loading of the x86/sha512 module led to it happening a lot during boot when using SHA-512 for module signature checking. Fixing these problems makes it safer to improve loading the rest of the x86 modules like the sha512 module. This series only handles the x86 modules. Except for the tcrypt change, v3 only tackles the hash functions as discussed in https://lore.kernel.org/lkml/MW5PR84MB184284FBED63E2D043C93A6FAB369@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ The limits are implemented as static const unsigned ints at the module level, which makes them easy to expose as module parameters for testing like this: -static const unsigned int bytes_per_fpu = 655 * 1024; +static unsigned int bytes_per_fpu = 655 * 1024; +module_param(bytes_per_fpu, uint, 0644); +MODULE_PARM_DESC(bytes_per_fpu, "Bytes per FPU context"); Robert Elliott (17): crypto: tcrypt - test crc32 crypto: tcrypt - test nhpoly1305 crypto: tcrypt - reschedule during cycles speed tests crypto: x86/sha - limit FPU preemption crypto: x86/crc - limit FPU preemption crypto: x86/sm3 - limit FPU preemption crypto: x86/ghash - use u8 rather than char crypto: x86/ghash - restructure FPU context saving crypto: x86/ghash - limit FPU preemption crypto: x86/*poly* - limit FPU preemption crypto: x86/sha - register all variations crypto: x86/sha - minimize time in FPU context crypto: x86/sha1, sha256 - load based on CPU features crypto: x86/crc - load based on CPU features crypto: x86/sm3 - load based on CPU features crypto: x86/ghash,polyval - load based on CPU features crypto: x86/nhpoly1305, poly1305 - load based on CPU features arch/x86/crypto/crc32-pclmul_asm.S | 6 +- arch/x86/crypto/crc32-pclmul_glue.c | 36 ++- arch/x86/crypto/crc32c-intel_glue.c | 58 +++-- arch/x86/crypto/crct10dif-pclmul_glue.c | 54 ++-- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +- arch/x86/crypto/ghash-clmulni-intel_glue.c | 43 ++-- arch/x86/crypto/nhpoly1305-avx2-glue.c | 21 +- arch/x86/crypto/nhpoly1305-sse2-glue.c | 21 +- arch/x86/crypto/poly1305_glue.c | 49 +++- arch/x86/crypto/polyval-clmulni_glue.c | 14 +- arch/x86/crypto/sha1_ssse3_glue.c | 276 +++++++++++++-------- arch/x86/crypto/sha256_ssse3_glue.c | 268 +++++++++++++------- arch/x86/crypto/sha512_ssse3_glue.c | 191 ++++++++------ arch/x86/crypto/sm3_avx_glue.c | 45 +++- crypto/tcrypt.c | 56 +++-- 15 files changed, 764 insertions(+), 378 deletions(-) -- 2.37.3
This series fixes the RCU stalls triggered by the x86 crypto modules discussed in https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ Two root causes were: - too much data processed between kernel_fpu_begin and kernel_fpu_end calls (which are heavily used by the x86 optimized drivers) - tcrypt not calling cond_resched during speed test loops These problems have always been lurking, but improving the loading of the x86/sha512 module led to it happening a lot during boot when using SHA-512 for module signature checking. Fixing these problems makes it safer to improve loading the rest of the x86 modules like the sha512 module. This series only handles the x86 modules. Version 4 tackles lingering comments from version 2. 1. Unlike the hash functions, skcipher and aead functions accept pointers to scatter-gather lists, and the helper functions that walk through those lists limit processing to a page size at a time. The aegis module did everything inside one pair of kernel_fpu_begin() and kernel_fpu_end() calls including walking through the sglist, so it could preempt the CPU without constraint. The aesni aead functions for gcm process the additional data (data that is included in the authentication tag calculation but not encrypted) in one FPU context, so that can be a problem. This will require some asm changes to fix. However, I don't think that is a typical use case, so this series defers fixing that. The series adds device table matching for all the x86 crypto modules. 2. I replaced all the positive and negative prints with module parameters, including enough clues in modinfo descriptions that a user can determine what is working and not working. Robert Elliott (24): crypto: tcrypt - test crc32 crypto: tcrypt - test nhpoly1305 crypto: tcrypt - reschedule during cycles speed tests crypto: x86/sha - limit FPU preemption crypto: x86/crc - limit FPU preemption crypto: x86/sm3 - limit FPU preemption crypto: x86/ghash - use u8 rather than char crypto: x86/ghash - restructure FPU context saving crypto: x86/ghash - limit FPU preemption crypto: x86/poly - limit FPU preemption crypto: x86/aegis - limit FPU preemption crypto: x86/sha - register all variations crypto: x86/sha - minimize time in FPU context crypto: x86/sha - load based on CPU features crypto: x86/crc - load based on CPU features crypto: x86/sm3 - load based on CPU features crypto: x86/poly - load based on CPU features crypto: x86/ghash - load based on CPU features crypto: x86/aesni - avoid type conversions crypto: x86/ciphers - load based on CPU features crypto: x86 - report used CPU features via module parameters crypto: x86 - report missing CPU features via module parameters crypto: x86 - report suboptimal CPUs via module parameters crypto: x86 - standarize module descriptions arch/x86/crypto/aegis128-aesni-glue.c | 66 +++-- arch/x86/crypto/aesni-intel_glue.c | 45 ++-- arch/x86/crypto/aria_aesni_avx_glue.c | 43 ++- arch/x86/crypto/blake2s-glue.c | 18 +- arch/x86/crypto/blowfish_glue.c | 39 ++- arch/x86/crypto/camellia_aesni_avx2_glue.c | 40 ++- arch/x86/crypto/camellia_aesni_avx_glue.c | 38 ++- arch/x86/crypto/camellia_glue.c | 37 ++- arch/x86/crypto/cast5_avx_glue.c | 30 ++- arch/x86/crypto/cast6_avx_glue.c | 30 ++- arch/x86/crypto/chacha_glue.c | 18 +- arch/x86/crypto/crc32-pclmul_asm.S | 6 +- arch/x86/crypto/crc32-pclmul_glue.c | 39 ++- arch/x86/crypto/crc32c-intel_glue.c | 66 +++-- arch/x86/crypto/crct10dif-pclmul_glue.c | 56 ++-- arch/x86/crypto/curve25519-x86_64.c | 29 +- arch/x86/crypto/des3_ede_glue.c | 36 ++- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +- arch/x86/crypto/ghash-clmulni-intel_glue.c | 45 ++-- arch/x86/crypto/nhpoly1305-avx2-glue.c | 36 ++- arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 +- arch/x86/crypto/poly1305_glue.c | 56 +++- arch/x86/crypto/polyval-clmulni_glue.c | 31 ++- arch/x86/crypto/serpent_avx2_glue.c | 36 ++- arch/x86/crypto/serpent_avx_glue.c | 31 ++- arch/x86/crypto/serpent_sse2_glue.c | 13 +- arch/x86/crypto/sha1_ssse3_glue.c | 298 ++++++++++++++------- arch/x86/crypto/sha256_ssse3_glue.c | 294 +++++++++++++------- arch/x86/crypto/sha512_ssse3_glue.c | 205 +++++++++----- arch/x86/crypto/sm3_avx_glue.c | 70 +++-- arch/x86/crypto/sm4_aesni_avx2_glue.c | 37 ++- arch/x86/crypto/sm4_aesni_avx_glue.c | 39 ++- arch/x86/crypto/twofish_avx_glue.c | 29 +- arch/x86/crypto/twofish_glue.c | 12 +- arch/x86/crypto/twofish_glue_3way.c | 36 ++- crypto/aes_ti.c | 2 +- crypto/blake2b_generic.c | 2 +- crypto/blowfish_common.c | 2 +- crypto/crct10dif_generic.c | 2 +- crypto/curve25519-generic.c | 1 + crypto/sha256_generic.c | 2 +- crypto/sha512_generic.c | 2 +- crypto/sm3.c | 2 +- crypto/sm4.c | 2 +- crypto/tcrypt.c | 56 ++-- crypto/twofish_common.c | 2 +- crypto/twofish_generic.c | 2 +- 47 files changed, 1377 insertions(+), 630 deletions(-) -- 2.38.1
On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote: > This series fixes the RCU stalls triggered by the x86 crypto > modules discussed in > https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ > > Two root causes were: > - too much data processed between kernel_fpu_begin and > kernel_fpu_end calls (which are heavily used by the x86 > optimized drivers) > - tcrypt not calling cond_resched during speed test loops > > These problems have always been lurking, but improving the > loading of the x86/sha512 module led to it happening a lot > during boot when using SHA-512 for module signature checking. Can we split this series up please? The fixes to the stalls should stand separately from the changes to how modules are loaded. The latter is more of an improvement while the former should be applied ASAP. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -----Original Message-----
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Sent: Wednesday, November 16, 2022 9:59 PM
> Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls
>
> On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
...
> > These problems have always been lurking, but improving the
> > loading of the x86/sha512 module led to it happening a lot
> > during boot when using SHA-512 for module signature checking.
>
> Can we split this series up please? The fixes to the stalls should
> stand separately from the changes to how modules are loaded. The
> latter is more of an improvement while the former should be applied
> ASAP.
Yes. With the v4 patch numbers:
[PATCH v4 01/24] crypto: tcrypt - test crc32
[PATCH v4 02/24] crypto: tcrypt - test nhpoly1305
Those ensure the changes to those hash modules are testable.
[PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed
That's only for tcrypt so not urgent for users, but pretty
simple.
[PATCH v4 04/24] crypto: x86/sha - limit FPU preemption
[PATCH v4 05/24] crypto: x86/crc - limit FPU preemption
[PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption
[PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char
[PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving
[PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption
[PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
[PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption
[PATCH v4 12/24] crypto: x86/sha - register all variations
[PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context
That's the end of the fixes set.
[PATCH v4 14/24] crypto: x86/sha - load based on CPU features
[PATCH v4 15/24] crypto: x86/crc - load based on CPU features
[PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features
[PATCH v4 17/24] crypto: x86/poly - load based on CPU features
[PATCH v4 18/24] crypto: x86/ghash - load based on CPU features
[PATCH v4 19/24] crypto: x86/aesni - avoid type conversions
[PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features
[PATCH v4 21/24] crypto: x86 - report used CPU features via module
[PATCH v4 22/24] crypto: x86 - report missing CPU features via module
[PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module
[PATCH v4 24/24] crypto: x86 - standardize module descriptions
I'll put those in a new series.
For 6.1, I still suggest reverting aa031b8f702e ("crypto: x86/sha512 -
load based on CPU features) since that exposed the problem. Target
the fixes for 6.2 and module loading for 6.2 or 6.3.
On Thu, Nov 17, 2022 at 4:14 PM Elliott, Robert (Servers) <elliott@hpe.com> wrote: > > -----Original Message----- > > From: Herbert Xu <herbert@gondor.apana.org.au> > > Sent: Wednesday, November 16, 2022 9:59 PM > > Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls > > > > On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote: > ... > > > These problems have always been lurking, but improving the > > > loading of the x86/sha512 module led to it happening a lot > > > during boot when using SHA-512 for module signature checking. > > > > Can we split this series up please? The fixes to the stalls should > > stand separately from the changes to how modules are loaded. The > > latter is more of an improvement while the former should be applied > > ASAP. > > Yes. With the v4 patch numbers: > [PATCH v4 01/24] crypto: tcrypt - test crc32 > [PATCH v4 02/24] crypto: tcrypt - test nhpoly1305 > > Those ensure the changes to those hash modules are testable. > > [PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed > > That's only for tcrypt so not urgent for users, but pretty > simple. > > [PATCH v4 04/24] crypto: x86/sha - limit FPU preemption > [PATCH v4 05/24] crypto: x86/crc - limit FPU preemption > [PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption > [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char > [PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving > [PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption > [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption > [PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption > [PATCH v4 12/24] crypto: x86/sha - register all variations > [PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context > > That's the end of the fixes set. > > [PATCH v4 14/24] crypto: x86/sha - load based on CPU features > [PATCH v4 15/24] crypto: x86/crc - load based on CPU features > [PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features > [PATCH v4 17/24] crypto: x86/poly - load based on CPU features > [PATCH v4 18/24] crypto: x86/ghash - load based on CPU features > [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions > [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features > [PATCH v4 21/24] crypto: x86 - report used CPU features via module > [PATCH v4 22/24] crypto: x86 - report missing CPU features via module > [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module > [PATCH v4 24/24] crypto: x86 - standardize module descriptions > > I'll put those in a new series. Thanks. Please take into account my review feedback this time for your next series. Jason
Add self-test and speed tests for crc32, paralleling those
offered for crc32c and crct10dif.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
crypto/tcrypt.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index a82679b576bb..4426386dfb42 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1711,6 +1711,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("gcm(aria)");
break;
+ case 59:
+ ret += tcrypt_test("crc32");
+ break;
+
case 100:
ret += tcrypt_test("hmac(md5)");
break;
@@ -2317,6 +2321,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
fallthrough;
+ case 329:
+ test_hash_speed("crc32", sec, generic_hash_speed_template);
+ if (mode > 300 && mode < 400) break;
+ fallthrough;
case 399:
break;
--
2.38.1
Add self-test mode for nhpoly1305.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
crypto/tcrypt.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 4426386dfb42..7a6a56751043 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1715,6 +1715,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("crc32");
break;
+ case 60:
+ ret += tcrypt_test("nhpoly1305");
+ break;
+
case 100:
ret += tcrypt_test("hmac(md5)");
break;
--
2.38.1
commit 2af632996b89 ("crypto: tcrypt - reschedule during speed tests")
added cond_resched() calls to "Avoid RCU stalls in the case of
non-preemptible kernel and lengthy speed tests by rescheduling when
advancing from one block size to another."
It only makes those calls if the sec module parameter is used
(run the speed test for a certain number of seconds), not the
default "cycles" mode.
Expand those to also run in "cycles" mode to reduce the rate
of rcu stall warnings:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks:
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
crypto/tcrypt.c | 44 ++++++++++++++++++--------------------------
1 file changed, 18 insertions(+), 26 deletions(-)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 7a6a56751043..c025ba26b663 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -408,14 +408,13 @@ static void test_mb_aead_speed(const char *algo, int enc, int secs,
}
- if (secs) {
+ if (secs)
ret = test_mb_aead_jiffies(data, enc, bs,
secs, num_mb);
- cond_resched();
- } else {
+ else
ret = test_mb_aead_cycles(data, enc, bs,
num_mb);
- }
+ cond_resched();
if (ret) {
pr_err("%s() failed return code=%d\n", e, ret);
@@ -661,13 +660,11 @@ static void test_aead_speed(const char *algo, int enc, unsigned int secs,
bs + (enc ? 0 : authsize),
iv);
- if (secs) {
- ret = test_aead_jiffies(req, enc, bs,
- secs);
- cond_resched();
- } else {
+ if (secs)
+ ret = test_aead_jiffies(req, enc, bs, secs);
+ else
ret = test_aead_cycles(req, enc, bs);
- }
+ cond_resched();
if (ret) {
pr_err("%s() failed return code=%d\n", e, ret);
@@ -917,14 +914,13 @@ static void test_ahash_speed_common(const char *algo, unsigned int secs,
ahash_request_set_crypt(req, sg, output, speed[i].plen);
- if (secs) {
+ if (secs)
ret = test_ahash_jiffies(req, speed[i].blen,
speed[i].plen, output, secs);
- cond_resched();
- } else {
+ else
ret = test_ahash_cycles(req, speed[i].blen,
speed[i].plen, output);
- }
+ cond_resched();
if (ret) {
pr_err("hashing failed ret=%d\n", ret);
@@ -1184,15 +1180,14 @@ static void test_mb_skcipher_speed(const char *algo, int enc, int secs,
cur->sg, bs, iv);
}
- if (secs) {
+ if (secs)
ret = test_mb_acipher_jiffies(data, enc,
bs, secs,
num_mb);
- cond_resched();
- } else {
+ else
ret = test_mb_acipher_cycles(data, enc,
bs, num_mb);
- }
+ cond_resched();
if (ret) {
pr_err("%s() failed flags=%x\n", e,
@@ -1401,14 +1396,11 @@ static void test_skcipher_speed(const char *algo, int enc, unsigned int secs,
skcipher_request_set_crypt(req, sg, sg, bs, iv);
- if (secs) {
- ret = test_acipher_jiffies(req, enc,
- bs, secs);
- cond_resched();
- } else {
- ret = test_acipher_cycles(req, enc,
- bs);
- }
+ if (secs)
+ ret = test_acipher_jiffies(req, enc, bs, secs);
+ else
+ ret = test_acipher_cycles(req, enc, bs);
+ cond_resched();
if (ret) {
pr_err("%s() failed flags=%x\n", e,
--
2.38.1
Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.
Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running.
This leads to "rcu_preempt detected expedited stalls" with stack dumps
pointing to the optimized hash function if the module is loaded and
used a lot:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...
For example, that can occur during boot with the stack track pointing
to the sha512-x86 function if the system set to use SHA-512 for
module signing. The call trace includes:
module_sig_check
mod_verify_sig
pkcs7_verify
pkcs7_digest
sha512_finup
sha512_base_do_update
Fixes: 66be89515888 ("crypto: sha1 - SSSE3 based SHA1 implementation for x86-64")
Fixes: 8275d1aa6422 ("crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.")
Fixes: 87de4579f92d ("crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.")
Fixes: aa031b8f702e ("crypto: x86/sha512 - load based on CPU features")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 simplify to while loops rather than do..while loops, avoid
redundant checks for zero length, rename the limit macro and
change into a const, vary the limit for each algo
---
arch/x86/crypto/sha1_ssse3_glue.c | 64 ++++++++++++++++++++++-------
arch/x86/crypto/sha256_ssse3_glue.c | 64 ++++++++++++++++++++++-------
arch/x86/crypto/sha512_ssse3_glue.c | 55 +++++++++++++++++++------
3 files changed, 140 insertions(+), 43 deletions(-)
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 44340a1139e0..4bc77c84b0fb 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -26,8 +26,17 @@
#include <crypto/sha1_base.h>
#include <asm/simd.h>
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+#ifdef CONFIG_AS_SHA1_NI
+static const unsigned int bytes_per_fpu_shani = 34 * 1024;
+#endif
+static const unsigned int bytes_per_fpu_avx2 = 34 * 1024;
+static const unsigned int bytes_per_fpu_avx = 30 * 1024;
+static const unsigned int bytes_per_fpu_ssse3 = 26 * 1024;
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha1_block_fn *sha1_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ sha1_block_fn *sha1_xform)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -41,22 +50,39 @@ static int sha1_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);
- kernel_fpu_begin();
- sha1_base_do_update(desc, data, len, sha1_xform);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha1_base_do_update(desc, data, chunk, sha1_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
return 0;
}
static int sha1_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out, sha1_block_fn *sha1_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ u8 *out, sha1_block_fn *sha1_xform)
{
if (!crypto_simd_usable())
return crypto_sha1_finup(desc, data, len, out);
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha1_base_do_update(desc, data, chunk, sha1_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sha1_base_do_update(desc, data, len, sha1_xform);
sha1_base_do_finalize(desc, sha1_xform);
kernel_fpu_end();
@@ -69,13 +95,15 @@ asmlinkage void sha1_transform_ssse3(struct sha1_state *state,
static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_transform_ssse3);
+ return sha1_update(desc, data, len, bytes_per_fpu_ssse3,
+ sha1_transform_ssse3);
}
static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_transform_ssse3);
+ return sha1_finup(desc, data, len, bytes_per_fpu_ssse3, out,
+ sha1_transform_ssse3);
}
/* Add padding and return the message digest. */
@@ -119,13 +147,15 @@ asmlinkage void sha1_transform_avx(struct sha1_state *state,
static int sha1_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_transform_avx);
+ return sha1_update(desc, data, len, bytes_per_fpu_avx,
+ sha1_transform_avx);
}
static int sha1_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_transform_avx);
+ return sha1_finup(desc, data, len, bytes_per_fpu_avx, out,
+ sha1_transform_avx);
}
static int sha1_avx_final(struct shash_desc *desc, u8 *out)
@@ -201,13 +231,15 @@ static void sha1_apply_transform_avx2(struct sha1_state *state,
static int sha1_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_apply_transform_avx2);
+ return sha1_update(desc, data, len, bytes_per_fpu_avx2,
+ sha1_apply_transform_avx2);
}
static int sha1_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_apply_transform_avx2);
+ return sha1_finup(desc, data, len, bytes_per_fpu_avx2, out,
+ sha1_apply_transform_avx2);
}
static int sha1_avx2_final(struct shash_desc *desc, u8 *out)
@@ -251,13 +283,15 @@ asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data,
static int sha1_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_ni_transform);
+ return sha1_update(desc, data, len, bytes_per_fpu_shani,
+ sha1_ni_transform);
}
static int sha1_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_ni_transform);
+ return sha1_finup(desc, data, len, bytes_per_fpu_shani, out,
+ sha1_ni_transform);
}
static int sha1_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 3a5f6be7dbba..cdcdf5a80ffe 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -40,11 +40,20 @@
#include <linux/string.h>
#include <asm/simd.h>
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+#ifdef CONFIG_AS_SHA256_NI
+static const unsigned int bytes_per_fpu_shani = 13 * 1024;
+#endif
+static const unsigned int bytes_per_fpu_avx2 = 13 * 1024;
+static const unsigned int bytes_per_fpu_avx = 11 * 1024;
+static const unsigned int bytes_per_fpu_ssse3 = 11 * 1024;
+
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);
static int _sha256_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha256_block_fn *sha256_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ sha256_block_fn *sha256_xform)
{
struct sha256_state *sctx = shash_desc_ctx(desc);
@@ -58,22 +67,39 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
- kernel_fpu_begin();
- sha256_base_do_update(desc, data, len, sha256_xform);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha256_base_do_update(desc, data, chunk, sha256_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
return 0;
}
static int sha256_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out, sha256_block_fn *sha256_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ u8 *out, sha256_block_fn *sha256_xform)
{
if (!crypto_simd_usable())
return crypto_sha256_finup(desc, data, len, out);
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha256_base_do_update(desc, data, chunk, sha256_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sha256_base_do_update(desc, data, len, sha256_xform);
sha256_base_do_finalize(desc, sha256_xform);
kernel_fpu_end();
@@ -83,13 +109,15 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data,
static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_transform_ssse3);
+ return _sha256_update(desc, data, len, bytes_per_fpu_ssse3,
+ sha256_transform_ssse3);
}
static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_transform_ssse3);
+ return sha256_finup(desc, data, len, bytes_per_fpu_ssse3,
+ out, sha256_transform_ssse3);
}
/* Add padding and return the message digest. */
@@ -149,13 +177,15 @@ asmlinkage void sha256_transform_avx(struct sha256_state *state,
static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_transform_avx);
+ return _sha256_update(desc, data, len, bytes_per_fpu_avx,
+ sha256_transform_avx);
}
static int sha256_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_transform_avx);
+ return sha256_finup(desc, data, len, bytes_per_fpu_avx,
+ out, sha256_transform_avx);
}
static int sha256_avx_final(struct shash_desc *desc, u8 *out)
@@ -225,13 +255,15 @@ asmlinkage void sha256_transform_rorx(struct sha256_state *state,
static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_transform_rorx);
+ return _sha256_update(desc, data, len, bytes_per_fpu_avx2,
+ sha256_transform_rorx);
}
static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_transform_rorx);
+ return sha256_finup(desc, data, len, bytes_per_fpu_avx2,
+ out, sha256_transform_rorx);
}
static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
@@ -300,13 +332,15 @@ asmlinkage void sha256_ni_transform(struct sha256_state *digest,
static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_ni_transform);
+ return _sha256_update(desc, data, len, bytes_per_fpu_shani,
+ sha256_ni_transform);
}
static int sha256_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_ni_transform);
+ return sha256_finup(desc, data, len, bytes_per_fpu_shani,
+ out, sha256_ni_transform);
}
static int sha256_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 6d3b85e53d0e..c7036cfe2a7e 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -39,11 +39,17 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu_avx2 = 20 * 1024;
+static const unsigned int bytes_per_fpu_avx = 17 * 1024;
+static const unsigned int bytes_per_fpu_ssse3 = 17 * 1024;
+
asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
const u8 *data, int blocks);
static int sha512_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha512_block_fn *sha512_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ sha512_block_fn *sha512_xform)
{
struct sha512_state *sctx = shash_desc_ctx(desc);
@@ -57,22 +63,39 @@ static int sha512_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
- kernel_fpu_begin();
- sha512_base_do_update(desc, data, len, sha512_xform);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha512_base_do_update(desc, data, chunk, sha512_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
return 0;
}
static int sha512_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out, sha512_block_fn *sha512_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ u8 *out, sha512_block_fn *sha512_xform)
{
if (!crypto_simd_usable())
return crypto_sha512_finup(desc, data, len, out);
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha512_base_do_update(desc, data, chunk, sha512_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sha512_base_do_update(desc, data, len, sha512_xform);
sha512_base_do_finalize(desc, sha512_xform);
kernel_fpu_end();
@@ -82,13 +105,15 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data,
static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha512_update(desc, data, len, sha512_transform_ssse3);
+ return sha512_update(desc, data, len, bytes_per_fpu_ssse3,
+ sha512_transform_ssse3);
}
static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha512_finup(desc, data, len, out, sha512_transform_ssse3);
+ return sha512_finup(desc, data, len, bytes_per_fpu_ssse3,
+ out, sha512_transform_ssse3);
}
/* Add padding and return the message digest. */
@@ -158,13 +183,15 @@ static bool avx_usable(void)
static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha512_update(desc, data, len, sha512_transform_avx);
+ return sha512_update(desc, data, len, bytes_per_fpu_avx,
+ sha512_transform_avx);
}
static int sha512_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha512_finup(desc, data, len, out, sha512_transform_avx);
+ return sha512_finup(desc, data, len, bytes_per_fpu_avx,
+ out, sha512_transform_avx);
}
/* Add padding and return the message digest. */
@@ -224,13 +251,15 @@ asmlinkage void sha512_transform_rorx(struct sha512_state *state,
static int sha512_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha512_update(desc, data, len, sha512_transform_rorx);
+ return sha512_update(desc, data, len, bytes_per_fpu_avx2,
+ sha512_transform_rorx);
}
static int sha512_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha512_finup(desc, data, len, out, sha512_transform_rorx);
+ return sha512_finup(desc, data, len, bytes_per_fpu_avx2,
+ out, sha512_transform_rorx);
}
/* Add padding and return the message digest. */
--
2.38.1
Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.
Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...
Fixes: 78c37d191dd6 ("crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation")
Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction")
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 use while loops and static int, simplify one of the loop structures,
add algorithm-specific limits, use local stack variable in crc32 finup
rather than the context pointer like update uses
---
arch/x86/crypto/crc32-pclmul_asm.S | 6 +--
arch/x86/crypto/crc32-pclmul_glue.c | 27 +++++++++----
arch/x86/crypto/crc32c-intel_glue.c | 52 ++++++++++++++++++-------
arch/x86/crypto/crct10dif-pclmul_glue.c | 48 +++++++++++++++++------
4 files changed, 99 insertions(+), 34 deletions(-)
diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S
index ca53e96996ac..9abd861636c3 100644
--- a/arch/x86/crypto/crc32-pclmul_asm.S
+++ b/arch/x86/crypto/crc32-pclmul_asm.S
@@ -72,15 +72,15 @@
.text
/**
* Calculate crc32
- * BUF - buffer (16 bytes aligned)
- * LEN - sizeof buffer (16 bytes aligned), LEN should be grater than 63
+ * BUF - buffer - must be 16 bytes aligned
+ * LEN - sizeof buffer - must be multiple of 16 bytes and greater than 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pclmul_le_16(unsigned char const *buffer,
* size_t len, uint crc32)
*/
-SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */
+SYM_FUNC_START(crc32_pclmul_le_16)
movdqa (BUF), %xmm1
movdqa 0x10(BUF), %xmm2
movdqa 0x20(BUF), %xmm3
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index 98cf3b4e4c9f..df3dbc754818 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -46,6 +46,9 @@
#define SCALE_F 16L /* size of xmm register */
#define SCALE_F_MASK (SCALE_F - 1)
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 655 * 1024;
+
u32 crc32_pclmul_le_16(unsigned char const *buffer, size_t len, u32 crc32);
static u32 __attribute__((pure))
@@ -55,6 +58,9 @@ static u32 __attribute__((pure))
unsigned int iremainder;
unsigned int prealign;
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+ BUILD_BUG_ON(bytes_per_fpu & SCALE_F_MASK);
+
if (len < PCLMUL_MIN_LEN + SCALE_F_MASK || !crypto_simd_usable())
return crc32_le(crc, p, len);
@@ -70,12 +76,19 @@ static u32 __attribute__((pure))
iquotient = len & (~SCALE_F_MASK);
iremainder = len & SCALE_F_MASK;
- kernel_fpu_begin();
- crc = crc32_pclmul_le_16(p, iquotient, crc);
- kernel_fpu_end();
+ while (iquotient >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(iquotient, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc32_pclmul_le_16(p, chunk, crc);
+ kernel_fpu_end();
+
+ iquotient -= chunk;
+ p += chunk;
+ }
- if (iremainder)
- crc = crc32_le(crc, p + iquotient, iremainder);
+ if (iquotient || iremainder)
+ crc = crc32_le(crc, p, iquotient + iremainder);
return crc;
}
@@ -120,8 +133,8 @@ static int crc32_pclmul_update(struct shash_desc *desc, const u8 *data,
}
/* No final XOR 0xFFFFFFFF, like crc32_le */
-static int __crc32_pclmul_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32_pclmul_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
*(__le32 *)out = cpu_to_le32(crc32_pclmul_le(*crcp, data, len));
return 0;
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb5254c7e..f08ed68ec93d 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -45,7 +45,10 @@ asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
unsigned int crc_init);
#endif /* CONFIG_X86_64 */
-static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 868 * 1024;
+
+static u32 crc32c_intel_le_hw_byte(u32 crc, const unsigned char *data, size_t length)
{
while (length--) {
asm("crc32b %1, %0"
@@ -56,7 +59,7 @@ static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t le
return crc;
}
-static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len)
+static u32 __pure crc32c_intel_le_hw(u32 crc, const unsigned char *p, size_t len)
{
unsigned int iquotient = len / SCALE_F;
unsigned int iremainder = len % SCALE_F;
@@ -110,8 +113,8 @@ static int crc32c_intel_update(struct shash_desc *desc, const u8 *data,
return 0;
}
-static int __crc32c_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32c_intel_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
*(__le32 *)out = ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
return 0;
@@ -153,29 +156,52 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
{
u32 *crcp = shash_desc_ctx(desc);
+ BUILD_BUG_ON(bytes_per_fpu < CRC32C_PCL_BREAKEVEN);
+ BUILD_BUG_ON(bytes_per_fpu % SCALE_F);
+
/*
* use faster PCL version if datasize is large enough to
* overcome kernel fpu state save/restore overhead
*/
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *crcp = crc_pcl(data, len, *crcp);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
} else
*crcp = crc32c_intel_le_hw(*crcp, data, len);
return 0;
}
-static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32c_pcl_intel_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
+ u32 crc = *crcp;
+
+ BUILD_BUG_ON(bytes_per_fpu < CRC32C_PCL_BREAKEVEN);
+ BUILD_BUG_ON(bytes_per_fpu % SCALE_F);
+
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc_pcl(data, chunk, crc);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+ *(__le32 *)out = ~cpu_to_le32(crc);
} else
*(__le32 *)out =
- ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
+ ~cpu_to_le32(crc32c_intel_le_hw(crc, data, len));
return 0;
}
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 71291d5af9f4..4f6b8c727d88 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -34,6 +34,11 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>
+#define PCLMUL_MIN_LEN 16U /* minimum size of buffer for crc_t10dif_pcl */
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 614 * 1024;
+
asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len);
struct chksum_desc_ctx {
@@ -54,11 +59,21 @@ static int chksum_update(struct shash_desc *desc, const u8 *data,
{
struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
- if (length >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- ctx->crc = crc_t10dif_pcl(ctx->crc, data, length);
- kernel_fpu_end();
- } else
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+
+ if (length >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ while (length >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(length, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ ctx->crc = crc_t10dif_pcl(ctx->crc, data, chunk);
+ kernel_fpu_end();
+
+ length -= chunk;
+ data += chunk;
+ }
+ }
+ if (length)
ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
return 0;
}
@@ -73,12 +88,23 @@ static int chksum_final(struct shash_desc *desc, u8 *out)
static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
{
- if (len >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__u16 *)out = crc_t10dif_pcl(crc, data, len);
- kernel_fpu_end();
- } else
- *(__u16 *)out = crc_t10dif_generic(crc, data, len);
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+
+ if (len >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ while (len >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc_t10dif_pcl(crc, data, chunk);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+ }
+ if (len)
+ crc = crc_t10dif_generic(crc, data, len);
+ *(__u16 *)out = crc;
return 0;
}
--
2.38.1
Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.
Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, causing:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...
Fixes: 930ab34d906d ("crypto: x86/sm3 - add AVX assembly implementation")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 use while loop, static int
---
arch/x86/crypto/sm3_avx_glue.c | 35 ++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 661b6f22ffcd..483aaed996ba 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -17,6 +17,9 @@
#include <crypto/sm3_base.h>
#include <asm/simd.h>
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 11 * 1024;
+
asmlinkage void sm3_transform_avx(struct sm3_state *state,
const u8 *data, int nblocks);
@@ -25,8 +28,10 @@ static int sm3_avx_update(struct shash_desc *desc, const u8 *data,
{
struct sm3_state *sctx = shash_desc_ctx(desc);
+ BUILD_BUG_ON(bytes_per_fpu == 0);
+
if (!crypto_simd_usable() ||
- (sctx->count % SM3_BLOCK_SIZE) + len < SM3_BLOCK_SIZE) {
+ (sctx->count % SM3_BLOCK_SIZE) + len < SM3_BLOCK_SIZE) {
sm3_update(sctx, data, len);
return 0;
}
@@ -37,9 +42,16 @@ static int sm3_avx_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
- kernel_fpu_begin();
- sm3_base_do_update(desc, data, len, sm3_transform_avx);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sm3_base_do_update(desc, data, chunk, sm3_transform_avx);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
return 0;
}
@@ -47,6 +59,8 @@ static int sm3_avx_update(struct shash_desc *desc, const u8 *data,
static int sm3_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
+ BUILD_BUG_ON(bytes_per_fpu == 0);
+
if (!crypto_simd_usable()) {
struct sm3_state *sctx = shash_desc_ctx(desc);
@@ -57,9 +71,18 @@ static int sm3_avx_finup(struct shash_desc *desc, const u8 *data,
return 0;
}
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sm3_base_do_update(desc, data, chunk, sm3_transform_avx);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sm3_base_do_update(desc, data, len, sm3_transform_avx);
sm3_base_do_finalize(desc, sm3_transform_avx);
kernel_fpu_end();
--
2.38.1
Use more consistent unambivalent types for the source and destination
buffer pointer arguments.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 ++--
arch/x86/crypto/ghash-clmulni-intel_glue.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index 2bf871899920..c7b8542facee 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -88,7 +88,7 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
RET
SYM_FUNC_END(__clmul_gf128mul_ble)
-/* void clmul_ghash_mul(char *dst, const u128 *shash) */
+/* void clmul_ghash_mul(u8 *dst, const u128 *shash) */
SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
@@ -103,7 +103,7 @@ SYM_FUNC_START(clmul_ghash_mul)
SYM_FUNC_END(clmul_ghash_mul)
/*
- * void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
+ * void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen,
* const u128 *shash);
*/
SYM_FUNC_START(clmul_ghash_update)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 1f1a95f3dd0c..e996627c6583 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -23,9 +23,9 @@
#define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16
-void clmul_ghash_mul(char *dst, const u128 *shash);
+void clmul_ghash_mul(u8 *dst, const u128 *shash);
-void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
+void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen,
const u128 *shash);
struct ghash_async_ctx {
--
2.38.1
Wrap each of the calls to clmul_hash_update and clmul_ghash__mul
in its own set of kernel_fpu_begin and kernel_fpu_end calls, preparing
to limit the amount of data processed by each _update call to avoid
RCU stalls.
This is more like how polyval-clmulni_glue is structured.
Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index e996627c6583..22367e363d72 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -80,7 +80,6 @@ static int ghash_update(struct shash_desc *desc,
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *dst = dctx->buffer;
- kernel_fpu_begin();
if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
@@ -91,10 +90,14 @@ static int ghash_update(struct shash_desc *desc,
while (n--)
*pos++ ^= *src++;
- if (!dctx->bytes)
+ if (!dctx->bytes) {
+ kernel_fpu_begin();
clmul_ghash_mul(dst, &ctx->shash);
+ kernel_fpu_end();
+ }
}
+ kernel_fpu_begin();
clmul_ghash_update(dst, src, srclen, &ctx->shash);
kernel_fpu_end();
--
2.38.1
Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.
Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...
Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 change to static int, simplify while loop
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 28 +++++++++++++++-------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 22367e363d72..0f24c3b23fd2 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -20,8 +20,11 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>
-#define GHASH_BLOCK_SIZE 16
-#define GHASH_DIGEST_SIZE 16
+#define GHASH_BLOCK_SIZE 16U
+#define GHASH_DIGEST_SIZE 16U
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 50 * 1024;
void clmul_ghash_mul(u8 *dst, const u128 *shash);
@@ -80,9 +83,11 @@ static int ghash_update(struct shash_desc *desc,
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *dst = dctx->buffer;
+ BUILD_BUG_ON(bytes_per_fpu < GHASH_BLOCK_SIZE);
+
if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
- u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
+ u8 *pos = dst + GHASH_BLOCK_SIZE - dctx->bytes;
dctx->bytes -= n;
srclen -= n;
@@ -97,13 +102,18 @@ static int ghash_update(struct shash_desc *desc,
}
}
- kernel_fpu_begin();
- clmul_ghash_update(dst, src, srclen, &ctx->shash);
- kernel_fpu_end();
+ while (srclen >= GHASH_BLOCK_SIZE) {
+ unsigned int chunk = min(srclen, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ clmul_ghash_update(dst, src, chunk, &ctx->shash);
+ kernel_fpu_end();
+
+ src += chunk & ~(GHASH_BLOCK_SIZE - 1);
+ srclen -= chunk & ~(GHASH_BLOCK_SIZE - 1);
+ }
- if (srclen & 0xf) {
- src += srclen - (srclen & 0xf);
- srclen &= 0xf;
+ if (srclen) {
dctx->bytes = GHASH_BLOCK_SIZE - srclen;
while (srclen--)
*dst++ ^= *src++;
--
2.38.1
Use a static const unsigned int for the limit of the number of bytes
processed between kernel_fpu_begin() and kernel_fpu_end() rather than
using the SZ_4K macro (which is a signed value), or a magic value
of 4096U embedded in the C code.
Use unsigned int rather than size_t for some of the arguments to
avoid typecasting for the min() macro.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 use static int rather than macro, change to while loops
rather than do/while loops
---
arch/x86/crypto/nhpoly1305-avx2-glue.c | 11 +++++---
arch/x86/crypto/nhpoly1305-sse2-glue.c | 11 +++++---
arch/x86/crypto/poly1305_glue.c | 37 +++++++++++++++++---------
arch/x86/crypto/polyval-clmulni_glue.c | 8 ++++--
4 files changed, 46 insertions(+), 21 deletions(-)
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 8ea5ab0f1ca7..f7dc9c563bb5 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -13,6 +13,9 @@
#include <linux/sizes.h>
#include <asm/simd.h>
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 337 * 1024;
+
asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);
@@ -26,18 +29,20 @@ static void _nh_avx2(const u32 *key, const u8 *message, size_t message_len,
static int nhpoly1305_avx2_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
+ BUILD_BUG_ON(bytes_per_fpu == 0);
+
if (srclen < 64 || !crypto_simd_usable())
return crypto_nhpoly1305_update(desc, src, srclen);
- do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ while (srclen) {
+ unsigned int n = min(srclen, bytes_per_fpu);
kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2);
kernel_fpu_end();
src += n;
srclen -= n;
- } while (srclen);
+ }
return 0;
}
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index 2b353d42ed13..daffcc7019ad 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -13,6 +13,9 @@
#include <linux/sizes.h>
#include <asm/simd.h>
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 199 * 1024;
+
asmlinkage void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);
@@ -26,18 +29,20 @@ static void _nh_sse2(const u32 *key, const u8 *message, size_t message_len,
static int nhpoly1305_sse2_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
+ BUILD_BUG_ON(bytes_per_fpu == 0);
+
if (srclen < 64 || !crypto_simd_usable())
return crypto_nhpoly1305_update(desc, src, srclen);
- do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ while (srclen) {
+ unsigned int n = min(srclen, bytes_per_fpu);
kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2);
kernel_fpu_end();
src += n;
srclen -= n;
- } while (srclen);
+ }
return 0;
}
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 1dfb8af48a3c..16831c036d71 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -15,20 +15,27 @@
#include <asm/intel-family.h>
#include <asm/simd.h>
+#define POLY1305_BLOCK_SIZE_MASK (~(POLY1305_BLOCK_SIZE - 1))
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 217 * 1024;
+
asmlinkage void poly1305_init_x86_64(void *ctx,
const u8 key[POLY1305_BLOCK_SIZE]);
asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp,
- const size_t len, const u32 padbit);
+ const unsigned int len,
+ const u32 padbit);
asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
const u32 nonce[4]);
asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
const u32 nonce[4]);
-asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len,
- const u32 padbit);
-asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len,
- const u32 padbit);
+asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp,
+ const unsigned int len, const u32 padbit);
+asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp,
+ const unsigned int len, const u32 padbit);
asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp,
- const size_t len, const u32 padbit);
+ const unsigned int len,
+ const u32 padbit);
static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx);
static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx2);
@@ -86,14 +93,12 @@ static void poly1305_simd_init(void *ctx, const u8 key[POLY1305_BLOCK_SIZE])
poly1305_init_x86_64(ctx, key);
}
-static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
+static void poly1305_simd_blocks(void *ctx, const u8 *inp, unsigned int len,
const u32 padbit)
{
struct poly1305_arch_internal *state = ctx;
- /* SIMD disables preemption, so relax after processing each page. */
- BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE ||
- SZ_4K % POLY1305_BLOCK_SIZE);
+ BUILD_BUG_ON(bytes_per_fpu < POLY1305_BLOCK_SIZE);
if (!static_branch_likely(&poly1305_use_avx) ||
(len < (POLY1305_BLOCK_SIZE * 18) && !state->is_base2_26) ||
@@ -103,8 +108,14 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
return;
}
- do {
- const size_t bytes = min_t(size_t, len, SZ_4K);
+ while (len) {
+ unsigned int bytes;
+
+ if (len < POLY1305_BLOCK_SIZE)
+ bytes = len;
+ else
+ bytes = min(len,
+ bytes_per_fpu & POLY1305_BLOCK_SIZE_MASK);
kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512))
@@ -117,7 +128,7 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
len -= bytes;
inp += bytes;
- } while (len);
+ }
}
static void poly1305_simd_emit(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index b7664d018851..de1c908f7412 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -29,6 +29,9 @@
#define NUM_KEY_POWERS 8
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 393 * 1024;
+
struct polyval_tfm_ctx {
/*
* These powers must be in the order h^8, ..., h^1.
@@ -107,6 +110,8 @@ static int polyval_x86_update(struct shash_desc *desc,
unsigned int nblocks;
unsigned int n;
+ BUILD_BUG_ON(bytes_per_fpu < POLYVAL_BLOCK_SIZE);
+
if (dctx->bytes) {
n = min(srclen, dctx->bytes);
pos = dctx->buffer + POLYVAL_BLOCK_SIZE - dctx->bytes;
@@ -123,8 +128,7 @@ static int polyval_x86_update(struct shash_desc *desc,
}
while (srclen >= POLYVAL_BLOCK_SIZE) {
- /* Allow rescheduling every 4K bytes. */
- nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
+ nblocks = min(srclen, bytes_per_fpu) / POLYVAL_BLOCK_SIZE;
internal_polyval_update(tctx, src, nblocks, dctx->buffer);
srclen -= nblocks * POLYVAL_BLOCK_SIZE;
src += nblocks * POLYVAL_BLOCK_SIZE;
--
2.38.1
On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote:
> +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> +static const unsigned int bytes_per_fpu = 337 * 1024;
> +
Use an enum for constants like this:
enum { BYTES_PER_FPU = ... };
You can even make it function-local, so it's near the code that uses it,
which will better justify its existence.
Also, where did you get this number? Seems kind of weird.
> asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
> u8 hash[NH_HASH_BYTES]);
>
> @@ -26,18 +29,20 @@ static void _nh_avx2(const u32 *key, const u8 *message, size_t message_len,
> static int nhpoly1305_avx2_update(struct shash_desc *desc,
> const u8 *src, unsigned int srclen)
> {
> + BUILD_BUG_ON(bytes_per_fpu == 0);
Make the constant function local and remove this check.
> +7
> if (srclen < 64 || !crypto_simd_usable())
> return crypto_nhpoly1305_update(desc, src, srclen);
>
> - do {
> - unsigned int n = min_t(unsigned int, srclen, SZ_4K);
> + while (srclen) {
Does this add a needless additional check or does it generate better
code? Would be nice to have some explanation of the rationale.
Same comments as above apply for the rest of this patch ans series.
On Wed, Nov 16, 2022 at 12:13:51PM +0100, Jason A. Donenfeld wrote:
> On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote:
> > +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> > +static const unsigned int bytes_per_fpu = 337 * 1024;
> > +
>
> Use an enum for constants like this:
>
> enum { BYTES_PER_FPU = ... };
>
> You can even make it function-local, so it's near the code that uses it,
> which will better justify its existence.
>
> Also, where did you get this number? Seems kind of weird.
These numbers are highly dependent on hardware and I think having
them hard-coded is wrong.
Perhaps we should try a different approach. How about just limiting
the size to 4K, and then depending on need_resched we break out of
the loop? Something like:
if (!len)
return 0;
kernel_fpu_begin();
for (;;) {
unsigned int chunk = min(len, 4096);
sha1_base_do_update(desc, data, chunk, sha1_xform);
len -= chunk;
data += chunk;
if (!len)
break;
if (need_resched()) {
kernel_fpu_end();
cond_resched();
kernel_fpu_begin();
}
}
kernel_fpu_end();
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -----Original Message-----
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Sent: Friday, November 25, 2022 2:41 AM
> To: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Elliott, Robert (Servers) <elliott@hpe.com>; davem@davemloft.net;
> tim.c.chen@linux.intel.com; ap420073@gmail.com; ardb@kernel.org;
> David.Laight@aculab.com; ebiggers@kernel.org; linux-
> crypto@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
>
> On Wed, Nov 16, 2022 at 12:13:51PM +0100, Jason A. Donenfeld wrote:
> > On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote:
> > > +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> > > +static const unsigned int bytes_per_fpu = 337 * 1024;
> > > +
> >
> > Use an enum for constants like this:
> >
> > enum { BYTES_PER_FPU = ... };
> >
> > You can even make it function-local, so it's near the code that uses it,
> > which will better justify its existence.
> >
> > Also, where did you get this number? Seems kind of weird.
>
> These numbers are highly dependent on hardware and I think having
> them hard-coded is wrong.
>
> Perhaps we should try a different approach. How about just limiting
> the size to 4K, and then depending on need_resched we break out of
> the loop? Something like:
>
> if (!len)
> return 0;
>
> kernel_fpu_begin();
> for (;;) {
> unsigned int chunk = min(len, 4096);
>
> sha1_base_do_update(desc, data, chunk, sha1_xform);
>
> len -= chunk;
> data += chunk;
>
> if (!len)
> break;
>
> if (need_resched()) {
> kernel_fpu_end();
> cond_resched();
> kernel_fpu_begin();
> }
> }
> kernel_fpu_end();
>
I implemented that conditional approach in the sha algorithms.
The results of a boot (using sha512 for module signatures, with
crypto extra tests enabled, comparing to sha512 with a 20 KiB
fixed limit) are:
sha1 cond: 14479 calls; 784256 cycles doing begin/end; longest FPU context 35828 cycles
sha256 cond: 26763 calls; 1273570 cycles doing begin/end; longest FPU context 118612 cycles
sha512 cond: 26957 calls; 1680046 cycles doing begin/end; longest FPU context 169140982 cycles
sha512 20KiB: 161011 calls; 16232280 cycles doing begin/end; longest FPU context 4049644 cycles
NOTE: I didn't have a patch in place to isolate the counts for each variation
(ssse3 vs. avx vs. avx2) and
- for sha512: sha512 vs. sha384
- for sha256: sha256 vs. sha224
so the numbers include sha256 and sha512 running twice as many tests
as sha1.
This approach looks very good:
- 16% of the number of begin/end calls
- 10% of the CPU cycles spent making the calls
- the FPU context is held for a long time (77 ms) but only while
it's not needed.
That's much more efficient than releasing it every 30 us just in case.
I'll keep testing this to make sure RCU stalls stay away, and apply
the approach to the other algorithms.
In x86, need_resched() has to deal with a PER_CPU variable, so I'm
not sure it's worth the hassle to figure out how to do that from
assembly code.
> > Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
> > Perhaps we should try a different approach. How about just limiting
> > the size to 4K, and then depending on need_resched we break out of
> > the loop? Something like:
> >
> > if (!len)
> > return 0;
> >
> > kernel_fpu_begin();
> > for (;;) {
> > unsigned int chunk = min(len, 4096);
> >
> > sha1_base_do_update(desc, data, chunk, sha1_xform);
> >
> > len -= chunk;
> > data += chunk;
> >
> > if (!len)
> > break;
> >
> > if (need_resched()) {
> > kernel_fpu_end();
> > cond_resched();
> > kernel_fpu_begin();
> > }
> > }
> > kernel_fpu_end();
>
>
> I implemented that conditional approach in the sha algorithms.
>
> The results of a boot (using sha512 for module signatures, with
> crypto extra tests enabled, comparing to sha512 with a 20 KiB
> fixed limit) are:
>
> sha1 cond: 14479 calls; 784256 cycles doing begin/end; longest FPU context 35828 cycles
> sha256 cond: 26763 calls; 1273570 cycles doing begin/end; longest FPU context 118612 cycles
> sha512 cond: 26957 calls; 1680046 cycles doing begin/end; longest FPU context 169140982 cycles
> sha512 20KiB: 161011 calls; 16232280 cycles doing begin/end; longest FPU context 4049644 cycles
>
> NOTE: I didn't have a patch in place to isolate the counts for each variation
> (ssse3 vs. avx vs. avx2) and
> - for sha512: sha512 vs. sha384
> - for sha256: sha256 vs. sha224
> so the numbers include sha256 and sha512 running twice as many tests
> as sha1.
>
> This approach looks very good:
> - 16% of the number of begin/end calls
> - 10% of the CPU cycles spent making the calls
> - the FPU context is held for a long time (77 ms) but only while
> it's not needed.
>
> That's much more efficient than releasing it every 30 us just in case.
How recently did you make this change? I implemented this conditional
approach for ecb_cbc_helpers.h, but saw no changes at all to performance
on serpent-avx2 and twofish-avx.
kernel_fpu_{begin,end} (after the first call to begin) don't do anything
more than enable/disable preemption and make a few writes to the mxcsr.
It's likely that the above approach has the tiniest bit less overhead,
and it will preempt on non CONFIG_PREEMPT kernels, but nothing suggests
a performance uplift.
This brings us back to this question: should crypto routines be
preempted under PREEMPT_VOLUNTARY or not?
> I'll keep testing this to make sure RCU stalls stay away, and apply
> the approach to the other algorithms.
I missed the earlier discussions. Have you seen issues with RCU
stalls/latency spikes because of crypto routines? If so, what preemption
model were you running?
> In x86, need_resched() has to deal with a PER_CPU variable, so I'm
> not sure it's worth the hassle to figure out how to do that from
> assembly code.
Leave it in c. It'll be more maintainable that way.
Cheers,
Peter Lafreniere <peter@n8pjl.ca>
> -----Original Message-----
> From: Peter Lafreniere <peter@n8pjl.ca>
> Sent: Tuesday, December 6, 2022 5:06 PM
> To: Elliott, Robert (Servers) <elliott@hpe.com>
> Subject: RE: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
>
> > > Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
> > > Perhaps we should try a different approach. How about just limiting
> > > the size to 4K, and then depending on need_resched we break out of
> > > the loop? Something like:
> > >
> > > if (!len)
> > > return 0;
> > >
> > > kernel_fpu_begin();
> > > for (;;) {
> > > unsigned int chunk = min(len, 4096);
> > >
> > > sha1_base_do_update(desc, data, chunk, sha1_xform);
> > >
> > > len -= chunk;
> > > data += chunk;
> > >
> > > if (!len)
> > > break;
> > >
> > > if (need_resched()) {
> > > kernel_fpu_end();
> > > cond_resched();
> > > kernel_fpu_begin();
> > > }
> > > }
> > > kernel_fpu_end();
> >
> >
> > I implemented that conditional approach in the sha algorithms.
> >
> > The results of a boot (using sha512 for module signatures, with
> > crypto extra tests enabled, comparing to sha512 with a 20 KiB
> > fixed limit) are:
> >
> > sha1 cond: 14479 calls; 784256 cycles doing begin/end; longest FPU
> context 35828 cycles
> > sha256 cond: 26763 calls; 1273570 cycles doing begin/end; longest FPU
> context 118612 cycles
> > sha512 cond: 26957 calls; 1680046 cycles doing begin/end; longest FPU
> context 169140982 cycles
> > sha512 20KiB: 161011 calls; 16232280 cycles doing begin/end; longest FPU
> context 4049644 cycles
> >
> > NOTE: I didn't have a patch in place to isolate the counts for each
> variation
> > (ssse3 vs. avx vs. avx2) and
> > - for sha512: sha512 vs. sha384
> > - for sha256: sha256 vs. sha224
> > so the numbers include sha256 and sha512 running twice as many tests
> > as sha1.
> >
> > This approach looks very good:
> > - 16% of the number of begin/end calls
> > - 10% of the CPU cycles spent making the calls
> > - the FPU context is held for a long time (77 ms) but only while
> > it's not needed.
> >
> > That's much more efficient than releasing it every 30 us just in case.
>
> How recently did you make this change? I implemented this conditional
> approach for ecb_cbc_helpers.h, but saw no changes at all to performance
> on serpent-avx2 and twofish-avx.
The hash functions are the main problem; the skciphers receive
requests already broken into 4 KiB chunks by the SG list helpers.
> kernel_fpu_{begin,end} (after the first call to begin) don't do anything
> more than enable/disable preemption and make a few writes to the mxcsr.
> It's likely that the above approach has the tiniest bit less overhead,
> and it will preempt on non CONFIG_PREEMPT kernels, but nothing suggests
> a performance uplift.
>
> > I'll keep testing this to make sure RCU stalls stay away, and apply
> > the approach to the other algorithms.
>
> I missed the earlier discussions. Have you seen issues with RCU
> stalls/latency spikes because of crypto routines? If so, what preemption
> model were you running?
While running Wireshark in Fedora, I noticed the top function consuming
CPU cycles (per "perf top") was sha512_generic.
Although Fedora and RHEL have the x86 optimized driver compiled as a
module, nothing in the distro or application space noticed it was there
and loaded it. The only x86 optimized drivers that do get used are the
ones built-in to the kernel.
After making changes to load the x86 sha512 module, I noticed several
boots over the next few weeks reported RCU stalls, all in the sha512_avx2
function. Because the stack traces take a long time to print to the
serial port, these can trigger soft lockups as well. Fedora and RHEL
default to "Voluntary Kernel Preemption (Desktop)":
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
The reason was that sha512 and all the other x86 crypto hash functions
process the entire data in one kernel_fpu_begin()/end() block, which
blocks preemption. Each boot checks module signatures for about 4000
files, totaling about 2.4 GB. Breaking the loops into smaller chunks
fixes the problem. However, since functions like crc32c are 20x faster
than sha1, one value like 4 KiB is not ideal.
A few non-hash functions have issues too. Although most skciphers are
broken up into 4 KiB chunks by the sg list walking functions, aegis
packs everything inside one kernel_fpu_begin()/end() block. All the
aead functions handle the main data with sg list walking functions,
but handle all the associated data inside one kernel_fpu_begin()/end()
block.
> > In x86, need_resched() has to deal with a PER_CPU variable, so I'm
> > not sure it's worth the hassle to figure out how to do that from
> > assembly code.
>
> Leave it in c. It'll be more maintainable that way.
I'm testing a new kernel_fpu_yield() utility function that looks nice:
void __sha1_transform_avx2(struct sha1_state *state, const u8 *data, int blocks)
{
if (blocks <= 0)
return;
kernel_fpu_begin();
for (;;) {
const int chunks = min(blocks, 4096 / SHA1_BLOCK_SIZE);
sha1_transform_avx2(state->state, data, chunks);
blocks -= chunks;
if (blocks <= 0)
break;
data += chunks * SHA1_BLOCK_SIZE;
kernel_fpu_yield();
}
kernel_fpu_end();
}
This construction also makes it easy to add debug counters to
observe what is happening.
In a boot with preempt=none and the crypto extra self-tests
enabled, two modules benefitted from that new yield call:
/sys/module/sha256_ssse3/parameters/fpu_rescheds:3
/sys/module/sha512_ssse3/parameters/fpu_rescheds:515
10 passes of 1 MiB buffer tests on all the drivers
shows several others benefitting:
/sys/module/aegis128_aesni/parameters/fpu_rescheds:1
/sys/module/aesni_intel/parameters/fpu_rescheds:0
/sys/module/aria_aesni_avx_x86_64/parameters/fpu_rescheds:45
/sys/module/camellia_aesni_avx2/parameters/fpu_rescheds:0
/sys/module/camellia_aesni_avx_x86_64/parameters/fpu_rescheds:0
/sys/module/camellia_x86_64/parameters/fpu_rescheds:0
/sys/module/cast5_avx_x86_64/parameters/fpu_rescheds:0
/sys/module/cast6_avx_x86_64/parameters/fpu_rescheds:0
/sys/module/chacha_x86_64/parameters/fpu_rescheds:0
/sys/module/crc32c_intel/parameters/fpu_rescheds:1
/sys/module/crc32_pclmul/parameters/fpu_rescheds:1
/sys/module/crct10dif_pclmul/parameters/fpu_rescheds:1
/sys/module/ghash_clmulni_intel/parameters/fpu_rescheds:1
/sys/module/libblake2s_x86_64/parameters/fpu_rescheds:0
/sys/module/nhpoly1305_avx2/parameters/fpu_rescheds:1
/sys/module/nhpoly1305_sse2/parameters/fpu_rescheds:1
/sys/module/poly1305_x86_64/parameters/fpu_rescheds:1
/sys/module/polyval_clmulni/parameters/fpu_rescheds:1
/sys/module/serpent_avx2/parameters/fpu_rescheds:0
/sys/module/serpent_avx_x86_64/parameters/fpu_rescheds:0
/sys/module/serpent_sse2_x86_64/parameters/fpu_rescheds:0
/sys/module/sha1_ssse3/parameters/fpu_rescheds:3
/sys/module/sha256_ssse3/parameters/fpu_rescheds:9
/sys/module/sha512_ssse3/parameters/fpu_rescheds:723
/sys/module/sm3_avx_x86_64/parameters/fpu_rescheds:171
/sys/module/sm4_aesni_avx_x86_64/parameters/fpu_rescheds:0
/sys/module/twofish_avx_x86_64/parameters/fpu_rescheds:0
/sys/module/twofish_x86_64_3way/parameters/fpu_rescheds:0
I'll keep experimenting with all the preempt modes, heavier
workloads, and shorter RCU timeouts to confirm this solution
is robust. It might even be appropriate for the generic
drivers, if they suffer from the problems that sm4 shows here.
> This brings us back to this question: should crypto routines be
> preempted under PREEMPT_VOLUNTARY or not?
I think so. The RCU stall and soft lockup detectors aren't disabled,
so there is still an expectation of sharing the CPUs even in
PREEMPT=none mode.
1 MiB tests under CONFIG_PREEMPT=none triggered soft lockups while
running CBC mode for SM4, Camellia, and Serpent:
[ 208.975253] tcrypt: PERL "cfb-sm4-aesni-avx2" => 22499840,
[ 218.187217] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [modprobe:3433]
...
[ 219.391776] tcrypt: PERL "cbc-sm4-aesni-avx2" => 22528138,
[ 244.471181] tcrypt: PERL "ecb-sm4-aesni-avx" => 4469626,
[ 246.181064] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [modprobe:3433]
...
[ 250.168239] tcrypt: PERL "cbc-camellia-aesni-avx2" => 12202738,
[ 264.047440] tcrypt: PERL "cbc-cast5-avx" => 17744280,
[ 273.091258] tcrypt: PERL "cbc-cast6-avx" => 19375400,
[ 274.183249] watchdog: BUG: soft lockup - CPU#1 stuck for 78s! [modprobe:3433]
...
[ 283.066260] tcrypt: PERL "cbc-serpent-avx2" => 21454930,
SM4 falls back to the generic driver for encryption; it only has
optimized decryption functions. Therefore, it doesn't make any
kernel_fpu_end() calls and thus makes no rescheduling calls.
This shows the CPU cycles for 1 MiB of encrypt and decrypt for
each algorithm (no soft lockups this time). SM4, Serpent, Cast5,
and Cast6 encryption in CBC mode are the slowest by far.
[ 2233.362748] tcrypt: PERL my %speeds_skcipher = (
[ 2234.427387] tcrypt: PERL "cbc-aes-aesni" => 2178586,
[ 2234.738823] tcrypt: PERL "cbc-aes-aesni" => 538752,
[ 2235.064335] tcrypt: PERL "ctr-aes-aesni" => 574026,
[ 2235.389427] tcrypt: PERL "ctr-aes-aesni" => 574060,
[ 2236.451594] tcrypt: PERL "cts-cbc-aes-aesni" => 2178946,
[ 2236.762174] tcrypt: PERL "cts-cbc-aes-aesni" => 540066,
[ 2237.070371] tcrypt: PERL "ecb-aes-aesni" => 536970,
[ 2237.379549] tcrypt: PERL "ecb-aes-aesni" => 538012,
[ 2237.686137] tcrypt: PERL "xctr-aes-aesni" => 534690,
[ 2237.993315] tcrypt: PERL "xctr-aes-aesni" => 534632,
[ 2238.304077] tcrypt: PERL "xts-aes-aesni" => 542590,
[ 2238.615057] tcrypt: PERL "xts-aes-aesni" => 541296,
[ 2240.233298] tcrypt: PERL "ctr-aria-avx" => 3393212,
[ 2241.849000] tcrypt: PERL "ctr-aria-avx" => 3391982,
[ 2242.081296] tcrypt: PERL "xchacha12-simd" => 370794,
[ 2242.316868] tcrypt: PERL "xchacha12-simd" => 373788,
[ 2242.626165] tcrypt: PERL "xchacha20-simd" => 536310,
[ 2242.936646] tcrypt: PERL "xchacha20-simd" => 537094,
[ 2243.250356] tcrypt: PERL "chacha20-simd" => 540542,
[ 2243.559396] tcrypt: PERL "chacha20-simd" => 536604,
[ 2244.831594] tcrypt: PERL "ctr-sm4-aesni-avx2" => 2642674,
[ 2246.106143] tcrypt: PERL "ctr-sm4-aesni-avx2" => 2640350,
[ 2256.475661] tcrypt: PERL "cfb-sm4-aesni-avx2" => 22496346,
[ 2257.732511] tcrypt: PERL "cfb-sm4-aesni-avx2" => 2604932,
[ 2268.123821] tcrypt: PERL "cbc-sm4-aesni-avx2" => 22528268,
[ 2269.378028] tcrypt: PERL "cbc-sm4-aesni-avx2" => 2601090,
[ 2271.533556] tcrypt: PERL "ctr-sm4-aesni-avx" => 4559648,
[ 2273.688772] tcrypt: PERL "ctr-sm4-aesni-avx" => 4561300,
[ 2284.073187] tcrypt: PERL "cfb-sm4-aesni-avx" => 22499496,
[ 2286.177732] tcrypt: PERL "cfb-sm4-aesni-avx" => 4457588,
[ 2296.569751] tcrypt: PERL "cbc-sm4-aesni-avx" => 22529182,
[ 2298.677312] tcrypt: PERL "cbc-sm4-aesni-avx" => 4457226,
[ 2300.789931] tcrypt: PERL "ecb-sm4-aesni-avx" => 4464282,
[ 2302.899974] tcrypt: PERL "ecb-sm4-aesni-avx" => 4466052,
[ 2308.589365] tcrypt: PERL "cbc-camellia-aesni-avx2" => 12260426,
[ 2309.737064] tcrypt: PERL "cbc-camellia-aesni-avx2" => 2350988,
[ 2315.433319] tcrypt: PERL "cbc-camellia-aesni" => 12248986,
[ 2317.262589] tcrypt: PERL "cbc-camellia-aesni" => 3814202,
[ 2325.460542] tcrypt: PERL "cbc-cast5-avx" => 17739828,
[ 2327.856127] tcrypt: PERL "cbc-cast5-avx" => 5061992,
[ 2336.668992] tcrypt: PERL "cbc-cast6-avx" => 19066440,
[ 2340.470787] tcrypt: PERL "cbc-cast6-avx" => 8147336,
[ 2350.376676] tcrypt: PERL "cbc-serpent-avx2" => 21466002,
[ 2351.646295] tcrypt: PERL "cbc-serpent-avx2" => 2611362,
[ 2361.562736] tcrypt: PERL "cbc-serpent-avx" => 21471118,
[ 2364.019693] tcrypt: PERL "cbc-serpent-avx" => 5201506,
[ 2373.930747] tcrypt: PERL "cbc-serpent-sse2" => 21465594,
[ 2376.697210] tcrypt: PERL "cbc-serpent-sse2" => 5855766,
[ 2380.944596] tcrypt: PERL "cbc-twofish-avx" => 9058090,
[ 2383.308215] tcrypt: PERL "cbc-twofish-avx" => 4989064,
[ 2384.904158] tcrypt: PERL "ecb-aria-avx" => 3299260,
[ 2386.498365] tcrypt: PERL "ecb-aria-avx" => 3297534,
[ 2387.625226] tcrypt: PERL "ecb-camellia-aesni-avx2" => 2306326,
[ 2388.757749] tcrypt: PERL "ecb-camellia-aesni-avx2" => 2312876,
[ 2390.549340] tcrypt: PERL "ecb-camellia-aesni" => 3752534,
[ 2392.335240] tcrypt: PERL "ecb-camellia-aesni" => 3751896,
[ 2394.724956] tcrypt: PERL "ecb-cast5-avx" => 5032914,
[ 2397.116268] tcrypt: PERL "ecb-cast5-avx" => 5041908,
[ 2400.935093] tcrypt: PERL "ecb-cast6-avx" => 8148418,
[ 2404.754816] tcrypt: PERL "ecb-cast6-avx" => 8150448,
[ 2406.025861] tcrypt: PERL "ecb-serpent-avx2" => 2613024,
[ 2407.286682] tcrypt: PERL "ecb-serpent-avx2" => 2602556,
[ 2409.732474] tcrypt: PERL "ecb-serpent-avx" => 5191944,
[ 2412.161829] tcrypt: PERL "ecb-serpent-avx" => 5165230,
[ 2414.678835] tcrypt: PERL "ecb-serpent-sse2" => 5345630,
[ 2417.217632] tcrypt: PERL "ecb-serpent-sse2" => 5331110,
[ 2419.545136] tcrypt: PERL "ecb-twofish-avx" => 4917424,
[ 2421.870457] tcrypt: PERL "ecb-twofish-avx" => 4915194,
[ 2421.870564] tcrypt: PERL );
> I'll keep experimenting with all the preempt modes, heavier
> workloads, and shorter RCU timeouts to confirm this solution
> is robust. It might even be appropriate for the generic
> drivers, if they suffer from the problems that sm4 shows here.
I have a set of patches that's looking promising. It's no longer
generating RCU stall warnings or soft lockups with either x86
drivers or generic drivers (sm4 is particularly taxing).
Test case:
* added 28 clones of the tcrypt module so modprobe can run it
many times in parallel (1 thread per CPU core)
* added 1 MiB big buffer functional tests (compare to
generic results)
* added 1 MiB big buffer speed tests
* 3 windows running
* 28 threads running
* modprobe with each defined test mode in order 1, 2, 3, etc.
* RCU stall timeouts set to shortest supported values
* run in preempt=none, preempt=voluntary, preempt=full modes
Patches include:
* Ard's kmap_local() patch
* Suppress RCU stall warnings during speed tests. Change the
rcu_sysrq_start()/end() functions to be general purpose and
call them from tcrypt test functions that measure time of
a crypto operation
* add crypto_yield() unilaterally in skcipher_walk_done so
it is run even if data is aligned
* add crypto_yield() in aead_encrypt/decrypt so they always
call it like skcipher
* add crypto_yield() at the end each hash update(), digest(),
and finup() function so they always call it like skcipher
* add kernel_fpu_yield() calls every 4 KiB inside x86
kernel_fpu_begin()/end() blocks, so the x86 functions always
yield to the scheduler even when they're bypassing those
helper functions (that now call crypto_yield() more
consistently)
I'll keep trying to break it over the weekend. If it holds
up I'll post the patches next week.
On Fri, Dec 02, 2022 at 06:21:02AM +0000, Elliott, Robert (Servers) wrote: > > I'll keep testing this to make sure RCU stalls stay away, and apply > the approach to the other algorithms. Thanks for doing all the hard work! BTW, just a minor nit but you can delete the cond_resched() call because kernel_fpu_end()/preempt_enable() will do it anyway. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Friday, December 2, 2022 3:25 AM > To: Elliott, Robert (Servers) <elliott@hpe.com> > Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption > > On Fri, Dec 02, 2022 at 06:21:02AM +0000, Elliott, Robert (Servers) wrote: ... > BTW, just a minor nit but you can delete the cond_resched() call > because kernel_fpu_end()/preempt_enable() will do it anyway. That happens under CONFIG_PREEMPTION=y (from include/Linux/preempt.h and arch/x86/include/asm/preempt.h) Is calling cond_resched() still helpful if that is not the configuration?
On Fri, Dec 02, 2022 at 04:15:23PM +0000, Elliott, Robert (Servers) wrote: > > > > -----Original Message----- > > From: Herbert Xu <herbert@gondor.apana.org.au> > > Sent: Friday, December 2, 2022 3:25 AM > > To: Elliott, Robert (Servers) <elliott@hpe.com> > > Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption > > > > On Fri, Dec 02, 2022 at 06:21:02AM +0000, Elliott, Robert (Servers) wrote: > ... > > BTW, just a minor nit but you can delete the cond_resched() call > > because kernel_fpu_end()/preempt_enable() will do it anyway. > > That happens under > CONFIG_PREEMPTION=y > (from include/Linux/preempt.h and arch/x86/include/asm/preempt.h) > > Is calling cond_resched() still helpful if that is not the configuration? Perhaps, but then again perhaps if preemption is off, maybe we shouldn't even bother with the 4K split. Were the initial warnings with or without preemption? Personally I don't really care since I always use preemption. The PREEMPT Kconfigs do provide a bit of nuance with the split between PREEMPT_NONE vs. PREEMPT_VOLUNTARY. But perhaps that is just overkill for our situation. I'll leave it to you to decide :) Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> > > BTW, just a minor nit but you can delete the cond_resched() call
> > > because kernel_fpu_end()/preempt_enable() will do it anyway.
> >
> > That happens under
> > CONFIG_PREEMPTION=y
> > (from include/Linux/preempt.h and arch/x86/include/asm/preempt.h)
> >
> > Is calling cond_resched() still helpful if that is not the configuration?
>
>
> Perhaps, but then again perhaps if preemption is off, maybe we
> shouldn't even bother with the 4K split. Were the initial
> warnings with or without preemption?
>
> Personally I don't really care since I always use preemption.
>
> The PREEMPT Kconfigs do provide a bit of nuance with the split
> between PREEMPT_NONE vs. PREEMPT_VOLUNTARY. But perhaps that is
> just overkill for our situation.
I was thinking about this a few days ago, and my 2¢ is that it's
probably best to not preempt the kernel in the middle of a crypto
operation under PREEMPT_VOLUNTARY. We're already not preempting during
these operations, and there haven't been complaints of excessive latency
because of these crypto operations.
If we skip the kernel_fpu_{begin,end} pair when not under
CONFIG_PREEMPT, we'll save a significant cycle count that is wasted
currently. See Elliot Robert's numbers on conditional begin/end in sha
to see the benefits of not saving/restoring unnecessarily: "10% of the
CPU cycles spent making the [kernel_fpu_{begin,end}] calls".
> I'll leave it to you to decide :)
One extra thought: commit 827ee47: "crypto: x86 - add some helper macros
for ECB and CBC modes" makes a mention of fpu save/restore being done
lazily. I don't know the details, so would that change this discussion?
Thanks for listening,
Peter Lafreniere <peter@n8pjl.ca>
From: Peter Lafreniere > Sent: 06 December 2022 14:04 > > > > > BTW, just a minor nit but you can delete the cond_resched() call > > > > because kernel_fpu_end()/preempt_enable() will do it anyway. > > > > > > That happens under > > > CONFIG_PREEMPTION=y > > > (from include/Linux/preempt.h and arch/x86/include/asm/preempt.h) > > > > > > Is calling cond_resched() still helpful if that is not the configuration? > > > > > > Perhaps, but then again perhaps if preemption is off, maybe we > > shouldn't even bother with the 4K split. Were the initial > > warnings with or without preemption? > > > > Personally I don't really care since I always use preemption. > > > > The PREEMPT Kconfigs do provide a bit of nuance with the split > > between PREEMPT_NONE vs. PREEMPT_VOLUNTARY. But perhaps that is > > just overkill for our situation. > > I was thinking about this a few days ago, and my 2¢ is that it's > probably best to not preempt the kernel in the middle of a crypto > operation under PREEMPT_VOLUNTARY. We're already not preempting during > these operations, and there haven't been complaints of excessive latency > because of these crypto operations. ... Probably because the people who have been suffering from (and looking for) latency issues aren't running crypto tests. I've found some terrible pre-emption latency issues trying to get RT processes scheduled in a sensible timeframe. I wouldn't worry about 100us - I'm doing audio processing every 10ms, but anything much longer causes problems when trying to use 90+% of the cpu time for lots of audio channels. I didn't try a CONFIG_RT kernel, the application needs to run on a standard 'distro' kernel. In any case I suspect all the extra processes switches (etc) the RT kernel adds will completely kill performance. I wonder how much it would cost to measure the time spent with pre-empt disabled (and not checked) and to trace long intervals. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, 25 Nov 2022 at 09:41, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Wed, Nov 16, 2022 at 12:13:51PM +0100, Jason A. Donenfeld wrote:
> > On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote:
> > > +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> > > +static const unsigned int bytes_per_fpu = 337 * 1024;
> > > +
> >
> > Use an enum for constants like this:
> >
> > enum { BYTES_PER_FPU = ... };
> >
> > You can even make it function-local, so it's near the code that uses it,
> > which will better justify its existence.
> >
> > Also, where did you get this number? Seems kind of weird.
>
> These numbers are highly dependent on hardware and I think having
> them hard-coded is wrong.
>
> Perhaps we should try a different approach. How about just limiting
> the size to 4K, and then depending on need_resched we break out of
> the loop? Something like:
>
> if (!len)
> return 0;
>
> kernel_fpu_begin();
> for (;;) {
> unsigned int chunk = min(len, 4096);
>
> sha1_base_do_update(desc, data, chunk, sha1_xform);
>
> len -= chunk;
> data += chunk;
>
> if (!len)
> break;
>
> if (need_resched()) {
> kernel_fpu_end();
> cond_resched();
> kernel_fpu_begin();
> }
> }
> kernel_fpu_end();
>
On arm64, this is implemented in an assembler macro 'cond_yield' so we
don't need to preserve/restore the SIMD state state at all if the
yield is not going to result in a call to schedule(). For example, the
SHA3 code keeps 400 bytes of state in registers, which we don't want
to save and reload unless needed. (5f6cb2e617681 'crypto:
arm64/sha512-ce - simplify NEON yield')
So the asm routines will call cond_yield, and return early if a yield
is required, with the number of blocks or bytes left to process as the
return value. The C wrapper just calls the asm routine in a loop until
the return value becomes 0.
That way, we don't need magic values at all, and the yield will occur
as soon as the asm inner loop observes the yield condition so the
latency should be much lower as well.
Note that it is only used in shash implementations, given that they
are the only ones that may receive unbounded inputs.
On Fri, Nov 25, 2022 at 09:59:17AM +0100, Ard Biesheuvel wrote: > > On arm64, this is implemented in an assembler macro 'cond_yield' so we > don't need to preserve/restore the SIMD state state at all if the > yield is not going to result in a call to schedule(). For example, the > SHA3 code keeps 400 bytes of state in registers, which we don't want > to save and reload unless needed. (5f6cb2e617681 'crypto: > arm64/sha512-ce - simplify NEON yield') Yes this would be optimally done from the assembly code which would make a difference if they benefited from larger block sizes. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Fri, Nov 25, 2022 at 09:59:17AM +0100, Ard Biesheuvel wrote: > On arm64, this is implemented in an assembler macro 'cond_yield' so we > don't need to preserve/restore the SIMD state state at all if the > yield is not going to result in a call to schedule(). For example, the > SHA3 code keeps 400 bytes of state in registers, which we don't want > to save and reload unless needed. (5f6cb2e617681 'crypto: > arm64/sha512-ce - simplify NEON yield') That sounds like the optimal approach. There is a cost to unnecessary kernel_fpu_begin()/end() calls - increasing their usage in the x86 sha512 driver added 929 us during one boot. The cond_yield check is just a few memory reads and conditional branches. I see that is built on the asm-offsets.c technique mentioned by Dave Hansen in the x86 aria driver thread. > Note that it is only used in shash implementations, given that they > are the only ones that may receive unbounded inputs. Although typical usage probably doesn't stress this, the length of the additional associated data presented to aead implementations is unconstrained as well. At least in x86, they can end up processing multiple megabytes in one chunk like the hash functions (if the associated data is a big buffer described by a sg list created with sg_init_one()).
> On Fri, Nov 25, 2022 at 09:59:17AM +0100, Ard Biesheuvel wrote:
> > Note that it is only used in shash implementations, given that they
> > are the only ones that may receive unbounded inputs.
>
> Although typical usage probably doesn't stress this, the length of the
> additional associated data presented to aead implementations is
> unconstrained as well. At least in x86, they can end up processing
> multiple megabytes in one chunk like the hash functions (if the
> associated data is a big buffer described by a sg list created
> with sg_init_one()).
>
Reviewing the two arm64 aead drivers, aes-ce-ccm-glue.c solves that by
including this in the do/while loop in ccm_calculate_auth_mac():
n = min_t(u32, n, SZ_4K); /* yield NEON at least every 4k */
That was added by 36a916af641 ("crypto: arm64/aes-ccm - yield NEON
when processing auth-only data") in 2021, also relying on
41691c44606b ("crypto: arm64/aes-ccm - reduce NEON begin/end calls
for common case").
ghash-ce-glue.c seems to be missing that in its similar function named
gcm_calculate_auth_mac().
> -----Original Message-----
> From: Jason A. Donenfeld <Jason@zx2c4.com>
> Sent: Wednesday, November 16, 2022 5:14 AM
> To: Elliott, Robert (Servers) <elliott@hpe.com>
> Cc: herbert@gondor.apana.org.au; davem@davemloft.net;
> tim.c.chen@linux.intel.com; ap420073@gmail.com; ardb@kernel.org;
> David.Laight@ACULAB.COM; ebiggers@kernel.org; linux-
> crypto@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
>
> On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote:
> > +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> > +static const unsigned int bytes_per_fpu = 337 * 1024;
> > +
>
> Use an enum for constants like this:
>
> enum { BYTES_PER_FPU = ... };
>
> You can even make it function-local, so it's near the code that uses it,
> which will better justify its existence.
Using crc32c-intel as an example, the gcc 12.2.1 assembly output is the
same for:
1. const variable per-module
static const unsigned int bytes_per_fpu = 868 * 1024;
2. enum per-module
enum { bytes_per_fpu = 868 * 1024 } ;
3. enum per function (_update and _finup)
enum { bytes_per_fpu = 868 * 1024 } ;
Each function gets a movl instruction with that constant, and has the
same compare, add, subtract, and jump instructions.
# ../arch/x86/crypto/crc32c-intel_glue.c:198: unsigned int chunk = min(len, bytes_per_fpu);
movl $888832, %eax #, tmp127
# ../arch/x86/crypto/crc32c-intel_glue.c:171: unsigned int chunk = min(len, bytes_per_fpu);
movl $888832, %r13d #, tmp128
Since enum doesn't guarantee any particular type, those variations
upset the min() macro. min_t() is necessary to eliminate the
compiler warning.
../arch/x86/crypto/crc32c-intel_glue.c: In function ‘crc32c_pcl_intel_update’:
../arch/x86/crypto/crc32c-intel_glue.c:171:97: warning: comparison of distinct pointer types lacks a cast
171 | unsigned int chunk = min(len, bytes_per_fpu);
> Also, where did you get this number? Seems kind of weird.
As described in replies on the v2 patches, I created a tcrypt test that
runs each algorithm on a 1 MiB buffer with no loop limits (and irqs
disabled), picks the best result out of 10 passes, and calculates the
number of bytes that nominally fit in 30 us (on a 2.2 GHz Cascade
Lake CPU).
Actual results with those values vary from 37 to 102 us; it is much
better than running unlimited, but still imperfect.
https://lore.kernel.org/lkml/MW5PR84MB184284FBED63E2D043C93A6FAB369@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/
The hash algorithms seem to congregate around three speeds:
- slow: 10 to 20 KiB for sha* and sm3
- medium: 200 to 400 KiB for poly*
- fast: 600 to 800 KiB for crc*
so it might be preferable to just go with three values (e.g., 16, 256, and
512 KiB). There's a lot of variability in CPU architecture, CPU speeds, and
other system activity that make this impossible to perfect.
It'd be ideal if the loops checked the CPU cycle count rather than
worked on a byte count, but the RDTSC instruction is notoriously slow and
would slow down overall performance.
The RAID6 library supports running a benchmark during each boot to pick
the best implementation to use (not always enabled):
[ 3.341372] raid6: skipped pq benchmark and selected avx512x4
The crypto subsystem does something similar with its self-tests, which could
be expanded to include speed tests to tune the loop values. However, that
slows down boot and could be misled by an NMI or SMI during the test, which
could lead to even worse results.
The selected values could be kept in each file or put in a shared .h file.
Both approaches seem difficult to maintain.
The powerpc modules have paragraph-sized comments explaining their MAX_BYTES
macros (e.g., in arch/powerpc/crypto/sha256-spe-glue.c) that might be a good
model for documenting the values:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). SHA256 takes ~2,000
* operations per 64 bytes. e500 cores can issue two arithmetic instructions
* per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
* Thus 1KB of input data will need an estimated maximum of 18,000 cycles.
* Headroom for cache misses included. Even with the low end model clocked
* at 667 MHz this equals to a critical time window of less than 27us.
> > asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t
> message_len,
> > u8 hash[NH_HASH_BYTES]);
> >
> > @@ -26,18 +29,20 @@ static void _nh_avx2(const u32 *key, const u8
> *message, size_t message_len,
> > static int nhpoly1305_avx2_update(struct shash_desc *desc,
> > const u8 *src, unsigned int srclen)
> > {
> > + BUILD_BUG_ON(bytes_per_fpu == 0);
>
> Make the constant function local and remove this check.
That just makes sure someone editing the source code doesn't pick a value that
will cause the loops to hang; it's stronger than a comment saying "don't set
this to 0". It's only a compile-time check, and doesn't result in any
the assembly language output that I can see.
> > +7
> > if (srclen < 64 || !crypto_simd_usable())
> > return crypto_nhpoly1305_update(desc, src, srclen);
> >
> > - do {
> > - unsigned int n = min_t(unsigned int, srclen, SZ_4K);
> > + while (srclen) {
>
> Does this add a needless additional check or does it generate better
> code? Would be nice to have some explanation of the rationale.
Each module's assembly function can have different handling of
- length 0
- length < block size
- length < some minimum length
- length < a performance switchover point
- length not a multiple of block size
- current buffer pointer not aligned to block size
Sometimes the C glue logic checks values upfront; sometimes
it doesn't.
The while loops help get them to follow one of two patterns:
while (length)
or
while (length >= BLOCK_SIZE)
and sidestep some of the special handling concerns.
Performance-wise, the patches are either
- adding lots of kernel_fpu_begin() and kernel_fpu_end() calls
(all the ones that were running unlimited)
- removing lots of kernel_fpu_begin() and kernel_fpu_end() calls
(e.g., polyval relaxed from 4 KiB to 383 KiB)
which is much more impactful than the common while loop entry.
I created tcrypt tests that try lengths around all the special
values like 0, 16, 4096, and the selected bytes per FPU size
(comparing results to the generic algorithms like the extended
self-tests), so I think these loops are functionally correct
(I've found that violating the undocumented assumptions of the
assembly functions is a good way to exercise RCU stall
reporting).
From: Elliott, Robert > Sent: 22 November 2022 05:06 ... > Since enum doesn't guarantee any particular type, those variations > upset the min() macro. min_t() is necessary to eliminate the > compiler warning. Yes, min() is fundamentally broken. min_t() isn't really a solution. I think min() needs to include something like: #define min(a, b) \ __builtin_constant(b) && (b) + 0u <= MAX_INT ? \ ((a) < (int)(b) ? (a) : (int)(b)) : \ ... So in the common case where 'b' is a small constant integer it doesn't matter whether the is it signed or unsigned. I might try compiling a kernel where min_t() does that instead of the casts - just to see how many of the casts are actually needed. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Make kernel_fpu_begin() and kernel_fpu_end() calls around each
assembly language function that uses FPU context, rather than
around the entire set (init, ad, crypt, final).
Limit the processing of bulk data based on a module parameter,
so multiple blocks are processed within one FPU context
(associated data is not limited).
Allow the skcipher_walk functions to sleep again, since they are
is no longer called inside FPU context.
Motivation: calling crypto_aead_encrypt() with a single scatter-gather
list entry pointing to a 1 MiB plaintext buffer caused the aesni_encrypt
function to receive a length of 1048576 bytes and consume 306348 cycles
within FPU context to process that data.
Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: ba6771c0a0bc ("crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP")
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/aegis128-aesni-glue.c | 39 ++++++++++++++++++++-------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 4623189000d8..6e96bdda2811 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -23,6 +23,9 @@
#define AEGIS128_MIN_AUTH_SIZE 8
#define AEGIS128_MAX_AUTH_SIZE 16
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 4 * 1024;
+
asmlinkage void crypto_aegis128_aesni_init(void *state, void *key, void *iv);
asmlinkage void crypto_aegis128_aesni_ad(
@@ -85,15 +88,19 @@ static void crypto_aegis128_aesni_process_ad(
if (pos > 0) {
unsigned int fill = AEGIS128_BLOCK_SIZE - pos;
memcpy(buf.bytes + pos, src, fill);
- crypto_aegis128_aesni_ad(state,
+ kernel_fpu_begin();
+ crypto_aegis128_aesni_ad(state->blocks,
AEGIS128_BLOCK_SIZE,
buf.bytes);
+ kernel_fpu_end();
pos = 0;
left -= fill;
src += fill;
}
- crypto_aegis128_aesni_ad(state, left, src);
+ kernel_fpu_begin();
+ crypto_aegis128_aesni_ad(state->blocks, left, src);
+ kernel_fpu_end();
src += left & ~(AEGIS128_BLOCK_SIZE - 1);
left &= AEGIS128_BLOCK_SIZE - 1;
@@ -110,7 +117,9 @@ static void crypto_aegis128_aesni_process_ad(
if (pos > 0) {
memset(buf.bytes + pos, 0, AEGIS128_BLOCK_SIZE - pos);
- crypto_aegis128_aesni_ad(state, AEGIS128_BLOCK_SIZE, buf.bytes);
+ kernel_fpu_begin();
+ crypto_aegis128_aesni_ad(state->blocks, AEGIS128_BLOCK_SIZE, buf.bytes);
+ kernel_fpu_end();
}
}
@@ -119,15 +128,23 @@ static void crypto_aegis128_aesni_process_crypt(
const struct aegis_crypt_ops *ops)
{
while (walk->nbytes >= AEGIS128_BLOCK_SIZE) {
- ops->crypt_blocks(state,
- round_down(walk->nbytes, AEGIS128_BLOCK_SIZE),
+ unsigned int chunk = min(walk->nbytes, bytes_per_fpu);
+
+ chunk = round_down(chunk, AEGIS128_BLOCK_SIZE);
+
+ kernel_fpu_begin();
+ ops->crypt_blocks(state->blocks, chunk,
walk->src.virt.addr, walk->dst.virt.addr);
- skcipher_walk_done(walk, walk->nbytes % AEGIS128_BLOCK_SIZE);
+ kernel_fpu_end();
+
+ skcipher_walk_done(walk, walk->nbytes - chunk);
}
if (walk->nbytes) {
- ops->crypt_tail(state, walk->nbytes, walk->src.virt.addr,
+ kernel_fpu_begin();
+ ops->crypt_tail(state->blocks, walk->nbytes, walk->src.virt.addr,
walk->dst.virt.addr);
+ kernel_fpu_end();
skcipher_walk_done(walk, 0);
}
}
@@ -172,15 +189,17 @@ static void crypto_aegis128_aesni_crypt(struct aead_request *req,
struct skcipher_walk walk;
struct aegis_state state;
- ops->skcipher_walk_init(&walk, req, true);
+ ops->skcipher_walk_init(&walk, req, false);
kernel_fpu_begin();
+ crypto_aegis128_aesni_init(&state.blocks, ctx->key.bytes, req->iv);
+ kernel_fpu_end();
- crypto_aegis128_aesni_init(&state, ctx->key.bytes, req->iv);
crypto_aegis128_aesni_process_ad(&state, req->src, req->assoclen);
crypto_aegis128_aesni_process_crypt(&state, &walk, ops);
- crypto_aegis128_aesni_final(&state, tag_xor, req->assoclen, cryptlen);
+ kernel_fpu_begin();
+ crypto_aegis128_aesni_final(&state.blocks, tag_xor, req->assoclen, cryptlen);
kernel_fpu_end();
}
--
2.38.1
Don't register and unregister each of the functions from least-
to most-optimized (e.g., SSSE3 then AVX then AVX2); register all
variations.
This enables selecting those other algorithms if needed,
such as for testing with:
modprobe tcrypt mode=300 alg=sha512-avx
modprobe tcrypt mode=400 alg=sha512-avx
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 register all the variations, not just the best one, per
Herbert's feedback. return -ENODEV if none are successful, 0
if any are successful
v4 remove driver_name strings that are only used by later
patches no longer included in this series that enhance the
prints. A future patch series might remove existing prints
rather than add and enhance them.
Reported-by: kernel test robot <lkp@intel.com>
---
arch/x86/crypto/sha1_ssse3_glue.c | 132 +++++++++++++--------------
arch/x86/crypto/sha256_ssse3_glue.c | 136 +++++++++++++---------------
arch/x86/crypto/sha512_ssse3_glue.c | 99 +++++++++-----------
3 files changed, 168 insertions(+), 199 deletions(-)
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 4bc77c84b0fb..e75a1060bb5f 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -34,6 +34,13 @@ static const unsigned int bytes_per_fpu_avx2 = 34 * 1024;
static const unsigned int bytes_per_fpu_avx = 30 * 1024;
static const unsigned int bytes_per_fpu_ssse3 = 26 * 1024;
+static int using_x86_ssse3;
+static int using_x86_avx;
+static int using_x86_avx2;
+#ifdef CONFIG_AS_SHA1_NI
+static int using_x86_shani;
+#endif
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha1_block_fn *sha1_xform)
@@ -128,17 +135,12 @@ static struct shash_alg sha1_ssse3_alg = {
}
};
-static int register_sha1_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shash(&sha1_ssse3_alg);
- return 0;
-}
-
static void unregister_sha1_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (using_x86_ssse3) {
crypto_unregister_shash(&sha1_ssse3_alg);
+ using_x86_ssse3 = 0;
+ }
}
asmlinkage void sha1_transform_avx(struct sha1_state *state,
@@ -179,28 +181,12 @@ static struct shash_alg sha1_avx_alg = {
}
};
-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
-
-static int register_sha1_avx(void)
-{
- if (avx_usable())
- return crypto_register_shash(&sha1_avx_alg);
- return 0;
-}
-
static void unregister_sha1_avx(void)
{
- if (avx_usable())
+ if (using_x86_avx) {
crypto_unregister_shash(&sha1_avx_alg);
+ using_x86_avx = 0;
+ }
}
#define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 blocks */
@@ -208,16 +194,6 @@ static void unregister_sha1_avx(void)
asmlinkage void sha1_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks);
-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2)
- && boot_cpu_has(X86_FEATURE_BMI1)
- && boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
static void sha1_apply_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks)
{
@@ -263,17 +239,12 @@ static struct shash_alg sha1_avx2_alg = {
}
};
-static int register_sha1_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shash(&sha1_avx2_alg);
- return 0;
-}
-
static void unregister_sha1_avx2(void)
{
- if (avx2_usable())
+ if (using_x86_avx2) {
crypto_unregister_shash(&sha1_avx2_alg);
+ using_x86_avx2 = 0;
+ }
}
#ifdef CONFIG_AS_SHA1_NI
@@ -315,49 +286,70 @@ static struct shash_alg sha1_ni_alg = {
}
};
-static int register_sha1_ni(void)
-{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
- return crypto_register_shash(&sha1_ni_alg);
- return 0;
-}
-
static void unregister_sha1_ni(void)
{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
+ if (using_x86_shani) {
crypto_unregister_shash(&sha1_ni_alg);
+ using_x86_shani = 0;
+ }
}
#else
-static inline int register_sha1_ni(void) { return 0; }
static inline void unregister_sha1_ni(void) { }
#endif
static int __init sha1_ssse3_mod_init(void)
{
- if (register_sha1_ssse3())
- goto fail;
+ const char *feature_name;
+ int ret;
+
+#ifdef CONFIG_AS_SHA1_NI
+ /* SHA-NI */
+ if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
- if (register_sha1_avx()) {
- unregister_sha1_ssse3();
- goto fail;
+ ret = crypto_register_shash(&sha1_ni_alg);
+ if (!ret)
+ using_x86_shani = 1;
}
+#endif
+
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
- if (register_sha1_avx2()) {
- unregister_sha1_avx();
- unregister_sha1_ssse3();
- goto fail;
+ if (boot_cpu_has(X86_FEATURE_BMI1) &&
+ boot_cpu_has(X86_FEATURE_BMI2)) {
+
+ ret = crypto_register_shash(&sha1_avx2_alg);
+ if (!ret)
+ using_x86_avx2 = 1;
+ }
}
- if (register_sha1_ni()) {
- unregister_sha1_avx2();
- unregister_sha1_avx();
- unregister_sha1_ssse3();
- goto fail;
+ /* AVX */
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+
+ ret = crypto_register_shash(&sha1_avx_alg);
+ if (!ret)
+ using_x86_avx = 1;
+ }
}
- return 0;
-fail:
+ /* SSE3 */
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shash(&sha1_ssse3_alg);
+ if (!ret)
+ using_x86_ssse3 = 1;
+ }
+
+#ifdef CONFIG_AS_SHA1_NI
+ if (using_x86_shani)
+ return 0;
+#endif
+ if (using_x86_avx2 || using_x86_avx || using_x86_ssse3)
+ return 0;
return -ENODEV;
}
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index cdcdf5a80ffe..c6261ede4bae 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -51,6 +51,13 @@ static const unsigned int bytes_per_fpu_ssse3 = 11 * 1024;
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);
+static int using_x86_ssse3;
+static int using_x86_avx;
+static int using_x86_avx2;
+#ifdef CONFIG_AS_SHA256_NI
+static int using_x86_shani;
+#endif
+
static int _sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha256_block_fn *sha256_xform)
@@ -156,19 +163,13 @@ static struct shash_alg sha256_ssse3_algs[] = { {
}
} };
-static int register_sha256_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(sha256_ssse3_algs,
- ARRAY_SIZE(sha256_ssse3_algs));
- return 0;
-}
-
static void unregister_sha256_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (using_x86_ssse3) {
crypto_unregister_shashes(sha256_ssse3_algs,
ARRAY_SIZE(sha256_ssse3_algs));
+ using_x86_ssse3 = 0;
+ }
}
asmlinkage void sha256_transform_avx(struct sha256_state *state,
@@ -223,30 +224,13 @@ static struct shash_alg sha256_avx_algs[] = { {
}
} };
-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
-
-static int register_sha256_avx(void)
-{
- if (avx_usable())
- return crypto_register_shashes(sha256_avx_algs,
- ARRAY_SIZE(sha256_avx_algs));
- return 0;
-}
-
static void unregister_sha256_avx(void)
{
- if (avx_usable())
+ if (using_x86_avx) {
crypto_unregister_shashes(sha256_avx_algs,
ARRAY_SIZE(sha256_avx_algs));
+ using_x86_avx = 0;
+ }
}
asmlinkage void sha256_transform_rorx(struct sha256_state *state,
@@ -301,28 +285,13 @@ static struct shash_alg sha256_avx2_algs[] = { {
}
} };
-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2) &&
- boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
-static int register_sha256_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shashes(sha256_avx2_algs,
- ARRAY_SIZE(sha256_avx2_algs));
- return 0;
-}
-
static void unregister_sha256_avx2(void)
{
- if (avx2_usable())
+ if (using_x86_avx2) {
crypto_unregister_shashes(sha256_avx2_algs,
ARRAY_SIZE(sha256_avx2_algs));
+ using_x86_avx2 = 0;
+ }
}
#ifdef CONFIG_AS_SHA256_NI
@@ -378,51 +347,72 @@ static struct shash_alg sha256_ni_algs[] = { {
}
} };
-static int register_sha256_ni(void)
-{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
- return crypto_register_shashes(sha256_ni_algs,
- ARRAY_SIZE(sha256_ni_algs));
- return 0;
-}
-
static void unregister_sha256_ni(void)
{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
+ if (using_x86_shani) {
crypto_unregister_shashes(sha256_ni_algs,
ARRAY_SIZE(sha256_ni_algs));
+ using_x86_shani = 0;
+ }
}
#else
-static inline int register_sha256_ni(void) { return 0; }
static inline void unregister_sha256_ni(void) { }
#endif
static int __init sha256_ssse3_mod_init(void)
{
- if (register_sha256_ssse3())
- goto fail;
+ const char *feature_name;
+ int ret;
+
+#ifdef CONFIG_AS_SHA256_NI
+ /* SHA-NI */
+ if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
- if (register_sha256_avx()) {
- unregister_sha256_ssse3();
- goto fail;
+ ret = crypto_register_shashes(sha256_ni_algs,
+ ARRAY_SIZE(sha256_ni_algs));
+ if (!ret)
+ using_x86_shani = 1;
}
+#endif
+
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
- if (register_sha256_avx2()) {
- unregister_sha256_avx();
- unregister_sha256_ssse3();
- goto fail;
+ if (boot_cpu_has(X86_FEATURE_BMI2)) {
+ ret = crypto_register_shashes(sha256_avx2_algs,
+ ARRAY_SIZE(sha256_avx2_algs));
+ if (!ret)
+ using_x86_avx2 = 1;
+ }
}
- if (register_sha256_ni()) {
- unregister_sha256_avx2();
- unregister_sha256_avx();
- unregister_sha256_ssse3();
- goto fail;
+ /* AVX */
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ ret = crypto_register_shashes(sha256_avx_algs,
+ ARRAY_SIZE(sha256_avx_algs));
+ if (!ret)
+ using_x86_avx = 1;
+ }
}
- return 0;
-fail:
+ /* SSE3 */
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shashes(sha256_ssse3_algs,
+ ARRAY_SIZE(sha256_ssse3_algs));
+ if (!ret)
+ using_x86_ssse3 = 1;
+ }
+
+#ifdef CONFIG_AS_SHA256_NI
+ if (using_x86_shani)
+ return 0;
+#endif
+ if (using_x86_avx2 || using_x86_avx || using_x86_ssse3)
+ return 0;
return -ENODEV;
}
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index c7036cfe2a7e..feae85933270 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -47,6 +47,10 @@ static const unsigned int bytes_per_fpu_ssse3 = 17 * 1024;
asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
const u8 *data, int blocks);
+static int using_x86_ssse3;
+static int using_x86_avx;
+static int using_x86_avx2;
+
static int sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha512_block_fn *sha512_xform)
@@ -152,33 +156,17 @@ static struct shash_alg sha512_ssse3_algs[] = { {
}
} };
-static int register_sha512_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(sha512_ssse3_algs,
- ARRAY_SIZE(sha512_ssse3_algs));
- return 0;
-}
-
static void unregister_sha512_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (using_x86_ssse3) {
crypto_unregister_shashes(sha512_ssse3_algs,
ARRAY_SIZE(sha512_ssse3_algs));
+ using_x86_ssse3 = 0;
+ }
}
asmlinkage void sha512_transform_avx(struct sha512_state *state,
const u8 *data, int blocks);
-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
@@ -230,19 +218,13 @@ static struct shash_alg sha512_avx_algs[] = { {
}
} };
-static int register_sha512_avx(void)
-{
- if (avx_usable())
- return crypto_register_shashes(sha512_avx_algs,
- ARRAY_SIZE(sha512_avx_algs));
- return 0;
-}
-
static void unregister_sha512_avx(void)
{
- if (avx_usable())
+ if (using_x86_avx) {
crypto_unregister_shashes(sha512_avx_algs,
ARRAY_SIZE(sha512_avx_algs));
+ using_x86_avx = 0;
+ }
}
asmlinkage void sha512_transform_rorx(struct sha512_state *state,
@@ -298,22 +280,6 @@ static struct shash_alg sha512_avx2_algs[] = { {
}
} };
-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2) &&
- boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
-static int register_sha512_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shashes(sha512_avx2_algs,
- ARRAY_SIZE(sha512_avx2_algs));
- return 0;
-}
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
@@ -324,32 +290,53 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static void unregister_sha512_avx2(void)
{
- if (avx2_usable())
+ if (using_x86_avx2) {
crypto_unregister_shashes(sha512_avx2_algs,
ARRAY_SIZE(sha512_avx2_algs));
+ using_x86_avx2 = 0;
+ }
}
static int __init sha512_ssse3_mod_init(void)
{
+ const char *feature_name;
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (register_sha512_ssse3())
- goto fail;
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
+ if (boot_cpu_has(X86_FEATURE_BMI2)) {
+ ret = crypto_register_shashes(sha512_avx2_algs,
+ ARRAY_SIZE(sha512_avx2_algs));
+ if (!ret)
+ using_x86_avx2 = 1;
+ }
+ }
+
+ /* AVX */
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
- if (register_sha512_avx()) {
- unregister_sha512_ssse3();
- goto fail;
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ ret = crypto_register_shashes(sha512_avx_algs,
+ ARRAY_SIZE(sha512_avx_algs));
+ if (!ret)
+ using_x86_avx = 1;
+ }
}
- if (register_sha512_avx2()) {
- unregister_sha512_avx();
- unregister_sha512_ssse3();
- goto fail;
+ /* SSE3 */
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shashes(sha512_ssse3_algs,
+ ARRAY_SIZE(sha512_ssse3_algs));
+ if (!ret)
+ using_x86_ssse3 = 1;
}
- return 0;
-fail:
+ if (using_x86_avx2 || using_x86_avx || using_x86_ssse3)
+ return 0;
return -ENODEV;
}
--
2.38.1
Narrow the kernel_fpu_begin()/kernel_fpu_end() to just wrap the
assembly functions, not any extra C code around them (which includes
several memcpy() calls).
This reduces unnecessary time in FPU context, in which the scheduler
is prevented from preempting and the RCU subsystem is kept from
doing its work.
Example results measuring a boot, in which SHA-512 is used to check all
module signatures using finup() calls:
Before:
calls maxcycles bpf update finup algorithm module
======== ============ ======== ======== ======== =========== ==============
168390 1233188 19456 0 19456 sha512-avx2 sha512_ssse3
After:
182694 1007224 19456 0 19456 sha512-avx2 sha512_ssse3
That means it stayed in FPU context for 226k fewer clocks cycles (which
is 102 microseconds on this system, 18% less).
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/sha1_ssse3_glue.c | 82 ++++++++++++++++++++---------
arch/x86/crypto/sha256_ssse3_glue.c | 67 ++++++++++++++++++-----
arch/x86/crypto/sha512_ssse3_glue.c | 48 ++++++++++++-----
3 files changed, 145 insertions(+), 52 deletions(-)
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index e75a1060bb5f..32f3310e19e2 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -34,6 +34,54 @@ static const unsigned int bytes_per_fpu_avx2 = 34 * 1024;
static const unsigned int bytes_per_fpu_avx = 30 * 1024;
static const unsigned int bytes_per_fpu_ssse3 = 26 * 1024;
+asmlinkage void sha1_transform_ssse3(struct sha1_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha1_transform_avx(struct sha1_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha1_transform_avx2(struct sha1_state *state,
+ const u8 *data, int blocks);
+
+#ifdef CONFIG_AS_SHA1_NI
+asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data,
+ int rounds);
+#endif
+
+static void fpu_sha1_transform_ssse3(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_transform_ssse3(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha1_transform_avx(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_transform_avx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha1_transform_avx2(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_transform_avx2(state, data, blocks);
+ kernel_fpu_end();
+}
+
+#ifdef CONFIG_AS_SHA1_NI
+static void fpu_sha1_transform_shani(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_ni_transform(state, data, blocks);
+ kernel_fpu_end();
+}
+#endif
+
static int using_x86_ssse3;
static int using_x86_avx;
static int using_x86_avx2;
@@ -60,9 +108,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);
- kernel_fpu_begin();
sha1_base_do_update(desc, data, chunk, sha1_xform);
- kernel_fpu_end();
len -= chunk;
data += chunk;
@@ -81,36 +127,29 @@ static int sha1_finup(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);
- kernel_fpu_begin();
sha1_base_do_update(desc, data, chunk, sha1_xform);
- kernel_fpu_end();
len -= chunk;
data += chunk;
}
- kernel_fpu_begin();
sha1_base_do_finalize(desc, sha1_xform);
- kernel_fpu_end();
return sha1_base_finish(desc, out);
}
-asmlinkage void sha1_transform_ssse3(struct sha1_state *state,
- const u8 *data, int blocks);
-
static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha1_update(desc, data, len, bytes_per_fpu_ssse3,
- sha1_transform_ssse3);
+ fpu_sha1_transform_ssse3);
}
static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha1_finup(desc, data, len, bytes_per_fpu_ssse3, out,
- sha1_transform_ssse3);
+ fpu_sha1_transform_ssse3);
}
/* Add padding and return the message digest. */
@@ -143,21 +182,18 @@ static void unregister_sha1_ssse3(void)
}
}
-asmlinkage void sha1_transform_avx(struct sha1_state *state,
- const u8 *data, int blocks);
-
static int sha1_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha1_update(desc, data, len, bytes_per_fpu_avx,
- sha1_transform_avx);
+ fpu_sha1_transform_avx);
}
static int sha1_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha1_finup(desc, data, len, bytes_per_fpu_avx, out,
- sha1_transform_avx);
+ fpu_sha1_transform_avx);
}
static int sha1_avx_final(struct shash_desc *desc, u8 *out)
@@ -191,17 +227,14 @@ static void unregister_sha1_avx(void)
#define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 blocks */
-asmlinkage void sha1_transform_avx2(struct sha1_state *state,
- const u8 *data, int blocks);
-
static void sha1_apply_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks)
{
/* Select the optimal transform based on data block size */
if (blocks >= SHA1_AVX2_BLOCK_OPTSIZE)
- sha1_transform_avx2(state, data, blocks);
+ fpu_sha1_transform_avx2(state, data, blocks);
else
- sha1_transform_avx(state, data, blocks);
+ fpu_sha1_transform_avx(state, data, blocks);
}
static int sha1_avx2_update(struct shash_desc *desc, const u8 *data,
@@ -248,21 +281,18 @@ static void unregister_sha1_avx2(void)
}
#ifdef CONFIG_AS_SHA1_NI
-asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data,
- int rounds);
-
static int sha1_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha1_update(desc, data, len, bytes_per_fpu_shani,
- sha1_ni_transform);
+ fpu_sha1_transform_shani);
}
static int sha1_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha1_finup(desc, data, len, bytes_per_fpu_shani, out,
- sha1_ni_transform);
+ fpu_sha1_transform_shani);
}
static int sha1_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index c6261ede4bae..839da1b36273 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -51,6 +51,51 @@ static const unsigned int bytes_per_fpu_ssse3 = 11 * 1024;
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);
+asmlinkage void sha256_transform_avx(struct sha256_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha256_transform_rorx(struct sha256_state *state,
+ const u8 *data, int blocks);
+
+#ifdef CONFIG_AS_SHA256_NI
+asmlinkage void sha256_ni_transform(struct sha256_state *digest,
+ const u8 *data, int rounds);
+#endif
+
+static void fpu_sha256_transform_ssse3(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_transform_ssse3(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha256_transform_avx(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_transform_avx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha256_transform_avx2(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_transform_rorx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+#ifdef CONFIG_AS_SHA1_NI
+static void fpu_sha256_transform_shani(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_ni_transform(state, data, blocks);
+ kernel_fpu_end();
+}
+#endif
+
static int using_x86_ssse3;
static int using_x86_avx;
static int using_x86_avx2;
@@ -77,9 +122,7 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);
- kernel_fpu_begin();
sha256_base_do_update(desc, data, chunk, sha256_xform);
- kernel_fpu_end();
len -= chunk;
data += chunk;
@@ -98,17 +141,13 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);
- kernel_fpu_begin();
sha256_base_do_update(desc, data, chunk, sha256_xform);
- kernel_fpu_end();
len -= chunk;
data += chunk;
}
- kernel_fpu_begin();
sha256_base_do_finalize(desc, sha256_xform);
- kernel_fpu_end();
return sha256_base_finish(desc, out);
}
@@ -117,14 +156,14 @@ static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_ssse3,
- sha256_transform_ssse3);
+ fpu_sha256_transform_ssse3);
}
static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_ssse3,
- out, sha256_transform_ssse3);
+ out, fpu_sha256_transform_ssse3);
}
/* Add padding and return the message digest. */
@@ -179,14 +218,14 @@ static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_avx,
- sha256_transform_avx);
+ fpu_sha256_transform_avx);
}
static int sha256_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_avx,
- out, sha256_transform_avx);
+ out, fpu_sha256_transform_avx);
}
static int sha256_avx_final(struct shash_desc *desc, u8 *out)
@@ -240,14 +279,14 @@ static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_avx2,
- sha256_transform_rorx);
+ fpu_sha256_transform_avx2);
}
static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_avx2,
- out, sha256_transform_rorx);
+ out, fpu_sha256_transform_avx2);
}
static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
@@ -302,14 +341,14 @@ static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_shani,
- sha256_ni_transform);
+ fpu_sha256_transform_shani);
}
static int sha256_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_shani,
- out, sha256_ni_transform);
+ out, fpu_sha256_transform_shani);
}
static int sha256_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index feae85933270..48586ab40d55 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -47,6 +47,36 @@ static const unsigned int bytes_per_fpu_ssse3 = 17 * 1024;
asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
const u8 *data, int blocks);
+asmlinkage void sha512_transform_avx(struct sha512_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha512_transform_rorx(struct sha512_state *state,
+ const u8 *data, int blocks);
+
+static void fpu_sha512_transform_ssse3(struct sha512_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha512_transform_ssse3(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha512_transform_avx(struct sha512_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha512_transform_avx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha512_transform_avx2(struct sha512_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha512_transform_rorx(state, data, blocks);
+ kernel_fpu_end();
+}
+
static int using_x86_ssse3;
static int using_x86_avx;
static int using_x86_avx2;
@@ -70,9 +100,7 @@ static int sha512_update(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);
- kernel_fpu_begin();
sha512_base_do_update(desc, data, chunk, sha512_xform);
- kernel_fpu_end();
len -= chunk;
data += chunk;
@@ -91,17 +119,13 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);
- kernel_fpu_begin();
sha512_base_do_update(desc, data, chunk, sha512_xform);
- kernel_fpu_end();
len -= chunk;
data += chunk;
}
- kernel_fpu_begin();
sha512_base_do_finalize(desc, sha512_xform);
- kernel_fpu_end();
return sha512_base_finish(desc, out);
}
@@ -110,14 +134,14 @@ static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_update(desc, data, len, bytes_per_fpu_ssse3,
- sha512_transform_ssse3);
+ fpu_sha512_transform_ssse3);
}
static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_finup(desc, data, len, bytes_per_fpu_ssse3,
- out, sha512_transform_ssse3);
+ out, fpu_sha512_transform_ssse3);
}
/* Add padding and return the message digest. */
@@ -172,14 +196,14 @@ static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_update(desc, data, len, bytes_per_fpu_avx,
- sha512_transform_avx);
+ fpu_sha512_transform_avx);
}
static int sha512_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_finup(desc, data, len, bytes_per_fpu_avx,
- out, sha512_transform_avx);
+ out, fpu_sha512_transform_avx);
}
/* Add padding and return the message digest. */
@@ -234,14 +258,14 @@ static int sha512_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_update(desc, data, len, bytes_per_fpu_avx2,
- sha512_transform_rorx);
+ fpu_sha512_transform_avx2);
}
static int sha512_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_finup(desc, data, len, bytes_per_fpu_avx2,
- out, sha512_transform_rorx);
+ out, fpu_sha512_transform_avx2);
}
/* Add padding and return the message digest. */
--
2.38.1
Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
sha1, sha256
based on CPU feature bits so udev gets a chance to load them later in
the boot process when the filesystems are all running.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v3 put device table SHA_NI entries inside CONFIG_SHAn_NI ifdefs,
ensure builds properly with arch/x86/Kconfig.assembler changed
to not set CONFIG_AS_SHA*_NI
---
arch/x86/crypto/sha1_ssse3_glue.c | 15 +++++++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 15 +++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 32f3310e19e2..806463f57b6d 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -24,6 +24,7 @@
#include <linux/types.h>
#include <crypto/sha1.h>
#include <crypto/sha1_base.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -328,11 +329,25 @@ static void unregister_sha1_ni(void)
static inline void unregister_sha1_ni(void) { }
#endif
+static const struct x86_cpu_id module_cpu_ids[] = {
+#ifdef CONFIG_AS_SHA1_NI
+ X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
+#endif
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sha1_ssse3_mod_init(void)
{
const char *feature_name;
int ret;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
#ifdef CONFIG_AS_SHA1_NI
/* SHA-NI */
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 839da1b36273..30c8c50c1123 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -38,6 +38,7 @@
#include <crypto/sha2.h>
#include <crypto/sha256_base.h>
#include <linux/string.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -399,11 +400,25 @@ static void unregister_sha256_ni(void)
static inline void unregister_sha256_ni(void) { }
#endif
+static const struct x86_cpu_id module_cpu_ids[] = {
+#ifdef CONFIG_AS_SHA256_NI
+ X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
+#endif
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sha256_ssse3_mod_init(void)
{
const char *feature_name;
int ret;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
#ifdef CONFIG_AS_SHA256_NI
/* SHA-NI */
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
--
2.38.1
Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
crc32, crc32c, and crct10dif
Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.
Remove the print on a device table mismatch from crc32 that is not
present in the other modules. Modules are not supposed to print
unless they are active.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/crc32-pclmul_glue.c | 10 ++++------
arch/x86/crypto/crc32c-intel_glue.c | 6 +++---
arch/x86/crypto/crct10dif-pclmul_glue.c | 6 +++---
3 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index df3dbc754818..d5e889c24bea 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -182,20 +182,18 @@ static struct shash_alg alg = {
}
};
-static const struct x86_cpu_id crc32pclmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crc32pclmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init crc32_pclmul_mod_init(void)
{
-
- if (!x86_match_cpu(crc32pclmul_cpu_id)) {
- pr_info("PCLMULQDQ-NI instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- }
+
return crypto_register_shash(&alg);
}
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index f08ed68ec93d..aff132e925ea 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -240,15 +240,15 @@ static struct shash_alg alg = {
}
};
-static const struct x86_cpu_id crc32c_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init crc32c_intel_mod_init(void)
{
- if (!x86_match_cpu(crc32c_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 4f6b8c727d88..a26dbd27da96 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -139,15 +139,15 @@ static struct shash_alg alg = {
}
};
-static const struct x86_cpu_id crct10dif_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crct10dif_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init crct10dif_intel_mod_init(void)
{
- if (!x86_match_cpu(crct10dif_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
return crypto_register_shash(&alg);
--
2.38.1
Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
sm3
based on CPU feature bits so udev gets a chance to load them later in
the boot process when the filesystems are all running.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v4 removed second AVX check that is unreachable
---
arch/x86/crypto/sm3_avx_glue.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 483aaed996ba..c7786874319c 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -15,6 +15,7 @@
#include <linux/types.h>
#include <crypto/sm3.h>
#include <crypto/sm3_base.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -119,14 +120,18 @@ static struct shash_alg sm3_avx_alg = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sm3_avx_mod_init(void)
{
const char *feature_name;
- if (!boot_cpu_has(X86_FEATURE_AVX)) {
- pr_info("AVX instruction are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- }
if (!boot_cpu_has(X86_FEATURE_BMI2)) {
pr_info("BMI2 instruction are not detected.\n");
--
2.38.1
Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
nhpoly1305
poly1305
polyval
Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.
Remove the __maybe_unused attribute from polyval since it is
always used.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v4 Removed CPU feature checks that are unreachable because
the x86_match_cpu call already handles them.
Made poly1305 match on all features since it does provide
an x86_64 asm function if avx, avx2, and avx512f are not
available.
Move polyval into this patch rather than pair with ghash.
Remove __maybe_unused from polyval.
---
arch/x86/crypto/nhpoly1305-avx2-glue.c | 13 +++++++++++--
arch/x86/crypto/nhpoly1305-sse2-glue.c | 9 ++++++++-
arch/x86/crypto/poly1305_glue.c | 10 ++++++++++
arch/x86/crypto/polyval-clmulni_glue.c | 6 +++---
4 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index f7dc9c563bb5..fa415fec5793 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -60,10 +61,18 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE))
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_OSXSAVE))
return -ENODEV;
return crypto_register_shash(&nhpoly1305_alg);
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index daffcc7019ad..c47765e46236 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -60,9 +61,15 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_XMM2))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
return crypto_register_shash(&nhpoly1305_alg);
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 16831c036d71..f1e39e23b2a3 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/simd.h>
@@ -268,8 +269,17 @@ static struct shash_alg alg = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init poly1305_simd_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_AVX) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
static_branch_enable(&poly1305_use_avx);
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index de1c908f7412..b98e32f8e2a4 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -176,15 +176,15 @@ static struct shash_alg polyval_alg = {
},
};
-__maybe_unused static const struct x86_cpu_id pcmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init polyval_clmulni_mod_init(void)
{
- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_AVX))
--
2.38.1
On Tue, Nov 15, 2022 at 10:13:35PM -0600, Robert Elliott wrote:
> +static const struct x86_cpu_id module_cpu_ids[] = {
> + X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
> + {}
> +};
> +MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
> +
> static int __init poly1305_simd_mod_init(void)
> {
> + if (!x86_match_cpu(module_cpu_ids))
> + return -ENODEV;
What exactly does this accomplish? Isn't this just a no-op?
Jason
Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
ghash
Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v4 move polyval into a separate patch
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 0f24c3b23fd2..d19a8e9b34a6 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -325,17 +325,17 @@ static struct ahash_alg ghash_async_alg = {
},
};
-static const struct x86_cpu_id pcmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL), /* Pickle-Mickle-Duck */
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init ghash_pclmulqdqni_mod_init(void)
{
int err;
- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
err = crypto_register_shash(&ghash_alg);
--
2.38.1
Change the type of the GCM auth_tag_len argument and derivative
variables from unsigned long to unsigned int, so they preserve the
type returned by crypto_aead_authsize().
Continue to pass it to the asm functions as an unsigned long,
but let those function calls be the place where the conversion
to the possibly larger type occurs.
This avoids possible truncation for calculations like:
scatterwalk_map_and_copy(auth_tag_msg, req->src,
req->assoclen + req->cryptlen - auth_tag_len,
auth_tag_len, 0);
whose third argument is an unsigned int. If unsigned long were
bigger than unsigned int, that equation could wrap.
Use unsigned int rather than int for intermediate variables
containing byte counts and block counts, since all the functions
using them accept unsigned int arguments.
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/aesni-intel_glue.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..921680373855 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -381,7 +381,7 @@ static int cts_cbc_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+ unsigned int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
struct scatterlist *src = req->src, *dst = req->dst;
struct scatterlist sg_src[2], sg_dst[2];
struct skcipher_request subreq;
@@ -437,7 +437,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+ unsigned int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
struct scatterlist *src = req->src, *dst = req->dst;
struct scatterlist sg_src[2], sg_dst[2];
struct skcipher_request subreq;
@@ -671,11 +671,11 @@ static int generic_gcmaes_set_authsize(struct crypto_aead *tfm,
static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
unsigned int assoclen, u8 *hash_subkey,
u8 *iv, void *aes_ctx, u8 *auth_tag,
- unsigned long auth_tag_len)
+ unsigned int auth_tag_len)
{
u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
- unsigned long left = req->cryptlen;
+ unsigned int left = req->cryptlen;
struct scatter_walk assoc_sg_walk;
struct skcipher_walk walk;
bool do_avx, do_avx2;
@@ -782,7 +782,7 @@ static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
- unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+ unsigned int auth_tag_len = crypto_aead_authsize(tfm);
u8 auth_tag[16];
int err;
@@ -801,7 +801,7 @@ static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
- unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+ unsigned int auth_tag_len = crypto_aead_authsize(tfm);
u8 auth_tag_msg[16];
u8 auth_tag[16];
int err;
@@ -907,7 +907,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
- int tail = req->cryptlen % AES_BLOCK_SIZE;
+ unsigned int tail = req->cryptlen % AES_BLOCK_SIZE;
struct skcipher_request subreq;
struct skcipher_walk walk;
int err;
@@ -920,7 +920,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
return err;
if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
- int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+ unsigned int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
skcipher_walk_abort(&walk);
@@ -945,7 +945,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
while (walk.nbytes > 0) {
- int nbytes = walk.nbytes;
+ unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes &= ~(AES_BLOCK_SIZE - 1);
--
2.38.1
Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases based on CPU feature bits for
modules not implementing hash algorithms:
aegis, aesni, aria
blake2s, blowfish
camellia, cast5, cast6, chacha, curve25519
des3_ede
serpent, sm4
twofish
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
v4 Remove CPU feature checks that are unreachable because
x86_match_cpu already handles them. Make curve25519 match
on ADX and check BMI2.
---
arch/x86/crypto/aegis128-aesni-glue.c | 10 +++++++++-
arch/x86/crypto/aesni-intel_glue.c | 6 +++---
arch/x86/crypto/aria_aesni_avx_glue.c | 15 ++++++++++++---
arch/x86/crypto/blake2s-glue.c | 12 +++++++++++-
arch/x86/crypto/blowfish_glue.c | 10 ++++++++++
arch/x86/crypto/camellia_aesni_avx2_glue.c | 17 +++++++++++++----
arch/x86/crypto/camellia_aesni_avx_glue.c | 15 ++++++++++++---
arch/x86/crypto/camellia_glue.c | 10 ++++++++++
arch/x86/crypto/cast5_avx_glue.c | 10 ++++++++++
arch/x86/crypto/cast6_avx_glue.c | 10 ++++++++++
arch/x86/crypto/chacha_glue.c | 11 +++++++++--
arch/x86/crypto/curve25519-x86_64.c | 19 ++++++++++++++-----
arch/x86/crypto/des3_ede_glue.c | 10 ++++++++++
arch/x86/crypto/serpent_avx2_glue.c | 14 ++++++++++++--
arch/x86/crypto/serpent_avx_glue.c | 10 ++++++++++
arch/x86/crypto/serpent_sse2_glue.c | 11 ++++++++---
arch/x86/crypto/sm4_aesni_avx2_glue.c | 13 +++++++++++--
arch/x86/crypto/sm4_aesni_avx_glue.c | 15 ++++++++++++---
arch/x86/crypto/twofish_avx_glue.c | 10 ++++++++++
arch/x86/crypto/twofish_glue.c | 10 ++++++++++
arch/x86/crypto/twofish_glue_3way.c | 10 ++++++++++
21 files changed, 216 insertions(+), 32 deletions(-)
diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 6e96bdda2811..a3ebd018953c 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -282,12 +282,20 @@ static struct aead_alg crypto_aegis128_aesni_alg = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_aead_alg *simd_alg;
static int __init crypto_aegis128_aesni_module_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_XMM2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
!cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
return -ENODEV;
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 921680373855..0505d4f9d2a2 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1228,17 +1228,17 @@ static struct aead_alg aesni_aeads[0];
static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)];
-static const struct x86_cpu_id aesni_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init aesni_init(void)
{
int err;
- if (!x86_match_cpu(aesni_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_AVX2)) {
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index c561ea4fefa5..6a135203a767 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -5,6 +5,7 @@
* Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com>
*/
+#include <asm/cpu_device_id.h>
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/aria.h>
@@ -165,14 +166,22 @@ static struct skcipher_alg aria_algs[] = {
static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init aria_avx_init(void)
{
const char *feature_name;
- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("AES or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index aaba21230528..df757d18a35a 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -10,7 +10,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
-
+#include <asm/cpu_device_id.h>
#include <asm/cpufeature.h>
#include <asm/fpu/api.h>
#include <asm/processor.h>
@@ -55,8 +55,18 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
}
EXPORT_SYMBOL(blake2s_compress);
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX512VL, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init blake2s_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_SSSE3))
static_branch_enable(&blake2s_use_ssse3);
diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 019c64c1340a..4c0ead71b198 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
/* regular block cipher functions */
asmlinkage void __blowfish_enc_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src,
@@ -303,10 +304,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init blowfish_init(void)
{
int err;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"blowfish-x86_64: performance on this CPU "
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e7e4d64e9577..6c48fc9f3fde 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
#include "camellia.h"
#include "ecb_cbc_helpers.h"
@@ -98,17 +99,25 @@ static struct skcipher_alg camellia_algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
static int __init camellia_aesni_init(void)
{
const char *feature_name;
- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
+ !boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
+ pr_info("AES-NI, AVX, or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index c7ccf63e741e..6d7fc96d242e 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
#include "camellia.h"
#include "ecb_cbc_helpers.h"
@@ -98,16 +99,24 @@ static struct skcipher_alg camellia_algs[] = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
static int __init camellia_aesni_init(void)
{
const char *feature_name;
- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index d45e9c0c42ac..a3df1043ed73 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -8,6 +8,7 @@
* Copyright (C) 2006 NTT (Nippon Telegraph and Telephone Corporation)
*/
+#include <asm/cpu_device_id.h>
#include <asm/unaligned.h>
#include <linux/crypto.h>
#include <linux/init.h>
@@ -1377,10 +1378,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init camellia_init(void)
{
int err;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"camellia-x86_64: performance on this CPU "
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 3976a87f92ad..bdc3c763334c 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -13,6 +13,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
#include "ecb_cbc_helpers.h"
@@ -93,12 +94,21 @@ static struct skcipher_alg cast5_algs[] = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];
static int __init cast5_init(void)
{
const char *feature_name;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 7e2aea372349..addca34b3511 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/cast6.h>
#include <crypto/internal/simd.h>
+#include <asm/cpu_device_id.h>
#include "ecb_cbc_helpers.h"
@@ -93,12 +94,21 @@ static struct skcipher_alg cast6_algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];
static int __init cast6_init(void)
{
const char *feature_name;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 7b3a1cf0984b..546ab0abf30c 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -13,6 +13,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
@@ -276,10 +277,16 @@ static struct skcipher_alg algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init chacha_simd_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_SSSE3))
- return 0;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
static_branch_enable(&chacha_use_simd);
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index d55fa9e9b9e6..ae7536b17bf9 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -12,7 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/scatterlist.h>
-
+#include <asm/cpu_device_id.h>
#include <asm/cpufeature.h>
#include <asm/processor.h>
@@ -1697,13 +1697,22 @@ static struct kpp_alg curve25519_alg = {
.max_size = curve25519_max_size,
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init curve25519_mod_init(void)
{
- if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX))
- static_branch_enable(&curve25519_use_bmi2_adx);
- else
- return 0;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_BMI2))
+ return -ENODEV;
+
+ static_branch_enable(&curve25519_use_bmi2_adx);
+
return IS_REACHABLE(CONFIG_CRYPTO_KPP) ?
crypto_register_kpp(&curve25519_alg) : 0;
}
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index abb8b1fe123b..168cac5c6ca6 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
struct des3_ede_x86_ctx {
struct des3_ede_ctx enc;
@@ -354,10 +355,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init des3_ede_x86_init(void)
{
int err;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
pr_info("des3_ede-x86_64: performance on this CPU would be suboptimal: disabling des3_ede-x86_64.\n");
return -ENODEV;
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 347e97f4b713..bc18149fb928 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -12,6 +12,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>
#include "serpent-avx.h"
#include "ecb_cbc_helpers.h"
@@ -94,14 +95,23 @@ static struct skcipher_alg serpent_algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_avx2_init(void)
{
const char *feature_name;
- if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ pr_info("OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 6c248e1ea4ef..0db18d99da50 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>
#include "serpent-avx.h"
#include "ecb_cbc_helpers.h"
@@ -100,12 +101,21 @@ static struct skcipher_alg serpent_algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_init(void)
{
const char *feature_name;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index d78f37e9b2cf..74f0c89f55ef 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -20,6 +20,7 @@
#include <crypto/b128ops.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>
#include "serpent-sse2.h"
#include "ecb_cbc_helpers.h"
@@ -103,14 +104,18 @@ static struct skcipher_alg serpent_algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_sse2_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_XMM2)) {
- printk(KERN_INFO "SSE2 instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- }
return simd_register_skciphers_compat(serpent_algs,
ARRAY_SIZE(serpent_algs),
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 84bc718f49a3..125b00db89b1 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
@@ -126,6 +127,12 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];
@@ -133,11 +140,13 @@ static int __init sm4_init(void)
{
const char *feature_name;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
+ pr_info("AVX, AES-NI, and/or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 7800f77d68ad..ac8182b197cf 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
@@ -445,6 +446,12 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];
@@ -452,10 +459,12 @@ static int __init sm4_init(void)
{
const char *feature_name;
- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 3eb3440b477a..4657e6efc35d 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/twofish.h>
+#include <asm/cpu_device_id.h>
#include "twofish.h"
#include "ecb_cbc_helpers.h"
@@ -103,12 +104,21 @@ static struct skcipher_alg twofish_algs[] = {
},
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];
static int __init twofish_init(void)
{
const char *feature_name;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index f9c4adc27404..ade98aef3402 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -43,6 +43,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
asmlinkage void twofish_enc_blk(struct twofish_ctx *ctx, u8 *dst,
const u8 *src);
@@ -81,8 +82,17 @@ static struct crypto_alg alg = {
}
};
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init twofish_glue_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
return crypto_register_alg(&alg);
}
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 90454cf18e0d..790e5a59a9a7 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -11,6 +11,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>
#include "twofish.h"
#include "ecb_cbc_helpers.h"
@@ -140,8 +141,17 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init twofish_3way_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"twofish-x86_64-3way: performance on this CPU "
--
2.38.1
On Tue, Nov 15, 2022 at 10:13:38PM -0600, Robert Elliott wrote:
> diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
> index d55fa9e9b9e6..ae7536b17bf9 100644
> --- a/arch/x86/crypto/curve25519-x86_64.c
> +++ b/arch/x86/crypto/curve25519-x86_64.c
> @@ -12,7 +12,7 @@
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/scatterlist.h>
> -
> +#include <asm/cpu_device_id.h>
> #include <asm/cpufeature.h>
> #include <asm/processor.h>
>
> @@ -1697,13 +1697,22 @@ static struct kpp_alg curve25519_alg = {
> .max_size = curve25519_max_size,
> };
>
> +static const struct x86_cpu_id module_cpu_ids[] = {
> + X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
> + {}
> +};
> +MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
>
> static int __init curve25519_mod_init(void)
> {
> - if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX))
> - static_branch_enable(&curve25519_use_bmi2_adx);
> - else
> - return 0;
> + if (!x86_match_cpu(module_cpu_ids))
> + return -ENODEV;
> +
> + if (!boot_cpu_has(X86_FEATURE_BMI2))
> + return -ENODEV;
> +
> + static_branch_enable(&curve25519_use_bmi2_adx);
Can the user still insmod this? If so, you can't remove the ADX check.
Ditto for rest of patch.
For modules that have multiple choices, add read-only module parameters
reporting which CPU features a module is using.
The parameters show up as follows for modules that modify the behavior
of their registered drivers or register additional drivers for
each choice:
/sys/module/aesni_intel/parameters/using_x86_avx:1
/sys/module/aesni_intel/parameters/using_x86_avx2:1
/sys/module/aria_aesni_avx_x86_64/parameters/using_x86_gfni:0
/sys/module/chacha_x86_64/parameters/using_x86_avx2:1
/sys/module/chacha_x86_64/parameters/using_x86_avx512:1
/sys/module/crc32c_intel/parameters/using_x86_pclmulqdq:1
/sys/module/curve25519_x86_64/parameters/using_x86_adx:1
/sys/module/libblake2s_x86_64/parameters/using_x86_avx512:1
/sys/module/libblake2s_x86_64/parameters/using_x86_ssse3:1
/sys/module/poly1305_x86_64/parameters/using_x86_avx:1
/sys/module/poly1305_x86_64/parameters/using_x86_avx2:1
/sys/module/poly1305_x86_64/parameters/using_x86_avx512:0
/sys/module/sha1_ssse3/parameters/using_x86_avx:1
/sys/module/sha1_ssse3/parameters/using_x86_avx2:1
/sys/module/sha1_ssse3/parameters/using_x86_shani:0
/sys/module/sha1_ssse3/parameters/using_x86_ssse3:1
/sys/module/sha256_ssse3/parameters/using_x86_avx:1
/sys/module/sha256_ssse3/parameters/using_x86_avx2:1
/sys/module/sha256_ssse3/parameters/using_x86_shani:0
/sys/module/sha256_ssse3/parameters/using_x86_ssse3:1
/sys/module/sha512_ssse3/parameters/using_x86_avx:1
/sys/module/sha512_ssse3/parameters/using_x86_avx2:1
/sys/module/sha512_ssse3/parameters/using_x86_ssse3:1
Delete the aesni_intel prints reporting those selections:
pr_info("AVX2 version of gcm_enc/dec engaged.\n");
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/aesni-intel_glue.c | 19 ++++++++-----------
arch/x86/crypto/aria_aesni_avx_glue.c | 6 ++++++
arch/x86/crypto/blake2s-glue.c | 5 +++++
arch/x86/crypto/chacha_glue.c | 5 +++++
arch/x86/crypto/crc32c-intel_glue.c | 6 ++++++
arch/x86/crypto/curve25519-x86_64.c | 3 +++
arch/x86/crypto/poly1305_glue.c | 7 +++++++
arch/x86/crypto/sha1_ssse3_glue.c | 11 +++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 20 +++++++++++---------
arch/x86/crypto/sha512_ssse3_glue.c | 7 +++++++
10 files changed, 69 insertions(+), 20 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 0505d4f9d2a2..80dbf98c53fd 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1228,6 +1228,11 @@ static struct aead_alg aesni_aeads[0];
static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)];
+module_param_named(using_x86_avx2, gcm_use_avx2.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx, gcm_use_avx.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2 (for GCM mode)");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX (for CTR and GCM modes)");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
{}
@@ -1241,22 +1246,14 @@ static int __init aesni_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
- if (boot_cpu_has(X86_FEATURE_AVX2)) {
- pr_info("AVX2 version of gcm_enc/dec engaged.\n");
- static_branch_enable(&gcm_use_avx);
+ if (boot_cpu_has(X86_FEATURE_AVX2))
static_branch_enable(&gcm_use_avx2);
- } else
+
if (boot_cpu_has(X86_FEATURE_AVX)) {
- pr_info("AVX version of gcm_enc/dec engaged.\n");
static_branch_enable(&gcm_use_avx);
- } else {
- pr_info("SSE version of gcm_enc/dec engaged.\n");
- }
- if (boot_cpu_has(X86_FEATURE_AVX)) {
- /* optimize performance of ctr mode encryption transform */
static_call_update(aesni_ctr_enc_tfm, aesni_ctr_enc_avx_tfm);
- pr_info("AES CTR mode by8 optimization enabled\n");
}
+
#endif /* CONFIG_X86_64 */
err = crypto_register_alg(&aesni_cipher_alg);
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index 6a135203a767..9fd3d1fe1105 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -166,6 +166,10 @@ static struct skcipher_alg aria_algs[] = {
static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
+static int using_x86_gfni;
+module_param(using_x86_gfni, int, 0444);
+MODULE_PARM_DESC(using_x86_gfni, "Using x86 instruction set extensions: GF-NI");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
{}
@@ -192,6 +196,7 @@ static int __init aria_avx_init(void)
}
if (boot_cpu_has(X86_FEATURE_GFNI)) {
+ using_x86_gfni = 1;
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
@@ -210,6 +215,7 @@ static void __exit aria_avx_exit(void)
{
simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
aria_simd_algs);
+ using_x86_gfni = 0;
}
module_init(aria_avx_init);
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index df757d18a35a..781cf9471cb6 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -55,6 +55,11 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
}
EXPORT_SYMBOL(blake2s_compress);
+module_param_named(using_x86_ssse3, blake2s_use_ssse3.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx512vl, blake2s_use_avx512.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx512vl, "Using x86 instruction set extensions: AVX-512VL");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
X86_MATCH_FEATURE(X86_FEATURE_AVX512VL, NULL),
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 546ab0abf30c..ec7461412c5e 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -277,6 +277,11 @@ static struct skcipher_alg algs[] = {
},
};
+module_param_named(using_x86_avx512vl, chacha_use_avx512vl.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx2, chacha_use_avx2.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_avx512vl, "Using x86 instruction set extensions: AVX-512VL");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
{}
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index aff132e925ea..3c2bf7032667 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -240,6 +240,10 @@ static struct shash_alg alg = {
}
};
+static int using_x86_pclmulqdq;
+module_param(using_x86_pclmulqdq, int, 0444);
+MODULE_PARM_DESC(using_x86_pclmulqdq, "Using x86 instruction set extensions: PCLMULQDQ");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
@@ -252,6 +256,7 @@ static int __init crc32c_intel_mod_init(void)
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
+ using_x86_pclmulqdq = 1;
alg.update = crc32c_pcl_intel_update;
alg.finup = crc32c_pcl_intel_finup;
alg.digest = crc32c_pcl_intel_digest;
@@ -263,6 +268,7 @@ static int __init crc32c_intel_mod_init(void)
static void __exit crc32c_intel_mod_fini(void)
{
crypto_unregister_shash(&alg);
+ using_x86_pclmulqdq = 0;
}
module_init(crc32c_intel_mod_init);
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index ae7536b17bf9..6d222849e409 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1697,6 +1697,9 @@ static struct kpp_alg curve25519_alg = {
.max_size = curve25519_max_size,
};
+module_param_named(using_x86_adx, curve25519_use_bmi2_adx.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_adx, "Using x86 instruction set extensions: ADX");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
{}
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index f1e39e23b2a3..d3c0d5b335ea 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -269,6 +269,13 @@ static struct shash_alg alg = {
},
};
+module_param_named(using_x86_avx, poly1305_use_avx.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx2, poly1305_use_avx2.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx512f, poly1305_use_avx512.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+MODULE_PARM_DESC(using_x86_avx512f, "Using x86 instruction set extensions: AVX-512F");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
{}
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 806463f57b6d..2445648cf234 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -90,6 +90,17 @@ static int using_x86_avx2;
static int using_x86_shani;
#endif
+#ifdef CONFIG_AS_SHA1_NI
+module_param(using_x86_shani, int, 0444);
+MODULE_PARM_DESC(using_x86_shani, "Using x86 instruction set extensions: SHA-NI");
+#endif
+module_param(using_x86_ssse3, int, 0444);
+module_param(using_x86_avx, int, 0444);
+module_param(using_x86_avx2, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha1_block_fn *sha1_xform)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 30c8c50c1123..1464e6ccf912 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -104,6 +104,17 @@ static int using_x86_avx2;
static int using_x86_shani;
#endif
+#ifdef CONFIG_AS_SHA256_NI
+module_param(using_x86_shani, int, 0444);
+MODULE_PARM_DESC(using_x86_shani, "Using x86 instruction set extensions: SHA-NI");
+#endif
+module_param(using_x86_ssse3, int, 0444);
+module_param(using_x86_avx, int, 0444);
+module_param(using_x86_avx2, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static int _sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha256_block_fn *sha256_xform)
@@ -212,9 +223,6 @@ static void unregister_sha256_ssse3(void)
}
}
-asmlinkage void sha256_transform_avx(struct sha256_state *state,
- const u8 *data, int blocks);
-
static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
@@ -273,9 +281,6 @@ static void unregister_sha256_avx(void)
}
}
-asmlinkage void sha256_transform_rorx(struct sha256_state *state,
- const u8 *data, int blocks);
-
static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
@@ -335,9 +340,6 @@ static void unregister_sha256_avx2(void)
}
#ifdef CONFIG_AS_SHA256_NI
-asmlinkage void sha256_ni_transform(struct sha256_state *digest,
- const u8 *data, int rounds);
-
static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 48586ab40d55..04e2af951a3e 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -81,6 +81,13 @@ static int using_x86_ssse3;
static int using_x86_avx;
static int using_x86_avx2;
+module_param(using_x86_ssse3, int, 0444);
+module_param(using_x86_avx, int, 0444);
+module_param(using_x86_avx2, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static int sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha512_block_fn *sha512_xform)
--
2.38.1
On Tue, Nov 15, 2022 at 10:13:39PM -0600, Robert Elliott wrote:
> For modules that have multiple choices, add read-only module parameters
> reporting which CPU features a module is using.
>
> The parameters show up as follows for modules that modify the behavior
> of their registered drivers or register additional drivers for
> each choice:
> /sys/module/aesni_intel/parameters/using_x86_avx:1
> /sys/module/aesni_intel/parameters/using_x86_avx2:1
> /sys/module/aria_aesni_avx_x86_64/parameters/using_x86_gfni:0
> /sys/module/chacha_x86_64/parameters/using_x86_avx2:1
> /sys/module/chacha_x86_64/parameters/using_x86_avx512:1
> /sys/module/crc32c_intel/parameters/using_x86_pclmulqdq:1
> /sys/module/curve25519_x86_64/parameters/using_x86_adx:1
> /sys/module/libblake2s_x86_64/parameters/using_x86_avx512:1
> /sys/module/libblake2s_x86_64/parameters/using_x86_ssse3:1
> /sys/module/poly1305_x86_64/parameters/using_x86_avx:1
> /sys/module/poly1305_x86_64/parameters/using_x86_avx2:1
> /sys/module/poly1305_x86_64/parameters/using_x86_avx512:0
> /sys/module/sha1_ssse3/parameters/using_x86_avx:1
> /sys/module/sha1_ssse3/parameters/using_x86_avx2:1
> /sys/module/sha1_ssse3/parameters/using_x86_shani:0
> /sys/module/sha1_ssse3/parameters/using_x86_ssse3:1
> /sys/module/sha256_ssse3/parameters/using_x86_avx:1
> /sys/module/sha256_ssse3/parameters/using_x86_avx2:1
> /sys/module/sha256_ssse3/parameters/using_x86_shani:0
> /sys/module/sha256_ssse3/parameters/using_x86_ssse3:1
> /sys/module/sha512_ssse3/parameters/using_x86_avx:1
> /sys/module/sha512_ssse3/parameters/using_x86_avx2:1
> /sys/module/sha512_ssse3/parameters/using_x86_ssse3:1
Isn't chacha missing?
However, what's the point of any of this? Who benefits from this info?
If something seems slow, I'll generally look at perf top, which provides
this same thing.
Also, "using" isn't quite correct. Some AVX2 machines will never use any
ssse3 instructions, despite the code being executable.
>
> Delete the aesni_intel prints reporting those selections:
> pr_info("AVX2 version of gcm_enc/dec engaged.\n");
This part I like.
> +module_param_named(using_x86_adx, curve25519_use_bmi2_adx.key.enabled.counter, int, 0444);
> +MODULE_PARM_DESC(using_x86_adx, "Using x86 instruction set extensions: ADX");
And BMI2, not just ADX.
Don't refuse to load modules based on missing additional x86 features
(e.g., OSXSAVE) or x86 XSAVE features (e.g., YMM). Instead, load the module,
but don't register any crypto drivers. Report the fact that one or more
features are missing in a new missing_x86_features module parameter
(0 = no problems, 1 = something is missing; each module parameter
description lists all the features that it wants).
For the SHA functions that register up to four drivers based on CPU
features, report separate module parameters for each set:
missing_x86_features_avx2
missing_x86_features_avx
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/aegis128-aesni-glue.c | 15 ++++++++++---
arch/x86/crypto/aria_aesni_avx_glue.c | 24 +++++++++++---------
arch/x86/crypto/camellia_aesni_avx2_glue.c | 25 ++++++++++++---------
arch/x86/crypto/camellia_aesni_avx_glue.c | 25 ++++++++++++---------
arch/x86/crypto/cast5_avx_glue.c | 20 ++++++++++-------
arch/x86/crypto/cast6_avx_glue.c | 20 ++++++++++-------
arch/x86/crypto/curve25519-x86_64.c | 12 ++++++++--
arch/x86/crypto/nhpoly1305-avx2-glue.c | 14 +++++++++---
arch/x86/crypto/polyval-clmulni_glue.c | 15 ++++++++++---
arch/x86/crypto/serpent_avx2_glue.c | 24 +++++++++++---------
arch/x86/crypto/serpent_avx_glue.c | 21 ++++++++++-------
arch/x86/crypto/sha1_ssse3_glue.c | 20 +++++++++++++----
arch/x86/crypto/sha256_ssse3_glue.c | 18 +++++++++++++--
arch/x86/crypto/sha512_ssse3_glue.c | 18 +++++++++++++--
arch/x86/crypto/sm3_avx_glue.c | 22 ++++++++++--------
arch/x86/crypto/sm4_aesni_avx2_glue.c | 26 +++++++++++++---------
arch/x86/crypto/sm4_aesni_avx_glue.c | 26 +++++++++++++---------
arch/x86/crypto/twofish_avx_glue.c | 19 ++++++++++------
18 files changed, 243 insertions(+), 121 deletions(-)
diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index a3ebd018953c..e0312ecf34a8 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -288,6 +288,11 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (SSE2) and/or XSAVE features (SSE)");
+
static struct simd_aead_alg *simd_alg;
static int __init crypto_aegis128_aesni_module_init(void)
@@ -296,8 +301,10 @@ static int __init crypto_aegis128_aesni_module_init(void)
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_XMM2) ||
- !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
- return -ENODEV;
+ !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL)) {
+ missing_x86_features = 1;
+ return 0;
+ }
return simd_register_aeads_compat(&crypto_aegis128_aesni_alg, 1,
&simd_alg);
@@ -305,7 +312,9 @@ static int __init crypto_aegis128_aesni_module_init(void)
static void __exit crypto_aegis128_aesni_module_exit(void)
{
- simd_unregister_aeads(&crypto_aegis128_aesni_alg, 1, &simd_alg);
+ if (!missing_x86_features)
+ simd_unregister_aeads(&crypto_aegis128_aesni_alg, 1, &simd_alg);
+ missing_x86_features = 0;
}
module_init(crypto_aegis128_aesni_module_init);
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index 9fd3d1fe1105..ebb9760967b5 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -176,23 +176,25 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static int __init aria_avx_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
if (boot_cpu_has(X86_FEATURE_GFNI)) {
@@ -213,8 +215,10 @@ static int __init aria_avx_init(void)
static void __exit aria_avx_exit(void)
{
- simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
- aria_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
+ aria_simd_algs);
+ missing_x86_features = 0;
using_x86_gfni = 0;
}
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 6c48fc9f3fde..e8ae1e1a801d 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -105,26 +105,28 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, AVX, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
static int __init camellia_aesni_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES-NI, AVX, or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(camellia_algs,
@@ -134,8 +136,11 @@ static int __init camellia_aesni_init(void)
static void __exit camellia_aesni_fini(void)
{
- simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs),
- camellia_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(camellia_algs,
+ ARRAY_SIZE(camellia_algs),
+ camellia_simd_algs);
+ missing_x86_features = 0;
}
module_init(camellia_aesni_init);
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 6d7fc96d242e..6784d631575c 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -105,25 +105,27 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
static int __init camellia_aesni_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(camellia_algs,
@@ -133,8 +135,11 @@ static int __init camellia_aesni_init(void)
static void __exit camellia_aesni_fini(void)
{
- simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs),
- camellia_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(camellia_algs,
+ ARRAY_SIZE(camellia_algs),
+ camellia_simd_algs);
+ missing_x86_features = 0;
}
module_init(camellia_aesni_init);
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index bdc3c763334c..34ef032bb8d0 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -100,19 +100,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];
static int __init cast5_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(cast5_algs,
@@ -122,8 +124,10 @@ static int __init cast5_init(void)
static void __exit cast5_exit(void)
{
- simd_unregister_skciphers(cast5_algs, ARRAY_SIZE(cast5_algs),
- cast5_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(cast5_algs, ARRAY_SIZE(cast5_algs),
+ cast5_simd_algs);
+ missing_x86_features = 0;
}
module_init(cast5_init);
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index addca34b3511..71559fd3ea87 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -100,19 +100,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];
static int __init cast6_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(cast6_algs,
@@ -122,8 +124,10 @@ static int __init cast6_init(void)
static void __exit cast6_exit(void)
{
- simd_unregister_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs),
- cast6_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs),
+ cast6_simd_algs);
+ missing_x86_features = 0;
}
module_init(cast6_init);
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index 6d222849e409..74672351e534 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1706,13 +1706,20 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (BMI2)");
+
static int __init curve25519_mod_init(void)
{
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!boot_cpu_has(X86_FEATURE_BMI2))
- return -ENODEV;
+ if (!boot_cpu_has(X86_FEATURE_BMI2)) {
+ missing_x86_features = 1;
+ return 0;
+ }
static_branch_enable(&curve25519_use_bmi2_adx);
@@ -1725,6 +1732,7 @@ static void __exit curve25519_mod_exit(void)
if (IS_REACHABLE(CONFIG_CRYPTO_KPP) &&
static_branch_likely(&curve25519_use_bmi2_adx))
crypto_unregister_kpp(&curve25519_alg);
+ missing_x86_features = 0;
}
module_init(curve25519_mod_init);
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index fa415fec5793..2e63947bc9fa 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -67,20 +67,28 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (OSXSAVE)");
+
static int __init nhpoly1305_mod_init(void)
{
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!boot_cpu_has(X86_FEATURE_OSXSAVE))
- return -ENODEV;
+ if (!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ missing_x86_features = 1;
+ return 0;
+ }
return crypto_register_shash(&nhpoly1305_alg);
}
static void __exit nhpoly1305_mod_exit(void)
{
- crypto_unregister_shash(&nhpoly1305_alg);
+ if (!missing_x86_features)
+ crypto_unregister_shash(&nhpoly1305_alg);
}
module_init(nhpoly1305_mod_init);
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index b98e32f8e2a4..20d4a68ec1d7 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -182,20 +182,29 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AVX)");
+
static int __init polyval_clmulni_mod_init(void)
{
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!boot_cpu_has(X86_FEATURE_AVX))
- return -ENODEV;
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ missing_x86_features = 1;
+ return 0;
+ }
return crypto_register_shash(&polyval_alg);
}
static void __exit polyval_clmulni_mod_exit(void)
{
- crypto_unregister_shash(&polyval_alg);
+ if (!missing_x86_features)
+ crypto_unregister_shash(&polyval_alg);
+ missing_x86_features = 0;
}
module_init(polyval_clmulni_mod_init);
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index bc18149fb928..2aa62c93a16f 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -101,23 +101,25 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_avx2_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(serpent_algs,
@@ -127,8 +129,10 @@ static int __init serpent_avx2_init(void)
static void __exit serpent_avx2_fini(void)
{
- simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
- serpent_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
+ serpent_simd_algs);
+ missing_x86_features = 0;
}
module_init(serpent_avx2_init);
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 0db18d99da50..28ee9717df49 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -107,19 +107,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(serpent_algs,
@@ -129,8 +131,11 @@ static int __init serpent_init(void)
static void __exit serpent_exit(void)
{
- simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
- serpent_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(serpent_algs,
+ ARRAY_SIZE(serpent_algs),
+ serpent_simd_algs);
+ missing_x86_features = 0;
}
module_init(serpent_init);
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 2445648cf234..405af5e14b67 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -351,9 +351,17 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features_avx2;
+static int missing_x86_features_avx;
+module_param(missing_x86_features_avx2, int, 0444);
+module_param(missing_x86_features_avx, int, 0444);
+MODULE_PARM_DESC(missing_x86_features_avx2,
+ "Missing x86 instruction set extensions (BMI1, BMI2) to support AVX2");
+MODULE_PARM_DESC(missing_x86_features_avx,
+ "Missing x86 XSAVE features (SSE, YMM) to support AVX");
+
static int __init sha1_ssse3_mod_init(void)
{
- const char *feature_name;
int ret;
if (!x86_match_cpu(module_cpu_ids))
@@ -374,10 +382,11 @@ static int __init sha1_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_BMI1) &&
boot_cpu_has(X86_FEATURE_BMI2)) {
-
ret = crypto_register_shash(&sha1_avx2_alg);
if (!ret)
using_x86_avx2 = 1;
+ } else {
+ missing_x86_features_avx2 = 1;
}
}
@@ -385,11 +394,12 @@ static int __init sha1_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX)) {
if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
-
+ NULL)) {
ret = crypto_register_shash(&sha1_avx_alg);
if (!ret)
using_x86_avx = 1;
+ } else {
+ missing_x86_features_avx = 1;
}
}
@@ -415,6 +425,8 @@ static void __exit sha1_ssse3_mod_fini(void)
unregister_sha1_avx2();
unregister_sha1_avx();
unregister_sha1_ssse3();
+ missing_x86_features_avx2 = 0;
+ missing_x86_features_avx = 0;
}
module_init(sha1_ssse3_mod_init);
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 1464e6ccf912..293cf7085dd3 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -413,9 +413,17 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features_avx2;
+static int missing_x86_features_avx;
+module_param(missing_x86_features_avx2, int, 0444);
+module_param(missing_x86_features_avx, int, 0444);
+MODULE_PARM_DESC(missing_x86_features_avx2,
+ "Missing x86 instruction set extensions (BMI2) to support AVX2");
+MODULE_PARM_DESC(missing_x86_features_avx,
+ "Missing x86 XSAVE features (SSE, YMM) to support AVX");
+
static int __init sha256_ssse3_mod_init(void)
{
- const char *feature_name;
int ret;
if (!x86_match_cpu(module_cpu_ids))
@@ -440,6 +448,8 @@ static int __init sha256_ssse3_mod_init(void)
ARRAY_SIZE(sha256_avx2_algs));
if (!ret)
using_x86_avx2 = 1;
+ } else {
+ missing_x86_features_avx2 = 1;
}
}
@@ -447,11 +457,13 @@ static int __init sha256_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX)) {
if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
+ NULL)) {
ret = crypto_register_shashes(sha256_avx_algs,
ARRAY_SIZE(sha256_avx_algs));
if (!ret)
using_x86_avx = 1;
+ } else {
+ missing_x86_features_avx = 1;
}
}
@@ -478,6 +490,8 @@ static void __exit sha256_ssse3_mod_fini(void)
unregister_sha256_avx2();
unregister_sha256_avx();
unregister_sha256_ssse3();
+ missing_x86_features_avx2 = 0;
+ missing_x86_features_avx = 0;
}
module_init(sha256_ssse3_mod_init);
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 04e2af951a3e..9f13baf7dda9 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -319,6 +319,15 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features_avx2;
+static int missing_x86_features_avx;
+module_param(missing_x86_features_avx2, int, 0444);
+module_param(missing_x86_features_avx, int, 0444);
+MODULE_PARM_DESC(missing_x86_features_avx2,
+ "Missing x86 instruction set extensions (BMI2) to support AVX2");
+MODULE_PARM_DESC(missing_x86_features_avx,
+ "Missing x86 XSAVE features (SSE, YMM) to support AVX");
+
static void unregister_sha512_avx2(void)
{
if (using_x86_avx2) {
@@ -330,7 +339,6 @@ static void unregister_sha512_avx2(void)
static int __init sha512_ssse3_mod_init(void)
{
- const char *feature_name;
int ret;
if (!x86_match_cpu(module_cpu_ids))
@@ -343,6 +351,8 @@ static int __init sha512_ssse3_mod_init(void)
ARRAY_SIZE(sha512_avx2_algs));
if (!ret)
using_x86_avx2 = 1;
+ } else {
+ missing_x86_features_avx2 = 1;
}
}
@@ -350,11 +360,13 @@ static int __init sha512_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX)) {
if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
+ NULL)) {
ret = crypto_register_shashes(sha512_avx_algs,
ARRAY_SIZE(sha512_avx_algs));
if (!ret)
using_x86_avx = 1;
+ } else {
+ missing_x86_features_avx = 1;
}
}
@@ -376,6 +388,8 @@ static void __exit sha512_ssse3_mod_fini(void)
unregister_sha512_avx2();
unregister_sha512_avx();
unregister_sha512_ssse3();
+ missing_x86_features_avx2 = 0;
+ missing_x86_features_avx = 0;
}
module_init(sha512_ssse3_mod_init);
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index c7786874319c..169ba6a2c806 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -126,22 +126,24 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (BMI2) and/or XSAVE features (SSE, YMM)");
+
static int __init sm3_avx_mod_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_BMI2)) {
- pr_info("BMI2 instruction are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return crypto_register_shash(&sm3_avx_alg);
@@ -149,7 +151,9 @@ static int __init sm3_avx_mod_init(void)
static void __exit sm3_avx_mod_exit(void)
{
- crypto_unregister_shash(&sm3_avx_alg);
+ if (!missing_x86_features)
+ crypto_unregister_shash(&sm3_avx_alg);
+ missing_x86_features = 0;
}
module_init(sm3_avx_mod_init);
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 125b00db89b1..6bcf78231888 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -133,27 +133,29 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, AVX, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];
static int __init sm4_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX, AES-NI, and/or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(sm4_aesni_avx2_skciphers,
@@ -163,9 +165,11 @@ static int __init sm4_init(void)
static void __exit sm4_exit(void)
{
- simd_unregister_skciphers(sm4_aesni_avx2_skciphers,
- ARRAY_SIZE(sm4_aesni_avx2_skciphers),
- simd_sm4_aesni_avx2_skciphers);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(sm4_aesni_avx2_skciphers,
+ ARRAY_SIZE(sm4_aesni_avx2_skciphers),
+ simd_sm4_aesni_avx2_skciphers);
+ missing_x86_features = 0;
}
module_init(sm4_init);
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index ac8182b197cf..03775b1079dc 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -452,26 +452,28 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];
static int __init sm4_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(sm4_aesni_avx_skciphers,
@@ -481,9 +483,11 @@ static int __init sm4_init(void)
static void __exit sm4_exit(void)
{
- simd_unregister_skciphers(sm4_aesni_avx_skciphers,
- ARRAY_SIZE(sm4_aesni_avx_skciphers),
- simd_sm4_aesni_avx_skciphers);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(sm4_aesni_avx_skciphers,
+ ARRAY_SIZE(sm4_aesni_avx_skciphers),
+ simd_sm4_aesni_avx_skciphers);
+ missing_x86_features = 0;
}
module_init(sm4_init);
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 4657e6efc35d..ae3cc4ad6f4f 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -110,18 +110,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];
static int __init twofish_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}
return simd_register_skciphers_compat(twofish_algs,
@@ -131,8 +134,10 @@ static int __init twofish_init(void)
static void __exit twofish_exit(void)
{
- simd_unregister_skciphers(twofish_algs, ARRAY_SIZE(twofish_algs),
- twofish_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(twofish_algs, ARRAY_SIZE(twofish_algs),
+ twofish_simd_algs);
+ missing_x86_features = 0;
}
module_init(twofish_init);
--
2.38.1
Don't refuse to load modules on certain CPUs and print a message
to the console. Instead, load the module but don't register the
crypto functions, and report this condition via a new module
suboptimal_x86 module parameter with this description:
Crypto driver not registered because performance on this CPU would be suboptimal
Reword the descriptions of the existing force module parameter
to match this modified behavior:
force: Force crypto driver registration on suboptimal CPUs
Make the new module parameters readable via sysfs:
/sys/module/blowfish_x86_64/parameters/suboptimal_x86:0
/sys/module/camellia_x86_64/parameters/suboptimal_x86:0
/sys/module/des3_ede_x86_64/parameters/suboptimal_x86:1
/sys/module/twofish_x86_64_3way/parameters/suboptimal_x86:1
If the module has been loaded and is reporting suboptimal_x86=1,
remove it to try loading again:
modprobe -r blowfish_x86_64
modprobe blowfish_x86_64 force=1
or specify it on the kernel command line:
blowfish_x86_64.force=1
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/blowfish_glue.c | 29 +++++++++++++++++------------
arch/x86/crypto/camellia_glue.c | 27 ++++++++++++++++-----------
arch/x86/crypto/des3_ede_glue.c | 26 +++++++++++++++++---------
arch/x86/crypto/twofish_glue_3way.c | 26 +++++++++++++++-----------
4 files changed, 65 insertions(+), 43 deletions(-)
diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 4c0ead71b198..8e4de7859e34 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -283,7 +283,7 @@ static struct skcipher_alg bf_skcipher_algs[] = {
},
};
-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -292,7 +292,7 @@ static bool is_blacklisted_cpu(void)
/*
* On Pentium 4, blowfish-x86_64 is slower than generic C
* implementation because use of 64bit rotates (which are really
- * slow on P4). Therefore blacklist P4s.
+ * slow on P4).
*/
return true;
}
@@ -302,7 +302,12 @@ static bool is_blacklisted_cpu(void)
static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -317,12 +322,9 @@ static int __init blowfish_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!force && is_blacklisted_cpu()) {
- printk(KERN_INFO
- "blowfish-x86_64: performance on this CPU "
- "would be suboptimal: disabling "
- "blowfish-x86_64.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}
err = crypto_register_alg(&bf_cipher_alg);
@@ -339,9 +341,12 @@ static int __init blowfish_init(void)
static void __exit blowfish_fini(void)
{
- crypto_unregister_alg(&bf_cipher_alg);
- crypto_unregister_skciphers(bf_skcipher_algs,
- ARRAY_SIZE(bf_skcipher_algs));
+ if (!suboptimal_x86) {
+ crypto_unregister_alg(&bf_cipher_alg);
+ crypto_unregister_skciphers(bf_skcipher_algs,
+ ARRAY_SIZE(bf_skcipher_algs));
+ }
+ suboptimal_x86 = 0;
}
module_init(blowfish_init);
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index a3df1043ed73..2cb9b24d9437 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1356,7 +1356,7 @@ static struct skcipher_alg camellia_skcipher_algs[] = {
}
};
-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -1376,7 +1376,12 @@ static bool is_blacklisted_cpu(void)
static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -1391,12 +1396,9 @@ static int __init camellia_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!force && is_blacklisted_cpu()) {
- printk(KERN_INFO
- "camellia-x86_64: performance on this CPU "
- "would be suboptimal: disabling "
- "camellia-x86_64.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}
err = crypto_register_alg(&camellia_cipher_alg);
@@ -1413,9 +1415,12 @@ static int __init camellia_init(void)
static void __exit camellia_fini(void)
{
- crypto_unregister_alg(&camellia_cipher_alg);
- crypto_unregister_skciphers(camellia_skcipher_algs,
- ARRAY_SIZE(camellia_skcipher_algs));
+ if (!suboptimal_x86) {
+ crypto_unregister_alg(&camellia_cipher_alg);
+ crypto_unregister_skciphers(camellia_skcipher_algs,
+ ARRAY_SIZE(camellia_skcipher_algs));
+ }
+ suboptimal_x86 = 0;
}
module_init(camellia_init);
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index 168cac5c6ca6..a4cac5129148 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -334,7 +334,7 @@ static struct skcipher_alg des3_ede_skciphers[] = {
}
};
-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -343,7 +343,7 @@ static bool is_blacklisted_cpu(void)
/*
* On Pentium 4, des3_ede-x86_64 is slower than generic C
* implementation because use of 64bit rotates (which are really
- * slow on P4). Therefore blacklist P4s.
+ * slow on P4).
*/
return true;
}
@@ -353,7 +353,12 @@ static bool is_blacklisted_cpu(void)
static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -368,9 +373,9 @@ static int __init des3_ede_x86_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!force && is_blacklisted_cpu()) {
- pr_info("des3_ede-x86_64: performance on this CPU would be suboptimal: disabling des3_ede-x86_64.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}
err = crypto_register_alg(&des3_ede_cipher);
@@ -387,9 +392,12 @@ static int __init des3_ede_x86_init(void)
static void __exit des3_ede_x86_fini(void)
{
- crypto_unregister_alg(&des3_ede_cipher);
- crypto_unregister_skciphers(des3_ede_skciphers,
- ARRAY_SIZE(des3_ede_skciphers));
+ if (!suboptimal_x86) {
+ crypto_unregister_alg(&des3_ede_cipher);
+ crypto_unregister_skciphers(des3_ede_skciphers,
+ ARRAY_SIZE(des3_ede_skciphers));
+ }
+ suboptimal_x86 = 0;
}
module_init(des3_ede_x86_init);
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 790e5a59a9a7..8db2f23b3056 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -103,7 +103,7 @@ static struct skcipher_alg tf_skciphers[] = {
},
};
-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -118,8 +118,7 @@ static bool is_blacklisted_cpu(void)
* storing blocks in 64bit registers to allow three blocks to
* be processed parallel. Parallel operation then allows gaining
* more performance than was trade off, on out-of-order CPUs.
- * However Atom does not benefit from this parallelism and
- * should be blacklisted.
+ * However Atom does not benefit from this parallelism.
*/
return true;
}
@@ -139,7 +138,12 @@ static bool is_blacklisted_cpu(void)
static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -152,12 +156,9 @@ static int __init twofish_3way_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- if (!force && is_blacklisted_cpu()) {
- printk(KERN_INFO
- "twofish-x86_64-3way: performance on this CPU "
- "would be suboptimal: disabling "
- "twofish-x86_64-3way.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}
return crypto_register_skciphers(tf_skciphers,
@@ -166,7 +167,10 @@ static int __init twofish_3way_init(void)
static void __exit twofish_3way_fini(void)
{
- crypto_unregister_skciphers(tf_skciphers, ARRAY_SIZE(tf_skciphers));
+ if (!suboptimal_x86)
+ crypto_unregister_skciphers(tf_skciphers, ARRAY_SIZE(tf_skciphers));
+
+ suboptimal_x86 = 0;
}
module_init(twofish_3way_init);
--
2.38.1
Make the module descriptions for the x86 optimized crypto modules match
the descriptions of the generic modules and the names in Kconfig.
End each description with "with <feature name>" listing the features
used for module matching.
"-- accelerated for x86 with AVX2"
Mention any other required CPU features:
"(also required: AES-NI)"
Mention any CPU features that are not required but enable additional
acceleration:
"(optional: GF-NI)"
Signed-off-by: Robert Elliott <elliott@hpe.com>
---
arch/x86/crypto/aegis128-aesni-glue.c | 2 +-
arch/x86/crypto/aesni-intel_glue.c | 2 +-
arch/x86/crypto/aria_aesni_avx_glue.c | 2 +-
arch/x86/crypto/blake2s-glue.c | 1 +
arch/x86/crypto/blowfish_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx_glue.c | 2 +-
arch/x86/crypto/camellia_glue.c | 2 +-
arch/x86/crypto/cast5_avx_glue.c | 2 +-
arch/x86/crypto/cast6_avx_glue.c | 2 +-
arch/x86/crypto/chacha_glue.c | 2 +-
arch/x86/crypto/crc32-pclmul_glue.c | 2 +-
arch/x86/crypto/crc32c-intel_glue.c | 2 +-
arch/x86/crypto/crct10dif-pclmul_glue.c | 2 +-
arch/x86/crypto/curve25519-x86_64.c | 1 +
arch/x86/crypto/des3_ede_glue.c | 2 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 +-
arch/x86/crypto/nhpoly1305-avx2-glue.c | 2 +-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 2 +-
arch/x86/crypto/poly1305_glue.c | 2 +-
arch/x86/crypto/polyval-clmulni_glue.c | 2 +-
arch/x86/crypto/serpent_avx2_glue.c | 2 +-
arch/x86/crypto/serpent_avx_glue.c | 2 +-
arch/x86/crypto/serpent_sse2_glue.c | 2 +-
arch/x86/crypto/sha1_ssse3_glue.c | 2 +-
arch/x86/crypto/sha256_ssse3_glue.c | 2 +-
arch/x86/crypto/sha512_ssse3_glue.c | 2 +-
arch/x86/crypto/sm3_avx_glue.c | 2 +-
arch/x86/crypto/sm4_aesni_avx2_glue.c | 2 +-
arch/x86/crypto/sm4_aesni_avx_glue.c | 2 +-
arch/x86/crypto/twofish_avx_glue.c | 2 +-
arch/x86/crypto/twofish_glue.c | 2 +-
arch/x86/crypto/twofish_glue_3way.c | 2 +-
crypto/aes_ti.c | 2 +-
crypto/blake2b_generic.c | 2 +-
crypto/blowfish_common.c | 2 +-
crypto/crct10dif_generic.c | 2 +-
crypto/curve25519-generic.c | 1 +
crypto/sha256_generic.c | 2 +-
crypto/sha512_generic.c | 2 +-
crypto/sm3.c | 2 +-
crypto/sm4.c | 2 +-
crypto/twofish_common.c | 2 +-
crypto/twofish_generic.c | 2 +-
44 files changed, 44 insertions(+), 41 deletions(-)
diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index e0312ecf34a8..e72ae7ba5f12 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -322,6 +322,6 @@ module_exit(crypto_aegis128_aesni_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
-MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm -- AESNI+SSE2 implementation");
+MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm -- accelerated for x86 with AES-NI (also required: SEE2)");
MODULE_ALIAS_CRYPTO("aegis128");
MODULE_ALIAS_CRYPTO("aegis128-aesni");
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 80dbf98c53fd..3d8508598e76 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1311,6 +1311,6 @@ static void __exit aesni_exit(void)
late_initcall(aesni_init);
module_exit(aesni_exit);
-MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, Intel AES-NI instructions optimized");
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm -- accelerated for x86 with AES-NI (optional: AVX, AVX2)");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index ebb9760967b5..1d23c7ef7aef 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -227,6 +227,6 @@ module_exit(aria_avx_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Taehee Yoo <ap420073@gmail.com>");
-MODULE_DESCRIPTION("ARIA Cipher Algorithm, AVX/AES-NI/GFNI optimized");
+MODULE_DESCRIPTION("ARIA Cipher Algorithm -- accelerated for x86 with AVX (also required: AES-NI, OSXSAVE)(optional: GF-NI)");
MODULE_ALIAS_CRYPTO("aria");
MODULE_ALIAS_CRYPTO("aria-aesni-avx");
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index 781cf9471cb6..0618f0d31fae 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -90,3 +90,4 @@ static int __init blake2s_mod_init(void)
module_init(blake2s_mod_init);
MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("BLAKE2s hash algorithm -- accelerated for x86 with SSSE3 or AVX-512VL");
diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 8e4de7859e34..67f7562d2d02 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -353,6 +353,6 @@ module_init(blowfish_init);
module_exit(blowfish_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Blowfish Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Blowfish Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("blowfish");
MODULE_ALIAS_CRYPTO("blowfish-asm");
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e8ae1e1a801d..da89fef184d2 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -147,6 +147,6 @@ module_init(camellia_aesni_init);
module_exit(camellia_aesni_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Camellia Cipher Algorithm, AES-NI/AVX2 optimized");
+MODULE_DESCRIPTION("Camellia Cipher Algorithm -- accelerated for x86 with AVX2 (also required: AES-NI, AVX, OSXSAVE)");
MODULE_ALIAS_CRYPTO("camellia");
MODULE_ALIAS_CRYPTO("camellia-asm");
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 6784d631575c..0eebb56bc440 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -146,6 +146,6 @@ module_init(camellia_aesni_init);
module_exit(camellia_aesni_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Camellia Cipher Algorithm, AES-NI/AVX optimized");
+MODULE_DESCRIPTION("Camellia Cipher Algorithm -- accelerated for x86 with AVX (also required: AES-NI, OSXSAVE)");
MODULE_ALIAS_CRYPTO("camellia");
MODULE_ALIAS_CRYPTO("camellia-asm");
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index 2cb9b24d9437..b8cad1655c66 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1427,6 +1427,6 @@ module_init(camellia_init);
module_exit(camellia_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Camellia Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Camellia Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("camellia");
MODULE_ALIAS_CRYPTO("camellia-asm");
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 34ef032bb8d0..4a11d3ea9838 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -133,6 +133,6 @@ static void __exit cast5_exit(void)
module_init(cast5_init);
module_exit(cast5_exit);
-MODULE_DESCRIPTION("Cast5 Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Cast5 Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("cast5");
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 71559fd3ea87..53a92999a234 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -133,6 +133,6 @@ static void __exit cast6_exit(void)
module_init(cast6_init);
module_exit(cast6_exit);
-MODULE_DESCRIPTION("Cast6 Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Cast6 Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("cast6");
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index ec7461412c5e..563546d0bc2a 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -320,7 +320,7 @@ module_exit(chacha_simd_mod_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
-MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (x64 SIMD accelerated)");
+MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers -- accelerated for x86 with SSSE3 (optional: AVX, AVX2, AVX-512VL and AVX-512BW)");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-simd");
MODULE_ALIAS_CRYPTO("xchacha20");
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index d5e889c24bea..1c297fae5d39 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -207,6 +207,6 @@ module_exit(crc32_pclmul_mod_fini);
MODULE_AUTHOR("Alexander Boyko <alexander_boyko@xyratex.com>");
MODULE_LICENSE("GPL");
-
+MODULE_DESCRIPTION("CRC32 -- accelerated for x86 with PCLMULQDQ");
MODULE_ALIAS_CRYPTO("crc32");
MODULE_ALIAS_CRYPTO("crc32-pclmul");
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 3c2bf7032667..ba7899d04bb1 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -275,7 +275,7 @@ module_init(crc32c_intel_mod_init);
module_exit(crc32c_intel_mod_fini);
MODULE_AUTHOR("Austin Zhang <austin.zhang@intel.com>, Kent Liu <kent.liu@intel.com>");
-MODULE_DESCRIPTION("CRC32c (Castagnoli) optimization using Intel Hardware.");
+MODULE_DESCRIPTION("CRC32c (Castagnoli) -- accelerated for x86 with SSE4.2 (optional: PCLMULQDQ)");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("crc32c");
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index a26dbd27da96..df9f81ee97a3 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -162,7 +162,7 @@ module_init(crct10dif_intel_mod_init);
module_exit(crct10dif_intel_mod_fini);
MODULE_AUTHOR("Tim Chen <tim.c.chen@linux.intel.com>");
-MODULE_DESCRIPTION("T10 DIF CRC calculation accelerated with PCLMULQDQ.");
+MODULE_DESCRIPTION("T10 DIF CRC -- accelerated for x86 with PCLMULQDQ");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("crct10dif");
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index 74672351e534..078508f53ff0 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1742,3 +1742,4 @@ MODULE_ALIAS_CRYPTO("curve25519");
MODULE_ALIAS_CRYPTO("curve25519-x86");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_DESCRIPTION("Curve25519 algorithm -- accelerated for x86 with ADX (also requires BMI2)");
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index a4cac5129148..fc90c0a076e3 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -404,7 +404,7 @@ module_init(des3_ede_x86_init);
module_exit(des3_ede_x86_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Triple DES EDE Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Triple DES EDE Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("des3_ede");
MODULE_ALIAS_CRYPTO("des3_ede-asm");
MODULE_AUTHOR("Jussi Kivilinna <jussi.kivilinna@iki.fi>");
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index d19a8e9b34a6..30f4966df4de 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -363,5 +363,5 @@ module_init(ghash_pclmulqdqni_mod_init);
module_exit(ghash_pclmulqdqni_mod_exit);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("GHASH hash function, accelerated by PCLMULQDQ-NI");
+MODULE_DESCRIPTION("GHASH hash function -- accelerated for x86 with PCLMULQDQ");
MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 2e63947bc9fa..ed6209f027e7 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -94,7 +94,7 @@ static void __exit nhpoly1305_mod_exit(void)
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
-MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (AVX2-accelerated)");
+MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function -- accelerated for x86 with AVX2 (also required: OSXSAVE)");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index c47765e46236..d09156e702dd 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -83,7 +83,7 @@ static void __exit nhpoly1305_mod_exit(void)
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
-MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (SSE2-accelerated)");
+MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function -- accelerated for x86 with SSE2");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index d3c0d5b335ea..78f88be4a22a 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -313,6 +313,6 @@ module_exit(poly1305_simd_mod_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
-MODULE_DESCRIPTION("Poly1305 authenticator");
+MODULE_DESCRIPTION("Poly1305 authenticator -- accelerated for x86 (optional: AVX, AVX2, AVX-512F)");
MODULE_ALIAS_CRYPTO("poly1305");
MODULE_ALIAS_CRYPTO("poly1305-simd");
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index 20d4a68ec1d7..447f0f219759 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -211,6 +211,6 @@ module_init(polyval_clmulni_mod_init);
module_exit(polyval_clmulni_mod_exit);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("POLYVAL hash function accelerated by PCLMULQDQ-NI");
+MODULE_DESCRIPTION("POLYVAL hash function - accelerated for x86 with PCLMULQDQ (also required: AVX)");
MODULE_ALIAS_CRYPTO("polyval");
MODULE_ALIAS_CRYPTO("polyval-clmulni");
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 2aa62c93a16f..0a57779a7559 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -139,6 +139,6 @@ module_init(serpent_avx2_init);
module_exit(serpent_avx2_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Serpent Cipher Algorithm, AVX2 optimized");
+MODULE_DESCRIPTION("Serpent Cipher Algorithm -- accelerated for x86 with AVX2 (also required: OSXSAVE)");
MODULE_ALIAS_CRYPTO("serpent");
MODULE_ALIAS_CRYPTO("serpent-asm");
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 28ee9717df49..9d03fb25537f 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -141,6 +141,6 @@ static void __exit serpent_exit(void)
module_init(serpent_init);
module_exit(serpent_exit);
-MODULE_DESCRIPTION("Serpent Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Serpent Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("serpent");
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index 74f0c89f55ef..287b19527105 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -131,6 +131,6 @@ static void __exit serpent_sse2_exit(void)
module_init(serpent_sse2_init);
module_exit(serpent_sse2_exit);
-MODULE_DESCRIPTION("Serpent Cipher Algorithm, SSE2 optimized");
+MODULE_DESCRIPTION("Serpent Cipher Algorithm -- accelerated for x86 with SSE2");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("serpent");
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 405af5e14b67..113756544d4e 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -433,7 +433,7 @@ module_init(sha1_ssse3_mod_init);
module_exit(sha1_ssse3_mod_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA1 Secure Hash Algorithm, Supplemental SSE3 accelerated");
+MODULE_DESCRIPTION("SHA1 Secure Hash Algorithm -- accelerated for x86 with SSSE3, AVX, AVX2, or SHA-NI");
MODULE_ALIAS_CRYPTO("sha1");
MODULE_ALIAS_CRYPTO("sha1-ssse3");
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 293cf7085dd3..78fa25d2e4ba 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -498,7 +498,7 @@ module_init(sha256_ssse3_mod_init);
module_exit(sha256_ssse3_mod_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA256 Secure Hash Algorithm, Supplemental SSE3 accelerated");
+MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithms -- accelerated for x86 with SSSE3, AVX, AVX2, or SHA-NI");
MODULE_ALIAS_CRYPTO("sha256");
MODULE_ALIAS_CRYPTO("sha256-ssse3");
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 9f13baf7dda9..2fa951069604 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -396,7 +396,7 @@ module_init(sha512_ssse3_mod_init);
module_exit(sha512_ssse3_mod_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA512 Secure Hash Algorithm, Supplemental SSE3 accelerated");
+MODULE_DESCRIPTION("SHA-384 and SHA-512 Secure Hash Algorithms -- accelerated for x86 with SSSE3, AVX, or AVX2");
MODULE_ALIAS_CRYPTO("sha512");
MODULE_ALIAS_CRYPTO("sha512-ssse3");
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 169ba6a2c806..9e1177fbf032 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -161,6 +161,6 @@ module_exit(sm3_avx_mod_exit);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Tianjia Zhang <tianjia.zhang@linux.alibaba.com>");
-MODULE_DESCRIPTION("SM3 Secure Hash Algorithm, AVX assembler accelerated");
+MODULE_DESCRIPTION("SM3 Secure Hash Algorithm -- accelerated for x86 with AVX (also required: BMI2)");
MODULE_ALIAS_CRYPTO("sm3");
MODULE_ALIAS_CRYPTO("sm3-avx");
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 6bcf78231888..b497a6006c8d 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -177,6 +177,6 @@ module_exit(sm4_exit);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Tianjia Zhang <tianjia.zhang@linux.alibaba.com>");
-MODULE_DESCRIPTION("SM4 Cipher Algorithm, AES-NI/AVX2 optimized");
+MODULE_DESCRIPTION("SM4 Cipher Algorithm -- accelerated for x86 with AVX2 (also required: AES-NI, AVX, OSXSAVE)");
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-aesni-avx2");
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 03775b1079dc..e583ee0948af 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -495,6 +495,6 @@ module_exit(sm4_exit);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Tianjia Zhang <tianjia.zhang@linux.alibaba.com>");
-MODULE_DESCRIPTION("SM4 Cipher Algorithm, AES-NI/AVX optimized");
+MODULE_DESCRIPTION("SM4 Cipher Algorithm -- accelerated for x86 with AVX (also required: AES-NI, OSXSAVE)");
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-aesni-avx");
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index ae3cc4ad6f4f..7b405c66d5fa 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -143,6 +143,6 @@ static void __exit twofish_exit(void)
module_init(twofish_init);
module_exit(twofish_exit);
-MODULE_DESCRIPTION("Twofish Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("twofish");
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index ade98aef3402..10729675e79c 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -105,6 +105,6 @@ module_init(twofish_glue_init);
module_exit(twofish_glue_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION ("Twofish Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("twofish");
MODULE_ALIAS_CRYPTO("twofish-asm");
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 8db2f23b3056..43f428b59684 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -177,6 +177,6 @@ module_init(twofish_3way_init);
module_exit(twofish_3way_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Twofish Cipher Algorithm, 3-way parallel asm optimized");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm -- accelerated for x86 (3-way parallel)");
MODULE_ALIAS_CRYPTO("twofish");
MODULE_ALIAS_CRYPTO("twofish-asm");
diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c
index 205c2c257d49..3cff553495ad 100644
--- a/crypto/aes_ti.c
+++ b/crypto/aes_ti.c
@@ -78,6 +78,6 @@ static void __exit aes_fini(void)
module_init(aes_init);
module_exit(aes_fini);
-MODULE_DESCRIPTION("Generic fixed time AES");
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm -- generic fixed time");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
index 6704c0355889..ee53f25ff254 100644
--- a/crypto/blake2b_generic.c
+++ b/crypto/blake2b_generic.c
@@ -175,7 +175,7 @@ subsys_initcall(blake2b_mod_init);
module_exit(blake2b_mod_fini);
MODULE_AUTHOR("David Sterba <kdave@kernel.org>");
-MODULE_DESCRIPTION("BLAKE2b generic implementation");
+MODULE_DESCRIPTION("BLAKE2b hash algorithm");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("blake2b-160");
MODULE_ALIAS_CRYPTO("blake2b-160-generic");
diff --git a/crypto/blowfish_common.c b/crypto/blowfish_common.c
index 1c072012baff..8c75fdfcd09c 100644
--- a/crypto/blowfish_common.c
+++ b/crypto/blowfish_common.c
@@ -394,4 +394,4 @@ int blowfish_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen)
EXPORT_SYMBOL_GPL(blowfish_setkey);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Blowfish Cipher common functions");
+MODULE_DESCRIPTION("Blowfish Cipher Algorithm common functions");
diff --git a/crypto/crct10dif_generic.c b/crypto/crct10dif_generic.c
index e843982073bb..81c131c8ccd0 100644
--- a/crypto/crct10dif_generic.c
+++ b/crypto/crct10dif_generic.c
@@ -116,7 +116,7 @@ subsys_initcall(crct10dif_mod_init);
module_exit(crct10dif_mod_fini);
MODULE_AUTHOR("Tim Chen <tim.c.chen@linux.intel.com>");
-MODULE_DESCRIPTION("T10 DIF CRC calculation.");
+MODULE_DESCRIPTION("T10 DIF CRC calculation");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("crct10dif");
MODULE_ALIAS_CRYPTO("crct10dif-generic");
diff --git a/crypto/curve25519-generic.c b/crypto/curve25519-generic.c
index d055b0784c77..4f96583b31dd 100644
--- a/crypto/curve25519-generic.c
+++ b/crypto/curve25519-generic.c
@@ -88,3 +88,4 @@ module_exit(curve25519_exit);
MODULE_ALIAS_CRYPTO("curve25519");
MODULE_ALIAS_CRYPTO("curve25519-generic");
MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Curve25519 algorithm");
diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index bf147b01e313..141430c25e15 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -102,7 +102,7 @@ subsys_initcall(sha256_generic_mod_init);
module_exit(sha256_generic_mod_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithm");
+MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithms");
MODULE_ALIAS_CRYPTO("sha224");
MODULE_ALIAS_CRYPTO("sha224-generic");
diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index be70e76d6d86..63c5616ec770 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -219,7 +219,7 @@ subsys_initcall(sha512_generic_mod_init);
module_exit(sha512_generic_mod_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA-512 and SHA-384 Secure Hash Algorithms");
+MODULE_DESCRIPTION("SHA-384 and SHA-512 Secure Hash Algorithms");
MODULE_ALIAS_CRYPTO("sha384");
MODULE_ALIAS_CRYPTO("sha384-generic");
diff --git a/crypto/sm3.c b/crypto/sm3.c
index d473e358a873..2a400eb69e66 100644
--- a/crypto/sm3.c
+++ b/crypto/sm3.c
@@ -242,5 +242,5 @@ void sm3_final(struct sm3_state *sctx, u8 *out)
}
EXPORT_SYMBOL_GPL(sm3_final);
-MODULE_DESCRIPTION("Generic SM3 library");
+MODULE_DESCRIPTION("SM3 Secure Hash Algorithm generic library");
MODULE_LICENSE("GPL v2");
diff --git a/crypto/sm4.c b/crypto/sm4.c
index 2c44193bc27e..d46b598b41cd 100644
--- a/crypto/sm4.c
+++ b/crypto/sm4.c
@@ -180,5 +180,5 @@ void sm4_crypt_block(const u32 *rk, u8 *out, const u8 *in)
}
EXPORT_SYMBOL_GPL(sm4_crypt_block);
-MODULE_DESCRIPTION("Generic SM4 library");
+MODULE_DESCRIPTION("SM4 Cipher Algorithm generic library");
MODULE_LICENSE("GPL v2");
diff --git a/crypto/twofish_common.c b/crypto/twofish_common.c
index f921f30334f4..daa28045069d 100644
--- a/crypto/twofish_common.c
+++ b/crypto/twofish_common.c
@@ -690,4 +690,4 @@ int twofish_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int key_len)
EXPORT_SYMBOL_GPL(twofish_setkey);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Twofish cipher common functions");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm common functions");
diff --git a/crypto/twofish_generic.c b/crypto/twofish_generic.c
index 86b2f067a416..4fe42b4ac82d 100644
--- a/crypto/twofish_generic.c
+++ b/crypto/twofish_generic.c
@@ -191,6 +191,6 @@ subsys_initcall(twofish_mod_init);
module_exit(twofish_mod_fini);
MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION ("Twofish Cipher Algorithm");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm");
MODULE_ALIAS_CRYPTO("twofish");
MODULE_ALIAS_CRYPTO("twofish-generic");
--
2.38.1
© 2016 - 2026 Red Hat, Inc.