Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
XSHA256, XSHA384 and XSHA512 instructions.
With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.
This patch includes the XSHA1 instruction optimized implementation of
SHA-1 transform function.
Signed-off-by: AlanSong-oc <AlanSong-oc@zhaoxin.com>
---
lib/crypto/x86/sha1.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/lib/crypto/x86/sha1.h b/lib/crypto/x86/sha1.h
index c48a0131f..d4946270a 100644
--- a/lib/crypto/x86/sha1.h
+++ b/lib/crypto/x86/sha1.h
@@ -48,6 +48,26 @@ static void sha1_blocks_avx2(struct sha1_block_state *state,
}
}
+#if IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN)
+#define PHE_ALIGNMENT 16
+static void sha1_blocks_phe(struct sha1_block_state *state,
+ const u8 *data, size_t nblocks)
+{
+ /*
+ * XSHA1 requires %edi to point to a 32-byte, 16-byte-aligned
+ * buffer on Zhaoxin processors.
+ */
+ u8 buf[32 + PHE_ALIGNMENT - 1];
+ u8 *dst = PTR_ALIGN(&buf[0], PHE_ALIGNMENT);
+
+ memcpy(dst, state, SHA1_DIGEST_SIZE);
+ asm volatile(".byte 0xf3,0x0f,0xa6,0xc8"
+ : "+S"(data), "+D"(dst)
+ : "a"((long)-1), "c"(nblocks));
+ memcpy(state, dst, SHA1_DIGEST_SIZE);
+}
+#endif /* CONFIG_CPU_SUP_ZHAOXIN */
+
static void sha1_blocks(struct sha1_block_state *state,
const u8 *data, size_t nblocks)
{
@@ -59,6 +79,11 @@ static void sha1_mod_init_arch(void)
{
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
static_call_update(sha1_blocks_x86, sha1_blocks_ni);
+#if IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN)
+ } else if (boot_cpu_has(X86_FEATURE_PHE_EN)) {
+ if (boot_cpu_data.x86 >= 0x07)
+ static_call_update(sha1_blocks_x86, sha1_blocks_phe);
+#endif
} else if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
NULL) &&
boot_cpu_has(X86_FEATURE_AVX)) {
--
2.34.1
On Fri, Jan 16, 2026 at 03:15:12PM +0800, AlanSong-oc wrote:
> Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
> instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
> XSHA256, XSHA384 and XSHA512 instructions.
>
> With the help of implementation of SHA in hardware instead of software,
> can develop applications with higher performance, more security and more
> flexibility.
>
> This patch includes the XSHA1 instruction optimized implementation of
> SHA-1 transform function.
>
> Signed-off-by: AlanSong-oc <AlanSong-oc@zhaoxin.com>
Please include the information I've asked for (benchmark results, test
results, and link to the specification) directly in the commit message.
> +#if IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN)
> +#define PHE_ALIGNMENT 16
> +static void sha1_blocks_phe(struct sha1_block_state *state,
> + const u8 *data, size_t nblocks)
The IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN) should go in the CPU feature
check, so that the code will be parsed regardless of the setting. That
reduces the chance that future changes will cause compilation errors.
> + /*
> + * XSHA1 requires %edi to point to a 32-byte, 16-byte-aligned
> + * buffer on Zhaoxin processors.
> + */
This seems implausible. In 64-bit mode a pointer can't fit in %edi. I
thought you mentioned that this instruction is 64-bit compatible? You
may have meant %rdi.
Interestingly, the spec you provided specifically says the registers
operated on are %eax, %ecx, %esi, and %edi.
So assuming the code works, perhaps both the spec and your code comment
are incorrect?
These errors don't really confidence in this instruction.
> + memcpy(dst, state, SHA1_DIGEST_SIZE);
> + asm volatile(".byte 0xf3,0x0f,0xa6,0xc8"
> + : "+S"(data), "+D"(dst)
> + : "a"((long)-1), "c"(nblocks));
> + memcpy(state, dst, SHA1_DIGEST_SIZE);
Is the reason for using '.byte' that the GNU and clang assemblers don't
implement the mnemonic this Zhaoxin-specific instruction? The spec
implies that the intended mnemonic is "rep sha1".
If that's correct, could you add a comment like /* rep sha1 */ so that
it's clear what the intended instruction is?
Also, the spec describes all four registers as both input and output
registers. Yet your inline asm marks %rax and %rcx as inputs only.
> @@ -59,6 +79,11 @@ static void sha1_mod_init_arch(void)
> {
> if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
> static_call_update(sha1_blocks_x86, sha1_blocks_ni);
> +#if IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN)
> + } else if (boot_cpu_has(X86_FEATURE_PHE_EN)) {
> + if (boot_cpu_data.x86 >= 0x07)
> + static_call_update(sha1_blocks_x86, sha1_blocks_phe);
> +#endif
I think it should be:
} else if (IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN) &&
boot_cpu_has(X86_FEATURE_PHE_EN) &&
boot_cpu_data.x86 >= 0x07) {
static_call_update(sha1_blocks_x86, sha1_blocks_phe);
... so (a) the code will be parsed even when !CONFIG_CPU_SUP_ZHAOXIN,
and (b) functions won't be unnecessarily disabled when
boot_cpu_has(X86_FEATURE_PHE_EN) && boot_cpu_data.x86 < 0x07).
As before, all these comments apply to the SHA-256 patch too.
- Eric
© 2016 - 2026 Red Hat, Inc.