From nobody Sat Feb 7 23:23:27 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56E2513D2B2; Sun, 2 Nov 2025 23:44:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127043; cv=none; b=MkCL7NE7AsdjaZ0fGLHvywRRuz9iOyRaE0SfkAduxg5codi4ifl6F1eq778Hf9faHB0f0GJ9f5zF7YoOl7GxLZB2CASAT2T/+Mm23eSgl97hZN3v2GU9oe7H5WCaevclHz3QyZZbTnRz4hODY01p11kTGs6hRxdJScKzKr/qG3Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127043; c=relaxed/simple; bh=Q6eKCfeKL6NL11OUN0m6F/u4RWoglkYFIdJAEJsYzs4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HOpLV6MnJoaOOsP/XxAvyhvmtEkXnNAblimWVGDkh/RPNdbf3oO5rtlx5t9+YyJk/YtJVDp710vO2mzoSGB1pBYxeRP8SUG3rKyfBtHh7fzBh2HTQhmLPlMK/3PDspv5MmMwWbHbXR3NWn7WvTFPSU3qC5wD4SOuY0KiiHoL6No= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sssUsswx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sssUsswx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 91F10C116B1; Sun, 2 Nov 2025 23:44:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762127042; bh=Q6eKCfeKL6NL11OUN0m6F/u4RWoglkYFIdJAEJsYzs4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sssUsswx0zWR9HcHB05wU8TvV1NffLyzQ7hJe8KqZVlH1mDsNgfkDwfxgtb6APU2W lxMdx0umCsMdgE3FzoiShWlzF8EtCznTM8/oGAZn8a+wNckqP2++R5kgDbmNlAQZpx 44cjlojtfqTPglxopwHQoPpNAcJWIiNQcgB0XGEBo3NEIBNOUvrnUFwoLUCGW3GwIL m4hFADHO7aW/oK0J+/WJGdMXH1fgMH00jpBL082HLPrfXgyO67F/rv3ikw6ZcE56mX /dzUrvzlONIRU+JFXy/B4ljiqERpzEdW+69qa4BfF3anJpFQ1heYVmCd/hZk3IaTI+ ExxEapn1CMMjQ== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , Herbert Xu , x86@kernel.org, Samuel Neves , Eric Biggers , stable@vger.kernel.org Subject: [PATCH 1/6] lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit Date: Sun, 2 Nov 2025 15:42:04 -0800 Message-ID: <20251102234209.62133-2-ebiggers@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251102234209.62133-1-ebiggers@kernel.org> References: <20251102234209.62133-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the C code, the 'inc' argument to the assembly functions blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with type u32, matching blake2s_compress(). The assembly code then reads it from the 64-bit %rcx. However, the ABI doesn't guarantee zero-extension to 64 bits, nor do gcc or clang guarantee it. Therefore, fix these functions to read this argument from the 32-bit %ecx. In theory, this bug could have caused the wrong 'inc' value to be used, causing incorrect BLAKE2s hashes. In practice, probably not: I've fixed essentially this same bug in many other assembly files too, but there's never been a real report of it having caused a problem. In x86_64, all writes to 32-bit registers are zero-extended to 64 bits. That results in zero-extension in nearly all situations. I've only been able to demonstrate a lack of zero-extension with a somewhat contrived example involving truncation, e.g. when the C code has a u64 variable holding 0x1234567800000040 and passes it as a u32 expecting it to be truncated to 0x40 (64). But that's not what the real code does, of course. Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/blake2s-core.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S index ef8e9f427aab..093e7814f387 100644 --- a/lib/crypto/x86/blake2s-core.S +++ b/lib/crypto/x86/blake2s-core.S @@ -50,11 +50,11 @@ SYM_FUNC_START(blake2s_compress_ssse3) movdqu (%rdi),%xmm0 movdqu 0x10(%rdi),%xmm1 movdqa ROT16(%rip),%xmm12 movdqa ROR328(%rip),%xmm13 movdqu 0x20(%rdi),%xmm14 - movq %rcx,%xmm15 + movd %ecx,%xmm15 leaq SIGMA+0xa0(%rip),%r8 jmp .Lbeginofloop .align 32 .Lbeginofloop: movdqa %xmm0,%xmm10 @@ -174,11 +174,11 @@ SYM_FUNC_END(blake2s_compress_ssse3) =20 SYM_FUNC_START(blake2s_compress_avx512) vmovdqu (%rdi),%xmm0 vmovdqu 0x10(%rdi),%xmm1 vmovdqu 0x20(%rdi),%xmm4 - vmovq %rcx,%xmm5 + vmovd %ecx,%xmm5 vmovdqa IV(%rip),%xmm14 vmovdqa IV+16(%rip),%xmm15 jmp .Lblake2s_compress_avx512_mainloop .align 32 .Lblake2s_compress_avx512_mainloop: --=20 2.51.2 From nobody Sat Feb 7 23:23:27 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CF2A231856; Sun, 2 Nov 2025 23:44:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127043; cv=none; b=h2ZxyzZklWDh7WywaAgMcT5hjsWwMXZSG67szrQ6sCgrTQNEKyCBpztWOflOLRLBwOHUBodJaTfkH4/wm7bvexPiGo/yYV3Gm0N/e0CDepG2e6mgZMEpl2GSrTzkH6kCzooadMbxTs+KRDUXHfFPHXeXSjtBvzjG/cYPtqd8eas= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127043; c=relaxed/simple; bh=Cjz2Ynime9JFgui5jMG92w+fxZmvkZNBICHD9qR75Jg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YJF641c2kWfcJzBgmvUImBZsmE8ibtU7iEVlUfo+u4b2+Gf5mEd6YHNN32HtTMxEDDbSw0m7C1jCPyzBvvbvUwul7o8tRGrQ28f/jJ32iAJHGmz9XHkvlsbhl7+xTPgJKAHggoOIK4UxFXmRNA+AXkSARle/bBtZ1JBGCP24Nag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=omoKZCBD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="omoKZCBD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00B59C19421; Sun, 2 Nov 2025 23:44:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762127043; bh=Cjz2Ynime9JFgui5jMG92w+fxZmvkZNBICHD9qR75Jg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=omoKZCBD2ojlxQ6a0XUfYqhc3oa8RkS3oiDyfzTMmbQQQV0vtzYVRy+XSR8WsVYbJ bw3LDMdiyjdnJv5Qwpsf0u2hjMLdLHONrYX/tn84w0S6FkX0cDk9BYU7O0q4JIsVgW foptSmTPXnWv9HWo5esYDVmPc/BrF+pmq8D8glhDqvYl+Mfj08aRkDT23X5ESp3FVu XfrctE4yMnss9L1bZRfL7lHiVbKgU9BLdpXxhGLZjKgHQsFCKUcFQkMjXR1ggZAAo+ KMWxyVQC92Xz24DxDeWepz3qFPFh8LdKyLpbdx0YVwfLcB6CGIN/hkW0qixZPGV7dT BG0GDgb0wBInw== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , Herbert Xu , x86@kernel.org, Samuel Neves , Eric Biggers Subject: [PATCH 2/6] lib/crypto: x86/blake2s: Drop check for nblocks == 0 Date: Sun, 2 Nov 2025 15:42:05 -0800 Message-ID: <20251102234209.62133-3-ebiggers@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251102234209.62133-1-ebiggers@kernel.org> References: <20251102234209.62133-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since blake2s_compress() is always passed nblocks !=3D 0, remove the unnecessary check for nblocks =3D=3D 0 from blake2s_compress_ssse3(). Note that this makes it consistent with blake2s_compress_avx512() in the same file as well as the arm32 blake2s_compress(). Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/blake2s-core.S | 3 --- 1 file changed, 3 deletions(-) diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S index 093e7814f387..aee13b97cc34 100644 --- a/lib/crypto/x86/blake2s-core.S +++ b/lib/crypto/x86/blake2s-core.S @@ -43,12 +43,10 @@ SIGMA2: .byte 15, 5, 4, 13, 10, 7, 3, 11, 12, 2, 0, 6, 9, 8, 1, 14 .byte 8, 7, 14, 11, 13, 15, 0, 12, 10, 4, 5, 6, 3, 2, 1, 9 =20 .text SYM_FUNC_START(blake2s_compress_ssse3) - testq %rdx,%rdx - je .Lendofloop movdqu (%rdi),%xmm0 movdqu 0x10(%rdi),%xmm1 movdqa ROT16(%rip),%xmm12 movdqa ROR328(%rip),%xmm13 movdqu 0x20(%rdi),%xmm14 @@ -166,11 +164,10 @@ SYM_FUNC_START(blake2s_compress_ssse3) decq %rdx jnz .Lbeginofloop movdqu %xmm0,(%rdi) movdqu %xmm1,0x10(%rdi) movdqu %xmm14,0x20(%rdi) -.Lendofloop: RET SYM_FUNC_END(blake2s_compress_ssse3) =20 SYM_FUNC_START(blake2s_compress_avx512) vmovdqu (%rdi),%xmm0 --=20 2.51.2 From nobody Sat Feb 7 23:23:27 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4CC723C8A1; Sun, 2 Nov 2025 23:44:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127043; cv=none; b=ZYk3lrQp3Szv+a9w3ALhbig/8eUKSql8sysPb9MF2pVhOXoQA7OfrwvHUNtD0VxNGa6fqmprsCuteGHvvJ8jnCZgOoJqr5+R1/aUL13w4Uuk/4EQxEOORKFNVBmNZ50h/SbjDRQBBCZAfhHi8IOiq+FyJi+m7PTOwcp6xBXb/pY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127043; c=relaxed/simple; bh=ulGxQA65b9Ah8iXpW7oz06s75Ru03GK8D6CllcJfxuU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H/k40YVLGPA4bFQkSEuorMPYxilRfs5xdMePa1aUiB3qPnMN03QjoWxRAKL2US0JykhM7yg1yBcZORo0G3pd5+NnhNl8fIykSHVhKD4THHCHn45XbbLQ+zVFs4uJOW0MOYp/YAohCtrPagSC+0boWp83Oa2rhIwUlQhY6Wnbe7Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=d+bylABq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="d+bylABq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B9CBC19422; Sun, 2 Nov 2025 23:44:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762127043; bh=ulGxQA65b9Ah8iXpW7oz06s75Ru03GK8D6CllcJfxuU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=d+bylABqZRcKnX36PL0C+kE5HBR+/FnEdOsDtnWu0IzXJpmsc52oLI261no+nQYyR WPB/lAtzNXgQtqkyf/RWUi6qxMtDPmsX5s0BnLkGGBBokuB/hpp5Kp2h3czzyUOOPd nBBztcgYWOXrM3nMpLNxMDwjtGNKlZvbnFCoJluSYrNDR0wZmZantNzbLLOVMp2mua zZnbroDt2RPZ3hjkG6dpx0MRmBrZ4qVlmy/xf1SeVPHjvzPMLQReklF8MjWy1Q7PSg g2MWwC9gS9mPbO9JvRZVmxJ5ToYph42vEOKXUYW2Fgydzi8i0b1HFd/z3urca5WRdb cWVODLbh9GXsw== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , Herbert Xu , x86@kernel.org, Samuel Neves , Eric Biggers Subject: [PATCH 3/6] lib/crypto: x86/blake2s: Use local labels for data Date: Sun, 2 Nov 2025 15:42:06 -0800 Message-ID: <20251102234209.62133-4-ebiggers@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251102234209.62133-1-ebiggers@kernel.org> References: <20251102234209.62133-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Following the usual practice, prefix the names of the data labels with ".L" so that the assembler treats them as truly local. This more clearly expresses the intent and is less error-prone. Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/blake2s-core.S | 45 ++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S index aee13b97cc34..14e487559c09 100644 --- a/lib/crypto/x86/blake2s-core.S +++ b/lib/crypto/x86/blake2s-core.S @@ -4,36 +4,43 @@ * Copyright (C) 2017-2019 Samuel Neves . All Rights Res= erved. */ =20 #include =20 -.section .rodata.cst32.BLAKE2S_IV, "aM", @progbits, 32 +.section .rodata.cst32.iv, "aM", @progbits, 32 .align 32 -IV: .octa 0xA54FF53A3C6EF372BB67AE856A09E667 +.Liv: + .octa 0xA54FF53A3C6EF372BB67AE856A09E667 .octa 0x5BE0CD191F83D9AB9B05688C510E527F -.section .rodata.cst16.ROT16, "aM", @progbits, 16 + +.section .rodata.cst16.ror16, "aM", @progbits, 16 .align 16 -ROT16: .octa 0x0D0C0F0E09080B0A0504070601000302 -.section .rodata.cst16.ROR328, "aM", @progbits, 16 +.Lror16: + .octa 0x0D0C0F0E09080B0A0504070601000302 + +.section .rodata.cst16.ror8, "aM", @progbits, 16 .align 16 -ROR328: .octa 0x0C0F0E0D080B0A090407060500030201 -.section .rodata.cst64.BLAKE2S_SIGMA, "aM", @progbits, 160 +.Lror8: + .octa 0x0C0F0E0D080B0A090407060500030201 + +.section .rodata.cst64.sigma, "aM", @progbits, 160 .align 64 -SIGMA: +.Lsigma: .byte 0, 2, 4, 6, 1, 3, 5, 7, 14, 8, 10, 12, 15, 9, 11, 13 .byte 14, 4, 9, 13, 10, 8, 15, 6, 5, 1, 0, 11, 3, 12, 2, 7 .byte 11, 12, 5, 15, 8, 0, 2, 13, 9, 10, 3, 7, 4, 14, 6, 1 .byte 7, 3, 13, 11, 9, 1, 12, 14, 15, 2, 5, 4, 8, 6, 10, 0 .byte 9, 5, 2, 10, 0, 7, 4, 15, 3, 14, 11, 6, 13, 1, 12, 8 .byte 2, 6, 0, 8, 12, 10, 11, 3, 1, 4, 7, 15, 9, 13, 5, 14 .byte 12, 1, 14, 4, 5, 15, 13, 10, 8, 0, 6, 9, 11, 7, 3, 2 .byte 13, 7, 12, 3, 11, 14, 1, 9, 2, 5, 15, 8, 10, 0, 4, 6 .byte 6, 14, 11, 0, 15, 9, 3, 8, 10, 12, 13, 1, 5, 2, 7, 4 .byte 10, 8, 7, 1, 2, 4, 6, 5, 13, 15, 9, 3, 0, 11, 14, 12 -.section .rodata.cst64.BLAKE2S_SIGMA2, "aM", @progbits, 160 + +.section .rodata.cst64.sigma2, "aM", @progbits, 160 .align 64 -SIGMA2: +.Lsigma2: .byte 0, 2, 4, 6, 1, 3, 5, 7, 14, 8, 10, 12, 15, 9, 11, 13 .byte 8, 2, 13, 15, 10, 9, 12, 3, 6, 4, 0, 14, 5, 11, 1, 7 .byte 11, 13, 8, 6, 5, 10, 14, 3, 2, 4, 12, 15, 1, 0, 7, 9 .byte 11, 10, 7, 0, 8, 15, 1, 13, 3, 6, 2, 12, 4, 14, 9, 5 .byte 4, 10, 9, 14, 15, 0, 11, 8, 1, 7, 3, 13, 2, 5, 6, 12 @@ -45,25 +52,25 @@ SIGMA2: =20 .text SYM_FUNC_START(blake2s_compress_ssse3) movdqu (%rdi),%xmm0 movdqu 0x10(%rdi),%xmm1 - movdqa ROT16(%rip),%xmm12 - movdqa ROR328(%rip),%xmm13 + movdqa .Lror16(%rip),%xmm12 + movdqa .Lror8(%rip),%xmm13 movdqu 0x20(%rdi),%xmm14 movd %ecx,%xmm15 - leaq SIGMA+0xa0(%rip),%r8 + leaq .Lsigma+0xa0(%rip),%r8 jmp .Lbeginofloop .align 32 .Lbeginofloop: movdqa %xmm0,%xmm10 movdqa %xmm1,%xmm11 paddq %xmm15,%xmm14 - movdqa IV(%rip),%xmm2 + movdqa .Liv(%rip),%xmm2 movdqa %xmm14,%xmm3 - pxor IV+0x10(%rip),%xmm3 - leaq SIGMA(%rip),%rcx + pxor .Liv+0x10(%rip),%xmm3 + leaq .Lsigma(%rip),%rcx .Lroundloop: movzbl (%rcx),%eax movd (%rsi,%rax,4),%xmm4 movzbl 0x1(%rcx),%eax movd (%rsi,%rax,4),%xmm5 @@ -172,12 +179,12 @@ SYM_FUNC_END(blake2s_compress_ssse3) SYM_FUNC_START(blake2s_compress_avx512) vmovdqu (%rdi),%xmm0 vmovdqu 0x10(%rdi),%xmm1 vmovdqu 0x20(%rdi),%xmm4 vmovd %ecx,%xmm5 - vmovdqa IV(%rip),%xmm14 - vmovdqa IV+16(%rip),%xmm15 + vmovdqa .Liv(%rip),%xmm14 + vmovdqa .Liv+16(%rip),%xmm15 jmp .Lblake2s_compress_avx512_mainloop .align 32 .Lblake2s_compress_avx512_mainloop: vmovdqa %xmm0,%xmm10 vmovdqa %xmm1,%xmm11 @@ -185,11 +192,11 @@ SYM_FUNC_START(blake2s_compress_avx512) vmovdqa %xmm14,%xmm2 vpxor %xmm15,%xmm4,%xmm3 vmovdqu (%rsi),%ymm6 vmovdqu 0x20(%rsi),%ymm7 addq $0x40,%rsi - leaq SIGMA2(%rip),%rax + leaq .Lsigma2(%rip),%rax movb $0xa,%cl .Lblake2s_compress_avx512_roundloop: vpmovzxbd (%rax),%ymm8 vpmovzxbd 0x8(%rax),%ymm9 addq $0x10,%rax --=20 2.51.2 From nobody Sat Feb 7 23:23:27 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F2D825A63D; Sun, 2 Nov 2025 23:44:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127046; cv=none; b=DuNkrY2tCVpKC59xQRGjQR+WKQgEZti4TJj7Ah0IbGAlqNi50ep8xyftg6FKkcamnYxrow4DJoeUeLoBB0kBCGAPPsIoerP6z3lRAEmstyaZXedDvc9QUey9LW1Y/wVIhkPtkHLGph21qTR0VixefVjA48VgwyvrlUotQ8GpXog= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127046; c=relaxed/simple; bh=51qIKkW5HOnlBeN2ABSyez+jNgIlah/y7uNFW97oB4g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s5KMt31ofAYb0oCkshjIAiSobcIecD4G+wNKk1mzbIrWIGhguchl555Rds02YD2trEP0+K+kDIubvljY1oMmfpGfrGwQF93wYZkqlmXlTOhc0dmL64z8ikFA2X2K7rPRX8v97aDppgsVghi7fqdRfQNoM2Sko/75yQ9LP0zhx3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N4WlSSKr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N4WlSSKr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6D24C116B1; Sun, 2 Nov 2025 23:44:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762127044; bh=51qIKkW5HOnlBeN2ABSyez+jNgIlah/y7uNFW97oB4g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=N4WlSSKrJKO1O1riujpcl5kKgV6PxdxxN27kNKgUsK5+ILrgGJgyaJ1mD3fyNKDBo s5KnCVUMb4r2Vtp1o1B3CGB9HcePuSUJ3VW6xj/d7VFVEzQCS+kFwaHwUWs2kMa3iw Bv1VQLsG/y1jGOFIPaBihlNS5VEYCFDDDYnbmqK518aUyykN9ZRBKcXqmWrcZ+Fdb2 63ANv1LJU9D4nU8lDdGR0nE4CcyHAnq57DlGPADBF//Pb/otSbxv56CWenC36fwEKy wI5IZaigdAyMLkBbvnXezvyjLWJF7rbuJ/i5M5VK06ZbIIMZhimo5TqD8hXG2vS8qz 7c5pDukr87Q4w== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , Herbert Xu , x86@kernel.org, Samuel Neves , Eric Biggers Subject: [PATCH 4/6] lib/crypto: x86/blake2s: Improve readability Date: Sun, 2 Nov 2025 15:42:07 -0800 Message-ID: <20251102234209.62133-5-ebiggers@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251102234209.62133-1-ebiggers@kernel.org> References: <20251102234209.62133-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Various cleanups for readability. No change to the generated code: - Add some comments - Add #defines for arguments - Rename some labels - Use decimal constants instead of hex where it makes sense. (The pshufd immediates intentionally remain as hex.) - Add blank lines when there's a logical break The round loop still could use some work, but this is at least a start. Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/blake2s-core.S | 231 ++++++++++++++++++++-------------- 1 file changed, 134 insertions(+), 97 deletions(-) diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S index 14e487559c09..f805a49c590d 100644 --- a/lib/crypto/x86/blake2s-core.S +++ b/lib/crypto/x86/blake2s-core.S @@ -48,209 +48,246 @@ .byte 4, 8, 15, 9, 14, 11, 13, 5, 3, 2, 1, 12, 6, 10, 7, 0 .byte 6, 13, 0, 14, 12, 2, 1, 11, 15, 4, 5, 8, 7, 9, 3, 10 .byte 15, 5, 4, 13, 10, 7, 3, 11, 12, 2, 0, 6, 9, 8, 1, 14 .byte 8, 7, 14, 11, 13, 15, 0, 12, 10, 4, 5, 6, 3, 2, 1, 9 =20 +#define CTX %rdi +#define DATA %rsi +#define NBLOCKS %rdx +#define INC %ecx + .text +// +// void blake2s_compress_ssse3(struct blake2s_ctx *ctx, +// const u8 *data, size_t nblocks, u32 inc); +// +// Only the first three fields of struct blake2s_ctx are used: +// u32 h[8]; (inout) +// u32 t[2]; (inout) +// u32 f[2]; (in) +// SYM_FUNC_START(blake2s_compress_ssse3) - movdqu (%rdi),%xmm0 - movdqu 0x10(%rdi),%xmm1 + movdqu (CTX),%xmm0 // Load h[0..3] + movdqu 16(CTX),%xmm1 // Load h[4..7] movdqa .Lror16(%rip),%xmm12 movdqa .Lror8(%rip),%xmm13 - movdqu 0x20(%rdi),%xmm14 - movd %ecx,%xmm15 - leaq .Lsigma+0xa0(%rip),%r8 - jmp .Lbeginofloop + movdqu 32(CTX),%xmm14 // Load t and f + movd INC,%xmm15 // Load inc + leaq .Lsigma+160(%rip),%r8 + jmp .Lssse3_mainloop + .align 32 -.Lbeginofloop: - movdqa %xmm0,%xmm10 - movdqa %xmm1,%xmm11 - paddq %xmm15,%xmm14 - movdqa .Liv(%rip),%xmm2 +.Lssse3_mainloop: + // Main loop: each iteration processes one 64-byte block. + movdqa %xmm0,%xmm10 // Save h[0..3] and let v[0..3] =3D h[0..3] + movdqa %xmm1,%xmm11 // Save h[4..7] and let v[4..7] =3D h[4..7] + paddq %xmm15,%xmm14 // t +=3D inc (64-bit addition) + movdqa .Liv(%rip),%xmm2 // v[8..11] =3D iv[0..3] movdqa %xmm14,%xmm3 - pxor .Liv+0x10(%rip),%xmm3 + pxor .Liv+16(%rip),%xmm3 // v[12..15] =3D iv[4..7] ^ [t, f] leaq .Lsigma(%rip),%rcx -.Lroundloop: + +.Lssse3_roundloop: + // Round loop: each iteration does 1 round (of 10 rounds total). movzbl (%rcx),%eax - movd (%rsi,%rax,4),%xmm4 - movzbl 0x1(%rcx),%eax - movd (%rsi,%rax,4),%xmm5 - movzbl 0x2(%rcx),%eax - movd (%rsi,%rax,4),%xmm6 - movzbl 0x3(%rcx),%eax - movd (%rsi,%rax,4),%xmm7 + movd (DATA,%rax,4),%xmm4 + movzbl 1(%rcx),%eax + movd (DATA,%rax,4),%xmm5 + movzbl 2(%rcx),%eax + movd (DATA,%rax,4),%xmm6 + movzbl 3(%rcx),%eax + movd (DATA,%rax,4),%xmm7 punpckldq %xmm5,%xmm4 punpckldq %xmm7,%xmm6 punpcklqdq %xmm6,%xmm4 paddd %xmm4,%xmm0 paddd %xmm1,%xmm0 pxor %xmm0,%xmm3 pshufb %xmm12,%xmm3 paddd %xmm3,%xmm2 pxor %xmm2,%xmm1 movdqa %xmm1,%xmm8 - psrld $0xc,%xmm1 - pslld $0x14,%xmm8 + psrld $12,%xmm1 + pslld $20,%xmm8 por %xmm8,%xmm1 - movzbl 0x4(%rcx),%eax - movd (%rsi,%rax,4),%xmm5 - movzbl 0x5(%rcx),%eax - movd (%rsi,%rax,4),%xmm6 - movzbl 0x6(%rcx),%eax - movd (%rsi,%rax,4),%xmm7 - movzbl 0x7(%rcx),%eax - movd (%rsi,%rax,4),%xmm4 + movzbl 4(%rcx),%eax + movd (DATA,%rax,4),%xmm5 + movzbl 5(%rcx),%eax + movd (DATA,%rax,4),%xmm6 + movzbl 6(%rcx),%eax + movd (DATA,%rax,4),%xmm7 + movzbl 7(%rcx),%eax + movd (DATA,%rax,4),%xmm4 punpckldq %xmm6,%xmm5 punpckldq %xmm4,%xmm7 punpcklqdq %xmm7,%xmm5 paddd %xmm5,%xmm0 paddd %xmm1,%xmm0 pxor %xmm0,%xmm3 pshufb %xmm13,%xmm3 paddd %xmm3,%xmm2 pxor %xmm2,%xmm1 movdqa %xmm1,%xmm8 - psrld $0x7,%xmm1 - pslld $0x19,%xmm8 + psrld $7,%xmm1 + pslld $25,%xmm8 por %xmm8,%xmm1 pshufd $0x93,%xmm0,%xmm0 pshufd $0x4e,%xmm3,%xmm3 pshufd $0x39,%xmm2,%xmm2 - movzbl 0x8(%rcx),%eax - movd (%rsi,%rax,4),%xmm6 - movzbl 0x9(%rcx),%eax - movd (%rsi,%rax,4),%xmm7 - movzbl 0xa(%rcx),%eax - movd (%rsi,%rax,4),%xmm4 - movzbl 0xb(%rcx),%eax - movd (%rsi,%rax,4),%xmm5 + movzbl 8(%rcx),%eax + movd (DATA,%rax,4),%xmm6 + movzbl 9(%rcx),%eax + movd (DATA,%rax,4),%xmm7 + movzbl 10(%rcx),%eax + movd (DATA,%rax,4),%xmm4 + movzbl 11(%rcx),%eax + movd (DATA,%rax,4),%xmm5 punpckldq %xmm7,%xmm6 punpckldq %xmm5,%xmm4 punpcklqdq %xmm4,%xmm6 paddd %xmm6,%xmm0 paddd %xmm1,%xmm0 pxor %xmm0,%xmm3 pshufb %xmm12,%xmm3 paddd %xmm3,%xmm2 pxor %xmm2,%xmm1 movdqa %xmm1,%xmm8 - psrld $0xc,%xmm1 - pslld $0x14,%xmm8 + psrld $12,%xmm1 + pslld $20,%xmm8 por %xmm8,%xmm1 - movzbl 0xc(%rcx),%eax - movd (%rsi,%rax,4),%xmm7 - movzbl 0xd(%rcx),%eax - movd (%rsi,%rax,4),%xmm4 - movzbl 0xe(%rcx),%eax - movd (%rsi,%rax,4),%xmm5 - movzbl 0xf(%rcx),%eax - movd (%rsi,%rax,4),%xmm6 + movzbl 12(%rcx),%eax + movd (DATA,%rax,4),%xmm7 + movzbl 13(%rcx),%eax + movd (DATA,%rax,4),%xmm4 + movzbl 14(%rcx),%eax + movd (DATA,%rax,4),%xmm5 + movzbl 15(%rcx),%eax + movd (DATA,%rax,4),%xmm6 punpckldq %xmm4,%xmm7 punpckldq %xmm6,%xmm5 punpcklqdq %xmm5,%xmm7 paddd %xmm7,%xmm0 paddd %xmm1,%xmm0 pxor %xmm0,%xmm3 pshufb %xmm13,%xmm3 paddd %xmm3,%xmm2 pxor %xmm2,%xmm1 movdqa %xmm1,%xmm8 - psrld $0x7,%xmm1 - pslld $0x19,%xmm8 + psrld $7,%xmm1 + pslld $25,%xmm8 por %xmm8,%xmm1 pshufd $0x39,%xmm0,%xmm0 pshufd $0x4e,%xmm3,%xmm3 pshufd $0x93,%xmm2,%xmm2 - addq $0x10,%rcx + addq $16,%rcx cmpq %r8,%rcx - jnz .Lroundloop + jnz .Lssse3_roundloop + + // Compute the new h: h[0..7] ^=3D v[0..7] ^ v[8..15] pxor %xmm2,%xmm0 pxor %xmm3,%xmm1 pxor %xmm10,%xmm0 pxor %xmm11,%xmm1 - addq $0x40,%rsi - decq %rdx - jnz .Lbeginofloop - movdqu %xmm0,(%rdi) - movdqu %xmm1,0x10(%rdi) - movdqu %xmm14,0x20(%rdi) + addq $64,DATA + decq NBLOCKS + jnz .Lssse3_mainloop + + movdqu %xmm0,(CTX) // Store new h[0..3] + movdqu %xmm1,16(CTX) // Store new h[4..7] + movdqu %xmm14,32(CTX) // Store new t and f RET SYM_FUNC_END(blake2s_compress_ssse3) =20 +// +// void blake2s_compress_avx512(struct blake2s_ctx *ctx, +// const u8 *data, size_t nblocks, u32 inc); +// +// Only the first three fields of struct blake2s_ctx are used: +// u32 h[8]; (inout) +// u32 t[2]; (inout) +// u32 f[2]; (in) +// SYM_FUNC_START(blake2s_compress_avx512) - vmovdqu (%rdi),%xmm0 - vmovdqu 0x10(%rdi),%xmm1 - vmovdqu 0x20(%rdi),%xmm4 - vmovd %ecx,%xmm5 - vmovdqa .Liv(%rip),%xmm14 - vmovdqa .Liv+16(%rip),%xmm15 - jmp .Lblake2s_compress_avx512_mainloop -.align 32 -.Lblake2s_compress_avx512_mainloop: - vmovdqa %xmm0,%xmm10 - vmovdqa %xmm1,%xmm11 - vpaddq %xmm5,%xmm4,%xmm4 - vmovdqa %xmm14,%xmm2 - vpxor %xmm15,%xmm4,%xmm3 - vmovdqu (%rsi),%ymm6 - vmovdqu 0x20(%rsi),%ymm7 - addq $0x40,%rsi + vmovdqu (CTX),%xmm0 // Load h[0..3] + vmovdqu 16(CTX),%xmm1 // Load h[4..7] + vmovdqu 32(CTX),%xmm4 // Load t and f + vmovd INC,%xmm5 // Load inc + vmovdqa .Liv(%rip),%xmm14 // Load iv[0..3] + vmovdqa .Liv+16(%rip),%xmm15 // Load iv[4..7] + jmp .Lavx512_mainloop + + .align 32 +.Lavx512_mainloop: + // Main loop: each iteration processes one 64-byte block. + vmovdqa %xmm0,%xmm10 // Save h[0..3] and let v[0..3] =3D h[0..3] + vmovdqa %xmm1,%xmm11 // Save h[4..7] and let v[4..7] =3D h[4..7] + vpaddq %xmm5,%xmm4,%xmm4 // t +=3D inc (64-bit addition) + vmovdqa %xmm14,%xmm2 // v[8..11] =3D iv[0..3] + vpxor %xmm15,%xmm4,%xmm3 // v[12..15] =3D iv[4..7] ^ [t, f] + vmovdqu (DATA),%ymm6 // Load first 8 data words + vmovdqu 32(DATA),%ymm7 // Load second 8 data words + addq $64,DATA leaq .Lsigma2(%rip),%rax - movb $0xa,%cl -.Lblake2s_compress_avx512_roundloop: + movb $10,%cl // Set num rounds remaining + +.Lavx512_roundloop: + // Round loop: each iteration does 1 round (of 10 rounds total). vpmovzxbd (%rax),%ymm8 - vpmovzxbd 0x8(%rax),%ymm9 - addq $0x10,%rax + vpmovzxbd 8(%rax),%ymm9 + addq $16,%rax vpermi2d %ymm7,%ymm6,%ymm8 vpermi2d %ymm7,%ymm6,%ymm9 vmovdqa %ymm8,%ymm6 vmovdqa %ymm9,%ymm7 vpaddd %xmm8,%xmm0,%xmm0 vpaddd %xmm1,%xmm0,%xmm0 vpxor %xmm0,%xmm3,%xmm3 - vprord $0x10,%xmm3,%xmm3 + vprord $16,%xmm3,%xmm3 vpaddd %xmm3,%xmm2,%xmm2 vpxor %xmm2,%xmm1,%xmm1 - vprord $0xc,%xmm1,%xmm1 - vextracti128 $0x1,%ymm8,%xmm8 + vprord $12,%xmm1,%xmm1 + vextracti128 $1,%ymm8,%xmm8 vpaddd %xmm8,%xmm0,%xmm0 vpaddd %xmm1,%xmm0,%xmm0 vpxor %xmm0,%xmm3,%xmm3 - vprord $0x8,%xmm3,%xmm3 + vprord $8,%xmm3,%xmm3 vpaddd %xmm3,%xmm2,%xmm2 vpxor %xmm2,%xmm1,%xmm1 - vprord $0x7,%xmm1,%xmm1 + vprord $7,%xmm1,%xmm1 vpshufd $0x93,%xmm0,%xmm0 vpshufd $0x4e,%xmm3,%xmm3 vpshufd $0x39,%xmm2,%xmm2 vpaddd %xmm9,%xmm0,%xmm0 vpaddd %xmm1,%xmm0,%xmm0 vpxor %xmm0,%xmm3,%xmm3 - vprord $0x10,%xmm3,%xmm3 + vprord $16,%xmm3,%xmm3 vpaddd %xmm3,%xmm2,%xmm2 vpxor %xmm2,%xmm1,%xmm1 - vprord $0xc,%xmm1,%xmm1 - vextracti128 $0x1,%ymm9,%xmm9 + vprord $12,%xmm1,%xmm1 + vextracti128 $1,%ymm9,%xmm9 vpaddd %xmm9,%xmm0,%xmm0 vpaddd %xmm1,%xmm0,%xmm0 vpxor %xmm0,%xmm3,%xmm3 - vprord $0x8,%xmm3,%xmm3 + vprord $8,%xmm3,%xmm3 vpaddd %xmm3,%xmm2,%xmm2 vpxor %xmm2,%xmm1,%xmm1 - vprord $0x7,%xmm1,%xmm1 + vprord $7,%xmm1,%xmm1 vpshufd $0x39,%xmm0,%xmm0 vpshufd $0x4e,%xmm3,%xmm3 vpshufd $0x93,%xmm2,%xmm2 decb %cl - jne .Lblake2s_compress_avx512_roundloop + jne .Lavx512_roundloop + + // Compute the new h: h[0..7] ^=3D v[0..7] ^ v[8..15] vpxor %xmm10,%xmm0,%xmm0 vpxor %xmm11,%xmm1,%xmm1 vpxor %xmm2,%xmm0,%xmm0 vpxor %xmm3,%xmm1,%xmm1 - decq %rdx - jne .Lblake2s_compress_avx512_mainloop - vmovdqu %xmm0,(%rdi) - vmovdqu %xmm1,0x10(%rdi) - vmovdqu %xmm4,0x20(%rdi) + decq NBLOCKS + jne .Lavx512_mainloop + + vmovdqu %xmm0,(CTX) // Store new h[0..3] + vmovdqu %xmm1,16(CTX) // Store new h[4..7] + vmovdqu %xmm4,32(CTX) // Store new t and f vzeroupper RET SYM_FUNC_END(blake2s_compress_avx512) --=20 2.51.2 From nobody Sat Feb 7 23:23:27 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD00E207A20; Sun, 2 Nov 2025 23:44:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127048; cv=none; b=aBNUgzJQAWEfJLClHpFSCNll1XrqRmEn6E30kndUO1y67cgDsJ/Lrf7zMJZ0CpVUKqiuFWBZsLQPOrvYQVcg1sjtu3kK4RhtVa8TB+bKztoivQ2jH2blgW/x+SVLwtu287Gl4ZTF+awY8K2YDf1vmjm60kt/58Q/nGpuifqZxcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127048; c=relaxed/simple; bh=k2ihRsAJpVZynwObEpw3RH70PsUdDzp23XLPstmxK9s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EnfF786jCbq0qmMx4wC62hMTqYuHxHCcVGzARHthhoBUDXiduQfNJkrzfu0qT+V4CSOAbae7B5K9z/ZYbgdTbSeilR/KuOdvk6/jQ1hJne1f3iWBsOHDcPYWr9/4aQ1X056dg1gyKmkq1jDff1mYE5n9pm3Pqwe0jFSkO0HwaHQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=b++mgGmd; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="b++mgGmd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E599C4CEF7; Sun, 2 Nov 2025 23:44:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762127044; bh=k2ihRsAJpVZynwObEpw3RH70PsUdDzp23XLPstmxK9s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b++mgGmd3lKCuPyeu8y+8WpSyacY1Vqu/yhuBtI4l2G7sishtVO+RvDhnsOb/kv3e 2aPbyu7xYnV5tXFKlhlHNocVInPs8NXGwMBe8BqjU+8m6cA0+TMdL9F6bxgsuL6iUn EWIUQ3CywYaNNITcLo8bVLXW/cgABlReidQ0QfIOl5BwGl1ApF+MA1JxitMwVy9A17 4ATRJPRM4Cxkhhy+x7tvfmkhXJCuTw/YTBjknHykt3Jwru0inlq5/myt77KBX1DKZA qZiddK6sA6u6XLiNlWC0o54NgdzVmLgfEmbjlTjoC0vNvsCBSy5zlANTpUwQQldqFM LIYUdfH1Yf2Vg== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , Herbert Xu , x86@kernel.org, Samuel Neves , Eric Biggers Subject: [PATCH 5/6] lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value Date: Sun, 2 Nov 2025 15:42:08 -0800 Message-ID: <20251102234209.62133-6-ebiggers@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251102234209.62133-1-ebiggers@kernel.org> References: <20251102234209.62133-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Just before returning, blake2s_compress_ssse3() and blake2s_compress_avx512() store updated values to the 'h', 't', and 'f' fields of struct blake2s_ctx. But 'f' is always unchanged (which is correct; only the C code changes it). So, there's no need to write to 'f'. Use 64-bit stores (movq and vmovq) instead of 128-bit stores (movdqu and vmovdqu) so that only 't' is written. Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/blake2s-core.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S index f805a49c590d..869064f6ac16 100644 --- a/lib/crypto/x86/blake2s-core.S +++ b/lib/crypto/x86/blake2s-core.S @@ -191,11 +191,11 @@ SYM_FUNC_START(blake2s_compress_ssse3) decq NBLOCKS jnz .Lssse3_mainloop =20 movdqu %xmm0,(CTX) // Store new h[0..3] movdqu %xmm1,16(CTX) // Store new h[4..7] - movdqu %xmm14,32(CTX) // Store new t and f + movq %xmm14,32(CTX) // Store new t (f is unchanged) RET SYM_FUNC_END(blake2s_compress_ssse3) =20 // // void blake2s_compress_avx512(struct blake2s_ctx *ctx, @@ -285,9 +285,9 @@ SYM_FUNC_START(blake2s_compress_avx512) decq NBLOCKS jne .Lavx512_mainloop =20 vmovdqu %xmm0,(CTX) // Store new h[0..3] vmovdqu %xmm1,16(CTX) // Store new h[4..7] - vmovdqu %xmm4,32(CTX) // Store new t and f + vmovq %xmm4,32(CTX) // Store new t (f is unchanged) vzeroupper RET SYM_FUNC_END(blake2s_compress_avx512) --=20 2.51.2 From nobody Sat Feb 7 23:23:27 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D41B6253F2B; Sun, 2 Nov 2025 23:44:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127044; cv=none; b=OTz05cBMFtih86/TB4rSMEcNykDfuEO+LcbAfiLDe40hj1EKyhkNCBFD0XT0u/gyQXNaMpCZVzOehQKc4sm8fnfZKleDyx6d1r55aZMNrD9IsKs1SiyBvDvVQJkoH9GlLat3bhGgCK1fNbEEe7mt2UegTqWV3YDfRZTzjby0zGM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762127044; c=relaxed/simple; bh=j+LSbTUig2K+EmVWaRZVVpXdTDuVAi/PAB71FfKJnIs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U3UxjFKPxqdNe1QKkNUVnJucVxGEFiYUQ049pye3p2LOfYZE1CzFkwHktGI6NSwhxPbfWbXaGALJ8phynMgww2WfJf0qGO4SVM8g1EdgiTrJ1DFJhUj+qglefFh0BzMvaDkbjNWGLKTgdQHJfmklM68qbHJxHClSBCB2gwct9lI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GtVTM6xJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GtVTM6xJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7AC21C19423; Sun, 2 Nov 2025 23:44:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762127044; bh=j+LSbTUig2K+EmVWaRZVVpXdTDuVAi/PAB71FfKJnIs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GtVTM6xJL/gvPY+zJsdv8/aaTPE5/cHj/iGZYsJ0X3+DOisrk8LuSao0kmvFtknQ8 Wu8fQnerrUXuSTurB7CmlXX8TfKbc0IfNJDBBNWIuCo5zAaHDmDQUvqJ1F+BYIZ5q3 ZyYUg0MdounUp9QUFldvD6ITKq3E+z31/5Qk2mK9hO/EBPbBFKupJBV7b9LMUSNYH4 /OIjUeQ2jhaPc+eqwrhKximaFVSi30aDC8ocVnXGCYWA6tXMUEmVeFOhnKR5gHyykv ZNOdTvGJv6KngG7E8829H437XRN8qV3lwA4dRdNP9tH+wa90yGAEcg2eRlLxVIaTFC dP+ZhJXQcHeDA== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , Herbert Xu , x86@kernel.org, Samuel Neves , Eric Biggers Subject: [PATCH 6/6] lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs Date: Sun, 2 Nov 2025 15:42:09 -0800 Message-ID: <20251102234209.62133-7-ebiggers@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251102234209.62133-1-ebiggers@kernel.org> References: <20251102234209.62133-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq) instruction with immediate 0x96. This approach, vs. the alternative of two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS code, since it reduces the instruction count and is faster on some CPUs. Make blake2s_compress_avx512() take advantage of it too. Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/blake2s-core.S | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S index 869064f6ac16..7b1d98ca7482 100644 --- a/lib/crypto/x86/blake2s-core.S +++ b/lib/crypto/x86/blake2s-core.S @@ -276,14 +276,12 @@ SYM_FUNC_START(blake2s_compress_avx512) vpshufd $0x93,%xmm2,%xmm2 decb %cl jne .Lavx512_roundloop =20 // Compute the new h: h[0..7] ^=3D v[0..7] ^ v[8..15] - vpxor %xmm10,%xmm0,%xmm0 - vpxor %xmm11,%xmm1,%xmm1 - vpxor %xmm2,%xmm0,%xmm0 - vpxor %xmm3,%xmm1,%xmm1 + vpternlogd $0x96,%xmm10,%xmm2,%xmm0 + vpternlogd $0x96,%xmm11,%xmm3,%xmm1 decq NBLOCKS jne .Lavx512_mainloop =20 vmovdqu %xmm0,(CTX) // Store new h[0..3] vmovdqu %xmm1,16(CTX) // Store new h[4..7] --=20 2.51.2