From nobody Tue Oct 7 19:50:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E721D293C4F; Sun, 6 Jul 2025 23:11:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843504; cv=none; b=l0Y7yT3+TQufEFeHKJABSCqCpO2flEyRLiuvVPhw9/O9aahvMIBaopwzmRKbY5V0iuH7aw3cvLzgdbp00EX8onotu12vmnhuJcTZ2DhvuKKpaU59n2EuMLipbH/wZZlEaOh4Nz1ynZBbhdaakMoYaih/OnA9kV+V59IwUVmZCCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843504; c=relaxed/simple; bh=teoV448S0gcMOu8C5WoirpthEs9NO+TdMk4xuLOATJo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c3MlUQVpVVgbIkheU9rAZfC+2OKH0Lo87z9xTfv7BGMV1QXmQPPmBfuYE8IU4ZjxqrcZDouaftwSYPxLe9j77DVMzxyHHDKJwoZOAhb0Mi3Yfz0fXm0eGTo+nRNgROsdb/osXkxMoSGQ3Y2ZN6QjdtAmcRyYs95NeqCftEMC1bc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f3a7W0of; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f3a7W0of" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C51AFC4CEF3; Sun, 6 Jul 2025 23:11:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751843503; bh=teoV448S0gcMOu8C5WoirpthEs9NO+TdMk4xuLOATJo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=f3a7W0of7/Jn47K34fx9xqgeSmDmpq/O5gMCPuremhPnhrR8nwudK87rOYh6nUYqd OLWLwRYYw2X37+u+56dtHC/AQTkEIjrXb+ZSc7XGDYjHMLfn8D9FSa0Ia+B0+F1S0f nVJKmv8qJw0qtGzW50Lw+T1S5O1aKhKjvXSvGkGyg+ntLDKkmeiPvRLoF4NhTTitdP OJ1n/3J0FSbQEL9n1uI1Xb27QBA9uF8IbpUdbePaW6x7BpBlVDWqGkBUHo1NCtSEdG quNoqEUTbkT2B68MKYjFZGb8+AczHW119pYGvbGi0SgC0eMj9I5feNFdJ7fnEk6ObS 0aqVgphHrc6NA== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Eric Biggers Subject: [PATCH 1/5] lib/crypto: arm/poly1305: Remove unneeded empty weak function Date: Sun, 6 Jul 2025 16:10:56 -0700 Message-ID: <20250706231100.176113-2-ebiggers@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250706231100.176113-1-ebiggers@kernel.org> References: <20250706231100.176113-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The __weak and empty definition of poly1305_blocks_neon() was a workaround to prevent link errors when CONFIG_KERNEL_MODE_NEON=3Dn, as compilers didn't always optimize out the call. This call is now guarded by IS_ENABLED(CONFIG_KERNEL_MODE_NEON). That guarantees the call is removed at compile time when NEON support is disabled. Therefore, the workaround is no longer needed. Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/arm/poly1305-glue.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/lib/crypto/arm/poly1305-glue.c b/lib/crypto/arm/poly1305-glue.c index 2603b0771f2c..5b65b840c166 100644 --- a/lib/crypto/arm/poly1305-glue.c +++ b/lib/crypto/arm/poly1305-glue.c @@ -25,15 +25,10 @@ asmlinkage void poly1305_blocks_neon(struct poly1305_bl= ock_state *state, asmlinkage void poly1305_emit_arch(const struct poly1305_state *state, u8 digest[POLY1305_DIGEST_SIZE], const u32 nonce[4]); EXPORT_SYMBOL_GPL(poly1305_emit_arch); =20 -void __weak poly1305_blocks_neon(struct poly1305_block_state *state, - const u8 *src, u32 len, u32 hibit) -{ -} - static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon); =20 void poly1305_blocks_arch(struct poly1305_block_state *state, const u8 *sr= c, unsigned int len, u32 padbit) { --=20 2.50.0 From nobody Tue Oct 7 19:50:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0BFE29E11F; Sun, 6 Jul 2025 23:11:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843504; cv=none; b=CAIgK5fwZAXCyf7n1SruYGpi3Apzr5V890eoP39Jin/7G6/2Ca5wM/jvOmRdF9S4Q1CpYkHc8jfvOUR+8anWEeOU49fQqhsOAsILH2Kf76jOBeZyY3JrxNBJXgGy6tbV2TZjSaNdygIsGjnfmTm5+IQzAWQRbYfhOLqJ9OUwGHI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843504; c=relaxed/simple; bh=XWfjXYT5FJgcE1mXz/f5qW91oQVmDL2Rc7vpGHvJHCk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j7RM/qkUK366Z98/WYmqJZyQmx/d7eyRAOS5NA6Fq6KEamFciYvBmIWz86FCGKmr1Q2Os3vE3wSF2jhNU1hb0LtCSru5yosTRP+WFYNusszCTQKC2sx0B8CsfE4qyeJ4+jAtyL66qFPTuCRLP04loXODm0HQASPcgVfj9Yi00mw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VZw/74aY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VZw/74aY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF17FC4CEF2; Sun, 6 Jul 2025 23:11:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751843504; bh=XWfjXYT5FJgcE1mXz/f5qW91oQVmDL2Rc7vpGHvJHCk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VZw/74aYNyJCvF/E+Jw7X4nCvVoF1NKvNTDjDkO+pFKvp8sd1Hz2oEKygRlE9SNKD kZ6an3PCWsh5GWm4eMc0X8yp39Cjrm1ZDbhlNAHXcL7A2Hj3wQt94iyJgg1j3YllVj LxG3tdal1EGEqgsswwdZlSoWFhiD993bVunTZl5qv22vTFOfJTiKotnaUjdeT6OXIC WaE5LyvYbTUdDj6xTB49htkSUrkp6/8JHXjo7OS4mMbxaYsF5dUu9nY1urWz8hF1yv /iReSS8SiAP6BNraPAb3kD7t4mi6PAgY3hiAddznUaDRLWYDpS+nqW+r75G3sHJXR7 UwsKKa0LsJ0Rw== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Eric Biggers , stable@vger.kernel.org Subject: [PATCH 2/5] lib/crypto: arm/poly1305: Fix register corruption in no-SIMD contexts Date: Sun, 6 Jul 2025 16:10:57 -0700 Message-ID: <20250706231100.176113-3-ebiggers@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250706231100.176113-1-ebiggers@kernel.org> References: <20250706231100.176113-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restore the SIMD usability check that was removed by commit 773426f4771b ("crypto: arm/poly1305 - Add block-only interface"). This safety check is cheap and is well worth eliminating a footgun. While the Poly1305 functions *should* be called only where SIMD registers are usable, if they are anyway, they should just do the right thing instead of corrupting random tasks' registers and/or computing incorrect MACs. Fixing this is also needed for poly1305_kunit to pass. Just use may_use_simd() instead of the original crypto_simd_usable(), since poly1305_kunit won't rely on crypto_simd_disabled_for_test. Fixes: 773426f4771b ("crypto: arm/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/arm/poly1305-glue.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/crypto/arm/poly1305-glue.c b/lib/crypto/arm/poly1305-glue.c index 5b65b840c166..2d86c78af883 100644 --- a/lib/crypto/arm/poly1305-glue.c +++ b/lib/crypto/arm/poly1305-glue.c @@ -5,10 +5,11 @@ * Copyright (C) 2019 Linaro Ltd. */ =20 #include #include +#include #include #include #include #include #include @@ -32,11 +33,11 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neo= n); void poly1305_blocks_arch(struct poly1305_block_state *state, const u8 *sr= c, unsigned int len, u32 padbit) { len =3D round_down(len, POLY1305_BLOCK_SIZE); if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && - static_branch_likely(&have_neon)) { + static_branch_likely(&have_neon) && likely(may_use_simd())) { do { unsigned int todo =3D min_t(unsigned int, len, SZ_4K); =20 kernel_neon_begin(); poly1305_blocks_neon(state, src, todo, padbit); --=20 2.50.0 From nobody Tue Oct 7 19:50:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6F8022CBE9; Sun, 6 Jul 2025 23:11:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843505; cv=none; b=d55O9EtWhSI0ocwzpvWU9L4k8hZPgWtbRfpxeEThK/L8U2tfRdpe+Xab2Oi+E3TFDFKi36WYs/6oxq7+kHBBvkr9ZU3XjgV17VAbihkg2RK31VZ81V5FPs8EEWYkaepiM6wUTBJqXX60J3lHEciKznQKCPUAhI3vHC+sNLZwsD8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843505; c=relaxed/simple; bh=eNLm4Gv8Ke0apz7fKKHlDcDiwap6peJnw5T7wbtI+Fk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ydmz+jsdiQbPa/rYYRmZ4C8wH7yPGupaoGLOqvfRI3v+1EaLk5Gu26Y2bjQOoSEjKhzoUabH7mxvlowPbfiM/TvTte/hnL5YsxU2L0mxSJEnwQab5SbVcJxDWbCuksnbZBNphgyf9q5dAomReFNRKRxHMBb33+M9sVtZzY3vyNc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KEl716Q/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KEl716Q/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 88026C4CEF1; Sun, 6 Jul 2025 23:11:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751843505; bh=eNLm4Gv8Ke0apz7fKKHlDcDiwap6peJnw5T7wbtI+Fk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KEl716Q/rK2VskCIQXoLJ37scDB2Fm19apyuGqS/cPSPUMWjpMuGpH/CQk/nCvxwb 0MOsJ+LZR5sAEBNawyFs1A2IcutUGYMhilgG4Ih6IqfQ8BpjMY2o02fcqr9jU+Jl/b Z6l54XfQegzQi4jRdpmV+zohLeDQV5EHEV+VY4hyXDxchkcv7lhdl0v90Tpa/f3OkR abOc6SXVC0dtEWnbp6fgSs2+TAdnF4bOVLhfh/p6Hnm7SFGUUxIbWCt0cWRdQHedNu ARrsqagae5ClEE5uLmBvMtLYXcrMWFInU4JuU4LAwq8PDUCxnKXm3ijXOM0XWDgD17 pIZeqhJR8EasA== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Eric Biggers , stable@vger.kernel.org Subject: [PATCH 3/5] lib/crypto: arm64/poly1305: Fix register corruption in no-SIMD contexts Date: Sun, 6 Jul 2025 16:10:58 -0700 Message-ID: <20250706231100.176113-4-ebiggers@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250706231100.176113-1-ebiggers@kernel.org> References: <20250706231100.176113-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restore the SIMD usability check that was removed by commit a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface"). This safety check is cheap and is well worth eliminating a footgun. While the Poly1305 functions *should* be called only where SIMD registers are usable, if they are anyway, they should just do the right thing instead of corrupting random tasks' registers and/or computing incorrect MACs. Fixing this is also needed for poly1305_kunit to pass. Just use may_use_simd() instead of the original crypto_simd_usable(), since poly1305_kunit won't rely on crypto_simd_disabled_for_test. Fixes: a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/arm64/poly1305-glue.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/crypto/arm64/poly1305-glue.c b/lib/crypto/arm64/poly1305-g= lue.c index c9a74766785b..31aea21ce42f 100644 --- a/lib/crypto/arm64/poly1305-glue.c +++ b/lib/crypto/arm64/poly1305-glue.c @@ -5,10 +5,11 @@ * Copyright (C) 2019 Linaro Ltd. */ =20 #include #include +#include #include #include #include #include #include @@ -31,11 +32,11 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neo= n); =20 void poly1305_blocks_arch(struct poly1305_block_state *state, const u8 *sr= c, unsigned int len, u32 padbit) { len =3D round_down(len, POLY1305_BLOCK_SIZE); - if (static_branch_likely(&have_neon)) { + if (static_branch_likely(&have_neon) && likely(may_use_simd())) { do { unsigned int todo =3D min_t(unsigned int, len, SZ_4K); =20 kernel_neon_begin(); poly1305_blocks_neon(state, src, todo, padbit); --=20 2.50.0 From nobody Tue Oct 7 19:50:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BE882BDC17; Sun, 6 Jul 2025 23:11:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843506; cv=none; b=b1ErLvYUokt/t4jA8wQww5G5aVrCqtVU01yrI3SIx9l9RK2/0NmRPLuZ2Z3EIdSxC5ZqqnjjGG/dzNUpcsUMFEi1UuyyeqmFsJZMK2X4hi8SYgQ2mCH9s2Csdzna/c4PafIG/yq7loIUbpeB56W1SZXM/pt2wWDMaSMrW33Ohcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843506; c=relaxed/simple; bh=bKAMxj63gaupN/jPkfJEbmvoMQTv+UmI9WM2d0qO3mE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Od2n2KYu8wCxhyvO5buU8olJ/S3vy/3a87DqIZW5EQ81v3E1Eq0YyuQox2OH/ScqTCgE5y6MfE82GzQQ1a13mty0jMacqhT7gmvy0zULBjRCNd7g8paMqnnyHbJlBNxu6FcXYOf56OMtXwJvO948GyF4DK0bLOdLdnGNr0y9Lwg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hzVCEDK5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hzVCEDK5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 684A0C4CEF3; Sun, 6 Jul 2025 23:11:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751843506; bh=bKAMxj63gaupN/jPkfJEbmvoMQTv+UmI9WM2d0qO3mE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hzVCEDK5NF34GmUsnPaUCZvX2iUQTpQOn313UltFqgaO2/uvwdrNURrbDQOejBenw NQ32GzxatZ9iOSvjGKBCRDp1TLey1lsIfGY5lKzKqJgT3S7kVlkXVZQPMQmOHvjJl3 I6AoDiKgHxSVwb1UOXqNFi0JvVUjbvM9SdC+l3h84glZKhKD+f7F4CE8QX0JaJUMBY K8XHCTU/9F/lzXZ7zrzaawXsmFWrt9mn9UQ5EiESrSOwY8erVhCQKkD+wXmYxOgxNZ FXZDepSXX+MOy1aaEtASkDz/hjoqWd8x6aRVT8RzlH9UP99hCuNei9cp5+53WrXyxL D+NwPULFu5UAg== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Eric Biggers , stable@vger.kernel.org Subject: [PATCH 4/5] lib/crypto: x86/poly1305: Fix register corruption in no-SIMD contexts Date: Sun, 6 Jul 2025 16:10:59 -0700 Message-ID: <20250706231100.176113-5-ebiggers@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250706231100.176113-1-ebiggers@kernel.org> References: <20250706231100.176113-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restore the SIMD usability check and base conversion that were removed by commit 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface"). This safety check is cheap and is well worth eliminating a footgun. While the Poly1305 functions *should* be called only where SIMD registers are usable, if they are anyway, they should just do the right thing instead of corrupting random tasks' registers and/or computing incorrect MACs. Fixing this is also needed for poly1305_kunit to pass. Just use irq_fpu_usable() instead of the original crypto_simd_usable(), since poly1305_kunit won't rely on crypto_simd_disabled_for_test. Fixes: 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/poly1305_glue.c | 40 +++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/lib/crypto/x86/poly1305_glue.c b/lib/crypto/x86/poly1305_glue.c index b7e78a583e07..968d84677631 100644 --- a/lib/crypto/x86/poly1305_glue.c +++ b/lib/crypto/x86/poly1305_glue.c @@ -23,10 +23,46 @@ struct poly1305_arch_internal { u64 r[2]; u64 pad; struct { u32 r2, r1, r4, r3; } rn[9]; }; =20 +/* + * The AVX code uses base 2^26, while the scalar code uses base 2^64. If w= e hit + * the unfortunate situation of using AVX and then having to go back to sc= alar + * -- because the user is silly and has called the update function from two + * separate contexts -- then we need to convert back to the original base = before + * proceeding. It is possible to reason that the initial reduction below is + * sufficient given the implementation invariants. However, for an avoidan= ce of + * doubt and because this is not performance critical, we do the full redu= ction + * anyway. Z3 proof of below function: https://xn--4db.cc/ltPtHCKN/py + */ +static void convert_to_base2_64(void *ctx) +{ + struct poly1305_arch_internal *state =3D ctx; + u32 cy; + + if (!state->is_base2_26) + return; + + cy =3D state->h[0] >> 26; state->h[0] &=3D 0x3ffffff; state->h[1] +=3D cy; + cy =3D state->h[1] >> 26; state->h[1] &=3D 0x3ffffff; state->h[2] +=3D cy; + cy =3D state->h[2] >> 26; state->h[2] &=3D 0x3ffffff; state->h[3] +=3D cy; + cy =3D state->h[3] >> 26; state->h[3] &=3D 0x3ffffff; state->h[4] +=3D cy; + state->hs[0] =3D ((u64)state->h[2] << 52) | ((u64)state->h[1] << 26) | st= ate->h[0]; + state->hs[1] =3D ((u64)state->h[4] << 40) | ((u64)state->h[3] << 14) | (s= tate->h[2] >> 12); + state->hs[2] =3D state->h[4] >> 24; + /* Unsigned Less Than: branchlessly produces 1 if a < b, else 0. */ +#define ULT(a, b) ((a ^ ((a ^ b) | ((a - b) ^ b))) >> (sizeof(a) * 8 - 1)) + cy =3D (state->hs[2] >> 2) + (state->hs[2] & ~3ULL); + state->hs[2] &=3D 3; + state->hs[0] +=3D cy; + state->hs[1] +=3D (cy =3D ULT(state->hs[0], cy)); + state->hs[2] +=3D ULT(state->hs[1], cy); +#undef ULT + state->is_base2_26 =3D 0; +} + asmlinkage void poly1305_block_init_arch( struct poly1305_block_state *state, const u8 raw_key[POLY1305_BLOCK_SIZE]); EXPORT_SYMBOL_GPL(poly1305_block_init_arch); asmlinkage void poly1305_blocks_x86_64(struct poly1305_arch_internal *ctx, @@ -60,11 +96,13 @@ void poly1305_blocks_arch(struct poly1305_block_state *= state, const u8 *inp, =20 /* SIMD disables preemption, so relax after processing each page. */ BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE || SZ_4K % POLY1305_BLOCK_SIZE); =20 - if (!static_branch_likely(&poly1305_use_avx)) { + if (!static_branch_likely(&poly1305_use_avx) || + unlikely(!irq_fpu_usable())) { + convert_to_base2_64(ctx); poly1305_blocks_x86_64(ctx, inp, len, padbit); return; } =20 do { --=20 2.50.0 From nobody Tue Oct 7 19:50:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13D522BE02E; Sun, 6 Jul 2025 23:11:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843507; cv=none; b=mcI+/udLC0Pe9e38xet3HJnLX9oX5du2mNHDgJTe8oEx0P0YhoFIUXErqxAnaFNyInE84rCvH5k2M3AbEuQcp5lNjLHepLQbM8ydHsmnxtbngTJyBhV+lQivAdiLe9V3obBbbsFdPCg4UV7hjMU4oyZKcJRnbo0oaDySIp81AmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751843507; c=relaxed/simple; bh=Szsb85EsQ+Zh1w4jNZz/88qNbUj4Tj26HrAtW7FyjqU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AMqU+gKKT26W0QvwEQa0BbC8EcN3k7ZfZF5vw7x/8Ww3Buy5C8NQd5W1wX9luwO3eypTcgyZwpeybapLF0Fpc4QXgfRlcO9iN90R+rjd4edaun2i3uiS2rZy54XnMBgv7Xr87IRbjMHxDrrLFnPSRawQhdpotm70ahYDK0DsaGc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BLw6kDkr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BLw6kDkr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 489D6C4CEED; Sun, 6 Jul 2025 23:11:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751843506; bh=Szsb85EsQ+Zh1w4jNZz/88qNbUj4Tj26HrAtW7FyjqU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BLw6kDkrizdxub9fjMWAZyiLgAQCIRS02Qtjf8aBUsmv08tcyRIpytwlj6zVajjSx A7gFOuwgrjL+9yKxrWrIi3ND3tl0UGiRNQPz5q2gRrfFMHt1YvASf4ssJfSo4nXqKc /33jxZCkDvHL93ZJGVywgCZdV7QkLSBkjZvmXZvdv/KuDakTidmefdfshG55iLCOZ1 9gDLNdYzbhTlCxHYF0+CyeHRoGGkaLBeTqQ6p/0CJ839Vrva+HomoBhwSGAC60vnb7 oAIEx4RUFtj1kMA0eLGt5ERxEXxrxjKQv0uci+8sRz4TeJL1SG7XG9IRfuS//mkmpX Ux/qgcMUvcIEw== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , "Jason A . Donenfeld" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Eric Biggers , stable@vger.kernel.org Subject: [PATCH 5/5] lib/crypto: x86/poly1305: Fix performance regression on short messages Date: Sun, 6 Jul 2025 16:11:00 -0700 Message-ID: <20250706231100.176113-6-ebiggers@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250706231100.176113-1-ebiggers@kernel.org> References: <20250706231100.176113-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restore the len >=3D 288 condition on using the AVX implementation, which was incidentally removed by commit 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface"). This check took into account the overhead in key power computation, kernel-mode "FPU", and tail handling associated with the AVX code. Indeed, restoring this check slightly improves performance for len < 256 as measured using poly1305_kunit on an "AMD Ryzen AI 9 365" (Zen 5) CPU: Length Before After =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D 1 30 MB/s 36 MB/s 16 516 MB/s 598 MB/s 64 1700 MB/s 1882 MB/s 127 2265 MB/s 2651 MB/s 128 2457 MB/s 2827 MB/s 200 2702 MB/s 3238 MB/s 256 3841 MB/s 3768 MB/s 511 4580 MB/s 4585 MB/s 512 5430 MB/s 5398 MB/s 1024 7268 MB/s 7305 MB/s 3173 8999 MB/s 8948 MB/s 4096 9942 MB/s 9921 MB/s 16384 10557 MB/s 10545 MB/s While the optimal threshold for this CPU might be slightly lower than 288 (see the len =3D=3D 256 case), other CPUs would need to be tested too, and these sorts of benchmarks can underestimate the true cost of kernel-mode "FPU". Therefore, for now just restore the 288 threshold. Fixes: 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel --- lib/crypto/x86/poly1305_glue.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/lib/crypto/x86/poly1305_glue.c b/lib/crypto/x86/poly1305_glue.c index 968d84677631..856d48fd422b 100644 --- a/lib/crypto/x86/poly1305_glue.c +++ b/lib/crypto/x86/poly1305_glue.c @@ -96,11 +96,19 @@ void poly1305_blocks_arch(struct poly1305_block_state *= state, const u8 *inp, =20 /* SIMD disables preemption, so relax after processing each page. */ BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE || SZ_4K % POLY1305_BLOCK_SIZE); =20 + /* + * The AVX implementations have significant setup overhead (e.g. key + * power computation, kernel FPU enabling) which makes them slower for + * short messages. Fall back to the scalar implementation for messages + * shorter than 288 bytes, unless the AVX-specific key setup has already + * been performed (indicated by ctx->is_base2_26). + */ if (!static_branch_likely(&poly1305_use_avx) || + (len < POLY1305_BLOCK_SIZE * 18 && !ctx->is_base2_26) || unlikely(!irq_fpu_usable())) { convert_to_base2_64(ctx); poly1305_blocks_x86_64(ctx, inp, len, padbit); return; } --=20 2.50.0