From nobody Sun Dec 14 21:54:18 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1752223044729568.6320601396171; Fri, 11 Jul 2025 01:37:24 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ua99h-0003iG-2t; Fri, 11 Jul 2025 04:30:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ua90e-0001y0-NP; Fri, 11 Jul 2025 04:21:34 -0400 Received: from isrv.corpit.ru ([212.248.84.144]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ua90c-0004lo-Lg; Fri, 11 Jul 2025 04:21:32 -0400 Received: from tsrv.corpit.ru (tsrv.tls.msk.ru [192.168.177.2]) by isrv.corpit.ru (Postfix) with ESMTP id DDEC61356F2; Fri, 11 Jul 2025 11:17:20 +0300 (MSK) Received: from think4mjt.tls.msk.ru (mjtthink.wg.tls.msk.ru [192.168.177.146]) by tsrv.corpit.ru (Postfix) with ESMTP id D63B823FA67; Fri, 11 Jul 2025 11:17:47 +0300 (MSK) From: Michael Tokarev To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Richard Henderson , Peter Maydell , Michael Tokarev Subject: [Stable-10.0.3 39/39] target/arm: Fix bfdotadd_ebf vs nan selection Date: Fri, 11 Jul 2025 11:16:35 +0300 Message-ID: <20250711081745.1785806-39-mjt@tls.msk.ru> X-Mailer: git-send-email 2.47.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=212.248.84.144; envelope-from=mjt@tls.msk.ru; helo=isrv.corpit.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1752223045560116600 Content-Type: text/plain; charset="utf-8" From: Richard Henderson Implement FPProcessNaNs4 within bfdotadd_ebf, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=3D1 semantics for bfdot= add()") Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell Message-id: 20250704142112.1018902-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell (cherry picked from commit bf020eaa6741711902a425016e2c7585f222562d) Signed-off-by: Michael Tokarev diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index 986eaf8ffa..3b7f308803 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -2989,31 +2989,62 @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t= e2, float_status *fpst) float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst, float_status *fpst_odd) { - /* - * Compare f16_dotadd() in sme_helper.c, but here we have - * bfloat16 inputs. In particular that means that we do not - * want the FPCR.FZ16 flush semantics, so we use the normal - * float_status for the input handling here. - */ - float64 e1r =3D float32_to_float64(e1 << 16, fpst); - float64 e1c =3D float32_to_float64(e1 & 0xffff0000u, fpst); - float64 e2r =3D float32_to_float64(e2 << 16, fpst); - float64 e2c =3D float32_to_float64(e2 & 0xffff0000u, fpst); - float64 t64; + float32 s1r =3D e1 << 16; + float32 s1c =3D e1 & 0xffff0000u; + float32 s2r =3D e2 << 16; + float32 s2c =3D e2 & 0xffff0000u; float32 t32; =20 - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 =3D float64_mul(e1r, e2r, fpst_odd); - t64 =3D float64r32_muladd(e1c, e2c, t64, 0, fpst); + /* C.f. FPProcessNaNs4 */ + if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) || + float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) { + if (float32_is_signaling_nan(s1r, fpst)) { + t32 =3D s1r; + } else if (float32_is_signaling_nan(s1c, fpst)) { + t32 =3D s1c; + } else if (float32_is_signaling_nan(s2r, fpst)) { + t32 =3D s2r; + } else if (float32_is_signaling_nan(s2c, fpst)) { + t32 =3D s2c; + } else if (float32_is_any_nan(s1r)) { + t32 =3D s1r; + } else if (float32_is_any_nan(s1c)) { + t32 =3D s1c; + } else if (float32_is_any_nan(s2r)) { + t32 =3D s2r; + } else { + t32 =3D s2c; + } + /* + * FPConvertNaN(FPProcessNaN(t32)) will be done as part + * of the final addition below. + */ + } else { + /* + * Compare f16_dotadd() in sme_helper.c, but here we have + * bfloat16 inputs. In particular that means that we do not + * want the FPCR.FZ16 flush semantics, so we use the normal + * float_status for the input handling here. + */ + float64 e1r =3D float32_to_float64(s1r, fpst); + float64 e1c =3D float32_to_float64(s1c, fpst); + float64 e2r =3D float32_to_float64(s2r, fpst); + float64 e2c =3D float32_to_float64(s2c, fpst); + float64 t64; =20 - /* This conversion is exact, because we've already rounded. */ - t32 =3D float64_to_float32(t64, fpst); + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 =3D float64_mul(e1r, e2r, fpst_odd); + t64 =3D float64r32_muladd(e1c, e2c, t64, 0, fpst); + + /* This conversion is exact, because we've already rounded. */ + t32 =3D float64_to_float32(t64, fpst); + } =20 /* The final accumulation step is not fused. */ return float32_add(sum, t32, fpst); --=20 2.47.2