From nobody Fri Nov 7 02:29:02 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1545679024551898.7844105988447; Mon, 24 Dec 2018 11:17:04 -0800 (PST) Received: from localhost ([127.0.0.1]:38596 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gbVij-0000fl-Hp for importer@patchew.org; Mon, 24 Dec 2018 14:16:57 -0500 Received: from eggs.gnu.org ([208.118.235.92]:35186) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gbVhr-0000GF-RW for qemu-devel@nongnu.org; Mon, 24 Dec 2018 14:16:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gbVhn-00072u-7t for qemu-devel@nongnu.org; Mon, 24 Dec 2018 14:16:03 -0500 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:35881) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gbVhn-00071A-02 for qemu-devel@nongnu.org; Mon, 24 Dec 2018 14:15:59 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id EAE3D21FB2; Mon, 24 Dec 2018 14:15:56 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Mon, 24 Dec 2018 14:15:56 -0500 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id C18F9E4597; Mon, 24 Dec 2018 14:15:55 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id; s=mesmtp; bh=a078hlyxXcM8SwM jnpwm1UGSuMP/Z99qc9Fru49InhI=; b=He0cB+ZYusq/UM03TvOFk8ueB/j6zFw sBu3vshFwHtO6Qw/iAT7FwoFxETHaKZ8/Zqm1PAFjObGod9G3ww8/zebqOCkp/lb L5TgsrLCIPpT/dNAm9aUVLmlhPOPU2xn5Hj2jGyoaHc4BruLLwoY4ad8oR31W758 noQyxG/8YUS8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:message-id:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=a078hlyxXcM8SwMjnpwm1UGSuMP/Z99qc9Fru49InhI=; b=fY8xqViv SWDrk+L+xSrrB+FrB0y1OdAmX697fwQyRsksjr/btUCwOROIDvTcceMQI601l9ue ex0FJMUBoKeTl3noVM+Z6pRXQ7GTGO9MLozUo7OT6N3jJg/SXr9VEW+TIx5Ql5nU YPJclR1SB744t5HvdhmBLiOoc+bzNLg/SY9/j9eFOy+siNHC2ZBMGMxtsOJ1Bvyj IEFiyW0DoV0333+3EJ1e29GsZhQh95J0YCb4plY969+msHMWZm6JUhIr9SwOq7OM p+njZFY5Qo9b9uSky2+hbjg+E3C3T0qMFugQmYfZs/9S5rOyW/cogi3+HlcpEUqO 3en3vZmp7vFJDg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtkedrudekuddguddvfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fosedttdertdertddtnecuhfhrohhmpedfgfhmihhlihhoucfirdcuvehothgrfdcuoegt ohhtrgessghrrggrphdrohhrgheqnecuffhomhgrihhnpehsohhurhgtvgifrghrvgdroh hrghenucfkphepuddvkedrheelrddvtddrvdduieenucfrrghrrghmpehmrghilhhfrhho mheptghothgrsegsrhgrrghprdhorhhgnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Mon, 24 Dec 2018 14:15:55 -0500 Message-Id: <20181224191555.14187-1-cota@braap.org> X-Mailer: git-send-email 2.17.1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.25 Subject: [Qemu-devel] [PATCH] softfloat: enforce softfloat if the host's FMA is broken X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: laurent.desnogues@gmail.com, =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The added branch to the FMA ops is marked as unlikely and therefore its impact on performance (measured with fp-bench) is within noise range when measured on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. In addition, when the host doesn't have a hardware FMA instruction we force the use of softfloat, since whatever the libc does (e.g. checking the host's FP flags) is unlikely to be faster than our softfloat implementation. For instance, on an i386 machine with no hardware support for FMA, we get: $ for precision in single double; do ./fp-bench -o mulAdd -p $precision done - before: 5.07 MFlops 1.85 MFlops - after: 12.65 MFlops 10.05 MFlops Reported-by: Laurent Desnogues Suggested-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/qemu/cpuid.h | 6 ++++ fpu/softfloat.c | 85 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h index 69301700bd..320926ffe0 100644 --- a/include/qemu/cpuid.h +++ b/include/qemu/cpuid.h @@ -25,6 +25,9 @@ #endif =20 /* Leaf 1, %ecx */ +#ifndef bit_FMA3 +#define bit_FMA3 (1 << 12) +#endif #ifndef bit_SSE4_1 #define bit_SSE4_1 (1 << 19) #endif @@ -53,5 +56,8 @@ #ifndef bit_LZCNT #define bit_LZCNT (1 << 5) #endif +#ifndef bit_FMA4 +#define bit_FMA4 (1 << 16) +#endif =20 #endif /* QEMU_CPUID_H */ diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 59eac97d10..ccaed85b0f 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -1542,6 +1542,8 @@ soft_f64_muladd(float64 a, float64 b, float64 c, int = flags, return float64_round_pack_canonical(pr, status); } =20 +static bool force_soft_fma; + float32 QEMU_FLATTEN float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status= *s) { @@ -1562,6 +1564,11 @@ float32_muladd(float32 xa, float32 xb, float32 xc, i= nt flags, float_status *s) if (unlikely(!f32_is_zon3(ua, ub, uc))) { goto soft; } + + if (unlikely(force_soft_fma)) { + goto soft; + } + /* * When (a || b) =3D=3D 0, there's no need to check for under/over flo= w, * since we know the addend is (normal || 0) and the product is 0. @@ -1623,6 +1630,11 @@ float64_muladd(float64 xa, float64 xb, float64 xc, i= nt flags, float_status *s) if (unlikely(!f64_is_zon3(ua, ub, uc))) { goto soft; } + + if (unlikely(force_soft_fma)) { + goto soft; + } + /* * When (a || b) =3D=3D 0, there's no need to check for under/over flo= w, * since we know the addend is (normal || 0) and the product is 0. @@ -7974,3 +7986,76 @@ float128 float128_scalbn(float128 a, int n, float_st= atus *status) , status); =20 } + +#ifdef CONFIG_CPUID_H +#include "qemu/cpuid.h" +#endif + +static void check_host_hw_fma(void) +{ +#ifdef CONFIG_CPUID_H + int max =3D __get_cpuid_max(0, NULL); + int a, b, c, d; + bool has_fma3 =3D false; + bool has_fma4 =3D false; + bool has_avx =3D false; + + if (max >=3D 1) { + __cpuid(1, a, b, c, d); + + /* check whether avx is usable */ + if (c & bit_OSXSAVE) { + int bv; + + __asm("xgetbv" : "=3Da"(bv), "=3Dd"(d) : "c"(0)); + if ((bv & 6) =3D=3D 6) { + has_avx =3D c & bit_AVX; + } + } + + if (has_avx) { + /* fma3 */ + has_fma3 =3D c & bit_FMA3; + + /* fma4 */ + __cpuid(0x80000000, a, b, c, d); + if (a >=3D 0x80000001) { + __cpuid(0x80000001, a, b, c, d); + + has_fma4 =3D c & bit_FMA4; + } + } + } + /* + * Without HW FMA, whatever the libc does is probably slower than our + * softfloat implementation. + */ + if (!has_fma3 && !has_fma4) { + force_soft_fma =3D true; + } +#endif +} + +static void __attribute__((constructor)) softfloat_init(void) +{ + union_float64 ua, ub, uc, ur; + + if (QEMU_NO_HARDFLOAT) { + return; + } + + /* + * Test that the host's FMA is not obviously broken. For example, + * glibc < 2.23 can perform an incorrect FMA on certain hosts; see + * https://sourceware.org/bugzilla/show_bug.cgi?id=3D13304 + */ + ua.s =3D 0x0020000000000001ULL; + ub.s =3D 0x3ca0000000000000ULL; + uc.s =3D 0x0020000000000000ULL; + ur.h =3D fma(ua.h, ub.h, uc.h); + if (ur.s !=3D 0x0020000000000001ULL) { + force_soft_fma =3D true; + } + + check_host_hw_fma(); +} --=20 2.17.1