From nobody Mon Feb 9 10:58:02 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1543104549499590.5041202800496; Sat, 24 Nov 2018 16:09:09 -0800 (PST) Received: from localhost ([::1]:58203 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gQhz2-00068P-Dj for importer@patchew.org; Sat, 24 Nov 2018 19:09:08 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57607) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gQhmj-0003Qz-4t for qemu-devel@nongnu.org; Sat, 24 Nov 2018 18:56:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gQhme-0006e2-A1 for qemu-devel@nongnu.org; Sat, 24 Nov 2018 18:56:25 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:47617) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gQhme-0005OH-1n for qemu-devel@nongnu.org; Sat, 24 Nov 2018 18:56:20 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id C7C51D0F; Sat, 24 Nov 2018 18:56:03 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Sat, 24 Nov 2018 18:56:04 -0500 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 1D07C102F1; Sat, 24 Nov 2018 18:56:03 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= mesmtp; bh=e7ciGg3YiL1x7XCnTx7UfOknXcepmzGvg99XkPSqOPU=; b=R3CeB GuU3hrrCkWmmDS1Xmhe6R60ZaehM6l6acOQOLCaBzfOCsCDeTfx+YQ3lAH+XyhWD ssXndvrlJNUuLgeMhmU1viRDgheHfb0VGy0Zr+IYHkSugjg34UBhyAMDqvpc29U2 4I8dU5yu242XBEbouDy90o5IXz2Oz1ja8P645M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=e7ciGg3YiL1x7XCnTx7UfOknXcepm zGvg99XkPSqOPU=; b=ZEnPOCXg2jQngh5ZXX14njeFTfLBA2xHKCoQ1X/lCiT7T 31Z4aspNfCUuKTJyxqO2yNueJjg/EmXLmk/NZZGzpss1U1vMxRwgC6intpnUP8d1 DoASRhpU4HG/AbsmuiaWF01WnuUNGjO34TE1NYuSs8XPnM/WXOklNB/lmU5DQZ/z dwLJJVkHdoAXRMZmSDIk+Yd+YOTK1nWeh89jzxixM8Tljhd8C+O0wfvDvZzuWckq zMsLeUzvHo0WazhF+mLnOrOzTWOY43o+6/QYljulE+DIVokxK/yPzpdcopUNf2CN fWkykfLwfBOWFELJu2ZrWdggcHxiuIhlRu1o1BXaw== X-ME-Sender: X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Sat, 24 Nov 2018 18:55:53 -0500 Message-Id: <20181124235553.17371-14-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181124235553.17371-1-cota@braap.org> References: <20181124235553.17371-1-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.123.25 Subject: [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson , =?UTF-8?q?Alex=20Benn=C3=A9e?= Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: cmp-single: 110.98 MFlops cmp-double: 107.12 MFlops - after: cmp-single: 506.28 MFlops cmp-double: 524.77 MFlops Note that flattening both eq and eq_signaling versions would give us extra performance (695v506, 615v524 Mflops for single/double, respectively) but this would emit two essentially identical functions for each eq/signaling pair, which is a waste. Aggregate performance improvement for the last few patches: [ all charts in png: https://imgur.com/a/4yV8p ] 1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz qemu-aarch64 NBench score; higher is better Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz 16 +-+-----------+-------------+----=3D=3D=3D-------+---=3D=3D=3D-------+= -----------+-+ 14 +-+..........................@@@&&.=3D.......@@@&&.=3D................= ...+-+ 12 +-+..........................@.@.&.=3D.......@.@.&.=3D.....+befor=3D= =3D=3D +-+ 10 +-+..........................@.@.&.=3D.......@.@.&.=3D.....+ad@@&& =3D= +-+ 8 +-+.......................$$$%.@.&.=3D.......@.@.&.=3D.....+ @@u& =3D= +-+ 6 +-+............@@@&&=3D+***##.$%.@.&.=3D***##$$%+@.&.=3D..###$$%%@i& = =3D +-+ 4 +-+.......###$%%.@.&=3D.*.*.#.$%.@.&.=3D*.*.#.$%.@.&.=3D+**.#+$ +@m& = =3D +-+ 2 +-+.....***.#$.%.@.&=3D.*.*.#.$%.@.&.=3D*.*.#.$%.@.&.=3D.**.#+$+sqr& = =3D +-+ 0 +-+-----***##$%%@@&&=3D-***##$$%@@&&=3D=3D***##$$%@@&&=3D=3D-**##$$%+c= mp=3D=3D-----+-+ FOURIER NEURAL NELU DECOMPOSITION gmean qemu-aarch64 SPEC06fp (test set) speedup over= QEMU 4c2c1015905 Host: Intel(R) Core(TM) i7-6700K CPU = @ 4.00GHz error bars: 95% confidence inte= rval 4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----= +-----+-----+-----+-----+----+-----+---+-+ 4 +-+..........................+@@+....................................= .......................................+-+ 3.5 +-+..............%%@&.........@@..............%%@&...................= .........................+++dsub +-+ 2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&...................= .................+%%&+.+%@&++%%@& +-+ 2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.= %%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+ 1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**= #%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+ 0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**= #%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+ 0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**= #%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+ 410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.Ge= msF465.tont470.lb4482.sphinxgeomean 2. Host: ARM Aarch64 A57 @ 2.4GHz qemu-aarch64 NBench score; higher is better Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz 5 +-+-----------+-------------+-------------+-------------+-----------+= -+ 4.5 +-+........................................@@@&=3D=3D................= ...+-+ 3 4 +-+..........................@@@&=3D=3D........@.@&.=3D.....+before = +-+ 3 +-+..........................@.@&.=3D........@.@&.=3D.....+ad@@@&=3D= =3D +-+ 2.5 +-+.....................##$$%%.@&.=3D........@.@&.=3D.....+ @m@& =3D= +-+ 2 +-+............@@@&=3D=3D.***#.$.%.@&.=3D.***#$$%%.@&.=3D.***#$$%%d@&= =3D +-+ 1.5 +-+.....***#$$%%.@&.=3D.*.*#.$.%.@&.=3D.*.*#.$.%.@&.=3D.*.*#+$ +f@& = =3D +-+ 0.5 +-+.....*.*#.$.%.@&.=3D.*.*#.$.%.@&.=3D.*.*#.$.%.@&.=3D.*.*#+$+sqr& = =3D +-+ 0 +-+-----***#$$%%@@&=3D=3D-***#$$%%@@&=3D=3D-***#$$%%@@&=3D=3D-***#$$%= +cmp=3D=3D-----+-+ FOURIER NEURAL NLU DECOMPOSITION gmean Signed-off-by: Emilio G. Cota Reviewed-by: Alex Benn=C3=A9e --- fpu/softfloat.c | 109 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 95 insertions(+), 14 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 4c6ecd1883..b29a2b6714 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -2899,28 +2899,109 @@ static int compare_floats(FloatParts a, FloatParts= b, bool is_quiet, } } =20 -#define COMPARE(sz) \ -int float ## sz ## _compare(float ## sz a, float ## sz b, \ - float_status *s) \ +#define COMPARE(name, attr, sz) \ +static int attr \ +name(float ## sz a, float ## sz b, bool is_quiet, float_status *s) \ { \ FloatParts pa =3D float ## sz ## _unpack_canonical(a, s); \ FloatParts pb =3D float ## sz ## _unpack_canonical(b, s); \ - return compare_floats(pa, pb, false, s); \ -} \ -int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, \ - float_status *s) \ -{ \ - FloatParts pa =3D float ## sz ## _unpack_canonical(a, s); \ - FloatParts pb =3D float ## sz ## _unpack_canonical(b, s); \ - return compare_floats(pa, pb, true, s); \ + return compare_floats(pa, pb, is_quiet, s); \ } =20 -COMPARE(16) -COMPARE(32) -COMPARE(64) +COMPARE(soft_f16_compare, QEMU_FLATTEN, 16) +COMPARE(soft_f32_compare, QEMU_SOFTFLOAT_ATTR, 32) +COMPARE(soft_f64_compare, QEMU_SOFTFLOAT_ATTR, 64) =20 #undef COMPARE =20 +int float16_compare(float16 a, float16 b, float_status *s) +{ + return soft_f16_compare(a, b, false, s); +} + +int float16_compare_quiet(float16 a, float16 b, float_status *s) +{ + return soft_f16_compare(a, b, true, s); +} + +static int QEMU_FLATTEN +f32_compare(float32 xa, float32 xb, bool is_quiet, float_status *s) +{ + union_float32 ua, ub; + + ua.s =3D xa; + ub.s =3D xb; + + if (QEMU_NO_HARDFLOAT) { + goto soft; + } + + float32_input_flush2(&ua.s, &ub.s, s); + if (isgreaterequal(ua.h, ub.h)) { + if (isgreater(ua.h, ub.h)) { + return float_relation_greater; + } + return float_relation_equal; + } + if (likely(isless(ua.h, ub.h))) { + return float_relation_less; + } + /* The only condition remaining is unordered. + * Fall through to set flags. + */ + soft: + return soft_f32_compare(ua.s, ub.s, is_quiet, s); +} + +int float32_compare(float32 a, float32 b, float_status *s) +{ + return f32_compare(a, b, false, s); +} + +int float32_compare_quiet(float32 a, float32 b, float_status *s) +{ + return f32_compare(a, b, true, s); +} + +static int QEMU_FLATTEN +f64_compare(float64 xa, float64 xb, bool is_quiet, float_status *s) +{ + union_float64 ua, ub; + + ua.s =3D xa; + ub.s =3D xb; + + if (QEMU_NO_HARDFLOAT) { + goto soft; + } + + float64_input_flush2(&ua.s, &ub.s, s); + if (isgreaterequal(ua.h, ub.h)) { + if (isgreater(ua.h, ub.h)) { + return float_relation_greater; + } + return float_relation_equal; + } + if (likely(isless(ua.h, ub.h))) { + return float_relation_less; + } + /* The only condition remaining is unordered. + * Fall through to set flags. + */ + soft: + return soft_f64_compare(ua.s, ub.s, is_quiet, s); +} + +int float64_compare(float64 a, float64 b, float_status *s) +{ + return f64_compare(a, b, false, s); +} + +int float64_compare_quiet(float64 a, float64 b, float_status *s) +{ + return f64_compare(a, b, true, s); +} + /* Multiply A by 2 raised to the power N. */ static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s) { --=20 2.17.1