From nobody Fri Oct 24 22:12:12 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1519211996620402.2076186294132; Wed, 21 Feb 2018 03:19:56 -0800 (PST) Received: from localhost ([::1]:60115 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoSRF-0006jI-ND for importer@patchew.org; Wed, 21 Feb 2018 06:19:53 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35402) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoSDe-000422-Fu for qemu-devel@nongnu.org; Wed, 21 Feb 2018 06:05:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eoSDb-0006ii-Ai for qemu-devel@nongnu.org; Wed, 21 Feb 2018 06:05:50 -0500 Received: from mail-wr0-x241.google.com ([2a00:1450:400c:c0c::241]:40318) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eoSDa-0006iM-S9 for qemu-devel@nongnu.org; Wed, 21 Feb 2018 06:05:47 -0500 Received: by mail-wr0-x241.google.com with SMTP id o76so3270891wrb.7 for ; Wed, 21 Feb 2018 03:05:46 -0800 (PST) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id 63sm29389269wms.46.2018.02.21.03.05.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Feb 2018 03:05:36 -0800 (PST) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id 2661A3E0A37; Wed, 21 Feb 2018 11:05:24 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XR8PyRuFoPPXrKMDDk3W6aaN1ZdO641hBWbNiAqNQI0=; b=XmmJONH1fb1Dk++1UooY+6m0p9hEoDd3fsV5CTk/UA0OcSRE8Clde289q04EPoO1I3 Z9cq9gGsSDaVp04sKKZstM1jEdpwl0jmoZTqIelX/0y9figWEcls3hl9JN9rsK3zpQn+ U1M5sFyBDTiR3gH/sZW9B6ApWPxxE4IhNGCG0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XR8PyRuFoPPXrKMDDk3W6aaN1ZdO641hBWbNiAqNQI0=; b=PfjYAl3MzvE7beY7XXugm3rjN9b8/iIz7uv28XglWiSkLv8SFr+CpJIlGVLHiFWb+O lya01FNCps+a2ZGE7lPbFV9m6naHGVMVBn0TeHlnqKcUPNwKWBMOViiL/CY6/877bg/y AT4toNNQQOU2GVTi8Kpn0ZYXzYeX4ew+zKEhx/pFm+jL6ixzKn9we1oq14bMp7rrCDOw +qxmJWujiMD9cVsbPxjSkp66pbXXaHFcrUCMC74xEiBNRBJuSdbqIY9PNYRXwsE/LgD0 onLhGyvOV2OOgxBDQqlQxAhSZY+to6PFUj+b7SktCU+kt293qaT1P1lNxDbkIpT9+nf+ sPDw== X-Gm-Message-State: APf1xPB89BiNeZ02DQqY/iGywmeoNUjXsfYutta6ewYyVqaJWJDAVREm H2AosQHQeXjaYf5f3lxaw+R9HQ== X-Google-Smtp-Source: AH8x225ogppZtIu93v7Iky7Zl81/bWuwOwL76SCPDMYTmZV8F0VhDrtrDcnqGdW3pesOEKdW8HVBrg== X-Received: by 10.28.114.8 with SMTP id n8mr1710188wmc.30.1519211145304; Wed, 21 Feb 2018 03:05:45 -0800 (PST) From: =?UTF-8?q?Alex=20Benn=C3=A9e?= To: peter.maydell@linaro.org Date: Wed, 21 Feb 2018 11:05:12 +0000 Message-Id: <20180221110523.859-13-alex.bennee@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180221110523.859-1-alex.bennee@linaro.org> References: <20180221110523.859-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c0c::241 Subject: [Qemu-devel] [PULL 12/22] fpu/softfloat: re-factor add/sub X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson , =?UTF-8?q?Alex=20Benn=C3=A9e?= , qemu-devel@nongnu.org, Aurelien Jarno Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 We can now add float16_add/sub and use the common decompose and canonicalize functions to have a single implementation for float16/32/64 add and sub functions. Signed-off-by: Alex Benn=C3=A9e Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daud=C3=A9 diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 568d555595..2190e7de56 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -83,6 +83,7 @@ this code that are retained. * target-dependent and needs the TARGET_* macros. */ #include "qemu/osdep.h" +#include "qemu/bitops.h" #include "fpu/softfloat.h" =20 /* We only need stdlib for abort() */ @@ -270,6 +271,470 @@ static const FloatFmt float64_params =3D { FLOAT_PARAMS(11, 52) }; =20 +/* Unpack a float to parts, but do not canonicalize. */ +static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw) +{ + const int sign_pos =3D fmt.frac_size + fmt.exp_size; + + return (FloatParts) { + .cls =3D float_class_unclassified, + .sign =3D extract64(raw, sign_pos, 1), + .exp =3D extract64(raw, fmt.frac_size, fmt.exp_size), + .frac =3D extract64(raw, 0, fmt.frac_size), + }; +} + +static inline FloatParts float16_unpack_raw(float16 f) +{ + return unpack_raw(float16_params, f); +} + +static inline FloatParts float32_unpack_raw(float32 f) +{ + return unpack_raw(float32_params, f); +} + +static inline FloatParts float64_unpack_raw(float64 f) +{ + return unpack_raw(float64_params, f); +} + +/* Pack a float from parts, but do not canonicalize. */ +static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p) +{ + const int sign_pos =3D fmt.frac_size + fmt.exp_size; + uint64_t ret =3D deposit64(p.frac, fmt.frac_size, fmt.exp_size, p.exp); + return deposit64(ret, sign_pos, 1, p.sign); +} + +static inline float16 float16_pack_raw(FloatParts p) +{ + return make_float16(pack_raw(float16_params, p)); +} + +static inline float32 float32_pack_raw(FloatParts p) +{ + return make_float32(pack_raw(float32_params, p)); +} + +static inline float64 float64_pack_raw(FloatParts p) +{ + return make_float64(pack_raw(float64_params, p)); +} + +/* Canonicalize EXP and FRAC, setting CLS. */ +static FloatParts canonicalize(FloatParts part, const FloatFmt *parm, + float_status *status) +{ + if (part.exp =3D=3D parm->exp_max) { + if (part.frac =3D=3D 0) { + part.cls =3D float_class_inf; + } else { +#ifdef NO_SIGNALING_NANS + part.cls =3D float_class_qnan; +#else + int64_t msb =3D part.frac << (parm->frac_shift + 2); + if ((msb < 0) =3D=3D status->snan_bit_is_one) { + part.cls =3D float_class_snan; + } else { + part.cls =3D float_class_qnan; + } +#endif + } + } else if (part.exp =3D=3D 0) { + if (likely(part.frac =3D=3D 0)) { + part.cls =3D float_class_zero; + } else if (status->flush_inputs_to_zero) { + float_raise(float_flag_input_denormal, status); + part.cls =3D float_class_zero; + part.frac =3D 0; + } else { + int shift =3D clz64(part.frac) - 1; + part.cls =3D float_class_normal; + part.exp =3D parm->frac_shift - parm->exp_bias - shift + 1; + part.frac <<=3D shift; + } + } else { + part.cls =3D float_class_normal; + part.exp -=3D parm->exp_bias; + part.frac =3D DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_s= hift); + } + return part; +} + +/* Round and uncanonicalize a floating-point number by parts. There + * are FRAC_SHIFT bits that may require rounding at the bottom of the + * fraction; these bits will be removed. The exponent will be biased + * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0]. + */ + +static FloatParts round_canonical(FloatParts p, float_status *s, + const FloatFmt *parm) +{ + const uint64_t frac_lsbm1 =3D parm->frac_lsbm1; + const uint64_t round_mask =3D parm->round_mask; + const uint64_t roundeven_mask =3D parm->roundeven_mask; + const int exp_max =3D parm->exp_max; + const int frac_shift =3D parm->frac_shift; + uint64_t frac, inc; + int exp, flags =3D 0; + bool overflow_norm; + + frac =3D p.frac; + exp =3D p.exp; + + switch (p.cls) { + case float_class_normal: + switch (s->float_rounding_mode) { + case float_round_nearest_even: + overflow_norm =3D false; + inc =3D ((frac & roundeven_mask) !=3D frac_lsbm1 ? frac_lsbm1 = : 0); + break; + case float_round_ties_away: + overflow_norm =3D false; + inc =3D frac_lsbm1; + break; + case float_round_to_zero: + overflow_norm =3D true; + inc =3D 0; + break; + case float_round_up: + inc =3D p.sign ? 0 : round_mask; + overflow_norm =3D p.sign; + break; + case float_round_down: + inc =3D p.sign ? round_mask : 0; + overflow_norm =3D !p.sign; + break; + default: + g_assert_not_reached(); + } + + exp +=3D parm->exp_bias; + if (likely(exp > 0)) { + if (frac & round_mask) { + flags |=3D float_flag_inexact; + frac +=3D inc; + if (frac & DECOMPOSED_OVERFLOW_BIT) { + frac >>=3D 1; + exp++; + } + } + frac >>=3D frac_shift; + + if (unlikely(exp >=3D exp_max)) { + flags |=3D float_flag_overflow | float_flag_inexact; + if (overflow_norm) { + exp =3D exp_max - 1; + frac =3D -1; + } else { + p.cls =3D float_class_inf; + goto do_inf; + } + } + } else if (s->flush_to_zero) { + flags |=3D float_flag_output_denormal; + p.cls =3D float_class_zero; + goto do_zero; + } else { + bool is_tiny =3D (s->float_detect_tininess + =3D=3D float_tininess_before_rounding) + || (exp < 0) + || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT); + + shift64RightJamming(frac, 1 - exp, &frac); + if (frac & round_mask) { + /* Need to recompute round-to-even. */ + if (s->float_rounding_mode =3D=3D float_round_nearest_even= ) { + inc =3D ((frac & roundeven_mask) !=3D frac_lsbm1 + ? frac_lsbm1 : 0); + } + flags |=3D float_flag_inexact; + frac +=3D inc; + } + + exp =3D (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0); + frac >>=3D frac_shift; + + if (is_tiny && (flags & float_flag_inexact)) { + flags |=3D float_flag_underflow; + } + if (exp =3D=3D 0 && frac =3D=3D 0) { + p.cls =3D float_class_zero; + } + } + break; + + case float_class_zero: + do_zero: + exp =3D 0; + frac =3D 0; + break; + + case float_class_inf: + do_inf: + exp =3D exp_max; + frac =3D 0; + break; + + case float_class_qnan: + case float_class_snan: + exp =3D exp_max; + break; + + default: + g_assert_not_reached(); + } + + float_raise(flags, s); + p.exp =3D exp; + p.frac =3D frac; + return p; +} + +static FloatParts float16_unpack_canonical(float16 f, float_status *s) +{ + return canonicalize(float16_unpack_raw(f), &float16_params, s); +} + +static float16 float16_round_pack_canonical(FloatParts p, float_status *s) +{ + switch (p.cls) { + case float_class_dnan: + return float16_default_nan(s); + case float_class_msnan: + return float16_maybe_silence_nan(float16_pack_raw(p), s); + default: + p =3D round_canonical(p, s, &float16_params); + return float16_pack_raw(p); + } +} + +static FloatParts float32_unpack_canonical(float32 f, float_status *s) +{ + return canonicalize(float32_unpack_raw(f), &float32_params, s); +} + +static float32 float32_round_pack_canonical(FloatParts p, float_status *s) +{ + switch (p.cls) { + case float_class_dnan: + return float32_default_nan(s); + case float_class_msnan: + return float32_maybe_silence_nan(float32_pack_raw(p), s); + default: + p =3D round_canonical(p, s, &float32_params); + return float32_pack_raw(p); + } +} + +static FloatParts float64_unpack_canonical(float64 f, float_status *s) +{ + return canonicalize(float64_unpack_raw(f), &float64_params, s); +} + +static float64 float64_round_pack_canonical(FloatParts p, float_status *s) +{ + switch (p.cls) { + case float_class_dnan: + return float64_default_nan(s); + case float_class_msnan: + return float64_maybe_silence_nan(float64_pack_raw(p), s); + default: + p =3D round_canonical(p, s, &float64_params); + return float64_pack_raw(p); + } +} + +/* Simple helpers for checking if what NaN we have */ +static bool is_nan(FloatClass c) +{ + return unlikely(c >=3D float_class_qnan); +} +static bool is_snan(FloatClass c) +{ + return c =3D=3D float_class_snan; +} +static bool is_qnan(FloatClass c) +{ + return c =3D=3D float_class_qnan; +} + +static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s) +{ + if (is_snan(a.cls) || is_snan(b.cls)) { + s->float_exception_flags |=3D float_flag_invalid; + } + + if (s->default_nan_mode) { + a.cls =3D float_class_dnan; + } else { + if (pickNaN(is_qnan(a.cls), is_snan(a.cls), + is_qnan(b.cls), is_snan(b.cls), + a.frac > b.frac || + (a.frac =3D=3D b.frac && a.sign < b.sign))) { + a =3D b; + } + a.cls =3D float_class_msnan; + } + return a; +} + +/* + * Returns the result of adding or subtracting the values of the + * floating-point values `a' and `b'. The operation is performed + * according to the IEC/IEEE Standard for Binary Floating-Point + * Arithmetic. + */ + +static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract, + float_status *s) +{ + bool a_sign =3D a.sign; + bool b_sign =3D b.sign ^ subtract; + + if (a_sign !=3D b_sign) { + /* Subtraction */ + + if (a.cls =3D=3D float_class_normal && b.cls =3D=3D float_class_no= rmal) { + if (a.exp > b.exp || (a.exp =3D=3D b.exp && a.frac >=3D b.frac= )) { + shift64RightJamming(b.frac, a.exp - b.exp, &b.frac); + a.frac =3D a.frac - b.frac; + } else { + shift64RightJamming(a.frac, b.exp - a.exp, &a.frac); + a.frac =3D b.frac - a.frac; + a.exp =3D b.exp; + a_sign ^=3D 1; + } + + if (a.frac =3D=3D 0) { + a.cls =3D float_class_zero; + a.sign =3D s->float_rounding_mode =3D=3D float_round_down; + } else { + int shift =3D clz64(a.frac) - 1; + a.frac =3D a.frac << shift; + a.exp =3D a.exp - shift; + a.sign =3D a_sign; + } + return a; + } + if (is_nan(a.cls) || is_nan(b.cls)) { + return pick_nan(a, b, s); + } + if (a.cls =3D=3D float_class_inf) { + if (b.cls =3D=3D float_class_inf) { + float_raise(float_flag_invalid, s); + a.cls =3D float_class_dnan; + } + return a; + } + if (a.cls =3D=3D float_class_zero && b.cls =3D=3D float_class_zero= ) { + a.sign =3D s->float_rounding_mode =3D=3D float_round_down; + return a; + } + if (a.cls =3D=3D float_class_zero || b.cls =3D=3D float_class_inf)= { + b.sign =3D a_sign ^ 1; + return b; + } + if (b.cls =3D=3D float_class_zero) { + return a; + } + } else { + /* Addition */ + if (a.cls =3D=3D float_class_normal && b.cls =3D=3D float_class_no= rmal) { + if (a.exp > b.exp) { + shift64RightJamming(b.frac, a.exp - b.exp, &b.frac); + } else if (a.exp < b.exp) { + shift64RightJamming(a.frac, b.exp - a.exp, &a.frac); + a.exp =3D b.exp; + } + a.frac +=3D b.frac; + if (a.frac & DECOMPOSED_OVERFLOW_BIT) { + a.frac >>=3D 1; + a.exp +=3D 1; + } + return a; + } + if (is_nan(a.cls) || is_nan(b.cls)) { + return pick_nan(a, b, s); + } + if (a.cls =3D=3D float_class_inf || b.cls =3D=3D float_class_zero)= { + return a; + } + if (b.cls =3D=3D float_class_inf || a.cls =3D=3D float_class_zero)= { + b.sign =3D b_sign; + return b; + } + } + g_assert_not_reached(); +} + +/* + * Returns the result of adding or subtracting the floating-point + * values `a' and `b'. The operation is performed according to the + * IEC/IEEE Standard for Binary Floating-Point Arithmetic. + */ + +float16 __attribute__((flatten)) float16_add(float16 a, float16 b, + float_status *status) +{ + FloatParts pa =3D float16_unpack_canonical(a, status); + FloatParts pb =3D float16_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, false, status); + + return float16_round_pack_canonical(pr, status); +} + +float32 __attribute__((flatten)) float32_add(float32 a, float32 b, + float_status *status) +{ + FloatParts pa =3D float32_unpack_canonical(a, status); + FloatParts pb =3D float32_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, false, status); + + return float32_round_pack_canonical(pr, status); +} + +float64 __attribute__((flatten)) float64_add(float64 a, float64 b, + float_status *status) +{ + FloatParts pa =3D float64_unpack_canonical(a, status); + FloatParts pb =3D float64_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, false, status); + + return float64_round_pack_canonical(pr, status); +} + +float16 __attribute__((flatten)) float16_sub(float16 a, float16 b, + float_status *status) +{ + FloatParts pa =3D float16_unpack_canonical(a, status); + FloatParts pb =3D float16_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, true, status); + + return float16_round_pack_canonical(pr, status); +} + +float32 __attribute__((flatten)) float32_sub(float32 a, float32 b, + float_status *status) +{ + FloatParts pa =3D float32_unpack_canonical(a, status); + FloatParts pb =3D float32_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, true, status); + + return float32_round_pack_canonical(pr, status); +} + +float64 __attribute__((flatten)) float64_sub(float64 a, float64 b, + float_status *status) +{ + FloatParts pa =3D float64_unpack_canonical(a, status); + FloatParts pb =3D float64_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, true, status); + + return float64_round_pack_canonical(pr, status); +} + /*------------------------------------------------------------------------= ---- | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 | and 7, and returns the properly rounded 32-bit integer corresponding to = the @@ -2081,220 +2546,6 @@ float32 float32_round_to_int(float32 a, float_statu= s *status) =20 } =20 -/*------------------------------------------------------------------------= ---- -| Returns the result of adding the absolute values of the single-precision -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated -| before being returned. `zSign' is ignored if the result is a NaN. -| The addition is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -static float32 addFloat32Sigs(float32 a, float32 b, flag zSign, - float_status *status) -{ - int aExp, bExp, zExp; - uint32_t aSig, bSig, zSig; - int expDiff; - - aSig =3D extractFloat32Frac( a ); - aExp =3D extractFloat32Exp( a ); - bSig =3D extractFloat32Frac( b ); - bExp =3D extractFloat32Exp( b ); - expDiff =3D aExp - bExp; - aSig <<=3D 6; - bSig <<=3D 6; - if ( 0 < expDiff ) { - if ( aExp =3D=3D 0xFF ) { - if (aSig) { - return propagateFloat32NaN(a, b, status); - } - return a; - } - if ( bExp =3D=3D 0 ) { - --expDiff; - } - else { - bSig |=3D 0x20000000; - } - shift32RightJamming( bSig, expDiff, &bSig ); - zExp =3D aExp; - } - else if ( expDiff < 0 ) { - if ( bExp =3D=3D 0xFF ) { - if (bSig) { - return propagateFloat32NaN(a, b, status); - } - return packFloat32( zSign, 0xFF, 0 ); - } - if ( aExp =3D=3D 0 ) { - ++expDiff; - } - else { - aSig |=3D 0x20000000; - } - shift32RightJamming( aSig, - expDiff, &aSig ); - zExp =3D bExp; - } - else { - if ( aExp =3D=3D 0xFF ) { - if (aSig | bSig) { - return propagateFloat32NaN(a, b, status); - } - return a; - } - if ( aExp =3D=3D 0 ) { - if (status->flush_to_zero) { - if (aSig | bSig) { - float_raise(float_flag_output_denormal, status); - } - return packFloat32(zSign, 0, 0); - } - return packFloat32( zSign, 0, ( aSig + bSig )>>6 ); - } - zSig =3D 0x40000000 + aSig + bSig; - zExp =3D aExp; - goto roundAndPack; - } - aSig |=3D 0x20000000; - zSig =3D ( aSig + bSig )<<1; - --zExp; - if ( (int32_t) zSig < 0 ) { - zSig =3D aSig + bSig; - ++zExp; - } - roundAndPack: - return roundAndPackFloat32(zSign, zExp, zSig, status); - -} - -/*------------------------------------------------------------------------= ---- -| Returns the result of subtracting the absolute values of the single- -| precision floating-point values `a' and `b'. If `zSign' is 1, the -| difference is negated before being returned. `zSign' is ignored if the -| result is a NaN. The subtraction is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -static float32 subFloat32Sigs(float32 a, float32 b, flag zSign, - float_status *status) -{ - int aExp, bExp, zExp; - uint32_t aSig, bSig, zSig; - int expDiff; - - aSig =3D extractFloat32Frac( a ); - aExp =3D extractFloat32Exp( a ); - bSig =3D extractFloat32Frac( b ); - bExp =3D extractFloat32Exp( b ); - expDiff =3D aExp - bExp; - aSig <<=3D 7; - bSig <<=3D 7; - if ( 0 < expDiff ) goto aExpBigger; - if ( expDiff < 0 ) goto bExpBigger; - if ( aExp =3D=3D 0xFF ) { - if (aSig | bSig) { - return propagateFloat32NaN(a, b, status); - } - float_raise(float_flag_invalid, status); - return float32_default_nan(status); - } - if ( aExp =3D=3D 0 ) { - aExp =3D 1; - bExp =3D 1; - } - if ( bSig < aSig ) goto aBigger; - if ( aSig < bSig ) goto bBigger; - return packFloat32(status->float_rounding_mode =3D=3D float_round_down= , 0, 0); - bExpBigger: - if ( bExp =3D=3D 0xFF ) { - if (bSig) { - return propagateFloat32NaN(a, b, status); - } - return packFloat32( zSign ^ 1, 0xFF, 0 ); - } - if ( aExp =3D=3D 0 ) { - ++expDiff; - } - else { - aSig |=3D 0x40000000; - } - shift32RightJamming( aSig, - expDiff, &aSig ); - bSig |=3D 0x40000000; - bBigger: - zSig =3D bSig - aSig; - zExp =3D bExp; - zSign ^=3D 1; - goto normalizeRoundAndPack; - aExpBigger: - if ( aExp =3D=3D 0xFF ) { - if (aSig) { - return propagateFloat32NaN(a, b, status); - } - return a; - } - if ( bExp =3D=3D 0 ) { - --expDiff; - } - else { - bSig |=3D 0x40000000; - } - shift32RightJamming( bSig, expDiff, &bSig ); - aSig |=3D 0x40000000; - aBigger: - zSig =3D aSig - bSig; - zExp =3D aExp; - normalizeRoundAndPack: - --zExp; - return normalizeRoundAndPackFloat32(zSign, zExp, zSig, status); - -} - -/*------------------------------------------------------------------------= ---- -| Returns the result of adding the single-precision floating-point values = `a' -| and `b'. The operation is performed according to the IEC/IEEE Standard = for -| Binary Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -float32 float32_add(float32 a, float32 b, float_status *status) -{ - flag aSign, bSign; - a =3D float32_squash_input_denormal(a, status); - b =3D float32_squash_input_denormal(b, status); - - aSign =3D extractFloat32Sign( a ); - bSign =3D extractFloat32Sign( b ); - if ( aSign =3D=3D bSign ) { - return addFloat32Sigs(a, b, aSign, status); - } - else { - return subFloat32Sigs(a, b, aSign, status); - } - -} - -/*------------------------------------------------------------------------= ---- -| Returns the result of subtracting the single-precision floating-point va= lues -| `a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard -| for Binary Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -float32 float32_sub(float32 a, float32 b, float_status *status) -{ - flag aSign, bSign; - a =3D float32_squash_input_denormal(a, status); - b =3D float32_squash_input_denormal(b, status); - - aSign =3D extractFloat32Sign( a ); - bSign =3D extractFloat32Sign( b ); - if ( aSign =3D=3D bSign ) { - return subFloat32Sigs(a, b, aSign, status); - } - else { - return addFloat32Sigs(a, b, aSign, status); - } - -} - /*------------------------------------------------------------------------= ---- | Returns the result of multiplying the single-precision floating-point va= lues | `a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard @@ -3891,219 +4142,6 @@ float64 float64_trunc_to_int(float64 a, float_statu= s *status) return res; } =20 -/*------------------------------------------------------------------------= ---- -| Returns the result of adding the absolute values of the double-precision -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated -| before being returned. `zSign' is ignored if the result is a NaN. -| The addition is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -static float64 addFloat64Sigs(float64 a, float64 b, flag zSign, - float_status *status) -{ - int aExp, bExp, zExp; - uint64_t aSig, bSig, zSig; - int expDiff; - - aSig =3D extractFloat64Frac( a ); - aExp =3D extractFloat64Exp( a ); - bSig =3D extractFloat64Frac( b ); - bExp =3D extractFloat64Exp( b ); - expDiff =3D aExp - bExp; - aSig <<=3D 9; - bSig <<=3D 9; - if ( 0 < expDiff ) { - if ( aExp =3D=3D 0x7FF ) { - if (aSig) { - return propagateFloat64NaN(a, b, status); - } - return a; - } - if ( bExp =3D=3D 0 ) { - --expDiff; - } - else { - bSig |=3D LIT64( 0x2000000000000000 ); - } - shift64RightJamming( bSig, expDiff, &bSig ); - zExp =3D aExp; - } - else if ( expDiff < 0 ) { - if ( bExp =3D=3D 0x7FF ) { - if (bSig) { - return propagateFloat64NaN(a, b, status); - } - return packFloat64( zSign, 0x7FF, 0 ); - } - if ( aExp =3D=3D 0 ) { - ++expDiff; - } - else { - aSig |=3D LIT64( 0x2000000000000000 ); - } - shift64RightJamming( aSig, - expDiff, &aSig ); - zExp =3D bExp; - } - else { - if ( aExp =3D=3D 0x7FF ) { - if (aSig | bSig) { - return propagateFloat64NaN(a, b, status); - } - return a; - } - if ( aExp =3D=3D 0 ) { - if (status->flush_to_zero) { - if (aSig | bSig) { - float_raise(float_flag_output_denormal, status); - } - return packFloat64(zSign, 0, 0); - } - return packFloat64( zSign, 0, ( aSig + bSig )>>9 ); - } - zSig =3D LIT64( 0x4000000000000000 ) + aSig + bSig; - zExp =3D aExp; - goto roundAndPack; - } - aSig |=3D LIT64( 0x2000000000000000 ); - zSig =3D ( aSig + bSig )<<1; - --zExp; - if ( (int64_t) zSig < 0 ) { - zSig =3D aSig + bSig; - ++zExp; - } - roundAndPack: - return roundAndPackFloat64(zSign, zExp, zSig, status); - -} - -/*------------------------------------------------------------------------= ---- -| Returns the result of subtracting the absolute values of the double- -| precision floating-point values `a' and `b'. If `zSign' is 1, the -| difference is negated before being returned. `zSign' is ignored if the -| result is a NaN. The subtraction is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -static float64 subFloat64Sigs(float64 a, float64 b, flag zSign, - float_status *status) -{ - int aExp, bExp, zExp; - uint64_t aSig, bSig, zSig; - int expDiff; - - aSig =3D extractFloat64Frac( a ); - aExp =3D extractFloat64Exp( a ); - bSig =3D extractFloat64Frac( b ); - bExp =3D extractFloat64Exp( b ); - expDiff =3D aExp - bExp; - aSig <<=3D 10; - bSig <<=3D 10; - if ( 0 < expDiff ) goto aExpBigger; - if ( expDiff < 0 ) goto bExpBigger; - if ( aExp =3D=3D 0x7FF ) { - if (aSig | bSig) { - return propagateFloat64NaN(a, b, status); - } - float_raise(float_flag_invalid, status); - return float64_default_nan(status); - } - if ( aExp =3D=3D 0 ) { - aExp =3D 1; - bExp =3D 1; - } - if ( bSig < aSig ) goto aBigger; - if ( aSig < bSig ) goto bBigger; - return packFloat64(status->float_rounding_mode =3D=3D float_round_down= , 0, 0); - bExpBigger: - if ( bExp =3D=3D 0x7FF ) { - if (bSig) { - return propagateFloat64NaN(a, b, status); - } - return packFloat64( zSign ^ 1, 0x7FF, 0 ); - } - if ( aExp =3D=3D 0 ) { - ++expDiff; - } - else { - aSig |=3D LIT64( 0x4000000000000000 ); - } - shift64RightJamming( aSig, - expDiff, &aSig ); - bSig |=3D LIT64( 0x4000000000000000 ); - bBigger: - zSig =3D bSig - aSig; - zExp =3D bExp; - zSign ^=3D 1; - goto normalizeRoundAndPack; - aExpBigger: - if ( aExp =3D=3D 0x7FF ) { - if (aSig) { - return propagateFloat64NaN(a, b, status); - } - return a; - } - if ( bExp =3D=3D 0 ) { - --expDiff; - } - else { - bSig |=3D LIT64( 0x4000000000000000 ); - } - shift64RightJamming( bSig, expDiff, &bSig ); - aSig |=3D LIT64( 0x4000000000000000 ); - aBigger: - zSig =3D aSig - bSig; - zExp =3D aExp; - normalizeRoundAndPack: - --zExp; - return normalizeRoundAndPackFloat64(zSign, zExp, zSig, status); - -} - -/*------------------------------------------------------------------------= ---- -| Returns the result of adding the double-precision floating-point values = `a' -| and `b'. The operation is performed according to the IEC/IEEE Standard = for -| Binary Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -float64 float64_add(float64 a, float64 b, float_status *status) -{ - flag aSign, bSign; - a =3D float64_squash_input_denormal(a, status); - b =3D float64_squash_input_denormal(b, status); - - aSign =3D extractFloat64Sign( a ); - bSign =3D extractFloat64Sign( b ); - if ( aSign =3D=3D bSign ) { - return addFloat64Sigs(a, b, aSign, status); - } - else { - return subFloat64Sigs(a, b, aSign, status); - } - -} - -/*------------------------------------------------------------------------= ---- -| Returns the result of subtracting the double-precision floating-point va= lues -| `a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard -| for Binary Floating-Point Arithmetic. -*-------------------------------------------------------------------------= ---*/ - -float64 float64_sub(float64 a, float64 b, float_status *status) -{ - flag aSign, bSign; - a =3D float64_squash_input_denormal(a, status); - b =3D float64_squash_input_denormal(b, status); - - aSign =3D extractFloat64Sign( a ); - bSign =3D extractFloat64Sign( b ); - if ( aSign =3D=3D bSign ) { - return subFloat64Sigs(a, b, aSign, status); - } - else { - return addFloat64Sigs(a, b, aSign, status); - } - -} =20 /*------------------------------------------------------------------------= ---- | Returns the result of multiplying the double-precision floating-point va= lues diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 23824a3000..693ece0974 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -236,6 +236,10 @@ float64 float16_to_float64(float16 a, flag ieee, float= _status *status); /*------------------------------------------------------------------------= ---- | Software half-precision operations. *-------------------------------------------------------------------------= ---*/ + +float16 float16_add(float16, float16, float_status *status); +float16 float16_sub(float16, float16, float_status *status); + int float16_is_quiet_nan(float16, float_status *status); int float16_is_signaling_nan(float16, float_status *status); float16 float16_maybe_silence_nan(float16, float_status *status); --=20 2.15.1