From nobody Fri Dec 19 20:35:49 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1597303045; cv=none; d=zohomail.com; s=zohoarc; b=CtrhaLrntgYYdOIP1ub8RRiToJgipENbsFuado3KmHey1StPYsq/4RsjHXcs+IRu0sBZgvt+pL1iD0icTDV8d+WCxncKITIc3ChjklYxAazdGkpNMzbjNzrQ7v8QpJN/03kFztPGqp9lPFFZzfSuJ8zxvJbOZuS4UryEO8M6AF0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1597303045; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=su/x2nUAmaVHKRfvrXBbqw8lBpLIoBy7smzx/dyPIzw=; b=WcBriw56rjrEuznEHKSOHFMQ1Zy9hBdmXM0IkZNKOricevl0Yqqi+ilMPWFW/DUtY2BbQyeoZaWvB/s4KOKaBuZJgrKf/HBaHQMf+35ILIE89plyKz+j1/5I8A40WKtDvkGjNIyVVgS2/2Eh5x3TBZWnKd5jl65q+ZF2nMDCOKI= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1597303045085998.8848565209888; Thu, 13 Aug 2020 00:17:25 -0700 (PDT) Received: from localhost ([::1]:43464 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k67UJ-0005I3-Sm for importer@patchew.org; Thu, 13 Aug 2020 03:17:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36904) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k67Rk-0001uU-Vf for qemu-devel@nongnu.org; Thu, 13 Aug 2020 03:14:44 -0400 Received: from smtp2200-217.mail.aliyun.com ([121.197.200.217]:43551) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k67Rh-0007eV-Uu for qemu-devel@nongnu.org; Thu, 13 Aug 2020 03:14:44 -0400 Received: from L-PF1D6DP4-1208.hz.ali.com(mailfrom:zhiwei_liu@c-sky.com fp:SMTPD_---.IGxBKGW_1597302868) by smtp.aliyun-inc.com(10.147.41.143); Thu, 13 Aug 2020 15:14:29 +0800 X-Alimail-AntiSpam: AC=CONTINUE; BC=0.06614426|-1; CH=green; DM=|CONTINUE|false|; DS=CONTINUE|ham_system_inform|0.0232365-0.000612012-0.976151; FP=0|0|0|0|0|-1|-1|-1; HT=e02c03267; MF=zhiwei_liu@c-sky.com; NM=1; PH=DS; RN=6; RT=6; SR=0; TI=SMTPD_---.IGxBKGW_1597302868; From: LIU Zhiwei To: qemu-devel@nongnu.org Subject: [PATCH 1/3] fpu/softfloat: Define operations for bfloat16 Date: Thu, 13 Aug 2020 15:14:19 +0800 Message-Id: <20200813071421.2509-2-zhiwei_liu@c-sky.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200813071421.2509-1-zhiwei_liu@c-sky.com> References: <20200813071421.2509-1-zhiwei_liu@c-sky.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: none client-ip=121.197.200.217; envelope-from=zhiwei_liu@c-sky.com; helo=smtp2200-217.mail.aliyun.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/13 03:14:30 X-ACL-Warn: Detected OS = Linux 3.x [generic] [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, UNPARSEABLE_RELAY=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, richard.henderson@linaro.org, LIU Zhiwei , aurelien@aurel32.net, peter.maydell@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" This patch implements operations for bfloat16 except conversion and some mi= sc operations. We also add FloatFmt and pack/unpack interfaces for bfloat16. As they are both static fields, we can't make a sperate patch for them. Signed-off-by: LIU Zhiwei Reviewed-by: Richard Henderson --- fpu/softfloat.c | 168 ++++++++++++++++++++++++++++++++++ include/fpu/softfloat-types.h | 5 + include/fpu/softfloat.h | 44 +++++++++ 3 files changed, 217 insertions(+) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 79be4f5840..d4205f92d5 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -554,6 +554,10 @@ static const FloatFmt float16_params_ahp =3D { .arm_althp =3D true }; =20 +static const FloatFmt bfloat16_params =3D { + FLOAT_PARAMS(8, 7) +}; + static const FloatFmt float32_params =3D { FLOAT_PARAMS(8, 23) }; @@ -580,6 +584,11 @@ static inline FloatParts float16_unpack_raw(float16 f) return unpack_raw(float16_params, f); } =20 +static inline FloatParts bfloat16_unpack_raw(bfloat16 f) +{ + return unpack_raw(bfloat16_params, f); +} + static inline FloatParts float32_unpack_raw(float32 f) { return unpack_raw(float32_params, f); @@ -603,6 +612,11 @@ static inline float16 float16_pack_raw(FloatParts p) return make_float16(pack_raw(float16_params, p)); } =20 +static inline bfloat16 bfloat16_pack_raw(FloatParts p) +{ + return pack_raw(bfloat16_params, p); +} + static inline float32 float32_pack_raw(FloatParts p) { return make_float32(pack_raw(float32_params, p)); @@ -820,6 +834,11 @@ static FloatParts float16_unpack_canonical(float16 f, = float_status *s) return float16a_unpack_canonical(f, s, &float16_params); } =20 +static FloatParts bfloat16_unpack_canonical(bfloat16 f, float_status *s) +{ + return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s); +} + static float16 float16a_round_pack_canonical(FloatParts p, float_status *s, const FloatFmt *params) { @@ -831,6 +850,11 @@ static float16 float16_round_pack_canonical(FloatParts= p, float_status *s) return float16a_round_pack_canonical(p, s, &float16_params); } =20 +static bfloat16 bfloat16_round_pack_canonical(FloatParts p, float_status *= s) +{ + return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params)); +} + static FloatParts float32_unpack_canonical(float32 f, float_status *s) { return sf_canonicalize(float32_unpack_raw(f), &float32_params, s); @@ -1158,6 +1182,28 @@ float64_sub(float64 a, float64 b, float_status *s) return float64_addsub(a, b, s, hard_f64_sub, soft_f64_sub); } =20 +/* + * Returns the result of adding or subtracting the bfloat16 + * values `a' and `b'. + */ +bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status *s= tatus) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pb =3D bfloat16_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, false, status); + + return bfloat16_round_pack_canonical(pr, status); +} + +bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *s= tatus) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pb =3D bfloat16_unpack_canonical(b, status); + FloatParts pr =3D addsub_floats(pa, pb, true, status); + + return bfloat16_round_pack_canonical(pr, status); +} + /* * Returns the result of multiplying the floating-point values `a' and * `b'. The operation is performed according to the IEC/IEEE Standard @@ -1260,6 +1306,20 @@ float64_mul(float64 a, float64 b, float_status *s) f64_is_zon2, f64_addsubmul_post); } =20 +/* + * Returns the result of multiplying the bfloat16 + * values `a' and `b'. + */ + +bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *s= tatus) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pb =3D bfloat16_unpack_canonical(b, status); + FloatParts pr =3D mul_floats(pa, pb, status); + + return bfloat16_round_pack_canonical(pr, status); +} + /* * Returns the result of multiplying the floating-point values `a' and * `b' then adding 'c', with no intermediate rounding step after the @@ -1642,6 +1702,23 @@ float64_muladd(float64 xa, float64 xb, float64 xc, i= nt flags, float_status *s) return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s); } =20 +/* + * Returns the result of multiplying the bfloat16 values `a' + * and `b' then adding 'c', with no intermediate rounding step after the + * multiplication. + */ + +bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c, + int flags, float_status *status) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pb =3D bfloat16_unpack_canonical(b, status); + FloatParts pc =3D bfloat16_unpack_canonical(c, status); + FloatParts pr =3D muladd_floats(pa, pb, pc, flags, status); + + return bfloat16_round_pack_canonical(pr, status); +} + /* * Returns the result of dividing the floating-point value `a' by the * corresponding value `b'. The operation is performed according to @@ -1808,6 +1885,20 @@ float64_div(float64 a, float64 b, float_status *s) f64_div_pre, f64_div_post); } =20 +/* + * Returns the result of dividing the bfloat16 + * value `a' by the corresponding value `b'. + */ + +bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pb =3D bfloat16_unpack_canonical(b, status); + FloatParts pr =3D div_floats(pa, pb, status); + + return bfloat16_round_pack_canonical(pr, status); +} + /* * Float to Float conversions * @@ -2847,6 +2938,25 @@ MINMAX(64, maxnummag, false, true, true) =20 #undef MINMAX =20 +#define BF16_MINMAX(name, ismin, isiee, ismag) \ +bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s) \ +{ \ + FloatParts pa =3D bfloat16_unpack_canonical(a, s); \ + FloatParts pb =3D bfloat16_unpack_canonical(b, s); \ + FloatParts pr =3D minmax_floats(pa, pb, ismin, isiee, ismag, s); \ + \ + return bfloat16_round_pack_canonical(pr, s); \ +} + +BF16_MINMAX(min, true, false, false) +BF16_MINMAX(minnum, true, true, false) +BF16_MINMAX(minnummag, true, true, true) +BF16_MINMAX(max, false, false, false) +BF16_MINMAX(maxnum, false, true, false) +BF16_MINMAX(maxnummag, false, true, true) + +#undef BF16_MINMAX + /* Floating point compare */ static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_qu= iet, float_status *s) @@ -3008,6 +3118,24 @@ FloatRelation float64_compare_quiet(float64 a, float= 64 b, float_status *s) return f64_compare(a, b, true, s); } =20 +static int QEMU_FLATTEN +soft_bf16_compare(bfloat16 a, bfloat16 b, bool is_quiet, float_status *s) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, s); + FloatParts pb =3D bfloat16_unpack_canonical(b, s); + return compare_floats(pa, pb, is_quiet, s); +} + +int bfloat16_compare(bfloat16 a, bfloat16 b, float_status *s) +{ + return soft_bf16_compare(a, b, false, s); +} + +int bfloat16_compare_quiet(bfloat16 a, bfloat16 b, float_status *s) +{ + return soft_bf16_compare(a, b, true, s); +} + /* Multiply A by 2 raised to the power N. */ static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s) { @@ -3047,6 +3175,13 @@ float64 float64_scalbn(float64 a, int n, float_statu= s *status) return float64_round_pack_canonical(pr, status); } =20 +bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pr =3D scalbn_decomposed(pa, n, status); + return bfloat16_round_pack_canonical(pr, status); +} + /* * Square Root * @@ -3197,6 +3332,13 @@ float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_= status *s) return soft_f64_sqrt(ua.s, s); } =20 +bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status) +{ + FloatParts pa =3D bfloat16_unpack_canonical(a, status); + FloatParts pr =3D sqrt_float(pa, status, &bfloat16_params); + return bfloat16_round_pack_canonical(pr, status); +} + /*------------------------------------------------------------------------= ---- | The pattern for a default generated NaN. *-------------------------------------------------------------------------= ---*/ @@ -3239,6 +3381,13 @@ float128 float128_default_nan(float_status *status) return r; } =20 +bfloat16 bfloat16_default_nan(float_status *status) +{ + FloatParts p =3D parts_default_nan(status); + p.frac >>=3D bfloat16_params.frac_shift; + return bfloat16_pack_raw(p); +} + /*------------------------------------------------------------------------= ---- | Returns a quiet NaN from a signalling NaN for the floating point value `= a'. *-------------------------------------------------------------------------= ---*/ @@ -3270,6 +3419,14 @@ float64 float64_silence_nan(float64 a, float_status = *status) return float64_pack_raw(p); } =20 +bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status) +{ + FloatParts p =3D bfloat16_unpack_raw(a); + p.frac <<=3D bfloat16_params.frac_shift; + p =3D parts_silence_nan(p, status); + p.frac >>=3D bfloat16_params.frac_shift; + return bfloat16_pack_raw(p); +} =20 /*------------------------------------------------------------------------= ---- | If `a' is denormal and we are in flush-to-zero mode then set the @@ -3319,6 +3476,17 @@ float64 float64_squash_input_denormal(float64 a, flo= at_status *status) return a; } =20 +bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status) +{ + if (status->flush_inputs_to_zero) { + FloatParts p =3D bfloat16_unpack_raw(a); + if (parts_squash_denormal(p, status)) { + return bfloat16_set_sign(bfloat16_zero, p.sign); + } + } + return a; +} + /*------------------------------------------------------------------------= ---- | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 | and 7, and returns the properly rounded 32-bit integer corresponding to = the diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h index 7680193ebc..3a3bf55631 100644 --- a/include/fpu/softfloat-types.h +++ b/include/fpu/softfloat-types.h @@ -112,6 +112,11 @@ typedef struct { #define make_float128(high_, low_) ((float128) { .high =3D high_, .low =3D= low_ }) #define make_float128_init(high_, low_) { .high =3D high_, .low =3D low_ } =20 +/* + * Software neural-network floating-point types. + */ +typedef uint16_t bfloat16; + /* * Software IEC/IEEE floating-point underflow tininess-detection mode. */ diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 659218b5c7..198f954e4d 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -239,6 +239,37 @@ bool float16_is_quiet_nan(float16, float_status *statu= s); bool float16_is_signaling_nan(float16, float_status *status); float16 float16_silence_nan(float16, float_status *status); =20 +/*------------------------------------------------------------------------= ---- +| Software bfloat16 operations. +*-------------------------------------------------------------------------= ---*/ + +bfloat16 bfloat16_add(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_sub(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_mul(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_div(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_muladd(bfloat16, bfloat16, bfloat16, int, + float_status *status); +float16 bfloat16_scalbn(bfloat16, int, float_status *status); +bfloat16 bfloat16_min(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_max(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_minnum(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_maxnum(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_minnummag(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_maxnummag(bfloat16, bfloat16, float_status *status); +bfloat16 bfloat16_sqrt(bfloat16, float_status *status); +int bfloat16_compare(bfloat16, bfloat16, float_status *status); +int bfloat16_compare_quiet(bfloat16, bfloat16, float_status *status); +int bfloat16_unordered_quiet(bfloat16, bfloat16, float_status *status); +int bfloat16_le(bfloat16, bfloat16, float_status *status); +int bfloat16_lt(bfloat16, bfloat16, float_status *status); +int bfloat16_eq_quiet(bfloat16, bfloat16, float_status *status); + +int bfloat16_is_quiet_nan(bfloat16, float_status *status); +int bfloat16_is_signaling_nan(bfloat16, float_status *status); +bfloat16 bfloat16_silence_nan(bfloat16, float_status *status); +bfloat16 bfloat16_default_nan(float_status *status); +bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status); + static inline bool float16_is_any_nan(float16 a) { return ((float16_val(a) & ~0x8000) > 0x7c00); @@ -293,6 +324,19 @@ static inline float16 float16_set_sign(float16 a, int = sign) #define float16_three make_float16(0x4200) #define float16_infinity make_float16(0x7c00) =20 +static inline bfloat16 bfloat16_set_sign(bfloat16 a, int sign) +{ + return (a & 0x7fff) | (sign << 15); +} + +#define bfloat16_zero 0 +#define bfloat16_half 0x3f00 +#define bfloat16_one 0x3f80 +#define bfloat16_one_point_five 0x3fc0 +#define bfloat16_two 0x4000 +#define bfloat16_three 0x4040 +#define bfloat16_infinity 0x7f80 + /*------------------------------------------------------------------------= ---- | The pattern for a default generated half-precision NaN. *-------------------------------------------------------------------------= ---*/ --=20 2.23.0