From nobody Fri Oct 24 09:38:25 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1519400506552998.5666782179516; Fri, 23 Feb 2018 07:41:46 -0800 (PST) Received: from localhost ([::1]:45263 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1epFTl-0006zJ-K3 for importer@patchew.org; Fri, 23 Feb 2018 10:41:45 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44377) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1epFOx-0002lS-0c for qemu-devel@nongnu.org; Fri, 23 Feb 2018 10:36:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1epFOv-000745-CP for qemu-devel@nongnu.org; Fri, 23 Feb 2018 10:36:47 -0500 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:35686) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1epFOv-00072H-2P for qemu-devel@nongnu.org; Fri, 23 Feb 2018 10:36:45 -0500 Received: by mail-wm0-x242.google.com with SMTP id x7so3053878wmc.0 for ; Fri, 23 Feb 2018 07:36:44 -0800 (PST) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id m187sm2910038wmg.0.2018.02.23.07.36.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Feb 2018 07:36:42 -0800 (PST) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id EEC463E043F; Fri, 23 Feb 2018 15:36:36 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tIJ5d2GcFeAwMZmYlpoC01XfdO99ce2ZsNfJkGnJ5JY=; b=HQd15bvGLeie0iCaWPmJFobuvO4MGRjOVlm9NA+T+iFSsuGa7fjCmnAXAUHUJKWexu drVQWEpgYotFG00ZLJM97cpSV7fIhbXNOfCGs2AGQXEJkuML8CqDwh758hSKMyrHNR6s l+W05CkPz/JW5eLUPCZEC2zU4nhf1jw/vkJmw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tIJ5d2GcFeAwMZmYlpoC01XfdO99ce2ZsNfJkGnJ5JY=; b=CgvFMJXBSEMkzHy8o2YSfvMqdRpRke8zOqDv+wT6lWN1bICSqzMroaFlugAa5HTJer 1c0aeIdsVJyDjm/5ljgy5HmkFj8l3zDMWwJShqY9u8rKQf5/Qxl0Eak8T1TL/KSWTaSq AtfrJXGqxaDM260BRLEQ0LfwUiRai4goME5JscpimCQQIcNHfZCafNXW3a50YMfomCnF adUOK7vWfQzomvDiJjpxdW63pMFVxUD7WCAWvrw6qdbFmgDs/cYMi6r9pEgm3GcWS2YY EFUWTCoKH2Ul/LijR/Q9idvSid1uT3QJUvaTSQPhKZTWP+ixDMeT4Rwkk0bq7Gc3ZMoZ UUaA== X-Gm-Message-State: APf1xPA/r6R1bYOLy5HD+35hf0j+RAEB9hgmD2upfpSd4U5pwbrA5JjI 7ppBoQ5/PzscTeCDBv6KTH32kg== X-Google-Smtp-Source: AG47ELuxLK+IAw2/UcPfRnzxfCJMSV/XP2oP7Dz9ZWH/07Ck45YbZDp1WiZ3aaWYz1eJsGU2bl18fQ== X-Received: by 10.28.138.6 with SMTP id m6mr2215789wmd.146.1519400203909; Fri, 23 Feb 2018 07:36:43 -0800 (PST) From: =?UTF-8?q?Alex=20Benn=C3=A9e?= To: qemu-arm@nongnu.org Date: Fri, 23 Feb 2018 15:36:11 +0000 Message-Id: <20180223153636.29809-7-alex.bennee@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180223153636.29809-1-alex.bennee@linaro.org> References: <20180223153636.29809-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::242 Subject: [Qemu-devel] [PATCH v3 06/31] arm/translate-a64: implement half-precision F(MIN|MAX)(V|NMV) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , richard.henderson@linaro.org, qemu-devel@nongnu.org, Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 This implements the half-precision variants of the across vector reduction operations. This involves a re-factor of the reduction code which more closely matches the ARM ARM order (and handles 8 element reductions). Signed-off-by: Alex Benn=C3=A9e Reviewed-by: Richard Henderson -- v1 - dropped the advsimd_2a stuff v2 - fixed up checkpatch v3 - add TCG_CALL_NO_RWG to helper definitions --- target/arm/helper-a64.c | 18 ++++++ target/arm/helper-a64.h | 4 ++ target/arm/translate-a64.c | 142 ++++++++++++++++++++++++++++-------------= ---- 3 files changed, 110 insertions(+), 54 deletions(-) diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c index 10e08bdc1f..fddd5d242b 100644 --- a/target/arm/helper-a64.c +++ b/target/arm/helper-a64.c @@ -572,3 +572,21 @@ uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMSt= ate *env, uint64_t addr, { return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, true, GETPC()= ); } + +/* + * AdvSIMD half-precision + */ + +#define ADVSIMD_HELPER(name, suffix) HELPER(glue(glue(advsimd_, name), suf= fix)) + +#define ADVSIMD_HALFOP(name) \ +float16 ADVSIMD_HELPER(name, h)(float16 a, float16 b, void *fpstp) \ +{ \ + float_status *fpst =3D fpstp; \ + return float16_ ## name(a, b, fpst); \ +} + +ADVSIMD_HALFOP(min) +ADVSIMD_HALFOP(max) +ADVSIMD_HALFOP(minnum) +ADVSIMD_HALFOP(maxnum) diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h index 85d86741db..cb2a73124d 100644 --- a/target/arm/helper-a64.h +++ b/target/arm/helper-a64.h @@ -48,3 +48,7 @@ DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL= _NO_WG, DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64= , i64) DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG, i64, env, i64, i64, i64) +DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, ptr) +DEF_HELPER_FLAGS_3(advsimd_minh, TCG_CALL_NO_RWG, f16, f16, f16, ptr) +DEF_HELPER_FLAGS_3(advsimd_maxnumh, TCG_CALL_NO_RWG, f16, f16, f16, ptr) +DEF_HELPER_FLAGS_3(advsimd_minnumh, TCG_CALL_NO_RWG, f16, f16, f16, ptr) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 91c2b8ed11..ebaf4571ac 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -5741,26 +5741,75 @@ static void disas_simd_zip_trn(DisasContext *s, uin= t32_t insn) tcg_temp_free_i64(tcg_resh); } =20 -static void do_minmaxop(DisasContext *s, TCGv_i32 tcg_elt1, TCGv_i32 tcg_e= lt2, - int opc, bool is_min, TCGv_ptr fpst) -{ - /* Helper function for disas_simd_across_lanes: do a single precision - * min/max operation on the specified two inputs, - * and return the result in tcg_elt1. - */ - if (opc =3D=3D 0xc) { - if (is_min) { - gen_helper_vfp_minnums(tcg_elt1, tcg_elt1, tcg_elt2, fpst); - } else { - gen_helper_vfp_maxnums(tcg_elt1, tcg_elt1, tcg_elt2, fpst); - } +/* + * do_reduction_op helper + * + * This mirrors the Reduce() pseudocode in the ARM ARM. It is + * important for correct NaN propagation that we do these + * operations in exactly the order specified by the pseudocode. + * + * This is a recursive function, TCG temps should be freed by the + * calling function once it is done with the values. + */ +static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn, + int esize, int size, int vmap, TCGv_ptr fp= st) +{ + if (esize =3D=3D size) { + int element; + TCGMemOp msize =3D esize =3D=3D 16 ? MO_16 : MO_32; + TCGv_i32 tcg_elem; + + /* We should have one register left here */ + assert(ctpop8(vmap) =3D=3D 1); + element =3D ctz32(vmap); + assert(element < 8); + + tcg_elem =3D tcg_temp_new_i32(); + read_vec_element_i32(s, tcg_elem, rn, element, msize); + return tcg_elem; } else { - assert(opc =3D=3D 0xf); - if (is_min) { - gen_helper_vfp_mins(tcg_elt1, tcg_elt1, tcg_elt2, fpst); - } else { - gen_helper_vfp_maxs(tcg_elt1, tcg_elt1, tcg_elt2, fpst); + int bits =3D size / 2; + int shift =3D ctpop8(vmap) / 2; + int vmap_lo =3D (vmap >> shift) & vmap; + int vmap_hi =3D (vmap & ~vmap_lo); + TCGv_i32 tcg_hi, tcg_lo, tcg_res; + + tcg_hi =3D do_reduction_op(s, fpopcode, rn, esize, bits, vmap_hi, = fpst); + tcg_lo =3D do_reduction_op(s, fpopcode, rn, esize, bits, vmap_lo, = fpst); + tcg_res =3D tcg_temp_new_i32(); + + switch (fpopcode) { + case 0x0c: /* fmaxnmv half-precision */ + gen_helper_advsimd_maxnumh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x0f: /* fmaxv half-precision */ + gen_helper_advsimd_maxh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x1c: /* fminnmv half-precision */ + gen_helper_advsimd_minnumh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x1f: /* fminv half-precision */ + gen_helper_advsimd_minh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x2c: /* fmaxnmv */ + gen_helper_vfp_maxnums(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x2f: /* fmaxv */ + gen_helper_vfp_maxs(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x3c: /* fminnmv */ + gen_helper_vfp_minnums(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x3f: /* fminv */ + gen_helper_vfp_mins(tcg_res, tcg_lo, tcg_hi, fpst); + break; + default: + g_assert_not_reached(); } + + tcg_temp_free_i32(tcg_hi); + tcg_temp_free_i32(tcg_lo); + return tcg_res; } } =20 @@ -5802,16 +5851,21 @@ static void disas_simd_across_lanes(DisasContext *s= , uint32_t insn) break; case 0xc: /* FMAXNMV, FMINNMV */ case 0xf: /* FMAXV, FMINV */ - if (!is_u || !is_q || extract32(size, 0, 1)) { - unallocated_encoding(s); - return; - } - /* Bit 1 of size field encodes min vs max, and actual size is alwa= ys - * 32 bits: adjust the size variable so following code can rely on= it + /* Bit 1 of size field encodes min vs max and the actual size + * depends on the encoding of the U bit. If not set (and FP16 + * enabled) then we do half-precision float instead of single + * precision. */ is_min =3D extract32(size, 1, 1); is_fp =3D true; - size =3D 2; + if (!is_u && arm_dc_feature(s, ARM_FEATURE_V8_FP16)) { + size =3D 1; + } else if (!is_u || !is_q || extract32(size, 0, 1)) { + unallocated_encoding(s); + return; + } else { + size =3D 2; + } break; default: unallocated_encoding(s); @@ -5868,38 +5922,18 @@ static void disas_simd_across_lanes(DisasContext *s= , uint32_t insn) =20 } } else { - /* Floating point ops which work on 32 bit (single) intermediates. + /* Floating point vector reduction ops which work across 32 + * bit (single) or 16 bit (half-precision) intermediates. * Note that correct NaN propagation requires that we do these * operations in exactly the order specified by the pseudocode. */ - TCGv_i32 tcg_elt1 =3D tcg_temp_new_i32(); - TCGv_i32 tcg_elt2 =3D tcg_temp_new_i32(); - TCGv_i32 tcg_elt3 =3D tcg_temp_new_i32(); - TCGv_ptr fpst =3D get_fpstatus_ptr(false); - - assert(esize =3D=3D 32); - assert(elements =3D=3D 4); - - read_vec_element(s, tcg_elt, rn, 0, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt1, tcg_elt); - read_vec_element(s, tcg_elt, rn, 1, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt2, tcg_elt); - - do_minmaxop(s, tcg_elt1, tcg_elt2, opcode, is_min, fpst); - - read_vec_element(s, tcg_elt, rn, 2, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt2, tcg_elt); - read_vec_element(s, tcg_elt, rn, 3, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt3, tcg_elt); - - do_minmaxop(s, tcg_elt2, tcg_elt3, opcode, is_min, fpst); - - do_minmaxop(s, tcg_elt1, tcg_elt2, opcode, is_min, fpst); - - tcg_gen_extu_i32_i64(tcg_res, tcg_elt1); - tcg_temp_free_i32(tcg_elt1); - tcg_temp_free_i32(tcg_elt2); - tcg_temp_free_i32(tcg_elt3); + TCGv_ptr fpst =3D get_fpstatus_ptr(size =3D=3D MO_16); + int fpopcode =3D opcode | is_min << 4 | is_u << 5; + int vmap =3D (1 << elements) - 1; + TCGv_i32 tcg_res32 =3D do_reduction_op(s, fpopcode, rn, esize, + (is_q ? 128 : 64), vmap, fpst= ); + tcg_gen_extu_i32_i64(tcg_res, tcg_res32); + tcg_temp_free_i32(tcg_res32); tcg_temp_free_ptr(fpst); } =20 --=20 2.15.1