From nobody Mon Feb  9 08:15:11 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=fail(p=none dis=none)  header.from=linaro.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1519400506552998.5666782179516;
 Fri, 23 Feb 2018 07:41:46 -0800 (PST)
Received: from localhost ([::1]:45263 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1epFTl-0006zJ-K3
	for importer@patchew.org; Fri, 23 Feb 2018 10:41:45 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44377)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1epFOx-0002lS-0c
	for qemu-devel@nongnu.org; Fri, 23 Feb 2018 10:36:48 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1epFOv-000745-CP
	for qemu-devel@nongnu.org; Fri, 23 Feb 2018 10:36:47 -0500
Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:35686)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <alex.bennee@linaro.org>)
	id 1epFOv-00072H-2P
	for qemu-devel@nongnu.org; Fri, 23 Feb 2018 10:36:45 -0500
Received: by mail-wm0-x242.google.com with SMTP id x7so3053878wmc.0
	for <qemu-devel@nongnu.org>; Fri, 23 Feb 2018 07:36:44 -0800 (PST)
Received: from zen.linaro.local ([81.128.185.34])
	by smtp.gmail.com with ESMTPSA id
	m187sm2910038wmg.0.2018.02.23.07.36.38
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 23 Feb 2018 07:36:42 -0800 (PST)
Received: from zen.linaroharston (localhost [127.0.0.1])
	by zen.linaro.local (Postfix) with ESMTP id EEC463E043F;
	Fri, 23 Feb 2018 15:36:36 +0000 (GMT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references
	:mime-version:content-transfer-encoding;
	bh=tIJ5d2GcFeAwMZmYlpoC01XfdO99ce2ZsNfJkGnJ5JY=;
	b=HQd15bvGLeie0iCaWPmJFobuvO4MGRjOVlm9NA+T+iFSsuGa7fjCmnAXAUHUJKWexu
	drVQWEpgYotFG00ZLJM97cpSV7fIhbXNOfCGs2AGQXEJkuML8CqDwh758hSKMyrHNR6s
	l+W05CkPz/JW5eLUPCZEC2zU4nhf1jw/vkJmw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references:mime-version:content-transfer-encoding;
	bh=tIJ5d2GcFeAwMZmYlpoC01XfdO99ce2ZsNfJkGnJ5JY=;
	b=CgvFMJXBSEMkzHy8o2YSfvMqdRpRke8zOqDv+wT6lWN1bICSqzMroaFlugAa5HTJer
	1c0aeIdsVJyDjm/5ljgy5HmkFj8l3zDMWwJShqY9u8rKQf5/Qxl0Eak8T1TL/KSWTaSq
	AtfrJXGqxaDM260BRLEQ0LfwUiRai4goME5JscpimCQQIcNHfZCafNXW3a50YMfomCnF
	adUOK7vWfQzomvDiJjpxdW63pMFVxUD7WCAWvrw6qdbFmgDs/cYMi6r9pEgm3GcWS2YY
	EFUWTCoKH2Ul/LijR/Q9idvSid1uT3QJUvaTSQPhKZTWP+ixDMeT4Rwkk0bq7Gc3ZMoZ
	UUaA==
X-Gm-Message-State: APf1xPA/r6R1bYOLy5HD+35hf0j+RAEB9hgmD2upfpSd4U5pwbrA5JjI
	7ppBoQ5/PzscTeCDBv6KTH32kg==
X-Google-Smtp-Source: 
 AG47ELuxLK+IAw2/UcPfRnzxfCJMSV/XP2oP7Dz9ZWH/07Ck45YbZDp1WiZ3aaWYz1eJsGU2bl18fQ==
X-Received: by 10.28.138.6 with SMTP id m6mr2215789wmd.146.1519400203909;
	Fri, 23 Feb 2018 07:36:43 -0800 (PST)
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>
To: qemu-arm@nongnu.org
Date: Fri, 23 Feb 2018 15:36:11 +0000
Message-Id: <20180223153636.29809-7-alex.bennee@linaro.org>
X-Mailer: git-send-email 2.15.1
In-Reply-To: <20180223153636.29809-1-alex.bennee@linaro.org>
References: <20180223153636.29809-1-alex.bennee@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2a00:1450:400c:c09::242
Subject: [Qemu-devel] [PATCH v3 06/31] arm/translate-a64: implement
 half-precision F(MIN|MAX)(V|NMV)
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>,
	richard.henderson@linaro.org, qemu-devel@nongnu.org,
	Peter Maydell <peter.maydell@linaro.org>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0

This implements the half-precision variants of the across vector
reduction operations. This involves a re-factor of the reduction code
which more closely matches the ARM ARM order (and handles 8 element
reductions).

Signed-off-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
--
v1
  - dropped the advsimd_2a stuff
v2
  - fixed up checkpatch
v3
  - add TCG_CALL_NO_RWG to helper definitions
---
 target/arm/helper-a64.c    |  18 ++++++
 target/arm/helper-a64.h    |   4 ++
 target/arm/translate-a64.c | 142 ++++++++++++++++++++++++++++-------------=
----
 3 files changed, 110 insertions(+), 54 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 10e08bdc1f..fddd5d242b 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -572,3 +572,21 @@ uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMSt=
ate *env, uint64_t addr,
 {
     return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, true, GETPC()=
);
 }
+
+/*
+ * AdvSIMD half-precision
+ */
+
+#define ADVSIMD_HELPER(name, suffix) HELPER(glue(glue(advsimd_, name), suf=
fix))
+
+#define ADVSIMD_HALFOP(name) \
+float16 ADVSIMD_HELPER(name, h)(float16 a, float16 b, void *fpstp) \
+{ \
+    float_status *fpst =3D fpstp; \
+    return float16_ ## name(a, b, fpst);    \
+}
+
+ADVSIMD_HALFOP(min)
+ADVSIMD_HALFOP(max)
+ADVSIMD_HALFOP(minnum)
+ADVSIMD_HALFOP(maxnum)
diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 85d86741db..cb2a73124d 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -48,3 +48,7 @@ DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL=
_NO_WG,
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64=
, i64)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG,
                    i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
+DEF_HELPER_FLAGS_3(advsimd_minh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
+DEF_HELPER_FLAGS_3(advsimd_maxnumh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
+DEF_HELPER_FLAGS_3(advsimd_minnumh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 91c2b8ed11..ebaf4571ac 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5741,26 +5741,75 @@ static void disas_simd_zip_trn(DisasContext *s, uin=
t32_t insn)
     tcg_temp_free_i64(tcg_resh);
 }
=20
-static void do_minmaxop(DisasContext *s, TCGv_i32 tcg_elt1, TCGv_i32 tcg_e=
lt2,
-                        int opc, bool is_min, TCGv_ptr fpst)
-{
-    /* Helper function for disas_simd_across_lanes: do a single precision
-     * min/max operation on the specified two inputs,
-     * and return the result in tcg_elt1.
-     */
-    if (opc =3D=3D 0xc) {
-        if (is_min) {
-            gen_helper_vfp_minnums(tcg_elt1, tcg_elt1, tcg_elt2, fpst);
-        } else {
-            gen_helper_vfp_maxnums(tcg_elt1, tcg_elt1, tcg_elt2, fpst);
-        }
+/*
+ * do_reduction_op helper
+ *
+ * This mirrors the Reduce() pseudocode in the ARM ARM. It is
+ * important for correct NaN propagation that we do these
+ * operations in exactly the order specified by the pseudocode.
+ *
+ * This is a recursive function, TCG temps should be freed by the
+ * calling function once it is done with the values.
+ */
+static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn,
+                                int esize, int size, int vmap, TCGv_ptr fp=
st)
+{
+    if (esize =3D=3D size) {
+        int element;
+        TCGMemOp msize =3D esize =3D=3D 16 ? MO_16 : MO_32;
+        TCGv_i32 tcg_elem;
+
+        /* We should have one register left here */
+        assert(ctpop8(vmap) =3D=3D 1);
+        element =3D ctz32(vmap);
+        assert(element < 8);
+
+        tcg_elem =3D tcg_temp_new_i32();
+        read_vec_element_i32(s, tcg_elem, rn, element, msize);
+        return tcg_elem;
     } else {
-        assert(opc =3D=3D 0xf);
-        if (is_min) {
-            gen_helper_vfp_mins(tcg_elt1, tcg_elt1, tcg_elt2, fpst);
-        } else {
-            gen_helper_vfp_maxs(tcg_elt1, tcg_elt1, tcg_elt2, fpst);
+        int bits =3D size / 2;
+        int shift =3D ctpop8(vmap) / 2;
+        int vmap_lo =3D (vmap >> shift) & vmap;
+        int vmap_hi =3D (vmap & ~vmap_lo);
+        TCGv_i32 tcg_hi, tcg_lo, tcg_res;
+
+        tcg_hi =3D do_reduction_op(s, fpopcode, rn, esize, bits, vmap_hi, =
fpst);
+        tcg_lo =3D do_reduction_op(s, fpopcode, rn, esize, bits, vmap_lo, =
fpst);
+        tcg_res =3D tcg_temp_new_i32();
+
+        switch (fpopcode) {
+        case 0x0c: /* fmaxnmv half-precision */
+            gen_helper_advsimd_maxnumh(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x0f: /* fmaxv half-precision */
+            gen_helper_advsimd_maxh(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x1c: /* fminnmv half-precision */
+            gen_helper_advsimd_minnumh(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x1f: /* fminv half-precision */
+            gen_helper_advsimd_minh(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x2c: /* fmaxnmv */
+            gen_helper_vfp_maxnums(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x2f: /* fmaxv */
+            gen_helper_vfp_maxs(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x3c: /* fminnmv */
+            gen_helper_vfp_minnums(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        case 0x3f: /* fminv */
+            gen_helper_vfp_mins(tcg_res, tcg_lo, tcg_hi, fpst);
+            break;
+        default:
+            g_assert_not_reached();
         }
+
+        tcg_temp_free_i32(tcg_hi);
+        tcg_temp_free_i32(tcg_lo);
+        return tcg_res;
     }
 }
=20
@@ -5802,16 +5851,21 @@ static void disas_simd_across_lanes(DisasContext *s=
, uint32_t insn)
         break;
     case 0xc: /* FMAXNMV, FMINNMV */
     case 0xf: /* FMAXV, FMINV */
-        if (!is_u || !is_q || extract32(size, 0, 1)) {
-            unallocated_encoding(s);
-            return;
-        }
-        /* Bit 1 of size field encodes min vs max, and actual size is alwa=
ys
-         * 32 bits: adjust the size variable so following code can rely on=
 it
+        /* Bit 1 of size field encodes min vs max and the actual size
+         * depends on the encoding of the U bit. If not set (and FP16
+         * enabled) then we do half-precision float instead of single
+         * precision.
          */
         is_min =3D extract32(size, 1, 1);
         is_fp =3D true;
-        size =3D 2;
+        if (!is_u && arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            size =3D 1;
+        } else if (!is_u || !is_q || extract32(size, 0, 1)) {
+            unallocated_encoding(s);
+            return;
+        } else {
+            size =3D 2;
+        }
         break;
     default:
         unallocated_encoding(s);
@@ -5868,38 +5922,18 @@ static void disas_simd_across_lanes(DisasContext *s=
, uint32_t insn)
=20
         }
     } else {
-        /* Floating point ops which work on 32 bit (single) intermediates.
+        /* Floating point vector reduction ops which work across 32
+         * bit (single) or 16 bit (half-precision) intermediates.
          * Note that correct NaN propagation requires that we do these
          * operations in exactly the order specified by the pseudocode.
          */
-        TCGv_i32 tcg_elt1 =3D tcg_temp_new_i32();
-        TCGv_i32 tcg_elt2 =3D tcg_temp_new_i32();
-        TCGv_i32 tcg_elt3 =3D tcg_temp_new_i32();
-        TCGv_ptr fpst =3D get_fpstatus_ptr(false);
-
-        assert(esize =3D=3D 32);
-        assert(elements =3D=3D 4);
-
-        read_vec_element(s, tcg_elt, rn, 0, MO_32);
-        tcg_gen_extrl_i64_i32(tcg_elt1, tcg_elt);
-        read_vec_element(s, tcg_elt, rn, 1, MO_32);
-        tcg_gen_extrl_i64_i32(tcg_elt2, tcg_elt);
-
-        do_minmaxop(s, tcg_elt1, tcg_elt2, opcode, is_min, fpst);
-
-        read_vec_element(s, tcg_elt, rn, 2, MO_32);
-        tcg_gen_extrl_i64_i32(tcg_elt2, tcg_elt);
-        read_vec_element(s, tcg_elt, rn, 3, MO_32);
-        tcg_gen_extrl_i64_i32(tcg_elt3, tcg_elt);
-
-        do_minmaxop(s, tcg_elt2, tcg_elt3, opcode, is_min, fpst);
-
-        do_minmaxop(s, tcg_elt1, tcg_elt2, opcode, is_min, fpst);
-
-        tcg_gen_extu_i32_i64(tcg_res, tcg_elt1);
-        tcg_temp_free_i32(tcg_elt1);
-        tcg_temp_free_i32(tcg_elt2);
-        tcg_temp_free_i32(tcg_elt3);
+        TCGv_ptr fpst =3D get_fpstatus_ptr(size =3D=3D MO_16);
+        int fpopcode =3D opcode | is_min << 4 | is_u << 5;
+        int vmap =3D (1 << elements) - 1;
+        TCGv_i32 tcg_res32 =3D do_reduction_op(s, fpopcode, rn, esize,
+                                             (is_q ? 128 : 64), vmap, fpst=
);
+        tcg_gen_extu_i32_i64(tcg_res, tcg_res32);
+        tcg_temp_free_i32(tcg_res32);
         tcg_temp_free_ptr(fpst);
     }
=20
--=20
2.15.1