From nobody Tue Apr 15 18:52:49 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=linaro.org ARC-Seal: i=1; a=rsa-sha256; t=1582292409; cv=none; d=zohomail.com; s=zohoarc; b=dtkXBMZohKpNWkrJVIQ23yRQWiatXcTWXMIiyHLte30EKNYvDurRINNItxK4GNRLbd2jx+9NY+cde9HIZCCLqgqdrl+vqYOA6GD7W4Ta4N6DvtdJK1ZUuptabaRc0Mk+5lORePF6fGC70D9JpNoUftCDMM2CvVY8Mkc0VfR2vvo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1582292409; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=N2CqiOBafoAQTHmhaY96Iclyx2NNEgn6MWbHgGdY59o=; b=Yzo3w/2SvYnU5Psq1fmm0MD7GGu886UecFrs9RHvyjokB7gAHGgKuFxe+aHqV0MCna46dXyJ6rzC5AaqQxXITqUCATpU5P/DJ6YDDFJIG3l7LNSz3NBOYj45K+K+8GMCBLu+ZexcnAq95KAq1PtNjA37YvUdXpTPJPjC+8vS/Uo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 15822924091391009.4000769143965; Fri, 21 Feb 2020 05:40:09 -0800 (PST) Received: from localhost ([::1]:57986 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j58XH-0007OH-Rh for importer@patchew.org; Fri, 21 Feb 2020 08:40:07 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:57109) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j582n-0008Gs-GE for qemu-devel@nongnu.org; Fri, 21 Feb 2020 08:08:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j582l-0003kJ-I6 for qemu-devel@nongnu.org; Fri, 21 Feb 2020 08:08:37 -0500 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]:37992) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j582l-0003iu-9r for qemu-devel@nongnu.org; Fri, 21 Feb 2020 08:08:35 -0500 Received: by mail-wm1-x336.google.com with SMTP id a9so1777867wmj.3 for ; Fri, 21 Feb 2020 05:08:35 -0800 (PST) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [81.2.115.148]) by smtp.gmail.com with ESMTPSA id p15sm3598892wma.40.2020.02.21.05.08.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Feb 2020 05:08:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=N2CqiOBafoAQTHmhaY96Iclyx2NNEgn6MWbHgGdY59o=; b=GNbFM9MYYTQPZyvNGhKWjn1llBgEaUsFapHZcOGo/v4VAnC/2v9RAJ2StFEr3M6+PK KRki88ZxBNOBsjcFQuC9zxdpuvIiA5QrxmLExMXEUV2TBV8kp2fyP+0+Fv6LG0dnGmZb aI76+8sTM70e1YiZzj0eaP+RubjUd4tkBPKygk/LBckT/sRTZtXMItvA7EnYl5UPG6lD 2dF5iZ5mj37RlwDIkjmjxixHRM6gU7Um7uYaTyxlV8WVrp7hne8ng1sMQNvgfcA6fa1Y m0suyMFGmEbzO/sNEenPp5w+jtYo2RY2vswWze+Woxz01wd2SAmknvwUpJl/3MzB+5fJ A/Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=N2CqiOBafoAQTHmhaY96Iclyx2NNEgn6MWbHgGdY59o=; b=nbOdszRM7ZDrDHw916JI11ejW1efLNFZ21G7sac8z8Qd8YzYnh5KARbWxcyb3WKVFN e0hIIQa5kttMZPDHz0zrdP3cGF+EFwpF1M1AGqR+78Omchpg8yFWMojawCaPAsQHIL+Q Y70JqZagGkGHfTcmuVapRCVYzSpaoujZ1MXQpiQIHYSBKpjt12QdYoH/lhLV9iFoDoVP y1b2JV/4NA9bzOg3lfO4cG9QDmt2HPZQKcsxL/UQOlo2nwkIrWXjt7HlJIMW23k3vzXe x3jbkvVUv0H+W6jBP5iZh04OUQ3VWrkXBgmiZachnb62gjXdxQjidpA0feWRjj5bUY8O +0qg== X-Gm-Message-State: APjAAAWzKVR5cG+PPfagDIcav+SFP9JF0wTZNAYGV3wHFysvdfvG/upk lXpNPFuSfnUlRKh+ymBvgDZJApUXRVxApQ== X-Google-Smtp-Source: APXvYqxTXkv8JwZSS/qoLmGgLFLKtF8ABZJYIE8TMa9tALWpWmIAfsrIKk4XJ87saCCk/2oCEWf+nw== X-Received: by 2002:a7b:cf39:: with SMTP id m25mr3803729wmg.146.1582290512714; Fri, 21 Feb 2020 05:08:32 -0800 (PST) From: Peter Maydell To: qemu-devel@nongnu.org Subject: [PULL 41/52] target/arm: Convert PMULL.8 to gvec Date: Fri, 21 Feb 2020 13:07:29 +0000 Message-Id: <20200221130740.7583-42-peter.maydell@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200221130740.7583-1-peter.maydell@linaro.org> References: <20200221130740.7583-1-peter.maydell@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::336 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @linaro.org) From: Richard Henderson We still need two different helpers, since NEON and SVE2 get the inputs from different locations within the source vector. However, we can convert both to the same internal form for computation. The sve2 helper is not used yet, but adding it with this patch helps illustrate why the neon changes are helpful. Tested-by: Alex Benn=C3=A9e Reviewed-by: Alex Benn=C3=A9e Signed-off-by: Richard Henderson Message-id: 20200216214232.4230-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell --- target/arm/helper-sve.h | 2 ++ target/arm/helper.h | 3 +- target/arm/neon_helper.c | 32 -------------------- target/arm/translate-a64.c | 27 +++++++++++------ target/arm/translate.c | 26 ++++++++--------- target/arm/vec_helper.c | 60 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 95 insertions(+), 55 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 9e79182ab46..2f472791558 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1574,3 +1574,5 @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/helper.h b/target/arm/helper.h index 4352fae3dbf..fcbf5041213 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -342,7 +342,6 @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32) DEF_HELPER_2(neon_sub_u16, i32, i32, i32) DEF_HELPER_2(neon_mul_u8, i32, i32, i32) DEF_HELPER_2(neon_mul_u16, i32, i32, i32) -DEF_HELPER_2(neon_mull_p8, i64, i32, i32) =20 DEF_HELPER_2(neon_tst_u8, i32, i32, i32) DEF_HELPER_2(neon_tst_u16, i32, i32, i32) @@ -695,6 +694,8 @@ DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, = ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) =20 +DEF_HELPER_FLAGS_4(neon_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index 6a107da0e11..c7a8438b42a 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -1129,38 +1129,6 @@ NEON_VOP(mul_u8, neon_u8, 4) NEON_VOP(mul_u16, neon_u16, 2) #undef NEON_FN =20 -/* Polynomial multiplication is like integer multiplication except the - partial products are XORed, not added. */ -uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2) -{ - uint64_t result =3D 0; - uint64_t mask; - uint64_t op2ex =3D op2; - op2ex =3D (op2ex & 0xff) | - ((op2ex & 0xff00) << 8) | - ((op2ex & 0xff0000) << 16) | - ((op2ex & 0xff000000) << 24); - while (op1) { - mask =3D 0; - if (op1 & 1) { - mask |=3D 0xffff; - } - if (op1 & (1 << 8)) { - mask |=3D (0xffffU << 16); - } - if (op1 & (1 << 16)) { - mask |=3D (0xffffULL << 32); - } - if (op1 & (1 << 24)) { - mask |=3D (0xffffULL << 48); - } - result ^=3D op2ex & mask; - op1 =3D (op1 >> 1) & 0x7f7f7f7f; - op2ex <<=3D 1; - } - return result; -} - #define NEON_FN(dest, src1, src2) dest =3D (src1 & src2) ? -1 : 0 NEON_VOP(tst_u8, neon_u8, 4) NEON_VOP(tst_u16, neon_u16, 2) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 03ce879497d..596bf4cf734 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10542,10 +10542,6 @@ static void handle_3rd_widening(DisasContext *s, i= nt is_q, int is_u, int size, gen_helper_neon_addl_saturate_s32(tcg_passres, cpu_env, tcg_passres, tcg_passres= ); break; - case 14: /* PMULL */ - assert(size =3D=3D 0); - gen_helper_neon_mull_p8(tcg_passres, tcg_op1, tcg_op2); - break; default: g_assert_not_reached(); } @@ -10709,11 +10705,21 @@ static void disas_simd_three_reg_diff(DisasContex= t *s, uint32_t insn) handle_3rd_narrowing(s, is_q, is_u, size, opcode, rd, rn, rm); break; case 14: /* PMULL, PMULL2 */ - if (is_u || size =3D=3D 1 || size =3D=3D 2) { + if (is_u) { unallocated_encoding(s); return; } - if (size =3D=3D 3) { + switch (size) { + case 0: /* PMULL.P8 */ + if (!fp_access_check(s)) { + return; + } + /* The Q field specifies lo/hi half input for this insn. */ + gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, + gen_helper_neon_pmull_h); + break; + + case 3: /* PMULL.P64 */ if (!dc_isar_feature(aa64_pmull, s)) { unallocated_encoding(s); return; @@ -10724,9 +10730,13 @@ static void disas_simd_three_reg_diff(DisasContext= *s, uint32_t insn) /* The Q field specifies lo/hi half input for this insn. */ gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, gen_helper_gvec_pmull_q); - return; + break; + + default: + unallocated_encoding(s); + break; } - goto is_widening; + return; case 9: /* SQDMLAL, SQDMLAL2 */ case 11: /* SQDMLSL, SQDMLSL2 */ case 13: /* SQDMULL, SQDMULL2 */ @@ -10747,7 +10757,6 @@ static void disas_simd_three_reg_diff(DisasContext = *s, uint32_t insn) unallocated_encoding(s); return; } - is_widening: if (!fp_access_check(s)) { return; } diff --git a/target/arm/translate.c b/target/arm/translate.c index 57d61c4aa57..ea6e984da65 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -5866,15 +5866,20 @@ static int disas_neon_data_insn(DisasContext *s, ui= nt32_t insn) return 1; } =20 - /* Handle VMULL.P64 (Polynomial 64x64 to 128 bit multiply) - * outside the loop below as it only performs a single pas= s. - */ - if (op =3D=3D 14 && size =3D=3D 2) { - if (!dc_isar_feature(aa32_pmull, s)) { - return 1; + /* Handle polynomial VMULL in a single pass. */ + if (op =3D=3D 14) { + if (size =3D=3D 0) { + /* VMULL.P8 */ + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_neon_pmull_h); + } else { + /* VMULL.P64 */ + if (!dc_isar_feature(aa32_pmull, s)) { + return 1; + } + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_gvec_pmull_q); } - tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, - 0, gen_helper_gvec_pmull_q); return 0; } =20 @@ -5952,11 +5957,6 @@ static int disas_neon_data_insn(DisasContext *s, uin= t32_t insn) /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL = */ gen_neon_mull(cpu_V0, tmp, tmp2, size, u); break; - case 14: /* Polynomial VMULL */ - gen_helper_neon_mull_p8(cpu_V0, tmp, tmp2); - tcg_temp_free_i32(tmp2); - tcg_temp_free_i32(tmp); - break; default: /* 15 is RESERVED: caught earlier */ abort(); } diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 79d2624f7b1..8017bd88c4c 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1197,3 +1197,63 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *= vm, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +/* + * 8x8->16 polynomial multiply. + * + * The byte inputs are expanded to (or extracted from) half-words. + * Note that neon and sve2 get the inputs from different positions. + * This allows 4 bytes to be processed in parallel with uint64_t. + */ + +static uint64_t expand_byte_to_half(uint64_t x) +{ + return (x & 0x000000ff) + | ((x & 0x0000ff00) << 8) + | ((x & 0x00ff0000) << 16) + | ((x & 0xff000000) << 24); +} + +static uint64_t pmull_h(uint64_t op1, uint64_t op2) +{ + uint64_t result =3D 0; + int i; + + for (i =3D 0; i < 8; ++i) { + uint64_t mask =3D (op1 & 0x0001000100010001ull) * 0xffff; + result ^=3D op2 & mask; + op1 >>=3D 1; + op2 <<=3D 1; + } + return result; +} + +void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + int hi =3D simd_data(desc); + uint64_t *d =3D vd, *n =3D vn, *m =3D vm; + uint64_t nn =3D n[hi], mm =3D m[hi]; + + d[0] =3D pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm)); + nn >>=3D 32; + mm >>=3D 32; + d[1] =3D pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm)); + + clear_tail(d, 16, simd_maxsz(desc)); +} + +#ifdef TARGET_AARCH64 +void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + int shift =3D simd_data(desc) * 8; + intptr_t i, opr_sz =3D simd_oprsz(desc); + uint64_t *d =3D vd, *n =3D vn, *m =3D vm; + + for (i =3D 0; i < opr_sz / 8; ++i) { + uint64_t nn =3D (n[i] >> shift) & 0x00ff00ff00ff00ffull; + uint64_t mm =3D (m[i] >> shift) & 0x00ff00ff00ff00ffull; + + d[i] =3D pmull_h(nn, mm); + } +} +#endif --=20 2.20.1