From nobody Tue Oct 28 21:06:30 2025 Delivered-To: importer@patchew.org Received-SPF: temperror (zoho.com: Error in retrieving data from DNS) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=temperror (zoho.com: Error in retrieving data from DNS) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 151361911901444.521967591654175; Mon, 18 Dec 2017 09:45:19 -0800 (PST) Received: from localhost ([::1]:59083 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzTD-0006cZ-Ba for importer@patchew.org; Mon, 18 Dec 2017 12:44:55 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37594) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3q-0000iB-F6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3o-0001Y2-L6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:42 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:39324) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3o-0001X6-BO for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:40 -0500 Received: by mail-pf0-x241.google.com with SMTP id l24so9940974pfj.6 for ; Mon, 18 Dec 2017 09:18:40 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.37 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=jZYNM72c+UJ2kJmkbJ/YQCAfFmAcrPcvY4k53BsqArSGKITfvRWdXrdzvHaPMXaCPL cUQ+nUloYBzR1GB2qWxRipDYmsq5DPBnORHE8nnx2TxGH9zY1e9CGOo65L5GHTOYK+6Z eZXpbIM4PR3nL3HtDMunHxXVDvlRsOsP3zyng= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=GmSWftpjaKvDU1oQg6O7ovEtp1q5mVHozwzoOAi6UrU041vsQRUeAtXJoXw+B6MEDB qQVZVCOyopzVi8p/a7GBCcmQ9slEXIai+evk+5Anq2SqMeI3Xyo12iRaFul9Yggx33ye tVYLBAN4b22AKFpvv4ICGUwRyA0214qzE+KpcS9p9qNsUIpTOdtA8ItpYtERbGBDmoQy gqPxCuRMIuqdYU+3SBnh/2X3qzFHZx8ruX2HVvnr4Gr+LD7xq72H1FzYumNn7ipsy3d9 +VzVJjx/RFf/RfBlALgxHNJ/YCD6lRN8s3U8ghx8lc59RdYFToFyDl7mFEqKi3piRbyb hOvg== X-Gm-Message-State: AKGB3mLtjqab77itgGGKLhLaUKsT2E94lH57wHtgjFd1xJcTLZvwyOe0 wFtIpBJ2V25vZ3uPSYs/E1Z61qu+/fI= X-Google-Smtp-Source: ACJfBovkekgck1FJBaOsHh7xESgpbVnedFJA2BAko6N8Qv7z2Lj3O1kmXH1ORVvXwYyM9z/n2SskXA== X-Received: by 10.98.57.131 with SMTP id u3mr404785pfj.7.1513617518894; Mon, 18 Dec 2017 09:18:38 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:56 -0800 Message-Id: <20171218171758.16964-25-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v7 24/26] tcg: Add support for 4 operand vector ops X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_6 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson --- tcg/tcg-op-gvec.h | 45 +++++++++-- tcg/tcg-op-gvec.c | 226 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 266 insertions(+), 5 deletions(-) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 188c3368bd..91459b9c38 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -30,29 +30,43 @@ =20 /* Expand a call to a gvec-style helper, with pointers to two vector operands, and a descriptor (see tcg-gvec-desc.h). */ -typedef void (gen_helper_gvec_2)(TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_2(TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2 *fn); =20 /* Similarly, passing an extra pointer (e.g. env or float_status). */ -typedef void (gen_helper_gvec_2_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i3= 2); +typedef void gen_helper_gvec_2_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2_ptr *fn); =20 /* Similarly, with three vector operands. */ -typedef void (gen_helper_gvec_3)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_3(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_3 *fn); =20 -typedef void (gen_helper_gvec_3_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, - TCGv_ptr, TCGv_i32); +/* Similarly, with four vector operands. */ +typedef void gen_helper_gvec_4(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn); + +typedef void gen_helper_gvec_3_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_3_ptr *fn); =20 +typedef void gen_helper_gvec_4_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn); + /* Expand a gvec operation. Either inline or out-of-line depending on the actual vector size and the operations supported by the host. */ typedef struct { @@ -114,12 +128,33 @@ typedef struct { bool load_dest; } GVecGen3; =20 +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_4 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen4; + void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, const GVecGen2 *); void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, unsigned c, const GVecGen2i *); void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz, const GVecGen3 *); +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t = cofs, + uint32_t opsz, uint32_t clsz, const GVecGen4 *); =20 /* Expand a specific vector operation. */ =20 diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 8ec1817727..5f58859a1e 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -55,6 +55,18 @@ static void check_overlap_3(uint32_t d, uint32_t a, uint= 32_t b, uint32_t s) check_overlap_2(a, b, s); } =20 +/* Verify vector overlap rules for four operands. */ +static void check_overlap_4(uint32_t d, uint32_t a, uint32_t b, + uint32_t c, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(d, c, s); + check_overlap_2(a, b, s); + check_overlap_2(a, c, s); + check_overlap_2(b, c, s); +} + /* Create a descriptor from components. */ uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data) { @@ -118,6 +130,33 @@ void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, = uint32_t bofs, tcg_temp_free_i32(desc); } =20 +/* Generate a call to a gvec-style helper with four vector operands. */ +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc =3D tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 =3D tcg_temp_new_ptr(); + a1 =3D tcg_temp_new_ptr(); + a2 =3D tcg_temp_new_ptr(); + a3 =3D tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + /* Generate a call to a gvec-style helper with three vector operands and an extra pointer operand. */ void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, @@ -165,6 +204,35 @@ void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, = uint32_t bofs, tcg_temp_free_i32(desc); } =20 +/* Generate a call to a gvec-style helper with four vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc =3D tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 =3D tcg_temp_new_ptr(); + a1 =3D tcg_temp_new_ptr(); + a2 =3D tcg_temp_new_ptr(); + a3 =3D tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + /* Return true if we want to implement something of OPRSZ bytes in units of LNSZ. This limits the expansion of inline code. */ static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz) @@ -468,6 +536,30 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs, tcg_temp_free_i32(t0); } =20 +/* Expand OPSZ bytes worth of three-operand operations using i32 elements.= */ +static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i3= 2)) +{ + TCGv_i32 t0 =3D tcg_temp_new_i32(); + TCGv_i32 t1 =3D tcg_temp_new_i32(); + TCGv_i32 t2 =3D tcg_temp_new_i32(); + TCGv_i32 t3 =3D tcg_temp_new_i32(); + uint32_t i; + + for (i =3D 0; i < opsz; i +=3D 4) { + tcg_gen_ld_i32(t1, cpu_env, aofs + i); + tcg_gen_ld_i32(t2, cpu_env, bofs + i); + tcg_gen_ld_i32(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t3); + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + /* Expand OPSZ bytes worth of two-operand operations using i64 elements. = */ static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, void (*fni)(TCGv_i64, TCGv_i64)) @@ -527,6 +619,30 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs, tcg_temp_free_i64(t0); } =20 +/* Expand OPSZ bytes worth of three-operand operations using i64 elements.= */ +static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i6= 4)) +{ + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + TCGv_i64 t3 =3D tcg_temp_new_i64(); + uint32_t i; + + for (i =3D 0; i < opsz; i +=3D 4) { + tcg_gen_ld_i64(t1, cpu_env, aofs + i); + tcg_gen_ld_i64(t2, cpu_env, bofs + i); + tcg_gen_ld_i64(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t3); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + /* Expand OPSZ bytes worth of two-operand operations using host vectors. = */ static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t tysz, TCGType type, @@ -591,6 +707,32 @@ static void expand_3_vec(unsigned vece, uint32_t dofs,= uint32_t aofs, tcg_temp_free_vec(t0); } =20 +/* Expand OPSZ bytes worth of four-operand operations using host vectors. = */ +static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t cofs, uint32_t opsz, + uint32_t tysz, TCGType type, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, + TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 =3D tcg_temp_new_vec(type); + TCGv_vec t1 =3D tcg_temp_new_vec(type); + TCGv_vec t2 =3D tcg_temp_new_vec(type); + TCGv_vec t3 =3D tcg_temp_new_vec(type); + uint32_t i; + + for (i =3D 0; i < opsz; i +=3D tysz) { + tcg_gen_ld_vec(t1, cpu_env, aofs + i); + tcg_gen_ld_vec(t2, cpu_env, bofs + i); + tcg_gen_ld_vec(t3, cpu_env, cofs + i); + fni(vece, t0, t1, t2, t3); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + /* Expand a vector two-operand operation. */ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g) @@ -831,6 +973,90 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint= 32_t bofs, tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, g->data, g->fno); } =20 +/* Expand a vector four-operand operation. */ +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t = cofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs | cofs); + check_overlap_4(dofs, aofs, bofs, cofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz =3D=3D 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece)= )) { + uint32_t done =3D QEMU_ALIGN_DOWN(oprsz, 32); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 32, TCG_TYPE_V256, g->fniv); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + oprsz -=3D done; + maxsz -=3D done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece)= )) { + uint32_t done =3D QEMU_ALIGN_DOWN(oprsz, 16); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 16, TCG_TYPE_V128, g->fniv); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + oprsz -=3D done; + maxsz -=3D done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done =3D QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_4_i64(dofs, aofs, bofs, cofs, done, g->fni8); + } else { + done =3D 0; + } + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + oprsz -=3D done; + maxsz -=3D done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done =3D QEMU_ALIGN_DOWN(oprsz, 4); + expand_4_i32(dofs, aofs, bofs, cofs, done, g->fni4); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + oprsz -=3D done; + maxsz -=3D done; + } + + if (oprsz =3D=3D 0) { + if (maxsz !=3D 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs, oprsz, maxsz, g->data, g->f= no); +} + /* * Expand specific vector operations. */ --=20 2.14.3