From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011159403129.33278494761623; Thu, 17 Aug 2017 16:05:59 -0700 (PDT) Received: from localhost ([::1]:56352 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTrS-0008LA-2R for importer@patchew.org; Thu, 17 Aug 2017 19:05:58 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44511) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTn3-0004vZ-K0 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn0-0000qV-BS for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400 Received: from mail-pg0-x22a.google.com ([2607:f8b0:400e:c05::22a]:38803) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn0-0000pD-1a for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:22 -0400 Received: by mail-pg0-x22a.google.com with SMTP id t80so24355818pgb.5 for ; Thu, 17 Aug 2017 16:01:21 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6IuCfVCVWZnbLbStTM3CUG3Tw378RJWUNZA3UpqaoME=; b=HFo6/2zN2J4WP4XpPEfxz1htFoKxrkWDoeCwSZOFFO3LDcTHQFMm4IluN8zVNDlIXT bYvyf3TgmNLlVOKVa6ThZen0jBM4gKa8AGN0hjkBM3wULth3tNGFtuawxLCNnRV8P/Sb T8IyC60YFJHgGcmj9UHeE2AIR6zgZIPaiXW6M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6IuCfVCVWZnbLbStTM3CUG3Tw378RJWUNZA3UpqaoME=; b=EKkXZBdRVEcAzKMb/ijp0yqonKcAGDTBv/uHhrv8GeCpHnMHrUHgeI3UZi1q0jMiBp DnNTEmehAFfZHhoSTqaa2keNj3kFLxFkPAQ5kVkyWlT+CsmgDM93AlVXxeHjgsQHUJma KkYlw6U1SZg2jz6d1bcGERyas9hcBKJ3XXOzJycuvhkFwi6vlFG18/nIMUX6zc63wx2W 80CNRhIgRNXmjdYdSsLcQ2Bj3hkXfT7sUO7sWRKuIEwaJGw3+/hVBfhP8a0HEH6yFcAB fXHujo6TlwTgJk1B0+jOVDKnroE30gGMNz9RMhLWJ1JsElkjS45Gw+3UQAXyZi+cZQq7 5hAw== X-Gm-Message-State: AHYfb5hMTKEAKMYTGSW2EVPUoLXN9Th613V5SeaDqAl3lOwHrlW8bN73 p7lAoNWe0OWSXVL9wWwwTQ== X-Received: by 10.98.70.132 with SMTP id o4mr6736746pfi.104.1503010877999; Thu, 17 Aug 2017 16:01:17 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:07 -0700 Message-Id: <20170817230114.3655-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22a Subject: [Qemu-devel] [PATCH 1/8] tcg: Add generic vector infrastructure and ops for add/sub/logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- Makefile.target | 5 +- tcg/tcg-op-gvec.h | 88 ++++++++++ tcg/tcg-runtime.h | 16 ++ tcg/tcg-op-gvec.c | 443 +++++++++++++++++++++++++++++++++++++++++++++= ++++ tcg/tcg-runtime-gvec.c | 199 ++++++++++++++++++++++ 5 files changed, 749 insertions(+), 2 deletions(-) create mode 100644 tcg/tcg-op-gvec.h create mode 100644 tcg/tcg-op-gvec.c create mode 100644 tcg/tcg-runtime-gvec.c diff --git a/Makefile.target b/Makefile.target index 7f42c45db8..9ae3e904f7 100644 --- a/Makefile.target +++ b/Makefile.target @@ -93,8 +93,9 @@ all: $(PROGS) stap # cpu emulator library obj-y +=3D exec.o obj-y +=3D accel/ -obj-$(CONFIG_TCG) +=3D tcg/tcg.o tcg/tcg-op.o tcg/optimize.o -obj-$(CONFIG_TCG) +=3D tcg/tcg-common.o tcg/tcg-runtime.o +obj-$(CONFIG_TCG) +=3D tcg/tcg.o tcg/tcg-common.o tcg/optimize.o +obj-$(CONFIG_TCG) +=3D tcg/tcg-op.o tcg/tcg-op-gvec.o +obj-$(CONFIG_TCG) +=3D tcg/tcg-runtime.o tcg/tcg-runtime-gvec.o obj-$(CONFIG_TCG_INTERPRETER) +=3D tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) +=3D disas/tci.o obj-y +=3D fpu/softfloat.o diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h new file mode 100644 index 0000000000..10db3599a5 --- /dev/null +++ b/tcg/tcg-op-gvec.h @@ -0,0 +1,88 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* + * "Generic" vectors. All operands are given as offsets from ENV, + * and therefore cannot also be allocated via tcg_global_mem_new_*. + * OPSZ is the byte size of the vector upon which the operation is perform= ed. + * CLSZ is the byte size of the full vector; bytes beyond OPSZ are cleared. + * + * All sizes must be 8 or any multiple of 16. + * When OPSZ is 8, the alignment may be 8, otherwise must be 16. + * Operands may completely, but not partially, overlap. + */ + +/* Fundamental operation expanders. These are exposed to the front ends + so that target-specific SIMD operations can be handled similarly to + the standard SIMD operations. */ + +typedef struct { + /* "Small" sizes: expand inline as a 64-bit or 32-bit lane. + Generally only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32); + /* Similarly, but load up a constant and re-use across lanes. */ + void (*fni8x)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64); + uint64_t extra_value; + /* Larger sizes: expand out-of-line helper w/size descriptor. */ + void (*fno)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +} GVecGen3; + +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz, const GVecGen3 *); + +#define DEF_GVEC_2(X) \ + void tcg_gen_gvec_##X(uint32_t dofs, uint32_t aofs, uint32_t bofs, \ + uint32_t opsz, uint32_t clsz) + +DEF_GVEC_2(add8); +DEF_GVEC_2(add16); +DEF_GVEC_2(add32); +DEF_GVEC_2(add64); + +DEF_GVEC_2(sub8); +DEF_GVEC_2(sub16); +DEF_GVEC_2(sub32); +DEF_GVEC_2(sub64); + +DEF_GVEC_2(and8); +DEF_GVEC_2(or8); +DEF_GVEC_2(xor8); +DEF_GVEC_2(andc8); +DEF_GVEC_2(orc8); + +#undef DEF_GVEC_2 + +/* + * 64-bit vector operations. Use these when the register has been + * allocated with tcg_global_mem_new_i64. OPSZ =3D CLSZ =3D 8. + */ + +#define DEF_VEC8_2(X) \ + void tcg_gen_vec8_##X(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) + +DEF_VEC8_2(add8); +DEF_VEC8_2(add16); +DEF_VEC8_2(add32); + +DEF_VEC8_2(sub8); +DEF_VEC8_2(sub16); +DEF_VEC8_2(sub32); + +#undef DEF_VEC8_2 diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h index c41d38a557..f8d07090f8 100644 --- a/tcg/tcg-runtime.h +++ b/tcg/tcg-runtime.h @@ -134,3 +134,19 @@ GEN_ATOMIC_HELPERS(xor_fetch) GEN_ATOMIC_HELPERS(xchg) =20 #undef GEN_ATOMIC_HELPERS + +DEF_HELPER_FLAGS_4(gvec_add8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_and8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_or8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_xor8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_andc8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_orc8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c new file mode 100644 index 0000000000..6de49dc07f --- /dev/null +++ b/tcg/tcg-op-gvec.c @@ -0,0 +1,443 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "cpu.h" +#include "exec/exec-all.h" +#include "tcg.h" +#include "tcg-op.h" +#include "tcg-op-gvec.h" +#include "trace-tcg.h" +#include "trace/mem.h" + +#define REP8(x) ((x) * 0x0101010101010101ull) +#define REP16(x) ((x) * 0x0001000100010001ull) + +#define MAX_INLINE 16 + +static inline void check_size_s(uint32_t opsz, uint32_t clsz) +{ + tcg_debug_assert(opsz % 8 =3D=3D 0); + tcg_debug_assert(clsz % 8 =3D=3D 0); + tcg_debug_assert(opsz <=3D clsz); +} + +static inline void check_align_s_3(uint32_t dofs, uint32_t aofs, uint32_t = bofs) +{ + tcg_debug_assert(dofs % 8 =3D=3D 0); + tcg_debug_assert(aofs % 8 =3D=3D 0); + tcg_debug_assert(bofs % 8 =3D=3D 0); +} + +static inline void check_size_l(uint32_t opsz, uint32_t clsz) +{ + tcg_debug_assert(opsz % 16 =3D=3D 0); + tcg_debug_assert(clsz % 16 =3D=3D 0); + tcg_debug_assert(opsz <=3D clsz); +} + +static inline void check_align_l_3(uint32_t dofs, uint32_t aofs, uint32_t = bofs) +{ + tcg_debug_assert(dofs % 16 =3D=3D 0); + tcg_debug_assert(aofs % 16 =3D=3D 0); + tcg_debug_assert(bofs % 16 =3D=3D 0); +} + +static inline void check_overlap_3(uint32_t d, uint32_t a, + uint32_t b, uint32_t s) +{ + tcg_debug_assert(d =3D=3D a || d + s <=3D a || a + s <=3D d); + tcg_debug_assert(d =3D=3D b || d + s <=3D b || b + s <=3D d); + tcg_debug_assert(a =3D=3D b || a + s <=3D b || b + s <=3D a); +} + +static void expand_clr(uint32_t dofs, uint32_t opsz, uint32_t clsz) +{ + if (clsz > opsz) { + TCGv_i64 zero =3D tcg_const_i64(0); + uint32_t i; + + for (i =3D opsz; i < clsz; i +=3D 8) { + tcg_gen_st_i64(zero, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i64(zero); + } +} + +static TCGv_i32 make_desc(uint32_t opsz, uint32_t clsz) +{ + tcg_debug_assert(opsz >=3D 16 && opsz <=3D 255 * 16 && opsz % 16 =3D= =3D 0); + tcg_debug_assert(clsz >=3D 16 && clsz <=3D 255 * 16 && clsz % 16 =3D= =3D 0); + opsz /=3D 16; + clsz /=3D 16; + opsz -=3D 1; + clsz -=3D 1; + return tcg_const_i32(deposit32(opsz, 8, 8, clsz)); +} + +static void expand_3_o(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz, + void (*fno)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32)) +{ + TCGv_ptr d =3D tcg_temp_new_ptr(); + TCGv_ptr a =3D tcg_temp_new_ptr(); + TCGv_ptr b =3D tcg_temp_new_ptr(); + TCGv_i32 desc =3D make_desc(opsz, clsz); + + tcg_gen_addi_ptr(d, tcg_ctx.tcg_env, dofs); + tcg_gen_addi_ptr(a, tcg_ctx.tcg_env, aofs); + tcg_gen_addi_ptr(b, tcg_ctx.tcg_env, bofs); + fno(d, a, b, desc); + + tcg_temp_free_ptr(d); + tcg_temp_free_ptr(a); + tcg_temp_free_ptr(b); + tcg_temp_free_i32(desc); +} + +static void expand_3x4(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 =3D tcg_temp_new_i32(); + uint32_t i; + + if (aofs =3D=3D bofs) { + for (i =3D 0; i < opsz; i +=3D 4) { + tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i); + fni(t0, t0, t0); + tcg_gen_st_i32(t0, tcg_ctx.tcg_env, dofs + i); + } + } else { + TCGv_i32 t1 =3D tcg_temp_new_i32(); + for (i =3D 0; i < opsz; i +=3D 4) { + tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i); + tcg_gen_ld_i32(t1, tcg_ctx.tcg_env, bofs + i); + fni(t0, t0, t1); + tcg_gen_st_i32(t0, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i32(t1); + } + tcg_temp_free_i32(t0); +} + +static void expand_3x8(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 =3D tcg_temp_new_i64(); + uint32_t i; + + if (aofs =3D=3D bofs) { + for (i =3D 0; i < opsz; i +=3D 8) { + tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i); + fni(t0, t0, t0); + tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i); + } + } else { + TCGv_i64 t1 =3D tcg_temp_new_i64(); + for (i =3D 0; i < opsz; i +=3D 8) { + tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i); + tcg_gen_ld_i64(t1, tcg_ctx.tcg_env, bofs + i); + fni(t0, t0, t1); + tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i64(t1); + } + tcg_temp_free_i64(t0); +} + +static void expand_3x8p1(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint64_t data, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i6= 4)) +{ + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_const_i64(data); + uint32_t i; + + if (aofs =3D=3D bofs) { + for (i =3D 0; i < opsz; i +=3D 8) { + tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i); + fni(t0, t0, t0, t2); + tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i); + } + } else { + TCGv_i64 t1 =3D tcg_temp_new_i64(); + for (i =3D 0; i < opsz; i +=3D 8) { + tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i); + tcg_gen_ld_i64(t1, tcg_ctx.tcg_env, bofs + i); + fni(t0, t0, t1, t2); + tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i64(t1); + } + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz, const GVecGen3 *g) +{ + check_overlap_3(dofs, aofs, bofs, clsz); + if (opsz <=3D MAX_INLINE) { + check_size_s(opsz, clsz); + check_align_s_3(dofs, aofs, bofs); + if (g->fni8) { + expand_3x8(dofs, aofs, bofs, opsz, g->fni8); + } else if (g->fni4) { + expand_3x4(dofs, aofs, bofs, opsz, g->fni4); + } else if (g->fni8x) { + expand_3x8p1(dofs, aofs, bofs, opsz, g->extra_value, g->fni8x); + } else { + g_assert_not_reached(); + } + expand_clr(dofs, opsz, clsz); + } else { + check_size_l(opsz, clsz); + check_align_l_3(dofs, aofs, bofs); + expand_3_o(dofs, aofs, bofs, opsz, clsz, g->fno); + } +} + +static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + TCGv_i64 t3 =3D tcg_temp_new_i64(); + + tcg_gen_andc_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_xor_i64(t3, a, b); + tcg_gen_add_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_gvec_add8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .extra_value =3D REP8(0x80), + .fni8x =3D gen_addv_mask, + .fno =3D gen_helper_gvec_add8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_add16(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .extra_value =3D REP16(0x8000), + .fni8x =3D gen_addv_mask, + .fno =3D gen_helper_gvec_add16, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_add32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni4 =3D tcg_gen_add_i32, + .fno =3D gen_helper_gvec_add32, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_add64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_add_i64, + .fno =3D gen_helper_gvec_add64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_vec8_add8(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m =3D tcg_const_i64(REP8(0x80)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec8_add16(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m =3D tcg_const_i64(REP16(0x8000)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec8_add32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, a, ~0xffffffffull); + tcg_gen_add_i64(t2, a, b); + tcg_gen_add_i64(t1, t1, b); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + TCGv_i64 t3 =3D tcg_temp_new_i64(); + + tcg_gen_or_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_eqv_i64(t3, a, b); + tcg_gen_sub_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_gvec_sub8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .extra_value =3D REP8(0x80), + .fni8x =3D gen_subv_mask, + .fno =3D gen_helper_gvec_sub8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_sub16(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .extra_value =3D REP16(0x8000), + .fni8x =3D gen_subv_mask, + .fno =3D gen_helper_gvec_sub16, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_sub32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni4 =3D tcg_gen_sub_i32, + .fno =3D gen_helper_gvec_sub32, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_sub64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_sub_i64, + .fno =3D gen_helper_gvec_sub64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_vec8_sub8(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m =3D tcg_const_i64(REP8(0x80)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec8_sub16(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m =3D tcg_const_i64(REP16(0x8000)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec8_sub32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_sub_i64(t2, a, b); + tcg_gen_sub_i64(t1, a, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_and8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_and_i64, + .fno =3D gen_helper_gvec_and8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_or8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_or_i64, + .fno =3D gen_helper_gvec_or8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_xor8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_xor_i64, + .fno =3D gen_helper_gvec_xor8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_andc8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_andc_i64, + .fno =3D gen_helper_gvec_andc8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_orc8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g =3D { + .fni8 =3D tcg_gen_orc_i64, + .fno =3D gen_helper_gvec_orc8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} diff --git a/tcg/tcg-runtime-gvec.c b/tcg/tcg-runtime-gvec.c new file mode 100644 index 0000000000..9a37ce07a2 --- /dev/null +++ b/tcg/tcg-runtime-gvec.c @@ -0,0 +1,199 @@ +/* + * Generic vectorized operation runtime + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "cpu.h" +#include "exec/helper-proto.h" + +/* Virtually all hosts support 16-byte vectors. Those that don't + can emulate them via GCC's generic vector extension. + + In tcg-op-gvec.c, we asserted that both the size and alignment + of the data are multiples of 16. */ + +typedef uint8_t vec8 __attribute__((vector_size(16))); +typedef uint16_t vec16 __attribute__((vector_size(16))); +typedef uint32_t vec32 __attribute__((vector_size(16))); +typedef uint64_t vec64 __attribute__((vector_size(16))); + +static inline intptr_t extract_opsz(uint32_t desc) +{ + return ((desc & 0xff) + 1) * 16; +} + +static inline intptr_t extract_clsz(uint32_t desc) +{ + return (((desc >> 8) & 0xff) + 1) * 16; +} + +static inline void clear_high(void *d, intptr_t opsz, uint32_t desc) +{ + intptr_t clsz =3D extract_clsz(desc); + intptr_t i; + + if (unlikely(clsz > opsz)) { + for (i =3D opsz; i < clsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D (vec64){ 0 }; + } + } +} + +void HELPER(gvec_add8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec8)) { + *(vec8 *)(d + i) =3D *(vec8 *)(a + i) + *(vec8 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_add16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec16)) { + *(vec16 *)(d + i) =3D *(vec16 *)(a + i) + *(vec16 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_add32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec32)) { + *(vec32 *)(d + i) =3D *(vec32 *)(a + i) + *(vec32 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) + *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec8)) { + *(vec8 *)(d + i) =3D *(vec8 *)(a + i) - *(vec8 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_sub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec16)) { + *(vec16 *)(d + i) =3D *(vec16 *)(a + i) - *(vec16 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_sub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec32)) { + *(vec32 *)(d + i) =3D *(vec32 *)(a + i) - *(vec32 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) - *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_and8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) & *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_or8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) | *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_xor8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) ^ *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_andc8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) &~ *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} + +void HELPER(gvec_orc8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t opsz =3D extract_opsz(desc); + intptr_t i; + + for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) { + *(vec64 *)(d + i) =3D *(vec64 *)(a + i) |~ *(vec64 *)(b + i); + } + clear_high(d, opsz, desc); +} --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011007548413.1288451564941; Thu, 17 Aug 2017 16:03:27 -0700 (PDT) Received: from localhost ([::1]:56297 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diToy-0006A2-Or for importer@patchew.org; Thu, 17 Aug 2017 19:03:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44460) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTn1-0004so-0O for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTmz-0000nM-1m for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:23 -0400 Received: from mail-pg0-x235.google.com ([2607:f8b0:400e:c05::235]:33446) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTmy-0000lm-Qa for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:20 -0400 Received: by mail-pg0-x235.google.com with SMTP id t3so23902992pgt.0 for ; Thu, 17 Aug 2017 16:01:20 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aXDZ4Qr9mwfYKP1Hb4K6fjgDzUCvCw9IzsWpbOBP7G0=; b=TXNcPQ8qLiZ3cZen6iUEA4m0Jsfn/DYXvMtximn7ImIWf0YkK5RkHY5Z+lwX+gpuvm k5c4PyN7e1pAPlEPAxkLgwaCpUKDkaljzxKAVNNrSYY9R9FE0k5S+IYD+LVZbqAR40vi VJL9vyISIzfpC7MvYPE8nBnleg+IbgdwNGpIM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=aXDZ4Qr9mwfYKP1Hb4K6fjgDzUCvCw9IzsWpbOBP7G0=; b=PfcdjFlZa2QKmjZLYnxAM8Z+rwM5QGbWJP+hZ72h/pbCVaRMNp11rezXpVRrV0hAfX HSck5dn9BQYJEwJBP/+ywybYJPj7c9yv5Xj72oq269fdRFaoQ/2MQTzn5kVtKum8ca07 FNpx794FNO76MV/F9pNh7KAtpmjSSwlV+WlxloE2S7fbCq17p2KznL29MRJ1SuY2K8Ur BVYvMUZ0Csl7KszS2R5lwNS5morS6tGGcKPTqgFNN9eceqbQjUazGjSbiJ2dkfMQ1Mrp 0CE1hPijCYzlWVQQ2k+z5BD09zWj1JEsu/zo17tiKNevm2jv/QXDX/2u7T+wUmq28BK0 wGPA== X-Gm-Message-State: AHYfb5gYeGIhryK6mDjdyjj5wOrkcUTc2dDvAeObis/yVQU/ebgv+1st c0HO4iHjWfz2DkM5SvHztA== X-Received: by 10.84.171.195 with SMTP id l61mr7755983plb.464.1503010879346; Thu, 17 Aug 2017 16:01:19 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:08 -0700 Message-Id: <20170817230114.3655-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::235 Subject: [Qemu-devel] [PATCH 2/8] target/arm: Use generic vector infrastructure for aa64 add/sub/logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 137 ++++++++++++++++++++++++++++-------------= ---- 1 file changed, 87 insertions(+), 50 deletions(-) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 2200e25be0..025354f983 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -21,6 +21,7 @@ #include "cpu.h" #include "exec/exec-all.h" #include "tcg-op.h" +#include "tcg-op-gvec.h" #include "qemu/log.h" #include "arm_ldst.h" #include "translate.h" @@ -82,6 +83,7 @@ typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCG= v_i64, TCGv_ptr); typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64); typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32); typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32); +typedef void GVecGenTwoFn(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t= ); =20 /* initialize TCG globals. */ void a64_translate_init(void) @@ -537,6 +539,21 @@ static inline int vec_reg_offset(DisasContext *s, int = regno, return offs; } =20 +/* Return the offset info CPUARMState of the "whole" vector register Qn. = */ +static inline int vec_full_reg_offset(DisasContext *s, int regno) +{ + assert_fp_access_checked(s); + return offsetof(CPUARMState, vfp.regs[regno * 2]); +} + +/* Return the byte size of the "whole" vector register, VL / 8. */ +static inline int vec_full_reg_size(DisasContext *s) +{ + /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags. + In the meantime this is just the AdvSIMD length of 128. */ + return 128 / 8; +} + /* Return the offset into CPUARMState of a slice (from * the least significant end) of FP register Qn (ie * Dn, Sn, Hn or Bn). @@ -9042,11 +9059,38 @@ static void disas_simd_3same_logic(DisasContext *s,= uint32_t insn) bool is_q =3D extract32(insn, 30, 1); TCGv_i64 tcg_op1, tcg_op2, tcg_res[2]; int pass; + GVecGenTwoFn *gvec_op; =20 if (!fp_access_check(s)) { return; } =20 + switch (size + 4 * is_u) { + case 0: /* AND */ + gvec_op =3D tcg_gen_gvec_and8; + goto do_gvec; + case 1: /* BIC */ + gvec_op =3D tcg_gen_gvec_andc8; + goto do_gvec; + case 2: /* ORR */ + gvec_op =3D tcg_gen_gvec_or8; + goto do_gvec; + case 3: /* ORN */ + gvec_op =3D tcg_gen_gvec_orc8; + goto do_gvec; + case 4: /* EOR */ + gvec_op =3D tcg_gen_gvec_xor8; + goto do_gvec; + do_gvec: + gvec_op(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + + /* Note that we've now eliminated all !is_u. */ + tcg_op1 =3D tcg_temp_new_i64(); tcg_op2 =3D tcg_temp_new_i64(); tcg_res[0] =3D tcg_temp_new_i64(); @@ -9056,47 +9100,27 @@ static void disas_simd_3same_logic(DisasContext *s,= uint32_t insn) read_vec_element(s, tcg_op1, rn, pass, MO_64); read_vec_element(s, tcg_op2, rm, pass, MO_64); =20 - if (!is_u) { - switch (size) { - case 0: /* AND */ - tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BIC */ - tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 2: /* ORR */ - tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 3: /* ORN */ - tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - } - } else { - if (size !=3D 0) { - /* B* ops need res loaded to operate on */ - read_vec_element(s, tcg_res[pass], rd, pass, MO_64); - } + /* B* ops need res loaded to operate on */ + read_vec_element(s, tcg_res[pass], rd, pass, MO_64); =20 - switch (size) { - case 0: /* EOR */ - tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BSL bitwise select */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1); - break; - case 2: /* BIT, bitwise insert if true */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - case 3: /* BIF, bitwise insert if false */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - } + switch (size) { + case 1: /* BSL bitwise select */ + tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2); + tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]); + tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1); + break; + case 2: /* BIT, bitwise insert if true */ + tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); + tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2); + tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); + break; + case 3: /* BIF, bitwise insert if false */ + tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); + tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2); + tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); + break; + default: + g_assert_not_reached(); } } =20 @@ -9370,6 +9394,7 @@ static void disas_simd_3same_int(DisasContext *s, uin= t32_t insn) int rn =3D extract32(insn, 5, 5); int rd =3D extract32(insn, 0, 5); int pass; + GVecGenTwoFn *gvec_op; =20 switch (opcode) { case 0x13: /* MUL, PMUL */ @@ -9409,6 +9434,28 @@ static void disas_simd_3same_int(DisasContext *s, ui= nt32_t insn) return; } =20 + switch (opcode) { + case 0x10: /* ADD, SUB */ + { + static GVecGenTwoFn * const fns[4][2] =3D { + { tcg_gen_gvec_add8, tcg_gen_gvec_sub8 }, + { tcg_gen_gvec_add16, tcg_gen_gvec_sub16 }, + { tcg_gen_gvec_add32, tcg_gen_gvec_sub32 }, + { tcg_gen_gvec_add64, tcg_gen_gvec_sub64 }, + }; + gvec_op =3D fns[size][u]; + goto do_gvec; + } + break; + + do_gvec: + gvec_op(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + if (size =3D=3D 3) { assert(is_q); for (pass =3D 0; pass < 2; pass++) { @@ -9581,16 +9628,6 @@ static void disas_simd_3same_int(DisasContext *s, ui= nt32_t insn) genfn =3D fns[size][u]; break; } - case 0x10: /* ADD, SUB */ - { - static NeonGenTwoOpFn * const fns[3][2] =3D { - { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 }, - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 }, - { tcg_gen_add_i32, tcg_gen_sub_i32 }, - }; - genfn =3D fns[size][u]; - break; - } case 0x11: /* CMTST, CMEQ */ { static NeonGenTwoOpFn * const fns[3][2] =3D { --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011007994955.295648488074; Thu, 17 Aug 2017 16:03:27 -0700 (PDT) Received: from localhost ([::1]:56299 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diToy-0006An-Nt for importer@patchew.org; Thu, 17 Aug 2017 19:03:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTn3-0004vg-RA for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn0-0000qQ-Ag for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400 Received: from mail-pg0-x22a.google.com ([2607:f8b0:400e:c05::22a]:36958) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn0-0000px-5Y for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:22 -0400 Received: by mail-pg0-x22a.google.com with SMTP id y129so51924905pgy.4 for ; Thu, 17 Aug 2017 16:01:21 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9o+4s+u+HhA1SN1wAmIK4x0UqDGVW4ct/r56gOt0Xg0=; b=ERhOP+bZ0PJ2PEOW+/ErsB9nFCLtJqJLFDUj272WDZ0MjFuhNhXwKnFnAmMUKs4A7s Hj8cuiKw2JYV52xfx6PLq7IEAaweD9doF4Md1XiAcX3CdQP0ajE6QRFcNqOvRiEh36rg Z7zWyt9Sy5eLrqYruYPWdiwWhW6ZteYTQT4tw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9o+4s+u+HhA1SN1wAmIK4x0UqDGVW4ct/r56gOt0Xg0=; b=X+QBsxjDylkCdE6laL42KUW9aRc/BQgRWPJ/ha0PQ/uIVl4/6LOrW3k1TYSTtJiaCH VH40tyai0cLb6r13L4qpYSAmtBblU8QyLGYNyYAKSgro1ktt8k8s9UbAWHtVEv0yM3vL xkx28FYtfFIaNwlQERi0WNE9YWM5HKmGHRkfiujj2fk136AovV1ffJSp8gEU2H0KpQ9I yraTE2cB2jWXffeydFCSHUdMYbcLGbIa55es8w697reJOz97hO/70PGfGw4R29/a3toS TNCoOK+2wsPc4DgpB34NIVG2uiArg6oc61tqCFWAiPC/S48w6r5HbtDZsVnPDv17KLBW ZrWg== X-Gm-Message-State: AHYfb5gaHQP4fgaC5j1shin9WerTpMk3Yl1JgXLW0eH1TpTuEWajoLyo 8/fj2UX/PFcPxwI7lAwi/w== X-Received: by 10.98.198.145 with SMTP id x17mr6922847pfk.272.1503010880804; Thu, 17 Aug 2017 16:01:20 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:09 -0700 Message-Id: <20170817230114.3655-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22a Subject: [Qemu-devel] [PATCH 3/8] tcg: Add types for host vectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Nothing uses or enables them yet. Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- tcg/tcg.h | 5 +++++ tcg/tcg.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index dd97095af5..1277caed3d 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -256,6 +256,11 @@ typedef struct TCGPool { typedef enum TCGType { TCG_TYPE_I32, TCG_TYPE_I64, + + TCG_TYPE_V64, + TCG_TYPE_V128, + TCG_TYPE_V256, + TCG_TYPE_COUNT, /* number of different types */ =20 /* An alias for the size of the host register. */ diff --git a/tcg/tcg.c b/tcg/tcg.c index 787c8ba0f7..ea78d47fad 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -118,7 +118,7 @@ static TCGReg tcg_reg_alloc_new(TCGContext *s, TCGType = t) static bool tcg_out_ldst_finalize(TCGContext *s); #endif =20 -static TCGRegSet tcg_target_available_regs[2]; +static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT]; static TCGRegSet tcg_target_call_clobber_regs; =20 #if TCG_TARGET_INSN_UNIT_SIZE =3D=3D 1 --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011029783695.5534651225215; Thu, 17 Aug 2017 16:03:49 -0700 (PDT) Received: from localhost ([::1]:56308 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTpM-0006UY-GS for importer@patchew.org; Thu, 17 Aug 2017 19:03:48 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44499) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTn3-0004uL-24 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn1-0000sr-Qb for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400 Received: from mail-pg0-x230.google.com ([2607:f8b0:400e:c05::230]:36964) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn1-0000qt-Jd for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:23 -0400 Received: by mail-pg0-x230.google.com with SMTP id y129so51925216pgy.4 for ; Thu, 17 Aug 2017 16:01:23 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Ve0DM43w5YrFDYvyz9+DQQvsPDp9TqP3iha+QC7CvrE=; b=fL+HCOIasLc1hBhR5VTNOSXl/yfHcdm8oT4BJkflQeu5Y3+ExC64p5Jsjtn6fwe+V/ FdugLl11X4yiNj16xp7SsDElgXfxuHSFSipnr0S+WXsXthNFuRxUWGBfu/QvLZJWBUJg D6PKMj1NaPH+p7t7FPuauKAoQf4fVjD3+6laU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Ve0DM43w5YrFDYvyz9+DQQvsPDp9TqP3iha+QC7CvrE=; b=VkDatV2FAdJ9sFRUnq+PBzGOnxAEdXWDOAcWoIZU1EIjfozIIXQlEedq+M4iRKdtuJ HQybxHb2eyKGA46O0B3zc3dWqu4dTzbmlnT6PSQLw2APcO0fzGBf0OeJeITh3UxMFM+V wiyIdaosqcH1A9iCPkbFDBgD7Ph8CxhZQ390TfK3nZ64kLAb9DxDxzVBL7rn6aseBuYW jZ21QZMNlHrKVoPeXyuR41uCl9K30IBRR8CdyBgYFEcyvLmgjcM3hsYu8N3O9A5Tx4XP QPy3FCsvqv4Ehkdp2A3B9B8H/uEv+UjuTLtMZWhkXqt4tAciQOQ96Fhxuzh8wjwKftgX n63g== X-Gm-Message-State: AHYfb5gUc+QB38VmE6ScD/FK+j+2e1hyZgvIEJ6wsuiWxgAkT/Hx41QN y1boMInhGqnym6wrFabn7w== X-Received: by 10.98.193.68 with SMTP id i65mr6957117pfg.142.1503010882201; Thu, 17 Aug 2017 16:01:22 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:10 -0700 Message-Id: <20170817230114.3655-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::230 Subject: [Qemu-devel] [PATCH 4/8] tcg: Add operations for host vectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Nothing uses or implements them yet. Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- tcg/tcg-opc.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ tcg/tcg.h | 24 ++++++++++++++++ 2 files changed, 113 insertions(+) diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 956fb1e9f3..9162125fac 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -206,6 +206,95 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1, =20 #undef TLADDR_ARGS #undef DATA64_ARGS + +/* Host integer vector operations. */ +/* These opcodes are required whenever the base vector size is enabled. */ + +DEF(mov_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(mov_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(mov_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_v256)) + +DEF(movi_v64, 1, 0, 1, IMPL(TCG_TARGET_HAS_v64)) +DEF(movi_v128, 1, 0, 1, IMPL(TCG_TARGET_HAS_v128)) +DEF(movi_v256, 1, 0, 1, IMPL(TCG_TARGET_HAS_v256)) + +DEF(ld_v64, 1, 1, 1, IMPL(TCG_TARGET_HAS_v64)) +DEF(ld_v128, 1, 1, 1, IMPL(TCG_TARGET_HAS_v128)) +DEF(ld_v256, 1, 1, 1, IMPL(TCG_TARGET_HAS_v256)) + +DEF(st_v64, 0, 2, 1, IMPL(TCG_TARGET_HAS_v64)) +DEF(st_v128, 0, 2, 1, IMPL(TCG_TARGET_HAS_v128)) +DEF(st_v256, 0, 2, 1, IMPL(TCG_TARGET_HAS_v256)) + +DEF(and_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(and_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(and_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) + +DEF(or_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(or_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(or_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) + +DEF(xor_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(xor_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(xor_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) + +DEF(add8_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(add16_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(add32_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) + +DEF(add8_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(add16_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(add32_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(add64_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) + +DEF(add8_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(add16_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(add32_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(add64_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) + +DEF(sub8_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(sub16_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) +DEF(sub32_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64)) + +DEF(sub8_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(sub16_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(sub32_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) +DEF(sub64_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128)) + +DEF(sub8_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(sub16_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(sub32_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(sub64_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256)) + +/* These opcodes are optional. + All element counts must be supported if any are. */ + +DEF(not_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_v64)) +DEF(not_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_v128)) +DEF(not_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_v256)) + +DEF(andc_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_v64)) +DEF(andc_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_v128)) +DEF(andc_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_v256)) + +DEF(orc_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_v64)) +DEF(orc_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_v128)) +DEF(orc_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_v256)) + +DEF(neg8_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v64)) +DEF(neg16_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v64)) +DEF(neg32_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v64)) + +DEF(neg8_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128)) +DEF(neg16_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128)) +DEF(neg32_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128)) +DEF(neg64_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128)) + +DEF(neg8_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256)) +DEF(neg16_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256)) +DEF(neg32_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256)) +DEF(neg64_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256)) + #undef IMPL #undef IMPL64 #undef DEF diff --git a/tcg/tcg.h b/tcg/tcg.h index 1277caed3d..b9e15da13b 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -166,6 +166,30 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_rem_i64 0 #endif =20 +#ifndef TCG_TARGET_HAS_v64 +#define TCG_TARGET_HAS_v64 0 +#define TCG_TARGET_HAS_andc_v64 0 +#define TCG_TARGET_HAS_orc_v64 0 +#define TCG_TARGET_HAS_not_v64 0 +#define TCG_TARGET_HAS_neg_v64 0 +#endif + +#ifndef TCG_TARGET_HAS_v128 +#define TCG_TARGET_HAS_v128 0 +#define TCG_TARGET_HAS_andc_v128 0 +#define TCG_TARGET_HAS_orc_v128 0 +#define TCG_TARGET_HAS_not_v128 0 +#define TCG_TARGET_HAS_neg_v128 0 +#endif + +#ifndef TCG_TARGET_HAS_v256 +#define TCG_TARGET_HAS_v256 0 +#define TCG_TARGET_HAS_andc_v256 0 +#define TCG_TARGET_HAS_orc_v256 0 +#define TCG_TARGET_HAS_not_v256 0 +#define TCG_TARGET_HAS_neg_v256 0 +#endif + /* For 32-bit targets, some sort of unsigned widening multiply is required= . */ #if TCG_TARGET_REG_BITS =3D=3D 32 \ && !(defined(TCG_TARGET_HAS_mulu2_i32) \ --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011157795398.5633525593728; Thu, 17 Aug 2017 16:05:57 -0700 (PDT) Received: from localhost ([::1]:56351 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTrQ-0008JF-Fq for importer@patchew.org; Thu, 17 Aug 2017 19:05:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44537) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTn4-0004wR-J8 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn3-0000x0-9E for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:26 -0400 Received: from mail-pg0-x22b.google.com ([2607:f8b0:400e:c05::22b]:36971) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn3-0000va-24 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400 Received: by mail-pg0-x22b.google.com with SMTP id y129so51925581pgy.4 for ; Thu, 17 Aug 2017 16:01:24 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.22 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=hIDrwc+oo2/OTeyFZIxY3GeNtfhOOi0JxoLuI/S8d1Q=; b=Y0Lsb3z60F7sB14wYzFrb59izB7AHueIXWhSfpjV9vCPDXmsE4WpCyRAZjpn7RjgFp s1cMHaKERonXYCCb5ZTfT+af5GVV+sVq4J6lOdYqrvLKAuLJxD69VdA3VwWgj9WkxFrU cQgOlf1XiQkd+YpwWxKsvz1/w4vvOwJ4YSNTw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=hIDrwc+oo2/OTeyFZIxY3GeNtfhOOi0JxoLuI/S8d1Q=; b=ijkU8ggfhHXc868xM9p/BFsbh+47L80DQUrlFlyuyI4A5acoYKQ1a1b3ZvEeOVLIJs CRogmzFIM5L0D3QsKoKxLuzApqvMHcLLmf73PQxroN05D5eUnLgTz7MXPoF31aINIDHn n+zbK8GvxdmOom1LqW1aNRpQsiUTU15bjneXxZRQJnD8dk4kSGpnfeJUSiaUwPQlf+te dzWS9adbR24kNqSYetrpZYp7M1Pxu/lO1nS6bh46fY9GA841ZLnpk3Qa0XytHpC2va5O qqufnKuUiCL1SVcgq05N+x6xHgdAsGSl9qWakwRUIi+KxJ2Xbd+jhzBiAQVP+PPdfIc8 /ksA== X-Gm-Message-State: AHYfb5jqsV812HPwFSZf83IGkRcYtCycen/tsVm6IRLSpKkhZzwcttNs aDL9UvxlUNy0pFLvFAu3vA== X-Received: by 10.98.82.2 with SMTP id g2mr6749735pfb.308.1503010883573; Thu, 17 Aug 2017 16:01:23 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:11 -0700 Message-Id: <20170817230114.3655-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22b Subject: [Qemu-devel] [PATCH 5/8] tcg: Add tcg_op_supported X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- tcg/tcg.h | 2 + tcg/tcg.c | 310 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 312 insertions(+) diff --git a/tcg/tcg.h b/tcg/tcg.h index b9e15da13b..b443143b21 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -962,6 +962,8 @@ do {\ #define tcg_temp_free_ptr(T) tcg_temp_free_i64(TCGV_PTR_TO_NAT(T)) #endif =20 +bool tcg_op_supported(TCGOpcode op); + void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret, int nargs, TCGArg *args); =20 diff --git a/tcg/tcg.c b/tcg/tcg.c index ea78d47fad..3c3cdda938 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -751,6 +751,316 @@ int tcg_check_temp_count(void) } #endif =20 +/* Return true if OP may appear in the opcode stream. + Test the runtime variable that controls each opcode. */ +bool tcg_op_supported(TCGOpcode op) +{ + switch (op) { + case INDEX_op_discard: + case INDEX_op_set_label: + case INDEX_op_call: + case INDEX_op_br: + case INDEX_op_mb: + case INDEX_op_insn_start: + case INDEX_op_exit_tb: + case INDEX_op_goto_tb: + case INDEX_op_qemu_ld_i32: + case INDEX_op_qemu_st_i32: + case INDEX_op_qemu_ld_i64: + case INDEX_op_qemu_st_i64: + return true; + + case INDEX_op_goto_ptr: + return TCG_TARGET_HAS_goto_ptr; + + case INDEX_op_mov_i32: + case INDEX_op_movi_i32: + case INDEX_op_setcond_i32: + case INDEX_op_brcond_i32: + case INDEX_op_ld8u_i32: + case INDEX_op_ld8s_i32: + case INDEX_op_ld16u_i32: + case INDEX_op_ld16s_i32: + case INDEX_op_ld_i32: + case INDEX_op_st8_i32: + case INDEX_op_st16_i32: + case INDEX_op_st_i32: + case INDEX_op_add_i32: + case INDEX_op_sub_i32: + case INDEX_op_mul_i32: + case INDEX_op_and_i32: + case INDEX_op_or_i32: + case INDEX_op_xor_i32: + case INDEX_op_shl_i32: + case INDEX_op_shr_i32: + case INDEX_op_sar_i32: + return true; + + case INDEX_op_movcond_i32: + return TCG_TARGET_HAS_movcond_i32; + case INDEX_op_div_i32: + case INDEX_op_divu_i32: + return TCG_TARGET_HAS_div_i32; + case INDEX_op_rem_i32: + case INDEX_op_remu_i32: + return TCG_TARGET_HAS_rem_i32; + case INDEX_op_div2_i32: + case INDEX_op_divu2_i32: + return TCG_TARGET_HAS_div2_i32; + case INDEX_op_rotl_i32: + case INDEX_op_rotr_i32: + return TCG_TARGET_HAS_rot_i32; + case INDEX_op_deposit_i32: + return TCG_TARGET_HAS_deposit_i32; + case INDEX_op_extract_i32: + return TCG_TARGET_HAS_extract_i32; + case INDEX_op_sextract_i32: + return TCG_TARGET_HAS_sextract_i32; + case INDEX_op_add2_i32: + return TCG_TARGET_HAS_add2_i32; + case INDEX_op_sub2_i32: + return TCG_TARGET_HAS_sub2_i32; + case INDEX_op_mulu2_i32: + return TCG_TARGET_HAS_mulu2_i32; + case INDEX_op_muls2_i32: + return TCG_TARGET_HAS_muls2_i32; + case INDEX_op_muluh_i32: + return TCG_TARGET_HAS_muluh_i32; + case INDEX_op_mulsh_i32: + return TCG_TARGET_HAS_mulsh_i32; + case INDEX_op_ext8s_i32: + return TCG_TARGET_HAS_ext8s_i32; + case INDEX_op_ext16s_i32: + return TCG_TARGET_HAS_ext16s_i32; + case INDEX_op_ext8u_i32: + return TCG_TARGET_HAS_ext8u_i32; + case INDEX_op_ext16u_i32: + return TCG_TARGET_HAS_ext16u_i32; + case INDEX_op_bswap16_i32: + return TCG_TARGET_HAS_bswap16_i32; + case INDEX_op_bswap32_i32: + return TCG_TARGET_HAS_bswap32_i32; + case INDEX_op_not_i32: + return TCG_TARGET_HAS_not_i32; + case INDEX_op_neg_i32: + return TCG_TARGET_HAS_neg_i32; + case INDEX_op_andc_i32: + return TCG_TARGET_HAS_andc_i32; + case INDEX_op_orc_i32: + return TCG_TARGET_HAS_orc_i32; + case INDEX_op_eqv_i32: + return TCG_TARGET_HAS_eqv_i32; + case INDEX_op_nand_i32: + return TCG_TARGET_HAS_nand_i32; + case INDEX_op_nor_i32: + return TCG_TARGET_HAS_nor_i32; + case INDEX_op_clz_i32: + return TCG_TARGET_HAS_clz_i32; + case INDEX_op_ctz_i32: + return TCG_TARGET_HAS_ctz_i32; + case INDEX_op_ctpop_i32: + return TCG_TARGET_HAS_ctpop_i32; + + case INDEX_op_brcond2_i32: + case INDEX_op_setcond2_i32: + return TCG_TARGET_REG_BITS =3D=3D 32; + + case INDEX_op_mov_i64: + case INDEX_op_movi_i64: + case INDEX_op_setcond_i64: + case INDEX_op_brcond_i64: + case INDEX_op_ld8u_i64: + case INDEX_op_ld8s_i64: + case INDEX_op_ld16u_i64: + case INDEX_op_ld16s_i64: + case INDEX_op_ld32u_i64: + case INDEX_op_ld32s_i64: + case INDEX_op_ld_i64: + case INDEX_op_st8_i64: + case INDEX_op_st16_i64: + case INDEX_op_st32_i64: + case INDEX_op_st_i64: + case INDEX_op_add_i64: + case INDEX_op_sub_i64: + case INDEX_op_mul_i64: + case INDEX_op_and_i64: + case INDEX_op_or_i64: + case INDEX_op_xor_i64: + case INDEX_op_shl_i64: + case INDEX_op_shr_i64: + case INDEX_op_sar_i64: + case INDEX_op_ext_i32_i64: + case INDEX_op_extu_i32_i64: + return TCG_TARGET_REG_BITS =3D=3D 64; + + case INDEX_op_movcond_i64: + return TCG_TARGET_HAS_movcond_i64; + case INDEX_op_div_i64: + case INDEX_op_divu_i64: + return TCG_TARGET_HAS_div_i64; + case INDEX_op_rem_i64: + case INDEX_op_remu_i64: + return TCG_TARGET_HAS_rem_i64; + case INDEX_op_div2_i64: + case INDEX_op_divu2_i64: + return TCG_TARGET_HAS_div2_i64; + case INDEX_op_rotl_i64: + case INDEX_op_rotr_i64: + return TCG_TARGET_HAS_rot_i64; + case INDEX_op_deposit_i64: + return TCG_TARGET_HAS_deposit_i64; + case INDEX_op_extract_i64: + return TCG_TARGET_HAS_extract_i64; + case INDEX_op_sextract_i64: + return TCG_TARGET_HAS_sextract_i64; + case INDEX_op_extrl_i64_i32: + return TCG_TARGET_HAS_extrl_i64_i32; + case INDEX_op_extrh_i64_i32: + return TCG_TARGET_HAS_extrh_i64_i32; + case INDEX_op_ext8s_i64: + return TCG_TARGET_HAS_ext8s_i64; + case INDEX_op_ext16s_i64: + return TCG_TARGET_HAS_ext16s_i64; + case INDEX_op_ext32s_i64: + return TCG_TARGET_HAS_ext32s_i64; + case INDEX_op_ext8u_i64: + return TCG_TARGET_HAS_ext8u_i64; + case INDEX_op_ext16u_i64: + return TCG_TARGET_HAS_ext16u_i64; + case INDEX_op_ext32u_i64: + return TCG_TARGET_HAS_ext32u_i64; + case INDEX_op_bswap16_i64: + return TCG_TARGET_HAS_bswap16_i64; + case INDEX_op_bswap32_i64: + return TCG_TARGET_HAS_bswap32_i64; + case INDEX_op_bswap64_i64: + return TCG_TARGET_HAS_bswap64_i64; + case INDEX_op_not_i64: + return TCG_TARGET_HAS_not_i64; + case INDEX_op_neg_i64: + return TCG_TARGET_HAS_neg_i64; + case INDEX_op_andc_i64: + return TCG_TARGET_HAS_andc_i64; + case INDEX_op_orc_i64: + return TCG_TARGET_HAS_orc_i64; + case INDEX_op_eqv_i64: + return TCG_TARGET_HAS_eqv_i64; + case INDEX_op_nand_i64: + return TCG_TARGET_HAS_nand_i64; + case INDEX_op_nor_i64: + return TCG_TARGET_HAS_nor_i64; + case INDEX_op_clz_i64: + return TCG_TARGET_HAS_clz_i64; + case INDEX_op_ctz_i64: + return TCG_TARGET_HAS_ctz_i64; + case INDEX_op_ctpop_i64: + return TCG_TARGET_HAS_ctpop_i64; + case INDEX_op_add2_i64: + return TCG_TARGET_HAS_add2_i64; + case INDEX_op_sub2_i64: + return TCG_TARGET_HAS_sub2_i64; + case INDEX_op_mulu2_i64: + return TCG_TARGET_HAS_mulu2_i64; + case INDEX_op_muls2_i64: + return TCG_TARGET_HAS_muls2_i64; + case INDEX_op_muluh_i64: + return TCG_TARGET_HAS_muluh_i64; + case INDEX_op_mulsh_i64: + return TCG_TARGET_HAS_mulsh_i64; + + case INDEX_op_mov_v64: + case INDEX_op_movi_v64: + case INDEX_op_ld_v64: + case INDEX_op_st_v64: + case INDEX_op_and_v64: + case INDEX_op_or_v64: + case INDEX_op_xor_v64: + case INDEX_op_add8_v64: + case INDEX_op_add16_v64: + case INDEX_op_add32_v64: + case INDEX_op_sub8_v64: + case INDEX_op_sub16_v64: + case INDEX_op_sub32_v64: + return TCG_TARGET_HAS_v64; + + case INDEX_op_mov_v128: + case INDEX_op_movi_v128: + case INDEX_op_ld_v128: + case INDEX_op_st_v128: + case INDEX_op_and_v128: + case INDEX_op_or_v128: + case INDEX_op_xor_v128: + case INDEX_op_add8_v128: + case INDEX_op_add16_v128: + case INDEX_op_add32_v128: + case INDEX_op_add64_v128: + case INDEX_op_sub8_v128: + case INDEX_op_sub16_v128: + case INDEX_op_sub32_v128: + case INDEX_op_sub64_v128: + return TCG_TARGET_HAS_v128; + + case INDEX_op_mov_v256: + case INDEX_op_movi_v256: + case INDEX_op_ld_v256: + case INDEX_op_st_v256: + case INDEX_op_and_v256: + case INDEX_op_or_v256: + case INDEX_op_xor_v256: + case INDEX_op_add8_v256: + case INDEX_op_add16_v256: + case INDEX_op_add32_v256: + case INDEX_op_add64_v256: + case INDEX_op_sub8_v256: + case INDEX_op_sub16_v256: + case INDEX_op_sub32_v256: + case INDEX_op_sub64_v256: + return TCG_TARGET_HAS_v256; + + case INDEX_op_not_v64: + return TCG_TARGET_HAS_not_v64; + case INDEX_op_not_v128: + return TCG_TARGET_HAS_not_v128; + case INDEX_op_not_v256: + return TCG_TARGET_HAS_not_v256; + + case INDEX_op_andc_v64: + return TCG_TARGET_HAS_andc_v64; + case INDEX_op_andc_v128: + return TCG_TARGET_HAS_andc_v128; + case INDEX_op_andc_v256: + return TCG_TARGET_HAS_andc_v256; + + case INDEX_op_orc_v64: + return TCG_TARGET_HAS_orc_v64; + case INDEX_op_orc_v128: + return TCG_TARGET_HAS_orc_v128; + case INDEX_op_orc_v256: + return TCG_TARGET_HAS_orc_v256; + + case INDEX_op_neg8_v64: + case INDEX_op_neg16_v64: + case INDEX_op_neg32_v64: + return TCG_TARGET_HAS_neg_v64; + + case INDEX_op_neg8_v128: + case INDEX_op_neg16_v128: + case INDEX_op_neg32_v128: + case INDEX_op_neg64_v128: + return TCG_TARGET_HAS_neg_v128; + + case INDEX_op_neg8_v256: + case INDEX_op_neg16_v256: + case INDEX_op_neg32_v256: + case INDEX_op_neg64_v256: + return TCG_TARGET_HAS_neg_v256; + + case NB_OPS: + break; + } + g_assert_not_reached(); +} + /* Note: we convert the 64 bit args to 32 bit and do some alignment and endian swap. Maybe it would be better to do the alignment and endian swap in tcg_reg_alloc_call(). */ --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 15030112950371008.613745663873; Thu, 17 Aug 2017 16:08:15 -0700 (PDT) Received: from localhost ([::1]:56502 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTtd-0001rw-Og for importer@patchew.org; Thu, 17 Aug 2017 19:08:13 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44564) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTn5-0004xV-P7 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn5-0000y2-3z for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:27 -0400 Received: from mail-pg0-x22d.google.com ([2607:f8b0:400e:c05::22d]:36560) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn4-0000xc-VB for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:27 -0400 Received: by mail-pg0-x22d.google.com with SMTP id i12so51987893pgr.3 for ; Thu, 17 Aug 2017 16:01:26 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.23 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=dpdlyo3XtVpsa3ftb6C8wRlu5yzXE0Obkds6jVxb5hw=; b=NYmCuYgEnbczxXFGsRPIauPB+03hnctKKtTaOoKZa/Yek3w5jieLYsou8TgpIhmJ1o RFdWq32rcqfaZdS8/7R9H5HZPnuhWF6LajCpasBijtif4pxANchy6RgjHRZnZeX5TYRR 4aim+tHJHsAEv80MSBeZJW1bQMHHRUpXc+Ow8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dpdlyo3XtVpsa3ftb6C8wRlu5yzXE0Obkds6jVxb5hw=; b=AAac+EqPjQcW2nBivgp7f20nVa/wcaRNHmDw97zYtXxmT0ssBaIcKX9iTJFf9uYsF+ sbpUeXuNJwInkgeEYjx/AO1ByBc9NobBkYYk26ITJFkix1xqdA4ByzJ3+Eet3/JkALdz RU+wegMyQ+NVe6Qb21EKUanUM/IK/H69jvWmbnaZtSSnndcGpZNMm0xK/CDDgl9Xx79x H/WKlhDYvP/AyFx/XybegBhYo2qOR1iCZf/rmrS/u8a/kISoJrGInDTRxha02Vbi8nYh WrPgi6z/i1Of686Nrzsn/oprCCMqcOzClfR2K0jvyjlZx0m+4zV33zonbVe+NujJ/Ub6 5fhQ== X-Gm-Message-State: AHYfb5ggvXa39lZMOfqovHkku7/kvZFFHjii007A8WMriMzADkw5cCZx QTO1Ox/Dpal3LrqASSBCwQ== X-Received: by 10.98.14.93 with SMTP id w90mr6909429pfi.298.1503010885018; Thu, 17 Aug 2017 16:01:25 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:12 -0700 Message-Id: <20170817230114.3655-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22d Subject: [Qemu-devel] [PATCH 6/8] tcg: Add INDEX_op_invalid X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add with value 0 so that structure zero initialization can indicate that the field is not present. Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- tcg/tcg-opc.h | 2 ++ tcg/tcg.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 9162125fac..b1445a4c24 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -26,6 +26,8 @@ * DEF(name, oargs, iargs, cargs, flags) */ =20 +DEF(invalid, 0, 0, 0, TCG_OPF_NOT_PRESENT) + /* predefined ops */ DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT) DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT) diff --git a/tcg/tcg.c b/tcg/tcg.c index 3c3cdda938..879b29e81f 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -756,6 +756,9 @@ int tcg_check_temp_count(void) bool tcg_op_supported(TCGOpcode op) { switch (op) { + case INDEX_op_invalid: + return false; + case INDEX_op_discard: case INDEX_op_set_label: case INDEX_op_call: --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011387574579.4853486048796; Thu, 17 Aug 2017 16:09:47 -0700 (PDT) Received: from localhost ([::1]:56685 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTv8-0003Hs-C1 for importer@patchew.org; Thu, 17 Aug 2017 19:09:46 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44637) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTnB-00053N-U4 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn7-0000yz-1O for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:34 -0400 Received: from mail-pg0-x230.google.com ([2607:f8b0:400e:c05::230]:36567) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn6-0000yi-QL for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:28 -0400 Received: by mail-pg0-x230.google.com with SMTP id i12so51988336pgr.3 for ; Thu, 17 Aug 2017 16:01:28 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ayDUNuMxBJiB70+Pu5TZupwFTb2kohK2+QYR/Q2F6rg=; b=V4mEpb1xOMQrlcvGv1KqgEZ+kPVth2kZllsNtYaFD+kChXCtn2lwjYNfWyXJpIrLFJ 7gJyLIRWPOgAFTZUVy/76rkddSgBcQYecfccNy3Zpm8nuSokPtFo/0NPC5s+LJjhM5gS XQNTD20jWF4tLuNrw6wJAa1wZy0l2D1nhZnk8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ayDUNuMxBJiB70+Pu5TZupwFTb2kohK2+QYR/Q2F6rg=; b=nnW/Hgyx0YnBWYxgd2BOujoUZT4UXyXCIsd++oAiUqIcvzjsops9liL764U/t+/FU1 KvzRg1gDdbgFj86owSiFlfSciGmK47oUMH20sBs6X7/WuPR193iu0hYFEn1tG7xzFQNH 8ONJ4QmVkXLOkXHKn5x1kPQ0CvWah5i45Baykbq808sd4CXFQ/DojLjJCELSDnM1hf3g prlQcTu1LqQnr47g+0PALhdpXWq2nXls5noCuktvn6pzbZGlHXznH82UtITMmIxGxf9H xKLnDnnqSO+VcqHsbya1CALVhn7JTF+l2tPfINf+/48ZwFAJ4iEnsz3+ZXFJnx+m5uIi +dFA== X-Gm-Message-State: AHYfb5hdQNcoxsHZv5wsu56t69WvMks+IfY6TdM6tPCVj5JzXrDUrktK 0KBH35Qi+yXNSdv4JXuKkA== X-Received: by 10.98.86.195 with SMTP id h64mr6585247pfj.99.1503010886329; Thu, 17 Aug 2017 16:01:26 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:13 -0700 Message-Id: <20170817230114.3655-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::230 Subject: [Qemu-devel] [PATCH 7/8] tcg: Expand target vector ops with host vector ops X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson --- tcg/tcg-op-gvec.h | 4 + tcg/tcg.h | 6 +- tcg/tcg-op-gvec.c | 230 +++++++++++++++++++++++++++++++++++++++++++-------= ---- tcg/tcg.c | 8 +- 4 files changed, 197 insertions(+), 51 deletions(-) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 10db3599a5..99f36d208e 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -40,6 +40,10 @@ typedef struct { /* Similarly, but load up a constant and re-use across lanes. */ void (*fni8x)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64); uint64_t extra_value; + /* Operations with host vector ops. */ + TCGOpcode op_v256; + TCGOpcode op_v128; + TCGOpcode op_v64; /* Larger sizes: expand out-of-line helper w/size descriptor. */ void (*fno)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); } GVecGen3; diff --git a/tcg/tcg.h b/tcg/tcg.h index b443143b21..7f10501d31 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -825,9 +825,11 @@ int tcg_global_mem_new_internal(TCGType, TCGv_ptr, int= ptr_t, const char *); TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name); TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name); =20 -TCGv_i32 tcg_temp_new_internal_i32(int temp_local); -TCGv_i64 tcg_temp_new_internal_i64(int temp_local); +int tcg_temp_new_internal(TCGType type, bool temp_local); +TCGv_i32 tcg_temp_new_internal_i32(bool temp_local); +TCGv_i64 tcg_temp_new_internal_i64(bool temp_local); =20 +void tcg_temp_free_internal(int arg); void tcg_temp_free_i32(TCGv_i32 arg); void tcg_temp_free_i64(TCGv_i64 arg); =20 diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 6de49dc07f..3aca565dc0 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -30,54 +30,73 @@ #define REP8(x) ((x) * 0x0101010101010101ull) #define REP16(x) ((x) * 0x0001000100010001ull) =20 -#define MAX_INLINE 16 +#define MAX_UNROLL 4 =20 -static inline void check_size_s(uint32_t opsz, uint32_t clsz) +static inline void check_size_align(uint32_t opsz, uint32_t clsz, uint32_t= ofs) { - tcg_debug_assert(opsz % 8 =3D=3D 0); - tcg_debug_assert(clsz % 8 =3D=3D 0); + uint32_t align =3D clsz > 16 || opsz >=3D 16 ? 15 : 7; + tcg_debug_assert(opsz > 0); tcg_debug_assert(opsz <=3D clsz); + tcg_debug_assert((opsz & align) =3D=3D 0); + tcg_debug_assert((clsz & align) =3D=3D 0); + tcg_debug_assert((ofs & align) =3D=3D 0); } =20 -static inline void check_align_s_3(uint32_t dofs, uint32_t aofs, uint32_t = bofs) +static inline void check_overlap_3(uint32_t d, uint32_t a, + uint32_t b, uint32_t s) { - tcg_debug_assert(dofs % 8 =3D=3D 0); - tcg_debug_assert(aofs % 8 =3D=3D 0); - tcg_debug_assert(bofs % 8 =3D=3D 0); + tcg_debug_assert(d =3D=3D a || d + s <=3D a || a + s <=3D d); + tcg_debug_assert(d =3D=3D b || d + s <=3D b || b + s <=3D d); + tcg_debug_assert(a =3D=3D b || a + s <=3D b || b + s <=3D a); } =20 -static inline void check_size_l(uint32_t opsz, uint32_t clsz) +static inline bool check_size_impl(uint32_t opsz, uint32_t lnsz) { - tcg_debug_assert(opsz % 16 =3D=3D 0); - tcg_debug_assert(clsz % 16 =3D=3D 0); - tcg_debug_assert(opsz <=3D clsz); + uint32_t lnct =3D opsz / lnsz; + return lnct >=3D 1 && lnct <=3D MAX_UNROLL; } =20 -static inline void check_align_l_3(uint32_t dofs, uint32_t aofs, uint32_t = bofs) +static void expand_clr_v(uint32_t dofs, uint32_t clsz, uint32_t lnsz, + TCGType type, TCGOpcode opc_mv, TCGOpcode opc_st) { - tcg_debug_assert(dofs % 16 =3D=3D 0); - tcg_debug_assert(aofs % 16 =3D=3D 0); - tcg_debug_assert(bofs % 16 =3D=3D 0); -} + TCGArg t0 =3D tcg_temp_new_internal(type, 0); + TCGArg env =3D GET_TCGV_PTR(tcg_ctx.tcg_env); + uint32_t i; =20 -static inline void check_overlap_3(uint32_t d, uint32_t a, - uint32_t b, uint32_t s) -{ - tcg_debug_assert(d =3D=3D a || d + s <=3D a || a + s <=3D d); - tcg_debug_assert(d =3D=3D b || d + s <=3D b || b + s <=3D d); - tcg_debug_assert(a =3D=3D b || a + s <=3D b || b + s <=3D a); + tcg_gen_op2(&tcg_ctx, opc_mv, t0, 0); + for (i =3D 0; i < clsz; i +=3D lnsz) { + tcg_gen_op3(&tcg_ctx, opc_st, t0, env, dofs + i); + } + tcg_temp_free_internal(t0); } =20 -static void expand_clr(uint32_t dofs, uint32_t opsz, uint32_t clsz) +static void expand_clr(uint32_t dofs, uint32_t clsz) { - if (clsz > opsz) { - TCGv_i64 zero =3D tcg_const_i64(0); - uint32_t i; + if (clsz >=3D 32 && TCG_TARGET_HAS_v256) { + uint32_t done =3D QEMU_ALIGN_DOWN(clsz, 32); + expand_clr_v(dofs, done, 32, TCG_TYPE_V256, + INDEX_op_movi_v256, INDEX_op_st_v256); + dofs +=3D done; + clsz -=3D done; + } =20 - for (i =3D opsz; i < clsz; i +=3D 8) { - tcg_gen_st_i64(zero, tcg_ctx.tcg_env, dofs + i); - } - tcg_temp_free_i64(zero); + if (clsz >=3D 16 && TCG_TARGET_HAS_v128) { + uint16_t done =3D QEMU_ALIGN_DOWN(clsz, 16); + expand_clr_v(dofs, done, 16, TCG_TYPE_V128, + INDEX_op_movi_v128, INDEX_op_st_v128); + dofs +=3D done; + clsz -=3D done; + } + + if (TCG_TARGET_REG_BITS =3D=3D 64) { + expand_clr_v(dofs, clsz, 8, TCG_TYPE_I64, + INDEX_op_movi_i64, INDEX_op_st_i64); + } else if (TCG_TARGET_HAS_v64) { + expand_clr_v(dofs, clsz, 8, TCG_TYPE_V64, + INDEX_op_movi_v64, INDEX_op_st_v64); + } else { + expand_clr_v(dofs, clsz, 4, TCG_TYPE_I32, + INDEX_op_movi_i32, INDEX_op_st_i32); } } =20 @@ -164,6 +183,7 @@ static void expand_3x8(uint32_t dofs, uint32_t aofs, tcg_temp_free_i64(t0); } =20 +/* FIXME: add CSE for constants and we can eliminate this. */ static void expand_3x8p1(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint64_t data, void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i6= 4)) @@ -192,28 +212,111 @@ static void expand_3x8p1(uint32_t dofs, uint32_t aof= s, uint32_t bofs, tcg_temp_free_i64(t2); } =20 +static void expand_3_v(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t lnsz, TCGType type, + TCGOpcode opc_op, TCGOpcode opc_ld, TCGOpcode opc_s= t) +{ + TCGArg t0 =3D tcg_temp_new_internal(type, 0); + TCGArg env =3D GET_TCGV_PTR(tcg_ctx.tcg_env); + uint32_t i; + + if (aofs =3D=3D bofs) { + for (i =3D 0; i < opsz; i +=3D lnsz) { + tcg_gen_op3(&tcg_ctx, opc_ld, t0, env, aofs + i); + tcg_gen_op3(&tcg_ctx, opc_op, t0, t0, t0); + tcg_gen_op3(&tcg_ctx, opc_st, t0, env, dofs + i); + } + } else { + TCGArg t1 =3D tcg_temp_new_internal(type, 0); + for (i =3D 0; i < opsz; i +=3D lnsz) { + tcg_gen_op3(&tcg_ctx, opc_ld, t0, env, aofs + i); + tcg_gen_op3(&tcg_ctx, opc_ld, t1, env, bofs + i); + tcg_gen_op3(&tcg_ctx, opc_op, t0, t0, t1); + tcg_gen_op3(&tcg_ctx, opc_st, t0, env, dofs + i); + } + tcg_temp_free_internal(t1); + } + tcg_temp_free_internal(t0); +} + void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz, const GVecGen3 *g) { + check_size_align(opsz, clsz, dofs | aofs | bofs); check_overlap_3(dofs, aofs, bofs, clsz); - if (opsz <=3D MAX_INLINE) { - check_size_s(opsz, clsz); - check_align_s_3(dofs, aofs, bofs); - if (g->fni8) { - expand_3x8(dofs, aofs, bofs, opsz, g->fni8); - } else if (g->fni4) { - expand_3x4(dofs, aofs, bofs, opsz, g->fni4); + + if (opsz > MAX_UNROLL * 32 || clsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. opsz =3D=3D 80 would be expanded with 2x32 + 1x16. */ + /* ??? For clsz > opsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (check_size_impl(opsz, 32) && tcg_op_supported(g->op_v256)) { + uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 32); + expand_3_v(dofs, aofs, bofs, done, 32, TCG_TYPE_V256, + g->op_v256, INDEX_op_ld_v256, INDEX_op_st_v256); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + opsz -=3D done; + clsz -=3D done; + } + + if (check_size_impl(opsz, 16) && tcg_op_supported(g->op_v128)) { + uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 16); + expand_3_v(dofs, aofs, bofs, done, 16, TCG_TYPE_V128, + g->op_v128, INDEX_op_ld_v128, INDEX_op_st_v128); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + opsz -=3D done; + clsz -=3D done; + } + + if (check_size_impl(opsz, 8)) { + uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 8); + if (tcg_op_supported(g->op_v64)) { + expand_3_v(dofs, aofs, bofs, done, 8, TCG_TYPE_V64, + g->op_v64, INDEX_op_ld_v64, INDEX_op_st_v64); + } else if (g->fni8) { + expand_3x8(dofs, aofs, bofs, done, g->fni8); } else if (g->fni8x) { - expand_3x8p1(dofs, aofs, bofs, opsz, g->extra_value, g->fni8x); + expand_3x8p1(dofs, aofs, bofs, done, g->extra_value, g->fni8x); } else { - g_assert_not_reached(); + done =3D 0; } - expand_clr(dofs, opsz, clsz); - } else { - check_size_l(opsz, clsz); - check_align_l_3(dofs, aofs, bofs); - expand_3_o(dofs, aofs, bofs, opsz, clsz, g->fno); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + opsz -=3D done; + clsz -=3D done; } + + if (check_size_impl(opsz, 4)) { + uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 4); + expand_3x4(dofs, aofs, bofs, done, g->fni4); + dofs +=3D done; + aofs +=3D done; + bofs +=3D done; + opsz -=3D done; + clsz -=3D done; + } + + if (opsz =3D=3D 0) { + if (clsz !=3D 0) { + expand_clr(dofs, clsz); + } + return; + } + + do_ool: + expand_3_o(dofs, aofs, bofs, opsz, clsz, g->fno); } =20 static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) @@ -240,6 +343,9 @@ void tcg_gen_gvec_add8(uint32_t dofs, uint32_t aofs, ui= nt32_t bofs, static const GVecGen3 g =3D { .extra_value =3D REP8(0x80), .fni8x =3D gen_addv_mask, + .op_v256 =3D INDEX_op_add8_v256, + .op_v128 =3D INDEX_op_add8_v128, + .op_v64 =3D INDEX_op_add8_v64, .fno =3D gen_helper_gvec_add8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -251,6 +357,9 @@ void tcg_gen_gvec_add16(uint32_t dofs, uint32_t aofs, u= int32_t bofs, static const GVecGen3 g =3D { .extra_value =3D REP16(0x8000), .fni8x =3D gen_addv_mask, + .op_v256 =3D INDEX_op_add16_v256, + .op_v128 =3D INDEX_op_add16_v128, + .op_v64 =3D INDEX_op_add16_v64, .fno =3D gen_helper_gvec_add16, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -261,6 +370,9 @@ void tcg_gen_gvec_add32(uint32_t dofs, uint32_t aofs, u= int32_t bofs, { static const GVecGen3 g =3D { .fni4 =3D tcg_gen_add_i32, + .op_v256 =3D INDEX_op_add32_v256, + .op_v128 =3D INDEX_op_add32_v128, + .op_v64 =3D INDEX_op_add32_v64, .fno =3D gen_helper_gvec_add32, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -271,6 +383,8 @@ void tcg_gen_gvec_add64(uint32_t dofs, uint32_t aofs, u= int32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_add_i64, + .op_v256 =3D INDEX_op_add64_v256, + .op_v128 =3D INDEX_op_add64_v128, .fno =3D gen_helper_gvec_add64, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -328,6 +442,9 @@ void tcg_gen_gvec_sub8(uint32_t dofs, uint32_t aofs, ui= nt32_t bofs, static const GVecGen3 g =3D { .extra_value =3D REP8(0x80), .fni8x =3D gen_subv_mask, + .op_v256 =3D INDEX_op_sub8_v256, + .op_v128 =3D INDEX_op_sub8_v128, + .op_v64 =3D INDEX_op_sub8_v64, .fno =3D gen_helper_gvec_sub8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -339,6 +456,9 @@ void tcg_gen_gvec_sub16(uint32_t dofs, uint32_t aofs, u= int32_t bofs, static const GVecGen3 g =3D { .extra_value =3D REP16(0x8000), .fni8x =3D gen_subv_mask, + .op_v256 =3D INDEX_op_sub16_v256, + .op_v128 =3D INDEX_op_sub16_v128, + .op_v64 =3D INDEX_op_sub16_v64, .fno =3D gen_helper_gvec_sub16, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -349,6 +469,9 @@ void tcg_gen_gvec_sub32(uint32_t dofs, uint32_t aofs, u= int32_t bofs, { static const GVecGen3 g =3D { .fni4 =3D tcg_gen_sub_i32, + .op_v256 =3D INDEX_op_sub32_v256, + .op_v128 =3D INDEX_op_sub32_v128, + .op_v64 =3D INDEX_op_sub32_v64, .fno =3D gen_helper_gvec_sub32, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -359,6 +482,8 @@ void tcg_gen_gvec_sub64(uint32_t dofs, uint32_t aofs, u= int32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_sub_i64, + .op_v256 =3D INDEX_op_sub64_v256, + .op_v128 =3D INDEX_op_sub64_v128, .fno =3D gen_helper_gvec_sub64, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -397,6 +522,9 @@ void tcg_gen_gvec_and8(uint32_t dofs, uint32_t aofs, ui= nt32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_and_i64, + .op_v256 =3D INDEX_op_and_v256, + .op_v128 =3D INDEX_op_and_v128, + .op_v64 =3D INDEX_op_and_v64, .fno =3D gen_helper_gvec_and8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -407,6 +535,9 @@ void tcg_gen_gvec_or8(uint32_t dofs, uint32_t aofs, uin= t32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_or_i64, + .op_v256 =3D INDEX_op_or_v256, + .op_v128 =3D INDEX_op_or_v128, + .op_v64 =3D INDEX_op_or_v64, .fno =3D gen_helper_gvec_or8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -417,6 +548,9 @@ void tcg_gen_gvec_xor8(uint32_t dofs, uint32_t aofs, ui= nt32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_xor_i64, + .op_v256 =3D INDEX_op_xor_v256, + .op_v128 =3D INDEX_op_xor_v128, + .op_v64 =3D INDEX_op_xor_v64, .fno =3D gen_helper_gvec_xor8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -427,6 +561,9 @@ void tcg_gen_gvec_andc8(uint32_t dofs, uint32_t aofs, u= int32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_andc_i64, + .op_v256 =3D INDEX_op_andc_v256, + .op_v128 =3D INDEX_op_andc_v128, + .op_v64 =3D INDEX_op_andc_v64, .fno =3D gen_helper_gvec_andc8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); @@ -437,6 +574,9 @@ void tcg_gen_gvec_orc8(uint32_t dofs, uint32_t aofs, ui= nt32_t bofs, { static const GVecGen3 g =3D { .fni8 =3D tcg_gen_orc_i64, + .op_v256 =3D INDEX_op_orc_v256, + .op_v128 =3D INDEX_op_orc_v128, + .op_v64 =3D INDEX_op_orc_v64, .fno =3D gen_helper_gvec_orc8, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); diff --git a/tcg/tcg.c b/tcg/tcg.c index 879b29e81f..86eb4214b0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -604,7 +604,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr = base, return temp_idx(s, ts); } =20 -static int tcg_temp_new_internal(TCGType type, int temp_local) +int tcg_temp_new_internal(TCGType type, bool temp_local) { TCGContext *s =3D &tcg_ctx; TCGTemp *ts; @@ -650,7 +650,7 @@ static int tcg_temp_new_internal(TCGType type, int temp= _local) return idx; } =20 -TCGv_i32 tcg_temp_new_internal_i32(int temp_local) +TCGv_i32 tcg_temp_new_internal_i32(bool temp_local) { int idx; =20 @@ -658,7 +658,7 @@ TCGv_i32 tcg_temp_new_internal_i32(int temp_local) return MAKE_TCGV_I32(idx); } =20 -TCGv_i64 tcg_temp_new_internal_i64(int temp_local) +TCGv_i64 tcg_temp_new_internal_i64(bool temp_local) { int idx; =20 @@ -666,7 +666,7 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local) return MAKE_TCGV_I64(idx); } =20 -static void tcg_temp_free_internal(int idx) +void tcg_temp_free_internal(int idx) { TCGContext *s =3D &tcg_ctx; TCGTemp *ts; --=20 2.13.5 From nobody Fri May 3 04:08:12 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503011467626297.3491086646583; Thu, 17 Aug 2017 16:11:07 -0700 (PDT) Received: from localhost ([::1]:56754 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTwQ-0004By-82 for importer@patchew.org; Thu, 17 Aug 2017 19:11:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diTnB-00053O-U6 for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diTn9-000107-5p for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:34 -0400 Received: from mail-pg0-x236.google.com ([2607:f8b0:400e:c05::236]:38835) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1diTn8-0000zc-TF for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:31 -0400 Received: by mail-pg0-x236.google.com with SMTP id t80so24357871pgb.5 for ; Thu, 17 Aug 2017 16:01:30 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id c23sm5190043pfc.136.2017.08.17.16.01.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 17 Aug 2017 16:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=dCdy+QRsPHrn9hSMlPvVaJHGa7p6SJQcfc8C7AMm1tk=; b=VmT+n/qCUGZgbd3eLemOH9Nwy2f5WiaWGMLwFtRtyJZPYP7igI5TzvPQAM6H53OOHQ eJps6bJs7v1YQkhIoENVQPuHwQHy9kiSaGGKjH/EmzY0USuka1WuHRL+6jGHSZjOXEiE rkFI2s/gp5j73pvkXtRFJo+RChJ5tj9kic1wU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dCdy+QRsPHrn9hSMlPvVaJHGa7p6SJQcfc8C7AMm1tk=; b=JFxK+d1MSN/E+Bo7FHGYoFXRn/Epo7zpewp2kuy7W1BeiuJFhfC7TIxP+USlrZJ47M 3bjcmMj/CltFuHpkWXiuBtQ3r3yPLG3XK22rbXvXwZ1qSYuT8MwJXzByM2IAVYn3rFLp H08ICOZjmUEYCf20O2pLYWyenLKFU6EiTBK57uXlhIYoS5sL7jqC8WG/WmN4ZpJBIP1A KiX47s2ZVCHNmTn6QppBtHqypGl7skvlZwRG8DjleOOoAL6o9IZpGbp61kZ+Fp808BlY Oy6au69SKQqlsyoPqKrn0u3QPRcAfCuW81vqugCSMZCc8p3TDTmT0Pwv6/SDV7/dD5IU R99w== X-Gm-Message-State: AHYfb5hYWYL+w7LFxzwID6t/l0XSnTXJJG63iwXqrUeuIF7Ct4hekT5S e5s8UqiGb+6YbULa86u2mg== X-Received: by 10.98.208.196 with SMTP id p187mr6819530pfg.320.1503010888086; Thu, 17 Aug 2017 16:01:28 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 17 Aug 2017 16:01:14 -0700 Message-Id: <20170817230114.3655-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org> References: <20170817230114.3655-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::236 Subject: [Qemu-devel] [PATCH 8/8] tcg/i386: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 46 +++++- tcg/tcg-opc.h | 12 +- tcg/i386/tcg-target.inc.c | 382 ++++++++++++++++++++++++++++++++++++++++++= ---- 3 files changed, 399 insertions(+), 41 deletions(-) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index e512648c95..147f82062b 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -30,11 +30,10 @@ =20 #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 -# define TCG_TARGET_NB_REGS 16 #else # define TCG_TARGET_REG_BITS 32 -# define TCG_TARGET_NB_REGS 8 #endif +# define TCG_TARGET_NB_REGS 24 =20 typedef enum { TCG_REG_EAX =3D 0, @@ -56,6 +55,19 @@ typedef enum { TCG_REG_R13, TCG_REG_R14, TCG_REG_R15, + + /* SSE registers; 64-bit has access to 8 more, but we won't + need more than a few and using only the first 8 minimizes + the need for a rex prefix on the sse instructions. */ + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, + TCG_REG_XMM6, + TCG_REG_XMM7, + TCG_REG_RAX =3D TCG_REG_EAX, TCG_REG_RCX =3D TCG_REG_ECX, TCG_REG_RDX =3D TCG_REG_EDX, @@ -79,6 +91,17 @@ extern bool have_bmi1; extern bool have_bmi2; extern bool have_popcnt; =20 +#ifdef __SSE2__ +#define have_sse2 true +#else +extern bool have_sse2; +#endif +#ifdef __AVX2__ +#define have_avx2 true +#else +extern bool have_avx2; +#endif + /* optional instructions */ #define TCG_TARGET_HAS_div2_i32 1 #define TCG_TARGET_HAS_rot_i32 1 @@ -147,6 +170,25 @@ extern bool have_popcnt; #define TCG_TARGET_HAS_mulsh_i64 0 #endif =20 +#define TCG_TARGET_HAS_v64 have_sse2 +#define TCG_TARGET_HAS_v128 have_sse2 +#define TCG_TARGET_HAS_v256 have_avx2 + +#define TCG_TARGET_HAS_andc_v64 TCG_TARGET_HAS_v64 +#define TCG_TARGET_HAS_orc_v64 0 +#define TCG_TARGET_HAS_not_v64 0 +#define TCG_TARGET_HAS_neg_v64 0 + +#define TCG_TARGET_HAS_andc_v128 TCG_TARGET_HAS_v128 +#define TCG_TARGET_HAS_orc_v128 0 +#define TCG_TARGET_HAS_not_v128 0 +#define TCG_TARGET_HAS_neg_v128 0 + +#define TCG_TARGET_HAS_andc_v256 TCG_TARGET_HAS_v256 +#define TCG_TARGET_HAS_orc_v256 0 +#define TCG_TARGET_HAS_not_v256 0 +#define TCG_TARGET_HAS_neg_v256 0 + #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (have_bmi2 || \ ((ofs) =3D=3D 0 && (len) =3D=3D 8) || \ diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b1445a4c24..b84cd584fb 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -212,13 +212,13 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1, /* Host integer vector operations. */ /* These opcodes are required whenever the base vector size is enabled. */ =20 -DEF(mov_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_v64)) -DEF(mov_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_v128)) -DEF(mov_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_v256)) +DEF(mov_v64, 1, 1, 0, TCG_OPF_NOT_PRESENT) +DEF(mov_v128, 1, 1, 0, TCG_OPF_NOT_PRESENT) +DEF(mov_v256, 1, 1, 0, TCG_OPF_NOT_PRESENT) =20 -DEF(movi_v64, 1, 0, 1, IMPL(TCG_TARGET_HAS_v64)) -DEF(movi_v128, 1, 0, 1, IMPL(TCG_TARGET_HAS_v128)) -DEF(movi_v256, 1, 0, 1, IMPL(TCG_TARGET_HAS_v256)) +DEF(movi_v64, 1, 0, 1, TCG_OPF_NOT_PRESENT) +DEF(movi_v128, 1, 0, 1, TCG_OPF_NOT_PRESENT) +DEF(movi_v256, 1, 0, 1, TCG_OPF_NOT_PRESENT) =20 DEF(ld_v64, 1, 1, 1, IMPL(TCG_TARGET_HAS_v64)) DEF(ld_v128, 1, 1, 1, IMPL(TCG_TARGET_HAS_v128)) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index aeefb72aa0..0e01b54aa0 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -31,7 +31,9 @@ static const char * const tcg_target_reg_names[TCG_TARGET= _NB_REGS] =3D { "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", #else "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi", + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, #endif + "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7", }; #endif =20 @@ -61,6 +63,14 @@ static const int tcg_target_reg_alloc_order[] =3D { TCG_REG_EDX, TCG_REG_EAX, #endif + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, + TCG_REG_XMM6, + TCG_REG_XMM7, }; =20 static const int tcg_target_call_iarg_regs[] =3D { @@ -94,7 +104,7 @@ static const int tcg_target_call_oarg_regs[] =3D { #define TCG_CT_CONST_I32 0x400 #define TCG_CT_CONST_WSZ 0x800 =20 -/* Registers used with L constraint, which are the first argument=20 +/* Registers used with L constraint, which are the first argument registers on x86_64, and two random call clobbered registers on i386. */ #if TCG_TARGET_REG_BITS =3D=3D 64 @@ -127,6 +137,16 @@ bool have_bmi1; bool have_bmi2; bool have_popcnt; =20 +#ifndef have_sse2 +bool have_sse2; +#endif +#ifdef have_avx2 +#define have_avx1 have_avx2 +#else +static bool have_avx1; +bool have_avx2; +#endif + #ifdef CONFIG_CPUID_H static bool have_movbe; static bool have_lzcnt; @@ -215,6 +235,10 @@ static const char *target_parse_constraint(TCGArgConst= raint *ct, /* With TZCNT/LZCNT, we can have operand-size as an input. */ ct->ct |=3D TCG_CT_CONST_WSZ; break; + case 'x': + ct->ct |=3D TCG_CT_REG; + tcg_regset_set32(ct->u.regs, 0, 0xff0000); + break; =20 /* qemu_ld/st address constraint */ case 'L': @@ -292,6 +316,7 @@ static inline int tcg_target_const_match(tcg_target_lon= g val, TCGType type, #endif #define P_SIMDF3 0x20000 /* 0xf3 opcode prefix */ #define P_SIMDF2 0x40000 /* 0xf2 opcode prefix */ +#define P_VEXL 0x80000 /* Set VEX.L =3D 1 */ =20 #define OPC_ARITH_EvIz (0x81) #define OPC_ARITH_EvIb (0x83) @@ -324,13 +349,31 @@ static inline int tcg_target_const_match(tcg_target_l= ong val, TCGType type, #define OPC_MOVL_Iv (0xb8) #define OPC_MOVBE_GyMy (0xf0 | P_EXT38) #define OPC_MOVBE_MyGy (0xf1 | P_EXT38) +#define OPC_MOVDQA_GyMy (0x6f | P_EXT | P_DATA16) +#define OPC_MOVDQA_MyGy (0x7f | P_EXT | P_DATA16) +#define OPC_MOVDQU_GyMy (0x6f | P_EXT | P_SIMDF3) +#define OPC_MOVDQU_MyGy (0x7f | P_EXT | P_SIMDF3) +#define OPC_MOVQ_GyMy (0x7e | P_EXT | P_SIMDF3) +#define OPC_MOVQ_MyGy (0xd6 | P_EXT | P_DATA16) #define OPC_MOVSBL (0xbe | P_EXT) #define OPC_MOVSWL (0xbf | P_EXT) #define OPC_MOVSLQ (0x63 | P_REXW) #define OPC_MOVZBL (0xb6 | P_EXT) #define OPC_MOVZWL (0xb7 | P_EXT) +#define OPC_PADDB (0xfc | P_EXT | P_DATA16) +#define OPC_PADDW (0xfd | P_EXT | P_DATA16) +#define OPC_PADDD (0xfe | P_EXT | P_DATA16) +#define OPC_PADDQ (0xd4 | P_EXT | P_DATA16) +#define OPC_PAND (0xdb | P_EXT | P_DATA16) +#define OPC_PANDN (0xdf | P_EXT | P_DATA16) #define OPC_PDEP (0xf5 | P_EXT38 | P_SIMDF2) #define OPC_PEXT (0xf5 | P_EXT38 | P_SIMDF3) +#define OPC_POR (0xeb | P_EXT | P_DATA16) +#define OPC_PSUBB (0xf8 | P_EXT | P_DATA16) +#define OPC_PSUBW (0xf9 | P_EXT | P_DATA16) +#define OPC_PSUBD (0xfa | P_EXT | P_DATA16) +#define OPC_PSUBQ (0xfb | P_EXT | P_DATA16) +#define OPC_PXOR (0xef | P_EXT | P_DATA16) #define OPC_POP_r32 (0x58) #define OPC_POPCNT (0xb8 | P_EXT | P_SIMDF3) #define OPC_PUSH_r32 (0x50) @@ -500,7 +543,8 @@ static void tcg_out_modrm(TCGContext *s, int opc, int r= , int rm) tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } =20 -static void tcg_out_vex_pfx_opc(TCGContext *s, int opc, int r, int v, int = rm) +static void tcg_out_vex_pfx_opc(TCGContext *s, int opc, int r, int v, + int rm, int index) { int tmp; =20 @@ -515,14 +559,16 @@ static void tcg_out_vex_pfx_opc(TCGContext *s, int op= c, int r, int v, int rm) } else if (opc & P_EXT) { tmp =3D 1; } else { - tcg_abort(); + g_assert_not_reached(); } - tmp |=3D 0x40; /* VEX.X */ tmp |=3D (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp |=3D (index & 8 ? 0 : 0x40); /* VEX.X */ tmp |=3D (rm & 8 ? 0 : 0x20); /* VEX.B */ tcg_out8(s, tmp); =20 tmp =3D (opc & P_REXW ? 0x80 : 0); /* VEX.W */ + tmp |=3D (opc & P_VEXL ? 0x04 : 0); /* VEX.L */ + /* VEX.pp */ if (opc & P_DATA16) { tmp |=3D 1; /* 0x66 */ @@ -538,7 +584,7 @@ static void tcg_out_vex_pfx_opc(TCGContext *s, int opc,= int r, int v, int rm) =20 static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) { - tcg_out_vex_pfx_opc(s, opc, r, v, rm); + tcg_out_vex_pfx_opc(s, opc, r, v, rm, 0); tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } =20 @@ -565,7 +611,7 @@ static void tcg_out_opc_pool_imm(TCGContext *s, int opc= , int r, static void tcg_out_vex_pool_imm(TCGContext *s, int opc, int r, int v, tcg_target_ulong data) { - tcg_out_vex_pfx_opc(s, opc, r, v, 0); + tcg_out_vex_pfx_opc(s, opc, r, v, 0, 0); tcg_out_sfx_pool_imm(s, r, data); } =20 @@ -574,8 +620,8 @@ static void tcg_out_vex_pool_imm(TCGContext *s, int opc= , int r, int v, mode for absolute addresses, ~RM is the size of the immediate operand that will follow the instruction. */ =20 -static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, - int index, int shift, intptr_t offset) +static void tcg_out_sib_offset(TCGContext *s, int r, int rm, int index, + int shift, intptr_t offset) { int mod, len; =20 @@ -586,7 +632,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int= opc, int r, int rm, intptr_t pc =3D (intptr_t)s->code_ptr + 5 + ~rm; intptr_t disp =3D offset - pc; if (disp =3D=3D (int32_t)disp) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 5); tcg_out32(s, disp); return; @@ -596,7 +641,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int= opc, int r, int rm, use of the MODRM+SIB encoding and is therefore larger than rip-relative addressing. */ if (offset =3D=3D (int32_t)offset) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (4 << 3) | 5); tcg_out32(s, offset); @@ -604,10 +648,9 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, in= t opc, int r, int rm, } =20 /* ??? The memory isn't directly addressable. */ - tcg_abort(); + g_assert_not_reached(); } else { /* Absolute address. */ - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (r << 3) | 5); tcg_out32(s, offset); return; @@ -630,7 +673,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int= opc, int r, int rm, that would be used for %esp is the escape to the two byte form. */ if (index < 0 && LOWREGMASK(rm) !=3D TCG_REG_ESP) { /* Single byte MODRM format. */ - tcg_out_opc(s, opc, r, rm, 0); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } else { /* Two byte MODRM+SIB format. */ @@ -644,7 +686,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int= opc, int r, int rm, tcg_debug_assert(index !=3D TCG_REG_ESP); } =20 - tcg_out_opc(s, opc, r, rm, index); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (shift << 6) | (LOWREGMASK(index) << 3) | LOWREGMASK(r= m)); } @@ -656,6 +697,21 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, in= t opc, int r, int rm, } } =20 +static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, + int index, int shift, intptr_t offset) +{ + tcg_out_opc(s, opc, r, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + +static void tcg_out_vex_modrm_sib_offset(TCGContext *s, int opc, int r, in= t v, + int rm, int index, int shift, + intptr_t offset) +{ + tcg_out_vex_pfx_opc(s, opc, r, v, rm < 0 ? 0 : rm, index < 0 ? 0 : ind= ex); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + /* A simplification of the above with no index or shift. */ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, int rm, intptr_t offset) @@ -663,6 +719,31 @@ static inline void tcg_out_modrm_offset(TCGContext *s,= int opc, int r, tcg_out_modrm_sib_offset(s, opc, r, rm, -1, 0, offset); } =20 +static inline void tcg_out_vex_modrm_offset(TCGContext *s, int opc, int r, + int v, int rm, intptr_t offset) +{ + tcg_out_vex_modrm_sib_offset(s, opc, r, v, rm, -1, 0, offset); +} + +static void tcg_out_maybe_vex_modrm(TCGContext *s, int opc, int r, int rm) +{ + if (have_avx1) { + tcg_out_vex_modrm(s, opc, r, 0, rm); + } else { + tcg_out_modrm(s, opc, r, rm); + } +} + +static void tcg_out_maybe_vex_modrm_offset(TCGContext *s, int opc, int r, + int rm, intptr_t offset) +{ + if (have_avx1) { + tcg_out_vex_modrm_offset(s, opc, r, 0, rm, offset); + } else { + tcg_out_modrm_offset(s, opc, r, rm, offset); + } +} + /* Generate dest op=3D src. Uses the same ARITH_* codes as tgen_arithi. = */ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) { @@ -673,12 +754,32 @@ static inline void tgen_arithr(TCGContext *s, int sub= op, int dest, int src) tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src); } =20 -static inline void tcg_out_mov(TCGContext *s, TCGType type, - TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg ar= g) { if (arg !=3D ret) { - int opc =3D OPC_MOVL_GvEv + (type =3D=3D TCG_TYPE_I64 ? P_REXW : 0= ); - tcg_out_modrm(s, opc, ret, arg); + int opc =3D 0; + + switch (type) { + case TCG_TYPE_I64: + opc =3D P_REXW; + /* fallthru */ + case TCG_TYPE_I32: + opc |=3D OPC_MOVL_GvEv; + tcg_out_modrm(s, opc, ret, arg); + break; + + case TCG_TYPE_V256: + opc =3D P_VEXL; + /* fallthru */ + case TCG_TYPE_V128: + case TCG_TYPE_V64: + opc |=3D OPC_MOVDQA_GyMy; + tcg_out_maybe_vex_modrm(s, opc, ret, arg); + break; + + default: + g_assert_not_reached(); + } } } =20 @@ -687,6 +788,27 @@ static void tcg_out_movi(TCGContext *s, TCGType type, { tcg_target_long diff; =20 + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + break; + + case TCG_TYPE_V64: + case TCG_TYPE_V128: + case TCG_TYPE_V256: + /* ??? Revisit this as the implementation progresses. */ + tcg_debug_assert(arg =3D=3D 0); + if (have_avx1) { + tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret); + } else { + tcg_out_modrm(s, OPC_PXOR, ret, ret); + } + return; + + default: + g_assert_not_reached(); + } + if (arg =3D=3D 0) { tgen_arithr(s, ARITH_XOR, ret, ret); return; @@ -750,18 +872,54 @@ static inline void tcg_out_pop(TCGContext *s, int reg) tcg_out_opc(s, OPC_POP_r32 + LOWREGMASK(reg), 0, reg, 0); } =20 -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg arg1, intptr_t arg2) { - int opc =3D OPC_MOVL_GvEv + (type =3D=3D TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, ret, arg1, arg2); + switch (type) { + case TCG_TYPE_I64: + tcg_out_modrm_offset(s, OPC_MOVL_GvEv | P_REXW, ret, arg1, arg2); + break; + case TCG_TYPE_I32: + tcg_out_modrm_offset(s, OPC_MOVL_GvEv, ret, arg1, arg2); + break; + case TCG_TYPE_V64: + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVQ_GyMy, ret, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVDQU_GyMy, ret, arg1, arg2= ); + break; + case TCG_TYPE_V256: + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_GyMy | P_VEXL, + ret, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } =20 -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, + TCGReg arg1, intptr_t arg2) { - int opc =3D OPC_MOVL_EvGv + (type =3D=3D TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, arg, arg1, arg2); + switch (type) { + case TCG_TYPE_I64: + tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_REXW, arg, arg1, arg2); + break; + case TCG_TYPE_I32: + tcg_out_modrm_offset(s, OPC_MOVL_EvGv, arg, arg1, arg2); + break; + case TCG_TYPE_V64: + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVQ_MyGy, arg, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVDQU_MyGy, arg, arg1, arg2= ); + break; + case TCG_TYPE_V256: + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_MyGy | P_VEXL, + arg, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } =20 static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -773,6 +931,8 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TC= GArg val, return false; } rexw =3D P_REXW; + } else if (type !=3D TCG_TYPE_I32) { + return false; } tcg_out_modrm_offset(s, OPC_MOVL_EvIz | rexw, 0, base, ofs); tcg_out32(s, val); @@ -1914,6 +2074,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpco= de opc, case glue(glue(INDEX_op_, x), _i32) #endif =20 +#define OP_128_256(x) \ + case glue(glue(INDEX_op_, x), _v256): \ + rexw =3D P_VEXL; /* FALLTHRU */ \ + case glue(glue(INDEX_op_, x), _v128) + +#define OP_64_128_256(x) \ + OP_128_256(x): \ + case glue(glue(INDEX_op_, x), _v64) + /* Hoist the loads of the most common arguments. */ a0 =3D args[0]; a1 =3D args[1]; @@ -2379,19 +2548,94 @@ static inline void tcg_out_op(TCGContext *s, TCGOpc= ode opc, } break; =20 + OP_64_128_256(add8): + c =3D OPC_PADDB; + goto gen_simd; + OP_64_128_256(add16): + c =3D OPC_PADDW; + goto gen_simd; + OP_64_128_256(add32): + c =3D OPC_PADDD; + goto gen_simd; + OP_128_256(add64): + c =3D OPC_PADDQ; + goto gen_simd; + OP_64_128_256(sub8): + c =3D OPC_PSUBB; + goto gen_simd; + OP_64_128_256(sub16): + c =3D OPC_PSUBW; + goto gen_simd; + OP_64_128_256(sub32): + c =3D OPC_PSUBD; + goto gen_simd; + OP_128_256(sub64): + c =3D OPC_PSUBQ; + goto gen_simd; + OP_64_128_256(and): + c =3D OPC_PAND; + goto gen_simd; + OP_64_128_256(andc): + c =3D OPC_PANDN; + goto gen_simd; + OP_64_128_256(or): + c =3D OPC_POR; + goto gen_simd; + OP_64_128_256(xor): + c =3D OPC_PXOR; + gen_simd: + if (have_avx1) { + tcg_out_vex_modrm(s, c, a0, a1, a2); + } else { + tcg_out_modrm(s, c, a0, a2); + } + break; + + case INDEX_op_ld_v64: + c =3D TCG_TYPE_V64; + goto gen_simd_ld; + case INDEX_op_ld_v128: + c =3D TCG_TYPE_V128; + goto gen_simd_ld; + case INDEX_op_ld_v256: + c =3D TCG_TYPE_V256; + gen_simd_ld: + tcg_out_ld(s, c, a0, a1, a2); + break; + + case INDEX_op_st_v64: + c =3D TCG_TYPE_V64; + goto gen_simd_st; + case INDEX_op_st_v128: + c =3D TCG_TYPE_V128; + goto gen_simd_st; + case INDEX_op_st_v256: + c =3D TCG_TYPE_V256; + gen_simd_st: + tcg_out_st(s, c, a0, a1, a2); + break; + case INDEX_op_mb: tcg_out_mb(s, a0); break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_v64: + case INDEX_op_mov_v128: + case INDEX_op_mov_v256: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_movi_v64: + case INDEX_op_movi_v128: + case INDEX_op_movi_v256: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: tcg_abort(); } =20 #undef OP_32_64 +#undef OP_128_256 +#undef OP_64_128_256 } =20 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) @@ -2417,6 +2661,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpc= ode op) =3D { .args_ct_str =3D { "r", "r", "L", "L" } }; static const TCGTargetOpDef L_L_L_L =3D { .args_ct_str =3D { "L", "L", "L", "L" } }; + static const TCGTargetOpDef x_0_x =3D { .args_ct_str =3D { "x", "0", "= x" } }; + static const TCGTargetOpDef x_x_x =3D { .args_ct_str =3D { "x", "x", "= x" } }; + static const TCGTargetOpDef x_r =3D { .args_ct_str =3D { "x", "r" } }; =20 switch (op) { case INDEX_op_goto_ptr: @@ -2620,6 +2867,52 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOp= code op) return &s2; } =20 + case INDEX_op_ld_v64: + case INDEX_op_ld_v128: + case INDEX_op_ld_v256: + case INDEX_op_st_v64: + case INDEX_op_st_v128: + case INDEX_op_st_v256: + return &x_r; + + case INDEX_op_add8_v64: + case INDEX_op_add8_v128: + case INDEX_op_add16_v64: + case INDEX_op_add16_v128: + case INDEX_op_add32_v64: + case INDEX_op_add32_v128: + case INDEX_op_add64_v128: + case INDEX_op_sub8_v64: + case INDEX_op_sub8_v128: + case INDEX_op_sub16_v64: + case INDEX_op_sub16_v128: + case INDEX_op_sub32_v64: + case INDEX_op_sub32_v128: + case INDEX_op_sub64_v128: + case INDEX_op_and_v64: + case INDEX_op_and_v128: + case INDEX_op_andc_v64: + case INDEX_op_andc_v128: + case INDEX_op_or_v64: + case INDEX_op_or_v128: + case INDEX_op_xor_v64: + case INDEX_op_xor_v128: + return have_avx1 ? &x_x_x : &x_0_x; + + case INDEX_op_add8_v256: + case INDEX_op_add16_v256: + case INDEX_op_add32_v256: + case INDEX_op_add64_v256: + case INDEX_op_sub8_v256: + case INDEX_op_sub16_v256: + case INDEX_op_sub32_v256: + case INDEX_op_sub64_v256: + case INDEX_op_and_v256: + case INDEX_op_andc_v256: + case INDEX_op_or_v256: + case INDEX_op_xor_v256: + return &x_x_x; + default: break; } @@ -2725,9 +3018,16 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int c= ount) static void tcg_target_init(TCGContext *s) { #ifdef CONFIG_CPUID_H - unsigned a, b, c, d; + unsigned a, b, c, d, b7 =3D 0; int max =3D __get_cpuid_max(0, 0); =20 + if (max >=3D 7) { + /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ + __cpuid_count(7, 0, a, b7, c, d); + have_bmi1 =3D (b7 & bit_BMI) !=3D 0; + have_bmi2 =3D (b7 & bit_BMI2) !=3D 0; + } + if (max >=3D 1) { __cpuid(1, a, b, c, d); #ifndef have_cmov @@ -2736,17 +3036,26 @@ static void tcg_target_init(TCGContext *s) available, we'll use a small forward branch. */ have_cmov =3D (d & bit_CMOV) !=3D 0; #endif +#ifndef have_sse2 + have_sse2 =3D (d & bit_SSE2) !=3D 0; +#endif /* MOVBE is only available on Intel Atom and Haswell CPUs, so we need to probe for it. */ have_movbe =3D (c & bit_MOVBE) !=3D 0; have_popcnt =3D (c & bit_POPCNT) !=3D 0; - } =20 - if (max >=3D 7) { - /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ - __cpuid_count(7, 0, a, b, c, d); - have_bmi1 =3D (b & bit_BMI) !=3D 0; - have_bmi2 =3D (b & bit_BMI2) !=3D 0; +#ifndef have_avx2 + /* There are a number of things we must check before we can be + sure of not hitting invalid opcode. */ + if (c & bit_OSXSAVE) { + unsigned xcrl, xcrh; + asm ("xgetbv" : "=3Da" (xcrl), "=3Dd" (xcrh) : "c" (0)); + if (xcrl & 6 =3D=3D 6) { + have_avx1 =3D (c & bit_AVX) !=3D 0; + have_avx2 =3D (b7 & bit_AVX2) !=3D 0; + } + } +#endif } =20 max =3D __get_cpuid_max(0x8000000, 0); @@ -2763,6 +3072,13 @@ static void tcg_target_init(TCGContext *s) } else { tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xff); } + if (have_sse2) { + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_V64], 0, 0xff0= 000); + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_V128], 0, 0xff= 0000); + } + if (have_avx2) { + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_V256], 0, 0xff= 0000); + } =20 tcg_regset_clear(tcg_target_call_clobber_regs); tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_EAX); --=20 2.13.5