From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011159403129.33278494761623;
 Thu, 17 Aug 2017 16:05:59 -0700 (PDT)
Received: from localhost ([::1]:56352 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diTrS-0008LA-2R
	for importer@patchew.org; Thu, 17 Aug 2017 19:05:58 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44511)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn3-0004vZ-K0
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn0-0000qV-BS
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400
Received: from mail-pg0-x22a.google.com ([2607:f8b0:400e:c05::22a]:38803)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn0-0000pD-1a
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:22 -0400
Received: by mail-pg0-x22a.google.com with SMTP id t80so24355818pgb.5
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:21 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.16
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=6IuCfVCVWZnbLbStTM3CUG3Tw378RJWUNZA3UpqaoME=;
	b=HFo6/2zN2J4WP4XpPEfxz1htFoKxrkWDoeCwSZOFFO3LDcTHQFMm4IluN8zVNDlIXT
	bYvyf3TgmNLlVOKVa6ThZen0jBM4gKa8AGN0hjkBM3wULth3tNGFtuawxLCNnRV8P/Sb
	T8IyC60YFJHgGcmj9UHeE2AIR6zgZIPaiXW6M=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=6IuCfVCVWZnbLbStTM3CUG3Tw378RJWUNZA3UpqaoME=;
	b=EKkXZBdRVEcAzKMb/ijp0yqonKcAGDTBv/uHhrv8GeCpHnMHrUHgeI3UZi1q0jMiBp
	DnNTEmehAFfZHhoSTqaa2keNj3kFLxFkPAQ5kVkyWlT+CsmgDM93AlVXxeHjgsQHUJma
	KkYlw6U1SZg2jz6d1bcGERyas9hcBKJ3XXOzJycuvhkFwi6vlFG18/nIMUX6zc63wx2W
	80CNRhIgRNXmjdYdSsLcQ2Bj3hkXfT7sUO7sWRKuIEwaJGw3+/hVBfhP8a0HEH6yFcAB
	fXHujo6TlwTgJk1B0+jOVDKnroE30gGMNz9RMhLWJ1JsElkjS45Gw+3UQAXyZi+cZQq7
	5hAw==
X-Gm-Message-State: AHYfb5hMTKEAKMYTGSW2EVPUoLXN9Th613V5SeaDqAl3lOwHrlW8bN73
	p7lAoNWe0OWSXVL9wWwwTQ==
X-Received: by 10.98.70.132 with SMTP id o4mr6736746pfi.104.1503010877999;
	Thu, 17 Aug 2017 16:01:17 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:07 -0700
Message-Id: <20170817230114.3655-2-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::22a
Subject: [Qemu-devel] [PATCH 1/8] tcg: Add generic vector infrastructure and
 ops for add/sub/logic
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
---
 Makefile.target        |   5 +-
 tcg/tcg-op-gvec.h      |  88 ++++++++++
 tcg/tcg-runtime.h      |  16 ++
 tcg/tcg-op-gvec.c      | 443 +++++++++++++++++++++++++++++++++++++++++++++=
++++
 tcg/tcg-runtime-gvec.c | 199 ++++++++++++++++++++++
 5 files changed, 749 insertions(+), 2 deletions(-)
 create mode 100644 tcg/tcg-op-gvec.h
 create mode 100644 tcg/tcg-op-gvec.c
 create mode 100644 tcg/tcg-runtime-gvec.c

diff --git a/Makefile.target b/Makefile.target
index 7f42c45db8..9ae3e904f7 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -93,8 +93,9 @@ all: $(PROGS) stap
 # cpu emulator library
 obj-y +=3D exec.o
 obj-y +=3D accel/
-obj-$(CONFIG_TCG) +=3D tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
-obj-$(CONFIG_TCG) +=3D tcg/tcg-common.o tcg/tcg-runtime.o
+obj-$(CONFIG_TCG) +=3D tcg/tcg.o tcg/tcg-common.o tcg/optimize.o
+obj-$(CONFIG_TCG) +=3D tcg/tcg-op.o tcg/tcg-op-gvec.o
+obj-$(CONFIG_TCG) +=3D tcg/tcg-runtime.o tcg/tcg-runtime-gvec.o
 obj-$(CONFIG_TCG_INTERPRETER) +=3D tcg/tci.o
 obj-$(CONFIG_TCG_INTERPRETER) +=3D disas/tci.o
 obj-y +=3D fpu/softfloat.o
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
new file mode 100644
index 0000000000..10db3599a5
--- /dev/null
+++ b/tcg/tcg-op-gvec.h
@@ -0,0 +1,88 @@
+/*
+ *  Generic vector operation expansion
+ *
+ *  Copyright (c) 2017 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licens=
es/>.
+ */
+
+/*
+ * "Generic" vectors.  All operands are given as offsets from ENV,
+ * and therefore cannot also be allocated via tcg_global_mem_new_*.
+ * OPSZ is the byte size of the vector upon which the operation is perform=
ed.
+ * CLSZ is the byte size of the full vector; bytes beyond OPSZ are cleared.
+ *
+ * All sizes must be 8 or any multiple of 16.
+ * When OPSZ is 8, the alignment may be 8, otherwise must be 16.
+ * Operands may completely, but not partially, overlap.
+ */
+
+/* Fundamental operation expanders.  These are exposed to the front ends
+   so that target-specific SIMD operations can be handled similarly to
+   the standard SIMD operations.  */
+
+typedef struct {
+    /* "Small" sizes: expand inline as a 64-bit or 32-bit lane.
+       Generally only one of these will be non-NULL.  */
+    void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64);
+    void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32);
+    /* Similarly, but load up a constant and re-use across lanes.  */
+    void (*fni8x)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64);
+    uint64_t extra_value;
+    /* Larger sizes: expand out-of-line helper w/size descriptor.  */
+    void (*fno)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+} GVecGen3;
+
+void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                    uint32_t opsz, uint32_t clsz, const GVecGen3 *);
+
+#define DEF_GVEC_2(X) \
+    void tcg_gen_gvec_##X(uint32_t dofs, uint32_t aofs, uint32_t bofs, \
+                          uint32_t opsz, uint32_t clsz)
+
+DEF_GVEC_2(add8);
+DEF_GVEC_2(add16);
+DEF_GVEC_2(add32);
+DEF_GVEC_2(add64);
+
+DEF_GVEC_2(sub8);
+DEF_GVEC_2(sub16);
+DEF_GVEC_2(sub32);
+DEF_GVEC_2(sub64);
+
+DEF_GVEC_2(and8);
+DEF_GVEC_2(or8);
+DEF_GVEC_2(xor8);
+DEF_GVEC_2(andc8);
+DEF_GVEC_2(orc8);
+
+#undef DEF_GVEC_2
+
+/*
+ * 64-bit vector operations.  Use these when the register has been
+ * allocated with tcg_global_mem_new_i64.  OPSZ =3D CLSZ =3D 8.
+ */
+
+#define DEF_VEC8_2(X) \
+    void tcg_gen_vec8_##X(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+
+DEF_VEC8_2(add8);
+DEF_VEC8_2(add16);
+DEF_VEC8_2(add32);
+
+DEF_VEC8_2(sub8);
+DEF_VEC8_2(sub16);
+DEF_VEC8_2(sub32);
+
+#undef DEF_VEC8_2
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index c41d38a557..f8d07090f8 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -134,3 +134,19 @@ GEN_ATOMIC_HELPERS(xor_fetch)
 GEN_ATOMIC_HELPERS(xchg)
=20
 #undef GEN_ATOMIC_HELPERS
+
+DEF_HELPER_FLAGS_4(gvec_add8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_and8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_or8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_xor8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_andc8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_orc8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
new file mode 100644
index 0000000000..6de49dc07f
--- /dev/null
+++ b/tcg/tcg-op-gvec.c
@@ -0,0 +1,443 @@
+/*
+ *  Generic vector operation expansion
+ *
+ *  Copyright (c) 2017 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licens=
es/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "tcg.h"
+#include "tcg-op.h"
+#include "tcg-op-gvec.h"
+#include "trace-tcg.h"
+#include "trace/mem.h"
+
+#define REP8(x)    ((x) * 0x0101010101010101ull)
+#define REP16(x)   ((x) * 0x0001000100010001ull)
+
+#define MAX_INLINE 16
+
+static inline void check_size_s(uint32_t opsz, uint32_t clsz)
+{
+    tcg_debug_assert(opsz % 8 =3D=3D 0);
+    tcg_debug_assert(clsz % 8 =3D=3D 0);
+    tcg_debug_assert(opsz <=3D clsz);
+}
+
+static inline void check_align_s_3(uint32_t dofs, uint32_t aofs, uint32_t =
bofs)
+{
+    tcg_debug_assert(dofs % 8 =3D=3D 0);
+    tcg_debug_assert(aofs % 8 =3D=3D 0);
+    tcg_debug_assert(bofs % 8 =3D=3D 0);
+}
+
+static inline void check_size_l(uint32_t opsz, uint32_t clsz)
+{
+    tcg_debug_assert(opsz % 16 =3D=3D 0);
+    tcg_debug_assert(clsz % 16 =3D=3D 0);
+    tcg_debug_assert(opsz <=3D clsz);
+}
+
+static inline void check_align_l_3(uint32_t dofs, uint32_t aofs, uint32_t =
bofs)
+{
+    tcg_debug_assert(dofs % 16 =3D=3D 0);
+    tcg_debug_assert(aofs % 16 =3D=3D 0);
+    tcg_debug_assert(bofs % 16 =3D=3D 0);
+}
+
+static inline void check_overlap_3(uint32_t d, uint32_t a,
+                                   uint32_t b, uint32_t s)
+{
+    tcg_debug_assert(d =3D=3D a || d + s <=3D a || a + s <=3D d);
+    tcg_debug_assert(d =3D=3D b || d + s <=3D b || b + s <=3D d);
+    tcg_debug_assert(a =3D=3D b || a + s <=3D b || b + s <=3D a);
+}
+
+static void expand_clr(uint32_t dofs, uint32_t opsz, uint32_t clsz)
+{
+    if (clsz > opsz) {
+        TCGv_i64 zero =3D tcg_const_i64(0);
+        uint32_t i;
+
+        for (i =3D opsz; i < clsz; i +=3D 8) {
+            tcg_gen_st_i64(zero, tcg_ctx.tcg_env, dofs + i);
+        }
+        tcg_temp_free_i64(zero);
+    }
+}
+
+static TCGv_i32 make_desc(uint32_t opsz, uint32_t clsz)
+{
+    tcg_debug_assert(opsz >=3D 16 && opsz <=3D 255 * 16 && opsz % 16 =3D=
=3D 0);
+    tcg_debug_assert(clsz >=3D 16 && clsz <=3D 255 * 16 && clsz % 16 =3D=
=3D 0);
+    opsz /=3D 16;
+    clsz /=3D 16;
+    opsz -=3D 1;
+    clsz -=3D 1;
+    return tcg_const_i32(deposit32(opsz, 8, 8, clsz));
+}
+
+static void expand_3_o(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t clsz,
+                       void (*fno)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32))
+{
+    TCGv_ptr d =3D tcg_temp_new_ptr();
+    TCGv_ptr a =3D tcg_temp_new_ptr();
+    TCGv_ptr b =3D tcg_temp_new_ptr();
+    TCGv_i32 desc =3D make_desc(opsz, clsz);
+
+    tcg_gen_addi_ptr(d, tcg_ctx.tcg_env, dofs);
+    tcg_gen_addi_ptr(a, tcg_ctx.tcg_env, aofs);
+    tcg_gen_addi_ptr(b, tcg_ctx.tcg_env, bofs);
+    fno(d, a, b, desc);
+
+    tcg_temp_free_ptr(d);
+    tcg_temp_free_ptr(a);
+    tcg_temp_free_ptr(b);
+    tcg_temp_free_i32(desc);
+}
+
+static void expand_3x4(uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t opsz,
+                       void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 t0 =3D tcg_temp_new_i32();
+    uint32_t i;
+
+    if (aofs =3D=3D bofs) {
+        for (i =3D 0; i < opsz; i +=3D 4) {
+            tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i);
+            fni(t0, t0, t0);
+            tcg_gen_st_i32(t0, tcg_ctx.tcg_env, dofs + i);
+        }
+    } else {
+        TCGv_i32 t1 =3D tcg_temp_new_i32();
+        for (i =3D 0; i < opsz; i +=3D 4) {
+            tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i);
+            tcg_gen_ld_i32(t1, tcg_ctx.tcg_env, bofs + i);
+            fni(t0, t0, t1);
+            tcg_gen_st_i32(t0, tcg_ctx.tcg_env, dofs + i);
+        }
+        tcg_temp_free_i32(t1);
+    }
+    tcg_temp_free_i32(t0);
+}
+
+static void expand_3x8(uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t opsz,
+                       void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64))
+{
+    TCGv_i64 t0 =3D tcg_temp_new_i64();
+    uint32_t i;
+
+    if (aofs =3D=3D bofs) {
+        for (i =3D 0; i < opsz; i +=3D 8) {
+            tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i);
+            fni(t0, t0, t0);
+            tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i);
+        }
+    } else {
+        TCGv_i64 t1 =3D tcg_temp_new_i64();
+        for (i =3D 0; i < opsz; i +=3D 8) {
+            tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i);
+            tcg_gen_ld_i64(t1, tcg_ctx.tcg_env, bofs + i);
+            fni(t0, t0, t1);
+            tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i);
+        }
+        tcg_temp_free_i64(t1);
+    }
+    tcg_temp_free_i64(t0);
+}
+
+static void expand_3x8p1(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                         uint32_t opsz, uint64_t data,
+                         void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i6=
4))
+{
+    TCGv_i64 t0 =3D tcg_temp_new_i64();
+    TCGv_i64 t2 =3D tcg_const_i64(data);
+    uint32_t i;
+
+    if (aofs =3D=3D bofs) {
+        for (i =3D 0; i < opsz; i +=3D 8) {
+            tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i);
+            fni(t0, t0, t0, t2);
+            tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i);
+        }
+    } else {
+        TCGv_i64 t1 =3D tcg_temp_new_i64();
+        for (i =3D 0; i < opsz; i +=3D 8) {
+            tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i);
+            tcg_gen_ld_i64(t1, tcg_ctx.tcg_env, bofs + i);
+            fni(t0, t0, t1, t2);
+            tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i);
+        }
+        tcg_temp_free_i64(t1);
+    }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t2);
+}
+
+void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                    uint32_t opsz, uint32_t clsz, const GVecGen3 *g)
+{
+    check_overlap_3(dofs, aofs, bofs, clsz);
+    if (opsz <=3D MAX_INLINE) {
+        check_size_s(opsz, clsz);
+        check_align_s_3(dofs, aofs, bofs);
+        if (g->fni8) {
+            expand_3x8(dofs, aofs, bofs, opsz, g->fni8);
+        } else if (g->fni4) {
+            expand_3x4(dofs, aofs, bofs, opsz, g->fni4);
+        } else if (g->fni8x) {
+            expand_3x8p1(dofs, aofs, bofs, opsz, g->extra_value, g->fni8x);
+        } else {
+            g_assert_not_reached();
+        }
+        expand_clr(dofs, opsz, clsz);
+    } else {
+        check_size_l(opsz, clsz);
+        check_align_l_3(dofs, aofs, bofs);
+        expand_3_o(dofs, aofs, bofs, opsz, clsz, g->fno);
+    }
+}
+
+static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)
+{
+    TCGv_i64 t1 =3D tcg_temp_new_i64();
+    TCGv_i64 t2 =3D tcg_temp_new_i64();
+    TCGv_i64 t3 =3D tcg_temp_new_i64();
+
+    tcg_gen_andc_i64(t1, a, m);
+    tcg_gen_andc_i64(t2, b, m);
+    tcg_gen_xor_i64(t3, a, b);
+    tcg_gen_add_i64(d, t1, t2);
+    tcg_gen_and_i64(t3, t3, m);
+    tcg_gen_xor_i64(d, d, t3);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
+}
+
+void tcg_gen_gvec_add8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .extra_value =3D REP8(0x80),
+        .fni8x =3D gen_addv_mask,
+        .fno =3D gen_helper_gvec_add8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_add16(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .extra_value =3D REP16(0x8000),
+        .fni8x =3D gen_addv_mask,
+        .fno =3D gen_helper_gvec_add16,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_add32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni4 =3D tcg_gen_add_i32,
+        .fno =3D gen_helper_gvec_add32,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_add64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_add_i64,
+        .fno =3D gen_helper_gvec_add64,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_vec8_add8(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 m =3D tcg_const_i64(REP8(0x80));
+    gen_addv_mask(d, a, b, m);
+    tcg_temp_free_i64(m);
+}
+
+void tcg_gen_vec8_add16(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 m =3D tcg_const_i64(REP16(0x8000));
+    gen_addv_mask(d, a, b, m);
+    tcg_temp_free_i64(m);
+}
+
+void tcg_gen_vec8_add32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1 =3D tcg_temp_new_i64();
+    TCGv_i64 t2 =3D tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t1, a, ~0xffffffffull);
+    tcg_gen_add_i64(t2, a, b);
+    tcg_gen_add_i64(t1, t1, b);
+    tcg_gen_deposit_i64(d, t1, t2, 0, 32);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)
+{
+    TCGv_i64 t1 =3D tcg_temp_new_i64();
+    TCGv_i64 t2 =3D tcg_temp_new_i64();
+    TCGv_i64 t3 =3D tcg_temp_new_i64();
+
+    tcg_gen_or_i64(t1, a, m);
+    tcg_gen_andc_i64(t2, b, m);
+    tcg_gen_eqv_i64(t3, a, b);
+    tcg_gen_sub_i64(d, t1, t2);
+    tcg_gen_and_i64(t3, t3, m);
+    tcg_gen_xor_i64(d, d, t3);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
+}
+
+void tcg_gen_gvec_sub8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .extra_value =3D REP8(0x80),
+        .fni8x =3D gen_subv_mask,
+        .fno =3D gen_helper_gvec_sub8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_sub16(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .extra_value =3D REP16(0x8000),
+        .fni8x =3D gen_subv_mask,
+        .fno =3D gen_helper_gvec_sub16,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_sub32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni4 =3D tcg_gen_sub_i32,
+        .fno =3D gen_helper_gvec_sub32,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_sub64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_sub_i64,
+        .fno =3D gen_helper_gvec_sub64,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_vec8_sub8(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 m =3D tcg_const_i64(REP8(0x80));
+    gen_subv_mask(d, a, b, m);
+    tcg_temp_free_i64(m);
+}
+
+void tcg_gen_vec8_sub16(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 m =3D tcg_const_i64(REP16(0x8000));
+    gen_subv_mask(d, a, b, m);
+    tcg_temp_free_i64(m);
+}
+
+void tcg_gen_vec8_sub32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1 =3D tcg_temp_new_i64();
+    TCGv_i64 t2 =3D tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t1, b, ~0xffffffffull);
+    tcg_gen_sub_i64(t2, a, b);
+    tcg_gen_sub_i64(t1, a, t1);
+    tcg_gen_deposit_i64(d, t1, t2, 0, 32);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+void tcg_gen_gvec_and8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_and_i64,
+        .fno =3D gen_helper_gvec_and8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_or8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                      uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_or_i64,
+        .fno =3D gen_helper_gvec_or8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_xor8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_xor_i64,
+        .fno =3D gen_helper_gvec_xor8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_andc8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                        uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_andc_i64,
+        .fno =3D gen_helper_gvec_andc8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
+
+void tcg_gen_gvec_orc8(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t clsz)
+{
+    static const GVecGen3 g =3D {
+        .fni8 =3D tcg_gen_orc_i64,
+        .fno =3D gen_helper_gvec_orc8,
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
+}
diff --git a/tcg/tcg-runtime-gvec.c b/tcg/tcg-runtime-gvec.c
new file mode 100644
index 0000000000..9a37ce07a2
--- /dev/null
+++ b/tcg/tcg-runtime-gvec.c
@@ -0,0 +1,199 @@
+/*
+ *  Generic vectorized operation runtime
+ *
+ *  Copyright (c) 2017 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licens=
es/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+
+/* Virtually all hosts support 16-byte vectors.  Those that don't
+   can emulate them via GCC's generic vector extension.
+
+   In tcg-op-gvec.c, we asserted that both the size and alignment
+   of the data are multiples of 16.  */
+
+typedef uint8_t vec8 __attribute__((vector_size(16)));
+typedef uint16_t vec16 __attribute__((vector_size(16)));
+typedef uint32_t vec32 __attribute__((vector_size(16)));
+typedef uint64_t vec64 __attribute__((vector_size(16)));
+
+static inline intptr_t extract_opsz(uint32_t desc)
+{
+    return ((desc & 0xff) + 1) * 16;
+}
+
+static inline intptr_t extract_clsz(uint32_t desc)
+{
+    return (((desc >> 8) & 0xff) + 1) * 16;
+}
+
+static inline void clear_high(void *d, intptr_t opsz, uint32_t desc)
+{
+    intptr_t clsz =3D extract_clsz(desc);
+    intptr_t i;
+
+    if (unlikely(clsz > opsz)) {
+        for (i =3D opsz; i < clsz; i +=3D sizeof(vec64)) {
+            *(vec64 *)(d + i) =3D (vec64){ 0 };
+        }
+    }
+}
+
+void HELPER(gvec_add8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec8)) {
+        *(vec8 *)(d + i) =3D *(vec8 *)(a + i) + *(vec8 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_add16)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec16)) {
+        *(vec16 *)(d + i) =3D *(vec16 *)(a + i) + *(vec16 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_add32)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec32)) {
+        *(vec32 *)(d + i) =3D *(vec32 *)(a + i) + *(vec32 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) + *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec8)) {
+        *(vec8 *)(d + i) =3D *(vec8 *)(a + i) - *(vec8 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_sub16)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec16)) {
+        *(vec16 *)(d + i) =3D *(vec16 *)(a + i) - *(vec16 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_sub32)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec32)) {
+        *(vec32 *)(d + i) =3D *(vec32 *)(a + i) - *(vec32 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) - *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_and8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) & *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_or8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) | *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_xor8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) ^ *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_andc8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) &~ *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
+
+void HELPER(gvec_orc8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t opsz =3D extract_opsz(desc);
+    intptr_t i;
+
+    for (i =3D 0; i < opsz; i +=3D sizeof(vec64)) {
+        *(vec64 *)(d + i) =3D *(vec64 *)(a + i) |~ *(vec64 *)(b + i);
+    }
+    clear_high(d, opsz, desc);
+}
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011007548413.1288451564941;
 Thu, 17 Aug 2017 16:03:27 -0700 (PDT)
Received: from localhost ([::1]:56297 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diToy-0006A2-Or
	for importer@patchew.org; Thu, 17 Aug 2017 19:03:24 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44460)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn1-0004so-0O
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTmz-0000nM-1m
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:23 -0400
Received: from mail-pg0-x235.google.com ([2607:f8b0:400e:c05::235]:33446)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTmy-0000lm-Qa
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:20 -0400
Received: by mail-pg0-x235.google.com with SMTP id t3so23902992pgt.0
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:20 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.18
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=aXDZ4Qr9mwfYKP1Hb4K6fjgDzUCvCw9IzsWpbOBP7G0=;
	b=TXNcPQ8qLiZ3cZen6iUEA4m0Jsfn/DYXvMtximn7ImIWf0YkK5RkHY5Z+lwX+gpuvm
	k5c4PyN7e1pAPlEPAxkLgwaCpUKDkaljzxKAVNNrSYY9R9FE0k5S+IYD+LVZbqAR40vi
	VJL9vyISIzfpC7MvYPE8nBnleg+IbgdwNGpIM=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=aXDZ4Qr9mwfYKP1Hb4K6fjgDzUCvCw9IzsWpbOBP7G0=;
	b=PfcdjFlZa2QKmjZLYnxAM8Z+rwM5QGbWJP+hZ72h/pbCVaRMNp11rezXpVRrV0hAfX
	HSck5dn9BQYJEwJBP/+ywybYJPj7c9yv5Xj72oq269fdRFaoQ/2MQTzn5kVtKum8ca07
	FNpx794FNO76MV/F9pNh7KAtpmjSSwlV+WlxloE2S7fbCq17p2KznL29MRJ1SuY2K8Ur
	BVYvMUZ0Csl7KszS2R5lwNS5morS6tGGcKPTqgFNN9eceqbQjUazGjSbiJ2dkfMQ1Mrp
	0CE1hPijCYzlWVQQ2k+z5BD09zWj1JEsu/zo17tiKNevm2jv/QXDX/2u7T+wUmq28BK0
	wGPA==
X-Gm-Message-State: AHYfb5gYeGIhryK6mDjdyjj5wOrkcUTc2dDvAeObis/yVQU/ebgv+1st
	c0HO4iHjWfz2DkM5SvHztA==
X-Received: by 10.84.171.195 with SMTP id l61mr7755983plb.464.1503010879346;
	Thu, 17 Aug 2017 16:01:19 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:08 -0700
Message-Id: <20170817230114.3655-3-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::235
Subject: [Qemu-devel] [PATCH 2/8] target/arm: Use generic vector
 infrastructure for aa64 add/sub/logic
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 137 ++++++++++++++++++++++++++++-------------=
----
 1 file changed, 87 insertions(+), 50 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 2200e25be0..025354f983 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -21,6 +21,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "tcg-op.h"
+#include "tcg-op-gvec.h"
 #include "qemu/log.h"
 #include "arm_ldst.h"
 #include "translate.h"
@@ -82,6 +83,7 @@ typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCG=
v_i64, TCGv_ptr);
 typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
 typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32);
 typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void GVecGenTwoFn(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t=
);
=20
 /* initialize TCG globals.  */
 void a64_translate_init(void)
@@ -537,6 +539,21 @@ static inline int vec_reg_offset(DisasContext *s, int =
regno,
     return offs;
 }
=20
+/* Return the offset info CPUARMState of the "whole" vector register Qn.  =
*/
+static inline int vec_full_reg_offset(DisasContext *s, int regno)
+{
+    assert_fp_access_checked(s);
+    return offsetof(CPUARMState, vfp.regs[regno * 2]);
+}
+
+/* Return the byte size of the "whole" vector register, VL / 8.  */
+static inline int vec_full_reg_size(DisasContext *s)
+{
+    /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags.
+       In the meantime this is just the AdvSIMD length of 128.  */
+    return 128 / 8;
+}
+
 /* Return the offset into CPUARMState of a slice (from
  * the least significant end) of FP register Qn (ie
  * Dn, Sn, Hn or Bn).
@@ -9042,11 +9059,38 @@ static void disas_simd_3same_logic(DisasContext *s,=
 uint32_t insn)
     bool is_q =3D extract32(insn, 30, 1);
     TCGv_i64 tcg_op1, tcg_op2, tcg_res[2];
     int pass;
+    GVecGenTwoFn *gvec_op;
=20
     if (!fp_access_check(s)) {
         return;
     }
=20
+    switch (size + 4 * is_u) {
+    case 0: /* AND */
+        gvec_op =3D tcg_gen_gvec_and8;
+        goto do_gvec;
+    case 1: /* BIC */
+        gvec_op =3D tcg_gen_gvec_andc8;
+        goto do_gvec;
+    case 2: /* ORR */
+        gvec_op =3D tcg_gen_gvec_or8;
+        goto do_gvec;
+    case 3: /* ORN */
+        gvec_op =3D tcg_gen_gvec_orc8;
+        goto do_gvec;
+    case 4: /* EOR */
+        gvec_op =3D tcg_gen_gvec_xor8;
+        goto do_gvec;
+    do_gvec:
+        gvec_op(vec_full_reg_offset(s, rd),
+                vec_full_reg_offset(s, rn),
+                vec_full_reg_offset(s, rm),
+                is_q ? 16 : 8, vec_full_reg_size(s));
+        return;
+    }
+
+    /* Note that we've now eliminated all !is_u.  */
+
     tcg_op1 =3D tcg_temp_new_i64();
     tcg_op2 =3D tcg_temp_new_i64();
     tcg_res[0] =3D tcg_temp_new_i64();
@@ -9056,47 +9100,27 @@ static void disas_simd_3same_logic(DisasContext *s,=
 uint32_t insn)
         read_vec_element(s, tcg_op1, rn, pass, MO_64);
         read_vec_element(s, tcg_op2, rm, pass, MO_64);
=20
-        if (!is_u) {
-            switch (size) {
-            case 0: /* AND */
-                tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2);
-                break;
-            case 1: /* BIC */
-                tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2);
-                break;
-            case 2: /* ORR */
-                tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2);
-                break;
-            case 3: /* ORN */
-                tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2);
-                break;
-            }
-        } else {
-            if (size !=3D 0) {
-                /* B* ops need res loaded to operate on */
-                read_vec_element(s, tcg_res[pass], rd, pass, MO_64);
-            }
+        /* B* ops need res loaded to operate on */
+        read_vec_element(s, tcg_res[pass], rd, pass, MO_64);
=20
-            switch (size) {
-            case 0: /* EOR */
-                tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2);
-                break;
-            case 1: /* BSL bitwise select */
-                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2);
-                tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]);
-                tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1);
-                break;
-            case 2: /* BIT, bitwise insert if true */
-                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
-                tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2);
-                tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
-                break;
-            case 3: /* BIF, bitwise insert if false */
-                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
-                tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2);
-                tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
-                break;
-            }
+        switch (size) {
+        case 1: /* BSL bitwise select */
+            tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2);
+            tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+            tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1);
+            break;
+        case 2: /* BIT, bitwise insert if true */
+            tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+            tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2);
+            tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
+            break;
+        case 3: /* BIF, bitwise insert if false */
+            tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+            tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2);
+            tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
=20
@@ -9370,6 +9394,7 @@ static void disas_simd_3same_int(DisasContext *s, uin=
t32_t insn)
     int rn =3D extract32(insn, 5, 5);
     int rd =3D extract32(insn, 0, 5);
     int pass;
+    GVecGenTwoFn *gvec_op;
=20
     switch (opcode) {
     case 0x13: /* MUL, PMUL */
@@ -9409,6 +9434,28 @@ static void disas_simd_3same_int(DisasContext *s, ui=
nt32_t insn)
         return;
     }
=20
+    switch (opcode) {
+    case 0x10: /* ADD, SUB */
+        {
+            static GVecGenTwoFn * const fns[4][2] =3D {
+                { tcg_gen_gvec_add8, tcg_gen_gvec_sub8 },
+                { tcg_gen_gvec_add16, tcg_gen_gvec_sub16 },
+                { tcg_gen_gvec_add32, tcg_gen_gvec_sub32 },
+                { tcg_gen_gvec_add64, tcg_gen_gvec_sub64 },
+            };
+            gvec_op =3D fns[size][u];
+            goto do_gvec;
+        }
+        break;
+
+    do_gvec:
+        gvec_op(vec_full_reg_offset(s, rd),
+                vec_full_reg_offset(s, rn),
+                vec_full_reg_offset(s, rm),
+                is_q ? 16 : 8, vec_full_reg_size(s));
+        return;
+    }
+
     if (size =3D=3D 3) {
         assert(is_q);
         for (pass =3D 0; pass < 2; pass++) {
@@ -9581,16 +9628,6 @@ static void disas_simd_3same_int(DisasContext *s, ui=
nt32_t insn)
                 genfn =3D fns[size][u];
                 break;
             }
-            case 0x10: /* ADD, SUB */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] =3D {
-                    { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 },
-                    { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
-                    { tcg_gen_add_i32, tcg_gen_sub_i32 },
-                };
-                genfn =3D fns[size][u];
-                break;
-            }
             case 0x11: /* CMTST, CMEQ */
             {
                 static NeonGenTwoOpFn * const fns[3][2] =3D {
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011007994955.295648488074;
 Thu, 17 Aug 2017 16:03:27 -0700 (PDT)
Received: from localhost ([::1]:56299 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diToy-0006An-Nt
	for importer@patchew.org; Thu, 17 Aug 2017 19:03:24 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44514)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn3-0004vg-RA
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:26 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn0-0000qQ-Ag
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400
Received: from mail-pg0-x22a.google.com ([2607:f8b0:400e:c05::22a]:36958)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn0-0000px-5Y
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:22 -0400
Received: by mail-pg0-x22a.google.com with SMTP id y129so51924905pgy.4
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:21 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.19
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=9o+4s+u+HhA1SN1wAmIK4x0UqDGVW4ct/r56gOt0Xg0=;
	b=ERhOP+bZ0PJ2PEOW+/ErsB9nFCLtJqJLFDUj272WDZ0MjFuhNhXwKnFnAmMUKs4A7s
	Hj8cuiKw2JYV52xfx6PLq7IEAaweD9doF4Md1XiAcX3CdQP0ajE6QRFcNqOvRiEh36rg
	Z7zWyt9Sy5eLrqYruYPWdiwWhW6ZteYTQT4tw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=9o+4s+u+HhA1SN1wAmIK4x0UqDGVW4ct/r56gOt0Xg0=;
	b=X+QBsxjDylkCdE6laL42KUW9aRc/BQgRWPJ/ha0PQ/uIVl4/6LOrW3k1TYSTtJiaCH
	VH40tyai0cLb6r13L4qpYSAmtBblU8QyLGYNyYAKSgro1ktt8k8s9UbAWHtVEv0yM3vL
	xkx28FYtfFIaNwlQERi0WNE9YWM5HKmGHRkfiujj2fk136AovV1ffJSp8gEU2H0KpQ9I
	yraTE2cB2jWXffeydFCSHUdMYbcLGbIa55es8w697reJOz97hO/70PGfGw4R29/a3toS
	TNCoOK+2wsPc4DgpB34NIVG2uiArg6oc61tqCFWAiPC/S48w6r5HbtDZsVnPDv17KLBW
	ZrWg==
X-Gm-Message-State: AHYfb5gaHQP4fgaC5j1shin9WerTpMk3Yl1JgXLW0eH1TpTuEWajoLyo
	8/fj2UX/PFcPxwI7lAwi/w==
X-Received: by 10.98.198.145 with SMTP id x17mr6922847pfk.272.1503010880804;
	Thu, 17 Aug 2017 16:01:20 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:09 -0700
Message-Id: <20170817230114.3655-4-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::22a
Subject: [Qemu-devel] [PATCH 3/8] tcg: Add types for host vectors
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Nothing uses or enables them yet.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
---
 tcg/tcg.h | 5 +++++
 tcg/tcg.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index dd97095af5..1277caed3d 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -256,6 +256,11 @@ typedef struct TCGPool {
 typedef enum TCGType {
     TCG_TYPE_I32,
     TCG_TYPE_I64,
+
+    TCG_TYPE_V64,
+    TCG_TYPE_V128,
+    TCG_TYPE_V256,
+
     TCG_TYPE_COUNT, /* number of different types */
=20
     /* An alias for the size of the host register.  */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 787c8ba0f7..ea78d47fad 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -118,7 +118,7 @@ static TCGReg tcg_reg_alloc_new(TCGContext *s, TCGType =
t)
 static bool tcg_out_ldst_finalize(TCGContext *s);
 #endif
=20
-static TCGRegSet tcg_target_available_regs[2];
+static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT];
 static TCGRegSet tcg_target_call_clobber_regs;
=20
 #if TCG_TARGET_INSN_UNIT_SIZE =3D=3D 1
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011029783695.5534651225215;
 Thu, 17 Aug 2017 16:03:49 -0700 (PDT)
Received: from localhost ([::1]:56308 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diTpM-0006UY-GS
	for importer@patchew.org; Thu, 17 Aug 2017 19:03:48 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44499)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn3-0004uL-24
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:26 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn1-0000sr-Qb
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400
Received: from mail-pg0-x230.google.com ([2607:f8b0:400e:c05::230]:36964)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn1-0000qt-Jd
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:23 -0400
Received: by mail-pg0-x230.google.com with SMTP id y129so51925216pgy.4
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:23 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.20
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=Ve0DM43w5YrFDYvyz9+DQQvsPDp9TqP3iha+QC7CvrE=;
	b=fL+HCOIasLc1hBhR5VTNOSXl/yfHcdm8oT4BJkflQeu5Y3+ExC64p5Jsjtn6fwe+V/
	FdugLl11X4yiNj16xp7SsDElgXfxuHSFSipnr0S+WXsXthNFuRxUWGBfu/QvLZJWBUJg
	D6PKMj1NaPH+p7t7FPuauKAoQf4fVjD3+6laU=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=Ve0DM43w5YrFDYvyz9+DQQvsPDp9TqP3iha+QC7CvrE=;
	b=VkDatV2FAdJ9sFRUnq+PBzGOnxAEdXWDOAcWoIZU1EIjfozIIXQlEedq+M4iRKdtuJ
	HQybxHb2eyKGA46O0B3zc3dWqu4dTzbmlnT6PSQLw2APcO0fzGBf0OeJeITh3UxMFM+V
	wiyIdaosqcH1A9iCPkbFDBgD7Ph8CxhZQ390TfK3nZ64kLAb9DxDxzVBL7rn6aseBuYW
	jZ21QZMNlHrKVoPeXyuR41uCl9K30IBRR8CdyBgYFEcyvLmgjcM3hsYu8N3O9A5Tx4XP
	QPy3FCsvqv4Ehkdp2A3B9B8H/uEv+UjuTLtMZWhkXqt4tAciQOQ96Fhxuzh8wjwKftgX
	n63g==
X-Gm-Message-State: AHYfb5gUc+QB38VmE6ScD/FK+j+2e1hyZgvIEJ6wsuiWxgAkT/Hx41QN
	y1boMInhGqnym6wrFabn7w==
X-Received: by 10.98.193.68 with SMTP id i65mr6957117pfg.142.1503010882201;
	Thu, 17 Aug 2017 16:01:22 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:10 -0700
Message-Id: <20170817230114.3655-5-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::230
Subject: [Qemu-devel] [PATCH 4/8] tcg: Add operations for host vectors
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Nothing uses or implements them yet.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
---
 tcg/tcg-opc.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++=
++++
 tcg/tcg.h     | 24 ++++++++++++++++
 2 files changed, 113 insertions(+)

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 956fb1e9f3..9162125fac 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -206,6 +206,95 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
=20
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
+
+/* Host integer vector operations.  */
+/* These opcodes are required whenever the base vector size is enabled.  */
+
+DEF(mov_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(mov_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(mov_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(movi_v64, 1, 0, 1, IMPL(TCG_TARGET_HAS_v64))
+DEF(movi_v128, 1, 0, 1, IMPL(TCG_TARGET_HAS_v128))
+DEF(movi_v256, 1, 0, 1, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(ld_v64, 1, 1, 1, IMPL(TCG_TARGET_HAS_v64))
+DEF(ld_v128, 1, 1, 1, IMPL(TCG_TARGET_HAS_v128))
+DEF(ld_v256, 1, 1, 1, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(st_v64, 0, 2, 1, IMPL(TCG_TARGET_HAS_v64))
+DEF(st_v128, 0, 2, 1, IMPL(TCG_TARGET_HAS_v128))
+DEF(st_v256, 0, 2, 1, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(and_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(and_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(and_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(or_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(or_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(or_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(xor_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(xor_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(xor_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(add8_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(add16_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(add32_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+
+DEF(add8_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(add16_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(add32_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(add64_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+
+DEF(add8_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(add16_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(add32_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(add64_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+
+DEF(sub8_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(sub16_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+DEF(sub32_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_v64))
+
+DEF(sub8_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(sub16_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(sub32_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+DEF(sub64_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_v128))
+
+DEF(sub8_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(sub16_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(sub32_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(sub64_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_v256))
+
+/* These opcodes are optional.
+   All element counts must be supported if any are.  */
+
+DEF(not_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_v64))
+DEF(not_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_v128))
+DEF(not_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_v256))
+
+DEF(andc_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_v64))
+DEF(andc_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_v128))
+DEF(andc_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_v256))
+
+DEF(orc_v64, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_v64))
+DEF(orc_v128, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_v128))
+DEF(orc_v256, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_v256))
+
+DEF(neg8_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v64))
+DEF(neg16_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v64))
+DEF(neg32_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v64))
+
+DEF(neg8_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128))
+DEF(neg16_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128))
+DEF(neg32_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128))
+DEF(neg64_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v128))
+
+DEF(neg8_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256))
+DEF(neg16_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256))
+DEF(neg32_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256))
+DEF(neg64_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_v256))
+
 #undef IMPL
 #undef IMPL64
 #undef DEF
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1277caed3d..b9e15da13b 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -166,6 +166,30 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_rem_i64          0
 #endif
=20
+#ifndef TCG_TARGET_HAS_v64
+#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_andc_v64         0
+#define TCG_TARGET_HAS_orc_v64          0
+#define TCG_TARGET_HAS_not_v64          0
+#define TCG_TARGET_HAS_neg_v64          0
+#endif
+
+#ifndef TCG_TARGET_HAS_v128
+#define TCG_TARGET_HAS_v128             0
+#define TCG_TARGET_HAS_andc_v128        0
+#define TCG_TARGET_HAS_orc_v128         0
+#define TCG_TARGET_HAS_not_v128         0
+#define TCG_TARGET_HAS_neg_v128         0
+#endif
+
+#ifndef TCG_TARGET_HAS_v256
+#define TCG_TARGET_HAS_v256             0
+#define TCG_TARGET_HAS_andc_v256        0
+#define TCG_TARGET_HAS_orc_v256         0
+#define TCG_TARGET_HAS_not_v256         0
+#define TCG_TARGET_HAS_neg_v256         0
+#endif
+
 /* For 32-bit targets, some sort of unsigned widening multiply is required=
.  */
 #if TCG_TARGET_REG_BITS =3D=3D 32 \
     && !(defined(TCG_TARGET_HAS_mulu2_i32) \
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011157795398.5633525593728;
 Thu, 17 Aug 2017 16:05:57 -0700 (PDT)
Received: from localhost ([::1]:56351 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diTrQ-0008JF-Fq
	for importer@patchew.org; Thu, 17 Aug 2017 19:05:56 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44537)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn4-0004wR-J8
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn3-0000x0-9E
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:26 -0400
Received: from mail-pg0-x22b.google.com ([2607:f8b0:400e:c05::22b]:36971)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn3-0000va-24
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:25 -0400
Received: by mail-pg0-x22b.google.com with SMTP id y129so51925581pgy.4
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:24 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.22
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=hIDrwc+oo2/OTeyFZIxY3GeNtfhOOi0JxoLuI/S8d1Q=;
	b=Y0Lsb3z60F7sB14wYzFrb59izB7AHueIXWhSfpjV9vCPDXmsE4WpCyRAZjpn7RjgFp
	s1cMHaKERonXYCCb5ZTfT+af5GVV+sVq4J6lOdYqrvLKAuLJxD69VdA3VwWgj9WkxFrU
	cQgOlf1XiQkd+YpwWxKsvz1/w4vvOwJ4YSNTw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=hIDrwc+oo2/OTeyFZIxY3GeNtfhOOi0JxoLuI/S8d1Q=;
	b=ijkU8ggfhHXc868xM9p/BFsbh+47L80DQUrlFlyuyI4A5acoYKQ1a1b3ZvEeOVLIJs
	CRogmzFIM5L0D3QsKoKxLuzApqvMHcLLmf73PQxroN05D5eUnLgTz7MXPoF31aINIDHn
	n+zbK8GvxdmOom1LqW1aNRpQsiUTU15bjneXxZRQJnD8dk4kSGpnfeJUSiaUwPQlf+te
	dzWS9adbR24kNqSYetrpZYp7M1Pxu/lO1nS6bh46fY9GA841ZLnpk3Qa0XytHpC2va5O
	qqufnKuUiCL1SVcgq05N+x6xHgdAsGSl9qWakwRUIi+KxJ2Xbd+jhzBiAQVP+PPdfIc8
	/ksA==
X-Gm-Message-State: AHYfb5jqsV812HPwFSZf83IGkRcYtCycen/tsVm6IRLSpKkhZzwcttNs
	aDL9UvxlUNy0pFLvFAu3vA==
X-Received: by 10.98.82.2 with SMTP id g2mr6749735pfb.308.1503010883573;
	Thu, 17 Aug 2017 16:01:23 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:11 -0700
Message-Id: <20170817230114.3655-6-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::22b
Subject: [Qemu-devel] [PATCH 5/8] tcg: Add tcg_op_supported
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
---
 tcg/tcg.h |   2 +
 tcg/tcg.c | 310 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
++++
 2 files changed, 312 insertions(+)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index b9e15da13b..b443143b21 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -962,6 +962,8 @@ do {\
 #define tcg_temp_free_ptr(T) tcg_temp_free_i64(TCGV_PTR_TO_NAT(T))
 #endif
=20
+bool tcg_op_supported(TCGOpcode op);
+
 void tcg_gen_callN(TCGContext *s, void *func,
                    TCGArg ret, int nargs, TCGArg *args);
=20
diff --git a/tcg/tcg.c b/tcg/tcg.c
index ea78d47fad..3c3cdda938 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -751,6 +751,316 @@ int tcg_check_temp_count(void)
 }
 #endif
=20
+/* Return true if OP may appear in the opcode stream.
+   Test the runtime variable that controls each opcode.  */
+bool tcg_op_supported(TCGOpcode op)
+{
+    switch (op) {
+    case INDEX_op_discard:
+    case INDEX_op_set_label:
+    case INDEX_op_call:
+    case INDEX_op_br:
+    case INDEX_op_mb:
+    case INDEX_op_insn_start:
+    case INDEX_op_exit_tb:
+    case INDEX_op_goto_tb:
+    case INDEX_op_qemu_ld_i32:
+    case INDEX_op_qemu_st_i32:
+    case INDEX_op_qemu_ld_i64:
+    case INDEX_op_qemu_st_i64:
+        return true;
+
+    case INDEX_op_goto_ptr:
+        return TCG_TARGET_HAS_goto_ptr;
+
+    case INDEX_op_mov_i32:
+    case INDEX_op_movi_i32:
+    case INDEX_op_setcond_i32:
+    case INDEX_op_brcond_i32:
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld_i32:
+    case INDEX_op_st8_i32:
+    case INDEX_op_st16_i32:
+    case INDEX_op_st_i32:
+    case INDEX_op_add_i32:
+    case INDEX_op_sub_i32:
+    case INDEX_op_mul_i32:
+    case INDEX_op_and_i32:
+    case INDEX_op_or_i32:
+    case INDEX_op_xor_i32:
+    case INDEX_op_shl_i32:
+    case INDEX_op_shr_i32:
+    case INDEX_op_sar_i32:
+        return true;
+
+    case INDEX_op_movcond_i32:
+        return TCG_TARGET_HAS_movcond_i32;
+    case INDEX_op_div_i32:
+    case INDEX_op_divu_i32:
+        return TCG_TARGET_HAS_div_i32;
+    case INDEX_op_rem_i32:
+    case INDEX_op_remu_i32:
+        return TCG_TARGET_HAS_rem_i32;
+    case INDEX_op_div2_i32:
+    case INDEX_op_divu2_i32:
+        return TCG_TARGET_HAS_div2_i32;
+    case INDEX_op_rotl_i32:
+    case INDEX_op_rotr_i32:
+        return TCG_TARGET_HAS_rot_i32;
+    case INDEX_op_deposit_i32:
+        return TCG_TARGET_HAS_deposit_i32;
+    case INDEX_op_extract_i32:
+        return TCG_TARGET_HAS_extract_i32;
+    case INDEX_op_sextract_i32:
+        return TCG_TARGET_HAS_sextract_i32;
+    case INDEX_op_add2_i32:
+        return TCG_TARGET_HAS_add2_i32;
+    case INDEX_op_sub2_i32:
+        return TCG_TARGET_HAS_sub2_i32;
+    case INDEX_op_mulu2_i32:
+        return TCG_TARGET_HAS_mulu2_i32;
+    case INDEX_op_muls2_i32:
+        return TCG_TARGET_HAS_muls2_i32;
+    case INDEX_op_muluh_i32:
+        return TCG_TARGET_HAS_muluh_i32;
+    case INDEX_op_mulsh_i32:
+        return TCG_TARGET_HAS_mulsh_i32;
+    case INDEX_op_ext8s_i32:
+        return TCG_TARGET_HAS_ext8s_i32;
+    case INDEX_op_ext16s_i32:
+        return TCG_TARGET_HAS_ext16s_i32;
+    case INDEX_op_ext8u_i32:
+        return TCG_TARGET_HAS_ext8u_i32;
+    case INDEX_op_ext16u_i32:
+        return TCG_TARGET_HAS_ext16u_i32;
+    case INDEX_op_bswap16_i32:
+        return TCG_TARGET_HAS_bswap16_i32;
+    case INDEX_op_bswap32_i32:
+        return TCG_TARGET_HAS_bswap32_i32;
+    case INDEX_op_not_i32:
+        return TCG_TARGET_HAS_not_i32;
+    case INDEX_op_neg_i32:
+        return TCG_TARGET_HAS_neg_i32;
+    case INDEX_op_andc_i32:
+        return TCG_TARGET_HAS_andc_i32;
+    case INDEX_op_orc_i32:
+        return TCG_TARGET_HAS_orc_i32;
+    case INDEX_op_eqv_i32:
+        return TCG_TARGET_HAS_eqv_i32;
+    case INDEX_op_nand_i32:
+        return TCG_TARGET_HAS_nand_i32;
+    case INDEX_op_nor_i32:
+        return TCG_TARGET_HAS_nor_i32;
+    case INDEX_op_clz_i32:
+        return TCG_TARGET_HAS_clz_i32;
+    case INDEX_op_ctz_i32:
+        return TCG_TARGET_HAS_ctz_i32;
+    case INDEX_op_ctpop_i32:
+        return TCG_TARGET_HAS_ctpop_i32;
+
+    case INDEX_op_brcond2_i32:
+    case INDEX_op_setcond2_i32:
+        return TCG_TARGET_REG_BITS =3D=3D 32;
+
+    case INDEX_op_mov_i64:
+    case INDEX_op_movi_i64:
+    case INDEX_op_setcond_i64:
+    case INDEX_op_brcond_i64:
+    case INDEX_op_ld8u_i64:
+    case INDEX_op_ld8s_i64:
+    case INDEX_op_ld16u_i64:
+    case INDEX_op_ld16s_i64:
+    case INDEX_op_ld32u_i64:
+    case INDEX_op_ld32s_i64:
+    case INDEX_op_ld_i64:
+    case INDEX_op_st8_i64:
+    case INDEX_op_st16_i64:
+    case INDEX_op_st32_i64:
+    case INDEX_op_st_i64:
+    case INDEX_op_add_i64:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i64:
+    case INDEX_op_and_i64:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i64:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i64:
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
+        return TCG_TARGET_REG_BITS =3D=3D 64;
+
+    case INDEX_op_movcond_i64:
+        return TCG_TARGET_HAS_movcond_i64;
+    case INDEX_op_div_i64:
+    case INDEX_op_divu_i64:
+        return TCG_TARGET_HAS_div_i64;
+    case INDEX_op_rem_i64:
+    case INDEX_op_remu_i64:
+        return TCG_TARGET_HAS_rem_i64;
+    case INDEX_op_div2_i64:
+    case INDEX_op_divu2_i64:
+        return TCG_TARGET_HAS_div2_i64;
+    case INDEX_op_rotl_i64:
+    case INDEX_op_rotr_i64:
+        return TCG_TARGET_HAS_rot_i64;
+    case INDEX_op_deposit_i64:
+        return TCG_TARGET_HAS_deposit_i64;
+    case INDEX_op_extract_i64:
+        return TCG_TARGET_HAS_extract_i64;
+    case INDEX_op_sextract_i64:
+        return TCG_TARGET_HAS_sextract_i64;
+    case INDEX_op_extrl_i64_i32:
+        return TCG_TARGET_HAS_extrl_i64_i32;
+    case INDEX_op_extrh_i64_i32:
+        return TCG_TARGET_HAS_extrh_i64_i32;
+    case INDEX_op_ext8s_i64:
+        return TCG_TARGET_HAS_ext8s_i64;
+    case INDEX_op_ext16s_i64:
+        return TCG_TARGET_HAS_ext16s_i64;
+    case INDEX_op_ext32s_i64:
+        return TCG_TARGET_HAS_ext32s_i64;
+    case INDEX_op_ext8u_i64:
+        return TCG_TARGET_HAS_ext8u_i64;
+    case INDEX_op_ext16u_i64:
+        return TCG_TARGET_HAS_ext16u_i64;
+    case INDEX_op_ext32u_i64:
+        return TCG_TARGET_HAS_ext32u_i64;
+    case INDEX_op_bswap16_i64:
+        return TCG_TARGET_HAS_bswap16_i64;
+    case INDEX_op_bswap32_i64:
+        return TCG_TARGET_HAS_bswap32_i64;
+    case INDEX_op_bswap64_i64:
+        return TCG_TARGET_HAS_bswap64_i64;
+    case INDEX_op_not_i64:
+        return TCG_TARGET_HAS_not_i64;
+    case INDEX_op_neg_i64:
+        return TCG_TARGET_HAS_neg_i64;
+    case INDEX_op_andc_i64:
+        return TCG_TARGET_HAS_andc_i64;
+    case INDEX_op_orc_i64:
+        return TCG_TARGET_HAS_orc_i64;
+    case INDEX_op_eqv_i64:
+        return TCG_TARGET_HAS_eqv_i64;
+    case INDEX_op_nand_i64:
+        return TCG_TARGET_HAS_nand_i64;
+    case INDEX_op_nor_i64:
+        return TCG_TARGET_HAS_nor_i64;
+    case INDEX_op_clz_i64:
+        return TCG_TARGET_HAS_clz_i64;
+    case INDEX_op_ctz_i64:
+        return TCG_TARGET_HAS_ctz_i64;
+    case INDEX_op_ctpop_i64:
+        return TCG_TARGET_HAS_ctpop_i64;
+    case INDEX_op_add2_i64:
+        return TCG_TARGET_HAS_add2_i64;
+    case INDEX_op_sub2_i64:
+        return TCG_TARGET_HAS_sub2_i64;
+    case INDEX_op_mulu2_i64:
+        return TCG_TARGET_HAS_mulu2_i64;
+    case INDEX_op_muls2_i64:
+        return TCG_TARGET_HAS_muls2_i64;
+    case INDEX_op_muluh_i64:
+        return TCG_TARGET_HAS_muluh_i64;
+    case INDEX_op_mulsh_i64:
+        return TCG_TARGET_HAS_mulsh_i64;
+
+    case INDEX_op_mov_v64:
+    case INDEX_op_movi_v64:
+    case INDEX_op_ld_v64:
+    case INDEX_op_st_v64:
+    case INDEX_op_and_v64:
+    case INDEX_op_or_v64:
+    case INDEX_op_xor_v64:
+    case INDEX_op_add8_v64:
+    case INDEX_op_add16_v64:
+    case INDEX_op_add32_v64:
+    case INDEX_op_sub8_v64:
+    case INDEX_op_sub16_v64:
+    case INDEX_op_sub32_v64:
+        return TCG_TARGET_HAS_v64;
+
+    case INDEX_op_mov_v128:
+    case INDEX_op_movi_v128:
+    case INDEX_op_ld_v128:
+    case INDEX_op_st_v128:
+    case INDEX_op_and_v128:
+    case INDEX_op_or_v128:
+    case INDEX_op_xor_v128:
+    case INDEX_op_add8_v128:
+    case INDEX_op_add16_v128:
+    case INDEX_op_add32_v128:
+    case INDEX_op_add64_v128:
+    case INDEX_op_sub8_v128:
+    case INDEX_op_sub16_v128:
+    case INDEX_op_sub32_v128:
+    case INDEX_op_sub64_v128:
+        return TCG_TARGET_HAS_v128;
+
+    case INDEX_op_mov_v256:
+    case INDEX_op_movi_v256:
+    case INDEX_op_ld_v256:
+    case INDEX_op_st_v256:
+    case INDEX_op_and_v256:
+    case INDEX_op_or_v256:
+    case INDEX_op_xor_v256:
+    case INDEX_op_add8_v256:
+    case INDEX_op_add16_v256:
+    case INDEX_op_add32_v256:
+    case INDEX_op_add64_v256:
+    case INDEX_op_sub8_v256:
+    case INDEX_op_sub16_v256:
+    case INDEX_op_sub32_v256:
+    case INDEX_op_sub64_v256:
+        return TCG_TARGET_HAS_v256;
+
+    case INDEX_op_not_v64:
+        return TCG_TARGET_HAS_not_v64;
+    case INDEX_op_not_v128:
+        return TCG_TARGET_HAS_not_v128;
+    case INDEX_op_not_v256:
+        return TCG_TARGET_HAS_not_v256;
+
+    case INDEX_op_andc_v64:
+        return TCG_TARGET_HAS_andc_v64;
+    case INDEX_op_andc_v128:
+        return TCG_TARGET_HAS_andc_v128;
+    case INDEX_op_andc_v256:
+        return TCG_TARGET_HAS_andc_v256;
+
+    case INDEX_op_orc_v64:
+        return TCG_TARGET_HAS_orc_v64;
+    case INDEX_op_orc_v128:
+        return TCG_TARGET_HAS_orc_v128;
+    case INDEX_op_orc_v256:
+        return TCG_TARGET_HAS_orc_v256;
+
+    case INDEX_op_neg8_v64:
+    case INDEX_op_neg16_v64:
+    case INDEX_op_neg32_v64:
+        return TCG_TARGET_HAS_neg_v64;
+
+    case INDEX_op_neg8_v128:
+    case INDEX_op_neg16_v128:
+    case INDEX_op_neg32_v128:
+    case INDEX_op_neg64_v128:
+        return TCG_TARGET_HAS_neg_v128;
+
+    case INDEX_op_neg8_v256:
+    case INDEX_op_neg16_v256:
+    case INDEX_op_neg32_v256:
+    case INDEX_op_neg64_v256:
+        return TCG_TARGET_HAS_neg_v256;
+
+    case NB_OPS:
+        break;
+    }
+    g_assert_not_reached();
+}
+
 /* Note: we convert the 64 bit args to 32 bit and do some alignment
    and endian swap. Maybe it would be better to do the alignment
    and endian swap in tcg_reg_alloc_call(). */
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 15030112950371008.613745663873;
 Thu, 17 Aug 2017 16:08:15 -0700 (PDT)
Received: from localhost ([::1]:56502 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diTtd-0001rw-Og
	for importer@patchew.org; Thu, 17 Aug 2017 19:08:13 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44564)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn5-0004xV-P7
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn5-0000y2-3z
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:27 -0400
Received: from mail-pg0-x22d.google.com ([2607:f8b0:400e:c05::22d]:36560)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn4-0000xc-VB
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:27 -0400
Received: by mail-pg0-x22d.google.com with SMTP id i12so51987893pgr.3
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:26 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.23
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=dpdlyo3XtVpsa3ftb6C8wRlu5yzXE0Obkds6jVxb5hw=;
	b=NYmCuYgEnbczxXFGsRPIauPB+03hnctKKtTaOoKZa/Yek3w5jieLYsou8TgpIhmJ1o
	RFdWq32rcqfaZdS8/7R9H5HZPnuhWF6LajCpasBijtif4pxANchy6RgjHRZnZeX5TYRR
	4aim+tHJHsAEv80MSBeZJW1bQMHHRUpXc+Ow8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=dpdlyo3XtVpsa3ftb6C8wRlu5yzXE0Obkds6jVxb5hw=;
	b=AAac+EqPjQcW2nBivgp7f20nVa/wcaRNHmDw97zYtXxmT0ssBaIcKX9iTJFf9uYsF+
	sbpUeXuNJwInkgeEYjx/AO1ByBc9NobBkYYk26ITJFkix1xqdA4ByzJ3+Eet3/JkALdz
	RU+wegMyQ+NVe6Qb21EKUanUM/IK/H69jvWmbnaZtSSnndcGpZNMm0xK/CDDgl9Xx79x
	H/WKlhDYvP/AyFx/XybegBhYo2qOR1iCZf/rmrS/u8a/kISoJrGInDTRxha02Vbi8nYh
	WrPgi6z/i1Of686Nrzsn/oprCCMqcOzClfR2K0jvyjlZx0m+4zV33zonbVe+NujJ/Ub6
	5fhQ==
X-Gm-Message-State: AHYfb5ggvXa39lZMOfqovHkku7/kvZFFHjii007A8WMriMzADkw5cCZx
	QTO1Ox/Dpal3LrqASSBCwQ==
X-Received: by 10.98.14.93 with SMTP id w90mr6909429pfi.298.1503010885018;
	Thu, 17 Aug 2017 16:01:25 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:12 -0700
Message-Id: <20170817230114.3655-7-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::22d
Subject: [Qemu-devel] [PATCH 6/8] tcg: Add INDEX_op_invalid
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Add with value 0 so that structure zero initialization can
indicate that the field is not present.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
---
 tcg/tcg-opc.h | 2 ++
 tcg/tcg.c     | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 9162125fac..b1445a4c24 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -26,6 +26,8 @@
  * DEF(name, oargs, iargs, cargs, flags)
  */
=20
+DEF(invalid, 0, 0, 0, TCG_OPF_NOT_PRESENT)
+
 /* predefined ops */
 DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT)
 DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3c3cdda938..879b29e81f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -756,6 +756,9 @@ int tcg_check_temp_count(void)
 bool tcg_op_supported(TCGOpcode op)
 {
     switch (op) {
+    case INDEX_op_invalid:
+        return false;
+
     case INDEX_op_discard:
     case INDEX_op_set_label:
     case INDEX_op_call:
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011387574579.4853486048796;
 Thu, 17 Aug 2017 16:09:47 -0700 (PDT)
Received: from localhost ([::1]:56685 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diTv8-0003Hs-C1
	for importer@patchew.org; Thu, 17 Aug 2017 19:09:46 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44637)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTnB-00053N-U4
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:35 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn7-0000yz-1O
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:34 -0400
Received: from mail-pg0-x230.google.com ([2607:f8b0:400e:c05::230]:36567)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn6-0000yi-QL
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:28 -0400
Received: by mail-pg0-x230.google.com with SMTP id i12so51988336pgr.3
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:28 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.25
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=ayDUNuMxBJiB70+Pu5TZupwFTb2kohK2+QYR/Q2F6rg=;
	b=V4mEpb1xOMQrlcvGv1KqgEZ+kPVth2kZllsNtYaFD+kChXCtn2lwjYNfWyXJpIrLFJ
	7gJyLIRWPOgAFTZUVy/76rkddSgBcQYecfccNy3Zpm8nuSokPtFo/0NPC5s+LJjhM5gS
	XQNTD20jWF4tLuNrw6wJAa1wZy0l2D1nhZnk8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=ayDUNuMxBJiB70+Pu5TZupwFTb2kohK2+QYR/Q2F6rg=;
	b=nnW/Hgyx0YnBWYxgd2BOujoUZT4UXyXCIsd++oAiUqIcvzjsops9liL764U/t+/FU1
	KvzRg1gDdbgFj86owSiFlfSciGmK47oUMH20sBs6X7/WuPR193iu0hYFEn1tG7xzFQNH
	8ONJ4QmVkXLOkXHKn5x1kPQ0CvWah5i45Baykbq808sd4CXFQ/DojLjJCELSDnM1hf3g
	prlQcTu1LqQnr47g+0PALhdpXWq2nXls5noCuktvn6pzbZGlHXznH82UtITMmIxGxf9H
	xKLnDnnqSO+VcqHsbya1CALVhn7JTF+l2tPfINf+/48ZwFAJ4iEnsz3+ZXFJnx+m5uIi
	+dFA==
X-Gm-Message-State: AHYfb5hdQNcoxsHZv5wsu56t69WvMks+IfY6TdM6tPCVj5JzXrDUrktK
	0KBH35Qi+yXNSdv4JXuKkA==
X-Received: by 10.98.86.195 with SMTP id h64mr6585247pfj.99.1503010886329;
	Thu, 17 Aug 2017 16:01:26 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:13 -0700
Message-Id: <20170817230114.3655-8-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::230
Subject: [Qemu-devel] [PATCH 7/8] tcg: Expand target vector ops with host
 vector ops
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.h |   4 +
 tcg/tcg.h         |   6 +-
 tcg/tcg-op-gvec.c | 230 +++++++++++++++++++++++++++++++++++++++++++-------=
----
 tcg/tcg.c         |   8 +-
 4 files changed, 197 insertions(+), 51 deletions(-)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 10db3599a5..99f36d208e 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -40,6 +40,10 @@ typedef struct {
     /* Similarly, but load up a constant and re-use across lanes.  */
     void (*fni8x)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64);
     uint64_t extra_value;
+    /* Operations with host vector ops.  */
+    TCGOpcode op_v256;
+    TCGOpcode op_v128;
+    TCGOpcode op_v64;
     /* Larger sizes: expand out-of-line helper w/size descriptor.  */
     void (*fno)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 } GVecGen3;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index b443143b21..7f10501d31 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -825,9 +825,11 @@ int tcg_global_mem_new_internal(TCGType, TCGv_ptr, int=
ptr_t, const char *);
 TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name);
 TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name);
=20
-TCGv_i32 tcg_temp_new_internal_i32(int temp_local);
-TCGv_i64 tcg_temp_new_internal_i64(int temp_local);
+int tcg_temp_new_internal(TCGType type, bool temp_local);
+TCGv_i32 tcg_temp_new_internal_i32(bool temp_local);
+TCGv_i64 tcg_temp_new_internal_i64(bool temp_local);
=20
+void tcg_temp_free_internal(int arg);
 void tcg_temp_free_i32(TCGv_i32 arg);
 void tcg_temp_free_i64(TCGv_i64 arg);
=20
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 6de49dc07f..3aca565dc0 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -30,54 +30,73 @@
 #define REP8(x)    ((x) * 0x0101010101010101ull)
 #define REP16(x)   ((x) * 0x0001000100010001ull)
=20
-#define MAX_INLINE 16
+#define MAX_UNROLL  4
=20
-static inline void check_size_s(uint32_t opsz, uint32_t clsz)
+static inline void check_size_align(uint32_t opsz, uint32_t clsz, uint32_t=
 ofs)
 {
-    tcg_debug_assert(opsz % 8 =3D=3D 0);
-    tcg_debug_assert(clsz % 8 =3D=3D 0);
+    uint32_t align =3D clsz > 16 || opsz >=3D 16 ? 15 : 7;
+    tcg_debug_assert(opsz > 0);
     tcg_debug_assert(opsz <=3D clsz);
+    tcg_debug_assert((opsz & align) =3D=3D 0);
+    tcg_debug_assert((clsz & align) =3D=3D 0);
+    tcg_debug_assert((ofs & align) =3D=3D 0);
 }
=20
-static inline void check_align_s_3(uint32_t dofs, uint32_t aofs, uint32_t =
bofs)
+static inline void check_overlap_3(uint32_t d, uint32_t a,
+                                   uint32_t b, uint32_t s)
 {
-    tcg_debug_assert(dofs % 8 =3D=3D 0);
-    tcg_debug_assert(aofs % 8 =3D=3D 0);
-    tcg_debug_assert(bofs % 8 =3D=3D 0);
+    tcg_debug_assert(d =3D=3D a || d + s <=3D a || a + s <=3D d);
+    tcg_debug_assert(d =3D=3D b || d + s <=3D b || b + s <=3D d);
+    tcg_debug_assert(a =3D=3D b || a + s <=3D b || b + s <=3D a);
 }
=20
-static inline void check_size_l(uint32_t opsz, uint32_t clsz)
+static inline bool check_size_impl(uint32_t opsz, uint32_t lnsz)
 {
-    tcg_debug_assert(opsz % 16 =3D=3D 0);
-    tcg_debug_assert(clsz % 16 =3D=3D 0);
-    tcg_debug_assert(opsz <=3D clsz);
+    uint32_t lnct =3D opsz / lnsz;
+    return lnct >=3D 1 && lnct <=3D MAX_UNROLL;
 }
=20
-static inline void check_align_l_3(uint32_t dofs, uint32_t aofs, uint32_t =
bofs)
+static void expand_clr_v(uint32_t dofs, uint32_t clsz, uint32_t lnsz,
+                         TCGType type, TCGOpcode opc_mv, TCGOpcode opc_st)
 {
-    tcg_debug_assert(dofs % 16 =3D=3D 0);
-    tcg_debug_assert(aofs % 16 =3D=3D 0);
-    tcg_debug_assert(bofs % 16 =3D=3D 0);
-}
+    TCGArg t0 =3D tcg_temp_new_internal(type, 0);
+    TCGArg env =3D GET_TCGV_PTR(tcg_ctx.tcg_env);
+    uint32_t i;
=20
-static inline void check_overlap_3(uint32_t d, uint32_t a,
-                                   uint32_t b, uint32_t s)
-{
-    tcg_debug_assert(d =3D=3D a || d + s <=3D a || a + s <=3D d);
-    tcg_debug_assert(d =3D=3D b || d + s <=3D b || b + s <=3D d);
-    tcg_debug_assert(a =3D=3D b || a + s <=3D b || b + s <=3D a);
+    tcg_gen_op2(&tcg_ctx, opc_mv, t0, 0);
+    for (i =3D 0; i < clsz; i +=3D lnsz) {
+        tcg_gen_op3(&tcg_ctx, opc_st, t0, env, dofs + i);
+    }
+    tcg_temp_free_internal(t0);
 }
=20
-static void expand_clr(uint32_t dofs, uint32_t opsz, uint32_t clsz)
+static void expand_clr(uint32_t dofs, uint32_t clsz)
 {
-    if (clsz > opsz) {
-        TCGv_i64 zero =3D tcg_const_i64(0);
-        uint32_t i;
+    if (clsz >=3D 32 && TCG_TARGET_HAS_v256) {
+        uint32_t done =3D QEMU_ALIGN_DOWN(clsz, 32);
+        expand_clr_v(dofs, done, 32, TCG_TYPE_V256,
+                     INDEX_op_movi_v256, INDEX_op_st_v256);
+        dofs +=3D done;
+        clsz -=3D done;
+    }
=20
-        for (i =3D opsz; i < clsz; i +=3D 8) {
-            tcg_gen_st_i64(zero, tcg_ctx.tcg_env, dofs + i);
-        }
-        tcg_temp_free_i64(zero);
+    if (clsz >=3D 16 && TCG_TARGET_HAS_v128) {
+        uint16_t done =3D QEMU_ALIGN_DOWN(clsz, 16);
+        expand_clr_v(dofs, done, 16, TCG_TYPE_V128,
+                     INDEX_op_movi_v128, INDEX_op_st_v128);
+        dofs +=3D done;
+        clsz -=3D done;
+    }
+
+    if (TCG_TARGET_REG_BITS =3D=3D 64) {
+        expand_clr_v(dofs, clsz, 8, TCG_TYPE_I64,
+                     INDEX_op_movi_i64, INDEX_op_st_i64);
+    } else if (TCG_TARGET_HAS_v64) {
+        expand_clr_v(dofs, clsz, 8, TCG_TYPE_V64,
+                     INDEX_op_movi_v64, INDEX_op_st_v64);
+    } else {
+        expand_clr_v(dofs, clsz, 4, TCG_TYPE_I32,
+                     INDEX_op_movi_i32, INDEX_op_st_i32);
     }
 }
=20
@@ -164,6 +183,7 @@ static void expand_3x8(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i64(t0);
 }
=20
+/* FIXME: add CSE for constants and we can eliminate this.  */
 static void expand_3x8p1(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t opsz, uint64_t data,
                          void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i6=
4))
@@ -192,28 +212,111 @@ static void expand_3x8p1(uint32_t dofs, uint32_t aof=
s, uint32_t bofs,
     tcg_temp_free_i64(t2);
 }
=20
+static void expand_3_v(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                       uint32_t opsz, uint32_t lnsz, TCGType type,
+                       TCGOpcode opc_op, TCGOpcode opc_ld, TCGOpcode opc_s=
t)
+{
+    TCGArg t0 =3D tcg_temp_new_internal(type, 0);
+    TCGArg env =3D GET_TCGV_PTR(tcg_ctx.tcg_env);
+    uint32_t i;
+
+    if (aofs =3D=3D bofs) {
+        for (i =3D 0; i < opsz; i +=3D lnsz) {
+            tcg_gen_op3(&tcg_ctx, opc_ld, t0, env, aofs + i);
+            tcg_gen_op3(&tcg_ctx, opc_op, t0, t0, t0);
+            tcg_gen_op3(&tcg_ctx, opc_st, t0, env, dofs + i);
+        }
+    } else {
+        TCGArg t1 =3D tcg_temp_new_internal(type, 0);
+        for (i =3D 0; i < opsz; i +=3D lnsz) {
+            tcg_gen_op3(&tcg_ctx, opc_ld, t0, env, aofs + i);
+            tcg_gen_op3(&tcg_ctx, opc_ld, t1, env, bofs + i);
+            tcg_gen_op3(&tcg_ctx, opc_op, t0, t0, t1);
+            tcg_gen_op3(&tcg_ctx, opc_st, t0, env, dofs + i);
+        }
+        tcg_temp_free_internal(t1);
+    }
+    tcg_temp_free_internal(t0);
+}
+
 void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                     uint32_t opsz, uint32_t clsz, const GVecGen3 *g)
 {
+    check_size_align(opsz, clsz, dofs | aofs | bofs);
     check_overlap_3(dofs, aofs, bofs, clsz);
-    if (opsz <=3D MAX_INLINE) {
-        check_size_s(opsz, clsz);
-        check_align_s_3(dofs, aofs, bofs);
-        if (g->fni8) {
-            expand_3x8(dofs, aofs, bofs, opsz, g->fni8);
-        } else if (g->fni4) {
-            expand_3x4(dofs, aofs, bofs, opsz, g->fni4);
+
+    if (opsz > MAX_UNROLL * 32 || clsz > MAX_UNROLL * 32) {
+        goto do_ool;
+    }
+
+    /* Recall that ARM SVE allows vector sizes that are not a power of 2.
+       Expand with successively smaller host vector sizes.  The intent is
+       that e.g. opsz =3D=3D 80 would be expanded with 2x32 + 1x16.  */
+    /* ??? For clsz > opsz, the host may be able to use an op-sized
+       operation, zeroing the balance of the register.  We can then
+       use a cl-sized store to implement the clearing without an extra
+       store operation.  This is true for aarch64 and x86_64 hosts.  */
+
+    if (check_size_impl(opsz, 32) && tcg_op_supported(g->op_v256)) {
+        uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 32);
+        expand_3_v(dofs, aofs, bofs, done, 32, TCG_TYPE_V256,
+                   g->op_v256, INDEX_op_ld_v256, INDEX_op_st_v256);
+        dofs +=3D done;
+        aofs +=3D done;
+        bofs +=3D done;
+        opsz -=3D done;
+        clsz -=3D done;
+    }
+
+    if (check_size_impl(opsz, 16) && tcg_op_supported(g->op_v128)) {
+        uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 16);
+        expand_3_v(dofs, aofs, bofs, done, 16, TCG_TYPE_V128,
+                   g->op_v128, INDEX_op_ld_v128, INDEX_op_st_v128);
+        dofs +=3D done;
+        aofs +=3D done;
+        bofs +=3D done;
+        opsz -=3D done;
+        clsz -=3D done;
+    }
+
+    if (check_size_impl(opsz, 8)) {
+        uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 8);
+        if (tcg_op_supported(g->op_v64)) {
+            expand_3_v(dofs, aofs, bofs, done, 8, TCG_TYPE_V64,
+                       g->op_v64, INDEX_op_ld_v64, INDEX_op_st_v64);
+        } else if (g->fni8) {
+            expand_3x8(dofs, aofs, bofs, done, g->fni8);
         } else if (g->fni8x) {
-            expand_3x8p1(dofs, aofs, bofs, opsz, g->extra_value, g->fni8x);
+            expand_3x8p1(dofs, aofs, bofs, done, g->extra_value, g->fni8x);
         } else {
-            g_assert_not_reached();
+            done =3D 0;
         }
-        expand_clr(dofs, opsz, clsz);
-    } else {
-        check_size_l(opsz, clsz);
-        check_align_l_3(dofs, aofs, bofs);
-        expand_3_o(dofs, aofs, bofs, opsz, clsz, g->fno);
+        dofs +=3D done;
+        aofs +=3D done;
+        bofs +=3D done;
+        opsz -=3D done;
+        clsz -=3D done;
     }
+
+    if (check_size_impl(opsz, 4)) {
+        uint32_t done =3D QEMU_ALIGN_DOWN(opsz, 4);
+        expand_3x4(dofs, aofs, bofs, done, g->fni4);
+        dofs +=3D done;
+        aofs +=3D done;
+        bofs +=3D done;
+        opsz -=3D done;
+        clsz -=3D done;
+    }
+
+    if (opsz =3D=3D 0) {
+        if (clsz !=3D 0) {
+            expand_clr(dofs, clsz);
+        }
+        return;
+    }
+
+ do_ool:
+    expand_3_o(dofs, aofs, bofs, opsz, clsz, g->fno);
 }
=20
 static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)
@@ -240,6 +343,9 @@ void tcg_gen_gvec_add8(uint32_t dofs, uint32_t aofs, ui=
nt32_t bofs,
     static const GVecGen3 g =3D {
         .extra_value =3D REP8(0x80),
         .fni8x =3D gen_addv_mask,
+        .op_v256 =3D INDEX_op_add8_v256,
+        .op_v128 =3D INDEX_op_add8_v128,
+        .op_v64 =3D INDEX_op_add8_v64,
         .fno =3D gen_helper_gvec_add8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -251,6 +357,9 @@ void tcg_gen_gvec_add16(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
     static const GVecGen3 g =3D {
         .extra_value =3D REP16(0x8000),
         .fni8x =3D gen_addv_mask,
+        .op_v256 =3D INDEX_op_add16_v256,
+        .op_v128 =3D INDEX_op_add16_v128,
+        .op_v64 =3D INDEX_op_add16_v64,
         .fno =3D gen_helper_gvec_add16,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -261,6 +370,9 @@ void tcg_gen_gvec_add32(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni4 =3D tcg_gen_add_i32,
+        .op_v256 =3D INDEX_op_add32_v256,
+        .op_v128 =3D INDEX_op_add32_v128,
+        .op_v64 =3D INDEX_op_add32_v64,
         .fno =3D gen_helper_gvec_add32,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -271,6 +383,8 @@ void tcg_gen_gvec_add64(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_add_i64,
+        .op_v256 =3D INDEX_op_add64_v256,
+        .op_v128 =3D INDEX_op_add64_v128,
         .fno =3D gen_helper_gvec_add64,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -328,6 +442,9 @@ void tcg_gen_gvec_sub8(uint32_t dofs, uint32_t aofs, ui=
nt32_t bofs,
     static const GVecGen3 g =3D {
         .extra_value =3D REP8(0x80),
         .fni8x =3D gen_subv_mask,
+        .op_v256 =3D INDEX_op_sub8_v256,
+        .op_v128 =3D INDEX_op_sub8_v128,
+        .op_v64 =3D INDEX_op_sub8_v64,
         .fno =3D gen_helper_gvec_sub8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -339,6 +456,9 @@ void tcg_gen_gvec_sub16(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
     static const GVecGen3 g =3D {
         .extra_value =3D REP16(0x8000),
         .fni8x =3D gen_subv_mask,
+        .op_v256 =3D INDEX_op_sub16_v256,
+        .op_v128 =3D INDEX_op_sub16_v128,
+        .op_v64 =3D INDEX_op_sub16_v64,
         .fno =3D gen_helper_gvec_sub16,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -349,6 +469,9 @@ void tcg_gen_gvec_sub32(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni4 =3D tcg_gen_sub_i32,
+        .op_v256 =3D INDEX_op_sub32_v256,
+        .op_v128 =3D INDEX_op_sub32_v128,
+        .op_v64 =3D INDEX_op_sub32_v64,
         .fno =3D gen_helper_gvec_sub32,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -359,6 +482,8 @@ void tcg_gen_gvec_sub64(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_sub_i64,
+        .op_v256 =3D INDEX_op_sub64_v256,
+        .op_v128 =3D INDEX_op_sub64_v128,
         .fno =3D gen_helper_gvec_sub64,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -397,6 +522,9 @@ void tcg_gen_gvec_and8(uint32_t dofs, uint32_t aofs, ui=
nt32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_and_i64,
+        .op_v256 =3D INDEX_op_and_v256,
+        .op_v128 =3D INDEX_op_and_v128,
+        .op_v64 =3D INDEX_op_and_v64,
         .fno =3D gen_helper_gvec_and8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -407,6 +535,9 @@ void tcg_gen_gvec_or8(uint32_t dofs, uint32_t aofs, uin=
t32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_or_i64,
+        .op_v256 =3D INDEX_op_or_v256,
+        .op_v128 =3D INDEX_op_or_v128,
+        .op_v64 =3D INDEX_op_or_v64,
         .fno =3D gen_helper_gvec_or8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -417,6 +548,9 @@ void tcg_gen_gvec_xor8(uint32_t dofs, uint32_t aofs, ui=
nt32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_xor_i64,
+        .op_v256 =3D INDEX_op_xor_v256,
+        .op_v128 =3D INDEX_op_xor_v128,
+        .op_v64 =3D INDEX_op_xor_v64,
         .fno =3D gen_helper_gvec_xor8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -427,6 +561,9 @@ void tcg_gen_gvec_andc8(uint32_t dofs, uint32_t aofs, u=
int32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_andc_i64,
+        .op_v256 =3D INDEX_op_andc_v256,
+        .op_v128 =3D INDEX_op_andc_v128,
+        .op_v64 =3D INDEX_op_andc_v64,
         .fno =3D gen_helper_gvec_andc8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
@@ -437,6 +574,9 @@ void tcg_gen_gvec_orc8(uint32_t dofs, uint32_t aofs, ui=
nt32_t bofs,
 {
     static const GVecGen3 g =3D {
         .fni8 =3D tcg_gen_orc_i64,
+        .op_v256 =3D INDEX_op_orc_v256,
+        .op_v128 =3D INDEX_op_orc_v128,
+        .op_v64 =3D INDEX_op_orc_v64,
         .fno =3D gen_helper_gvec_orc8,
     };
     tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 879b29e81f..86eb4214b0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -604,7 +604,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr =
base,
     return temp_idx(s, ts);
 }
=20
-static int tcg_temp_new_internal(TCGType type, int temp_local)
+int tcg_temp_new_internal(TCGType type, bool temp_local)
 {
     TCGContext *s =3D &tcg_ctx;
     TCGTemp *ts;
@@ -650,7 +650,7 @@ static int tcg_temp_new_internal(TCGType type, int temp=
_local)
     return idx;
 }
=20
-TCGv_i32 tcg_temp_new_internal_i32(int temp_local)
+TCGv_i32 tcg_temp_new_internal_i32(bool temp_local)
 {
     int idx;
=20
@@ -658,7 +658,7 @@ TCGv_i32 tcg_temp_new_internal_i32(int temp_local)
     return MAKE_TCGV_I32(idx);
 }
=20
-TCGv_i64 tcg_temp_new_internal_i64(int temp_local)
+TCGv_i64 tcg_temp_new_internal_i64(bool temp_local)
 {
     int idx;
=20
@@ -666,7 +666,7 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local)
     return MAKE_TCGV_I64(idx);
 }
=20
-static void tcg_temp_free_internal(int idx)
+void tcg_temp_free_internal(int idx)
 {
     TCGContext *s =3D &tcg_ctx;
     TCGTemp *ts;
--=20
2.13.5


From nobody Fri May  3 04:08:12 2024
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1503011467626297.3491086646583;
 Thu, 17 Aug 2017 16:11:07 -0700 (PDT)
Received: from localhost ([::1]:56754 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1diTwQ-0004By-82
	for importer@patchew.org; Thu, 17 Aug 2017 19:11:06 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44638)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTnB-00053O-U6
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.henderson@linaro.org>) id 1diTn9-000107-5p
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:34 -0400
Received: from mail-pg0-x236.google.com ([2607:f8b0:400e:c05::236]:38835)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
	id 1diTn8-0000zc-TF
	for qemu-devel@nongnu.org; Thu, 17 Aug 2017 19:01:31 -0400
Received: by mail-pg0-x236.google.com with SMTP id t80so24357871pgb.5
	for <qemu-devel@nongnu.org>; Thu, 17 Aug 2017 16:01:30 -0700 (PDT)
Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net.
	[97.126.108.236]) by smtp.gmail.com with ESMTPSA id
	c23sm5190043pfc.136.2017.08.17.16.01.26
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Thu, 17 Aug 2017 16:01:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=dCdy+QRsPHrn9hSMlPvVaJHGa7p6SJQcfc8C7AMm1tk=;
	b=VmT+n/qCUGZgbd3eLemOH9Nwy2f5WiaWGMLwFtRtyJZPYP7igI5TzvPQAM6H53OOHQ
	eJps6bJs7v1YQkhIoENVQPuHwQHy9kiSaGGKjH/EmzY0USuka1WuHRL+6jGHSZjOXEiE
	rkFI2s/gp5j73pvkXtRFJo+RChJ5tj9kic1wU=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=dCdy+QRsPHrn9hSMlPvVaJHGa7p6SJQcfc8C7AMm1tk=;
	b=JFxK+d1MSN/E+Bo7FHGYoFXRn/Epo7zpewp2kuy7W1BeiuJFhfC7TIxP+USlrZJ47M
	3bjcmMj/CltFuHpkWXiuBtQ3r3yPLG3XK22rbXvXwZ1qSYuT8MwJXzByM2IAVYn3rFLp
	H08ICOZjmUEYCf20O2pLYWyenLKFU6EiTBK57uXlhIYoS5sL7jqC8WG/WmN4ZpJBIP1A
	KiX47s2ZVCHNmTn6QppBtHqypGl7skvlZwRG8DjleOOoAL6o9IZpGbp61kZ+Fp808BlY
	Oy6au69SKQqlsyoPqKrn0u3QPRcAfCuW81vqugCSMZCc8p3TDTmT0Pwv6/SDV7/dD5IU
	R99w==
X-Gm-Message-State: AHYfb5hYWYL+w7LFxzwID6t/l0XSnTXJJG63iwXqrUeuIF7Ct4hekT5S
	e5s8UqiGb+6YbULa86u2mg==
X-Received: by 10.98.208.196 with SMTP id p187mr6819530pfg.320.1503010888086;
	Thu, 17 Aug 2017 16:01:28 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Thu, 17 Aug 2017 16:01:14 -0700
Message-Id: <20170817230114.3655-9-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.13.5
In-Reply-To: <20170817230114.3655-1-richard.henderson@linaro.org>
References: <20170817230114.3655-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400e:c05::236
Subject: [Qemu-devel] [PATCH 8/8] tcg/i386: Add vector operations
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  46 +++++-
 tcg/tcg-opc.h             |  12 +-
 tcg/i386/tcg-target.inc.c | 382 ++++++++++++++++++++++++++++++++++++++++++=
----
 3 files changed, 399 insertions(+), 41 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index e512648c95..147f82062b 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -30,11 +30,10 @@
=20
 #ifdef __x86_64__
 # define TCG_TARGET_REG_BITS  64
-# define TCG_TARGET_NB_REGS   16
 #else
 # define TCG_TARGET_REG_BITS  32
-# define TCG_TARGET_NB_REGS    8
 #endif
+# define TCG_TARGET_NB_REGS   24
=20
 typedef enum {
     TCG_REG_EAX =3D 0,
@@ -56,6 +55,19 @@ typedef enum {
     TCG_REG_R13,
     TCG_REG_R14,
     TCG_REG_R15,
+
+    /* SSE registers; 64-bit has access to 8 more, but we won't
+       need more than a few and using only the first 8 minimizes
+       the need for a rex prefix on the sse instructions.  */
+    TCG_REG_XMM0,
+    TCG_REG_XMM1,
+    TCG_REG_XMM2,
+    TCG_REG_XMM3,
+    TCG_REG_XMM4,
+    TCG_REG_XMM5,
+    TCG_REG_XMM6,
+    TCG_REG_XMM7,
+
     TCG_REG_RAX =3D TCG_REG_EAX,
     TCG_REG_RCX =3D TCG_REG_ECX,
     TCG_REG_RDX =3D TCG_REG_EDX,
@@ -79,6 +91,17 @@ extern bool have_bmi1;
 extern bool have_bmi2;
 extern bool have_popcnt;
=20
+#ifdef __SSE2__
+#define have_sse2  true
+#else
+extern bool have_sse2;
+#endif
+#ifdef __AVX2__
+#define have_avx2  true
+#else
+extern bool have_avx2;
+#endif
+
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32         1
 #define TCG_TARGET_HAS_rot_i32          1
@@ -147,6 +170,25 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_mulsh_i64        0
 #endif
=20
+#define TCG_TARGET_HAS_v64              have_sse2
+#define TCG_TARGET_HAS_v128             have_sse2
+#define TCG_TARGET_HAS_v256             have_avx2
+
+#define TCG_TARGET_HAS_andc_v64         TCG_TARGET_HAS_v64
+#define TCG_TARGET_HAS_orc_v64          0
+#define TCG_TARGET_HAS_not_v64          0
+#define TCG_TARGET_HAS_neg_v64          0
+
+#define TCG_TARGET_HAS_andc_v128        TCG_TARGET_HAS_v128
+#define TCG_TARGET_HAS_orc_v128         0
+#define TCG_TARGET_HAS_not_v128         0
+#define TCG_TARGET_HAS_neg_v128         0
+
+#define TCG_TARGET_HAS_andc_v256        TCG_TARGET_HAS_v256
+#define TCG_TARGET_HAS_orc_v256         0
+#define TCG_TARGET_HAS_not_v256         0
+#define TCG_TARGET_HAS_neg_v256         0
+
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (have_bmi2 ||                              \
      ((ofs) =3D=3D 0 && (len) =3D=3D 8) ||             \
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index b1445a4c24..b84cd584fb 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -212,13 +212,13 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
 /* Host integer vector operations.  */
 /* These opcodes are required whenever the base vector size is enabled.  */
=20
-DEF(mov_v64, 1, 1, 0, IMPL(TCG_TARGET_HAS_v64))
-DEF(mov_v128, 1, 1, 0, IMPL(TCG_TARGET_HAS_v128))
-DEF(mov_v256, 1, 1, 0, IMPL(TCG_TARGET_HAS_v256))
+DEF(mov_v64, 1, 1, 0, TCG_OPF_NOT_PRESENT)
+DEF(mov_v128, 1, 1, 0, TCG_OPF_NOT_PRESENT)
+DEF(mov_v256, 1, 1, 0, TCG_OPF_NOT_PRESENT)
=20
-DEF(movi_v64, 1, 0, 1, IMPL(TCG_TARGET_HAS_v64))
-DEF(movi_v128, 1, 0, 1, IMPL(TCG_TARGET_HAS_v128))
-DEF(movi_v256, 1, 0, 1, IMPL(TCG_TARGET_HAS_v256))
+DEF(movi_v64, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(movi_v128, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(movi_v256, 1, 0, 1, TCG_OPF_NOT_PRESENT)
=20
 DEF(ld_v64, 1, 1, 1, IMPL(TCG_TARGET_HAS_v64))
 DEF(ld_v128, 1, 1, 1, IMPL(TCG_TARGET_HAS_v128))
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index aeefb72aa0..0e01b54aa0 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -31,7 +31,9 @@ static const char * const tcg_target_reg_names[TCG_TARGET=
_NB_REGS] =3D {
     "%r8",  "%r9",  "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
 #else
     "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
+    NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 #endif
+    "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7",
 };
 #endif
=20
@@ -61,6 +63,14 @@ static const int tcg_target_reg_alloc_order[] =3D {
     TCG_REG_EDX,
     TCG_REG_EAX,
 #endif
+    TCG_REG_XMM0,
+    TCG_REG_XMM1,
+    TCG_REG_XMM2,
+    TCG_REG_XMM3,
+    TCG_REG_XMM4,
+    TCG_REG_XMM5,
+    TCG_REG_XMM6,
+    TCG_REG_XMM7,
 };
=20
 static const int tcg_target_call_iarg_regs[] =3D {
@@ -94,7 +104,7 @@ static const int tcg_target_call_oarg_regs[] =3D {
 #define TCG_CT_CONST_I32 0x400
 #define TCG_CT_CONST_WSZ 0x800
=20
-/* Registers used with L constraint, which are the first argument=20
+/* Registers used with L constraint, which are the first argument
    registers on x86_64, and two random call clobbered registers on
    i386. */
 #if TCG_TARGET_REG_BITS =3D=3D 64
@@ -127,6 +137,16 @@ bool have_bmi1;
 bool have_bmi2;
 bool have_popcnt;
=20
+#ifndef have_sse2
+bool have_sse2;
+#endif
+#ifdef have_avx2
+#define have_avx1  have_avx2
+#else
+static bool have_avx1;
+bool have_avx2;
+#endif
+
 #ifdef CONFIG_CPUID_H
 static bool have_movbe;
 static bool have_lzcnt;
@@ -215,6 +235,10 @@ static const char *target_parse_constraint(TCGArgConst=
raint *ct,
         /* With TZCNT/LZCNT, we can have operand-size as an input.  */
         ct->ct |=3D TCG_CT_CONST_WSZ;
         break;
+    case 'x':
+        ct->ct |=3D TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 0xff0000);
+        break;
=20
         /* qemu_ld/st address constraint */
     case 'L':
@@ -292,6 +316,7 @@ static inline int tcg_target_const_match(tcg_target_lon=
g val, TCGType type,
 #endif
 #define P_SIMDF3        0x20000         /* 0xf3 opcode prefix */
 #define P_SIMDF2        0x40000         /* 0xf2 opcode prefix */
+#define P_VEXL          0x80000         /* Set VEX.L =3D 1 */
=20
 #define OPC_ARITH_EvIz	(0x81)
 #define OPC_ARITH_EvIb	(0x83)
@@ -324,13 +349,31 @@ static inline int tcg_target_const_match(tcg_target_l=
ong val, TCGType type,
 #define OPC_MOVL_Iv     (0xb8)
 #define OPC_MOVBE_GyMy  (0xf0 | P_EXT38)
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
+#define OPC_MOVDQA_GyMy (0x6f | P_EXT | P_DATA16)
+#define OPC_MOVDQA_MyGy (0x7f | P_EXT | P_DATA16)
+#define OPC_MOVDQU_GyMy (0x6f | P_EXT | P_SIMDF3)
+#define OPC_MOVDQU_MyGy (0x7f | P_EXT | P_SIMDF3)
+#define OPC_MOVQ_GyMy   (0x7e | P_EXT | P_SIMDF3)
+#define OPC_MOVQ_MyGy   (0xd6 | P_EXT | P_DATA16)
 #define OPC_MOVSBL	(0xbe | P_EXT)
 #define OPC_MOVSWL	(0xbf | P_EXT)
 #define OPC_MOVSLQ	(0x63 | P_REXW)
 #define OPC_MOVZBL	(0xb6 | P_EXT)
 #define OPC_MOVZWL	(0xb7 | P_EXT)
+#define OPC_PADDB       (0xfc | P_EXT | P_DATA16)
+#define OPC_PADDW       (0xfd | P_EXT | P_DATA16)
+#define OPC_PADDD       (0xfe | P_EXT | P_DATA16)
+#define OPC_PADDQ       (0xd4 | P_EXT | P_DATA16)
+#define OPC_PAND        (0xdb | P_EXT | P_DATA16)
+#define OPC_PANDN       (0xdf | P_EXT | P_DATA16)
 #define OPC_PDEP        (0xf5 | P_EXT38 | P_SIMDF2)
 #define OPC_PEXT        (0xf5 | P_EXT38 | P_SIMDF3)
+#define OPC_POR         (0xeb | P_EXT | P_DATA16)
+#define OPC_PSUBB       (0xf8 | P_EXT | P_DATA16)
+#define OPC_PSUBW       (0xf9 | P_EXT | P_DATA16)
+#define OPC_PSUBD       (0xfa | P_EXT | P_DATA16)
+#define OPC_PSUBQ       (0xfb | P_EXT | P_DATA16)
+#define OPC_PXOR        (0xef | P_EXT | P_DATA16)
 #define OPC_POP_r32	(0x58)
 #define OPC_POPCNT      (0xb8 | P_EXT | P_SIMDF3)
 #define OPC_PUSH_r32	(0x50)
@@ -500,7 +543,8 @@ static void tcg_out_modrm(TCGContext *s, int opc, int r=
, int rm)
     tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm));
 }
=20
-static void tcg_out_vex_pfx_opc(TCGContext *s, int opc, int r, int v, int =
rm)
+static void tcg_out_vex_pfx_opc(TCGContext *s, int opc, int r, int v,
+                                int rm, int index)
 {
     int tmp;
=20
@@ -515,14 +559,16 @@ static void tcg_out_vex_pfx_opc(TCGContext *s, int op=
c, int r, int v, int rm)
     } else if (opc & P_EXT) {
         tmp =3D 1;
     } else {
-        tcg_abort();
+        g_assert_not_reached();
     }
-    tmp |=3D 0x40;                           /* VEX.X */
     tmp |=3D (r & 8 ? 0 : 0x80);             /* VEX.R */
+    tmp |=3D (index & 8 ? 0 : 0x40);         /* VEX.X */
     tmp |=3D (rm & 8 ? 0 : 0x20);            /* VEX.B */
     tcg_out8(s, tmp);
=20
     tmp =3D (opc & P_REXW ? 0x80 : 0);       /* VEX.W */
+    tmp |=3D (opc & P_VEXL ? 0x04 : 0);      /* VEX.L */
+
     /* VEX.pp */
     if (opc & P_DATA16) {
         tmp |=3D 1;                          /* 0x66 */
@@ -538,7 +584,7 @@ static void tcg_out_vex_pfx_opc(TCGContext *s, int opc,=
 int r, int v, int rm)
=20
 static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm)
 {
-    tcg_out_vex_pfx_opc(s, opc, r, v, rm);
+    tcg_out_vex_pfx_opc(s, opc, r, v, rm, 0);
     tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm));
 }
=20
@@ -565,7 +611,7 @@ static void tcg_out_opc_pool_imm(TCGContext *s, int opc=
, int r,
 static void tcg_out_vex_pool_imm(TCGContext *s, int opc, int r, int v,
                                  tcg_target_ulong data)
 {
-    tcg_out_vex_pfx_opc(s, opc, r, v, 0);
+    tcg_out_vex_pfx_opc(s, opc, r, v, 0, 0);
     tcg_out_sfx_pool_imm(s, r, data);
 }
=20
@@ -574,8 +620,8 @@ static void tcg_out_vex_pool_imm(TCGContext *s, int opc=
, int r, int v,
    mode for absolute addresses, ~RM is the size of the immediate operand
    that will follow the instruction.  */
=20
-static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm,
-                                     int index, int shift, intptr_t offset)
+static void tcg_out_sib_offset(TCGContext *s, int r, int rm, int index,
+                               int shift, intptr_t offset)
 {
     int mod, len;
=20
@@ -586,7 +632,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int=
 opc, int r, int rm,
             intptr_t pc =3D (intptr_t)s->code_ptr + 5 + ~rm;
             intptr_t disp =3D offset - pc;
             if (disp =3D=3D (int32_t)disp) {
-                tcg_out_opc(s, opc, r, 0, 0);
                 tcg_out8(s, (LOWREGMASK(r) << 3) | 5);
                 tcg_out32(s, disp);
                 return;
@@ -596,7 +641,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int=
 opc, int r, int rm,
                use of the MODRM+SIB encoding and is therefore larger than
                rip-relative addressing.  */
             if (offset =3D=3D (int32_t)offset) {
-                tcg_out_opc(s, opc, r, 0, 0);
                 tcg_out8(s, (LOWREGMASK(r) << 3) | 4);
                 tcg_out8(s, (4 << 3) | 5);
                 tcg_out32(s, offset);
@@ -604,10 +648,9 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, in=
t opc, int r, int rm,
             }
=20
             /* ??? The memory isn't directly addressable.  */
-            tcg_abort();
+            g_assert_not_reached();
         } else {
             /* Absolute address.  */
-            tcg_out_opc(s, opc, r, 0, 0);
             tcg_out8(s, (r << 3) | 5);
             tcg_out32(s, offset);
             return;
@@ -630,7 +673,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int=
 opc, int r, int rm,
        that would be used for %esp is the escape to the two byte form.  */
     if (index < 0 && LOWREGMASK(rm) !=3D TCG_REG_ESP) {
         /* Single byte MODRM format.  */
-        tcg_out_opc(s, opc, r, rm, 0);
         tcg_out8(s, mod | (LOWREGMASK(r) << 3) | LOWREGMASK(rm));
     } else {
         /* Two byte MODRM+SIB format.  */
@@ -644,7 +686,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int=
 opc, int r, int rm,
             tcg_debug_assert(index !=3D TCG_REG_ESP);
         }
=20
-        tcg_out_opc(s, opc, r, rm, index);
         tcg_out8(s, mod | (LOWREGMASK(r) << 3) | 4);
         tcg_out8(s, (shift << 6) | (LOWREGMASK(index) << 3) | LOWREGMASK(r=
m));
     }
@@ -656,6 +697,21 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, in=
t opc, int r, int rm,
     }
 }
=20
+static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm,
+                                     int index, int shift, intptr_t offset)
+{
+    tcg_out_opc(s, opc, r, rm < 0 ? 0 : rm, index < 0 ? 0 : index);
+    tcg_out_sib_offset(s, r, rm, index, shift, offset);
+}
+
+static void tcg_out_vex_modrm_sib_offset(TCGContext *s, int opc, int r, in=
t v,
+                                         int rm, int index, int shift,
+                                         intptr_t offset)
+{
+    tcg_out_vex_pfx_opc(s, opc, r, v, rm < 0 ? 0 : rm, index < 0 ? 0 : ind=
ex);
+    tcg_out_sib_offset(s, r, rm, index, shift, offset);
+}
+
 /* A simplification of the above with no index or shift.  */
 static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r,
                                         int rm, intptr_t offset)
@@ -663,6 +719,31 @@ static inline void tcg_out_modrm_offset(TCGContext *s,=
 int opc, int r,
     tcg_out_modrm_sib_offset(s, opc, r, rm, -1, 0, offset);
 }
=20
+static inline void tcg_out_vex_modrm_offset(TCGContext *s, int opc, int r,
+                                            int v, int rm, intptr_t offset)
+{
+    tcg_out_vex_modrm_sib_offset(s, opc, r, v, rm, -1, 0, offset);
+}
+
+static void tcg_out_maybe_vex_modrm(TCGContext *s, int opc, int r, int rm)
+{
+    if (have_avx1) {
+        tcg_out_vex_modrm(s, opc, r, 0, rm);
+    } else {
+        tcg_out_modrm(s, opc, r, rm);
+    }
+}
+
+static void tcg_out_maybe_vex_modrm_offset(TCGContext *s, int opc, int r,
+                                           int rm, intptr_t offset)
+{
+    if (have_avx1) {
+        tcg_out_vex_modrm_offset(s, opc, r, 0, rm, offset);
+    } else {
+        tcg_out_modrm_offset(s, opc, r, rm, offset);
+    }
+}
+
 /* Generate dest op=3D src.  Uses the same ARITH_* codes as tgen_arithi.  =
*/
 static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
 {
@@ -673,12 +754,32 @@ static inline void tgen_arithr(TCGContext *s, int sub=
op, int dest, int src)
     tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
 }
=20
-static inline void tcg_out_mov(TCGContext *s, TCGType type,
-                               TCGReg ret, TCGReg arg)
+static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg ar=
g)
 {
     if (arg !=3D ret) {
-        int opc =3D OPC_MOVL_GvEv + (type =3D=3D TCG_TYPE_I64 ? P_REXW : 0=
);
-        tcg_out_modrm(s, opc, ret, arg);
+        int opc =3D 0;
+
+        switch (type) {
+        case TCG_TYPE_I64:
+            opc =3D P_REXW;
+            /* fallthru */
+        case TCG_TYPE_I32:
+            opc |=3D OPC_MOVL_GvEv;
+            tcg_out_modrm(s, opc, ret, arg);
+            break;
+
+        case TCG_TYPE_V256:
+            opc =3D P_VEXL;
+            /* fallthru */
+        case TCG_TYPE_V128:
+        case TCG_TYPE_V64:
+            opc |=3D OPC_MOVDQA_GyMy;
+            tcg_out_maybe_vex_modrm(s, opc, ret, arg);
+            break;
+
+        default:
+            g_assert_not_reached();
+        }
     }
 }
=20
@@ -687,6 +788,27 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 {
     tcg_target_long diff;
=20
+    switch (type) {
+    case TCG_TYPE_I32:
+    case TCG_TYPE_I64:
+        break;
+
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        /* ??? Revisit this as the implementation progresses.  */
+        tcg_debug_assert(arg =3D=3D 0);
+        if (have_avx1) {
+            tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret);
+        } else {
+            tcg_out_modrm(s, OPC_PXOR, ret, ret);
+        }
+        return;
+
+    default:
+        g_assert_not_reached();
+    }
+
     if (arg =3D=3D 0) {
         tgen_arithr(s, ARITH_XOR, ret, ret);
         return;
@@ -750,18 +872,54 @@ static inline void tcg_out_pop(TCGContext *s, int reg)
     tcg_out_opc(s, OPC_POP_r32 + LOWREGMASK(reg), 0, reg, 0);
 }
=20
-static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
-                              TCGReg arg1, intptr_t arg2)
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
+                       TCGReg arg1, intptr_t arg2)
 {
-    int opc =3D OPC_MOVL_GvEv + (type =3D=3D TCG_TYPE_I64 ? P_REXW : 0);
-    tcg_out_modrm_offset(s, opc, ret, arg1, arg2);
+    switch (type) {
+    case TCG_TYPE_I64:
+        tcg_out_modrm_offset(s, OPC_MOVL_GvEv | P_REXW, ret, arg1, arg2);
+        break;
+    case TCG_TYPE_I32:
+        tcg_out_modrm_offset(s, OPC_MOVL_GvEv, ret, arg1, arg2);
+        break;
+    case TCG_TYPE_V64:
+        tcg_out_maybe_vex_modrm_offset(s, OPC_MOVQ_GyMy, ret, arg1, arg2);
+        break;
+    case TCG_TYPE_V128:
+        tcg_out_maybe_vex_modrm_offset(s, OPC_MOVDQU_GyMy, ret, arg1, arg2=
);
+        break;
+    case TCG_TYPE_V256:
+        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_GyMy | P_VEXL,
+                                 ret, 0, arg1, arg2);
+        break;
+    default:
+        g_assert_not_reached();
+    }
 }
=20
-static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
-                              TCGReg arg1, intptr_t arg2)
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
+                       TCGReg arg1, intptr_t arg2)
 {
-    int opc =3D OPC_MOVL_EvGv + (type =3D=3D TCG_TYPE_I64 ? P_REXW : 0);
-    tcg_out_modrm_offset(s, opc, arg, arg1, arg2);
+    switch (type) {
+    case TCG_TYPE_I64:
+        tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_REXW, arg, arg1, arg2);
+        break;
+    case TCG_TYPE_I32:
+        tcg_out_modrm_offset(s, OPC_MOVL_EvGv, arg, arg1, arg2);
+        break;
+    case TCG_TYPE_V64:
+        tcg_out_maybe_vex_modrm_offset(s, OPC_MOVQ_MyGy, arg, arg1, arg2);
+        break;
+    case TCG_TYPE_V128:
+        tcg_out_maybe_vex_modrm_offset(s, OPC_MOVDQU_MyGy, arg, arg1, arg2=
);
+        break;
+    case TCG_TYPE_V256:
+        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_MyGy | P_VEXL,
+                                 arg, 0, arg1, arg2);
+        break;
+    default:
+        g_assert_not_reached();
+    }
 }
=20
 static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
@@ -773,6 +931,8 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TC=
GArg val,
             return false;
         }
         rexw =3D P_REXW;
+    } else if (type !=3D TCG_TYPE_I32) {
+        return false;
     }
     tcg_out_modrm_offset(s, OPC_MOVL_EvIz | rexw, 0, base, ofs);
     tcg_out32(s, val);
@@ -1914,6 +2074,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpco=
de opc,
         case glue(glue(INDEX_op_, x), _i32)
 #endif
=20
+#define OP_128_256(x) \
+        case glue(glue(INDEX_op_, x), _v256): \
+            rexw =3D P_VEXL; /* FALLTHRU */     \
+        case glue(glue(INDEX_op_, x), _v128)
+
+#define OP_64_128_256(x) \
+        OP_128_256(x):   \
+        case glue(glue(INDEX_op_, x), _v64)
+
     /* Hoist the loads of the most common arguments.  */
     a0 =3D args[0];
     a1 =3D args[1];
@@ -2379,19 +2548,94 @@ static inline void tcg_out_op(TCGContext *s, TCGOpc=
ode opc,
         }
         break;
=20
+    OP_64_128_256(add8):
+        c =3D OPC_PADDB;
+        goto gen_simd;
+    OP_64_128_256(add16):
+        c =3D OPC_PADDW;
+        goto gen_simd;
+    OP_64_128_256(add32):
+        c =3D OPC_PADDD;
+        goto gen_simd;
+    OP_128_256(add64):
+        c =3D OPC_PADDQ;
+        goto gen_simd;
+    OP_64_128_256(sub8):
+        c =3D OPC_PSUBB;
+        goto gen_simd;
+    OP_64_128_256(sub16):
+        c =3D OPC_PSUBW;
+        goto gen_simd;
+    OP_64_128_256(sub32):
+        c =3D OPC_PSUBD;
+        goto gen_simd;
+    OP_128_256(sub64):
+        c =3D OPC_PSUBQ;
+        goto gen_simd;
+    OP_64_128_256(and):
+        c =3D OPC_PAND;
+        goto gen_simd;
+    OP_64_128_256(andc):
+        c =3D OPC_PANDN;
+        goto gen_simd;
+    OP_64_128_256(or):
+        c =3D OPC_POR;
+        goto gen_simd;
+    OP_64_128_256(xor):
+        c =3D OPC_PXOR;
+    gen_simd:
+        if (have_avx1) {
+            tcg_out_vex_modrm(s, c, a0, a1, a2);
+        } else {
+            tcg_out_modrm(s, c, a0, a2);
+        }
+        break;
+
+    case INDEX_op_ld_v64:
+        c =3D TCG_TYPE_V64;
+        goto gen_simd_ld;
+    case INDEX_op_ld_v128:
+        c =3D TCG_TYPE_V128;
+        goto gen_simd_ld;
+    case INDEX_op_ld_v256:
+        c =3D TCG_TYPE_V256;
+    gen_simd_ld:
+        tcg_out_ld(s, c, a0, a1, a2);
+        break;
+
+    case INDEX_op_st_v64:
+        c =3D TCG_TYPE_V64;
+        goto gen_simd_st;
+    case INDEX_op_st_v128:
+        c =3D TCG_TYPE_V128;
+        goto gen_simd_st;
+    case INDEX_op_st_v256:
+        c =3D TCG_TYPE_V256;
+    gen_simd_st:
+        tcg_out_st(s, c, a0, a1, a2);
+        break;
+
     case INDEX_op_mb:
         tcg_out_mb(s, a0);
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
+    case INDEX_op_mov_v64:
+    case INDEX_op_mov_v128:
+    case INDEX_op_mov_v256:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_movi_i64:
+    case INDEX_op_movi_v64:
+    case INDEX_op_movi_v128:
+    case INDEX_op_movi_v256:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
     }
=20
 #undef OP_32_64
+#undef OP_128_256
+#undef OP_64_128_256
 }
=20
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -2417,6 +2661,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpc=
ode op)
         =3D { .args_ct_str =3D { "r", "r", "L", "L" } };
     static const TCGTargetOpDef L_L_L_L
         =3D { .args_ct_str =3D { "L", "L", "L", "L" } };
+    static const TCGTargetOpDef x_0_x =3D { .args_ct_str =3D { "x", "0", "=
x" } };
+    static const TCGTargetOpDef x_x_x =3D { .args_ct_str =3D { "x", "x", "=
x" } };
+    static const TCGTargetOpDef x_r =3D { .args_ct_str =3D { "x", "r" } };
=20
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -2620,6 +2867,52 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOp=
code op)
             return &s2;
         }
=20
+    case INDEX_op_ld_v64:
+    case INDEX_op_ld_v128:
+    case INDEX_op_ld_v256:
+    case INDEX_op_st_v64:
+    case INDEX_op_st_v128:
+    case INDEX_op_st_v256:
+        return &x_r;
+
+    case INDEX_op_add8_v64:
+    case INDEX_op_add8_v128:
+    case INDEX_op_add16_v64:
+    case INDEX_op_add16_v128:
+    case INDEX_op_add32_v64:
+    case INDEX_op_add32_v128:
+    case INDEX_op_add64_v128:
+    case INDEX_op_sub8_v64:
+    case INDEX_op_sub8_v128:
+    case INDEX_op_sub16_v64:
+    case INDEX_op_sub16_v128:
+    case INDEX_op_sub32_v64:
+    case INDEX_op_sub32_v128:
+    case INDEX_op_sub64_v128:
+    case INDEX_op_and_v64:
+    case INDEX_op_and_v128:
+    case INDEX_op_andc_v64:
+    case INDEX_op_andc_v128:
+    case INDEX_op_or_v64:
+    case INDEX_op_or_v128:
+    case INDEX_op_xor_v64:
+    case INDEX_op_xor_v128:
+        return have_avx1 ? &x_x_x : &x_0_x;
+
+    case INDEX_op_add8_v256:
+    case INDEX_op_add16_v256:
+    case INDEX_op_add32_v256:
+    case INDEX_op_add64_v256:
+    case INDEX_op_sub8_v256:
+    case INDEX_op_sub16_v256:
+    case INDEX_op_sub32_v256:
+    case INDEX_op_sub64_v256:
+    case INDEX_op_and_v256:
+    case INDEX_op_andc_v256:
+    case INDEX_op_or_v256:
+    case INDEX_op_xor_v256:
+        return &x_x_x;
+
     default:
         break;
     }
@@ -2725,9 +3018,16 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int c=
ount)
 static void tcg_target_init(TCGContext *s)
 {
 #ifdef CONFIG_CPUID_H
-    unsigned a, b, c, d;
+    unsigned a, b, c, d, b7 =3D 0;
     int max =3D __get_cpuid_max(0, 0);
=20
+    if (max >=3D 7) {
+        /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs.  */
+        __cpuid_count(7, 0, a, b7, c, d);
+        have_bmi1 =3D (b7 & bit_BMI) !=3D 0;
+        have_bmi2 =3D (b7 & bit_BMI2) !=3D 0;
+    }
+
     if (max >=3D 1) {
         __cpuid(1, a, b, c, d);
 #ifndef have_cmov
@@ -2736,17 +3036,26 @@ static void tcg_target_init(TCGContext *s)
            available, we'll use a small forward branch.  */
         have_cmov =3D (d & bit_CMOV) !=3D 0;
 #endif
+#ifndef have_sse2
+        have_sse2 =3D (d & bit_SSE2) !=3D 0;
+#endif
         /* MOVBE is only available on Intel Atom and Haswell CPUs, so we
            need to probe for it.  */
         have_movbe =3D (c & bit_MOVBE) !=3D 0;
         have_popcnt =3D (c & bit_POPCNT) !=3D 0;
-    }
=20
-    if (max >=3D 7) {
-        /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs.  */
-        __cpuid_count(7, 0, a, b, c, d);
-        have_bmi1 =3D (b & bit_BMI) !=3D 0;
-        have_bmi2 =3D (b & bit_BMI2) !=3D 0;
+#ifndef have_avx2
+        /* There are a number of things we must check before we can be
+           sure of not hitting invalid opcode.  */
+        if (c & bit_OSXSAVE) {
+            unsigned xcrl, xcrh;
+            asm ("xgetbv" : "=3Da" (xcrl), "=3Dd" (xcrh) : "c" (0));
+            if (xcrl & 6 =3D=3D 6) {
+                have_avx1 =3D (c & bit_AVX) !=3D 0;
+                have_avx2 =3D (b7 & bit_AVX2) !=3D 0;
+            }
+        }
+#endif
     }
=20
     max =3D __get_cpuid_max(0x8000000, 0);
@@ -2763,6 +3072,13 @@ static void tcg_target_init(TCGContext *s)
     } else {
         tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xff);
     }
+    if (have_sse2) {
+        tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_V64], 0, 0xff0=
000);
+        tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_V128], 0, 0xff=
0000);
+    }
+    if (have_avx2) {
+        tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_V256], 0, 0xff=
0000);
+    }
=20
     tcg_regset_clear(tcg_target_call_clobber_regs);
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_EAX);
--=20
2.13.5