From nobody Mon Apr 29 12:11:19 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1552655108274194.6143699038729; Fri, 15 Mar 2019 06:05:08 -0700 (PDT) Received: from localhost ([127.0.0.1]:54629 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h4mWH-0004qA-5b for importer@patchew.org; Fri, 15 Mar 2019 09:05:05 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h4mTG-0002TT-LM for qemu-devel@nongnu.org; Fri, 15 Mar 2019 09:01:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h4mQP-00075z-GL for qemu-devel@nongnu.org; Fri, 15 Mar 2019 08:59:03 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:37814 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h4mQO-00073q-N8 for qemu-devel@nongnu.org; Fri, 15 Mar 2019 08:59:01 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 0F3881A1D4E; Fri, 15 Mar 2019 13:02:54 +0100 (CET) Received: from rtrkw310-lin.domain.local (rtrkw310-lin.domain.local [10.10.13.72]) by mail.rt-rk.com (Postfix) with ESMTPSA id E7A551A202C; Fri, 15 Mar 2019 13:02:53 +0100 (CET) X-Virus-Scanned: amavisd-new at rt-rk.com From: Mateja Marjanovic To: qemu-devel@nongnu.org Date: Fri, 15 Mar 2019 13:02:47 +0100 Message-Id: <1552651368-7422-2-git-send-email-mateja.marjanovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1552651368-7422-1-git-send-email-mateja.marjanovic@rt-rk.com> References: <1552651368-7422-1-git-send-email-mateja.marjanovic@rt-rk.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 1/2] target/mips: Optimize ILVOD. MSA instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: arikalo@wavecomp.com, amarkovic@wavecomp.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" From: Mateja Marjanovic Optimize set of MSA instructions ILVOD, using directly tcg registers and performing logic on them insted of using helpers. Performance measurement is done by executing the instructions large number of times on a computer with Intel Core i7-3770 CPU @ 3.40GHz=C3=978. instruction || before || after || =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ilvod.b: || 66.97 ms || 26.34 ms || ilvod.h: || 44.75 ms || 25.17 ms || ilvod.w: || 41.27 ms || 24.37 ms || ilvod.d: || 41.75 ms || 20.50 ms || Signed-off-by: Mateja Marjanovic --- target/mips/helper.h | 1 - target/mips/msa_helper.c | 51 -------------------- target/mips/translate.c | 119 +++++++++++++++++++++++++++++++++++++++++++= +++- 3 files changed, 118 insertions(+), 53 deletions(-) diff --git a/target/mips/helper.h b/target/mips/helper.h index a6d687e..d162836 100644 --- a/target/mips/helper.h +++ b/target/mips/helper.h @@ -865,7 +865,6 @@ DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i3= 2) DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32) -DEF_HELPER_5(msa_ilvod_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32) diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c index 9d9dafe..cbcfd57 100644 --- a/target/mips/msa_helper.c +++ b/target/mips/msa_helper.c @@ -1363,57 +1363,6 @@ void helper_msa_ilvev_df(CPUMIPSState *env, uint32_t= df, uint32_t wd, } } =20 -void helper_msa_ilvod_df(CPUMIPSState *env, uint32_t df, uint32_t wd, - uint32_t ws, uint32_t wt) -{ - wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); - wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); - wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); - - switch (df) { - case DF_BYTE: - pwd->b[0] =3D pwt->b[1]; - pwd->b[1] =3D pws->b[1]; - pwd->b[2] =3D pwt->b[3]; - pwd->b[3] =3D pws->b[3]; - pwd->b[4] =3D pwt->b[5]; - pwd->b[5] =3D pws->b[5]; - pwd->b[6] =3D pwt->b[7]; - pwd->b[7] =3D pws->b[7]; - pwd->b[8] =3D pwt->b[9]; - pwd->b[9] =3D pws->b[9]; - pwd->b[10] =3D pwt->b[11]; - pwd->b[11] =3D pws->b[11]; - pwd->b[12] =3D pwt->b[13]; - pwd->b[13] =3D pws->b[13]; - pwd->b[14] =3D pwt->b[15]; - pwd->b[15] =3D pws->b[15]; - break; - case DF_HALF: - pwd->h[0] =3D pwt->h[1]; - pwd->h[1] =3D pws->h[1]; - pwd->h[2] =3D pwt->h[3]; - pwd->h[3] =3D pws->h[3]; - pwd->h[4] =3D pwt->h[5]; - pwd->h[5] =3D pws->h[5]; - pwd->h[6] =3D pwt->h[7]; - pwd->h[7] =3D pws->h[7]; - break; - case DF_WORD: - pwd->w[0] =3D pwt->w[1]; - pwd->w[1] =3D pws->w[1]; - pwd->w[2] =3D pwt->w[3]; - pwd->w[3] =3D pws->w[3]; - break; - case DF_DOUBLE: - pwd->d[0] =3D pwt->d[1]; - pwd->d[1] =3D pws->d[1]; - break; - default: - assert(0); - } -} - void helper_msa_ilvl_df(CPUMIPSState *env, uint32_t df, uint32_t wd, uint32_t ws, uint32_t wt) { diff --git a/target/mips/translate.c b/target/mips/translate.c index b4a1103..101d2de 100644 --- a/target/mips/translate.c +++ b/target/mips/translate.c @@ -28889,6 +28889,108 @@ static void gen_msa_bit(CPUMIPSState *env, DisasC= ontext *ctx) tcg_temp_free_i32(tws); } =20 +static inline void gen_ilvod_b(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + + uint64_t mask =3D (1ULL << 8) - 1; + mask |=3D mask << 16; + mask |=3D mask << 32; + mask <<=3D 8; + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2], mask); + tcg_gen_shri_i64(t0, t0, 8); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2], mask); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2], t1); + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_shri_i64(t0, t0, 8); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], t1); + + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + +static inline void gen_ilvod_h(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + + uint64_t mask =3D (1ULL << 16) - 1; + mask |=3D mask << 32; + mask <<=3D 16; + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2], mask); + tcg_gen_shri_i64(t0, t0, 16); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2], mask); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2], t1); + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_shri_i64(t0, t0, 16); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], t1); + + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + +static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + + uint64_t mask =3D (1ULL << 32) - 1; + tcg_gen_movi_i64(t1, 0); + + mask <<=3D 32; + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2], mask); + tcg_gen_shri_i64(t0, t0, 32); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2], mask); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2], t1); + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_shri_i64(t0, t0, 32); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], t1); + + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + +static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]); + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]); +} + static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx) { #define MASK_MSA_3R(op) (MASK_MSA_MINOR(op) | (op & (0x7 << 23))) @@ -29060,7 +29162,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasCon= text *ctx) gen_helper_msa_mod_u_df(cpu_env, tdf, twd, tws, twt); break; case OPC_ILVOD_df: - gen_helper_msa_ilvod_df(cpu_env, tdf, twd, tws, twt); + switch (df) { + case DF_BYTE: + gen_ilvod_b(env, wd, ws, wt); + break; + case DF_HALF: + gen_ilvod_h(env, wd, ws, wt); + break; + case DF_WORD: + gen_ilvod_w(env, wd, ws, wt); + break; + case DF_DOUBLE: + gen_ilvod_d(env, wd, ws, wt); + break; + default: + assert(0); + } break; =20 case OPC_DOTP_S_df: --=20 2.7.4 From nobody Mon Apr 29 12:11:19 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1552655201893674.8727657460889; Fri, 15 Mar 2019 06:06:41 -0700 (PDT) Received: from localhost ([127.0.0.1]:54685 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h4mXk-00065M-Lo for importer@patchew.org; Fri, 15 Mar 2019 09:06:36 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50301) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h4mTC-0002AM-NQ for qemu-devel@nongnu.org; Fri, 15 Mar 2019 09:01:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h4mQP-00075e-5e for qemu-devel@nongnu.org; Fri, 15 Mar 2019 08:59:03 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:37812 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h4mQO-00073k-E0 for qemu-devel@nongnu.org; Fri, 15 Mar 2019 08:59:00 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 1B9DF1A202C; Fri, 15 Mar 2019 13:02:54 +0100 (CET) Received: from rtrkw310-lin.domain.local (rtrkw310-lin.domain.local [10.10.13.72]) by mail.rt-rk.com (Postfix) with ESMTPSA id F1F221A20EB; Fri, 15 Mar 2019 13:02:53 +0100 (CET) X-Virus-Scanned: amavisd-new at rt-rk.com From: Mateja Marjanovic To: qemu-devel@nongnu.org Date: Fri, 15 Mar 2019 13:02:48 +0100 Message-Id: <1552651368-7422-3-git-send-email-mateja.marjanovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1552651368-7422-1-git-send-email-mateja.marjanovic@rt-rk.com> References: <1552651368-7422-1-git-send-email-mateja.marjanovic@rt-rk.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 2/2] target/mips: Optimize ILVEV. MSA instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: arikalo@wavecomp.com, amarkovic@wavecomp.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" From: Mateja Marjanovic Optimize set of MSA instructions ILVEV, using directly tcg registers and performing logic on them insted of using helpers. Performance measurement is done by executing the instructions large number of times on a computer with Intel Core i7-3770 CPU @ 3.40GHz=C3=978. instruction || before || after || =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ilvev.b || 74.38 ms || 38.85 ms || ilvev.h || 46.78 ms || 33.98 ms || ilvev.w || 45.50 ms || 28.93 ms || ilvev.d || 37.67 ms || 23.09 ms || Signed-off-by: Mateja Marjanovic --- target/mips/helper.h | 1 - target/mips/msa_helper.c | 52 --------------------- target/mips/translate.c | 117 +++++++++++++++++++++++++++++++++++++++++++= +++- 3 files changed, 116 insertions(+), 54 deletions(-) diff --git a/target/mips/helper.h b/target/mips/helper.h index d162836..2f23b0d 100644 --- a/target/mips/helper.h +++ b/target/mips/helper.h @@ -864,7 +864,6 @@ DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i3= 2) DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32) -DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32) diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c index cbcfd57..421dced 100644 --- a/target/mips/msa_helper.c +++ b/target/mips/msa_helper.c @@ -1311,58 +1311,6 @@ void helper_msa_pckev_df(CPUMIPSState *env, uint32_t= df, uint32_t wd, } } =20 - -void helper_msa_ilvev_df(CPUMIPSState *env, uint32_t df, uint32_t wd, - uint32_t ws, uint32_t wt) -{ - wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); - wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); - wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); - - switch (df) { - case DF_BYTE: - pwd->b[15] =3D pws->b[14]; - pwd->b[14] =3D pwt->b[14]; - pwd->b[13] =3D pws->b[12]; - pwd->b[12] =3D pwt->b[12]; - pwd->b[11] =3D pws->b[10]; - pwd->b[10] =3D pwt->b[10]; - pwd->b[9] =3D pws->b[8]; - pwd->b[8] =3D pwt->b[8]; - pwd->b[7] =3D pws->b[6]; - pwd->b[6] =3D pwt->b[6]; - pwd->b[5] =3D pws->b[4]; - pwd->b[4] =3D pwt->b[4]; - pwd->b[3] =3D pws->b[2]; - pwd->b[2] =3D pwt->b[2]; - pwd->b[1] =3D pws->b[0]; - pwd->b[0] =3D pwt->b[0]; - break; - case DF_HALF: - pwd->h[7] =3D pws->h[6]; - pwd->h[6] =3D pwt->h[6]; - pwd->h[5] =3D pws->h[4]; - pwd->h[4] =3D pwt->h[4]; - pwd->h[3] =3D pws->h[2]; - pwd->h[2] =3D pwt->h[2]; - pwd->h[1] =3D pws->h[0]; - pwd->h[0] =3D pwt->h[0]; - break; - case DF_WORD: - pwd->w[3] =3D pws->w[2]; - pwd->w[2] =3D pwt->w[2]; - pwd->w[1] =3D pws->w[0]; - pwd->w[0] =3D pwt->w[0]; - break; - case DF_DOUBLE: - pwd->d[1] =3D pws->d[0]; - pwd->d[0] =3D pwt->d[0]; - break; - default: - assert(0); - } -} - void helper_msa_ilvl_df(CPUMIPSState *env, uint32_t df, uint32_t wd, uint32_t ws, uint32_t wt) { diff --git a/target/mips/translate.c b/target/mips/translate.c index 101d2de..1526d24 100644 --- a/target/mips/translate.c +++ b/target/mips/translate.c @@ -28991,6 +28991,106 @@ static inline void gen_ilvod_d(CPUMIPSState *env,= uint32_t wd, tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]); } =20 +static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + + uint64_t mask =3D (1ULL << 8) - 1; + mask |=3D mask << 16; + mask |=3D mask << 32; + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2], mask); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2], mask); + tcg_gen_shli_i64(t0, t0, 8); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2], t1); + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_shli_i64(t0, t0, 8); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], t1); + + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + +static inline void gen_ilvev_h(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + + uint64_t mask =3D (1ULL << 16) - 1; + mask |=3D mask << 32; + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2], mask); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2], mask); + tcg_gen_shli_i64(t0, t0, 16); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2], t1); + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_shli_i64(t0, t0, 16); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], t1); + + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + +static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + + uint64_t mask =3D (1ULL << 32) - 1; + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2], mask); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2], mask); + tcg_gen_shli_i64(t0, t0, 32); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2], t1); + + tcg_gen_movi_i64(t1, 0); + + tcg_gen_andi_i64(t0, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_or_i64(t1, t1, t0); + tcg_gen_andi_i64(t0, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_shli_i64(t0, t0, 32); + tcg_gen_or_i64(t1, t1, t0); + + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], t1); + + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + +static inline void gen_ilvev_d(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) { + tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]); + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]); +} + static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx) { #define MASK_MSA_3R(op) (MASK_MSA_MINOR(op) | (op & (0x7 << 23))) @@ -29147,7 +29247,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasCon= text *ctx) gen_helper_msa_mod_s_df(cpu_env, tdf, twd, tws, twt); break; case OPC_ILVEV_df: - gen_helper_msa_ilvev_df(cpu_env, tdf, twd, tws, twt); + switch (df) { + case DF_BYTE: + gen_ilvev_b(env, wd, ws, wt); + break; + case DF_HALF: + gen_ilvev_h(env, wd, ws, wt); + break; + case DF_WORD: + gen_ilvev_w(env, wd, ws, wt); + break; + case DF_DOUBLE: + gen_ilvev_d(env, wd, ws, wt); + break; + default: + assert(0); + } break; case OPC_BINSR_df: gen_helper_msa_binsr_df(cpu_env, tdf, twd, tws, twt); --=20 2.7.4