From nobody Mon Apr 29 04:44:20 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1554218334452897.3889565919608; Tue, 2 Apr 2019 08:18:54 -0700 (PDT) Received: from localhost ([127.0.0.1]:59892 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hBLBb-000785-7z for importer@patchew.org; Tue, 02 Apr 2019 11:18:51 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43855) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hBL9R-00062E-9N for qemu-devel@nongnu.org; Tue, 02 Apr 2019 11:16:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hBL9P-0003FP-TO for qemu-devel@nongnu.org; Tue, 02 Apr 2019 11:16:37 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:44651 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hBL9P-0001cv-Fi for qemu-devel@nongnu.org; Tue, 02 Apr 2019 11:16:35 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 014791A2139; Tue, 2 Apr 2019 17:15:26 +0200 (CEST) Received: from rtrkw310-lin.domain.local (rtrkw310-lin.domain.local [10.10.13.97]) by mail.rt-rk.com (Postfix) with ESMTPSA id CC1581A2172; Tue, 2 Apr 2019 17:15:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Mateja Marjanovic To: qemu-devel@nongnu.org Date: Tue, 2 Apr 2019 17:15:20 +0200 Message-Id: <1554218121-26766-2-git-send-email-mateja.marjanovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1554218121-26766-1-git-send-email-mateja.marjanovic@rt-rk.com> References: <1554218121-26766-1-git-send-email-mateja.marjanovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH v5 1/2] target/mips: Optimize ILVOD. MSA instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: arikalo@wavecomp.com, richard.henderson@linaro.org, amarkovic@wavecomp.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Mateja Marjanovic Optimize set of MSA instructions ILVOD, using directly tcg registers and performing logic on them instead of using helpers. instr || before || after =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ilvod.b || 117.50 ms || 24.99 ms ilvod.h || 93.16 ms || 24.27 ms ilvod.w || 119.90 ms || 24.81 ms ilvod.d || 43.01 ms || 20.69 ms Signed-off-by: Mateja Marjanovic --- target/mips/helper.h | 1 - target/mips/msa_helper.c | 7 ---- target/mips/translate.c | 107 +++++++++++++++++++++++++++++++++++++++++++= +++- 3 files changed, 106 insertions(+), 9 deletions(-) diff --git a/target/mips/helper.h b/target/mips/helper.h index 2863f60..02e16c7 100644 --- a/target/mips/helper.h +++ b/target/mips/helper.h @@ -865,7 +865,6 @@ DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i3= 2) DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32) -DEF_HELPER_5(msa_ilvod_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32) diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c index 6c57281..a7ea6aa 100644 --- a/target/mips/msa_helper.c +++ b/target/mips/msa_helper.c @@ -1206,13 +1206,6 @@ MSA_FN_DF(ilvr_df) MSA_FN_DF(ilvev_df) #undef MSA_DO =20 -#define MSA_DO(DF) \ - do { \ - pwx->DF[2*i] =3D pwt->DF[2*i+1]; \ - pwx->DF[2*i+1] =3D pws->DF[2*i+1]; \ - } while (0) -MSA_FN_DF(ilvod_df) -#undef MSA_DO #undef MSA_LOOP_COND =20 #define MSA_LOOP_COND(DF) \ diff --git a/target/mips/translate.c b/target/mips/translate.c index bba8b6c..04406d6 100644 --- a/target/mips/translate.c +++ b/target/mips/translate.c @@ -28884,6 +28884,96 @@ static void gen_msa_bit(CPUMIPSState *env, DisasCo= ntext *ctx) tcg_temp_free_i32(tws); } =20 +/* + * [MSA] ILVOD.B wd, ws, wt + * + * Vector Interleave Odd (byte data elements) + * + */ +static inline void gen_ilvod_b(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + const uint64_t mask =3D 0xff00ff00ff00ff00ULL; + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask); + tcg_gen_shri_i64(t1, t1, 8); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask); + tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2); + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_shri_i64(t1, t1, 8); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +/* + * [MSA] ILVOD.H wd, ws, wt + * + * Vector Interleave Odd (halfword data elements) + * + */ +static inline void gen_ilvod_h(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + const uint64_t mask =3D 0xffff0000ffff0000ULL; + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask); + tcg_gen_shri_i64(t1, t1, 16); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask); + tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2); + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_shri_i64(t1, t1, 16); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +/* + * [MSA] ILVOD.W wd, ws, wt + * + * Vector Interleave Odd (word data elements) + * + */ +static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + const uint64_t mask =3D 0xffffffff00000000ULL; + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask); + tcg_gen_shri_i64(t1, t1, 32); + tcg_gen_deposit_i64(msa_wr_d[wd * 2], msa_wr_d[ws * 2], t1, 0, 32); + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_shri_i64(t1, t1, 32); + tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1], t1, 0,= 32); + + tcg_temp_free_i64(t1); +} + +/* + * [MSA] ILVOD.D wd, ws, wt + * + * Vector Interleave Odd (double data elements) + * + */ +static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]); + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]); +} + static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx) { #define MASK_MSA_3R(op) (MASK_MSA_MINOR(op) | (op & (0x7 << 23))) @@ -29055,7 +29145,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasCon= text *ctx) gen_helper_msa_mod_u_df(cpu_env, tdf, twd, tws, twt); break; case OPC_ILVOD_df: - gen_helper_msa_ilvod_df(cpu_env, tdf, twd, tws, twt); + switch (df) { + case DF_BYTE: + gen_ilvod_b(env, wd, ws, wt); + break; + case DF_HALF: + gen_ilvod_h(env, wd, ws, wt); + break; + case DF_WORD: + gen_ilvod_w(env, wd, ws, wt); + break; + case DF_DOUBLE: + gen_ilvod_d(env, wd, ws, wt); + break; + default: + assert(0); + } break; =20 case OPC_DOTP_S_df: --=20 2.7.4 From nobody Mon Apr 29 04:44:20 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 155421848437115.746934524221274; Tue, 2 Apr 2019 08:21:24 -0700 (PDT) Received: from localhost ([127.0.0.1]:60510 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hBLE1-0000kb-7D for importer@patchew.org; Tue, 02 Apr 2019 11:21:21 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43850) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hBL9R-000628-72 for qemu-devel@nongnu.org; Tue, 02 Apr 2019 11:16:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hBL9P-0003FU-Tl for qemu-devel@nongnu.org; Tue, 02 Apr 2019 11:16:37 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:44687 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hBL9P-0001cw-GA for qemu-devel@nongnu.org; Tue, 02 Apr 2019 11:16:35 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 111381A227B; Tue, 2 Apr 2019 17:15:26 +0200 (CEST) Received: from rtrkw310-lin.domain.local (rtrkw310-lin.domain.local [10.10.13.97]) by mail.rt-rk.com (Postfix) with ESMTPSA id D797A1A21C5; Tue, 2 Apr 2019 17:15:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Mateja Marjanovic To: qemu-devel@nongnu.org Date: Tue, 2 Apr 2019 17:15:21 +0200 Message-Id: <1554218121-26766-3-git-send-email-mateja.marjanovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1554218121-26766-1-git-send-email-mateja.marjanovic@rt-rk.com> References: <1554218121-26766-1-git-send-email-mateja.marjanovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: arikalo@wavecomp.com, richard.henderson@linaro.org, amarkovic@wavecomp.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Mateja Marjanovic Optimize set of MSA instructions ILVEV, using directly tcg registers and performing logic on them instead of using helpers. In the following table, the first column is the performance before this patch. The second represents the performance, after converting from helpers to tcg, but without using tcg_gen_deposit function. The third one is the solution which is implemented in this patch. instr || before || no-deposit || with-deposit =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D ilvev.b || 126.92 ms || 24.52 ms || 24.43 ms ilvev.h || 93.67 ms || 23.92 ms || 23.86 ms ilvev.w || 117.86 ms || 23.83 ms || 22.17 ms ilvev.d || 45.49 ms || 19.74 ms || 19.71 ms The solution with deposit is suggested by Richard Henderson. Signed-off-by: Mateja Marjanovic Suggested-by: Richard Henderson --- target/mips/helper.h | 1 - target/mips/msa_helper.c | 9 ---- target/mips/translate.c | 105 +++++++++++++++++++++++++++++++++++++++++++= +++- 3 files changed, 104 insertions(+), 11 deletions(-) diff --git a/target/mips/helper.h b/target/mips/helper.h index 02e16c7..82f6a40 100644 --- a/target/mips/helper.h +++ b/target/mips/helper.h @@ -864,7 +864,6 @@ DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i3= 2) DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32) -DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32) DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32) diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c index a7ea6aa..d5c3842 100644 --- a/target/mips/msa_helper.c +++ b/target/mips/msa_helper.c @@ -1197,15 +1197,6 @@ MSA_FN_DF(ilvl_df) } while (0) MSA_FN_DF(ilvr_df) #undef MSA_DO - -#define MSA_DO(DF) \ - do { \ - pwx->DF[2*i] =3D pwt->DF[2*i]; \ - pwx->DF[2*i+1] =3D pws->DF[2*i]; \ - } while (0) -MSA_FN_DF(ilvev_df) -#undef MSA_DO - #undef MSA_LOOP_COND =20 #define MSA_LOOP_COND(DF) \ diff --git a/target/mips/translate.c b/target/mips/translate.c index 04406d6..e26c6a6 100644 --- a/target/mips/translate.c +++ b/target/mips/translate.c @@ -28974,6 +28974,94 @@ static inline void gen_ilvod_d(CPUMIPSState *env, = uint32_t wd, tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]); } =20 +/* + * [MSA] ILVEV.B wd, ws, wt + * + * Vector Interleave Even (byte data elements) + * + */ +static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + const uint64_t mask =3D 0x00ff00ff00ff00ffULL; + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask); + tcg_gen_shli_i64(t2, t2, 8); + tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2); + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_shli_i64(t2, t2, 8); + tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +/* + * [MSA] ILVEV.H wd, ws, wt + * + * Vector Interleave Even (halfword data elements) + * + */ +static inline void gen_ilvev_h(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 t2 =3D tcg_temp_new_i64(); + const uint64_t mask =3D 0x0000ffff0000ffffULL; + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask); + tcg_gen_shli_i64(t2, t2, 16); + tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2); + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask); + tcg_gen_shli_i64(t2, t2, 16); + tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +/* + * [MSA] ILVEV.W wd, ws, wt + * + * Vector Interleave Even (word data elements) + * + */ +static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + TCGv_i64 t1 =3D tcg_temp_new_i64(); + const uint64_t mask =3D 0x00000000ffffffffULL; + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask); + tcg_gen_deposit_i64(msa_wr_d[wd * 2], t1, msa_wr_d[ws * 2], 32, 32); + + tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask); + tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], t1, msa_wr_d[ws * 2 + 1], 32= , 32); + + tcg_temp_free_i64(t1); +} + +/* + * [MSA] ILVEV.D wd, ws, wt + * + * Vector Interleave Even (Double data elements) + * + */ +static inline void gen_ilvev_d(CPUMIPSState *env, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]); + tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]); +} + static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx) { #define MASK_MSA_3R(op) (MASK_MSA_MINOR(op) | (op & (0x7 << 23))) @@ -29130,7 +29218,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasCon= text *ctx) gen_helper_msa_mod_s_df(cpu_env, tdf, twd, tws, twt); break; case OPC_ILVEV_df: - gen_helper_msa_ilvev_df(cpu_env, tdf, twd, tws, twt); + switch (df) { + case DF_BYTE: + gen_ilvev_b(env, wd, ws, wt); + break; + case DF_HALF: + gen_ilvev_h(env, wd, ws, wt); + break; + case DF_WORD: + gen_ilvev_w(env, wd, ws, wt); + break; + case DF_DOUBLE: + gen_ilvev_d(env, wd, ws, wt); + break; + default: + assert(0); + } break; case OPC_BINSR_df: gen_helper_msa_binsr_df(cpu_env, tdf, twd, tws, twt); --=20 2.7.4