[PATCH] target/mips/translate: Simplify PCPYH using deposit_i64()

Philippe Mathieu-Daudé posted 1 patch 4 years, 9 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210213001901.75562-1-f4bug@amsat.org
Maintainers: Jiaxun Yang <jiaxun.yang@flygoat.com>, "Philippe Mathieu-Daudé" <f4bug@amsat.org>, Aurelien Jarno <aurelien@aurel32.net>, Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
There is a newer version of this series
target/mips/translate.c | 36 ++++++------------------------------
1 file changed, 6 insertions(+), 30 deletions(-)
[PATCH] target/mips/translate: Simplify PCPYH using deposit_i64()
Posted by Philippe Mathieu-Daudé 4 years, 9 months ago
Simplify the PCPYH (Parallel Copy Halfword) instruction by using
multiple calls to deposit_i64() which can be optimized by some
TCG backends.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/mips/translate.c | 36 ++++++------------------------------
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index a5cf1742a8b..5b31aa44f30 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -24786,36 +24786,12 @@ static void gen_mmi_pcpyh(DisasContext *ctx)
         tcg_gen_movi_i64(cpu_gpr[rd], 0);
         tcg_gen_movi_i64(cpu_mmr[rd], 0);
     } else {
-        TCGv_i64 t0 = tcg_temp_new();
-        TCGv_i64 t1 = tcg_temp_new();
-        uint64_t mask = (1ULL << 16) - 1;
-
-        tcg_gen_andi_i64(t0, cpu_gpr[rt], mask);
-        tcg_gen_movi_i64(t1, 0);
-        tcg_gen_or_i64(t1, t0, t1);
-        tcg_gen_shli_i64(t0, t0, 16);
-        tcg_gen_or_i64(t1, t0, t1);
-        tcg_gen_shli_i64(t0, t0, 16);
-        tcg_gen_or_i64(t1, t0, t1);
-        tcg_gen_shli_i64(t0, t0, 16);
-        tcg_gen_or_i64(t1, t0, t1);
-
-        tcg_gen_mov_i64(cpu_gpr[rd], t1);
-
-        tcg_gen_andi_i64(t0, cpu_mmr[rt], mask);
-        tcg_gen_movi_i64(t1, 0);
-        tcg_gen_or_i64(t1, t0, t1);
-        tcg_gen_shli_i64(t0, t0, 16);
-        tcg_gen_or_i64(t1, t0, t1);
-        tcg_gen_shli_i64(t0, t0, 16);
-        tcg_gen_or_i64(t1, t0, t1);
-        tcg_gen_shli_i64(t0, t0, 16);
-        tcg_gen_or_i64(t1, t0, t1);
-
-        tcg_gen_mov_i64(cpu_mmr[rd], t1);
-
-        tcg_temp_free(t0);
-        tcg_temp_free(t1);
+        for (int i = 0; i < 4; i++) {
+            tcg_gen_deposit_i64(cpu_gpr[rd],
+                                cpu_gpr[rd], cpu_gpr[rd], 8 * i, 8);
+            tcg_gen_deposit_i64(cpu_mmr[rd],
+                                cpu_mmr[rd], cpu_mmr[rd], 8 * i, 8);
+        }
     }
 }
 
-- 
2.26.2

Re: [PATCH] target/mips/translate: Simplify PCPYH using deposit_i64()
Posted by Philippe Mathieu-Daudé 4 years, 9 months ago
On 2/13/21 1:19 AM, Philippe Mathieu-Daudé wrote:
> Simplify the PCPYH (Parallel Copy Halfword) instruction by using
> multiple calls to deposit_i64() which can be optimized by some
> TCG backends.
> 
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  target/mips/translate.c | 36 ++++++------------------------------
>  1 file changed, 6 insertions(+), 30 deletions(-)
> 
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index a5cf1742a8b..5b31aa44f30 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -24786,36 +24786,12 @@ static void gen_mmi_pcpyh(DisasContext *ctx)
>          tcg_gen_movi_i64(cpu_gpr[rd], 0);
>          tcg_gen_movi_i64(cpu_mmr[rd], 0);
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new();
> -        TCGv_i64 t1 = tcg_temp_new();
> -        uint64_t mask = (1ULL << 16) - 1;
> -
> -        tcg_gen_andi_i64(t0, cpu_gpr[rt], mask);
> -        tcg_gen_movi_i64(t1, 0);
> -        tcg_gen_or_i64(t1, t0, t1);
> -        tcg_gen_shli_i64(t0, t0, 16);
> -        tcg_gen_or_i64(t1, t0, t1);
> -        tcg_gen_shli_i64(t0, t0, 16);
> -        tcg_gen_or_i64(t1, t0, t1);
> -        tcg_gen_shli_i64(t0, t0, 16);
> -        tcg_gen_or_i64(t1, t0, t1);
> -
> -        tcg_gen_mov_i64(cpu_gpr[rd], t1);
> -
> -        tcg_gen_andi_i64(t0, cpu_mmr[rt], mask);
> -        tcg_gen_movi_i64(t1, 0);
> -        tcg_gen_or_i64(t1, t0, t1);
> -        tcg_gen_shli_i64(t0, t0, 16);
> -        tcg_gen_or_i64(t1, t0, t1);
> -        tcg_gen_shli_i64(t0, t0, 16);
> -        tcg_gen_or_i64(t1, t0, t1);
> -        tcg_gen_shli_i64(t0, t0, 16);
> -        tcg_gen_or_i64(t1, t0, t1);
> -
> -        tcg_gen_mov_i64(cpu_mmr[rd], t1);
> -
> -        tcg_temp_free(t0);
> -        tcg_temp_free(t1);
> +        for (int i = 0; i < 4; i++) {
> +            tcg_gen_deposit_i64(cpu_gpr[rd],
> +                                cpu_gpr[rd], cpu_gpr[rd], 8 * i, 8);
> +            tcg_gen_deposit_i64(cpu_mmr[rd],
> +                                cpu_mmr[rd], cpu_mmr[rd], 8 * i, 8);

Oops sorry disregard this patch, wrong opcode.