[PULL v2 77/91] target/arm: Avoid tcg_const_ptr in handle_vec_simd_sqshrn

Richard Henderson posted 91 patches 2 years ago
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Michael Rolnik <mrolnik@gmail.com>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Taylor Simpson <tsimpson@quicinc.com>, Eduardo Habkost <eduardo@habkost.net>, Song Gao <gaosong@loongson.cn>, Xiaojuan Yang <yangxiaojuan@loongson.cn>, Laurent Vivier <laurent@vivier.eu>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Aurelien Jarno <aurelien@aurel32.net>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>, Chris Wulff <crwulff@gmail.com>, Marek Vasut <marex@denx.de>, Stafford Horne <shorne@gmail.com>, Daniel Henrique Barboza <danielhb413@gmail.com>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bin.meng@windriver.com>, Weiwei Li <liweiwei@iscas.ac.cn>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, David Hildenbrand <david@redhat.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Thomas Huth <thuth@redhat.com>, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, Artyom Tarasenko <atar4qemu@gmail.com>, Bastian Koppelmann <kbastian@mail.uni-paderborn.de>, Max Filippov <jcmvbkbc@gmail.com>
There is a newer version of this series
[PULL v2 77/91] target/arm: Avoid tcg_const_ptr in handle_vec_simd_sqshrn
Posted by Richard Henderson 2 years ago
It is easy enough to use mov instead of or-with-zero
and relying on the optimizer to fold away the or.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2ad7c48901..082a8b82dd 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8459,7 +8459,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
     tcg_rn = tcg_temp_new_i64();
     tcg_rd = tcg_temp_new_i64();
     tcg_rd_narrowed = tcg_temp_new_i32();
-    tcg_final = tcg_const_i64(0);
+    tcg_final = tcg_temp_new_i64();
 
     if (round) {
         tcg_round = tcg_constant_i64(1ULL << (shift - 1));
@@ -8473,7 +8473,11 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
                                 false, is_u_shift, size+1, shift);
         narrowfn(tcg_rd_narrowed, cpu_env, tcg_rd);
         tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed);
-        tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize);
+        if (i == 0) {
+            tcg_gen_mov_i64(tcg_final, tcg_rd);
+        } else {
+            tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize);
+        }
     }
 
     if (!is_q) {
-- 
2.34.1


Re: [PULL v2 77/91] target/arm: Avoid tcg_const_ptr in handle_vec_simd_sqshrn
Posted by Peter Maydell 1 year, 2 months ago
On Thu, 9 Mar 2023 at 20:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> It is easy enough to use mov instead of or-with-zero
> and relying on the optimizer to fold away the or.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/translate-a64.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
> index 2ad7c48901..082a8b82dd 100644
> --- a/target/arm/tcg/translate-a64.c
> +++ b/target/arm/tcg/translate-a64.c
> @@ -8459,7 +8459,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
>      tcg_rn = tcg_temp_new_i64();
>      tcg_rd = tcg_temp_new_i64();
>      tcg_rd_narrowed = tcg_temp_new_i32();
> -    tcg_final = tcg_const_i64(0);
> +    tcg_final = tcg_temp_new_i64();
>
>      if (round) {
>          tcg_round = tcg_constant_i64(1ULL << (shift - 1));
> @@ -8473,7 +8473,11 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
>                                  false, is_u_shift, size+1, shift);
>          narrowfn(tcg_rd_narrowed, cpu_env, tcg_rd);
>          tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed);
> -        tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize);
> +        if (i == 0) {
> +            tcg_gen_mov_i64(tcg_final, tcg_rd);
> +        } else {
> +            tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize);
> +        }

So, it turns out that this causes a regression:
https://gitlab.com/qemu-project/qemu/-/issues/2089

The change here is fine for the vector case, because when
we loop round the subsequent deposit ops will overwrite
the bits of tcg_final above the initial element, whatever
they happen to be in tcg_rd. However, for the scalar case
we only execute this loop once, and so after this change
instead of the high bits of the result being 0, we leave
them as whatever they were in tcg_rd. If the narrow is a
signed version and the result was negative, those high bits
will now be 1 instead of the 0 they should be.

Using
 tcg_gen_extract_i64(tcg_final, tcg_rd, 0, esize);
instead of the tcg_gen_mov_i64() should fix this.

I'll send a patch later this afternoon.

thanks
-- PMM