[Qemu-devel] [PATCH 0/2] target/i386: Implement all TBM instructions

Richard Henderson posted 2 patches 6 years, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170711212123.4368-1-rth@twiddle.net
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
target/i386/cc_helper_template.h |  18 +++++
target/i386/cpu.h                |   7 +-
target/i386/cc_helper.c          |  28 ++++++-
target/i386/cpu.c                |   3 +-
target/i386/translate.c          | 163 +++++++++++++++++++++++++++++++++++----
5 files changed, 199 insertions(+), 20 deletions(-)
[Qemu-devel] [PATCH 0/2] target/i386: Implement all TBM instructions
Posted by Richard Henderson 6 years, 8 months ago
These are general purpose bit manipulation instructions akin
to the BMI1 and BMI2 instructions.  This is an AMD extension
and uses the XOP instruction prefix.

I am in the process of trying to run the gcc testsuite with -mtbm,
with and without the patchset, to see (1) if the new insns get used
and (2) that they run ok.

Please review.


r~


Richard Henderson (2):
  target/i386: Decode AMD XOP prefix
  target/i386: Implement all TBM instructions

 target/i386/cc_helper_template.h |  18 +++++
 target/i386/cpu.h                |   7 +-
 target/i386/cc_helper.c          |  28 ++++++-
 target/i386/cpu.c                |   3 +-
 target/i386/translate.c          | 163 +++++++++++++++++++++++++++++++++++----
 5 files changed, 199 insertions(+), 20 deletions(-)

-- 
2.9.4


Re: [Qemu-devel] [PATCH 0/2] target/i386: Implement all TBM instructions
Posted by Richard Henderson 6 years, 8 months ago
On 07/11/2017 11:21 AM, Richard Henderson wrote:
> I am in the process of trying to run the gcc testsuite with -mtbm,
> with and without the patchset, to see (1) if the new insns get used
> and (2) that they run ok.

FWIW, make check-gcc RUNTESTFLAGS='--target_board=unix/-mtbm execute.exp' shows 
204 failures on a host that does not support TBM, so the extension is being 
used.  A browse through exactly one of these used only bextr.  Running the same 
tests with dejagnu using qemu-x86_64 -cpu qemu64,+tbm shows zero failures.


r~

Re: [Qemu-devel] [PATCH 0/2] target/i386: Implement all TBM instructions
Posted by Ricardo Ribalda Delgado 6 years, 8 months ago
Hi Richard

Thanks for your patch! I have applied it to my tree, but i still get
SIGSEGV. I think that I might have found the problem. It seems to be
related to the bmi instruction blsr, which seems to be not properly
implemented.

On this example:

#include <stdio.h>

int test_blsr(int val){

return (val & (val - 1));
}


int main(int argc, char *argv) {
volatile int val = 4096;

fprintf(stdout, "%d\n", test_blsr(val));

return 0;
}

When it is compiled with -march=bdver4 -static -O3 test_blsr , the
compiler produces:

0000000000400af0 <test_blsr>:
  400af0:       c4 e2 78 f3 cf          blsr   %edi,%eax
  400af5:       c3                      retq
  400af6:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  400afd:       00 00 00

If I run the emulator: /tmp/qemu/x86_64-linux-user/qemu-x86_64 -cpu
Haswell ./a.out

The function prints 4096

A fast look in the code shows that
https://github.com/qemu/qemu/blob/master/target/i386/translate.c#L4028
does not really match
https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#BMI1_.28Bit_Manipulation_Instruction_Set_1.29

It appears that case 1 and case 3 are swapped. I tried to fix it, but
with no results :(. Anyway, the wiki could also be wrong.

What is sure is that the code produces different results on qemu than
on the target, which is not good


Thanks again for your help!


On Wed, Jul 12, 2017 at 6:04 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 07/11/2017 11:21 AM, Richard Henderson wrote:
>>
>> I am in the process of trying to run the gcc testsuite with -mtbm,
>> with and without the patchset, to see (1) if the new insns get used
>> and (2) that they run ok.
>
>
> FWIW, make check-gcc RUNTESTFLAGS='--target_board=unix/-mtbm execute.exp'
> shows 204 failures on a host that does not support TBM, so the extension is
> being used.  A browse through exactly one of these used only bextr.  Running
> the same tests with dejagnu using qemu-x86_64 -cpu qemu64,+tbm shows zero
> failures.
>
>
> r~



-- 
Ricardo Ribalda

Re: [Qemu-devel] [PATCH 0/2] target/i386: Implement all TBM instructions
Posted by Richard Henderson 6 years, 8 months ago
On 07/12/2017 03:28 AM, Ricardo Ribalda Delgado wrote:
> Hi Richard
> 
> Thanks for your patch! I have applied it to my tree, but i still get
> SIGSEGV. I think that I might have found the problem. It seems to be
> related to the bmi instruction blsr, which seems to be not properly
> implemented.

You're absolutely right.


r~

[Qemu-devel] [PATCH] target/i386: Fix BLSR and BLSI
Posted by Richard Henderson 6 years, 8 months ago
The implementation of these two instructions was swapped.
At the same time, unify the setup of eflags for the insn group.

Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 8365a6d..087a2e6 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4029,36 +4029,27 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                     goto illegal_op;
                 }
                 ot = mo_64_32(s->dflag);
-                gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
+                gen_ldst_modrm(env, s, modrm, ot, OR_TMP1, 0);
 
                 switch (reg & 7) {
                 case 1: /* blsr By,Ey */
-                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
+                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
                     tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
                     gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
-                    gen_op_update2_cc();
-                    set_cc_op(s, CC_OP_BMILGB + ot);
                     break;
-
                 case 2: /* blsmsk By,Ey */
-                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
-                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
-                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
-                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
-                    set_cc_op(s, CC_OP_BMILGB + ot);
+                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
+                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
                     break;
-
                 case 3: /* blsi By, Ey */
-                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
-                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
-                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
-                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
-                    set_cc_op(s, CC_OP_BMILGB + ot);
+                    tcg_gen_neg_tl(cpu_T0, cpu_T1);
+                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
                     break;
-
                 default:
                     goto unknown_op;
                 }
+                gen_op_update2_cc();
+                set_cc_op(s, CC_OP_BMILGB + ot);
                 break;
 
             default:
-- 
2.9.4


Re: [Qemu-devel] [PATCH] target/i386: Fix BLSR and BLSI
Posted by Ricardo Ribalda Delgado 6 years, 8 months ago
Hi Richard

Thanks again!, When I apply this patch I get the following error:

/tmp/qemu/tcg/tcg.c:2042: tcg fatal error

Regards!

On Wed, Jul 12, 2017 at 8:45 PM, Richard Henderson <rth@twiddle.net> wrote:
> The implementation of these two instructions was swapped.
> At the same time, unify the setup of eflags for the insn group.
>
> Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target/i386/translate.c | 25 ++++++++-----------------
>  1 file changed, 8 insertions(+), 17 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 8365a6d..087a2e6 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -4029,36 +4029,27 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
>                      goto illegal_op;
>                  }
>                  ot = mo_64_32(s->dflag);
> -                gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
> +                gen_ldst_modrm(env, s, modrm, ot, OR_TMP1, 0);
>
>                  switch (reg & 7) {
>                  case 1: /* blsr By,Ey */
> -                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
> +                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
>                      tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>                      gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
> -                    gen_op_update2_cc();
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>                      break;
> -
>                  case 2: /* blsmsk By,Ey */
> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
> -                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
> +                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
> +                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
>                      break;
> -
>                  case 3: /* blsi By, Ey */
> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
> -                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
> +                    tcg_gen_neg_tl(cpu_T0, cpu_T1);
> +                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>                      break;
> -
>                  default:
>                      goto unknown_op;
>                  }
> +                gen_op_update2_cc();
> +                set_cc_op(s, CC_OP_BMILGB + ot);
>                  break;
>
>              default:
> --
> 2.9.4
>



-- 
Ricardo Ribalda

Re: [Qemu-devel] [PATCH] target/i386: Fix BLSR and BLSI
Posted by Richard Henderson 6 years, 8 months ago
On 07/12/2017 08:58 AM, Ricardo Ribalda Delgado wrote:
> Hi Richard
> 
> Thanks again!, When I apply this patch I get the following error:
> 
> /tmp/qemu/tcg/tcg.c:2042: tcg fatal error

Bah.  I misremembered that OR_TMP1 is unusable in this context.


r~

Re: [Qemu-devel] [PATCH] target/i386: Fix BLSR and BLSI
Posted by Ricardo Ribalda Delgado 6 years, 8 months ago
This seems to work fine with the example. But my app still throughs sigsegv :(

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 2c64d2b71ec4..564b9c6057c2 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4033,32 +4033,23 @@ static void gen_sse(CPUX86State *env,
DisasContext *s, int b,

                 switch (reg & 7) {
                 case 1: /* blsr By,Ey */
-                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
+                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
                     tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
                     gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
-                    gen_op_update2_cc();
-                    set_cc_op(s, CC_OP_BMILGB + ot);
                     break;
-
                 case 2: /* blsmsk By,Ey */
-                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
-                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
-                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
-                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
-                    set_cc_op(s, CC_OP_BMILGB + ot);
+                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
+                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
                     break;
-
                 case 3: /* blsi By, Ey */
-                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
-                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
-                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
-                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
-                    set_cc_op(s, CC_OP_BMILGB + ot);
+                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
+                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
                     break;
-
                 default:
                     goto unknown_op;
                 }
+                gen_op_update2_cc();
+                set_cc_op(s, CC_OP_BMILGB + ot);
                 break;

             default:

On Wed, Jul 12, 2017 at 9:12 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 07/12/2017 08:58 AM, Ricardo Ribalda Delgado wrote:
>>
>> Hi Richard
>>
>> Thanks again!, When I apply this patch I get the following error:
>>
>> /tmp/qemu/tcg/tcg.c:2042: tcg fatal error
>
>
> Bah.  I misremembered that OR_TMP1 is unusable in this context.
>
>
> r~



-- 
Ricardo Ribalda

Re: [Qemu-devel] [PATCH] target/i386: Fix BLSR and BLSI
Posted by Ricardo Ribalda Delgado 5 years, 9 months ago
Hi Richard,

I cannot find this patch on qemu master branch. Do you need any help
to get this done?

Thanks!
On Wed, Jul 12, 2017 at 8:45 PM Richard Henderson <rth@twiddle.net> wrote:
>
> The implementation of these two instructions was swapped.
> At the same time, unify the setup of eflags for the insn group.
>
> Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target/i386/translate.c | 25 ++++++++-----------------
>  1 file changed, 8 insertions(+), 17 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 8365a6d..087a2e6 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -4029,36 +4029,27 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
>                      goto illegal_op;
>                  }
>                  ot = mo_64_32(s->dflag);
> -                gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
> +                gen_ldst_modrm(env, s, modrm, ot, OR_TMP1, 0);
>
>                  switch (reg & 7) {
>                  case 1: /* blsr By,Ey */
> -                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
> +                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
>                      tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>                      gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
> -                    gen_op_update2_cc();
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>                      break;
> -
>                  case 2: /* blsmsk By,Ey */
> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
> -                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
> +                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
> +                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
>                      break;
> -
>                  case 3: /* blsi By, Ey */
> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
> -                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
> +                    tcg_gen_neg_tl(cpu_T0, cpu_T1);
> +                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>                      break;
> -
>                  default:
>                      goto unknown_op;
>                  }
> +                gen_op_update2_cc();
> +                set_cc_op(s, CC_OP_BMILGB + ot);
>                  break;
>
>              default:
> --
> 2.9.4
>


-- 
Ricardo Ribalda

Re: [Qemu-devel] [PATCH] target/i386: Fix BLSR and BLSI
Posted by Paolo Bonzini 5 years, 9 months ago
On 06/06/2018 11:13, Ricardo Ribalda Delgado wrote:
> Hi Richard,
> 
> I cannot find this patch on qemu master branch. Do you need any help
> to get this done?

I queued it now, thanks for the reminder!

Paolo


> Thanks!
> On Wed, Jul 12, 2017 at 8:45 PM Richard Henderson <rth@twiddle.net> wrote:
>>
>> The implementation of these two instructions was swapped.
>> At the same time, unify the setup of eflags for the insn group.
>>
>> Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  target/i386/translate.c | 25 ++++++++-----------------
>>  1 file changed, 8 insertions(+), 17 deletions(-)
>>
>> diff --git a/target/i386/translate.c b/target/i386/translate.c
>> index 8365a6d..087a2e6 100644
>> --- a/target/i386/translate.c
>> +++ b/target/i386/translate.c
>> @@ -4029,36 +4029,27 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
>>                      goto illegal_op;
>>                  }
>>                  ot = mo_64_32(s->dflag);
>> -                gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
>> +                gen_ldst_modrm(env, s, modrm, ot, OR_TMP1, 0);
>>
>>                  switch (reg & 7) {
>>                  case 1: /* blsr By,Ey */
>> -                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
>> +                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
>>                      tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>>                      gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
>> -                    gen_op_update2_cc();
>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>>                      break;
>> -
>>                  case 2: /* blsmsk By,Ey */
>> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
>> -                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
>> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>> +                    tcg_gen_subi_tl(cpu_T0, cpu_T1, 1);
>> +                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
>>                      break;
>> -
>>                  case 3: /* blsi By, Ey */
>> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
>> -                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
>> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>> +                    tcg_gen_neg_tl(cpu_T0, cpu_T1);
>> +                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>>                      break;
>> -
>>                  default:
>>                      goto unknown_op;
>>                  }
>> +                gen_op_update2_cc();
>> +                set_cc_op(s, CC_OP_BMILGB + ot);
>>                  break;
>>
>>              default:
>> --
>> 2.9.4
>>
> 
> 


[Qemu-devel] [PATCH v2] target/i386: Fix BLSR and BLSI
Posted by Richard Henderson 6 years, 8 months ago
The implementation of these two instructions was swapped.
At the same time, unify the setup of eflags for the insn group.

Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 9d5f1c3..69d3787 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4031,34 +4031,26 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                 ot = mo_64_32(s->dflag);
                 gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
 
+                tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
                 switch (reg & 7) {
                 case 1: /* blsr By,Ey */
-                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
+                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
                     tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
-                    gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
-                    gen_op_update2_cc();
-                    set_cc_op(s, CC_OP_BMILGB + ot);
                     break;
-
                 case 2: /* blsmsk By,Ey */
-                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
-                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
-                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
-                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
-                    set_cc_op(s, CC_OP_BMILGB + ot);
+                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
+                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
                     break;
-
                 case 3: /* blsi By, Ey */
-                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
-                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
-                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
-                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
-                    set_cc_op(s, CC_OP_BMILGB + ot);
+                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
+                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
                     break;
-
                 default:
                     goto unknown_op;
                 }
+                tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
+                gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
+                set_cc_op(s, CC_OP_BMILGB + ot);
                 break;
 
             default:
-- 
2.9.4


Re: [Qemu-devel] [PATCH v2] target/i386: Fix BLSR and BLSI
Posted by Ricardo Ribalda Delgado 6 years, 8 months ago
Hi Richard

The simple example works as expected, but my big application
(gobject-introspection) still crashes with sigsegv :(.

it seems to be something related to the bmi and tbm instructions. If I
disable them in gcc ( -mno-bmi -mno-tbm), the application
runs ok.

A look at qemu's code does not show anything obvious, but I am not
that familiar with qemu source yet to find something like this through
static analysis.

My plan (as soon as I have some time) is to create a small set of apps
to validate bmi/tbm/ (Are you aware of something already existing for
this?)
My stupid guess is that maybe the ops are switched, or the flags are
not properly modified.

If you want, I can share the application that crashes with you, just
be aware that the number of dependencies is considerable.

BTW I can only run the gdb stub on version 2.8.0. On git HEAD I am getting only:

Quit
(gdb) c
Continuing.
warning: Remote failure reply: E22

Program stopped.
0x00000040017bac07 in ?? ()
(gdb) c
Continuing.



Thanks for your help, it is greatly appreciated!

On Wed, Jul 12, 2017 at 9:29 PM, Richard Henderson <rth@twiddle.net> wrote:
> The implementation of these two instructions was swapped.
> At the same time, unify the setup of eflags for the insn group.
>
> Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target/i386/translate.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 9d5f1c3..69d3787 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -4031,34 +4031,26 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
>                  ot = mo_64_32(s->dflag);
>                  gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
>
> +                tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>                  switch (reg & 7) {
>                  case 1: /* blsr By,Ey */
> -                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
> +                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
>                      tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
> -                    gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
> -                    gen_op_update2_cc();
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>                      break;
> -
>                  case 2: /* blsmsk By,Ey */
> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
> -                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
> +                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
> +                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
>                      break;
> -
>                  case 3: /* blsi By, Ey */
> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
> -                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> -                    set_cc_op(s, CC_OP_BMILGB + ot);
> +                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
> +                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>                      break;
> -
>                  default:
>                      goto unknown_op;
>                  }
> +                tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
> +                gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
> +                set_cc_op(s, CC_OP_BMILGB + ot);
>                  break;
>
>              default:
> --
> 2.9.4
>



-- 
Ricardo Ribalda

Re: [Qemu-devel] [PATCH v2] target/i386: Fix BLSR and BLSI
Posted by Ricardo Ribalda Delgado 6 years, 8 months ago
Hi again

Some progress here, I think that I have found a bug in andn, I have
already send a patch.

I have made a rudimentary testcase for bmi. I will try tomorrow o
build something similar for tbm.

For reference, I am using this script:

for a in $(seq 0 255); do for b in $(seq 0 255); do for c in $(seq 0
255); do /tmp/qemu/x86_64-linux-user/qemu-x86_64 -cpu qemu64,+bmi1
./a.out $a $b $c >/tmp/res.qemu; ./pc $a $b $c > /tmp/res.pc; if !
diff  /tmp/res.pc /tmp/res.qemu; then echo $a $b $c; fi ; done ; done
; done

with this build options
gcc kk.c -mbmi -O3 -Wall
gcc kk.c -march=native -O3 -Wall -o pc

and this code:

#include <stdio.h>
#include <stdlib.h>
#include <x86intrin.h>


long test_blsr(long val){

return (val & (val - 1));
}

long test_blsi(long val){

return (val & (-val));
}

long test_blmsk(long val){

return (val ^ (val -1));
}

long test_andn(long v1, long v2){

return (~v1 & v2);
}

long test_tzcnt(long val) {
#ifdef PC
return val ? __builtin_ctz(val) : 32;
#else
return __tzcnt_u32(val);
#endif
}

long test_bextr(long src, unsigned char start, unsigned char len) {
#ifdef PC
return (src >> start) & ((1 << len)-1);
#else
return __bextr_u32(src, start | len <<8);
#endif
}

int main(int argc, char *argv[]) {
long op1, op2, op3;
long ret;

if (argc < 4) {
fprintf(stderr, "use %s op1 op2 op3\n", argv[0]);
return -1;
}
op1 = strtoul(argv[1], NULL, 0);
op2 = strtoul(argv[2], NULL, 0);
op3 = strtoul(argv[3], NULL, 0);

fprintf(stdout, "op 1 %ld (0x%lx)\n", op1, op1);
fprintf(stdout, "op 2 %ld (0x%lx)\n", op2, op2);

ret = test_blsr(op1);
fprintf(stdout, "blsr %ld (0x%lx)\n",ret, ret);

ret = test_blsi(op1);
fprintf(stdout, "blsi %ld (0x%lx)\n",ret, ret);

ret = test_blmsk(op1);
fprintf(stdout, "blmsk %ld (0x%lx)\n",ret, ret);

ret = test_andn(op1,op2);
fprintf(stdout, "andn %ld (0x%lx)\n",ret, ret);

ret = test_tzcnt(op1);
fprintf(stdout, "tzcnt %ld (0x%lx)\n",ret, ret);

ret = test_bextr(op1, op2, op3);
fprintf(stdout, "bextr %ld (0x%lx)\n",ret, ret);

return 0;
}

On Thu, Jul 13, 2017 at 10:42 PM, Ricardo Ribalda Delgado
<ricardo.ribalda@gmail.com> wrote:
> Hi Richard
>
> The simple example works as expected, but my big application
> (gobject-introspection) still crashes with sigsegv :(.
>
> it seems to be something related to the bmi and tbm instructions. If I
> disable them in gcc ( -mno-bmi -mno-tbm), the application
> runs ok.
>
> A look at qemu's code does not show anything obvious, but I am not
> that familiar with qemu source yet to find something like this through
> static analysis.
>
> My plan (as soon as I have some time) is to create a small set of apps
> to validate bmi/tbm/ (Are you aware of something already existing for
> this?)
> My stupid guess is that maybe the ops are switched, or the flags are
> not properly modified.
>
> If you want, I can share the application that crashes with you, just
> be aware that the number of dependencies is considerable.
>
> BTW I can only run the gdb stub on version 2.8.0. On git HEAD I am getting only:
>
> Quit
> (gdb) c
> Continuing.
> warning: Remote failure reply: E22
>
> Program stopped.
> 0x00000040017bac07 in ?? ()
> (gdb) c
> Continuing.
>
>
>
> Thanks for your help, it is greatly appreciated!
>
> On Wed, Jul 12, 2017 at 9:29 PM, Richard Henderson <rth@twiddle.net> wrote:
>> The implementation of these two instructions was swapped.
>> At the same time, unify the setup of eflags for the insn group.
>>
>> Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  target/i386/translate.c | 26 +++++++++-----------------
>>  1 file changed, 9 insertions(+), 17 deletions(-)
>>
>> diff --git a/target/i386/translate.c b/target/i386/translate.c
>> index 9d5f1c3..69d3787 100644
>> --- a/target/i386/translate.c
>> +++ b/target/i386/translate.c
>> @@ -4031,34 +4031,26 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
>>                  ot = mo_64_32(s->dflag);
>>                  gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
>>
>> +                tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>>                  switch (reg & 7) {
>>                  case 1: /* blsr By,Ey */
>> -                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
>> +                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
>>                      tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>> -                    gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
>> -                    gen_op_update2_cc();
>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>>                      break;
>> -
>>                  case 2: /* blsmsk By,Ey */
>> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
>> -                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
>> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>> +                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
>> +                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
>>                      break;
>> -
>>                  case 3: /* blsi By, Ey */
>> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
>> -                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
>> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>> +                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
>> +                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>>                      break;
>> -
>>                  default:
>>                      goto unknown_op;
>>                  }
>> +                tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>> +                gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
>> +                set_cc_op(s, CC_OP_BMILGB + ot);
>>                  break;
>>
>>              default:
>> --
>> 2.9.4
>>
>
>
>
> --
> Ricardo Ribalda



-- 
Ricardo Ribalda

Re: [Qemu-devel] [PATCH v2] target/i386: Fix BLSR and BLSI
Posted by Ricardo Ribalda Delgado 6 years, 8 months ago
Hi

For completion. This is my poor man tbm test. It has run for 5 minutes
with no errors

gcc tbm.c -O3 -march=native  -o pc
gcc tbm.c -mtbm -O3

 for a in $(seq 0 65535); do /tmp/qemu/x86_64-linux-user/qemu-x86_64
-cpu qemu64,+tbm ./a.out $a >/tmp/res.qemu ; ./pc $a  > /tmp/res.pc ;
if !  diff  /tmp/res.pc /tmp/res.qemu; then echo $a ; fi ; done



#include <stdio.h>
#include <stdlib.h>
#include <x86intrin.h>


long test_bextr(long src) {
unsigned char start =1, len=3;
return (src >> start) & ((1 << len)-1);
}

long test_blcfill(long val){

return val & (val + 1);
}

long test_blci(long val){

return val | ~(val + 1);
}

long test_blcic(long val){

return ~val & (val + 1);
}

long test_blcmsk(long val){

return val ^ (val + 1);
}

long test_blcs(long val){

return val | (val + 1);
}

long test_blsfill(long val){

return val | (val - 1);
}

long test_blsic(long val){

return ~val | (val - 1);
}

long test_t1mskc(long val){

return ~val | (val + 1);
}

long test_tzmsk(long val){

return ~val & (val - 1);
}

int main(int argc, char *argv[]) {
long op1;
long ret;

if (argc < 2) {
fprintf(stderr, "use %s op1 \n", argv[0]);
return -1;
}
op1 = strtoul(argv[1], NULL, 0);

fprintf(stdout, "op 1 %ld (0x%lx)\n", op1, op1);

ret = test_bextr(op1);
fprintf(stdout, "bextr %ld (0x%lx)\n",ret, ret);

ret = test_blcfill(op1);
fprintf(stdout, "blcfill %ld (0x%lx)\n",ret, ret);

ret = test_blci(op1);
fprintf(stdout, "blci %ld (0x%lx)\n",ret, ret);

ret = test_blcic(op1);
fprintf(stdout, "blcic %ld (0x%lx)\n",ret, ret);

ret = test_blcmsk(op1);
fprintf(stdout, "blcmsk %ld (0x%lx)\n",ret, ret);

ret = test_blcs(op1);
fprintf(stdout, "blcs %ld (0x%lx)\n",ret, ret);

ret = test_blsfill(op1);
fprintf(stdout, "blsfill %ld (0x%lx)\n",ret, ret);

ret = test_blsic(op1);
fprintf(stdout, "blsic %ld (0x%lx)\n",ret, ret);

ret = test_t1mskc(op1);
fprintf(stdout, "t1mskc %ld (0x%lx)\n",ret, ret);

ret = test_tzmsk(op1);
fprintf(stdout, "tzmsk %ld (0x%lx)\n",ret, ret);



return 0;
}

On Thu, Jul 13, 2017 at 11:55 PM, Ricardo Ribalda Delgado
<ricardo.ribalda@gmail.com> wrote:
> Hi again
>
> Some progress here, I think that I have found a bug in andn, I have
> already send a patch.
>
> I have made a rudimentary testcase for bmi. I will try tomorrow o
> build something similar for tbm.
>
> For reference, I am using this script:
>
> for a in $(seq 0 255); do for b in $(seq 0 255); do for c in $(seq 0
> 255); do /tmp/qemu/x86_64-linux-user/qemu-x86_64 -cpu qemu64,+bmi1
> ./a.out $a $b $c >/tmp/res.qemu; ./pc $a $b $c > /tmp/res.pc; if !
> diff  /tmp/res.pc /tmp/res.qemu; then echo $a $b $c; fi ; done ; done
> ; done
>
> with this build options
> gcc kk.c -mbmi -O3 -Wall
> gcc kk.c -march=native -O3 -Wall -o pc
>
> and this code:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <x86intrin.h>
>
>
> long test_blsr(long val){
>
> return (val & (val - 1));
> }
>
> long test_blsi(long val){
>
> return (val & (-val));
> }
>
> long test_blmsk(long val){
>
> return (val ^ (val -1));
> }
>
> long test_andn(long v1, long v2){
>
> return (~v1 & v2);
> }
>
> long test_tzcnt(long val) {
> #ifdef PC
> return val ? __builtin_ctz(val) : 32;
> #else
> return __tzcnt_u32(val);
> #endif
> }
>
> long test_bextr(long src, unsigned char start, unsigned char len) {
> #ifdef PC
> return (src >> start) & ((1 << len)-1);
> #else
> return __bextr_u32(src, start | len <<8);
> #endif
> }
>
> int main(int argc, char *argv[]) {
> long op1, op2, op3;
> long ret;
>
> if (argc < 4) {
> fprintf(stderr, "use %s op1 op2 op3\n", argv[0]);
> return -1;
> }
> op1 = strtoul(argv[1], NULL, 0);
> op2 = strtoul(argv[2], NULL, 0);
> op3 = strtoul(argv[3], NULL, 0);
>
> fprintf(stdout, "op 1 %ld (0x%lx)\n", op1, op1);
> fprintf(stdout, "op 2 %ld (0x%lx)\n", op2, op2);
>
> ret = test_blsr(op1);
> fprintf(stdout, "blsr %ld (0x%lx)\n",ret, ret);
>
> ret = test_blsi(op1);
> fprintf(stdout, "blsi %ld (0x%lx)\n",ret, ret);
>
> ret = test_blmsk(op1);
> fprintf(stdout, "blmsk %ld (0x%lx)\n",ret, ret);
>
> ret = test_andn(op1,op2);
> fprintf(stdout, "andn %ld (0x%lx)\n",ret, ret);
>
> ret = test_tzcnt(op1);
> fprintf(stdout, "tzcnt %ld (0x%lx)\n",ret, ret);
>
> ret = test_bextr(op1, op2, op3);
> fprintf(stdout, "bextr %ld (0x%lx)\n",ret, ret);
>
> return 0;
> }
>
> On Thu, Jul 13, 2017 at 10:42 PM, Ricardo Ribalda Delgado
> <ricardo.ribalda@gmail.com> wrote:
>> Hi Richard
>>
>> The simple example works as expected, but my big application
>> (gobject-introspection) still crashes with sigsegv :(.
>>
>> it seems to be something related to the bmi and tbm instructions. If I
>> disable them in gcc ( -mno-bmi -mno-tbm), the application
>> runs ok.
>>
>> A look at qemu's code does not show anything obvious, but I am not
>> that familiar with qemu source yet to find something like this through
>> static analysis.
>>
>> My plan (as soon as I have some time) is to create a small set of apps
>> to validate bmi/tbm/ (Are you aware of something already existing for
>> this?)
>> My stupid guess is that maybe the ops are switched, or the flags are
>> not properly modified.
>>
>> If you want, I can share the application that crashes with you, just
>> be aware that the number of dependencies is considerable.
>>
>> BTW I can only run the gdb stub on version 2.8.0. On git HEAD I am getting only:
>>
>> Quit
>> (gdb) c
>> Continuing.
>> warning: Remote failure reply: E22
>>
>> Program stopped.
>> 0x00000040017bac07 in ?? ()
>> (gdb) c
>> Continuing.
>>
>>
>>
>> Thanks for your help, it is greatly appreciated!
>>
>> On Wed, Jul 12, 2017 at 9:29 PM, Richard Henderson <rth@twiddle.net> wrote:
>>> The implementation of these two instructions was swapped.
>>> At the same time, unify the setup of eflags for the insn group.
>>>
>>> Reported-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>> ---
>>>  target/i386/translate.c | 26 +++++++++-----------------
>>>  1 file changed, 9 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/target/i386/translate.c b/target/i386/translate.c
>>> index 9d5f1c3..69d3787 100644
>>> --- a/target/i386/translate.c
>>> +++ b/target/i386/translate.c
>>> @@ -4031,34 +4031,26 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
>>>                  ot = mo_64_32(s->dflag);
>>>                  gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
>>>
>>> +                tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>>>                  switch (reg & 7) {
>>>                  case 1: /* blsr By,Ey */
>>> -                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
>>> +                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
>>>                      tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>>> -                    gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
>>> -                    gen_op_update2_cc();
>>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>>>                      break;
>>> -
>>>                  case 2: /* blsmsk By,Ey */
>>> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>>> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
>>> -                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_cc_src);
>>> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>>> +                    tcg_gen_subi_tl(cpu_T1, cpu_T0, 1);
>>> +                    tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
>>>                      break;
>>> -
>>>                  case 3: /* blsi By, Ey */
>>> -                    tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
>>> -                    tcg_gen_subi_tl(cpu_T0, cpu_T0, 1);
>>> -                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_cc_src);
>>> -                    tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>>> -                    set_cc_op(s, CC_OP_BMILGB + ot);
>>> +                    tcg_gen_neg_tl(cpu_T1, cpu_T0);
>>> +                    tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
>>>                      break;
>>> -
>>>                  default:
>>>                      goto unknown_op;
>>>                  }
>>> +                tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
>>> +                gen_op_mov_reg_v(ot, s->vex_v, cpu_T0);
>>> +                set_cc_op(s, CC_OP_BMILGB + ot);
>>>                  break;
>>>
>>>              default:
>>> --
>>> 2.9.4
>>>
>>
>>
>>
>> --
>> Ricardo Ribalda
>
>
>
> --
> Ricardo Ribalda



-- 
Ricardo Ribalda