target/i386/tcg: more fixes

[PATCH 5/5] target/i386/tcg: remove register case from decode_modrm_address

Posted by Paolo Bonzini 1 month ago

Unlike the older code in translate.c, mod=11b *is* filtered out earlier
by decode_modrm.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/tcg/decode-new.c.inc | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 243df7e3735..7b595607fa7 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -2024,12 +2024,6 @@ static AddressParts decode_modrm_address(CPUX86State *env, DisasContext *s,
     rm = modrm & 7;
     base = rm | REX_B(s);
 
-    if (mod == 3) {
-        /* Normally filtered out earlier, but including this path
-           simplifies multi-byte nop, as well as bndcl, bndcu, bndcn.  */
-        goto done;
-    }
-
     switch (s->aflag) {
     case MO_64:
     case MO_32:
@@ -2127,7 +2121,6 @@ static AddressParts decode_modrm_address(CPUX86State *env, DisasContext *s,
         g_assert_not_reached();
     }
 
- done:
     return (AddressParts){ def_seg, base, index, scale, disp };
 }
 
-- 
2.52.0

Re: [PATCH 5/5] target/i386/tcg: remove register case from decode_modrm_address

Posted by Richard Henderson 1 month ago

On 1/8/26 02:14, Paolo Bonzini wrote:
> Unlike the older code in translate.c, mod=11b *is* filtered out earlier
> by decode_modrm.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   target/i386/tcg/decode-new.c.inc | 7 -------
>   1 file changed, 7 deletions(-)
> 
> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
> index 243df7e3735..7b595607fa7 100644
> --- a/target/i386/tcg/decode-new.c.inc
> +++ b/target/i386/tcg/decode-new.c.inc
> @@ -2024,12 +2024,6 @@ static AddressParts decode_modrm_address(CPUX86State *env, DisasContext *s,
>       rm = modrm & 7;
>       base = rm | REX_B(s);
>   
> -    if (mod == 3) {
> -        /* Normally filtered out earlier, but including this path
> -           simplifies multi-byte nop, as well as bndcl, bndcu, bndcn.  */
> -        goto done;
> -    }
> -

I can see that this is true, but one has to dig around to see that it's so.

There's enough prep code duplicated between decode_modrm and decode_modrm_address that I 
think it would be worthwhile to merge the two functions and simplify.

Also, not do things like

         switch (mod) {
...
         default:
         case 2:

where the default isn't reachable.


r~

Re: [PATCH 5/5] target/i386/tcg: remove register case from decode_modrm_address

Posted by Paolo Bonzini 1 month ago

On 1/8/26 00:44, Richard Henderson wrote:
> On 1/8/26 02:14, Paolo Bonzini wrote:
>> Unlike the older code in translate.c, mod=11b *is* filtered out earlier
>> by decode_modrm.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   target/i386/tcg/decode-new.c.inc | 7 -------
>>   1 file changed, 7 deletions(-)
>>
>> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/ 
>> decode-new.c.inc
>> index 243df7e3735..7b595607fa7 100644
>> --- a/target/i386/tcg/decode-new.c.inc
>> +++ b/target/i386/tcg/decode-new.c.inc
>> @@ -2024,12 +2024,6 @@ static AddressParts 
>> decode_modrm_address(CPUX86State *env, DisasContext *s,
>>       rm = modrm & 7;
>>       base = rm | REX_B(s);
>> -    if (mod == 3) {
>> -        /* Normally filtered out earlier, but including this path
>> -           simplifies multi-byte nop, as well as bndcl, bndcu, 
>> bndcn.  */
>> -        goto done;
>> -    }
>> -
> 
> I can see that this is true, but one has to dig around to see that it's so.
> 
> There's enough prep code duplicated between decode_modrm and 
> decode_modrm_address that I think it would be worthwhile to merge the 
> two functions and simplify.

Is there really so much?  I guess you could write the combined function like so:

     int modrm = get_modrm(s, env);
     int mod = (s->modrm >> 6) & 3;
     int rm = s->modrm & 7;
     if (mod == 3) {
         op->n = (rm | REX_B(s)) & reg_nb_mask(s, op->unit);
         return;
     }

     int def_seg = R_DS;
     int base = rm | REX_B(s);
     int index = -1;
     int scale = 0;
     targe_long disp = 0;
     bool is_vsib = decode->e.vex_class == 12;
     switch(s->aflag) {
         ...
     }

     op->has_ea = true;
     op->n = -1;
     decode->mem = (AddressParts){ def_seg, base, index, scale, disp };

but it does not seem so much better...  I guess I could do the modrm
split just once:

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 41012659104..23cae451eb7 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -2247,21 +2247,19 @@
  
  /* Decompose an address.  */
  static AddressParts decode_modrm_address(CPUX86State *env, DisasContext *s,
-                                         int modrm, bool is_vsib)
+                                         int mod, int rm, bool is_vsib)
  {
-    int def_seg, base, index, scale, mod, rm;
+    int def_seg, base, index, scale;
      target_long disp;
      bool havesib;
  
+    assert(mod == 3);
      def_seg = R_DS;
+    base = rm | REX_B(s);
      index = -1;
      scale = 0;
      disp = 0;
  
-    mod = (modrm >> 6) & 3;
-    rm = modrm & 7;
-    base = rm | REX_B(s);
-
      switch (s->aflag) {
      case MO_64:
      case MO_32:
@@ -2379,16 +2376,18 @@
  static int decode_modrm(DisasContext *s, CPUX86State *env,
                          X86DecodedInsn *decode, X86DecodedOp *op)
  {
-    int modrm = s->modrm;
-    if ((modrm >> 6) == 3) {
-        op->n = (modrm & 7);
+    int modrm = get_modrm(s, env);
+    int mod = (s->modrm >> 6) & 3;
+    int rm = s->modrm & 7;
+    if (mod == 3) {
+        op->n = rm;
          if (op->unit != X86_OP_MMX) {
              op->n |= REX_B(s);
          }
      } else {
          op->has_ea = true;
          op->n = -1;
-        decode->mem = decode_modrm_address(env, s, get_modrm(s, env),
+        decode->mem = decode_modrm_address(env, s, mod, rm,
                                             decode->e.vex_class == 12);
      }
      return modrm;

> 
> Also, not do things like
> 
>          switch (mod) {
> ...
>          default:
>          case 2:
> 
> where the default isn't reachable.

I agree and was curious to when it dated... and it's March 2003 :) (commit 4b74fe1f001, "many fixes", which brought in the very first implementation of 16-bit addressing modes):

+        default:
+        case 7:
+            gen_op_movl_A0_reg[R_EBX]();
+            break;

Paolo

Re: [PATCH 5/5] target/i386/tcg: remove register case from decode_modrm_address

Posted by Richard Henderson 1 month ago

On 1/9/26 00:28, Paolo Bonzini wrote:
> On 1/8/26 00:44, Richard Henderson wrote:
>> On 1/8/26 02:14, Paolo Bonzini wrote:
>>> Unlike the older code in translate.c, mod=11b *is* filtered out earlier
>>> by decode_modrm.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>   target/i386/tcg/decode-new.c.inc | 7 -------
>>>   1 file changed, 7 deletions(-)
>>>
>>> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/ decode-new.c.inc
>>> index 243df7e3735..7b595607fa7 100644
>>> --- a/target/i386/tcg/decode-new.c.inc
>>> +++ b/target/i386/tcg/decode-new.c.inc
>>> @@ -2024,12 +2024,6 @@ static AddressParts decode_modrm_address(CPUX86State *env, 
>>> DisasContext *s,
>>>       rm = modrm & 7;
>>>       base = rm | REX_B(s);
>>> -    if (mod == 3) {
>>> -        /* Normally filtered out earlier, but including this path
>>> -           simplifies multi-byte nop, as well as bndcl, bndcu, bndcn.  */
>>> -        goto done;
>>> -    }
>>> -
>>
>> I can see that this is true, but one has to dig around to see that it's so.
>>
>> There's enough prep code duplicated between decode_modrm and decode_modrm_address that I 
>> think it would be worthwhile to merge the two functions and simplify.
> 
> Is there really so much?  I guess you could write the combined function like so:
> 
>      int modrm = get_modrm(s, env);
>      int mod = (s->modrm >> 6) & 3;
>      int rm = s->modrm & 7;
>      if (mod == 3) {
>          op->n = (rm | REX_B(s)) & reg_nb_mask(s, op->unit);
>          return;
>      }
> 
>      int def_seg = R_DS;
>      int base = rm | REX_B(s);
>      int index = -1;
>      int scale = 0;
>      targe_long disp = 0;
>      bool is_vsib = decode->e.vex_class == 12;
>      switch(s->aflag) {
>          ...
>      }
> 
>      op->has_ea = true;
>      op->n = -1;
>      decode->mem = (AddressParts){ def_seg, base, index, scale, disp };

Yes, this is the sort of thing that I was thinking.

Where, in particular, the compiler gets to see that bounds of mod were [0-3], and that mod 
== 3 had been eliminated, so that later in the function the bounds of mod are [0-2] for 
those later switch statements.


> but it does not seem so much better...  I guess I could do the modrm
> split just once:
I suppose the compiler would be able to do the same bounds discovery with that, since 
it'll definitely inline the single-use function.  So at least that's an improvement over 
the current situation, where I think the two shift + mask extractions from a memory 
operand probably aren't CSE'd.

But the fact that you felt the need to add an assert there at the beginning suggests to me 
that it really is clearer as a single function.


r~