target/sh4: decode_gusa: recognize add#imm with prior mov Rm, Rn

[PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm, Rn

Posted by Matt Turner 5 days, 1 hour ago

The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
`mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
cpu_exec_step_atomic for sequences like:

  mov.l  @r2, r3      ; load
  mov    r3, r7       ; save old value (mv_src == ld_dst)
  add    #1, r7       ; increment copy
  mov.l  r7, @r2      ; store

When mv_src == ld_dst the move merely copies the loaded value to
preserve it -- exactly the situation already accepted for the
`add Rm, Rn` form. The immediate form can be handled identically with
tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
instead of taking the slower single-step atomic fallback.
---
 target/sh4/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git ./target/sh4/translate.c ./target/sh4/translate.c
index 5adf650744..d38a6bd352 100644
--- ./target/sh4/translate.c
+++ ./target/sh4/translate.c
@@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
         break;
 
     case 0x7000 ... 0x700f: /* add #imm,Rn */
-        if (op_dst != B11_8 || mv_src >= 0) {
+        if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
             goto fail;
         }
         op_opc = INDEX_op_add;
-- 
2.53.0

Re: [PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm,Rn

Posted by Richard Henderson 4 days, 16 hours ago

On 5/25/26 08:27, Matt Turner wrote:
> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> cpu_exec_step_atomic for sequences like:
> 
>    mov.l  @r2, r3      ; load
>    mov    r3, r7       ; save old value (mv_src == ld_dst)
>    add    #1, r7       ; increment copy
>    mov.l  r7, @r2      ; store
> 
> When mv_src == ld_dst the move merely copies the loaded value to
> preserve it -- exactly the situation already accepted for the
> `add Rm, Rn` form. The immediate form can be handled identically with
> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> instead of taking the slower single-step atomic fallback.
> ---
>   target/sh4/translate.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> index 5adf650744..d38a6bd352 100644
> --- ./target/sh4/translate.c
> +++ ./target/sh4/translate.c
> @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
>           break;
>   
>       case 0x7000 ... 0x700f: /* add #imm,Rn */
> -        if (op_dst != B11_8 || mv_src >= 0) {
> +        if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
>               goto fail;

Hmm, but then you don't further constrain B11_8.

For the given example, afaics, B11_8 == op_dst, so we're failing only because mv_src >= 0. 
  Do we actually need the mv_src check?  I'm guessing not.


r~

Re: [PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm, Rn

Posted by Matt Turner 1 day, 17 hours ago

On Mon, May 25, 2026 at 8:55 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/25/26 08:27, Matt Turner wrote:
> > The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> > `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> > cpu_exec_step_atomic for sequences like:
> >
> >    mov.l  @r2, r3      ; load
> >    mov    r3, r7       ; save old value (mv_src == ld_dst)
> >    add    #1, r7       ; increment copy
> >    mov.l  r7, @r2      ; store
> >
> > When mv_src == ld_dst the move merely copies the loaded value to
> > preserve it -- exactly the situation already accepted for the
> > `add Rm, Rn` form. The immediate form can be handled identically with
> > tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> > instead of taking the slower single-step atomic fallback.
> > ---
> >   target/sh4/translate.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> > index 5adf650744..d38a6bd352 100644
> > --- ./target/sh4/translate.c
> > +++ ./target/sh4/translate.c
> > @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> >           break;
> >
> >       case 0x7000 ... 0x700f: /* add #imm,Rn */
> > -        if (op_dst != B11_8 || mv_src >= 0) {
> > +        if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> >               goto fail;
>
> Hmm, but then you don't further constrain B11_8.
>
> For the given example, afaics, B11_8 == op_dst, so we're failing only because mv_src >= 0.
>   Do we actually need the mv_src check?  I'm guessing not.

I think you're exactly right that `mv_src >= 0` doesn't do anything by
itself. With this patch, the `mv_src >= 0` check is needed to prevent
a spurious failure for:

mov.l  @r2, r3   ; ld_dst=3, op_dst=3 (set to ld_dst by default)
add    #1, r3    ; B11_8=3
mov.l  r3, @r2

With just mv_src != ld_dst (no mv_src >= 0 guard):
- mv_src = -1, ld_dst = 3
- -1 != 3 -> true -> spurious fail

Re: [PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm, Rn

Posted by yoshinori.sato@nifty.com 3 days, 3 hours ago

On Tue, 26 May 2026 09:55:28 +0900,
Richard Henderson wrote:
> 
> On 5/25/26 08:27, Matt Turner wrote:
> > The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> > `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> > cpu_exec_step_atomic for sequences like:
> > 
> >    mov.l  @r2, r3      ; load
> >    mov    r3, r7       ; save old value (mv_src == ld_dst)
> >    add    #1, r7       ; increment copy
> >    mov.l  r7, @r2      ; store
> > 
> > When mv_src == ld_dst the move merely copies the loaded value to
> > preserve it -- exactly the situation already accepted for the
> > `add Rm, Rn` form. The immediate form can be handled identically with
> > tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> > instead of taking the slower single-step atomic fallback.
> > ---
> >   target/sh4/translate.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> > index 5adf650744..d38a6bd352 100644
> > --- ./target/sh4/translate.c
> > +++ ./target/sh4/translate.c
> > @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> >           break;
> >         case 0x7000 ... 0x700f: /* add #imm,Rn */
> > -        if (op_dst != B11_8 || mv_src >= 0) {
> > +        if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> >               goto fail;
> 
> Hmm, but then you don't further constrain B11_8.
> 
> For the given example, afaics, B11_8 == op_dst, so we're failing only
> because mv_src >= 0.  Do we actually need the mv_src check?  I'm
> guessing not.
> 
> 
> r~

This check is necessary because the intention is to cause failure in the
case of an unintended sequence of instructions.
If atomic addition results in such an instruction sequence, then it seems
like a reasonable change.

-- 
Yosinori Sato

Re: [PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm,Rn

Posted by Richard Henderson 3 days, 1 hour ago

On 5/27/26 06:27, yoshinori.sato@nifty.com wrote:
> On Tue, 26 May 2026 09:55:28 +0900,
> Richard Henderson wrote:
>>
>> On 5/25/26 08:27, Matt Turner wrote:
>>> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
>>> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
>>> cpu_exec_step_atomic for sequences like:
>>>
>>>     mov.l  @r2, r3      ; load
>>>     mov    r3, r7       ; save old value (mv_src == ld_dst)
>>>     add    #1, r7       ; increment copy
>>>     mov.l  r7, @r2      ; store
>>>
>>> When mv_src == ld_dst the move merely copies the loaded value to
>>> preserve it -- exactly the situation already accepted for the
>>> `add Rm, Rn` form. The immediate form can be handled identically with
>>> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
>>> instead of taking the slower single-step atomic fallback.
>>> ---
>>>    target/sh4/translate.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git ./target/sh4/translate.c ./target/sh4/translate.c
>>> index 5adf650744..d38a6bd352 100644
>>> --- ./target/sh4/translate.c
>>> +++ ./target/sh4/translate.c
>>> @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
>>>            break;
>>>          case 0x7000 ... 0x700f: /* add #imm,Rn */
>>> -        if (op_dst != B11_8 || mv_src >= 0) {
>>> +        if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
>>>                goto fail;
>>
>> Hmm, but then you don't further constrain B11_8.
>>
>> For the given example, afaics, B11_8 == op_dst, so we're failing only
>> because mv_src >= 0.  Do we actually need the mv_src check?  I'm
>> guessing not.
>>
>>
>> r~
> 
> This check is necessary because the intention is to cause failure in the
> case of an unintended sequence of instructions.

Of course, but what's the unintended sequence prevented by mv_src >= 0?
Matt has identified a sequence that *is* possible, which is prevented by that check.


r~

Re: [PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm, Rn

Posted by yoshinori.sato@nifty.com 2 days, 1 hour ago

On Thu, 28 May 2026 00:26:49 +0900,
Richard Henderson wrote:
> 
> On 5/27/26 06:27, yoshinori.sato@nifty.com wrote:
> > On Tue, 26 May 2026 09:55:28 +0900,
> > Richard Henderson wrote:
> >> 
> >> On 5/25/26 08:27, Matt Turner wrote:
> >>> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> >>> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> >>> cpu_exec_step_atomic for sequences like:
> >>> 
> >>>     mov.l  @r2, r3      ; load
> >>>     mov    r3, r7       ; save old value (mv_src == ld_dst)
> >>>     add    #1, r7       ; increment copy
> >>>     mov.l  r7, @r2      ; store
> >>> 
> >>> When mv_src == ld_dst the move merely copies the loaded value to
> >>> preserve it -- exactly the situation already accepted for the
> >>> `add Rm, Rn` form. The immediate form can be handled identically with
> >>> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> >>> instead of taking the slower single-step atomic fallback.
> >>> ---
> >>>    target/sh4/translate.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>> 
> >>> diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> >>> index 5adf650744..d38a6bd352 100644
> >>> --- ./target/sh4/translate.c
> >>> +++ ./target/sh4/translate.c
> >>> @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> >>>            break;
> >>>          case 0x7000 ... 0x700f: /* add #imm,Rn */
> >>> -        if (op_dst != B11_8 || mv_src >= 0) {
> >>> +        if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> >>>                goto fail;
> >> 
> >> Hmm, but then you don't further constrain B11_8.
> >> 
> >> For the given example, afaics, B11_8 == op_dst, so we're failing only
> >> because mv_src >= 0.  Do we actually need the mv_src check?  I'm
> >> guessing not.
> >> 
> >> 
> >> r~
> > 
> > This check is necessary because the intention is to cause failure in the
> > case of an unintended sequence of instructions.
> 
> Of course, but what's the unintended sequence prevented by mv_src >= 0?
> Matt has identified a sequence that *is* possible, which is prevented by that check.
> 
> 
> r~

Decoding "mov r3, r7" in the example code snippet results in mv_src being 3,
which causes the current code to fail.
This is an excessive failure, so it should be made effective.

-- 
Yosinori Sato

Re: [PATCH] target/sh4: decode_gusa: recognize add#imm with prior mov Rm, Rn

Posted by Matt Turner 4 days, 19 hours ago

On Mon, May 25, 2026 at 11:27 AM Matt Turner <mattst88@gmail.com> wrote:
>
> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> cpu_exec_step_atomic for sequences like:
>
>   mov.l  @r2, r3      ; load
>   mov    r3, r7       ; save old value (mv_src == ld_dst)
>   add    #1, r7       ; increment copy
>   mov.l  r7, @r2      ; store
>
> When mv_src == ld_dst the move merely copies the loaded value to
> preserve it -- exactly the situation already accepted for the
> `add Rm, Rn` form. The immediate form can be handled identically with
> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> instead of taking the slower single-step atomic fallback.
> ---

Signed-off-by: Matt Turner <mattst88@gmail.com>