The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
`mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
cpu_exec_step_atomic for sequences like:
mov.l @r2, r3 ; load
mov r3, r7 ; save old value (mv_src == ld_dst)
add #1, r7 ; increment copy
mov.l r7, @r2 ; store
When mv_src == ld_dst the move merely copies the loaded value to
preserve it -- exactly the situation already accepted for the
`add Rm, Rn` form. The immediate form can be handled identically with
tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
instead of taking the slower single-step atomic fallback.
---
target/sh4/translate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git ./target/sh4/translate.c ./target/sh4/translate.c
index 5adf650744..d38a6bd352 100644
--- ./target/sh4/translate.c
+++ ./target/sh4/translate.c
@@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
break;
case 0x7000 ... 0x700f: /* add #imm,Rn */
- if (op_dst != B11_8 || mv_src >= 0) {
+ if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
goto fail;
}
op_opc = INDEX_op_add;
--
2.53.0
On 5/25/26 08:27, Matt Turner wrote:
> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> cpu_exec_step_atomic for sequences like:
>
> mov.l @r2, r3 ; load
> mov r3, r7 ; save old value (mv_src == ld_dst)
> add #1, r7 ; increment copy
> mov.l r7, @r2 ; store
>
> When mv_src == ld_dst the move merely copies the loaded value to
> preserve it -- exactly the situation already accepted for the
> `add Rm, Rn` form. The immediate form can be handled identically with
> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> instead of taking the slower single-step atomic fallback.
> ---
> target/sh4/translate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> index 5adf650744..d38a6bd352 100644
> --- ./target/sh4/translate.c
> +++ ./target/sh4/translate.c
> @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> break;
>
> case 0x7000 ... 0x700f: /* add #imm,Rn */
> - if (op_dst != B11_8 || mv_src >= 0) {
> + if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> goto fail;
Hmm, but then you don't further constrain B11_8.
For the given example, afaics, B11_8 == op_dst, so we're failing only because mv_src >= 0.
Do we actually need the mv_src check? I'm guessing not.
r~
On Mon, May 25, 2026 at 8:55 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/25/26 08:27, Matt Turner wrote:
> > The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> > `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> > cpu_exec_step_atomic for sequences like:
> >
> > mov.l @r2, r3 ; load
> > mov r3, r7 ; save old value (mv_src == ld_dst)
> > add #1, r7 ; increment copy
> > mov.l r7, @r2 ; store
> >
> > When mv_src == ld_dst the move merely copies the loaded value to
> > preserve it -- exactly the situation already accepted for the
> > `add Rm, Rn` form. The immediate form can be handled identically with
> > tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> > instead of taking the slower single-step atomic fallback.
> > ---
> > target/sh4/translate.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> > index 5adf650744..d38a6bd352 100644
> > --- ./target/sh4/translate.c
> > +++ ./target/sh4/translate.c
> > @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> > break;
> >
> > case 0x7000 ... 0x700f: /* add #imm,Rn */
> > - if (op_dst != B11_8 || mv_src >= 0) {
> > + if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> > goto fail;
>
> Hmm, but then you don't further constrain B11_8.
>
> For the given example, afaics, B11_8 == op_dst, so we're failing only because mv_src >= 0.
> Do we actually need the mv_src check? I'm guessing not.
I think you're exactly right that `mv_src >= 0` doesn't do anything by
itself. With this patch, the `mv_src >= 0` check is needed to prevent
a spurious failure for:
mov.l @r2, r3 ; ld_dst=3, op_dst=3 (set to ld_dst by default)
add #1, r3 ; B11_8=3
mov.l r3, @r2
With just mv_src != ld_dst (no mv_src >= 0 guard):
- mv_src = -1, ld_dst = 3
- -1 != 3 -> true -> spurious fail
On Tue, 26 May 2026 09:55:28 +0900,
Richard Henderson wrote:
>
> On 5/25/26 08:27, Matt Turner wrote:
> > The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> > `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> > cpu_exec_step_atomic for sequences like:
> >
> > mov.l @r2, r3 ; load
> > mov r3, r7 ; save old value (mv_src == ld_dst)
> > add #1, r7 ; increment copy
> > mov.l r7, @r2 ; store
> >
> > When mv_src == ld_dst the move merely copies the loaded value to
> > preserve it -- exactly the situation already accepted for the
> > `add Rm, Rn` form. The immediate form can be handled identically with
> > tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> > instead of taking the slower single-step atomic fallback.
> > ---
> > target/sh4/translate.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> > index 5adf650744..d38a6bd352 100644
> > --- ./target/sh4/translate.c
> > +++ ./target/sh4/translate.c
> > @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> > break;
> > case 0x7000 ... 0x700f: /* add #imm,Rn */
> > - if (op_dst != B11_8 || mv_src >= 0) {
> > + if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> > goto fail;
>
> Hmm, but then you don't further constrain B11_8.
>
> For the given example, afaics, B11_8 == op_dst, so we're failing only
> because mv_src >= 0. Do we actually need the mv_src check? I'm
> guessing not.
>
>
> r~
This check is necessary because the intention is to cause failure in the
case of an unintended sequence of instructions.
If atomic addition results in such an instruction sequence, then it seems
like a reasonable change.
--
Yosinori Sato
On 5/27/26 06:27, yoshinori.sato@nifty.com wrote:
> On Tue, 26 May 2026 09:55:28 +0900,
> Richard Henderson wrote:
>>
>> On 5/25/26 08:27, Matt Turner wrote:
>>> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
>>> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
>>> cpu_exec_step_atomic for sequences like:
>>>
>>> mov.l @r2, r3 ; load
>>> mov r3, r7 ; save old value (mv_src == ld_dst)
>>> add #1, r7 ; increment copy
>>> mov.l r7, @r2 ; store
>>>
>>> When mv_src == ld_dst the move merely copies the loaded value to
>>> preserve it -- exactly the situation already accepted for the
>>> `add Rm, Rn` form. The immediate form can be handled identically with
>>> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
>>> instead of taking the slower single-step atomic fallback.
>>> ---
>>> target/sh4/translate.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git ./target/sh4/translate.c ./target/sh4/translate.c
>>> index 5adf650744..d38a6bd352 100644
>>> --- ./target/sh4/translate.c
>>> +++ ./target/sh4/translate.c
>>> @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
>>> break;
>>> case 0x7000 ... 0x700f: /* add #imm,Rn */
>>> - if (op_dst != B11_8 || mv_src >= 0) {
>>> + if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
>>> goto fail;
>>
>> Hmm, but then you don't further constrain B11_8.
>>
>> For the given example, afaics, B11_8 == op_dst, so we're failing only
>> because mv_src >= 0. Do we actually need the mv_src check? I'm
>> guessing not.
>>
>>
>> r~
>
> This check is necessary because the intention is to cause failure in the
> case of an unintended sequence of instructions.
Of course, but what's the unintended sequence prevented by mv_src >= 0?
Matt has identified a sequence that *is* possible, which is prevented by that check.
r~
On Thu, 28 May 2026 00:26:49 +0900,
Richard Henderson wrote:
>
> On 5/27/26 06:27, yoshinori.sato@nifty.com wrote:
> > On Tue, 26 May 2026 09:55:28 +0900,
> > Richard Henderson wrote:
> >>
> >> On 5/25/26 08:27, Matt Turner wrote:
> >>> The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior
> >>> `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to
> >>> cpu_exec_step_atomic for sequences like:
> >>>
> >>> mov.l @r2, r3 ; load
> >>> mov r3, r7 ; save old value (mv_src == ld_dst)
> >>> add #1, r7 ; increment copy
> >>> mov.l r7, @r2 ; store
> >>>
> >>> When mv_src == ld_dst the move merely copies the loaded value to
> >>> preserve it -- exactly the situation already accepted for the
> >>> `add Rm, Rn` form. The immediate form can be handled identically with
> >>> tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline
> >>> instead of taking the slower single-step atomic fallback.
> >>> ---
> >>> target/sh4/translate.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git ./target/sh4/translate.c ./target/sh4/translate.c
> >>> index 5adf650744..d38a6bd352 100644
> >>> --- ./target/sh4/translate.c
> >>> +++ ./target/sh4/translate.c
> >>> @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, CPUSH4State *env)
> >>> break;
> >>> case 0x7000 ... 0x700f: /* add #imm,Rn */
> >>> - if (op_dst != B11_8 || mv_src >= 0) {
> >>> + if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) {
> >>> goto fail;
> >>
> >> Hmm, but then you don't further constrain B11_8.
> >>
> >> For the given example, afaics, B11_8 == op_dst, so we're failing only
> >> because mv_src >= 0. Do we actually need the mv_src check? I'm
> >> guessing not.
> >>
> >>
> >> r~
> >
> > This check is necessary because the intention is to cause failure in the
> > case of an unintended sequence of instructions.
>
> Of course, but what's the unintended sequence prevented by mv_src >= 0?
> Matt has identified a sequence that *is* possible, which is prevented by that check.
>
>
> r~
Decoding "mov r3, r7" in the example code snippet results in mv_src being 3,
which causes the current code to fail.
This is an excessive failure, so it should be made effective.
--
Yosinori Sato
On Mon, May 25, 2026 at 11:27 AM Matt Turner <mattst88@gmail.com> wrote: > > The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior > `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to > cpu_exec_step_atomic for sequences like: > > mov.l @r2, r3 ; load > mov r3, r7 ; save old value (mv_src == ld_dst) > add #1, r7 ; increment copy > mov.l r7, @r2 ; store > > When mv_src == ld_dst the move merely copies the loaded value to > preserve it -- exactly the situation already accepted for the > `add Rm, Rn` form. The immediate form can be handled identically with > tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline > instead of taking the slower single-step atomic fallback. > --- Signed-off-by: Matt Turner <mattst88@gmail.com>
© 2016 - 2026 Red Hat, Inc.