target/arm/translate.c | 58 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 48 insertions(+), 10 deletions(-)
The SMLAD instruction is supposed to:
* signed multiply Rn[15:0] * Rm[15:0]
* signed multiply Rn[31:16] * Rm[31:16]
* perform a signed addition of the products and Ra
* set Rd to the low 32 bits of the theoretical
infinite-precision result
* set the Q flag if the sign-extension of Rd
would differ from the infinite-precision result
(ie on overflow)
Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q. This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1
The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.
Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/translate.c | 58 ++++++++++++++++++++++++++++++++++--------
1 file changed, 48 insertions(+), 10 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d34c1d351a6..d8729e42c48 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7401,21 +7401,59 @@ static bool op_smlad(DisasContext *s, arg_rrrr *a, bool m_swap, bool sub)
gen_smul_dual(t1, t2);
if (sub) {
- /* This subtraction cannot overflow. */
+ /*
+ * This subtraction cannot overflow, so we can do a simple
+ * 32-bit subtraction and then a possible 32-bit saturating
+ * addition of Ra.
+ */
tcg_gen_sub_i32(t1, t1, t2);
+ tcg_temp_free_i32(t2);
+
+ if (a->ra != 15) {
+ t2 = load_reg(s, a->ra);
+ gen_helper_add_setq(t1, cpu_env, t1, t2);
+ tcg_temp_free_i32(t2);
+ }
+ } else if (a->ra == 15) {
+ /* Single saturation-checking addition */
+ gen_helper_add_setq(t1, cpu_env, t1, t2);
+ tcg_temp_free_i32(t2);
} else {
/*
- * This addition cannot overflow 32 bits; however it may
- * overflow considered as a signed operation, in which case
- * we must set the Q flag.
+ * We need to add the products and Ra together and then
+ * determine whether the final result overflowed. Doing
+ * this as two separate add-and-check-overflow steps incorrectly
+ * sets Q for cases like (-32768 * -32768) + (-32768 * -32768) + -1.
+ * Do all the arithmetic at 64-bits and then check for overflow.
*/
- gen_helper_add_setq(t1, cpu_env, t1, t2);
- }
- tcg_temp_free_i32(t2);
+ TCGv_i64 p64, q64;
+ TCGv_i32 t3, qf, one;
- if (a->ra != 15) {
- t2 = load_reg(s, a->ra);
- gen_helper_add_setq(t1, cpu_env, t1, t2);
+ p64 = tcg_temp_new_i64();
+ q64 = tcg_temp_new_i64();
+ tcg_gen_ext_i32_i64(p64, t1);
+ tcg_gen_ext_i32_i64(q64, t2);
+ tcg_gen_add_i64(p64, p64, q64);
+ load_reg_var(s, t2, a->ra);
+ tcg_gen_ext_i32_i64(q64, t2);
+ tcg_gen_add_i64(p64, p64, q64);
+ tcg_temp_free_i64(q64);
+
+ tcg_gen_extr_i64_i32(t1, t2, p64);
+ tcg_temp_free_i64(p64);
+ /*
+ * t1 is the low half of the result which goes into Rd.
+ * We have overflow and must set Q if the high half (t2)
+ * is different from the sign-extension of t1.
+ */
+ t3 = tcg_temp_new_i32();
+ tcg_gen_sari_i32(t3, t1, 31);
+ qf = load_cpu_field(QF);
+ one = tcg_const_i32(1);
+ tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
+ store_cpu_field(qf, QF);
+ tcg_temp_free_i32(one);
+ tcg_temp_free_i32(t3);
tcg_temp_free_i32(t2);
}
store_reg(s, a->rd, t1);
--
2.20.1
On 10/9/20 9:47 AM, Peter Maydell wrote:
> + /*
> + * t1 is the low half of the result which goes into Rd.
> + * We have overflow and must set Q if the high half (t2)
> + * is different from the sign-extension of t1.
> + */
> + t3 = tcg_temp_new_i32();
> + tcg_gen_sari_i32(t3, t1, 31);
> + qf = load_cpu_field(QF);
> + one = tcg_const_i32(1);
> + tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
> + store_cpu_field(qf, QF);
This works, however, QF is not restricted to {0,1}.
Perhaps this is simpler?
qf = load_cpu_field(QF);
/* t1 is the low half; extract the sign bit */
tcg_gen_shri_i32(t3, t1, 31);
/* t2 is the high half; must be 0 or -1,
and the extension of the sign bit. adding them
must equal 0 (0 + 0 = 0; -1 + 1 = 0). */
tcg_gen_add_i32(t3, t3, t2);
/* Any non-zero result sets QF */
tcg_gen_or_i32(qf, qf, t3);
store_cpu_field(qf, QF);
Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
On Fri, 9 Oct 2020 at 18:57, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/9/20 9:47 AM, Peter Maydell wrote:
> > + /*
> > + * t1 is the low half of the result which goes into Rd.
> > + * We have overflow and must set Q if the high half (t2)
> > + * is different from the sign-extension of t1.
> > + */
> > + t3 = tcg_temp_new_i32();
> > + tcg_gen_sari_i32(t3, t1, 31);
> > + qf = load_cpu_field(QF);
> > + one = tcg_const_i32(1);
> > + tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
> > + store_cpu_field(qf, QF);
>
> This works, however, QF is not restricted to {0,1}.
I'm not sure if you mean "this code doesn't maintain that
invariant" or "there is no such invariant". If the former,
the declaration of the field in cpu.h disagrees:
uint32_t QF; /* 0 or 1 */
If the latter, I think the code does preserve the invariant.
Either we set QF to 'one', or we write back the value it had
to start with (which must have been {0,1}, but we don't really
care, because we don't want to touch it.)
> Perhaps this is simpler?
>
> qf = load_cpu_field(QF);
> /* t1 is the low half; extract the sign bit */
> tcg_gen_shri_i32(t3, t1, 31);
> /* t2 is the high half; must be 0 or -1,
> and the extension of the sign bit. adding them
> must equal 0 (0 + 0 = 0; -1 + 1 = 0). */
> tcg_gen_add_i32(t3, t3, t2);
> /* Any non-zero result sets QF */
> tcg_gen_or_i32(qf, qf, t3);
> store_cpu_field(qf, QF);
This looks like it can write something to QF that isn't 0 or 1.
thanks
-- PMM
On 10/9/20 1:47 PM, Peter Maydell wrote:
> On Fri, 9 Oct 2020 at 18:57, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 10/9/20 9:47 AM, Peter Maydell wrote:
>>> + /*
>>> + * t1 is the low half of the result which goes into Rd.
>>> + * We have overflow and must set Q if the high half (t2)
>>> + * is different from the sign-extension of t1.
>>> + */
>>> + t3 = tcg_temp_new_i32();
>>> + tcg_gen_sari_i32(t3, t1, 31);
>>> + qf = load_cpu_field(QF);
>>> + one = tcg_const_i32(1);
>>> + tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
>>> + store_cpu_field(qf, QF);
>>
>> This works, however, QF is not restricted to {0,1}.
>
> I'm not sure if you mean "this code doesn't maintain that
> invariant" or "there is no such invariant". If the former,
> the declaration of the field in cpu.h disagrees:
> uint32_t QF; /* 0 or 1 */
Oops, I confused QF with QC, the neon saturation bit.
r~
On Fri, 9 Oct 2020 at 15:47, Peter Maydell <peter.maydell@linaro.org> wrote: > + tcg_gen_extr_i64_i32(t1, t2, p64); Oh, I forgot to mention, but it looks like extr_i64_i32 isn't documented in tcg/README. Is that because it isn't really a TCG IR op, or just an omission? thanks -- PMM
On 10/9/20 1:48 PM, Peter Maydell wrote: > On Fri, 9 Oct 2020 at 15:47, Peter Maydell <peter.maydell@linaro.org> wrote: >> + tcg_gen_extr_i64_i32(t1, t2, p64); > > Oh, I forgot to mention, but it looks like extr_i64_i32 > isn't documented in tcg/README. Is that because it isn't > really a TCG IR op, or just an omission? Because it's not an IR op. It's the combo of extrl and extrh. r~
On Fri, 9 Oct 2020 at 23:36, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 10/9/20 1:48 PM, Peter Maydell wrote: > > On Fri, 9 Oct 2020 at 15:47, Peter Maydell <peter.maydell@linaro.org> wrote: > >> + tcg_gen_extr_i64_i32(t1, t2, p64); > > > > Oh, I forgot to mention, but it looks like extr_i64_i32 > > isn't documented in tcg/README. Is that because it isn't > > really a TCG IR op, or just an omission? > > Because it's not an IR op. It's the combo of extrl and extrh. We really should figure out somewhere to document the interface and operations that frontends can use. (Among other important things, extr_i64_i32 is usable generically, but extrl/extrh are 64-bit hosts only according to tcg/README, so if you trusted the README docs then you'd end up trying to synthesize this out of shifts and trunc.) thanks -- PMM
© 2016 - 2025 Red Hat, Inc.