The SMLAD instruction is supposed to:
* signed multiply Rn[15:0] * Rm[15:0]
* signed multiply Rn[31:16] * Rm[31:16]
* perform a signed addition of the products and Ra
* set Rd to the low 32 bits of the theoretical
infinite-precision result
* set the Q flag if the sign-extension of Rd
would differ from the infinite-precision result
(ie on overflow)
Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q. This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1
The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.
Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201009144712.11187-1-peter.maydell@linaro.org
target/arm/translate.c | 58 ++++++++++++++++++++++++++++++++++--------
1 file changed, 48 insertions(+), 10 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d34c1d351a6..d8729e42c48 100644
@@ -7401,21 +7401,59 @@ static bool op_smlad(DisasContext *s, arg_rrrr *a, bool m_swap, bool sub)
gen_smul_dual(t1, t2);
if (sub) {
- /* This subtraction cannot overflow. */
+ /*
+ * This subtraction cannot overflow, so we can do a simple
+ * 32-bit subtraction and then a possible 32-bit saturating
+ * addition of Ra.
+ */
tcg_gen_sub_i32(t1, t1, t2);
+ tcg_temp_free_i32(t2);
+
+ if (a->ra != 15) {
+ t2 = load_reg(s, a->ra);
+ gen_helper_add_setq(t1, cpu_env, t1, t2);
+ tcg_temp_free_i32(t2);
+ }
+ } else if (a->ra == 15) {
+ /* Single saturation-checking addition */
+ gen_helper_add_setq(t1, cpu_env, t1, t2);
+ tcg_temp_free_i32(t2);
} else {
/*
- * This addition cannot overflow 32 bits; however it may
- * overflow considered as a signed operation, in which case
- * we must set the Q flag.
+ * We need to add the products and Ra together and then
+ * determine whether the final result overflowed. Doing
+ * this as two separate add-and-check-overflow steps incorrectly
+ * sets Q for cases like (-32768 * -32768) + (-32768 * -32768) + -1.
+ * Do all the arithmetic at 64-bits and then check for overflow.
*/
- gen_helper_add_setq(t1, cpu_env, t1, t2);
- }
- tcg_temp_free_i32(t2);
+ TCGv_i64 p64, q64;
+ TCGv_i32 t3, qf, one;
- if (a->ra != 15) {
- t2 = load_reg(s, a->ra);
- gen_helper_add_setq(t1, cpu_env, t1, t2);
+ p64 = tcg_temp_new_i64();
+ q64 = tcg_temp_new_i64();
+ tcg_gen_ext_i32_i64(p64, t1);
+ tcg_gen_ext_i32_i64(q64, t2);
+ tcg_gen_add_i64(p64, p64, q64);
+ load_reg_var(s, t2, a->ra);
+ tcg_gen_ext_i32_i64(q64, t2);
+ tcg_gen_add_i64(p64, p64, q64);
+ tcg_temp_free_i64(q64);
+
+ tcg_gen_extr_i64_i32(t1, t2, p64);
+ tcg_temp_free_i64(p64);
+ /*
+ * t1 is the low half of the result which goes into Rd.
+ * We have overflow and must set Q if the high half (t2)
+ * is different from the sign-extension of t1.
+ */
+ t3 = tcg_temp_new_i32();
+ tcg_gen_sari_i32(t3, t1, 31);
+ qf = load_cpu_field(QF);
+ one = tcg_const_i32(1);
+ tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
+ store_cpu_field(qf, QF);
+ tcg_temp_free_i32(one);
+ tcg_temp_free_i32(t3);
tcg_temp_free_i32(t2);
}
store_reg(s, a->rd, t1);
--
2.20.1